linux-block.git
11 months agoselftests/bpf: Fix compiler warnings reported in -O2 mode
Andrii Nakryiko [Fri, 6 Oct 2023 17:57:42 +0000 (10:57 -0700)]
selftests/bpf: Fix compiler warnings reported in -O2 mode

Fix a bunch of potentially unitialized variable usage warnings that are
reported by GCC in -O2 mode. Also silence overzealous stringop-truncation
class of warnings.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20231006175744.3136675-1-andrii@kernel.org
11 months agobpf: Inherit system settings for CPU security mitigations
Yafang Shao [Thu, 5 Oct 2023 08:41:23 +0000 (08:41 +0000)]
bpf: Inherit system settings for CPU security mitigations

Currently, there exists a system-wide setting related to CPU security
mitigations, denoted as 'mitigations='. When set to 'mitigations=off', it
deactivates all optional CPU mitigations. Therefore, if we implement a
system-wide 'mitigations=off' setting, it should inherently bypass Spectre
v1 and Spectre v4 in the BPF subsystem.

Please note that there is also a more specific 'nospectre_v1' setting on
x86 and ppc architectures, though it is not currently exported. For the
time being, let's disregard more fine-grained options.

This idea emerged during our discussion about potential Spectre v1 attacks
with Luis [0].

  [0] https://lore.kernel.org/bpf/b4fc15f7-b204-767e-ebb9-fdb4233961fb@iogearbox.net

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <song@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Cc: Luis Gerhorst <gerhorst@cs.fau.de>
Link: https://lore.kernel.org/bpf/20231005084123.1338-1-laoar.shao@gmail.com
11 months agobpf: Fix the comment for bpf_restore_data_end()
Akihiko Odaki [Thu, 5 Oct 2023 07:21:36 +0000 (16:21 +0900)]
bpf: Fix the comment for bpf_restore_data_end()

The comment used to say:
> Restore data saved by bpf_compute_data_pointers().

But bpf_compute_data_pointers() does not save the data;
bpf_compute_and_save_data_end() does.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231005072137.29870-1-akihiko.odaki@daynix.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
11 months agoselftests/bpf: Enable CONFIG_VSOCKETS in config
Geliang Tang [Thu, 5 Oct 2023 07:21:51 +0000 (15:21 +0800)]
selftests/bpf: Enable CONFIG_VSOCKETS in config

CONFIG_VSOCKETS is required by BPF selftests, otherwise we get errors
like this:

    ./test_progs:socket_loopback_reuseport:386: socket:
Address family not supported by protocol
    socket_loopback_reuseport:FAIL:386
    ./test_progs:vsock_unix_redir_connectible:1496:
vsock_socketpair_connectible() failed
    vsock_unix_redir_connectible:FAIL:1496

So this patch enables it in tools/testing/selftests/bpf/config.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/472e73d285db2ea59aca9bbb95eb5d4048327588.1696490003.git.geliang.tang@suse.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
11 months agoMerge branch 'selftest/bpf, riscv: Improved cross-building support'
Andrii Nakryiko [Wed, 4 Oct 2023 20:35:07 +0000 (13:35 -0700)]
Merge branch 'selftest/bpf, riscv: Improved cross-building support'

Björn Töpel says:

====================
From: Björn Töpel <bjorn@rivosinc.com>

Yet another "more cross-building support for RISC-V" series.

An example how to invoke a gen_tar build:

  | make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- CC=riscv64-linux-gnu-gcc \
  |    HOSTCC=gcc O=/workspace/kbuild FORMAT= \
  |    SKIP_TARGETS="arm64 ia64 powerpc sparc64 x86 sgx" -j $(($(nproc)-1)) \
  |    -C tools/testing/selftests gen_tar

Björn
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
11 months agoselftests/bpf: Add uprobe_multi to gen_tar target
Björn Töpel [Wed, 4 Oct 2023 12:27:21 +0000 (14:27 +0200)]
selftests/bpf: Add uprobe_multi to gen_tar target

The uprobe_multi program was not picked up for the gen_tar target. Fix
by adding it to TEST_GEN_FILES.

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20231004122721.54525-4-bjorn@kernel.org
11 months agoselftests/bpf: Enable lld usage for RISC-V
Björn Töpel [Wed, 4 Oct 2023 12:27:20 +0000 (14:27 +0200)]
selftests/bpf: Enable lld usage for RISC-V

RISC-V has proper lld support. Use that, similar to what x86 does, for
urandom_read et al.

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231004122721.54525-3-bjorn@kernel.org
11 months agoselftests/bpf: Add cross-build support for urandom_read et al
Björn Töpel [Wed, 4 Oct 2023 12:27:19 +0000 (14:27 +0200)]
selftests/bpf: Add cross-build support for urandom_read et al

Some userland programs in the BPF test suite, e.g. urandom_read, is
missing cross-build support. Add cross-build support for these
programs

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231004122721.54525-2-bjorn@kernel.org
11 months agoMerge branch 'libbpf/selftests syscall wrapper fixes for RISC-V'
Andrii Nakryiko [Wed, 4 Oct 2023 20:18:49 +0000 (13:18 -0700)]
Merge branch 'libbpf/selftests syscall wrapper fixes for RISC-V'

Björn Töpel says:

====================
From: Björn Töpel <bjorn@rivosinc.com>

Commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers") introduced
some regressions in libbpf, and the kselftests BPF suite, which are
fixed with these three patches.

Note that there's an outstanding fix [1] for ftrace syscall tracing
which is also a fallout from the commit above.

Björn

[1] https://lore.kernel.org/linux-riscv/20231003182407.32198-1-alexghiti@rivosinc.com/

Alexandre Ghiti (1):
  libbpf: Fix syscall access arguments on riscv
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
11 months agoselftests/bpf: Define SYS_NANOSLEEP_KPROBE_NAME for riscv
Björn Töpel [Wed, 4 Oct 2023 11:09:05 +0000 (13:09 +0200)]
selftests/bpf: Define SYS_NANOSLEEP_KPROBE_NAME for riscv

Add missing sys_nanosleep name for RISC-V, which is used by some tests
(e.g. attach_probe).

Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/bpf/20231004110905.49024-4-bjorn@kernel.org
11 months agoselftests/bpf: Define SYS_PREFIX for riscv
Björn Töpel [Wed, 4 Oct 2023 11:09:04 +0000 (13:09 +0200)]
selftests/bpf: Define SYS_PREFIX for riscv

SYS_PREFIX was missing for a RISC-V, which made a couple of kprobe
tests fail.

Add missing SYS_PREFIX for RISC-V.

Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/bpf/20231004110905.49024-3-bjorn@kernel.org
11 months agolibbpf: Fix syscall access arguments on riscv
Alexandre Ghiti [Wed, 4 Oct 2023 11:09:03 +0000 (13:09 +0200)]
libbpf: Fix syscall access arguments on riscv

Since commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers"), riscv
selects ARCH_HAS_SYSCALL_WRAPPER so let's use the generic implementation
of PT_REGS_SYSCALL_REGS().

Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/bpf/20231004110905.49024-2-bjorn@kernel.org
11 months agoMerge branch 'bpf-xsk-sh-umem'
Daniel Borkmann [Wed, 4 Oct 2023 13:26:02 +0000 (15:26 +0200)]
Merge branch 'bpf-xsk-sh-umem'

Tushar Vyavahare says:

====================
Implement a test for the SHARED_UMEM feature in this patch set and make
necessary changes/improvements. Ensure that the framework now supports
different streams for different sockets.

v2->v3:
- Set the sock_num at the end of the while loop.
- Declare xsk at the top of the while loop.

v1->v2:
- Remove generate_mac_addresses() and generate mac addresses based on
  the number of sockets in __test_spec_init() function. [Magnus]
- Update Makefile to include find_bit.c for compiling xskxceiver.
- Add bitmap_full() function to verify all bits are set to break the
  while loop in the receive_pkts() and send_pkts() functions.
- Replace __test_and_set_bit() function with __set_bit() function.
- Add single return check for wait_for_tx_completion() function call.

Patch series summary:

1: Move the packet stream from the ifobject struct to the xsk_socket_info
   struct to enable the use of different streams for different sockets
   This will facilitate the sending and receiving of data from multiple
   sockets simultaneously using the SHARED_XDP_UMEM feature.

   It gives flexibility of send/recive individual traffic on particular
   socket.

2: Rename the header file to a generic name so that it can be used by all
   future XDP programs.

3: Move the src_mac and dst_mac fields from the ifobject structure to the
   xsk_socket_info structure to achieve per-socket MAC address assignment.
   Require this in order to steer traffic to various sockets in subsequent
   patches.

4: Improve the receive_pkt() function to enable it to receive packets from
   multiple sockets. Define a sock_num variable to iterate through all the
   sockets in the Rx path. Add nb_valid_entries to check that all the
   expected number of packets are received.

5: The pkt_set() function no longer needs the umem parameter. This commit
   removes the umem parameter from the pkt_set() function.

6: Iterate over all the sockets in the send pkts function. Update
   send_pkts() to handle multiple sockets for sending packets. Multiple TX
   sockets are utilized alternately based on the batch size for improve
   packet transmission.

7: Modify xsk_update_xskmap() to accept the index as an argument, enabling
   the addition of multiple sockets to xskmap.

8: Add a new test for testing shared umem feature. This is accomplished by
   adding a new XDP program and using the multiple sockets. The new XDP
   program redirects the packets based on the destination MAC address.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
11 months agoselftests/xsk: Add a test for shared umem feature
Tushar Vyavahare [Wed, 27 Sep 2023 13:52:41 +0000 (19:22 +0530)]
selftests/xsk: Add a test for shared umem feature

Add a new test for testing shared umem feature. This is accomplished by
adding a new XDP program and using the multiple sockets.

The new XDP program redirects the packets based on the destination MAC
address.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230927135241.2287547-9-tushar.vyavahare@intel.com
11 months agoselftests/xsk: Modify xsk_update_xskmap() to accept the index as an argument
Tushar Vyavahare [Wed, 27 Sep 2023 13:52:40 +0000 (19:22 +0530)]
selftests/xsk: Modify xsk_update_xskmap() to accept the index as an argument

Modify xsk_update_xskmap() to accept the index as an argument, enabling
the addition of multiple sockets to xskmap.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230927135241.2287547-8-tushar.vyavahare@intel.com
11 months agoselftests/xsk: Iterate over all the sockets in the send pkts function
Tushar Vyavahare [Wed, 27 Sep 2023 13:52:39 +0000 (19:22 +0530)]
selftests/xsk: Iterate over all the sockets in the send pkts function

Update send_pkts() to handle multiple sockets for sending packets.
Multiple TX sockets are utilized alternately based on the batch size for
improve packet transmission.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230927135241.2287547-7-tushar.vyavahare@intel.com
11 months agoselftests/xsk: Remove unnecessary parameter from pkt_set() function call
Tushar Vyavahare [Wed, 27 Sep 2023 13:52:38 +0000 (19:22 +0530)]
selftests/xsk: Remove unnecessary parameter from pkt_set() function call

The pkt_set() function no longer needs the umem parameter. This commit
removes the umem parameter from the pkt_set() function.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230927135241.2287547-6-tushar.vyavahare@intel.com
11 months agoselftests/xsk: Iterate over all the sockets in the receive pkts function
Tushar Vyavahare [Wed, 27 Sep 2023 13:52:37 +0000 (19:22 +0530)]
selftests/xsk: Iterate over all the sockets in the receive pkts function

Improve the receive_pkt() function to enable it to receive packets from
multiple sockets. Define a sock_num variable to iterate through all the
sockets in the Rx path. Add nb_valid_entries to check that all the
expected number of packets are received.

Revise the function __receive_pkts() to only inspect the receive ring
once, handle any received packets, and promptly return. Implement a bitmap
to store the value of number of sockets. Update Makefile to include
find_bit.c for compiling xskxceiver.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230927135241.2287547-5-tushar.vyavahare@intel.com
11 months agoselftests/xsk: Move src_mac and dst_mac to the xsk_socket_info
Tushar Vyavahare [Wed, 27 Sep 2023 13:52:36 +0000 (19:22 +0530)]
selftests/xsk: Move src_mac and dst_mac to the xsk_socket_info

Move the src_mac and dst_mac fields from the ifobject structure to the
xsk_socket_info structure to achieve per-socket MAC address assignment.

Require this in order to steer traffic to various sockets in subsequent
patches.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230927135241.2287547-4-tushar.vyavahare@intel.com
11 months agoselftests/xsk: Rename xsk_xdp_metadata.h to xsk_xdp_common.h
Tushar Vyavahare [Wed, 27 Sep 2023 13:52:35 +0000 (19:22 +0530)]
selftests/xsk: Rename xsk_xdp_metadata.h to xsk_xdp_common.h

Rename the header file to a generic name so that it can be used by all
future XDP programs. Ensure that the xsk_xdp_common.h header file includes
include guards.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230927135241.2287547-3-tushar.vyavahare@intel.com
11 months agoselftests/xsk: Move pkt_stream to the xsk_socket_info
Tushar Vyavahare [Wed, 27 Sep 2023 13:52:34 +0000 (19:22 +0530)]
selftests/xsk: Move pkt_stream to the xsk_socket_info

Move the packet stream from the ifobject struct to the xsk_socket_info
struct to enable the use of different streams for different sockets. This
will facilitate the sending and receiving of data from multiple sockets
simultaneously using the SHARED_XDP_UMEM feature.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20230927135241.2287547-2-tushar.vyavahare@intel.com
11 months agolibbpf: Allow Golang symbols in uprobe secdef
Hengqi Chen [Fri, 29 Sep 2023 15:59:54 +0000 (15:59 +0000)]
libbpf: Allow Golang symbols in uprobe secdef

Golang symbols in ELF files are different from C/C++
which contains special characters like '*', '(' and ')'.
With generics, things get more complicated, there are
symbols like:

  github.com/cilium/ebpf/internal.(*Deque[go.shape.interface { Format(fmt.State, int32); TypeName() string;github.com/cilium/ebpf/btf.copy() github.com/cilium/ebpf/btf.Type}]).Grow

Matching such symbols using `%m[^\n]` in sscanf, this
excludes newline which typically does not appear in ELF
symbols. This should work in most use-cases and also
work for unicode letters in identifiers. If newline do
show up in ELF symbols, users can still attach to such
symbol by specifying bpf_uprobe_opts::func_name.

A working example can be found at this repo ([0]).

  [0]: https://github.com/chenhengqi/libbpf-go-symbols

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230929155954.92448-1-hengqi.chen@gmail.com
11 months agosamples/bpf: Add -fsanitize=bounds to userspace programs
Ruowen Qin [Wed, 27 Sep 2023 04:50:30 +0000 (23:50 -0500)]
samples/bpf: Add -fsanitize=bounds to userspace programs

The sanitizer flag, which is supported by both clang and gcc, would make
it easier to debug array index out-of-bounds problems in these programs.

Make the Makfile smarter to detect ubsan support from the compiler and
add the '-fsanitize=bounds' accordingly.

Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jinghao Jia <jinghao@linux.ibm.com>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Ruowen Qin <ruowenq2@illinois.edu>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230927045030.224548-2-ruowenq2@illinois.edu
11 months agoMerge branch 'bpf: Add missed stats for kprobes'
Andrii Nakryiko [Mon, 25 Sep 2023 23:37:45 +0000 (16:37 -0700)]
Merge branch 'bpf: Add missed stats for kprobes'

Jiri Olsa says:

====================
hi,
at the moment we can't retrieve the number of missed kprobe
executions and subsequent execution of BPF programs.

This patchset adds:
  - counting of missed execution on attach layer for:
    . kprobes attached through perf link (kprobe/ftrace)
    . kprobes attached through kprobe.multi link (fprobe)
  - counting of recursion_misses for BPF kprobe programs

It's still technically possible to create kprobe without perf link (using
SET_BPF perf ioctl) in which case we don't have a way to retrieve the kprobe's
'missed' count. However both libbpf and cilium/ebpf libraries use perf link
if it's available, and for old kernels without perf link support we can use
BPF program to retrieve the kprobe missed count.

v3 changes:
  - added acks [Song]
  - make test_missed not serial [Andrii]

Also available at:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  bpf/missed_stats

thanks,
jirka
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
11 months agoselftests/bpf: Add test for recursion counts of perf event link tracepoint
Jiri Olsa [Wed, 20 Sep 2023 21:31:45 +0000 (23:31 +0200)]
selftests/bpf: Add test for recursion counts of perf event link tracepoint

Adding selftest that puts kprobe on bpf_fentry_test1 that calls bpf_printk
and invokes bpf_trace_printk tracepoint. The bpf_trace_printk tracepoint
has test[234] programs attached to it.

Because kprobe execution goes through bpf_prog_active check, programs
attached to the tracepoint will fail the recursion check and increment the
recursion_misses stats.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-10-jolsa@kernel.org
11 months agoselftests/bpf: Add test for recursion counts of perf event link kprobe
Jiri Olsa [Wed, 20 Sep 2023 21:31:44 +0000 (23:31 +0200)]
selftests/bpf: Add test for recursion counts of perf event link kprobe

Adding selftest that puts kprobe.multi on bpf_fentry_test1 that
calls bpf_kfunc_common_test kfunc which has 3 perf event kprobes
and 1 kprobe.multi attached.

Because fprobe (kprobe.multi attach layear) does not have strict
recursion check the kprobe's bpf_prog_active check is hit for test2-5.

Disabling this test for arm64, because there's no fprobe support yet.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-9-jolsa@kernel.org
11 months agoselftests/bpf: Add test for missed counts of perf event link kprobe
Jiri Olsa [Wed, 20 Sep 2023 21:31:43 +0000 (23:31 +0200)]
selftests/bpf: Add test for missed counts of perf event link kprobe

Adding test that puts kprobe on bpf_fentry_test1 that calls
bpf_kfunc_common_test kfunc, which has also kprobe on.

The latter won't get triggered due to kprobe recursion check
and kprobe missed counter is incremented.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-8-jolsa@kernel.org
11 months agobpftool: Display missed count for kprobe perf link
Jiri Olsa [Wed, 20 Sep 2023 21:31:42 +0000 (23:31 +0200)]
bpftool: Display missed count for kprobe perf link

Adding 'missed' field to display missed counts for kprobes
attached by perf event link, like:

  # bpftool link
  5: perf_event  prog 82
          kprobe ffffffff815203e0 ksys_write
  6: perf_event  prog 83
          kprobe ffffffff811d1e50 scheduler_tick  missed 682217

  # bpftool link -jp
  [{
          "id": 5,
          "type": "perf_event",
          "prog_id": 82,
          "retprobe": false,
          "addr": 18446744071584220128,
          "func": "ksys_write",
          "offset": 0,
          "missed": 0
      },{
          "id": 6,
          "type": "perf_event",
          "prog_id": 83,
          "retprobe": false,
          "addr": 18446744071580753488,
          "func": "scheduler_tick",
          "offset": 0,
          "missed": 693469
      }
  ]

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-7-jolsa@kernel.org
11 months agobpftool: Display missed count for kprobe_multi link
Jiri Olsa [Wed, 20 Sep 2023 21:31:41 +0000 (23:31 +0200)]
bpftool: Display missed count for kprobe_multi link

Adding 'missed' field to display missed counts for kprobes
attached by kprobe multi link, like:

  # bpftool link
  5: kprobe_multi  prog 76
          kprobe.multi  func_cnt 1  missed 1
          addr             func [module]
          ffffffffa039c030 fp3_test [fprobe_test]

  # bpftool link -jp
  [{
          "id": 5,
          "type": "kprobe_multi",
          "prog_id": 76,
          "retprobe": false,
          "func_cnt": 1,
          "missed": 1,
          "funcs": [{
                  "addr": 18446744072102723632,
                  "func": "fp3_test",
                  "module": "fprobe_test"
              }
          ]
      }
  ]

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-6-jolsa@kernel.org
11 months agobpf: Count missed stats in trace_call_bpf
Jiri Olsa [Wed, 20 Sep 2023 21:31:40 +0000 (23:31 +0200)]
bpf: Count missed stats in trace_call_bpf

Increase misses stats in case bpf array execution is skipped
because of recursion check in trace_call_bpf.

Adding bpf_prog_inc_misses_counters that increase misses
counts for all bpf programs in bpf_prog_array.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-5-jolsa@kernel.org
11 months agobpf: Add missed value to kprobe perf link info
Jiri Olsa [Wed, 20 Sep 2023 21:31:39 +0000 (23:31 +0200)]
bpf: Add missed value to kprobe perf link info

Add missed value to kprobe attached through perf link info to
hold the stats of missed kprobe handler execution.

The kprobe's missed counter gets incremented when kprobe handler
is not executed due to another kprobe running on the same cpu.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-4-jolsa@kernel.org
11 months agobpf: Add missed value to kprobe_multi link info
Jiri Olsa [Wed, 20 Sep 2023 21:31:38 +0000 (23:31 +0200)]
bpf: Add missed value to kprobe_multi link info

Add missed value to kprobe_multi link info to hold the stats of missed
kprobe_multi probe.

The missed counter gets incremented when fprobe fails the recursion
check or there's no rethook available for return probe. In either
case the attached bpf program is not executed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-3-jolsa@kernel.org
11 months agobpf: Count stats for kprobe_multi programs
Jiri Olsa [Wed, 20 Sep 2023 21:31:37 +0000 (23:31 +0200)]
bpf: Count stats for kprobe_multi programs

Adding support to gather missed stats for kprobe_multi
programs due to bpf_prog_active protection.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Reviewed-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20230920213145.1941596-2-jolsa@kernel.org
11 months agoMerge branch 'add libbpf getters for individual ringbuffers'
Andrii Nakryiko [Mon, 25 Sep 2023 23:22:43 +0000 (16:22 -0700)]
Merge branch 'add libbpf getters for individual ringbuffers'

Martin Kelly says:

====================
This patch series adds a new ring__ API to libbpf exposing getters for
accessing the individual ringbuffers inside a struct ring_buffer. This is
useful for polling individually, getting available data, or similar use
cases. The API looks like this, and was roughly proposed by Andrii Nakryiko
in another thread:

Getting a ring struct:
struct ring *ring_buffer__ring(struct ring_buffer *rb, unsigned int idx);

Using the ring struct:
unsigned long ring__consumer_pos(const struct ring *r);
unsigned long ring__producer_pos(const struct ring *r);
size_t ring__avail_data_size(const struct ring *r);
size_t ring__size(const struct ring *r);
int ring__map_fd(const struct ring *r);
int ring__consume(struct ring *r);

Changes in v2:
- Addressed all feedback from Andrii Nakryiko
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
11 months agoselftests/bpf: Add tests for ring__consume
Martin Kelly [Mon, 25 Sep 2023 21:50:45 +0000 (14:50 -0700)]
selftests/bpf: Add tests for ring__consume

Add tests for new API ring__consume.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-15-martin.kelly@crowdstrike.com
11 months agolibbpf: Add ring__consume
Martin Kelly [Mon, 25 Sep 2023 21:50:44 +0000 (14:50 -0700)]
libbpf: Add ring__consume

Add ring__consume to consume a single ringbuffer, analogous to
ring_buffer__consume.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-14-martin.kelly@crowdstrike.com
11 months agoselftests/bpf: Add tests for ring__map_fd
Martin Kelly [Mon, 25 Sep 2023 21:50:43 +0000 (14:50 -0700)]
selftests/bpf: Add tests for ring__map_fd

Add tests for the new API ring__map_fd.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-13-martin.kelly@crowdstrike.com
11 months agolibbpf: Add ring__map_fd
Martin Kelly [Mon, 25 Sep 2023 21:50:42 +0000 (14:50 -0700)]
libbpf: Add ring__map_fd

Add ring__map_fd to get the file descriptor underlying a given
ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-12-martin.kelly@crowdstrike.com
11 months agoselftests/bpf: Add tests for ring__size
Martin Kelly [Mon, 25 Sep 2023 21:50:41 +0000 (14:50 -0700)]
selftests/bpf: Add tests for ring__size

Add tests for the new API ring__size.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-11-martin.kelly@crowdstrike.com
11 months agolibbpf: Add ring__size
Martin Kelly [Mon, 25 Sep 2023 21:50:40 +0000 (14:50 -0700)]
libbpf: Add ring__size

Add ring__size to get the total size of a given ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-10-martin.kelly@crowdstrike.com
11 months agoselftests/bpf: Add tests for ring__avail_data_size
Martin Kelly [Mon, 25 Sep 2023 21:50:39 +0000 (14:50 -0700)]
selftests/bpf: Add tests for ring__avail_data_size

Add test for the new API ring__avail_data_size.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-9-martin.kelly@crowdstrike.com
11 months agolibbpf: Add ring__avail_data_size
Martin Kelly [Mon, 25 Sep 2023 21:50:38 +0000 (14:50 -0700)]
libbpf: Add ring__avail_data_size

Add ring__avail_data_size for querying the currently available data in
the ringbuffer, similar to the BPF_RB_AVAIL_DATA flag in
bpf_ringbuf_query. This is racy during ongoing operations but is still
useful for overall information on how a ringbuffer is behaving.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-8-martin.kelly@crowdstrike.com
11 months agoselftests/bpf: Add tests for ring__*_pos
Martin Kelly [Mon, 25 Sep 2023 21:50:37 +0000 (14:50 -0700)]
selftests/bpf: Add tests for ring__*_pos

Add tests for the new APIs ring__producer_pos and ring__consumer_pos.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-7-martin.kelly@crowdstrike.com
11 months agolibbpf: Add ring__producer_pos, ring__consumer_pos
Martin Kelly [Mon, 25 Sep 2023 21:50:36 +0000 (14:50 -0700)]
libbpf: Add ring__producer_pos, ring__consumer_pos

Add APIs to get the producer and consumer position for a given
ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-6-martin.kelly@crowdstrike.com
11 months agoselftests/bpf: Add tests for ring_buffer__ring
Martin Kelly [Mon, 25 Sep 2023 21:50:35 +0000 (14:50 -0700)]
selftests/bpf: Add tests for ring_buffer__ring

Add tests for the new API ring_buffer__ring.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-5-martin.kelly@crowdstrike.com
11 months agolibbpf: Add ring_buffer__ring
Martin Kelly [Mon, 25 Sep 2023 21:50:34 +0000 (14:50 -0700)]
libbpf: Add ring_buffer__ring

Add a new function ring_buffer__ring, which exposes struct ring * to the
user, representing a single ringbuffer.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-4-martin.kelly@crowdstrike.com
11 months agolibbpf: Switch rings to array of pointers
Martin Kelly [Mon, 25 Sep 2023 21:50:33 +0000 (14:50 -0700)]
libbpf: Switch rings to array of pointers

Switch rb->rings to be an array of pointers instead of a contiguous
block. This allows for each ring pointer to be stable after
ring_buffer__add is called, which allows us to expose struct ring * to
the user without gotchas. Without this change, the realloc in
ring_buffer__add could invalidate a struct ring *, making it unsafe to
give to the user.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-3-martin.kelly@crowdstrike.com
11 months agolibbpf: Refactor cleanup in ring_buffer__add
Martin Kelly [Mon, 25 Sep 2023 21:50:32 +0000 (14:50 -0700)]
libbpf: Refactor cleanup in ring_buffer__add

Refactor the cleanup code in ring_buffer__add to use a unified err_out
label. This reduces code duplication, as well as plugging a potential
leak if mmap_sz != (__u64)(size_t)mmap_sz (currently this would miss
unmapping tmp because ringbuf_unmap_ring isn't called).

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230925215045.2375758-2-martin.kelly@crowdstrike.com
11 months agoMerge branch 'libbpf: Support symbol versioning for uprobe'
Andrii Nakryiko [Fri, 22 Sep 2023 21:18:56 +0000 (14:18 -0700)]
Merge branch 'libbpf: Support symbol versioning for uprobe'

Hengqi Chen says:

====================
Dynamic symbols in shared library may have the same name, for example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
      706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
      2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
      2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

There are two pthread_rwlock_wrlock symbols in libc.so .dynsym section.
The one with @@ is the default version, the other is hidden.
Note that the version info is stored in .gnu.version and .gnu.version_d
sections of libc and the two symbols are at the _same_ offset.

Currently, specify `pthread_rwlock_wrlock`, `pthread_rwlock_wrlock@@GLIBC_2.34`
or `pthread_rwlock_wrlock@GLIBC_2.2.5` in bpf_uprobe_opts::func_name won't work.
Because there are two `pthread_rwlock_wrlock` in .dynsym sections without the
version suffix and both are global bind.

We could solve this by introducing symbol versioning ([0]). So that users can
specify func, func@LIB_VERSION or func@@LIB_VERSION to attach a uprobe.

This patchset resolves symbol conflicts and add symbol versioning for uprobe.
  - Patch 1 resolves symbol conflicts at the same offset
  - Patch 2 adds symbol versioning for dynsym
  - Patch 3 adds selftests for the above changes

Changes from v3:
  - Address comments from Andrii

Changes from v2:
  - Add uretprobe selfttest (Alan)
  - Check symbol exact match (Alan)
  - Fix typo (Jiri)

Changes from v1:
  - Address comments from Alan and Jiri
  - Add selftests (Someone reminds me that there is an attempt at [1]
    and part of the selftest code from Andrii is taken from there)

  [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html
  [1]: https://lore.kernel.org/lkml/CAEf4BzZTrjjyyOm3ak9JsssPSh6T_ZmGd677a2rt5e5rBLUrpQ@mail.gmail.com/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
11 months agoselftests/bpf: Add tests for symbol versioning for uprobe
Hengqi Chen [Mon, 18 Sep 2023 02:48:13 +0000 (02:48 +0000)]
selftests/bpf: Add tests for symbol versioning for uprobe

This exercises the newly added dynsym symbol versioning logics.
Now we accept symbols in form of func, func@LIB_VERSION or
func@@LIB_VERSION.

The test rely on liburandom_read.so. For liburandom_read.so, we have:

    $ nm -D liburandom_read.so
                     w __cxa_finalize@GLIBC_2.17
                     w __gmon_start__
                     w _ITM_deregisterTMCloneTable
                     w _ITM_registerTMCloneTable
    0000000000000000 A LIBURANDOM_READ_1.0.0
    0000000000000000 A LIBURANDOM_READ_2.0.0
    000000000000081c T urandlib_api@@LIBURANDOM_READ_2.0.0
    0000000000000814 T urandlib_api@LIBURANDOM_READ_1.0.0
    0000000000000824 T urandlib_api_sameoffset@LIBURANDOM_READ_1.0.0
    0000000000000824 T urandlib_api_sameoffset@@LIBURANDOM_READ_2.0.0
    000000000000082c T urandlib_read_without_sema@@LIBURANDOM_READ_1.0.0
    00000000000007c4 T urandlib_read_with_sema@@LIBURANDOM_READ_1.0.0
    0000000000011018 D urandlib_read_with_sema_semaphore@@LIBURANDOM_READ_1.0.0

For `urandlib_api`, specifying `urandlib_api` will cause a conflict because
there are two symbols named urandlib_api and both are global bind.
For `urandlib_api_sameoffset`, there are also two symbols in the .so, but
both are at the same offset and essentially they refer to the same function
so no conflict.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230918024813.237475-4-hengqi.chen@gmail.com
11 months agolibbpf: Support symbol versioning for uprobe
Hengqi Chen [Mon, 18 Sep 2023 02:48:12 +0000 (02:48 +0000)]
libbpf: Support symbol versioning for uprobe

In current implementation, we assume that symbol found in .dynsym section
would have a version suffix and use it to compare with symbol user supplied.
According to the spec ([0]), this assumption is incorrect, the version info
of dynamic symbols are stored in .gnu.version and .gnu.version_d sections
of ELF objects. For example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
      706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
      2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
      2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

In this case, specify pthread_rwlock_wrlock@@GLIBC_2.34 or
pthread_rwlock_wrlock@GLIBC_2.2.5 in bpf_uprobe_opts::func_name won't work.
Because the qualified name does NOT match `pthread_rwlock_wrlock` (without
version suffix) in .dynsym sections.

This commit implements the symbol versioning for dynsym and allows user to
specify symbol in the following forms:
  - func
  - func@LIB_VERSION
  - func@@LIB_VERSION

In case of symbol conflicts, error out and users should resolve it by
specifying a qualified name.

  [0]: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230918024813.237475-3-hengqi.chen@gmail.com
11 months agolibbpf: Resolve symbol conflicts at the same offset for uprobe
Hengqi Chen [Mon, 18 Sep 2023 02:48:11 +0000 (02:48 +0000)]
libbpf: Resolve symbol conflicts at the same offset for uprobe

Dynamic symbols in shared library may have the same name, for example:

    $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
    000000000009b1a0 T __pthread_rwlock_wrlock@GLIBC_2.2.5
    000000000009b1a0 T pthread_rwlock_wrlock@@GLIBC_2.34
    000000000009b1a0 T pthread_rwlock_wrlock@GLIBC_2.2.5

    $ readelf -W --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | grep rwlock_wrlock
     706: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 __pthread_rwlock_wrlock@GLIBC_2.2.5
    2568: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@@GLIBC_2.34
    2571: 000000000009b1a0   878 FUNC    GLOBAL DEFAULT   15 pthread_rwlock_wrlock@GLIBC_2.2.5

Currently, users can't attach a uprobe to pthread_rwlock_wrlock because
there are two symbols named pthread_rwlock_wrlock and both are global
bind. And libbpf considers it as a conflict.

Since both of them are at the same offset we could accept one of them
harmlessly. Note that we already does this in elf_resolve_syms_offsets.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230918024813.237475-2-hengqi.chen@gmail.com
11 months agobpf, docs: Add loongarch64 as arch supporting BPF JIT
Tiezhu Yang [Tue, 19 Sep 2023 08:25:37 +0000 (16:25 +0800)]
bpf, docs: Add loongarch64 as arch supporting BPF JIT

As BPF JIT support for loongarch64 was added about one year ago
with commit 5dc615520c4d ("LoongArch: Add BPF JIT support"), it
is appropriate to add loongarch64 as arch supporting BPF JIT in
bpf and sysctl docs as well.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1695111937-19697-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
11 months agosamples/bpf: syscall_tp_user: Fix array out-of-bound access
Jinghao Jia [Sun, 17 Sep 2023 21:42:20 +0000 (16:42 -0500)]
samples/bpf: syscall_tp_user: Fix array out-of-bound access

Commit 06744f24696e ("samples/bpf: Add openat2() enter/exit tracepoint
to syscall_tp sample") added two more eBPF programs to support the
openat2() syscall. However, it did not increase the size of the array
that holds the corresponding bpf_links. This leads to an out-of-bound
access on that array in the bpf_object__for_each_program loop and could
corrupt other variables on the stack. On our testing QEMU, it corrupts
the map1_fds array and causes the sample to fail:

  # ./syscall_tp
  prog #0: map ids 4 5
  verify map:4 val: 5
  map_lookup failed: Bad file descriptor

Dynamically allocate the array based on the number of programs reported
by libbpf to prevent similar inconsistencies in the future

Fixes: 06744f24696e ("samples/bpf: Add openat2() enter/exit tracepoint to syscall_tp sample")
Signed-off-by: Jinghao Jia <jinghao@linux.ibm.com>
Signed-off-by: Ruowen Qin <ruowenq2@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Link: https://lore.kernel.org/r/20230917214220.637721-4-jinghao7@illinois.edu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agosamples/bpf: syscall_tp_user: Rename num_progs into nr_tests
Jinghao Jia [Sun, 17 Sep 2023 21:42:19 +0000 (16:42 -0500)]
samples/bpf: syscall_tp_user: Rename num_progs into nr_tests

The variable name num_progs causes confusion because that variable
really controls the number of rounds the test should be executed.

Rename num_progs into nr_tests for the sake of clarity.

Signed-off-by: Jinghao Jia <jinghao@linux.ibm.com>
Signed-off-by: Ruowen Qin <ruowenq2@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Link: https://lore.kernel.org/r/20230917214220.637721-3-jinghao7@illinois.edu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agoMerge branch 'implement-cpuv4-support-for-s390x'
Alexei Starovoitov [Thu, 21 Sep 2023 21:22:01 +0000 (14:22 -0700)]
Merge branch 'implement-cpuv4-support-for-s390x'

Ilya Leoshkevich says:

====================
Implement cpuv4 support for s390x

v1: https://lore.kernel.org/bpf/20230830011128.1415752-1-iii@linux.ibm.com/
v1 -> v2:
- Redo Disable zero-extension for BPF_MEMSX as Puranjay and Alexei
  suggested.
- Drop the bpf_ct_insert_entry() patch, it went in via the bpf tree.
- Rebase, don't apply A-bs because there were fixed conflicts.

Hi,

This series adds the cpuv4 support to the s390x eBPF JIT.
Patches 1-3 are preliminary bugfixes.
Patches 4-8 implement the new instructions.
Patches 9-10 enable the tests.

Best regards,
Ilya
====================

Link: https://lore.kernel.org/r/20230919101336.2223655-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agoselftests/bpf: Trim DENYLIST.s390x
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:12 +0000 (12:09 +0200)]
selftests/bpf: Trim DENYLIST.s390x

Enable all selftests, except the 2 that have to do with the userspace
unwinding, and the new exceptions test, in the s390x CI.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-11-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agoselftests/bpf: Enable the cpuv4 tests for s390x
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:11 +0000 (12:09 +0200)]
selftests/bpf: Enable the cpuv4 tests for s390x

Now that all the cpuv4 support is in place, enable the tests.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-10-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agos390/bpf: Implement signed division
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:10 +0000 (12:09 +0200)]
s390/bpf: Implement signed division

Implement the cpuv4 signed division. It is encoded as unsigned
division, but with off field set to 1. s390x has the necessary
instructions: dsgfr, dsgf and dsgr.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-9-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agos390/bpf: Implement unconditional jump with 32-bit offset
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:09 +0000 (12:09 +0200)]
s390/bpf: Implement unconditional jump with 32-bit offset

Implement the cpuv4 unconditional jump with 32-bit offset, which is
encoded as BPF_JMP32 | BPF_JA and stores the offset in the imm field.
Reuse the existing BPF_JMP | BPF_JA logic.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-8-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agos390/bpf: Implement unconditional byte swap
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:08 +0000 (12:09 +0200)]
s390/bpf: Implement unconditional byte swap

Implement the cpuv4 unconditional byte swap, which is encoded as
BPF_ALU64 | BPF_END | BPF_FROM_LE. Since s390x is big-endian, it's
the same as the existing BPF_ALU | BPF_END | BPF_FROM_LE.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-7-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agos390/bpf: Implement BPF_MEMSX
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:07 +0000 (12:09 +0200)]
s390/bpf: Implement BPF_MEMSX

Implement the cpuv4 load with sign-extension, which is encoded as
BPF_MEMSX (and, for internal uses cases only, BPF_PROBE_MEMSX).

This is the same as BPF_MEM and BPF_PROBE_MEM, but with sign
extension instead of zero extension, and s390x has the necessary
instructions: lgb, lgh and lgf.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-6-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agos390/bpf: Implement BPF_MOV | BPF_X with sign-extension
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:06 +0000 (12:09 +0200)]
s390/bpf: Implement BPF_MOV | BPF_X with sign-extension

Implement the cpuv4 register-to-register move with sign extension. It
is distinguished from the normal moves by non-zero values in
insn->off, which determine the source size. s390x has instructions to
deal with all of them: lbr, lhr, lgbr, lghr and lgfr.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-5-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agoselftests/bpf: Add big-endian support to the ldsx test
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:05 +0000 (12:09 +0200)]
selftests/bpf: Add big-endian support to the ldsx test

Prepare the ldsx test to run on big-endian systems by adding the
necessary endianness checks around narrow memory accesses.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-4-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agoselftests/bpf: Unmount the cgroup2 work directory
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:04 +0000 (12:09 +0200)]
selftests/bpf: Unmount the cgroup2 work directory

test_progs -t bind_perm,bpf_obj_pinning/mounted-str-rel fails when
the selftests directory is mounted under /mnt, which is a reasonable
thing to do when sharing the selftests residing on the host with a
virtual machine, e.g., using 9p.

The reason is that cgroup2 is mounted at /mnt and not unmounted,
causing subsequent tests that need to access the selftests directory
to fail.

Fix by unmounting it. The kernel maintains a mount stack, so this
reveals what was mounted there before. Introduce cgroup_workdir_mounted
in order to maintain idempotency. Make it thread-local in order to
support test_progs -j.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agobpf: Disable zero-extension for BPF_MEMSX
Ilya Leoshkevich [Tue, 19 Sep 2023 10:09:03 +0000 (12:09 +0200)]
bpf: Disable zero-extension for BPF_MEMSX

On the architectures that use bpf_jit_needs_zext(), e.g., s390x, the
verifier incorrectly inserts a zero-extension after BPF_MEMSX, leading
to miscompilations like the one below:

      24:       89 1a ff fe 00 00 00 00 "r1 = *(s16 *)(r10 - 2);"       # zext_dst set
   0x3ff7fdb910e:       lgh     %r2,-2(%r13,%r0)                        # load halfword
   0x3ff7fdb9114:       llgfr   %r2,%r2                                 # wrong!
      25:       65 10 00 03 00 00 7f ff if r1 s> 32767 goto +3 <l0_1>   # check_cond_jmp_op()

Disable such zero-extensions. The JITs need to insert sign-extension
themselves, if necessary.

Suggested-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20230919101336.2223655-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
11 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Paolo Abeni [Thu, 21 Sep 2023 19:49:45 +0000 (21:49 +0200)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 21 Sep 2023 18:28:16 +0000 (11:28 -0700)]
Merge tag 'net-6.6-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and bpf.

  Current release - regressions:

   - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE

   - netfilter: fix entries val in rule reset audit log

   - eth: stmmac: fix incorrect rxq|txq_stats reference

  Previous releases - regressions:

   - ipv4: fix null-deref in ipv4_link_failure

   - netfilter:
      - fix several GC related issues
      - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP

   - eth: team: fix null-ptr-deref when team device type is changed

   - eth: i40e: fix VF VLAN offloading when port VLAN is configured

   - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB

  Previous releases - always broken:

   - core: fix ETH_P_1588 flow dissector

   - mptcp: fix several connection hang-up conditions

   - bpf:
      - avoid deadlock when using queue and stack maps from NMI
      - add override check to kprobe multi link attach

   - hsr: properly parse HSRv1 supervisor frames.

   - eth: igc: fix infinite initialization loop with early XDP redirect

   - eth: octeon_ep: fix tx dma unmap len values in SG

   - eth: hns3: fix GRE checksum offload issue"

* tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
  igc: Expose tx-usecs coalesce setting to user
  octeontx2-pf: Do xdp_do_flush() after redirects.
  bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
  net: ena: Flush XDP packets on error.
  net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
  net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev'
  netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
  netfilter: nf_tables: fix memleak when more than 255 elements expired
  netfilter: nf_tables: disable toggling dormant table state more than once
  vxlan: Add missing entries to vxlan_get_size()
  net: rds: Fix possible NULL-pointer dereference
  team: fix null-ptr-deref when team device type is changed
  net: bridge: use DEV_STATS_INC()
  net: hns3: add 5ms delay before clear firmware reset irq source
  net: hns3: fix fail to delete tc flower rules during reset issue
  net: hns3: only enable unicast promisc when mac table full
  net: hns3: fix GRE checksum offload issue
  net: hns3: add cmdq check for vf periodic service task
  net: stmmac: fix incorrect rxq|txq_stats reference
  ...

11 months agoMerge tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 21 Sep 2023 17:15:26 +0000 (10:15 -0700)]
Merge tag 'v6.6-rc3.vfs.ctime.revert' of git://git./linux/kernel/git/vfs/vfs

Pull finegrained timestamp reverts from Christian Brauner:
 "Earlier this week we sent a few minor fixes for the multi-grained
  timestamp work in [1]. While we were polishing those up after Linus
  realized that there might be a nicer way to fix them we received a
  regression report in [2] that fine grained timestamps break gnulib
  tests and thus possibly other tools.

  The kernel will elide fine-grain timestamp updates when no one is
  actively querying for them to avoid performance impacts. So a sequence
  like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may
  result in timestamp f1 to be older than the final f2 timestamp even
  though f1 was last written too but the second write didn't update the
  timestamp.

  Such plotholes can lead to subtle bugs when programs compare
  timestamps. For example, the nap() function in [2] will estimate that
  it needs to wait one ns on a fine-grain timestamp enabled filesytem
  between subsequent calls to observe a timestamp change. But in general
  we don't update timestamps with more than one jiffie if we think that
  no one is actively querying for fine-grain timestamps to avoid
  performance impacts.

  While discussing various fixes the decision was to go back to the
  drawing board and ultimately to explore a solution that involves only
  exposing such fine-grained timestamps to nfs internally and never to
  userspace.

  As there are multiple solutions discussed the honest thing to do here
  is not to fix this up or disable it but to cleanly revert. The general
  infrastructure will probably come back but there is no reason to keep
  this code in mainline.

  The general changes to timestamp handling are valid and a good cleanup
  that will stay. The revert is fully bisectable"

Link: https://lore.kernel.org/all/20230918-hirte-neuzugang-4c2324e7bae3@brauner
Link: https://lore.kernel.org/all/bf0524debb976627693e12ad23690094e4514303.camel@linuxfromscratch.org
* tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Revert "fs: add infrastructure for multigrain timestamps"
  Revert "btrfs: convert to multigrain timestamps"
  Revert "ext4: switch to multigrain timestamps"
  Revert "xfs: switch to multigrain timestamps"
  Revert "tmpfs: add support for multigrain timestamps"

11 months agoMerge tag 'powerpc-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Thu, 21 Sep 2023 15:39:24 +0000 (08:39 -0700)]
Merge tag 'powerpc-6.6-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - A fix for breakpoint handling which was using get_user() while atomic

 - Fix the Power10 HASHCHK handler which was using get_user() while
   atomic

 - A few build fixes for issues caused by recent changes

Thanks to Benjamin Gray, Christophe Leroy, Kajol Jain, and Naveen N Rao.

* tag 'powerpc-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/dexcr: Move HASHCHK trap handler
  powerpc/82xx: Select FSL_SOC
  powerpc: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION and FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
  powerpc/watchpoints: Annotate atomic context in more places
  powerpc/watchpoint: Disable pagefaults when getting user instruction
  powerpc/watchpoints: Disable preemption in thread_change_pc()
  powerpc/perf/hv-24x7: Update domain value check

11 months agoMerge tag 'for-linus-6.6a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 21 Sep 2023 15:27:42 +0000 (08:27 -0700)]
Merge tag 'for-linus-6.6a-rc3-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - remove some unused functions in the Xen event channel handling

 - fix a regression (introduced during the merge window) when booting as
   Xen PV guest

 - small cleanup removing another strncpy() instance

* tag 'for-linus-6.6a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/efi: refactor deprecated strncpy
  x86/xen: allow nesting of same lazy mode
  x86/xen: move paravirt lazy code
  arm/xen: remove lazy mode related definitions
  xen: simplify evtchn_do_upcall() call maze

11 months agoMerge tag 'fixes-2023-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Thu, 21 Sep 2023 15:21:23 +0000 (08:21 -0700)]
Merge tag 'fixes-2023-09-21' of git://git./linux/kernel/git/rppt/memblock

Pull memblock test fixes from Mike Rapoport:
 "Fix several compilation errors and warnings in memblock tests"

* tag 'fixes-2023-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock tests: fix warning ‘struct seq_file’ declared inside parameter list
  memblock tests: fix warning: "__ALIGN_KERNEL" redefined
  memblock tests: Fix compilation errors.

11 months agoMerge tag 'sound-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 21 Sep 2023 15:13:15 +0000 (08:13 -0700)]
Merge tag 'sound-6.6-rc3' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A large collection of fixes around this time.

  All small and mostly trivial fixes.

   - Lots of fixes for the new -Wformat-truncation warnings

   - A fix in ALSA rawmidi core regression and UMP handling

   - Series of Cirrus codec fixes

   - ASoC Intel and Realtek codec fixes

   - Usual HD- and USB-audio quirks and AMD ASoC quirks"

* tag 'sound-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (64 commits)
  ALSA: hda/realtek - ALC287 Realtek I2S speaker platform support
  ALSA: hda: cs35l56: Use the new RUNTIME_PM_OPS() macro
  ALSA: usb-audio: scarlett_gen2: Fix another -Wformat-truncation warning
  ALSA: rawmidi: Fix NULL dereference at proc read
  ASoC: SOF: core: Only call sof_ops_free() on remove if the probe was successful
  ASoC: SOF: Intel: MTL: Reduce the DSP init timeout
  ASoC: cs42l43: Add shared IRQ flag for shutters
  ASoC: imx-audmix: Fix return error with devm_clk_get()
  ASoC: hdaudio.c: Add missing check for devm_kstrdup
  ALSA: riptide: Fix -Wformat-truncation warning for longname string
  ALSA: cs4231: Fix -Wformat-truncation warning for longname string
  ALSA: ad1848: Fix -Wformat-truncation warning for longname string
  ALSA: hda: generic: Check potential mixer name string truncation
  ALSA: cmipci: Fix -Wformat-truncation warning
  ALSA: firewire: Fix -Wformat-truncation warning for MIDI stream names
  ALSA: firewire: Fix -Wformat-truncation warning for longname string
  ALSA: xen: Fix -Wformat-truncation warning
  ALSA: opti9x: Fix -Wformat-truncation warning
  ALSA: es1688: Fix -Wformat-truncation warning
  ALSA: cs4236: Fix -Wformat-truncation warning
  ...

11 months agoMerge tag 'hwmon-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groec...
Linus Torvalds [Thu, 21 Sep 2023 15:10:47 +0000 (08:10 -0700)]
Merge tag 'hwmon-for-v6.6-rc3' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "One patch to drop a non-existent alarm attribute in the nct6775 driver"

* tag 'hwmon-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct6775) Fix non-existent ALARM warning

11 months agonet: dsa: sja1105: make read-only const arrays static
Colin Ian King [Tue, 19 Sep 2023 09:36:06 +0000 (10:36 +0100)]
net: dsa: sja1105: make read-only const arrays static

Don't populate read-only const arrays on the stack, instead make them
static.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230919093606.24446-1-colin.i.king@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonetdev: Remove unneeded semicolon
Yang Li [Tue, 19 Sep 2023 01:03:05 +0000 (09:03 +0800)]
netdev: Remove unneeded semicolon

./drivers/dpll/dpll_netlink.c:847:3-4: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6605
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309190540.RFwfIgO7-lkp@intel.com/
Link: https://lore.kernel.org/r/20230919010305.120991-1-yang.lee@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge branch 'vsock-virtio-vhost-msg_zerocopy-preparations'
Paolo Abeni [Thu, 21 Sep 2023 10:34:02 +0000 (12:34 +0200)]
Merge branch 'vsock-virtio-vhost-msg_zerocopy-preparations'

Arseniy Krasnov says:

====================
vsock/virtio/vhost: MSG_ZEROCOPY preparations

this patchset is first of three parts of another big patchset for
MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/

During review of this series, Stefano Garzarella <sgarzare@redhat.com>
suggested to split it for three parts to simplify review and merging:

1) virtio and vhost updates (for fragged skbs) <--- this patchset
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
   tx completions) and update for Documentation/.
3) Updates for tests and utils.

This series enables handling of fragged skbs in virtio and vhost parts.
Newly logic won't be triggered, because SO_ZEROCOPY options is still
impossible to enable at this moment (next bunch of patches from big
set above will enable it).
====================

Link: https://lore.kernel.org/r/20230916130918.4105122-1-avkrasnov@salutedevices.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agovsock/virtio: MSG_ZEROCOPY flag support
Arseniy Krasnov [Sat, 16 Sep 2023 13:09:18 +0000 (16:09 +0300)]
vsock/virtio: MSG_ZEROCOPY flag support

This adds handling of MSG_ZEROCOPY flag on transmission path:

1) If this flag is set and zerocopy transmission is possible (enabled
   in socket options and transport allows zerocopy), then non-linear
   skb will be created and filled with the pages of user's buffer.
   Pages of user's buffer are locked in memory by 'get_user_pages()'.
2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
   calls 'skb_set_owner_w()'. Reason of this change is that
   '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
   to decrease this field correctly, proper skb destructor is needed:
   'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
   If this callback is set, then transport needs extra check to be able
   to send provided number of buffers in zerocopy mode. Currently, the
   only transport that needs this callback set is virtio, because this
   transport adds new buffers to the virtio queue and we need to check,
   that number of these buffers is less than size of the queue (it is
   required by virtio spec). vhost and loopback transports don't need
   this check.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agovsock/virtio: non-linear skb handling for tap
Arseniy Krasnov [Sat, 16 Sep 2023 13:09:17 +0000 (16:09 +0300)]
vsock/virtio: non-linear skb handling for tap

For tap device new skb is created and data from the current skb is
copied to it. This adds copying data from non-linear skb to new
the skb.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agovsock/virtio: support to send non-linear skb
Arseniy Krasnov [Sat, 16 Sep 2023 13:09:16 +0000 (16:09 +0300)]
vsock/virtio: support to send non-linear skb

For non-linear skb use its pages from fragment array as buffers in
virtio tx queue. These pages are already pinned by 'get_user_pages()'
during such skb creation.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agovsock/virtio/vhost: read data from non-linear skb
Arseniy Krasnov [Sat, 16 Sep 2023 13:09:15 +0000 (16:09 +0300)]
vsock/virtio/vhost: read data from non-linear skb

This is preparation patch for MSG_ZEROCOPY support. It adds handling of
non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with
'skb_copy_datagram_iter()'. Main advantage of the second one is that it
can handle paged part of the skb by using 'kmap()' on each page, but if
there are no pages in the skb, it behaves like simple copying to iov
iterator. This patch also adds new field to the control block of skb -
this value shows current offset in the skb to read next portion of data
(it doesn't matter linear it or not). Idea behind this field is that
'skb_copy_datagram_iter()' handles both types of skb internally - it
just needs an offset from which to copy data from the given skb. This
offset is incremented on each read from skb. This approach allows to
simplify handling of both linear and non-linear skbs, because for
linear skb we need to call 'skb_pull()' after reading data from it,
while in non-linear case we need to update 'data_len'.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 21 Sep 2023 09:09:44 +0000 (11:09 +0200)]
Merge tag 'nf-23-09-20' of https://git./linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter updates for net

The following three patches fix regressions in the netfilter subsystem:

1. Reject attempts to repeatedly toggle the 'dormant' flag in a single
   transaction.  Doing so makes nf_tables lose track of the real state
   vs. the desired state.  This ends with an attempt to unregister hooks
   that were never registered in the first place, which yields a splat.

2. Fix element counting in the new nftables garbage collection infra
   that came with 6.5:  More than 255 expired elements wraps a counter
   which results in memory leak.

3. Since 6.4 ipset can BUG when a set is renamed while a CREATE command
   is in progress, fix from Jozsef Kadlecsik.

* tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
  netfilter: nf_tables: fix memleak when more than 255 elements expired
  netfilter: nf_tables: disable toggling dormant table state more than once
====================

Link: https://lore.kernel.org/r/20230920084156.4192-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agosfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
Edward Cree [Tue, 19 Sep 2023 18:39:49 +0000 (19:39 +0100)]
sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()

Several places in TC offload code assumed that the return from
 rhashtable_lookup_get_insert_fast() was always either NULL or a valid
 pointer to an existing entry, but in fact that function can return an
 error pointer.  In that case, perform the usual cleanup of the newly
 created entry, then pass up the error, rather than attempting to take a
 reference on the old entry.

Fixes: d902e1a737d4 ("sfc: bare bones TC offload on EF100")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230919183949.59392-1-edward.cree@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoigc: Expose tx-usecs coalesce setting to user
Muhammad Husaini Zulkifli [Tue, 19 Sep 2023 17:03:31 +0000 (10:03 -0700)]
igc: Expose tx-usecs coalesce setting to user

When users attempt to obtain the coalesce setting using the
ethtool command, current code always returns 0 for tx-usecs.
This is because I225/6 always uses a queue pair setting, hence
tx_coalesce_usecs does not return a value during the
igc_ethtool_get_coalesce() callback process. The pair queue
condition checking in igc_ethtool_get_coalesce() is removed by
this patch so that the user gets information of the value of tx-usecs.

Even if i225/6 is using queue pair setting, there is no harm in
notifying the user of the tx-usecs. The implementation of the current
code may have previously been a copy of the legacy code i210.
Since I225 has the queue pair setting enabled, tx-usecs will always adhere
to the user-set rx-usecs value. An error message will appear when the user
attempts to set the tx-usecs value for the input parameters because,
by default, they should only set the rx-usecs value.

This patch also adds the helper function to get the
previous rx coalesce value similar to tx coalesce.

How to test:
User can get the coalesce value using ethtool command.

Example command:
Get: ethtool -c <interface>

Previous output:

rx-usecs: 3
rx-frames: n/a
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 0
tx-frames: n/a
tx-usecs-irq: n/a
tx-frames-irq: n/a

New output:

rx-usecs: 3
rx-frames: n/a
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 3
tx-frames: n/a
tx-usecs-irq: n/a
tx-frames-irq: n/a

Fixes: 8c5ad0dae93c ("igc: Add ethtool support")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230919170331.1581031-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge branch 'add-missing-xdp_do_flush-invocations'
Paolo Abeni [Thu, 21 Sep 2023 07:01:44 +0000 (09:01 +0200)]
Merge branch 'add-missing-xdp_do_flush-invocations'

Sebastian Andrzej Siewior says:

====================
Add missing xdp_do_flush() invocations.

I've been looking at the drivers/ XDP users and noticed that some
XDP_REDIRECT user don't invoke xdp_do_flush() at the end.
====================

Link: https://lore.kernel.org/r/20230918153611.165722-1-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoocteontx2-pf: Do xdp_do_flush() after redirects.
Sebastian Andrzej Siewior [Mon, 18 Sep 2023 15:36:11 +0000 (17:36 +0200)]
octeontx2-pf: Do xdp_do_flush() after redirects.

xdp_do_flush() should be invoked before leaving the NAPI poll function
if XDP-redirect has been performed.

Invoke xdp_do_flush() before leaving NAPI.

Cc: Geetha sowjanya <gakula@marvell.com>
Cc: Subbaraya Sundeep <sbhatta@marvell.com>
Cc: Sunil Goutham <sgoutham@marvell.com>
Cc: hariprasad <hkelam@marvell.com>
Fixes: 06059a1a9a4a5 ("octeontx2-pf: Add XDP support to netdev PF")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Geethasowjanya Akula <gakula@marvell.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agobnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
Sebastian Andrzej Siewior [Mon, 18 Sep 2023 15:36:10 +0000 (17:36 +0200)]
bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI

bnxt_poll_nitroa0() invokes bnxt_rx_pkt() which can run a XDP program
which in turn can return XDP_REDIRECT. bnxt_rx_pkt() is also used by
__bnxt_poll_work() which flushes (xdp_do_flush()) the packets after each
round. bnxt_poll_nitroa0() lacks this feature.
xdp_do_flush() should be invoked before leaving the NAPI callback.

Invoke xdp_do_flush() after a redirect in bnxt_poll_nitroa0() NAPI.

Cc: Michael Chan <michael.chan@broadcom.com>
Fixes: f18c2b77b2e4e ("bnxt_en: optimized XDP_REDIRECT support")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: ena: Flush XDP packets on error.
Sebastian Andrzej Siewior [Mon, 18 Sep 2023 15:36:09 +0000 (17:36 +0200)]
net: ena: Flush XDP packets on error.

xdp_do_flush() should be invoked before leaving the NAPI poll function
after a XDP-redirect. This is not the case if the driver leaves via
the error path (after having a redirect in one of its previous
iterations).

Invoke xdp_do_flush() also in the error path.

Cc: Arthur Kiyanovski <akiyano@amazon.com>
Cc: David Arinzon <darinzon@amazon.com>
Cc: Noam Dagan <ndagan@amazon.com>
Cc: Saeed Bishara <saeedb@amazon.com>
Cc: Shay Agroskin <shayagr@amazon.com>
Fixes: a318c70ad152b ("net: ena: introduce XDP redirect implementation")
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agoMerge tag 'media/v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Wed, 20 Sep 2023 19:09:23 +0000 (12:09 -0700)]
Merge tag 'media/v6.6-2' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - driver fixes due to incorrect fwnode_handle_put() call

 - bt8xx: bttv_risc_packed(): remove field checks

 - vb2: frame_vector.c: replace WARN_ONCE with a comment

 - imx219: a couple typo fixes and perform a full mode set
   unconditionally

 - uvcvideo: Fix OOB read

 - some dependency fixes

* tag 'media/v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: imx-mipi-csis: Remove an incorrect fwnode_handle_put() call
  media: vb2: frame_vector.c: replace WARN_ONCE with a comment
  media: uvcvideo: Fix OOB read
  media: bt8xx: bttv_risc_packed(): remove field checks
  media: i2c: rdacm21: Remove an incorrect fwnode_handle_put() call
  media: i2c: imx219: Perform a full mode set unconditionally
  media: i2c: imx219: Fix crop rectangle setting when changing format
  media: i2c: imx219: Fix a typo referring to a wrong variable
  media: i2c: max9286: Remove an incorrect fwnode_handle_put() call
  media: ivsc: Depend on VIDEO_DEV
  media: via: Use correct dependency for camera sensor drivers
  media: v4l: Use correct dependency for camera sensor drivers
  media: pci: ivsc: Select build dependencies

12 months agoMerge tag 'for-6.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 20 Sep 2023 18:03:45 +0000 (11:03 -0700)]
Merge tag 'for-6.6-rc2-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more followup fixes to the directory listing.

  People have noticed different behaviour compared to other filesystems
  after changes in 6.5. This is now unified to more "logical" and
  expected behaviour while still within POSIX. And a few more fixes for
  stable.

   - change behaviour of readdir()/rewinddir() when new directory
     entries are created after opendir(), properly tracking the last
     entry

   - fix race in readdir when multiple threads can set the last entry
     index for a directory

  Additionally:

   - use exclusive lock when direct io might need to drop privs and call
     notify_change()

   - don't clear uptodate bit on page after an error, this may lead to a
     deadlock in subpage mode

   - fix waiting pattern when multiple readers block on Merkle tree
     data, switch to folios"

* tag 'for-6.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix race between reading a directory and adding entries to it
  btrfs: refresh dir last index during a rewinddir(3) call
  btrfs: set last dir index to the current last index when opening dir
  btrfs: don't clear uptodate on write errors
  btrfs: file_remove_privs needs an exclusive lock in direct io write
  btrfs: convert btrfs_read_merkle_tree_page() to use a folio

12 months agoMerge tag 'spi-fix-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Wed, 20 Sep 2023 16:28:22 +0000 (09:28 -0700)]
Merge tag 'spi-fix-v6.6-rc2' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A small collection of fixes, plus a new device ID for Intel Granite
  Rapids systems.

  The fix for the i.MX driver is fairly urgent, it's fixing a data
  corruption issue when bits per word isn't 8.

  There's also one fix which was queued but not sent for v6.4 due to
  being minor and arriving at the end of the release"

* tag 'spi-fix-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: imx: Take in account bits per word instead of assuming 8-bits
  spi: intel-pci: Add support for Granite Rapids SPI serial flash
  spi: stm32: add a delay before SPI disable
  spi: nxp-fspi: reset the FLSHxCR1 registers
  spi: zynqmp-gqspi: fix clock imbalance on probe failure

12 months agoMerge tag 'regulator-fix-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 20 Sep 2023 16:26:42 +0000 (09:26 -0700)]
Merge tag 'regulator-fix-v6.6-rc2' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "One fix for the tps6287x driver which was incorrectly specifying the
  field for voltage range selection leading to incorrect voltages being
  set"

* tag 'regulator-fix-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Fix voltage range selection

12 months agoRevert "fs: add infrastructure for multigrain timestamps"
Christian Brauner [Wed, 20 Sep 2023 14:40:13 +0000 (16:40 +0200)]
Revert "fs: add infrastructure for multigrain timestamps"

This reverts commit ffb6cf19e06334062744b7e3493f71e500964f8e.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoRevert "btrfs: convert to multigrain timestamps"
Christian Brauner [Wed, 20 Sep 2023 14:41:47 +0000 (16:41 +0200)]
Revert "btrfs: convert to multigrain timestamps"

This reverts commit 50e9ceef1d4f644ee0049e82e360058a64ec284c.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoRevert "ext4: switch to multigrain timestamps"
Christian Brauner [Wed, 20 Sep 2023 14:41:45 +0000 (16:41 +0200)]
Revert "ext4: switch to multigrain timestamps"

This reverts commit 0269b585868e59b6a2ecc6ea685d39310e4fc18b.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoRevert "xfs: switch to multigrain timestamps"
Christian Brauner [Wed, 20 Sep 2023 14:41:18 +0000 (16:41 +0200)]
Revert "xfs: switch to multigrain timestamps"

This reverts commit e44df2664746aed8b6dd5245eb711a0ce33c5cf5.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoRevert "tmpfs: add support for multigrain timestamps"
Christian Brauner [Wed, 20 Sep 2023 14:40:22 +0000 (16:40 +0200)]
Revert "tmpfs: add support for multigrain timestamps"

This reverts commit d48c3397291690c3576d6c983b0a86ecbc203cac.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoMerge tag 'asoc-fix-v6.6-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git...
Takashi Iwai [Wed, 20 Sep 2023 13:02:16 +0000 (15:02 +0200)]
Merge tag 'asoc-fix-v6.6-rc2' of https://git./linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.6

Quite a large collection of fixes, with numbers boosted by multiple
vendors sending multi-patch serieses.  Nothing super major, and also one
device quirk.

12 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Wed, 20 Sep 2023 10:56:27 +0000 (11:56 +0100)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
bpf-next 2023-09-19

The following pull-request contains BPF updates for your *net-next* tree.

We've added 4 non-merge commits during the last 1 day(s) which contain
a total of 4 files changed, 9 insertions(+), 13 deletions(-).

The main changes are:

1) A set of fixes for bpf exceptions, from Kumar and Alexei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
12 months agonet/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
Jinjie Ruan [Tue, 19 Sep 2023 10:44:06 +0000 (18:44 +0800)]
net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()

When making CONFIG_DEBUG_KMEMLEAK=y and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y,
modprobe handshake-test and then rmmmod handshake-test, the below memory
leak is detected.

The struct socket_alloc which is allocated by alloc_inode_sb() in
__sock_create() is not freed. And the struct dentry which is allocated
by __d_alloc() in sock_alloc_file() is not freed.

Since fput() will call file->f_op->release() which is sock_close() here and
it will call __sock_release(). and fput() will call dput(dentry) to free
the struct dentry. So replace sock_release() with fput() to fix the
below memory leak. After applying this patch, the following memory leak is
never detected.

unreferenced object 0xffff888109165840 (size 768):
  comm "kunit_try_catch", pid 1852, jiffies 4294685807 (age 976.262s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa0209ba2>] 0xffffffffa0209ba2
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810f472008 (size 192):
  comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 08 20 47 0f 81 88 ff ff  ......... G.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0209bbb>] 0xffffffffa0209bbb
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810958e580 (size 224):
  comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0209bbb>] 0xffffffffa0209bbb
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926dc88 (size 192):
  comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 88 dc 26 09 81 88 ff ff  ..........&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208fdc>] 0xffffffffa0208fdc
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a241380 (size 224):
  comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208fdc>] 0xffffffffa0208fdc
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109165040 (size 768):
  comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa0208860>] 0xffffffffa0208860
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926d568 (size 192):
  comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 68 d5 26 09 81 88 ff ff  ........h.&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208879>] 0xffffffffa0208879
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240580 (size 224):
  comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.347s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208879>] 0xffffffffa0208879
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164c40 (size 768):
  comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa0208541>] 0xffffffffa0208541
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926cd18 (size 192):
  comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 18 cd 26 09 81 88 ff ff  ..........&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa020855a>] 0xffffffffa020855a
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240200 (size 224):
  comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa020855a>] 0xffffffffa020855a
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164840 (size 768):
  comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa02093e2>] 0xffffffffa02093e2
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926cab8 (size 192):
  comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 b8 ca 26 09 81 88 ff ff  ..........&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa02093fb>] 0xffffffffa02093fb
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240040 (size 224):
  comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa02093fb>] 0xffffffffa02093fb
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109166440 (size 768):
  comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa02097c1>] 0xffffffffa02097c1
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926c398 (size 192):
  comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 98 c3 26 09 81 88 ff ff  ..........&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa02097da>] 0xffffffffa02097da
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107e0b8c0 (size 224):
  comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa02097da>] 0xffffffffa02097da
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164440 (size 768):
  comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.487s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa020824e>] 0xffffffffa020824e
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810f4cf698 (size 192):
  comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 98 f6 4c 0f 81 88 ff ff  ..........L.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208267>] 0xffffffffa0208267
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107e0b000 (size 224):
  comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208267>] 0xffffffffa0208267
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20

Fixes: 88232ec1ec5e ("net/handshake: Add Kunit tests for the handshake consumer API")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>