linux-block.git
9 months agoselftests/net: convert gre_gso.sh to run it in unique namespace
Hangbin Liu [Tue, 19 Dec 2023 09:48:49 +0000 (17:48 +0800)]
selftests/net: convert gre_gso.sh to run it in unique namespace

Here is the test result after conversion.

 # ./gre_gso.sh
     TEST: GREv6/v4 - copy file w/ TSO                                   [ OK ]
     TEST: GREv6/v4 - copy file w/ GSO                                   [ OK ]
     TEST: GREv6/v6 - copy file w/ TSO                                   [ OK ]
     TEST: GREv6/v6 - copy file w/ GSO                                   [ OK ]

 Tests passed:   4
 Tests failed:   0

Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftests/net: remove unneeded semicolon
Jiapeng Chong [Tue, 19 Dec 2023 05:54:04 +0000 (13:54 +0800)]
selftests/net: remove unneeded semicolon

No functional modification involved.

./tools/testing/selftests/net/tcp_ao/setsockopt-closed.c:121:2-3: Unneeded semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7771
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftest/tcp-ao: Rectify out-of-tree build
Dmitry Safonov [Tue, 19 Dec 2023 02:03:05 +0000 (02:03 +0000)]
selftest/tcp-ao: Rectify out-of-tree build

Trivial fix for out-of-tree build that I wasn't testing previously:

1. Create a directory for library object files, fixes:
> gcc lib/kconfig.c -Wall -O2 -g -D_GNU_SOURCE -fno-strict-aliasing -I ../../../../../usr/include/ -iquote /tmp/kselftest/kselftest/net/tcp_ao/lib -I ../../../../include/  -o /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o -c
> Assembler messages:
> Fatal error: can't create /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o: No such file or directory
> make[1]: *** [Makefile:46: /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o] Error 1

2. Include $(KHDR_INCLUDES) that's exported by selftests/Makefile, fixes:
> In file included from lib/kconfig.c:6:
> lib/aolib.h:320:45: warning: ‘struct tcp_ao_add’ declared inside parameter list will not be visible outside of this definition or declaration
>   320 | extern int test_prepare_key_sockaddr(struct tcp_ao_add *ao, const char *alg,
>       |                                             ^~~~~~~~~~
...

3. While at here, clean-up $(KSFT_KHDR_INSTALL): it's not needed anymore
   since commit f2745dc0ba3d ("selftests: stop using KSFT_KHDR_INSTALL")

4. Also, while at here, drop .DEFAULT_GOAL definition: that has a
   self-explaining comment, that was valid when I made these selftests
   compile on local v4.19 kernel, but not needed since
   commit 8ce72dc32578 ("selftests: fix headers_install circular dependency")

Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312190645.q76MmHyq-lkp@intel.com/
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotipc: Remove some excess struct member documentation
Jonathan Corbet [Tue, 19 Dec 2023 00:28:13 +0000 (17:28 -0700)]
tipc: Remove some excess struct member documentation

Remove documentation for nonexistent struct members, addressing these
warnings:

  ./net/tipc/link.c:228: warning: Excess struct member 'media_addr' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'timer' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'refcnt' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'proto_msg' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'pmsg' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'backlog_limit' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'exp_msg_count' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'reset_rcv_checkpt' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'transmitq' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'snt_nxt' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'deferred_queue' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'unacked_window' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'next_out' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'long_msg_seq_no' description in 'tipc_link'
  ./net/tipc/link.c:228: warning: Excess struct member 'bc_rcvr' description in 'tipc_link'

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agonet: skbuff: Remove some excess struct-member documentation
Jonathan Corbet [Tue, 19 Dec 2023 00:26:26 +0000 (17:26 -0700)]
net: skbuff: Remove some excess struct-member documentation

Remove documentation for nonexistent structure members, addressing these
warnings:

  ./include/linux/skbuff.h:1063: warning: Excess struct member 'sp' description in 'sk_buff'
  ./include/linux/skbuff.h:1063: warning: Excess struct member 'nf_bridge' description in 'sk_buff'

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoMerge branch 'tcp-refactor-bhash2'
David S. Miller [Fri, 22 Dec 2023 22:15:35 +0000 (22:15 +0000)]
Merge branch 'tcp-refactor-bhash2'

Kuniyuki Iwashima says:

====================
tcp: Refactor bhash2 and remove sk_bind2_node.

This series refactors code around bhash2 and remove some bhash2-specific
fields; sock.sk_bind2_node, and inet_timewait_sock.tw_bind2_node.

  patch 1      : optimise bind() for non-wildcard v4-mapped-v6 address
  patch 2 -  4 : optimise bind() conflict tests
  patch 5 - 12 : Link bhash2 to bhash and unlink sk from bhash2 to
                 remove sk_bind2_node

The patch 8 will trigger a false-positive error by checkpatch.

v2: resend of https://lore.kernel.org/netdev/20231213082029.35149-1-kuniyu@amazon.com/
  * Rebase on latest net-next
  * Patch 11
    * Add change in inet_diag_dump_icsk() for recent bhash dump patch

v1: https://lore.kernel.org/netdev/20231023190255.39190-1-kuniyu@amazon.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Remove dead code and fields for bhash2.
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:33 +0000 (09:18 +0900)]
tcp: Remove dead code and fields for bhash2.

Now all sockets including TIME_WAIT are linked to bhash2 using
sock_common.skc_bind_node.

We no longer use inet_bind2_bucket.deathrow, sock.sk_bind2_node,
and inet_timewait_sock.tw_bind2_node.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Link sk and twsk to tb2->owners using skc_bind_node.
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:32 +0000 (09:18 +0900)]
tcp: Link sk and twsk to tb2->owners using skc_bind_node.

Now we can use sk_bind_node/tw_bind_node for bhash2, which means
we need not link TIME_WAIT sockets separately.

The dead code and sk_bind2_node will be removed in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Unlink sk from bhash.
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:31 +0000 (09:18 +0900)]
tcp: Unlink sk from bhash.

Now we do not use tb->owners and can unlink sockets from bhash.

sk_bind_node/tw_bind_node are available for bhash2 and will be
used in the following patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Check hlist_empty(&tb->bhash2) instead of hlist_empty(&tb->owners).
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:30 +0000 (09:18 +0900)]
tcp: Check hlist_empty(&tb->bhash2) instead of hlist_empty(&tb->owners).

We use hlist_empty(&tb->owners) to check if the bhash bucket has a socket.
We can check the child bhash2 buckets instead.

For this to work, the bhash2 bucket must be freed before the bhash bucket.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Iterate tb->bhash2 in inet_csk_bind_conflict().
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:29 +0000 (09:18 +0900)]
tcp: Iterate tb->bhash2 in inet_csk_bind_conflict().

Sockets in bhash are also linked to bhash2, but TIME_WAIT sockets
are linked separately in tb2->deathrow.

Let's replace tb->owners iteration in inet_csk_bind_conflict() with
two iterations over tb2->owners and tb2->deathrow.

This can be done safely under bhash's lock because socket insertion/
deletion in bhash2 happens with bhash's lock held.

Note that twsk_for_each_bound_bhash() will be removed later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Rearrange tests in inet_csk_bind_conflict().
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:28 +0000 (09:18 +0900)]
tcp: Rearrange tests in inet_csk_bind_conflict().

The following patch adds code in the !inet_use_bhash2_on_bind(sk)
case in inet_csk_bind_conflict().

To avoid adding nest and make the change cleaner, this patch
rearranges tests in inet_csk_bind_conflict().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Link bhash2 to bhash.
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:27 +0000 (09:18 +0900)]
tcp: Link bhash2 to bhash.

bhash2 added a new member sk_bind2_node in struct sock to link
sockets to bhash2 in addition to bhash.

bhash is still needed to search conflicting sockets efficiently
from a port for the wildcard address.  However, bhash itself need
not have sockets.

If we link each bhash2 bucket to the corresponding bhash bucket,
we can iterate the same set of the sockets from bhash2 via bhash.

This patch links bhash2 to bhash only, and the actual use will be
in the later patches.  Finally, we will remove sk_bind2_node.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Rename tb in inet_bind2_bucket_(init|create)().
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:26 +0000 (09:18 +0900)]
tcp: Rename tb in inet_bind2_bucket_(init|create)().

Later, we no longer link sockets to bhash.  Instead, each bhash2
bucket is linked to the corresponding bhash bucket.

Then, we pass the bhash bucket to bhash2 allocation functions as
tb.  However, tb is already used in inet_bind2_bucket_create() and
inet_bind2_bucket_init() as the bhash2 bucket.

To make the following diff clear, let's use tb2 for the bhash2 bucket
there.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Save address type in inet_bind2_bucket.
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:25 +0000 (09:18 +0900)]
tcp: Save address type in inet_bind2_bucket.

inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any()
are called for each bhash2 bucket to check conflicts.  Thus, we call
ipv6_addr_any() and ipv6_addr_v4mapped() over and over during bind().

Let's avoid calling them by saving the address type in inet_bind2_bucket.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Save v4 address as v4-mapped-v6 in inet_bind2_bucket.v6_rcv_saddr.
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:24 +0000 (09:18 +0900)]
tcp: Save v4 address as v4-mapped-v6 in inet_bind2_bucket.v6_rcv_saddr.

In bhash2, IPv4/IPv6 addresses are saved in two union members,
which complicate address checks in inet_bind2_bucket_addr_match()
and inet_bind2_bucket_match_addr_any() considering uninitialised
memory and v4-mapped-v6 conflicts.

Let's simplify that by saving IPv4 address as v4-mapped-v6 address
and defining tb2.rcv_saddr as tb2.v6_rcv_saddr.s6_addr32[3].

Then, we can compare v6 address as is, and after checking v4-mapped-v6,
we can compare v4 address easily.  Also, we can remove tb2->family.

Note these functions will be further refactored in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Rearrange tests in inet_bind2_bucket_(addr_match|match_addr_any)().
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:23 +0000 (09:18 +0900)]
tcp: Rearrange tests in inet_bind2_bucket_(addr_match|match_addr_any)().

The protocol family tests in inet_bind2_bucket_addr_match() and
inet_bind2_bucket_match_addr_any() are ordered as follows.

  if (sk->sk_family != tb2->family)
  else if (sk->sk_family == AF_INET6)
  else

This patch rearranges them so that AF_INET6 socket is handled first
to make the following patch tidy, where tb2->family will be removed.

  if (sk->sk_family == AF_INET6)
  else if (tb2->family == AF_INET6)
  else

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agotcp: Use bhash2 for v4-mapped-v6 non-wildcard address.
Kuniyuki Iwashima [Tue, 19 Dec 2023 00:18:22 +0000 (09:18 +0900)]
tcp: Use bhash2 for v4-mapped-v6 non-wildcard address.

While checking port availability in bind() or listen(), we used only
bhash for all v4-mapped-v6 addresses.  But there is no good reason not
to use bhash2 for v4-mapped-v6 non-wildcard addresses.

Let's do it by returning true in inet_use_bhash2_on_bind().  Then, we
also need to add a test in inet_bind2_bucket_match_addr_any() so that
::ffff:X.X.X.X will match with 0.0.0.0.

Note that sk->sk_rcv_saddr is initialised for v4-mapped-v6 sk in
__inet6_bind().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftests/net: Fix various spelling mistakes in TCP-AO tests
Colin Ian King [Mon, 18 Dec 2023 13:30:22 +0000 (13:30 +0000)]
selftests/net: Fix various spelling mistakes in TCP-AO tests

There are a handful of spelling mistakes in test messages in the
TCP-AIO selftests. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoocteontx2-af: Fix a double free issue
Suman Ghosh [Mon, 18 Dec 2023 18:02:58 +0000 (23:32 +0530)]
octeontx2-af: Fix a double free issue

There was a memory leak during error handling in function
npc_mcam_rsrcs_init().

Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
David S. Miller [Fri, 22 Dec 2023 12:09:52 +0000 (12:09 +0000)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
intel: use bitfield operations

Jesse Brandeburg says:

After repeatedly getting review comments on new patches, and sporadic
patches to fix parts of our drivers, we should just convert the Intel code
to use FIELD_PREP() and FIELD_GET().  It's then "common" in the code and
hopefully future change-sets will see the context and do-the-right-thing.

This conversion was done with a coccinelle script which is mentioned in the
commit messages. Generally there were only a couple conversions that were
"undone" after the automatic changes because they tried to convert a
non-contiguous mask.

Patch 1 is required at the beginning of this series to fix a "forever"
issue in the e1000e driver that fails the compilation test after conversion
because the shift / mask was out of range.

The second patch just adds all the new #includes in one go.

The patch titled: "ice: fix pre-shifted bit usage" is needed to allow the
use of the FIELD_* macros and fix up the unexpected "shifts included"
defines found while creating this series.

The rest are the conversion to use FIELD_PREP()/FIELD_GET(), and the
occasional leXX_{get,set,encode}_bits() call, as suggested by Alex.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Paolo Abeni [Thu, 21 Dec 2023 21:17:23 +0000 (22:17 +0100)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
  23c93c3b6275 ("bnxt_en: do not map packet buffers twice")
  6d1add95536b ("bnxt_en: Modify TX ring indexing logic.")

tools/testing/selftests/net/Makefile
  2258b666482d ("selftests: add vlan hw filter tests")
  a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet/ipv6: Remove gc_link warn on in fib6_info_release
David Ahern [Tue, 19 Dec 2023 03:07:42 +0000 (20:07 -0700)]
net/ipv6: Remove gc_link warn on in fib6_info_release

A revert of
   3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes")
was sent for net-next. Revert the remainder of 5a08d0065a915
which added a warn on if a fib entry is still on the gc_link list
to avoid compile failures when net is merged to net-next

Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231219030742.25715-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoMerge tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 21 Dec 2023 17:15:37 +0000 (09:15 -0800)]
Merge tag 'net-6.7-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from WiFi and bpf.

  Current release - regressions:

   - bpf: syzkaller found null ptr deref in unix_bpf proto add

   - eth: i40e: fix ST code value for clause 45

  Previous releases - regressions:

   - core: return error from sk_stream_wait_connect() if sk_wait_event()
     fails

   - ipv6: revert remove expired routes with a separated list of routes

   - wifi rfkill:
       - set GPIO direction
       - fix crash with WED rx support enabled

   - bluetooth:
       - fix deadlock in vhci_send_frame
       - fix use-after-free in bt_sock_recvmsg

   - eth: mlx5e: fix a race in command alloc flow

   - eth: ice: fix PF with enabled XDP going no-carrier after reset

   - eth: bnxt_en: do not map packet buffers twice

  Previous releases - always broken:

   - core:
       - check vlan filter feature in vlan_vids_add_by_dev() and
         vlan_vids_del_by_dev()
       - check dev->gso_max_size in gso_features_check()

   - mptcp: fix inconsistent state on fastopen race

   - phy: skip LED triggers on PHYs on SFP modules

   - eth: mlx5e:
       - fix double free of encap_header
       - fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()"

* tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  net: check dev->gso_max_size in gso_features_check()
  kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail
  net/ipv6: Revert remove expired routes with a separated list of routes
  net: avoid build bug in skb extension length calculation
  net: ethernet: mtk_wed: fix possible NULL pointer dereference in mtk_wed_wo_queue_tx_clean()
  net: stmmac: fix incorrect flag check in timestamp interrupt
  selftests: add vlan hw filter tests
  net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()
  net: hns3: add new maintainer for the HNS3 ethernet driver
  net: mana: select PAGE_POOL
  net: ks8851: Fix TX stall caused by TX buffer overrun
  ice: Fix PF with enabled XDP going no-carrier after reset
  ice: alter feature support check for SRIOV and LAG
  ice: stop trashing VF VSI aggregator node ID information
  mailmap: add entries for Geliang Tang
  mptcp: fill in missing MODULE_DESCRIPTION()
  mptcp: fix inconsistent state on fastopen race
  selftests: mptcp: join: fix subflow_send_ack lookup
  net: phy: skip LED triggers on PHYs on SFP modules
  bpf: Add missing BPF_LINK_TYPE invocations
  ...

9 months agonet: phy: at803x: replace msleep(1) with usleep_range
Christian Marangi [Sun, 17 Dec 2023 23:25:08 +0000 (00:25 +0100)]
net: phy: at803x: replace msleep(1) with usleep_range

Replace msleep(1) with usleep_range as suggested by timers-howto guide.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231217232508.26470-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet: phy: at803x: remove extra space after cast
Christian Marangi [Sun, 17 Dec 2023 23:27:39 +0000 (00:27 +0100)]
net: phy: at803x: remove extra space after cast

Remove extra space after cast as reported by checkpatch to keep code
clean.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231217232739.27065-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Paolo Abeni [Thu, 21 Dec 2023 11:27:28 +0000 (12:27 +0100)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-12-21

Hi David, hi Jakub, hi Paolo, hi Eric,

The following pull-request contains BPF updates for your *net* tree.

We've added 3 non-merge commits during the last 5 day(s) which contain
a total of 4 files changed, 45 insertions(+).

The main changes are:

1) Fix a syzkaller splat which triggered an oob issue in bpf_link_show_fdinfo(),
   from Jiri Olsa.

2) Fix another syzkaller-found issue which triggered a NULL pointer dereference
   in BPF sockmap for unconnected unix sockets, from John Fastabend.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add missing BPF_LINK_TYPE invocations
  bpf: sockmap, test for unconnected af_unix sock
  bpf: syzkaller found null ptr deref in unix_bpf proto add
====================

Link: https://lore.kernel.org/r/20231221104844.1374-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet: check dev->gso_max_size in gso_features_check()
Eric Dumazet [Tue, 19 Dec 2023 12:53:31 +0000 (12:53 +0000)]
net: check dev->gso_max_size in gso_features_check()

Some drivers might misbehave if TSO packets get too big.

GVE for instance uses a 16bit field in its TX descriptor,
and will do bad things if a packet is bigger than 2^16 bytes.

Linux TCP stack honors dev->gso_max_size, but there are
other ways for too big packets to reach an ndo_start_xmit()
handler : virtio_net, af_packet, GRO...

Add a generic check in gso_features_check() and fallback
to GSO when needed.

gso_max_size was added in the blamed commit.

Fixes: 82cc1a7a5687 ("[NET]: Add per-connection option to set max TSO frame size")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231219125331.4127498-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agokselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail
Hangbin Liu [Tue, 19 Dec 2023 06:57:37 +0000 (14:57 +0800)]
kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail

run_cmd_grep_fail should be used when expecting the cmd fail, or the ret
will be set to 1, and the total test return 1 when exiting. This would cause
the result report to fail if run via run_kselftest.sh.

Before fix:
 # ./rtnetlink.sh -t kci_test_addrlft
 PASS: preferred_lft addresses have expired
 # echo $?
 1

After fix:
 # ./rtnetlink.sh -t kci_test_addrlft
 PASS: preferred_lft addresses have expired
 # echo $?
 0

Fixes: 9c2a19f71515 ("kselftest: rtnetlink.sh: add verbose flag")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231219065737.1725120-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet/ipv6: Revert remove expired routes with a separated list of routes
David Ahern [Tue, 19 Dec 2023 03:02:43 +0000 (20:02 -0700)]
net/ipv6: Revert remove expired routes with a separated list of routes

This reverts commit 3dec89b14d37ee635e772636dad3f09f78f1ab87.

The commit has some race conditions given how expires is managed on a
fib6_info in relation to gc start, adding the entry to the gc list and
setting the timer value leading to UAF. Revert the commit and try again
in a later release.

Fixes: 3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes")
Cc: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231219030243.25687-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Paolo Abeni [Thu, 21 Dec 2023 07:34:08 +0000 (08:34 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-12-18 (ice)

This series contains updates to ice driver only.

Jakes stops clearing of needed aggregator information.

Dave adds a check for LAG device support before initializing the
associated event handler.

Larysa restores accounting of XDP queues in TC configurations.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Fix PF with enabled XDP going no-carrier after reset
  ice: alter feature support check for SRIOV and LAG
  ice: stop trashing VF VSI aggregator node ID information
====================

Link: https://lore.kernel.org/r/20231218192708.3397702-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet: avoid build bug in skb extension length calculation
Thomas Weißschuh [Mon, 18 Dec 2023 17:06:54 +0000 (18:06 +0100)]
net: avoid build bug in skb extension length calculation

GCC seems to incorrectly fail to evaluate skb_ext_total_length() at
compile time under certain conditions.

The issue even occurs if all values in skb_ext_type_len[] are "0",
ruling out the possibility of an actual overflow.

As the patch has been in mainline since v6.6 without triggering the
problem it seems to be a very uncommon occurrence.

As the issue only occurs when -fno-tree-loop-im is specified as part of
CFLAGS_GCOV, disable the BUILD_BUG_ON() only when building with coverage
reporting enabled.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312171924.4FozI5FG-lkp@intel.com/
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/lkml/487cfd35-fe68-416f-9bfd-6bb417f98304@app.fastmail.com/
Fixes: 5d21d0a65b57 ("net: generalize calculation of skb extensions length")
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20231218-net-skbuff-build-bug-v1-1-eefc2fb0a7d3@weissschuh.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet: ethernet: mtk_wed: fix possible NULL pointer dereference in mtk_wed_wo_queue_tx_...
Lorenzo Bianconi [Sun, 17 Dec 2023 15:37:40 +0000 (16:37 +0100)]
net: ethernet: mtk_wed: fix possible NULL pointer dereference in mtk_wed_wo_queue_tx_clean()

In order to avoid a NULL pointer dereference, check entry->buf pointer before running
skb_free_frag in mtk_wed_wo_queue_tx_clean routine.

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/3c1262464d215faa8acebfc08869798c81c96f4a.1702827359.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoposix-timers: Get rid of [COMPAT_]SYS_NI() uses
Linus Torvalds [Tue, 19 Dec 2023 23:26:59 +0000 (15:26 -0800)]
posix-timers: Get rid of [COMPAT_]SYS_NI() uses

Only the posix timer system calls use this (when the posix timer support
is disabled, which does not actually happen in any normal case), because
they had debug code to print out a warning about missing system calls.

Get rid of that special case, and just use the standard COND_SYSCALL
interface that creates weak system call stubs that return -ENOSYS for
when the system call does not exist.

This fixes a kCFI issue with the SYS_NI() hackery:

  CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
  WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0

Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 months agoMerge tag '6.7-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Thu, 21 Dec 2023 05:09:47 +0000 (21:09 -0800)]
Merge tag '6.7-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - two multichannel reconnect fixes, one fixing an important refcounting
   problem that can lead to umount problems

 - atime fix

 - five fixes for various potential OOB accesses, including a CVE fix,
   and two additional fixes for problems pointed out by Robert Morris's
   fuzzing investigation

* tag '6.7-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: do not let cifs_chan_update_iface deallocate channels
  cifs: fix a pending undercount of srv_count
  fs: cifs: Fix atime update check
  smb: client: fix potential OOB in smb2_dump_detail()
  smb: client: fix potential OOB in cifs_dump_detail()
  smb: client: fix OOB in smbCalcSize()
  smb: client: fix OOB in SMB2_query_info_init()
  smb: client: fix OOB in cifsd when receiving compounded resps

9 months agoMerge tag 's390-6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 21 Dec 2023 00:12:39 +0000 (16:12 -0800)]
Merge tag 's390-6.7-4' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

 - Fix virtual vs physical address confusion in Storage Class Memory
   (SCM) block device driver.

 - Fix saving and restoring of FPU kernel context, which could lead to
   corruption of vector registers 8-15

 - Update defconfigs

* tag 's390-6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: update defconfigs
  s390/vx: fix save/restore of fpu kernel context
  s390/scm: fix virtual vs physical address confusion

9 months agoMerge tag 'soc-fixes-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Thu, 21 Dec 2023 00:06:40 +0000 (16:06 -0800)]
Merge tag 'soc-fixes-6.7-2' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "There are only a handful of bugfixes this time, which feels almost too
  small, so I hope we are not missing something important.

   - One more mediatek dts warning fix after the previous larger set,
     this should finally result in a clean defconfig build.

   - TI OMAP dts fixes for a spurious hang on am335x and invalid data on
     DTA7

   - One DTS fix for ethernet on Oriange Pi Zero (Allwinner H616)

   - A regression fix for ti-sysc interconnect target module driver to
     not access registers after reset if srst_udelay quirk is needed

   - Reset controller driver fixes for a crash during error handling and
     a build warning"

* tag 'soc-fixes-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: mediatek: mt8395-genio-1200-evk: add interrupt-parent for mt6360
  ARM: dts: Fix occasional boot hang for am3 usb
  reset: Fix crash when freeing non-existent optional resets
  ARM: OMAP2+: Fix null pointer dereference and memory leak in omap_soc_device_init
  ARM: dts: dra7: Fix DRA7 L3 NoC node register size
  bus: ti-sysc: Flush posted write only after srst_udelay
  reset: hisilicon: hi6220: fix Wvoid-pointer-to-enum-cast warning
  arm64: dts: allwinner: h616: update emac for Orange Pi Zero 3

9 months agoMerge tag 'platform-drivers-x86-v6.7-5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 20 Dec 2023 23:58:18 +0000 (15:58 -0800)]
Merge tag 'platform-drivers-x86-v6.7-5' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Ilpo Järvinen:

 - Fan reporting on some ThinkPads

 - Laptop 13 spurious keypresses while suspended

 - Intel PMC correction to avoid crash

* tag 'platform-drivers-x86-v6.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/amd/pmc: Disable keyboard wakeup on AMD Framework 13
  platform/x86/amd/pmc: Move keyboard wakeup disablement detection to pmc-quirks
  platform/x86/amd/pmc: Only run IRQ1 firmware version check on Cezanne
  platform/x86/amd/pmc: Move platform defines to header
  platform/x86/intel/pmc: Fix hang in pmc_core_send_ltr_ignore()
  platform/x86: thinkpad_acpi: fix for incorrect fan reporting on some ThinkPad systems

9 months agoMerge tag 'ovl-fixes-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/overla...
Linus Torvalds [Wed, 20 Dec 2023 20:04:03 +0000 (12:04 -0800)]
Merge tag 'ovl-fixes-6.7-rc7' of git://git./linux/kernel/git/overlayfs/vfs

Pull overlayfs fix from Amir Goldstein:
 "Fix a regression from this merge window"

* tag 'ovl-fixes-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: fix dentry reference leak after changes to underlying layers

9 months agoMerge tag 'bcachefs-2023-12-19' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Wed, 20 Dec 2023 19:24:28 +0000 (11:24 -0800)]
Merge tag 'bcachefs-2023-12-19' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs fixes from Kent Overstreet:

 - Fix a deadlock in the data move path with nocow locks (vs. update in
   place writes); when trylock failed we were incorrectly waiting for in
   flight ios to flush.

 - Fix reporting of NFS file handle length

 - Fix early error path in bch2_fs_alloc() - list head wasn't being
   initialized early enough

 - Make sure correct (hardware accelerated) crc modules get loaded

 - Fix a rare overflow in the btree split path, when the packed bkey
   format grows and all the keys have no value (LRU btree).

 - Fix error handling in the sector allocator

   This was causing writes to spuriously fail in multidevice setups, and
   another bug meant that the errors weren't being logged, only reported
   via fsync.

* tag 'bcachefs-2023-12-19' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix bch2_alloc_sectors_start_trans() error handling
  bcachefs; guard against overflow in btree node split
  bcachefs: btree_node_u64s_with_format() takes nr keys
  bcachefs: print explicit recovery pass message only once
  bcachefs: improve modprobe support by providing softdeps
  bcachefs: fix invalid memory access in bch2_fs_alloc() error path
  bcachefs: Fix determining required file handle length
  bcachefs: Fix nocow locks deadlock

9 months agoMerge tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Wed, 20 Dec 2023 19:16:50 +0000 (11:16 -0800)]
Merge tag 'nfsd-6.7-2' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Address a few recently-introduced issues

* tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806
  NFSD: Revert 738401a9bd1ac34ccd5723d69640a4adbb1a4bc0
  NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500d
  nfsd: hold nfsd_mutex across entire netlink operation
  nfsd: call nfsd_last_thread() before final nfsd_put()

9 months agoMerge tag 'dm-6.7/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Wed, 20 Dec 2023 19:01:28 +0000 (11:01 -0800)]
Merge tag 'dm-6.7/dm-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - DM raid target (and MD raid) fix for reconfig_mutex MD deadlock that
   should have been merged along with recent v6.7-rc6 MD fixes (see MD
   related commits: f2d87a759f68^..b39113349de6)

 - DM integrity target fix to avoid modifying immutable biovec in the
   integrity_metadata() edge case where kmalloc fails.

 - Fix drivers/md/Kconfig so DM_AUDIT depends on BLK_DEV_DM.

 - Update DM entry in MAINTAINERS to remove stale info.

* tag 'dm-6.7/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  MAINTAINERS: remove stale info for DEVICE-MAPPER
  dm audit: fix Kconfig so DM_AUDIT depends on BLK_DEV_DM
  dm-integrity: don't modify bio's immutable bio_vec in integrity_metadata()
  dm-raid: delay flushing event_work() after reconfig_mutex is released

9 months agoarm64: dts: mediatek: mt8395-genio-1200-evk: add interrupt-parent for mt6360
Macpaul Lin [Fri, 15 Dec 2023 07:32:52 +0000 (15:32 +0800)]
arm64: dts: mediatek: mt8395-genio-1200-evk: add interrupt-parent for mt6360

This patch fix the warning introduced by mt6360 node in
mt8395-genio-1200-evk.dts.

arch/arm64/boot/dts/mediatek/mt8195.dtsi:464.4-27: Warning (interrupts_property): /soc/i2c@11d01000/pmic@34:#interrupt-cells: size is (8), expected multiple of 16

Add a missing 'interrupt-parent' to fix this warning.

Fixes: f2b543a191b6 ("arm64: dts: mediatek: add device-tree for Genio 1200 EVK board")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/linux-devicetree/20231212214737.230115-1-arnd@kernel.org/
Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
9 months agoMerge tag 'am3-usb-hang-fix-signed' of git://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Wed, 20 Dec 2023 12:04:38 +0000 (12:04 +0000)]
Merge tag 'am3-usb-hang-fix-signed' of git://git./linux/kernel/git/tmlind/linux-omap into arm/fixes

Fix for occasional boot hang for am335x USB

A fix for occasional boot hang for am335x USB that I've only recently
started noticing.

This can be merged naturally whenever suitable. This issue has been seen
with other similar SoCs earlier and has clearly existed for a long time.

* tag 'am3-usb-hang-fix-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: Fix occasional boot hang for am3 usb

Link: https://lore.kernel.org/r/pull-1703071616-395333@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
9 months agoMerge tag 'omap-for-v6.7/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Wed, 20 Dec 2023 12:02:25 +0000 (12:02 +0000)]
Merge tag 'omap-for-v6.7/fixes-signed' of git://git./linux/kernel/git/tmlind/linux-omap into arm/fixes

Fixes for omaps

A few fixes for omaps:

- A regression fix for ti-sysc interconnect target module driver to not access
  registers after reset if srst_udelay quirk is needed

- DRA7 L3 NoC node register size fix

* tag 'omap-for-v6.7/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP2+: Fix null pointer dereference and memory leak in omap_soc_device_init
  ARM: dts: dra7: Fix DRA7 L3 NoC node register size
  bus: ti-sysc: Flush posted write only after srst_udelay

Link: https://lore.kernel.org/r/pull-1702037799-781982@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
9 months agoMerge branch 'net-sched-tc-drop-reason'
David S. Miller [Wed, 20 Dec 2023 11:50:13 +0000 (11:50 +0000)]
Merge branch 'net-sched-tc-drop-reason'

Victor Nogueira says:

====================
net: sched: Make tc-related drop reason more flexible for remaining qdiscs

This patch builds on Daniel's patch[1] to add initial support of tc drop
reason. The main goal is to distinguish between policy and error drops for
the remainder of the egress qdiscs (other than clsact).
The drop reason is set by cls_api and act_api in the tc skb cb in case
any error occurred in the data path.

Also add new skb drop reasons that are idiosyncratic to TC.

[1] https://lore.kernel.org/all/20231009092655.22025-1-daniel@iogearbox.net

Changes in V5:
- Drop "EXT_" from cookie error's drop reason name in doc

Changes in V4:
- Condense all the cookie drop reasons into one

Changes in V3:
- Removed duplicate assignment
- Rename function tc_skb_cb_drop_reason to tcf_get_drop_reason
- Move zone field upwards in struct tc_skb_cb to move hole to the end of
  the struct

Changes in V2:
- Dropped RFC tag
- Removed check for drop reason being overwritten by filter in cls_api.c
- Simplified logic and removed function tcf_init_drop_reason
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agonet: sched: Add initial TC error skb drop reasons
Victor Nogueira [Sat, 16 Dec 2023 20:44:36 +0000 (17:44 -0300)]
net: sched: Add initial TC error skb drop reasons

Continue expanding Daniel's patch by adding new skb drop reasons that
are idiosyncratic to TC.

More specifically:

- SKB_DROP_REASON_TC_COOKIE_ERROR: An error occurred whilst
  processing a tc ext cookie.

- SKB_DROP_REASON_TC_CHAIN_NOTFOUND: tc chain lookup failed.

- SKB_DROP_REASON_TC_RECLASSIFY_LOOP: tc exceeded max reclassify loop
  iterations

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agonet: sched: Make tc-related drop reason more flexible for remaining qdiscs
Victor Nogueira [Sat, 16 Dec 2023 20:44:35 +0000 (17:44 -0300)]
net: sched: Make tc-related drop reason more flexible for remaining qdiscs

Incrementing on Daniel's patch[1], make tc-related drop reason more
flexible for remaining qdiscs - that is, all qdiscs aside from clsact.
In essence, the drop reason will be set by cls_api and act_api in case
any error occurred in the data path. With that, we can give the user more
detailed information so that they can distinguish between a policy drop
or an error drop.

[1] https://lore.kernel.org/all/20231009092655.22025-1-daniel@iogearbox.net

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agonet: sched: Move drop_reason to struct tc_skb_cb
Victor Nogueira [Sat, 16 Dec 2023 20:44:34 +0000 (17:44 -0300)]
net: sched: Move drop_reason to struct tc_skb_cb

Move drop_reason from struct tcf_result to skb cb - more specifically to
struct tc_skb_cb. With that, we'll be able to also set the drop reason for
the remaining qdiscs (aside from clsact) that do not have access to
tcf_result when time comes to set the skb drop reason.

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agor8169: add support for LED's on RTL8168/RTL8101
Heiner Kallweit [Sat, 16 Dec 2023 19:58:10 +0000 (20:58 +0100)]
r8169: add support for LED's on RTL8168/RTL8101

This adds support for the LED's on most chip versions. Excluded are
the old non-PCIe versions and RTL8125. RTL8125 has a different LED
register layout, support for it will follow later.

LED's can be controlled from userspace using the netdev LED trigger.

Tested on RTL8168h.

Note: The driver can't know which LED's are actually physically
wired. Therefore not every LED device may represent a physically
available LED.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoMerge branch 'bridge-mdb-bulk-delete'
David S. Miller [Wed, 20 Dec 2023 11:27:21 +0000 (11:27 +0000)]
Merge branch 'bridge-mdb-bulk-delete'

Ido Schimmel says:

====================
Add MDB bulk deletion support

This patchset adds MDB bulk deletion support, allowing user space to
request the deletion of matching entries instead of dumping the entire
MDB and issuing a separate deletion request for each matching entry.
Support is added in both the bridge and VXLAN drivers in a similar
fashion to the existing FDB bulk deletion support.

The parameters according to which bulk deletion can be performed are
similar to the FDB ones, namely: Destination port, VLAN ID, state (e.g.,
"permanent"), routing protocol, source / destination VNI, destination IP
and UDP port. Flushing based on flags (e.g., "offload", "fast_leave",
"added_by_star_ex", "blocked") is not currently supported, but can be
added in the future, if a use case arises.

Patch #1 adds a new uAPI attribute to allow specifying the state mask
according to which bulk deletion will be performed, if any.

Patch #2 adds a new policy according to which bulk deletion requests
(with 'NLM_F_BULK' flag set) will be parsed.

Patches #3-#4 add a new NDO for MDB bulk deletion and invoke it from the
rtnetlink code when a bulk deletion request is made.

Patches #5-#6 implement the MDB bulk deletion NDO in the bridge and
VXLAN drivers, respectively.

Patch #7 allows user space to issue MDB bulk deletion requests by no
longer rejecting the 'NLM_F_BULK' flag when it is set in 'RTM_DELMDB'
requests.

Patches #8-#9 add selftests for both drivers, for both good and bad
flows.

iproute2 changes can be found here [1].

https://github.com/idosch/iproute2/tree/submit/mdb_flush_v1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftests: vxlan_mdb: Add MDB bulk deletion test
Ido Schimmel [Sun, 17 Dec 2023 08:32:44 +0000 (10:32 +0200)]
selftests: vxlan_mdb: Add MDB bulk deletion test

Add test cases to verify the behavior of the MDB bulk deletion
functionality in the VXLAN driver.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoselftests: bridge_mdb: Add MDB bulk deletion test
Ido Schimmel [Sun, 17 Dec 2023 08:32:43 +0000 (10:32 +0200)]
selftests: bridge_mdb: Add MDB bulk deletion test

Add test cases to verify the behavior of the MDB bulk deletion
functionality in the bridge driver.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agortnetlink: bridge: Enable MDB bulk deletion
Ido Schimmel [Sun, 17 Dec 2023 08:32:42 +0000 (10:32 +0200)]
rtnetlink: bridge: Enable MDB bulk deletion

Now that both the common code as well as individual drivers support MDB
bulk deletion, allow user space to make such requests.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agovxlan: mdb: Add MDB bulk deletion support
Ido Schimmel [Sun, 17 Dec 2023 08:32:41 +0000 (10:32 +0200)]
vxlan: mdb: Add MDB bulk deletion support

Implement MDB bulk deletion support in the VXLAN driver, allowing MDB
entries to be deleted in bulk according to provided parameters.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agobridge: mdb: Add MDB bulk deletion support
Ido Schimmel [Sun, 17 Dec 2023 08:32:40 +0000 (10:32 +0200)]
bridge: mdb: Add MDB bulk deletion support

Implement MDB bulk deletion support in the bridge driver, allowing MDB
entries to be deleted in bulk according to provided parameters.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agortnetlink: bridge: Invoke MDB bulk deletion when needed
Ido Schimmel [Sun, 17 Dec 2023 08:32:39 +0000 (10:32 +0200)]
rtnetlink: bridge: Invoke MDB bulk deletion when needed

Invoke the new MDB bulk deletion device operation when the 'NLM_F_BULK'
flag is set in the netlink message header.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agonet: Add MDB bulk deletion device operation
Ido Schimmel [Sun, 17 Dec 2023 08:32:38 +0000 (10:32 +0200)]
net: Add MDB bulk deletion device operation

Add MDB net device operation that will be invoked by rtnetlink code in
response to received 'RTM_DELMDB' messages with the 'NLM_F_BULK' flag
set. Subsequent patches will implement the operation in the bridge and
VXLAN drivers.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agortnetlink: bridge: Use a different policy for MDB bulk delete
Ido Schimmel [Sun, 17 Dec 2023 08:32:37 +0000 (10:32 +0200)]
rtnetlink: bridge: Use a different policy for MDB bulk delete

For MDB bulk delete we will need to validate 'MDBA_SET_ENTRY'
differently compared to regular delete. Specifically, allow the ifindex
to be zero (in case not filtering on bridge port) and force the address
to be zero as bulk delete based on address is not supported.

Do that by introducing a new policy and choosing the correct policy
based on the presence of the 'NLM_F_BULK' flag in the netlink message
header. Use nlmsg_parse() for strict validation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agobridge: add MDB state mask uAPI attribute
Ido Schimmel [Sun, 17 Dec 2023 08:32:36 +0000 (10:32 +0200)]
bridge: add MDB state mask uAPI attribute

Currently, the 'state' field in 'struct br_port_msg' can be set to 1 if
the MDB entry is permanent or 0 if it is temporary. Additional states
might be added in the future.

In a similar fashion to 'NDA_NDM_STATE_MASK', add an MDB state mask uAPI
attribute that will allow the upcoming bulk deletion API to bulk delete
MDB entries with a certain state or any state.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agonet: stmmac: fix incorrect flag check in timestamp interrupt
Lai Peter Jun Ann [Mon, 18 Dec 2023 07:51:32 +0000 (15:51 +0800)]
net: stmmac: fix incorrect flag check in timestamp interrupt

The driver should continue get the timestamp if STMMAC_FLAG_EXT_SNAPSHOT_EN
flag is set.

Fixes: aa5513f5d95f ("net: stmmac: replace the ext_snapshot_en field with a flag")
Cc: <stable@vger.kernel.org> # 6.6
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoocteontx2-af: insert space after include
Wang Jinchao [Mon, 18 Dec 2023 07:04:07 +0000 (15:04 +0800)]
octeontx2-af: insert space after include

Maintain Consistent Formatting: Insert Space after #include

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agoMerge tag 'for-net-2023-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
David S. Miller [Wed, 20 Dec 2023 11:12:12 +0000 (11:12 +0000)]
Merge tag 'for-net-2023-12-15' of git://git./linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Add encryption key size check when acting as peripheral
 - Shut up false-positive build warning
 - Send reject if L2CAP command request is corrupted
 - Fix Use-After-Free in bt_sock_recvmsg
 - Fix not notifying when connection encryption changes
 - Fix not checking if HCI_OP_INQUIRY has been sent
 - Fix address type send over to the MGMT interface
 - Fix deadlock in vhci_send_frame
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 months agobcachefs: Fix bch2_alloc_sectors_start_trans() error handling
Kent Overstreet [Tue, 19 Dec 2023 22:16:34 +0000 (17:16 -0500)]
bcachefs: Fix bch2_alloc_sectors_start_trans() error handling

When we fail to allocate because of insufficient open buckets, we don't
want to retry from the full set of devices - we just want to retry in
blocking mode.

But if the retry in blocking mode fails with a different error code, we
end up squashing the -BCH_ERR_open_buckets_empty error with an error
that makes us thing we won't be able to allocate (insufficient_devices)
- which is incorrect when we didn't try to allocate from the full set of
devices, and causes the write to fail.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs; guard against overflow in btree node split
Kent Overstreet [Mon, 18 Dec 2023 04:31:26 +0000 (23:31 -0500)]
bcachefs; guard against overflow in btree node split

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agobcachefs: btree_node_u64s_with_format() takes nr keys
Kent Overstreet [Mon, 18 Dec 2023 04:20:59 +0000 (23:20 -0500)]
bcachefs: btree_node_u64s_with_format() takes nr keys

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 months agoMerge tag 'trace-v6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Tue, 19 Dec 2023 20:25:43 +0000 (12:25 -0800)]
Merge tag 'trace-v6.7-rc6' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "While working on the ring buffer, I found one more bug with the
  timestamp code, and the fix for this removed the need for the final
  64-bit cmpxchg!

  The ring buffer events hold a "delta" from the previous event. If it
  is determined that the delta can not be calculated, it falls back to
  adding an absolute timestamp value. The way to know if the delta can
  be used is via two stored timestamps in the per-cpu buffer meta data:

   before_stamp and write_stamp

  The before_stamp is written by every event before it tries to allocate
  its space on the ring buffer. The write_stamp is written after it
  allocates its space and knows that nothing came in after it read the
  previous before_stamp and write_stamp and the two matched.

  A previous fix dd9394257078 ("ring-buffer: Do not try to put back
  write_stamp") removed putting back the write_stamp to match the
  before_stamp so that the next event could use the delta, but races
  were found where the two would match, but not be for of the previous
  event.

  It was determined to allow the event reservation to not have a valid
  write_stamp when it is finished, and this fixed a lot of races.

  The last use of the 64-bit timestamp cmpxchg depended on the
  write_stamp being valid after an interruption. But this is no longer
  the case, as if an event is interrupted by a softirq that writes an
  event, and that event gets interrupted by a hardirq or NMI and that
  writes an event, then the softirq could finish its reservation without
  a valid write_stamp.

  In the slow path of the event reservation, a delta can still be used
  if the write_stamp is valid. Instead of using a cmpxchg against the
  write stamp, the before_stamp needs to be read again to validate the
  write_stamp. The cmpxchg is not needed.

  This updates the slowpath to validate the write_stamp by comparing it
  to the before_stamp and removes all rb_time_cmpxchg() as there are no
  more users of that function.

  The removal of the 32-bit updates of rb_time_t will be done in the
  next merge window"

* tag 'trace-v6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Fix slowpath of interrupted event

9 months agoMerge tag 'arc-6.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Tue, 19 Dec 2023 20:19:25 +0000 (12:19 -0800)]
Merge tag 'arc-6.7-fixes' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - build error for hugetlb, sparse and smatch fixes

 - removal of VIPT aliasing cache code

* tag 'arc-6.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: add hugetlb definitions
  ARC: fix smatch warning
  ARC: fix spare error
  ARC: mm: retire support for aliasing VIPT D$
  ARC: entry: move ARCompact specific bits out of entry.h
  ARC: entry: SAVE_ABI_CALLEE_REG: ISA/ABI specific helper

9 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Paolo Abeni [Tue, 19 Dec 2023 17:35:28 +0000 (18:35 +0100)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-12-19

Hi David, hi Jakub, hi Paolo, hi Eric,

The following pull-request contains BPF updates for your *net-next* tree.

We've added 2 non-merge commits during the last 1 day(s) which contain
a total of 40 files changed, 642 insertions(+), 2926 deletions(-).

The main changes are:

1) Revert all of BPF token-related patches for now as per list discussion [0],
   from Andrii Nakryiko.

   [0] https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com

2) Fix a syzbot-reported use-after-free read in nla_find() triggered from
   bpf_skb_get_nlattr_nest() helper, from Jakub Kicinski.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  Revert BPF token-related functionality
  bpf: Use nla_ok() instead of checking nla_len directly
====================

Link: https://lore.kernel.org/r/20231219170359.11035-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agocifs: do not let cifs_chan_update_iface deallocate channels
Shyam Prasad N [Fri, 15 Dec 2023 17:16:56 +0000 (17:16 +0000)]
cifs: do not let cifs_chan_update_iface deallocate channels

cifs_chan_update_iface is meant to check and update the server
interface used for a channel when the existing server interface
is no longer available.

So far, this handler had the code to remove an interface entry
even if a new candidate interface is not available. Allowing
this leads to several corner cases to handle.

This change makes the logic much simpler by not deallocating
the current channel interface entry if a new interface is not
found to replace it with.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
9 months agocifs: fix a pending undercount of srv_count
Shyam Prasad N [Fri, 15 Dec 2023 17:16:55 +0000 (17:16 +0000)]
cifs: fix a pending undercount of srv_count

The following commit reverted the changes to ref count
the server struct while scheduling a reconnect work:
823342524868 Revert "cifs: reconnect work should have reference on server struct"

However, a following change also introduced scheduling
of reconnect work, and assumed ref counting. This change
fixes that as well.

Fixes umount problems like:

[73496.157838] CPU: 5 PID: 1321389 Comm: umount Tainted: G        W  OE      6.7.0-060700rc6-generic #202312172332
[73496.157841] Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET67W (1.50 ) 12/15/2022
[73496.157843] RIP: 0010:cifs_put_tcp_session+0x17d/0x190 [cifs]
[73496.157906] Code: 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc e8 4a 6e 14 e6 e9 f6 fe ff ff be 03 00 00 00 48 89 d7 e8 78 26 b3 e5 e9 e4 fe ff ff <0f> 0b e9 b1 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90
[73496.157908] RSP: 0018:ffffc90003bcbcb8 EFLAGS: 00010286
[73496.157911] RAX: 00000000ffffffff RBX: ffff8885830fa800 RCX: 0000000000000000
[73496.157913] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[73496.157915] RBP: ffffc90003bcbcc8 R08: 0000000000000000 R09: 0000000000000000
[73496.157917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[73496.157918] R13: ffff8887d56ba800 R14: 00000000ffffffff R15: ffff8885830fa800
[73496.157920] FS:  00007f1ff0e33800(0000) GS:ffff88887ba80000(0000) knlGS:0000000000000000
[73496.157922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[73496.157924] CR2: 0000115f002e2010 CR3: 00000003d1e24005 CR4: 00000000003706f0
[73496.157926] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[73496.157928] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[73496.157929] Call Trace:
[73496.157931]  <TASK>
[73496.157933]  ? show_regs+0x6d/0x80
[73496.157936]  ? __warn+0x89/0x160
[73496.157939]  ? cifs_put_tcp_session+0x17d/0x190 [cifs]
[73496.157976]  ? report_bug+0x17e/0x1b0
[73496.157980]  ? handle_bug+0x51/0xa0
[73496.157983]  ? exc_invalid_op+0x18/0x80
[73496.157985]  ? asm_exc_invalid_op+0x1b/0x20
[73496.157989]  ? cifs_put_tcp_session+0x17d/0x190 [cifs]
[73496.158023]  ? cifs_put_tcp_session+0x1e/0x190 [cifs]
[73496.158057]  __cifs_put_smb_ses+0x2b5/0x540 [cifs]
[73496.158090]  ? tconInfoFree+0xc2/0x120 [cifs]
[73496.158130]  cifs_put_tcon.part.0+0x108/0x2b0 [cifs]
[73496.158173]  cifs_put_tlink+0x49/0x90 [cifs]
[73496.158220]  cifs_umount+0x56/0xb0 [cifs]
[73496.158258]  cifs_kill_sb+0x52/0x60 [cifs]
[73496.158306]  deactivate_locked_super+0x32/0xc0
[73496.158309]  deactivate_super+0x46/0x60
[73496.158311]  cleanup_mnt+0xc3/0x170
[73496.158314]  __cleanup_mnt+0x12/0x20
[73496.158330]  task_work_run+0x5e/0xa0
[73496.158333]  exit_to_user_mode_loop+0x105/0x130
[73496.158336]  exit_to_user_mode_prepare+0xa5/0xb0
[73496.158338]  syscall_exit_to_user_mode+0x29/0x60
[73496.158341]  do_syscall_64+0x6c/0xf0
[73496.158344]  ? syscall_exit_to_user_mode+0x37/0x60
[73496.158346]  ? do_syscall_64+0x6c/0xf0
[73496.158349]  ? exit_to_user_mode_prepare+0x30/0xb0
[73496.158353]  ? syscall_exit_to_user_mode+0x37/0x60
[73496.158355]  ? do_syscall_64+0x6c/0xf0

Reported-by: Robert Morris <rtm@csail.mit.edu>
Fixes: 705fc522fe9d ("cifs: handle when server starts supporting multichannel")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
9 months agos390: update defconfigs
Heiko Carstens [Thu, 7 Dec 2023 14:24:34 +0000 (15:24 +0100)]
s390: update defconfigs

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
9 months agofs: cifs: Fix atime update check
Zizhi Wo [Wed, 13 Dec 2023 02:23:53 +0000 (10:23 +0800)]
fs: cifs: Fix atime update check

Commit 9b9c5bea0b96 ("cifs: do not return atime less than mtime") indicates
that in cifs, if atime is less than mtime, some apps will break.
Therefore, it introduce a function to compare this two variables in two
places where atime is updated. If atime is less than mtime, update it to
mtime.

However, the patch was handled incorrectly, resulting in atime and mtime
being exactly equal. A previous commit 69738cfdfa70 ("fs: cifs: Fix atime
update check vs mtime") fixed one place and forgot to fix another. Fix it.

Fixes: 9b9c5bea0b96 ("cifs: do not return atime less than mtime")
Cc: stable@vger.kernel.org
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
9 months agosmb: client: fix potential OOB in smb2_dump_detail()
Paulo Alcantara [Tue, 19 Dec 2023 16:10:31 +0000 (13:10 -0300)]
smb: client: fix potential OOB in smb2_dump_detail()

Validate SMB message with ->check_message() before calling
->calc_smb_size().

This fixes CVE-2023-6610.

Reported-by: j51569436@gmail.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218219
Cc; stable@vger.kernel.org
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
9 months agoRevert BPF token-related functionality
Andrii Nakryiko [Tue, 19 Dec 2023 15:37:35 +0000 (07:37 -0800)]
Revert BPF token-related functionality

This patch includes the following revert (one  conflicting BPF FS
patch and three token patch sets, represented by merge commits):
  - revert 0f5d5454c723 "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'";
  - revert 750e785796bb "bpf: Support uid and gid when mounting bpffs";
  - revert 733763285acf "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'";
  - revert c35919dcce28 "Merge branch 'bpf-token-and-bpf-fs-based-delegation'".

Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
9 months agoMerge branch 'devlink-introduce-notifications-filtering'
Paolo Abeni [Tue, 19 Dec 2023 14:31:43 +0000 (15:31 +0100)]
Merge branch 'devlink-introduce-notifications-filtering'

Jiri Pirko says:

====================
devlink: introduce notifications filtering

From: Jiri Pirko <jiri@nvidia.com>

Currently the user listening on a socket for devlink notifications
gets always all messages for all existing devlink instances and objects,
even if he is interested only in one of those. That may cause
unnecessary overhead on setups with thousands of instances present.

User is currently able to narrow down the devlink objects replies
to dump commands by specifying select attributes.

Allow similar approach for notifications providing user a new
notify-filter-set command to select attributes with values
the notification message has to match. In that case, it is delivered
to the socket.

Note that the filtering is done per-socket, so multiple users may
specify different selection of attributes with values.

This patchset initially introduces support for following attributes:
DEVLINK_ATTR_BUS_NAME
DEVLINK_ATTR_DEV_NAME
DEVLINK_ATTR_PORT_INDEX

Patches #1 - #4 are preparations in devlink code, patch #3 is
                an optimization done on the way.
Patches #5 - #7 are preparations in netlink and generic netlink code.
Patch #8 is the main one in this set implementing of
         the notify-filter-set command and the actual
         per-socket filtering.
Patch #9 extends the infrastructure allowing to filter according
         to a port index.

Example:
$ devlink mon port pci/0000:08:00.0/32768
[port,new] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type eth flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type eth netdev eth3 flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type eth netdev eth3 flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type eth flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,new] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
[port,del] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
====================

Link: https://lore.kernel.org/r/20231216123001.1293639-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agodevlink: extend multicast filtering by port index
Jiri Pirko [Sat, 16 Dec 2023 12:30:01 +0000 (13:30 +0100)]
devlink: extend multicast filtering by port index

Expose the previously introduced notification multicast messages
filtering infrastructure and allow the user to select messages using
port index.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agodevlink: add a command to set notification filter and use it for multicasts
Jiri Pirko [Sat, 16 Dec 2023 12:30:00 +0000 (13:30 +0100)]
devlink: add a command to set notification filter and use it for multicasts

Currently the user listening on a socket for devlink notifications
gets always all messages for all existing instances, even if he is
interested only in one of those. That may cause unnecessary overhead
on setups with thousands of instances present.

User is currently able to narrow down the devlink objects replies
to dump commands by specifying select attributes.

Allow similar approach for notifications. Introduce a new devlink
NOTIFY_FILTER_SET which the user passes the select attributes. Store
these per-socket and use them for filtering messages
during multicast send.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agogenetlink: introduce helpers to do filtered multicast
Jiri Pirko [Sat, 16 Dec 2023 12:29:59 +0000 (13:29 +0100)]
genetlink: introduce helpers to do filtered multicast

Currently it is possible for netlink kernel user to pass custom
filter function to broadcast send function netlink_broadcast_filtered().
However, this is not exposed to multicast send and to generic
netlink users.

Extend the api and introduce a netlink helper nlmsg_multicast_filtered()
and a generic netlink helper genlmsg_multicast_netns_filtered()
to allow generic netlink families to specify filter function
while sending multicast messages.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonetlink: introduce typedef for filter function
Jiri Pirko [Sat, 16 Dec 2023 12:29:58 +0000 (13:29 +0100)]
netlink: introduce typedef for filter function

Make the code using filter function a bit nicer by consolidating the
filter function arguments using typedef.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agogenetlink: introduce per-sock family private storage
Jiri Pirko [Sat, 16 Dec 2023 12:29:57 +0000 (13:29 +0100)]
genetlink: introduce per-sock family private storage

Introduce an xarray for Generic netlink family to store per-socket
private. Initialize this xarray only if family uses per-socket privs.

Introduce genl_sk_priv_get() to get the socket priv pointer for a family
and initialize it in case it does not exist.
Introduce __genl_sk_priv_get() to obtain socket priv pointer for a
family under RCU read lock.

Allow family to specify the priv size, init() and destroy() callbacks.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agodevlink: introduce a helper for netlink multicast send
Jiri Pirko [Sat, 16 Dec 2023 12:29:56 +0000 (13:29 +0100)]
devlink: introduce a helper for netlink multicast send

Introduce a helper devlink_nl_notify_send() so each object notification
function does not have to call genlmsg_multicast_netns() with the same
arguments.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agodevlink: send notifications only if there are listeners
Jiri Pirko [Sat, 16 Dec 2023 12:29:55 +0000 (13:29 +0100)]
devlink: send notifications only if there are listeners

Introduce devlink_nl_notify_need() helper and using it to check at the
beginning of notification functions to avoid overhead of composing
notification messages in case nobody listens.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agodevlink: introduce __devl_is_registered() helper and use it instead of xa_get_mark()
Jiri Pirko [Sat, 16 Dec 2023 12:29:54 +0000 (13:29 +0100)]
devlink: introduce __devl_is_registered() helper and use it instead of xa_get_mark()

Introduce __devl_is_registered() which does not assert on devlink
instance lock and use it in notifications which may be called
without devlink instance lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agodevlink: use devl_is_registered() helper instead xa_get_mark()
Jiri Pirko [Sat, 16 Dec 2023 12:29:53 +0000 (13:29 +0100)]
devlink: use devl_is_registered() helper instead xa_get_mark()

Instead of checking the xarray mark directly using xa_get_mark() helper
use devl_is_registered() helper which wraps it up. Note that there are
couple more users of xa_get_mark() left which are going to be handled
by the next patch.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agobpf: Use nla_ok() instead of checking nla_len directly
Jakub Kicinski [Mon, 18 Dec 2023 23:19:04 +0000 (15:19 -0800)]
bpf: Use nla_ok() instead of checking nla_len directly

nla_len may also be too short to be sane, in which case after
recent changes nla_len() will return a wrapped value.

Fixes: 172db56d90d2 ("netlink: Return unsigned value for nla_len()")
Reported-by: syzbot+f43a23b6e622797c7a28@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/bpf/20231218231904.260440-1-kuba@kernel.org
9 months agoMerge branch 'check-vlan-filter-feature-in-vlan_vids_add_by_dev-and-vlan_vids_del_by_dev'
Paolo Abeni [Tue, 19 Dec 2023 12:13:59 +0000 (13:13 +0100)]
Merge branch 'check-vlan-filter-feature-in-vlan_vids_add_by_dev-and-vlan_vids_del_by_dev'

Liu Jian says:

====================
check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()

v2->v3:
Filter using vlan_hw_filter_capable().
Add one basic test.
====================

Link: https://lore.kernel.org/r/20231216075219.2379123-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoselftests: add vlan hw filter tests
Liu Jian [Sat, 16 Dec 2023 07:52:19 +0000 (15:52 +0800)]
selftests: add vlan hw filter tests

Add one basic vlan hw filter test.

Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()
Liu Jian [Sat, 16 Dec 2023 07:52:18 +0000 (15:52 +0800)]
net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()

I got the below warning trace:

WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify
CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0
Call Trace:
 rtnl_dellink
 rtnetlink_rcv_msg
 netlink_rcv_skb
 netlink_unicast
 netlink_sendmsg
 __sock_sendmsg
 ____sys_sendmsg
 ___sys_sendmsg
 __sys_sendmsg
 do_syscall_64
 entry_SYSCALL_64_after_hwframe

It can be repoduced via:

    ip netns add ns1
    ip netns exec ns1 ip link add bond0 type bond mode 0
    ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
    ip netns exec ns1 ip link set bond_slave_1 master bond0
[1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off
[2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
[3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0
[4] ip netns exec ns1 ip link set bond_slave_1 nomaster
[5] ip netns exec ns1 ip link del veth2
    ip netns del ns1

This is all caused by command [1] turning off the rx-vlan-filter function
of bond0. The reason is the same as commit 01f4fd270870 ("bonding: Fix
incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands
[2] [3] add the same vid to slave and master respectively, causing
command [4] to empty slave->vlan_info. The following command [5] triggers
this problem.

To fix this problem, we should add VLAN_FILTER feature checks in
vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect
addition or deletion of vlan_vid information.

Fixes: 348a1443cc43 ("vlan: introduce functions to do mass addition/deletion of vids by another device")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet: hns3: add new maintainer for the HNS3 ethernet driver
Jijie Shao [Sat, 16 Dec 2023 07:04:13 +0000 (15:04 +0800)]
net: hns3: add new maintainer for the HNS3 ethernet driver

Jijie Shao will be responsible for
maintaining the hns3 driver's code in the future,
so add Jijie to the hns3 driver's matainer list.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20231216070413.233668-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet: mana: select PAGE_POOL
Yury Norov [Fri, 15 Dec 2023 20:33:53 +0000 (12:33 -0800)]
net: mana: select PAGE_POOL

Mana uses PAGE_POOL API. x86_64 defconfig doesn't select it:

ld: vmlinux.o: in function `mana_create_page_pool.isra.0':
mana_en.c:(.text+0x9ae36f): undefined reference to `page_pool_create'
ld: vmlinux.o: in function `mana_get_rxfrag':
mana_en.c:(.text+0x9afed1): undefined reference to `page_pool_alloc_pages'
make[3]: *** [/home/yury/work/linux/scripts/Makefile.vmlinux:37: vmlinux] Error 1
make[2]: *** [/home/yury/work/linux/Makefile:1154: vmlinux] Error 2
make[1]: *** [/home/yury/work/linux/Makefile:234: __sub-make] Error 2
make[1]: Leaving directory '/home/yury/work/build-linux-x86_64'
make: *** [Makefile:234: __sub-make] Error 2

So we need to select it explicitly.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter")
Link: https://lore.kernel.org/r/20231215203353.635379-1-yury.norov@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoMerge branch 'add-pf-vf-mailbox-support'
Paolo Abeni [Tue, 19 Dec 2023 11:00:55 +0000 (12:00 +0100)]
Merge branch 'add-pf-vf-mailbox-support'

Shinas Rasheed says:

====================
add PF-VF mailbox support

This patchset aims to add PF-VF mailbox support, its related
version support, and relevant control net support for immediate
functionalities such as firmware notifications to VF.

Changes:
V6:
  - Fixed 1/4 patch to apply to top of net-next merged with net fixes

V5: https://lore.kernel.org/all/20231214164536.2670006-1-srasheed@marvell.com/
  - Refactored patches to cut out redundant changes in 1/4 patch.

V4: https://lore.kernel.org/all/20231213035816.2656851-1-srasheed@marvell.com/
  - Included tag [1/4] in subject of first patch of series which was
    lost in V3

V3: https://lore.kernel.org/all/20231211063355.2630028-1-srasheed@marvell.com/
  - Corrected error cleanup logic for PF-VF mbox setup
  - Removed double inclusion of types.h header file in octep_pfvf_mbox.c

V2: https://lore.kernel.org/all/20231209081450.2613561-1-srasheed@marvell.com/
  - Removed unused variable in PATCH 1/4

V1: https://lore.kernel.org/all/20231208070352.2606192-1-srasheed@marvell.com/
====================

Link: https://lore.kernel.org/r/20231215181425.2681426-1-srasheed@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoocteon_ep: support firmware notifications for VFs
Shinas Rasheed [Fri, 15 Dec 2023 18:14:25 +0000 (10:14 -0800)]
octeon_ep: support firmware notifications for VFs

Notifications from firmware to vf has to pass through PF
control mbox and via PF-VF mailboxes. The notifications have to
be parsed out from the control mbox and passed to the
PF-VF mailbox in order to reach the corresponding VF.
Version compatibility should also be checked before messages
are passed to the mailboxes.

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoocteon_ep: control net framework to support VF offloads
Shinas Rasheed [Fri, 15 Dec 2023 18:14:24 +0000 (10:14 -0800)]
octeon_ep: control net framework to support VF offloads

Inquire firmware on supported offloads, as well as convey offloads
enabled dynamically to firmware for the VFs. Implement control net API
to support the same.

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoocteon_ep: PF-VF mailbox version support
Shinas Rasheed [Fri, 15 Dec 2023 18:14:23 +0000 (10:14 -0800)]
octeon_ep: PF-VF mailbox version support

Add PF-VF mailbox initial version support

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoocteon_ep: add PF-VF mailbox communication
Shinas Rasheed [Fri, 15 Dec 2023 18:14:22 +0000 (10:14 -0800)]
octeon_ep: add PF-VF mailbox communication

Implement mailbox communication between PF and VFs.
PF-VF mailbox is used for all control commands from VF to PF and
asynchronous notification messages from PF to VF.

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agonet: ks8851: Fix TX stall caused by TX buffer overrun
Ronald Wahl [Thu, 14 Dec 2023 18:11:12 +0000 (19:11 +0100)]
net: ks8851: Fix TX stall caused by TX buffer overrun

There is a bug in the ks8851 Ethernet driver that more data is written
to the hardware TX buffer than actually available. This is caused by
wrong accounting of the free TX buffer space.

The driver maintains a tx_space variable that represents the TX buffer
space that is deemed to be free. The ks8851_start_xmit_spi() function
adds an SKB to a queue if tx_space is large enough and reduces tx_space
by the amount of buffer space it will later need in the TX buffer and
then schedules a work item. If there is not enough space then the TX
queue is stopped.

The worker function ks8851_tx_work() dequeues all the SKBs and writes
the data into the hardware TX buffer. The last packet will trigger an
interrupt after it was send. Here it is assumed that all data fits into
the TX buffer.

In the interrupt routine (which runs asynchronously because it is a
threaded interrupt) tx_space is updated with the current value from the
hardware. Also the TX queue is woken up again.

Now it could happen that after data was sent to the hardware and before
handling the TX interrupt new data is queued in ks8851_start_xmit_spi()
when the TX buffer space had still some space left. When the interrupt
is actually handled tx_space is updated from the hardware but now we
already have new SKBs queued that have not been written to the hardware
TX buffer yet. Since tx_space has been overwritten by the value from the
hardware the space is not accounted for.

Now we have more data queued then buffer space available in the hardware
and ks8851_tx_work() will potentially overrun the hardware TX buffer. In
many cases it will still work because often the buffer is written out
fast enough so that no overrun occurs but for example if the peer
throttles us via flow control then an overrun may happen.

This can be fixed in different ways. The most simple way would be to set
tx_space to 0 before writing data to the hardware TX buffer preventing
the queuing of more SKBs until the TX interrupt has been handled. I have
chosen a slightly more efficient (and still rather simple) way and
track the amount of data that is already queued and not yet written to
the hardware. When new SKBs are to be queued the already queued amount
of data is honoured when checking free TX buffer space.

I tested this with a setup of two linked KS8851 running iperf3 between
the two in bidirectional mode. Before the fix I got a stall after some
minutes. With the fix I saw now issues anymore after hours.

Fixes: 3ba81f3ece3c ("net: Micrel KS8851 SPI network driver")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231214181112.76052-1-rwahl@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 months agoring-buffer: Fix slowpath of interrupted event
Steven Rostedt (Google) [Tue, 19 Dec 2023 04:07:12 +0000 (23:07 -0500)]
ring-buffer: Fix slowpath of interrupted event

To synchronize the timestamps with the ring buffer reservation, there are
two timestamps that are saved in the buffer meta data.

1. before_stamp
2. write_stamp

When the two are equal, the write_stamp is considered valid, as in, it may
be used to calculate the delta of the next event as the write_stamp is the
timestamp of the previous reserved event on the buffer.

This is done by the following:

 /*A*/ w = current position on the ring buffer
before = before_stamp
after = write_stamp
ts = read current timestamp

if (before != after) {
write_stamp is not valid, force adding an absolute
timestamp.
}

 /*B*/ before_stamp = ts

 /*C*/ write = local_add_return(event length, position on ring buffer)

if (w == write - event length) {
/* Nothing interrupted between A and C */
 /*E*/ write_stamp = ts;
delta = ts - after
/*
 * If nothing interrupted again,
 * before_stamp == write_stamp and write_stamp
 * can be used to calculate the delta for
 * events that come in after this one.
 */
} else {

/*
 * The slow path!
 * Was interrupted between A and C.
 */

This is the place that there's a bug. We currently have:

after = write_stamp
ts = read current timestamp

 /*F*/ if (write == current position on the ring buffer &&
    after < ts && cmpxchg(write_stamp, after, ts)) {

delta = ts - after;

} else {
delta = 0;
}

The assumption is that if the current position on the ring buffer hasn't
moved between C and F, then it also was not interrupted, and that the last
event written has a timestamp that matches the write_stamp. That is the
write_stamp is valid.

But this may not be the case:

If a task context event was interrupted by softirq between B and C.

And the softirq wrote an event that got interrupted by a hard irq between
C and E.

and the hard irq wrote an event (does not need to be interrupted)

We have:

 /*B*/ before_stamp = ts of normal context

   ---> interrupted by softirq

/*B*/ before_stamp = ts of softirq context

  ---> interrupted by hardirq

/*B*/ before_stamp = ts of hard irq context
/*E*/ write_stamp = ts of hard irq context

/* matches and write_stamp valid */
  <----

/*E*/ write_stamp = ts of softirq context

/* No longer matches before_stamp, write_stamp is not valid! */

   <---

 w != write - length, go to slow path

// Right now the order of events in the ring buffer is:
//
// |-- softirq event --|-- hard irq event --|-- normal context event --|
//

 after = write_stamp (this is the ts of softirq)
 ts = read current timestamp

 if (write == current position on the ring buffer [true] &&
     after < ts [true] && cmpxchg(write_stamp, after, ts) [true]) {

delta = ts - after  [Wrong!]

The delta is to be between the hard irq event and the normal context
event, but the above logic made the delta between the softirq event and
the normal context event, where the hard irq event is between the two. This
will shift all the remaining event timestamps on the sub-buffer
incorrectly.

The write_stamp is only valid if it matches the before_stamp. The cmpxchg
does nothing to help this.

Instead, the following logic can be done to fix this:

before = before_stamp
ts = read current timestamp
before_stamp = ts

after = write_stamp

if (write == current position on the ring buffer &&
    after == before && after < ts) {

delta = ts - after

} else {
delta = 0;
}

The above will only use the write_stamp if it still matches before_stamp
and was tested to not have changed since C.

As a bonus, with this logic we do not need any 64-bit cmpxchg() at all!

This means the 32-bit rb_time_t workaround can finally be removed. But
that's for a later time.

Link: https://lore.kernel.org/linux-trace-kernel/20231218175229.58ec3daf@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231218230712.3a76b081@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: dd93942570789 ("ring-buffer: Do not try to put back write_stamp")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
9 months agoMerge tag 'hid-for-linus-2023121901' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 19 Dec 2023 00:47:21 +0000 (16:47 -0800)]
Merge tag 'hid-for-linus-2023121901' of git://git./linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - fix for division by zero in Nintendo driver when generic joycon is
   attached, reported and fixed by SteamOS folks (Guilherme G. Piccoli)

 - GCC-7 build fix (which is a good cleanup anyway) for Nintendo driver
   (Ryan McClelland)

* tag 'hid-for-linus-2023121901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: nintendo: Prevent divide-by-zero on code
  HID: nintendo: fix initializer element is not constant error

9 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Tue, 19 Dec 2023 00:46:07 +0000 (16:46 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-12-18

This PR is larger than usual and contains changes in various parts
of the kernel.

The main changes are:

1) Fix kCFI bugs in BPF, from Peter Zijlstra.

End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.

2) Introduce BPF token object, from Andrii Nakryiko.

It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.

Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
             -o delegate_cmds=prog_load:MAP_CREATE \
             -o delegate_progs=kprobe \
             -o delegate_attachs=xdp

3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.

 - Complete precision tracking support for register spills
 - Fix verification of possibly-zero-sized stack accesses
 - Fix access to uninit stack slots
 - Track aligned STACK_ZERO cases as imprecise spilled registers.
   It improves the verifier "instructions processed" metric from single
   digit to 50-60% for some programs.
 - Fix verifier retval logic

4) Support for VLAN tag in XDP hints, from Larysa Zaremba.

5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.

End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.

6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.

7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.

It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.

8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.

9) BPF file verification via fsverity, from Song Liu.

It allows BPF progs get fsverity digest.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
  bpf: Ensure precise is reset to false in __mark_reg_const_zero()
  selftests/bpf: Add more uprobe multi fail tests
  bpf: Fail uprobe multi link with negative offset
  selftests/bpf: Test the release of map btf
  s390/bpf: Fix indirect trampoline generation
  selftests/bpf: Temporarily disable dummy_struct_ops test on s390
  x86/cfi,bpf: Fix bpf_exception_cb() signature
  bpf: Fix dtor CFI
  cfi: Add CFI_NOSEAL()
  x86/cfi,bpf: Fix bpf_struct_ops CFI
  x86/cfi,bpf: Fix bpf_callback_t CFI
  x86/cfi,bpf: Fix BPF JIT call
  cfi: Flip headers
  selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
  selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
  selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
  bpf: Limit the number of kprobes when attaching program to multiple kprobes
  bpf: Limit the number of uprobes when attaching program to multiple uprobes
  bpf: xdp: Register generic_kfunc_set with XDP programs
  selftests/bpf: utilize string values for delegate_xxx mount options
  ...
====================

Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>