Luiz Augusto von Dentz [Tue, 20 May 2025 19:42:21 +0000 (15:42 -0400)]
Bluetooth: MGMT: Protect mgmt_pending list with its own lock
This uses a mutex to protect from concurrent access of mgmt_pending
list which can cause crashes like:
==================================================================
BUG: KASAN: slab-use-after-free in hci_sock_get_channel+0x60/0x68 net/bluetooth/hci_sock.c:91
Read of size 2 at addr
ffff0000c48885b2 by task syz.4.334/7318
CPU: 0 UID: 0 PID: 7318 Comm: syz.4.334 Not tainted
6.15.0-rc7-syzkaller-g187899f4124a #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call trace:
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
__dump_stack+0x30/0x40 lib/dump_stack.c:94
dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120
print_address_description+0xa8/0x254 mm/kasan/report.c:408
print_report+0x68/0x84 mm/kasan/report.c:521
kasan_report+0xb0/0x110 mm/kasan/report.c:634
__asan_report_load2_noabort+0x20/0x2c mm/kasan/report_generic.c:379
hci_sock_get_channel+0x60/0x68 net/bluetooth/hci_sock.c:91
mgmt_pending_find+0x7c/0x140 net/bluetooth/mgmt_util.c:223
pending_find net/bluetooth/mgmt.c:947 [inline]
remove_adv_monitor+0x44/0x1a4 net/bluetooth/mgmt.c:5445
hci_mgmt_cmd+0x780/0xc00 net/bluetooth/hci_sock.c:1712
hci_sock_sendmsg+0x544/0xbb0 net/bluetooth/hci_sock.c:1832
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg net/socket.c:727 [inline]
sock_write_iter+0x25c/0x378 net/socket.c:1131
new_sync_write fs/read_write.c:591 [inline]
vfs_write+0x62c/0x97c fs/read_write.c:684
ksys_write+0x120/0x210 fs/read_write.c:736
__do_sys_write fs/read_write.c:747 [inline]
__se_sys_write fs/read_write.c:744 [inline]
__arm64_sys_write+0x7c/0x90 fs/read_write.c:744
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Allocated by task 7037:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_alloc_info+0x44/0x54 mm/kasan/generic.c:562
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x9c/0xb4 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__do_kmalloc_node mm/slub.c:4327 [inline]
__kmalloc_noprof+0x2fc/0x4c8 mm/slub.c:4339
kmalloc_noprof include/linux/slab.h:909 [inline]
sk_prot_alloc+0xc4/0x1f0 net/core/sock.c:2198
sk_alloc+0x44/0x3ac net/core/sock.c:2254
bt_sock_alloc+0x4c/0x300 net/bluetooth/af_bluetooth.c:148
hci_sock_create+0xa8/0x194 net/bluetooth/hci_sock.c:2202
bt_sock_create+0x14c/0x24c net/bluetooth/af_bluetooth.c:132
__sock_create+0x43c/0x91c net/socket.c:1541
sock_create net/socket.c:1599 [inline]
__sys_socket_create net/socket.c:1636 [inline]
__sys_socket+0xd4/0x1c0 net/socket.c:1683
__do_sys_socket net/socket.c:1697 [inline]
__se_sys_socket net/socket.c:1695 [inline]
__arm64_sys_socket+0x7c/0x94 net/socket.c:1695
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Freed by task 6607:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:576
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x68/0x88 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2380 [inline]
slab_free mm/slub.c:4642 [inline]
kfree+0x17c/0x474 mm/slub.c:4841
sk_prot_free net/core/sock.c:2237 [inline]
__sk_destruct+0x4f4/0x760 net/core/sock.c:2332
sk_destruct net/core/sock.c:2360 [inline]
__sk_free+0x320/0x430 net/core/sock.c:2371
sk_free+0x60/0xc8 net/core/sock.c:2382
sock_put include/net/sock.h:1944 [inline]
mgmt_pending_free+0x88/0x118 net/bluetooth/mgmt_util.c:290
mgmt_pending_remove+0xec/0x104 net/bluetooth/mgmt_util.c:298
mgmt_set_powered_complete+0x418/0x5cc net/bluetooth/mgmt.c:1355
hci_cmd_sync_work+0x204/0x33c net/bluetooth/hci_sync.c:334
process_one_work+0x7e8/0x156c kernel/workqueue.c:3238
process_scheduled_works kernel/workqueue.c:3319 [inline]
worker_thread+0x958/0xed8 kernel/workqueue.c:3400
kthread+0x5fc/0x75c kernel/kthread.c:464
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:847
Fixes:
a380b6cff1a2 ("Bluetooth: Add generic mgmt helper API")
Closes: https://syzkaller.appspot.com/bug?extid=
0a7039d5d9986ff4ecec
Closes: https://syzkaller.appspot.com/bug?extid=
cc0cc52e7f43dc9e6df1
Reported-by: syzbot+0a7039d5d9986ff4ecec@syzkaller.appspotmail.com
Tested-by: syzbot+0a7039d5d9986ff4ecec@syzkaller.appspotmail.com
Tested-by: syzbot+cc0cc52e7f43dc9e6df1@syzkaller.appspotmail.com
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Tue, 3 Jun 2025 20:12:39 +0000 (16:12 -0400)]
Bluetooth: MGMT: Fix UAF on mgmt_remove_adv_monitor_complete
This reworks MGMT_OP_REMOVE_ADV_MONITOR to not use mgmt_pending_add to
avoid crashes like bellow:
==================================================================
BUG: KASAN: slab-use-after-free in mgmt_remove_adv_monitor_complete+0xe5/0x540 net/bluetooth/mgmt.c:5406
Read of size 8 at addr
ffff88801c53f318 by task kworker/u5:5/5341
CPU: 0 UID: 0 PID: 5341 Comm: kworker/u5:5 Not tainted
6.15.0-syzkaller-10402-g4cb6c8af8591 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0xd2/0x2b0 mm/kasan/report.c:521
kasan_report+0x118/0x150 mm/kasan/report.c:634
mgmt_remove_adv_monitor_complete+0xe5/0x540 net/bluetooth/mgmt.c:5406
hci_cmd_sync_work+0x261/0x3a0 net/bluetooth/hci_sync.c:334
process_one_work kernel/workqueue.c:3238 [inline]
process_scheduled_works+0xade/0x17b0 kernel/workqueue.c:3321
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3402
kthread+0x711/0x8a0 kernel/kthread.c:464
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Allocated by task 5987:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__kmalloc_cache_noprof+0x230/0x3d0 mm/slub.c:4358
kmalloc_noprof include/linux/slab.h:905 [inline]
kzalloc_noprof include/linux/slab.h:1039 [inline]
mgmt_pending_new+0x65/0x240 net/bluetooth/mgmt_util.c:252
mgmt_pending_add+0x34/0x120 net/bluetooth/mgmt_util.c:279
remove_adv_monitor+0x103/0x1b0 net/bluetooth/mgmt.c:5454
hci_mgmt_cmd+0x9c9/0xef0 net/bluetooth/hci_sock.c:1719
hci_sock_sendmsg+0x6ca/0xef0 net/bluetooth/hci_sock.c:1839
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg+0x219/0x270 net/socket.c:727
sock_write_iter+0x258/0x330 net/socket.c:1131
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x548/0xa90 fs/read_write.c:686
ksys_write+0x145/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5989:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:576
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x62/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2380 [inline]
slab_free mm/slub.c:4642 [inline]
kfree+0x18e/0x440 mm/slub.c:4841
mgmt_pending_foreach+0xc9/0x120 net/bluetooth/mgmt_util.c:242
mgmt_index_removed+0x10d/0x2f0 net/bluetooth/mgmt.c:9366
hci_sock_bind+0xbe9/0x1000 net/bluetooth/hci_sock.c:1314
__sys_bind_socket net/socket.c:1810 [inline]
__sys_bind+0x2c3/0x3e0 net/socket.c:1841
__do_sys_bind net/socket.c:1846 [inline]
__se_sys_bind net/socket.c:1844 [inline]
__x64_sys_bind+0x7a/0x90 net/socket.c:1844
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes:
66bd095ab5d4 ("Bluetooth: advmon offload MSFT remove monitor")
Closes: https://syzkaller.appspot.com/bug?extid=
feb0dc579bbe30a13190
Reported-by: syzbot+feb0dc579bbe30a13190@syzkaller.appspotmail.com
Tested-by: syzbot+feb0dc579bbe30a13190@syzkaller.appspotmail.com
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Chandrashekar Devegowda [Tue, 3 Jun 2025 10:04:40 +0000 (15:34 +0530)]
Bluetooth: btintel_pcie: Reduce driver buffer posting to prevent race condition
Modify the driver to post 3 fewer buffers than the maximum rx buffers
(64) allowed for the firmware. This change mitigates a hardware issue
causing a race condition in the firmware, improving stability and data
handling.
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Fixes:
c2b636b3f788 ("Bluetooth: btintel_pcie: Add support for PCIe transport")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Chandrashekar Devegowda [Tue, 3 Jun 2025 10:04:39 +0000 (15:34 +0530)]
Bluetooth: btintel_pcie: Increase the tx and rx descriptor count
This change addresses latency issues observed in HID use cases where
events arrive in bursts. By increasing the Rx descriptor count to 64,
the firmware can handle bursty data more effectively, reducing latency
and preventing buffer overflows.
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Fixes:
c2b636b3f788 ("Bluetooth: btintel_pcie: Add support for PCIe transport")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Kiran K [Tue, 3 Jun 2025 10:04:38 +0000 (15:34 +0530)]
Bluetooth: btintel_pcie: Fix driver not posting maximum rx buffers
The driver was posting only 6 rx buffers, despite the maximum rx buffers
being defined as 16. Having fewer RX buffers caused firmware exceptions
in HID use cases when events arrived in bursts.
Exception seen on android 6.12 kernel.
E Bluetooth: hci0: Received hw exception interrupt
E Bluetooth: hci0: Received gp1 mailbox interrupt
D Bluetooth: hci0:
00000000: ff 3e 87 80 03 01 01 01 03 01 0c 0d 02 1c 10 0e
D Bluetooth: hci0:
00000010: 01 00 05 14 66 b0 28 b0 c0 b0 28 b0 ac af 28 b0
D Bluetooth: hci0:
00000020: 14 f1 28 b0 00 00 00 00 fa 04 00 00 00 00 40 10
D Bluetooth: hci0:
00000030: 08 00 00 00 7a 7a 7a 7a 47 00 fb a0 10 00 00 00
D Bluetooth: hci0:
00000000: 10 01 0a
E Bluetooth: hci0: ---- Dump of debug registers —
E Bluetooth: hci0: boot stage: 0xe0fb0047
E Bluetooth: hci0: ipc status: 0x00000004
E Bluetooth: hci0: ipc control: 0x00000000
E Bluetooth: hci0: ipc sleep control: 0x00000000
E Bluetooth: hci0: mbox_1: 0x00badbad
E Bluetooth: hci0: mbox_2: 0x0000101c
E Bluetooth: hci0: mbox_3: 0x00000008
E Bluetooth: hci0: mbox_4: 0x7a7a7a7a
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Fixes:
c2b636b3f788 ("Bluetooth: btintel_pcie: Add support for PCIe transport")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pauli Virtanen [Sat, 31 May 2025 15:24:58 +0000 (18:24 +0300)]
Bluetooth: hci_core: fix list_for_each_entry_rcu usage
Releasing + re-acquiring RCU lock inside list_for_each_entry_rcu() loop
body is not correct.
Fix by taking the update-side hdev->lock instead.
Fixes:
c7eaf80bfb0c ("Bluetooth: Fix hci_link_tx_to RCU lock usage")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Eric Dumazet [Wed, 4 Jun 2025 13:38:26 +0000 (13:38 +0000)]
calipso: unlock rcu before returning -EAFNOSUPPORT
syzbot reported that a recent patch forgot to unlock rcu
in the error path.
Adopt the convention that netlbl_conn_setattr() is already using.
Fixes:
6e9f2df1c550 ("calipso: Don't call calipso functions for AF_INET sk.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20250604133826.1667664-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Wed, 4 Jun 2025 11:32:52 +0000 (14:32 +0300)]
seg6: Fix validation of nexthop addresses
The kernel currently validates that the length of the provided nexthop
address does not exceed the specified length. This can lead to the
kernel reading uninitialized memory if user space provided a shorter
length than the specified one.
Fix by validating that the provided length exactly matches the specified
one.
Fixes:
d1df6fd8a1d2 ("ipv6: sr: define core operations for seg6local lightweight tunnel")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250604113252.371528-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 4 Jun 2025 10:58:15 +0000 (10:58 +0000)]
net: prevent a NULL deref in rtnl_create_link()
At the time rtnl_create_link() is running, dev->netdev_ops is NULL,
we must not use netdev_lock_ops() or risk a NULL deref if
CONFIG_NET_SHAPER is defined.
Use netif_set_group() instead of dev_set_group().
RIP: 0010:netdev_need_ops_lock include/net/netdev_lock.h:33 [inline]
RIP: 0010:netdev_lock_ops include/net/netdev_lock.h:41 [inline]
RIP: 0010:dev_set_group+0xc0/0x230 net/core/dev_api.c:82
Call Trace:
<TASK>
rtnl_create_link+0x748/0xd10 net/core/rtnetlink.c:3674
rtnl_newlink_create+0x25c/0xb00 net/core/rtnetlink.c:3813
__rtnl_newlink net/core/rtnetlink.c:3940 [inline]
rtnl_newlink+0x16d6/0x1c70 net/core/rtnetlink.c:4055
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6944
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2534
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x75b/0x8d0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:712 [inline]
Reported-by: syzbot+9fc858ba0312b42b577e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
6840265f.
a00a0220.d4325.0009.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes:
7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250604105815.1516973-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 4 Jun 2025 09:39:28 +0000 (09:39 +0000)]
net: annotate data-races around cleanup_net_task
from_cleanup_net() reads cleanup_net_task locklessly.
Add READ_ONCE()/WRITE_ONCE() annotations to avoid
a potential KCSAN warning, even if the race is harmless.
Fixes:
0734d7c3d93c ("net: expedite synchronize_net() for cleanup_net()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250604093928.1323333-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Jun 2025 01:20:55 +0000 (18:20 -0700)]
selftests: drv-net: tso: make bkg() wait for socat to quit
Commit
846742f7e32f ("selftests: drv-net: add a warning for
bkg + shell + terminate") added a warning for bkg() used
with terminate=True. The tso test was missed as we didn't
have it running anywhere in NIPA. Add exit_wait=True, to avoid:
# Warning: combining shell and terminate is risky!
# SIGTERM may not reach the child on zsh/ksh!
getting printed twice for every variant.
Fixes:
0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604012055.891431-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Jun 2025 01:20:31 +0000 (18:20 -0700)]
selftests: drv-net: tso: fix the GRE device name
The device type for IPv4 GRE is "gre" not "ipgre",
unlike for IPv6 which uses "ip6gre".
Not sure how I missed this when writing the test, perhaps
because all HW I have access to is on an IPv6-only network.
Fixes:
0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604012031.891242-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Jun 2025 00:16:52 +0000 (17:16 -0700)]
selftests: drv-net: add configs for the TSO test
Add missing config options for the tso.py test, specifically
to make sure the kernel is built with vxlan and gre tunnels.
I noticed this while adding a TSO-capable device QEMU to the CI.
Previously we only run virtio tests and it doesn't report LSO
stats on the QEMU we have.
Fixes:
0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604001653.853008-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jun 2025 14:59:31 +0000 (07:59 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
iavf: get rid of the crit lock
Przemek Kitszel says:
Fix some deadlocks in iavf, and make it less error prone for the future.
Patch 1 is simple and independent from the rest.
Patches 2, 3, 4 are strictly a refactor, but it enables the last patch
to be much smaller.
(Technically Jake given his RB tags not knowing I will send it to -net).
Patch 5 just adds annotations, this also helps prove last patch to be correct.
Patch 6 removes the crit lock, with its unusual try_lock()s.
I have more refactoring for scheduling done for -next, to be sent soon.
There is a simple test:
add VF; decrease number of queueus; remove VF
that was way too hard to pass without this series :)
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: get rid of the crit lock
iavf: sprinkle netdev_assert_locked() annotations
iavf: extract iavf_watchdog_step() out of iavf_watchdog_task()
iavf: simplify watchdog_task in terms of adminq task scheduling
iavf: centralize watchdog requeueing itself
iavf: iavf_suspend(): take RTNL before netdev_lock()
====================
Link: https://patch.msgid.link/20250603171710.2336151-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mirco Barone [Thu, 5 Jun 2025 12:06:16 +0000 (14:06 +0200)]
wireguard: device: enable threaded NAPI
Enable threaded NAPI by default for WireGuard devices in response to low
performance behavior that we observed when multiple tunnels (and thus
multiple wg devices) are deployed on a single host. This affects any
kind of multi-tunnel deployment, regardless of whether the tunnels share
the same endpoints or not (i.e., a VPN concentrator type of gateway
would also be affected).
The problem is caused by the fact that, in case of a traffic surge that
involves multiple tunnels at the same time, the polling of the NAPI
instance of all these wg devices tends to converge onto the same core,
causing underutilization of the CPU and bottlenecking performance.
This happens because NAPI polling is hosted by default in softirq
context, but the WireGuard driver only raises this softirq after the rx
peer queue has been drained, which doesn't happen during high traffic.
In this case, the softirq already active on a core is reused instead of
raising a new one.
As a result, once two or more tunnel softirqs have been scheduled on
the same core, they remain pinned there until the surge ends.
In our experiments, this almost always leads to all tunnel NAPIs being
handled on a single core shortly after a surge begins, limiting
scalability to less than 3× the performance of a single tunnel, despite
plenty of unused CPU cores being available.
The proposed mitigation is to enable threaded NAPI for all WireGuard
devices. This moves the NAPI polling context to a dedicated per-device
kernel thread, allowing the scheduler to balance the load across all
available cores.
On our 32-core gateways, enabling threaded NAPI yields a ~4× performance
improvement with 16 tunnels, increasing throughput from ~13 Gbps to
~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck)
jumps from 20% to 100%.
We have found no performance regressions in any scenario we tested.
Single-tunnel throughput remains unchanged.
More details are available in our Netdev paper.
Link: https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf
Signed-off-by: Mirco Barone <mirco.barone@polito.it>
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250605120616.2808744-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Thu, 5 Jun 2025 13:19:33 +0000 (15:19 +0200)]
Merge tag 'wireless-2025-06-05' of https://git./linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Couple of quick fixes:
- iwlwifi/iwlmld crash on certain error paths
- iwlwifi/iwlmld regulatory data mixup
- iwlwifi/iwlmld suspend/resume fix
- iwlwifi MSI (without -X) fix
- cfg80211/mac80211 S1G parsing fixes
* tag 'wireless-2025-06-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements
wifi: iwlwifi: mld: Move regulatory domain initialization
wifi: iwlwifi: pcie: fix non-MSIX handshake register
wifi: iwlwifi: mld: avoid panic on init failure
wifi: iwlwifi: mvm: fix assert on suspend
====================
Link: https://patch.msgid.link/20250605095443.17874-6-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Jun 2025 11:37:02 +0000 (13:37 +0200)]
Merge tag 'nf-25-06-05' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Zero out the remainder in nft_pipapo AVX2 implementation, otherwise
next lookup could bogusly report a mismatch. This is followed by two
patches to update nft_pipapo selftests to cover for the previous bug.
From Florian Westphal.
2) Check for reverse tuple too in case of esoteric NAT collisions for
UDP traffic and extend selftest coverage. Also from Florian.
netfilter pull request 25-06-05
* tag 'nf-25-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
selftests: netfilter: nft_concat_range.sh: add datapath check for map fill bug
selftests: netfilter: nft_concat_range.sh: prefer per element counters for testing
netfilter: nf_set_pipapo_avx2: fix initial map fill
====================
Link: https://patch.msgid.link/20250605085735.52205-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Jun 2025 10:50:12 +0000 (12:50 +0200)]
Merge branch 'netlink-specs-rt-link-decode-ip6gre'
Jakub Kicinski says:
====================
netlink: specs: rt-link: decode ip6gre
Adding GRE tunnels to the .config for driver tests caused
some unhappiness in YNL, as it can't decode all the link
attrs on the system. Add ip6gre support to fix the tests.
This is similar to commit
6ffdbb93a59c ("netlink: specs:
rt_link: decode ip6tnl, vti and vti6 link attrs").
====================
Link: https://patch.msgid.link/20250603135357.502626-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 3 Jun 2025 13:53:57 +0000 (06:53 -0700)]
netlink: specs: rt-link: decode ip6gre
Driver tests now require GRE tunnels, while we don't configure
them with YNL, YNL will complain when it sees link types it
doesn't recognize. Teach it decoding ip6gre tunnels. The attrs
are largely the same as IPv4 GRE.
Correct the type of encap-limit, but note that this attr is
only used in ip6gre, so the mistake didn't matter until now.
Fixes:
0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250603135357.502626-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 3 Jun 2025 13:53:56 +0000 (06:53 -0700)]
netlink: specs: rt-link: add missing byte-order properties
A number of fields in the ip tunnels are lacking the big-endian
designation. I suspect this is not intentional, as decoding
the ports with the right endian seems objectively beneficial.
Fixes:
6ffdbb93a59c ("netlink: specs: rt_link: decode ip6tnl, vti and vti6 link attrs")
Fixes:
077b6022d24b ("doc/netlink/specs: Add sub-message type to rt_link family")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250603135357.502626-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Jun 2025 10:37:10 +0000 (12:37 +0200)]
Merge tag 'ovpn-net-
20250603' of https://github.com/OpenVPN/ovpn-net-next
Antonio Quartulli says:
====================
In this batch you can find the following bug fixes:
Patch 1: when releasing a UDP socket we were wrongly invoking
setup_udp_tunnel_sock() with an empty config. This was not
properly shutting down the UDP encap state.
With this patch we simply undo what was done during setup.
Patch 2: ovpn was holding a reference to a 'struct socket'
without increasing its reference counter. This was intended
and worked as expected until we hit a race condition where
user space tries to close the socket while kernel space is
also releasing it. In this case the (struct socket *)->sk
member would disappear under our feet leading to a null-ptr-deref.
This patch fixes this issue by having struct ovpn_socket hold
a reference directly to the sk member while also increasing
its reference counter.
Patch 3: in case of errors along the TCP RX path (softirq)
we want to immediately delete the peer, but this operation may
sleep. With this patch we move the peer deletion to a scheduled
worker.
Patch 4 and 5 are instead fixing minor issues in the ovpn
kselftests.
* tag 'ovpn-net-
20250603' of https://github.com/OpenVPN/ovpn-net-next:
selftest/net/ovpn: fix missing file
selftest/net/ovpn: fix TCP socket creation
ovpn: avoid sleep in atomic context in TCP RX error path
ovpn: ensure sk is still valid during cleanup
ovpn: properly deconfigure UDP-tunnel
====================
Link: https://patch.msgid.link/20250603111110.4575-1-antonio@openvpn.net/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniele Palmas [Tue, 3 Jun 2025 09:12:04 +0000 (11:12 +0200)]
net: wwan: mhi_wwan_mbim: use correct mux_id for multiplexing
Recent Qualcomm chipsets like SDX72/75 require MBIM sessionId mapping
to muxId in the range (0x70-0x8F) for the PCIe tethered use.
This has been partially addressed by the referenced commit, mapping
the default data call to muxId = 112, but the multiplexed data calls
scenario was not properly considered, mapping sessionId = 1 to muxId
1, while it should have been 113.
Fix this by moving the session_id assignment logic to mhi_mbim_newlink,
in order to map sessionId = n to muxId = n + WDS_BIND_MUX_DATA_PORT_MUX_ID.
Fixes:
65bc58c3dcad ("net: wwan: mhi: make default data link id configurable")
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20250603091204.2802840-1-dnlplm@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Johannes Berg [Thu, 5 Jun 2025 09:33:07 +0000 (11:33 +0200)]
Merge tag 'iwlwifi-fixes-2025-06-04' of https://git./linux/kernel/git/iwlwifi/iwlwifi-next
Miri Korenblit says:
====================
iwlwifi fixes
====================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Lachlan Hodges [Tue, 3 Jun 2025 05:35:38 +0000 (15:35 +1000)]
wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements
S1G beacons are not traditional beacons but a type of extension frame.
Extension frames contain the frame control and duration fields, followed
by zero or more optional fields before the frame body. These optional
fields are distinct from the variable length elements.
The presence of optional fields is indicated in the frame control field.
To correctly locate the elements offset, the frame control must be parsed
to identify which optional fields are present. Currently, mac80211 parses
S1G beacons based on fixed assumptions about the frame layout, without
inspecting the frame control field. This can result in incorrect offsets
to the "variable" portion of the frame.
Properly parse S1G beacon frames by using the field lengths defined in
IEEE 802.11-2024, section 9.3.4.3, ensuring that the elements offset is
calculated accurately.
Fixes:
9eaffe5078ca ("cfg80211: convert S1G beacon to scan results")
Fixes:
cd418ba63f0c ("mac80211: convert S1G beacon to scan results")
Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Link: https://patch.msgid.link/20250603053538.468562-1-lachlan.hodges@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Paolo Abeni [Thu, 5 Jun 2025 09:07:36 +0000 (11:07 +0200)]
Merge branch 'net-dsa-b53-fix-rgmii-ports'
Jonas Gorski says:
====================
net: dsa: b53: fix RGMII ports
RGMII ports on BCM63xx were not really working, especially with PHYs
that support EEE and are capable of configuring their own RGMII delays.
So let's make them work, and fix additional minor rgmii related issues
found while working on it.
With a BCM96328BU-P300:
Before:
[ 3.580000] b53-switch
10700000.switch GbE3 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.600000] b53-switch
10700000.switch GbE3 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.610000] b53-switch
10700000.switch GbE3 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 4
[ 3.620000] b53-switch
10700000.switch GbE1 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.640000] b53-switch
10700000.switch GbE1 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.650000] b53-switch
10700000.switch GbE1 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 5
[ 3.660000] b53-switch
10700000.switch GbE4 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.680000] b53-switch
10700000.switch GbE4 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.690000] b53-switch
10700000.switch GbE4 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 6
[ 3.700000] b53-switch
10700000.switch GbE5 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.720000] b53-switch
10700000.switch GbE5 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.730000] b53-switch
10700000.switch GbE5 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 7
After:
[ 3.700000] b53-switch
10700000.switch GbE3 (uninitialized): PHY [mdio_mux-0.1:00] driver [Broadcom BCM54612E] (irq=POLL)
[ 3.770000] b53-switch
10700000.switch GbE1 (uninitialized): PHY [mdio_mux-0.1:01] driver [Broadcom BCM54612E] (irq=POLL)
[ 3.850000] b53-switch
10700000.switch GbE4 (uninitialized): PHY [mdio_mux-0.1:18] driver [Broadcom BCM54612E] (irq=POLL)
[ 3.920000] b53-switch
10700000.switch GbE5 (uninitialized): PHY [mdio_mux-0.1:19] driver [Broadcom BCM54612E] (irq=POLL)
====================
Link: https://patch.msgid.link/20250602193953.1010487-1-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:53 +0000 (21:39 +0200)]
net: dsa: b53: do not touch DLL_IQQD on bcm53115
According to OpenMDK, bit 2 of the RGMII register has a different
meaning for BCM53115 [1]:
"DLL_IQQD 1: In the IDDQ mode, power is down0: Normal function
mode"
Configuring RGMII delay works without setting this bit, so let's keep it
at the default. For other chips, we always set it, so not clearing it
is not an issue.
One would assume BCM53118 works the same, but OpenMDK is not quite sure
what this bit actually means [2]:
"BYPASS_IMP_2NS_DEL #1: In the IDDQ mode, power is down#0: Normal
function mode1: Bypass dll65_2ns_del IP0: Use
dll65_2ns_del IP"
So lets keep setting it for now.
[1] https://github.com/Broadcom-Network-Switching-Software/OpenMDK/blob/master/cdk/PKG/chip/bcm53115/bcm53115_a0_defs.h#L19871
[2] https://github.com/Broadcom-Network-Switching-Software/OpenMDK/blob/master/cdk/PKG/chip/bcm53118/bcm53118_a0_defs.h#L14392
Fixes:
967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20250602193953.1010487-6-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:52 +0000 (21:39 +0200)]
net: dsa: b53: allow RGMII for bcm63xx RGMII ports
Add RGMII to supported interfaces for BCM63xx RGMII ports so they can be
actually used in RGMII mode.
Without this, phylink will fail to configure them:
[ 3.580000] b53-switch
10700000.switch GbE3 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.600000] b53-switch
10700000.switch GbE3 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.610000] b53-switch
10700000.switch GbE3 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 4
Fixes:
ce3bf94871f7 ("net: dsa: b53: add support for BCM63xx RGMIIs")
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20250602193953.1010487-5-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:51 +0000 (21:39 +0200)]
net: dsa: b53: do not configure bcm63xx's IMP port interface
The IMP port is not a valid RGMII interface, but hard wired to internal,
so we shouldn't touch the undefined register B53_RGMII_CTRL_IMP.
While this does not seem to have any side effects, let's not touch it at
all, so limit RGMII configuration on bcm63xx to the actual RGMII ports.
Fixes:
ce3bf94871f7 ("net: dsa: b53: add support for BCM63xx RGMIIs")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250602193953.1010487-4-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:50 +0000 (21:39 +0200)]
net: dsa: b53: do not enable RGMII delay on bcm63xx
bcm63xx's RGMII ports are always in MAC mode, never in PHY mode, so we
shouldn't enable any delays and let the PHY handle any delays as
necessary.
This fixes using RGMII ports with normal PHYs like BCM54612E, which will
handle the delay in the PHY.
Fixes:
ce3bf94871f7 ("net: dsa: b53: add support for BCM63xx RGMIIs")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250602193953.1010487-3-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:49 +0000 (21:39 +0200)]
net: dsa: b53: do not enable EEE on bcm63xx
BCM63xx internal switches do not support EEE, but provide multiple RGMII
ports where external PHYs may be connected. If one of these PHYs are EEE
capable, we may try to enable EEE for the MACs, which then hangs the
system on access of the (non-existent) EEE registers.
Fix this by checking if the switch actually supports EEE before
attempting to configure it.
Fixes:
22256b0afb12 ("net: dsa: b53: Move EEE functions to b53")
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Álvaro Fernández Rojas <noltari@gmail.com>
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20250602193953.1010487-2-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Meghana Malladi [Tue, 3 Jun 2025 05:29:04 +0000 (10:59 +0530)]
net: ti: icssg-prueth: Fix swapped TX stats for MII interfaces.
In MII mode, Tx lines are swapped for port0 and port1, which means
Tx port0 receives data from PRU1 and the Tx port1 receives data from
PRU0. This is an expected hardware behavior and reading the Tx stats
needs to be handled accordingly in the driver. Update the driver to
read Tx stats from the PRU1 for port0 and PRU0 for port1.
Fixes:
c1e10d5dc7a1 ("net: ti: icssg-prueth: Add ICSSG Stats")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250603052904.431203-1-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Florian Westphal [Fri, 30 May 2025 10:34:03 +0000 (12:34 +0200)]
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
This will fail without the previous bug fix because we erronously
believe that the clashing entry went way.
However, the clash exists in the opposite direction due to an
existing nat mapping:
PASS: IP statless for ns2-LgTIuS
ERROR: failed to test udp ns1-x4iyOW to ns2-LgTIuS with dnat rule step 2, result: ""
This is partially adapted from test instructions from the below
ubuntu tracker.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2109889
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Shaun Brady <brady.1345@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 30 May 2025 10:34:02 +0000 (12:34 +0200)]
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
The logic added in the blamed commit was supposed to only omit nat source
port allocation if neither the existing nor the new entry are subject to
NAT.
However, its not enough to lookup the conntrack based on the proposed
tuple, we must also check the reverse direction.
Otherwise there are esoteric cases where the collision is in the reverse
direction because that colliding connection has a port rewrite, but the
new entry doesn't. In this case, we only check the new entry and then
erronously conclude that no clash exists anymore.
The existing (udp) tuple is:
a:p -> b:P, with nat translation to s:P, i.e. pure daddr rewrite,
reverse tuple in conntrack table is s:P -> a:p.
When another UDP packet is sent directly to s, i.e. a:p->s:P, this is
correctly detected as a colliding entry: tuple is taken by existing reply
tuple in reverse direction.
But the colliding conntrack is only searched for with unreversed
direction, and we can't find such entry matching a:p->s:P.
The incorrect conclusion is that the clashing entry has timed out and
that no port address translation is required.
Such conntrack will then be discarded at nf_confirm time because the
proposed reverse direction clashes with an existing mapping in the
conntrack table.
Search for the reverse tuple too, this will then check the NAT bits of
the colliding entry and triggers port reallocation.
Followp patch extends nft_nat.sh selftest to cover this scenario.
The IPS_SEQ_ADJUST change is also a bug fix:
Instead of checking for SEQ_ADJ this tested for SEEN_REPLY and ASSURED
by accident -- _BIT is only for use with the test_bit() API.
This bug has little consequence in practice, because the sequence number
adjustments are only useful for TCP which doesn't support clash resolution.
The existing test case (conntrack_reverse_clash.sh) exercise a race
condition path (parallel conntrack creation on different CPUs), so
the colliding entries have neither SEEN_REPLY nor ASSURED set.
Thanks to Yafang Shao and Shaun Brady for an initial investigation
of this bug.
Fixes:
d8f84a9bc7c4 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1795
Reported-by: Yafang Shao <laoar.shao@gmail.com>
Reported-by: Shaun Brady <brady.1345@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 23 May 2025 12:20:46 +0000 (14:20 +0200)]
selftests: netfilter: nft_concat_range.sh: add datapath check for map fill bug
commit
0935ee6032df ("selftests: netfilter: add test case for recent mismatch bug")
added a regression check for incorrect initial fill of the result map
that was fixed with
791a615b7ad2 ("netfilter: nf_set_pipapo: fix initial map fill").
The test used 'nft get element', i.e., control plane checks for
match/nomatch results.
The control plane however doesn't use avx2 version, so we need to
send+match packets.
As the additional packet match/nomatch is slow, don't do this for
every element added/removed: add and use maybe_send_(no)match
helpers and use them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 23 May 2025 12:20:45 +0000 (14:20 +0200)]
selftests: netfilter: nft_concat_range.sh: prefer per element counters for testing
The selftest uses following rule:
... @test counter name "test"
Then sends a packet, then checks if the named counter did increment or
not.
This is fine for the 'no-match' test case: If anything matches the
counter increments and the test fails as expected.
But for the 'should match' test cases this isn't optimal.
Consider buggy matching, where the packet matches entry x, but it
should have matched entry y.
In that case the test would erronously pass.
Rework the selftest to use per-element counters to avoid this.
After sending packet that should have matched entry x, query the
relevant element via 'nft reset element' and check that its counter
had incremented.
The 'nomatch' case isn't altered, no entry should match so the named
counter must be 0, changing it to the per-element counter would then
pass if another entry matches.
The downside of this change is a slight increase in test run-time by
a few seconds.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 23 May 2025 12:20:44 +0000 (14:20 +0200)]
netfilter: nf_set_pipapo_avx2: fix initial map fill
If the first field doesn't cover the entire start map, then we must zero
out the remainder, else we leak those bits into the next match round map.
The early fix was incomplete and did only fix up the generic C
implementation.
A followup patch adds a test case to nft_concat_range.sh.
Fixes:
791a615b7ad2 ("netfilter: nf_set_pipapo: fix initial map fill")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ilan Peer [Wed, 4 Jun 2025 03:13:21 +0000 (06:13 +0300)]
wifi: iwlwifi: mld: Move regulatory domain initialization
The regulatory domain information was initialized every time the
FW was loaded and the device was restarted. This was unnecessary
and useless as at this stage the wiphy channels information was
not setup yet so while the regulatory domain was set to the wiphy,
the channel information was not updated.
In case that a specific MCC was configured during FW initialization
then following updates with this MCC are ignored, and thus the
wiphy channels information is left with information not matching
the regulatory domain.
This commit moves the regulatory domain initialization to after the
operational firmware is started, i.e., after the wiphy channels were
configured and the regulatory information is needed.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250604061200.f138a7382093.I2fd8b3e99be13c2687da483e2cb1311ffb4fbfce@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Johannes Berg [Wed, 4 Jun 2025 03:13:20 +0000 (06:13 +0300)]
wifi: iwlwifi: pcie: fix non-MSIX handshake register
When reading the interrupt status after a FW reset handshake
timeout, read the actual value not the mask for the non-MSIX
case.
Fixes:
ab606dea80c4 ("wifi: iwlwifi: pcie: add support for the reset handshake in MSI")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250604061200.87a849a55086.I2f8571aafa55aa3b936a30b938de9d260592a584@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Miri Korenblit [Wed, 4 Jun 2025 03:13:19 +0000 (06:13 +0300)]
wifi: iwlwifi: mld: avoid panic on init failure
In case of an error during init, in_hw_restart will be set, but it will
never get cleared.
Instead, we will retry to init again, and then we will act like we are in a
restart when we are actually not.
This causes (among others) to a NULL pointer dereference when canceling
rx_omi::finished_work, that was not even initialized, because we thought
that we are in hw_restart.
Set in_hw_restart to true only if the fw is running, then we know that
FW was loaded successfully and we are not going to the retry loop.
Fixes:
7391b2a4f7db ("wifi: iwlwifi: rework firmware error handling")
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250604061200.e0040e0a4b09.Iae469a0abe6bfa3c26d8a88c066bad75c2e8f121@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Miri Korenblit [Wed, 4 Jun 2025 03:13:18 +0000 (06:13 +0300)]
wifi: iwlwifi: mvm: fix assert on suspend
After using DEFINE_RAW_FLEX, cmd is a pointer to iwl_rxq_sync_cmd,
and not a variable containing both the command and notification.
Adjust hcmd->data and hcmd->len assignment as well.
Fixes:
7438843df8cf ("wifi: iwlwifi: mvm: Avoid -Wflex-array-member-not-at-end warning")
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250604031321.2277481-2-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Alok Tiwari [Mon, 2 Jun 2025 10:34:29 +0000 (03:34 -0700)]
gve: add missing NULL check for gve_alloc_pending_packet() in TX DQO
gve_alloc_pending_packet() can return NULL, but gve_tx_add_skb_dqo()
did not check for this case before dereferencing the returned pointer.
Add a missing NULL check to prevent a potential NULL pointer
dereference when allocation fails.
This improves robustness in low-memory scenarios.
Fixes:
a57e5de476be ("gve: DQO: Add TX path")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Przemek Kitszel [Fri, 4 Apr 2025 10:23:21 +0000 (12:23 +0200)]
iavf: get rid of the crit lock
Get rid of the crit lock.
That frees us from the error prone logic of try_locks.
Thanks to netdev_lock() by Jakub it is now easy, and in most cases we were
protected by it already - replace crit lock by netdev lock when it was not
the case.
Lockdep reports that we should cancel the work under crit_lock [splat1],
and that was the scheme we have mostly followed since [1] by Slawomir.
But when that is done we still got into deadlocks [splat2]. So instead
we should look at the bigger problem, namely "weird locking/scheduling"
of the iavf. The first step to fix that is to remove the crit lock.
I will followup with a -next series that simplifies scheduling/tasks.
Cancel the work without netdev lock (weird unlock+lock scheme),
to fix the [splat2] (which would be totally ugly if we would kept
the crit lock).
Extend protected part of iavf_watchdog_task() to include scheduling
more work.
Note that the removed comment in iavf_reset_task() was misplaced,
it belonged to inside of the removed if condition, so it's gone now.
[splat1] - w/o this patch - The deadlock during VF removal:
WARNING: possible circular locking dependency detected
sh/3825 is trying to acquire lock:
((work_completion)(&(&adapter->watchdog_task)->work)){+.+.}-{0:0}, at: start_flush_work+0x1a1/0x470
but task is already holding lock:
(&adapter->crit_lock){+.+.}-{4:4}, at: iavf_remove+0xd1/0x690 [iavf]
which lock already depends on the new lock.
[splat2] - when cancelling work under crit lock, w/o this series,
see [2] for the band aid attempt
WARNING: possible circular locking dependency detected
sh/3550 is trying to acquire lock:
((wq_completion)iavf){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
but task is already holding lock:
(&dev->lock){+.+.}-{4:4}, at: iavf_remove+0xa6/0x6e0 [iavf]
which lock already depends on the new lock.
[1]
fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation")
[2] https://github.com/pkitszel/linux/commit/
52dddbfc2bb60294083f5711a158a
Fixes:
d1639a17319b ("iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies")
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemek Kitszel [Fri, 4 Apr 2025 10:23:20 +0000 (12:23 +0200)]
iavf: sprinkle netdev_assert_locked() annotations
Lockdep annotations help in general, but here it is extra good, as next
commit will remove crit lock.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemek Kitszel [Fri, 4 Apr 2025 10:23:19 +0000 (12:23 +0200)]
iavf: extract iavf_watchdog_step() out of iavf_watchdog_task()
Finish up easy refactor of watchdog_task, total for this + prev two
commits is:
1 file changed, 47 insertions(+), 82 deletions(-)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemek Kitszel [Fri, 4 Apr 2025 10:23:18 +0000 (12:23 +0200)]
iavf: simplify watchdog_task in terms of adminq task scheduling
Simplify the decision whether to schedule adminq task. The condition is
the same, but it is executed in more scenarios.
Note that movement of watchdog_done label makes this commit a bit
surprising. (Hence not squashing it to anything bigger).
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemek Kitszel [Fri, 4 Apr 2025 10:23:17 +0000 (12:23 +0200)]
iavf: centralize watchdog requeueing itself
Centralize the unlock(critlock); unlock(netdev); queue_delayed_work(watchog_task);
pattern to one place.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Przemek Kitszel [Fri, 4 Apr 2025 10:23:16 +0000 (12:23 +0200)]
iavf: iavf_suspend(): take RTNL before netdev_lock()
Fix an obvious violation of lock ordering.
Jakub's [1] added netdev_lock() call that is wrong ordered wrt RTNL,
but the Fixes tag points to crit_lock being wrongly placed (by lockdep
standards).
Actual reason we got it wrong is dated back to critical section managed by
pure flag checks, which is with us since the very beginning.
[1]
afc664987ab3 ("eth: iavf: extend the netdev_lock usage")
Fixes:
5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Antonio Quartulli [Thu, 22 May 2025 13:33:18 +0000 (15:33 +0200)]
selftest/net/ovpn: fix missing file
test-large-mtu.sh is referenced by the Makefile
but does not exist.
Add it along the other scripts.
Fixes:
944f8b6abab6 ("selftest/net/ovpn: extend coverage with more test cases")
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Antonio Quartulli [Tue, 20 May 2025 14:42:39 +0000 (16:42 +0200)]
selftest/net/ovpn: fix TCP socket creation
TCP sockets cannot be created with AF_UNSPEC, but
one among the supported family must be used.
Since commit
944f8b6abab6 ("selftest/net/ovpn: extend
coverage with more test cases") the default address
family for all tests was changed from AF_INET to AF_UNSPEC,
thus breaking all TCP cases.
Restore AF_INET as default address family for TCP listeners.
Fixes:
944f8b6abab6 ("selftest/net/ovpn: extend coverage with more test cases")
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Antonio Quartulli [Tue, 27 May 2025 22:01:10 +0000 (00:01 +0200)]
ovpn: avoid sleep in atomic context in TCP RX error path
Upon error along the TCP data_ready event path, we have
the following chain of calls:
strp_data_ready()
ovpn_tcp_rcv()
ovpn_peer_del()
ovpn_socket_release()
Since strp_data_ready() may be invoked from softirq context, and
ovpn_socket_release() may sleep, the above sequence may cause a
sleep in atomic context like the following:
BUG: sleeping function called from invalid context at ./ovpn-backports-ovpn-net-next-main-6.15.0-rc5-
20250522/drivers/net/ovpn/socket.c:71
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: ksoftirqd/3
5 locks held by ksoftirqd/3/25:
#0:
ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0xb8/0x5b0
OpenVPN/ovpn-backports#1:
ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0xb8/0x5b0
OpenVPN/ovpn-backports#2:
ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x66/0x1e0
OpenVPN/ovpn-backports#3:
ffffffe003ce9818 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x156e/0x17a0
OpenVPN/ovpn-backports#4:
ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: ovpn_tcp_data_ready+0x0/0x1b0 [ovpn]
CPU: 3 PID: 25 Comm: ksoftirqd/3 Not tainted 5.10.104+ #0
Call Trace:
walk_stackframe+0x0/0x1d0
show_stack+0x2e/0x44
dump_stack+0xc2/0x102
___might_sleep+0x29c/0x2b0
__might_sleep+0x62/0xa0
ovpn_socket_release+0x24/0x2d0 [ovpn]
unlock_ovpn+0x6e/0x190 [ovpn]
ovpn_peer_del+0x13c/0x390 [ovpn]
ovpn_tcp_rcv+0x280/0x560 [ovpn]
__strp_recv+0x262/0x940
strp_recv+0x66/0x80
tcp_read_sock+0x122/0x410
strp_data_ready+0x156/0x1f0
ovpn_tcp_data_ready+0x92/0x1b0 [ovpn]
tcp_data_ready+0x6c/0x150
tcp_rcv_established+0xb36/0xc50
tcp_v4_do_rcv+0x25e/0x380
tcp_v4_rcv+0x166a/0x17a0
ip_protocol_deliver_rcu+0x8c/0x250
ip_local_deliver_finish+0xf8/0x1e0
ip_local_deliver+0xc2/0x2d0
ip_rcv+0x1f2/0x330
__netif_receive_skb+0xfc/0x290
netif_receive_skb+0x104/0x5b0
br_pass_frame_up+0x190/0x3f0
br_handle_frame_finish+0x3e2/0x7a0
br_handle_frame+0x750/0xab0
__netif_receive_skb_core.constprop.0+0x4c0/0x17f0
__netif_receive_skb+0xc6/0x290
netif_receive_skb+0x104/0x5b0
xgmac_dma_rx+0x962/0xb40
__napi_poll.constprop.0+0x5a/0x350
net_rx_action+0x1fe/0x4b0
__do_softirq+0x1f8/0x85c
run_ksoftirqd+0x80/0xd0
smpboot_thread_fn+0x1f0/0x3e0
kthread+0x1e6/0x210
ret_from_kernel_thread+0x8/0xc
Fix this issue by postponing the ovpn_peer_del() call to
a scheduled worker, as we already do in ovpn_tcp_send_sock()
for the very same reason.
Fixes:
11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/13
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Antonio Quartulli [Wed, 30 Apr 2025 00:26:49 +0000 (02:26 +0200)]
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O
6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:
ffffc90000c9fb18 EFLAGS:
00010217
RAX:
dffffc0000000000 RBX:
ffff8881148d7940 RCX:
ffffffff817787bb
RDX:
0000000000000077 RSI:
0000000000000008 RDI:
00000000000003be
RBP:
ffffc90000c9fb30 R08:
0000000000000000 R09:
fffffbfff0d3e840
R10:
ffffffff869f4207 R11:
0000000000000000 R12:
0000000000000000
R13:
ffff888115eb9300 R14:
ffffc90000c9fbc8 R15:
000000000000000c
FS:
0000000000000000(0000) GS:
ffff8882b0151000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f37266b6114 CR3:
00000000054a8000 CR4:
0000000000750ef0
PKRU:
55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes:
11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Antonio Quartulli [Tue, 13 May 2025 14:43:59 +0000 (16:43 +0200)]
ovpn: properly deconfigure UDP-tunnel
When deconfiguring a UDP-tunnel from a socket, we cannot
call setup_udp_tunnel_sock() with an empty config, because
this helper is expected to be invoked only during setup.
Get rid of the call to setup_udp_tunnel_sock() and just
revert what it did during socket initialization..
Note that the global udp_encap_needed_key and the GRO state
are left untouched: udp_destroy_socket() will eventually
take care of them.
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes:
ab66abbc769b ("ovpn: implement basic RX path (UDP)")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Closes: https://lore.kernel.org/netdev/
1a47ce02-fd42-4761-8697-
f3f315011cc6@redhat.com
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Paolo Abeni [Tue, 3 Jun 2025 11:04:08 +0000 (13:04 +0200)]
Merge branch 'net-airoha-fix-ipv6-hw-acceleration'
Lorenzo Bianconi says:
====================
net: airoha: Fix IPv6 hw acceleration
Fix IPv6 hw acceleration in bridge mode resolving ib2 and
airoha_foe_mac_info_common overwrite in
airoha_ppe_foe_commit_subflow_entry routine.
Introduce UPDMEM source-mac table used to set source mac address for
L3 IPv6 hw accelerated traffic.
====================
Link: https://patch.msgid.link/20250602-airoha-flowtable-ipv6-fix-v2-0-3287f8b55214@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Mon, 2 Jun 2025 10:55:39 +0000 (12:55 +0200)]
net: airoha: Fix smac_id configuration in bridge mode
Set PPE entry smac_id field to 0xf in airoha_ppe_foe_commit_subflow_entry
routine for IPv6 traffic in order to instruct the hw to keep original
source mac address for IPv6 hw accelerated traffic in bridge mode.
Fixes:
cd53f622611f ("net: airoha: Add L2 hw acceleration support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250602-airoha-flowtable-ipv6-fix-v2-3-3287f8b55214@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Mon, 2 Jun 2025 10:55:38 +0000 (12:55 +0200)]
net: airoha: Fix IPv6 hw acceleration in bridge mode
ib2 and airoha_foe_mac_info_common have not the same offsets in
airoha_foe_bridge and airoha_foe_ipv6 structures. Current codebase does
not accelerate IPv6 traffic in bridge mode since ib2 and l2 info are not
set properly copying airoha_foe_bridge struct into airoha_foe_ipv6 one
in airoha_ppe_foe_commit_subflow_entry routine.
Fix IPv6 hw acceleration in bridge mode resolving ib2 and
airoha_foe_mac_info_common overwrite in
airoha_ppe_foe_commit_subflow_entry() and configuring them with proper
values.
Fixes:
cd53f622611f ("net: airoha: Add L2 hw acceleration support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250602-airoha-flowtable-ipv6-fix-v2-2-3287f8b55214@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Mon, 2 Jun 2025 10:55:37 +0000 (12:55 +0200)]
net: airoha: Initialize PPE UPDMEM source-mac table
UPDMEM source-mac table is a key-value map used to store devices mac
addresses according to the port identifier. UPDMEM source mac table is
used during IPv6 traffic hw acceleration since PPE entries, for space
constraints, do not contain the full source mac address but just the
identifier in the UPDMEM source-mac table.
Configure UPDMEM source-mac table with device mac addresses and set
the source-mac ID field for PPE IPv6 entries in order to select the
proper device mac address as source mac for L3 IPv6 hw accelerated traffic.
Fixes:
00a7678310fe ("net: airoha: Introduce flowtable offload support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250602-airoha-flowtable-ipv6-fix-v2-1-3287f8b55214@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bui Quang Minh [Sun, 1 Jun 2025 14:29:13 +0000 (21:29 +0700)]
selftests: net: build net/lib dependency in all target
We have the logic to include net/lib automatically for net related
selftests. However, currently, this logic is only in install target
which means only `make install` will have net/lib included. This commit
adds the logic to all target so that all `make`, `make run_tests` and
`make install` will have net/lib included in net related selftests.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250601142914.13379-1-minhquangbui99@gmail.com
Fixes:
b86761ff6374 ("selftests: net: add scaffolding for Netlink tests in Python")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ronak Doshi [Fri, 30 May 2025 15:27:00 +0000 (15:27 +0000)]
vmxnet3: correctly report gso type for UDP tunnels
Commit
3d010c8031e3 ("udp: do not accept non-tunnel GSO skbs landing
in a tunnel") added checks in linux stack to not accept non-tunnel
GRO packets landing in a tunnel. This exposed an issue in vmxnet3
which was not correctly reporting GRO packets for tunnel packets.
This patch fixes this issue by setting correct GSO type for the
tunnel packets.
Currently, vmxnet3 does not support reporting inner fields for LRO
tunnel packets. The issue is not seen for egress drivers that do not
use skb inner fields. The workaround is to enable tnl-segmentation
offload on the egress interfaces if the driver supports it. This
problem pre-exists this patch fix and can be addressed as a separate
future patch.
Fixes:
dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support")
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Guolin Yang <guolin.yang@broadcom.com>
Link: https://patch.msgid.link/20250530152701.70354-1-ronak.doshi@broadcom.com
[pabeni@redhat.com: dropped the changelog]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Fri, 30 May 2025 13:58:00 +0000 (06:58 -0700)]
Revert "kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests"
This reverts commit
a571a9a1b120264e24b41eddf1ac5140131bfa84.
The commit in question breaks kunit for older compilers:
$ gcc --version
gcc (GCC) 11.5.0
20240719 (Red Hat 11.5.0-5)
$ ./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=x86_64 O=.kunit olddefconfig
ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config.
This is probably due to unsatisfied dependencies.
Missing: CONFIG_INIT_STACK_ALL_PATTERN=y
Link: https://lore.kernel.org/20250529083811.778bc31b@kernel.org
Fixes:
a571a9a1b120 ("kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://patch.msgid.link/20250530135800.13437-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jinjian Song [Fri, 30 May 2025 03:16:48 +0000 (11:16 +0800)]
net: wwan: t7xx: Fix napi rx poll issue
When driver handles the napi rx polling requests, the netdev might
have been released by the dellink logic triggered by the disconnect
operation on user plane. However, in the logic of processing skb in
polling, an invalid netdev is still being used, which causes a panic.
BUG: kernel NULL pointer dereference, address:
00000000000000f1
Oops: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:dev_gro_receive+0x3a/0x620
[...]
Call Trace:
<IRQ>
? __die_body+0x68/0xb0
? page_fault_oops+0x379/0x3e0
? exc_page_fault+0x4f/0xa0
? asm_exc_page_fault+0x22/0x30
? __pfx_t7xx_ccmni_recv_skb+0x10/0x10 [mtk_t7xx (HASH:1400 7)]
? dev_gro_receive+0x3a/0x620
napi_gro_receive+0xad/0x170
t7xx_ccmni_recv_skb+0x48/0x70 [mtk_t7xx (HASH:1400 7)]
t7xx_dpmaif_napi_rx_poll+0x590/0x800 [mtk_t7xx (HASH:1400 7)]
net_rx_action+0x103/0x470
irq_exit_rcu+0x13a/0x310
sysvec_apic_timer_interrupt+0x56/0x90
</IRQ>
Fixes:
5545b7b9f294 ("net: wwan: t7xx: Add NAPI support")
Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
Link: https://patch.msgid.link/20250530031648.5592-1-jinjian.song@fibocom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 3 Jun 2025 01:44:37 +0000 (18:44 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-05-30 (ice, idpf)
For ice:
Michal resolves XDP issues related to Tx scheduler configuration with
large number of Tx queues.
Additional information:
https://lore.kernel.org/intel-wired-lan/
20250513105529.241745-1-michal.kubiak@intel.com/
For idpf:
Brian Vazquez updates netif_subqueue_maybe_stop() condition check to
prevent possible races.
Emil shuts down virtchannel mailbox during reset to reduce timeout
delays as it's unavailable during that time.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: avoid mailbox timeout delays during reset
idpf: fix a race in txq wakeup
ice: fix rebuilding the Tx scheduler tree for large queue counts
ice: create new Tx scheduler nodes for new queues only
ice: fix Tx scheduler error handling in XDP callback
====================
Link: https://patch.msgid.link/20250530211221.2170484-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shiming Cheng [Fri, 30 May 2025 01:26:08 +0000 (09:26 +0800)]
net: fix udp gso skb_segment after pull from frag_list
Commit
a1e40ac5b5e9 ("net: gso: fix udp gso fraglist segmentation after
pull from frag_list") detected invalid geometry in frag_list skbs and
redirects them from skb_segment_list to more robust skb_segment. But some
packets with modified geometry can also hit bugs in that code. We don't
know how many such cases exist. Addressing each one by one also requires
touching the complex skb_segment code, which risks introducing bugs for
other types of skbs. Instead, linearize all these packets that fail the
basic invariants on gso fraglist skbs. That is more robust.
If only part of the fraglist payload is pulled into head_skb, it will
always cause exception when splitting skbs by skb_segment. For detailed
call stack information, see below.
Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size
Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify fraglist skbs, breaking these invariants.
In extreme cases they pull one part of data into skb linear. For UDP,
this causes three payloads with lengths of (11,11,10) bytes were
pulled tail to become (12,10,10) bytes.
The skbs no longer meets the above SKB_GSO_FRAGLIST conditions because
payload was pulled into head_skb, it needs to be linearized before pass
to regular skb_segment.
skb_segment+0xcd0/0xd14
__udp_gso_segment+0x334/0x5f4
udp4_ufo_fragment+0x118/0x15c
inet_gso_segment+0x164/0x338
skb_mac_gso_segment+0xc4/0x13c
__skb_gso_segment+0xc4/0x124
validate_xmit_skb+0x9c/0x2c0
validate_xmit_skb_list+0x4c/0x80
sch_direct_xmit+0x70/0x404
__dev_queue_xmit+0x64c/0xe5c
neigh_resolve_output+0x178/0x1c4
ip_finish_output2+0x37c/0x47c
__ip_finish_output+0x194/0x240
ip_finish_output+0x20/0xf4
ip_output+0x100/0x1a0
NF_HOOK+0xc4/0x16c
ip_forward+0x314/0x32c
ip_rcv+0x90/0x118
__netif_receive_skb+0x74/0x124
process_backlog+0xe8/0x1a4
__napi_poll+0x5c/0x1f8
net_rx_action+0x154/0x314
handle_softirqs+0x154/0x4b8
[118.376811] [
C201134] rxq0_pus: [name:bug&]kernel BUG at net/core/skbuff.c:4278!
[118.376829] [
C201134] rxq0_pus: [name:traps&]Internal error: Oops - BUG:
00000000f2000800 [#1] PREEMPT SMP
[118.470774] [
C201134] rxq0_pus: [name:mrdump&]Kernel Offset: 0x178cc00000 from 0xffffffc008000000
[118.470810] [
C201134] rxq0_pus: [name:mrdump&]PHYS_OFFSET: 0x40000000
[118.470827] [
C201134] rxq0_pus: [name:mrdump&]pstate:
60400005 (nZCv daif +PAN -UAO)
[118.470848] [
C201134] rxq0_pus: [name:mrdump&]pc : [0xffffffd79598aefc] skb_segment+0xcd0/0xd14
[118.470900] [
C201134] rxq0_pus: [name:mrdump&]lr : [0xffffffd79598a5e8] skb_segment+0x3bc/0xd14
[118.470928] [
C201134] rxq0_pus: [name:mrdump&]sp :
ffffffc008013770
Fixes:
a1e40ac5b5e9 ("gso: fix udp gso fraglist segmentation after pull from frag_list")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 31 May 2025 02:53:53 +0000 (19:53 -0700)]
Merge branch 'net-fix-inet_proto_csum_replace_by_diff-for-ipv6'
Paul Chaignon says:
====================
net: Fix inet_proto_csum_replace_by_diff for IPv6
This patchset fixes a bug that causes skb->csum to hold an incorrect
value when calling inet_proto_csum_replace_by_diff for an IPv6 packet
in CHECKSUM_COMPLETE state. This bug affects BPF helper
bpf_l4_csum_replace and IPv6 ILA in adj-transport mode.
In those cases, inet_proto_csum_replace_by_diff updates the L4 checksum
field after an IPv6 address change. These two changes cancel each other
in terms of checksum, so skb->csum shouldn't be updated.
v2: https://lore.kernel.org/aCz84JU60wd8etiT@mail.gmail.com
====================
Link: https://patch.msgid.link/cover.1748509484.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Chaignon [Thu, 29 May 2025 10:28:35 +0000 (12:28 +0200)]
bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE
In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
things, update the L4 checksum after reverse SNATing IPv6 packets. That
use case is however not currently supported and leads to invalid
skb->csum values in some cases. This patch adds support for IPv6 address
changes in bpf_l4_csum_update via a new flag.
When calling bpf_l4_csum_replace in Cilium, it ends up calling
inet_proto_csum_replace_by_diff:
1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
2: __wsum diff, bool pseudohdr)
3: {
4: if (skb->ip_summed != CHECKSUM_PARTIAL) {
5: csum_replace_by_diff(sum, diff);
6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
7: skb->csum = ~csum_sub(diff, skb->csum);
8: } else if (pseudohdr) {
9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
10: }
11: }
The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
updated one of the IPv6 addresses. The helper now updates the L4 header
checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.
For an IPv6 packet, the updates of the IPv6 address and of the L4
checksum will cancel each other. The checksums are set such that
computing a checksum over the packet including its checksum will result
in a sum of 0. So the same is true here when we update the L4 checksum
on line 5. We'll update it as to cancel the previous IPv6 address
update. Hence skb->csum should remain untouched in this case.
The same bug doesn't affect IPv4 packets because, in that case, three
fields are updated: the IPv4 address, the IP checksum, and the L4
checksum. The change to the IPv4 address and one of the checksums still
cancel each other in skb->csum, but we're left with one checksum update
and should therefore update skb->csum accordingly. That's exactly what
inet_proto_csum_replace_by_diff does.
This special case for IPv6 L4 checksums is also described atop
inet_proto_csum_replace16, the function we should be using in this case.
This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6,
to indicate that we're updating the L4 checksum of an IPv6 packet. When
the flag is set, inet_proto_csum_replace_by_diff will skip the
skb->csum update.
Fixes:
7d672345ed295 ("bpf: add generic bpf_csum_diff helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/96a6bc3a443e6f0b21ff7b7834000e17fb549e05.1748509484.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Chaignon [Thu, 29 May 2025 10:28:05 +0000 (12:28 +0200)]
net: Fix checksum update for ILA adj-transport
During ILA address translations, the L4 checksums can be handled in
different ways. One of them, adj-transport, consist in parsing the
transport layer and updating any found checksum. This logic relies on
inet_proto_csum_replace_by_diff and produces an incorrect skb->csum when
in state CHECKSUM_COMPLETE.
This bug can be reproduced with a simple ILA to SIR mapping, assuming
packets are received with CHECKSUM_COMPLETE:
$ ip a show dev eth0
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 62:ae:35:9e:0f:8d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 3333:0:0:1::c078/64 scope global
valid_lft forever preferred_lft forever
inet6 fd00:10:244:1::c078/128 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::60ae:35ff:fe9e:f8d/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ ip ila add loc_match fd00:10:244:1 loc 3333:0:0:1 \
csum-mode adj-transport ident-type luid dev eth0
Then I hit [fd00:10:244:1::c078]:8000 with a server listening only on
[3333:0:0:1::c078]:8000. With the bug, the SYN packet is dropped with
SKB_DROP_REASON_TCP_CSUM after inet_proto_csum_replace_by_diff changed
skb->csum. The translation and drop are visible on pwru [1] traces:
IFACE TUPLE FUNC
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ipv6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ip6_rcv_core
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) nf_hook_slow
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) inet_proto_csum_replace_by_diff
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_early_demux
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_route_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input_finish
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_protocol_deliver_rcu
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) raw6_local_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ipv6_raw_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) __skb_checksum_complete
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skb_reason(SKB_DROP_REASON_TCP_CSUM)
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_head_state
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_data
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_free_head
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skbmem
This is happening because inet_proto_csum_replace_by_diff is updating
skb->csum when it shouldn't. The L4 checksum is updated such that it
"cancels" the IPv6 address change in terms of checksum computation, so
the impact on skb->csum is null.
Note this would be different for an IPv4 packet since three fields
would be updated: the IPv4 address, the IP checksum, and the L4
checksum. Two would cancel each other and skb->csum would still need
to be updated to take the L4 checksum change into account.
This patch fixes it by passing an ipv6 flag to
inet_proto_csum_replace_by_diff, to skip the skb->csum update if we're
in the IPv6 case. Note the behavior of the only other user of
inet_proto_csum_replace_by_diff, the BPF subsystem, is left as is in
this patch and fixed in the subsequent patch.
With the fix, using the reproduction from above, I can confirm
skb->csum is not touched by inet_proto_csum_replace_by_diff and the TCP
SYN proceeds to the application after the ILA translation.
Link: https://github.com/cilium/pwru
Fixes:
65d7ab8de582 ("net: Identifier Locator Addressing module")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/b5539869e3550d46068504feb02d37653d939c0b.1748509484.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 31 May 2025 02:33:30 +0000 (19:33 -0700)]
Merge branch 'net-stmmac-prevent-div-by-0'
Alexis Lothoré says:
====================
net: stmmac: prevent div by 0
fix a small splat I am observing on a STM32MP157 platform at boot
(see commit 1) due to a division by 0.
v3: https://lore.kernel.org/
20250528-stmmac_tstamp_div-v3-0-
b525ecdfd84c@bootlin.com
v2: https://lore.kernel.org/
20250527-stmmac_tstamp_div-v2-1-
663251b3b542@bootlin.com
v1: https://lore.kernel.org/
20250523-stmmac_tstamp_div-v1-1-
bca8a5a3a477@bootlin.com
====================
Link: https://patch.msgid.link/20250529-stmmac_tstamp_div-v4-0-d73340a794d5@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexis Lothoré [Thu, 29 May 2025 09:07:24 +0000 (11:07 +0200)]
net: stmmac: make sure that ptp_rate is not 0 before configuring EST
If the ptp_rate recorded earlier in the driver happens to be 0, this
bogus value will propagate up to EST configuration, where it will
trigger a division by 0.
Prevent this division by 0 by adding the corresponding check and error
code.
Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Fixes:
8572aec3d0dc ("net: stmmac: Add basic EST support for XGMAC")
Link: https://patch.msgid.link/20250529-stmmac_tstamp_div-v4-2-d73340a794d5@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexis Lothoré [Thu, 29 May 2025 09:07:23 +0000 (11:07 +0200)]
net: stmmac: make sure that ptp_rate is not 0 before configuring timestamping
The stmmac platform drivers that do not open-code the clk_ptp_rate value
after having retrieved the default one from the device-tree can end up
with 0 in clk_ptp_rate (as clk_get_rate can return 0). It will
eventually propagate up to PTP initialization when bringing up the
interface, leading to a divide by 0:
Division by zero in kernel.
CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.12.30-00001-g48313bd5768a #22
Hardware name: STM32 (Device Tree Support)
Call trace:
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x6c/0x8c
dump_stack_lvl from Ldiv0_64+0x8/0x18
Ldiv0_64 from stmmac_init_tstamp_counter+0x190/0x1a4
stmmac_init_tstamp_counter from stmmac_hw_setup+0xc1c/0x111c
stmmac_hw_setup from __stmmac_open+0x18c/0x434
__stmmac_open from stmmac_open+0x3c/0xbc
stmmac_open from __dev_open+0xf4/0x1ac
__dev_open from __dev_change_flags+0x1cc/0x224
__dev_change_flags from dev_change_flags+0x24/0x60
dev_change_flags from ip_auto_config+0x2e8/0x11a0
ip_auto_config from do_one_initcall+0x84/0x33c
do_one_initcall from kernel_init_freeable+0x1b8/0x214
kernel_init_freeable from kernel_init+0x24/0x140
kernel_init from ret_from_fork+0x14/0x28
Exception stack(0xe0815fb0 to 0xe0815ff8)
Prevent this division by 0 by adding an explicit check and error log
about the actual issue. While at it, remove the same check from
stmmac_ptp_register, which then becomes duplicate
Fixes:
19d857c9038e ("stmmac: Fix calculations for ptp counters when clock input = 50Mhz.")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250529-stmmac_tstamp_div-v4-1-d73340a794d5@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saurabh Sengar [Thu, 29 May 2025 10:18:30 +0000 (03:18 -0700)]
hv_netvsc: fix potential deadlock in netvsc_vf_setxdp()
The MANA driver's probe registers netdevice via the following call chain:
mana_probe()
register_netdev()
register_netdevice()
register_netdevice() calls notifier callback for netvsc driver,
holding the netdev mutex via netdev_lock_ops().
Further this netvsc notifier callback end up attempting to acquire the
same lock again in dev_xdp_propagate() leading to deadlock.
netvsc_netdev_event()
netvsc_vf_setxdp()
dev_xdp_propagate()
This deadlock was not observed so far because net_shaper_ops was never set,
and thus the lock was effectively a no-op in this case. Fix this by using
netif_xdp_propagate() instead of dev_xdp_propagate() to avoid recursive
locking in this path.
And, since no deadlock is observed on the other path which is via
netvsc_probe, add the lock exclusivly for that path.
Also, clean up the unregistration path by removing the unnecessary call to
netvsc_vf_setxdp(), since unregister_netdevice_many_notify() already
performs this cleanup via dev_xdp_uninstall().
Fixes:
97246d6d21c2 ("net: hold netdev instance lock during ndo_bpf")
Cc: stable@vger.kernel.org
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Tested-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1748513910-23963-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pranjal Shrivastava [Wed, 28 May 2025 21:10:58 +0000 (21:10 +0000)]
net: Fix net_devmem_bind_dmabuf for non-devmem configs
Fix the signature of the net_devmem_bind_dmabuf API for
CONFIG_NET_DEVMEM=n.
Fixes:
bd61848900bf ("net: devmem: Implement TX path")
Signed-off-by: Pranjal Shrivastava <praan@google.com>
Link: https://patch.msgid.link/20250528211058.1826608-1-praan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Álvaro Fernández Rojas [Thu, 29 May 2025 12:44:06 +0000 (14:44 +0200)]
net: dsa: tag_brcm: legacy: fix pskb_may_pull length
BRCM_LEG_PORT_ID was incorrectly used for pskb_may_pull length.
The correct check is BRCM_LEG_TAG_LEN + VLAN_HLEN, or 10 bytes.
Fixes:
964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags")
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250529124406.2513779-1-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 31 May 2025 02:15:58 +0000 (19:15 -0700)]
Merge tag 'for-net-2025-05-30' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_qca: move the SoC type check to the right place
- MGMT: reject malformed HCI_CMD_SYNC commands
- btnxpuart: Fix missing devm_request_irq() return value check
- L2CAP: Fix not responding with L2CAP_CR_LE_ENCRYPTION
* tag 'for-net-2025-05-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix not responding with L2CAP_CR_LE_ENCRYPTION
Bluetooth: hci_qca: move the SoC type check to the right place
Bluetooth: btnxpuart: Fix missing devm_request_irq() return value check
Bluetooth: MGMT: reject malformed HCI_CMD_SYNC commands
====================
Link: https://patch.msgid.link/20250530174835.405726-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Emil Tantilov [Thu, 8 May 2025 18:47:15 +0000 (11:47 -0700)]
idpf: avoid mailbox timeout delays during reset
Mailbox operations are not possible while the driver is in reset.
Operations that require MBX exchange with the control plane will result
in long delays if executed while a reset is in progress:
ethtool -L <inf> combined 8& echo 1 > /sys/class/net/<inf>/device/reset
idpf 0000:83:00.0: HW reset detected
idpf 0000:83:00.0: Device HW Reset initiated
idpf 0000:83:00.0: Transaction timed-out (op:504 cookie:be00 vc_op:504 salt:be timeout:2000ms)
idpf 0000:83:00.0: Transaction timed-out (op:508 cookie:bf00 vc_op:508 salt:bf timeout:2000ms)
idpf 0000:83:00.0: Transaction timed-out (op:512 cookie:c000 vc_op:512 salt:c0 timeout:2000ms)
idpf 0000:83:00.0: Transaction timed-out (op:510 cookie:c100 vc_op:510 salt:c1 timeout:2000ms)
idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c200 vc_op:509 salt:c2 timeout:60000ms)
idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c300 vc_op:509 salt:c3 timeout:60000ms)
idpf 0000:83:00.0: Transaction timed-out (op:505 cookie:c400 vc_op:505 salt:c4 timeout:60000ms)
idpf 0000:83:00.0: Failed to configure queues for vport 0, -62
Disable mailbox communication in case of a reset, unless it's done during
a driver load, where the virtchnl operations are needed to configure the
device.
Fixes:
8077c727561aa ("idpf: add controlq init and reset checks")
Co-developed-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Brian Vazquez [Thu, 1 May 2025 17:06:17 +0000 (17:06 +0000)]
idpf: fix a race in txq wakeup
Add a helper function to correctly handle the lockless
synchronization when the sender needs to block. The paradigm is
if (no_resources()) {
stop_queue();
barrier();
if (!no_resources())
restart_queue();
}
netif_subqueue_maybe_stop already handles the paradigm correctly, but
the code split the check for resources in three parts, the first one
(descriptors) followed the protocol, but the other two (completions and
tx_buf) were only doing the first part and so race prone.
Luckily netif_subqueue_maybe_stop macro already allows you to use a
function to evaluate the start/stop conditions so the fix only requires
the right helper function to evaluate all the conditions at once.
The patch removes idpf_tx_maybe_stop_common since it's no longer needed
and instead adjusts separately the conditions for singleq and splitq.
Note that idpf_tx_buf_hw_update doesn't need to check for resources
since that will be covered in idpf_tx_splitq_frame.
To reproduce:
Reduce the threshold for pending completions to increase the chances of
hitting this pause by changing your kernel:
drivers/net/ethernet/intel/idpf/idpf_txrx.h
-#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 1)
+#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 4)
Use pktgen to force the host to push small pkts very aggressively:
./pktgen_sample02_multiqueue.sh -i eth1 -s 100 -6 -d $IP -m $MAC \
-p 10000-10000 -t 16 -n 0 -v -x -c 64
Fixes:
6818c4d5b3c2 ("idpf: add splitq start_xmit")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Josh Hay <joshua.a.hay@intel.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Luigi Rizzo <lrizzo@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Kubiak [Tue, 13 May 2025 10:55:29 +0000 (12:55 +0200)]
ice: fix rebuilding the Tx scheduler tree for large queue counts
The current implementation of the Tx scheduler allows the tree to be
rebuilt as the user adds more Tx queues to the VSI. In such a case,
additional child nodes are added to the tree to support the new number
of queues.
Unfortunately, this algorithm does not take into account that the limit
of the VSI support node may be exceeded, so an additional node in the
VSI layer may be required to handle all the requested queues.
Such a scenario occurs when adding XDP Tx queues on machines with many
CPUs. Although the driver still respects the queue limit returned by
the FW, the Tx scheduler was unable to add those queues to its tree
and returned one of the errors below.
Such a scenario occurs when adding XDP Tx queues on machines with many
CPUs (e.g. at least 321 CPUs, if there is already 128 Tx/Rx queue pairs).
Although the driver still respects the queue limit returned by the FW,
the Tx scheduler was unable to add those queues to its tree and returned
the following errors:
Failed VSI LAN queue config for XDP, error: -5
or:
Failed to set LAN Tx queue context, error: -22
Fix this problem by extending the tree rebuild algorithm to check if the
current VSI node can support the requested number of queues. If it
cannot, create as many additional VSI support nodes as necessary to
handle all the required Tx queues. Symmetrically, adjust the VSI node
removal algorithm to remove all nodes associated with the given VSI.
Also, make the search for the next free VSI node more restrictive. That is,
add queue group nodes only to the VSI support nodes that have a matching
VSI handle.
Finally, fix the comment describing the tree update algorithm to better
reflect the current scenario.
Fixes:
b0153fdd7e8a ("ice: update VSI config dynamically")
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Tested-by: Saritha Sanigani <sarithax.sanigani@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Kubiak [Tue, 13 May 2025 10:55:28 +0000 (12:55 +0200)]
ice: create new Tx scheduler nodes for new queues only
The current implementation of the Tx scheduler tree attempts
to create nodes for all Tx queues, ignoring the fact that some
queues may already exist in the tree. For example, if the VSI
already has 128 Tx queues and the user requests for 16 new queues,
the Tx scheduler will compute the tree for 272 queues (128 existing
queues + 144 new queues), instead of 144 queues (128 existing queues
and 16 new queues).
Fix that by modifying the node count calculation algorithm to skip
the queues that already exist in the tree.
Fixes:
5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support")
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Tested-by: Saritha Sanigani <sarithax.sanigani@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Kubiak [Tue, 13 May 2025 10:55:27 +0000 (12:55 +0200)]
ice: fix Tx scheduler error handling in XDP callback
When the XDP program is loaded, the XDP callback adds new Tx queues.
This means that the callback must update the Tx scheduler with the new
queue number. In the event of a Tx scheduler failure, the XDP callback
should also fail and roll back any changes previously made for XDP
preparation.
The previous implementation had a bug that not all changes made by the
XDP callback were rolled back. This caused the crash with the following
call trace:
[ +9.549584] ice 0000:ca:00.0: Failed VSI LAN queue config for XDP, error: -5
[ +0.382335] Oops: general protection fault, probably for non-canonical address 0x50a2250a90495525: 0000 [#1] SMP NOPTI
[ +0.010710] CPU: 103 UID: 0 PID: 0 Comm: swapper/103 Not tainted 6.14.0-net-next-mar-31+ #14 PREEMPT(voluntary)
[ +0.010175] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.
2202160810 02/16/2022
[ +0.010946] RIP: 0010:__ice_update_sample+0x39/0xe0 [ice]
[...]
[ +0.002715] Call Trace:
[ +0.002452] <IRQ>
[ +0.002021] ? __die_body.cold+0x19/0x29
[ +0.003922] ? die_addr+0x3c/0x60
[ +0.003319] ? exc_general_protection+0x17c/0x400
[ +0.004707] ? asm_exc_general_protection+0x26/0x30
[ +0.004879] ? __ice_update_sample+0x39/0xe0 [ice]
[ +0.004835] ice_napi_poll+0x665/0x680 [ice]
[ +0.004320] __napi_poll+0x28/0x190
[ +0.003500] net_rx_action+0x198/0x360
[ +0.003752] ? update_rq_clock+0x39/0x220
[ +0.004013] handle_softirqs+0xf1/0x340
[ +0.003840] ? sched_clock_cpu+0xf/0x1f0
[ +0.003925] __irq_exit_rcu+0xc2/0xe0
[ +0.003665] common_interrupt+0x85/0xa0
[ +0.003839] </IRQ>
[ +0.002098] <TASK>
[ +0.002106] asm_common_interrupt+0x26/0x40
[ +0.004184] RIP: 0010:cpuidle_enter_state+0xd3/0x690
Fix this by performing the missing unmapping of XDP queues from
q_vectors and setting the XDP rings pointer back to NULL after all those
queues are released.
Also, add an immediate exit from the XDP callback in case of ring
preparation failure.
Fixes:
efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Tested-by: Saritha Sanigani <sarithax.sanigani@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Luiz Augusto von Dentz [Wed, 28 May 2025 18:53:11 +0000 (14:53 -0400)]
Bluetooth: L2CAP: Fix not responding with L2CAP_CR_LE_ENCRYPTION
Depending on the security set the response to L2CAP_LE_CONN_REQ shall be
just L2CAP_CR_LE_ENCRYPTION if only encryption when BT_SECURITY_MEDIUM
is selected since that means security mode 2 which doesn't require
authentication which is something that is covered in the qualification
test L2CAP/LE/CFC/BV-25-C.
Link: https://github.com/bluez/bluez/issues/1270
Fixes:
27e2d4c8d28b ("Bluetooth: Add basic LE L2CAP connect request receiving support")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bartosz Golaszewski [Tue, 27 May 2025 07:47:37 +0000 (09:47 +0200)]
Bluetooth: hci_qca: move the SoC type check to the right place
Commit
3d05fc82237a ("Bluetooth: qca: set power_ctrl_enabled on NULL
returned by gpiod_get_optional()") accidentally changed the prevous
behavior where power control would be disabled without the BT_EN GPIO
only on QCA_WCN6750 and QCA_WCN6855 while also getting the error check
wrong. We should treat every IS_ERR() return value from
devm_gpiod_get_optional() as a reason to bail-out while we should only
set power_ctrl_enabled to false on the two models mentioned above. While
at it: use dev_err_probe() to save a LOC.
Cc: stable@vger.kernel.org
Fixes:
3d05fc82237a ("Bluetooth: qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional()")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Tested-by: Hsin-chen Chuang <chharry@chromium.org>
Reviewed-by: Hsin-chen Chuang <chharry@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Krzysztof Kozlowski [Sun, 25 May 2025 19:00:21 +0000 (21:00 +0200)]
Bluetooth: btnxpuart: Fix missing devm_request_irq() return value check
Return value of devm_request_irq() must be checked (function is even
annotated) and without it clang W=1 complains:
btnxpuart.c:494:6: error: unused variable 'ret' [-Werror,-Wunused-variable]
Setting up wakeup IRQ handler is not really critical, because the
handler is empty, so just log the informational message so user could
submit proper bug report and silences the clang warning.
Fixes:
c50b56664e48 ("Bluetooth: btnxpuart: Implement host-wakeup feature")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Dmitry Antipov [Thu, 22 May 2025 18:16:02 +0000 (21:16 +0300)]
Bluetooth: MGMT: reject malformed HCI_CMD_SYNC commands
In 'mgmt_hci_cmd_sync()', check whether the size of parameters passed
in 'struct mgmt_cp_hci_cmd_sync' matches the total size of the data
(i.e. 'sizeof(struct mgmt_cp_hci_cmd_sync)' plus trailing bytes).
Otherwise, large invalid 'params_len' will cause 'hci_cmd_sync_alloc()'
to do 'skb_put_data()' from an area beyond the one actually passed to
'mgmt_hci_cmd_sync()'.
Reported-by: syzbot+5fe2d5bfbfbec0b675a0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
5fe2d5bfbfbec0b675a0
Fixes:
827af4787e74 ("Bluetooth: MGMT: Add initial implementation of MGMT_OP_HCI_CMD_SYNC")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Oliver Neukum [Wed, 28 May 2025 11:03:54 +0000 (13:03 +0200)]
net: usb: aqc111: debug info before sanitation
If we sanitize error returns, the debug statements need
to come before that so that we don't lose information.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Fixes:
405b0d610745 ("net: usb: aqc111: fix error handling of usbnet read calls")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Wed, 28 May 2025 09:36:19 +0000 (11:36 +0200)]
net: lan966x: Make sure to insert the vlan tags also in host mode
When running these commands on DUT (and similar at the other end)
ip link set dev eth0 up
ip link add link eth0 name eth0.10 type vlan id 10
ip addr add 10.0.0.1/24 dev eth0.10
ip link set dev eth0.10 up
ping 10.0.0.2
The ping will fail.
The reason why is failing is because, the network interfaces for lan966x
have a flag saying that the HW can insert the vlan tags into the
frames(NETIF_F_HW_VLAN_CTAG_TX). Meaning that the frames that are
transmitted don't have the vlan tag inside the skb data, but they have
it inside the skb. We already get that vlan tag and put it in the IFH
but the problem is that we don't configure the HW to rewrite the frame
when the interface is in host mode.
The fix consists in actually configuring the HW to insert the vlan tag
if it is different than 0.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Fixes:
6d2c186afa5d ("net: lan966x: Add vlan support.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250528093619.3738998-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 29 May 2025 10:55:33 +0000 (12:55 +0200)]
Merge tag 'linux-can-fixes-for-6.16-
20250529' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-05-29
this is a pull request of 1 patch for net/main.
The patch is by Fedor Pchelkin and fixes a slab-out-of-bounds access
in the kvaser_pciefd driver.
linux-can-fixes-for-6.16-
20250529
* tag 'linux-can-fixes-for-6.16-
20250529' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: kvaser_pciefd: refine error prone echo_skb_max handling logic
====================
Link: https://patch.msgid.link/20250529075313.1101820-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dan Carpenter [Wed, 28 May 2025 08:11:09 +0000 (11:11 +0300)]
net/mlx4_en: Prevent potential integer overflow calculating Hz
The "freq" variable is in terms of MHz and "max_val_cycles" is in terms
of Hz. The fact that "max_val_cycles" is a u64 suggests that support
for high frequency is intended but the "freq_khz * 1000" would overflow
the u32 type if we went above 4GHz. Use unsigned long long type for the
mutliplication to prevent that.
Fixes:
31c128b66e5b ("net/mlx4_en: Choose time-stamping shift value according to HW frequency")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/aDbFHe19juIJKjsb@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yanqing Wang [Wed, 28 May 2025 07:53:51 +0000 (15:53 +0800)]
driver: net: ethernet: mtk_star_emac: fix suspend/resume issue
Identify the cause of the suspend/resume hang: netif_carrier_off()
is called during link state changes and becomes stuck while
executing linkwatch_work().
To resolve this issue, call netif_device_detach() during the Ethernet
suspend process to temporarily detach the network device from the
kernel and prevent the suspend/resume hang.
Fixes:
8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver")
Signed-off-by: Yanqing Wang <ot_yanqing.wang@mediatek.com>
Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Link: https://patch.msgid.link/20250528075351.593068-1-macpaul.lin@mediatek.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Geert Uytterhoeven [Tue, 27 May 2025 19:33:41 +0000 (21:33 +0200)]
hinic3: Remove printed message during module init
No driver should spam the kernel log when merely being loaded.
Fixes:
17fcb3dc12bbee8e ("hinic3: module initialization and tx/rx logic")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/5310dac0b3ab4bd16dd8fb761566f12e73b38cab.1748357352.git.geert+renesas@glider.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Charalampos Mitrodimas [Tue, 27 May 2025 16:35:44 +0000 (16:35 +0000)]
net: tipc: fix refcount warning in tipc_aead_encrypt
syzbot reported a refcount warning [1] caused by calling get_net() on
a network namespace that is being destroyed (refcount=0). This happens
when a TIPC discovery timer fires during network namespace cleanup.
The recently added get_net() call in commit
e279024617134 ("net/tipc:
fix slab-use-after-free Read in tipc_aead_encrypt_done") attempts to
hold a reference to the network namespace. However, if the namespace
is already being destroyed, its refcount might be zero, leading to the
use-after-free warning.
Replace get_net() with maybe_get_net(), which safely checks if the
refcount is non-zero before incrementing it. If the namespace is being
destroyed, return -ENODEV early, after releasing the bearer reference.
[1]: https://lore.kernel.org/all/
68342b55.
a70a0220.253bc2.0091.GAE@google.com/T/#m12019cf9ae77e1954f666914640efa36d52704a2
Reported-by: syzbot+f0c4a4aba757549ae26c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/
68342b55.
a70a0220.253bc2.0091.GAE@google.com/T/#m12019cf9ae77e1954f666914640efa36d52704a2
Fixes:
e27902461713 ("net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done")
Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20250527-net-tipc-warning-v2-1-df3dc398a047@posteo.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
David Howells [Tue, 27 May 2025 15:01:43 +0000 (16:01 +0100)]
rxrpc: Fix return from none_validate_challenge()
Fix the return value of none_validate_challenge() to be explicitly true
(which indicates the source packet should simply be discarded) rather than
implicitly true (because rxrpc_abort_conn() always returns -EPROTO which
gets converted to true).
Note that this change doesn't change the behaviour of the code (which is
correct by accident) and, in any case, we *shouldn't* get a CHALLENGE
packet to an rxnull connection (ie. no security).
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lists.infradead.org/pipermail/linux-afs/2025-April/009738.html
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/10720.1748358103@warthog.procyon.org.uk
Fixes:
5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Alok Tiwari [Tue, 27 May 2025 13:08:16 +0000 (06:08 -0700)]
gve: Fix RX_BUFFERS_POSTED stat to report per-queue fill_cnt
Previously, the RX_BUFFERS_POSTED stat incorrectly reported the
fill_cnt from RX queue 0 for all queues, resulting in inaccurate
per-queue statistics.
Fix this by correctly indexing priv->rx[idx].fill_cnt for each RX queue.
Fixes:
24aeb56f2d38 ("gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250527130830.1812903-1-alok.a.tiwari@oracle.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Quentin Schulz [Tue, 27 May 2025 11:56:23 +0000 (13:56 +0200)]
net: stmmac: platform: guarantee uniqueness of bus_id
bus_id is currently derived from the ethernetX alias. If one is missing
for the device, 0 is used. If ethernet0 points to another stmmac device
or if there are 2+ stmmac devices without an ethernet alias, then bus_id
will be 0 for all of those.
This is an issue because the bus_id is used to generate the mdio bus id
(new_bus->id in drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
stmmac_mdio_register) and this needs to be unique.
This allows to avoid needing to define ethernet aliases for devices with
multiple stmmac controllers (such as the Rockchip RK3588) for multiple
stmmac devices to probe properly.
Obviously, the bus_id isn't guaranteed to be stable across reboots if no
alias is set for the device but that is easily fixed by simply adding an
alias if this is desired.
Fixes:
25c83b5c2e82 ("dt:net:stmmac: Add support to dwmac version 3.610 and 3.710")
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250527-stmmac-mdio-bus_id-v2-1-a5ca78454e3c@cherry.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fedor Pchelkin [Wed, 28 May 2025 19:27:12 +0000 (22:27 +0300)]
can: kvaser_pciefd: refine error prone echo_skb_max handling logic
echo_skb_max should define the supported upper limit of echo_skb[]
allocated inside the netdevice's priv. The corresponding size value
provided by this driver to alloc_candev() is KVASER_PCIEFD_CAN_TX_MAX_COUNT
which is 17.
But later echo_skb_max is rounded up to the nearest power of two (for the
max case, that would be 32) and the tx/ack indices calculated further
during tx/rx may exceed the upper array boundary. Kasan reported this for
the ack case inside kvaser_pciefd_handle_ack_packet(), though the xmit
function has actually caught the same thing earlier.
BUG: KASAN: slab-out-of-bounds in kvaser_pciefd_handle_ack_packet+0x2d7/0x92a drivers/net/can/kvaser_pciefd.c:1528
Read of size 8 at addr
ffff888105e4f078 by task swapper/4/0
CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Not tainted 6.15.0 #12 PREEMPT(voluntary)
Call Trace:
<IRQ>
dump_stack_lvl lib/dump_stack.c:122
print_report mm/kasan/report.c:521
kasan_report mm/kasan/report.c:634
kvaser_pciefd_handle_ack_packet drivers/net/can/kvaser_pciefd.c:1528
kvaser_pciefd_read_packet drivers/net/can/kvaser_pciefd.c:1605
kvaser_pciefd_read_buffer drivers/net/can/kvaser_pciefd.c:1656
kvaser_pciefd_receive_irq drivers/net/can/kvaser_pciefd.c:1684
kvaser_pciefd_irq_handler drivers/net/can/kvaser_pciefd.c:1733
__handle_irq_event_percpu kernel/irq/handle.c:158
handle_irq_event kernel/irq/handle.c:210
handle_edge_irq kernel/irq/chip.c:833
__common_interrupt arch/x86/kernel/irq.c:296
common_interrupt arch/x86/kernel/irq.c:286
</IRQ>
Tx max count definitely matters for kvaser_pciefd_tx_avail(), but for seq
numbers' generation that's not the case - we're free to calculate them as
would be more convenient, not taking tx max count into account. The only
downside is that the size of echo_skb[] should correspond to the max seq
number (not tx max count), so in some situations a bit more memory would
be consumed than could be.
Thus make the size of the underlying echo_skb[] sufficient for the rounded
max tx value.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes:
8256e0ca6010 ("can: kvaser_pciefd: Fix echo_skb race")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://patch.msgid.link/20250528192713.63894-1-pchelkin@ispras.ru
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Dong Chenchen [Tue, 27 May 2025 11:41:52 +0000 (19:41 +0800)]
page_pool: Fix use-after-free in page_pool_recycle_in_ring
syzbot reported a uaf in page_pool_recycle_in_ring:
BUG: KASAN: slab-use-after-free in lock_release+0x151/0xa30 kernel/locking/lockdep.c:5862
Read of size 8 at addr
ffff8880286045a0 by task syz.0.284/6943
CPU: 0 UID: 0 PID: 6943 Comm: syz.0.284 Not tainted
6.13.0-rc3-syzkaller-gdfa94ce54f41 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x169/0x550 mm/kasan/report.c:489
kasan_report+0x143/0x180 mm/kasan/report.c:602
lock_release+0x151/0xa30 kernel/locking/lockdep.c:5862
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:165 [inline]
_raw_spin_unlock_bh+0x1b/0x40 kernel/locking/spinlock.c:210
spin_unlock_bh include/linux/spinlock.h:396 [inline]
ptr_ring_produce_bh include/linux/ptr_ring.h:164 [inline]
page_pool_recycle_in_ring net/core/page_pool.c:707 [inline]
page_pool_put_unrefed_netmem+0x748/0xb00 net/core/page_pool.c:826
page_pool_put_netmem include/net/page_pool/helpers.h:323 [inline]
page_pool_put_full_netmem include/net/page_pool/helpers.h:353 [inline]
napi_pp_put_page+0x149/0x2b0 net/core/skbuff.c:1036
skb_pp_recycle net/core/skbuff.c:1047 [inline]
skb_free_head net/core/skbuff.c:1094 [inline]
skb_release_data+0x6c4/0x8a0 net/core/skbuff.c:1125
skb_release_all net/core/skbuff.c:1190 [inline]
__kfree_skb net/core/skbuff.c:1204 [inline]
sk_skb_reason_drop+0x1c9/0x380 net/core/skbuff.c:1242
kfree_skb_reason include/linux/skbuff.h:1263 [inline]
__skb_queue_purge_reason include/linux/skbuff.h:3343 [inline]
root cause is:
page_pool_recycle_in_ring
ptr_ring_produce
spin_lock(&r->producer_lock);
WRITE_ONCE(r->queue[r->producer++], ptr)
//recycle last page to pool
page_pool_release
page_pool_scrub
page_pool_empty_ring
ptr_ring_consume
page_pool_return_page //release all page
__page_pool_destroy
free_percpu(pool->recycle_stats);
free(pool) //free
spin_unlock(&r->producer_lock); //pool->ring uaf read
recycle_stat_inc(pool, ring);
page_pool can be free while page pool recycle the last page in ring.
Add producer-lock barrier to page_pool_release to prevent the page
pool from being free before all pages have been recycled.
recycle_stat_inc() is empty when CONFIG_PAGE_POOL_STATS is not
enabled, which will trigger Wempty-body build warning. Add definition
for pool stat macro to fix warning.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250513083123.3514193-1-dongchenchen2@huawei.com
Fixes:
ff7d6b27f894 ("page_pool: refurbish version of page_pool code")
Reported-by: syzbot+204a4382fcb3311f3858@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
204a4382fcb3311f3858
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250527114152.3119109-1-dongchenchen2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qasim Ijaz [Mon, 26 May 2025 18:36:07 +0000 (19:36 +0100)]
net: ch9200: fix uninitialised access during mii_nway_restart
In mii_nway_restart() the code attempts to call
mii->mdio_read which is ch9200_mdio_read(). ch9200_mdio_read()
utilises a local buffer called "buff", which is initialised
with control_read(). However "buff" is conditionally
initialised inside control_read():
if (err == size) {
memcpy(data, buf, size);
}
If the condition of "err == size" is not met, then
"buff" remains uninitialised. Once this happens the
uninitialised "buff" is accessed and returned during
ch9200_mdio_read():
return (buff[0] | buff[1] << 8);
The problem stems from the fact that ch9200_mdio_read()
ignores the return value of control_read(), leading to
uinit-access of "buff".
To fix this we should check the return value of
control_read() and return early on error.
Reported-by: syzbot <syzbot+3361c2d6f78a3e0892f9@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=
3361c2d6f78a3e0892f9
Tested-by: syzbot <syzbot+3361c2d6f78a3e0892f9@syzkaller.appspotmail.com>
Fixes:
4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices")
Cc: stable@vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00@gmail.com>
Link: https://patch.msgid.link/20250526183607.66527-1-qasdev00@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tengteng Yang [Tue, 27 May 2025 03:04:19 +0000 (11:04 +0800)]
Fix sock_exceed_buf_limit not being triggered in __sk_mem_raise_allocated
When a process under memory pressure is not part of any cgroup and
the charged flag is false, trace_sock_exceed_buf_limit was not called
as expected.
This regression was introduced by commit
2def8ff3fdb6 ("sock:
Code cleanup on __sk_mem_raise_allocated()"). The fix changes the
default value of charged to true while preserving existing logic.
Fixes:
2def8ff3fdb6 ("sock: Code cleanup on __sk_mem_raise_allocated()")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Signed-off-by: Tengteng Yang <yangtengteng@bytedance.com>
Link: https://patch.msgid.link/20250527030419.67693-1-yangtengteng@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 28 May 2025 22:52:42 +0000 (15:52 -0700)]
Merge tag 'bpf-next-6.16' of git://git./linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Fix and improve BTF deduplication of identical BTF types (Alan
Maguire and Andrii Nakryiko)
- Support up to 12 arguments in BPF trampoline on arm64 (Xu Kuohai and
Alexis Lothoré)
- Support load-acquire and store-release instructions in BPF JIT on
riscv64 (Andrea Parri)
- Fix uninitialized values in BPF_{CORE,PROBE}_READ macros (Anton
Protopopov)
- Streamline allowed helpers across program types (Feng Yang)
- Support atomic update for hashtab of BPF maps (Hou Tao)
- Implement json output for BPF helpers (Ihor Solodrai)
- Several s390 JIT fixes (Ilya Leoshkevich)
- Various sockmap fixes (Jiayuan Chen)
- Support mmap of vmlinux BTF data (Lorenz Bauer)
- Support BPF rbtree traversal and list peeking (Martin KaFai Lau)
- Tests for sockmap/sockhash redirection (Michal Luczaj)
- Introduce kfuncs for memory reads into dynptrs (Mykyta Yatsenko)
- Add support for dma-buf iterators in BPF (T.J. Mercier)
- The verifier support for __bpf_trap() (Yonghong Song)
* tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (135 commits)
bpf, arm64: Remove unused-but-set function and variable.
selftests/bpf: Add tests with stack ptr register in conditional jmp
bpf: Do not include stack ptr register in precision backtracking bookkeeping
selftests/bpf: enable many-args tests for arm64
bpf, arm64: Support up to 12 function arguments
bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem()
bpf: Avoid __bpf_prog_ret0_warn when jit fails
bpftool: Add support for custom BTF path in prog load/loadall
selftests/bpf: Add unit tests with __bpf_trap() kfunc
bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variable
bpf: Remove special_kfunc_set from verifier
selftests/bpf: Add test for open coded dmabuf_iter
selftests/bpf: Add test for dmabuf_iter
bpf: Add open coded dmabuf iterator
bpf: Add dmabuf iterator
dma-buf: Rename debugfs symbols
bpf: Fix error return value in bpf_copy_from_user_dynptr
libbpf: Use mmap to parse vmlinux BTF from sysfs
selftests: bpf: Add a test for mmapable vmlinux BTF
btf: Allow mmap of vmlinux btf
...
Linus Torvalds [Wed, 28 May 2025 22:24:36 +0000 (15:24 -0700)]
Merge tag 'net-next-6.16' of git://git./linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Implement the Device Memory TCP transmit path, allowing zero-copy
data transmission on top of TCP from e.g. GPU memory to the wire.
- Move all the IPv6 routing tables management outside the RTNL scope,
under its own lock and RCU. The route control path is now 3x times
faster.
- Convert queue related netlink ops to instance lock, reducing again
the scope of the RTNL lock. This improves the control plane
scalability.
- Refactor the software crc32c implementation, removing unneeded
abstraction layers and improving significantly the related
micro-benchmarks.
- Optimize the GRO engine for UDP-tunneled traffic, for a 10%
performance improvement in related stream tests.
- Cover more per-CPU storage with local nested BH locking; this is a
prep work to remove the current per-CPU lock in local_bh_disable()
on PREMPT_RT.
- Introduce and use nlmsg_payload helper, combining buffer bounds
verification with accessing payload carried by netlink messages.
Netfilter:
- Rewrite the procfs conntrack table implementation, improving
considerably the dump performance. A lot of user-space tools still
use this interface.
- Implement support for wildcard netdevice in netdev basechain and
flowtables.
- Integrate conntrack information into nft trace infrastructure.
- Export set count and backend name to userspace, for better
introspection.
BPF:
- BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
programs and can be controlled in similar way to traditional qdiscs
using the "tc qdisc" command.
- Refactor the UDP socket iterator, addressing long standing issues
WRT duplicate hits or missed sockets.
Protocols:
- Improve TCP receive buffer auto-tuning and increase the default
upper bound for the receive buffer; overall this improves the
single flow maximum thoughput on 200Gbs link by over 60%.
- Add AFS GSSAPI security class to AF_RXRPC; it provides transport
security for connections to the AFS fileserver and VL server.
- Improve TCP multipath routing, so that the sources address always
matches the nexthop device.
- Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
and thus preventing DoS caused by passing around problematic FDs.
- Retire DCCP socket. DCCP only receives updates for bugs, and major
distros disable it by default. Its removal allows for better
organisation of TCP fields to reduce the number of cache lines hit
in the fast path.
- Extend TCP drop-reason support to cover PAWS checks.
Driver API:
- Reorganize PTP ioctl flag support to require an explicit opt-in for
the drivers, avoiding the problem of drivers not rejecting new
unsupported flags.
- Converted several device drivers to timestamping APIs.
- Introduce per-PHY ethtool dump helpers, improving the support for
dump operations targeting PHYs.
Tests and tooling:
- Add support for classic netlink in user space C codegen, so that
ynl-c can now read, create and modify links, routes addresses and
qdisc layer configuration.
- Add ynl sub-types for binary attributes, allowing ynl-c to output
known struct instead of raw binary data, clarifying the classic
netlink output.
- Extend MPTCP selftests to improve the code-coverage.
- Add tests for XDP tail adjustment in AF_XDP.
New hardware / drivers:
- OpenVPN virtual driver: offload OpenVPN data channels processing to
the kernel-space, increasing the data transfer throughput WRT the
user-space implementation.
- Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.
- Broadcom asp-v3.0 ethernet driver.
- AMD Renoir ethernet device.
- ReakTek MT9888 2.5G ethernet PHY driver.
- Aeonsemi 10G C45 PHYs driver.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- refactor the steering table handling to significantly
reduce the amount of memory used
- add support for complex matches in H/W flow steering
- improve flow streeing error handling
- convert to netdev instance locking
- Intel (100G, ice, igb, ixgbe, idpf):
- ice: add switchdev support for LLDP traffic over VF
- ixgbe: add firmware manipulation and regions devlink support
- igb: introduce support for frame transmission premption
- igb: adds persistent NAPI configuration
- idpf: introduce RDMA support
- idpf: add initial PTP support
- Meta (fbnic):
- extend hardware stats coverage
- add devlink dev flash support
- Broadcom (bnxt):
- add support for RX-side device memory TCP
- Wangxun (txgbe):
- implement support for udp tunnel offload
- complete PTP and SRIOV support for AML 25G/10G devices
- Ethernet NICs embedded and virtual:
- Google (gve):
- add device memory TCP TX support
- Amazon (ena):
- support persistent per-NAPI config
- Airoha:
- add H/W support for L2 traffic offload
- add per flow stats for flow offloading
- RealTek (rtl8211): add support for WoL magic packet
- Synopsys (stmmac):
- dwmac-socfpga 1000BaseX support
- add Loongson-2K3000 support
- introduce support for hardware-accelerated VLAN stripping
- Broadcom (bcmgenet):
- expose more H/W stats
- Freescale (enetc, dpaa2-eth):
- enetc: add MAC filter, VLAN filter RSS and loopback support
- dpaa2-eth: convert to H/W timestamping APIs
- vxlan: convert FDB table to rhashtable, for better scalabilty
- veth: apply qdisc backpressure on full ring to reduce TX drops
- Ethernet switches:
- Microchip (kzZ88x3): add ETS scheduler support
- Ethernet PHYs:
- RealTek (rtl8211):
- add support for WoL magic packet
- add support for PHY LEDs
- CAN:
- Adds RZ/G3E CANFD support to the rcar_canfd driver.
- Preparatory work for CAN-XL support.
- Add self-tests framework with support for CAN physical interfaces.
- WiFi:
- mac80211:
- scan improvements with multi-link operation (MLO)
- Qualcomm (ath12k):
- enable AHB support for IPQ5332
- add monitor interface support to QCN9274
- add multi-link operation support to WCN7850
- add 802.11d scan offload support to WCN7850
- monitor mode for WCN7850, better 6 GHz regulatory
- Qualcomm (ath11k):
- restore hibernation support
- MediaTek (mt76):
- WiFi-7 improvements
- implement support for mt7990
- Intel (iwlwifi):
- enhanced multi-link single-radio (EMLSR) support on 5 GHz links
- rework device configuration
- RealTek (rtw88):
- improve throughput for RTL8814AU
- RealTek (rtw89):
- add multi-link operation support
- STA/P2P concurrency improvements
- support different SAR configs by antenna
- Bluetooth:
- introduce HCI Driver protocol
- btintel_pcie: do not generate coredump for diagnostic events
- btusb: add HCI Drv commands for configuring altsetting
- btusb: add RTL8851BE device 0x0bda:0xb850
- btusb: add new VID/PID 13d3/3584 for MT7922
- btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
- btnxpuart: implement host-wakeup feature"
* tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits)
selftests/bpf: Fix bpf selftest build warning
selftests: netfilter: Fix skip of wildcard interface test
net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
net: openvswitch: Fix the dead loop of MPLS parse
calipso: Don't call calipso functions for AF_INET sk.
selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback
octeontx2-pf: QOS: Perform cache sync on send queue teardown
net: mana: Add support for Multi Vports on Bare metal
net: devmem: ncdevmem: remove unused variable
net: devmem: ksft: upgrade rx test to send 1K data
net: devmem: ksft: add 5 tuple FS support
net: devmem: ksft: add exit_wait to make rx test pass
net: devmem: ksft: add ipv4 support
net: devmem: preserve sockc_err
page_pool: fix ugly page_pool formatting
net: devmem: move list_add to net_devmem_bind_dmabuf.
selftests: netfilter: nft_queue.sh: include file transfer duration in log message
net: phy: mscc: Fix memory leak when using one step timestamping
...
Linus Torvalds [Wed, 28 May 2025 21:55:35 +0000 (14:55 -0700)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The headline feature is the re-enablement of support for Arm's
Scalable Matrix Extension (SME) thanks to a bumper crop of fixes
from Mark Rutland.
If matrices aren't your thing, then Ryan's page-table optimisation
work is much more interesting.
Summary:
ACPI, EFI and PSCI:
- Decouple Arm's "Software Delegated Exception Interface" (SDEI)
support from the ACPI GHES code so that it can be used by platforms
booted with device-tree
- Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
runtime calls
- Fix a node refcount imbalance in the PSCI device-tree code
CPU Features:
- Ensure register sanitisation is applied to fields in ID_AA64MMFR4
- Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
guests can reliably query the underlying CPU types from the VMM
- Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
to our context-switching, signal handling and ptrace code
Entry code:
- Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be
selected
Memory management:
- Prevent BSS exports from being used by the early PI code
- Propagate level and stride information to the low-level TLB
invalidation routines when operating on hugetlb entries
- Use the page-table contiguous hint for vmap() mappings with
VM_ALLOW_HUGE_VMAP where possible
- Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode"
and hook this up on arm64 so that the trailing DSB (used to publish
the updates to the hardware walker) can be deferred until the end
of the mapping operation
- Extend mmap() randomisation for 52-bit virtual addresses (on par
with 48-bit addressing) and remove limited support for
randomisation of the linear map
Perf and PMUs:
- Add support for probing the CMN-S3 driver using ACPI
- Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers
Selftests:
- Fix FPSIMD and SME tests to align with the freshly re-enabled SME
support
- Fix default setting of the OUTPUT variable so that tests are
installed in the right location
vDSO:
- Replace raw counter access from inline assembly code with a call to
the the __arch_counter_get_cntvct() helper function
Miscellaneous:
- Add some missing header inclusions to the CCA headers
- Rework rendering of /proc/cpuinfo to follow the x86-approach and
avoid repeated buffer expansion (the user-visible format remains
identical)
- Remove redundant selection of CONFIG_CRC32
- Extend early error message when failing to map the device-tree
blob"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
arm64: cputype: Add cputype definition for HIP12
arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again
perf/arm-cmn: Add CMN S3 ACPI binding
arm64/boot: Disallow BSS exports to startup code
arm64/boot: Move global CPU override variables out of BSS
arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace
perf/arm-cmn: Initialise cmn->cpu earlier
kselftest/arm64: Set default OUTPUT path when undefined
arm64: Update comment regarding values in __boot_cpu_mode
arm64: mm: Drop redundant check in pmd_trans_huge()
arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1
arm64/mm: Permit lazy_mmu_mode to be nested
arm64/mm: Disable barrier batching in interrupt contexts
arm64/cpuinfo: only show one cpu's info in c_show()
arm64/mm: Batch barriers when updating kernel mappings
mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes
arm64/mm: Support huge pte-mapped pages in vmap
mm/vmalloc: Gracefully unmap huge ptes
mm/vmalloc: Warn on improper use of vunmap_range()
arm64/mm: Hoist barriers out of set_ptes_anysz() loop
...
Linus Torvalds [Wed, 28 May 2025 21:52:12 +0000 (14:52 -0700)]
Merge tag 'nios2_updates_for_v6.16' of git://git./linux/kernel/git/dinguyen/linux
Pull nios2 updates fromDinh Nguyen:
- Use strscpy() and simply setup_cpuinfo()
- Remove conflicting mappings when flushing tlb entries
- Force update_mmu_cache on spurious pagefaults
* tag 'nios2_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: Replace strcpy() with strscpy() and simplify setup_cpuinfo()
nios2: do not introduce conflicting mappings when flushing tlb entries
nios2: force update_mmu_cache on spurious tlb-permission--related pagefaults
Linus Torvalds [Wed, 28 May 2025 21:43:00 +0000 (14:43 -0700)]
Merge tag 'v6.16-p2' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a buffer overflow regression in shash"
* tag 'v6.16-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: shash - Fix buffer overrun in import function