D. Wythe [Mon, 18 Aug 2025 05:46:18 +0000 (13:46 +0800)]
net/smc: fix UAF on smcsk after smc_listen_out()
BPF CI testing report a UAF issue:
[ 16.446633] BUG: kernel NULL pointer dereference, address:
000000000000003 0
[ 16.447134] #PF: supervisor read access in kernel mod e
[ 16.447516] #PF: error_code(0x0000) - not-present pag e
[ 16.447878] PGD 0 P4D 0
[ 16.448063] Oops: Oops: 0000 [#1] PREEMPT SMP NOPT I
[ 16.448409] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:1 Tainted: G OE
6.13.0-rc3-g89e8a75fda73-dirty #4 2
[ 16.449124] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODUL E
[ 16.449502] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/201 4
[ 16.450201] Workqueue: smc_hs_wq smc_listen_wor k
[ 16.450531] RIP: 0010:smc_listen_work+0xc02/0x159 0
[ 16.452158] RSP: 0018:
ffffb5ab40053d98 EFLAGS:
0001024 6
[ 16.452526] RAX:
0000000000000001 RBX:
0000000000000002 RCX:
000000000000030 0
[ 16.452994] RDX:
0000000000000280 RSI:
00003513840053f0 RDI:
000000000000000 0
[ 16.453492] RBP:
ffffa097808e3800 R08:
ffffa09782dba1e0 R09:
000000000000000 5
[ 16.453987] R10:
0000000000000000 R11:
0000000000000000 R12:
ffffa0978274640 0
[ 16.454497] R13:
0000000000000000 R14:
0000000000000000 R15:
ffffa09782d4092 0
[ 16.454996] FS:
0000000000000000(0000) GS:
ffffa097bbc00000(0000) knlGS:
000000000000000 0
[ 16.455557] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003 3
[ 16.455961] CR2:
0000000000000030 CR3:
0000000102788004 CR4:
0000000000770ef 0
[ 16.456459] PKRU:
5555555 4
[ 16.456654] Call Trace :
[ 16.456832] <TASK >
[ 16.456989] ? __die+0x23/0x7 0
[ 16.457215] ? page_fault_oops+0x180/0x4c 0
[ 16.457508] ? __lock_acquire+0x3e6/0x249 0
[ 16.457801] ? exc_page_fault+0x68/0x20 0
[ 16.458080] ? asm_exc_page_fault+0x26/0x3 0
[ 16.458389] ? smc_listen_work+0xc02/0x159 0
[ 16.458689] ? smc_listen_work+0xc02/0x159 0
[ 16.458987] ? lock_is_held_type+0x8f/0x10 0
[ 16.459284] process_one_work+0x1ea/0x6d 0
[ 16.459570] worker_thread+0x1c3/0x38 0
[ 16.459839] ? __pfx_worker_thread+0x10/0x1 0
[ 16.460144] kthread+0xe0/0x11 0
[ 16.460372] ? __pfx_kthread+0x10/0x1 0
[ 16.460640] ret_from_fork+0x31/0x5 0
[ 16.460896] ? __pfx_kthread+0x10/0x1 0
[ 16.461166] ret_from_fork_asm+0x1a/0x3 0
[ 16.461453] </TASK >
[ 16.461616] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE) ]
[ 16.462134] CR2:
000000000000003 0
[ 16.462380] ---[ end trace
0000000000000000 ]---
[ 16.462710] RIP: 0010:smc_listen_work+0xc02/0x1590
The direct cause of this issue is that after smc_listen_out_connected(),
newclcsock->sk may be NULL since it will releases the smcsk. Therefore,
if the application closes the socket immediately after accept,
newclcsock->sk can be NULL. A possible execution order could be as
follows:
smc_listen_work | userspace
-----------------------------------------------------------------
lock_sock(sk) |
smc_listen_out_connected() |
| \- smc_listen_out |
| | \- release_sock |
| |- sk->sk_data_ready() |
| fd = accept();
| close(fd);
| \- socket->sk = NULL;
/* newclcsock->sk is NULL now */
SMC_STAT_SERV_SUCC_INC(sock_net(newclcsock->sk))
Since smc_listen_out_connected() will not fail, simply swapping the order
of the code can easily fix this issue.
Fixes:
3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20250818054618.41615-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yao Zi [Fri, 15 Aug 2025 10:48:03 +0000 (10:48 +0000)]
net: stmmac: thead: Enable TX clock before MAC initialization
The clk_tx_i clock must be supplied to the MAC for successful
initialization. On TH1520 SoC, the clock is provided by an internal
divider configured through GMAC_PLLCLK_DIV register when using RGMII
interface. However, currently we don't setup the divider before
initialization of the MAC, resulting in DMA reset failures if the
bootloader/firmware doesn't enable the divider,
[ 7.839601] thead-dwmac
ffe7060000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 7.938338] thead-dwmac
ffe7060000.ethernet eth0: PHY [stmmac-0:02] driver [RTL8211F Gigabit Ethernet] (irq=POLL)
[ 8.160746] thead-dwmac
ffe7060000.ethernet eth0: Failed to reset the dma
[ 8.170118] thead-dwmac
ffe7060000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
[ 8.179384] thead-dwmac
ffe7060000.ethernet eth0: __stmmac_open: Hw setup failed
Let's simply write GMAC_PLLCLK_DIV_EN to GMAC_PLLCLK_DIV to enable the
divider before MAC initialization. Note that for reconfiguring the
divisor, the divider must be disabled first and re-enabled later to make
sure the new divisor take effect.
The exact clock rate doesn't affect MAC's initialization according to my
test. It's set to the speed required by RGMII when the linkspeed is
1Gbps and could be reclocked later after link is up if necessary.
Fixes:
33a1a01e3afa ("net: stmmac: Add glue layer for T-HEAD TH1520 SoC")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Link: https://patch.msgid.link/20250815104803.55294-1-ziyao@disroot.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jordan Rhee [Mon, 18 Aug 2025 21:12:45 +0000 (14:12 -0700)]
gve: prevent ethtool ops after shutdown
A crash can occur if an ethtool operation is invoked
after shutdown() is called.
shutdown() is invoked during system shutdown to stop DMA operations
without performing expensive deallocations. It is discouraged to
unregister the netdev in this path, so the device may still be visible
to userspace and kernel helpers.
In gve, shutdown() tears down most internal data structures. If an
ethtool operation is dispatched after shutdown(), it will dereference
freed or NULL pointers, leading to a kernel panic. While graceful
shutdown normally quiesces userspace before invoking the reboot
syscall, forced shutdowns (as observed on GCP VMs) can still trigger
this path.
Fix by calling netif_device_detach() in shutdown().
This marks the device as detached so the ethtool ioctl handler
will skip dispatching operations to the driver.
Fixes:
974365e51861 ("gve: Implement suspend/resume/shutdown")
Signed-off-by: Jordan Rhee <jordanrhee@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250818211245.1156919-1-jeroendb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yuichiro Tsuji [Mon, 18 Aug 2025 08:45:07 +0000 (17:45 +0900)]
net: usb: asix_devices: Fix PHY address mask in MDIO bus initialization
Syzbot reported shift-out-of-bounds exception on MDIO bus initialization.
The PHY address should be masked to 5 bits (0-31). Without this
mask, invalid PHY addresses could be used, potentially causing issues
with MDIO bus operations.
Fix this by masking the PHY address with 0x1f (31 decimal) to ensure
it stays within the valid range.
Fixes:
4faff70959d5 ("net: usb: asix_devices: add phy_mask for ax88772 mdio bus")
Reported-by: syzbot+20537064367a0f98d597@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
20537064367a0f98d597
Tested-by: syzbot+20537064367a0f98d597@syzkaller.appspotmail.com
Signed-off-by: Yuichiro Tsuji <yuichtsu@amazon.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250818084541.1958-1-yuichtsu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Horatiu Vultur [Mon, 18 Aug 2025 08:10:29 +0000 (10:10 +0200)]
phy: mscc: Fix timestamping for vsc8584
There was a problem when we received frames and the frames were
timestamped. The driver is configured to store the nanosecond part of
the timestmap in the ptp reserved bits and it would take the second part
by reading the LTC. The problem is that when reading the LTC we are in
atomic context and to read the second part will go over mdio bus which
might sleep, so we get an error.
The fix consists in actually put all the frames in a queue and start the
aux work and in that work to read the LTC and then calculate the full
received time.
Fixes:
7d272e63e0979d ("net: phy: mscc: timestamping and PHC support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250818081029.1300780-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Fri, 15 Aug 2025 13:53:17 +0000 (10:53 -0300)]
net/sched: sch_dualpi2: Run prob update timer in softirq to avoid deadlock
When a user creates a dualpi2 qdisc it automatically sets a timer. This
timer will run constantly and update the qdisc's probability field.
The issue is that the timer acquires the qdisc root lock and runs in
hardirq. The qdisc root lock is also acquired in dev.c whenever a packet
arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq,
it may interrupt the packet processing running in softirq. If that happens
and it runs on the same CPU, it will acquire the same lock and cause a
deadlock. The following splat shows up when running a kernel compiled with
lock debugging:
[ +0.000224] WARNING: inconsistent lock state
[ +0.000224] 6.16.0+ #10 Not tainted
[ +0.000169] --------------------------------
[ +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes:
[ +0.000000]
ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140
[ +0.000000] {IN-HARDIRQ-W} state was registered at:
[ +0.000000] lock_acquire.part.0+0xb6/0x220
[ +0.000000] _raw_spin_lock+0x31/0x80
[ +0.000000] dualpi2_timer+0x6f/0x270
[ +0.000000] __hrtimer_run_queues+0x1c5/0x360
[ +0.000000] hrtimer_interrupt+0x115/0x260
[ +0.000000] __sysvec_apic_timer_interrupt+0x6d/0x1a0
[ +0.000000] sysvec_apic_timer_interrupt+0x6e/0x80
[ +0.000000] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ +0.000000] pv_native_safe_halt+0xf/0x20
[ +0.000000] default_idle+0x9/0x10
[ +0.000000] default_idle_call+0x7e/0x1e0
[ +0.000000] do_idle+0x1e8/0x250
[ +0.000000] cpu_startup_entry+0x29/0x30
[ +0.000000] rest_init+0x151/0x160
[ +0.000000] start_kernel+0x6f3/0x700
[ +0.000000] x86_64_start_reservations+0x24/0x30
[ +0.000000] x86_64_start_kernel+0xc8/0xd0
[ +0.000000] common_startup_64+0x13e/0x148
[ +0.000000] irq event stamp: 6884
[ +0.000000] hardirqs last enabled at (6883): [<
ffffffffa75700b3>] neigh_resolve_output+0x223/0x270
[ +0.000000] hardirqs last disabled at (6882): [<
ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270
[ +0.000000] softirqs last enabled at (6880): [<
ffffffffa757006b>] neigh_resolve_output+0x1db/0x270
[ +0.000000] softirqs last disabled at (6884): [<
ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140
[ +0.000000]
other info that might help us debug this:
[ +0.000000] Possible unsafe locking scenario:
[ +0.000000] CPU0
[ +0.000000] ----
[ +0.000000] lock(&sch->root_lock_key);
[ +0.000000] <Interrupt>
[ +0.000000] lock(&sch->root_lock_key);
[ +0.000000]
*** DEADLOCK ***
[ +0.000000] 4 locks held by ping/156:
[ +0.000000] #0:
ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40
[ +0.000000] #1:
ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190
[ +0.000000] #2:
ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950
[ +0.000000] #3:
ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140
I am able to reproduce it consistently when running the following:
tc qdisc add dev lo handle 1: root dualpi2
ping -f 127.0.0.1
To fix it, make the timer run in softirq.
Fixes:
320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250815135317.664993-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lubomir Rintel [Thu, 14 Aug 2025 15:42:14 +0000 (17:42 +0200)]
cdc_ncm: Flag Intel OEM version of Fibocom L850-GL as WWAN
This lets NetworkManager/ModemManager know that this is a modem and
needs to be connected first.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Link: https://patch.msgid.link/20250814154214.250103-1-lkundrak@v3.sk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MD Danish Anwar [Thu, 14 Aug 2025 10:51:06 +0000 (16:21 +0530)]
net: ti: icssg-prueth: Fix HSR and switch offload Enablement during firwmare reload.
To enable HSR / Switch offload, certain configurations are needed.
Currently they are done inside icssg_change_mode(). This function only
gets called if we move from one mode to another without bringing the
links up / down.
Once in HSR / Switch mode, if we bring the links down and bring it back
up again. The callback sequence is,
- emac_ndo_stop()
Firmwares are stopped
- emac_ndo_open()
Firmwares are loaded
In this path icssg_change_mode() doesn't get called and as a result the
configurations needed for HSR / Switch is not done.
To fix this, put all these configurations in a separate function
icssg_enable_fw_offload() and call this from both icssg_change_mode()
and emac_ndo_open()
Fixes:
56375086d093 ("net: ti: icssg-prueth: Enable HSR Tx duplication, Tx Tag and Rx Tag offload")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20250814105106.1491871-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Qingfang Deng [Thu, 14 Aug 2025 01:25:58 +0000 (09:25 +0800)]
ppp: fix race conditions in ppp_fill_forward_path
ppp_fill_forward_path() has two race conditions:
1. The ppp->channels list can change between list_empty() and
list_first_entry(), as ppp_lock() is not held. If the only channel
is deleted in ppp_disconnect_channel(), list_first_entry() may
access an empty head or a freed entry, and trigger a panic.
2. pch->chan can be NULL. When ppp_unregister_channel() is called,
pch->chan is set to NULL before pch is removed from ppp->channels.
Fix these by using a lockless RCU approach:
- Use list_first_or_null_rcu() to safely test and access the first list
entry.
- Convert list modifications on ppp->channels to their RCU variants and
add synchronize_net() after removal.
- Check for a NULL pch->chan before dereferencing it.
Fixes:
f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices")
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20250814012559.3705-2-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Qingfang Deng [Thu, 14 Aug 2025 01:25:57 +0000 (09:25 +0800)]
net: ethernet: mtk_ppe: add RCU lock around dev_fill_forward_path
Ensure ndo_fill_forward_path() is called with RCU lock held.
Fixes:
2830e314778d ("net: ethernet: mtk-ppe: fix traffic offload with bridged wlan")
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20250814012559.3705-1-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michael Chan [Sat, 16 Aug 2025 18:38:50 +0000 (11:38 -0700)]
bnxt_en: Fix lockdep warning during rmmod
The commit under the Fixes tag added a netdev_assert_locked() in
bnxt_free_ntp_fltrs(). The lock should be held during normal run-time
but the assert will be triggered (see below) during bnxt_remove_one()
which should not need the lock. The netdev is already unregistered by
then. Fix it by calling netdev_assert_locked_or_invisible() which will
not assert if the netdev is unregistered.
WARNING: CPU: 5 PID: 2241 at ./include/net/netdev_lock.h:17 bnxt_free_ntp_fltrs+0xf8/0x100 [bnxt_en]
Modules linked in: rpcrdma rdma_cm iw_cm ib_cm configfs ib_core bnxt_en(-) bridge stp llc x86_pkg_temp_thermal xfs tg3 [last unloaded: bnxt_re]
CPU: 5 UID: 0 PID: 2241 Comm: rmmod Tainted: G S W 6.16.0 #2 PREEMPT(voluntary)
Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
RIP: 0010:bnxt_free_ntp_fltrs+0xf8/0x100 [bnxt_en]
Code: 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 47 60 be ff ff ff ff 48 8d b8 28 0c 00 00 e8 d0 cf 41 c3 85 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:
ffffa92082387da0 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff9e5b593d8000 RCX:
0000000000000001
RDX:
0000000000000001 RSI:
ffffffff83dc9a70 RDI:
ffffffff83e1a1cf
RBP:
ffff9e5b593d8c80 R08:
0000000000000000 R09:
ffffffff8373a2b3
R10:
000000008100009f R11:
0000000000000001 R12:
0000000000000001
R13:
ffffffffc01c4478 R14:
dead000000000122 R15:
dead000000000100
FS:
00007f3a8a52c740(0000) GS:
ffff9e631ad1c000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000055bb289419c8 CR3:
000000011274e001 CR4:
00000000003706f0
Call Trace:
<TASK>
bnxt_remove_one+0x57/0x180 [bnxt_en]
pci_device_remove+0x39/0xc0
device_release_driver_internal+0xa5/0x130
driver_detach+0x42/0x90
bus_remove_driver+0x61/0xc0
pci_unregister_driver+0x38/0x90
bnxt_exit+0xc/0x7d0 [bnxt_en]
Fixes:
004b5008016a ("eth: bnxt: remove most dependencies on RTNL")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250816183850.4125033-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonas Gorski [Fri, 15 Aug 2025 20:18:09 +0000 (22:18 +0200)]
net: dsa: b53: fix reserved register access in b53_fdb_dump()
When BCM5325 support was added in
c45655386e53 ("net: dsa: b53: add
support for FDB operations on 5325/5365"), the register used for ARL access
was made conditional on the chip.
But in b53_fdb_dump(), instead of the register argument the page
argument was replaced, causing it to write to a reserved page 0x50 on
!BCM5325*. Writing to this page seems to completely lock the switch up:
[ 89.680000] b53-switch spi0.1 lan2: Link is Down
[ 89.680000] WARNING: CPU: 1 PID: 26 at drivers/net/phy/phy.c:1350 _phy_state_machine+0x1bc/0x454
[ 89.720000] phy_check_link_status+0x0/0x114: returned: -5
[ 89.730000] Modules linked in: nft_fib_inet nf_flow_table_inet nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 cls_flower sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact vrf md5 crc32c_cryptoapi
[ 89.780000] CPU: 1 UID: 0 PID: 26 Comm: kworker/u10:0 Tainted: G W 6.16.0-rc1+ #0 NONE
[ 89.780000] Tainted: [W]=WARN
[ 89.780000] Hardware name: Netgear DGND3700 v1
[ 89.780000] Workqueue: events_power_efficient phy_state_machine
[ 89.780000] Stack :
809c762c 8006b050 00000001 820a9ce3 0000114c 000affff 805d22d0 8200ba00
[ 89.780000]
82005000 6576656e 74735f70 6f776572 5f656666 10008b00 820a9cb8 82088700
[ 89.780000]
00000000 00000000 809c762c 820a9a98 00000000 00000000 ffffefff 80a7a76c
[ 89.780000]
80a70000 820a9af8 80a70000 80a70000 80a70000 00000000 809c762c 820a9dd4
[ 89.780000]
00000000 805d1494 80a029e4 80a70000 00000003 00000000 00000004 81a60004
[ 89.780000] ...
[ 89.780000] Call Trace:
[ 89.780000] [<
800228b8>] show_stack+0x38/0x118
[ 89.780000] [<
8001afc4>] dump_stack_lvl+0x6c/0xac
[ 89.780000] [<
80046b90>] __warn+0x9c/0x114
[ 89.780000] [<
80046da8>] warn_slowpath_fmt+0x1a0/0x1b0
[ 89.780000] [<
805d1494>] _phy_state_machine+0x1bc/0x454
[ 89.780000] [<
805d22fc>] phy_state_machine+0x2c/0x70
[ 89.780000] [<
80066b08>] process_one_work+0x1e8/0x3e0
[ 89.780000] [<
80067a1c>] worker_thread+0x354/0x4e4
[ 89.780000] [<
800706cc>] kthread+0x130/0x274
[ 89.780000] [<
8001d808>] ret_from_kernel_thread+0x14/0x1c
And any further accesses fail:
[ 120.790000] b53-switch spi0.1: timeout waiting for ARL to finish: 0x81
[ 120.800000] b53-switch spi0.1: port 2 failed to add 2c:b0:5d:27:9a:bd vid 3 to fdb: -145
[ 121.010000] b53-switch spi0.1: timeout waiting for ARL to finish: 0xbf
[ 121.020000] b53-switch spi0.1: port 3 failed to add 2c:b0:5d:27:9a:bd vid 3 to fdb: -145
Restore the correct page B53_ARLIO_PAGE again, and move the offset
argument to the correct place.
*On BCM5325, this became a write to the MIB page of Port 1. Still
a reserved offset, but likely less brokenness from that write.
Fixes:
c45655386e53 ("net: dsa: b53: add support for FDB operations on 5325/5365")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250815201809.549195-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 19 Aug 2025 00:40:03 +0000 (17:40 -0700)]
Merge branch 'mptcp-misc-fixes-for-v6-17-rc'
Matthieu Baerts says:
====================
mptcp: misc fixes for v6.17-rc
Here are various fixes:
- Patch 1: Better handling SKB extension allocation failures.
A fix for v5.7.
- Patches 2, 3: Avoid resetting MPTCP limits when flushing MPTCP
endpoints. With a validation in the selftests. Fixes for v5.7.
- Patches 4, 5, 6: Disallow '0' as ADD_ADDR retransmission timeout.
With a preparation patch, and a validation in the selftests.
Fixes for v5.11.
- Patches 8, 9: Fix C23 extension warnings in the selftests,
spotted by GCC. Fixes for v6.16.
====================
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-0-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 15 Aug 2025 17:28:26 +0000 (19:28 +0200)]
selftests: mptcp: sockopt: fix C23 extension warning
GCC was complaining about the new label:
mptcp_inq.c:79:2: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
79 | int err = getaddrinfo(node, service, hints, res);
| ^
mptcp_sockopt.c:166:2: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
166 | int err = getaddrinfo(node, service, hints, res);
| ^
Simply declare 'err' before the label to avoid this warning.
Fixes:
dd367e81b79a ("selftests: mptcp: sockopt: use IPPROTO_MPTCP for getaddrinfo")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-8-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 15 Aug 2025 17:28:25 +0000 (19:28 +0200)]
selftests: mptcp: connect: fix C23 extension warning
GCC was complaining about the new label:
mptcp_connect.c:187:2: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
187 | int err = getaddrinfo(node, service, hints, res);
| ^
Simply declare 'err' before the label to avoid this warning.
Fixes:
a862771d1aa4 ("selftests: mptcp: use IPPROTO_MPTCP for getaddrinfo")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-7-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Fri, 15 Aug 2025 17:28:24 +0000 (19:28 +0200)]
selftests: mptcp: disable add_addr retrans in endpoint_tests
To prevent test instability in the "delete re-add signal" test caused by
ADD_ADDR retransmissions, disable retransmissions for this test by setting
net.mptcp.add_addr_timeout to 0.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-6-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Fri, 15 Aug 2025 17:28:23 +0000 (19:28 +0200)]
mptcp: disable add_addr retransmission when timeout is 0
When add_addr_timeout was set to 0, this caused the ADD_ADDR to be
retransmitted immediately, which looks like a buggy behaviour. Instead,
interpret 0 as "no retransmissions needed".
The documentation is updated to explicitly state that setting the timeout
to 0 disables retransmission.
Fixes:
93f323b9cccc ("mptcp: add a new sysctl add_addr_timeout")
Cc: stable@vger.kernel.org
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-5-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Fri, 15 Aug 2025 17:28:22 +0000 (19:28 +0200)]
mptcp: remove duplicate sk_reset_timer call
sk_reset_timer() was called twice in mptcp_pm_alloc_anno_list.
Simplify the code by using a 'goto' statement to eliminate the
duplication.
Note that this is not a fix, but it will help backporting the following
patch. The same "Fixes" tag has been added for this reason.
Fixes:
93f323b9cccc ("mptcp: add a new sysctl add_addr_timeout")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-4-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 15 Aug 2025 17:28:21 +0000 (19:28 +0200)]
selftests: mptcp: pm: check flush doesn't reset limits
This modification is linked to the parent commit where the received
ADD_ADDR limit was accidentally reset when the endpoints were flushed.
To validate that, the test is now flushing endpoints after having set
new limits, and before checking them.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes:
01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-3-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 15 Aug 2025 17:28:20 +0000 (19:28 +0200)]
mptcp: pm: kernel: flush: do not reset ADD_ADDR limit
A flush of the MPTCP endpoints should not affect the MPTCP limits. In
other words, 'ip mptcp endpoint flush' should not change 'ip mptcp
limits'.
But it was the case: the MPTCP_PM_ATTR_RCV_ADD_ADDRS (add_addr_accepted)
limit was reset by accident. Removing the reset of this counter during a
flush fixes this issue.
Fixes:
01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Reported-by: Thomas Dreibholz <dreibh@simula.no>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/579
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-2-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christoph Paasch [Fri, 15 Aug 2025 17:28:19 +0000 (19:28 +0200)]
mptcp: drop skb if MPTCP skb extension allocation fails
When skb_ext_add(skb, SKB_EXT_MPTCP) fails in mptcp_incoming_options(),
we used to return true, letting the segment proceed through the TCP
receive path without a DSS mapping. Such segments can leave inconsistent
mapping state and trigger a mid-stream fallback to TCP, which in testing
collapsed (by artificially forcing failures in skb_ext_add) throughput
to zero.
Return false instead so the TCP input path drops the skb (see
tcp_data_queue() and step-7 processing). This is the safer choice
under memory pressure: it preserves MPTCP correctness and provides
backpressure to the sender.
Control packets remain unaffected: ACK updates and DATA_FIN handling
happen before attempting the extension allocation, and tcp_reset()
continues to ignore the return value.
With this change, MPTCP continues to work at high throughput if we
artificially inject failures into skb_ext_add.
Fixes:
6787b7e350d3 ("mptcp: avoid processing packet if a subflow reset")
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-1-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Minhong He [Fri, 15 Aug 2025 06:38:45 +0000 (14:38 +0800)]
ipv6: sr: validate HMAC algorithm ID in seg6_hmac_info_add
The seg6_genl_sethmac() directly uses the algorithm ID provided by the
userspace without verifying whether it is an HMAC algorithm supported
by the system.
If an unsupported HMAC algorithm ID is configured, packets using SRv6 HMAC
will be dropped during encapsulation or decapsulation.
Fixes:
4f4853dc1c9c ("ipv6: sr: implement API to control SR HMAC structure")
Signed-off-by: Minhong He <heminhong@kylinos.cn>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250815063845.85426-1-heminhong@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Ramaseuski [Thu, 14 Aug 2025 10:51:19 +0000 (12:51 +0200)]
net: gso: Forbid IPv6 TSO with extensions on devices with only IPV6_CSUM
When performing Generic Segmentation Offload (GSO) on an IPv6 packet that
contains extension headers, the kernel incorrectly requests checksum offload
if the egress device only advertises NETIF_F_IPV6_CSUM feature, which has
a strict contract: it supports checksum offload only for plain TCP or UDP
over IPv6 and explicitly does not support packets with extension headers.
The current GSO logic violates this contract by failing to disable the feature
for packets with extension headers, such as those used in GREoIPv6 tunnels.
This violation results in the device being asked to perform an operation
it cannot support, leading to a `skb_warn_bad_offload` warning and a collapse
of network throughput. While device TSO/USO is correctly bypassed in favor
of software GSO for these packets, the GSO stack must be explicitly told not
to request checksum offload.
Mask NETIF_F_IPV6_CSUM, NETIF_F_TSO6 and NETIF_F_GSO_UDP_L4
in gso_features_check if the IPv6 header contains extension headers to compute
checksum in software.
The exception is a BIG TCP extension, which, as stated in commit
68e068cabd2c6c53 ("net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets"):
"The feature is only enabled on devices that support BIG TCP TSO.
The header is only present for PF_PACKET taps like tcpdump,
and not transmitted by physical devices."
kernel log output (truncated):
WARNING: CPU: 1 PID: 5273 at net/core/dev.c:3535 skb_warn_bad_offload+0x81/0x140
...
Call Trace:
<TASK>
skb_checksum_help+0x12a/0x1f0
validate_xmit_skb+0x1a3/0x2d0
validate_xmit_skb_list+0x4f/0x80
sch_direct_xmit+0x1a2/0x380
__dev_xmit_skb+0x242/0x670
__dev_queue_xmit+0x3fc/0x7f0
ip6_finish_output2+0x25e/0x5d0
ip6_finish_output+0x1fc/0x3f0
ip6_tnl_xmit+0x608/0xc00 [ip6_tunnel]
ip6gre_tunnel_xmit+0x1c0/0x390 [ip6_gre]
dev_hard_start_xmit+0x63/0x1c0
__dev_queue_xmit+0x6d0/0x7f0
ip6_finish_output2+0x214/0x5d0
ip6_finish_output+0x1fc/0x3f0
ip6_xmit+0x2ca/0x6f0
ip6_finish_output+0x1fc/0x3f0
ip6_xmit+0x2ca/0x6f0
inet6_csk_xmit+0xeb/0x150
__tcp_transmit_skb+0x555/0xa80
tcp_write_xmit+0x32a/0xe90
tcp_sendmsg_locked+0x437/0x1110
tcp_sendmsg+0x2f/0x50
...
skb linear:
00000000: e4 3d 1a 7d ec 30 e4 3d 1a 7e 5d 90 86 dd 60 0e
skb linear:
00000010: 00 0a 1b 34 3c 40 20 11 00 00 00 00 00 00 00 00
skb linear:
00000020: 00 00 00 00 00 12 20 11 00 00 00 00 00 00 00 00
skb linear:
00000030: 00 00 00 00 00 11 2f 00 04 01 04 01 01 00 00 00
skb linear:
00000040: 86 dd 60 0e 00 0a 1b 00 06 40 20 23 00 00 00 00
skb linear:
00000050: 00 00 00 00 00 00 00 00 00 12 20 23 00 00 00 00
skb linear:
00000060: 00 00 00 00 00 00 00 00 00 11 bf 96 14 51 13 f9
skb linear:
00000070: ae 27 a0 a8 2b e3 80 18 00 40 5b 6f 00 00 01 01
skb linear:
00000080: 08 0a 42 d4 50 d5 4b 70 f8 1a
Fixes:
04c20a9356f283da ("net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Suggested-by: Michal Schmidt <mschmidt@redhat.com>
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Jakub Ramaseuski <jramaseu@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250814105119.1525687-1-jramaseu@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 14 Aug 2025 19:43:23 +0000 (12:43 -0700)]
selftests: tls: make the new data_steal test less flaky
The CI has hit a couple of cases of:
RUN global.data_steal ...
tls.c:2762:data_steal:Expected recv(cfd, buf2, sizeof(buf2), MSG_DONTWAIT) (20000) == -1 (-1)
data_steal: Test terminated by timeout
FAIL global.data_steal
Looks like the 2msec sleep is not long enough. Make the sleep longer,
and then instead of second sleep wait for the thieving process to exit.
That way we can be sure it called recv() before us.
While at it also avoid trying to steal more than a record, this seems
to be causing issues in manual testing as well.
Fixes:
d7e82594a45c ("selftests: tls: test TCP stealing data from under the TLS socket")
Link: https://patch.msgid.link/20250814194323.2014650-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chandra Mohan Sundar [Thu, 14 Aug 2025 16:30:10 +0000 (22:00 +0530)]
net: libwx: Fix the size in RSS hash key population
While trying to fill a random RSS key, the size of the pointer
is being used rather than the actual size of the RSS key.
Fix by passing an appropriate value of the RSS key.
This issue was reported by static coverity analyser.
Fixes:
eb4898fde1de8 ("net: libwx: add wangxun vf common api")
Signed-off-by: Chandra Mohan Sundar <chandramohan.explore@gmail.com>
Link: https://patch.msgid.link/20250814163014.613004-1-chandramohan.explore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 15 Aug 2025 17:56:11 +0000 (10:56 -0700)]
Merge tag 'for-net-2025-08-15' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_conn: Fix running bis_cleanup for hci_conn->type PA_LINK
- hci_conn: Fix not cleaning up Broadcaster/Broadcast Source
- hci_core: Fix using {cis,bis}_capable for current settings
- hci_core: Fix using ll_privacy_capable for current settings
- hci_core: Fix not accounting for BIS/CIS/PA links separately
- hci_conn: do return error from hci_enhanced_setup_sync()
- hci_event: fix MTU for BN == 0 in CIS Established
- hci_sync: Fix scan state after PA Sync has been established
- hci_sync: Avoid adding default advertising on startup
- hci_sync: Prevent unintended PA sync when SID is 0xFF
- ISO: Fix getname not returning broadcast fields
- btmtk: Fix wait_on_bit_timeout interruption during shutdown
- btnxpuart: Uses threaded IRQ for host wakeup handling
* tag 'for-net-2025-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_core: Fix not accounting for BIS/CIS/PA links separately
Bluetooth: btnxpuart: Uses threaded IRQ for host wakeup handling
Bluetooth: hci_conn: do return error from hci_enhanced_setup_sync()
Bluetooth: hci_event: fix MTU for BN == 0 in CIS Established
Bluetooth: hci_sync: Prevent unintended PA sync when SID is 0xFF
Bluetooth: hci_core: Fix using ll_privacy_capable for current settings
Bluetooth: hci_core: Fix using {cis,bis}_capable for current settings
Bluetooth: btmtk: Fix wait_on_bit_timeout interruption during shutdown
Bluetooth: hci_conn: Fix not cleaning up Broadcaster/Broadcast Source
Bluetooth: hci_conn: Fix running bis_cleanup for hci_conn->type PA_LINK
Bluetooth: ISO: Fix getname not returning broadcast fields
Bluetooth: hci_sync: Fix scan state after PA Sync has been established
Bluetooth: hci_sync: Avoid adding default advertising on startup
====================
Link: https://patch.msgid.link/20250815142229.253052-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 15 Aug 2025 17:44:50 +0000 (10:44 -0700)]
Merge branch 'mlxsw-spectrum-forward-packets-with-an-ipv4-link-local-source-ip'
Petr Machata says:
====================
mlxsw: spectrum: Forward packets with an IPv4 link-local source IP
By default, Spectrum devices do not forward IPv4 packets with a link-local
source IP (i.e., 169.254.0.0/16). This behavior does not align with the
kernel which does forward them. Fix the issue and add a selftest.
====================
Link: https://patch.msgid.link/cover.1755174341.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 14 Aug 2025 13:06:41 +0000 (15:06 +0200)]
selftest: forwarding: router: Add a test case for IPv4 link-local source IP
Add a test case which checks that packets with an IPv4 link-local source
IP are forwarded and not dropped.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/3c2e0b17d99530f57bef5ddff9af284fa0c9b667.1755174341.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 14 Aug 2025 13:06:40 +0000 (15:06 +0200)]
mlxsw: spectrum: Forward packets with an IPv4 link-local source IP
By default, the device does not forward IPv4 packets with a link-local
source IP (i.e., 169.254.0.0/16). This behavior does not align with the
kernel which does forward them.
Fix by instructing the device to forward such packets instead of
dropping them.
Fixes:
ca360db4b825 ("mlxsw: spectrum: Disable DIP_LINK_LOCAL check in hardware pipeline")
Reported-by: Zoey Mertes <zoey@cloudflare.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/6721e6b2c96feb80269e72ce8d0b426e2f32d99c.1755174341.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz [Thu, 14 Aug 2025 15:57:19 +0000 (11:57 -0400)]
Bluetooth: hci_core: Fix not accounting for BIS/CIS/PA links separately
This fixes the likes of hci_conn_num(CIS_LINK) returning the total of
ISO connection which includes BIS_LINK as well, so this splits the
iso_num into each link type and introduces hci_iso_num that can be used
in places where the total number of ISO connection still needs to be
used.
Fixes:
23205562ffc8 ("Bluetooth: separate CIS_LINK and BIS_LINK link types")
Fixes:
a7bcffc673de ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Neeraj Sanjay Kale [Mon, 4 Aug 2025 10:30:15 +0000 (16:00 +0530)]
Bluetooth: btnxpuart: Uses threaded IRQ for host wakeup handling
This replaces devm_request_irq() with devm_request_threaded_irq().
On iMX93 11x11 EVK platform, the BT chip's BT_WAKE_OUT pin is connected
to an I2C GPIO expander instead of directly been connected to iMX GPIO.
When I2C GPIO expander's (PCAL6524) host driver receives an interrupt on
it's INTR line, the driver's interrupt handler needs to query the
interrupt source with PCAL6524 first, and then call the actual interrupt
handler, in this case the IRQ handler in BTNXPUART.
In order to handle interrupts when such I2C GPIO expanders are between
the host and interrupt source, devm_request_threaded_irq() is needed.
This commit also removes the IRQF_TRIGGER_FALLING flag, to allow setting
the IRQ trigger type from the device tree setting instead of hardcoding
in the driver.
Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com>
Reviewed-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Sergey Shtylyov [Tue, 5 Aug 2025 19:14:51 +0000 (22:14 +0300)]
Bluetooth: hci_conn: do return error from hci_enhanced_setup_sync()
The commit
e07a06b4eb41 ("Bluetooth: Convert SCO configure_datapath to
hci_sync") missed to update the *return* statement under the *case* of
BT_CODEC_TRANSPARENT in hci_enhanced_setup_sync(), which led to returning
success (0) instead of the negative error code (-EINVAL). However, the
result of hci_enhanced_setup_sync() seems to be ignored anyway, since NULL
gets passed to hci_cmd_sync_queue() as the last argument in that case and
the only function interested in that result is specified by that argument.
Fixes:
e07a06b4eb41 ("Bluetooth: Convert SCO configure_datapath to hci_sync")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pauli Virtanen [Sat, 9 Aug 2025 08:36:20 +0000 (11:36 +0300)]
Bluetooth: hci_event: fix MTU for BN == 0 in CIS Established
BN == 0x00 in CIS Established means no isochronous data for the
corresponding direction (Core v6.1 pp. 2394). In this case SDU MTU
should be 0.
However, the specification does not say the Max_PDU_C_To_P or P_To_C are
then zero. Intel AX210 in Framed CIS mode sets nonzero Max_PDU for
direction with zero BN. This causes failure later when we try to LE
Setup ISO Data Path for disabled direction, which is disallowed (Core
v6.1 pp. 2750).
Fix by setting SDU MTU to 0 if BN == 0.
Fixes:
2be22f1941d5f ("Bluetooth: hci_event: Fix parsing of CIS Established Event")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Yang Li [Thu, 7 Aug 2025 07:56:03 +0000 (15:56 +0800)]
Bluetooth: hci_sync: Prevent unintended PA sync when SID is 0xFF
After LE Extended Scan times out, conn->sid remains 0xFF,
so the PA sync creation process should be aborted.
Btmon snippet from PA sync with SID=0xFF:
< HCI Command: LE Set Extended.. (0x08|0x0042) plen 6 #74726 [hci0] 863.107927
Extended scan: Enabled (0x01)
Filter duplicates: Enabled (0x01)
Duration: 0 msec (0x0000)
Period: 0.00 sec (0x0000)
> HCI Event: Command Complete (0x0e) plen 4 #74727 [hci0] 863.109389
LE Set Extended Scan Enable (0x08|0x0042) ncmd 1
Status: Success (0x00)
< HCI Command: LE Periodic Ad.. (0x08|0x0044) plen 14 #74728 [hci0] 865.141168
Options: 0x0000
Use advertising SID, Advertiser Address Type and address
Reporting initially enabled
SID: 0xff
Adv address type: Random (0x01)
Adv address: 0D:D7:2C:E7:42:46 (Non-Resolvable)
Skip: 0x0000
Sync timeout: 20000 msec (0x07d0)
Sync CTE type: 0x0000
> HCI Event: Command Status (0x0f) plen 4 #74729 [hci0] 865.143223
LE Periodic Advertising Create Sync (0x08|0x0044) ncmd 1
Status: Success (0x00)
Fixes:
e2d471b7806b ("Bluetooth: ISO: Fix not using SID from adv report")
Signed-off-by: Yang Li <yang.li@amlogic.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Mon, 4 Aug 2025 18:05:03 +0000 (14:05 -0400)]
Bluetooth: hci_core: Fix using ll_privacy_capable for current settings
ll_privacy_capable only indicates that the controller supports the
feature but it doesnt' check that LE is enabled so it end up being
marked as active in the current settings when it shouldn't.
Fixes:
ad383c2c65a5 ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Mon, 4 Aug 2025 13:54:05 +0000 (09:54 -0400)]
Bluetooth: hci_core: Fix using {cis,bis}_capable for current settings
{cis,bis}_capable only indicates the controller supports the feature
since it doesn't check that LE is enabled so it shall not be used for
current setting, instead this introduces {cis,bis}_enabled macros that
can be used to indicate that these features are currently enabled.
Fixes:
26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections")
Fixes:
eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections")
Fixes:
ae7533613133 ("Bluetooth: Check for ISO support in controller")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Jiande Lu [Thu, 24 Jul 2025 08:51:17 +0000 (16:51 +0800)]
Bluetooth: btmtk: Fix wait_on_bit_timeout interruption during shutdown
During the shutdown process, an interrupt occurs that
prematurely terminates the wait for the expected event.
This change replaces TASK_INTERRUPTIBLE with
TASK_UNINTERRUPTIBLE in the wait_on_bit_timeout call to ensure
the shutdown process completes as intended without being
interrupted by signals.
Fixes:
d019930b0049 ("Bluetooth: btmtk: move btusb_mtk_hci_wmt_sync to btmtk.c")
Signed-off-by: Jiande Lu <jiande.lu@mediatek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Tue, 29 Jul 2025 16:11:09 +0000 (12:11 -0400)]
Bluetooth: hci_conn: Fix not cleaning up Broadcaster/Broadcast Source
This fixes Broadcaster/Broadcast Source not sending HCI_OP_LE_TERM_BIG
because HCI_CONN_PER_ADV where not being set.
Fixes:
a7bcffc673de ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Mon, 28 Jul 2025 17:51:01 +0000 (13:51 -0400)]
Bluetooth: hci_conn: Fix running bis_cleanup for hci_conn->type PA_LINK
Connections with type of PA_LINK shall be considered temporary just to
track the lifetime of PA Sync setup, once the BIG Sync is established
and connection are created with BIS_LINK the existing PA_LINK
connection shall not longer use bis_cleanup otherwise it terminates the
PA Sync when that shall be left to BIS_LINK connection to do it.
Fixes:
a7bcffc673de ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Thu, 24 Jul 2025 20:36:27 +0000 (16:36 -0400)]
Bluetooth: ISO: Fix getname not returning broadcast fields
getname shall return iso_bc fields for both BIS_LINK and PA_LINK since
the likes of bluetoothd do use the getpeername to retrieve the SID both
when enumerating the broadcasters and when synchronizing.
Fixes:
a7bcffc673de ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Thu, 24 Jul 2025 20:43:18 +0000 (16:43 -0400)]
Bluetooth: hci_sync: Fix scan state after PA Sync has been established
Passive scanning is used to program the address of the peer to be
synchronized, so once HCI_EV_LE_PA_SYNC_ESTABLISHED is received it
needs to be updated after clearing HCI_PA_SYNC then call
hci_update_passive_scan_sync to return it to its original state.
Fixes:
6d0417e4e1cf ("Bluetooth: hci_conn: Fix not setting conn_timeout for Broadcast Receiver")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Yang Li [Mon, 28 Jul 2025 09:08:44 +0000 (17:08 +0800)]
Bluetooth: hci_sync: Avoid adding default advertising on startup
list_empty(&hdev->adv_instances) is always true during startup,
so an advertising instance is added by default.
Call trace:
dump_backtrace+0x94/0xec
show_stack+0x18/0x24
dump_stack_lvl+0x48/0x60
dump_stack+0x18/0x24
hci_setup_ext_adv_instance_sync+0x17c/0x328
hci_powered_update_adv_sync+0xb4/0x12c
hci_powered_update_sync+0x54/0x70
hci_power_on_sync+0xe4/0x278
hci_set_powered_sync+0x28/0x34
set_powered_sync+0x40/0x58
hci_cmd_sync_work+0x94/0x100
process_one_work+0x168/0x444
worker_thread+0x378/0x3f4
kthread+0x108/0x10c
ret_from_fork+0x10/0x20
Link: https://github.com/bluez/bluez/issues/1442
Signed-off-by: Yang Li <yang.li@amlogic.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Justin Lai [Wed, 13 Aug 2025 07:16:31 +0000 (15:16 +0800)]
rtase: Fix Rx descriptor CRC error bit definition
The CRC error bit is located at bit 17 in the Rx descriptor, but the
driver was incorrectly using bit 16. Fix it.
Fixes:
a36e9f5cfe9e ("rtase: Add support for a pci table in this module")
Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Link: https://patch.msgid.link/20250813071631.7566-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
William Liu [Tue, 12 Aug 2025 23:58:26 +0000 (23:58 +0000)]
selftests/tc-testing: Check backlog stats in gso_skb case
Add tests to ensure proper backlog accounting in hhf, codel, pie, fq,
fq_pie, and fq_codel qdiscs. We check for the bug pattern originally
found in fq, fq_pie, and fq_codel, which was an underflow in the tbf
parent backlog stats upon child qdisc removal.
Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
Link: https://patch.msgid.link/20250812235808.45281-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
William Liu [Tue, 12 Aug 2025 23:57:57 +0000 (23:57 +0000)]
net/sched: Fix backlog accounting in qdisc_dequeue_internal
This issue applies for the following qdiscs: hhf, fq, fq_codel, and
fq_pie, and occurs in their change handlers when adjusting to the new
limit. The problem is the following in the values passed to the
subsequent qdisc_tree_reduce_backlog call given a tbf parent:
When the tbf parent runs out of tokens, skbs of these qdiscs will
be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
which accounts for both qlen and backlog. However, in the case of
qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
from gso_skb. This means that these qdiscs are missing a
qdisc_qstats_backlog_dec when dropping packets to satisfy the
new limit in their change handlers.
One can observe this issue with the following (with tc patched to
support a limit of 0):
export TARGET=fq
tc qdisc del dev lo root
tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
echo ''; echo 'add child'; tc -s -d qdisc show dev lo
ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
tc qdisc replace dev lo handle 2: parent 1:1 sfq
echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
The second to last show command shows 0 packets but a positive
number (74) of backlog bytes. The problem becomes clearer in the
last show command, where qdisc_purge_queue triggers
qdisc_tree_reduce_backlog with the positive backlog and causes an
underflow in the tbf parent's backlog (4096 Mb instead of 0).
To fix this issue, the codepath for all clients of qdisc_dequeue_internal
has been simplified: codel, pie, hhf, fq, fq_pie, and fq_codel.
qdisc_dequeue_internal handles the backlog adjustments for all cases that
do not directly use the dequeue handler.
The old fq_codel_change limit adjustment loop accumulated the arguments to
the subsequent qdisc_tree_reduce_backlog call through the cstats field.
However, this is confusing and error prone as fq_codel_dequeue could also
potentially mutate this field (which qdisc_dequeue_internal calls in the
non gso_skb case), so we have unified the code here with other qdiscs.
Fixes:
2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
Fixes:
4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
Fixes:
10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
Link: https://patch.msgid.link/20250812235725.45243-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wang Liang [Wed, 13 Aug 2025 02:10:54 +0000 (10:10 +0800)]
net: bridge: fix soft lockup in br_multicast_query_expired()
When set multicast_query_interval to a large value, the local variable
'time' in br_multicast_send_query() may overflow. If the time is smaller
than jiffies, the timer will expire immediately, and then call mod_timer()
again, which creates a loop and may trigger the following soft lockup
issue.
watchdog: BUG: soft lockup - CPU#1 stuck for 221s! [rb_consumer:66]
CPU: 1 UID: 0 PID: 66 Comm: rb_consumer Not tainted 6.16.0+ #259 PREEMPT(none)
Call Trace:
<IRQ>
__netdev_alloc_skb+0x2e/0x3a0
br_ip6_multicast_alloc_query+0x212/0x1b70
__br_multicast_send_query+0x376/0xac0
br_multicast_send_query+0x299/0x510
br_multicast_query_expired.constprop.0+0x16d/0x1b0
call_timer_fn+0x3b/0x2a0
__run_timers+0x619/0x950
run_timer_softirq+0x11c/0x220
handle_softirqs+0x18e/0x560
__irq_exit_rcu+0x158/0x1a0
sysvec_apic_timer_interrupt+0x76/0x90
</IRQ>
This issue can be reproduced with:
ip link add br0 type bridge
echo 1 > /sys/class/net/br0/bridge/multicast_querier
echo 0xffffffffffffffff >
/sys/class/net/br0/bridge/multicast_query_interval
ip link set dev br0 up
The multicast_startup_query_interval can also cause this issue. Similar to
the commit
99b40610956a ("net: bridge: mcast: add and enforce query
interval minimum"), add check for the query interval maximum to fix this
issue.
Link: https://lore.kernel.org/netdev/20250806094941.1285944-1-wangliang74@huawei.com/
Link: https://lore.kernel.org/netdev/20250812091818.542238-1-wangliang74@huawei.com/
Fixes:
d902eee43f19 ("bridge: Add multicast count/interval sysfs entries")
Suggested-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250813021054.1643649-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Suraj Gupta [Wed, 13 Aug 2025 13:55:59 +0000 (19:25 +0530)]
net: xilinx: axienet: Fix RX skb ring management in DMAengine mode
Submit multiple descriptors in axienet_rx_cb() to fill Rx skb ring. This
ensures the ring "catches up" on previously missed allocations.
Increment Rx skb ring head pointer after BD is successfully allocated.
Previously, head pointer was incremented before verifying if descriptor is
successfully allocated and has valid entries, which could lead to ring
state inconsistency if descriptor setup failed.
These changes improve reliability by maintaining adequate descriptor
availability and ensuring proper ring buffer state management.
Fixes:
6a91b846af85 ("net: axienet: Introduce dmaengine support")
Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
Link: https://patch.msgid.link/20250813135559.1555652-1-suraj.gupta2@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexandra Winter [Wed, 13 Aug 2025 11:16:33 +0000 (13:16 +0200)]
MAINTAINERS: update s390/net
Remove Thorsten Winkler as maintainer and add Aswin Karuvally as reviewer.
Thank you Thorsten for your support, welcome Aswin!
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Thorsten Winkler <twinkler@linux.ibm.com>
Acked-by: Aswin Karuvally <aswin@linux.ibm.com>
Link: https://patch.msgid.link/20250813111633.241111-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 14 Aug 2025 14:14:30 +0000 (07:14 -0700)]
Merge tag 'net-6.17-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Netfilter and IPsec.
Current release - regressions:
- netfilter: nft_set_pipapo:
- don't return bogus extension pointer
- fix null deref for empty set
Current release - new code bugs:
- core: prevent deadlocks when enabling NAPIs with mixed kthread
config
- eth: netdevsim: Fix wild pointer access in nsim_queue_free().
Previous releases - regressions:
- page_pool: allow enabling recycling late, fix false positive
warning
- sched: ets: use old 'nbands' while purging unused classes
- xfrm:
- restore GSO for SW crypto
- bring back device check in validate_xmit_xfrm
- tls: handle data disappearing from under the TLS ULP
- ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
- eth:
- bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
- hv_netvsc: fix panic during namespace deletion with VF
Previous releases - always broken:
- netfilter: fix refcount leak on table dump
- vsock: do not allow binding to VMADDR_PORT_ANY
- sctp: linearize cloned gso packets in sctp_rcv
- eth:
- hibmcge: fix the division by zero issue
- microchip: fix KSZ8863 reset problem"
* tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
net: usb: asix_devices: add phy_mask for ax88772 mdio bus
net: kcm: Fix race condition in kcm_unattach()
selftests: net/forwarding: test purge of active DWRR classes
net/sched: ets: use old 'nbands' while purging unused classes
bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
netdevsim: Fix wild pointer access in nsim_queue_free().
net: mctp: Fix bad kfree_skb in bind lookup test
netfilter: nf_tables: reject duplicate device on updates
ipvs: Fix estimator kthreads preferred affinity
netfilter: nft_set_pipapo: fix null deref for empty set
selftests: tls: test TCP stealing data from under the TLS socket
tls: handle data disappearing from under the TLS ULP
ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
ixgbe: prevent from unwanted interface name changes
devlink: let driver opt out of automatic phys_port_name generation
net: prevent deadlocks when enabling NAPIs with mixed kthread config
net: update NAPI threaded config even for disabled NAPIs
selftests: drv-net: don't assume device has only 2 queues
docs: Fix name for net.ipv4.udp_child_hash_entries
riscv: dts: thead: Add APB clocks for TH1520 GMACs
...
Xu Yang [Mon, 11 Aug 2025 09:29:31 +0000 (17:29 +0800)]
net: usb: asix_devices: add phy_mask for ax88772 mdio bus
Without setting phy_mask for ax88772 mdio bus, current driver may create
at most 32 mdio phy devices with phy address range from 0x00 ~ 0x1f.
DLink DUB-E100 H/W Ver B1 is such a device. However, only one main phy
device will bind to net phy driver. This is creating issue during system
suspend/resume since phy_polling_mode() in phy_state_machine() will
directly deference member of phydev->drv for non-main phy devices. Then
NULL pointer dereference issue will occur. Due to only external phy or
internal phy is necessary, add phy_mask for ax88772 mdio bus to workarnoud
the issue.
Closes: https://lore.kernel.org/netdev/
20250806082931.
3289134-1-xu.yang_2@nxp.com
Fixes:
e532a096be0e ("net: usb: asix: ax88772: add phylib support")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250811092931.860333-1-xu.yang_2@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Thu, 14 Aug 2025 03:23:32 +0000 (20:23 -0700)]
Merge tag 'probes-fixes-v6.17-rc1' of git://git./linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
- MAINTAINERS: Remove bouncing kprobes maintainer
* tag 'probes-fixes-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: Remove bouncing kprobes maintainer
Dave Hansen [Thu, 14 Aug 2025 02:38:58 +0000 (11:38 +0900)]
MAINTAINERS: Remove bouncing kprobes maintainer
The kprobes MAINTAINERS entry includes anil.s.keshavamurthy@intel.com.
That address is bouncing. Remove it.
This still leaves three other listed maintainers.
Link: https://lore.kernel.org/all/20250808180124.7DDE2ECD@davehans-spike.ostc.intel.com/
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Sven Stegemann [Tue, 12 Aug 2025 19:18:03 +0000 (21:18 +0200)]
net: kcm: Fix race condition in kcm_unattach()
syzbot found a race condition when kcm_unattach(psock)
and kcm_release(kcm) are executed at the same time.
kcm_unattach() is missing a check of the flag
kcm->tx_stopped before calling queue_work().
If the kcm has a reserved psock, kcm_unattach() might get executed
between cancel_work_sync() and unreserve_psock() in kcm_release(),
requeuing kcm->tx_work right before kcm gets freed in kcm_done().
Remove kcm->tx_stopped and replace it by the less
error-prone disable_work_sync().
Fixes:
ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+e62c9db591c30e174662@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
e62c9db591c30e174662
Reported-by: syzbot+d199b52665b6c3069b94@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
d199b52665b6c3069b94
Reported-by: syzbot+be6b1fdfeae512726b4e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
be6b1fdfeae512726b4e
Signed-off-by: Sven Stegemann <sven@stegemann.de>
Link: https://patch.msgid.link/20250812191810.27777-1-sven@stegemann.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 14 Aug 2025 01:11:56 +0000 (18:11 -0700)]
Merge branch 'ets-use-old-nbands-while-purging-unused-classes'
Davide Caratti says:
====================
ets: use old 'nbands' while purging unused classes
- patch 1/2 fixes a NULL dereference in the control path of sch_ets qdisc
- patch 2/2 extends kselftests to verify effectiveness of the above fix
====================
Link: https://patch.msgid.link/cover.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Davide Caratti [Tue, 12 Aug 2025 16:40:30 +0000 (18:40 +0200)]
selftests: net/forwarding: test purge of active DWRR classes
Extend sch_ets.sh to add a reproducer for problematic list deletions when
active DWRR class are purged by ets_qdisc_change() [1] [2].
[1] https://lore.kernel.org/netdev/
e08c7f4a6882f260011909a868311c6e9b54f3e4.
1639153474.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/
f3b9bacc73145f265c19ab80785933da5b7cbdec.
1754581577.git.dcaratti@redhat.com/
Suggested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/489497cb781af7389011ca1591fb702a7391f5e7.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Davide Caratti [Tue, 12 Aug 2025 16:40:29 +0000 (18:40 +0200)]
net/sched: ets: use old 'nbands' while purging unused classes
Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify()
after recent changes from Lion [2]. The problem is: in ets_qdisc_change()
we purge unused DWRR queues; the value of 'q->nbands' is the new one, and
the cleanup should be done with the old one. The problem is here since my
first attempts to fix ets_qdisc_change(), but it surfaced again after the
recent qdisc len accounting fixes. Fix it purging idle DWRR queues before
assigning a new value of 'q->nbands', so that all purge operations find a
consistent configuration:
- old 'q->nbands' because it's needed by ets_class_find()
- old 'q->nstrict' because it's needed by ets_class_is_strict()
BUG: kernel NULL pointer dereference, address:
0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary)
Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021
RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80
Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab
RSP: 0018:
ffffba186009f400 EFLAGS:
00010202
RAX:
00000000000000d6 RBX:
0000000000000000 RCX:
0000000000000004
RDX:
ffff9f0fa29b69c0 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ffffffffc12c2400 R08:
0000000000000008 R09:
0000000000000004
R10:
ffffffffffffffff R11:
0000000000000004 R12:
0000000000000000
R13:
ffff9f0f8cfe0000 R14:
0000000000100005 R15:
0000000000000000
FS:
00007f2154f37480(0000) GS:
ffff9f269c1c0000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000000 CR3:
00000001530be001 CR4:
00000000007726f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
<TASK>
ets_class_qlen_notify+0x65/0x90 [sch_ets]
qdisc_tree_reduce_backlog+0x74/0x110
ets_qdisc_change+0x630/0xa40 [sch_ets]
__tc_modify_qdisc.constprop.0+0x216/0x7f0
tc_modify_qdisc+0x7c/0x120
rtnetlink_rcv_msg+0x145/0x3f0
netlink_rcv_skb+0x53/0x100
netlink_unicast+0x245/0x390
netlink_sendmsg+0x21b/0x470
____sys_sendmsg+0x39d/0x3d0
___sys_sendmsg+0x9a/0xe0
__sys_sendmsg+0x7a/0xd0
do_syscall_64+0x7d/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f2155114084
Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
RSP: 002b:
00007fff1fd7a988 EFLAGS:
00000202 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
0000560ec063e5e0 RCX:
00007f2155114084
RDX:
0000000000000000 RSI:
00007fff1fd7a9f0 RDI:
0000000000000003
RBP:
00007fff1fd7aa60 R08:
0000000000000010 R09:
000000000000003f
R10:
0000560ee9b3a010 R11:
0000000000000202 R12:
00007fff1fd7aae0
R13:
000000006891ccde R14:
0000560ec063e5e0 R15:
00007fff1fd7aad0
</TASK>
[1] https://lore.kernel.org/netdev/
e08c7f4a6882f260011909a868311c6e9b54f3e4.
1639153474.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/
d912cbd7-193b-4269-9857-
525bee8bbb6a@gmail.com/
Cc: stable@vger.kernel.org
Fixes:
103406b38c60 ("net/sched: Always pass notifications when child class becomes empty")
Fixes:
c062f2a0b04d ("net/sched: sch_ets: don't remove idle classes from the round-robin list")
Fixes:
dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Reported-by: Li Shuang <shuali@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-108026
Reviewed-by: Petr Machata <petrm@nvidia.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 14 Aug 2025 00:31:46 +0000 (17:31 -0700)]
Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
ixgbe: bypass devlink phys_port_name generation
Jedrzej adds option to skip phys_port_name generation and opts
ixgbe into it as some configurations rely on pre-devlink naming
which could end up broken as a result.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: prevent from unwanted interface name changes
devlink: let driver opt out of automatic phys_port_name generation
====================
Link: https://patch.msgid.link/20250812205226.1984369-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wei [Tue, 12 Aug 2025 18:29:07 +0000 (11:29 -0700)]
bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
The data page pool always fills the HW rx ring with pages. On arm64 with
64K pages, this will waste _at least_ 32K of memory per entry in the rx
ring.
Fix by fragmenting the pages if PAGE_SIZE > BNXT_RX_PAGE_SIZE. This
makes the data page pool the same as the header pool.
Tested with iperf3 with a small (64 entries) rx ring to encourage buffer
circulation.
Fixes:
cd1fafe7da1f ("eth: bnxt: add support rx side device memory TCP")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250812182907.1540755-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Tue, 12 Aug 2025 16:21:26 +0000 (16:21 +0000)]
netdevsim: Fix wild pointer access in nsim_queue_free().
syzbot reported the splat below. [0]
When nsim_queue_uninit() is called from nsim_init_netdevsim(),
register_netdevice() has not been called, thus dev->dstats has
not been allocated.
Let's not call dev_dstats_rx_dropped_add() in such a case.
[0]
BUG: unable to handle page fault for address:
ffff88809782c020
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
PGD
1b401067 P4D
1b401067 PUD 0
Oops: Oops: 0002 [#1] SMP KASAN NOPTI
CPU: 3 UID: 0 PID: 8476 Comm: syz.1.251 Not tainted
6.16.0-syzkaller-06699-ge8d780dcd957 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:local_add arch/x86/include/asm/local.h:33 [inline]
RIP: 0010:u64_stats_add include/linux/u64_stats_sync.h:89 [inline]
RIP: 0010:dev_dstats_rx_dropped_add include/linux/netdevice.h:3027 [inline]
RIP: 0010:nsim_queue_free+0xba/0x120 drivers/net/netdevsim/netdev.c:714
Code: 07 77 6c 4a 8d 3c ed 20 7e f1 8d 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 46 4a 03 1c ed 20 7e f1 8d <4c> 01 63 20 be 00 02 00 00 48 8d 3d 00 00 00 00 e8 61 2f 58 fa 48
RSP: 0018:
ffffc900044af150 EFLAGS:
00010286
RAX:
dffffc0000000000 RBX:
ffff88809782c000 RCX:
00000000000079c3
RDX:
1ffffffff1be2fc7 RSI:
ffffffff8c15f380 RDI:
ffffffff8df17e38
RBP:
ffff88805f59d000 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000001 R12:
0000000000000000
R13:
0000000000000003 R14:
ffff88806ceb3d00 R15:
ffffed100dfd308e
FS:
0000000000000000(0000) GS:
ffff88809782c000(0063) knlGS:
00000000f505db40
CS: 0010 DS: 002b ES: 002b CR0:
0000000080050033
CR2:
ffff88809782c020 CR3:
000000006fc6a000 CR4:
0000000000352ef0
Call Trace:
<TASK>
nsim_queue_uninit drivers/net/netdevsim/netdev.c:993 [inline]
nsim_init_netdevsim drivers/net/netdevsim/netdev.c:1049 [inline]
nsim_create+0xd0a/0x1260 drivers/net/netdevsim/netdev.c:1101
__nsim_dev_port_add+0x435/0x7d0 drivers/net/netdevsim/dev.c:1438
nsim_dev_port_add_all drivers/net/netdevsim/dev.c:1494 [inline]
nsim_dev_reload_create drivers/net/netdevsim/dev.c:1546 [inline]
nsim_dev_reload_up+0x5b8/0x860 drivers/net/netdevsim/dev.c:1003
devlink_reload+0x322/0x7c0 net/devlink/dev.c:474
devlink_nl_reload_doit+0xe31/0x1410 net/devlink/dev.c:584
genl_family_rcv_msg_doit+0x206/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x155/0x420 net/netlink/af_netlink.c:2552
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346
netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896
sock_sendmsg_nosec net/socket.c:714 [inline]
__sock_sendmsg net/socket.c:729 [inline]
____sys_sendmsg+0xa95/0xc70 net/socket.c:2614
___sys_sendmsg+0x134/0x1d0 net/socket.c:2668
__sys_sendmsg+0x16d/0x220 net/socket.c:2700
do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
__do_fast_syscall_32+0x7c/0x3a0 arch/x86/entry/syscall_32.c:306
do_fast_syscall_32+0x32/0x80 arch/x86/entry/syscall_32.c:331
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
RIP: 0023:0xf708e579
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:
00000000f505d55c EFLAGS:
00000296 ORIG_RAX:
0000000000000172
RAX:
ffffffffffffffda RBX:
0000000000000007 RCX:
0000000080000080
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
0000000000000000 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000296 R12:
0000000000000000
R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
</TASK>
Modules linked in:
CR2:
ffff88809782c020
Fixes:
2a68a22304f9 ("netdevsim: account dropped packet length in stats on queue free")
Reported-by: syzbot+8aa80c6232008f7b957d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
688bb9ca.
a00a0220.26d0e1.0050.GAE@google.com/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250812162130.4129322-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matt Johnston [Tue, 12 Aug 2025 05:08:58 +0000 (13:08 +0800)]
net: mctp: Fix bad kfree_skb in bind lookup test
The kunit test's skb_pkt is consumed by mctp_dst_input() so shouldn't be
freed separately.
Fixes:
e6d8e7dbc5a3 ("net: mctp: Add bind lookup test")
Reported-by: Alexandre Ghiti <alex@ghiti.fr>
Closes: https://lore.kernel.org/all/
734b02a3-1941-49df-a0da-
ec14310d41e4@ghiti.fr/
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://patch.msgid.link/20250812-fix-mctp-bind-test-v1-1-5e2128664eb3@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 13 Aug 2025 21:51:51 +0000 (14:51 -0700)]
Merge tag 'nf-25-08-13' of https://git./linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for *net*:
1) I managed to add a null dereference crash in nft_set_pipapo
in the current development cycle, was not caught by CI
because the avx2 implementation is fine, but selftest
splats when run on non-avx2 host.
2) Fix the ipvs estimater kthread affinity, was incorrect
since 6.14. From Frederic Weisbecker.
3) nf_tables should not allow to add a device to a flowtable
or netdev chain more than once -- reject this.
From Pablo Neira Ayuso. This has been broken for long time,
blamed commit dates from v5.8.
* tag 'nf-25-08-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: reject duplicate device on updates
ipvs: Fix estimator kthreads preferred affinity
netfilter: nft_set_pipapo: fix null deref for empty set
====================
Link: https://patch.msgid.link/20250813113800.20775-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 13 Aug 2025 18:29:27 +0000 (11:29 -0700)]
Merge tag 'erofs-for-6.17-rc2-fixes' of git://git./linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Align FSDAX enablement among multiple devices
- Fix EROFS_FS_ZIP_ACCEL build dependency again to prevent forcing
CRYPTO{,_DEFLATE}=y even if EROFS=m
- Fix atomic context detection to properly launch kworkers on demand
- Fix block count statistics for 48-bit addressing support
* tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix block count report when 48-bit layout is on
erofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOC
erofs: Do not select tristate symbols from bool symbols
erofs: Fallback to normal access if DAX is not supported on extra device
Linus Torvalds [Wed, 13 Aug 2025 17:23:28 +0000 (10:23 -0700)]
Merge tag 'rcu.fixes.6.17' of git://git./linux/kernel/git/rcu/linux
Pull RCU fix from Neeraj Upadhyay:
"Fix a regression introduced by commit
b41642c87716 ("rcu: Fix
rcu_read_unlock() deadloop due to IRQ work") which results in boot
hang as reported by kernel test bot at [1].
This issue happens because RCU re-initializes the deferred QS IRQ work
everytime it is queued. With commit
b41642c87716, the IRQ work
re-initialization can happen while it is already queued. This results
in IRQ work being requeued to itself. When IRQ work finally fires, as
it is requeued to itself, it is repeatedly executed and results in
hang.
Fix this with initializing the IRQ work only once before the CPU
boots"
Link: https://lore.kernel.org/rcu/202508071303.c1134cce-lkp@intel.com/
* tag 'rcu.fixes.6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
rcu: Fix racy re-initialization of irq_work causing hangs
Linus Torvalds [Wed, 13 Aug 2025 15:28:33 +0000 (08:28 -0700)]
Merge tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"12 hotfixes. 5 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels.
10 of these fixes are for MM"
* tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
proc: proc_maps_open allow proc_mem_open to return NULL
mm/mremap: avoid expensive folio lookup on mremap folio pte batch
userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
mm: pass page directly instead of using folio_page
selftests/proc: fix string literal warning in proc-maps-race.c
fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats
mm/smaps: fix race between smaps_hugetlb_range and migration
mm: fix the race between collapse and PT_RECLAIM under per-vma lock
mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
MAINTAINERS: add Masami as a reviewer of hung task detector
mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
kasan/test: fix protection against compiler elision
Pablo Neira Ayuso [Wed, 13 Aug 2025 00:38:50 +0000 (02:38 +0200)]
netfilter: nf_tables: reject duplicate device on updates
A chain/flowtable update with duplicated devices in the same batch is
possible. Unfortunately, netdev event path only removes the first
device that is found, leaving unregistered the hook of the duplicated
device.
Check if a duplicated device exists in the transaction batch, bail out
with EEXIST in such case.
WARNING is hit when unregistering the hook:
[49042.221275] WARNING: CPU: 4 PID: 8425 at net/netfilter/core.c:340 nf_hook_entry_head+0xaa/0x150
[49042.221375] CPU: 4 UID: 0 PID: 8425 Comm: nft Tainted: G S 6.16.0+ #170 PREEMPT(full)
[...]
[49042.221382] RIP: 0010:nf_hook_entry_head+0xaa/0x150
Fixes:
78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable")
Fixes:
b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Frederic Weisbecker [Tue, 29 Jul 2025 12:26:11 +0000 (14:26 +0200)]
ipvs: Fix estimator kthreads preferred affinity
The estimator kthreads' affinity are defined by sysctl overwritten
preferences and applied through a plain call to the scheduler's affinity
API.
However since the introduction of managed kthreads preferred affinity,
such a practice shortcuts the kthreads core code which eventually
overwrites the target to the default unbound affinity.
Fix this with using the appropriate kthread's API.
Fixes:
d1a89197589c ("kthread: Default affine kthread to its preferred NUMA node")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Mon, 11 Aug 2025 10:26:10 +0000 (12:26 +0200)]
netfilter: nft_set_pipapo: fix null deref for empty set
Blamed commit broke the check for a null scratch map:
- if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
+ if (unlikely(!raw_cpu_ptr(m->scratch)))
This should have been "if (!*raw_ ...)".
Use the pattern of the avx2 version which is more readable.
This can only be reproduced if avx2 support isn't available.
Fixes:
d8d871a35ca9 ("netfilter: nft_set_pipapo: merge pipapo_get/lookup")
Signed-off-by: Florian Westphal <fw@strlen.de>
Jakub Kicinski [Thu, 7 Aug 2025 23:29:07 +0000 (16:29 -0700)]
selftests: tls: test TCP stealing data from under the TLS socket
Check a race where data disappears from the TCP socket after
TLS signaled that its ready to receive.
ok 6 global.data_steal
# RUN tls_basic.base_base ...
# OK tls_basic.base_base
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250807232907.600366-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 7 Aug 2025 23:29:06 +0000 (16:29 -0700)]
tls: handle data disappearing from under the TLS ULP
TLS expects that it owns the receive queue of the TCP socket.
This cannot be guaranteed in case the reader of the TCP socket
entered before the TLS ULP was installed, or uses some non-standard
read API (eg. zerocopy ones). Replace the WARN_ON() and a buggy
early exit (which leaves anchor pointing to a freed skb) with real
error handling. Wipe the parsing state and tell the reader to retry.
We already reload the anchor every time we (re)acquire the socket lock,
so the only condition we need to avoid is an out of bounds read
(not having enough bytes in the socket for previously parsed record len).
If some data was read from under TLS but there's enough in the queue
we'll reload and decrypt what is most likely not a valid TLS record.
Leading to some undefined behavior from TLS perspective (corrupting
a stream? missing an alert? missing an attack?) but no kernel crash
should take place.
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Link: https://lore.kernel.org/tFjq_kf7sWIG3A7CrCg_egb8CVsT_gsmHAK0_wxDPJXfIzxFAMxqmLwp3MlU5EHiet0AwwJldaaFdgyHpeIUCS-3m3llsmRzp9xIOBR4lAI=@syst3mfailure.io
Fixes:
84c61fe1a75b ("tls: rx: do not use the standard strparser")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250807232907.600366-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jeongjun Park [Mon, 28 Jul 2025 06:26:49 +0000 (15:26 +0900)]
ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
syzbot reported the following ABBA deadlock:
CPU0 CPU1
---- ----
n_vclocks_store()
lock(&ptp->n_vclocks_mux) [1]
(physical clock)
pc_clock_adjtime()
lock(&clk->rwsem) [2]
(physical clock)
...
ptp_clock_freerun()
ptp_vclock_in_use()
lock(&ptp->n_vclocks_mux) [3]
(physical clock)
ptp_clock_unregister()
posix_clock_unregister()
lock(&clk->rwsem) [4]
(virtual clock)
Since ptp virtual clock is registered only under ptp physical clock, both
ptp_clock and posix_clock must be physical clocks for ptp_vclock_in_use()
to lock &ptp->n_vclocks_mux and check ptp->n_vclocks.
However, when unregistering vclocks in n_vclocks_store(), the locking
ptp->n_vclocks_mux is a physical clock lock, but clk->rwsem of
ptp_clock_unregister() called through device_for_each_child_reverse()
is a virtual clock lock.
Therefore, clk->rwsem used in CPU0 and clk->rwsem used in CPU1 are
different locks, but in lockdep, a false positive occurs because the
possibility of deadlock is determined through lock-class.
To solve this, lock subclass annotation must be added to the posix_clock
rwsem of the vclock.
Reported-by: syzbot+7cfb66a237c4a5fb22ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
7cfb66a237c4a5fb22ad
Fixes:
73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250728062649.469882-1-aha310510@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jedrzej Jagielski [Thu, 3 Jul 2025 10:41:15 +0000 (12:41 +0200)]
ixgbe: prevent from unwanted interface name changes
Users of the ixgbe driver report that after adding devlink support by
the commit
a0285236ab93 ("ixgbe: add initial devlink support") their
configs got broken due to unwanted changes of interface names. It's
caused by automatic phys_port_name generation during devlink port
initialization flow.
To prevent from that set no_phys_port_name flag for ixgbe devlink ports.
Reported-by: David Howells <dhowells@redhat.com>
Closes: https://lore.kernel.org/netdev/
3452224.
1745518016@warthog.procyon.org.uk/
Reported-by: David Kaplan <David.Kaplan@amd.com>
Closes: https://lore.kernel.org/netdev/LV3PR12MB92658474624CCF60220157199470A@LV3PR12MB9265.namprd12.prod.outlook.com/
Fixes:
a0285236ab93 ("ixgbe: add initial devlink support")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jedrzej Jagielski [Fri, 4 Jul 2025 11:17:47 +0000 (13:17 +0200)]
devlink: let driver opt out of automatic phys_port_name generation
Currently when adding devlink port, phys_port_name is automatically
generated within devlink port initialization flow. As a result adding
devlink port support to driver may result in forced changes of interface
names, which breaks already existing network configs.
This is an expected behavior but in some scenarios it would not be
preferable to provide such limitation for legacy driver not being able to
keep 'pre-devlink' interface name.
Add flag no_phys_port_name to devlink_port_attrs struct which indicates
if devlink should not alter name of interface.
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Link: https://lore.kernel.org/all/nbwrfnjhvrcduqzjl4a2jafnvvud6qsbxlvxaxilnryglf4j7r@btuqrimnfuly/
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Tue, 12 Aug 2025 19:10:33 +0000 (12:10 -0700)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs
Pull habanalabs fix from Al Viro:
"Yet another use-after-free fix due to dma_buf_fd() misuse"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
habanalabs: fix UAF in export_dmabuf()
Linus Torvalds [Tue, 12 Aug 2025 15:52:05 +0000 (08:52 -0700)]
Merge tag 'for-6.17-rc1-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix bug in qgroups reporting incorrect usage for higher level qgroups
- in zoned mode, do not select metadata group as finish target
- convert xarray lock to RCU when trying to release extent buffer to
avoid a deadlock
- do not allow relocation on partially dropped subvolumes, which is
normally not possible but has been reported on old filesystems
- in tree-log, report errors on missing block group when unaccounting
log tree extent buffers
- with large folios, fix range length when processing ordered extents
* tag 'for-6.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix iteration bug in __qgroup_excl_accounting()
btrfs: zoned: do not select metadata BG as finish target
btrfs: do not allow relocation of partially dropped subvolumes
btrfs: error on missing block group when unaccounting log tree extent buffers
btrfs: fix wrong length parameter for btrfs_cleanup_ordered_extents()
btrfs: make btrfs_cleanup_ordered_extents() support large folios
btrfs: fix subpage deadlock in try_release_subpage_extent_buffer()
Linus Torvalds [Tue, 12 Aug 2025 15:19:23 +0000 (08:19 -0700)]
Merge tag 'snp_cache_coherency' of git://git./linux/kernel/git/tip/tip
- Add a mitigation for a cache coherency vulnerability when running an
SNP guest which makes sure all cache lines belonging to a 4K page are
evicted after latter has been converted to a guest-private page
[ SNP: Secure Nested Paging - not to be confused with Single Nucleotide
Polymorphism, which is the more common use of that TLA. I am on a
mission to write out the more obscure TLAs in order to keep track of
them.
Because while math tells us that there are only about 17k different
combinations of three-letter acronyms using English letters (26^3), I
am convinced that somehow Intel, AMD and ARM have together figured out
new mathematics, and have at least a million different TLAs that they
use. - Linus ]
* tag 'snp_cache_coherency' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Evict cache lines during SNP memory validation
Paolo Abeni [Tue, 12 Aug 2025 12:57:02 +0000 (14:57 +0200)]
Merge tag 'ipsec-2025-08-11' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2025-08-11
1) Fix flushing of all states in xfrm_state_fini.
From Sabrina Dubroca.
2) Fix some IPsec software offload features. These
got lost with some recent HW offload changes.
From Sabrina Dubroca.
Please pull or let me know if there are problems.
* tag 'ipsec-2025-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
udp: also consider secpath when evaluating ipsec use for checksumming
xfrm: bring back device check in validate_xmit_xfrm
xfrm: restore GSO for SW crypto
xfrm: flush all states in xfrm_state_fini
====================
Link: https://patch.msgid.link/20250811092008.731573-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 12 Aug 2025 12:43:08 +0000 (14:43 +0200)]
Merge branch 'net-prevent-deadlocks-and-mis-configuration-with-per-napi-threaded-config'
Jakub Kicinski says:
====================
net: prevent deadlocks and mis-configuration with per-NAPI threaded config
Running the test added with a recent fix on a driver with persistent
NAPI config leads to a deadlock. The deadlock is fixed by patch 3,
patch 2 is I think a more fundamental problem with the way we
implemented the config.
I hope the fix makes sense, my own thinking is definitely colored
by my preference (IOW how the per-queue config RFC was implemented).
v1: https://lore.kernel.org/
20250808014952.724762-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250809001205.1147153-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Sat, 9 Aug 2025 00:12:05 +0000 (17:12 -0700)]
net: prevent deadlocks when enabling NAPIs with mixed kthread config
The following order of calls currently deadlocks if:
- device has threaded=1; and
- NAPI has persistent config with threaded=0.
netif_napi_add_weight_config()
dev->threaded == 1
napi_kthread_create()
napi_enable()
napi_restore_config()
napi_set_threaded(0)
napi_stop_kthread()
while (NAPIF_STATE_SCHED)
msleep(20)
We deadlock because disabled NAPI has STATE_SCHED set.
Creating a thread in netif_napi_add() just to destroy it in
napi_disable() is fairly ugly in the first place. Let's read
both the device config and the NAPI config in netif_napi_add().
Fixes:
e6d76268813d ("net: Update threaded state in napi config in netif_set_threaded")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250809001205.1147153-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Sat, 9 Aug 2025 00:12:04 +0000 (17:12 -0700)]
net: update NAPI threaded config even for disabled NAPIs
We have to make sure that all future NAPIs will have the right threaded
state when the state is configured on the device level.
We chose not to have an "unset" state for threaded, and not to wipe
the NAPI config clean when channels are explicitly disabled.
This means the persistent config structs "exist" even when their NAPIs
are not instantiated.
Differently put - the NAPI persistent state lives in the net_device
(ncfg == struct napi_config):
,--- [napi 0] - [napi 1]
[dev] | |
`--- [ncfg 0] - [ncfg 1]
so say we a device with 2 queues but only 1 enabled:
,--- [napi 0]
[dev] |
`--- [ncfg 0] - [ncfg 1]
now we set the device to threaded=1:
,---------- [napi 0 (thr:1)]
[dev(thr:1)] |
`---------- [ncfg 0 (thr:1)] - [ncfg 1 (thr:?)]
Since [ncfg 1] was not attached to a NAPI during configuration we
skipped it. If we create a NAPI for it later it will have the old
setting (presumably disabled). One could argue if this is right
or not "in principle", but it's definitely not how things worked
before per-NAPI config..
Fixes:
2677010e7793 ("Add support to set NAPI threaded for individual NAPI")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250809001205.1147153-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Sat, 9 Aug 2025 00:12:03 +0000 (17:12 -0700)]
selftests: drv-net: don't assume device has only 2 queues
The test is implicitly assuming the device only has 2 queues.
A real device will likely have more. The exact problem is that
because NAPIs get added to the list from the head, the netlink
dump reports them in reverse order. So the naive napis[0] will
actually likely give us the _last_ NAPI, not the first one.
Re-enable all the NAPIs instead of hard-coding 2 in the test.
This way the NAPIs we operated on will always reappear,
doesn't matter where they were in the registration order.
Fixes:
e6d76268813d ("net: Update threaded state in napi config in netif_set_threaded")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250809001205.1147153-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jordan Rife [Fri, 8 Aug 2025 18:57:56 +0000 (11:57 -0700)]
docs: Fix name for net.ipv4.udp_child_hash_entries
udp_child_ehash_entries -> udp_child_hash_entries
Fixes:
9804985bf27f ("udp: Introduce optional per-netns hash table.")
Signed-off-by: Jordan Rife <jordan@jrife.io>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250808185800.1189042-1-jordan@jrife.io
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 12 Aug 2025 10:52:24 +0000 (12:52 +0200)]
Merge branch 'fix-broken-link-with-th1520-gmac-when-linkspeed-changes'
Yao Zi says:
====================
Fix broken link with TH1520 GMAC when linkspeed changes
It's noted that on TH1520 SoC, the GMAC's link becomes broken after
the link speed is changed (for example, running ethtool -s eth0 speed
100 on the peer when negotiated to 1Gbps), but the GMAC could function
normally if the speed is brought back to the initial.
Just like many other SoCs utilizing STMMAC IP, we need to adjust the TX
clock supplying TH1520's GMAC through some SoC-specific glue registers
when linkspeed changes. But it's found that after the full kernel
startup, reading from them results in garbage and writing to them makes
no effect, which is the cause of broken link.
Further testing shows perisys-apb4-hclk must be ungated for normal
access to Th1520 GMAC APB glue registers, which is neither described in
dt-binding nor acquired by the driver.
This series expands the dt-binding of TH1520's GMAC to allow an extra
"APB glue registers interface clock", instructs the driver to acquire
and enable the clock, and finally supplies CLK_PERISYS_APB4_HCLK for
TH1520's GMACs in SoC devicetree.
v2: https://lore.kernel.org/netdev/
20250801091240.46114-1-ziyao@disroot.org/
v1: https://lore.kernel.org/all/
20250729093734.40132-1-ziyao@disroot.org/
====================
Link: https://patch.msgid.link/20250808093655.48074-2-ziyao@disroot.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yao Zi [Fri, 8 Aug 2025 09:36:56 +0000 (09:36 +0000)]
riscv: dts: thead: Add APB clocks for TH1520 GMACs
Describe perisys-apb4-hclk as the APB clock for TH1520 SoC, which is
essential for accessing GMAC glue registers.
Fixes:
7e756671a664 ("riscv: dts: thead: Add TH1520 ethernet nodes")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Tested-by: Drew Fustini <fustini@kernel.org>
Link: https://patch.msgid.link/20250808093655.48074-5-ziyao@disroot.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yao Zi [Fri, 8 Aug 2025 09:36:55 +0000 (09:36 +0000)]
net: stmmac: thead: Get and enable APB clock on initialization
It's necessary to adjust the MAC TX clock when the linkspeed changes,
but it's noted such adjustment always fails on TH1520 SoC, and reading
back from APB glue registers that control clock generation results in
garbage, causing broken link.
With some testing, it's found a clock must be ungated for access to APB
glue registers. Without any consumer, the clock is automatically
disabled during late kernel startup. Let's get and enable it if it's
described in devicetree.
For backward compatibility with older devicetrees, probing won't fail if
the APB clock isn't found. In this case, we emit a warning since the
link will break if the speed changes.
Fixes:
33a1a01e3afa ("net: stmmac: Add glue layer for T-HEAD TH1520 SoC")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Tested-by: Drew Fustini <fustini@kernel.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Link: https://patch.msgid.link/20250808093655.48074-4-ziyao@disroot.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yao Zi [Fri, 8 Aug 2025 09:36:54 +0000 (09:36 +0000)]
dt-bindings: net: thead,th1520-gmac: Describe APB interface clock
Besides ones for GMAC core and peripheral registers, the TH1520 GMAC
requires one more clock for configuring APB glue registers. Describe
it in the binding.
Fixes:
f920ce04c399 ("dt-bindings: net: Add T-HEAD dwmac support")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Link: https://patch.msgid.link/20250808093655.48074-3-ziyao@disroot.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Buday Csaba [Thu, 7 Aug 2025 13:54:49 +0000 (15:54 +0200)]
net: mdiobus: release reset_gpio in mdiobus_unregister_device()
reset_gpio is claimed in mdiobus_register_device(), but it is not
released in mdiobus_unregister_device(). It is instead only
released when the whole MDIO bus is unregistered.
When a device uses the reset_gpio property, it becomes impossible
to unregister it and register it again, because the GPIO remains
claimed.
This patch resolves that issue.
Fixes:
bafbdd527d56 ("phylib: Add device reset GPIO support") # see notes
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Cc: Csókás Bence <csokas.bence@prolan.hu>
[ csokas.bence: Resolve rebase conflict and clarify msg ]
Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Link: https://patch.msgid.link/20250807135449.254254-2-csokas.bence@prolan.hu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Clark Wang [Thu, 7 Aug 2025 04:08:32 +0000 (12:08 +0800)]
net: phy: nxp-c45-tja11xx: fix the PHY ID mismatch issue when using C45
TJA1103/04/20/21 support both C22 and C45 accessing methods.
The TJA11xx driver has implemented the match_phy_device() API.
However, it does not handle the C45 ID. If C45 was used to access
TJA11xx, match_phy_device() would always return false due to
phydev->phy_id only used by C22 being empty, resulting in the
generic phy driver being used for TJA11xx PHYs.
Therefore, check phydev->c45_ids.device_ids[MDIO_MMD_PMAPMD] when
using C45.
Fixes:
1b76b2497aba ("net: phy: nxp-c45-tja11xx: simplify .match_phy_device OP")
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Link: https://patch.msgid.link/20250807040832.2455306-1-xiaoning.wang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jialin Wang [Thu, 7 Aug 2025 16:54:55 +0000 (00:54 +0800)]
proc: proc_maps_open allow proc_mem_open to return NULL
The commit
65c66047259f ("proc: fix the issue of proc_mem_open returning
NULL") caused proc_maps_open() to return -ESRCH when proc_mem_open()
returns NULL. This breaks legitimate /proc/<pid>/maps access for kernel
threads since kernel threads have NULL mm_struct.
The regression causes perf to fail and exit when profiling a kernel
thread:
# perf record -v -g -p $(pgrep kswapd0)
...
couldn't open /proc/65/task/65/maps
This patch partially reverts the commit to fix it.
Link: https://lkml.kernel.org/r/20250807165455.73656-1-wjl.linux@gmail.com
Fixes:
65c66047259f ("proc: fix the issue of proc_mem_open returning NULL")
Signed-off-by: Jialin Wang <wjl.linux@gmail.com>
Cc: Penglei Jiang <superman.xpt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Thu, 7 Aug 2025 18:58:19 +0000 (19:58 +0100)]
mm/mremap: avoid expensive folio lookup on mremap folio pte batch
It was discovered in the attached report that commit
f822a9a81a31 ("mm:
optimize mremap() by PTE batching") introduced a significant performance
regression on a number of metrics on x86-64, most notably
stress-ng.bigheap.realloc_calls_per_sec - indicating a 37.3% regression in
number of mremap() calls per second.
I was able to reproduce this locally on an intel x86-64 raptor lake
system, noting an average of 143,857 realloc calls/sec (with a stddev of
4,531 or 3.1%) prior to this patch being applied, and 81,503 afterwards
(stddev of 2,131 or 2.6%) - a 43.3% regression.
During testing I was able to determine that there was no meaningful
difference in efforts to optimise the folio_pte_batch() operation, nor
checking folio_test_large().
This is within expectation, as a regression this large is likely to
indicate we are accessing memory that is not yet in a cache line (and
perhaps may even cause a main memory fetch).
The expectation by those discussing this from the start was that
vm_normal_folio() (invoked by mremap_folio_pte_batch()) would likely be
the culprit due to having to retrieve memory from the vmemmap (which
mremap() page table moves does not otherwise do, meaning this is
inevitably cold memory).
I was able to definitively determine that this theory is indeed correct
and the cause of the issue.
The solution is to restore part of an approach previously discarded on
review, that is to invoke pte_batch_hint() which explicitly determines,
through reference to the PTE alone (thus no vmemmap lookup), what the PTE
batch size may be.
On platforms other than arm64 this is currently hardcoded to return 1, so
this naturally resolves the issue for x86-64, and for arm64 introduces
little to no overhead as the pte cache line will be hot.
With this patch applied, we move from 81,503 realloc calls/sec to 138,701
(stddev of 496.1 or 0.4%), which is a -3.6% regression, however accounting
for the variance in the original result, this is broadly restoring
performance to its prior state.
Link: https://lkml.kernel.org/r/20250807185819.199865-1-lorenzo.stoakes@oracle.com
Fixes:
f822a9a81a31 ("mm: optimize mremap() by PTE batching")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/
202508071609.
4e743d7c-lkp@intel.com
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Wed, 6 Aug 2025 22:00:22 +0000 (15:00 -0700)]
userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
When UFFDIO_MOVE encounters a migration PMD entry, it proceeds with
obtaining a folio and accessing it even though the entry is swp_entry_t.
Add the missing check and let split_huge_pmd() handle migration entries.
While at it also remove unnecessary folio check.
[surenb@google.com: remove extra folio check, per David]
Link: https://lkml.kernel.org/r/20250807200418.1963585-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250806220022.926763-1-surenb@google.com
Fixes:
adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+b446dbe27035ef6bd6c2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/
68794b5c.
a70a0220.693ce.0050.GAE@google.com/
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Wed, 6 Aug 2025 14:56:11 +0000 (20:26 +0530)]
mm: pass page directly instead of using folio_page
In commit_anon_folio_batch(), we iterate over all pages pointed to by the
PTE batch. Therefore we need to know the first page of the batch;
currently we derive that via folio_page(folio, 0), but, that takes us to
the first (head) page of the folio instead - our PTE batch may lie in the
middle of the folio, leading to incorrectness.
Bite the bullet and throw away the micro-optimization of reusing the folio
in favour of code simplicity. Derive the page and the folio in
change_pte_range, and pass the page too to commit_anon_folio_batch to fix
the aforementioned issue.
Link: https://lkml.kernel.org/r/20250806145611.3962-1-dev.jain@arm.com
Fixes:
cac1db8c3aad ("mm: optimize mprotect() by PTE batching")
Reported-by: syzbot+57bcc752f0df8bb1365c@syzkaller.appspotmail.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Debugged-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sukrut Heroorkar [Mon, 4 Aug 2025 22:56:14 +0000 (00:56 +0200)]
selftests/proc: fix string literal warning in proc-maps-race.c
This change resolves non literal string format warning invoked for
proc-maps-race.c while compiling.
proc-maps-race.c:205:17: warning: format not a string literal and no format arguments [-Wformat-security]
205 | printf(text);
| ^~~~~~
proc-maps-race.c:209:17: warning: format not a string literal and no format arguments [-Wformat-security]
209 | printf(text);
| ^~~~~~
proc-maps-race.c: In function `print_last_lines':
proc-maps-race.c:224:9: warning: format not a string literal and no format arguments [-Wformat-security]
224 | printf(start);
| ^~~~~~
Add string format specifier %s for the printf calls in both
print_first_lines() and print_last_lines() thus resolving the warnings.
The test executes fine after this change thus causing no effect to the
functional behavior of the test.
Link: https://lkml.kernel.org/r/20250804225633.841777-1-hsukrut3@gmail.com
Fixes:
aadc099c480f ("selftests/proc: add verbose mode for /proc/pid/maps tearing tests")
Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hunter <david.hunter.linux@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Russell King (Oracle) [Fri, 8 Aug 2025 12:16:39 +0000 (13:16 +0100)]
net: stmmac: dwc-qos: fix clk prepare/enable leak on probe failure
dwc_eth_dwmac_probe() gets bulk clocks, and then prepares and enables
them. Unfortunately, if dwc_eth_dwmac_config_dt() or stmmac_dvr_probe()
fail, we leave the clocks prepared and enabled. Fix this by using
devm_clk_bulk_get_all_enabled() to combine the steps and provide devm
based release of the prepare and enable state.
This also fixes a similar leakin dwc_eth_dwmac_remove() which wasn't
correctly retrieving the struct plat_stmmacenet_data. This becomes
unnecessary.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes:
a045e40645df ("net: stmmac: refactor clock management in EQoS driver")
Link: https://patch.msgid.link/E1ukM1X-0086qu-Td@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Fri, 8 Aug 2025 12:16:34 +0000 (13:16 +0100)]
net: stmmac: rk: put the PHY clock on remove
The PHY clock (bsp_priv->clk_phy) is obtained using of_clk_get(), which
doesn't take part in the devm release. Therefore, when a device is
unbound, this clock needs to be explicitly put. Fix this.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes:
fecd4d7eef8b ("net: stmmac: dwmac-rk: Add integrated PHY support")
Link: https://patch.msgid.link/E1ukM1S-0086qo-PC@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jeff Layton [Fri, 8 Aug 2025 11:45:23 +0000 (07:45 -0400)]
ref_tracker: use %p instead of %px in debugfs dentry name
As Kees points out, this is a kernel address leak, and debugging is
not a sufficiently good reason to expose the real kernel address.
Fixes:
65b584f53611 ("ref_tracker: automatically register a file in debugfs for a ref_tracker_dir")
Reported-by: Kees Cook <kees@kernel.org>
Closes: https://lore.kernel.org/netdev/
202507301603.
62E553F93@keescook/
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fabio Porcedda [Fri, 8 Aug 2025 13:31:08 +0000 (15:31 +0200)]
net: usb: qmi_wwan: add Telit Cinterion FN990A w/audio composition
Add the following Telit Cinterion FN990A w/audio composition:
0x1077: tty (diag) + adb + rmnet + audio + tty (AT/NMEA) + tty (AT) +
tty (AT) + tty (AT)
T: Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 8 Spd=480 MxCh= 0
D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=1077 Rev=05.04
S: Manufacturer=Telit Wireless Solutions
S: Product=FN990
S: SerialNumber=
67e04c35
C: #Ifs=10 Cfg#= 1 Atr=e0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 3 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio
I: If#= 4 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E: Ad=03(O) Atr=0d(Isoc) MxPS= 68 Ivl=1ms
I: If#= 5 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E: Ad=84(I) Atr=0d(Isoc) MxPS= 68 Ivl=1ms
I: If#= 6 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=8c(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
Cc: stable@vger.kernel.org
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Hansen [Fri, 8 Aug 2025 17:39:25 +0000 (10:39 -0700)]
MAINTAINERS: Remove bouncing T7XX reviewer
This reviewer's email no longer works. Remove it from MAINTAINERS.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Cc: Liu Haijun <haijun.liu@mediatek.com>
Cc: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Link: https://patch.msgid.link/20250808173925.FECE3782@davehans-spike.ostc.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Hansen [Fri, 8 Aug 2025 17:53:24 +0000 (10:53 -0700)]
MAINTAINERS: Mark Intel PTP DFL ToD as orphaned
This maintainer's email no longer works. Remove it from MAINTAINERS.
Also mark the code as an Orphan.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tianfei Zhang <tianfei.zhang@intel.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Link: https://patch.msgid.link/20250808175324.8C4B7354@davehans-spike.ostc.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Hansen [Fri, 8 Aug 2025 17:45:05 +0000 (10:45 -0700)]
MAINTAINERS: Mark Intel WWAN IOSM driver as orphaned
This maintainer's email no longer works. Remove it from MAINTAINERS.
I've been unable to locate a new maintainer for this at Intel. Mark
the driver as Orphaned.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Loic Poulain <loic.poulain@oss.qualcomm.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://patch.msgid.link/20250808174505.C9FF434F@davehans-spike.ostc.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Mon, 11 Aug 2025 14:38:55 +0000 (07:38 -0700)]
Merge tag 'nfsd-6.17-1' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- A correctness fix for delegated timestamps
- Address an NFSD shutdown hang when LOCALIO is in use
- Prevent a remotely exploitable crasher when TLS is in use
* tag 'nfsd-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
sunrpc: fix handling of server side tls alerts
nfsd: avoid ref leak in nfsd_open_local_fh()
nfsd: don't set the ctime on delegated atime updates