Linus Torvalds [Fri, 2 May 2025 15:50:10 +0000 (08:50 -0700)]
Merge tag 'slab-for-6.15-rc5' of git://git./linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- Stable fix to avoid bugs due to leftover obj_ext after allocation
profiling is disabled at runtime (Zhenhua Huang)
* tag 'slab-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm, slab: clean up slab->obj_exts always
Linus Torvalds [Thu, 1 May 2025 17:37:49 +0000 (10:37 -0700)]
Merge tag 'net-6.15-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Happy May Day.
Things have calmed down on our end (knock on wood), no outstanding
investigations. Including fixes from Bluetooth and WiFi.
Current release - fix to a fix:
- igc: fix lock order in igc_ptp_reset
Current release - new code bugs:
- Revert "wifi: iwlwifi: make no_160 more generic", fixes regression
to Killer line of devices reported by a number of people
- Revert "wifi: iwlwifi: add support for BE213", initial FW is too
buggy
- number of fixes for mld, the new Intel WiFi subdriver
Previous releases - regressions:
- wifi: mac80211: restore monitor for outgoing frames
- drv: vmxnet3: fix malformed packet sizing in vmxnet3_process_xdp
- eth: bnxt_en: fix timestamping FIFO getting out of sync on reset,
delivering stale timestamps
- use sock_gen_put() in the TCP fraglist GRO heuristic, don't assume
every socket is a full socket
Previous releases - always broken:
- sched: adapt qdiscs for reentrant enqueue cases, fix list
corruptions
- xsk: fix race condition in AF_XDP generic RX path, shared UMEM
can't be protected by a per-socket lock
- eth: mtk-star-emac: fix spinlock recursion issues on rx/tx poll
- btusb: avoid NULL pointer dereference in skb_dequeue()
- dsa: felix: fix broken taprio gate states after clock jump"
* tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
net: vertexcom: mse102x: Fix RX error handling
net: vertexcom: mse102x: Add range check for CMD_RTS
net: vertexcom: mse102x: Fix LEN_MASK
net: vertexcom: mse102x: Fix possible stuck of SPI interrupt
net: hns3: defer calling ptp_clock_register()
net: hns3: fixed debugfs tm_qset size
net: hns3: fix an interrupt residual problem
net: hns3: store rx VLAN tag offload state for VF
octeon_ep: Fix host hang issue during device reboot
net: fec: ERR007885 Workaround for conventional TX
net: lan743x: Fix memleak issue when GSO enabled
ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations
net: use sock_gen_put() when sk_state is TCP_TIME_WAIT
bnxt_en: fix module unload sequence
bnxt_en: Fix ethtool -d byte order for 32-bit values
bnxt_en: Fix out-of-bound memcpy() during ethtool -w
bnxt_en: Fix coredump logic to free allocated buffer
bnxt_en: delay pci_alloc_irq_vectors() in the AER path
bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()
bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()
...
Jakub Kicinski [Thu, 1 May 2025 14:24:08 +0000 (07:24 -0700)]
Merge branch 'net-vertexcom-mse102x-fix-rx-handling'
Stefan Wahren says:
====================
net: vertexcom: mse102x: Fix RX handling
This series is the first part of two series for the Vertexcom driver.
It contains substantial fixes for the RX handling of the Vertexcom MSE102x.
====================
Link: https://patch.msgid.link/20250430133043.7722-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Wahren [Wed, 30 Apr 2025 13:30:43 +0000 (15:30 +0200)]
net: vertexcom: mse102x: Fix RX error handling
In case the CMD_RTS got corrupted by interferences, the MSE102x
doesn't allow a retransmission of the command. Instead the Ethernet
frame must be shifted out of the SPI FIFO. Since the actual length is
unknown, assume the maximum possible value.
Fixes:
2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-5-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Wahren [Wed, 30 Apr 2025 13:30:42 +0000 (15:30 +0200)]
net: vertexcom: mse102x: Add range check for CMD_RTS
Since there is no protection in the SPI protocol against electrical
interferences, the driver shouldn't blindly trust the length payload
of CMD_RTS. So introduce a bounds check for incoming frames.
Fixes:
2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-4-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Wahren [Wed, 30 Apr 2025 13:30:41 +0000 (15:30 +0200)]
net: vertexcom: mse102x: Fix LEN_MASK
The LEN_MASK for CMD_RTS doesn't cover the whole parameter mask.
The Bit 11 is reserved, so adjust LEN_MASK accordingly.
Fixes:
2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Wahren [Wed, 30 Apr 2025 13:30:40 +0000 (15:30 +0200)]
net: vertexcom: mse102x: Fix possible stuck of SPI interrupt
The MSE102x doesn't provide any SPI commands for interrupt handling.
So in case the interrupt fired before the driver requests the IRQ,
the interrupt will never fire again. In order to fix this always poll
for pending packets after opening the interface.
Fixes:
2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 1 May 2025 14:19:52 +0000 (07:19 -0700)]
Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Jijie Shao says:
====================
There are some bugfix for the HNS3 ethernet driver
====================
Link: https://patch.msgid.link/20250430093052.2400464-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Wed, 30 Apr 2025 09:30:52 +0000 (17:30 +0800)]
net: hns3: defer calling ptp_clock_register()
Currently the ptp_clock_register() is called before relative
ptp resource ready. It may cause unexpected result when upper
layer called the ptp API during the timewindow. Fix it by
moving the ptp_clock_register() to the function end.
Fixes:
0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250430093052.2400464-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hao Lan [Wed, 30 Apr 2025 09:30:51 +0000 (17:30 +0800)]
net: hns3: fixed debugfs tm_qset size
The size of the tm_qset file of debugfs is limited to 64 KB,
which is too small in the scenario with 1280 qsets.
The size needs to be expanded to 1 MB.
Fixes:
5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250430093052.2400464-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yonglong Liu [Wed, 30 Apr 2025 09:30:50 +0000 (17:30 +0800)]
net: hns3: fix an interrupt residual problem
When a VF is passthrough to a VM, and the VM is killed, the reported
interrupt may not been handled, it will remain, and won't be clear by
the nic engine even with a flr or tqp reset. When the VM restart, the
interrupt of the first vector may be dropped by the second enable_irq
in vfio, see the issue below:
https://gitlab.com/qemu-project/qemu/-/issues/2884#note_2423361621
We notice that the vfio has always behaved this way, and the interrupt
is a residue of the nic engine, so we fix the problem by moving the
vector enable process out of the enable_irq loop.
Fixes:
08a100689d4b ("net: hns3: re-organize vector handle")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250430093052.2400464-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jian Shen [Wed, 30 Apr 2025 09:30:49 +0000 (17:30 +0800)]
net: hns3: store rx VLAN tag offload state for VF
The VF driver missed to store the rx VLAN tag strip state when
user change the rx VLAN tag offload state. And it will default
to enable the rx vlan tag strip when re-init VF device after
reset. So if user disable rx VLAN tag offload, and trig reset,
then the HW will still strip the VLAN tag from packet nad fill
into RX BD, but the VF driver will ignore it for rx VLAN tag
offload disabled. It may cause the rx VLAN tag dropped.
Fixes:
b2641e2ad456 ("net: hns3: Add support of hardware rx-vlan-offload to HNS3 VF driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250430093052.2400464-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 1 May 2025 14:17:15 +0000 (07:17 -0700)]
Merge branch '200GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-04-29 (idpf, igc)
For idpf:
Michal fixes error path handling to remove memory leak.
Larysa prevents reset from being called during shutdown.
For igc:
Jake adjusts locking order to resolve sleeping in atomic context.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: fix lock order in igc_ptp_reset
idpf: protect shutdown from reset
idpf: fix potential memory leak on kcalloc() failure
====================
Link: https://patch.msgid.link/20250429221034.3909139-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sathesh B Edara [Tue, 29 Apr 2025 11:46:24 +0000 (04:46 -0700)]
octeon_ep: Fix host hang issue during device reboot
When the host loses heartbeat messages from the device,
the driver calls the device-specific ndo_stop function,
which frees the resources. If the driver is unloaded in
this scenario, it calls ndo_stop again, attempting to free
resources that have already been freed, leading to a host
hang issue. To resolve this, dev_close should be called
instead of the device-specific stop function.dev_close
internally calls ndo_stop to stop the network interface
and performs additional cleanup tasks. During the driver
unload process, if the device is already down, ndo_stop
is not called.
Fixes:
5cb96c29aa0e ("octeon_ep: add heartbeat monitor")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250429114624.19104-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mattias Barthel [Tue, 29 Apr 2025 09:08:26 +0000 (11:08 +0200)]
net: fec: ERR007885 Workaround for conventional TX
Activate TX hang workaround also in
fec_enet_txq_submit_skb() when TSO is not enabled.
Errata: ERR007885
Symptoms: NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
commit
37d6017b84f7 ("net: fec: Workaround for imx6sx enet tx hang when enable three queues")
There is a TDAR race condition for mutliQ when the software sets TDAR
and the UDMA clears TDAR simultaneously or in a small window (2-4 cycles).
This will cause the udma_tx and udma_tx_arbiter state machines to hang.
So, the Workaround is checking TDAR status four time, if TDAR cleared by
hardware and then write TDAR, otherwise don't set TDAR.
Fixes:
53bb20d1faba ("net: fec: add variable reg_desc_active to speed things up")
Signed-off-by: Mattias Barthel <mattias.barthel@atlascopco.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250429090826.3101258-1-mattiasbarthel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thangaraj Samynathan [Tue, 29 Apr 2025 05:25:27 +0000 (10:55 +0530)]
net: lan743x: Fix memleak issue when GSO enabled
Always map the `skb` to the LS descriptor. Previously skb was
mapped to EXT descriptor when the number of fragments is zero with
GSO enabled. Mapping the skb to EXT descriptor prevents it from
being freed, leading to a memory leak
Fixes:
23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250429052527.10031-1-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sagi Maimon [Tue, 29 Apr 2025 07:33:20 +0000 (10:33 +0300)]
ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations
On Adva boards, SMA sysfs store/get operations can call
__handle_signal_outputs() or __handle_signal_inputs() while the `irig`
and `dcf` pointers are uninitialized, leading to a NULL pointer
dereference in __handle_signal() and causing a kernel crash. Adva boards
don't use `irig` or `dcf` functionality, so add Adva-specific callbacks
`ptp_ocp_sma_adva_set_outputs()` and `ptp_ocp_sma_adva_set_inputs()` that
avoid invoking `irig` or `dcf` input/output routines.
Fixes:
ef61f5528fca ("ptp: ocp: add Adva timecard support")
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250429073320.33277-1-maimon.sagi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jibin Zhang [Tue, 29 Apr 2025 01:59:48 +0000 (09:59 +0800)]
net: use sock_gen_put() when sk_state is TCP_TIME_WAIT
It is possible for a pointer of type struct inet_timewait_sock to be
returned from the functions __inet_lookup_established() and
__inet6_lookup_established(). This can cause a crash when the
returned pointer is of type struct inet_timewait_sock and
sock_put() is called on it. The following is a crash call stack that
shows sk->sk_wmem_alloc being accessed in sk_free() during the call to
sock_put() on a struct inet_timewait_sock pointer. To avoid this issue,
use sock_gen_put() instead of sock_put() when sk->sk_state
is TCP_TIME_WAIT.
mrdump.ko ipanic() + 120
vmlinux notifier_call_chain(nr_to_call=-1, nr_calls=0) + 132
vmlinux atomic_notifier_call_chain(val=0) + 56
vmlinux panic() + 344
vmlinux add_taint() + 164
vmlinux end_report() + 136
vmlinux kasan_report(size=0) + 236
vmlinux report_tag_fault() + 16
vmlinux do_tag_recovery() + 16
vmlinux __do_kernel_fault() + 88
vmlinux do_bad_area() + 28
vmlinux do_tag_check_fault() + 60
vmlinux do_mem_abort() + 80
vmlinux el1_abort() + 56
vmlinux el1h_64_sync_handler() + 124
vmlinux > 0xFFFFFFC080011294()
vmlinux __lse_atomic_fetch_add_release(v=0xF2FFFF82A896087C)
vmlinux __lse_atomic_fetch_sub_release(v=0xF2FFFF82A896087C)
vmlinux arch_atomic_fetch_sub_release(i=1, v=0xF2FFFF82A896087C)
+ 8
vmlinux raw_atomic_fetch_sub_release(i=1, v=0xF2FFFF82A896087C)
+ 8
vmlinux atomic_fetch_sub_release(i=1, v=0xF2FFFF82A896087C) + 8
vmlinux __refcount_sub_and_test(i=1, r=0xF2FFFF82A896087C,
oldp=0) + 8
vmlinux __refcount_dec_and_test(r=0xF2FFFF82A896087C, oldp=0) + 8
vmlinux refcount_dec_and_test(r=0xF2FFFF82A896087C) + 8
vmlinux sk_free(sk=0xF2FFFF82A8960700) + 28
vmlinux sock_put() + 48
vmlinux tcp6_check_fraglist_gro() + 236
vmlinux tcp6_gro_receive() + 624
vmlinux ipv6_gro_receive() + 912
vmlinux dev_gro_receive() + 1116
vmlinux napi_gro_receive() + 196
ccmni.ko ccmni_rx_callback() + 208
ccmni.ko ccmni_queue_recv_skb() + 388
ccci_dpmaif.ko dpmaif_rxq_push_thread() + 1088
vmlinux kthread() + 268
vmlinux 0xFFFFFFC08001F30C()
Fixes:
c9d1d23e5239 ("net: add heuristic for enabling TCP fraglist GRO")
Signed-off-by: Jibin Zhang <jibin.zhang@mediatek.com>
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250429020412.14163-1-shiming.cheng@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vadim Fedorenko [Wed, 30 Apr 2025 17:03:43 +0000 (10:03 -0700)]
bnxt_en: fix module unload sequence
Recent updates to the PTP part of bnxt changed the way PTP FIFO is
cleared, skbs waiting for TX timestamps are now cleared during
ndo_close() call. To do clearing procedure, the ptp structure must
exist and point to a valid address. Module destroy sequence had ptp
clear code running before netdev close causing invalid memory access and
kernel crash. Change the sequence to destroy ptp structure after device
close.
Fixes:
8f7ae5a85137 ("bnxt_en: improve TX timestamping FIFO configuration")
Reported-by: Taehee Yoo <ap420073@gmail.com>
Closes: https://lore.kernel.org/netdev/CAMArcTWDe2cd41=ub=zzvYifaYcYv-N-csxfqxUvejy_L0D6UQ@mail.gmail.com/
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250430170343.759126-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nathan Chancellor [Wed, 30 Apr 2025 22:56:34 +0000 (15:56 -0700)]
kbuild: Properly disable -Wunterminated-string-initialization for clang
Clang and GCC have different behaviors around disabling warnings
included in -Wall and -Wextra and the order in which flags are
specified, which is exposed by clang's new support for
-Wunterminated-string-initialization.
$ cat test.c
const char foo[3] = "FOO";
const char bar[3] __attribute__((__nonstring__)) = "BAR";
$ clang -fsyntax-only -Wextra test.c
test.c:1:21: warning: initializer-string for character array is too long, array size is 3 but initializer has size 4 (including the null terminating character); did you mean to use the 'nonstring' attribute? [-Wunterminated-string-initialization]
1 | const char foo[3] = "FOO";
| ^~~~~
$ clang -fsyntax-only -Wextra -Wno-unterminated-string-initialization test.c
$ clang -fsyntax-only -Wno-unterminated-string-initialization -Wextra test.c
test.c:1:21: warning: initializer-string for character array is too long, array size is 3 but initializer has size 4 (including the null terminating character); did you mean to use the 'nonstring' attribute? [-Wunterminated-string-initialization]
1 | const char foo[3] = "FOO";
| ^~~~~
$ gcc -fsyntax-only -Wextra test.c
test.c:1:21: warning: initializer-string for array of ‘char’ truncates NUL terminator but destination lacks ‘nonstring’ attribute (4 chars into 3 available) [-Wunterminated-string-initialization]
1 | const char foo[3] = "FOO";
| ^~~~~
$ gcc -fsyntax-only -Wextra -Wno-unterminated-string-initialization test.c
$ gcc -fsyntax-only -Wno-unterminated-string-initialization -Wextra test.c
Move -Wextra up right below -Wall in Makefile.extrawarn to ensure these
flags are at the beginning of the warning options list. Move the couple
of warning options that have been added to the main Makefile since
commit
e88ca24319e4 ("kbuild: consolidate warning flags in
scripts/Makefile.extrawarn") to scripts/Makefile.extrawarn after -Wall /
-Wextra to ensure they get properly disabled for all compilers.
Fixes:
9d7a0577c9db ("gcc-15: disable '-Wunterminated-string-initialization' entirely for now")
Link: https://github.com/llvm/llvm-project/issues/10359
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 30 Apr 2025 15:56:50 +0000 (08:56 -0700)]
Merge tag 'for-6.15-rc4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix potential inode leak in iget() after memory allocation failure
- in subpage mode, fix extent buffer bitmap iteration when writing out
dirty sectors
- fix range calculation when falling back to COW for a NOCOW file
* tag 'for-6.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: adjust subpage bit start based on sectorsize
btrfs: fix the inode leak in btrfs_iget()
btrfs: fix COW handling in run_delalloc_nocow()
Linus Torvalds [Wed, 30 Apr 2025 15:37:52 +0000 (08:37 -0700)]
Merge tag 'modules-6.15-rc5' of git://git./linux/kernel/git/modules/linux
Pull modules fixes from Petr Pavlu:
"A single series to properly handle the module_kobject creation.
This fixes a problem with missing /sys/module/<module>/drivers for
built-in modules"
* tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
drivers: base: handle module_kobject creation
kernel: globalize lookup_or_create_module_kobject()
kernel: refactor lookup_or_create_module_kobject()
kernel: param: rename locate_module_kobject
David S. Miller [Wed, 30 Apr 2025 12:03:22 +0000 (13:03 +0100)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: Misc. bug fixes
This series fixes a bug in the driver initialization path, MSIX
setup sequencing issue in the FW error and AER paths, a missing
skb_mark_for_recycle() in the VLAN error path, some ethtool coredump
fixes, an ethtool selftest fix, and an ethtool register dump byte order
fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 28 Apr 2025 22:59:03 +0000 (15:59 -0700)]
bnxt_en: Fix ethtool -d byte order for 32-bit values
For version 1 register dump that includes the PCIe stats, the existing
code incorrectly assumes that all PCIe stats are 64-bit values. Fix it
by using an array containing the starting and ending index of the 32-bit
values. The loop in bnxt_get_regs() will use the array to do proper
endian swap for the 32-bit values.
Fixes:
b5d600b027eb ("bnxt_en: Add support for 'ethtool -d'")
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shruti Parab [Mon, 28 Apr 2025 22:59:02 +0000 (15:59 -0700)]
bnxt_en: Fix out-of-bound memcpy() during ethtool -w
When retrieving the FW coredump using ethtool, it can sometimes cause
memory corruption:
BUG: KFENCE: memory corruption in __bnxt_get_coredump+0x3ef/0x670 [bnxt_en]
Corrupted memory at 0x000000008f0f30e8 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#45):
__bnxt_get_coredump+0x3ef/0x670 [bnxt_en]
ethtool_get_dump_data+0xdc/0x1a0
__dev_ethtool+0xa1e/0x1af0
dev_ethtool+0xa8/0x170
dev_ioctl+0x1b5/0x580
sock_do_ioctl+0xab/0xf0
sock_ioctl+0x1ce/0x2e0
__x64_sys_ioctl+0x87/0xc0
do_syscall_64+0x5c/0xf0
entry_SYSCALL_64_after_hwframe+0x78/0x80
...
This happens when copying the coredump segment list in
bnxt_hwrm_dbg_dma_data() with the HWRM_DBG_COREDUMP_LIST FW command.
The info->dest_buf buffer is allocated based on the number of coredump
segments returned by the FW. The segment list is then DMA'ed by
the FW and the length of the DMA is returned by FW. The driver then
copies this DMA'ed segment list to info->dest_buf.
In some cases, this DMA length may exceed the info->dest_buf length
and cause the above BUG condition. Fix it by capping the copy
length to not exceed the length of info->dest_buf. The extra
DMA data contains no useful information.
This code path is shared for the HWRM_DBG_COREDUMP_LIST and the
HWRM_DBG_COREDUMP_RETRIEVE FW commands. The buffering is different
for these 2 FW commands. To simplify the logic, we need to move
the line to adjust the buffer length for HWRM_DBG_COREDUMP_RETRIEVE
up, so that the new check to cap the copy length will work for both
commands.
Fixes:
c74751f4c392 ("bnxt_en: Return error if FW returns more data than dump length")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shruti Parab [Mon, 28 Apr 2025 22:59:01 +0000 (15:59 -0700)]
bnxt_en: Fix coredump logic to free allocated buffer
When handling HWRM_DBG_COREDUMP_LIST FW command in
bnxt_hwrm_dbg_dma_data(), the allocated buffer info->dest_buf is
not freed in the error path. In the normal path, info->dest_buf
is assigned to coredump->data and it will eventually be freed after
the coredump is collected.
Free info->dest_buf immediately inside bnxt_hwrm_dbg_dma_data() in
the error path.
Fixes:
c74751f4c392 ("bnxt_en: Return error if FW returns more data than dump length")
Reported-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kashyap Desai [Mon, 28 Apr 2025 22:59:00 +0000 (15:59 -0700)]
bnxt_en: delay pci_alloc_irq_vectors() in the AER path
This patch is similar to the last patch to delay the
pci_alloc_irq_vectors() call in the AER path until after calling
bnxt_reserve_rings(). bnxt_reserve_rings() needs to properly map
the MSIX table first before we call pci_alloc_irq_vectors() which
may immediately write to the MSIX table in some architectures.
Move the bnxt_init_int_mode() call from bnxt_io_slot_reset() to
bnxt_io_resume() after calling bnxt_reserve_rings().
With this change, the AER path may call bnxt_open() ->
bnxt_hwrm_if_change() with bp->irq_tbl set to NULL. bp->irq_tbl is
cleared when we call bnxt_clear_int_mode() in bnxt_io_slot_reset().
So we cannot use !bp->irq_tbl to detect aborted FW reset. Add a
new BNXT_FW_RESET_STATE_ABORT to detect aborted FW reset in
bnxt_hwrm_if_change().
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kashyap Desai [Mon, 28 Apr 2025 22:58:59 +0000 (15:58 -0700)]
bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()
On some architectures (e.g. ARM), calling pci_alloc_irq_vectors()
will immediately cause the MSIX table to be written. This will not
work if we haven't called bnxt_reserve_rings() to properly map
the MSIX table to the MSIX vectors reserved by FW.
Fix the FW error recovery path to delay the bnxt_init_int_mode() ->
pci_alloc_irq_vectors() call by removing it from bnxt_hwrm_if_change().
bnxt_request_irq() later in the code path will call it and by then the
MSIX table is properly mapped.
Fixes:
4343838ca5eb ("bnxt_en: Replace deprecated PCI MSIX APIs")
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Somnath Kotur [Mon, 28 Apr 2025 22:58:58 +0000 (15:58 -0700)]
bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()
If bnxt_rx_vlan() fails because the VLAN protocol ID is invalid,
the SKB is freed but we're missing the call to recycle it. This
may cause the warning:
"page_pool_release_retry() stalled pool shutdown"
Add the missing skb_mark_for_recycle() in bnxt_rx_vlan().
Fixes:
86b05508f775 ("bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Mon, 28 Apr 2025 22:58:57 +0000 (15:58 -0700)]
bnxt_en: Fix ethtool selftest output in one of the failure cases
When RDMA driver is loaded, running offline self test is not
supported and driver returns failure early. But it is not clearing
the input buffer and hence the application prints some junk
characters for individual test results.
Fix it by clearing the buffer before returning.
Fixes:
895621f1c816 ("bnxt_en: Don't support offline self test when RoCE driver is loaded")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shravya KN [Mon, 28 Apr 2025 22:58:56 +0000 (15:58 -0700)]
bnxt_en: Fix error handling path in bnxt_init_chip()
WARN_ON() is triggered in __flush_work() if bnxt_init_chip() fails
because we call cancel_work_sync() on dim work that has not been
initialized.
WARNING: CPU: 37 PID: 5223 at kernel/workqueue.c:4201 __flush_work.isra.0+0x212/0x230
The driver relies on the BNXT_STATE_NAPI_DISABLED bit to check if dim
work has already been cancelled. But in the bnxt_open() path,
BNXT_STATE_NAPI_DISABLED is not set and this causes the error
path to think that it needs to cancel the uninitalized dim work.
Fix it by setting BNXT_STATE_NAPI_DISABLED during initialization.
The bit will be cleared when we enable NAPI and initialize dim work.
Fixes:
40452969a506 ("bnxt_en: Fix DIM shutdown")
Suggested-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 30 Apr 2025 03:59:42 +0000 (20:59 -0700)]
Merge tag 'v6.15-p6' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a regression in scompress"
* tag 'v6.15-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: scompress - increment scomp_scratch_users when already allocated
Felix Fietkau [Sat, 26 Apr 2025 15:32:09 +0000 (17:32 +0200)]
net: ipv6: fix UDPv6 GSO segmentation with NAT
If any address or port is changed, update it in all packets and recalculate
checksum.
Fixes:
9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250426153210.14044-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 29 Apr 2025 21:44:39 +0000 (14:44 -0700)]
Merge branch 'fix-felix-dsa-taprio-gates-after-clock-jump'
Vladimir Oltean says:
====================
Fix Felix DSA taprio gates after clock jump
Richie Pearn presented a reproducible situation where traffic would get
blocked on the NXP LS1028A switch if a certain taprio schedule was
applied, and stepping the PTP clock would take place. The latter event
is an expected initial occurrence, but also at runtime, for example when
transitioning from one grandmaster to another.
The issue is completely described in patch 1/4, which also contains
the fix, but it has left me with some doubts regarding the need for
vsc9959_tas_clock_adjust() in general.
In order to prove to myself that vsc9959_tas_clock_adjust() is needed in
general, I have written a selftest for the tc-taprio data path in patch
4/4. On the LS1028A, we can clearly see the following failures without
that function:
INFO: Forcing a backward clock jump
TEST: ping [FAIL]
INFO: Setting up taprio after PTP
TEST: In band with gate [FAIL]
Reception of 100 packets failed
TEST: Out of band with gate [FAIL]
Reception of 100 packets failed
As for testing my fix from patch 1/4, that was quite a bit more complex
to do automatically. In fact, I couldn't find any other schedule that
would fail to be updated by vsc9959_tas_clock_adjust() as cleanly as
the schedule from Richie, so I've added that specific schedule as the
test_clock_jump_backward() test.
The test ordering is also (unfortunately) very strategic. Running the
selftest to the end dirties the GCL RAM, and when running
test_clock_jump_backward() once again, the GCL entries won't be all
zeroes as they were the first time around. They will contain bits and
pieces of old schedules, making it very challenging to make it fail.
Thus, test_clock_jump_backward() is the first in the test suite, and
without patch 1/4, it is only supposed to fail the _first_ time when
running after a clean boot.
====================
Link: https://patch.msgid.link/20250426144859.3128352-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sat, 26 Apr 2025 14:48:58 +0000 (17:48 +0300)]
selftests: net: tc_taprio: new test
Add a forwarding path test for tc-taprio, based on isochron. This is
specifically intended for NICs with an offloaded data path (switchdev/DSA)
and requires taprio 'flags 2'. Also, $h1 and $h2 must support hardware
timestamping, and $h1 tc-etf offload, for isochron to work.
Packets received by a switch while the egress port has a taprio schedule
with an open gate for the traffic class must be sent right away.
Packets received by the switch while the traffic class gate must be
delayed until it opens.
Packets received by the switch must be dropped if the gate for the
traffic class never opens.
Packets should pass if the maximum SDU for the traffic class allows it,
and should be dropped otherwise.
The schedule should auto-update itself if clock jumps take place while
taprio is installed. Repeat most of the above tests after forcing two
clock jumps, one backwards (in Jan 1970) and one back into the present.
Symlink it from tools/testing/selftests/drivers/net/dsa, because usually
DSA ports have the same MAC address, and we need STABLE_MAC_ADDRS=yes
from its forwarding.config for the test to run successfully.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250426144859.3128352-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sat, 26 Apr 2025 14:48:57 +0000 (17:48 +0300)]
selftests: net: tsn_lib: add window_size argument to isochron_do()
Make out-of-band testing (send a packet when its traffic class gate is
closed, expecting it to be delayed) more predictable by allowing the
window size to be customized by isochron_do().
From man isochron-send, the window size alters the advance time (the
delta between the transmission time of the packet, and its expected TX
time when using SO_TXTIME or tc-taprio on the sender). In absence of the
argument, isochron-send defaults to maximizing the advance time (making
it equal to the cycle length).
The default behavior is exactly what is problematic. An advance time
that is too large will make packets intended to be out-of-band still be
potentially in-band with an open gate from the schedule's previous cycle.
We need to allow that advance time to be reduced.
Perhaps a bit confusingly, isochron_do() has a shift_time argument
currently, but that does not help here. The shift time shifts both the
user space wakeup time and the expected TX time by equal amounts, it is
unable of bringing them closer to one another.
Set the window size properly for the Ocelot PSFP selftest as well.
That used to work due to a very carefully chosen SHIFT_TIME_NS.
I've re-tested that the test still works properly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250426144859.3128352-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sat, 26 Apr 2025 14:48:56 +0000 (17:48 +0300)]
selftests: net: tsn_lib: create common helper for counting received packets
This snippet will be necessary for a future isochron-based test, so
provide a simpler high-level interface for counting the received
packets.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250426144859.3128352-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sat, 26 Apr 2025 14:48:55 +0000 (17:48 +0300)]
net: dsa: felix: fix broken taprio gate states after clock jump
Simplest setup to reproduce the issue: connect 2 ports of the
LS1028A-RDB together (eno0 with swp0) and run:
$ ip link set eno0 up && ip link set swp0 up
$ tc qdisc replace dev swp0 parent root handle 100 taprio num_tc 8 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 map 0 1 2 3 4 5 6 7 \
base-time 0 sched-entry S 20 300000 sched-entry S 10 200000 \
sched-entry S 20 300000 sched-entry S 48 200000 \
sched-entry S 20 300000 sched-entry S 83 200000 \
sched-entry S 40 300000 sched-entry S 00 200000 flags 2
$ ptp4l -i eno0 -f /etc/linuxptp/configs/gPTP.cfg -m &
$ ptp4l -i swp0 -f /etc/linuxptp/configs/gPTP.cfg -m
One will observe that the PTP state machine on swp0 starts
synchronizing, then it attempts to do a clock step, and after that, it
never fails to recover from the condition below.
ptp4l[82.427]: selected best master clock 00049f.fffe.05f627
ptp4l[82.428]: port 1 (swp0): MASTER to UNCALIBRATED on RS_SLAVE
ptp4l[83.252]: port 1 (swp0): UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[83.886]: rms
4537731277 max
9075462553 freq -18518 +/- 11467 delay 818 +/- 0
ptp4l[84.170]: timed out while polling for tx timestamp
ptp4l[84.171]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[84.172]: port 1 (swp0): send peer delay request failed
ptp4l[84.173]: port 1 (swp0): clearing fault immediately
ptp4l[84.269]: port 1 (swp0): SLAVE to LISTENING on INIT_COMPLETE
ptp4l[85.303]: timed out while polling for tx timestamp
ptp4l[84.171]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[84.172]: port 1 (swp0): send peer delay request failed
ptp4l[84.173]: port 1 (swp0): clearing fault immediately
ptp4l[84.269]: port 1 (swp0): SLAVE to LISTENING on INIT_COMPLETE
ptp4l[85.303]: timed out while polling for tx timestamp
ptp4l[85.304]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[85.305]: port 1 (swp0): send peer delay response failed
ptp4l[85.306]: port 1 (swp0): clearing fault immediately
ptp4l[86.304]: timed out while polling for tx timestamp
A hint is given by the non-zero statistics for dropped packets which
were expecting hardware TX timestamps:
$ ethtool --include-statistics -T swp0
(...)
Statistics:
tx_pkts: 30
tx_lost: 11
tx_err: 0
We know that when PTP clock stepping takes place (from ocelot_ptp_settime64()
or from ocelot_ptp_adjtime()), vsc9959_tas_clock_adjust() is called.
Another interesting hint is that placing an early return in
vsc9959_tas_clock_adjust(), so as to neutralize this function, fixes the
issue and TX timestamps are no longer dropped.
The debugging function written by me and included below is intended to
read the GCL RAM, after the admin schedule became operational, through
the two status registers available for this purpose:
QSYS_GCL_STATUS_REG_1 and QSYS_GCL_STATUS_REG_2.
static void vsc9959_print_tas_gcl(struct ocelot *ocelot)
{
u32 val, list_length, interval, gate_state;
int i, err;
err = read_poll_timeout(ocelot_read, val,
!(val & QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING),
10, 100000, false, ocelot, QSYS_PARAM_STATUS_REG_8);
if (err) {
dev_err(ocelot->dev,
"Failed to wait for TAS config pending bit to clear: %pe\n",
ERR_PTR(err));
return;
}
val = ocelot_read(ocelot, QSYS_PARAM_STATUS_REG_3);
list_length = QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_X(val);
dev_info(ocelot->dev, "GCL length: %u\n", list_length);
for (i = 0; i < list_length; i++) {
ocelot_rmw(ocelot,
QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM(i),
QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM_M,
QSYS_GCL_STATUS_REG_1);
interval = ocelot_read(ocelot, QSYS_GCL_STATUS_REG_2);
val = ocelot_read(ocelot, QSYS_GCL_STATUS_REG_1);
gate_state = QSYS_GCL_STATUS_REG_1_GATE_STATE_X(val);
dev_info(ocelot->dev, "GCL entry %d: states 0x%x interval %u\n",
i, gate_state, interval);
}
}
Calling it from two places: after the initial QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE
performed by vsc9959_qos_port_tas_set(), and after the one done by
vsc9959_tas_clock_adjust(), I notice the following difference.
From the tc-taprio process context, where the schedule was initially
configured, the GCL looks like this:
mscc_felix 0000:00:00.5: GCL length: 8
mscc_felix 0000:00:00.5: GCL entry 0: states 0x20 interval 300000
mscc_felix 0000:00:00.5: GCL entry 1: states 0x10 interval 200000
mscc_felix 0000:00:00.5: GCL entry 2: states 0x20 interval 300000
mscc_felix 0000:00:00.5: GCL entry 3: states 0x48 interval 200000
mscc_felix 0000:00:00.5: GCL entry 4: states 0x20 interval 300000
mscc_felix 0000:00:00.5: GCL entry 5: states 0x83 interval 200000
mscc_felix 0000:00:00.5: GCL entry 6: states 0x40 interval 300000
mscc_felix 0000:00:00.5: GCL entry 7: states 0x0 interval 200000
But from the ptp4l clock stepping process context, when the
vsc9959_tas_clock_adjust() hook is called, the GCL RAM of the
operational schedule now looks like this:
mscc_felix 0000:00:00.5: GCL length: 8
mscc_felix 0000:00:00.5: GCL entry 0: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 1: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 2: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 3: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 4: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 5: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 6: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 7: states 0x0 interval 0
I do not have a formal explanation, just experimental conclusions.
It appears that after triggering QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE
for a port's TAS, the GCL entry RAM is updated anyway, despite what the
documentation claims: "Specify the time interval in
QSYS::GCL_CFG_REG_2.TIME_INTERVAL. This triggers the actual RAM
write with the gate state and the time interval for the entry number
specified". We don't touch that register (through vsc9959_tas_gcl_set())
from vsc9959_tas_clock_adjust(), yet the GCL RAM is updated anyway.
It seems to be updated with effectively stale memory, which in my
testing can hold a variety of things, including even pieces of the
previously applied schedule, for particular schedule lengths.
As such, in most circumstances it is very difficult to pinpoint this
issue, because the newly updated schedule would "behave strangely",
but ultimately might still pass traffic to some extent, due to some
gate entries still being present in the stale GCL entry RAM. It is easy
to miss.
With the particular schedule given at the beginning, the GCL RAM
"happens" to be reproducibly rewritten with all zeroes, and this is
consistent with what we see: when the time-aware shaper has gate entries
with all gates closed, traffic is dropped on TX, no wonder we can't
retrieve TX timestamps.
Rewriting the GCL entry RAM when reapplying the new base time fixes the
observed issue.
Fixes:
8670dc33f48b ("net: dsa: felix: update base time of time-aware shaper when adjusting PTP time")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250426144859.3128352-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chad Monroe [Sun, 27 Apr 2025 01:05:44 +0000 (02:05 +0100)]
net: ethernet: mtk_eth_soc: fix SER panic with 4GB+ RAM
If the mtk_poll_rx() function detects the MTK_RESETTING flag, it will
jump to release_desc and refill the high word of the SDP on the 4GB RFB.
Subsequently, mtk_rx_clean will process an incorrect SDP, leading to a
panic.
Add patch from MediaTek's SDK to resolve this.
Fixes:
2d75891ebc09 ("net: ethernet: mtk_eth_soc: support 36-bit DMA addressing on MT7988")
Link: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/71f47ea785699c6aa3b922d66c2bdc1a43da25b1
Signed-off-by: Chad Monroe <chad@monroe.io>
Link: https://patch.msgid.link/4adc2aaeb0fb1b9cdc56bf21cf8e7fa328daa345.1745715843.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jacob Keller [Tue, 22 Apr 2025 21:03:09 +0000 (14:03 -0700)]
igc: fix lock order in igc_ptp_reset
Commit
1a931c4f5e68 ("igc: add lock preventing multiple simultaneous PTM
transactions") added a new mutex to protect concurrent PTM transactions.
This lock is acquired in igc_ptp_reset() in order to ensure the PTM
registers are properly disabled after a device reset.
The flow where the lock is acquired already holds a spinlock, so acquiring
a mutex leads to a sleep-while-locking bug, reported both by smatch,
and the kernel test robot.
The critical section in igc_ptp_reset() does correctly use the
readx_poll_timeout_atomic variants, but the standard PTM flow uses regular
sleeping variants. This makes converting the mutex to a spinlock a bit
tricky.
Instead, re-order the locking in igc_ptp_reset. Acquire the mutex first,
and then the tmreg_lock spinlock. This is safe because there is no other
ordering dependency on these locks, as this is the only place where both
locks were acquired simultaneously. Indeed, any other flow acquiring locks
in that order would be wrong regardless.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes:
1a931c4f5e68 ("igc: add lock preventing multiple simultaneous PTM transactions")
Link: https://lore.kernel.org/intel-wired-lan/Z_-P-Hc1yxcw0lTB@stanley.mountain/
Link: https://lore.kernel.org/intel-wired-lan/202504211511.f7738f5d-lkp@intel.com/T/#u
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Larysa Zaremba [Thu, 10 Apr 2025 11:52:23 +0000 (13:52 +0200)]
idpf: protect shutdown from reset
Before the referenced commit, the shutdown just called idpf_remove(),
this way IDPF_REMOVE_IN_PROG was protecting us from the serv_task
rescheduling reset. Without this flag set the shutdown process is
vulnerable to HW reset or any other triggering conditions (such as
default mailbox being destroyed).
When one of conditions checked in idpf_service_task becomes true,
vc_event_task can be rescheduled during shutdown, this leads to accessing
freed memory e.g. idpf_req_rel_vector_indexes() trying to read
vport->q_vector_idxs. This in turn causes the system to become defunct
during e.g. systemctl kexec.
Considering using IDPF_REMOVE_IN_PROG would lead to more heavy shutdown
process, instead just cancel the serv_task before cancelling
adapter->serv_task before cancelling adapter->vc_event_task to ensure that
reset will not be scheduled while we are doing a shutdown.
Fixes:
4c9106f4906a ("idpf: fix adapter NULL pointer dereference on reboot")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Swiatkowski [Fri, 4 Apr 2025 10:54:21 +0000 (12:54 +0200)]
idpf: fix potential memory leak on kcalloc() failure
In case of failing on rss_data->rss_key allocation the function is
freeing vport without freeing earlier allocated q_vector_idxs. Fix it.
Move from freeing in error branch to goto scheme.
Fixes:
d4d558718266 ("idpf: initialize interrupts and enable vport")
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Suggested-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Tue, 29 Apr 2025 21:23:36 +0000 (14:23 -0700)]
Merge tag 'mmc-v6.15-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Renesas SDHI fixes:
- Fix error-paths in probe
- Fix build-error when CONFIG_REGULATOR is unset"
* tag 'mmc-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: renesas_sdhi: disable clocks if registering regulator failed
mmc: renesas_sdhi: add regulator dependency
mmc: renesas_sdhi: Fix error handling in renesas_sdhi_probe
Da Xue [Fri, 25 Apr 2025 19:20:09 +0000 (15:20 -0400)]
net: mdio: mux-meson-gxl: set reversed bit when using internal phy
This bit is necessary to receive packets from the internal PHY.
Without this bit set, no activity occurs on the interface.
Normally u-boot sets this bit, but if u-boot is compiled without
net support, the interface will be up but without any activity.
If bit is set once, it will work until the IP is powered down or reset.
The vendor SDK sets this bit along with the PHY_ID bits.
Signed-off-by: Da Xue <da@libre.computer>
Fixes:
9a24e1ff4326 ("net: mdio: add amlogic gxl mdio mux support")
Link: https://patch.msgid.link/20250425192009.1439508-1-da@libre.computer
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Fri, 25 Apr 2025 15:50:47 +0000 (16:50 +0100)]
net: dlink: Correct endianness handling of led_mode
As it's name suggests, parse_eeprom() parses EEPROM data.
This is done by reading data, 16 bits at a time as follows:
for (i = 0; i < 128; i++)
((__le16 *) sromdata)[i] = cpu_to_le16(read_eeprom(np, i));
sromdata is at the same memory location as psrom.
And the type of psrom is a pointer to struct t_SROM.
As can be seen in the loop above, data is stored in sromdata, and thus psrom,
as 16-bit little-endian values.
However, the integer fields of t_SROM are host byte order integers.
And in the case of led_mode this leads to a little endian value
being incorrectly treated as host byte order.
Looking at rio_set_led_mode, this does appear to be a bug as that code
masks led_mode with 0x1, 0x2 and 0x8. Logic that would be effected by a
reversed byte order.
This problem would only manifest on big endian hosts.
Found by inspection while investigating a sparse warning
regarding the crc field of t_SROM.
I believe that warning is a false positive. And although I plan
to send a follow-up to use little-endian types for other the integer
fields of PSROM_t I do not believe that will involve any bug fixes.
Compile tested only.
Fixes:
c3f45d322cbd ("dl2k: Add support for IP1000A-based cards")
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250425-dlink-led-mode-v1-1-6bae3c36e736@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 29 Apr 2025 18:23:53 +0000 (11:23 -0700)]
Merge tag 'fsnotify_for_v6.15-rc5' of git://git./linux/kernel/git/jack/linux-fs
Pull fsnotify fix from Jan Kara:
"A fix for the recently merged mount notification support"
* tag 'fsnotify_for_v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
selftests/fs/mount-notify: test also remove/flush of mntns marks
fanotify: fix flush of mntns marks
Linus Torvalds [Tue, 29 Apr 2025 18:18:45 +0000 (11:18 -0700)]
Merge tag 'platform-drivers-x86-v6.15-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers fixes from Ilpo Järvinen:
"Fixes and new HW support
- amd/pmc: Require at least 2.5 seconds between HW sleep cycles
- alienware-wmi-wmax:
- Add support for Alienware m15 R7
- Fix error handling to avoid uninitialized variable
- asus-wmi: Disable OOBE state also on resume
- ideapad-laptop: Support a few new buttons
- intel/hid: Add Panther Lake support
- intel-uncore-freq: Fix missing uncore sysfs during CPU hotplug"
* tag 'platform-drivers-x86-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: ideapad-laptop: add support for some new buttons
platform/x86: asus-wmi: Disable OOBE state after resume from hibernation
platform/x86: alienware-wmi-wmax: Add support for Alienware m15 R7
platform/x86/intel: hid: Add Pantherlake support
platform/x86: alienware-wmi-wmax: Fix uninitialized variable due to bad error handling
platform/x86/intel-uncore-freq: Fix missing uncore sysfs during CPU hotplug
platform/x86/amd: pmc: Require at least 2.5 seconds between HW sleep cycles
Linus Torvalds [Tue, 29 Apr 2025 18:10:46 +0000 (11:10 -0700)]
Merge tag 'fixes-2025-04-29' of git://git./linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
"Fixes for nid setting in memmap_init_reserved_pages():
- pass 'size' rather than 'end' to memblock_set_node() as that
function expects
- fix a corner case when memblock.reserved is doubled at
memmap_init_reserved_pages() and the newly reserved block
won't have nid assigned"
* tag 'fixes-2025-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: add test for memblock_set_node
mm/memblock: repeat setting reserved region nid if array is doubled
mm/memblock: pass size instead of end to memblock_set_node()
Linus Torvalds [Mon, 28 Apr 2025 23:56:01 +0000 (16:56 -0700)]
Merge tag 'v6.15-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix three potential use after frees: in session logoff, in krb5 auth,
and in RPC open
- Fix missing rc check in session setup authentication
* tag 'v6.15-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix use-after-free in session logoff
ksmbd: fix use-after-free in kerberos authentication
ksmbd: fix use-after-free in ksmbd_session_rpc_open
smb: server: smb2pdu: check return value of xa_store()
Jakub Kicinski [Mon, 28 Apr 2025 22:59:15 +0000 (15:59 -0700)]
Merge branch 'intel-net-queue-100GbE'
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-04-22 (ice, idpf)
For ice:
Paul removes setting of ICE_AQ_FLAG_RD in ice_get_set_tx_topo() on
E830 devices.
Xuanqiang Luo adds error check for NULL VF VSI.
For idpf:
Madhu fixes misreporting of, currently, unsupported encapsulated
packets.
====================
Link: https://patch.msgid.link/20250425222636.3188441-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Madhu Chittim [Fri, 25 Apr 2025 22:26:33 +0000 (15:26 -0700)]
idpf: fix offloads support for encapsulated packets
Split offloads into csum, tso and other offloads so that tunneled
packets do not by default have all the offloads enabled.
Stateless offloads for encapsulated packets are not yet supported in
firmware/software but in the driver we were setting the features same as
non encapsulated features.
Fixed naming to clarify CSUM bits are being checked for Tx.
Inherit netdev features to VLAN interfaces as well.
Fixes:
0fe45467a104 ("idpf: add create vport and netdev configuration")
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Tested-by: Zachary Goldstein <zachmgoldstein@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250425222636.3188441-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xuanqiang Luo [Fri, 25 Apr 2025 22:26:32 +0000 (15:26 -0700)]
ice: Check VF VSI Pointer Value in ice_vc_add_fdir_fltr()
As mentioned in the commit
baeb705fd6a7 ("ice: always check VF VSI
pointer values"), we need to perform a null pointer check on the return
value of ice_get_vf_vsi() before using it.
Fixes:
6ebbe97a4881 ("ice: Add a per-VF limit on number of FDIR filters")
Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250425222636.3188441-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Greenwalt [Fri, 25 Apr 2025 22:26:31 +0000 (15:26 -0700)]
ice: fix Get Tx Topology AQ command error on E830
The Get Tx Topology AQ command (opcode 0x0418) has different read flag
requirements depending on the hardware/firmware. For E810, E822, and E823
firmware the read flag must be set, and for newer hardware (E825 and E830)
it must not be set.
This results in failure to configure Tx topology and the following warning
message during probe:
DDP package does not support Tx scheduling layers switching feature -
please update to the latest DDP package and try again
The current implementation only handles E825-C but not E830. It is
confusing as we first check ice_is_e825c() and then set the flag in the set
case. Finally, we check ice_is_e825c() again and set the flag for all other
hardware in both the set and get case.
Instead, notice that we always need the read flag for set, but only need
the read flag for get on E810, E822, and E823 firmware. Fix the logic to
check the MAC type and set the read flag in get only on the older devices
which require it.
Fixes:
ba1124f58afd ("ice: Add E830 device IDs, MAC type and registers")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250425222636.3188441-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 28 Apr 2025 22:55:11 +0000 (15:55 -0700)]
Merge branch 'net_sched-adapt-qdiscs-for-reentrant-enqueue-cases'
Victor Nogueira says:
====================
net_sched: Adapt qdiscs for reentrant enqueue cases
As described in Gerrard's report [1], there are cases where netem can
make the qdisc enqueue callback reentrant. Some qdiscs (drr, hfsc, ets,
qfq) break whenever the enqueue callback has reentrant behaviour.
This series addresses these issues by adding extra checks that cater for
these reentrant corner cases. This series has passed all relevant test
cases in the TDC suite.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
====================
Link: https://patch.msgid.link/20250425220710.3964791-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Fri, 25 Apr 2025 22:07:09 +0000 (19:07 -0300)]
selftests: tc-testing: Add TDC tests that exercise reentrant enqueue behaviour
Add 5 TDC tests that exercise the reentrant enqueue behaviour in drr,
ets, qfq, and hfsc:
- Test DRR's enqueue reentrant behaviour with netem (which caused a
double list add)
- Test ETS's enqueue reentrant behaviour with netem (which caused a double
list add)
- Test QFQ's enqueue reentrant behaviour with netem (which caused a double
list add)
- Test HFSC's enqueue reentrant behaviour with netem (which caused a UAF)
- Test nested DRR's enqueue reentrant behaviour with netem (which caused a
double list add)
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-6-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Fri, 25 Apr 2025 22:07:08 +0000 (19:07 -0300)]
net_sched: qfq: Fix double list add in class with netem as child qdisc
As described in Gerrard's report [1], there are use cases where a netem
child qdisc will make the parent qdisc's enqueue callback reentrant.
In the case of qfq, there won't be a UAF, but the code will add the same
classifier to the list twice, which will cause memory corruption.
This patch checks whether the class was already added to the agg->active
list (cl_is_active) before doing the addition to cater for the reentrant
case.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
Fixes:
37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-5-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Fri, 25 Apr 2025 22:07:07 +0000 (19:07 -0300)]
net_sched: ets: Fix double list add in class with netem as child qdisc
As described in Gerrard's report [1], there are use cases where a netem
child qdisc will make the parent qdisc's enqueue callback reentrant.
In the case of ets, there won't be a UAF, but the code will add the same
classifier to the list twice, which will cause memory corruption.
In addition to checking for qlen being zero, this patch checks whether
the class was already added to the active_list (cl_is_active) before
doing the addition to cater for the reentrant case.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
Fixes:
37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-4-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Fri, 25 Apr 2025 22:07:06 +0000 (19:07 -0300)]
net_sched: hfsc: Fix a UAF vulnerability in class with netem as child qdisc
As described in Gerrard's report [1], we have a UAF case when an hfsc class
has a netem child qdisc. The crux of the issue is that hfsc is assuming
that checking for cl->qdisc->q.qlen == 0 guarantees that it hasn't inserted
the class in the vttree or eltree (which is not true for the netem
duplicate case).
This patch checks the n_active class variable to make sure that the code
won't insert the class in the vttree or eltree twice, catering for the
reentrant case.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
Fixes:
37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-3-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Fri, 25 Apr 2025 22:07:05 +0000 (19:07 -0300)]
net_sched: drr: Fix double list add in class with netem as child qdisc
As described in Gerrard's report [1], there are use cases where a netem
child qdisc will make the parent qdisc's enqueue callback reentrant.
In the case of drr, there won't be a UAF, but the code will add the same
classifier to the list twice, which will cause memory corruption.
In addition to checking for qlen being zero, this patch checks whether the
class was already added to the active_list (cl_is_active) before adding
to the list to cover for the reentrant case.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
Fixes:
37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-2-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 25 Apr 2025 20:38:57 +0000 (13:38 -0700)]
pds_core: remove write-after-free of client_id
A use-after-free error popped up in stress testing:
[Mon Apr 21 21:21:33 2025] BUG: KFENCE: use-after-free write in pdsc_auxbus_dev_del+0xef/0x160 [pds_core]
[Mon Apr 21 21:21:33 2025] Use-after-free write at 0x000000007013ecd1 (in kfence-#47):
[Mon Apr 21 21:21:33 2025] pdsc_auxbus_dev_del+0xef/0x160 [pds_core]
[Mon Apr 21 21:21:33 2025] pdsc_remove+0xc0/0x1b0 [pds_core]
[Mon Apr 21 21:21:33 2025] pci_device_remove+0x24/0x70
[Mon Apr 21 21:21:33 2025] device_release_driver_internal+0x11f/0x180
[Mon Apr 21 21:21:33 2025] driver_detach+0x45/0x80
[Mon Apr 21 21:21:33 2025] bus_remove_driver+0x83/0xe0
[Mon Apr 21 21:21:33 2025] pci_unregister_driver+0x1a/0x80
The actual device uninit usually happens on a separate thread
scheduled after this code runs, but there is no guarantee of order
of thread execution, so this could be a problem. There's no
actual need to clear the client_id at this point, so simply
remove the offending code.
Fixes:
10659034c622 ("pds_core: add the aux client API")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250425203857.71547-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 28 Apr 2025 22:51:44 +0000 (15:51 -0700)]
Merge tag 'for-net-2025-04-25' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btmtksdio: Check function enabled before doing close
- btmtksdio: Do close if SDIO card removed without close
- btusb: avoid NULL pointer dereference in skb_dequeue()
- btintel_pcie: Avoid redundant buffer allocation
- btintel_pcie: Add additional to checks to clear TX/RX paths
- hci_conn: Fix not setting conn_timeout for Broadcast Receiver
- hci_conn: Fix not setting timeout for BIG Create Sync
* tag 'for-net-2025-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: copy RX timestamp to new fragments
Bluetooth: btintel_pcie: Add additional to checks to clear TX/RX paths
Bluetooth: btmtksdio: Do close if SDIO card removed without close
Bluetooth: btmtksdio: Check function enabled before doing close
Bluetooth: btusb: avoid NULL pointer dereference in skb_dequeue()
Bluetooth: btintel_pcie: Avoid redundant buffer allocation
Bluetooth: hci_conn: Fix not setting timeout for BIG Create Sync
Bluetooth: hci_conn: Fix not setting conn_timeout for Broadcast Receiver
====================
Link: https://patch.msgid.link/20250425192412.1578759-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kory Maincent [Fri, 25 Apr 2025 17:14:18 +0000 (19:14 +0200)]
netlink: specs: ethtool: Remove UAPI duplication of phy-upstream enum
The phy-upstream enum is already defined in the ethtool.h UAPI header
and used by the ethtool userspace tool. However, the ethtool spec does
not reference it, causing YNL to auto-generate a duplicate and redundant
enum.
Fix this by updating the spec to reference the existing UAPI enum
in ethtool.h.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250425171419.947352-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Fri, 25 Apr 2025 04:29:53 +0000 (05:29 +0100)]
net: ethernet: mtk_eth_soc: sync mtk_clks_source_name array
When removing the clock bits for clocks which aren't used by the
Ethernet driver their names should also have been removed from the
mtk_clks_source_name array.
Remove them now as enum mtk_clks_map needs to match the
mtk_clks_source_name array so the driver can make sure that all required
clocks are present and correctly name missing clocks.
Fixes:
887b1d1adb2e ("net: ethernet: mtk_eth_soc: drop clocks unused by Ethernet driver")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/d075e706ff1cebc07f9ec666736d0b32782fd487.1745555321.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vishal Badole [Thu, 24 Apr 2025 13:02:48 +0000 (18:32 +0530)]
amd-xgbe: Fix to ensure dependent features are toggled with RX checksum offload
According to the XGMAC specification, enabling features such as Layer 3
and Layer 4 Packet Filtering, Split Header and Virtualized Network support
automatically selects the IPC Full Checksum Offload Engine on the receive
side.
When RX checksum offload is disabled, these dependent features must also
be disabled to prevent abnormal behavior caused by mismatched feature
dependencies.
Ensure that toggling RX checksum offload (disabling or enabling) properly
disables or enables all dependent features, maintaining consistent and
expected behavior in the network device.
Cc: stable@vger.kernel.org
Fixes:
1a510ccf5869 ("amd-xgbe: Add support for VXLAN offload capabilities")
Signed-off-by: Vishal Badole <Vishal.Badole@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250424130248.428865-1-Vishal.Badole@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Mon, 28 Apr 2025 19:18:21 +0000 (12:18 -0700)]
Merge tag 'for-6.15/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- always update the array size in realloc_argv on success
- dm-integrity: fix a warning on invalid table line
- dm-bufio: don't schedule in atomic context
- Fix W=1 build with clang
* tag 'for-6.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: always update the array size in realloc_argv on success
dm-integrity: fix a warning on invalid table line
dm-bufio: don't schedule in atomic context
dm table: Fix W=1 build warning when mempool_needs_integrity is unused
Linus Torvalds [Mon, 28 Apr 2025 16:29:12 +0000 (09:29 -0700)]
Merge tag 'powerpc-6.15-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- fix to handle patchable function entries during module load
- fix to align vmemmap start to page size
- fixes to handle compilation errors and warnings
Thanks to Anthony Iliopoulos, Donet Tom, Ritesh Harjani (IBM), Venkat
Rao Bagalkote, and Stephen Rothwell.
* tag 'powerpc-6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/boot: Fix dash warning
powerpc/boot: Check for ld-option support
powerpc: Add check to select PPC_RADIX_BROADCAST_TLBIE
powerpc64/ftrace: fix module loading without patchable function entries
book3s64/radix : Align section vmemmap start address to PAGE_SIZE
book3s64/radix: Fix compile errors when CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP=n
Linus Torvalds [Mon, 28 Apr 2025 16:24:19 +0000 (09:24 -0700)]
Merge tag 'hyperv-fixes-signed-
20250427' of git://git./linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Bug fixes for the Hyper-V driver and kvp_daemon
* tag 'hyperv-fixes-signed-
20250427' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: Fix bad ref to hv_synic_eventring_tail when CPU goes offline
tools/hv: update route parsing in kvp daemon
Drivers: hv: Fix bad pointer dereference in hv_get_partition_id
Benjamin Marzinski [Tue, 15 Apr 2025 04:17:16 +0000 (00:17 -0400)]
dm: always update the array size in realloc_argv on success
realloc_argv() was only updating the array size if it was called with
old_argv already allocated. The first time it was called to create an
argv array, it would allocate the array but return the array size as
zero. dm_split_args() would think that it couldn't store any arguments
in the array and would call realloc_argv() again, causing it to
reallocate the initial slots (this time using GPF_KERNEL) and finally
return a size. Aside from being wasteful, this could cause deadlocks on
targets that need to process messages without starting new IO. Instead,
realloc_argv should always update the allocated array size on success.
Fixes:
a0651926553c ("dm table: don't copy from a NULL pointer in realloc_argv()")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Linus Torvalds [Sun, 27 Apr 2025 22:19:23 +0000 (15:19 -0700)]
Linux 6.15-rc4
Linus Torvalds [Sat, 26 Apr 2025 20:02:36 +0000 (13:02 -0700)]
Merge tag 'pci-v6.15-fixes-3' of git://git./linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- When releasing a start-aligned resource, e.g., a bridge window, save
start/end/flags for the next assignment attempt; fixes a v6.15-rc1
regression (Ilpo Järvinen)
- Move set_pcie_speed.sh from TEST_PROGS to TEST_FILE; fixes a bwctrl
selftest v6.15-rc1 regression (Ilpo Järvinen)
- Add Manivannan Sadhasivam as maintainer of native host bridge and
endpoint drivers (Manivannan Sadhasivam)
- In endpoint test driver, defer IRQ allocation from .probe() until
ioctl() to fix a regression on platforms where the Vendor/Device ID
match doesn't include driver_data (Niklas Cassel)
* tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
misc: pci_endpoint_test: Defer IRQ allocation until ioctl(PCITEST_SET_IRQTYPE)
MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer
selftests/pcie_bwctrl: Fix test progs list
PCI: Restore assigned resources fully after release
Linus Torvalds [Sat, 26 Apr 2025 17:43:03 +0000 (10:43 -0700)]
Merge tag 'nfsd-6.15-2' of git://git./linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Revert a v6.15 patch due to a report of SELinux test failures
* tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "sunrpc: clean cache_detail immediately when flush is written frequently"
Linus Torvalds [Sat, 26 Apr 2025 16:45:54 +0000 (09:45 -0700)]
Merge tag 'x86-urgent-2025-04-26' of git://git./linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix 32-bit kernel boot crash if passed physical memory with more than
32 address bits
- Fix Xen PV crash
- Work around build bug in certain limited build environments
- Fix CTEST instruction decoding in insn_decoder_test
* tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/insn: Fix CTEST instruction decoding
x86/boot: Work around broken busybox 'truncate' tool
x86/mm: Fix _pgd_alloc() for Xen PV mode
x86/e820: Discard high memory that can't be addressed by 32-bit systems
Linus Torvalds [Sat, 26 Apr 2025 16:23:20 +0000 (09:23 -0700)]
Merge tag 'sched-urgent-2025-04-26' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix sporadic crashes in dequeue_entities() due to ... bad math.
[ Arguably if pick_eevdf()/pick_next_entity() was less trusting of
complex math being correct it could have de-escalated a crash into
a warning, but that's for a different patch ]"
* tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
Linus Torvalds [Sat, 26 Apr 2025 16:13:09 +0000 (09:13 -0700)]
Merge tag 'perf-urgent-2025-04-26' of git://git./linux/kernel/git/tip/tip
Pull misc perf events fixes from Ingo Molnar:
- Use POLLERR for events in error state, instead of the ambiguous
POLLHUP error value
- Fix non-sampling (counting) events on certain x86 platforms
* tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix non-sampling (counting) events on certain x86 platforms
perf/core: Change to POLLERR for pinned events with error
Linus Torvalds [Sat, 26 Apr 2025 16:08:45 +0000 (09:08 -0700)]
Merge tag 'irq-urgent-2025-04-26' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Fix crashes in the gic-v2m irqchip driver, caused by an incorrect
__init annotation"
* tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()
Linus Torvalds [Sat, 26 Apr 2025 16:02:41 +0000 (09:02 -0700)]
Merge tag 'loongarch-fixes-6.15-1' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Add a missing Kconfig option, fix some bugs in exception handlers,
memory management and KVM"
* tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally
LoongArch: KVM: Fully clear some CSRs when VM reboot
LoongArch: KVM: Fix multiple typos of KVM code
LoongArch: Return NULL from huge_pte_offset() for invalid PMD
LoongArch: Remove a bogus reference to ZONE_DMA
LoongArch: Handle fp, lsx, lasx and lbt assembly symbols
LoongArch: Make do_xyz() exception handlers more robust
LoongArch: Make regs_irqs_disabled() more clear
LoongArch: Select ARCH_USE_MEMTEST
Linus Torvalds [Sat, 26 Apr 2025 16:01:13 +0000 (09:01 -0700)]
Merge tag 'for-linus' of https://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
- Support for cacheinfo API to expose OpenRISC cache info via sysfs,
this also translated to some cleanups to OpenRISC cache flush and
invalidate API's
- Documentation updates for new mailing list and toolchain binaries
* tag 'for-linus' of https://github.com/openrisc/linux:
Documentation: openrisc: Update toolchain binaries URL
Documentation: openrisc: Update mailing list
openrisc: Add cacheinfo support
openrisc: Introduce new utility functions to flush and invalidate caches
openrisc: Refactor struct cpuinfo_or1k to reduce duplication
Chuck Lever [Thu, 24 Apr 2025 13:27:35 +0000 (09:27 -0400)]
Revert "sunrpc: clean cache_detail immediately when flush is written frequently"
Ondrej reports that certain SELinux tests are failing after commit
fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is
written frequently"), merged during the v6.15 merge window.
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Fixes:
fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is written frequently")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Linus Torvalds [Sat, 26 Apr 2025 15:55:24 +0000 (08:55 -0700)]
Merge tag 'move-lib-kunit-v6.15-rc4' of git://git./linux/kernel/git/kees/linux
Pull kunit fix from Kees Cook:
"A single fix for the kunit lib/tests/ relocation:
- Ensure prime numbers tests are included in KUnit test runs (Mark Brown)"
* tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib: Ensure prime numbers tests are included in KUnit test runs
Linus Torvalds [Sat, 26 Apr 2025 15:32:29 +0000 (08:32 -0700)]
Merge tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, mostly amdgpu, with some exynos cleanups and a
couple of minor fixes, seems a bit quiet, but probably some lag from
Easter holidays.
amdgpu:
- P2P DMA fixes
- Display reset fixes
- DCN 3.5 fixes
- ACPI EDID fix
- LTTPR fix
- mode_valid() fix
exynos:
- fix spelling error
- remove redundant error handling in exynos_drm_vidi.c module
- marks struct decon_data as const in the exynos7_drm_decon driver
since it is only read
- Remove unnecessary checking in exynos_drm_drv.c module
meson:
- Fix VCLK calculation
panel:
- jd9365a: Fix reset polarity"
* tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel:
drm/exynos: Fix spelling mistake "enqueu" -> "enqueue"
drm/exynos: exynos7_drm_decon: Consstify struct decon_data
drm/exynos: fixed a spelling error
drm/exynos/vidi: Remove redundant error handling in vidi_get_modes()
drm/exynos: Remove unnecessary checking
drm/amd/display: do not copy invalid CRTC timing info
drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFF
drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinks
drm/amd/display: Fix ACPI edid parsing on some Lenovo systems
drm/amdgpu: Allow P2P access through XGMI
drm/amd/display: Enable urgent latency adjustment on DCN35
drm/amd/display: Force full update in gpu reset
drm/amd/display: Fix gpu reset in multidisplay config
drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFY
drm/amdgpu: Use allowed_domains for pinning dmabufs
drm: panel: jd9365da: fix reset signal polarity in unprepare
drm/meson: use unsigned long long / Hz for frequency types
Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"
Omar Sandoval [Fri, 25 Apr 2025 08:51:24 +0000 (01:51 -0700)]
sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
There is a code path in dequeue_entities() that can set the slice of a
sched_entity to U64_MAX, which sometimes results in a crash.
The offending case is when dequeue_entities() is called to dequeue a
delayed group entity, and then the entity's parent's dequeue is delayed.
In that case:
1. In the if (entity_is_task(se)) else block at the beginning of
dequeue_entities(), slice is set to
cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
2. The first for_each_sched_entity() loop dequeues the entity.
3. If the entity was its parent's only child, then the next iteration
tries to dequeue the parent.
4. If the parent's dequeue needs to be delayed, then it breaks from the
first for_each_sched_entity() loop _without updating slice_.
5. The second for_each_sched_entity() loop sets the parent's ->slice to
the saved slice, which is still U64_MAX.
This throws off subsequent calculations with potentially catastrophic
results. A manifestation we saw in production was:
6. In update_entity_lag(), se->slice is used to calculate limit, which
ends up as a huge negative number.
7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
is negative, vlag > limit, so se->vlag is set to the same huge
negative number.
8. In place_entity(), se->vlag is scaled, which overflows and results in
another huge (positive or negative) number.
9. The adjusted lag is subtracted from se->vruntime, which increases or
decreases se->vruntime by a huge number.
10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
incorrectly returns false because the vruntime is so far from the
other vruntimes on the queue, causing the
(vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
12. pick_next_entity() tries to dereference the return value of
pick_eevdf() and crashes.
Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
huge vruntime ranges and bogus vlag values, and I also traced se->slice
being set to U64_MAX on live systems (which was usually "benign" since
the rest of the runqueue needed to be in a particular state to crash).
Fix it in dequeue_entities() by always setting slice from the first
non-empty cfs_rq.
Fixes:
aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com
Suzuki K Poulose [Tue, 22 Apr 2025 16:16:16 +0000 (17:16 +0100)]
irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()
With ACPI in place, gicv2m_get_fwnode() is registered with the pci
subsystem as pci_msi_get_fwnode_cb(), which may get invoked at runtime
during a PCI host bridge probe. But, the call back is wrongly marked as
__init, causing it to be freed, while being registered with the PCI
subsystem and could trigger:
Unable to handle kernel paging request at virtual address
ffff8000816c0400
gicv2m_get_fwnode+0x0/0x58 (P)
pci_set_bus_msi_domain+0x74/0x88
pci_register_host_bridge+0x194/0x548
This is easily reproducible on a Juno board with ACPI boot.
Retain the function for later use.
Fixes:
0644b3daca28 ("irqchip/gic-v2m: acpi: Introducing GICv2m ACPI support")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Jakub Kicinski [Sat, 26 Apr 2025 02:06:58 +0000 (19:06 -0700)]
Merge branch 'net-ethernet-mtk-star-emac-fix-several-issues-on-rx-tx-poll'
Louis-Alexis Eyraud says:
====================
net: ethernet: mtk-star-emac: fix several issues on rx/tx poll
This patchset fixes two issues with the mtk-star-emac driver.
The first patch fixes spin lock recursion issues I've observed on the
Mediatek Genio 350-EVK board using this driver when the Ethernet
functionality is enabled on the board (requires a correct jumper and
DIP switch configuration, as well as enabling the device in the
devicetree).
The issues can be easily reproduced with apt install or ssh commands
especially and with the CONFIG_DEBUG_SPINLOCK parameter, when
one occurs, there is backtrace similar to this:
```
BUG: spinlock recursion on CPU#0, swapper/0/0
lock: 0xffff00000db9cf20, .magic:
dead4ead, .owner: swapper/0/0,
.owner_cpu: 0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
6.15.0-rc2-next-20250417-00001-gf6a27738686c-dirty #28 PREEMPT
Hardware name: MediaTek MT8365 Open Platform EVK (DT)
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x60/0x80
dump_stack+0x18/0x24
spin_dump+0x78/0x88
do_raw_spin_lock+0x11c/0x120
_raw_spin_lock+0x20/0x2c
mtk_star_handle_irq+0xc0/0x22c [mtk_star_emac]
__handle_irq_event_percpu+0x48/0x140
handle_irq_event+0x4c/0xb0
handle_fasteoi_irq+0xa0/0x1bc
handle_irq_desc+0x34/0x58
generic_handle_domain_irq+0x1c/0x28
gic_handle_irq+0x4c/0x120
do_interrupt_handler+0x50/0x84
el1_interrupt+0x34/0x68
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x6c/0x70
regmap_mmio_read32le+0xc/0x20 (P)
_regmap_bus_reg_read+0x6c/0xac
_regmap_read+0x60/0xdc
regmap_read+0x4c/0x80
mtk_star_rx_poll+0x2f4/0x39c [mtk_star_emac]
__napi_poll+0x38/0x188
net_rx_action+0x164/0x2c0
handle_softirqs+0x100/0x244
__do_softirq+0x14/0x20
____do_softirq+0x10/0x20
call_on_irq_stack+0x24/0x64
do_softirq_own_stack+0x1c/0x40
__irq_exit_rcu+0xd4/0x10c
irq_exit_rcu+0x10/0x1c
el1_interrupt+0x38/0x68
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x6c/0x70
cpuidle_enter_state+0xac/0x320 (P)
cpuidle_enter+0x38/0x50
do_idle+0x1e4/0x260
cpu_startup_entry+0x34/0x3c
rest_init+0xdc/0xe0
console_on_rootfs+0x0/0x6c
__primary_switched+0x88/0x90
```
The second patch is a cleanup patch to fix a inconsistency in the
mtk_star_rx_poll function between the napi_complete_done api usage and
its description in documentation.
I've tested this patchset on Mediatek Genio 350-EVK board with a kernel
based on linux-next (tag: next-
20250422).
v1: https://lore.kernel.org/
20250422-mtk_star_emac-fix-spinlock-recursion-issue-v1-0-
1e94ea430360@collabora.com
====================
Link: https://patch.msgid.link/20250424-mtk_star_emac-fix-spinlock-recursion-issue-v2-0-f3fde2e529d8@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Louis-Alexis Eyraud [Thu, 24 Apr 2025 08:38:49 +0000 (10:38 +0200)]
net: ethernet: mtk-star-emac: rearm interrupts in rx_poll only when advised
In mtk_star_rx_poll function, on event processing completion, the
mtk_star_emac driver calls napi_complete_done but ignores its return
code and enable RX DMA interrupts inconditionally. This return code
gives the info if a device should avoid rearming its interrupts or not,
so fix this behaviour by taking it into account.
Fixes:
8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://patch.msgid.link/20250424-mtk_star_emac-fix-spinlock-recursion-issue-v2-2-f3fde2e529d8@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Louis-Alexis Eyraud [Thu, 24 Apr 2025 08:38:48 +0000 (10:38 +0200)]
net: ethernet: mtk-star-emac: fix spinlock recursion issues on rx/tx poll
Use spin_lock_irqsave and spin_unlock_irqrestore instead of spin_lock
and spin_unlock in mtk_star_emac driver to avoid spinlock recursion
occurrence that can happen when enabling the DMA interrupts again in
rx/tx poll.
```
BUG: spinlock recursion on CPU#0, swapper/0/0
lock: 0xffff00000db9cf20, .magic:
dead4ead, .owner: swapper/0/0,
.owner_cpu: 0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
6.15.0-rc2-next-20250417-00001-gf6a27738686c-dirty #28 PREEMPT
Hardware name: MediaTek MT8365 Open Platform EVK (DT)
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x60/0x80
dump_stack+0x18/0x24
spin_dump+0x78/0x88
do_raw_spin_lock+0x11c/0x120
_raw_spin_lock+0x20/0x2c
mtk_star_handle_irq+0xc0/0x22c [mtk_star_emac]
__handle_irq_event_percpu+0x48/0x140
handle_irq_event+0x4c/0xb0
handle_fasteoi_irq+0xa0/0x1bc
handle_irq_desc+0x34/0x58
generic_handle_domain_irq+0x1c/0x28
gic_handle_irq+0x4c/0x120
do_interrupt_handler+0x50/0x84
el1_interrupt+0x34/0x68
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x6c/0x70
regmap_mmio_read32le+0xc/0x20 (P)
_regmap_bus_reg_read+0x6c/0xac
_regmap_read+0x60/0xdc
regmap_read+0x4c/0x80
mtk_star_rx_poll+0x2f4/0x39c [mtk_star_emac]
__napi_poll+0x38/0x188
net_rx_action+0x164/0x2c0
handle_softirqs+0x100/0x244
__do_softirq+0x14/0x20
____do_softirq+0x10/0x20
call_on_irq_stack+0x24/0x64
do_softirq_own_stack+0x1c/0x40
__irq_exit_rcu+0xd4/0x10c
irq_exit_rcu+0x10/0x1c
el1_interrupt+0x38/0x68
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x6c/0x70
cpuidle_enter_state+0xac/0x320 (P)
cpuidle_enter+0x38/0x50
do_idle+0x1e4/0x260
cpu_startup_entry+0x34/0x3c
rest_init+0xdc/0xe0
console_on_rootfs+0x0/0x6c
__primary_switched+0x88/0x90
```
Fixes:
0a8bd81fd6aa ("net: ethernet: mtk-star-emac: separate tx/rx handling with two NAPIs")
Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://patch.msgid.link/20250424-mtk_star_emac-fix-spinlock-recursion-issue-v2-1-f3fde2e529d8@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Justin Lai [Thu, 24 Apr 2025 04:04:44 +0000 (12:04 +0800)]
rtase: Modify the condition used to detect overflow in rtase_calc_time_mitigation
Fix the following compile error reported by the kernel test
robot by modifying the condition used to detect overflow in
rtase_calc_time_mitigation.
In file included from include/linux/mdio.h:10:0,
from drivers/net/ethernet/realtek/rtase/rtase_main.c:58:
In function 'u16_encode_bits',
inlined from 'rtase_calc_time_mitigation.constprop' at drivers/net/
ethernet/realtek/rtase/rtase_main.c:1915:13,
inlined from 'rtase_init_software_variable.isra.41' at drivers/net/
ethernet/realtek/rtase/rtase_main.c:1961:13,
inlined from 'rtase_init_one' at drivers/net/ethernet/realtek/
rtase/rtase_main.c:2111:2:
>> include/linux/bitfield.h:178:3: error: call to '__field_overflow'
declared with attribute error: value doesn't fit into mask
__field_overflow(); \
^~~~~~~~~~~~~~~~~~
include/linux/bitfield.h:198:2: note: in expansion of macro
'____MAKE_OP'
____MAKE_OP(u##size,u##size,,)
^~~~~~~~~~~
include/linux/bitfield.h:200:1: note: in expansion of macro
'__MAKE_OP'
__MAKE_OP(16)
^~~~~~~~~
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202503182158.nkAlbJWX-lkp@intel.com/
Fixes:
a36e9f5cfe9e ("rtase: Add support for a pci table in this module")
Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250424040444.5530-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bibo Mao [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally
In function kvm_pre_enter_guest(), it prepares to enter guest and check
whether there are pending signals or events. And it will not enter guest
if there are, PMU pass-through preparation for guest should be cancelled
and host should own PMU hardware.
Cc: stable@vger.kernel.org
Fixes:
f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fully clear some CSRs when VM reboot
Some registers such as LOONGARCH_CSR_ESTAT and LOONGARCH_CSR_GINTC are
partly cleared with function _kvm_setcsr(). This comes from the hardware
specification, some bits are read only in VM mode, and however they can
be written in host mode. So they are partly cleared in VM mode, and can
be fully cleared in host mode.
These read only bits show pending interrupt or exception status. When VM
reset, the read-only bits should be cleared, otherwise vCPU will receive
unknown interrupts in boot stage.
Here registers LOONGARCH_CSR_ESTAT/LOONGARCH_CSR_GINTC are fully cleared
in ioctl KVM_REG_LOONGARCH_VCPU_RESET vCPU reset path.
Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Yulong Han [Thu, 24 Apr 2025 12:15:52 +0000 (20:15 +0800)]
LoongArch: KVM: Fix multiple typos of KVM code
Fix multiple typos inside arch/loongarch/kvm.
Cc: stable@vger.kernel.org
Reviewed-by: Yuli Wang <wangyuli@uniontech.com>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Yulong Han <wheatfox17@icloud.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Ming Wang [Thu, 24 Apr 2025 12:15:47 +0000 (20:15 +0800)]
LoongArch: Return NULL from huge_pte_offset() for invalid PMD
LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot
even if the underlying entry points to invalid_pte_table (indicating no
mapping). Callers like smaps_hugetlb_range() fetch this invalid entry
value (the address of invalid_pte_table) via this pointer.
The generic is_swap_pte() check then incorrectly identifies this address
as a swap entry on LoongArch, because it satisfies the "!pte_present()
&& !pte_none()" conditions. This misinterpretation, combined with a
coincidental match by is_migration_entry() on the address bits, leads to
kernel crashes in pfn_swap_entry_to_page().
Fix this at the architecture level by modifying huge_pte_offset() to
check the PMD entry's content using pmd_none() before returning. If the
entry is invalid (i.e., it points to invalid_pte_table), return NULL
instead of the pointer to the slot.
Cc: stable@vger.kernel.org
Acked-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Petr Tesarik [Thu, 24 Apr 2025 12:15:41 +0000 (20:15 +0800)]
LoongArch: Remove a bogus reference to ZONE_DMA
Remove dead code. LoongArch does not have a DMA memory zone (24bit DMA).
The architecture does not even define MAX_DMA_PFN.
Cc: stable@vger.kernel.org
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Thu, 24 Apr 2025 12:15:41 +0000 (20:15 +0800)]
LoongArch: Handle fp, lsx, lasx and lbt assembly symbols
Like the other relevant symbols, export some fp, lsx, lasx and lbt
assembly symbols and put the function declarations in header files
rather than source files.
While at it, use "asmlinkage" for the other existing C prototypes
of assembly functions and also do not use the "extern" keyword with
function declarations according to the document coding-style.rst.
Cc: stable@vger.kernel.org # 6.6+
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Thu, 24 Apr 2025 12:15:41 +0000 (20:15 +0800)]
LoongArch: Make do_xyz() exception handlers more robust
Currently, interrupts need to be disabled before single-step mode is
set, it requires that CSR_PRMD_PIE be cleared in save_local_irqflag()
which is called by setup_singlestep(), this is reasonable.
But in the first kprobe breakpoint exception, if the irq is enabled at
the beginning of do_bp(), it will not be disabled at the end of do_bp()
due to the CSR_PRMD_PIE has been cleared in save_local_irqflag(). So for
this case, it may corrupt exception context when restoring the exception
after do_bp() in handle_bp(), this is not reasonable.
In order to restore exception safely in handle_bp(), it needs to ensure
the irq is disabled at the end of do_bp(), so just add a local variable
to record the original interrupt status in the parent context, then use
it as the check condition to enable and disable irq in do_bp().
While at it, do the similar thing for other do_xyz() exception handlers
to make them more robust.
Fixes:
6d4cc40fb5f5 ("LoongArch: Add kprobes support")
Suggested-by: Jinyang He <hejinyang@loongson.cn>
Suggested-by: Huacai Chen <chenhuacai@loongson.cn>
Co-developed-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Thu, 24 Apr 2025 12:15:41 +0000 (20:15 +0800)]
LoongArch: Make regs_irqs_disabled() more clear
In the current code, the definition of regs_irqs_disabled() is actually
"!(regs->csr_prmd & CSR_CRMD_IE)" because arch_irqs_disabled_flags() is
defined as "!(flags & CSR_CRMD_IE)", it looks a little strange.
Define regs_irqs_disabled() as !(regs->csr_prmd & CSR_PRMD_PIE) directly
to make it more clear, no functional change.
While at it, the return value of regs_irqs_disabled() is true or false,
so change its type to reflect that and also make it always inline.
Fixes:
803b0fc5c3f2 ("LoongArch: Add process management")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Yuli Wang [Thu, 24 Apr 2025 12:15:22 +0000 (20:15 +0800)]
LoongArch: Select ARCH_USE_MEMTEST
As of commit
dce44566192e ("mm/memtest: add ARCH_USE_MEMTEST"),
architectures must select ARCH_USE_MEMTESET to enable CONFIG_MEMTEST.
Commit
628c3bb40e9a ("LoongArch: Add boot and setup routines") added
support for early_memtest but did not select ARCH_USE_MEMTESET.
Fixes:
628c3bb40e9a ("LoongArch: Add boot and setup routines")
Tested-by: Erpeng Xu <xuerpeng@uniontech.com>
Tested-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Vadim Fedorenko [Thu, 24 Apr 2025 12:55:47 +0000 (05:55 -0700)]
bnxt_en: improve TX timestamping FIFO configuration
Reconfiguration of netdev may trigger close/open procedure which can
break FIFO status by adjusting the amount of empty slots for TX
timestamps. But it is not really needed because timestamps for the
packets sent over the wire still can be retrieved. On the other side,
during netdev close procedure any skbs waiting for TX timestamps can be
leaked because there is no cleaning procedure called. Free skbs waiting
for TX timestamps when closing netdev.
Fixes:
8aa2a79e9b95 ("bnxt_en: Increase the max total outstanding PTP TX packets to 4")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20250424125547.460632-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sathesh B Edara [Thu, 24 Apr 2025 13:39:44 +0000 (06:39 -0700)]
octeon_ep_vf: Resolve netdevice usage count issue
The netdevice usage count increases during transmit queue timeouts
because netdev_hold is called in ndo_tx_timeout, scheduling a task
to reinitialize the card. Although netdev_put is called at the end
of the scheduled work, rtnl_unlock checks the reference count during
cleanup. This could cause issues if transmit timeout is called on
multiple queues.
Fixes:
cb7dd712189f ("octeon_ep_vf: Add driver framework and device initialization")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Link: https://patch.msgid.link/20250424133944.28128-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christian Heusel [Thu, 24 Apr 2025 14:00:28 +0000 (16:00 +0200)]
Revert "rndis_host: Flag RNDIS modems as WWAN devices"
This reverts commit
67d1a8956d2d62fe6b4c13ebabb57806098511d8. Since this
commit has been proven to be problematic for the setup of USB-tethered
ethernet connections and the related breakage is very noticeable for
users it should be reverted until a fixed version of the change can be
rolled out.
Closes: https://lore.kernel.org/all/
e0df2d85-1296-4317-b717-
bd757e3ab928@heusel.eu/
Link: https://chaos.social/@gromit/114377862699921553
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220002
Link: https://bugs.gentoo.org/953555
Link: https://bbs.archlinux.org/viewtopic.php?id=304892
Cc: stable@vger.kernel.org
Acked-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Christian Heusel <christian@heusel.eu>
Link: https://patch.msgid.link/20250424-usb-tethering-fix-v1-1-b65cf97c740e@heusel.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 24 Apr 2025 22:37:34 +0000 (01:37 +0300)]
selftests: net: bridge_vlan_aware: test untagged/8021p-tagged with and without PVID
Recent discussions around commit
ad1afb003939 ("vlan_dev: VLAN 0 should
be treated as "no vlan tag" (802.1p packet)") have sparked the question
what happens with the DSA (and possibly other switchdev) data path when
the bridge says that ports should have no PVID VLAN, but the 8021q
module, as the result of a NETDEV_UP event, decides it should add VID 0
to the RX filter of those bridge ports. Do those bridge ports receive
packets tagged with VID 0 or not, now? We don't know, there is no test.
In the veth realm, this passes trivially, because veth is not VLAN
filtering and this, the 8021q module lacks the instinct to add VID 0 in
the first place.
In the realm of VLAN filtering NICs with no switchdev offload, this
should also pass, because the VLAN groups of the software bridge are
consulted, where it can clearly be seen that a PVID is missing, even
though the packet was initially accepted by the NIC.
The test only poses a challenge for switchdev drivers, which usually
have to program to hardware both VLANs from RX filtering, as well as
from switchdev. Especially when a switchdev port joins a VLAN-aware
bridge, it is unavoidable that it gains the NETIF_F_HW_VLAN_CTAG_FILTER
feature, i.e. any 8021q uppers that the bridge port may have must also
be committed to the RX filtering table of the interface. When a
VLAN-tagged packet is physically received by the port, it is initially
indistinguishable whether it will reach the bridge data path or the
8021q upper data path.
That is rather the final step of the new tests that we introduce.
We need to build context up to that stage, which means the following:
- we need to test that 802.1p (VID 0) tagged traffic is received in the
first place (on bridge ports with a valid PVID). This is the "8021p"
test.
- we need to test that the usual paths of reaching a configuration with
no PVID on a bridge port are all covered and they all reach the same
state.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250424223734.3096202-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 24 Apr 2025 22:37:33 +0000 (01:37 +0300)]
net: mscc: ocelot: delete PVID VLAN when readding it as non-PVID
The following set of commands:
ip link add br0 type bridge vlan_filtering 1 # vlan_default_pvid 1 is implicit
ip link set swp0 master br0
bridge vlan add dev swp0 vid 1
should result in the dropping of untagged and 802.1p-tagged traffic, but
we see that it continues to be accepted. Whereas, had we deleted VID 1
instead, the aforementioned dropping would have worked
This is because the ANA_PORT_DROP_CFG update logic doesn't run, because
ocelot_vlan_add() only calls ocelot_port_set_pvid() if the new VLAN has
the BRIDGE_VLAN_INFO_PVID flag.
Similar to other drivers like mt7530_port_vlan_add() which handle this
case correctly, we need to test whether the VLAN we're changing used to
have the BRIDGE_VLAN_INFO_PVID flag, but lost it now. That amounts to a
PVID deletion and should be treated as such.
Regarding blame attribution: this never worked properly since the
introduction of bridge VLAN filtering in commit
7142529f1688 ("net:
mscc: ocelot: add VLAN filtering"). However, there was a significant
paradigm shift which aligned the ANA_PORT_DROP_CFG register with the
PVID concept rather than with the native VLAN concept, and that change
wasn't targeted for 'stable'. Realistically, that is as far as this fix
needs to be propagated to.
Fixes:
be0576fed6d3 ("net: mscc: ocelot: move the logic to drop 802.1p traffic to the pvid deletion")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250424223734.3096202-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>