Eric Dumazet [Wed, 11 Jun 2025 11:15:12 +0000 (11:15 +0000)]
net_sched: red: fix a race in __red_change()
Gerrard Tai reported a race condition in RED, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
0c8d13ac9607 ("net: sched: red: delay destroying child qdisc on replace")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:11 +0000 (11:15 +0000)]
net_sched: prio: fix a race in prio_tune()
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
7b8e0b6e6599 ("net: sched: prio: delay destroying child qdiscs on change")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 08:35:01 +0000 (08:35 +0000)]
net_sched: sch_sfq: reject invalid perturb period
Gerrard Tai reported that SFQ perturb_period has no range check yet,
and this can be used to trigger a race condition fixed in a separate patch.
We want to make sure ctl->perturb_period * HZ will not overflow
and is positive.
Tested:
tc qd add dev lo root sfq perturb -10 # negative value : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb
1000000000 # too big : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb
2000000 # acceptable value
tc -s -d qd sh dev lo
qdisc sfq 8005: root refcnt 2 limit 127p quantum 64Kb depth 127 flows 128 divisor 1024 perturb 2000000sec
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250611083501.1810459-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxime Chevallier [Fri, 6 Jun 2025 09:43:20 +0000 (11:43 +0200)]
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
When performing a non-exact phy_caps lookup, we are looking for a
supported mode that matches as closely as possible the passed speed/duplex.
Blamed patch broke that logic by returning a match too early in case
the caller asks for half-duplex, as a full-duplex linkmode may match
first, and returned as a non-exact match without even trying to mach on
half-duplex modes.
Reported-by: Jijie Shao <shaojijie@huawei.com>
Closes: https://lore.kernel.org/netdev/
20250603102500.
4ec743cf@fedora/T/#m22ed60ca635c67dc7d9cbb47e8995b2beb5c1576
Tested-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Fixes:
fc81e257d19f ("net: phy: phy_caps: Allow looking-up link caps based on speed and duplex")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250606094321.483602-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Tue, 10 Jun 2025 23:56:58 +0000 (23:56 +0000)]
MAINTAINERS: Update Kuniyuki Iwashima's email address.
I left Amazon and joined Google, so let's map the email
addresses accordingly.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250610235734.88540-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Jun 2025 00:12:45 +0000 (17:12 -0700)]
selftests: net: add test case for NAT46 looping back dst
Simple test for crash involving multicast loopback and stale dst.
Reuse exising NAT46 program.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250610001245.1981782-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Jun 2025 00:12:44 +0000 (17:12 -0700)]
net: clear the dst when changing skb protocol
A not-so-careful NAT46 BPF program can crash the kernel
if it indiscriminately flips ingress packets from v4 to v6:
BUG: kernel NULL pointer dereference, address:
0000000000000000
ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
ipv6_rcv (net/ipv6/ip6_input.c:306:8)
process_backlog (net/core/dev.c:6186:4)
napi_poll (net/core/dev.c:6906:9)
net_rx_action (net/core/dev.c:7028:13)
do_softirq (kernel/softirq.c:462:3)
netif_rx (net/core/dev.c:5326:3)
dev_loopback_xmit (net/core/dev.c:4015:2)
ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
NF_HOOK (./include/linux/netfilter.h:314:9)
ip_mc_output (net/ipv4/ip_output.c:400:5)
dst_output (./include/net/dst.h:459:9)
ip_local_out (net/ipv4/ip_output.c:130:9)
ip_send_skb (net/ipv4/ip_output.c:1496:8)
udp_send_skb (net/ipv4/udp.c:1040:8)
udp_sendmsg (net/ipv4/udp.c:1328:10)
The output interface has a 4->6 program attached at ingress.
We try to loop the multicast skb back to the sending socket.
Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
and changes skb->protocol to v6. We enter ip6_rcv_core which
tries to use skb_dst(). But the dst is still an IPv4 one left
after IPv4 mcast output.
Clear the dst in all BPF helpers which change the protocol.
Try to preserve metadata dsts, those may carry non-routing
metadata.
Cc: stable@vger.kernel.org
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes:
d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
Fixes:
1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
Fixes:
6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250610001245.1981782-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Jun 2025 21:41:15 +0000 (14:41 -0700)]
Merge branch 'mlx5-misc-fixes-2025-06-10'
Mark Bloch says:
====================
mlx5 misc fixes 2025-06-10
This patchset includes misc fixes from the team for the mlx5 core
and Ethernet drivers.
====================
Link: https://patch.msgid.link/20250610151514.1094735-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shahar Shitrit [Tue, 10 Jun 2025 15:15:14 +0000 (18:15 +0300)]
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
When the link is up, either eth_proto_oper or ext_eth_proto_oper
typically reports the active link protocol, from which both speed
and number of lanes can be retrieved. However, in certain cases,
such as when a NIC is connected via a non-standard cable, the
firmware may not report the protocol.
In such scenarios, the speed can still be obtained from the
data_rate_oper field in PTYS register. Since data_rate_oper
provides only speed information and lacks lane details, it is
incorrect to derive the number of lanes from it.
This patch corrects the behavior by setting the number of lanes to
UNKNOWN instead of incorrectly using MAX_LANES when relying on
data_rate_oper.
Fixes:
7e959797f021 ("net/mlx5e: Enable lanes configuration when auto-negotiation is off")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-10-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianbo Liu [Tue, 10 Jun 2025 15:15:13 +0000 (18:15 +0300)]
net/mlx5e: Fix leak of Geneve TLV option object
Previously, a unique tunnel id was added for the matching on TC
non-zero chains, to support inner header rewrite with goto action.
Later, it was used to support VF tunnel offload for vxlan, then for
Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is
used to parse tunnel options. For Geneve, if there is TLV option, a
object is created, or refcnt is added if already exists. But the
temporary mlx5_flow_spec is directly freed after parsing, which causes
the leak because no information regarding the object is saved in
flow's mlx5_flow_spec, which is used to free the object when deleting
the flow.
To fix the leak, call mlx5_geneve_tlv_option_del() before free the
temporary spec if it has TLV object.
Fixes:
521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Alex Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vlad Dogaru [Tue, 10 Jun 2025 15:15:11 +0000 (18:15 +0300)]
net/mlx5: HWS, make sure the uplink is the last destination
When there are more than one destinations, we create a FW flow
table and provide it with all the destinations. FW requires to
have wire as the last destination in the list (if it exists),
otherwise the operation fails with FW syndrome.
This patch fixes the destination array action creation: if it
contains a wire destination, it is moved to the end.
Fixes:
504e536d9010 ("net/mlx5: HWS, added actions handling")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Tue, 10 Jun 2025 15:15:10 +0000 (18:15 +0300)]
net/mlx5: HWS, fix missing ip_version handling in definer
Fix missing field handling in definer - outer IP version.
Fixes:
74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vlad Dogaru [Tue, 10 Jun 2025 15:15:09 +0000 (18:15 +0300)]
net/mlx5: HWS, Init mutex on the correct path
The newly introduced mutex is only used for reformat actions, but it was
initialized for modify header instead.
The struct that contains the mutex is zero-initialized and an all-zero
mutex is valid, so the issue only shows up with CONFIG_DEBUG_MUTEXES.
Fixes:
b206d9ec19df ("net/mlx5: HWS, register reformat actions with fw")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Patrisious Haddad [Tue, 10 Jun 2025 15:15:08 +0000 (18:15 +0300)]
net/mlx5: Fix return value when searching for existing flow group
When attempting to add a rule to an existing flow group, if a matching
flow group exists but is not active, the error code returned should be
EAGAIN, so that the rule can be added to the matching flow group once
it is active, rather than ENOENT, which indicates that no matching
flow group was found.
Fixes:
bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amir Tzin [Tue, 10 Jun 2025 15:15:07 +0000 (18:15 +0300)]
net/mlx5: Fix ECVF vports unload on shutdown flow
Fix shutdown flow UAF when a virtual function is created on the embedded
chip (ECVF) of a BlueField device. In such case the vport acl ingress
table is not properly destroyed.
ECVF functionality is independent of ecpf_vport_exists capability and
thus functions mlx5_eswitch_(enable|disable)_pf_vf_vports() should not
test it when enabling/disabling ECVF vports.
kernel log:
[] refcount_t: underflow; use-after-free.
[] WARNING: CPU: 3 PID: 1 at lib/refcount.c:28
refcount_warn_saturate+0x124/0x220
----------------
[] Call trace:
[] refcount_warn_saturate+0x124/0x220
[] tree_put_node+0x164/0x1e0 [mlx5_core]
[] mlx5_destroy_flow_table+0x98/0x2c0 [mlx5_core]
[] esw_acl_ingress_table_destroy+0x28/0x40 [mlx5_core]
[] esw_acl_ingress_lgcy_cleanup+0x80/0xf4 [mlx5_core]
[] esw_legacy_vport_acl_cleanup+0x44/0x60 [mlx5_core]
[] esw_vport_cleanup+0x64/0x90 [mlx5_core]
[] mlx5_esw_vport_disable+0xc0/0x1d0 [mlx5_core]
[] mlx5_eswitch_unload_ec_vf_vports+0xcc/0x150 [mlx5_core]
[] mlx5_eswitch_disable_sriov+0x198/0x2a0 [mlx5_core]
[] mlx5_device_disable_sriov+0xb8/0x1e0 [mlx5_core]
[] mlx5_sriov_detach+0x40/0x50 [mlx5_core]
[] mlx5_unload+0x40/0xc4 [mlx5_core]
[] mlx5_unload_one_devl_locked+0x6c/0xe4 [mlx5_core]
[] mlx5_unload_one+0x3c/0x60 [mlx5_core]
[] shutdown+0x7c/0xa4 [mlx5_core]
[] pci_device_shutdown+0x3c/0xa0
[] device_shutdown+0x170/0x340
[] __do_sys_reboot+0x1f4/0x2a0
[] __arm64_sys_reboot+0x2c/0x40
[] invoke_syscall+0x78/0x100
[] el0_svc_common.constprop.0+0x54/0x184
[] do_el0_svc+0x30/0xac
[] el0_svc+0x48/0x160
[] el0t_64_sync_handler+0xa4/0x12c
[] el0t_64_sync+0x1a4/0x1a8
[] --[ end trace
9c4601d68c70030e ]---
Fixes:
a7719b29a821 ("net/mlx5: Add management of EC VF vports")
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Moshe Shemesh [Tue, 10 Jun 2025 15:15:06 +0000 (18:15 +0300)]
net/mlx5: Ensure fw pages are always allocated on same NUMA
When firmware asks the driver to allocate more pages, using event of
give_pages, the driver should always allocate it from same NUMA, the
original device NUMA. Current code uses dev_to_node() which can result
in different NUMA as it is changed by other driver flows, such as
mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
allocating firmware pages.
Fixes:
311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250610151514.1094735-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Jun 2025 21:28:51 +0000 (14:28 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-06-10 (i40e, iavf, ice, e1000)
For i40e:
Robert Malz improves reset handling for situations where multiple reset
requests could cause some to be missed.
For iavf:
Ahmed adds detection, and handling, of reset that could occur early in
the initialization process to stop long wait/hangs.
For ice:
Anton, properly, sets missed use_nsecs value.
For e1000:
Joe Damato moves cancel_work_sync() call to avoid deadlock.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000: Move cancel_work_sync to avoid deadlock
ice/ptp: fix crosstimestamp reporting
iavf: fix reset_task for early reset event
i40e: retry VFLR handling if there is ongoing VF reset
i40e: return false from i40e_reset_vf if reset is in progress
====================
Link: https://patch.msgid.link/20250610171348.1476574-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Raczynski [Mon, 9 Jun 2025 15:31:47 +0000 (17:31 +0200)]
net/mdiobus: Fix potential out-of-bounds clause 45 read/write access
When using publicly available tools like 'mdio-tools' to read/write data
from/to network interface and its PHY via C45 (clause 45) mdiobus,
there is no verification of parameters passed to the ioctl and
it accepts any mdio address.
Currently there is support for 32 addresses in kernel via PHY_MAX_ADDR define,
but it is possible to pass higher value than that via ioctl.
While read/write operation should generally fail in this case,
mdiobus provides stats array, where wrong address may allow out-of-bounds
read/write.
Fix that by adding address verification before C45 read/write operation.
While this excludes this access from any statistics, it improves security of
read/write operation.
Fixes:
4e4aafcddbbf ("net: mdio: Add dedicated C45 API to MDIO bus drivers")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reported-by: Wenjing Shan <wenjing.shan@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Raczynski [Mon, 9 Jun 2025 15:31:46 +0000 (17:31 +0200)]
net/mdiobus: Fix potential out-of-bounds read/write access
When using publicly available tools like 'mdio-tools' to read/write data
from/to network interface and its PHY via mdiobus, there is no verification of
parameters passed to the ioctl and it accepts any mdio address.
Currently there is support for 32 addresses in kernel via PHY_MAX_ADDR define,
but it is possible to pass higher value than that via ioctl.
While read/write operation should generally fail in this case,
mdiobus provides stats array, where wrong address may allow out-of-bounds
read/write.
Fix that by adding address verification before read/write operation.
While this excludes this access from any statistics, it improves security of
read/write operation.
Fixes:
080bb352fad00 ("net: phy: Maintain MDIO device and bus statistics")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reported-by: Wenjing Shan <wenjing.shan@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Carlos Fernandez [Mon, 9 Jun 2025 07:26:26 +0000 (09:26 +0200)]
macsec: MACsec SCI assignment for ES = 0
According to 802.1AE standard, when ES and SC flags in TCI are zero,
used SCI should be the current active SC_RX. Current code uses the
header MAC address. Without this patch, when ES flag is 0 (using a
bridge or switch), header MAC will not fit the SCI and MACSec frames
will be discarted.
In order to test this issue, MACsec link should be stablished between
two interfaces, setting SC and ES flags to zero and a port identifier
different than one. For example, using ip macsec tools:
ip link add link $ETH0 macsec0 type macsec port 11 send_sci off
end_station off
ip macsec add macsec0 tx sa 0 pn 2 on key 01 $ETH1_KEY
ip macsec add macsec0 rx port 11 address $ETH1_MAC
ip macsec add macsec0 rx port 11 address $ETH1_MAC sa 0 pn 2 on key 02
ip link set dev macsec0 up
ip link add link $ETH1 macsec1 type macsec port 11 send_sci off
end_station off
ip macsec add macsec1 tx sa 0 pn 2 on key 01 $ETH0_KEY
ip macsec add macsec1 rx port 11 address $ETH0_MAC
ip macsec add macsec1 rx port 11 address $ETH0_MAC sa 0 pn 2 on key 02
ip link set dev macsec1 up
Fixes:
c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Co-developed-by: Andreu Montiel <Andreu.Montiel@technica-engineering.de>
Signed-off-by: Andreu Montiel <Andreu.Montiel@technica-engineering.de>
Signed-off-by: Carlos Fernandez <carlos.fernandez@technica-engineering.de>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Mon, 9 Jun 2025 20:40:35 +0000 (22:40 +0200)]
net: airoha: Enable RX queues 16-31
Fix RX_DONE_INT_MASK definition in order to enable RX queues 16-31.
Fixes:
f252493e18353 ("net: airoha: Enable multiple IRQ lines support in airoha_eth driver.")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250609-aioha-fix-rx-queue-mask-v1-1-f33706a06fa2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gustavo Luiz Duarte [Mon, 9 Jun 2025 18:24:20 +0000 (11:24 -0700)]
netconsole: fix appending sysdata when sysdata_fields == SYSDATA_RELEASE
Before appending sysdata, prepare_extradata() checks if any feature is
enabled in sysdata_fields (and exits early if none is enabled).
When SYSDATA_RELEASE was introduced, we missed adding it to the list of
features being checked against sysdata_fields in prepare_extradata().
The result was that, if only SYSDATA_RELEASE is enabled in
sysdata_fields, we incorreclty exit early and fail to append the
release.
Instead of checking specific bits in sysdata_fields, check if
sysdata_fields has ALL bit zeroed and exit early if true. This fixes
case when only SYSDATA_RELEASE enabled and makes the code more general /
less error prone in future feature implementation.
Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Fixes:
cfcc9239e78a ("netconsole: append release to sysdata")
Link: https://patch.msgid.link/20250609-netconsole-fix-v1-1-17543611ae31@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Luczaj [Mon, 9 Jun 2025 17:08:03 +0000 (19:08 +0200)]
net: Fix TOCTOU issue in sk_is_readable()
sk->sk_prot->sock_is_readable is a valid function pointer when sk resides
in a sockmap. After the last sk_psock_put() (which usually happens when
socket is removed from sockmap), sk->sk_prot gets restored and
sk->sk_prot->sock_is_readable becomes NULL.
This makes sk_is_readable() racy, if the value of sk->sk_prot is reloaded
after the initial check. Which in turn may lead to a null pointer
dereference.
Ensure the function pointer does not turn NULL after the check.
Fixes:
8934ce2fd081 ("bpf: sockmap redirect ingress support")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250609-skisreadable-toctou-v1-1-d0dfb2d62c37@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lucas Sanchez Sagrado [Mon, 9 Jun 2025 14:55:36 +0000 (16:55 +0200)]
net: usb: r8152: Add device ID for TP-Link UE200
The TP-Link UE200 is a RTL8152B based USB 2.0 Fast Ethernet adapter. This
patch adds its device ID. It has been tested on Ubuntu 22.04.5.
Signed-off-by: Lucas Sanchez Sagrado <lucsansag@gmail.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/20250609145536.26648-1-lucsansag@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Tue, 3 Jun 2025 16:34:01 +0000 (16:34 +0000)]
e1000: Move cancel_work_sync to avoid deadlock
Previously, e1000_down called cancel_work_sync for the e1000 reset task
(via e1000_down_and_stop), which takes RTNL.
As reported by users and syzbot, a deadlock is possible in the following
scenario:
CPU 0:
- RTNL is held
- e1000_close
- e1000_down
- cancel_work_sync (cancel / wait for e1000_reset_task())
CPU 1:
- process_one_work
- e1000_reset_task
- take RTNL
To remedy this, avoid calling cancel_work_sync from e1000_down
(e1000_reset_task does nothing if the device is down anyway). Instead,
call cancel_work_sync for e1000_reset_task when the device is being
removed.
Fixes:
e400c7444d84 ("e1000: Hold RTNL when e1000_down can be called")
Reported-by: syzbot+846bb38dc67fe62cc733@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
683837bf.
a00a0220.52848.0003.GAE@google.com/
Reported-by: John <john.cs.hey@gmail.com>
Closes: https://lore.kernel.org/netdev/CAP=Rh=OEsn4y_2LvkO3UtDWurKcGPnZ_NPSXK=FbgygNXL37Sw@mail.gmail.com/
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Anton Nadezhdin [Tue, 20 May 2025 08:42:16 +0000 (10:42 +0200)]
ice/ptp: fix crosstimestamp reporting
Set use_nsecs=true as timestamp is reported in ns. Lack of this result
in smaller timestamp error window which cause error during phc2sys
execution on E825 NICs:
phc2sys[1768.256]: ioctl PTP_SYS_OFFSET_PRECISE: Invalid argument
This problem was introduced in the cited commit which omitted setting
use_nsecs to true when converting the ice driver to use
convert_base_to_cs().
Testing hints (ethX is PF netdev):
phc2sys -s ethX -c CLOCK_REALTIME -O 37 -m
phc2sys[1769.256]: CLOCK_REALTIME phc offset -5 s0 freq -0 delay 0
Fixes:
d4bea547ebb57 ("ice/ptp: Remove convert_art_to_tsc()")
Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Ahmed Zaki [Thu, 24 Apr 2025 13:50:13 +0000 (15:50 +0200)]
iavf: fix reset_task for early reset event
If a reset event is received from the PF early in the init cycle, the
state machine hangs for about 25 seconds.
Reproducer:
echo 1 > /sys/class/net/$PF0/device/sriov_numvfs
ip link set dev $PF0 vf 0 mac $NEW_MAC
The log shows:
[792.620416] ice 0000:5e:00.0: Enabling 1 VFs
[792.738812] iavf 0000:5e:01.0: enabling device (0000 -> 0002)
[792.744182] ice 0000:5e:00.0: Enabling 1 VFs with 17 vectors and 16 queues per VF
[792.839964] ice 0000:5e:00.0: Setting MAC 52:54:00:00:00:11 on VF 0. VF driver will be reinitialized
[813.389684] iavf 0000:5e:01.0: Failed to communicate with PF; waiting before retry
[818.635918] iavf 0000:5e:01.0: Hardware came out of reset. Attempting reinit.
[818.766273] iavf 0000:5e:01.0: Multiqueue Enabled: Queue pair count = 16
Fix it by scheduling the reset task and making the reset task capable of
resetting early in the init cycle.
Fixes:
ef8693eb90ae3 ("i40evf: refactor reset handling")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Robert Malz [Tue, 20 May 2025 08:31:52 +0000 (10:31 +0200)]
i40e: retry VFLR handling if there is ongoing VF reset
When a VFLR interrupt is received during a VF reset initiated from a
different source, the VFLR may be not fully handled. This can
leave the VF in an undefined state.
To address this, set the I40E_VFLR_EVENT_PENDING bit again during VFLR
handling if the reset is not yet complete. This ensures the driver
will properly complete the VF reset in such scenarios.
Fixes:
52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF")
Signed-off-by: Robert Malz <robert.malz@canonical.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Robert Malz [Tue, 20 May 2025 08:31:51 +0000 (10:31 +0200)]
i40e: return false from i40e_reset_vf if reset is in progress
The function i40e_vc_reset_vf attempts, up to 20 times, to handle a
VF reset request, using the return value of i40e_reset_vf as an indicator
of whether the reset was successfully triggered. Currently, i40e_reset_vf
always returns true, which causes new reset requests to be ignored if a
different VF reset is already in progress.
This patch updates the return value of i40e_reset_vf to reflect when
another VF reset is in progress, allowing the caller to properly use
the retry mechanism.
Fixes:
52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF")
Signed-off-by: Robert Malz <robert.malz@canonical.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jakub Kicinski [Mon, 9 Jun 2025 22:47:29 +0000 (15:47 -0700)]
Merge tag 'for-net-2025-06-05' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- MGMT: Fix UAF on mgmt_remove_adv_monitor_complete
- MGMT: Protect mgmt_pending list with its own lock
- hci_core: fix list_for_each_entry_rcu usage
- btintel_pcie: Increase the tx and rx descriptor count
- btintel_pcie: Reduce driver buffer posting to prevent race condition
- btintel_pcie: Fix driver not posting maximum rx buffers
* tag 'for-net-2025-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Protect mgmt_pending list with its own lock
Bluetooth: MGMT: Fix UAF on mgmt_remove_adv_monitor_complete
Bluetooth: btintel_pcie: Reduce driver buffer posting to prevent race condition
Bluetooth: btintel_pcie: Increase the tx and rx descriptor count
Bluetooth: btintel_pcie: Fix driver not posting maximum rx buffers
Bluetooth: hci_core: fix list_for_each_entry_rcu usage
====================
Link: https://patch.msgid.link/20250605191136.904411-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 6 Jun 2025 16:51:27 +0000 (16:51 +0000)]
net_sched: sch_sfq: fix a potential crash on gso_skb handling
SFQ has an assumption of always being able to queue at least one packet.
However, after the blamed commit, sch->q.len can be inflated by packets
in sch->gso_skb, and an enqueue() on an empty SFQ qdisc can be followed
by an immediate drop.
Fix sfq_drop() to properly clear q->tail in this situation.
Tested:
ip netns add lb
ip link add dev to-lb type veth peer name in-lb netns lb
ethtool -K to-lb tso off # force qdisc to requeue gso_skb
ip netns exec lb ethtool -K in-lb gro on # enable NAPI
ip link set dev to-lb up
ip -netns lb link set dev in-lb up
ip addr add dev to-lb 192.168.20.1/24
ip -netns lb addr add dev in-lb 192.168.20.2/24
tc qdisc replace dev to-lb root sfq limit 100
ip netns exec lb netserver
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
Fixes:
a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Closes: https://lore.kernel.org/netdev/
9da42688-bfaa-4364-8797-
e9271f3bdaef@hetzner-cloud.de/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250606165127.3629486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Fang [Thu, 5 Jun 2025 06:08:36 +0000 (14:08 +0800)]
net: enetc: fix the netc-lib driver build dependency
The kernel robot reported the following errors when the netc-lib driver
was compiled as a loadable module and the enetc-core driver was built-in.
ld.lld: error: undefined symbol: ntmp_init_cbdr
referenced by enetc_cbdr.c:88 (drivers/net/ethernet/freescale/enetc/enetc_cbdr.c:88)
ld.lld: error: undefined symbol: ntmp_free_cbdr
referenced by enetc_cbdr.c:96 (drivers/net/ethernet/freescale/enetc/enetc_cbdr.c:96)
Simply changing "tristate" to "bool" can fix this issue, but considering
that the netc-lib driver needs to support being compiled as a loadable
module and LS1028 does not need the netc-lib driver. Therefore, we add a
boolean symbol 'NXP_NTMP' to enable 'NXP_NETC_LIB' as needed. And when
adding NETC switch driver support in the future, there is no need to
modify the dependency, just select "NXP_NTMP" and "NXP_NETC_LIB" at the
same time.
Reported-by: Arnd Bergmann <arnd@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202505220734.x6TF6oHR-lkp@intel.com/
Fixes:
4701073c3deb ("net: enetc: add initial netc-lib driver to support NTMP")
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeongjun Park [Tue, 20 May 2025 16:07:17 +0000 (01:07 +0900)]
ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()
There is no disagreement that we should check both ptp->is_virtual_clock
and ptp->n_vclocks to check if the ptp virtual clock is in use.
However, when we acquire ptp->n_vclocks_mux to read ptp->n_vclocks in
ptp_vclock_in_use(), we observe a recursive lock in the call trace
starting from n_vclocks_store().
============================================
WARNING: possible recursive locking detected
6.15.0-rc6 #1 Not tainted
--------------------------------------------
syz.0.1540/13807 is trying to acquire lock:
ffff888035a24868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
ptp_vclock_in_use drivers/ptp/ptp_private.h:103 [inline]
ffff888035a24868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
ptp_clock_unregister+0x21/0x250 drivers/ptp/ptp_clock.c:415
but task is already holding lock:
ffff888030704868 (&ptp->n_vclocks_mux){+.+.}-{4:4}, at:
n_vclocks_store+0xf1/0x6d0 drivers/ptp/ptp_sysfs.c:215
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ptp->n_vclocks_mux);
lock(&ptp->n_vclocks_mux);
*** DEADLOCK ***
....
============================================
The best way to solve this is to remove the logic that checks
ptp->n_vclocks in ptp_vclock_in_use().
The reason why this is appropriate is that any path that uses
ptp->n_vclocks must unconditionally check if ptp->n_vclocks is greater
than 0 before unregistering vclocks, and all functions are already
written this way. And in the function that uses ptp->n_vclocks, we
already get ptp->n_vclocks_mux before unregistering vclocks.
Therefore, we need to remove the redundant check for ptp->n_vclocks in
ptp_vclock_in_use() to prevent recursive locking.
Fixes:
73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://patch.msgid.link/20250520160717.7350-1-aha310510@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonas Gorski [Mon, 2 Jun 2025 19:49:14 +0000 (21:49 +0200)]
net: dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
When Linux sends out untagged traffic from a port, it will enter the CPU
port without any VLAN tag, even if the port is a member of a vlan
filtering bridge with a PVID egress untagged VLAN.
This makes the CPU port's PVID take effect, and the PVID's VLAN
table entry controls if the packet will be tagged on egress.
Since commit
45e9d59d3950 ("net: dsa: b53: do not allow to configure
VLAN 0") we remove bridged ports from VLAN 0 when joining or leaving a
VLAN aware bridge. But we also clear the untagged bit, causing untagged
traffic from the controller to become tagged with VID 0 (and priority
0).
Fix this by not touching the untagged map of VLAN 0. Additionally,
always keep the CPU port as a member, as the untag map is only effective
as long as there is at least one member, and we would remove it when
bridging all ports and leaving no standalone ports.
Since Linux (and the switch) treats VLAN 0 tagged traffic like untagged,
the actual impact of this is rather low, but this also prevented earlier
detection of the issue.
Fixes:
45e9d59d3950 ("net: dsa: b53: do not allow to configure VLAN 0")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20250602194914.1011890-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 5 Jun 2025 19:34:55 +0000 (12:34 -0700)]
Merge tag 'net-6.16-rc1' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN, wireless, Bluetooth, and Netfilter.
Current release - regressions:
- Revert "kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in
all_tests", makes kunit error out if compiler is old
- wifi: iwlwifi: mvm: fix assert on suspend
- rxrpc: fix return from none_validate_challenge()
Current release - new code bugs:
- ovpn: couple of fixes for socket cleanup and UDP-tunnel teardown
- can: kvaser_pciefd: refine error prone echo_skb_max handling logic
- fix net_devmem_bind_dmabuf() stub when DEVMEM not compiled
- eth: airoha: fixes for config / accel in bridge mode
Previous releases - regressions:
- Bluetooth: hci_qca: move the SoC type check to the right place, fix
GPIO integration
- prevent a NULL deref in rtnl_create_link() after locking changes
- fix udp gso skb_segment after pull from frag_list
- hv_netvsc: fix potential deadlock in netvsc_vf_setxdp()
Previous releases - always broken:
- netfilter:
- nf_nat: also check reverse tuple to obtain clashing entry
- nf_set_pipapo_avx2: fix initial map fill (zeroing)
- fix the helper for incremental update of packet checksums after
modifying the IP address, used by ILA and BPF
- eth:
- stmmac: prevent div by 0 when clock rate is misconfigured
- ice: fix Tx scheduler handling of XDP and changing queue count
- eth: fix support for the RGMII interface when delays configured"
* tag 'net-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
calipso: unlock rcu before returning -EAFNOSUPPORT
seg6: Fix validation of nexthop addresses
net: prevent a NULL deref in rtnl_create_link()
net: annotate data-races around cleanup_net_task
selftests: drv-net: tso: make bkg() wait for socat to quit
selftests: drv-net: tso: fix the GRE device name
selftests: drv-net: add configs for the TSO test
wireguard: device: enable threaded NAPI
netlink: specs: rt-link: decode ip6gre
netlink: specs: rt-link: add missing byte-order properties
net: wwan: mhi_wwan_mbim: use correct mux_id for multiplexing
wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements
net: dsa: b53: do not touch DLL_IQQD on bcm53115
net: dsa: b53: allow RGMII for bcm63xx RGMII ports
net: dsa: b53: do not configure bcm63xx's IMP port interface
net: dsa: b53: do not enable RGMII delay on bcm63xx
net: dsa: b53: do not enable EEE on bcm63xx
net: ti: icssg-prueth: Fix swapped TX stats for MII interfaces.
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
...
Eric Biggers [Thu, 5 Jun 2025 17:11:56 +0000 (10:11 -0700)]
MAINTAINERS: add entry for crypto library
I am volunteering to maintain the kernel's crypto library code.
[ And Jason and Ard piped up too - Linus ]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luiz Augusto von Dentz [Tue, 20 May 2025 19:42:21 +0000 (15:42 -0400)]
Bluetooth: MGMT: Protect mgmt_pending list with its own lock
This uses a mutex to protect from concurrent access of mgmt_pending
list which can cause crashes like:
==================================================================
BUG: KASAN: slab-use-after-free in hci_sock_get_channel+0x60/0x68 net/bluetooth/hci_sock.c:91
Read of size 2 at addr
ffff0000c48885b2 by task syz.4.334/7318
CPU: 0 UID: 0 PID: 7318 Comm: syz.4.334 Not tainted
6.15.0-rc7-syzkaller-g187899f4124a #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call trace:
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
__dump_stack+0x30/0x40 lib/dump_stack.c:94
dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120
print_address_description+0xa8/0x254 mm/kasan/report.c:408
print_report+0x68/0x84 mm/kasan/report.c:521
kasan_report+0xb0/0x110 mm/kasan/report.c:634
__asan_report_load2_noabort+0x20/0x2c mm/kasan/report_generic.c:379
hci_sock_get_channel+0x60/0x68 net/bluetooth/hci_sock.c:91
mgmt_pending_find+0x7c/0x140 net/bluetooth/mgmt_util.c:223
pending_find net/bluetooth/mgmt.c:947 [inline]
remove_adv_monitor+0x44/0x1a4 net/bluetooth/mgmt.c:5445
hci_mgmt_cmd+0x780/0xc00 net/bluetooth/hci_sock.c:1712
hci_sock_sendmsg+0x544/0xbb0 net/bluetooth/hci_sock.c:1832
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg net/socket.c:727 [inline]
sock_write_iter+0x25c/0x378 net/socket.c:1131
new_sync_write fs/read_write.c:591 [inline]
vfs_write+0x62c/0x97c fs/read_write.c:684
ksys_write+0x120/0x210 fs/read_write.c:736
__do_sys_write fs/read_write.c:747 [inline]
__se_sys_write fs/read_write.c:744 [inline]
__arm64_sys_write+0x7c/0x90 fs/read_write.c:744
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Allocated by task 7037:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_alloc_info+0x44/0x54 mm/kasan/generic.c:562
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x9c/0xb4 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__do_kmalloc_node mm/slub.c:4327 [inline]
__kmalloc_noprof+0x2fc/0x4c8 mm/slub.c:4339
kmalloc_noprof include/linux/slab.h:909 [inline]
sk_prot_alloc+0xc4/0x1f0 net/core/sock.c:2198
sk_alloc+0x44/0x3ac net/core/sock.c:2254
bt_sock_alloc+0x4c/0x300 net/bluetooth/af_bluetooth.c:148
hci_sock_create+0xa8/0x194 net/bluetooth/hci_sock.c:2202
bt_sock_create+0x14c/0x24c net/bluetooth/af_bluetooth.c:132
__sock_create+0x43c/0x91c net/socket.c:1541
sock_create net/socket.c:1599 [inline]
__sys_socket_create net/socket.c:1636 [inline]
__sys_socket+0xd4/0x1c0 net/socket.c:1683
__do_sys_socket net/socket.c:1697 [inline]
__se_sys_socket net/socket.c:1695 [inline]
__arm64_sys_socket+0x7c/0x94 net/socket.c:1695
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Freed by task 6607:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_free_info+0x58/0x70 mm/kasan/generic.c:576
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x68/0x88 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2380 [inline]
slab_free mm/slub.c:4642 [inline]
kfree+0x17c/0x474 mm/slub.c:4841
sk_prot_free net/core/sock.c:2237 [inline]
__sk_destruct+0x4f4/0x760 net/core/sock.c:2332
sk_destruct net/core/sock.c:2360 [inline]
__sk_free+0x320/0x430 net/core/sock.c:2371
sk_free+0x60/0xc8 net/core/sock.c:2382
sock_put include/net/sock.h:1944 [inline]
mgmt_pending_free+0x88/0x118 net/bluetooth/mgmt_util.c:290
mgmt_pending_remove+0xec/0x104 net/bluetooth/mgmt_util.c:298
mgmt_set_powered_complete+0x418/0x5cc net/bluetooth/mgmt.c:1355
hci_cmd_sync_work+0x204/0x33c net/bluetooth/hci_sync.c:334
process_one_work+0x7e8/0x156c kernel/workqueue.c:3238
process_scheduled_works kernel/workqueue.c:3319 [inline]
worker_thread+0x958/0xed8 kernel/workqueue.c:3400
kthread+0x5fc/0x75c kernel/kthread.c:464
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:847
Fixes:
a380b6cff1a2 ("Bluetooth: Add generic mgmt helper API")
Closes: https://syzkaller.appspot.com/bug?extid=
0a7039d5d9986ff4ecec
Closes: https://syzkaller.appspot.com/bug?extid=
cc0cc52e7f43dc9e6df1
Reported-by: syzbot+0a7039d5d9986ff4ecec@syzkaller.appspotmail.com
Tested-by: syzbot+0a7039d5d9986ff4ecec@syzkaller.appspotmail.com
Tested-by: syzbot+cc0cc52e7f43dc9e6df1@syzkaller.appspotmail.com
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Luiz Augusto von Dentz [Tue, 3 Jun 2025 20:12:39 +0000 (16:12 -0400)]
Bluetooth: MGMT: Fix UAF on mgmt_remove_adv_monitor_complete
This reworks MGMT_OP_REMOVE_ADV_MONITOR to not use mgmt_pending_add to
avoid crashes like bellow:
==================================================================
BUG: KASAN: slab-use-after-free in mgmt_remove_adv_monitor_complete+0xe5/0x540 net/bluetooth/mgmt.c:5406
Read of size 8 at addr
ffff88801c53f318 by task kworker/u5:5/5341
CPU: 0 UID: 0 PID: 5341 Comm: kworker/u5:5 Not tainted
6.15.0-syzkaller-10402-g4cb6c8af8591 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0xd2/0x2b0 mm/kasan/report.c:521
kasan_report+0x118/0x150 mm/kasan/report.c:634
mgmt_remove_adv_monitor_complete+0xe5/0x540 net/bluetooth/mgmt.c:5406
hci_cmd_sync_work+0x261/0x3a0 net/bluetooth/hci_sync.c:334
process_one_work kernel/workqueue.c:3238 [inline]
process_scheduled_works+0xade/0x17b0 kernel/workqueue.c:3321
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3402
kthread+0x711/0x8a0 kernel/kthread.c:464
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Allocated by task 5987:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__kmalloc_cache_noprof+0x230/0x3d0 mm/slub.c:4358
kmalloc_noprof include/linux/slab.h:905 [inline]
kzalloc_noprof include/linux/slab.h:1039 [inline]
mgmt_pending_new+0x65/0x240 net/bluetooth/mgmt_util.c:252
mgmt_pending_add+0x34/0x120 net/bluetooth/mgmt_util.c:279
remove_adv_monitor+0x103/0x1b0 net/bluetooth/mgmt.c:5454
hci_mgmt_cmd+0x9c9/0xef0 net/bluetooth/hci_sock.c:1719
hci_sock_sendmsg+0x6ca/0xef0 net/bluetooth/hci_sock.c:1839
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg+0x219/0x270 net/socket.c:727
sock_write_iter+0x258/0x330 net/socket.c:1131
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x548/0xa90 fs/read_write.c:686
ksys_write+0x145/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5989:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:576
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x62/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2380 [inline]
slab_free mm/slub.c:4642 [inline]
kfree+0x18e/0x440 mm/slub.c:4841
mgmt_pending_foreach+0xc9/0x120 net/bluetooth/mgmt_util.c:242
mgmt_index_removed+0x10d/0x2f0 net/bluetooth/mgmt.c:9366
hci_sock_bind+0xbe9/0x1000 net/bluetooth/hci_sock.c:1314
__sys_bind_socket net/socket.c:1810 [inline]
__sys_bind+0x2c3/0x3e0 net/socket.c:1841
__do_sys_bind net/socket.c:1846 [inline]
__se_sys_bind net/socket.c:1844 [inline]
__x64_sys_bind+0x7a/0x90 net/socket.c:1844
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes:
66bd095ab5d4 ("Bluetooth: advmon offload MSFT remove monitor")
Closes: https://syzkaller.appspot.com/bug?extid=
feb0dc579bbe30a13190
Reported-by: syzbot+feb0dc579bbe30a13190@syzkaller.appspotmail.com
Tested-by: syzbot+feb0dc579bbe30a13190@syzkaller.appspotmail.com
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Chandrashekar Devegowda [Tue, 3 Jun 2025 10:04:40 +0000 (15:34 +0530)]
Bluetooth: btintel_pcie: Reduce driver buffer posting to prevent race condition
Modify the driver to post 3 fewer buffers than the maximum rx buffers
(64) allowed for the firmware. This change mitigates a hardware issue
causing a race condition in the firmware, improving stability and data
handling.
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Fixes:
c2b636b3f788 ("Bluetooth: btintel_pcie: Add support for PCIe transport")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Chandrashekar Devegowda [Tue, 3 Jun 2025 10:04:39 +0000 (15:34 +0530)]
Bluetooth: btintel_pcie: Increase the tx and rx descriptor count
This change addresses latency issues observed in HID use cases where
events arrive in bursts. By increasing the Rx descriptor count to 64,
the firmware can handle bursty data more effectively, reducing latency
and preventing buffer overflows.
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Fixes:
c2b636b3f788 ("Bluetooth: btintel_pcie: Add support for PCIe transport")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Kiran K [Tue, 3 Jun 2025 10:04:38 +0000 (15:34 +0530)]
Bluetooth: btintel_pcie: Fix driver not posting maximum rx buffers
The driver was posting only 6 rx buffers, despite the maximum rx buffers
being defined as 16. Having fewer RX buffers caused firmware exceptions
in HID use cases when events arrived in bursts.
Exception seen on android 6.12 kernel.
E Bluetooth: hci0: Received hw exception interrupt
E Bluetooth: hci0: Received gp1 mailbox interrupt
D Bluetooth: hci0:
00000000: ff 3e 87 80 03 01 01 01 03 01 0c 0d 02 1c 10 0e
D Bluetooth: hci0:
00000010: 01 00 05 14 66 b0 28 b0 c0 b0 28 b0 ac af 28 b0
D Bluetooth: hci0:
00000020: 14 f1 28 b0 00 00 00 00 fa 04 00 00 00 00 40 10
D Bluetooth: hci0:
00000030: 08 00 00 00 7a 7a 7a 7a 47 00 fb a0 10 00 00 00
D Bluetooth: hci0:
00000000: 10 01 0a
E Bluetooth: hci0: ---- Dump of debug registers —
E Bluetooth: hci0: boot stage: 0xe0fb0047
E Bluetooth: hci0: ipc status: 0x00000004
E Bluetooth: hci0: ipc control: 0x00000000
E Bluetooth: hci0: ipc sleep control: 0x00000000
E Bluetooth: hci0: mbox_1: 0x00badbad
E Bluetooth: hci0: mbox_2: 0x0000101c
E Bluetooth: hci0: mbox_3: 0x00000008
E Bluetooth: hci0: mbox_4: 0x7a7a7a7a
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Fixes:
c2b636b3f788 ("Bluetooth: btintel_pcie: Add support for PCIe transport")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pauli Virtanen [Sat, 31 May 2025 15:24:58 +0000 (18:24 +0300)]
Bluetooth: hci_core: fix list_for_each_entry_rcu usage
Releasing + re-acquiring RCU lock inside list_for_each_entry_rcu() loop
body is not correct.
Fix by taking the update-side hdev->lock instead.
Fixes:
c7eaf80bfb0c ("Bluetooth: Fix hci_link_tx_to RCU lock usage")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Linus Torvalds [Thu, 5 Jun 2025 18:45:33 +0000 (11:45 -0700)]
Merge tag 'uml-for-linux-6.16-rc1' of git://git./linux/kernel/git/uml/linux
Pull UML updates from Johannes Berg:
"The only really new thing is the long-standing seccomp work
(originally from 2021!). Wven if it still isn't enabled by default due
to security concerns it can still be used e.g. for tests.
- remove obsolete network transports
- remove PCI IO port support
- start adding seccomp-based process handling instead of ptrace"
* tag 'uml-for-linux-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (29 commits)
um: remove "extern" from implementation of sigchld_handler
um: fix unused variable warning
um: fix SECCOMP 32bit xstate register restore
um: pass FD for memory operations when needed
um: Add SECCOMP support detection and initialization
um: Implement kernel side of SECCOMP based process handling
um: Track userspace children dying in SECCOMP mode
um: Add helper functions to get/set state for SECCOMP
um: Add stub side of SECCOMP/futex based process handling
um: Move faultinfo extraction into userspace routine
um: vector: Use mac_pton() for MAC address parsing
um: vector: Clean up and modernize log messages
um: chan_kern: use raw spinlock for irqs_to_free_lock
MAINTAINERS: remove obsolete file entry in TUN/TAP DRIVER
um: Fix tgkill compile error on old host OSes
um: stop using PCI port I/O
um: Remove legacy network transport infrastructure
um: vector: Eliminate the dependency on uml_net
um: Remove obsolete legacy network transports
um/asm: Replace "REP; NOP" with PAUSE mnemonic
...
Linus Torvalds [Thu, 5 Jun 2025 18:39:17 +0000 (11:39 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"We've got a couple of build fixes when using LLD, a missing TLB
invalidation and a workaround for broken firmware on SoCs with CPUs
that implement MPAM:
- Disable problematic linker assertions for broken versions of LLD
- Work around sporadic link failure with LLD and various randconfig
builds
- Fix missing invalidation in the TLB batching code when reclaim
races with mprotect() and friends
- Add a command-line override for MPAM to allow booting on systems
with broken firmware"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Add override for MPAM
arm64/mm: Close theoretical race where stale TLB entry remains valid
arm64: Work around convergence issue with LLD linker
arm64: Disable LLD linker ASSERT()s for the time being
Linus Torvalds [Thu, 5 Jun 2025 18:33:09 +0000 (11:33 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rmk/linux
Pull ARM fixes from Russell King:
- Fix arch_memremap_can_ram_remap() which incorrectly passed a PFN to
memblock_is_map_memory rather than the actual address.
- Disallow kernel mode NEON when IRQs are disabled
Explanation:
"To avoid having to preserve/restore kernel mode NEON state when
such a softirq is taken softirqs are now disabled when using the
NEON from task context."
should explain that it's nested kernel mode.
In other words, softirqs from user mode are fine, because the context
will be preserved. softirqs from kernel mode may be from a context
that has already saved the user NEON state, and thus we would need to
preserve the NEON state for the parent kernel mode context, and this
we don't allow.
The problem occurs when the kernel context disables hard IRQs, and
then uses NEON. When it's finished, and restores the userspace NEON
state, we call local_bh_enable() with hard IRQs disabled, which
causes a warning.
This commit addresses that by disallowing the use of NEON with hard
IRQs disabled.
https://lore.kernel.org/all/
20250516231858.27899-4-ebiggers@kernel.org/T/#m104841b6e9346b1814c8b0fb9f2340551b0cd3e8
has some further context
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9446/1: Disallow kernel mode NEON when IRQs are disabled
ARM: 9447/1: arm/memremap: fix arch_memremap_can_ram_remap()
Linus Torvalds [Thu, 5 Jun 2025 15:54:47 +0000 (08:54 -0700)]
Merge tag 'rtc-6.16' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There are two new drivers this cycle. There is also support for a
negative offset for RTCs that have been shipped with a date set using
an epoch that is before 1970. This unfortunately happens with some
products that ship with a vendor kernel and an out of tree driver.
Core:
- support negative offsets for RTCs that have shipped with an epoch
earlier than 1970
New drivers:
- NXP S32G2/S32G3
- Sophgo CV1800
Drivers:
- loongson: fix missing alarm notifications for ACPI
- m41t80: kickstart ocillator upon failure
- mt6359: mt6357 support
- pcf8563: fix wrong alarm register
- sh: cleanups"
* tag 'rtc-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (39 commits)
rtc: mt6359: Add mt6357 support
rtc: test: Test date conversion for dates starting in 1900
rtc: test: Also test time and wday outcome of rtc_time64_to_tm()
rtc: test: Emit the seconds-since-1970 value instead of days-since-1970
rtc: Fix offset calculation for .start_secs < 0
rtc: Make rtc_time64_to_tm() support dates before 1970
rtc: pcf8563: fix wrong alarm register
rtc: rzn1: support input frequencies other than 32768Hz
rtc: rzn1: Disable controller before initialization
dt-bindings: rtc: rzn1: add optional second clock
rtc: m41t80: reduce verbosity
rtc: m41t80: kickstart ocillator upon failure
rtc: s32g: add NXP S32G2/S32G3 SoC support
dt-bindings: rtc: add schema for NXP S32G2/S32G3 SoCs
dt-bindings: at91rm9260-rtt: add microchip,sama7d65-rtt
dt-bindings: rtc: at91rm9200: add microchip,sama7d65-rtc
rtc: loongson: Add missing alarm notifications for ACPI RTC events
rtc: sophgo: add rtc support for Sophgo CV1800 SoC
rtc: stm32: drop unused module alias
rtc: s3c: drop unused module alias
...
Linus Torvalds [Thu, 5 Jun 2025 15:49:30 +0000 (08:49 -0700)]
Merge tag 'dmaengine-6.16-rc1' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"A fairly small update for the dmaengine subsystem. This has a new ARM
dmaengine driver and couple of new device support and few driver
changes:
New support:
- Renesas RZ/V2H(P) dma support for r9a09g057
- Arm DMA-350 driver
- Tegra Tegra264 ADMA support
Updates:
- AMD ptdma driver code removal and optimizations
- Freescale edma error interrupt handler support"
* tag 'dmaengine-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (27 commits)
dmaengine: idxd: Remove unused pointer and macro
arm64: dts: renesas: r9a09g057: Add DMAC nodes
dmaengine: sh: rz-dmac: Add RZ/V2H(P) support
dmaengine: sh: rz-dmac: Allow for multiple DMACs
irqchip/renesas-rzv2h: Add rzv2h_icu_register_dma_req()
dt-bindings: dma: rz-dmac: Document RZ/V2H(P) family of SoCs
dt-bindings: dma: rz-dmac: Restrict properties for RZ/A1H
dmaengine: idxd: Narrow the restriction on BATCH to ver. 1 only
dmaengine: ti: Add NULL check in udma_probe()
fsldma: Set correct dma_mask based on hw capability
dmaengine: idxd: Check availability of workqueue allocated by idxd wq driver before using
dmaengine: xilinx_dma: Set dma_device directions
dmaengine: tegra210-adma: Add Tegra264 support
dt-bindings: Document Tegra264 ADMA support
dmaengine: dw-edma: Add HDMA NATIVE map check
dmaegnine: fsl-edma: add edma error interrupt handler
dt-bindings: dma: fsl-edma: increase maxItems of interrupts and interrupt-names
dmaengine: ARM_DMA350 should depend on ARM/ARM64
dt-bindings: dma: qcom,bam: Document dma-coherent property
dmaengine: Add Arm DMA-350 driver
...
Linus Torvalds [Thu, 5 Jun 2025 15:20:21 +0000 (08:20 -0700)]
Merge tag 'phy-for-6.16' of git://git./linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"As usual featuring couple of new driver and bunch of new device
support and some driver changes to Freescale, rockchip driver along
with couple of yaml binding conversions.
New Support:
- Qualcomm IPQ5424 qusb2 support, IPQ5018 uniphy-pcie driver
- Rockchip usb2 support for RK3562, RK3036 usb2 phy support
- Samsung exynos2200 eusb2 phy support and driver refactoring for
this support, exynos7870 USBDRD support
- Mediatek MT7988 xs-phy support
- Broadcom BCM74110 usb phy support
- Renesas RZ/V2H(P) usb2 phy support
Updates:
- Freescale phy rate claculation updates, i.MX95 tuning support
- Better error handling for amlogic pcie phy
- Rockchip color depth configuration and management support
- Yaml binding conversion for RK3399 Type-C and PCIe Phy"
* tag 'phy-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (77 commits)
phy: tegra: p2u: Broaden architecture dependency
phy: rockchip: inno-usb2: Add usb2 phy support for rk3562
dt-bindings: phy: rockchip,inno-usb2phy: add rk3562
phy: rockchip: inno-usb2: add phy definition for rk3036
dt-bindings: phy: rockchip,inno-usb2phy: add rk3036 compatible
phy: freescale: fsl-samsung-hdmi: Improve LUT search for best clock
phy: freescale: fsl-samsung-hdmi: Refactor finding PHY settings
phy: freescale: fsl-samsung-hdmi: Rename phy_clk_round_rate
phy: renesas: phy-rcar-gen3-usb2: Add USB2.0 PHY support for RZ/V2H(P)
phy: renesas: phy-rcar-gen3-usb2: Sort compatible entries by SoC part number
dt-bindings: phy: renesas,usb2-phy: Document RZ/V2H(P) SoC
dt-bindings: phy: renesas,usb2-phy: Add clock constraint for RZ/G2L family
phy: exynos5-usbdrd: support Exynos USBDRD 3.2 4nm controller
phy: phy-snps-eusb2: add support for exynos2200
phy: phy-snps-eusb2: refactor reference clock init
phy: phy-snps-eusb2: make reset control optional
phy: phy-snps-eusb2: make repeater optional
phy: phy-snps-eusb2: split phy init code
phy: phy-snps-eusb2: refactor constructs names
phy: move phy-qcom-snps-eusb2 out of its vendor sub-directory
...
Linus Torvalds [Thu, 5 Jun 2025 15:07:24 +0000 (08:07 -0700)]
Merge tag 'soundwire-6.16-rc1' of git://git./linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
"A couple of small core changes and an Intel driver change:
- sdw_assign_device_num() logic simplification, using internal slave
id for irqs and optimizing computing of port params in specific
stream states
- Intel driver updates for ACE3+ microphone privacy status reporting
and enabling the status in HDA Intel driver"
* tag 'soundwire-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: only compute port params in specific stream states
ASoC: SOF: Intel: hda: Set the mic_privacy flag for soundwire with ACE3+
soundwire: intel: Add awareness of ACE3+ microphone privacy
soundwire: bus: Add internal slave ID and use for IRQs
soundwire: bus: Simplify sdw_assign_device_num()
Eric Dumazet [Wed, 4 Jun 2025 13:38:26 +0000 (13:38 +0000)]
calipso: unlock rcu before returning -EAFNOSUPPORT
syzbot reported that a recent patch forgot to unlock rcu
in the error path.
Adopt the convention that netlbl_conn_setattr() is already using.
Fixes:
6e9f2df1c550 ("calipso: Don't call calipso functions for AF_INET sk.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20250604133826.1667664-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Wed, 4 Jun 2025 11:32:52 +0000 (14:32 +0300)]
seg6: Fix validation of nexthop addresses
The kernel currently validates that the length of the provided nexthop
address does not exceed the specified length. This can lead to the
kernel reading uninitialized memory if user space provided a shorter
length than the specified one.
Fix by validating that the provided length exactly matches the specified
one.
Fixes:
d1df6fd8a1d2 ("ipv6: sr: define core operations for seg6local lightweight tunnel")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250604113252.371528-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 4 Jun 2025 10:58:15 +0000 (10:58 +0000)]
net: prevent a NULL deref in rtnl_create_link()
At the time rtnl_create_link() is running, dev->netdev_ops is NULL,
we must not use netdev_lock_ops() or risk a NULL deref if
CONFIG_NET_SHAPER is defined.
Use netif_set_group() instead of dev_set_group().
RIP: 0010:netdev_need_ops_lock include/net/netdev_lock.h:33 [inline]
RIP: 0010:netdev_lock_ops include/net/netdev_lock.h:41 [inline]
RIP: 0010:dev_set_group+0xc0/0x230 net/core/dev_api.c:82
Call Trace:
<TASK>
rtnl_create_link+0x748/0xd10 net/core/rtnetlink.c:3674
rtnl_newlink_create+0x25c/0xb00 net/core/rtnetlink.c:3813
__rtnl_newlink net/core/rtnetlink.c:3940 [inline]
rtnl_newlink+0x16d6/0x1c70 net/core/rtnetlink.c:4055
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6944
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2534
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x75b/0x8d0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:712 [inline]
Reported-by: syzbot+9fc858ba0312b42b577e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
6840265f.
a00a0220.d4325.0009.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes:
7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250604105815.1516973-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 4 Jun 2025 09:39:28 +0000 (09:39 +0000)]
net: annotate data-races around cleanup_net_task
from_cleanup_net() reads cleanup_net_task locklessly.
Add READ_ONCE()/WRITE_ONCE() annotations to avoid
a potential KCSAN warning, even if the race is harmless.
Fixes:
0734d7c3d93c ("net: expedite synchronize_net() for cleanup_net()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250604093928.1323333-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Jun 2025 01:20:55 +0000 (18:20 -0700)]
selftests: drv-net: tso: make bkg() wait for socat to quit
Commit
846742f7e32f ("selftests: drv-net: add a warning for
bkg + shell + terminate") added a warning for bkg() used
with terminate=True. The tso test was missed as we didn't
have it running anywhere in NIPA. Add exit_wait=True, to avoid:
# Warning: combining shell and terminate is risky!
# SIGTERM may not reach the child on zsh/ksh!
getting printed twice for every variant.
Fixes:
0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604012055.891431-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Jun 2025 01:20:31 +0000 (18:20 -0700)]
selftests: drv-net: tso: fix the GRE device name
The device type for IPv4 GRE is "gre" not "ipgre",
unlike for IPv6 which uses "ip6gre".
Not sure how I missed this when writing the test, perhaps
because all HW I have access to is on an IPv6-only network.
Fixes:
0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604012031.891242-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Jun 2025 00:16:52 +0000 (17:16 -0700)]
selftests: drv-net: add configs for the TSO test
Add missing config options for the tso.py test, specifically
to make sure the kernel is built with vxlan and gre tunnels.
I noticed this while adding a TSO-capable device QEMU to the CI.
Previously we only run virtio tests and it doesn't report LSO
stats on the QEMU we have.
Fixes:
0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250604001653.853008-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Jun 2025 14:59:31 +0000 (07:59 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
iavf: get rid of the crit lock
Przemek Kitszel says:
Fix some deadlocks in iavf, and make it less error prone for the future.
Patch 1 is simple and independent from the rest.
Patches 2, 3, 4 are strictly a refactor, but it enables the last patch
to be much smaller.
(Technically Jake given his RB tags not knowing I will send it to -net).
Patch 5 just adds annotations, this also helps prove last patch to be correct.
Patch 6 removes the crit lock, with its unusual try_lock()s.
I have more refactoring for scheduling done for -next, to be sent soon.
There is a simple test:
add VF; decrease number of queueus; remove VF
that was way too hard to pass without this series :)
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: get rid of the crit lock
iavf: sprinkle netdev_assert_locked() annotations
iavf: extract iavf_watchdog_step() out of iavf_watchdog_task()
iavf: simplify watchdog_task in terms of adminq task scheduling
iavf: centralize watchdog requeueing itself
iavf: iavf_suspend(): take RTNL before netdev_lock()
====================
Link: https://patch.msgid.link/20250603171710.2336151-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mirco Barone [Thu, 5 Jun 2025 12:06:16 +0000 (14:06 +0200)]
wireguard: device: enable threaded NAPI
Enable threaded NAPI by default for WireGuard devices in response to low
performance behavior that we observed when multiple tunnels (and thus
multiple wg devices) are deployed on a single host. This affects any
kind of multi-tunnel deployment, regardless of whether the tunnels share
the same endpoints or not (i.e., a VPN concentrator type of gateway
would also be affected).
The problem is caused by the fact that, in case of a traffic surge that
involves multiple tunnels at the same time, the polling of the NAPI
instance of all these wg devices tends to converge onto the same core,
causing underutilization of the CPU and bottlenecking performance.
This happens because NAPI polling is hosted by default in softirq
context, but the WireGuard driver only raises this softirq after the rx
peer queue has been drained, which doesn't happen during high traffic.
In this case, the softirq already active on a core is reused instead of
raising a new one.
As a result, once two or more tunnel softirqs have been scheduled on
the same core, they remain pinned there until the surge ends.
In our experiments, this almost always leads to all tunnel NAPIs being
handled on a single core shortly after a surge begins, limiting
scalability to less than 3× the performance of a single tunnel, despite
plenty of unused CPU cores being available.
The proposed mitigation is to enable threaded NAPI for all WireGuard
devices. This moves the NAPI polling context to a dedicated per-device
kernel thread, allowing the scheduler to balance the load across all
available cores.
On our 32-core gateways, enabling threaded NAPI yields a ~4× performance
improvement with 16 tunnels, increasing throughput from ~13 Gbps to
~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck)
jumps from 20% to 100%.
We have found no performance regressions in any scenario we tested.
Single-tunnel throughput remains unchanged.
More details are available in our Netdev paper.
Link: https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf
Signed-off-by: Mirco Barone <mirco.barone@polito.it>
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250605120616.2808744-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Thu, 5 Jun 2025 13:19:33 +0000 (15:19 +0200)]
Merge tag 'wireless-2025-06-05' of https://git./linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Couple of quick fixes:
- iwlwifi/iwlmld crash on certain error paths
- iwlwifi/iwlmld regulatory data mixup
- iwlwifi/iwlmld suspend/resume fix
- iwlwifi MSI (without -X) fix
- cfg80211/mac80211 S1G parsing fixes
* tag 'wireless-2025-06-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements
wifi: iwlwifi: mld: Move regulatory domain initialization
wifi: iwlwifi: pcie: fix non-MSIX handshake register
wifi: iwlwifi: mld: avoid panic on init failure
wifi: iwlwifi: mvm: fix assert on suspend
====================
Link: https://patch.msgid.link/20250605095443.17874-6-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Jun 2025 11:37:02 +0000 (13:37 +0200)]
Merge tag 'nf-25-06-05' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Zero out the remainder in nft_pipapo AVX2 implementation, otherwise
next lookup could bogusly report a mismatch. This is followed by two
patches to update nft_pipapo selftests to cover for the previous bug.
From Florian Westphal.
2) Check for reverse tuple too in case of esoteric NAT collisions for
UDP traffic and extend selftest coverage. Also from Florian.
netfilter pull request 25-06-05
* tag 'nf-25-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
selftests: netfilter: nft_concat_range.sh: add datapath check for map fill bug
selftests: netfilter: nft_concat_range.sh: prefer per element counters for testing
netfilter: nf_set_pipapo_avx2: fix initial map fill
====================
Link: https://patch.msgid.link/20250605085735.52205-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Jun 2025 10:50:12 +0000 (12:50 +0200)]
Merge branch 'netlink-specs-rt-link-decode-ip6gre'
Jakub Kicinski says:
====================
netlink: specs: rt-link: decode ip6gre
Adding GRE tunnels to the .config for driver tests caused
some unhappiness in YNL, as it can't decode all the link
attrs on the system. Add ip6gre support to fix the tests.
This is similar to commit
6ffdbb93a59c ("netlink: specs:
rt_link: decode ip6tnl, vti and vti6 link attrs").
====================
Link: https://patch.msgid.link/20250603135357.502626-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 3 Jun 2025 13:53:57 +0000 (06:53 -0700)]
netlink: specs: rt-link: decode ip6gre
Driver tests now require GRE tunnels, while we don't configure
them with YNL, YNL will complain when it sees link types it
doesn't recognize. Teach it decoding ip6gre tunnels. The attrs
are largely the same as IPv4 GRE.
Correct the type of encap-limit, but note that this attr is
only used in ip6gre, so the mistake didn't matter until now.
Fixes:
0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250603135357.502626-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 3 Jun 2025 13:53:56 +0000 (06:53 -0700)]
netlink: specs: rt-link: add missing byte-order properties
A number of fields in the ip tunnels are lacking the big-endian
designation. I suspect this is not intentional, as decoding
the ports with the right endian seems objectively beneficial.
Fixes:
6ffdbb93a59c ("netlink: specs: rt_link: decode ip6tnl, vti and vti6 link attrs")
Fixes:
077b6022d24b ("doc/netlink/specs: Add sub-message type to rt_link family")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250603135357.502626-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Jun 2025 10:37:10 +0000 (12:37 +0200)]
Merge tag 'ovpn-net-
20250603' of https://github.com/OpenVPN/ovpn-net-next
Antonio Quartulli says:
====================
In this batch you can find the following bug fixes:
Patch 1: when releasing a UDP socket we were wrongly invoking
setup_udp_tunnel_sock() with an empty config. This was not
properly shutting down the UDP encap state.
With this patch we simply undo what was done during setup.
Patch 2: ovpn was holding a reference to a 'struct socket'
without increasing its reference counter. This was intended
and worked as expected until we hit a race condition where
user space tries to close the socket while kernel space is
also releasing it. In this case the (struct socket *)->sk
member would disappear under our feet leading to a null-ptr-deref.
This patch fixes this issue by having struct ovpn_socket hold
a reference directly to the sk member while also increasing
its reference counter.
Patch 3: in case of errors along the TCP RX path (softirq)
we want to immediately delete the peer, but this operation may
sleep. With this patch we move the peer deletion to a scheduled
worker.
Patch 4 and 5 are instead fixing minor issues in the ovpn
kselftests.
* tag 'ovpn-net-
20250603' of https://github.com/OpenVPN/ovpn-net-next:
selftest/net/ovpn: fix missing file
selftest/net/ovpn: fix TCP socket creation
ovpn: avoid sleep in atomic context in TCP RX error path
ovpn: ensure sk is still valid during cleanup
ovpn: properly deconfigure UDP-tunnel
====================
Link: https://patch.msgid.link/20250603111110.4575-1-antonio@openvpn.net/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniele Palmas [Tue, 3 Jun 2025 09:12:04 +0000 (11:12 +0200)]
net: wwan: mhi_wwan_mbim: use correct mux_id for multiplexing
Recent Qualcomm chipsets like SDX72/75 require MBIM sessionId mapping
to muxId in the range (0x70-0x8F) for the PCIe tethered use.
This has been partially addressed by the referenced commit, mapping
the default data call to muxId = 112, but the multiplexed data calls
scenario was not properly considered, mapping sessionId = 1 to muxId
1, while it should have been 113.
Fix this by moving the session_id assignment logic to mhi_mbim_newlink,
in order to map sessionId = n to muxId = n + WDS_BIND_MUX_DATA_PORT_MUX_ID.
Fixes:
65bc58c3dcad ("net: wwan: mhi: make default data link id configurable")
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20250603091204.2802840-1-dnlplm@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Johannes Berg [Thu, 5 Jun 2025 09:33:07 +0000 (11:33 +0200)]
Merge tag 'iwlwifi-fixes-2025-06-04' of https://git./linux/kernel/git/iwlwifi/iwlwifi-next
Miri Korenblit says:
====================
iwlwifi fixes
====================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Lachlan Hodges [Tue, 3 Jun 2025 05:35:38 +0000 (15:35 +1000)]
wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements
S1G beacons are not traditional beacons but a type of extension frame.
Extension frames contain the frame control and duration fields, followed
by zero or more optional fields before the frame body. These optional
fields are distinct from the variable length elements.
The presence of optional fields is indicated in the frame control field.
To correctly locate the elements offset, the frame control must be parsed
to identify which optional fields are present. Currently, mac80211 parses
S1G beacons based on fixed assumptions about the frame layout, without
inspecting the frame control field. This can result in incorrect offsets
to the "variable" portion of the frame.
Properly parse S1G beacon frames by using the field lengths defined in
IEEE 802.11-2024, section 9.3.4.3, ensuring that the elements offset is
calculated accurately.
Fixes:
9eaffe5078ca ("cfg80211: convert S1G beacon to scan results")
Fixes:
cd418ba63f0c ("mac80211: convert S1G beacon to scan results")
Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Link: https://patch.msgid.link/20250603053538.468562-1-lachlan.hodges@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Benjamin Berg [Thu, 5 Jun 2025 05:03:25 +0000 (07:03 +0200)]
um: remove "extern" from implementation of sigchld_handler
There is no need to mark the function as extern in the implementation.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202506051226.X8r7X5aa-lkp@intel.com/
Fixes:
8420e08fe3a5 ("um: Track userspace children dying in SECCOMP mode")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250605050325.1077208-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Benjamin Berg [Thu, 5 Jun 2025 05:03:24 +0000 (07:03 +0200)]
um: fix unused variable warning
The code was updated to access the PID of the userspace stub process in
a different way, making the local cpu variable obsolete. Remove it.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202506050008.AwXLNxQX-lkp@intel.com/
Fixes:
406d17c6c370 ("um: Implement kernel side of SECCOMP based process handling")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250605050325.1077208-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Paolo Abeni [Thu, 5 Jun 2025 09:07:36 +0000 (11:07 +0200)]
Merge branch 'net-dsa-b53-fix-rgmii-ports'
Jonas Gorski says:
====================
net: dsa: b53: fix RGMII ports
RGMII ports on BCM63xx were not really working, especially with PHYs
that support EEE and are capable of configuring their own RGMII delays.
So let's make them work, and fix additional minor rgmii related issues
found while working on it.
With a BCM96328BU-P300:
Before:
[ 3.580000] b53-switch
10700000.switch GbE3 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.600000] b53-switch
10700000.switch GbE3 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.610000] b53-switch
10700000.switch GbE3 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 4
[ 3.620000] b53-switch
10700000.switch GbE1 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.640000] b53-switch
10700000.switch GbE1 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.650000] b53-switch
10700000.switch GbE1 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 5
[ 3.660000] b53-switch
10700000.switch GbE4 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.680000] b53-switch
10700000.switch GbE4 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.690000] b53-switch
10700000.switch GbE4 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 6
[ 3.700000] b53-switch
10700000.switch GbE5 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.720000] b53-switch
10700000.switch GbE5 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.730000] b53-switch
10700000.switch GbE5 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 7
After:
[ 3.700000] b53-switch
10700000.switch GbE3 (uninitialized): PHY [mdio_mux-0.1:00] driver [Broadcom BCM54612E] (irq=POLL)
[ 3.770000] b53-switch
10700000.switch GbE1 (uninitialized): PHY [mdio_mux-0.1:01] driver [Broadcom BCM54612E] (irq=POLL)
[ 3.850000] b53-switch
10700000.switch GbE4 (uninitialized): PHY [mdio_mux-0.1:18] driver [Broadcom BCM54612E] (irq=POLL)
[ 3.920000] b53-switch
10700000.switch GbE5 (uninitialized): PHY [mdio_mux-0.1:19] driver [Broadcom BCM54612E] (irq=POLL)
====================
Link: https://patch.msgid.link/20250602193953.1010487-1-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:53 +0000 (21:39 +0200)]
net: dsa: b53: do not touch DLL_IQQD on bcm53115
According to OpenMDK, bit 2 of the RGMII register has a different
meaning for BCM53115 [1]:
"DLL_IQQD 1: In the IDDQ mode, power is down0: Normal function
mode"
Configuring RGMII delay works without setting this bit, so let's keep it
at the default. For other chips, we always set it, so not clearing it
is not an issue.
One would assume BCM53118 works the same, but OpenMDK is not quite sure
what this bit actually means [2]:
"BYPASS_IMP_2NS_DEL #1: In the IDDQ mode, power is down#0: Normal
function mode1: Bypass dll65_2ns_del IP0: Use
dll65_2ns_del IP"
So lets keep setting it for now.
[1] https://github.com/Broadcom-Network-Switching-Software/OpenMDK/blob/master/cdk/PKG/chip/bcm53115/bcm53115_a0_defs.h#L19871
[2] https://github.com/Broadcom-Network-Switching-Software/OpenMDK/blob/master/cdk/PKG/chip/bcm53118/bcm53118_a0_defs.h#L14392
Fixes:
967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20250602193953.1010487-6-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:52 +0000 (21:39 +0200)]
net: dsa: b53: allow RGMII for bcm63xx RGMII ports
Add RGMII to supported interfaces for BCM63xx RGMII ports so they can be
actually used in RGMII mode.
Without this, phylink will fail to configure them:
[ 3.580000] b53-switch
10700000.switch GbE3 (uninitialized): validation of rgmii with support
0000000,
00000000,
00000000,
000062ff and advertisement
0000000,
00000000,
00000000,
000062ff failed: -EINVAL
[ 3.600000] b53-switch
10700000.switch GbE3 (uninitialized): failed to connect to PHY: -EINVAL
[ 3.610000] b53-switch
10700000.switch GbE3 (uninitialized): error -22 setting up PHY for tree 0, switch 0, port 4
Fixes:
ce3bf94871f7 ("net: dsa: b53: add support for BCM63xx RGMIIs")
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20250602193953.1010487-5-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:51 +0000 (21:39 +0200)]
net: dsa: b53: do not configure bcm63xx's IMP port interface
The IMP port is not a valid RGMII interface, but hard wired to internal,
so we shouldn't touch the undefined register B53_RGMII_CTRL_IMP.
While this does not seem to have any side effects, let's not touch it at
all, so limit RGMII configuration on bcm63xx to the actual RGMII ports.
Fixes:
ce3bf94871f7 ("net: dsa: b53: add support for BCM63xx RGMIIs")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250602193953.1010487-4-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:50 +0000 (21:39 +0200)]
net: dsa: b53: do not enable RGMII delay on bcm63xx
bcm63xx's RGMII ports are always in MAC mode, never in PHY mode, so we
shouldn't enable any delays and let the PHY handle any delays as
necessary.
This fixes using RGMII ports with normal PHYs like BCM54612E, which will
handle the delay in the PHY.
Fixes:
ce3bf94871f7 ("net: dsa: b53: add support for BCM63xx RGMIIs")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250602193953.1010487-3-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonas Gorski [Mon, 2 Jun 2025 19:39:49 +0000 (21:39 +0200)]
net: dsa: b53: do not enable EEE on bcm63xx
BCM63xx internal switches do not support EEE, but provide multiple RGMII
ports where external PHYs may be connected. If one of these PHYs are EEE
capable, we may try to enable EEE for the MACs, which then hangs the
system on access of the (non-existent) EEE registers.
Fix this by checking if the switch actually supports EEE before
attempting to configure it.
Fixes:
22256b0afb12 ("net: dsa: b53: Move EEE functions to b53")
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Álvaro Fernández Rojas <noltari@gmail.com>
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20250602193953.1010487-2-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Meghana Malladi [Tue, 3 Jun 2025 05:29:04 +0000 (10:59 +0530)]
net: ti: icssg-prueth: Fix swapped TX stats for MII interfaces.
In MII mode, Tx lines are swapped for port0 and port1, which means
Tx port0 receives data from PRU1 and the Tx port1 receives data from
PRU0. This is an expected hardware behavior and reading the Tx stats
needs to be handled accordingly in the driver. Update the driver to
read Tx stats from the PRU1 for port0 and PRU0 for port1.
Fixes:
c1e10d5dc7a1 ("net: ti: icssg-prueth: Add ICSSG Stats")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250603052904.431203-1-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Florian Westphal [Fri, 30 May 2025 10:34:03 +0000 (12:34 +0200)]
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
This will fail without the previous bug fix because we erronously
believe that the clashing entry went way.
However, the clash exists in the opposite direction due to an
existing nat mapping:
PASS: IP statless for ns2-LgTIuS
ERROR: failed to test udp ns1-x4iyOW to ns2-LgTIuS with dnat rule step 2, result: ""
This is partially adapted from test instructions from the below
ubuntu tracker.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2109889
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Shaun Brady <brady.1345@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 30 May 2025 10:34:02 +0000 (12:34 +0200)]
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
The logic added in the blamed commit was supposed to only omit nat source
port allocation if neither the existing nor the new entry are subject to
NAT.
However, its not enough to lookup the conntrack based on the proposed
tuple, we must also check the reverse direction.
Otherwise there are esoteric cases where the collision is in the reverse
direction because that colliding connection has a port rewrite, but the
new entry doesn't. In this case, we only check the new entry and then
erronously conclude that no clash exists anymore.
The existing (udp) tuple is:
a:p -> b:P, with nat translation to s:P, i.e. pure daddr rewrite,
reverse tuple in conntrack table is s:P -> a:p.
When another UDP packet is sent directly to s, i.e. a:p->s:P, this is
correctly detected as a colliding entry: tuple is taken by existing reply
tuple in reverse direction.
But the colliding conntrack is only searched for with unreversed
direction, and we can't find such entry matching a:p->s:P.
The incorrect conclusion is that the clashing entry has timed out and
that no port address translation is required.
Such conntrack will then be discarded at nf_confirm time because the
proposed reverse direction clashes with an existing mapping in the
conntrack table.
Search for the reverse tuple too, this will then check the NAT bits of
the colliding entry and triggers port reallocation.
Followp patch extends nft_nat.sh selftest to cover this scenario.
The IPS_SEQ_ADJUST change is also a bug fix:
Instead of checking for SEQ_ADJ this tested for SEEN_REPLY and ASSURED
by accident -- _BIT is only for use with the test_bit() API.
This bug has little consequence in practice, because the sequence number
adjustments are only useful for TCP which doesn't support clash resolution.
The existing test case (conntrack_reverse_clash.sh) exercise a race
condition path (parallel conntrack creation on different CPUs), so
the colliding entries have neither SEEN_REPLY nor ASSURED set.
Thanks to Yafang Shao and Shaun Brady for an initial investigation
of this bug.
Fixes:
d8f84a9bc7c4 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1795
Reported-by: Yafang Shao <laoar.shao@gmail.com>
Reported-by: Shaun Brady <brady.1345@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 23 May 2025 12:20:46 +0000 (14:20 +0200)]
selftests: netfilter: nft_concat_range.sh: add datapath check for map fill bug
commit
0935ee6032df ("selftests: netfilter: add test case for recent mismatch bug")
added a regression check for incorrect initial fill of the result map
that was fixed with
791a615b7ad2 ("netfilter: nf_set_pipapo: fix initial map fill").
The test used 'nft get element', i.e., control plane checks for
match/nomatch results.
The control plane however doesn't use avx2 version, so we need to
send+match packets.
As the additional packet match/nomatch is slow, don't do this for
every element added/removed: add and use maybe_send_(no)match
helpers and use them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 23 May 2025 12:20:45 +0000 (14:20 +0200)]
selftests: netfilter: nft_concat_range.sh: prefer per element counters for testing
The selftest uses following rule:
... @test counter name "test"
Then sends a packet, then checks if the named counter did increment or
not.
This is fine for the 'no-match' test case: If anything matches the
counter increments and the test fails as expected.
But for the 'should match' test cases this isn't optimal.
Consider buggy matching, where the packet matches entry x, but it
should have matched entry y.
In that case the test would erronously pass.
Rework the selftest to use per-element counters to avoid this.
After sending packet that should have matched entry x, query the
relevant element via 'nft reset element' and check that its counter
had incremented.
The 'nomatch' case isn't altered, no entry should match so the named
counter must be 0, changing it to the per-element counter would then
pass if another entry matches.
The downside of this change is a slight increase in test run-time by
a few seconds.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Fri, 23 May 2025 12:20:44 +0000 (14:20 +0200)]
netfilter: nf_set_pipapo_avx2: fix initial map fill
If the first field doesn't cover the entire start map, then we must zero
out the remainder, else we leak those bits into the next match round map.
The early fix was incomplete and did only fix up the generic C
implementation.
A followup patch adds a test case to nft_concat_range.sh.
Fixes:
791a615b7ad2 ("netfilter: nf_set_pipapo: fix initial map fill")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Thu, 5 Jun 2025 04:18:37 +0000 (21:18 -0700)]
Merge tag 'rust-6.16' of git://git./linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- KUnit '#[test]'s:
- Support KUnit-mapped 'assert!' macros.
The support that landed last cycle was very basic, and the
'assert!' macros panicked since they were the standard library
ones. Now, they are mapped to the KUnit ones in a similar way to
how is done for doctests, reusing the infrastructure there.
With this, a failing test like:
#[test]
fn my_first_test() {
assert_eq!(42, 43);
}
will report:
# my_first_test: ASSERTION FAILED at rust/kernel/lib.rs:251
Expected 42 == 43 to be true, but is false
# my_first_test.speed: normal
not ok 1 my_first_test
- Support tests with checked 'Result' return types.
The return value of test functions that return a 'Result' will
be checked, thus one can now easily catch errors when e.g. using
the '?' operator in tests.
With this, a failing test like:
#[test]
fn my_test() -> Result {
f()?;
Ok(())
}
will report:
# my_test: ASSERTION FAILED at rust/kernel/lib.rs:321
Expected is_test_result_ok(my_test()) to be true, but is false
# my_test.speed: normal
not ok 1 my_test
- Add 'kunit_tests' to the prelude.
- Clarify the remaining language unstable features in use.
- Compile 'core' with edition 2024 for Rust >= 1.87.
- Workaround 'bindgen' issue with forward references to 'enum' types.
- objtool: relax slice condition to cover more 'noreturn' functions.
- Use absolute paths in macros referencing 'core' and 'kernel'
crates.
- Skip '-mno-fdpic' flag for bindgen in GCC 32-bit arm builds.
- Clean some 'doc_markdown' lint hits -- we may enable it later on.
'kernel' crate:
- 'alloc' module:
- 'Box': support for type coercion, e.g. 'Box<T>' to 'Box<dyn U>'
if 'T' implements 'U'.
- 'Vec': implement new methods (prerequisites for nova-core and
binder): 'truncate', 'resize', 'clear', 'pop',
'push_within_capacity' (with new error type 'PushError'),
'drain_all', 'retain', 'remove' (with new error type
'RemoveError'), insert_within_capacity' (with new error type
'InsertError').
In addition, simplify 'push' using 'spare_capacity_mut', split
'set_len' into 'inc_len' and 'dec_len', add type invariant 'len
<= capacity' and simplify 'truncate' using 'dec_len'.
- 'time' module:
- Morph the Rust hrtimer subsystem into the Rust timekeeping
subsystem, covering delay, sleep, timekeeping, timers. This new
subsystem has all the relevant timekeeping C maintainers listed
in the entry.
- Replace 'Ktime' with 'Delta' and 'Instant' types to represent a
duration of time and a point in time.
- Temporarily add 'Ktime' to 'hrtimer' module to allow 'hrtimer'
to delay converting to 'Instant' and 'Delta'.
- 'xarray' module:
- Add a Rust abstraction for the 'xarray' data structure. This
abstraction allows Rust code to leverage the 'xarray' to store
types that implement 'ForeignOwnable'. This support is a
dependency for memory backing feature of the Rust null block
driver, which is waiting to be merged.
- Set up an entry in 'MAINTAINERS' for the XArray Rust support.
Patches will go to the new Rust XArray tree and then via the
Rust subsystem tree for now.
- Allow 'ForeignOwnable' to carry information about the pointed-to
type. This helps asserting alignment requirements for the
pointer passed to the foreign language.
- 'container_of!': retain pointer mut-ness and add a compile-time
check of the type of the first parameter ('$field_ptr').
- Support optional message in 'static_assert!'.
- Add C FFI types (e.g. 'c_int') to the prelude.
- 'str' module: simplify KUnit tests 'format!' macro, convert
'rusttest' tests into KUnit, take advantage of the '-> Result'
support in KUnit '#[test]'s.
- 'list' module: add examples for 'List', fix path of
'assert_pinned!' (so far unused macro rule).
- 'workqueue' module: remove 'HasWork::OFFSET'.
- 'page' module: add 'inline' attribute.
'macros' crate:
- 'module' macro: place 'cleanup_module()' in '.exit.text' section.
'pin-init' crate:
- Add 'Wrapper<T>' trait for creating pin-initializers for wrapper
structs with a structurally pinned value such as 'UnsafeCell<T>' or
'MaybeUninit<T>'.
- Add 'MaybeZeroable' derive macro to try to derive 'Zeroable', but
not error if not all fields implement it. This is needed to derive
'Zeroable' for all bindgen-generated structs.
- Add 'unsafe fn cast_[pin_]init()' functions to unsafely change the
initialized type of an initializer. These are utilized by the
'Wrapper<T>' implementations.
- Add support for visibility in 'Zeroable' derive macro.
- Add support for 'union's in 'Zeroable' derive macro.
- Upstream dev news: streamline CI, fix some bugs. Add new workflows
to check if the user-space version and the one in the kernel tree
have diverged. Use the issues tab [1] to track them, which should
help folks report and diagnose issues w.r.t. 'pin-init' better.
[1] https://github.com/rust-for-linux/pin-init/issues
Documentation:
- Testing: add docs on the new KUnit '#[test]' tests.
- Coding guidelines: explain that '///' vs. '//' applies to private
items too. Add section on C FFI types.
- Quick Start guide: update Ubuntu instructions and split them into
"25.04" and "24.04 LTS and older".
And a few other cleanups and improvements"
* tag 'rust-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (78 commits)
rust: list: Fix typo `much` in arc.rs
rust: check type of `$ptr` in `container_of!`
rust: workqueue: remove HasWork::OFFSET
rust: retain pointer mut-ness in `container_of!`
Documentation: rust: testing: add docs on the new KUnit `#[test]` tests
Documentation: rust: rename `#[test]`s to "`rusttest` host tests"
rust: str: take advantage of the `-> Result` support in KUnit `#[test]`'s
rust: str: simplify KUnit tests `format!` macro
rust: str: convert `rusttest` tests into KUnit
rust: add `kunit_tests` to the prelude
rust: kunit: support checked `-> Result`s in KUnit `#[test]`s
rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s
rust: make section names plural
rust: list: fix path of `assert_pinned!`
rust: compile libcore with edition 2024 for 1.87+
rust: dma: add missing Markdown code span
rust: task: add missing Markdown code spans and intra-doc links
rust: pci: fix docs related to missing Markdown code spans
rust: alloc: add missing Markdown code span
rust: alloc: add missing Markdown code spans
...
Linus Torvalds [Thu, 5 Jun 2025 02:46:22 +0000 (19:46 -0700)]
Merge tag 'bpf-fixes' of git://git./linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
"Two small fixes to selftests"
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Fix selftest btf_tag/btf_type_tag_percpu_vmlinux_helper failure
selftests/bpf: Fix bpf selftest build error
Linus Torvalds [Thu, 5 Jun 2025 02:23:37 +0000 (19:23 -0700)]
Merge tag '6.16-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server updates from Steve French:
"Four smb3 server fixes:
- Fix for special character handling when mounting with "posix"
- Fix for mounts from Mac for fs that don't provide unique inode
numbers
- Two cleanup patches (e.g. for crypto calls)"
* tag '6.16-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: allow a filename to contain special characters on SMB3.1.1 posix extension
ksmbd: provide zero as a unique ID to the Mac client
ksmbd: remove unnecessary softdep on crc32
ksmbd: use SHA-256 library API instead of crypto_shash API
Linus Torvalds [Thu, 5 Jun 2025 02:14:24 +0000 (19:14 -0700)]
Merge tag 'bcachefs-2025-06-04' of git://evilpiepirate.org/bcachefs
Pull more bcachefs updates from Kent Overstreet:
"More bcachefs updates:
- More stack usage improvements (~600 bytes)
- Define CLASS()es for some commonly used types, and convert most
rcu_read_lock() uses to the new lock guards
- New introspection:
- Superblock error counters are now available in sysfs:
previously, they were only visible with 'show-super', which
doesn't provide a live view
- New tracepoint, error_throw(), which is called any time we
return an error and start to unwind
- Repair
- check_fix_ptrs() can now repair btree node roots
- We can now repair when we've somehow ended up with the journal
using a superblock bucket
- Revert some leftovers from the aborted directory i_size feature,
and add repair code: some userspace programs (e.g. sshfs) were
getting confused
It seems in 6.15 there's a bug where i_nlink on the vfs inode has been
getting incorrectly set to 0, with some unfortunate results;
list_journal analysis showed bch2_inode_rm() being called (by
bch2_evict_inode()) when it clearly should not have been.
- bch2_inode_rm() now runs "should we be deleting this inode?" checks
that were previously only run when deleting unlinked inodes in
recovery
- check_subvol() was treating a dangling subvol (pointing to a
missing root inode) like a dangling dirent, and deleting it. This
was the really unfortunate one: check_subvol() will now recreate
the root inode if necessary
This took longer to debug than it should have, and we lost several
filesystems unnecessarily, because users have been ignoring the
release notes and blindly running 'fsck -y'. Debugging required
reconstructing what happened through analyzing the journal, when
ideally someone would have noticed 'hey, fsck is asking me if I want
to repair this: it usually doesn't, maybe I should run this in dry run
mode and check what's going on?'
As a reminder, fsck errors are being marked as autofix once we've
verified, in real world usage, that they're working correctly; blindly
running 'fsck -y' on an experimental filesystem is playing with fire
Up to this incident we've had an excellent track record of not losing
data, so let's try to learn from this one
This is a community effort, I wouldn't be able to get this done
without the help of all the people QAing and providing excellent bug
reports and feedback based on real world usage. But please don't
ignore advice and expect me to pick up the pieces
If an error isn't marked as autofix, and it /is/ happening in the
wild, that's also something I need to know about so we can check it
out and add it to the autofix list if repair looks good. I haven't
been getting those reports, and I should be; since we don't have any
sort of telemetry yet I am absolutely dependent on user reports
Now I'll be spending the weekend working on new repair code to see if
I can get a filesystem back for a user who didn't have backups"
* tag 'bcachefs-2025-06-04' of git://evilpiepirate.org/bcachefs: (69 commits)
bcachefs: add cond_resched() to handle_overwrites()
bcachefs: Make journal read log message a bit quieter
bcachefs: Fix subvol to missing root repair
bcachefs: Run may_delete_deleted_inode() checks in bch2_inode_rm()
bcachefs: delete dead code from may_delete_deleted_inode()
bcachefs: Add flags to subvolume_to_text()
bcachefs: Fix oops in btree_node_seq_matches()
bcachefs: Fix dirent_casefold_mismatch repair
bcachefs: Fix bch2_fsck_rename_dirent() for casefold
bcachefs: Redo bch2_dirent_init_name()
bcachefs: Fix -Wc23-extensions in bch2_check_dirents()
bcachefs: Run check_dirents second time if required
bcachefs: Run snapshot deletion out of system_long_wq
bcachefs: Make check_key_has_snapshot safer
bcachefs: BCH_RECOVERY_PASS_NO_RATELIMIT
bcachefs: bch2_require_recovery_pass()
bcachefs: bch_err_throw()
bcachefs: Repair code for directory i_size
bcachefs: Kill un-reverted directory i_size code
bcachefs: Delete redundant fsck_err()
...
Kent Overstreet [Tue, 3 Jun 2025 15:47:51 +0000 (11:47 -0400)]
bcachefs: add cond_resched() to handle_overwrites()
Fix soft lockup warnings in btree nodes can.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 3 Jun 2025 13:31:58 +0000 (09:31 -0400)]
bcachefs: Make journal read log message a bit quieter
Users seem to be assuming that the 'dropped unflushed entries' message
at the end of journal read indicates some sort of problem, when it does
not - we expect there to be entries in the journal that weren't
commited, it's purely informational so that we can correlate journal
sequence numbers elsewhere when debugging.
Shorten the log message a bit to hopefully make this clearer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Jun 2025 23:48:27 +0000 (19:48 -0400)]
bcachefs: Fix subvol to missing root repair
We had a bug where the root inode of a subvolume was erronously deleted:
bch2_evict_inode() called bch2_inode_rm(), meaning the VFS inode's
i_nlink was somehow set to 0 when it shouldn't have - the inode in the
btree indicated it clearly was not unlinked.
This has been addressed with additional safety checks in
bch2_inode_rm() - pulling in the safety checks we already were doing
when deleting unlinked inodes in recovery - but the really disastrous
bug was in check_subvols(), which on finding a dangling subvol (subvol
with a missing root inode) would delete the subvolume.
I assume this bug dates from early check_directory_structure() code,
which originally handled subvolumes and normal paths - the idea being
that still live contents of the subvolume would get reattached
somewhere.
But that's incorrect, and disastrously so; deleting a subvolume triggers
deleting the snapshot ID it points to, deleting the entire contents.
The correct way to repair is to recreate the root inode if it's missing;
then any contents will get reattached under that subvolume's lost+found.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Jun 2025 21:43:36 +0000 (17:43 -0400)]
bcachefs: Run may_delete_deleted_inode() checks in bch2_inode_rm()
We had a bug where bch2_evict_inode() incorrectly called bch2_inode_rm()
- the journal clearly showed the inode was not unlinked.
We've got checks that we use in recovery when cleaning up deleted
inodes, lift them to bch2_inode_rm() as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Jun 2025 22:26:44 +0000 (18:26 -0400)]
bcachefs: delete dead code from may_delete_deleted_inode()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Jun 2025 21:23:49 +0000 (17:23 -0400)]
bcachefs: Add flags to subvolume_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Jun 2025 13:26:20 +0000 (09:26 -0400)]
bcachefs: Fix oops in btree_node_seq_matches()
btree_update_nodes_written() needs to wait on in-flight writes to old
nodes before marking them as freed. But it has no reason to pin those
old nodes in memory, so some trickyness ensues.
The update we're completing deleted references to those nodes from the
btree, so we know if they've been evicted they can't be pulled back in.
We just have to check if the nodes we have pointers to are still those
old nodes, and haven't been reused.
To do that we check the node's "sequence number" (actually a random 64
bit cookie), but that lives in the node's data buffer. 'struct btree'
can't be freed until filesystem shutdown (as they're quite small), but
the data buffers can be freed or swapped around.
Commit
1f88c3567495, which was fixing a kmsan warning, assumed that we
could safely do this locklessly with just a READ_ONCE() - if we've got a
non-null ptr it would be safe to read from.
But that's not true if the data buffer is a vmalloc allocation, so we
need to restore the locking that commit deleted (or alternatively RCU
free those data buffers, but there's no other reason for that).
Fixes:
1f88c3567495 ("bcachefs: Fix a KMSAN splat in btree_update_nodes_written()")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 31 May 2025 04:11:52 +0000 (00:11 -0400)]
bcachefs: Fix dirent_casefold_mismatch repair
Instead of simply recreating a mis-casefolded dirent, use the str_hash
repair code, which will rename it if necessary - the dirent might have
been created again with the correct casefolding.
Factor out out bch2_str_hash_repair key() from
__bch2_str_hash_check_key() for the new path to use, and export
bch2_dirent_create_key() as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 31 May 2025 21:00:00 +0000 (17:00 -0400)]
bcachefs: Fix bch2_fsck_rename_dirent() for casefold
bch2_fsck_renamed_dirent was creating bch_dirent keys open-coded - but
we need to use the appropriate helper, if the directory is casefolded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 1 Jun 2025 22:35:18 +0000 (18:35 -0400)]
bcachefs: Redo bch2_dirent_init_name()
Redo (and simplify somewhat) how casefolded and non casefolded dirents
are initialized, and export this to be used by fsck_rename_dirent().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Nathan Chancellor [Wed, 4 Jun 2025 19:38:27 +0000 (12:38 -0700)]
bcachefs: Fix -Wc23-extensions in bch2_check_dirents()
Clang warns (or errors with CONFIG_WERROR=y):
fs/bcachefs/fsck.c:2325:2: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
2325 | int ret = bch2_trans_run(c,
| ^
On clang-17 and older, this is an unconditional error:
fs/bcachefs/fsck.c:2325:2: error: expected expression
2325 | int ret = bch2_trans_run(c,
| ^
Move the declaration of ret to the top of the function to resolve both
ways this issue manifests.
Fixes:
c72def523799 ("bcachefs: Run check_dirents second time if required")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Wed, 4 Jun 2025 19:07:16 +0000 (12:07 -0700)]
Merge tag 'sched_ext-for-6.16-rc1-fixes' of git://git./linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
"Two fixes in the built-in idle selection helpers:
- Fix prev_cpu handling to guarantee that idle selection never
returns a CPU that's not allowed
- Skip cross-node search when !NUMA which could lead to infinite
looping due to a bug in NUMA iterator"
* tag 'sched_ext-for-6.16-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: idle: Skip cross-node search with !CONFIG_NUMA
sched_ext: idle: Properly handle invalid prev_cpu during idle selection
Linus Torvalds [Wed, 4 Jun 2025 18:26:17 +0000 (11:26 -0700)]
Merge tag 'pci-v6.16-changes' of git://git./linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Print the actual delay time in pci_bridge_wait_for_secondary_bus()
instead of assuming it was 1000ms (Wilfred Mallawa)
- Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
devices', which broke resume from system sleep on AMD platforms and
has been fixed by other commits (Lukas Wunner)
Resource management:
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated
and unnecessary (Philipp Stanner)
- Remove pcim_iounmap_regions() and pcim_request_region_exclusive()
and related flags since all uses have been removed (Philipp
Stanner)
- Rework devres 'request' functions so they are no longer 'hybrid',
i.e., their behavior no longer depends on whether
pcim_enable_device or pci_enable_device() was used, and remove
related code (Philipp Stanner)
- Warn (not BUG()) about failure to assign optional resources (Ilpo
Järvinen)
Error handling:
- Log the DPC Error Source ID only when it's actually valid (when
ERR_FATAL or ERR_NONFATAL was received from a downstream device)
and decode into bus/device/function (Bjorn Helgaas)
- Determine AER log level once and save it so all related messages
use the same level (Karolina Stolarek)
- Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable
Errors (Karolina Stolarek)
- Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
controls on interval and burst count, to avoid flooding logs and
RCU stall warnings (Jon Pan-Doh)
Power management:
- Increment PM usage counter when probing reset methods so we don't
try to read config space of a powered-off device (Alex Williamson)
- Set all devices to D0 during enumeration to ensure ACPI opregion is
connected via _REG (Mario Limonciello)
Power control:
- Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match
the filename paths. Retain old deprecated symbols for
compatibility, except for the pwrctrl slot driver
(PCI_PWRCTRL_SLOT) (Johan Hovold)
- When unregistering pwrctrl, cancel outstanding rescan work before
cleaning up data structures to avoid use-after-free issues (Brian
Norris)
Bandwidth control:
- Simplify link bandwidth controller by replacing the count of Link
Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN
flag (Ilpo Järvinen)
- Update the Link Speed after retraining, since the Link Speed may
have changed (Ilpo Järvinen)
PCIe native device hotplug:
- Ignore Presence Detect Changed caused by DPC.
pciehp already ignores Link Down/Up events caused by DPC, but on
slots using in-band presence detect, DPC causes a spurious Presence
Detect Changed event (Lukas Wunner)
- Ignore Link Down/Up caused by Secondary Bus Reset.
On hotplug ports using in-band presence detect, the reset causes a
Presence Detect Changed event, which mistakenly caused teardown and
re-enumeration of the device. Drivers may need to annotate code
that resets their device (Lukas Wunner)
Virtualization:
- Add an ACS quirk for Loongson Root Ports that don't advertise ACS
but don't allow peer-to-peer transactions between Root Ports; the
quirk allows each Root Port to be in a separate IOMMU group (Huacai
Chen)
Endpoint framework:
- For fixed-size BARs, retain both the actual size and the possibly
larger size allocated to accommodate iATU alignment requirements
(Jerome Brunet)
- Simplify ctrl/SPAD space allocation and avoid allocating more space
than needed (Jerome Brunet)
- Correct MSI-X PBA offset calculations for DesignWare and Cadence
endpoint controllers (Niklas Cassel)
- Align the return value (number of interrupts) encoding for
pci_epc_get_msi()/pci_epc_ops::get_msi() and
pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel)
- Align the nr_irqs parameter encoding for
pci_epc_set_msi()/pci_epc_ops::set_msi() and
pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel)
Common host controller library:
- Convert pci-host-common to a library so platforms that don't need
native host controller drivers don't need to include these helper
functions (Manivannan Sadhasivam)
Apple PCIe controller driver:
- Extract ECAM bridge creation helper from pci_host_common_probe() to
separate driver-specific things like MSI from PCI things (Marc
Zyngier)
- Dynamically allocate RID-to_SID bitmap to prepare for SoCs with
varying capabilities (Marc Zyngier)
- Skip ports disabled in DT when setting up ports (Janne Grunau)
- Add t6020 compatible string (Alyssa Rosenzweig)
- Add T602x PCIe support (Hector Martin)
- Directly set/clear INTx mask bits because T602x dropped the
accessors that could do this without locking (Marc Zyngier)
- Move port PHY registers to their own reg items to accommodate
T602x, which moves them around; retain default offsets for existing
DTs that lack phy%d entries with the reg offsets (Hector Martin)
- Stop polling for core refclk, which doesn't work on T602x and the
bootloader has already done anyway (Hector Martin)
- Use gpiod_set_value_cansleep() when asserting PERST# in probe
because we're allowed to sleep there (Hector Martin)
Cadence PCIe controller driver:
- Drop a runtime PM 'put' to resolve a runtime atomic count underflow
(Hans Zhang)
- Make the cadence core buildable as a module (Kishon Vijay Abraham I)
- Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by
loadable drivers when they are removed (Siddharth Vadapalli)
Freescale i.MX6 PCIe controller driver:
- Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP
(Richard Zhu)
- Remove redundant dw_pcie_wait_for_link() from
imx_pcie_start_link(); since the DWC core does this, imx6 only
needs it when retraining for a faster link speed (Richard Zhu)
- Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu)
- Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in
some cases, the controller can't exit 'L23 Ready' through Beacon or
PERST# deassertion (Richard Zhu)
- Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum:
controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8
GT/s, causing timeouts in L1 (Richard Zhu)
- Wait for i.MX95 PLL lock before enabling controller (Richard Zhu)
- Save/restore i.MX95 LUT for suspend/resume (Richard Zhu)
Mobiveil PCIe controller driver:
- Return bool (not int) for link-up check in
mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans
Zhang)
NVIDIA Tegra194 PCIe controller driver:
- Create debugfs directory for 'aspm_state_cnt' only when
CONFIG_PCIEASPM is enabled, since there are no other entries (Hans
Zhang)
Qualcomm PCIe controller driver:
- Add OF support for parsing DT 'eq-presets-<N>gts' property for lane
equalization presets (Krishna Chaitanya Chundru)
- Read Maximum Link Width from the Link Capabilities register if DT
lacks 'num-lanes' property (Krishna Chaitanya Chundru)
- Add Physical Layer 64 GT/s Capability ID and register offsets for
8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya
Chundru)
- Add generic dwc support for configuring lane equalization presets
(Krishna Chaitanya Chundru)
- Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar)
Renesas R-Car PCIe controller driver:
- Describe endpoint BAR 4 as being fixed size (Jerome Brunet)
- Document how to obtain R-Car V4H (r8a779g0) controller firmware
(Yoshihiro Shimoda)
Rockchip PCIe controller driver:
- Reorder rockchip_pci_core_rsts because
reset_control_bulk_deassert() deasserts in reverse order, to fix a
link training regression (Jensen Huang)
- Mark RK3399 as being capable of raising INTx interrupts (Niklas
Cassel)
Rockchip DesignWare PCIe controller driver:
- Check only PCIE_LINKUP, not LTSSM status, to determine whether the
link is up (Shawn Lin)
- Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s
for Root Complex and Endpoint modes (Shawn Lin)
- Hide the broken ATS Capability in rockchip_pcie_ep_init() instead
of rockchip_pcie_ep_pre_init() so it stays hidden after PERST#
resets non-sticky registers (Shawn Lin)
- Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit()
(Diederik de Haas)
Synopsys DesignWare PCIe controller driver:
- Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training
more robust; this will not affect the intended link width if all
lanes are functional (Wenbin Yao)
- Return bool (not int) for link-up check in dw_pcie_ops.link_up()
and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay,
keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx,
tegra194, uniphier, visconti (Hans Zhang)
- Add debugfs support for exposing DWC device-specific PTM context
(Manivannan Sadhasivam)
TI J721E PCIe driver:
- Make j721e buildable as a loadable and removable module (Siddharth
Vadapalli)
- Fix j721e host/endpoint dependencies that result in link failures
in some configs (Arnd Bergmann)
Device tree bindings:
- Add qcom DT binding for 'global' interrupt (PCIe controller and
link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p,
sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan
Sadhasivam)
- Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074,
ipq8074-gen3, ipq6018 (Manivannan Sadhasivam)
- Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang)
- Correct indentation and style of examples in brcm,stb-pcie,
cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie,
microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm
(Krzysztof Kozlowski)
- Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and
armada8k from text to schema DT bindings (Rob Herring)
- Remove obsolete .txt DT bindings for content that has been moved to
schemas (Rob Herring)
- Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074
and IPQ9574 (Varadarajan Narayanan)
- Convert v3,v360epc-pci from text to DT schema binding (Rob Herring)
- Change microchip,pcie-host DT binding to be 'dma-noncoherent' since
PolarFire may be configured that way (Conor Dooley)
Miscellaneous:
- Drop 'pci' suffix from intel_mid_pci.c filename to match similar
files (Andy Shevchenko)
- All platforms with PCI have an MMU, so add PCI Kconfig dependency
on MMU to simplify build testing and avoid inadvertent build
regressions (Arnd Bergmann)
- Update Krzysztof Wilczyński's email address in MAINTAINERS
(Krzysztof Wilczyński)
- Update Manivannan Sadhasivam's email address in MAINTAINERS
(Manivannan Sadhasivam)"
* tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits)
MAINTAINERS: Update Manivannan Sadhasivam email address
PCI: j721e: Fix host/endpoint dependencies
PCI: j721e: Add support to build as a loadable module
PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup
PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup
PCI: cadence: Add support to build pcie-cadence library as a kernel module
MAINTAINERS: Update Krzysztof Wilczyński email address
PCI: Remove unnecessary linesplit in __pci_setup_bridge()
PCI: WARN (not BUG()) when we fail to assign optional resources
PCI: Remove unused pci_printk()
PCI: qcom: Replace PERST# sleep time with proper macro
PCI: dw-rockchip: Replace PERST# sleep time with proper macro
PCI: host-common: Convert to library for host controller drivers
PCI/ERR: Remove misleading TODO regarding kernel panic
PCI: cadence: Remove duplicate message code definitions
PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding
PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding
PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
PCI: cadence-ep: Correct PBA offset in .set_msix() callback
...
Ilan Peer [Wed, 4 Jun 2025 03:13:21 +0000 (06:13 +0300)]
wifi: iwlwifi: mld: Move regulatory domain initialization
The regulatory domain information was initialized every time the
FW was loaded and the device was restarted. This was unnecessary
and useless as at this stage the wiphy channels information was
not setup yet so while the regulatory domain was set to the wiphy,
the channel information was not updated.
In case that a specific MCC was configured during FW initialization
then following updates with this MCC are ignored, and thus the
wiphy channels information is left with information not matching
the regulatory domain.
This commit moves the regulatory domain initialization to after the
operational firmware is started, i.e., after the wiphy channels were
configured and the regulatory information is needed.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250604061200.f138a7382093.I2fd8b3e99be13c2687da483e2cb1311ffb4fbfce@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Johannes Berg [Wed, 4 Jun 2025 03:13:20 +0000 (06:13 +0300)]
wifi: iwlwifi: pcie: fix non-MSIX handshake register
When reading the interrupt status after a FW reset handshake
timeout, read the actual value not the mask for the non-MSIX
case.
Fixes:
ab606dea80c4 ("wifi: iwlwifi: pcie: add support for the reset handshake in MSI")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250604061200.87a849a55086.I2f8571aafa55aa3b936a30b938de9d260592a584@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>