Patrisious Haddad [Tue, 3 Dec 2024 20:49:17 +0000 (22:49 +0200)]
net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
In case that IB device is already disabled when moving to switchdev mode,
which can happen when working with LAG, need to do rescan_drivers()
before leaving in order to add ethernet representor auxiliary device.
Fixes:
ab85ebf43723 ("net/mlx5: E-switch, refactor eswitch mode change")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cosmin Ratiu [Tue, 3 Dec 2024 20:49:16 +0000 (22:49 +0200)]
net/mlx5: HWS: Properly set bwc queue locks lock classes
The mentioned "Fixes" patch forgot to do that.
Fixes:
9addffa34359 ("net/mlx5: HWS, use lock classes for bwc locks")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cosmin Ratiu [Tue, 3 Dec 2024 20:49:15 +0000 (22:49 +0200)]
net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
It allocates a match template, which creates a compressed definer fc
struct, but that is not deallocated.
This commit fixes that.
Fixes:
74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Dec 2024 03:23:36 +0000 (19:23 -0800)]
Merge branch 'bnxt_en-support-header-page-pool-in-queue-api'
David Wei says:
====================
bnxt_en: support header page pool in queue API
Commit
7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
separate page pool for header frags. Now, frags are allocated from this
header page pool e.g. rxr->tpa_info.data.
The queue API did not properly handle rxr->tpa_info and so using the
queue API to i.e. reset any queues will result in pages being returned
to the incorrect page pool, causing inflight != 0 warnings.
Fix this bug by properly allocating/freeing tpa_info and copying/freeing
head_pool in the queue API implementation.
The 1st patch is a prep patch that refactors helpers out to be used by
the implementation patch later.
The 2nd patch is a drive-by refactor. Happy to take it out and re-send
to net-next if there are any objections.
The 3rd patch is the implementation patch that will properly alloc/free
rxr->tpa_info.
====================
Link: https://patch.msgid.link/20241204041022.56512-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wei [Wed, 4 Dec 2024 04:10:22 +0000 (20:10 -0800)]
bnxt_en: handle tpa_info in queue API implementation
Commit
7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
page pool for header frags, which may be distinct from the existing pool
for the aggregation ring. Prior to this change, frags used in the TPA
ring rx_tpa were allocated from system memory e.g. napi_alloc_frag()
meaning their lifetimes were not associated with a page pool. They can
be returned at any time and so the queue API did not alloc or free
rx_tpa.
But now frags come from a separate head_pool which may be different to
page_pool. Without allocating and freeing rx_tpa, frags allocated from
the old head_pool may be returned to a different new head_pool which
causes a mismatch between the pp hold/release count.
Fix this problem by properly freeing and allocating rx_tpa in the queue
API implementation.
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-4-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wei [Wed, 4 Dec 2024 04:10:21 +0000 (20:10 -0800)]
bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
Refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() for
allocating rx_agg_bmap.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-3-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wei [Wed, 4 Dec 2024 04:10:20 +0000 (20:10 -0800)]
bnxt_en: refactor tpa_info alloc/free into helpers
Refactor bnxt_rx_ring_info->tpa_info operations into helpers that work
on a single tpa_info in prep for queue API using them.
There are 2 pairs of operations:
* bnxt_alloc_one_tpa_info()
* bnxt_free_one_tpa_info()
These alloc/free the tpa_info array itself.
* bnxt_alloc_one_tpa_info_data()
* bnxt_free_one_tpa_info_data()
These alloc/free the frags stored in tpa_info array.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 3 Dec 2024 18:21:21 +0000 (18:21 +0000)]
geneve: do not assume mac header is set in geneve_xmit_skb()
We should not assume mac header is set in output path.
Use skb_eth_hdr() instead of eth_hdr() to fix the issue.
sysbot reported the following :
WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 skb_mac_header include/linux/skbuff.h:3052 [inline]
WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 eth_hdr include/linux/if_ether.h:24 [inline]
WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 geneve_xmit_skb drivers/net/geneve.c:898 [inline]
WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 geneve_xmit+0x4c38/0x5730 drivers/net/geneve.c:1039
Modules linked in:
CPU: 0 UID: 0 PID: 11635 Comm: syz.4.1423 Not tainted
6.12.0-syzkaller-10296-gaaf20f870da0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:skb_mac_header include/linux/skbuff.h:3052 [inline]
RIP: 0010:eth_hdr include/linux/if_ether.h:24 [inline]
RIP: 0010:geneve_xmit_skb drivers/net/geneve.c:898 [inline]
RIP: 0010:geneve_xmit+0x4c38/0x5730 drivers/net/geneve.c:1039
Code: 21 c6 02 e9 35 d4 ff ff e8 a5 48 4c fb 90 0f 0b 90 e9 fd f5 ff ff e8 97 48 4c fb 90 0f 0b 90 e9 d8 f5 ff ff e8 89 48 4c fb 90 <0f> 0b 90 e9 41 e4 ff ff e8 7b 48 4c fb 90 0f 0b 90 e9 cd e7 ff ff
RSP: 0018:
ffffc90003b2f870 EFLAGS:
00010283
RAX:
000000000000037a RBX:
000000000000ffff RCX:
ffffc9000dc3d000
RDX:
0000000000080000 RSI:
ffffffff86428417 RDI:
0000000000000003
RBP:
ffffc90003b2f9f0 R08:
0000000000000003 R09:
000000000000ffff
R10:
000000000000ffff R11:
0000000000000002 R12:
ffff88806603c000
R13:
0000000000000000 R14:
ffff8880685b2780 R15:
0000000000000e23
FS:
00007fdc2deed6c0(0000) GS:
ffff8880b8600000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000001b30a1dff8 CR3:
0000000056b8c000 CR4:
00000000003526f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
__netdev_start_xmit include/linux/netdevice.h:5002 [inline]
netdev_start_xmit include/linux/netdevice.h:5011 [inline]
__dev_direct_xmit+0x58a/0x720 net/core/dev.c:4490
dev_direct_xmit include/linux/netdevice.h:3181 [inline]
packet_xmit+0x1e4/0x360 net/packet/af_packet.c:285
packet_snd net/packet/af_packet.c:3146 [inline]
packet_sendmsg+0x2700/0x5660 net/packet/af_packet.c:3178
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg net/socket.c:726 [inline]
__sys_sendto+0x488/0x4f0 net/socket.c:2197
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__x64_sys_sendto+0xe0/0x1c0 net/socket.c:2200
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes:
a025fb5f49ad ("geneve: Allow configuration of DF behaviour")
Reported-by: syzbot+3ec5271486d7cb2d242a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
674f4b72.
050a0220.17bd51.004a.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://patch.msgid.link/20241203182122.2725517-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Tue, 3 Dec 2024 15:16:05 +0000 (16:16 +0100)]
mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
The driver is currently using an ACL key block that is not supported by
Spectrum-4. This works because the driver is only using a single field
from this key block which is located in the same offset in the
equivalent Spectrum-4 key block.
The issue was discovered when the firmware started rejecting the use of
the unsupported key block. The change has been reverted to avoid
breaking users that only update their firmware.
Nonetheless, fix the issue by using the correct key block.
Fixes:
07ff135958dd ("mlxsw: Introduce flex key elements for Spectrum-4")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/35e72c97bdd3bc414fb8e4d747e5fb5d26c29658.1733237440.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kory Maincent [Mon, 2 Dec 2024 15:33:57 +0000 (16:33 +0100)]
ethtool: Fix wrong mod state in case of verbose and no_mask bitset
A bitset without mask in a _SET request means we want exactly the bits in
the bitset to be set. This works correctly for compact format but when
verbose format is parsed, ethnl_update_bitset32_verbose() only sets the
bits present in the request bitset but does not clear the rest. The commit
6699170376ab ("ethtool: fix application of verbose no_mask bitset") fixes
this issue by clearing the whole target bitmap before we start iterating.
The solution proposed brought an issue with the behavior of the mod
variable. As the bitset is always cleared the old value will always
differ to the new value.
Fix it by adding a new function to compare bitmaps and a temporary variable
which save the state of the old bitmap.
Fixes:
6699170376ab ("ethtool: fix application of verbose no_mask bitset")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20241202153358.1142095-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 3 Dec 2024 09:48:15 +0000 (10:48 +0100)]
ipmr: tune the ipmr_can_free_table() checks.
Eric reported a syzkaller-triggered splat caused by recent ipmr changes:
WARNING: CPU: 2 PID: 6041 at net/ipv6/ip6mr.c:419
ip6mr_free_table+0xbd/0x120 net/ipv6/ip6mr.c:419
Modules linked in:
CPU: 2 UID: 0 PID: 6041 Comm: syz-executor183 Not tainted
6.12.0-syzkaller-10681-g65ae975e97d5 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:ip6mr_free_table+0xbd/0x120 net/ipv6/ip6mr.c:419
Code: 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c
02 00 75 58 49 83 bc 24 c0 0e 00 00 00 74 09 e8 44 ef a9 f7 90 <0f> 0b
90 e8 3b ef a9 f7 48 8d 7b 38 e8 12 a3 96 f7 48 89 df be 0f
RSP: 0018:
ffffc90004267bd8 EFLAGS:
00010293
RAX:
0000000000000000 RBX:
ffff88803c710000 RCX:
ffffffff89e4d844
RDX:
ffff88803c52c880 RSI:
ffffffff89e4d87c RDI:
ffff88803c578ec0
RBP:
0000000000000001 R08:
0000000000000005 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000001 R12:
ffff88803c578000
R13:
ffff88803c710000 R14:
ffff88803c710008 R15:
dead000000000100
FS:
00007f7a855ee6c0(0000) GS:
ffff88806a800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f7a85689938 CR3:
000000003c492000 CR4:
0000000000352ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
ip6mr_rules_exit+0x176/0x2d0 net/ipv6/ip6mr.c:283
ip6mr_net_exit_batch+0x53/0xa0 net/ipv6/ip6mr.c:1388
ops_exit_list+0x128/0x180 net/core/net_namespace.c:177
setup_net+0x4fe/0x860 net/core/net_namespace.c:394
copy_net_ns+0x2b4/0x6b0 net/core/net_namespace.c:500
create_new_namespaces+0x3ea/0xad0 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
ksys_unshare+0x45d/0xa40 kernel/fork.c:3334
__do_sys_unshare kernel/fork.c:3405 [inline]
__se_sys_unshare kernel/fork.c:3403 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:3403
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7a856332d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007f7a855ee238 EFLAGS:
00000246 ORIG_RAX:
0000000000000110
RAX:
ffffffffffffffda RBX:
00007f7a856bd308 RCX:
00007f7a856332d9
RDX:
00007f7a8560f8c6 RSI:
0000000000000000 RDI:
0000000062040200
RBP:
00007f7a856bd300 R08:
00007fff932160a7 R09:
00007f7a855ee6c0
R10:
0000000000000000 R11:
0000000000000246 R12:
00007f7a856bd30c
R13:
0000000000000000 R14:
00007fff93215fc0 R15:
00007fff932160a8
</TASK>
The root cause is a network namespace creation failing after successful
initialization of the ipmr subsystem. Such a case is not currently
matched by the ipmr_can_free_table() helper.
New namespaces are zeroed on allocation and inserted into net ns list
only after successful creation; when deleting an ipmr table, the list
next pointer can be NULL only on netns initialization failure.
Update the ipmr_can_free_table() checks leveraging such condition.
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+6e8cb445d4b43d006e0c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
6e8cb445d4b43d006e0c
Fixes:
11b6e701bce9 ("ipmr: add debug check for mr table cleanup")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/8bde975e21bbca9d9c27e36209b2dd4f1d7a3f00.1733212078.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lion Ackermann [Mon, 2 Dec 2024 16:22:57 +0000 (17:22 +0100)]
net: sched: fix ordering of qlen adjustment
Changes to sch->q.qlen around qdisc_tree_reduce_backlog() need to happen
_before_ a call to said function because otherwise it may fail to notify
parent qdiscs when the child is about to become empty.
Signed-off-by: Lion Ackermann <nnamrec@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 2 Dec 2024 15:21:38 +0000 (10:21 -0500)]
net: sched: fix erspan_opt settings in cls_flower
When matching erspan_opt in cls_flower, only the (version, dir, hwid)
fields are relevant. However, in fl_set_erspan_opt() it initializes
all bits of erspan_opt and its mask to 1. This inadvertently requires
packets to match not only the (version, dir, hwid) fields but also the
other fields that are unexpectedly set to 1.
This patch resolves the issue by ensuring that only the (version, dir,
hwid) fields are configured in fl_set_erspan_opt(), leaving the other
fields to 0 in erspan_opt.
Fixes:
79b1011cb33d ("net: sched: allow flower to match erspan options")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Mon, 2 Dec 2024 16:48:05 +0000 (18:48 +0200)]
ethtool: Fix access to uninitialized fields in set RXNFC command
The check for non-zero ring with RSS is only relevant for
ETHTOOL_SRXCLSRLINS command, in other cases the check tries to access
memory which was not initialized by the userspace tool. Only perform the
check in case of ETHTOOL_SRXCLSRLINS.
Without this patch, filter deletion (for example) could statistically
result in a false error:
# ethtool --config-ntuple eth3 delete 484
rmgr: Cannot delete RX class rule: Invalid argument
Cannot delete classification rule
Fixes:
9e43ad7a1ede ("net: ethtool: only allow set_rxnfc with rss + ring_cookie if driver opts in")
Link: https://lore.kernel.org/netdev/871a9ecf-1e14-40dd-bbd7-e90c92f89d47@nvidia.com/
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20241202164805.1637093-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fernando Fernandez Mancera [Mon, 2 Dec 2024 15:56:08 +0000 (15:56 +0000)]
Revert "udp: avoid calling sock_def_readable() if possible"
This reverts commit
612b1c0dec5bc7367f90fc508448b8d0d7c05414. On a
scenario with multiple threads blocking on a recvfrom(), we need to call
sock_def_readable() on every __udp_enqueue_schedule_skb() otherwise the
threads won't be woken up as __skb_wait_for_more_packets() is using
prepare_to_wait_exclusive().
Link: https://bugzilla.redhat.com/2308477
Fixes:
612b1c0dec5b ("udp: avoid calling sock_def_readable() if possible")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241202155620.1719-1-ffmancera@riseup.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Mon, 2 Dec 2024 18:21:02 +0000 (18:21 +0000)]
net: Make napi_hash_lock irq safe
Make napi_hash_lock IRQ safe. It is used during the control path, and is
taken and released in napi_hash_add and napi_hash_del, which will
typically be called by calls to napi_enable and napi_disable.
This change avoids a deadlock in pcnet32 (and other any other drivers
which follow the same pattern):
CPU 0:
pcnet32_open
spin_lock_irqsave(&lp->lock, ...)
napi_enable
napi_hash_add <- before this executes, CPU 1 proceeds
spin_lock(napi_hash_lock)
[...]
spin_unlock_irqrestore(&lp->lock, flags);
CPU 1:
pcnet32_close
napi_disable
napi_hash_del
spin_lock(napi_hash_lock)
< INTERRUPT >
pcnet32_interrupt
spin_lock(lp->lock) <- DEADLOCK
Changing the napi_hash_lock to be IRQ safe prevents the IRQ from firing
on CPU 1 until napi_hash_lock is released, preventing the deadlock.
Cc: stable@vger.kernel.org
Fixes:
86e25f40aa1e ("net: napi: Add napi_config")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/netdev/
85dd4590-ea6b-427d-876a-
1d8559c7ad82@roeck-us.net/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241202182103.363038-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Mon, 2 Dec 2024 10:05:58 +0000 (10:05 +0000)]
net: hsr: must allocate more bytes for RedBox support
Blamed commit forgot to change hsr_init_skb() to allocate
larger skb for RedBox case.
Indeed, send_hsr_supervision_frame() will add
two additional components (struct hsr_sup_tlv
and struct hsr_sup_payload)
syzbot reported the following crash:
skbuff: skb_over_panic: text:
ffffffff8afd4b0a len:34 put:6 head:
ffff88802ad29e00 data:
ffff88802ad29f22 tail:0x144 end:0x140 dev:gretap0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 2 UID: 0 PID: 7611 Comm: syz-executor Not tainted 6.12.0-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:skb_panic+0x157/0x1d0 net/core/skbuff.c:206
Code: b6 04 01 84 c0 74 04 3c 03 7e 21 8b 4b 70 41 56 45 89 e8 48 c7 c7 a0 7d 9b 8c 41 57 56 48 89 ee 52 4c 89 e2 e8 9a 76 79 f8 90 <0f> 0b 4c 89 4c 24 10 48 89 54 24 08 48 89 34 24 e8 94 76 fb f8 4c
RSP: 0018:
ffffc90000858ab8 EFLAGS:
00010282
RAX:
0000000000000087 RBX:
ffff8880598c08c0 RCX:
ffffffff816d3e69
RDX:
0000000000000000 RSI:
ffffffff816de786 RDI:
0000000000000005
RBP:
ffffffff8c9b91c0 R08:
0000000000000005 R09:
0000000000000000
R10:
0000000000000302 R11:
ffffffff961cc1d0 R12:
ffffffff8afd4b0a
R13:
0000000000000006 R14:
ffff88804b938130 R15:
0000000000000140
FS:
000055558a3d6500(0000) GS:
ffff88806a800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f1295974ff8 CR3:
000000002ab6e000 CR4:
0000000000352ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<IRQ>
skb_over_panic net/core/skbuff.c:211 [inline]
skb_put+0x174/0x1b0 net/core/skbuff.c:2617
send_hsr_supervision_frame+0x6fa/0x9e0 net/hsr/hsr_device.c:342
hsr_proxy_announce+0x1a3/0x4a0 net/hsr/hsr_device.c:436
call_timer_fn+0x1a0/0x610 kernel/time/timer.c:1794
expire_timers kernel/time/timer.c:1845 [inline]
__run_timers+0x6e8/0x930 kernel/time/timer.c:2419
__run_timer_base kernel/time/timer.c:2430 [inline]
__run_timer_base kernel/time/timer.c:2423 [inline]
run_timer_base+0x111/0x190 kernel/time/timer.c:2439
run_timer_softirq+0x1a/0x40 kernel/time/timer.c:2449
handle_softirqs+0x213/0x8f0 kernel/softirq.c:554
__do_softirq kernel/softirq.c:588 [inline]
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu kernel/softirq.c:637 [inline]
irq_exit_rcu+0xbb/0x120 kernel/softirq.c:649
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
sysvec_apic_timer_interrupt+0xa4/0xc0 arch/x86/kernel/apic/apic.c:1049
</IRQ>
Fixes:
5055cccfc2d1 ("net: hsr: Provide RedBox support (HSR-SAN)")
Reported-by: syzbot+7f4643b267cc680bfa1c@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lukasz Majewski <lukma@denx.de>
Link: https://patch.msgid.link/20241202100558.507765-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cong Wang [Fri, 29 Nov 2024 21:25:19 +0000 (13:25 -0800)]
rtnetlink: fix double call of rtnl_link_get_net_ifla()
Currently rtnl_link_get_net_ifla() gets called twice when we create
peer devices, once in rtnl_add_peer_net() and once in each ->newlink()
implementation.
This looks safer, however, it leads to a classic Time-of-Check to
Time-of-Use (TOCTOU) bug since IFLA_NET_NS_PID is very dynamic. And
because of the lack of checking error pointer of the second call, it
also leads to a kernel crash as reported by syzbot.
Fix this by getting rid of the second call, which already becomes
redudant after Kuniyuki's work. We have to propagate the result of the
first rtnl_link_get_net_ifla() down to each ->newlink().
Reported-by: syzbot+21ba4d5adff0b6a7cfc6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
21ba4d5adff0b6a7cfc6
Fixes:
0eb87b02a705 ("veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.")
Fixes:
6b84e558e95d ("vxcan: Set VXCAN_INFO_PEER to vxcan_link_ops.peer_type.")
Fixes:
fefd5d082172 ("netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.")
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241129212519.825567-1-xiyou.wangcong@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Louis Leseur [Thu, 28 Nov 2024 08:33:58 +0000 (09:33 +0100)]
net/qed: allow old cards not supporting "num_images" to work
Commit
43645ce03e00 ("qed: Populate nvm image attribute shadow.")
added support for populating flash image attributes, notably
"num_images". However, some cards were not able to return this
information. In such cases, the driver would return EINVAL, causing the
driver to exit.
Add check to return EOPNOTSUPP instead of EINVAL when the card is not
able to return these information. The caller function already handles
EOPNOTSUPP without error.
Fixes:
43645ce03e00 ("qed: Populate nvm image attribute shadow.")
Co-developed-by: Florian Forestier <florian@forestier.re>
Signed-off-by: Florian Forestier <florian@forestier.re>
Signed-off-by: Louis Leseur <louis.leseur@gmail.com>
Link: https://patch.msgid.link/20241128083633.26431-1-louis.leseur@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 3 Dec 2024 09:42:36 +0000 (10:42 +0100)]
Merge branch 'two-fixes-for-smc'
Wen Gu says:
====================
two fixes for SMC
This patch set contains two bugfixes, to fix SMC warning and panic
issues in race conditions.
====================
Link: https://patch.msgid.link/20241127133014.100509-1-guwen@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wen Gu [Wed, 27 Nov 2024 13:30:14 +0000 (21:30 +0800)]
net/smc: fix LGR and link use-after-free issue
We encountered a LGR/link use-after-free issue, which manifested as
the LGR/link refcnt reaching 0 early and entering the clear process,
making resource access unsafe.
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
Workqueue: events smc_lgr_terminate_work [smc]
Call trace:
refcount_warn_saturate+0x9c/0x140
__smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
smc_lgr_terminate_work+0x28/0x30 [smc]
process_one_work+0x1b8/0x420
worker_thread+0x158/0x510
kthread+0x114/0x118
or
refcount_t: underflow; use-after-free.
WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
Workqueue: smc_hs_wq smc_listen_work [smc]
Call trace:
refcount_warn_saturate+0xf0/0x140
smcr_link_put+0x1cc/0x1d8 [smc]
smc_conn_free+0x110/0x1b0 [smc]
smc_conn_abort+0x50/0x60 [smc]
smc_listen_find_device+0x75c/0x790 [smc]
smc_listen_work+0x368/0x8a0 [smc]
process_one_work+0x1b8/0x420
worker_thread+0x158/0x510
kthread+0x114/0x118
It is caused by repeated release of LGR/link refcnt. One suspect is that
smc_conn_free() is called repeatedly because some smc_conn_free() from
server listening path are not protected by sock lock.
e.g.
Calls under socklock | smc_listen_work
-------------------------------------------------------
lock_sock(sk) | smc_conn_abort
smc_conn_free | \- smc_conn_free
\- smcr_link_put | \- smcr_link_put (duplicated)
release_sock(sk)
So here add sock lock protection in smc_listen_work() path, making it
exclusive with other connection operations.
Fixes:
3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Kai <KaiShen@linux.alibaba.com>
Signed-off-by: Kai <KaiShen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wen Gu [Wed, 27 Nov 2024 13:30:13 +0000 (21:30 +0800)]
net/smc: initialize close_work early to avoid warning
We encountered a warning that close_work was canceled before
initialization.
WARNING: CPU: 7 PID: 111103 at kernel/workqueue.c:3047 __flush_work+0x19e/0x1b0
Workqueue: events smc_lgr_terminate_work [smc]
RIP: 0010:__flush_work+0x19e/0x1b0
Call Trace:
? __wake_up_common+0x7a/0x190
? work_busy+0x80/0x80
__cancel_work_timer+0xe3/0x160
smc_close_cancel_work+0x1a/0x70 [smc]
smc_close_active_abort+0x207/0x360 [smc]
__smc_lgr_terminate.part.38+0xc8/0x180 [smc]
process_one_work+0x19e/0x340
worker_thread+0x30/0x370
? process_one_work+0x340/0x340
kthread+0x117/0x130
? __kthread_cancel_work+0x50/0x50
ret_from_fork+0x22/0x30
This is because when smc_close_cancel_work is triggered, e.g. the RDMA
driver is rmmod and the LGR is terminated, the conn->close_work is
flushed before initialization, resulting in WARN_ON(!work->func).
__smc_lgr_terminate | smc_connect_{rdma|ism}
-------------------------------------------------------------
| smc_conn_create
| \- smc_lgr_register_conn
for conn in lgr->conns_all |
\- smc_conn_kill |
\- smc_close_active_abort |
\- smc_close_cancel_work |
\- cancel_work_sync |
\- __flush_work |
(close_work) |
| smc_close_init
| \- INIT_WORK(&close_work)
So fix this by initializing close_work before establishing the
connection.
Fixes:
46c28dbd4c23 ("net/smc: no socket state changes in tasklet context")
Fixes:
413498440e30 ("net/smc: add SMC-D support in af_smc")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Wed, 27 Nov 2024 05:05:12 +0000 (14:05 +0900)]
tipc: Fix use-after-free of kernel socket in cleanup_bearer().
syzkaller reported a use-after-free of UDP kernel socket
in cleanup_bearer() without repro. [0][1]
When bearer_disable() calls tipc_udp_disable(), cleanup
of the UDP kernel socket is deferred by work calling
cleanup_bearer().
tipc_net_stop() waits for such works to finish by checking
tipc_net(net)->wq_count. However, the work decrements the
count too early before releasing the kernel socket,
unblocking cleanup_net() and resulting in use-after-free.
Let's move the decrement after releasing the socket in
cleanup_bearer().
[0]:
ref_tracker: net notrefcnt@
000000009b3d1faf has 1/1 users at
sk_alloc+0x438/0x608
inet_create+0x4c8/0xcb0
__sock_create+0x350/0x6b8
sock_create_kern+0x58/0x78
udp_sock_create4+0x68/0x398
udp_sock_create+0x88/0xc8
tipc_udp_enable+0x5e8/0x848
__tipc_nl_bearer_enable+0x84c/0xed8
tipc_nl_bearer_enable+0x38/0x60
genl_family_rcv_msg_doit+0x170/0x248
genl_rcv_msg+0x400/0x5b0
netlink_rcv_skb+0x1dc/0x398
genl_rcv+0x44/0x68
netlink_unicast+0x678/0x8b0
netlink_sendmsg+0x5e4/0x898
____sys_sendmsg+0x500/0x830
[1]:
BUG: KMSAN: use-after-free in udp_hashslot include/net/udp.h:85 [inline]
BUG: KMSAN: use-after-free in udp_lib_unhash+0x3b8/0x930 net/ipv4/udp.c:1979
udp_hashslot include/net/udp.h:85 [inline]
udp_lib_unhash+0x3b8/0x930 net/ipv4/udp.c:1979
sk_common_release+0xaf/0x3f0 net/core/sock.c:3820
inet_release+0x1e0/0x260 net/ipv4/af_inet.c:437
inet6_release+0x6f/0xd0 net/ipv6/af_inet6.c:489
__sock_release net/socket.c:658 [inline]
sock_release+0xa0/0x210 net/socket.c:686
cleanup_bearer+0x42d/0x4c0 net/tipc/udp_media.c:819
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xcaf/0x1c90 kernel/workqueue.c:3310
worker_thread+0xf6c/0x1510 kernel/workqueue.c:3391
kthread+0x531/0x6b0 kernel/kthread.c:389
ret_from_fork+0x60/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244
Uninit was created at:
slab_free_hook mm/slub.c:2269 [inline]
slab_free mm/slub.c:4580 [inline]
kmem_cache_free+0x207/0xc40 mm/slub.c:4682
net_free net/core/net_namespace.c:454 [inline]
cleanup_net+0x16f2/0x19d0 net/core/net_namespace.c:647
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xcaf/0x1c90 kernel/workqueue.c:3310
worker_thread+0xf6c/0x1510 kernel/workqueue.c:3391
kthread+0x531/0x6b0 kernel/kthread.c:389
ret_from_fork+0x60/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244
CPU: 0 UID: 0 PID: 54 Comm: kworker/0:2 Not tainted
6.12.0-rc1-00131-gf66ebf37d69c #7
91723d6f74857f70725e1583cba3cf4adc716cfa
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: events cleanup_bearer
Fixes:
26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241127050512.28438-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ivan Solodovnikov [Tue, 26 Nov 2024 14:39:02 +0000 (17:39 +0300)]
dccp: Fix memory leak in dccp_feat_change_recv
If dccp_feat_push_confirm() fails after new value for SP feature was accepted
without reconciliation ('entry == NULL' branch), memory allocated for that value
with dccp_feat_clone_sp_val() is never freed.
Here is the kmemleak stack for this:
unreferenced object 0xffff88801d4ab488 (size 8):
comm "syz-executor310", pid 1127, jiffies
4295085598 (age 41.666s)
hex dump (first 8 bytes):
01 b4 4a 1d 80 88 ff ff ..J.....
backtrace:
[<
00000000db7cabfe>] kmemdup+0x23/0x50 mm/util.c:128
[<
0000000019b38405>] kmemdup include/linux/string.h:465 [inline]
[<
0000000019b38405>] dccp_feat_clone_sp_val net/dccp/feat.c:371 [inline]
[<
0000000019b38405>] dccp_feat_clone_sp_val net/dccp/feat.c:367 [inline]
[<
0000000019b38405>] dccp_feat_change_recv net/dccp/feat.c:1145 [inline]
[<
0000000019b38405>] dccp_feat_parse_options+0x1196/0x2180 net/dccp/feat.c:1416
[<
00000000b1f6d94a>] dccp_parse_options+0xa2a/0x1260 net/dccp/options.c:125
[<
0000000030d7b621>] dccp_rcv_state_process+0x197/0x13d0 net/dccp/input.c:650
[<
000000001f74c72e>] dccp_v4_do_rcv+0xf9/0x1a0 net/dccp/ipv4.c:688
[<
00000000a6c24128>] sk_backlog_rcv include/net/sock.h:1041 [inline]
[<
00000000a6c24128>] __release_sock+0x139/0x3b0 net/core/sock.c:2570
[<
00000000cf1f3a53>] release_sock+0x54/0x1b0 net/core/sock.c:3111
[<
000000008422fa23>] inet_wait_for_connect net/ipv4/af_inet.c:603 [inline]
[<
000000008422fa23>] __inet_stream_connect+0x5d0/0xf70 net/ipv4/af_inet.c:696
[<
0000000015b6f64d>] inet_stream_connect+0x53/0xa0 net/ipv4/af_inet.c:735
[<
0000000010122488>] __sys_connect_file+0x15c/0x1a0 net/socket.c:1865
[<
00000000b4b70023>] __sys_connect+0x165/0x1a0 net/socket.c:1882
[<
00000000f4cb3815>] __do_sys_connect net/socket.c:1892 [inline]
[<
00000000f4cb3815>] __se_sys_connect net/socket.c:1889 [inline]
[<
00000000f4cb3815>] __x64_sys_connect+0x6e/0xb0 net/socket.c:1889
[<
00000000e7b1e839>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
[<
0000000055e91434>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
Clean up the allocated memory in case of dccp_feat_push_confirm() failure
and bail out with an error reset code.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes:
e77b8363b2ea ("dccp: Process incoming Change feature-negotiation options")
Signed-off-by: Ivan Solodovnikov <solodovnikov.ia@phystech.edu>
Link: https://patch.msgid.link/20241126143902.190853-1-solodovnikov.ia@phystech.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jiri Wiesner [Thu, 28 Nov 2024 08:59:50 +0000 (09:59 +0100)]
net/ipv6: release expired exception dst cached in socket
Dst objects get leaked in ip6_negative_advice() when this function is
executed for an expired IPv6 route located in the exception table. There
are several conditions that must be fulfilled for the leak to occur:
* an ICMPv6 packet indicating a change of the MTU for the path is received,
resulting in an exception dst being created
* a TCP connection that uses the exception dst for routing packets must
start timing out so that TCP begins retransmissions
* after the exception dst expires, the FIB6 garbage collector must not run
before TCP executes ip6_negative_advice() for the expired exception dst
When TCP executes ip6_negative_advice() for an exception dst that has
expired and if no other socket holds a reference to the exception dst, the
refcount of the exception dst is 2, which corresponds to the increment
made by dst_init() and the increment made by the TCP socket for which the
connection is timing out. The refcount made by the socket is never
released. The refcount of the dst is decremented in sk_dst_reset() but
that decrement is counteracted by a dst_hold() intentionally placed just
before the sk_dst_reset() in ip6_negative_advice(). After
ip6_negative_advice() has finished, there is no other object tied to the
dst. The socket lost its reference stored in sk_dst_cache and the dst is
no longer in the exception table. The exception dst becomes a leaked
object.
As a result of this dst leak, an unbalanced refcount is reported for the
loopback device of a net namespace being destroyed under kernels that do
not contain
e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"):
unregister_netdevice: waiting for lo to become free. Usage count = 2
Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
patch that introduced the dst_hold() in ip6_negative_advice() was
92f1655aa2b22 ("net: fix __dst_negative_advice() race"). But
92f1655aa2b22
merely refactored the code with regards to the dst refcount so the issue
was present even before
92f1655aa2b22. The bug was introduced in
54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually
expired.") where the expired cached route is deleted and the sk_dst_cache
member of the socket is set to NULL by calling dst_negative_advice() but
the refcount belonging to the socket is left unbalanced.
The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
When the TCP connection times out ipv4_negative_advice() merely resets the
sk_dst_cache of the socket while decrementing the refcount of the
exception dst.
Fixes:
92f1655aa2b22 ("net: fix __dst_negative_advice() race")
Fixes:
54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually expired.")
Link: https://lore.kernel.org/netdev/20241113105611.GA6723@incl/T/#u
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241128085950.GA4505@incl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Mon, 25 Nov 2024 08:40:50 +0000 (09:40 +0100)]
net: phy: microchip: Reset LAN88xx PHY to ensure clean link state on LAN7800/7850
Fix outdated MII_LPA data in the LAN88xx PHY, which is used in LAN7800
and LAN7850 USB Ethernet controllers. Due to a hardware limitation, the
PHY cannot reliably update link status after parallel detection when the
link partner does not support auto-negotiation. To mitigate this, add a
PHY reset in `lan88xx_link_change_notify()` when `phydev->state` is
`PHY_NOLINK`, ensuring the PHY starts in a clean state and reports
accurate fixed link parallel detection results.
Fixes:
792aec47d59d9 ("add microchip LAN88xx phy driver")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241125084050.414352-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 3 Dec 2024 02:04:10 +0000 (18:04 -0800)]
Merge tag 'linux-can-fixes-for-6.13-
20241202' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2024-12-02
The first patch is by me and allows the use of sleeping GPIOs to set
termination GPIOs.
Alexander Kozhinov fixes the gs_usb driver to use the endpoints
provided by the usb endpoint descriptions instead of hard coded ones.
Dario Binacchi contributes 11 statistics related patches for various
CAN driver. A potential use after free in the hi311x is fixed. The
statistics for the c_can, sun4i_can, hi311x, m_can, ifi_canfd,
sja1000, sun4i_can, ems_usb, f81604 are fixed: update statistics even
if the allocation of the error skb fails and fix the incrementing of
the rx,tx error counters.
A patch by me fixes the workaround for DS80000789E 6 erratum in the
mcp251xfd driver.
The last patch is by Dmitry Antipov, targets the j1939 CAN protocol
and fixes a skb reference counting issue.
* tag 'linux-can-fixes-for-6.13-
20241202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: j1939: j1939_session_new(): fix skb reference counting
can: mcp251xfd: mcp251xfd_get_tef_len(): work around erratum DS80000789E 6.
can: f81604: f81604_handle_can_bus_errors(): fix {rx,tx}_errors statistics
can: ems_usb: ems_usb_rx_err(): fix {rx,tx}_errors statistics
can: sun4i_can: sun4i_can_err(): fix {rx,tx}_errors statistics
can: sja1000: sja1000_err(): fix {rx,tx}_errors statistics
can: hi311x: hi3110_can_ist(): fix {rx,tx}_errors statistics
can: ifi_canfd: ifi_canfd_handle_lec_err(): fix {rx,tx}_errors statistics
can: m_can: m_can_handle_lec_err(): fix {rx,tx}_errors statistics
can: hi311x: hi3110_can_ist(): update state error statistics if skb allocation fails
can: hi311x: hi3110_can_ist(): fix potential use-after-free
can: sun4i_can: sun4i_can_err(): call can_change_state() even if cf is NULL
can: c_can: c_can_handle_bus_err(): update statistics if skb allocation fails
can: gs_usb: add usb endpoint address detection at driver probe step
can: dev: can_set_termination(): allow sleeping GPIOs
====================
Link: https://patch.msgid.link/20241202090040.1110280-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 30 Nov 2024 21:41:00 +0000 (13:41 -0800)]
MAINTAINERS: list PTP drivers under networking
PTP patches go via the netdev trees, add drivers/ptp/ to the networking
entry so that get_maintainer.pl --scm lists those trees above Linus's
tree.
Thanks to the real entry using drivers/ptp/* the original entry will
still be considered more specific / higher prio.
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://patch.msgid.link/20241130214100.125325-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geetha sowjanya [Tue, 26 Nov 2024 11:44:31 +0000 (17:14 +0530)]
octeontx2-af: Fix SDP MAC link credits configuration
Current driver allows only packet size < 512B as SDP_LINK_CREDIT
register is set to default value.
This patch fixes this issue by configure the register with
maximum HW supported value to allow packet size > 512B.
Fixes:
2f7f33a09516 ("octeontx2-pf: Add representors for sdp MAC")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Antipov [Tue, 5 Nov 2024 09:48:23 +0000 (12:48 +0300)]
can: j1939: j1939_session_new(): fix skb reference counting
Since j1939_session_skb_queue() does an extra skb_get() for each new
skb, do the same for the initial one in j1939_session_new() to avoid
refcount underflow.
Reported-by: syzbot+d4e8dc385d9258220c31@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
d4e8dc385d9258220c31
Fixes:
9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241105094823.2403806-1-dmantipov@yandex.ru
[mkl: clean up commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Eric Dumazet [Tue, 26 Nov 2024 19:28:27 +0000 (19:28 +0000)]
ipv6: avoid possible NULL deref in modify_prefix_route()
syzbot found a NULL deref [1] in modify_prefix_route(), caused by one
fib6_info without a fib6_table pointer set.
This can happen for net->ipv6.fib6_null_entry
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 1 UID: 0 PID: 5837 Comm: syz-executor888 Not tainted
6.12.0-syzkaller-09567-g7eef7e306d3c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:__lock_acquire+0xe4/0x3c40 kernel/locking/lockdep.c:5089
Code: 08 84 d2 0f 85 15 14 00 00 44 8b 0d ca 98 f5 0e 45 85 c9 0f 84 b4 0e 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 96 2c 00 00 49 8b 04 24 48 3d a0 07 7f 93 0f 84
RSP: 0018:
ffffc900035d7268 EFLAGS:
00010006
RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000006 RSI:
1ffff920006bae5f RDI:
0000000000000030
RBP:
0000000000000000 R08:
0000000000000001 R09:
0000000000000001
R10:
ffffffff90608e17 R11:
0000000000000001 R12:
0000000000000030
R13:
ffff888036334880 R14:
0000000000000000 R15:
0000000000000000
FS:
0000555579e90380(0000) GS:
ffff8880b8700000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007ffc59cc4278 CR3:
0000000072b54000 CR4:
00000000003526f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5849
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
modify_prefix_route+0x30b/0x8b0 net/ipv6/addrconf.c:4831
inet6_addr_modify net/ipv6/addrconf.c:4923 [inline]
inet6_rtm_newaddr+0x12c7/0x1ab0 net/ipv6/addrconf.c:5055
rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6920
netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2541
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg net/socket.c:726 [inline]
____sys_sendmsg+0xaaf/0xc90 net/socket.c:2583
___sys_sendmsg+0x135/0x1e0 net/socket.c:2637
__sys_sendmsg+0x16e/0x220 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd1dcef8b79
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007ffc59cc4378 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007fd1dcef8b79
RDX:
0000000000040040 RSI:
0000000020000140 RDI:
0000000000000004
RBP:
00000000000113fd R08:
0000000000000006 R09:
0000000000000006
R10:
0000000000000006 R11:
0000000000000246 R12:
00007ffc59cc438c
R13:
431bde82d7b634db R14:
0000000000000001 R15:
0000000000000001
</TASK>
Fixes:
5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.")
Reported-by: syzbot+1de74b0794c40c8eb300@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
67461f7f.
050a0220.1286eb.0021.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Kui-Feng Lee <thinker.li@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dong Chenchen [Wed, 27 Nov 2024 04:08:50 +0000 (12:08 +0800)]
net: Fix icmp host relookup triggering ip_rt_bug
arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is:
WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
6.12.0-rc6-00077-g2e1b3cc9d7f7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:ip_rt_bug+0x14/0x20
Call Trace:
<IRQ>
ip_send_skb+0x14/0x40
__icmp_send+0x42d/0x6a0
ipv4_link_failure+0xe2/0x1d0
arp_error_report+0x3c/0x50
neigh_invalidate+0x8d/0x100
neigh_timer_handler+0x2e1/0x330
call_timer_fn+0x21/0x120
__run_timer_base.part.0+0x1c9/0x270
run_timer_softirq+0x4c/0x80
handle_softirqs+0xac/0x280
irq_exit_rcu+0x62/0x80
sysvec_apic_timer_interrupt+0x77/0x90
The script below reproduces this scenario:
ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \
dir out priority 0 ptype main flag localok icmp
ip l a veth1 type veth
ip a a 192.168.141.111/24 dev veth0
ip l s veth0 up
ping 192.168.141.155 -c 1
icmp_route_lookup() create input routes for locally generated packets
while xfrm relookup ICMP traffic.Then it will set input route
(dst->out = ip_rt_bug) to skb for DESTUNREACH.
For ICMP err triggered by locally generated packets, dst->dev of output
route is loopback. Generally, xfrm relookup verification is not required
on loopback interfaces (net.ipv4.conf.lo.disable_xfrm = 1).
Skip icmp relookup for locally generated packets to fix it.
Fixes:
8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support")
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241127040850.1513135-1-dongchenchen2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 30 Nov 2024 22:16:13 +0000 (14:16 -0800)]
Merge branch 'bnxt-fix-failure-to-report-rss-context-in-ntuple-rule'
Daniel Xu says:
====================
bnxt: Fix failure to report RSS context in ntuple rule
This patchset fixes a bug where bnxt driver was failing to report that
an ntuple rule is redirecting to an RSS context. First commit is the
fix, then second commit extends selftests to detect if other/new drivers
are compliant with ntuple/rss_ctx API.
====================
Link: https://patch.msgid.link/cover.1732748253.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Xu [Wed, 27 Nov 2024 22:58:30 +0000 (15:58 -0700)]
selftests: drv-net: rss_ctx: Add test for ntuple rule
Extend the rss_ctx test suite to test that an ntuple action that
redirects to an RSS context contains that information in `ethtool -n`.
Otherwise the output from ethtool is highly deceiving. This test helps
ensure drivers are compliant with the API.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://patch.msgid.link/759870e430b7c93ecaae6e448f30a47284c59637.1732748253.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Xu [Wed, 27 Nov 2024 22:58:29 +0000 (15:58 -0700)]
bnxt_en: ethtool: Supply ntuple rss context action
Commit
2f4f9fe5bf5f ("bnxt_en: Support adding ntuple rules on RSS
contexts") added support for redirecting to an RSS context as an ntuple
rule action. However, it forgot to update the ETHTOOL_GRXCLSRULE
codepath. This caused `ethtool -n` to always report the action as
"Action: Direct to queue 0" which is wrong.
Fix by teaching bnxt driver to report the RSS context when applicable.
Fixes:
2f4f9fe5bf5f ("bnxt_en: Support adding ntuple rules on RSS contexts")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://patch.msgid.link/2e884ae39e08dc5123be7c170a6089cefe6a78f7.1732748253.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 26 Nov 2024 14:43:44 +0000 (14:43 +0000)]
net: hsr: avoid potential out-of-bound access in fill_frame_info()
syzbot is able to feed a packet with 14 bytes, pretending
it is a vlan one.
Since fill_frame_info() is relying on skb->mac_len already,
extend the check to cover this case.
BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:709 [inline]
BUG: KMSAN: uninit-value in hsr_forward_skb+0x9ee/0x3b10 net/hsr/hsr_forward.c:724
fill_frame_info net/hsr/hsr_forward.c:709 [inline]
hsr_forward_skb+0x9ee/0x3b10 net/hsr/hsr_forward.c:724
hsr_dev_xmit+0x2f0/0x350 net/hsr/hsr_device.c:235
__netdev_start_xmit include/linux/netdevice.h:5002 [inline]
netdev_start_xmit include/linux/netdevice.h:5011 [inline]
xmit_one net/core/dev.c:3590 [inline]
dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3606
__dev_queue_xmit+0x366a/0x57d0 net/core/dev.c:4434
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3146 [inline]
packet_sendmsg+0x91ae/0xa6f0 net/packet/af_packet.c:3178
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:726
__sys_sendto+0x594/0x750 net/socket.c:2197
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__x64_sys_sendto+0x125/0x1d0 net/socket.c:2200
x64_sys_call+0x346a/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:45
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4091 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
__alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
alloc_skb include/linux/skbuff.h:1323 [inline]
alloc_skb_with_frags+0xc8/0xd00 net/core/skbuff.c:6612
sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2881
packet_alloc_skb net/packet/af_packet.c:2995 [inline]
packet_snd net/packet/af_packet.c:3089 [inline]
packet_sendmsg+0x74c6/0xa6f0 net/packet/af_packet.c:3178
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:726
__sys_sendto+0x594/0x750 net/socket.c:2197
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__x64_sys_sendto+0x125/0x1d0 net/socket.c:2200
x64_sys_call+0x346a/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:45
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes:
48b491a5cc74 ("net: hsr: fix mac_len checks")
Reported-by: syzbot+671e2853f9851d039551@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
6745dc7f.
050a0220.21d33d.0018.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: WingMan Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: MD Danish Anwar <danishanwar@ti.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Cc: George McCollister <george.mccollister@gmail.com>
Link: https://patch.msgid.link/20241126144344.4177332-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vyshnav Ajith [Thu, 21 Nov 2024 22:48:27 +0000 (04:18 +0530)]
docs: net: bareudp: fix spelling and grammar mistakes
The BareUDP documentation had several grammar and spelling mistakes,
making it harder to read. This patch fixes those errors to improve
clarity and readability for developers.
Signed-off-by: Vyshnav Ajith <puthen1977@gmail.com>
Link: https://patch.msgid.link/20241121224827.12293-1-puthen1977@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 26 Nov 2024 14:59:11 +0000 (14:59 +0000)]
selinux: use sk_to_full_sk() in selinux_ip_output()
In blamed commit, TCP started to attach timewait sockets to
some skbs.
syzbot reported that selinux_ip_output() was not expecting them yet.
Note that using sk_to_full_sk() is still allowing the
following sk_listener() check to work as before.
BUG: KASAN: slab-out-of-bounds in selinux_sock security/selinux/include/objsec.h:207 [inline]
BUG: KASAN: slab-out-of-bounds in selinux_ip_output+0x1e0/0x1f0 security/selinux/hooks.c:5761
Read of size 8 at addr
ffff88804e86e758 by task syz-executor347/5894
CPU: 0 UID: 0 PID: 5894 Comm: syz-executor347 Not tainted
6.12.0-syzkaller-05480-gfcc79e1714e8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:488
kasan_report+0xd9/0x110 mm/kasan/report.c:601
selinux_sock security/selinux/include/objsec.h:207 [inline]
selinux_ip_output+0x1e0/0x1f0 security/selinux/hooks.c:5761
nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
nf_hook_slow+0xbb/0x200 net/netfilter/core.c:626
nf_hook+0x386/0x6d0 include/linux/netfilter.h:269
__ip_local_out+0x339/0x640 net/ipv4/ip_output.c:119
ip_local_out net/ipv4/ip_output.c:128 [inline]
ip_send_skb net/ipv4/ip_output.c:1505 [inline]
ip_push_pending_frames+0xa0/0x5b0 net/ipv4/ip_output.c:1525
ip_send_unicast_reply+0xd0e/0x1650 net/ipv4/ip_output.c:1672
tcp_v4_send_ack+0x976/0x13f0 net/ipv4/tcp_ipv4.c:1024
tcp_v4_timewait_ack net/ipv4/tcp_ipv4.c:1077 [inline]
tcp_v4_rcv+0x2f96/0x4390 net/ipv4/tcp_ipv4.c:2428
ip_protocol_deliver_rcu+0xba/0x4c0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x316/0x570 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip_local_deliver+0x18e/0x1f0 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish net/ipv4/ip_input.c:447 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip_rcv+0x2c3/0x5d0 net/ipv4/ip_input.c:567
__netif_receive_skb_one_core+0x199/0x1e0 net/core/dev.c:5672
__netif_receive_skb+0x1d/0x160 net/core/dev.c:5785
process_backlog+0x443/0x15f0 net/core/dev.c:6117
__napi_poll.constprop.0+0xb7/0x550 net/core/dev.c:6877
napi_poll net/core/dev.c:6946 [inline]
net_rx_action+0xa94/0x1010 net/core/dev.c:7068
handle_softirqs+0x213/0x8f0 kernel/softirq.c:554
do_softirq kernel/softirq.c:455 [inline]
do_softirq+0xb2/0xf0 kernel/softirq.c:442
</IRQ>
<TASK>
__local_bh_enable_ip+0x100/0x120 kernel/softirq.c:382
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
__dev_queue_xmit+0x8af/0x43e0 net/core/dev.c:4461
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
neigh_hh_output include/net/neighbour.h:523 [inline]
neigh_output include/net/neighbour.h:537 [inline]
ip_finish_output2+0xc6c/0x2150 net/ipv4/ip_output.c:236
__ip_finish_output net/ipv4/ip_output.c:314 [inline]
__ip_finish_output+0x49e/0x950 net/ipv4/ip_output.c:296
ip_finish_output+0x35/0x380 net/ipv4/ip_output.c:324
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:434
dst_output include/net/dst.h:450 [inline]
ip_local_out+0x33e/0x4a0 net/ipv4/ip_output.c:130
__ip_queue_xmit+0x777/0x1970 net/ipv4/ip_output.c:536
__tcp_transmit_skb+0x2b39/0x3df0 net/ipv4/tcp_output.c:1466
tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
tcp_write_xmit+0x12b1/0x8560 net/ipv4/tcp_output.c:2827
__tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3010
tcp_send_fin+0x154/0xc70 net/ipv4/tcp_output.c:3616
__tcp_close+0x96b/0xff0 net/ipv4/tcp.c:3130
tcp_close+0x28/0x120 net/ipv4/tcp.c:3221
inet_release+0x13c/0x280 net/ipv4/af_inet.c:435
__sock_release net/socket.c:640 [inline]
sock_release+0x8e/0x1d0 net/socket.c:668
smc_clcsock_release+0xb7/0xe0 net/smc/smc_close.c:34
__smc_release+0x5c2/0x880 net/smc/af_smc.c:301
smc_release+0x1fc/0x5f0 net/smc/af_smc.c:344
__sock_release+0xb0/0x270 net/socket.c:640
sock_close+0x1c/0x30 net/socket.c:1408
__fput+0x3f8/0xb60 fs/file_table.c:450
__fput_sync+0xa1/0xc0 fs/file_table.c:535
__do_sys_close fs/open.c:1550 [inline]
__se_sys_close fs/open.c:1535 [inline]
__x64_sys_close+0x86/0x100 fs/open.c:1535
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6814c9ae10
Code: ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d b1 e2 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
RSP: 002b:
00007fffb2389758 EFLAGS:
00000202 ORIG_RAX:
0000000000000003
RAX:
ffffffffffffffda RBX:
0000000000000004 RCX:
00007f6814c9ae10
RDX:
0000000000000010 RSI:
0000000020000000 RDI:
0000000000000003
RBP:
00000000000f4240 R08:
0000000000000001 R09:
0000000000000001
R10:
0000000000000001 R11:
0000000000000202 R12:
00007fffb23897b0
R13:
00000000000141c3 R14:
00007fffb238977c R15:
00007fffb2389790
</TASK>
Fixes:
79636038d37e ("ipv4: tcp: give socket pointer to control skbs")
Reported-by: syzbot+2d9f5f948c31dcb7745e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/
6745e1a2.
050a0220.1286eb.001c.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241126145911.4187198-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Ottens [Mon, 25 Nov 2024 17:46:07 +0000 (18:46 +0100)]
net/sched: tbf: correct backlog statistic for GSO packets
When the length of a GSO packet in the tbf qdisc is larger than the burst
size configured the packet will be segmented by the tbf_segment function.
Whenever this function is used to enqueue SKBs, the backlog statistic of
the tbf is not increased correctly. This can lead to underflows of the
'backlog' byte-statistic value when these packets are dequeued from tbf.
Reproduce the bug:
Ensure that the sender machine has GSO enabled. Configured the tbf on
the outgoing interface of the machine as follows (burstsize = 1 MTU):
$ tc qdisc add dev <oif> root handle 1: tbf rate 50Mbit burst 1514 latency 50ms
Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
client on this machine. Check the qdisc statistics:
$ tc -s qdisc show dev <oif>
The 'backlog' byte-statistic has incorrect values while traffic is
transferred, e.g., high values due to u32 underflows. When the transfer
is stopped, the value is != 0, which should never happen.
This patch fixes this bug by updating the statistics correctly, even if
single SKBs of a GSO SKB cannot be enqueued.
Fixes:
e43ac79a4bc6 ("sch_tbf: segment too big GSO packets")
Signed-off-by: Martin Ottens <martin.ottens@fau.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241125174608.1484356-1-martin.ottens@fau.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ajay Kaher [Mon, 25 Nov 2024 10:59:54 +0000 (10:59 +0000)]
ptp: Add error handling for adjfine callback in ptp_clock_adjtime
ptp_clock_adjtime sets ptp->dialed_frequency even when adjfine
callback returns an error. This causes subsequent reads to return
an incorrect value.
Fix this by adding error check before ptp->dialed_frequency is set.
Fixes:
39a8cbd9ca05 ("ptp: remember the adjusted frequency")
Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://patch.msgid.link/20241125105954.1509971-1-ajay.kaher@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Mon, 25 Nov 2024 09:30:39 +0000 (09:30 +0000)]
tcp: populate XPS related fields of timewait sockets
syzbot reported that netdev_core_pick_tx() was reading an uninitialized
field [1].
This is indeed hapening for timewait sockets after recent commits.
We can copy the original established socket sk_tx_queue_mapping
and sk_rx_queue_mapping fields, instead of adding more checks
in fast paths.
As a bonus, packets will use the same transmit queue than
prior ones, this potentially can avoid reordering.
[1]
BUG: KMSAN: uninit-value in netdev_pick_tx+0x5c7/0x1550
netdev_pick_tx+0x5c7/0x1550
netdev_core_pick_tx+0x1d2/0x4a0 net/core/dev.c:4312
__dev_queue_xmit+0x128a/0x57d0 net/core/dev.c:4394
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
neigh_hh_output include/net/neighbour.h:523 [inline]
neigh_output include/net/neighbour.h:537 [inline]
ip_finish_output2+0x187c/0x1b70 net/ipv4/ip_output.c:236
__ip_finish_output+0x287/0x810
ip_finish_output+0x4b/0x600 net/ipv4/ip_output.c:324
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:434
dst_output include/net/dst.h:450 [inline]
ip_local_out net/ipv4/ip_output.c:130 [inline]
ip_send_skb net/ipv4/ip_output.c:1505 [inline]
ip_push_pending_frames+0x444/0x570 net/ipv4/ip_output.c:1525
ip_send_unicast_reply+0x18c1/0x1b30 net/ipv4/ip_output.c:1672
tcp_v4_send_reset+0x238d/0x2a40 net/ipv4/tcp_ipv4.c:910
tcp_v4_rcv+0x48f8/0x5750 net/ipv4/tcp_ipv4.c:2431
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:578 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline]
ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:636
ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:670
__netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
__netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5762
__netif_receive_skb_list net/core/dev.c:5814 [inline]
netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:5905
gro_normal_list include/net/gro.h:515 [inline]
napi_complete_done+0x3d4/0x810 net/core/dev.c:6256
virtqueue_napi_complete drivers/net/virtio_net.c:758 [inline]
virtnet_poll+0x5d80/0x6bf0 drivers/net/virtio_net.c:3013
__napi_poll+0xe7/0x980 net/core/dev.c:6877
napi_poll net/core/dev.c:6946 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7068
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:554
__do_softirq kernel/softirq.c:588 [inline]
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0x68/0x180 kernel/softirq.c:655
irq_exit_rcu+0x12/0x20 kernel/softirq.c:671
common_interrupt+0x97/0xb0 arch/x86/kernel/irq.c:278
asm_common_interrupt+0x2b/0x40 arch/x86/include/asm/idtentry.h:693
__preempt_count_sub arch/x86/include/asm/preempt.h:84 [inline]
kmsan_virt_addr_valid arch/x86/include/asm/kmsan.h:95 [inline]
virt_to_page_or_null+0xfb/0x150 mm/kmsan/shadow.c:75
kmsan_get_metadata+0x13e/0x1c0 mm/kmsan/shadow.c:141
kmsan_get_shadow_origin_ptr+0x4d/0xb0 mm/kmsan/shadow.c:102
get_shadow_origin_ptr mm/kmsan/instrumentation.c:38 [inline]
__msan_metadata_ptr_for_store_4+0x27/0x40 mm/kmsan/instrumentation.c:93
rcu_preempt_read_enter kernel/rcu/tree_plugin.h:390 [inline]
__rcu_read_lock+0x46/0x70 kernel/rcu/tree_plugin.h:413
rcu_read_lock include/linux/rcupdate.h:847 [inline]
batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:408 [inline]
batadv_nc_worker+0x114/0x19e0 net/batman-adv/network-coding.c:719
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xae0/0x1c40 kernel/workqueue.c:3310
worker_thread+0xea7/0x14f0 kernel/workqueue.c:3391
kthread+0x3e2/0x540 kernel/kthread.c:389
ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Uninit was created at:
__alloc_pages_noprof+0x9a7/0xe00 mm/page_alloc.c:4774
alloc_pages_mpol_noprof+0x299/0x990 mm/mempolicy.c:2265
alloc_pages_noprof+0x1bf/0x1e0 mm/mempolicy.c:2344
alloc_slab_page mm/slub.c:2412 [inline]
allocate_slab+0x320/0x12e0 mm/slub.c:2578
new_slab mm/slub.c:2631 [inline]
___slab_alloc+0x12ef/0x35e0 mm/slub.c:3818
__slab_alloc mm/slub.c:3908 [inline]
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
kmem_cache_alloc_noprof+0x57a/0xb20 mm/slub.c:4141
inet_twsk_alloc+0x11f/0x9d0 net/ipv4/inet_timewait_sock.c:188
tcp_time_wait+0x83/0xf50 net/ipv4/tcp_minisocks.c:309
tcp_rcv_state_process+0x145a/0x49d0
tcp_v4_do_rcv+0xbf9/0x11a0 net/ipv4/tcp_ipv4.c:1939
tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:578 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline]
ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:636
ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:670
__netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
__netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5762
__netif_receive_skb_list net/core/dev.c:5814 [inline]
netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:5905
gro_normal_list include/net/gro.h:515 [inline]
napi_complete_done+0x3d4/0x810 net/core/dev.c:6256
virtqueue_napi_complete drivers/net/virtio_net.c:758 [inline]
virtnet_poll+0x5d80/0x6bf0 drivers/net/virtio_net.c:3013
__napi_poll+0xe7/0x980 net/core/dev.c:6877
napi_poll net/core/dev.c:6946 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7068
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:554
__do_softirq kernel/softirq.c:588 [inline]
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0x68/0x180 kernel/softirq.c:655
irq_exit_rcu+0x12/0x20 kernel/softirq.c:671
common_interrupt+0x97/0xb0 arch/x86/kernel/irq.c:278
asm_common_interrupt+0x2b/0x40 arch/x86/include/asm/idtentry.h:693
CPU: 0 UID: 0 PID: 3962 Comm: kworker/u8:18 Not tainted
6.12.0-syzkaller-09073-g9f16d5e6f220 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: bat_events batadv_nc_worker
Fixes:
79636038d37e ("ipv4: tcp: give socket pointer to control skbs")
Fixes:
507a96737d99 ("ipv6: tcp: give socket pointer to control skbs")
Reported-by: syzbot+8b0959fc16551d55896b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/netdev/674442bd.050a0220.1cc393.0072.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241125093039.3095790-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Fri, 29 Nov 2024 12:52:04 +0000 (12:52 +0000)]
Merge branch 'enetc-mqprio-fixes'
Wei Fang sayus:
====================
fix crash issue when setting MQPRIO for VFs
There is a crash issue when setting MQPRIO for ENETC VFs, the root casue
is that ENETC VFs don't like ENETC PFs, they don't have port registers,
so hw->port of VFs is NULL. However, this NULL pointer will be accessed
without any checks in enetc_mm_commit_preemptible_tcs() when configuring
MQPRIO for VFs. Therefore, two patches are added to fix this issue. The
first patch sets ENETC_SI_F_QBU flag only for SIs that support 802.1Qbu.
The second patch adds a check in enetc_change_preemptible_tcs() to ensure
that SIs that do not support 802.1Qbu do not configure preemptible TCs.
---
Link: https://lore.kernel.org/imx/20241030082117.1172634-1-wei.fang@nxp.com/
Link: https://lore.kernel.org/imx/20241104054309.1388433-1-wei.fang@nxp.com/
---
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Fang [Mon, 25 Nov 2024 09:07:19 +0000 (17:07 +0800)]
net: enetc: Do not configure preemptible TCs if SIs do not support
Both ENETC PF and VF drivers share enetc_setup_tc_mqprio() to configure
MQPRIO. And enetc_setup_tc_mqprio() calls enetc_change_preemptible_tcs()
to configure preemptible TCs. However, only PF is able to configure
preemptible TCs. Because only PF has related registers, while VF does not
have these registers. So for VF, its hw->port pointer is NULL. Therefore,
VF will access an invalid pointer when accessing a non-existent register,
which will cause a crash issue. The simplified log is as follows.
root@ls1028ardb:~# tc qdisc add dev eno0vf0 parent root handle 100: \
mqprio num_tc 4 map 0 0 1 1 2 2 3 3 queues 1@0 1@1 1@2 1@3 hw 1
[ 187.290775] Unable to handle kernel paging request at virtual address
0000000000001f00
[ 187.424831] pc : enetc_mm_commit_preemptible_tcs+0x1c4/0x400
[ 187.430518] lr : enetc_mm_commit_preemptible_tcs+0x30c/0x400
[ 187.511140] Call trace:
[ 187.513588] enetc_mm_commit_preemptible_tcs+0x1c4/0x400
[ 187.518918] enetc_setup_tc_mqprio+0x180/0x214
[ 187.523374] enetc_vf_setup_tc+0x1c/0x30
[ 187.527306] mqprio_enable_offload+0x144/0x178
[ 187.531766] mqprio_init+0x3ec/0x668
[ 187.535351] qdisc_create+0x15c/0x488
[ 187.539023] tc_modify_qdisc+0x398/0x73c
[ 187.542958] rtnetlink_rcv_msg+0x128/0x378
[ 187.547064] netlink_rcv_skb+0x60/0x130
[ 187.550910] rtnetlink_rcv+0x18/0x24
[ 187.554492] netlink_unicast+0x300/0x36c
[ 187.558425] netlink_sendmsg+0x1a8/0x420
[ 187.606759] ---[ end trace
0000000000000000 ]---
In addition, some PFs also do not support configuring preemptible TCs,
such as eno1 and eno3 on LS1028A. It won't crash like it does for VFs,
but we should prevent these PFs from accessing these unimplemented
registers.
Fixes:
827145392a4a ("net: enetc: only commit preemptible TCs to hardware when MM TX is active")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 25 Nov 2024 09:07:18 +0000 (17:07 +0800)]
net: enetc: read TSN capabilities from port register, not SI
Configuring TSN (Qbv, Qbu, PSFP) capabilities requires access to port
registers, which are available to the PSI but not the VSI.
Yet, the SI port capability register 0 (PSICAPR0), exposed to both PSIs
and VSIs, presents the same capabilities to the VF as to the PF, thus
leading the VF driver into thinking it can configure these features.
In the case of ENETC_SI_F_QBU, having it set in the VF leads to a crash:
root@ls1028ardb:~# tc qdisc add dev eno0vf0 parent root handle 100: \
mqprio num_tc 4 map 0 0 1 1 2 2 3 3 queues 1@0 1@1 1@2 1@3 hw 1
[ 187.290775] Unable to handle kernel paging request at virtual address
0000000000001f00
[ 187.424831] pc : enetc_mm_commit_preemptible_tcs+0x1c4/0x400
[ 187.430518] lr : enetc_mm_commit_preemptible_tcs+0x30c/0x400
[ 187.511140] Call trace:
[ 187.513588] enetc_mm_commit_preemptible_tcs+0x1c4/0x400
[ 187.518918] enetc_setup_tc_mqprio+0x180/0x214
[ 187.523374] enetc_vf_setup_tc+0x1c/0x30
[ 187.527306] mqprio_enable_offload+0x144/0x178
[ 187.531766] mqprio_init+0x3ec/0x668
[ 187.535351] qdisc_create+0x15c/0x488
[ 187.539023] tc_modify_qdisc+0x398/0x73c
[ 187.542958] rtnetlink_rcv_msg+0x128/0x378
[ 187.547064] netlink_rcv_skb+0x60/0x130
[ 187.550910] rtnetlink_rcv+0x18/0x24
[ 187.554492] netlink_unicast+0x300/0x36c
[ 187.558425] netlink_sendmsg+0x1a8/0x420
[ 187.606759] ---[ end trace
0000000000000000 ]---
while the other TSN features in the VF are harmless, because the
net_device_ops used for the VF driver do not expose entry points for
these other features.
These capability bits are in the process of being defeatured from the SI
registers. We should read them from the port capability register, where
they are also present, and which is naturally only exposed to the PF.
The change to blame (relevant for stable backports) is the one where
this started being a problem, aka when the kernel started to crash due
to the wrong capability seen by the VF driver.
Fixes:
827145392a4a ("net: enetc: only commit preemptible TCs to hardware when MM TX is active")
Reported-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 28 Nov 2024 18:15:20 +0000 (10:15 -0800)]
Merge tag 'net-6.13-rc1' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Current release - regressions:
- rtnetlink: fix rtnl_dump_ifinfo() error path
- bluetooth: remove the redundant sco_conn_put
Previous releases - regressions:
- netlink: fix false positive warning in extack during dumps
- sched: sch_fq: don't follow the fast path if Tx is behind now
- ipv6: delete temporary address if mngtmpaddr is removed or
unmanaged
- tcp: fix use-after-free of nreq in reqsk_timer_handler().
- bluetooth: fix slab-use-after-free Read in set_powered_sync
- l2tp: fix warning in l2tp_exit_net found
- eth:
- bnxt_en: fix receive ring space parameters when XDP is active
- lan78xx: fix double free issue with interrupt buffer allocation
- tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets
Previous releases - always broken:
- ipmr: fix tables suspicious RCU usage
- iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()
- eth:
- octeontx2-af: fix low network performance
- stmmac: dwmac-socfpga: set RX watchdog interrupt as broken
- rtase: correct the speed for RTL907XD-V1
Misc:
- some documentation fixup"
* tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
ipmr: fix build with clang and DEBUG_NET disabled.
Documentation: tls_offload: fix typos and grammar
Fix spelling mistake
ipmr: fix tables suspicious RCU usage
ip6mr: fix tables suspicious RCU usage
ipmr: add debug check for mr table cleanup
selftests: rds: move test.py to TEST_FILES
net_sched: sch_fq: don't follow the fast path if Tx is behind now
tcp: Fix use-after-free of nreq in reqsk_timer_handler().
net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI
net: Comment copy_from_sockptr() explaining its behaviour
rxrpc: Improve setsockopt() handling of malformed user input
llc: Improve setsockopt() handling of malformed user input
Bluetooth: SCO: remove the redundant sco_conn_put
Bluetooth: MGMT: Fix possible deadlocks
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
bnxt_en: Unregister PTP during PCI shutdown and suspend
bnxt_en: Refactor bnxt_ptp_init()
bnxt_en: Fix receive ring space parameters when XDP is active
bnxt_en: Fix queue start to update vnic RSS table
...
Linus Torvalds [Thu, 28 Nov 2024 18:06:00 +0000 (10:06 -0800)]
Merge tag 'spi-fix-v6.13-merge-window' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few fairly minor driver specific fixes, plus one core fix for the
handling of deferred probe on ACPI systems - ignoring probe deferral
and incorrectly treating it like a fatal error while parsing the
generic ACPI bindings for SPI devices"
* tag 'spi-fix-v6.13-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Fix acpi deferred irq probe
spi: atmel-quadspi: Fix register name in verbose logging function
spi-imx: prevent overflow when estimating transfer time
spi: rockchip-sfc: Embedded DMA only support 4B aligned address
Linus Torvalds [Thu, 28 Nov 2024 17:40:53 +0000 (09:40 -0800)]
Merge tag 'regulator-fix-v6.13-merge-window' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes that came in during the merge window, plus
documetation of a new device ID for the Qualcomm LABIBB driver.
There's a core fix for the rarely used current constraints and a fix
for the Qualcomm RPMH driver which had described only one of the two
voltage ranges that the hardware could control, creating a potential
incompatibility with the configuration left by firmware"
* tag 'regulator-fix-v6.13-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Ignore unset max_uA constraints in current limit check
dt-bindings: regulator: qcom-labibb-regulator: document the pmi8950 labibb regulator
regulator: qcom-rpmh: Update ranges for FTSMPS525
Linus Torvalds [Thu, 28 Nov 2024 17:28:09 +0000 (09:28 -0800)]
Merge tag 'for-v6.13' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Power-supply core:
- replace power_supply_register_no_ws() with power_supply_register()
and a new "no_wakeup_source" field in struct power_supply_config
- constify battery info tables in the core and all drivers
- switch back to remove callback for all platform drivers
- allow power_supply_put() to be called from atomic context
- mark attribute arrays read-only after init
Power-supply drivers:
- new driver for TWL6030 and TWL6032
- rk817: improve battery capacity calibration
- misc small cleanups and fixes"
* tag 'for-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (42 commits)
power: reset: ep93xx: add AUXILIARY_BUS dependency
dt-bindings: power: reset: Convert mode-.* properties to array
power: supply: sc27xx: Fix battery detect GPIO probe
dt-bindings: power: supply: sc27xx-fg: document deprecated bat-detect-gpio
reset: keystone-reset: remove unused macros
power: supply: axp20x_battery: Use scaled iio_read_channel
power: supply: axp20x_usb_power: Use scaled iio_read_channel
power: supply: generic-adc-battery: change my gmail
power: supply: pmu_battery: Set power supply type to BATTERY
power: Switch back to struct platform_driver::remove()
power: supply: hwmon: move interface to private header
power: supply: rk817: Update battery capacity calibration
power: supply: rk817: stop updating info in suspend
power: supply: rt9471: Use IC status regfield to report real charger status
power: supply: rt9471: Fix wrong WDT function regfield declaration
dt-bindings: power/supply: qcom,pmi8998-charger: Drop incorrect "#interrupt-cells" from example
power: supply: core: mark attribute arrays as ro_after_init
power: supply: core: unexport power_supply_property_is_writeable()
power: supply: core: use device mutex wrappers
power: supply: bq27xxx: Fix registers of bq27426
...
Linus Torvalds [Thu, 28 Nov 2024 17:22:00 +0000 (09:22 -0800)]
Merge tag 'ntfs3_for_6.13' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
- additional checks to address issues identified by syzbot
- continuation of the transition from 'page' to 'folio'
* tag 'ntfs3_for_6.13' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Accumulated refactoring changes
fs/ntfs3: Switch to folio to release resources
fs/ntfs3: Add check in ntfs_extend_initialized_size
fs/ntfs3: Add more checks in mi_enum_attr (part 2)
fs/ntfs3: Equivalent transition from page to folio
fs/ntfs3: Fix case when unmarked clusters intersect with zone
fs/ntfs3: Fix warning in ni_fiemap
Linus Torvalds [Thu, 28 Nov 2024 17:18:11 +0000 (09:18 -0800)]
Merge tag 'exfat-for-6.13-rc1' of git://git./linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- If the start cluster of stream entry is invalid, treat it as the
empty directory
- Valid size of steam entry cannot be greater than data size. If
valid_size is invalid, use data_size
- Move Direct-IO alignment check to before extending the valid size
- Fix uninit-value issue reported by syzbot
- Optimize finding directory entry-set in write_inode, rename, unlink
* tag 'exfat-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: reduce FAT chain traversal
exfat: code cleanup for exfat_readdir()
exfat: remove argument 'p_dir' from exfat_add_entry()
exfat: move exfat_chain_set() out of __exfat_resolve_path()
exfat: add exfat_get_dentry_set_by_ei() helper
exfat: rename argument name for exfat_move_file and exfat_rename_file
exfat: remove unnecessary read entry in __exfat_rename()
exfat: fix file being changed by unaligned direct write
exfat: fix uninit-value in __exfat_get_dentry_set
exfat: fix out-of-bounds access of directory entries
Paolo Abeni [Thu, 28 Nov 2024 16:18:04 +0000 (17:18 +0100)]
ipmr: fix build with clang and DEBUG_NET disabled.
Sasha reported a build issue in ipmr::
net/ipv4/ipmr.c:320:13: error: function 'ipmr_can_free_table' is not \
needed and will not be emitted \
[-Werror,-Wunneeded-internal-declaration]
320 | static bool ipmr_can_free_table(struct net *net)
Apparently clang is too smart with BUILD_BUG_ON_INVALID(), let's
fallback to a plain WARN_ON_ONCE().
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/
v6.11-25635-g6813e2326f1e/testrun/
26111580/suite/build/test/clang-nightly-lkftconfig/details/
Fixes:
11b6e701bce9 ("ipmr: add debug check for mr table cleanup")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/ee75faa926b2446b8302ee5fc30e129d2df73b90.1732810228.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Leo Stone [Sun, 24 Nov 2024 23:00:02 +0000 (15:00 -0800)]
Documentation: tls_offload: fix typos and grammar
Fix typos and grammar where it improves readability.
Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20241124230002.56058-1-leocstone@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vyshnav Ajith [Thu, 21 Nov 2024 22:18:52 +0000 (03:48 +0530)]
Fix spelling mistake
Changed from reequires to require. A minute typo.
Signed-off-by: Vyshnav Ajith <puthen1977@gmail.com>
Link: https://patch.msgid.link/20241121221852.10754-1-puthen1977@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 28 Nov 2024 09:23:26 +0000 (10:23 +0100)]
Merge branch 'net-fix-mcast-rcu-splats'
Paolo Abeni says:
====================
net: fix mcast RCU splats
This series addresses the RCU splat triggered by the forwarding
mroute tests.
The first patch does not address any specific issue, but makes the
following ones more clear. Patch 2 and 3 address the issue for ipv6 and
ipv4 respectively.
====================
Link: https://patch.msgid.link/cover.1732289799.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Sun, 24 Nov 2024 15:40:58 +0000 (16:40 +0100)]
ipmr: fix tables suspicious RCU usage
Similar to the previous patch, plumb the RCU lock inside
the ipmr_get_table(), provided a lockless variant and apply
the latter in the few spots were the lock is already held.
Fixes:
709b46e8d90b ("net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT")
Fixes:
f0ad0860d01e ("ipv4: ipmr: support multiple tables")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Sun, 24 Nov 2024 15:40:57 +0000 (16:40 +0100)]
ip6mr: fix tables suspicious RCU usage
Several places call ip6mr_get_table() with no RCU nor RTNL lock.
Add RCU protection inside such helper and provide a lockless variant
for the few callers that already acquired the relevant lock.
Note that some users additionally reference the table outside the RCU
lock. That is actually safe as the table deletion can happen only
after all table accesses are completed.
Fixes:
e2d57766e674 ("net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.")
Fixes:
d7c31cbde4bc ("net: ip6mr: add RTM_GETROUTE netlink op")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Sun, 24 Nov 2024 15:40:56 +0000 (16:40 +0100)]
ipmr: add debug check for mr table cleanup
The multicast route tables lifecycle, for both ipv4 and ipv6, is
protected by RCU using the RTNL lock for write access. In many
places a table pointer escapes the RCU (or RTNL) protected critical
section, but such scenarios are actually safe because tables are
deleted only at namespace cleanup time or just after allocation, in
case of default rule creation failure.
Tables freed at namespace cleanup time are assured to be alive for the
whole netns lifetime; tables freed just after creation time are never
exposed to other possible users.
Ensure that the free conditions are respected in ip{,6}mr_free_table, to
document the locking schema and to prevent future possible introduction
of 'table del' operation from breaking it.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Sun, 24 Nov 2024 07:32:43 +0000 (07:32 +0000)]
selftests: rds: move test.py to TEST_FILES
The test.py should not be run separately. It should be run via run.sh,
which will do some sanity checks first. Move the test.py from TEST_PROGS
to TEST_FILES.
Reported-by: Maximilian Heyne <mheyne@amazon.de>
Closes: https://lore.kernel.org/netdev/
20241122150129.GB18887@dev-dsk-mheyne-1b-
55676e6a.eu-west-1.amazon.com
Fixes:
3ade6ce1255e ("selftests: rds: add testing infrastructure")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20241124073243.847932-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Sun, 24 Nov 2024 02:21:48 +0000 (18:21 -0800)]
net_sched: sch_fq: don't follow the fast path if Tx is behind now
Recent kernels cause a lot of TCP retransmissions
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.24 GBytes 19.2 Gbits/sec 2767 442 KBytes
[ 5] 1.00-2.00 sec 2.23 GBytes 19.1 Gbits/sec 2312 350 KBytes
^^^^
Replacing the qdisc with pfifo makes retransmissions go away.
It appears that a flow may have a delayed packet with a very near
Tx time. Later, we may get busy processing Rx and the target Tx time
will pass, but we won't service Tx since the CPU is busy with Rx.
If Rx sees an ACK and we try to push more data for the delayed flow
we may fastpath the skb, not realizing that there are already "ready
to send" packets for this flow sitting in the qdisc.
Don't trust the fastpath if we are "behind" according to the projected
Tx time for next flow waiting in the Qdisc. Because we consider anything
within the offload window to be okay for fastpath we must consider
the entire offload window as "now".
Qdisc config:
qdisc fq 8001: dev eth0 parent 1234:1 limit 10000p flow_limit 100p \
buckets 32768 orphan_mask 1023 bands 3 \
priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 \
weights 589824 196608 65536 quantum 3028b initial_quantum 15140b \
low_rate_threshold 550Kbit \
refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
For iperf this change seems to do fine, the reordering is gone.
The fastpath still gets used most of the time:
gc 0 highprio 0 fastpath 142614 throttled 418309 latency 19.1us
xx_behind 2731
where "xx_behind" counts how many times we hit the new "return false".
CC: stable@vger.kernel.org
Fixes:
076433bd78d7 ("net_sched: sch_fq: add fast path for mostly idle qdisc")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241124022148.3126719-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Sat, 23 Nov 2024 17:42:36 +0000 (09:42 -0800)]
tcp: Fix use-after-free of nreq in reqsk_timer_handler().
The cited commit replaced inet_csk_reqsk_queue_drop_and_put() with
__inet_csk_reqsk_queue_drop() and reqsk_put() in reqsk_timer_handler().
Then, oreq should be passed to reqsk_put() instead of req; otherwise
use-after-free of nreq could happen when reqsk is migrated but the
retry attempt failed (e.g. due to timeout).
Let's pass oreq to reqsk_put().
Fixes:
e8c526f2bdf1 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().")
Reported-by: Liu Jian <liujian56@huawei.com>
Closes: https://lore.kernel.org/netdev/
1284490f-9525-42ee-b7b8-
ccadf6606f6d@huawei.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241123174236.62438-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Sat, 23 Nov 2024 14:50:12 +0000 (14:50 +0000)]
net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI
When phy_ethtool_set_eee_noneg() detects a change in the LPI
parameters, it attempts to update phylib state and trigger the link
to cycle so the MAC sees the updated parameters.
However, in doing so, it sets phydev->enable_tx_lpi depending on
whether the EEE configuration allows the MAC to generate LPI without
taking into account the result of negotiation.
This can be demonstrated with a 1000base-T FD interface by:
# ethtool --set-eee eno0 advertise 8 # cause EEE to be not negotiated
# ethtool --set-eee eno0 tx-lpi off
# ethtool --set-eee eno0 tx-lpi on
This results in being true, despite EEE not having been negotiated and:
# ethtool --show-eee eno0
EEE status: enabled - inactive
Tx LPI: 250 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Fix this by keeping track of whether EEE was negotiated via a new
eee_active member in struct phy_device, and include this state in
the decision whether phydev->enable_tx_lpi should be set.
Fixes:
3e43b903da04 ("net: phy: Immediately call adjust_link if only tx_lpi_enabled changes")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tErSe-005RhB-2R@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 28 Nov 2024 08:23:02 +0000 (09:23 +0100)]
Merge tag 'for-net-2024-11-26' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- SCO: remove the redundant sco_conn_put
- MGMT: Fix slab-use-after-free Read in set_powered_sync
- MGMT: Fix possible deadlocks
* tag 'for-net-2024-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: SCO: remove the redundant sco_conn_put
Bluetooth: MGMT: Fix possible deadlocks
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
====================
Link: https://patch.msgid.link/20241126165149.899213-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 28 Nov 2024 07:57:42 +0000 (08:57 +0100)]
Merge branch 'net-fix-some-callers-of-copy_from_sockptr'
Michal Luczaj says:
====================
net: Fix some callers of copy_from_sockptr()
Some callers misinterpret copy_from_sockptr()'s return value. The function
follows copy_from_user(), i.e. returns 0 for success, or the number of
bytes not copied on error. Simply returning the result in a non-zero case
isn't usually what was intended.
Compile tested with CONFIG_LLC, CONFIG_AF_RXRPC, CONFIG_BT enabled.
Last patch probably belongs more to net-next, if any. Here as an RFC.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
====================
Link: https://patch.msgid.link/20241119-sockptr-copy-fixes-v3-0-d752cac4be8e@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj [Tue, 19 Nov 2024 13:31:43 +0000 (14:31 +0100)]
net: Comment copy_from_sockptr() explaining its behaviour
copy_from_sockptr() has a history of misuse. Add a comment explaining that
the function follows API of copy_from_user(), i.e. returns 0 for success,
or number of bytes not copied on error.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj [Tue, 19 Nov 2024 13:31:42 +0000 (14:31 +0100)]
rxrpc: Improve setsockopt() handling of malformed user input
copy_from_sockptr() does not return negative value on error; instead, it
reports the number of bytes that failed to copy. Since it's deprecated,
switch to copy_safe_from_sockptr().
Note: Keeping the `optlen != sizeof(unsigned int)` check as
copy_safe_from_sockptr() by itself would also accept
optlen > sizeof(unsigned int). Which would allow a more lenient handling
of inputs.
Fixes:
17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal Luczaj [Tue, 19 Nov 2024 13:31:41 +0000 (14:31 +0100)]
llc: Improve setsockopt() handling of malformed user input
copy_from_sockptr() is used incorrectly: return value is the number of
bytes that could not be copied. Since it's deprecated, switch to
copy_safe_from_sockptr().
Note: Keeping the `optlen != sizeof(int)` check as copy_safe_from_sockptr()
by itself would also accept optlen > sizeof(int). Which would allow a more
lenient handling of inputs.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: David Wei <dw@davidwei.uk>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Wed, 27 Nov 2024 22:50:31 +0000 (14:50 -0800)]
Merge tag 'acpi-6.13-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These add a common init function for arch-specific ACPI
initialization, clean up idle states initialization in the ACPI
processor_idle driver and update quirks:
- Introduce acpi_arch_init() for architecture-specific ACPI subsystem
initialization (Miao Wang)
- Clean up Asus quirks in acpi_quirk_skip_dmi_ids[] and add a quirk
to skip I2C clients on Acer Iconia One 8 A1-840 (Hans de Goede)
- Make the ACPI processor_idle driver use acpi_idle_play_dead() for
all idle states regardless of their types (Rafael Wysocki)"
* tag 'acpi-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: introduce acpi_arch_init()
ACPI: x86: Clean up Asus entries in acpi_quirk_skip_dmi_ids[]
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 8 A1-840
ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states
Linus Torvalds [Wed, 27 Nov 2024 22:40:33 +0000 (14:40 -0800)]
Merge tag 'pm-6.13-rc1-3' of git://git./linux/kernel/git/rafael/linux-pm
Pull morepower management updates from Rafael Wysocki:
"These update the OPP (Operating Performance Points) DT bindings for
ti-cpu (Dhruva Gole) and remove unused declarations from the OPP
header file (Zhang Zekun)"
* tag 'pm-6.13-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
dt-bindings: opp: operating-points-v2-ti-cpu: Describe opp-supported-hw
OPP: Remove unused declarations in header file
Linus Torvalds [Wed, 27 Nov 2024 22:36:00 +0000 (14:36 -0800)]
Merge tag 'thermal-6.13-rc1-3' of git://git./linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"These fix a Power Allocator thermal governor issue reported recently,
update the Intel int3400 thermal driver and simplify DT data parsing
in the thermal control subsystem:
- Add a NULL pointer check that was missed by recent modifications of
the Power Allocator thermal governor (Rafael Wysocki)
- Remove the data_vault attribute_group from int3400 because it is
only used for exposing one binary file that can be exposed directly
(Thomas Weißschuh)
- Prevent the current_uuid sysfs attribute in int3400 from mistakenly
treating valid UUID values as invalid on some older systems
(Srinivas Pandruvada)
- Use the cleanup.h mechanics to simplify DT data parsing in the
thermal core and some drivers (Krzysztof Kozlowski)"
* tag 'thermal-6.13-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: sun8i: Use scoped device node handling to simplify error paths
thermal: tegra: Simplify with scoped for each OF child loop
thermal: qcom-spmi-adc-tm5: Simplify with scoped for each OF child loop
thermal: of: Use scoped device node handling to simplify of_thermal_zone_find()
thermal: of: Use scoped memory and OF handling to simplify thermal_of_trips_init()
thermal: of: Simplify thermal_of_should_bind with scoped for each OF child
thermal: gov_power_allocator: Add missing NULL pointer check
thermal: int3400: Remove unneeded data_vault attribute_group
thermal: int3400: Fix reading of current_uuid for active policy
Linus Torvalds [Wed, 27 Nov 2024 22:24:34 +0000 (14:24 -0800)]
Merge tag 'for-linus-iommufd' of git://git./linux/kernel/git/jgg/iommufd
Pull more iommufd updates from Jason Gunthorpe:
"Change the driver callback op domain_alloc_user() into two ops:
domain_alloc_paging_flags() and domain_alloc_nesting() that better
describe what the ops are expected to do.
There will be per-driver cleanup based on this going into the next
cycle via the driver trees"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommu: Rename ops->domain_alloc_user() to domain_alloc_paging_flags()
iommu: Add ops->domain_alloc_nested()
Linus Torvalds [Wed, 27 Nov 2024 21:38:09 +0000 (13:38 -0800)]
Merge tag 'soundwire-6.13-rc1' of git://git./linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
- structure optimization of few bus structures and header updates
- support for 2.0 disco spec
- amd driver updates for acp revision, refactoring code and support for
acp6.3
- soft reset support for cadence driver
* tag 'soundwire-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (24 commits)
soundwire: Minor formatting fixups in sdw.h header
soundwire: Update the includes on the sdw.h header
soundwire: cadence: clear MCP BLOCK_WAKEUP in init
soundwire: cadence: add soft-reset on startup
soundwire: intel_auxdevice: add kernel parameter for mclk divider
soundwire: mipi-disco: add support for DP0/DPn 'lane-list' property
soundwire: mipi-disco: add new properties from 2.0 spec
soundwire: mipi-disco: add comment on DP0-supported property
soundwire: mipi-disco: add support for peripheral channelprepare timeout
soundwire: mipi_disco: add support for clock-scales property
soundwire: mipi-disco: add error handling for property array read
soundwire: mipi-disco: remove DPn audio-modes
soundwire: optimize sdw_dpn_prop
soundwire: optimize sdw_dp0_prop
soundwire: optimize sdw_slave_prop
soundwire: optimize sdw_bus structure
soundwire: optimize sdw_master_prop
soundwire: optimize sdw_stream_runtime memory layout
soundwire: mipi_disco: add MIPI-specific property_read_bool() helpers
soundwire: Correct some typos in comments
...
Linus Torvalds [Wed, 27 Nov 2024 21:33:43 +0000 (13:33 -0800)]
Merge tag 'phy-for-6.13' of git://git./linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"New hardware support:
- ST STM32MP25 combophy support
- Sparx5 support for lan969x serdes and updates to driver to support
this
- NXP PTN3222 eUSB2 to USB2 redriver
- Qualcomm SAR2130P eusb2 support, QCS8300 USB DW3 and QMP USB2
support, X1E80100 QMP PCIe PHY Gen4 support, QCS615 and QCS8300 QMP
UFS PHY support and SA8775P eDP PHY support
- Rockchip rk3576 usbdp and rk3576 usb2 phy support
- Binding for Microchip ATA6561 can phy
Updates:
- Freescale driver updates from hdmi support
- Conversion of rockchip rk3228 hdmi phy binding to yaml
- Broadcom usb2-phy deprecated support dropped and USB init array
update for BCM4908
- TI USXGMII mode support in J7200
- Switch back to platform_driver::remove() subsystem update"
* tag 'phy-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (59 commits)
phy: qcom: qmp: Fix lecacy-legacy typo
phy: lan969x-serdes: add support for lan969x serdes driver
dt-bindings: phy: sparx5: document lan969x
phy: sparx5-serdes: add support for branching on chip type
phy: sparx5-serdes: add indirection layer to register macros
phy: sparx5-serdes: add function for getting the CMU index
phy: sparx5-serdes: add ops to match data
phy: sparx5-serdes: add constant for the number of CMU's
phy: sparx5-serdes: add constants to match data
phy: sparx5-serdes: add support for private match data
phy: bcm-ns-usb2: drop support for old binding variant
dt-bindings: phy: bcm-ns-usb2-phy: drop deprecated variant
dt-bindings: phy: Add QMP UFS PHY compatible for QCS8300
dt-bindings: phy: qcom: snps-eusb2: Add SAR2130P compatible
dt-bindings: phy: ti,tcan104x-can: Document Microchip ATA6561
phy: airoha: Fix REG_CSR_2L_RX{0,1}_REV0 definitions
phy: airoha: Fix REG_CSR_2L_JCPLL_SDM_HREN config in airoha_pcie_phy_init_ssc_jcpll()
phy: airoha: Fix REG_PCIE_PMA_TX_RESET config in airoha_pcie_phy_init_csr_2l()
phy: airoha: Fix REG_CSR_2L_PLL_CMN_RESERVE0 config in airoha_pcie_phy_init_clk_out()
phy: phy-rockchip-samsung-hdptx: Don't request RST_PHY/RST_ROPLL/RST_LCPLL
...
Linus Torvalds [Wed, 27 Nov 2024 21:25:47 +0000 (13:25 -0800)]
Merge tag 'dmaengine-6.13-rc1' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"New hardware support:
- Qualcomm SAR2130P GPI dma support
- Sifive PIC64GX pdma support
- Rcar r7s72100 support and associated updates
Updates:
- STM32 DMA3 updates for packing/unpacking mode and prevention of
additional xfers
- Simplification of devm_acpi_dma_controller_register() and associate
cleanup including headers
- loongson prefix renames
- Switch back to platform_driver::remove() subsystem update"
* tag 'dmaengine-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: loongson2-apb: Rename the prefix ls2x to loongson2
dt-bindings: dma: sifive pdma: Add PIC64GX to compatibles
dmaengine: fix typo in the comment
dmaengine: stm32-dma3: clamp AXI burst using match data
dmaengine: stm32-dma3: prevent LL refactoring thanks to DT configuration
dt-bindings: dma: stm32-dma3: prevent additional transfers
dmaengine: stm32-dma3: refactor HW linked-list to optimize memory accesses
dmaengine: stm32-dma3: prevent pack/unpack thanks to DT configuration
dt-bindings: dma: stm32-dma3: prevent packing/unpacking mode
dmaengine: idxd: Move DSA/IAA device IDs to IDXD driver
dt-bindings: dma: qcom,gpi: Add SAR2130P compatible
dmaengine: Switch back to struct platform_driver::remove()
dmaengine: ep93xx: Fix unsigned compared against 0
dmaengine: acpi: Clean up headers
dmaengine: acpi: Simplify devm_acpi_dma_controller_register()
dmaengine: acpi: Drop unused devm_acpi_dma_controller_free()
dmaengine: sh: rz-dmac: add r7s72100 support
dt-bindings: dma: rz-dmac: Document RZ/A1H SoC
Linus Torvalds [Wed, 27 Nov 2024 21:23:13 +0000 (13:23 -0800)]
Merge tag 'gpio-fixes-for-v6.13-rc1' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"Apart from the gpio-exar fix which addresses an older issue, they all
fix regressions from this release cycle:
- fix missing GPIO chip labels in gpio-zevio and gpio-altera
- for the latter: also set GPIO base to -1 to use dynamic range
allocation
- fix value setting with external pull-up/down resistor in gpio-exar
- use the recommended IDA interfaces in gpio-mpsse"
* tag 'gpio-fixes-for-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: mpsse: Remove usage of the deprecated ida_simple_xx() API
gpio: exar: set value when external pull-up or pull-down is present
gpio: altera: Add missed base and label initialisations
gpio: zevio: Add missed label initialisation
Linus Torvalds [Wed, 27 Nov 2024 21:11:58 +0000 (13:11 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"A small number of improvements all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_vdpa: remove redundant check on desc
virtio_fs: store actual queue index in mq_map
virtio_fs: add informative log for new tag discovery
virtio: Make vring_new_virtqueue support packed vring
virtio_pmem: Add freeze/restore callbacks
vdpa/mlx5: Fix suboptimal range on iotlb iteration
Linus Torvalds [Wed, 27 Nov 2024 20:57:03 +0000 (12:57 -0800)]
Merge tag 'vfio-v6.13-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Constify an unmodified structure used in linking vfio and kvm
(Christophe JAILLET)
- Add ID for an additional hardware SKU supported by the nvgrace-gpu
vfio-pci variant driver (Ankit Agrawal)
- Fix incorrect signed cast in QAT vfio-pci variant driver, negating
test in check_add_overflow(), though still caught by later tests
(Giovanni Cabiddu)
- Additional debugfs attributes exposed in hisi_acc vfio-pci variant
driver for migration debugging (Longfang Liu)
- Migration support is added to the virtio vfio-pci variant driver,
becoming the primary feature of the driver while retaining emulation
of virtio legacy support as a secondary option (Yishai Hadas)
- Fixes to a few unwind flows in the mlx5 vfio-pci driver discovered
through reviews of the virtio variant driver (Yishai Hadas)
- Fix an unlikely issue where a PCI device exposed to userspace with an
unknown capability at the base of the extended capability chain can
overflow an array index (Avihai Horon)
* tag 'vfio-v6.13-rc1' of https://github.com/awilliam/linux-vfio:
vfio/pci: Properly hide first-in-list PCIe extended capability
vfio/mlx5: Fix unwind flows in mlx5vf_pci_save/resume_device_data()
vfio/mlx5: Fix an unwind issue in mlx5vf_add_migration_pages()
vfio/virtio: Enable live migration once VIRTIO_PCI was configured
vfio/virtio: Add PRE_COPY support for live migration
vfio/virtio: Add support for the basic live migration functionality
virtio-pci: Introduce APIs to execute device parts admin commands
virtio: Manage device and driver capabilities via the admin commands
virtio: Extend the admin command to include the result size
virtio_pci: Introduce device parts access commands
Documentation: add debugfs description for hisi migration
hisi_acc_vfio_pci: register debugfs for hisilicon migration driver
hisi_acc_vfio_pci: create subfunction for data reading
hisi_acc_vfio_pci: extract public functions for container_of
vfio/qat: fix overflow check in qat_vf_resume_write()
vfio/nvgrace-gpu: Add a new GH200 SKU to the devid table
kvm/vfio: Constify struct kvm_device_ops
Linus Torvalds [Wed, 27 Nov 2024 19:19:09 +0000 (11:19 -0800)]
Merge tag 'riscv-for-linus-6.13-mw1' of git://git./linux/kernel/git/riscv/linux
Pull RISC-v updates from Palmer Dabbelt:
- Support for pointer masking in userspace
- Support for probing vector misaligned access performance
- Support for qspinlock on systems with Zacas and Zabha
* tag 'riscv-for-linus-6.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (38 commits)
RISC-V: Remove unnecessary include from compat.h
riscv: Fix default misaligned access trap
riscv: Add qspinlock support
dt-bindings: riscv: Add Ziccrse ISA extension description
riscv: Add ISA extension parsing for Ziccrse
asm-generic: ticket-lock: Add separate ticket-lock.h
asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
riscv: Implement xchg8/16() using Zabha
riscv: Implement arch_cmpxchg128() using Zacas
riscv: Improve zacas fully-ordered cmpxchg()
riscv: Implement cmpxchg8/16() using Zabha
dt-bindings: riscv: Add Zabha ISA extension description
riscv: Implement cmpxchg32/64() using Zacas
riscv: Do not fail to build on byte/halfword operations with Zawrs
riscv: Move cpufeature.h macros into their own header
KVM: riscv: selftests: Add Smnpm and Ssnpm to get-reg-list test
RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests
riscv: hwprobe: Export the Supm ISA extension
riscv: selftests: Add a pointer masking test
riscv: Allow ptrace control of the tagged address ABI
...
Linus Torvalds [Wed, 27 Nov 2024 19:15:27 +0000 (11:15 -0800)]
Merge tag 'loongarch-6.13' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Fix build failure with GCC 15 due to default -std=gnu23
- Add PREEMPT_RT/PREEMPT_LAZY support
- Add I2S in DTS for Loongson-2K1000/Loongson-2K2000
- Some bug fixes and other small changes
* tag 'loongarch-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Update Loongson-3 default config file
LoongArch: dts: Add I2S support to Loongson-2K2000
LoongArch: dts: Add I2S support to Loongson-2K1000
LoongArch: Allow to enable PREEMPT_LAZY
LoongArch: Allow to enable PREEMPT_RT
LoongArch: Select HAVE_POSIX_CPU_TIMERS_TASK_WORK
LoongArch: Fix sleeping in atomic context for PREEMPT_RT
LoongArch: Reduce min_delta for the arch clockevent device
LoongArch: BPF: Sign-extend return values
LoongArch: Fix build failure with GCC 15 (-std=gnu23)
LoongArch: Explicitly specify code model in Makefile
Linus Torvalds [Wed, 27 Nov 2024 19:13:25 +0000 (11:13 -0800)]
Merge tag 'memblock-v6.13-rc1' of git://git./linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- replace hardcoded strings with str_on_off() in report_meminit()
- initialize reserved pages to MIGRATE_MOVABLE when deferred struct
page initialization is enabled so that if the reserved pages are
freed they are put on movable free lists like it is done now when
deferred struct page initialization is disabled
* tag 'memblock-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: uniformly initialize all reserved pages to MIGRATE_MOVABLE
mm: Use str_on_off() helper function in report_meminit()
Linus Torvalds [Wed, 27 Nov 2024 18:20:50 +0000 (10:20 -0800)]
Merge tag 'modules-6.13-rc1' of git://git./linux/kernel/git/modules/linux
Pull modules updates from Luis Chamberlain:
- The whole caching of module code into huge pages by Mike Rapoport is
going in through Andrew Morton's tree due to some other code
dependencies. That's really the biggest highlight for Linux kernel
modules in this release. With it we share huge pages for modules,
starting off with x86. Expect to see that soon through Andrew!
- Helge Deller addressed some lingering low hanging fruit alignment
enhancements by. It is worth pointing out that from his old patch
series I dropped his vmlinux.lds.h change at Masahiro's request as he
would prefer this to be specified in asm code [0].
[0] https://lore.kernel.org/all/
20240129192644.
3359978-5-mcgrof@kernel.org/T/#m9efef5e700fbecd28b7afb462c15eed8ba78ef5a
- Matthew Maurer and Sami Tolvanen have been tag teaming to help get us
closer to a modversions for Rust. In this cycle we take in quite a
lot of the refactoring for ELF validation. I expect modversions for
Rust will be merged by v6.14 as that code is mostly ready now.
- Adds a new modules selftests: kallsyms which helps us tests
find_symbol() and the limits of kallsyms on Linux today.
- We have a realtime mailing list to kernel-ci testing for modules now
which relies and combines patchwork, kpd and kdevops:
https://patchwork.kernel.org/project/linux-modules/list/
https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/README.md
https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/kernel-ci-kpd.md
https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/linux-modules-kdevops-ci.md
If you want to help avoid Linux kernel modules regressions, now its
simple, just add a new Linux modules sefltests under
tools/testing/selftests/module/ That is it. All new selftests will be
used and leveraged automatically by the CI.
* tag 'modules-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
tests/module/gen_test_kallsyms.sh: use 0 value for variables
scripts: Remove export_report.pl
selftests: kallsyms: add MODULE_DESCRIPTION
selftests: add new kallsyms selftests
module: Reformat struct for code style
module: Additional validation in elf_validity_cache_strtab
module: Factor out elf_validity_cache_strtab
module: Group section index calculations together
module: Factor out elf_validity_cache_index_str
module: Factor out elf_validity_cache_index_sym
module: Factor out elf_validity_cache_index_mod
module: Factor out elf_validity_cache_index_info
module: Factor out elf_validity_cache_secstrings
module: Factor out elf_validity_cache_sechdrs
module: Factor out elf_validity_ehdr
module: Take const arg in validate_section_offset
modules: Add missing entry for __ex_table
modules: Ensure 64-bit alignment on __ksymtab_* sections
Rafael J. Wysocki [Wed, 27 Nov 2024 17:59:16 +0000 (18:59 +0100)]
Merge branch 'thermal-intel'
Merge updates of Intel int3400 thermal driver for 6.13-rc1:
- Remove the data_vault attribute_group from int3400 because it is only
used for exposing one binary file that can be exposed directly (Thomas
Weißschuh).
- Prevent the current_uuid sysfs attribute in int3400 from mistakenly
treating valid UUID values as invalid on some older systems (Srinivas
Pandruvada).
* thermal-intel:
thermal: int3400: Remove unneeded data_vault attribute_group
thermal: int3400: Fix reading of current_uuid for active policy
Rafael J. Wysocki [Wed, 27 Nov 2024 17:49:48 +0000 (18:49 +0100)]
Merge branches 'acpi-misc' and 'acpi-x86'
Merge miscellaneous ACPI changes and x86-specific ACPI updates for
6.13-rc1:
- Introduce acpi_arch_init() for architecture-specific ACPI subsystem
initialization (Miao Wang).
- Clean up Asus quirks in acpi_quirk_skip_dmi_ids[] and add a quirk
to skip I2C clients on Acer Iconia One 8 A1-840 (Hans de Goede).
* acpi-misc:
ACPI: introduce acpi_arch_init()
* acpi-x86:
ACPI: x86: Clean up Asus entries in acpi_quirk_skip_dmi_ids[]
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 8 A1-840
Rafael J. Wysocki [Wed, 27 Nov 2024 17:41:48 +0000 (18:41 +0100)]
Merge branch 'pm-opp'
Merge OPP (Operating Performance Points) changes for 6.13-rc1:
- Describe opp-supported-hw property for ti-cpu (Dhruva Gole).
- Remove unused declarations from the OPP header file (Zhang Zekun).
* pm-opp:
dt-bindings: opp: operating-points-v2-ti-cpu: Describe opp-supported-hw
OPP: Remove unused declarations in header file
Linus Torvalds [Wed, 27 Nov 2024 16:11:46 +0000 (08:11 -0800)]
Merge tag 'vfs-6.13-rc1.fixes' of git://git./linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix a few iomap bugs
- Fix a wrong argument in backing file callback
- Fix security mount option retrieval in statmount()
- Cleanup how statmount() handles unescaped options
- Add a missing inode_owner_or_capable() check for setting write hints
- Clear the return value in read_kcore_iter() after a successful
iov_iter_zero()
- Fix a mount_setattr() selftest
- Fix function signature in mount api documentation
- Remove duplicate include header in the fscache code
* tag 'vfs-6.13-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/backing_file: fix wrong argument in callback
fs_parser: update mount_api doc to match function signature
fs: require inode_owner_or_capable for F_SET_RW_HINT
fs/proc/kcore.c: Clear ret value in read_kcore_iter after successful iov_iter_zero
statmount: fix security option retrieval
statmount: clean up unescaped option handling
fscache: Remove duplicate included header
iomap: elide flush from partial eof zero range
iomap: lift zeroed mapping handling into iomap_zero_range()
iomap: reset per-iter state on non-error iter advances
iomap: warn on zero range of a post-eof folio
selftests/mount_setattr: Fix failures on 64K PAGE_SIZE kernels
Linus Torvalds [Wed, 27 Nov 2024 16:03:38 +0000 (08:03 -0800)]
Merge tag 'vfs-6.13.exec.deny_write_access.revert' of git://git./linux/kernel/git/vfs/vfs
Pull deny_write_access revert from Christian Brauner:
"It turns out that the mold linker relies on the deny_write_access()
mechanism for executables.
The mold linker tries to open a file for writing and if ETXTBSY is
returned mold falls back to creating a new file"
* tag 'vfs-6.13.exec.deny_write_access.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Revert "fs: don't block i_writecount during exec"
Christian Brauner [Wed, 27 Nov 2024 11:45:02 +0000 (12:45 +0100)]
Revert "fs: don't block i_writecount during exec"
This reverts commit
2a010c41285345da60cece35575b4e0af7e7bf44.
Rui Ueyama <rui314@gmail.com> writes:
> I'm the creator and the maintainer of the mold linker
> (https://github.com/rui314/mold). Recently, we discovered that mold
> started causing process crashes in certain situations due to a change
> in the Linux kernel. Here are the details:
>
> - In general, overwriting an existing file is much faster than
> creating an empty file and writing to it on Linux, so mold attempts to
> reuse an existing executable file if it exists.
>
> - If a program is running, opening the executable file for writing
> previously failed with ETXTBSY. If that happens, mold falls back to
> creating a new file.
>
> - However, the Linux kernel recently changed the behavior so that
> writing to an executable file is now always permitted
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
2a010c412853).
>
> That caused mold to write to an executable file even if there's a
> process running that file. Since changes to mmap'ed files are
> immediately visible to other processes, any processes running that
> file would almost certainly crash in a very mysterious way.
> Identifying the cause of these random crashes took us a few days.
>
> Rejecting writes to an executable file that is currently running is a
> well-known behavior, and Linux had operated that way for a very long
> time. So, I don’t believe relying on this behavior was our mistake;
> rather, I see this as a regression in the Linux kernel.
Quoting myself from commit
2a010c412853 ("fs: don't block i_writecount during exec")
> Yes, someone in userspace could potentially be relying on this. It's not
> completely out of the realm of possibility but let's find out if that's
> actually the case and not guess.
It seems we found out that someone is relying on this obscure behavior.
So revert the change.
Link: https://github.com/rui314/mold/issues/1361
Link: https://lore.kernel.org/r/4a2bc207-76be-4715-8e12-7fc45a76a125@leemhuis.info
Cc: <stable@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Linus Torvalds [Wed, 27 Nov 2024 02:36:55 +0000 (18:36 -0800)]
Merge tag 'rpmsg-v6.13' of git://git./linux/kernel/git/remoteproc/linux
Pull rpmsg update from Bjorn Andersson:
"Correct GLINK driver's decoding of the CMD_OPEN message, as upper half
of the second parameter encodes 'priority', and 'length' is only the
lower half"
* tag 'rpmsg-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
rpmsg: glink: use only lower 16-bits of param2 for CMD_OPEN name length
Linus Torvalds [Wed, 27 Nov 2024 02:34:13 +0000 (18:34 -0800)]
Merge tag 'rproc-v6.13' of git://git./linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
"Make Qualcomm TrustZone Peripherial Authentication Service-remoteproc
identifier/name human friendly. Add audio DSP support for the Qualcomm
SAR2130P. Ensure IMEM access in the Qualcomm modem remoteproc driver
is performed prior to the firmware enabling the XPU and locking us
out.
Improve error handling, error logging, compile testing support, and a
few other stylistic things across a variety of the drivers"
* tag 'rproc-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (30 commits)
remoteproc: qcom: wcss: Remove double assignment in q6v5_wcss_probe()
remoteproc: qcom_q6v5_mss: Re-order writes to the IMEM region
remoteproc: qcom_wcnss_iris: Simplify with dev_err_probe()
remoteproc: qcom_q6v5_wcss: Simplify with dev_err_probe()
remoteproc: qcom_q6v5_pas: Simplify with dev_err_probe()
remoteproc: qcom_q6v5_mss: Drop redundant error printks in probe
remoteproc: qcom_q6v5_mss: Simplify with dev_err_probe()
remoteproc: qcom_q6v5_adsp: Simplify with dev_err_probe()
remoteproc: qcom_q6v5_pas: disable auto boot for wpss
remoteproc: qcom: pas: Make remoteproc name human friendly
remoteproc: qcom: pas: enable SAR2130P audio DSP support
remoteproc: qcom: pas: add minidump_id to SM8350 resources
dt-bindings: remoteproc: qcom,sm8350-pas: add SAR2130P aDSP compatible
dt-bindings: remoteproc: qcom,sm8550-pas: Add SM8750 ADSP
remoteproc: qcom: wcss: Remove subdevs on the error path of q6v5_wcss_probe()
remoteproc: qcom: adsp: Remove subdevs on the error path of adsp_probe()
remoteproc: qcom: pas: Remove subdevs on the error path of adsp_probe()
remoteproc: Switch back to struct platform_driver::remove()
remoteproc: k3-dsp: Force cast from iomem address space
remoteproc: k3-r5: Force cast from iomem address space
...
Linus Torvalds [Wed, 27 Nov 2024 02:28:11 +0000 (18:28 -0800)]
Merge tag 'hwmon-for-v6.13-rc1-take2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- aquacomputer_d5next: Fix length of speed_input array
- tps23861: Fix reporting of negative temperatures
- tmp108: Do not fail in I3C probe when I3C regmap is a module
* tag 'hwmon-for-v6.13-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (aquacomputer_d5next) Fix length of speed_input array
hwmon: (tps23861) Fix reporting of negative temperatures
hwmon: (tmp108) Do not fail in I3C probe when I3C regmap is a module
Linus Torvalds [Wed, 27 Nov 2024 02:23:31 +0000 (18:23 -0800)]
Merge tag 'i3c/for-6.13' of git://git./linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Core:
- avoid possible deadlock on probe
- ensured preferred address is used on hot-join
Drivers:
- dw: add AMD I3C controller support
- mipi-i3c-hci: fix SETDASA, DMA interrupts fixes
- svc: many fixes for IBI and hotjoin"
* tag 'i3c/for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: Use i3cdev->desc->info instead of calling i3c_device_get_info() to avoid deadlock
i3c: mipi-i3c-hci: Support SETDASA CCC
i3c: dw: Add quirk to address OD/PP timing issue on AMD platform
i3c: dw: Add support for AMDI0015 ACPI ID
i3c: master: svc: Modify enabled_events bit 7:0 to act as IBI enable counter
i3c: Document I3C_ADDR_SLOT_EXT_STATUS_MASK
i3c: master: svc: Fix pm_runtime_set_suspended() with runtime pm enabled
i3c: mipi-i3c-hci: Handle interrupts according to current specifications
i3c: mipi-i3c-hci: Mask ring interrupts before ring stop request
i3c: master: Fix miss free init_dyn_addr at i3c_master_put_i3c_addrs()
i3c: master: Remove i3c_dev_disable_ibi_locked(olddev) on device hotjoin
i3c: master: svc: fix possible assignment of the same address to two devices
i3c: master: svc: wait for Manual ACK/NACK Done before next step
i3c: master: svc: use spin_lock_irqsave at svc_i3c_master_ibi_work()
i3c: master: svc: need check IBIWON for dynamic address assignment
i3c: master: svc: manually emit NACK/ACK for hotjoin
i3c: master: svc: use repeat start when IBI WIN happens
i3c: master: Fix dynamic address leak when 'assigned-address' is present
i3c: master: Extend address status bit to 4 and add I3C_ADDR_SLOT_EXT_DESIRED
i3c: master: Replace hard code 2 with macro I3C_ADDR_SLOT_STATUS_BITS
Linus Torvalds [Wed, 27 Nov 2024 02:05:44 +0000 (18:05 -0800)]
Merge tag 'pci-v6.13-changes' of git://git./linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Make pci_stop_dev() and pci_destroy_dev() safe so concurrent
callers can't stop a device multiple times, even as we migrate from
the global pci_rescan_remove_lock to finer-grained locking (Keith
Busch)
- Improve pci_walk_bus() implementation by making it recursive and
moving locking up to avoid need for a 'locked' parameter (Keith
Busch)
- Unexport pci_walk_bus_locked(), which is only used internally by
the PCI core (Keith Busch)
- Detect some Thunderbolt chips that are built-in and hence
'trustworthy' by a heuristic since the 'ExternalFacingPort' and
'usb4-host-interface' ACPI properties are not quite enough (Esther
Shimanovich)
Resource management:
- Use PCI bus addresses (not CPU addresses) in 'ranges' properties
when building dynamic DT nodes so systems where PCI and CPU
addresses differ work correctly (Andrea della Porta)
- Tidy resource sizing and assignment with helpers to reduce
redundancy (Ilpo Järvinen)
- Improve pdev_sort_resources() 'bogus alignment' warning to be more
specific (Ilpo Järvinen)
Driver binding:
- Convert driver .remove_new() callbacks to .remove() again to finish
the conversion from returning 'int' to being 'void' (Sergio
Paracuellos)
- Export pcim_request_all_regions(), a managed interface to request
all BARs (Philipp Stanner)
- Replace pcim_iomap_regions_request_all() with
pcim_request_all_regions(), and pcim_iomap_table()[n] with
pcim_iomap(n), in the following drivers: ahci, crypto qat, crypto
octeontx2, intel_th, iwlwifi, ntb idt, serial rp2, ALSA korg1212
(Philipp Stanner)
- Remove the now unused pcim_iomap_regions_request_all() (Philipp
Stanner)
- Export pcim_iounmap_region(), a managed interface to unmap and
release a PCI BAR (Philipp Stanner)
- Replace pcim_iomap_regions(mask) with pcim_iomap_region(n), and
pcim_iounmap_regions(mask) with pcim_iounmap_region(n), in the
following drivers: fpga dfl-pci, block mtip32xx, gpio-merrifield,
cavium (Philipp Stanner)
Error handling:
- Add sysfs 'reset_subordinate' to reset the entire hierarchy below a
bridge; previously Secondary Bus Reset could only be used when
there was a single device below a bridge (Keith Busch)
- Warn if we reset a running device where the driver didn't register
pci_error_handlers notification callbacks (Keith Busch)
ASPM:
- Disable ASPM L1 before touching L1 PM Substates to follow the spec
closer and avoid a CPU load timeout on some platforms (Ajay
Agarwal)
- Set devices below Intel VMD to D0 before enabling ASPM L1 Substates
as required per spec for all L1 Substates changes (Jian-Hong Pan)
Power management:
- Enable starfive controller runtime PM before probing host bridge
(Mayank Rana)
- Enable runtime power management for host bridges (Krishna chaitanya
chundru)
Power control:
- Use of_platform_device_create() instead of of_platform_populate()
to create pwrctl platform devices so we can control it based on the
child nodes (Manivannan Sadhasivam)
- Create pwrctrl platform devices only if there's a relevant power
supply property (Manivannan Sadhasivam)
- Add device link from the pwrctl supplier to the PCI dev to ensure
pwrctl drivers are probed before the PCI dev driver; this avoids a
race where pwrctl could change device power state while the PCI
driver was active (Manivannan Sadhasivam)
- Find pwrctl device for removal with of_find_device_by_node()
instead of searching all children of the parent (Manivannan
Sadhasivam)
- Rename 'pwrctl' to 'pwrctrl' to match new bandwidth controller
('bwctrl') and hotplug files (Bjorn Helgaas)
Bandwidth control:
- Add read/modify/write locking for Link Control 2, which is used to
manage Link speed (Ilpo Järvinen)
- Extract Link Bandwidth Management Status check into
pcie_lbms_seen(), where it can be shared between the bandwidth
controller and quirks that use it to help retrain failed links
(Ilpo Järvinen)
- Re-add Link Bandwidth notification support with updates to address
the reasons it was previously reverted (Alexandru Gagniuc, Ilpo
Järvinen)
- Add pcie_set_target_speed() and related functionality so drivers
can manage PCIe Link speed based on thermal or other constraints
(Ilpo Järvinen)
- Add a thermal cooling driver to throttle PCIe Links via the
existing thermal management framework (Ilpo Järvinen)
- Add a userspace selftest for the PCIe bandwidth controller (Ilpo
Järvinen)
PCI device hotplug:
- Add hotplug controller driver for Marvell OCTEON multi-function
device where function 0 has a management console interface to
enable/disable and provision various personalities for the other
functions (Shijith Thotton)
- Retain a reference to the pci_bus for the lifetime of a pci_slot to
avoid a use-after-free when the thunderbolt driver resets USB4 host
routers on boot, causing hotplug remove/add of downstream docks or
other devices (Lukas Wunner)
- Remove unused cpcihp struct cpci_hp_controller_ops.hardware_test
(Guilherme Giacomo Simoes)
- Remove unused cpqphp struct ctrl_dbg.ctrl (Christophe JAILLET)
- Use pci_bus_read_dev_vendor_id() instead of hand-coded presence
detection in cpqphp (Ilpo Järvinen)
- Simplify cpqphp enumeration, which is already simple-minded and
doesn't handle devices below hot-added bridges (Ilpo Järvinen)
Virtualization:
- Add ACS quirk for Wangxun FF5xxx NICs, which don't advertise an ACS
capability but do isolate functions as though PCI_ACS_RR and
PCI_ACS_CR were set, so the functions can be in independent IOMMU
groups (Mengyuan Lou)
TLP Processing Hints (TPH):
- Add and document TLP Processing Hints (TPH) support so drivers can
enable and disable TPH and the kernel can save/restore TPH
configuration (Wei Huang)
- Add TPH Steering Tag support so drivers can retrieve Steering Tag
values associated with specific CPUs via an ACPI _DSM to improve
performance by directing DMA writes closer to their consumers (Wei
Huang)
Data Object Exchange (DOE):
- Wait up to 1 second for DOE Busy bit to clear before writing a
request to the mailbox to avoid failures if the mailbox is still
busy from a previous transfer (Gregory Price)
Endpoint framework:
- Skip attempts to allocate from endpoint controller memory window if
the requested size is larger than the window (Damien Le Moal)
- Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to
handle controller-specific size and alignment constraints, and add
test cases to the endpoint test driver (Damien Le Moal)
- Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can
observe DWC-specific alignment requirements (Damien Le Moal)
- Synchronously cancel command handler work in endpoint test before
cleaning up DMA and BARs (Damien Le Moal)
- Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas
Cassel)
- Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and
dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent
(Niklas Cassel)
- Avoid NULL dereference if Modem Host Interface Endpoint lacks
'mmio' DT property (Zhongqiu Han)
- Release PCI domain ID of Endpoint controller parent (not controller
itself) and before unregistering the controller, to avoid
use-after-free (Zijun Hu)
- Clear secondary (not primary) EPC in pci_epc_remove_epf() when
removing the secondary controller associated with an NTB (Zijun Hu)
Cadence PCIe controller driver:
- Lower severity of 'phy-names' message (Bartosz Wawrzyniak)
Freescale i.MX6 PCIe controller driver:
- Fix suspend/resume support on i.MX6QDL, which has a hardware
erratum that prevents use of L2 (Stefan Eichenberger)
Intel VMD host bridge driver:
- Add 0xb60b and 0xb06f Device IDs for client SKUs (Nirmal Patel)
MediaTek PCIe Gen3 controller driver:
- Update mediatek-gen3 DT binding to require the exact number of
clocks for each SoC (Fei Shao)
- Add support for DT 'max-link-speed' and 'num-lanes' properties to
restrict the link speed and width (AngeloGioacchino Del Regno)
Microchip PolarFlare PCIe controller driver:
- Add DT and driver support for using either of the two PolarFire
Root Ports (Conor Dooley)
NVIDIA Tegra194 PCIe controller driver:
- Move endpoint controller cleanups that depend on refclk from the
host to the notifier that tells us the host has deasserted PERST#,
when refclk should be valid (Manivannan Sadhasivam)
Qualcomm PCIe controller driver:
- Add qcom SAR2130P DT binding with an additional clock (Dmitry
Baryshkov)
- Enable MSI interrupts if 'global' IRQ is supported, since a
previous commit unintentionally masked them (Manivannan Sadhasivam)
- Move endpoint controller cleanups that depend on refclk from the
host to the notifier that tells us the host has deasserted PERST#,
when refclk should be valid (Manivannan Sadhasivam)
- Add DT binding and driver support for IPQ9574, with Synopsys IP
v5.80a and Qcom IP 1.27.0 (devi priya)
- Move the OPP "operating-points-v2" table from the
qcom,pcie-sm8450.yaml DT binding to qcom,pcie-common.yaml, where it
can be used by other Qcom platforms (Qiang Yu)
- Add 'global' SPI interrupt for events like link-up, link-down to
qcom,pcie-x1e80100 DT binding so we can start enumeration when the
link comes up (Qiang Yu)
- Disable ASPM L0s for qcom,pcie-x1e80100 since the PHY is not tuned
to support this (Qiang Yu)
- Add ops_1_21_0 for SC8280X family SoC, which doesn't use the
'iommu-map' DT property and doesn't need BDF-to-SID translation
(Qiang Yu)
Rockchip PCIe controller driver:
- Define ROCKCHIP_PCIE_AT_SIZE_ALIGN to replace magic 256 endpoint
.align value (Damien Le Moal)
- When unmapping an endpoint window, compute the region index instead
of searching for it, and verify that the address was mapped (Damien
Le Moal)
- When mapping an endpoint window, verify that the address hasn't
been mapped already (Damien Le Moal)
- Implement pci_epc_ops.align_addr() for rockchip-ep (Damien Le Moal)
- Fix MSI IRQ data mapping to observe the alignment constraint, which
fixes intermittent page faults in memcpy_toio() and memcpy_fromio()
(Damien Le Moal)
- Rename rockchip_pcie_parse_ep_dt() to
rockchip_pcie_ep_get_resources() for consistency with similar DT
interfaces (Damien Le Moal)
- Skip the unnecessary link train in rockchip_pcie_ep_probe() and do
it only in the endpoint start operation (Damien Le Moal)
- Implement pci_epc_ops.stop_link() to disable link training and
controller configuration (Damien Le Moal)
- Attempt link training at 5 GT/s when both partners support it
(Damien Le Moal)
- Add a handler for PERST# signal so we can detect host-initiated
resets and start link training after PERST# is deasserted (Damien
Le Moal)
Synopsys DesignWare PCIe controller driver:
- Clear outbound address on unmap so dw_pcie_find_index() won't match
an ATU index that was already unmapped (Damien Le Moal)
- Use of_property_present() instead of of_property_read_bool() when
testing for presence of non-boolean DT properties (Rob Herring)
- Advertise 1MB size if endpoint supports Resizable BARs, which was
inadvertently lost in v6.11 (Niklas Cassel)
TI J721E PCIe driver:
- Add PCIe support for J722S SoC (Siddharth Vadapalli)
- Delay PCIE_T_PVPERL_MS (100 ms), not just PCIE_T_PERST_CLK_US (100
us), before deasserting PERST# to ensure power and refclk are
stable (Siddharth Vadapalli)
TI Keystone PCIe controller driver:
- Set the 'ti,keystone-pcie' mode so v3.65a devices work in Root
Complex mode (Kishon Vijay Abraham I)
- Try to avoid unrecoverable SError for attempts to issue config
transactions when the link is down; this is racy but the best we
can do (Kishon Vijay Abraham I)
Miscellaneous:
- Reorganize kerneldoc parameter names to match order in function
signature (Julia Lawall)
- Fix sysfs reset_method_store() memory leak (Todd Kjos)
- Simplify pci_create_slot() (Ilpo Järvinen)
- Fix incorrect printf format specifiers in pcitest (Luo Yifan)"
* tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (127 commits)
PCI: rockchip-ep: Handle PERST# signal in EP mode
PCI: rockchip-ep: Improve link training
PCI: rockship-ep: Implement the pci_epc_ops::stop_link() operation
PCI: rockchip-ep: Refactor endpoint link training enable
PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding
PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations
PCI: rockchip-ep: Rename rockchip_pcie_parse_ep_dt()
PCI: rockchip-ep: Fix MSI IRQ data mapping
PCI: rockchip-ep: Implement the pci_epc_ops::align_addr() operation
PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr()
PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr()
PCI: rockchip-ep: Use a macro to define EP controller .align feature
PCI: rockchip-ep: Fix address translation unit programming
PCI/pwrctrl: Rename pwrctrl functions and structures
PCI/pwrctrl: Rename pwrctl files to pwrctrl
PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent
PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers
PCI/pwrctl: Create pwrctl device only if at least one power supply is present
PCI/pwrctl: Use of_platform_device_create() to create pwrctl devices
tools: PCI: Fix incorrect printf format specifiers
...
Linus Torvalds [Wed, 27 Nov 2024 01:54:58 +0000 (17:54 -0800)]
rust: fix up formatting after merge
When I merged the rust 'use' imports, I didn't realize that there's
an offical preferred idiomatic format - so while it all worked fine,
it doesn't match what 'make rustfmt' wants to make it.
Fix it up appropriately.
Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 26 Nov 2024 22:54:00 +0000 (14:54 -0800)]
Merge tag 'perf-tools-for-v6.13-2024-11-24' of git://git./linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf record:
- Enable leader sampling for inherited task events. It was supported
only for system-wide events but the kernel started to support such
a setup since v6.12.
This is to reduce the number of PMU interrupts. The samples of the
leader event will contain counts of other events and no samples
will be generated for the other member events.
$ perf record -e '{cycles,instructions}:S' ${MYPROG}
perf report:
- Fix --branch-history option to display more branch-related
information like prediction, abort and cycles which is available
on Intel machines.
$ perf record -bg -- perf test -w brstack
$ perf report --branch-history
...
#
# Overhead Source:Line Symbol Shared Object Predicted Abort Cycles IPC [IPC Coverage]
# ........ ........................ .............. .................... ......... ..... ...... ....................
#
8.17% copy_page_64.S:19 [k] copy_page [kernel.kallsyms] 50.0% 0 5 - -
|
---xas_load xarray.h:171
|
|--5.68%--xas_load xarray.c:245 (cycles:1)
| xas_load xarray.c:242
| xas_load xarray.h:1260 (cycles:1)
| xas_descend xarray.c:146
| xas_load xarray.c:244 (cycles:2)
| xas_load xarray.c:245
| xas_descend xarray.c:218 (cycles:10)
...
perf stat:
- Add HWMON PMU support.
The HWMON provides various system information like CPU/GPU
temperature, fan speed and so on. Expose them as PMU events so that
users can see the values using perf stat commands.
$ perf stat -e temp_cpu,fan1 true
Performance counter stats for 'true':
60.00 'C temp_cpu
0 rpm fan1
0.
000745382 seconds time elapsed
0.
000883000 seconds user
0.
000000000 seconds sys
- Display metric threshold in JSON output.
Some metrics define thresholds to classify value ranges. It used to
be in a different color but it won't work for JSON.
Add "metric-threshold" field to the JSON that can be one of "good",
"less good", "nearly bad" and "bad".
# perf stat -a -M TopdownL1 -j true
{"counter-value" : "
18693525.000000", "unit" : "", "event" : "TOPDOWN.SLOTS", "event-runtime" :
5552708, "pcnt-running" : 100.00, "metric-value" : "43.226002", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"}
{"metric-value" : "29.212267", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"}
{"metric-value" : "7.138972", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"}
{"metric-value" : "20.422759", "metric-unit" : "% tma_retiring", "metric-threshold" : "good"}
{"counter-value" : "
3817732.000000", "unit" : "", "event" : "topdown-retiring", "event-runtime" :
5552708, "pcnt-running" : 100.00, }
{"counter-value" : "
5472824.000000", "unit" : "", "event" : "topdown-fe-bound", "event-runtime" :
5552708, "pcnt-running" : 100.00, }
{"counter-value" : "
7984780.000000", "unit" : "", "event" : "topdown-be-bound", "event-runtime" :
5552708, "pcnt-running" : 100.00, }
{"counter-value" : "
1418181.000000", "unit" : "", "event" : "topdown-bad-spec", "event-runtime" :
5552708, "pcnt-running" : 100.00, }
...
perf sched:
- Add -P/--pre-migrations option for 'timehist' sub-command to track
time a task waited on a run-queue before migrating to a different
CPU.
$ perf sched timehist -P
time cpu task name wait time sch delay run time pre-mig time
[tid/pid] (msec) (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- --------- ---------
585940.535527 [0000] perf[584885] 0.000 0.000 0.000 0.000
585940.535535 [0000] migration/0[20] 0.000 0.002 0.008 0.000
585940.535559 [0001] perf[584885] 0.000 0.000 0.000 0.000
585940.535563 [0001] migration/1[25] 0.000 0.001 0.004 0.000
585940.535678 [0002] perf[584885] 0.000 0.000 0.000 0.000
585940.535686 [0002] migration/2[31] 0.000 0.002 0.008 0.000
585940.535905 [0001] <idle> 0.000 0.000 0.342 0.000
585940.535938 [0003] perf[584885] 0.000 0.000 0.000 0.000
585940.537048 [0001] sleep[584886] 0.000 0.019 1.142 0.001
585940.537749 [0002] <idle> 0.000 0.000 2.062 0.000
...
Build:
- Make libunwind opt-in (LIBUNWIND=1) rather than opt-out.
The perf tools are generally built with libelf and libdw which has
unwinder functionality. The libunwind support predates it and no
need to have duplicate unwinders by default.
- Rename NO_DWARF=1 build option to NO_LIBDW=1 in order to clarify
it's using libdw for handling DWARF information.
Internals:
- Do not set exclude_guest bit in the perf_event_attr by default.
This was causing a trouble in AMD IBS PMU as it doesn't support the
bit. The bit will be set when it's needed later by the fallback
logic. Also update the missing feature detection logic to make sure
not clear supported bits unnecessarily.
- Run perf test in parallel by default and mark flaky tests
"exclusive" to run them serially at the end. Some test numbers are
changed but the test can complete in less than half the time.
JSON vendor events:
- Add AMD Zen 5 events and metrics.
- Add i.MX91 and i.MX95 DDR metrics
- Fix HiSilicon HIP08 Topdown metric name.
- Support compat events on PowerPC"
* tag 'perf-tools-for-v6.13-2024-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (232 commits)
perf tests: Fix hwmon parsing with PMU name test
perf hwmon_pmu: Ensure hwmon key union is zeroed before use
perf tests hwmon_pmu: Remove double evlist__delete()
perf/test: fix perf ftrace test on s390
perf bpf-filter: Return -ENOMEM directly when pfi allocation fails
perf test: Correct hwmon test PMU detection
perf: Remove unused del_perf_probe_events()
perf pmu: Move pmu_metrics_table__find and remove ARM override
perf jevents: Add map_for_cpu()
perf header: Pass a perf_cpu rather than a PMU to get_cpuid_str
perf header: Avoid transitive PMU includes
perf arm64 header: Use cpu argument in get_cpuid
perf header: Refactor get_cpuid to take a CPU for ARM
perf header: Move is_cpu_online to numa bench
perf jevents: fix breakage when do perf stat on system metric
perf test: Add missing __exit calls in tool/hwmon tests
perf tests: Make leader sampling test work without branch event
perf util: Remove kernel version deadcode
perf test shell trace_exit_race: Use --no-comm to avoid cases where COMM isn't resolved
perf test shell trace_exit_race: Show what went wrong in verbose mode
...
Linus Torvalds [Tue, 26 Nov 2024 22:49:20 +0000 (14:49 -0800)]
Merge tag 'parisc-for-6.13-rc1' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc architecture update from Helge Deller:
- Fix function graph tracing disablement on parisc
* tag 'parisc-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc/ftrace: Fix function graph tracing disablement
Linus Torvalds [Tue, 26 Nov 2024 22:47:58 +0000 (14:47 -0800)]
Merge tag 'm68knommu-for-v6.13' of git://git./linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
- only include FEC platform entries when hardware supports it
- fix typo in ifdef config name
* tag 'm68knommu-for-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: coldfire/device.c: only build FEC when HW macros are defined
m68k: mcfgpio: Fix incorrect register offset for CONFIG_M5441x
Linus Torvalds [Tue, 26 Nov 2024 22:00:26 +0000 (14:00 -0800)]
Merge tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux
Pull rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Enable a series of lints, including safety-related ones, e.g. the
compiler will now warn about missing safety comments, as well as
unnecessary ones. How safety documentation is organized is a
frequent source of review comments, thus having the compiler guide
new developers on where they are expected (and where not) is very
nice.
- Start using '#[expect]': an interesting feature in Rust (stabilized
in 1.81.0) that makes the compiler warn if an expected warning was
_not_ emitted. This is useful to avoid forgetting cleaning up
locally ignored diagnostics ('#[allow]'s).
- Introduce '.clippy.toml' configuration file for Clippy, the Rust
linter, which will allow us to tweak its behaviour. For instance,
our first use cases are declaring a disallowed macro and, more
importantly, enabling the checking of private items.
- Lints-related fixes and cleanups related to the items above.
- Migrate from 'receiver_trait' to 'arbitrary_self_types': to get the
kernel into stable Rust, one of the major pieces of the puzzle is
the support to write custom types that can be used as 'self', i.e.
as receivers, since the kernel needs to write types such as 'Arc'
that common userspace Rust would not. 'arbitrary_self_types' has
been accepted to become stable, and this is one of the steps
required to get there.
- Remove usage of the 'new_uninit' unstable feature.
- Use custom C FFI types. Includes a new 'ffi' crate to contain our
custom mapping, instead of using the standard library 'core::ffi'
one. The actual remapping will be introduced in a later cycle.
- Map '__kernel_{size_t,ssize_t,ptrdiff_t}' to 'usize'/'isize'
instead of 32/64-bit integers.
- Fix 'size_t' in bindgen generated prototypes of C builtins.
- Warn on bindgen < 0.69.5 and libclang >= 19.1 due to a double issue
in the projects, which we managed to trigger with the upcoming
tracepoint support. It includes a build test since some
distributions backported the fix (e.g. Debian -- thanks!). All
major distributions we list should be now OK except Ubuntu non-LTS.
'macros' crate:
- Adapt the build system to be able run the doctests there too; and
clean up and enable the corresponding doctests.
'kernel' crate:
- Add 'alloc' module with generic kernel allocator support and remove
the dependency on the Rust standard library 'alloc' and the
extension traits we used to provide fallible methods with flags.
Add the 'Allocator' trait and its implementations '{K,V,KV}malloc'.
Add the 'Box' type (a heap allocation for a single value of type
'T' that is also generic over an allocator and considers the
kernel's GFP flags) and its shorthand aliases '{K,V,KV}Box'. Add
'ArrayLayout' type. Add 'Vec' (a contiguous growable array type)
and its shorthand aliases '{K,V,KV}Vec', including iterator
support.
For instance, now we may write code such as:
let mut v = KVec::new();
v.push(1, GFP_KERNEL)?;
assert_eq!(&v, &[1]);
Treewide, move as well old users to these new types.
- 'sync' module: add global lock support, including the
'GlobalLockBackend' trait; the 'Global{Lock,Guard,LockedBy}' types
and the 'global_lock!' macro. Add the 'Lock::try_lock' method.
- 'error' module: optimize 'Error' type to use 'NonZeroI32' and make
conversion functions public.
- 'page' module: add 'page_align' function.
- Add 'transmute' module with the existing 'FromBytes' and 'AsBytes'
traits.
- 'block::mq::request' module: improve rendered documentation.
- 'types' module: extend 'Opaque' type documentation and add simple
examples for the 'Either' types.
drm/panic:
- Clean up a series of Clippy warnings.
Documentation:
- Add coding guidelines for lints and the '#[expect]' feature.
- Add Ubuntu to the list of distributions in the Quick Start guide.
MAINTAINERS:
- Add Danilo Krummrich as maintainer of the new 'alloc' module.
And a few other small cleanups and fixes"
* tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux: (82 commits)
rust: alloc: Fix `ArrayLayout` allocations
docs: rust: remove spurious item in `expect` list
rust: allow `clippy::needless_lifetimes`
rust: warn on bindgen < 0.69.5 and libclang >= 19.1
rust: use custom FFI integer types
rust: map `__kernel_size_t` and friends also to usize/isize
rust: fix size_t in bindgen prototypes of C builtins
rust: sync: add global lock support
rust: macros: enable the rest of the tests
rust: macros: enable paste! use from macro_rules!
rust: enable macros::module! tests
rust: kbuild: expand rusttest target for macros
rust: types: extend `Opaque` documentation
rust: block: fix formatting of `kernel::block::mq::request` module
rust: macros: fix documentation of the paste! macro
rust: kernel: fix THIS_MODULE header path in ThisModule doc comment
rust: page: add Rust version of PAGE_ALIGN
rust: helpers: remove unnecessary header includes
rust: exports: improve grammar in commentary
drm/panic: allow verbose version check
...
Linus Torvalds [Tue, 26 Nov 2024 21:44:27 +0000 (13:44 -0800)]
Merge tag 'docs-6.13-2' of git://git.lwn.net/linux
Pull more documentation updates from Jonathan Corbet:
"A few late-arriving fixes, plus two more significant changes that were
*almost* ready at the beginning of the merge window:
- A new document on debugging techniques from Sebastian Fricke
- A clarification on MODULE_LICENSE terms meant to head off the sort
of confusion that led to the recent Tuxedo Computers mess"
* tag 'docs-6.13-2' of git://git.lwn.net/linux:
docs: Add debugging guide for the media subsystem
docs: Add debugging section to process
docs/licensing: Clarify wording about "GPL" and "Proprietary"
docs: core-api/gfp_mask-from-fs-io: indicate that vmalloc supports GFP_NOFS/GFP_NOIO
Documentation: kernel-doc: enumerate identifier *type*s
Documentation: pwrseq: Fix trivial misspellings
Documentation: filesystems: update filename extensions
Linus Torvalds [Tue, 26 Nov 2024 21:39:02 +0000 (13:39 -0800)]
Merge tag 'vfs-6.13.ecryptfs.mount.api' of git://git./linux/kernel/git/vfs/vfs
Pull ecryptfs mount api conversion from Christian Brauner:
"Convert ecryptfs to the new mount api"
* tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ecryptfs: Fix spelling mistake "validationg" -> "validating"
ecryptfs: Convert ecryptfs to use the new mount API
ecryptfs: Factor out mount option validation
Linus Torvalds [Tue, 26 Nov 2024 21:26:15 +0000 (13:26 -0800)]
Merge tag 'vfs-6.13.exportfs' of git://git./linux/kernel/git/vfs/vfs
Pull vfs exportfs updates from Christian Brauner:
"This contains work to bring NFS connectable file handles to userspace
servers.
The name_to_handle_at() system call is extended to encode connectable
file handles. Such file handles can be resolved to an open file with a
connected path. So far userspace NFS servers couldn't make use of this
functionality even though the kernel does already support it. This is
achieved by introducing a new flag for name_to_handle_at().
Similarly, the open_by_handle_at() system call is tought to understand
connectable file handles explicitly created via name_to_handle_at()"
* tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: open_by_handle_at() support for decoding "explicit connectable" file handles
fs: name_to_handle_at() support for "explicit connectable" file handles
fs: prepare for "explicit connectable" file handles
Linus Torvalds [Tue, 26 Nov 2024 21:18:00 +0000 (13:18 -0800)]
Merge tag 'vfs-6.13.rust.pid_namespace' of git://git./linux/kernel/git/vfs/vfs
Pull pid_namespace rust bindings from Christian Brauner:
"This contains my Rust bindings for pid namespaces needed for various
rust drivers. Here's a description of the basic C semantics and how
they are mapped to Rust.
The pid namespace of a task doesn't ever change once the task is
alive. A unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID)
will not have an effect on the calling task's pid namespace. It will
only effect the pid namespace of children created by the calling task.
This invariant guarantees that after having acquired a reference to a
task's pid namespace it will remain unchanged.
When a task has exited and been reaped release_task() will be called.
This will set the pid namespace of the task to NULL. So retrieving the
pid namespace of a task that is dead will return NULL. Note, that
neither holding the RCU lock nor holding a reference count to the task
will prevent release_task() from being called.
In order to retrieve the pid namespace of a task the
task_active_pid_ns() function can be used. There are two cases to
consider:
(1) retrieving the pid namespace of the current task
(2) retrieving the pid namespace of a non-current task
From system call context retrieving the pid namespace for case (1) is
always safe and requires neither RCU locking nor a reference count to
be held. Retrieving the pid namespace after release_task() for current
will return NULL but no codepath like that is exposed to Rust.
Retrieving the pid namespace from system call context for (2) requires
RCU protection. Accessing a pid namespace outside of RCU protection
requires a reference count that must've been acquired while holding
the RCU lock. Note that accessing a non-current task means NULL can be
returned as the non-current task could have already passed through
release_task().
To retrieve (1) the current_pid_ns!() macro should be used. It ensures
that the returned pid namespace cannot outlive the calling scope. The
associated current_pid_ns() function should not be called directly as
it could be abused to created an unbounded lifetime for the pid
namespace. The current_pid_ns!() macro allows Rust to handle the
common case of accessing current's pid namespace without RCU
protection and without having to acquire a reference count.
For (2) the task_get_pid_ns() method must be used. This will always
acquire a reference on the pid namespace and will return an Option to
force the caller to explicitly handle the case where pid namespace is
None. Something that tends to be forgotten when doing the equivalent
operation in C.
Missing RCU primitives make it difficult to perform operations that
are otherwise safe without holding a reference count as long as RCU
protection is guaranteed. But it is not important currently. But we do
want it in the future.
Note that for (2) the required RCU protection around calling
task_active_pid_ns() synchronizes against putting the last reference
of the associated struct pid of task->thread_pid. The struct pid
stored in that field is used to retrieve the pid namespace of the
caller. When release_task() is called task->thread_pid will be NULLed
and put_pid() on said struct pid will be delayed in free_pid() via
call_rcu() allowing everyone with an RCU protected access to the
struct pid acquired from task->thread_pid to finish"
* tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
rust: add PidNamespace