Nikita Zhandarovich [Wed, 15 Jan 2025 16:42:20 +0000 (08:42 -0800)]
net/rose: prevent integer overflows in rose_setsockopt()
In case of possible unpredictably large arguments passed to
rose_setsockopt() and multiplied by extra values on top of that,
integer overflows may occur.
Do the safest minimum and fix these issues by checking the
contents of 'opt' and returning -EINVAL if they are too large. Also,
switch to unsigned int and remove useless check for negative 'opt'
in ROSE_IDLE case.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Link: https://patch.msgid.link/20250115164220.19954-1-n.zhandarovich@fintech.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mahdi Arghavani [Fri, 17 Jan 2025 21:37:51 +0000 (21:37 +0000)]
tcp_cubic: fix incorrect HyStart round start detection
I noticed that HyStart incorrectly marks the start of rounds,
leading to inaccurate measurements of ACK train lengths and
resetting the `ca->sample_cnt` variable. This inaccuracy can impact
HyStart's functionality in terminating exponential cwnd growth during
Slow-Start, potentially degrading TCP performance.
The issue arises because the changes introduced in commit
4e1fddc98d25
("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows")
moved the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function.
This modification added an additional condition for triggering the caller,
requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also
be satisfied before invoking `bictcp_hystart_reset`.
This fix ensures that `bictcp_hystart_reset` is correctly called
at the start of a new round, regardless of the congestion window size.
This is achieved by moving the condition
(tcp_snd_cwnd(tp) >= hystart_low_window)
from before calling `bictcp_hystart_reset` to after it.
I tested with a client and a server connected through two Linux software routers.
In this setup, the minimum RTT was 150 ms, the bottleneck bandwidth was 50 Mbps,
and the bottleneck buffer size was 1 BDP, calculated as (50M / 1514 / 8) * 0.150 = 619 packets.
I conducted the test twice, transferring data from the server to the client for 1.5 seconds.
Before the patch was applied, HYSTART-DELAY stopped the exponential growth of cwnd when
cwnd = 516, and the bottleneck link was not yet saturated (516 < 619).
After the patch was applied, HYSTART-ACK-TRAIN stopped the exponential growth of cwnd when
cwnd = 632, and the bottleneck link was saturated (632 > 619).
In this test, applying the patch resulted in 300 KB more data delivered.
Fixes:
4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows")
Signed-off-by: Mahdi Arghavani <ma.arghavani@yahoo.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Haibo Zhang <haibo.zhang@otago.ac.nz>
Cc: David Eyers <david.eyers@otago.ac.nz>
Cc: Abbas Arghavani <abbas.arghavani@mdu.se>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Thu, 16 Jan 2025 13:54:49 +0000 (15:54 +0200)]
net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns()
When getting the IRQ we use k3_udma_glue_tx_get_irq() which returns
negative error value on error. So not NULL check is not sufficient
to deteremine if IRQ is valid. Check that IRQ is greater then zero
to ensure it is valid.
There is no issue at probe time but at runtime user can invoke
.set_channels which results in the following call chain.
am65_cpsw_set_channels()
am65_cpsw_nuss_update_tx_rx_chns()
am65_cpsw_nuss_remove_tx_chns()
am65_cpsw_nuss_init_tx_chns()
At this point if am65_cpsw_nuss_init_tx_chns() fails due to
k3_udma_glue_tx_get_irq() then tx_chn->irq will be set to a
negative value.
Then, at subsequent .set_channels with higher channel count we
will attempt to free an invalid IRQ in am65_cpsw_nuss_remove_tx_chns()
leading to a kernel warning.
The issue is present in the original commit that introduced this driver,
although there, am65_cpsw_nuss_update_tx_rx_chns() existed as
am65_cpsw_nuss_update_tx_chns().
Fixes:
93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Thu, 16 Jan 2025 01:37:13 +0000 (17:37 -0800)]
net: sched: Disallow replacing of child qdisc from one parent to another
Lion Ackermann was able to create a UAF which can be abused for privilege
escalation with the following script
Step 1. create root qdisc
tc qdisc add dev lo root handle 1:0 drr
step2. a class for packet aggregation do demonstrate uaf
tc class add dev lo classid 1:1 drr
step3. a class for nesting
tc class add dev lo classid 1:2 drr
step4. a class to graft qdisc to
tc class add dev lo classid 1:3 drr
step5.
tc qdisc add dev lo parent 1:1 handle 2:0 plug limit 1024
step6.
tc qdisc add dev lo parent 1:2 handle 3:0 drr
step7.
tc class add dev lo classid 3:1 drr
step 8.
tc qdisc add dev lo parent 3:1 handle 4:0 pfifo
step 9. Display the class/qdisc layout
tc class ls dev lo
class drr 1:1 root leaf 2: quantum 64Kb
class drr 1:2 root leaf 3: quantum 64Kb
class drr 3:1 root leaf 4: quantum 64Kb
tc qdisc ls
qdisc drr 1: dev lo root refcnt 2
qdisc plug 2: dev lo parent 1:1
qdisc pfifo 4: dev lo parent 3:1 limit 1000p
qdisc drr 3: dev lo parent 1:2
step10. trigger the bug <=== prevented by this patch
tc qdisc replace dev lo parent 1:3 handle 4:0
step 11. Redisplay again the qdiscs/classes
tc class ls dev lo
class drr 1:1 root leaf 2: quantum 64Kb
class drr 1:2 root leaf 3: quantum 64Kb
class drr 1:3 root leaf 4: quantum 64Kb
class drr 3:1 root leaf 4: quantum 64Kb
tc qdisc ls
qdisc drr 1: dev lo root refcnt 2
qdisc plug 2: dev lo parent 1:1
qdisc pfifo 4: dev lo parent 3:1 refcnt 2 limit 1000p
qdisc drr 3: dev lo parent 1:2
Observe that a) parent for 4:0 does not change despite the replace request.
There can only be one parent. b) refcount has gone up by two for 4:0 and
c) both class 1:3 and 3:1 are pointing to it.
Step 12. send one packet to plug
echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10001))
step13. send one packet to the grafted fifo
echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10003))
step14. lets trigger the uaf
tc class delete dev lo classid 1:3
tc class delete dev lo classid 1:1
The semantics of "replace" is for a del/add _on the same node_ and not
a delete from one node(3:1) and add to another node (1:3) as in step10.
While we could "fix" with a more complex approach there could be
consequences to expectations so the patch takes the preventive approach of
"disallow such config".
Joint work with Lion Ackermann <nnamrec@gmail.com>
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250116013713.900000-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Antoine Tenart [Thu, 16 Jan 2025 09:21:57 +0000 (10:21 +0100)]
net: avoid race between device unregistration and ethnl ops
The following trace can be seen if a device is being unregistered while
its number of channels are being modified.
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120
CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771
RIP: 0010:__mutex_lock+0xc8a/0x1120
Call Trace:
<TASK>
ethtool_check_max_channel+0x1ea/0x880
ethnl_set_channels+0x3c3/0xb10
ethnl_default_set_doit+0x306/0x650
genl_family_rcv_msg_doit+0x1e3/0x2c0
genl_rcv_msg+0x432/0x6f0
netlink_rcv_skb+0x13d/0x3b0
genl_rcv+0x28/0x40
netlink_unicast+0x42e/0x720
netlink_sendmsg+0x765/0xc20
__sys_sendto+0x3ac/0x420
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is because unregister_netdevice_many_notify might run before the
rtnl lock section of ethnl operations, eg. set_channels in the above
example. In this example the rss lock would be destroyed by the device
unregistration path before being used again, but in general running
ethnl operations while dismantle has started is not a good idea.
Fix this by denying any operation on devices being unregistered. A check
was already there in ethnl_ops_begin, but not wide enough.
Note that the same issue cannot be seen on the ioctl version
(__dev_ethtool) because the device reference is retrieved from within
the rtnl lock section there. Once dismantle started, the net device is
unlisted and no reference will be found.
Fixes:
dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250116092159.50890-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 19 Jan 2025 01:26:42 +0000 (17:26 -0800)]
Merge branch 'fix-race-conditions-in-ndo_get_stats64'
Shinas Rasheed says:
====================
Fix race conditions in ndo_get_stats64
Fix race conditions in ndo_get_stats64 by storing tx/rx stats
locally and not availing per queue resources which could be torn
down during interface stop. Also remove stats fetch from
firmware which is currently unnecessary
====================
Link: https://patch.msgid.link/20250117094653.2588578-1-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shinas Rasheed [Fri, 17 Jan 2025 09:46:53 +0000 (01:46 -0800)]
octeon_ep_vf: update tx/rx stats locally for persistence
Update tx/rx stats locally, so that ndo_get_stats64()
can use that and not rely on per queue resources to obtain statistics.
The latter used to cause race conditions when the device stopped.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-5-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shinas Rasheed [Fri, 17 Jan 2025 09:46:52 +0000 (01:46 -0800)]
octeon_ep_vf: remove firmware stats fetch in ndo_get_stats64
The firmware stats fetch call that happens in ndo_get_stats64()
is currently not required, and causes a warning to issue.
The corresponding warn log for the PF is given below:
[ 123.316837] ------------[ cut here ]------------
[ 123.316840] Voluntary context switch within RCU read-side critical section!
[ 123.316917] pc : rcu_note_context_switch+0x2e4/0x300
[ 123.316919] lr : rcu_note_context_switch+0x2e4/0x300
[ 123.316947] Call trace:
[ 123.316949] rcu_note_context_switch+0x2e4/0x300
[ 123.316952] __schedule+0x84/0x584
[ 123.316955] schedule+0x38/0x90
[ 123.316956] schedule_timeout+0xa0/0x1d4
[ 123.316959] octep_send_mbox_req+0x190/0x230 [octeon_ep]
[ 123.316966] octep_ctrl_net_get_if_stats+0x78/0x100 [octeon_ep]
[ 123.316970] octep_get_stats64+0xd4/0xf0 [octeon_ep]
[ 123.316975] dev_get_stats+0x4c/0x114
[ 123.316977] dev_seq_printf_stats+0x3c/0x11c
[ 123.316980] dev_seq_show+0x1c/0x40
[ 123.316982] seq_read_iter+0x3cc/0x4e0
[ 123.316985] seq_read+0xc8/0x110
[ 123.316987] proc_reg_read+0x9c/0xec
[ 123.316990] vfs_read+0xc8/0x2ec
[ 123.316993] ksys_read+0x70/0x100
[ 123.316995] __arm64_sys_read+0x20/0x30
[ 123.316997] invoke_syscall.constprop.0+0x7c/0xd0
[ 123.317000] do_el0_svc+0xb4/0xd0
[ 123.317002] el0_svc+0xe8/0x1f4
[ 123.317005] el0t_64_sync_handler+0x134/0x150
[ 123.317006] el0t_64_sync+0x17c/0x180
[ 123.317008] ---[ end trace
63399811432ab69b ]---
Fixes:
c3fad23cdc06 ("octeon_ep_vf: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-4-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shinas Rasheed [Fri, 17 Jan 2025 09:46:51 +0000 (01:46 -0800)]
octeon_ep: update tx/rx stats locally for persistence
Update tx/rx stats locally, so that ndo_get_stats64()
can use that and not rely on per queue resources to obtain statistics.
The latter used to cause race conditions when the device stopped.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-3-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shinas Rasheed [Fri, 17 Jan 2025 09:46:50 +0000 (01:46 -0800)]
octeon_ep: remove firmware stats fetch in ndo_get_stats64
The firmware stats fetch call that happens in ndo_get_stats64()
is currently not required, and causes a warning to issue.
The warn log is given below:
[ 123.316837] ------------[ cut here ]------------
[ 123.316840] Voluntary context switch within RCU read-side critical section!
[ 123.316917] pc : rcu_note_context_switch+0x2e4/0x300
[ 123.316919] lr : rcu_note_context_switch+0x2e4/0x300
[ 123.316947] Call trace:
[ 123.316949] rcu_note_context_switch+0x2e4/0x300
[ 123.316952] __schedule+0x84/0x584
[ 123.316955] schedule+0x38/0x90
[ 123.316956] schedule_timeout+0xa0/0x1d4
[ 123.316959] octep_send_mbox_req+0x190/0x230 [octeon_ep]
[ 123.316966] octep_ctrl_net_get_if_stats+0x78/0x100 [octeon_ep]
[ 123.316970] octep_get_stats64+0xd4/0xf0 [octeon_ep]
[ 123.316975] dev_get_stats+0x4c/0x114
[ 123.316977] dev_seq_printf_stats+0x3c/0x11c
[ 123.316980] dev_seq_show+0x1c/0x40
[ 123.316982] seq_read_iter+0x3cc/0x4e0
[ 123.316985] seq_read+0xc8/0x110
[ 123.316987] proc_reg_read+0x9c/0xec
[ 123.316990] vfs_read+0xc8/0x2ec
[ 123.316993] ksys_read+0x70/0x100
[ 123.316995] __arm64_sys_read+0x20/0x30
[ 123.316997] invoke_syscall.constprop.0+0x7c/0xd0
[ 123.317000] do_el0_svc+0xb4/0xd0
[ 123.317002] el0_svc+0xe8/0x1f4
[ 123.317005] el0t_64_sync_handler+0x134/0x150
[ 123.317006] el0t_64_sync+0x17c/0x180
[ 123.317008] ---[ end trace
63399811432ab69b ]---
Fixes:
6a610a46bad1 ("octeon_ep: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20250117094653.2588578-2-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maher Sanalla [Thu, 16 Jan 2025 12:33:16 +0000 (14:33 +0200)]
net/mlxfw: Drop hard coded max FW flash image size
Currently, mlxfw kernel module limits FW flash image size to be
10MB at most, preventing the ability to burn recent BlueField-3
FW that exceeds the said size limit.
Thus, drop the hard coded limit. Instead, rely on FW's
max_component_size threshold that is reported in MCQI register
as the size limit for FW image.
Fixes:
410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1737030796-1441634-1-git-send-email-moshe@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Liu Jian [Thu, 16 Jan 2025 14:30:53 +0000 (22:30 +0800)]
net: let net.core.dev_weight always be non-zero
The following problem was encountered during stability test:
(NULL net_device): NAPI poll function process_backlog+0x0/0x530 \
returned 1, exceeding its budget of 0.
------------[ cut here ]------------
list_add double add: new=
ffff88905f746f48, prev=
ffff88905f746f48, \
next=
ffff88905f746e40.
WARNING: CPU: 18 PID: 5462 at lib/list_debug.c:35 \
__list_add_valid_or_report+0xf3/0x130
CPU: 18 UID: 0 PID: 5462 Comm: ping Kdump: loaded Not tainted 6.13.0-rc7+
RIP: 0010:__list_add_valid_or_report+0xf3/0x130
Call Trace:
? __warn+0xcd/0x250
? __list_add_valid_or_report+0xf3/0x130
enqueue_to_backlog+0x923/0x1070
netif_rx_internal+0x92/0x2b0
__netif_rx+0x15/0x170
loopback_xmit+0x2ef/0x450
dev_hard_start_xmit+0x103/0x490
__dev_queue_xmit+0xeac/0x1950
ip_finish_output2+0x6cc/0x1620
ip_output+0x161/0x270
ip_push_pending_frames+0x155/0x1a0
raw_sendmsg+0xe13/0x1550
__sys_sendto+0x3bf/0x4e0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x5b/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The reproduction command is as follows:
sysctl -w net.core.dev_weight=0
ping 127.0.0.1
This is because when the napi's weight is set to 0, process_backlog() may
return 0 and clear the NAPI_STATE_SCHED bit of napi->state, causing this
napi to be re-polled in net_rx_action() until __do_softirq() times out.
Since the NAPI_STATE_SCHED bit has been cleared, napi_schedule_rps() can
be retriggered in enqueue_to_backlog(), causing this issue.
Making the napi's weight always non-zero solves this problem.
Triggering this issue requires system-wide admin (setting is
not namespaced).
Fixes:
e38766054509 ("[NET]: Fix sysctl net.core.dev_weight")
Fixes:
3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Link: https://patch.msgid.link/20250116143053.4146855-1-liujian56@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Fri, 17 Jan 2025 12:01:41 +0000 (12:01 +0000)]
Merge branch 'realtek-link-down'
Daniel Golle says:
====================
net: phy: realtek: fix status when link is down
The .read_status method for RealTek RTL822x PHYs (both C22 and C45) has
multilpe issues which result in reporting bogus link partner advertised
modes as well as speed and duplex while the link is down and no cable
is plugged in.
Example: ethtool after disconnecting a 1000M/Full capable link partner,
now with no wire plugged:
Settings for lan1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
2500baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
2500baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 1000baseT/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 7
Transceiver: external
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Link detected: no
Fix this by making sure all of the fields populated by
rtl822x_c45_read_status() or rtl822x_read_status() get reset, also in
case the link is down.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Golle [Wed, 15 Jan 2025 14:45:00 +0000 (14:45 +0000)]
net: phy: realtek: always clear NBase-T lpa
Clear NBase-T link partner advertisement before calling
rtlgen_read_status() to avoid phy_resolve_aneg_linkmode() wrongly
setting speed and duplex.
This fixes bogus 2.5G/5G/10G link partner advertisement and thus
speed and duplex being set by phy_resolve_aneg_linkmode() due to stale
NBase-T lpa.
Fixes:
68d5cd09e891 ("net: phy: realtek: change order of calls in C22 read_status()")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Golle [Wed, 15 Jan 2025 14:43:43 +0000 (14:43 +0000)]
net: phy: realtek: clear master_slave_state if link is down
rtlgen_decode_physr() which sets master_slave_state isn't called in case
the link is down and other than rtlgen_read_status(),
rtl822x_c45_read_status() doesn't implicitely clear master_slave_state.
Avoid stale master_slave_state by always setting it to
MASTER_SLAVE_STATE_UNKNOWN in rtl822x_c45_read_status() in case the link
is down.
Fixes:
081c9c0265c9 ("net: phy: realtek: read duplex and gbit master from PHYSR register")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Golle [Wed, 15 Jan 2025 14:43:35 +0000 (14:43 +0000)]
net: phy: realtek: clear 1000Base-T lpa if link is down
Only read 1000Base-T link partner advertisement if autonegotiation has
completed and otherwise 1000Base-T link partner advertisement bits.
This fixes bogus 1000Base-T link partner advertisement after link goes
down (eg. by disconnecting the wire).
Fixes:
5cb409b3960e ("net: phy: realtek: clear 1000Base-T link partner advertisement")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 16 Jan 2025 17:09:44 +0000 (09:09 -0800)]
Merge tag 'net-6.13-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Notably this includes fixes for a few regressions spotted very
recently. No known outstanding ones.
Current release - regressions:
- core: avoid CFI problems with sock priv helpers
- xsk: bring back busy polling support
- netpoll: ensure skb_pool list is always initialized
Current release - new code bugs:
- core: make page_pool_ref_netmem work with net iovs
- ipv4: route: fix drop reason being overridden in
ip_route_input_slow
- udp: make rehash4 independent in udp_lib_rehash()
Previous releases - regressions:
- bpf: fix bpf_sk_select_reuseport() memory leak
- openvswitch: fix lockup on tx to unregistering netdev with carrier
- mptcp: be sure to send ack when mptcp-level window re-opens
- eth:
- bnxt: always recalculate features after XDP clearing, fix
null-deref
- mlx5: fix sub-function add port error handling
- fec: handle page_pool_dev_alloc_pages error
Previous releases - always broken:
- vsock: some fixes due to transport de-assignment
- eth:
- ice: fix E825 initialization
- mlx5e: fix inversion dependency warning while enabling IPsec
tunnel
- gtp: destroy device along with udp socket's netns dismantle.
- xilinx: axienet: Fix IRQ coalescing packet count overflow"
* tag 'net-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
netdev: avoid CFI problems with sock priv helpers
net/mlx5e: Always start IPsec sequence number from 1
net/mlx5e: Rely on reqid in IPsec tunnel mode
net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel
net/mlx5: Clear port select structure when fail to create
net/mlx5: SF, Fix add port error handling
net/mlx5: Fix a lockdep warning as part of the write combining test
net/mlx5: Fix RDMA TX steering prio
net: make page_pool_ref_netmem work with net iovs
net: ethernet: xgbe: re-add aneg to supported features in PHY quirks
net: pcs: xpcs: actively unset DW_VR_MII_DIG_CTRL1_2G5_EN for 1G SGMII
net: pcs: xpcs: fix DW_VR_MII_DIG_CTRL1_2G5_EN bit being set for 1G SGMII w/o inband
selftests: net: Adapt ethtool mq tests to fix in qdisc graft
net: fec: handle page_pool_dev_alloc_pages error
net: netpoll: ensure skb_pool list is always initialized
net: xilinx: axienet: Fix IRQ coalescing packet count overflow
nfp: bpf: prevent integer overflow in nfp_bpf_event_output()
selftests: mptcp: avoid spurious errors on disconnect
mptcp: fix spurious wake-up on under memory pressure
mptcp: be sure to send ack when mptcp-level window re-opens
...
Linus Torvalds [Thu, 16 Jan 2025 17:04:10 +0000 (09:04 -0800)]
Merge tag 'pm-6.13-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Update the documentation of cpuidle governors that does not match the
code any more after previous functional changes (Rafael Wysocki) and
fix up the cpufreq Kconfig file broken inadvertently by a previous
update (Viresh Kumar)"
* tag 'pm-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Move endif to the end of Kconfig file
cpuidle: teo: Update documentation after previous changes
cpuidle: menu: Update documentation after previous changes
Linus Torvalds [Thu, 16 Jan 2025 17:02:10 +0000 (09:02 -0800)]
Merge tag 'acpi-6.13-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Prevent acpi_video_device_EDID() from returning a pointer to a memory
region that should not be passed to kfree() which causes one of its
users to crash randomly on attempts to free it (Chris Bainbridge)"
* tag 'acpi-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Fix random crashes due to bad kfree()
Linus Torvalds [Thu, 16 Jan 2025 16:54:33 +0000 (08:54 -0800)]
Merge tag 'for-6.13-rc7-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
- handle d_path() errors when canonicalizing device mapper paths during
device scan
* tag 'for-6.13-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: add the missing error handling inside get_canonical_dev_path
Rafael J. Wysocki [Thu, 16 Jan 2025 14:36:41 +0000 (15:36 +0100)]
Merge branch 'pm-cpufreq'
Merge a cpufreq fix for 6.13:
- Fix cpufreq Kconfig breakage after previous changes (Viresh Kumar).
* pm-cpufreq:
cpufreq: Move endif to the end of Kconfig file
Jakub Kicinski [Wed, 15 Jan 2025 16:14:36 +0000 (08:14 -0800)]
netdev: avoid CFI problems with sock priv helpers
Li Li reports that casting away callback type may cause issues
for CFI. Let's generate a small wrapper for each callback,
to make sure compiler sees the anticipated types.
Reported-by: Li Li <dualli@chromium.org>
Link: https://lore.kernel.org/CANBPYPjQVqmzZ4J=rVQX87a9iuwmaetULwbK_5_3YWk2eGzkaA@mail.gmail.com
Fixes:
170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250115161436.648646-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 16 Jan 2025 11:45:50 +0000 (12:45 +0100)]
Merge branch 'mlx5-misc-fixes-2025-01-15'
Tariq Toukan says:
====================
mlx5 misc fixes 2025-01-15
This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
====================
Link: https://patch.msgid.link/20250115113910.1990174-1-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Leon Romanovsky [Wed, 15 Jan 2025 11:39:10 +0000 (13:39 +0200)]
net/mlx5e: Always start IPsec sequence number from 1
According to RFC4303, section "3.3.3. Sequence Number Generation",
the first packet sent using a given SA will contain a sequence
number of 1.
This is applicable to both ESN and non-ESN mode, which was not covered
in commit mentioned in Fixes line.
Fixes:
3d42c8cc67a8 ("net/mlx5e: Ensure that IPsec sequence packet number starts from 1")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Leon Romanovsky [Wed, 15 Jan 2025 11:39:09 +0000 (13:39 +0200)]
net/mlx5e: Rely on reqid in IPsec tunnel mode
All packet offloads SAs have reqid in it to make sure they have
corresponding policy. While it is not strictly needed for transparent
mode, it is extremely important in tunnel mode. In that mode, policy and
SAs have different match criteria.
Policy catches the whole subnet addresses, and SA catches the tunnel gateways
addresses. The source address of such tunnel is not known during egress packet
traversal in flow steering as it is added only after successful encryption.
As reqid is required for packet offload and it is unique for every SA,
we can safely rely on it only.
The output below shows the configured egress policy and SA by strongswan:
[leonro@vm ~]$ sudo ip x s
src 192.169.101.2 dst 192.169.101.1
proto esp spi 0xc88b7652 reqid 1 mode tunnel
replay-window 0 flag af-unspec esn
aead rfc4106(gcm(aes)) 0xe406a01083986e14d116488549094710e9c57bc6 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 1, bitmap-length 1
00000000
crypto offload parameters: dev eth2 dir out mode packet
[leonro@064 ~]$ sudo ip x p
src 192.170.0.0/16 dst 192.170.0.0/16
dir out priority 383615 ptype main
tmpl src 192.169.101.2 dst 192.169.101.1
proto esp spi 0xc88b7652 reqid 1 mode tunnel
crypto offload parameters: dev eth2 mode packet
Fixes:
b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Leon Romanovsky [Wed, 15 Jan 2025 11:39:08 +0000 (13:39 +0200)]
net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel
Attempt to enable IPsec packet offload in tunnel mode in debug kernel
generates the following kernel panic, which is happening due to two
issues:
1. In SA add section, the should be _bh() variant when marking SA mode.
2. There is not needed flush_workqueue in SA delete routine. It is not
needed as at this stage as it is removed from SADB and the running work
will be canceled later in SA free.
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.12.0+ #4 Not tainted
-----------------------------------------------------
charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
and this task is already holding:
ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
which would create a new lock dependency:
(&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&x->lock){+.-.}-{3:3}
... which became SOFTIRQ-irq-safe at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_timer_handler+0x91/0xd70
__hrtimer_run_queues+0x1dd/0xa60
hrtimer_run_softirq+0x146/0x2e0
handle_softirqs+0x266/0x860
irq_exit_rcu+0x115/0x1a0
sysvec_apic_timer_interrupt+0x6e/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
default_idle+0x13/0x20
default_idle_call+0x67/0xa0
do_idle+0x2da/0x320
cpu_startup_entry+0x50/0x60
start_secondary+0x213/0x2a0
common_startup_64+0x129/0x138
to a SOFTIRQ-irq-unsafe lock:
(&xa->xa_lock#24){+.+.}-{3:3}
... which became SOFTIRQ-irq-unsafe at:
...
lock_acquire+0x1be/0x520
_raw_spin_lock+0x2c/0x40
xa_set_mark+0x70/0x110
mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&xa->xa_lock#24);
local_irq_disable();
lock(&x->lock);
lock(&xa->xa_lock#24);
<Interrupt>
lock(&x->lock);
*** DEADLOCK ***
2 locks held by charon/1337:
#0:
ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90
#1:
ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&x->lock){+.-.}-{3:3} ops: 29 {
HARDIRQ-ON-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_alloc_spi+0xc0/0xe60
xfrm_alloc_userspi+0x5f6/0xbc0
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
IN-SOFTIRQ-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_timer_handler+0x91/0xd70
__hrtimer_run_queues+0x1dd/0xa60
hrtimer_run_softirq+0x146/0x2e0
handle_softirqs+0x266/0x860
irq_exit_rcu+0x115/0x1a0
sysvec_apic_timer_interrupt+0x6e/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
default_idle+0x13/0x20
default_idle_call+0x67/0xa0
do_idle+0x2da/0x320
cpu_startup_entry+0x50/0x60
start_secondary+0x213/0x2a0
common_startup_64+0x129/0x138
INITIAL USE at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
xfrm_alloc_spi+0xc0/0xe60
xfrm_alloc_userspi+0x5f6/0xbc0
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
}
... key at: [<
ffffffff87f9cd20>] __key.18+0x0/0x40
the dependencies between the lock to be acquired
and SOFTIRQ-irq-unsafe lock:
-> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 {
HARDIRQ-ON-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
SOFTIRQ-ON-W at:
lock_acquire+0x1be/0x520
_raw_spin_lock+0x2c/0x40
xa_set_mark+0x70/0x110
mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
INITIAL USE at:
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
xfrm_dev_state_add+0x3bb/0xd70
xfrm_add_sa+0x2451/0x4a90
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
}
... key at: [<
ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core]
... acquired at:
__lock_acquire+0x30a0/0x5040
lock_acquire+0x1be/0x520
_raw_spin_lock_bh+0x34/0x40
mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
xfrm_dev_state_delete+0x90/0x160
__xfrm_state_delete+0x662/0xae0
xfrm_state_delete+0x1e/0x30
xfrm_del_sa+0x1c2/0x340
xfrm_user_rcv_msg+0x493/0x880
netlink_rcv_skb+0x12e/0x380
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
netlink_sendmsg+0x745/0xbe0
__sock_sendmsg+0xc5/0x190
__sys_sendto+0x1fe/0x2c0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
stack backtrace:
CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x74/0xd0
check_irq_usage+0x12e8/0x1d90
? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0
? check_chain_key+0x1bb/0x4c0
? __lockdep_reset_lock+0x180/0x180
? check_path.constprop.0+0x24/0x50
? mark_lock+0x108/0x2fb0
? print_circular_bug+0x9b0/0x9b0
? mark_lock+0x108/0x2fb0
? print_usage_bug.part.0+0x670/0x670
? check_prev_add+0x1c4/0x2310
check_prev_add+0x1c4/0x2310
__lock_acquire+0x30a0/0x5040
? lockdep_set_lock_cmp_fn+0x190/0x190
? lockdep_set_lock_cmp_fn+0x190/0x190
lock_acquire+0x1be/0x520
? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x400/0x400
? __xfrm_state_delete+0x5f0/0xae0
? lock_downgrade+0x6b0/0x6b0
_raw_spin_lock_bh+0x34/0x40
? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
xfrm_dev_state_delete+0x90/0x160
__xfrm_state_delete+0x662/0xae0
xfrm_state_delete+0x1e/0x30
xfrm_del_sa+0x1c2/0x340
? xfrm_get_sa+0x250/0x250
? check_chain_key+0x1bb/0x4c0
xfrm_user_rcv_msg+0x493/0x880
? copy_sec_ctx+0x270/0x270
? check_chain_key+0x1bb/0x4c0
? lockdep_set_lock_cmp_fn+0x190/0x190
? lockdep_set_lock_cmp_fn+0x190/0x190
netlink_rcv_skb+0x12e/0x380
? copy_sec_ctx+0x270/0x270
? netlink_ack+0xd90/0xd90
? netlink_deliver_tap+0xcd/0xb60
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x42f/0x740
? netlink_attachskb+0x730/0x730
? lock_acquire+0x1be/0x520
netlink_sendmsg+0x745/0xbe0
? netlink_unicast+0x740/0x740
? __might_fault+0xbb/0x170
? netlink_unicast+0x740/0x740
__sock_sendmsg+0xc5/0x190
? fdget+0x163/0x1d0
__sys_sendto+0x1fe/0x2c0
? __x64_sys_getpeername+0xb0/0xb0
? do_user_addr_fault+0x856/0xe30
? lock_acquire+0x1be/0x520
? __task_pid_nr_ns+0x117/0x410
? lock_downgrade+0x6b0/0x6b0
__x64_sys_sendto+0xdc/0x1b0
? lockdep_hardirqs_on_prepare+0x284/0x400
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f7d31291ba4
Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45
RSP: 002b:
00007f7d2ccd94f0 EFLAGS:
00000297 ORIG_RAX:
000000000000002c
RAX:
ffffffffffffffda RBX:
0000000000000001 RCX:
00007f7d31291ba4
RDX:
0000000000000028 RSI:
00007f7d2ccd96a0 RDI:
000000000000000a
RBP:
00007f7d2ccd9530 R08:
00007f7d2ccd9598 R09:
000000000000000c
R10:
0000000000000000 R11:
0000000000000297 R12:
0000000000000028
R13:
00007f7d2ccd9598 R14:
00007f7d2ccd96a0 R15:
00000000000000e1
</TASK>
Fixes:
4c24272b4e2b ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Mark Zhang [Wed, 15 Jan 2025 11:39:07 +0000 (13:39 +0200)]
net/mlx5: Clear port select structure when fail to create
Clear the port select structure on error so no stale values left after
definers are destroyed. That's because the mlx5_lag_destroy_definers()
always try to destroy all lag definers in the tt_map, so in the flow
below lag definers get double-destroyed and cause kernel crash:
mlx5_lag_port_sel_create()
mlx5_lag_create_definers()
mlx5_lag_create_definer() <- Failed on tt 1
mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed
mlx5_lag_port_sel_create()
mlx5_lag_create_definers()
mlx5_lag_create_definer() <- Failed on tt 0
mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 64k pages, 48-bit VAs, pgdp=
0000000112ce2e00
[
0000000000000008] pgd=
0000000000000000, p4d=
0000000000000000, pud=
0000000000000000
Internal error: Oops:
0000000096000005 [#1] PREEMPT SMP
Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)]
CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G OE 6.11.0+ #2
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
pstate:
60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
sp :
ffff800085fafb00
x29:
ffff800085fafb00 x28:
ffff0000da0c8000 x27:
0000000000000000
x26:
ffff0000da0c8000 x25:
ffff0000da0c8000 x24:
ffff0000da0c8000
x23:
ffff0000c31f81a0 x22:
0400000000000000 x21:
ffff0000da0c8000
x20:
0000000000000000 x19:
0000000000000001 x18:
0000000000000000
x17:
0000000000000000 x16:
0000000000000000 x15:
0000ffff8b0c9350
x14:
0000000000000000 x13:
ffff800081390d18 x12:
ffff800081dc3cc0
x11:
0000000000000001 x10:
0000000000000b10 x9 :
ffff80007ab7304c
x8 :
ffff0000d00711f0 x7 :
0000000000000004 x6 :
0000000000000190
x5 :
ffff00027edb3010 x4 :
0000000000000000 x3 :
0000000000000000
x2 :
ffff0000d39b8000 x1 :
ffff0000d39b8000 x0 :
0400000000000000
Call trace:
mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core]
mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core]
mlx5_activate_lag+0x60c/0x6f8 [mlx5_core]
mlx5_do_bond_work+0x284/0x5c8 [mlx5_core]
process_one_work+0x170/0x3e0
worker_thread+0x2d8/0x3e0
kthread+0x11c/0x128
ret_from_fork+0x10/0x20
Code:
a9025bf5 aa0003f6 a90363f7 f90023f9 (
f9400400)
---[ end trace
0000000000000000 ]---
Fixes:
dc48516ec7d3 ("net/mlx5: Lag, add support to create definers for LAG")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chris Mi [Wed, 15 Jan 2025 11:39:06 +0000 (13:39 +0200)]
net/mlx5: SF, Fix add port error handling
If failed to add SF, error handling doesn't delete the SF from the
SF table. But the hw resources are deleted. So when unload driver,
hw resources will be deleted again. Firmware will report syndrome
0x68def3 which means "SF is not allocated can not deallocate".
Fix it by delete SF from SF table if failed to add SF.
Fixes:
2597ee190b4e ("net/mlx5: Call mlx5_sf_id_erase() once in mlx5_sf_dealloc()")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yishai Hadas [Wed, 15 Jan 2025 11:39:05 +0000 (13:39 +0200)]
net/mlx5: Fix a lockdep warning as part of the write combining test
Fix a lockdep warning [1] observed during the write combining test.
The warning indicates a potential nested lock scenario that could lead
to a deadlock.
However, this is a false positive alarm because the SF lock and its
parent lock are distinct ones.
The lockdep confusion arises because the locks belong to the same object
class (i.e., struct mlx5_core_dev).
To resolve this, the code has been refactored to avoid taking both
locks. Instead, only the parent lock is acquired.
[1]
raw_ethernet_bw/2118 is trying to acquire lock:
[ 213.619032]
ffff88811dd75e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.620270]
[ 213.620270] but task is already holding lock:
[ 213.620943]
ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[ 213.622045]
[ 213.622045] other info that might help us debug this:
[ 213.622778] Possible unsafe locking scenario:
[ 213.622778]
[ 213.623465] CPU0
[ 213.623815] ----
[ 213.624148] lock(&dev->wc_state_lock);
[ 213.624615] lock(&dev->wc_state_lock);
[ 213.625071]
[ 213.625071] *** DEADLOCK ***
[ 213.625071]
[ 213.625805] May be due to missing lock nesting notation
[ 213.625805]
[ 213.626522] 4 locks held by raw_ethernet_bw/2118:
[ 213.627019] #0:
ffff88813f80d578 (&uverbs_dev->disassociate_srcu){.+.+}-{0:0},
at: ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.628088] #1:
ffff88810fb23930 (&file->hw_destroy_rwsem){.+.+}-{3:3},
at: ib_init_ucontext+0x2d/0xf0 [ib_uverbs]
[ 213.629094] #2:
ffff88810fb23878 (&file->ucontext_lock){+.+.}-{3:3},
at: ib_init_ucontext+0x49/0xf0 [ib_uverbs]
[ 213.630106] #3:
ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3},
at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[ 213.631185]
[ 213.631185] stack backtrace:
[ 213.631718] CPU: 1 UID: 0 PID: 2118 Comm: raw_ethernet_bw Not tainted
6.12.0-rc7_internal_net_next_mlx5_89a0ad0 #1
[ 213.632722] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 213.633785] Call Trace:
[ 213.634099]
[ 213.634393] dump_stack_lvl+0x7e/0xc0
[ 213.634806] print_deadlock_bug+0x278/0x3c0
[ 213.635265] __lock_acquire+0x15f4/0x2c40
[ 213.635712] lock_acquire+0xcd/0x2d0
[ 213.636120] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.636722] ? mlx5_ib_enable_lb+0x24/0xa0 [mlx5_ib]
[ 213.637277] __mutex_lock+0x81/0xda0
[ 213.637697] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.638305] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.638902] ? rcu_read_lock_sched_held+0x3f/0x70
[ 213.639400] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.640016] mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[ 213.640615] set_ucontext_resp+0x68/0x2b0 [mlx5_ib]
[ 213.641144] ? debug_mutex_init+0x33/0x40
[ 213.641586] mlx5_ib_alloc_ucontext+0x18e/0x7b0 [mlx5_ib]
[ 213.642145] ib_init_ucontext+0xa0/0xf0 [ib_uverbs]
[ 213.642679] ib_uverbs_handler_UVERBS_METHOD_GET_CONTEXT+0x95/0xc0
[ib_uverbs]
[ 213.643426] ? _copy_from_user+0x46/0x80
[ 213.643878] ib_uverbs_cmd_verbs+0xa6b/0xc80 [ib_uverbs]
[ 213.644426] ? ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x130/0x130
[ib_uverbs]
[ 213.645213] ? __lock_acquire+0xa99/0x2c40
[ 213.645675] ? lock_acquire+0xcd/0x2d0
[ 213.646101] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.646625] ? reacquire_held_locks+0xcf/0x1f0
[ 213.647102] ? do_user_addr_fault+0x45d/0x770
[ 213.647586] ib_uverbs_ioctl+0xe0/0x170 [ib_uverbs]
[ 213.648102] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[ 213.648632] __x64_sys_ioctl+0x4d3/0xaa0
[ 213.649060] ? do_user_addr_fault+0x4a8/0x770
[ 213.649528] do_syscall_64+0x6d/0x140
[ 213.649947] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 213.650478] RIP: 0033:0x7fa179b0737b
[ 213.650893] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c
89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8
10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d
7d 2a 0f 00 f7 d8 64 89 01 48
[ 213.652619] RSP: 002b:
00007ffd2e6d46e8 EFLAGS:
00000246 ORIG_RAX:
0000000000000010
[ 213.653390] RAX:
ffffffffffffffda RBX:
00007ffd2e6d47f8 RCX:
00007fa179b0737b
[ 213.654084] RDX:
00007ffd2e6d47e0 RSI:
00000000c0181b01 RDI:
0000000000000003
[ 213.654767] RBP:
00007ffd2e6d47c0 R08:
00007fa1799be010 R09:
0000000000000002
[ 213.655453] R10:
00007ffd2e6d4960 R11:
0000000000000246 R12:
00007ffd2e6d487c
[ 213.656170] R13:
0000000000000027 R14:
0000000000000001 R15:
00007ffd2e6d4f70
Fixes:
d98995b4bf98 ("net/mlx5: Reimplement write combining test")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Patrisious Haddad [Wed, 15 Jan 2025 11:39:04 +0000 (13:39 +0200)]
net/mlx5: Fix RDMA TX steering prio
User added steering rules at RDMA_TX were being added to the first prio,
which is the counters prio.
Fix that so that they are correctly added to the BYPASS_PRIO instead.
Fixes:
24670b1a3166 ("net/mlx5: Add support for RDMA TX steering")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pavel Begunkov [Wed, 8 Jan 2025 22:06:22 +0000 (14:06 -0800)]
net: make page_pool_ref_netmem work with net iovs
page_pool_ref_netmem() should work with either netmem representation, but
currently it casts to a page with netmem_to_page(), which will fail with
net iovs. Use netmem_get_pp_ref_count_ref() instead.
Fixes:
8ab79ed50cf1 ("page_pool: devmem support")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/20250108220644.3528845-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sun, 12 Jan 2025 21:59:59 +0000 (22:59 +0100)]
net: ethernet: xgbe: re-add aneg to supported features in PHY quirks
In 4.19, before the switch to linkmode bitmaps, PHY_GBIT_FEATURES
included feature bits for aneg and TP/MII ports.
SUPPORTED_TP | \
SUPPORTED_MII)
SUPPORTED_10baseT_Full)
SUPPORTED_100baseT_Full)
SUPPORTED_1000baseT_Full)
PHY_100BT_FEATURES | \
PHY_DEFAULT_FEATURES)
PHY_1000BT_FEATURES)
Referenced commit expanded PHY_GBIT_FEATURES, silently removing
PHY_DEFAULT_FEATURES. The removed part can be re-added by using
the new PHY_GBIT_FEATURES definition.
Not clear to me is why nobody seems to have noticed this issue.
I stumbled across this when checking what it takes to make
phy_10_100_features_array et al private to phylib.
Fixes:
d0939c26c53a ("net: ethernet: xgbe: expand PHY_GBIT_FEAUTRES")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/46521973-7738-4157-9f5e-0bb6f694acba@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 14 Jan 2025 16:47:21 +0000 (18:47 +0200)]
net: pcs: xpcs: actively unset DW_VR_MII_DIG_CTRL1_2G5_EN for 1G SGMII
xpcs_config_2500basex() sets DW_VR_MII_DIG_CTRL1_2G5_EN, but
xpcs_config_aneg_c37_sgmii() never unsets it. So, on a protocol change
from 2500base-x to sgmii, the DW_VR_MII_DIG_CTRL1_2G5_EN bit will remain
set.
Fixes:
f27abde3042a ("net: pcs: add 2500BASEX support for Intel mGbE controller")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250114164721.2879380-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 14 Jan 2025 16:47:20 +0000 (18:47 +0200)]
net: pcs: xpcs: fix DW_VR_MII_DIG_CTRL1_2G5_EN bit being set for 1G SGMII w/o inband
On a port with SGMII fixed-link at SPEED_1000, DW_VR_MII_DIG_CTRL1 gets
set to 0x2404. This is incorrect, because bit 2 (DW_VR_MII_DIG_CTRL1_2G5_EN)
is set.
It comes from the previous write to DW_VR_MII_AN_CTRL, because the "val"
variable is reused and is dirty. Actually, its value is 0x4, aka
FIELD_PREP(DW_VR_MII_PCS_MODE_MASK, DW_VR_MII_PCS_MODE_C37_SGMII).
Resolve the issue by clearing "val" to 0 when writing to a new register.
After the fix, the register value is 0x2400.
Prior to the blamed commit, when the read-modify-write was open-coded,
the code saved the content of the DW_VR_MII_DIG_CTRL1 register in the
"ret" variable.
Fixes:
ce8d6081fcf4 ("net: pcs: xpcs: add _modify() accessors")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250114164721.2879380-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Sat, 11 Jan 2025 21:15:15 +0000 (18:15 -0300)]
selftests: net: Adapt ethtool mq tests to fix in qdisc graft
Because of patch[1] the graft behaviour changed
So the command:
tcq replace parent 100:1 handle 204:
Is no longer valid and will not delete 100:4 added by command:
tcq replace parent 100:4 handle 204: pfifo_fast
So to maintain the original behaviour, this patch manually deletes 100:4
and grafts 100:1
Note: This change will also work fine without [1]
[1] https://lore.kernel.org/netdev/
20250111151455.75480-1-jhs@mojatatu.com/T/#u
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Groeneveld [Mon, 13 Jan 2025 15:48:45 +0000 (10:48 -0500)]
net: fec: handle page_pool_dev_alloc_pages error
The fec_enet_update_cbd function calls page_pool_dev_alloc_pages but did
not handle the case when it returned NULL. There was a WARN_ON(!new_page)
but it would still proceed to use the NULL pointer and then crash.
This case does seem somewhat rare but when the system is under memory
pressure it can happen. One case where I can duplicate this with some
frequency is when writing over a smbd share to a SATA HDD attached to an
imx6q.
Setting /proc/sys/vm/min_free_kbytes to higher values also seems to solve
the problem for my test case. But it still seems wrong that the fec driver
ignores the memory allocation error and can crash.
This commit handles the allocation error by dropping the current packet.
Fixes:
95698ff6177b5 ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Kevin Groeneveld <kgroeneveld@lenbrook.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250113154846.1765414-1-kgroeneveld@lenbrook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
John Sperbeck [Tue, 14 Jan 2025 01:13:54 +0000 (17:13 -0800)]
net: netpoll: ensure skb_pool list is always initialized
When __netpoll_setup() is called directly, instead of through
netpoll_setup(), the np->skb_pool list head isn't initialized.
If skb_pool_flush() is later called, then we hit a NULL pointer
in skb_queue_purge_reason(). This can be seen with this repro,
when CONFIG_NETCONSOLE is enabled as a module:
ip tuntap add mode tap tap0
ip link add name br0 type bridge
ip link set dev tap0 master br0
modprobe netconsole netconsole=4444@10.0.0.1/br0,9353@10.0.0.2/
rmmod netconsole
The backtrace is:
BUG: kernel NULL pointer dereference, address:
0000000000000008
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
... ... ...
Call Trace:
<TASK>
__netpoll_free+0xa5/0xf0
br_netpoll_cleanup+0x43/0x50 [bridge]
do_netpoll_cleanup+0x43/0xc0
netconsole_netdev_event+0x1e3/0x300 [netconsole]
unregister_netdevice_notifier+0xd9/0x150
cleanup_module+0x45/0x920 [netconsole]
__se_sys_delete_module+0x205/0x290
do_syscall_64+0x70/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Move the skb_pool list setup and initial skb fill into __netpoll_setup().
Fixes:
221a9c1df790 ("net: netpoll: Individualize the skb pool")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250114011354.2096812-1-jsperbeck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Mon, 13 Jan 2025 16:30:00 +0000 (11:30 -0500)]
net: xilinx: axienet: Fix IRQ coalescing packet count overflow
If coalesce_count is greater than 255 it will not fit in the register and
will overflow. This can be reproduced by running
# ethtool -C ethX rx-frames 256
which will result in a timeout of 0us instead. Fix this by checking for
invalid values and reporting an error.
Fixes:
8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://patch.msgid.link/20250113163001.2335235-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Mon, 13 Jan 2025 06:18:39 +0000 (09:18 +0300)]
nfp: bpf: prevent integer overflow in nfp_bpf_event_output()
The "sizeof(struct cmsg_bpf_event) + pkt_size + data_size" math could
potentially have an integer wrapping bug on 32bit systems. Check for
this and return an error.
Fixes:
9816dd35ecec ("nfp: bpf: perf event output helpers support")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/6074805b-e78d-4b8a-bf05-e929b5377c28@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 14 Jan 2025 22:10:17 +0000 (14:10 -0800)]
Merge tag 'seccomp-v6.13-rc8' of git://git./linux/kernel/git/kees/linux
Pull seccomp fix from Kees Cook:
"Fix a randconfig failure:
- Unconditionally define stub for !CONFIG_SECCOMP (Linus Walleij)"
* tag 'seccomp-v6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Stub for !CONFIG_SECCOMP
Jakub Kicinski [Tue, 14 Jan 2025 21:41:16 +0000 (13:41 -0800)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Fix E825 initialization
Grzegorz Nitka says:
E825 products have incorrect initialization procedure, which may lead to
initialization failures and register values.
Fix E825 products initialization by adding correct sync delay, checking
the PHY revision only for current PHY and adding proper destination
device when reading port/quad.
In addition, E825 uses PF ID for indexing per PF registers and as
a primary PHY lane number, which is incorrect.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Add correct PHY lane assignment
ice: Fix ETH56G FC-FEC Rx offset value
ice: Fix quad registers read on E825
ice: Fix E825 initialization
====================
Link: https://patch.msgid.link/20250113182840.3564250-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 14 Jan 2025 21:32:13 +0000 (13:32 -0800)]
Merge branch 'mptcp-fixes-for-connect-selftest-flakes'
Matthieu Baerts says:
====================
mptcp: fixes for connect selftest flakes
Last week, Jakub reported [1] that the MPTCP Connect selftest was
unstable. It looked like it started after the introduction of some fixes
[2]. After analysis from Paolo, these patches revealed existing bugs,
that should be fixed by the following patches.
- Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In
some corner cases, the other peer was not notified when more data
could be sent. A fix for v5.11, but depending on a feature introduced
in v5.19.
- Patch 2: Fix spurious wake-up under memory pressure. In this
situation, the userspace could be invited to read data not being there
yet. A fix for v6.7.
- Patch 3: Fix a false positive error when running the MPTCP Connect
selftest with the "disconnect" cases. The userspace could disconnect
the socket too soon, which would reset (MP_FASTCLOSE) the connection,
interpreted as an error by the test. A fix for v5.17.
Link: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org
Link: https://lore.kernel.org/20241230-net-mptcp-rbuf-fixes-v1-0-8608af434ceb@kernel.org
====================
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-0-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 13 Jan 2025 15:44:58 +0000 (16:44 +0100)]
selftests: mptcp: avoid spurious errors on disconnect
The disconnect test-case generates spurious errors:
INFO: disconnect
INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI
01 ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 140ms) [FAIL]
file received by server does not match (in, out):
Unexpected revents: POLLERR/POLLNVAL(19)
-rw-r--r-- 1 root root
10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect
Trailing bytes are:
��\����R���!8��u2��5N%
-rw------- 1 root root
9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1
Trailing bytes are:
��\����R���!8��u2��5N%
02 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ]
03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP (duration 31ms) [ OK ]
04 ns1 TCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration 26ms) [ OK ]
[FAIL] Tests of the full disconnection have failed
Time: 2 seconds
The root cause is actually in the user-space bits: the test program
currently disconnects as soon as all the pending data has been spooled,
generating an FASTCLOSE. If such option reaches the peer before the
latter has reached the closed status, the msk socket will report an
error to the user-space, as per protocol specification, causing the
above failure.
Address the issue explicitly waiting for all the relevant sockets to
reach a closed status before performing the disconnect.
Fixes:
05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-3-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 13 Jan 2025 15:44:57 +0000 (16:44 +0100)]
mptcp: fix spurious wake-up on under memory pressure
The wake-up condition currently implemented by mptcp_epollin_ready()
is wrong, as it could mark the MPTCP socket as readable even when
no data are present and the system is under memory pressure.
Explicitly check for some data being available in the receive queue.
Fixes:
5684ab1a0eff ("mptcp: give rcvlowat some love")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-2-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 13 Jan 2025 15:44:56 +0000 (16:44 +0100)]
mptcp: be sure to send ack when mptcp-level window re-opens
mptcp_cleanup_rbuf() is responsible to send acks when the user-space
reads enough data to update the receive windows significantly.
It tries hard to avoid acquiring the subflow sockets locks by checking
conditions similar to the ones implemented at the TCP level.
To avoid too much code duplication - the MPTCP protocol can't reuse the
TCP helpers as part of the relevant status is maintained into the msk
socket - and multiple costly window size computation, mptcp_cleanup_rbuf
uses a rough estimate for the most recently advertised window size:
the MPTCP receive free space, as recorded as at last-ack time.
Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect
a zero to non-zero win change in some corner cases, skipping the
tcp_cleanup_rbuf call and leaving the peer stuck.
After commit
ea66758c1795 ("tcp: allow MPTCP to update the announced
window"), MPTCP has actually cheap access to the announced window value.
Use it in mptcp_cleanup_rbuf() for a more accurate ack generation.
Fixes:
e3859603ba13 ("mptcp: better msk receive window updates")
Cc: stable@vger.kernel.org
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/
20250107131845.
5e5de3c5@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-1-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Viresh Kumar [Fri, 10 Jan 2025 05:53:10 +0000 (11:23 +0530)]
cpufreq: Move endif to the end of Kconfig file
It is possible to enable few cpufreq drivers, without the framework
being enabled. This happened due to a bug while moving the entries
earlier. Fix it.
Fixes:
7ee1378736f0 ("cpufreq: Move CPPC configs to common Kconfig and add RISC-V")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/84ac7a8fa72a8fe20487bb0a350a758bce060965.1736488384.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Tue, 14 Jan 2025 19:32:14 +0000 (11:32 -0800)]
Merge tag 'pci-v6.13-fixes-3' of git://git./linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Prevent bwctrl NULL pointer dereference that caused hangs on shutdown
on ASUS ROG Strix SCAR 17 G733PYV (Lukas Wunner)
* tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/bwctrl: Fix NULL pointer deref on unbind and bind
Linus Torvalds [Tue, 14 Jan 2025 18:07:40 +0000 (10:07 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"One iscsi driver fix and one core fix.
The core fix is an important one because a retry efficiency update is
now causing some USB devices to get the wrong size on discovery (it
upset their retry logic for READ_CAPACITY_16)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
scsi: core: Fix command pass through retry regression
Linus Torvalds [Tue, 14 Jan 2025 17:54:57 +0000 (09:54 -0800)]
Merge tag 'sound-6.13' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the last PR for 6.13. This became bigger than wished due to
the timing after holiday breaks.
The only large LOC is the additional document for Cirrus codec which
is nice for users (and absolutely safe). All the rest are small fixes
in ASoC Rcar and codecs as well as HD-audio quirks (And no fix for USB
guitar pedals seen yet :)"
* tag 'sound-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5
ALSA: hda/realtek: fixup ASUS H7606W
ALSA: hda/realtek: fixup ASUS GA605W
ALSA: hda/realtek: Add support for Ayaneo System using CS35L41 HDA
ASoC: rsnd: check rsnd_adg_clk_enable() return value
ASoC: cs42l43: Add codec force suspend/resume ops
ALSA: doc: Add codecs/index.rst to top-level index
ALSA: doc: cs35l56: Add information about Cirrus Logic CS35L54/56/57
ASoC: samsung: Add missing depends on I2C
MAINTAINERS: add missing maintainers for Simple Audio Card
ASoC: samsung: Add missing selects for MFD_WM8994
ASoC: codecs: es8316: Fix HW rate calculation for 48Mhz MCLK
ASoC: wm8994: Add depends on MFD core
ASoC: tas2781: Fix occasional calibration failture
ASoC: codecs: ES8326: Adjust ANA_MICBIAS to reduce pop noise
Paolo Abeni [Tue, 14 Jan 2025 11:29:39 +0000 (12:29 +0100)]
Merge branch 'vsock-some-fixes-due-to-transport-de-assignment'
Stefano Garzarella says:
====================
vsock: some fixes due to transport de-assignment
v1: https://lore.kernel.org/netdev/
20250108180617.154053-1-sgarzare@redhat.com/
v2:
- Added patch 3 to cancel the virtio close delayed work when de-assigning
the transport
- Added patch 4 to clean the socket state after de-assigning the transport
- Added patch 5 as suggested by Michael and Hyunwoo Kim. It's based on
Hyunwoo Kim and Wongi Lee patch [1] but using WARN_ON and covering more
functions
- Added R-b/T-b tags
This series includes two patches discussed in the thread started by
Hyunwoo Kim a few weeks ago [1], plus 3 more patches added after some
discussions on v1 (see changelog). All related to the case where a vsock
socket is de-assigned from a transport (e.g., because the connect fails
or is interrupted by a signal) and then assigned to another transport
or to no-one (NULL).
I tested with usual vsock test suite, plus Michal repro [2]. (Note: the repo
works only if a G2H transport is not loaded, e.g. virtio-vsock driver).
The first patch is a fix more appropriate to the problem reported in
that thread, the second patch on the other hand is a related fix but
of a different problem highlighted by Michal Luczaj. It's present only
in vsock_bpf and already handled in af_vsock.c
The third patch is to cancel the virtio close delayed work when de-assigning
the transport, the fourth patch is to clean the socket state after de-assigning
the transport, the last patch adds warnings and prevents null-ptr-deref in
vsock_*[has_data|has_space].
Hyunwoo Kim, Michal, if you can test and report your Tested-by that
would be great!
[1] https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
[2] https://lore.kernel.org/netdev/
2b3062e3-bdaa-4c94-a3c0-
2930595b9670@rbox.co/
====================
Link: https://patch.msgid.link/20250110083511.30419-1-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stefano Garzarella [Fri, 10 Jan 2025 08:35:11 +0000 (09:35 +0100)]
vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]
Recent reports have shown how we sometimes call vsock_*_has_data()
when a vsock socket has been de-assigned from a transport (see attached
links), but we shouldn't.
Previous commits should have solved the real problems, but we may have
more in the future, so to avoid null-ptr-deref, we can return 0
(no space, no data available) but with a warning.
This way the code should continue to run in a nearly consistent state
and have a warning that allows us to debug future problems.
Fixes:
c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/
Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/
Co-developed-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Co-developed-by: Wongi Lee <qwerty@theori.io>
Signed-off-by: Wongi Lee <qwerty@theori.io>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stefano Garzarella [Fri, 10 Jan 2025 08:35:10 +0000 (09:35 +0100)]
vsock: reset socket state when de-assigning the transport
Transport's release() and destruct() are called when de-assigning the
vsock transport. These callbacks can touch some socket state like
sock flags, sk_state, and peer_shutdown.
Since we are reassigning the socket to a new transport during
vsock_connect(), let's reset these fields to have a clean state with
the new transport.
Fixes:
c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stefano Garzarella [Fri, 10 Jan 2025 08:35:09 +0000 (09:35 +0100)]
vsock/virtio: cancel close work in the destructor
During virtio_transport_release() we can schedule a delayed work to
perform the closing of the socket before destruction.
The destructor is called either when the socket is really destroyed
(reference counter to zero), or it can also be called when we are
de-assigning the transport.
In the former case, we are sure the delayed work has completed, because
it holds a reference until it completes, so the destructor will
definitely be called after the delayed work is finished.
But in the latter case, the destructor is called by AF_VSOCK core, just
after the release(), so there may still be delayed work scheduled.
Refactor the code, moving the code to delete the close work already in
the do_close() to a new function. Invoke it during destruction to make
sure we don't leave any pending work.
Fixes:
c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Tested-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stefano Garzarella [Fri, 10 Jan 2025 08:35:08 +0000 (09:35 +0100)]
vsock/bpf: return early if transport is not assigned
Some of the core functions can only be called if the transport
has been assigned.
As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:
BUG: kernel NULL pointer dereference, address:
00000000000000a0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD
12faf8067 P4D
12faf8067 PUD
113670067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
RIP: 0010:vsock_connectible_has_data+0x1f/0x40
Call Trace:
vsock_bpf_recvmsg+0xca/0x5e0
sock_recvmsg+0xb9/0xc0
__sys_recvfrom+0xb3/0x130
__x64_sys_recvfrom+0x20/0x30
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().
Fixes:
634f1a7110b4 ("vsock: support sockmap")
Cc: stable@vger.kernel.org
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/netdev/
5ca20d4c-1017-49c2-9516-
f6f75fd331e9@rbox.co/
Tested-by: Michal Luczaj <mhal@rbox.co>
Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
677f84a8.
050a0220.25a300.01b3.GAE@google.com/
Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stefano Garzarella [Fri, 10 Jan 2025 08:35:07 +0000 (09:35 +0100)]
vsock/virtio: discard packets if the transport changes
If the socket has been de-assigned or assigned to another transport,
we must discard any packets received because they are not expected
and would cause issues when we access vsk->transport.
A possible scenario is described by Hyunwoo Kim in the attached link,
where after a first connect() interrupted by a signal, and a second
connect() failed, we can find `vsk->transport` at NULL, leading to a
NULL pointer dereference.
Fixes:
c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: stable@vger.kernel.org
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Reported-by: Wongi Lee <qwerty@theori.io>
Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 14 Jan 2025 10:20:06 +0000 (11:20 +0100)]
Merge branch 'gtp-pfcp-fix-use-after-free-of-udp-tunnel-socket'
Kuniyuki Iwashima says:
====================
gtp/pfcp: Fix use-after-free of UDP tunnel socket.
Xiao Liang pointed out weird netns usages in ->newlink() of
gtp and pfcp.
This series fixes the issues.
Link: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/
Changes:
v2:
* Patch 1
* Fix uninit/unused local var
v1: https://lore.kernel.org/netdev/
20250108062834.11117-1-kuniyu@amazon.com/
====================
Link: https://patch.msgid.link/20250110014754.33847-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:54 +0000 (10:47 +0900)]
pfcp: Destroy device along with udp socket's netns dismantle.
pfcp_newlink() links the device to a list in dev_net(dev) instead
of net, where a udp tunnel socket is created.
Even when net is removed, the device stays alive on dev_net(dev).
Then, removing net triggers the splat below. [0]
In this example, pfcp0 is created in ns2, but the udp socket is
created in ns1.
ip netns add ns1
ip netns add ns2
ip -n ns1 link add netns ns2 name pfcp0 type pfcp
ip netns del ns1
Let's link the device to the socket's netns instead.
Now, pfcp_net_exit() needs another netdev iteration to remove
all pfcp devices in the netns.
pfcp_dev_list is not used under RCU, so the list API is converted
to the non-RCU variant.
pfcp_net_exit() can be converted to .exit_batch_rtnl() in net-next.
[0]:
ref_tracker: net notrefcnt@
00000000128b34dc has 1/1 users at
sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
__sock_create (net/socket.c:1558)
udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
pfcp_create_sock (drivers/net/pfcp.c:168)
pfcp_newlink (drivers/net/pfcp.c:182 drivers/net/pfcp.c:197)
rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
netlink_rcv_skb (net/netlink/af_netlink.c:2542)
netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
netlink_sendmsg (net/netlink/af_netlink.c:1891)
____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
___sys_sendmsg (net/socket.c:2639)
__sys_sendmsg (net/socket.c:2669)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
WARNING: CPU: 1 PID: 11 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 11 Comm: kworker/u16:0 Not tainted
6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:
ff11000007f3fb60 EFLAGS:
00010286
RAX:
00000000000020ef RBX:
ff1100000d6481e0 RCX:
1ffffffff0e40d82
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffffffff8423ee3c
RBP:
ff1100000d648230 R08:
0000000000000001 R09:
fffffbfff0e395af
R10:
0000000000000001 R11:
0000000000000000 R12:
ff1100000d648230
R13:
dead000000000100 R14:
ff1100000d648230 R15:
dffffc0000000000
FS:
0000000000000000(0000) GS:
ff1100006ce80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00005620e1363990 CR3:
000000000eeb2002 CR4:
0000000000771ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe07f0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
<TASK>
? __warn (kernel/panic.c:748)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:285)
? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
? kfree (mm/slub.c:4613 mm/slub.c:4761)
net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>
Fixes:
76c8764ef36a ("pfcp: add PFCP module")
Reported-by: Xiao Liang <shaw.leon@gmail.com>
Closes: https://lore.kernel.org/netdev/
20250104125732.17335-1-shaw.leon@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:53 +0000 (10:47 +0900)]
gtp: Destroy device along with udp socket's netns dismantle.
gtp_newlink() links the device to a list in dev_net(dev) instead of
src_net, where a udp tunnel socket is created.
Even when src_net is removed, the device stays alive on dev_net(dev).
Then, removing src_net triggers the splat below. [0]
In this example, gtp0 is created in ns2, and the udp socket is created
in ns1.
ip netns add ns1
ip netns add ns2
ip -n ns1 link add netns ns2 name gtp0 type gtp role sgsn
ip netns del ns1
Let's link the device to the socket's netns instead.
Now, gtp_net_exit_batch_rtnl() needs another netdev iteration to remove
all gtp devices in the netns.
[0]:
ref_tracker: net notrefcnt@
000000003d6e7d05 has 1/2 users at
sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
__sock_create (net/socket.c:1558)
udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
gtp_create_sock (./include/net/udp_tunnel.h:59 drivers/net/gtp.c:1423)
gtp_create_sockets (drivers/net/gtp.c:1447)
gtp_newlink (drivers/net/gtp.c:1507)
rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
netlink_rcv_skb (net/netlink/af_netlink.c:2542)
netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
netlink_sendmsg (net/netlink/af_netlink.c:1891)
____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
___sys_sendmsg (net/socket.c:2639)
__sys_sendmsg (net/socket.c:2669)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
WARNING: CPU: 1 PID: 60 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 60 Comm: kworker/u16:2 Not tainted
6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:
ff11000009a07b60 EFLAGS:
00010286
RAX:
0000000000002bd3 RBX:
ff1100000f4e1aa0 RCX:
1ffffffff0e40ac6
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffffffff8423ee3c
RBP:
ff1100000f4e1af0 R08:
0000000000000001 R09:
fffffbfff0e395ae
R10:
0000000000000001 R11:
0000000000036001 R12:
ff1100000f4e1af0
R13:
dead000000000100 R14:
ff1100000f4e1af0 R15:
dffffc0000000000
FS:
0000000000000000(0000) GS:
ff1100006ce80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f9b2464bd98 CR3:
0000000005286005 CR4:
0000000000771ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe07f0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
<TASK>
? __warn (kernel/panic.c:748)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:285)
? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
? kfree (mm/slub.c:4613 mm/slub.c:4761)
net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>
Fixes:
459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Reported-by: Xiao Liang <shaw.leon@gmail.com>
Closes: https://lore.kernel.org/netdev/
20250104125732.17335-1-shaw.leon@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:52 +0000 (10:47 +0900)]
gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().
gtp_newlink() links the gtp device to a list in dev_net(dev).
However, even after the gtp device is moved to another netns,
it stays on the list but should be invisible.
Let's use for_each_netdev_rcu() for netdev traversal in
gtp_genl_dump_pdp().
Note that gtp_dev_list is no longer used under RCU, so list
helpers are converted to the non-RCU variant.
Fixes:
459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Reported-by: Xiao Liang <shaw.leon@gmail.com>
Closes: https://lore.kernel.org/netdev/CABAhCOQdBL6h9M2C+kd+bGivRJ9Q72JUxW+-gur0nub_=PmFPA@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Philo Lu [Fri, 10 Jan 2025 01:08:10 +0000 (09:08 +0800)]
udp: Make rehash4 independent in udp_lib_rehash()
As discussed in [0], rehash4 could be missed in udp_lib_rehash() when
udp hash4 changes while hash2 doesn't change. This patch fixes this by
moving rehash4 codes out of rehash2 checking, and then rehash2 and
rehash4 are done separately.
By doing this, we no longer need to call rehash4 explicitly in
udp_lib_hash4(), as the rehash callback in __ip4_datagram_connect takes
it. Thus, now udp_lib_hash4() returns directly if the sk is already
hashed.
Note that uhash4 may fail to work under consecutive connect(<dst
address>) calls because rehash() is not called with every connect(). To
overcome this, connect(<AF_UNSPEC>) needs to be called after the next
connect to a new destination.
[0]
https://lore.kernel.org/all/
4761e466ab9f7542c68cdc95f248987d127044d2.
1733499715.git.pabeni@redhat.com/
Fixes:
78c91ae2c6de ("ipv4/udp: Add 4-tuple hash for connected socket")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://patch.msgid.link/20250110010810.107145-1-lulie@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Heiner Kallweit [Thu, 9 Jan 2025 22:43:12 +0000 (23:43 +0100)]
r8169: remove redundant hwmon support
The temperature sensor is actually part of the integrated PHY and available
also on the standalone versions of the PHY. Therefore hwmon support will
be added to the Realtek PHY driver and can be removed here.
Fixes:
1ffcc8d41306 ("r8169: add support for the temperature sensor being available from RTL8125B")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/afba85f5-987b-4449-83cc-350438af7fe7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Fertser [Thu, 9 Jan 2025 14:50:54 +0000 (17:50 +0300)]
net/ncsi: fix locking in Get MAC Address handling
Obtaining RTNL lock in a response handler is not allowed since it runs
in an atomic softirq context. Postpone setting the MAC address by adding
a dedicated step to the configuration FSM.
Fixes:
790071347a0a ("net/ncsi: change from ndo_set_mac_address to dev_set_mac_address")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb596af@gmail.com
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Tested-by: Potin Lai <potin.lai.pt@gmail.com>
Link: https://patch.msgid.link/20250109145054.30925-1-fercerpav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qu Wenruo [Wed, 8 Jan 2025 03:44:04 +0000 (14:14 +1030)]
btrfs: add the missing error handling inside get_canonical_dev_path
Inside function get_canonical_dev_path(), we call d_path() to get the
final device path.
But d_path() can return error, and in that case the next strscpy() call
will trigger an invalid memory access.
Add back the missing error handling for d_path().
Reported-by: Boris Burkov <boris@bur.io>
Fixes:
7e06de7c83a7 ("btrfs: canonicalize the device path before adding it")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Chris Bainbridge [Sat, 11 Jan 2025 18:59:45 +0000 (18:59 +0000)]
ACPI: video: Fix random crashes due to bad kfree()
Commit
c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
available for eDP") added function dm_helpers_probe_acpi_edid(), which
fetches the EDID from the BIOS by calling acpi_video_get_edid().
acpi_video_get_edid() returns a pointer to the EDID, but this pointer
does not originate from kmalloc() - it is actually the internal
"pointer" field from an acpi_buffer struct (which did come from
kmalloc()).
dm_helpers_probe_acpi_edid() then attempts to kfree() the EDID pointer,
resulting in memory corruption which leads to random, intermittent
crashes (e.g. 4% of boots will fail with some Oops).
Fix this by allocating a new array (which can be safely freed) for the
EDID data, and correctly freeing the acpi_buffer pointer.
The only other caller of acpi_video_get_edid() is nouveau_acpi_edid():
remove the extraneous kmemdup() here as the EDID data is now copied in
acpi_video_device_EDID().
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Fixes:
c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP")
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Closes: https://lore.kernel.org/amd-gfx/
20250110175252.GBZ4FedNKqmBRaY4T3@fat_crate.local/T/#m324a23eb4c4c32fa7e89e31f8ba96c781e496fb1
Link: https://patch.msgid.link/Z4K_oQL7eA9Owkbs@debian.local
[ rjw: Changed function description comment into a kerneldoc one ]
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Fri, 10 Jan 2025 12:48:10 +0000 (13:48 +0100)]
cpuidle: teo: Update documentation after previous changes
After previous changes, the description of the teo governor in the
documentation comment does not match the code any more, so update it
as appropriate.
Fixes:
449914398083 ("cpuidle: teo: Remove recent intercepts metric")
Fixes:
2662342079f5 ("cpuidle: teo: Gather statistics regarding whether or not to stop the tick")
Fixes:
6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/6120335.lOV4Wx5bFT@rjwysocki.net
[ rjw: Corrected 3 typos found by Christian ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Fri, 10 Jan 2025 12:46:43 +0000 (13:46 +0100)]
cpuidle: menu: Update documentation after previous changes
After commit
38f83090f515 ("cpuidle: menu: Remove iowait influence") and
other previous changes, the description of the menu governor in the
documentation does not match the code any more, so update it as
appropriate.
Fixes:
38f83090f515 ("cpuidle: menu: Remove iowait influence")
Fixes:
5484e31bbbff ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases")
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/12589281.O9o76ZdvQC@rjwysocki.net
Karol Kolacinski [Tue, 5 Nov 2024 12:29:16 +0000 (13:29 +0100)]
ice: Add correct PHY lane assignment
Driver always naively assumes, that for PTP purposes, PHY lane to
configure is corresponding to PF ID.
This is not true for some port configurations, e.g.:
- 2x50G per quad, where lanes used are 0 and 2 on each quad, but PF IDs
are 0 and 1
- 100G per quad on 2 quads, where lanes used are 0 and 4, but PF IDs are
0 and 1
Use correct PHY lane assignment by getting and parsing port options.
This is read from the NVM by the FW and provided to the driver with
the indication of active port split.
Remove ice_is_muxed_topo(), which is no longer needed.
Fixes:
4409ea1726cb ("ice: Adjust PTP init for 2x50G E825C devices")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Arkadiusz Kubalewski <Arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Karol Kolacinski [Tue, 5 Nov 2024 12:29:15 +0000 (13:29 +0100)]
ice: Fix ETH56G FC-FEC Rx offset value
Fix ETH56G FC-FEC incorrect Rx offset value by changing it from -255.96
to -469.26 ns.
Those values are derived from HW spec and reflect internal delays.
Hex value is a fixed point representation in Q23.9 format.
Fixes:
7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Karol Kolacinski [Tue, 5 Nov 2024 12:29:14 +0000 (13:29 +0100)]
ice: Fix quad registers read on E825
Quad registers are read/written incorrectly. E825 devices always use
quad 0 address and differentiate between the PHYs by changing SBQ
destination device (phy_0 or phy_0_peer).
Add helpers for reading/writing PTP registers shared per quad and use
correct quad address and SBQ destination device based on port.
Fixes:
7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Karol Kolacinski [Tue, 5 Nov 2024 12:29:13 +0000 (13:29 +0100)]
ice: Fix E825 initialization
Current implementation checks revision of all PHYs on all PFs, which is
incorrect and may result in initialization failure. Check only the
revision of the current PHY.
Fixes:
7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Mon, 13 Jan 2025 17:03:18 +0000 (09:03 -0800)]
Merge tag 'mm-hotfixes-stable-2025-01-13-00-03' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"18 hotfixes. 11 are cc:stable. 13 are MM and 5 are non-MM.
All patches are singletons - please see the relevant changelogs for
details"
* tag 'mm-hotfixes-stable-2025-01-13-00-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
fs/proc: fix softlockup in __read_vmcore (part 2)
mm: fix assertion in folio_end_read()
mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled.
vmstat: disable vmstat_work on vmstat_cpu_down_prep()
zram: fix potential UAF of zram table
selftests/mm: set allocated memory to non-zero content in cow test
mm: clear uffd-wp PTE/PMD state on mremap()
module: fix writing of livepatch relocations in ROX text
mm: zswap: properly synchronize freeing resources during CPU hotunplug
Revert "mm: zswap: fix race between [de]compression and CPU hotunplug"
hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
mm: fix div by zero in bdi_ratio_from_pages
x86/execmem: fix ROX cache usage in Xen PV guests
filemap: avoid truncating 64-bit offset to 32 bits
tools: fix atomic_set() definition to set the value correctly
mm/mempolicy: count MPOL_WEIGHTED_INTERLEAVE to "interleave_hit"
scripts/decode_stacktrace.sh: fix decoding of lines with an additional info
mm/kmemleak: fix percpu memory leak detection failure
Artem Chernyshev [Thu, 9 Jan 2025 08:30:39 +0000 (11:30 +0300)]
pktgen: Avoid out-of-bounds access in get_imix_entries
Passing a sufficient amount of imix entries leads to invalid access to the
pkt_dev->imix_entries array because of the incorrect boundary check.
UBSAN: array-index-out-of-bounds in net/core/pktgen.c:874:24
index 20 is out of range for type 'imix_pkt [20]'
CPU: 2 PID: 1210 Comm: bash Not tainted 6.10.0-rc1 #121
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl lib/dump_stack.c:117
__ubsan_handle_out_of_bounds lib/ubsan.c:429
get_imix_entries net/core/pktgen.c:874
pktgen_if_write net/core/pktgen.c:1063
pde_write fs/proc/inode.c:334
proc_reg_write fs/proc/inode.c:346
vfs_write fs/read_write.c:593
ksys_write fs/read_write.c:644
do_syscall_64 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:130
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
52a62f8603f9 ("pktgen: Parse internet mix (imix) input")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
[ fp: allow to fill the array completely; minor changelog cleanup ]
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yage Geng [Mon, 13 Jan 2025 08:52:08 +0000 (16:52 +0800)]
ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5
This patch fixes the volume adjustment issue on the Lenovo ThinkBook 16P Gen5
by applying the necessary quirk configuration for the Realtek ALC287 codec.
The issue was caused by incorrect configuration in the driver,
which prevented proper volume control on certain systems.
Signed-off-by: Yage Geng <icoderdev@gmail.com>
Link: https://patch.msgid.link/20250113085208.15351-1-icoderdev@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Rik van Riel [Fri, 10 Jan 2025 15:28:21 +0000 (10:28 -0500)]
fs/proc: fix softlockup in __read_vmcore (part 2)
Since commit
5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") the
number of softlockups in __read_vmcore at kdump time have gone down, but
they still happen sometimes.
In a memory constrained environment like the kdump image, a softlockup is
not just a harmless message, but it can interfere with things like RCU
freeing memory, causing the crashdump to get stuck.
The second loop in __read_vmcore has a lot more opportunities for natural
sleep points, like scheduling out while waiting for a data write to
happen, but apparently that is not always enough.
Add a cond_resched() to the second loop in __read_vmcore to (hopefully)
get rid of the softlockups.
Link: https://lkml.kernel.org/r/20250110102821.2a37581b@fangorn
Fixes:
5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore")
Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: Breno Leitao <leitao@debian.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Fri, 10 Jan 2025 16:32:57 +0000 (16:32 +0000)]
mm: fix assertion in folio_end_read()
We only need to assert that the uptodate flag is clear if we're going to
set it. This hasn't been a problem before now because we have only used
folio_end_read() when completing with an error, but it's convenient to use
it in squashfs if we discover the folio is already uptodate.
Link: https://lkml.kernel.org/r/20250110163300.3346321-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Donet Tom [Thu, 9 Jan 2025 06:05:39 +0000 (00:05 -0600)]
mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled.
When MGLRU is enabled, the pgdemote_kswapd, pgdemote_direct, and
pgdemote_khugepaged stats in vmstat are not being updated.
Commit
f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA
balancing operations") moved the pgdemote vmstat update from
demote_folio_list() to shrink_inactive_list(), which is in the normal LRU
path. As a result, the pgdemote stats are updated correctly for the
normal LRU but not for MGLRU.
To address this, we have added the pgdemote stat update in the
evict_folios() function, which is in the MGLRU path. With this patch, the
pgdemote stats will now be updated correctly when MGLRU is enabled.
Without this patch vmstat output when MGLRU is enabled
======================================================
pgdemote_kswapd 0
pgdemote_direct 0
pgdemote_khugepaged 0
With this patch vmstat output when MGLRU is enabled
===================================================
pgdemote_kswapd 43234
pgdemote_direct 4691
pgdemote_khugepaged 0
Link: https://lkml.kernel.org/r/20250109060540.451261-1-donettom@linux.ibm.com
Fixes:
f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Koichiro Den [Wed, 8 Jan 2025 04:28:07 +0000 (13:28 +0900)]
vmstat: disable vmstat_work on vmstat_cpu_down_prep()
The upstream commit
adcfb264c3ed ("vmstat: disable vmstat_work on
vmstat_cpu_down_prep()") introduced another warning during the boot phase
so was soon reverted on upstream by commit
cd6313beaeae ("Revert "vmstat:
disable vmstat_work on vmstat_cpu_down_prep()""). This commit resolves it
and reattempts the original fix.
Even after mm/vmstat:online teardown, shepherd may still queue work for
the dying cpu until the cpu is removed from online mask. While it's quite
rare, this means that after unbind_workers() unbinds a per-cpu kworker, it
potentially runs vmstat_update for the dying CPU on an irrelevant cpu
before entering atomic AP states. When CONFIG_DEBUG_PREEMPT=y, it results
in the following error with the backtrace.
BUG: using smp_processor_id() in preemptible [
00000000] code: \
kworker/7:3/1702
caller is refresh_cpu_vm_stats+0x235/0x5f0
CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
Tainted: [N]=TEST
Workqueue: mm_percpu_wq vmstat_update
Call Trace:
<TASK>
dump_stack_lvl+0x8d/0xb0
check_preemption_disabled+0xce/0xe0
refresh_cpu_vm_stats+0x235/0x5f0
vmstat_update+0x17/0xa0
process_one_work+0x869/0x1aa0
worker_thread+0x5e5/0x1100
kthread+0x29e/0x380
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
symmetrically enable it on startup.
For secondary CPUs during CPU hotplug scenarios, ensure the delayed work
is disabled immediately after the initialization. These CPUs are not yet
online when start_shepherd_timer() runs on boot CPU. vmstat_cpu_online()
will enable the work for them.
Link: https://lkml.kernel.org/r/20250108042807.3429745-1-koichiro.den@canonical.com
Signed-off-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Suggested-by: Huacai Chen <chenhuacai@kernel.org>
Tested-by: Charalampos Mitrodimas <charmitro@posteo.net>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Tue, 7 Jan 2025 06:54:46 +0000 (14:54 +0800)]
zram: fix potential UAF of zram table
If zram_meta_alloc failed early, it frees allocated zram->table without
setting it NULL. Which will potentially cause zram_meta_free to access
the table if user reset an failed and uninitialized device.
Link: https://lkml.kernel.org/r/20250107065446.86928-1-ryncsn@gmail.com
Fixes:
74363ec674cb ("zram: fix uninitialized ZRAM not releasing backing device")
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryan Roberts [Tue, 7 Jan 2025 14:25:53 +0000 (14:25 +0000)]
selftests/mm: set allocated memory to non-zero content in cow test
After commit
b1f202060afe ("mm: remap unused subpages to shared zeropage
when splitting isolated thp"), cow test cases involving swapping out THPs
via madvise(MADV_PAGEOUT) started to be skipped due to the subsequent
check via pagemap determining that the memory was not actually swapped
out. Logs similar to this were emitted:
...
# [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (16 kB)
ok 2 # SKIP MADV_PAGEOUT did not work, is swap enabled?
# [RUN] Basic COW after fork() ... with single PTE of swapped-out THP (16 kB)
ok 3 # SKIP MADV_PAGEOUT did not work, is swap enabled?
# [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (32 kB)
ok 4 # SKIP MADV_PAGEOUT did not work, is swap enabled?
...
The commit in question introduces the behaviour of scanning THPs and if
their content is predominantly zero, it splits them and replaces the pages
which are wholly zero with the zero page. These cow test cases were
getting caught up in this.
So let's avoid that by filling the contents of all allocated memory with
a non-zero value. With this in place, the tests are passing again.
Link: https://lkml.kernel.org/r/20250107142555.1870101-1-ryan.roberts@arm.com
Fixes:
b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryan Roberts [Tue, 7 Jan 2025 14:47:52 +0000 (14:47 +0000)]
mm: clear uffd-wp PTE/PMD state on mremap()
When mremap()ing a memory region previously registered with userfaultfd as
write-protected but without UFFD_FEATURE_EVENT_REMAP, an inconsistency in
flag clearing leads to a mismatch between the vma flags (which have
uffd-wp cleared) and the pte/pmd flags (which do not have uffd-wp
cleared). This mismatch causes a subsequent mprotect(PROT_WRITE) to
trigger a warning in page_table_check_pte_flags() due to setting the pte
to writable while uffd-wp is still set.
Fix this by always explicitly clearing the uffd-wp pte/pmd flags on any
such mremap() so that the values are consistent with the existing clearing
of VM_UFFD_WP. Be careful to clear the logical flag regardless of its
physical form; a PTE bit, a swap PTE bit, or a PTE marker. Cover PTE,
huge PMD and hugetlb paths.
Link: https://lkml.kernel.org/r/20250107144755.1871363-2-ryan.roberts@arm.com
Co-developed-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Closes: https://lore.kernel.org/linux-mm/
810b44a8-d2ae-4107-b665-
5a42eae2d948@arm.com/
Fixes:
63b2d4174c4a ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Petr Pavlu [Tue, 7 Jan 2025 15:34:57 +0000 (16:34 +0100)]
module: fix writing of livepatch relocations in ROX text
A livepatch module can contain a special relocation section
.klp.rela.<objname>.<secname> to apply its relocations at the appropriate
time and to additionally access local and unexported symbols. When
<objname> points to another module, such relocations are processed
separately from the regular module relocation process. For instance, only
when the target <objname> actually becomes loaded.
With CONFIG_STRICT_MODULE_RWX, when the livepatch core decides to apply
these relocations, their processing results in the following bug:
[ 25.827238] BUG: unable to handle page fault for address:
00000000000012ba
[ 25.827819] #PF: supervisor read access in kernel mode
[ 25.828153] #PF: error_code(0x0000) - not-present page
[ 25.828588] PGD 0 P4D 0
[ 25.829063] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 25.829742] CPU: 2 UID: 0 PID: 452 Comm: insmod Tainted: G O K
6.13.0-rc4-00078-g059dd502b263 #7820
[ 25.830417] Tainted: [O]=OOT_MODULE, [K]=LIVEPATCH
[ 25.830768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 25.831651] RIP: 0010:memcmp+0x24/0x60
[ 25.832190] Code: [...]
[ 25.833378] RSP: 0018:
ffffa40b403a3ae8 EFLAGS:
00000246
[ 25.833637] RAX:
0000000000000000 RBX:
ffff93bc81d8e700 RCX:
ffffffffc0202000
[ 25.834072] RDX:
0000000000000000 RSI:
0000000000000004 RDI:
00000000000012ba
[ 25.834548] RBP:
ffffa40b403a3b68 R08:
ffffa40b403a3b30 R09:
0000004a00000002
[ 25.835088] R10:
ffffffffffffd222 R11:
f000000000000000 R12:
0000000000000000
[ 25.835666] R13:
ffffffffc02032ba R14:
ffffffffc007d1e0 R15:
0000000000000004
[ 25.836139] FS:
00007fecef8c3080(0000) GS:
ffff93bc8f900000(0000) knlGS:
0000000000000000
[ 25.836519] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 25.836977] CR2:
00000000000012ba CR3:
0000000002f24000 CR4:
00000000000006f0
[ 25.837442] Call Trace:
[ 25.838297] <TASK>
[ 25.841083] __write_relocate_add.constprop.0+0xc7/0x2b0
[ 25.841701] apply_relocate_add+0x75/0xa0
[ 25.841973] klp_write_section_relocs+0x10e/0x140
[ 25.842304] klp_write_object_relocs+0x70/0xa0
[ 25.842682] klp_init_object_loaded+0x21/0xf0
[ 25.842972] klp_enable_patch+0x43d/0x900
[ 25.843572] do_one_initcall+0x4c/0x220
[ 25.844186] do_init_module+0x6a/0x260
[ 25.844423] init_module_from_file+0x9c/0xe0
[ 25.844702] idempotent_init_module+0x172/0x270
[ 25.845008] __x64_sys_finit_module+0x69/0xc0
[ 25.845253] do_syscall_64+0x9e/0x1a0
[ 25.845498] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 25.846056] RIP: 0033:0x7fecef9eb25d
[ 25.846444] Code: [...]
[ 25.847563] RSP: 002b:
00007ffd0c5d6de8 EFLAGS:
00000246 ORIG_RAX:
0000000000000139
[ 25.848082] RAX:
ffffffffffffffda RBX:
000055b03f05e470 RCX:
00007fecef9eb25d
[ 25.848456] RDX:
0000000000000000 RSI:
000055b001e74e52 RDI:
0000000000000003
[ 25.848969] RBP:
00007ffd0c5d6ea0 R08:
0000000000000040 R09:
0000000000004100
[ 25.849411] R10:
00007fecefac7b20 R11:
0000000000000246 R12:
000055b001e74e52
[ 25.849905] R13:
0000000000000000 R14:
000055b03f05e440 R15:
0000000000000000
[ 25.850336] </TASK>
[ 25.850553] Modules linked in: deku(OK+) uinput
[ 25.851408] CR2:
00000000000012ba
[ 25.852085] ---[ end trace
0000000000000000 ]---
The problem is that the .klp.rela.<objname>.<secname> relocations are
processed after the module was already formed and mod->rw_copy was reset.
However, the code in __write_relocate_add() calls
module_writable_address() which translates the target address 'loc' still
to 'loc + (mem->rw_copy - mem->base)', with mem->rw_copy now being 0.
Fix the problem by returning directly 'loc' in module_writable_address()
when the module is already formed. Function __write_relocate_add() knows
to use text_poke() in such a case.
Link: https://lkml.kernel.org/r/20250107153507.14733-1-petr.pavlu@suse.com
Fixes:
0c133b1e78cd ("module: prepare to handle ROX allocations for text")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reported-by: Marek Maslanka <mmaslanka@google.com>
Closes: https://lore.kernel.org/linux-modules/CAGcaFA2hdThQV6mjD_1_U+GNHThv84+MQvMWLgEuX+LVbAyDxg@mail.gmail.com/
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yosry Ahmed [Wed, 8 Jan 2025 22:24:41 +0000 (22:24 +0000)]
mm: zswap: properly synchronize freeing resources during CPU hotunplug
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the
current CPU at the beginning of the operation is retrieved and used
throughout. However, since neither preemption nor migration are disabled,
it is possible that the operation continues on a different CPU.
If the original CPU is hotunplugged while the acomp_ctx is still in use,
we run into a UAF bug as some of the resources attached to the acomp_ctx
are freed during hotunplug in zswap_cpu_comp_dead() (i.e.
acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp).
The problem was introduced in commit
1ec3b5fe6eec ("mm/zswap: move to use
crypto_acomp API for hardware acceleration") when the switch to the
crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was
retrieved using get_cpu_ptr() which disables preemption and makes sure the
CPU cannot go away from under us. Preemption cannot be disabled with the
crypto_acomp API as a sleepable context is needed.
Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating
and freeing resources with compression/decompression paths. Make sure
that acomp_ctx.req is NULL when the resources are freed. In the
compression/decompression paths, check if acomp_ctx.req is NULL after
acquiring the mutex (meaning the CPU was offlined) and retry on the new
CPU.
The initialization of acomp_ctx.mutex is moved from the CPU hotplug
callback to the pool initialization where it belongs (where the mutex is
allocated). In addition to adding clarity, this makes sure that CPU
hotplug cannot reinitialize a mutex that is already locked by
compression/decompression.
Previously a fix was attempted by holding cpus_read_lock() [1]. This
would have caused a potential deadlock as it is possible for code already
holding the lock to fall into reclaim and enter zswap (causing a
deadlock). A fix was also attempted using SRCU for synchronization, but
Johannes pointed out that synchronize_srcu() cannot be used in CPU hotplug
notifiers [2].
Alternative fixes that were considered/attempted and could have worked:
- Refcounting the per-CPU acomp_ctx. This involves complexity in
handling the race between the refcount dropping to zero in
zswap_[de]compress() and the refcount being re-initialized when the
CPU is onlined.
- Disabling migration before getting the per-CPU acomp_ctx [3], but
that's discouraged and is a much bigger hammer than needed, and could
result in subtle performance issues.
[1]https://lkml.kernel.org/
20241219212437.
2714151-1-yosryahmed@google.com/
[2]https://lkml.kernel.org/
20250107074724.
1756696-2-yosryahmed@google.com/
[3]https://lkml.kernel.org/
20250107222236.
2715883-2-yosryahmed@google.com/
[yosryahmed@google.com: remove comment]
Link: https://lkml.kernel.org/r/CAJD7tkaxS1wjn+swugt8QCvQ-rVF5RZnjxwPGX17k8x9zSManA@mail.gmail.com
Link: https://lkml.kernel.org/r/20250108222441.3622031-1-yosryahmed@google.com
Fixes:
1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Closes: https://lore.kernel.org/lkml/
20241113213007.GB1564047@cmpxchg.org/
Reported-by: Sam Sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/
Cc: Barry Song <baohua@kernel.org>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yosry Ahmed [Tue, 7 Jan 2025 22:22:34 +0000 (22:22 +0000)]
Revert "mm: zswap: fix race between [de]compression and CPU hotunplug"
This reverts commit
eaebeb93922ca6ab0dd92027b73d0112701706ef.
Commit
eaebeb93922c ("mm: zswap: fix race between [de]compression and CPU
hotunplug") used the CPU hotplug lock in zswap compress/decompress
operations to protect against a race with CPU hotunplug making some
per-CPU resources go away.
However, zswap compress/decompress can be reached through reclaim while
the lock is held, resulting in a potential deadlock as reported by syzbot:
======================================================
WARNING: possible circular locking dependency detected
6.13.0-rc6-syzkaller-00006-g5428dc1906dd #0 Not tainted
------------------------------------------------------
kswapd0/89 is trying to acquire lock:
ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: acomp_ctx_get_cpu mm/zswap.c:886 [inline]
ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_compress mm/zswap.c:908 [inline]
ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_store_page mm/zswap.c:1439 [inline]
ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_store+0xa74/0x1ba0 mm/zswap.c:1546
but task is already holding lock:
ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6871 [inline]
ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb58/0x2f30 mm/vmscan.c:7253
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
__fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
might_alloc include/linux/sched/mm.h:318 [inline]
slab_pre_alloc_hook mm/slub.c:4070 [inline]
slab_alloc_node mm/slub.c:4148 [inline]
__kmalloc_cache_node_noprof+0x40/0x3a0 mm/slub.c:4337
kmalloc_node_noprof include/linux/slab.h:924 [inline]
alloc_worker kernel/workqueue.c:2638 [inline]
create_worker+0x11b/0x720 kernel/workqueue.c:2781
workqueue_prepare_cpu+0xe3/0x170 kernel/workqueue.c:6628
cpuhp_invoke_callback+0x48d/0x830 kernel/cpu.c:194
__cpuhp_invoke_callback_range kernel/cpu.c:965 [inline]
cpuhp_invoke_callback_range kernel/cpu.c:989 [inline]
cpuhp_up_callbacks kernel/cpu.c:1020 [inline]
_cpu_up+0x2b3/0x580 kernel/cpu.c:1690
cpu_up+0x184/0x230 kernel/cpu.c:1722
cpuhp_bringup_mask+0xdf/0x260 kernel/cpu.c:1788
cpuhp_bringup_cpus_parallel+0xf9/0x160 kernel/cpu.c:1878
bringup_nonboot_cpus+0x2b/0x50 kernel/cpu.c:1892
smp_init+0x34/0x150 kernel/smp.c:1009
kernel_init_freeable+0x417/0x5d0 init/main.c:1569
kernel_init+0x1d/0x2b0 init/main.c:1466
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
cpus_read_lock+0x42/0x150 kernel/cpu.c:490
acomp_ctx_get_cpu mm/zswap.c:886 [inline]
zswap_compress mm/zswap.c:908 [inline]
zswap_store_page mm/zswap.c:1439 [inline]
zswap_store+0xa74/0x1ba0 mm/zswap.c:1546
swap_writepage+0x647/0xce0 mm/page_io.c:279
shmem_writepage+0x1248/0x1610 mm/shmem.c:1579
pageout mm/vmscan.c:696 [inline]
shrink_folio_list+0x35ee/0x57e0 mm/vmscan.c:1374
shrink_inactive_list mm/vmscan.c:1967 [inline]
shrink_list mm/vmscan.c:2205 [inline]
shrink_lruvec+0x16db/0x2f30 mm/vmscan.c:5734
mem_cgroup_shrink_node+0x385/0x8e0 mm/vmscan.c:6575
mem_cgroup_soft_reclaim mm/memcontrol-v1.c:312 [inline]
memcg1_soft_limit_reclaim+0x346/0x810 mm/memcontrol-v1.c:362
balance_pgdat mm/vmscan.c:6975 [inline]
kswapd+0x17b3/0x2f30 mm/vmscan.c:7253
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(cpu_hotplug_lock);
lock(fs_reclaim);
rlock(cpu_hotplug_lock);
*** DEADLOCK ***
1 lock held by kswapd0/89:
#0:
ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6871 [inline]
#0:
ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb58/0x2f30 mm/vmscan.c:7253
stack backtrace:
CPU: 0 UID: 0 PID: 89 Comm: kswapd0 Not tainted
6.13.0-rc6-syzkaller-00006-g5428dc1906dd #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
cpus_read_lock+0x42/0x150 kernel/cpu.c:490
acomp_ctx_get_cpu mm/zswap.c:886 [inline]
zswap_compress mm/zswap.c:908 [inline]
zswap_store_page mm/zswap.c:1439 [inline]
zswap_store+0xa74/0x1ba0 mm/zswap.c:1546
swap_writepage+0x647/0xce0 mm/page_io.c:279
shmem_writepage+0x1248/0x1610 mm/shmem.c:1579
pageout mm/vmscan.c:696 [inline]
shrink_folio_list+0x35ee/0x57e0 mm/vmscan.c:1374
shrink_inactive_list mm/vmscan.c:1967 [inline]
shrink_list mm/vmscan.c:2205 [inline]
shrink_lruvec+0x16db/0x2f30 mm/vmscan.c:5734
mem_cgroup_shrink_node+0x385/0x8e0 mm/vmscan.c:6575
mem_cgroup_soft_reclaim mm/memcontrol-v1.c:312 [inline]
memcg1_soft_limit_reclaim+0x346/0x810 mm/memcontrol-v1.c:362
balance_pgdat mm/vmscan.c:6975 [inline]
kswapd+0x17b3/0x2f30 mm/vmscan.c:7253
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Revert the change. A different fix for the race with CPU hotunplug will
follow.
Link: https://lkml.kernel.org/r/20250107222236.2715883-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sam Sun <samsun1006219@gmail.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Mon, 6 Jan 2025 03:31:17 +0000 (11:31 +0800)]
hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
hugetlb_file_setup() will pass a NULL @dir to hugetlbfs_get_inode(), so we
will access a NULL pointer for @dir. Fix it and set __entry->dr to 0 if
@dir is NULL. Because ->i_ino cannot be 0 (see get_next_ino()), there is
no confusing if user sees a 0 inode number.
Link: https://lkml.kernel.org/r/20250106033118.4640-1-songmuchun@bytedance.com
Fixes:
318580ad7f28 ("hugetlbfs: support tracepoint")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reported-by: Cheung Wall <zzqq0103.hey@gmail.com>
Closes: https://lore.kernel.org/linux-mm/
02858D60-43C1-4863-A84F-
3C76A8AF1F15@linux.dev/T/#
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Cc: cheung wall <zzqq0103.hey@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stefan Roesch [Sat, 4 Jan 2025 01:20:37 +0000 (17:20 -0800)]
mm: fix div by zero in bdi_ratio_from_pages
During testing it has been detected, that it is possible to get div by
zero error in bdi_set_min_bytes. The error is caused by the function
bdi_ratio_from_pages(). bdi_ratio_from_pages() calls global_dirty_limits.
If the dirty threshold is 0, the div by zero is raised. This can happen
if the root user is setting:
echo 0 > /proc/sys/vm/dirty_ratio
The following is a test case:
echo 0 > /proc/sys/vm/dirty_ratio
cd /sys/class/bdi/<device>
echo 1 > strict_limit
echo 8192 > min_bytes
==> error is raised.
The problem is addressed by returning -EINVAL if dirty_ratio or
dirty_bytes is set to 0.
[shr@devkernel.io: check for -EINVAL in bdi_set_min_bytes() and bdi_set_max_bytes()]
Link: https://lkml.kernel.org/r/20250108014723.166637-1-shr@devkernel.io
[shr@devkernel.io: v3]
Link: https://lkml.kernel.org/r/20250109063411.6591-1-shr@devkernel.io
Link: https://lkml.kernel.org/r/20250104012037.159386-1-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reported-by: cheung wall <zzqq0103.hey@gmail.com>
Closes: https://lore.kernel.org/linux-mm/87pll35yd0.fsf@devkernel.io/T/#t
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Qiang Zhang <zzqq0103.hey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Juergen Gross [Fri, 3 Jan 2025 06:56:31 +0000 (07:56 +0100)]
x86/execmem: fix ROX cache usage in Xen PV guests
The recently introduced ROX cache for modules is assuming large page
support in 64-bit mode without testing the related feature bit. This
results in breakage when running as a Xen PV guest, as in this mode large
pages are not supported.
Fix that by testing the X86_FEATURE_PSE capability when deciding whether
to enable the ROX cache.
Link: https://lkml.kernel.org/r/20250103065631.26459-1-jgross@suse.com
Fixes:
2e45474ab14f ("execmem: add support for cache of large ROX pages")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Marco Nelissen [Thu, 2 Jan 2025 19:04:11 +0000 (11:04 -0800)]
filemap: avoid truncating 64-bit offset to 32 bits
On 32-bit kernels, folio_seek_hole_data() was inadvertently truncating a
64-bit value to 32 bits, leading to a possible infinite loop when writing
to an xfs filesystem.
Link: https://lkml.kernel.org/r/20250102190540.1356838-1-marco.nelissen@gmail.com
Fixes:
54fa39ac2e00 ("iomap: use mapping_seek_hole_data")
Signed-off-by: Marco Nelissen <marco.nelissen@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Fri, 27 Dec 2024 22:22:20 +0000 (14:22 -0800)]
tools: fix atomic_set() definition to set the value correctly
Currently vma test is failing because of the new vma_assert_attached()
assertion. The check is failing because previous refcount_set() inside
vma_mark_attached() is a NoOp. Fix the definition of atomic_set() to
correctly set the value of the atomic.
Link: https://lkml.kernel.org/r/20241227222220.1726384-1-surenb@google.com
Fixes:
9325b8b5a1cb ("tools: add skeleton code for userland testing of VMA logic")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Honggyu Kim [Fri, 27 Dec 2024 09:57:37 +0000 (18:57 +0900)]
mm/mempolicy: count MPOL_WEIGHTED_INTERLEAVE to "interleave_hit"
Commit
fa3bea4e1f82 introduced MPOL_WEIGHTED_INTERLEAVE but it missed
adding its counter to "interleave_hit" of numastat, which is located at
/sys/devices/system/node/nodeN/ directory.
It'd be better to add weighted interleving counter info to the existing
"interleave_hit" instead of introducing a new counter
"weighted_interleave_hit".
Link: https://lkml.kernel.org/r/20241227095737.645-1-honggyu.kim@sk.com
Fixes:
fa3bea4e1f82 ("mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving")
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
Tested-by: Yunjeong Mun <yunjeong.mun@sk.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Luca Ceresoli [Mon, 30 Dec 2024 21:55:10 +0000 (22:55 +0100)]
scripts/decode_stacktrace.sh: fix decoding of lines with an additional info
Since commit
bdf8eafbf7f5 ("arm64: stacktrace: report source of unwind
data") a stack trace line can contain an additional info field that was not
present before, in the form of one or more letters in parentheses. E.g.:
[ 504.517915] led_sysfs_enable+0x54/0x80 (P)
^^^
When this is present, decode_stacktrace decodes the line incorrectly:
[ 504.517915] led_sysfs_enable+0x54/0x80 P
Extend parsing to decode it correctly:
[ 504.517915] led_sysfs_enable (drivers/leds/led-core.c:455 (discriminator 7)) (P)
The regex to match such lines assumes the info can be extended in the
future to other uppercase characters, and will need to be extended in case
other characters will be used. Using a much more generic regex might incur
in false positives, so this looked like a good tradeoff.
Link: https://lkml.kernel.org/r/20241230-decode_stacktrace-fix-info-v1-1-984910659173@bootlin.com
Fixes:
bdf8eafbf7f5 ("arm64: stacktrace: report source of unwind data")
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Puranjay Mohan <puranjay@kernel.org>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guo Weikang [Fri, 27 Dec 2024 09:23:10 +0000 (17:23 +0800)]
mm/kmemleak: fix percpu memory leak detection failure
kmemleak_alloc_percpu gives an incorrect min_count parameter, causing
percpu memory to be considered a gray object.
Link: https://lkml.kernel.org/r/20241227092311.3572500-1-guoweikang.kernel@gmail.com
Fixes:
8c8685928910 ("mm/kmemleak: use IS_ERR_PCPU() for pointer in the percpu address space")
Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sun, 12 Jan 2025 22:37:56 +0000 (14:37 -0800)]
Linux 6.13-rc7
Linus Torvalds [Sun, 12 Jan 2025 22:34:00 +0000 (14:34 -0800)]
Merge tag 'char-misc-6.13-rc7' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc/IIO driver fixes from Greg KH:
"Here are a bunch of small IIO and interconnect and other driver fixes
to resolve reported issues. Included in here are:
- loads of iio driver fixes as a result of an audit of places where
uninitialized data would leak to userspace.
- other smaller, and normal, iio driver fixes.
- mhi driver fix
- interconnect driver fixes
- pci1xxxx driver fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (32 commits)
misc: microchip: pci1xxxx: Resolve return code mismatch during GPIO set config
misc: microchip: pci1xxxx: Resolve kernel panic during GPIO IRQ handling
interconnect: icc-clk: check return values of devm_kasprintf()
interconnect: qcom: icc-rpm: Set the count member before accessing the flex array
iio: adc: ti-ads1119: fix sample size in scan struct for triggered buffer
iio: temperature: tmp006: fix information leak in triggered buffer
iio: inkern: call iio_device_put() only on mapped devices
iio: adc: ad9467: Fix the "don't allow reading vref if not available" case
iio: adc: at91: call input_free_device() on allocated iio_dev
iio: adc: ad7173: fix using shared static info struct
iio: adc: ti-ads124s08: Use gpiod_set_value_cansleep()
iio: adc: ti-ads1119: fix information leak in triggered buffer
iio: pressure: zpa2326: fix information leak in triggered buffer
iio: adc: rockchip_saradc: fix information leak in triggered buffer
iio: imu: kmx61: fix information leak in triggered buffer
iio: light: vcnl4035: fix information leak in triggered buffer
iio: light: bh1745: fix information leak in triggered buffer
iio: adc: ti-ads8688: fix information leak in triggered buffer
iio: dummy: iio_simply_dummy_buffer: fix information leak in triggered buffer
iio: test: Fix GTS test config
...
Linus Torvalds [Sun, 12 Jan 2025 22:26:31 +0000 (14:26 -0800)]
Merge tag 'driver-core-6.13-rc7' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs fixes from Greg KH:
"Here are some small driver core and debugfs fixes that resolve some
reported problems:
- debugfs runtime error reporting fixes
- topology cpumask race-condition fix
- MAINTAINERS file email update
All of these have been in linux-next this week with no reported
issues"
* tag 'driver-core-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
fs: debugfs: fix open proxy for unsafe files
MAINTAINERS: align Danilo's maintainer entries
topology: Keep the cpumask unchanged when printing cpumap
debugfs: fix missing mutex_destroy() in short_fops case
fs: debugfs: differentiate short fops with proxy ops
Linus Torvalds [Sun, 12 Jan 2025 22:22:13 +0000 (14:22 -0800)]
Merge tag 'staging-6.13-rc7' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes that resolve some reported
issues and have been in my tree for too long due to the holiday break.
They resolve the following issues:
- lots of gpib build-time fixes as reported by testers and 0-day
- gpib logical fixes
- mailmap fix
All of these have been in linux-next for a while, with no reported
issues other than the duplicated change"
* tag 'staging-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: gpib: mite: remove unused global functions
staging: gpib: refer to correct config symbol in tnt4882 Makefile
mailmap: update Bingwu Zhang's email address
staging: gpib: fix address space mixup
staging: gpib: use ioport_map
staging: gpib: fix pcmcia dependencies
staging: gpib: add module author and description fields
staging: gpib: fix Makefiles
staging: gpib: make global 'usec_diff' functions static
staging: gpib: Modify mismatched function name
staging: gpib: Add lower bound check for secondary address
staging: gpib: Fix erroneous removal of blank before newline
Linus Torvalds [Sun, 12 Jan 2025 21:27:15 +0000 (13:27 -0800)]
Merge tag 'tty-6.13-rc7' of git://git./linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are three small serial driver fixes tree. They resolve some
reported issues:
- stm32 break control fix
- 8250 runtime pm usage counter fix
- imx driver locking fix
All have been in my tree and linux-next for three weeks now, with no
reported issues"
* tag 'tty-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: stm32: use port lock wrappers for break control
serial: imx: Use uart_port_lock_irq() instead of uart_port_lock()
tty: serial: 8250: Fix another runtime PM usage counter underflow
Linus Torvalds [Sun, 12 Jan 2025 21:09:00 +0000 (13:09 -0800)]
Merge tag 'usb-6.13-rc7' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes and new device ids for 6.13-rc7.
Included in here are:
- usb serial new device ids
- typec bugfixes for reported issues
- dwc3 driver fixes
- chipidea driver fixes
- gadget driver fixes
- other minor fixes for reported problems.
All of these have been in linux-next for a while, with no reported
issues"
* tag 'usb-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: option: add Neoway N723-EA support
USB: serial: option: add MeiG Smart SRM815
USB: serial: cp210x: add Phoenix Contact UPS Device
usb: typec: fix pm usage counter imbalance in ucsi_ccg_sync_control()
usb-storage: Add max sectors quirk for Nokia 208
usb: gadget: midi2: Reverse-select at the right place
usb: gadget: f_fs: Remove WARN_ON in functionfs_bind
USB: core: Disable LPM only for non-suspended ports
usb: fix reference leak in usb_new_device()
usb: typec: tcpci: fix NULL pointer issue on shared irq case
usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null
usb: chipidea: ci_hdrc_imx: decrement device's refcount in .remove() and in the error path of .probe()
usb: typec: ucsi: Set orientation as none when connector is unplugged
usb: gadget: configfs: Ignore trailing LF for user strings to cdev
USB: usblp: return error when setting unsupported protocol
usb: gadget: f_uac2: Fix incorrect setting of bNumEndpoints
usb: typec: tcpm/tcpci_maxim: fix error code in max_contaminant_read_resistance_kohm()
usb: host: xhci-plat: set skip_phy_initialization if software node has XHCI_SKIP_PHY_INIT property
usb: dwc3-am62: Disable autosuspend during remove
usb: dwc3: gadget: fix writing NYET threshold
Linus Torvalds [Sun, 12 Jan 2025 20:04:53 +0000 (12:04 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The largest part here is for KVM/PPC, where a NULL pointer dereference
was introduced in the 6.13 merge window and is now fixed.
There's some "holiday-induced lateness", as the s390 submaintainer put
it, but otherwise things looks fine.
s390:
- fix a latent bug when the kernel is compiled in debug mode
- two small UCONTROL fixes and their selftests
arm64:
- always check page state in hyp_ack_unshare()
- align set_id_regs selftest with the fact that ASIDBITS field is RO
- various vPMU fixes for bugs that only affect nested virt
PPC e500:
- Fix a mostly impossible (but just wrong) case where IRQs were never
re-enabled
- Observe host permissions instead of mapping readonly host pages as
guest-writable. This fixes a NULL-pointer dereference in 6.13
- Replace brittle VMA-based attempts at building huge shadow TLB
entries with PTE lookups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: e500: perform hugepage check after looking up the PFN
KVM: e500: map readonly host pages for read
KVM: e500: track host-writability of pages
KVM: e500: use shadow TLB entry as witness for writability
KVM: e500: always restore irqs
KVM: s390: selftests: Add has device attr check to uc_attr_mem_limit selftest
KVM: s390: selftests: Add ucontrol gis routing test
KVM: s390: Reject KVM_SET_GSI_ROUTING on ucontrol VMs
KVM: s390: selftests: Add ucontrol flic attr selftests
KVM: s390: Reject setting flic pfault attributes on ucontrol VMs
KVM: s390: vsie: fix virtual/physical address in unpin_scb()
KVM: arm64: Only apply PMCR_EL0.P to the guest range of counters
KVM: arm64: nv: Reload PMU events upon MDCR_EL2.HPME change
KVM: arm64: Use KVM_REQ_RELOAD_PMU to handle PMCR_EL0.E change
KVM: arm64: Add unified helper for reprogramming counters by mask
KVM: arm64: Always check the state from hyp_ack_unshare()
KVM: arm64: Fix set_id_regs selftest for ASIDBITS becoming unwritable
Linus Torvalds [Sun, 12 Jan 2025 19:57:45 +0000 (11:57 -0800)]
Merge tag 'perf_urgent_for_v6.13_rc7' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- Fix a #GP in the perf user callchain code caused by a race between
uprobe freeing the task and the bpf profiler unwinding the task's
user stack
* tag 'perf_urgent_for_v6.13_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
uprobes: Fix race in uprobe_free_utask
Linus Torvalds [Sun, 12 Jan 2025 19:55:48 +0000 (11:55 -0800)]
Merge tag 'x86_urgent_for_v6.13_rc7' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Check whether shadow stack is active before using the ptrace regset
getter
- Remove a wrong BUG_ON in the early static call code which breaks Xen
PVH when booting as dom0
* tag 'x86_urgent_for_v6.13_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Ensure shadow stack is active before "getting" registers
x86/static-call: Remove early_boot_irqs_disabled check to fix Xen PVH dom0