linux-2.6-block.git
5 weeks agousbnet: ipheth: drop RX URBs with no payload
Foster Snowhill [Tue, 6 Aug 2024 17:28:07 +0000 (19:28 +0200)]
usbnet: ipheth: drop RX URBs with no payload

On iPhone 15 Pro Max one can observe periodic URBs with no payload
on the "bulk in" (RX) endpoint. These don't seem to do anything
meaningful. Reproduced on iOS 17.5.1 and 17.6.

This behaviour isn't observed on iPhone 11 on the same iOS version. The
nature of these zero-length URBs is so far unknown.

Drop RX URBs with no payload.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agousbnet: ipheth: remove extraneous rx URB length check
Foster Snowhill [Tue, 6 Aug 2024 17:28:06 +0000 (19:28 +0200)]
usbnet: ipheth: remove extraneous rx URB length check

Rx URB length was already checked in ipheth_rcvbulk_callback_legacy()
and ipheth_rcvbulk_callback_ncm(), depending on the current mode.
The check in ipheth_rcvbulk_callback() was thus mostly a duplicate.

The only place in ipheth_rcvbulk_callback() where we care about the URB
length is for the initial control frame. These frames are always 4 bytes
long. This has been checked as far back as iOS 4.2.1 on iPhone 3G.

Remove the extraneous URB length check. For control frames, check for
the specific 4-byte length instead.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Tested-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agousbnet: ipheth: race between ipheth_close and error handling
Oliver Neukum [Tue, 6 Aug 2024 17:28:05 +0000 (19:28 +0200)]
usbnet: ipheth: race between ipheth_close and error handling

ipheth_sndbulk_callback() can submit carrier_work
as a part of its error handling. That means that
the driver must make sure that the work is cancelled
after it has made sure that no more URB can terminate
with an error condition.

Hence the order of actions in ipheth_close() needs
to be inverted.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Foster Snowhill <forst@pen.gy>
Tested-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agoMerge tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 8 Aug 2024 20:51:44 +0000 (13:51 -0700)]
Merge tag 'net-6.11-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth.

  Current release - regressions:

   - eth: bnxt_en: fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() on
     older chips

  Current release - new code bugs:

   - ethtool: fix off-by-one error / kdoc contradicting the code for max
     RSS context IDs

   - Bluetooth: hci_qca:
      - QCA6390: fix support on non-DT platforms
      - QCA6390: don't call pwrseq_power_off() twice
      - fix a NULL-pointer derefence at shutdown

   - eth: ice: fix incorrect assigns of FEC counters

  Previous releases - regressions:

   - mptcp: fix handling endpoints with both 'signal' and 'subflow'
     flags set

   - virtio-net: fix changing ring count when vq IRQ coalescing not
     supported

   - eth: gve: fix use of netif_carrier_ok() during reconfig / reset

  Previous releases - always broken:

   - eth: idpf: fix bugs in queue re-allocation on reconfig / reset

   - ethtool: fix context creation with no parameters

  Misc:

   - linkwatch: use system_unbound_wq to ease RTNL contention"

* tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (41 commits)
  net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.
  ethtool: Fix context creation with no parameters
  net: ethtool: fix off-by-one error in max RSS context IDs
  net: pse-pd: tps23881: include missing bitfield.h header
  net: fec: Stop PPS on driver remove
  net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities
  l2tp: fix lockdep splat
  net: stmmac: dwmac4: fix PCS duplex mode decode
  idpf: fix UAFs when destroying the queues
  idpf: fix memleak in vport interrupt configuration
  idpf: fix memory leaks and crashes while performing a soft reset
  bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl()
  net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()
  net/smc: add the max value of fallback reason count
  Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
  Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()
  Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
  Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
  Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
  ice: Fix incorrect assigns of FEC counts
  ...

6 weeks agoMerge tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Thu, 8 Aug 2024 20:32:59 +0000 (13:32 -0700)]
Merge tag 'trace-v6.11-rc2' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Have reading of event format files test if the metadata still exists.

   When a event is freed, a flag (EVENT_FILE_FL_FREED) in the metadata
   is set to state that it is to prevent any new references to it from
   happening while waiting for existing references to close. When the
   last reference closes, the metadata is freed. But the "format" was
   missing a check to this flag (along with some other files) that
   allowed new references to happen, and a use-after-free bug to occur.

 - Have the trace event meta data use the refcount infrastructure
   instead of relying on its own atomic counters.

 - Have tracefs inodes use alloc_inode_sb() for allocation instead of
   using kmem_cache_alloc() directly.

 - Have eventfs_create_dir() return an ERR_PTR instead of NULL as the
   callers expect a real object or an ERR_PTR.

 - Have release_ei() use call_srcu() and not call_rcu() as all the
   protection is on SRCU and not RCU.

 - Fix ftrace_graph_ret_addr() to use the task passed in and not
   current.

 - Fix overflow bug in get_free_elt() where the counter can overflow the
   integer and cause an infinite loop.

 - Remove unused function ring_buffer_nr_pages()

 - Have tracefs freeing use the inode RCU infrastructure instead of
   creating its own.

   When the kernel had randomize structure fields enabled, the rcu field
   of the tracefs_inode was overlapping the rcu field of the inode
   structure, and corrupting it. Instead, use the destroy_inode()
   callback to do the initial cleanup of the code, and then have
   free_inode() free it.

* tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracefs: Use generic inode RCU for synchronizing freeing
  ring-buffer: Remove unused function ring_buffer_nr_pages()
  tracing: Fix overflow in get_free_elt()
  function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()
  eventfs: Use SRCU for freeing eventfs_inodes
  eventfs: Don't return NULL in eventfs_create_dir()
  tracefs: Fix inode allocation
  tracing: Use refcount for trace_event_file reference counter
  tracing: Have format file honor EVENT_FILE_FL_FREED

6 weeks agoMerge tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs
Linus Torvalds [Thu, 8 Aug 2024 20:27:31 +0000 (13:27 -0700)]
Merge tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Assorted little stuff:

   - lockdep fixup for lockdep_set_notrack_class()

   - we can now remove a device when using erasure coding without
     deadlocking, though we still hit other issues

   - the 'allocator stuck' timeout is now configurable, and messages are
     ratelimited. The default timeout has been increased from 10 seconds
     to 30"

* tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs:
  bcachefs: Use bch2_wait_on_allocator() in btree node alloc path
  bcachefs: Make allocator stuck timeout configurable, ratelimit messages
  bcachefs: Add missing path_traverse() to btree_iter_next_node()
  bcachefs: ec should not allocate from ro devs
  bcachefs: Improved allocator debugging for ec
  bcachefs: Add missing bch2_trans_begin() call
  bcachefs: Add a comment for bucket helper types
  bcachefs: Don't rely on implicit unsigned -> signed integer conversion
  lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT
  bcachefs: Fix double free of ca->buckets_nouse

6 weeks agomodule: warn about excessively long module waits
Linus Torvalds [Thu, 8 Aug 2024 19:29:40 +0000 (12:29 -0700)]
module: warn about excessively long module waits

Russell King reported that the arm cbc(aes) crypto module hangs when
loaded, and Herbert Xu bisected it to commit 9b9879fc0327 ("modules:
catch concurrent module loads, treat them as idempotent"), and noted:

 "So what's happening here is that the first modprobe tries to load a
  fallback CBC implementation, in doing so it triggers a load of the
  exact same module due to module aliases.

  IOW we're loading aes-arm-bs which provides cbc(aes). However, this
  needs a fallback of cbc(aes) to operate, which is made out of the
  generic cbc module + any implementation of aes, or ecb(aes). The
  latter happens to also be provided by aes-arm-cb so that's why it
  tries to load the same module again"

So loading the aes-arm-bs module ends up wanting to recursively load
itself, and the recursive load then ends up waiting for the original
module load to complete.

This is a regression, in that it used to be that we just tried to load
the module multiple times, and then as we went on to install it the
second time we would instead just error out because the module name
already existed.

That is actually also exactly what the original "catch concurrent loads"
patch did in commit 9828ed3f695a ("module: error out early on concurrent
load of the same module file"), but it turns out that it ends up being
racy, in that erroring out before the module has been fully initialized
will cause failures in dependent module loading.

See commit ac2263b588df (which was the revert of that "error out early")
commit for details about why erroring out before the module has been
initialized is actually fundamentally racy.

Now, for the actual recursive module load (as opposed to just
concurrently loading the same module twice), the race is not an issue.

At the same time it's hard for the kernel to see that this is recursion,
because the module load is always done from a usermode helper, so the
recursion is not some simple callchain within the kernel.

End result: this is not the real fix, but this at least adds a warning
for the situation (admittedly much too late for all the debugging pain
that Russell and Herbert went through) and if we can come to a
resolution on how to detect the recursion properly, this re-organizes
the code to make that easier.

Link: https://lore.kernel.org/all/ZrFHLqvFqhzykuYw@shell.armlinux.org.uk/
Reported-by: Russell King <linux@armlinux.org.uk>
Debugged-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 weeks agoMerge tag 'loongarch-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 8 Aug 2024 18:22:04 +0000 (11:22 -0700)]
Merge tag 'loongarch-fixes-6.11-1' of git://git./linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Enable general EFI poweroff method to make poweroff usable on
  hardwares which lack ACPI S5, use accessors to page table entries
  instead of direct dereference to avoid potential problems, and two
  trivial kvm cleanups"

* tag 'loongarch-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall()
  LoongArch: KVM: Remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
  LoongArch: Use accessors to page table entries instead of direct dereference
  LoongArch: Enable general EFI poweroff method

6 weeks agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 8 Aug 2024 16:02:25 +0000 (09:02 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-08-07 (ice)

This series contains updates to ice driver only.

Grzegorz adds IRQ synchronization call before performing reset and
prevents writing to hardware when it is resetting.

Mateusz swaps incorrect assignment of FEC statistics.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Fix incorrect assigns of FEC counts
  ice: Skip PTP HW writes during PTP reset procedure
  ice: Fix reset handler
====================

Link: https://patch.msgid.link/20240807224521.3819189-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.
Martin Whitaker [Wed, 7 Aug 2024 20:52:09 +0000 (21:52 +0100)]
net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.

As noted in the device errata [1-8], EEE support is not fully operational
in the KSZ8567, KSZ9477, KSZ9567, KSZ9896, and KSZ9897 devices, causing
link drops when connected to another device that supports EEE. The patch
series "net: add EEE support for KSZ9477 switch family" merged in commit
9b0bf4f77162 caused EEE support to be enabled in these devices. A fix for
this regression for the KSZ9477 alone was merged in commit 08c6d8bae48c2.
This patch extends this fix to the other affected devices.

[1] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ8567R-Errata-DS80000752.pdf
[2] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ8567S-Errata-DS80000753.pdf
[3] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ9477S-Errata-DS80000754.pdf
[4] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ9567R-Errata-DS80000755.pdf
[5] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ9567S-Errata-DS80000756.pdf
[6] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ9896C-Errata-DS80000757.pdf
[7] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ9897R-Errata-DS80000758.pdf
[8] https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/Errata/KSZ9897S-Errata-DS80000759.pdf

Fixes: 69d3b36ca045 ("net: dsa: microchip: enable EEE support") # for KSZ8567/KSZ9567/KSZ9896/KSZ9897
Link: https://lore.kernel.org/netdev/137ce1ee-0b68-4c96-a717-c8164b514eec@martin-whitaker.me.uk/
Signed-off-by: Martin Whitaker <foss@martin-whitaker.me.uk>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Link: https://patch.msgid.link/20240807205209.21464-1-foss@martin-whitaker.me.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoethtool: Fix context creation with no parameters
Gal Pressman [Wed, 7 Aug 2024 17:33:52 +0000 (20:33 +0300)]
ethtool: Fix context creation with no parameters

The 'at least one change' requirement is not applicable for context
creation, skip the check in such case.
This allows a command such as 'ethtool -X eth0 context new' to work.

The command works by mistake when using older versions of userspace
ethtool due to an incompatibility issue where rxfh.input_xfrm is passed
as zero (unset) instead of RXH_XFRM_NO_CHANGE as done with recent
userspace. This patch does not try to solve the incompatibility issue.

Link: https://lore.kernel.org/netdev/05ae8316-d3aa-4356-98c6-55ed4253c8a7@nvidia.com/
Fixes: 84a1d9c48200 ("net: ethtool: extend RXNFC API to support RSS spreading of filter matches")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20240807173352.3501746-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ethtool: fix off-by-one error in max RSS context IDs
Edward Cree [Wed, 7 Aug 2024 16:06:12 +0000 (17:06 +0100)]
net: ethtool: fix off-by-one error in max RSS context IDs

Both ethtool_ops.rxfh_max_context_id and the default value used when
 it's not specified are supposed to be exclusive maxima (the former
 is documented as such; the latter, U32_MAX, cannot be used as an ID
 since it equals ETH_RXFH_CONTEXT_ALLOC), but xa_alloc() expects an
 inclusive maximum.
Subtract one from 'limit' to produce an inclusive maximum, and pass
 that to xa_alloc().
Increase bnxt's max by one to prevent a (very minor) regression, as
 BNXT_MAX_ETH_RSS_CTX is an inclusive max.  This is safe since bnxt
 is not actually hard-limited; BNXT_MAX_ETH_RSS_CTX is just a
 leftover from old driver code that managed context IDs itself.
Rename rxfh_max_context_id to rxfh_max_num_contexts to make its
 semantics (hopefully) more obvious.

Fixes: 847a8ab18676 ("net: ethtool: let the core choose RSS context IDs")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/5a2d11a599aa5b0cc6141072c01accfb7758650c.1723045898.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: pse-pd: tps23881: include missing bitfield.h header
Arnd Bergmann [Wed, 7 Aug 2024 07:54:22 +0000 (09:54 +0200)]
net: pse-pd: tps23881: include missing bitfield.h header

Using FIELD_GET() fails in configurations that don't already include
the header file indirectly:

drivers/net/pse-pd/tps23881.c: In function 'tps23881_i2c_probe':
drivers/net/pse-pd/tps23881.c:755:13: error: implicit declaration of function 'FIELD_GET' [-Wimplicit-function-declaration]
  755 |         if (FIELD_GET(TPS23881_REG_DEVID_MASK, ret) != TPS23881_DEVICE_ID) {
      |             ^~~~~~~~~

Fixes: 89108cb5c285 ("net: pse-pd: tps23881: Fix the device ID check")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240807075455.2055224-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: fec: Stop PPS on driver remove
Csókás, Bence [Wed, 7 Aug 2024 08:09:56 +0000 (10:09 +0200)]
net: fec: Stop PPS on driver remove

PPS was not stopped in `fec_ptp_stop()`, called when
the adapter was removed. Consequentially, you couldn't
safely reload the driver with the PPS signal on.

Fixes: 32cba57ba74b ("net: fec: introduce fec_ptp_stop and use in probe fail path")
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/netdev/CAOMZO5BzcZR8PwKKwBssQq_wAGzVgf1ffwe_nhpQJjviTdxy-w@mail.gmail.com/T/#m01dcb810bfc451a492140f6797ca77443d0cb79f
Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20240807080956.2556602-1-csokas.bence@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities
Florian Fainelli [Tue, 6 Aug 2024 17:56:59 +0000 (10:56 -0700)]
net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities

Some Wake-on-LAN modes such as WAKE_FILTER may only be supported by the MAC,
while others might be only supported by the PHY. Make sure that the .get_wol()
returns the union of both rather than only that of the PHY if the PHY supports
Wake-on-LAN.

Fixes: 7e400ff35cbe ("net: bcmgenet: Add support for PHY-based Wake-on-LAN")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20240806175659.3232204-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agol2tp: fix lockdep splat
James Chapman [Tue, 6 Aug 2024 16:06:26 +0000 (17:06 +0100)]
l2tp: fix lockdep splat

When l2tp tunnels use a socket provided by userspace, we can hit
lockdep splats like the below when data is transmitted through another
(unrelated) userspace socket which then gets routed over l2tp.

This issue was previously discussed here:
https://lore.kernel.org/netdev/87sfialu2n.fsf@cloudflare.com/

The solution is to have lockdep treat socket locks of l2tp tunnel
sockets separately than those of standard INET sockets. To do so, use
a different lockdep subclass where lock nesting is possible.

  ============================================
  WARNING: possible recursive locking detected
  6.10.0+ #34 Not tainted
  --------------------------------------------
  iperf3/771 is trying to acquire lock:
  ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0

  but task is already holding lock:
  ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(slock-AF_INET/1);
    lock(slock-AF_INET/1);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  10 locks held by iperf3/771:
   #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40
   #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0
   #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260
   #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
   #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450
   #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450

  stack backtrace:
  CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x69/0xa0
   dump_stack+0xc/0x20
   __lock_acquire+0x135d/0x2600
   ? srso_alias_return_thunk+0x5/0xfbef5
   lock_acquire+0xc4/0x2a0
   ? l2tp_xmit_skb+0x243/0x9d0
   ? __skb_checksum+0xa3/0x540
   _raw_spin_lock_nested+0x35/0x50
   ? l2tp_xmit_skb+0x243/0x9d0
   l2tp_xmit_skb+0x243/0x9d0
   l2tp_eth_dev_xmit+0x3c/0xc0
   dev_hard_start_xmit+0x11e/0x420
   sch_direct_xmit+0xc3/0x640
   __dev_queue_xmit+0x61c/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   __tcp_send_ack+0x1b8/0x340
   tcp_send_ack+0x23/0x30
   __tcp_ack_snd_check+0xa8/0x530
   ? srso_alias_return_thunk+0x5/0xfbef5
   tcp_rcv_established+0x412/0xd70
   tcp_v4_do_rcv+0x299/0x420
   tcp_v4_rcv+0x1991/0x1e10
   ip_protocol_deliver_rcu+0x50/0x220
   ip_local_deliver_finish+0x158/0x260
   ip_local_deliver+0xc8/0xe0
   ip_rcv+0xe5/0x1d0
   ? __pfx_ip_rcv+0x10/0x10
   __netif_receive_skb_one_core+0xce/0xe0
   ? process_backlog+0x28b/0x9f0
   __netif_receive_skb+0x34/0xd0
   ? process_backlog+0x28b/0x9f0
   process_backlog+0x2cb/0x9f0
   __napi_poll.constprop.0+0x61/0x280
   net_rx_action+0x332/0x670
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? find_held_lock+0x2b/0x80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   handle_softirqs+0xda/0x480
   ? __dev_queue_xmit+0xa2c/0x1450
   do_softirq+0xa1/0xd0
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0xc8/0xe0
   ? __dev_queue_xmit+0xa2c/0x1450
   __dev_queue_xmit+0xa48/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   tcp_write_xmit+0x766/0x2fb0
   ? __entry_text_end+0x102ba9/0x102bad
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __might_fault+0x74/0xc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   __tcp_push_pending_frames+0x56/0x190
   tcp_push+0x117/0x310
   tcp_sendmsg_locked+0x14c1/0x1740
   tcp_sendmsg+0x28/0x40
   inet_sendmsg+0x5d/0x90
   sock_write_iter+0x242/0x2b0
   vfs_write+0x68d/0x800
   ? __pfx_sock_write_iter+0x10/0x10
   ksys_write+0xc8/0xf0
   __x64_sys_write+0x3d/0x50
   x64_sys_call+0xfaf/0x1f50
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f4d143af992
  Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0
  RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992
  RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005
  RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc
  R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0
   </TASK>

Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+6acef9e0a4d1f46c83d4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4
CC: gnault@redhat.com
CC: cong.wang@bytedance.com
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Link: https://patch.msgid.link/20240806160626.1248317-1-jchapman@katalix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: stmmac: dwmac4: fix PCS duplex mode decode
Russell King (Oracle) [Tue, 6 Aug 2024 13:08:41 +0000 (14:08 +0100)]
net: stmmac: dwmac4: fix PCS duplex mode decode

dwmac4 was decoding the duplex mode from the GMAC_PHYIF_CONTROL_STATUS
register incorrectly, using GMAC_PHYIF_CTRLSTATUS_LNKMOD_MASK (value 1)
rather than GMAC_PHYIF_CTRLSTATUS_LNKMOD (bit 16). Fix this.

Fixes: 70523e639bf8c ("drivers: net: stmmac: reworking the PCS code.")
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1sbJvd-001rGD-E3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 8 Aug 2024 14:32:20 +0000 (07:32 -0700)]
Merge tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Nine hotfixes. Five are cc:stable, the others either pertain to
  post-6.10 material or aren't considered necessary for earlier kernels.

  Five are MM and four are non-MM. No identifiable theme here - please
  see the individual changelogs"

* tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  padata: Fix possible divide-by-0 panic in padata_mt_helper()
  mailmap: update entry for David Heidelberg
  memcg: protect concurrent access to mem_cgroup_idr
  mm: shmem: fix incorrect aligned index when checking conflicts
  mm: shmem: avoid allocating huge pages larger than MAX_PAGECACHE_ORDER for shmem
  mm: list_lru: fix UAF for memory cgroup
  kcov: properly check for softirq context
  MAINTAINERS: Update LTP members and web
  selftests: mm: add s390 to ARCH check

6 weeks agoMerge tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Thu, 8 Aug 2024 03:31:42 +0000 (20:31 -0700)]
Merge tag 'for-net-2024-08-07' of git://git./linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_sync: avoid dup filtering when passive scanning with adv monitor
 - hci_qca: don't call pwrseq_power_off() twice for QCA6390
 - hci_qca: fix QCA6390 support on non-DT platforms
 - hci_qca: fix a NULL-pointer derefence at shutdown
 - l2cap: always unlock channel in l2cap_conless_channel()

* tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
  Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()
  Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
  Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
  Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
====================

Link: https://patch.msgid.link/20240807210103.142483-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'idpf-fix-3-bugs-revealed-by-the-chapter-i'
Jakub Kicinski [Thu, 8 Aug 2024 03:26:57 +0000 (20:26 -0700)]
Merge branch 'idpf-fix-3-bugs-revealed-by-the-chapter-i'

Tony Nguyen says:

====================
idpf: fix 3 bugs revealed by the Chapter I

Alexander Lobakin says:

The libeth conversion revealed 2 serious issues which lead to sporadic
crashes or WARNs under certain configurations. Additional one was found
while debugging these two with kmemleak.
This one is targeted stable, the rest can be backported manually later
if needed. They can be reproduced only after the conversion is applied
anyway.
====================

Link: https://patch.msgid.link/20240806220923.3359860-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoidpf: fix UAFs when destroying the queues
Alexander Lobakin [Tue, 6 Aug 2024 22:09:22 +0000 (15:09 -0700)]
idpf: fix UAFs when destroying the queues

The second tagged commit started sometimes (very rarely, but possible)
throwing WARNs from
net/core/page_pool.c:page_pool_disable_direct_recycling().
Turned out idpf frees interrupt vectors with embedded NAPIs *before*
freeing the queues making page_pools' NAPI pointers lead to freed
memory before these pools are destroyed by libeth.
It's not clear whether there are other accesses to the freed vectors
when destroying the queues, but anyway, we usually free queue/interrupt
vectors only when the queues are destroyed and the NAPIs are guaranteed
to not be referenced anywhere.

Invert the allocation and freeing logic making queue/interrupt vectors
be allocated first and freed last. Vectors don't require queues to be
present, so this is safe. Additionally, this change allows to remove
that useless queue->q_vector pointer cleanup, as vectors are still
valid when freeing the queues (+ both are freed within one function,
so it's not clear why nullify the pointers at all).

Fixes: 1c325aac10a8 ("idpf: configure resources for TX queues")
Fixes: 90912f9f4f2d ("idpf: convert header split mode to libeth + napi_build_skb()")
Reported-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240806220923.3359860-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoidpf: fix memleak in vport interrupt configuration
Michal Kubiak [Tue, 6 Aug 2024 22:09:21 +0000 (15:09 -0700)]
idpf: fix memleak in vport interrupt configuration

The initialization of vport interrupt consists of two functions:
 1) idpf_vport_intr_init() where a generic configuration is done
 2) idpf_vport_intr_req_irq() where the irq for each q_vector is
   requested.

The first function used to create a base name for each interrupt using
"kasprintf()" call. Unfortunately, although that call allocated memory
for a text buffer, that memory was never released.

Fix this by removing creating the interrupt base name in 1).
Instead, always create a full interrupt name in the function 2), because
there is no need to create a base name separately, considering that the
function 2) is never called out of idpf_vport_intr_init() context.

Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
Cc: stable@vger.kernel.org # 6.7
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240806220923.3359860-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoidpf: fix memory leaks and crashes while performing a soft reset
Alexander Lobakin [Tue, 6 Aug 2024 22:09:20 +0000 (15:09 -0700)]
idpf: fix memory leaks and crashes while performing a soft reset

The second tagged commit introduced a UAF, as it removed restoring
q_vector->vport pointers after reinitializating the structures.
This is due to that all queue allocation functions are performed here
with the new temporary vport structure and those functions rewrite
the backpointers to the vport. Then, this new struct is freed and
the pointers start leading to nowhere.

But generally speaking, the current logic is very fragile. It claims
to be more reliable when the system is low on memory, but in fact, it
consumes two times more memory as at the moment of running this
function, there are two vports allocated with their queues and vectors.
Moreover, it claims to prevent the driver from running into "bad state",
but in fact, any error during the rebuild leaves the old vport in the
partially allocated state.
Finally, if the interface is down when the function is called, it always
allocates a new queue set, but when the user decides to enable the
interface later on, vport_open() allocates them once again, IOW there's
a clear memory leak here.

Just don't allocate a new queue set when performing a reset, that solves
crashes and memory leaks. Readd the old queue number and reopen the
interface on rollback - that solves limbo states when the device is left
disabled and/or without HW queues enabled.

Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks")
Fixes: e4891e4687c8 ("idpf: split &idpf_queue into 4 strictly-typed queue structures")
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240806220923.3359860-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agobnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl()
Michael Chan [Tue, 6 Aug 2024 05:37:42 +0000 (22:37 -0700)]
bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl()

A recent commit has modified the code in __bnxt_reserve_rings() to
set the default RSS indirection table to default only when the number
of RX rings is changing.  While this works for newer firmware that
requires RX ring reservations, it causes the regression on older
firmware not requiring RX ring resrvations (BNXT_NEW_RM() returns
false).

With older firmware, RX ring reservations are not required and so
hw_resc->resv_rx_rings is not always set to the proper value.  The
comparison:

if (old_rx_rings != bp->hw_resc.resv_rx_rings)

in __bnxt_reserve_rings() may be false even when the RX rings are
changing.  This will cause __bnxt_reserve_rings() to skip setting
the default RSS indirection table to default to match the current
number of RX rings.  This may later cause bnxt_fill_hw_rss_tbl() to
use an out-of-range index.

We already have bnxt_check_rss_tbl_no_rmgr() to handle exactly this
scenario.  We just need to move it up in bnxt_need_reserve_rings()
to be called unconditionally when using older firmware.  Without the
fix, if the TX rings are changing, we'll skip the
bnxt_check_rss_tbl_no_rmgr() call and __bnxt_reserve_rings() may also
skip the bnxt_set_dflt_rss_indir_tbl() call for the reason explained
in the last paragraph.  Without setting the default RSS indirection
table to default, it causes the regression:

BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
Call Trace:
__bnxt_hwrm_vnic_set_rss+0xb79/0xe40
 bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
 __bnxt_setup_vnic_p5+0x12e/0x270
 __bnxt_open_nic+0x2262/0x2f30
 bnxt_open_nic+0x5d/0xf0
 ethnl_set_channels+0x5d4/0xb30
 ethnl_default_set_doit+0x2f1/0x620

Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/netdev/ZrC6jpghA3PWVWSB@gmail.com/
Fixes: 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240806053742.140304-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()
Joe Hattori [Tue, 6 Aug 2024 01:13:27 +0000 (10:13 +0900)]
net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()

bcm_sf2_mdio_register() calls of_phy_find_device() and then
phy_device_remove() in a loop to remove existing PHY devices.
of_phy_find_device() eventually calls bus_find_device(), which calls
get_device() on the returned struct device * to increment the refcount.
The current implementation does not decrement the refcount, which causes
memory leak.

This commit adds the missing phy_device_free() call to decrement the
refcount via put_device() to balance the refcount.

Fixes: 771089c2a485 ("net: dsa: bcm_sf2: Ensure that MDIO diversion is used")
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20240806011327.3817861-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/smc: add the max value of fallback reason count
Zhengchao Shao [Mon, 5 Aug 2024 04:38:56 +0000 (12:38 +0800)]
net/smc: add the max value of fallback reason count

The number of fallback reasons defined in the smc_clc.h file has reached
36. For historical reasons, some are no longer quoted, and there's 33
actually in use. So, add the max value of fallback reason count to 36.

Fixes: 6ac1e6563f59 ("net/smc: support smc v2.x features validate")
Fixes: 7f0620b9940b ("net/smc: support max connections per lgr negotiation")
Fixes: 69b888e3bb4b ("net/smc: support max links per lgr negotiation in clc handshake")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Link: https://patch.msgid.link/20240805043856.565677-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agopadata: Fix possible divide-by-0 panic in padata_mt_helper()
Waiman Long [Tue, 6 Aug 2024 17:46:47 +0000 (13:46 -0400)]
padata: Fix possible divide-by-0 panic in padata_mt_helper()

We are hit with a not easily reproducible divide-by-0 panic in padata.c at
bootup time.

  [   10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
  [   10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
  [   10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
  [   10.017908] Workqueue: events_unbound padata_mt_helper
  [   10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
    :
  [   10.017963] Call Trace:
  [   10.017968]  <TASK>
  [   10.018004]  ? padata_mt_helper+0x39/0xb0
  [   10.018084]  process_one_work+0x174/0x330
  [   10.018093]  worker_thread+0x266/0x3a0
  [   10.018111]  kthread+0xcf/0x100
  [   10.018124]  ret_from_fork+0x31/0x50
  [   10.018138]  ret_from_fork_asm+0x1a/0x30
  [   10.018147]  </TASK>

Looking at the padata_mt_helper() function, the only way a divide-by-0
panic can happen is when ps->chunk_size is 0.  The way that chunk_size is
initialized in padata_do_multithreaded(), chunk_size can be 0 when the
min_chunk in the passed-in padata_mt_job structure is 0.

Fix this divide-by-0 panic by making sure that chunk_size will be at least
1 no matter what the input parameters are.

Link: https://lkml.kernel.org/r/20240806174647.1050398-1-longman@redhat.com
Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agomailmap: update entry for David Heidelberg
David Heidelberg [Sun, 4 Aug 2024 05:45:54 +0000 (14:45 +0900)]
mailmap: update entry for David Heidelberg

Link my old gmail address to my active email.

Link: https://lkml.kernel.org/r/20240804054704.859503-1-david@ixit.cz
Signed-off-by: David Heidelberg <david@ixit.cz>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agomemcg: protect concurrent access to mem_cgroup_idr
Shakeel Butt [Fri, 2 Aug 2024 23:58:22 +0000 (16:58 -0700)]
memcg: protect concurrent access to mem_cgroup_idr

Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures.  It introduced IDR to maintain the memcg ID
space.  The IDR depends on external synchronization mechanisms for
modifications.  For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications.  However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero.  Fix that.

We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time.  These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code.  Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object.  The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success.  No evidence were found for these cases.

Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them.  So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove().  These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them.  Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.

Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev
Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agomm: shmem: fix incorrect aligned index when checking conflicts
Baolin Wang [Wed, 31 Jul 2024 05:46:20 +0000 (13:46 +0800)]
mm: shmem: fix incorrect aligned index when checking conflicts

In the shmem_suitable_orders() function, xa_find() is used to check for
conflicts in the pagecache to select suitable huge orders.  However, when
checking each huge order in every loop, the aligned index is calculated
from the previous iteration, which may cause suitable huge orders to be
missed.

We should use the original index each time in the loop to calculate a new
aligned index for checking conflicts to avoid this issue.

Link: https://lkml.kernel.org/r/07433b0f16a152bffb8cee34934a5c040e8e2ad6.1722404078.git.baolin.wang@linux.alibaba.com
Fixes: e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agomm: shmem: avoid allocating huge pages larger than MAX_PAGECACHE_ORDER for shmem
Baolin Wang [Wed, 31 Jul 2024 05:46:19 +0000 (13:46 +0800)]
mm: shmem: avoid allocating huge pages larger than MAX_PAGECACHE_ORDER for shmem

Similar to commit d659b715e94ac ("mm/huge_memory: avoid PMD-size page
cache if needed"), ARM64 can support 512MB PMD-sized THP when the base
page size is 64KB, which is larger than the maximum supported page cache
size MAX_PAGECACHE_ORDER.

This is not expected.  To fix this issue, use THP_ORDERS_ALL_FILE_DEFAULT
for shmem to filter allowable huge orders.

[baolin.wang@linux.alibaba.com: remove comment, per Barry]
Link: https://lkml.kernel.org/r/c55d7ef7-78aa-4ed6-b897-c3e03a3f3ab7@linux.alibaba.com
[wangkefeng.wang@huawei.com: remove local `orders']
Link: https://lkml.kernel.org/r/87769ae8-b6c6-4454-925d-1864364af9c8@huawei.com
Link: https://lkml.kernel.org/r/117121665254442c3c7f585248296495e5e2b45c.1722404078.git.baolin.wang@linux.alibaba.com
Fixes: e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agomm: list_lru: fix UAF for memory cgroup
Muchun Song [Thu, 18 Jul 2024 08:36:07 +0000 (16:36 +0800)]
mm: list_lru: fix UAF for memory cgroup

The mem_cgroup_from_slab_obj() is supposed to be called under rcu lock or
cgroup_mutex or others which could prevent returned memcg from being
freed.  Fix it by adding missing rcu read lock.

Found by code inspection.

[songmuchun@bytedance.com: only grab rcu lock when necessary, per Vlastimil]
Link: https://lkml.kernel.org/r/20240801024603.1865-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20240718083607.42068-1-songmuchun@bytedance.com
Fixes: 0a97c01cd20b ("list_lru: allow explicit memcg and NUMA node selection")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agokcov: properly check for softirq context
Andrey Konovalov [Mon, 29 Jul 2024 02:21:58 +0000 (04:21 +0200)]
kcov: properly check for softirq context

When collecting coverage from softirqs, KCOV uses in_serving_softirq() to
check whether the code is running in the softirq context.  Unfortunately,
in_serving_softirq() is > 0 even when the code is running in the hardirq
or NMI context for hardirqs and NMIs that happened during a softirq.

As a result, if a softirq handler contains a remote coverage collection
section and a hardirq with another remote coverage collection section
happens during handling the softirq, KCOV incorrectly detects a nested
softirq coverate collection section and prints a WARNING, as reported by
syzbot.

This issue was exposed by commit a7f3813e589f ("usb: gadget: dummy_hcd:
Switch to hrtimer transfer scheduler"), which switched dummy_hcd to using
hrtimer and made the timer's callback be executed in the hardirq context.

Change the related checks in KCOV to account for this behavior of
in_serving_softirq() and make KCOV ignore remote coverage collection
sections in the hardirq and NMI contexts.

This prevents the WARNING printed by syzbot but does not fix the inability
of KCOV to collect coverage from the __usb_hcd_giveback_urb when dummy_hcd
is in use (caused by a7f3813e589f); a separate patch is required for that.

Link: https://lkml.kernel.org/r/20240729022158.92059-1-andrey.konovalov@linux.dev
Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts")
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac
Acked-by: Marco Elver <elver@google.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marcello Sylvester Bauer <sylv@sylv.io>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agoMAINTAINERS: Update LTP members and web
Petr Vorel [Fri, 26 Jul 2024 07:20:09 +0000 (09:20 +0200)]
MAINTAINERS: Update LTP members and web

LTP project uses now readthedocs.org instance instead of GitHub wiki.

LTP maintainers are listed in alphabetical order.

Link: https://lkml.kernel.org/r/20240726072009.1021599-1-pvorel@suse.cz
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Li Wang <liwang@redhat.com>
Reviewed-by: Cyril Hrubis <chrubis@suse.cz>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Xiao Yang <yangx.jy@fujitsu.com>
Cc: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agoselftests: mm: add s390 to ARCH check
Nico Pache [Wed, 24 Jul 2024 21:35:17 +0000 (15:35 -0600)]
selftests: mm: add s390 to ARCH check

commit 0518dbe97fe6 ("selftests/mm: fix cross compilation with LLVM")
changed the env variable for the architecture from MACHINE to ARCH.

This is preventing 3 required TEST_GEN_FILES from being included when
cross compiling s390x and errors when trying to run the test suite.  This
is due to the ARCH variable already being set and the arch folder name
being s390.

Add "s390" to the filtered list to cover this case and have the 3 files
included in the build.

Link: https://lkml.kernel.org/r/20240724213517.23918-1-npache@redhat.com
Fixes: 0518dbe97fe6 ("selftests/mm: fix cross compilation with LLVM")
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
6 weeks agobcachefs: Use bch2_wait_on_allocator() in btree node alloc path
Kent Overstreet [Wed, 7 Aug 2024 20:40:14 +0000 (16:40 -0400)]
bcachefs: Use bch2_wait_on_allocator() in btree node alloc path

If the allocator gets stuck, we need to know why.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Make allocator stuck timeout configurable, ratelimit messages
Kent Overstreet [Wed, 7 Aug 2024 17:58:57 +0000 (13:58 -0400)]
bcachefs: Make allocator stuck timeout configurable, ratelimit messages

Limit these messages to once every 2 minutes to avoid spamming logs;
with multiple devices the output can be quite significant.

Also, up the default timeout to 30 seconds from 10 seconds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Add missing path_traverse() to btree_iter_next_node()
Kent Overstreet [Wed, 7 Aug 2024 22:03:54 +0000 (18:03 -0400)]
bcachefs: Add missing path_traverse() to btree_iter_next_node()

This fixes a bug exposed by the next path - we pop an assert in
path_set_should_be_locked().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agotracefs: Use generic inode RCU for synchronizing freeing
Steven Rostedt [Wed, 7 Aug 2024 22:54:02 +0000 (18:54 -0400)]
tracefs: Use generic inode RCU for synchronizing freeing

With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.

For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:

[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
[<...>] ------------[ cut here ]------------
[<...>] kernel BUG at lib/list_debug.c:54!
[<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ #122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
[<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
[<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
[<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
[<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
[<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
[<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
[<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
[<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
[<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
[<...>] RSI: __func__.47+0x4340/0x4400
[<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
[<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
[<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
[<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
[<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
[<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
[<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
[<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[<...>] ASID: 0003
[<...>] Stack:
[<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
[<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
[<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
[<...>] Call Trace:
[<...>]  <TASK>
[<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
[<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
[<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
[<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
[<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
[<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
[<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
[<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
[<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
[<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
[<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
[<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
[<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
[<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
[<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
[<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
[<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
[<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
[<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
[<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
[<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
[<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
[<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
[<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
[<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
[<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
[<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
[<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
[<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
[<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
[<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
[<...>]  </TASK>
[<...>]  <PTREGS>
[<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
[<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
[<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
[<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
[<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
[<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
[<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
[<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
[<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
[<...>]  </PTREGS>
[<...>] Modules linked in:
[<...>] ---[ end trace 0000000000000000 ]---

The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:

  struct tracefs_inode {
    union {
      struct inode {
        struct list_head {
          struct list_head * next;                    /*     0     8 */
          struct list_head * prev;                    /*     8     8 */
        } i_lru;
        [...]
      } vfs_inode;
      struct callback_head {
        void (*func)(struct callback_head *);         /*     0     8 */
        struct callback_head * next;                  /*     8     8 */
      } rcu;
    };
    [...]
  };

Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.

For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:

  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -629,6 +629,7 @@ struct inode {
    umode_t i_mode;
    unsigned short i_opflags;
    kuid_t i_uid;
  + struct list_head i_lru; /* inode LRU list */
    kgid_t i_gid;
    unsigned int i_flags;

  @@ -690,7 +691,6 @@ struct inode {
    u16 i_wb_frn_avg_time;
    u16 i_wb_frn_history;
   #endif
  - struct list_head i_lru; /* inode LRU list */
    struct list_head i_sb_list;
    struct list_head i_wb_list; /* backing dev writeback list */
    union {

The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".

The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.

Link: https://lore.kernel.org/all/20240807115143.45927-3-minipli@grsecurity.net/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <digirigawa@gmail.com>
Link: https://lore.kernel.org/20240807185402.61410544@gandalf.local.home
Fixes: baa23a8d4360 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Mathias Krause <minipli@grsecurity.net>
Reported-by: Brad Spengler <spender@grsecurity.net>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agoring-buffer: Remove unused function ring_buffer_nr_pages()
Jianhui Zhou [Mon, 5 Aug 2024 11:36:31 +0000 (19:36 +0800)]
ring-buffer: Remove unused function ring_buffer_nr_pages()

Because ring_buffer_nr_pages() is not an inline function and user accesses
buffer->buffers[cpu]->nr_pages directly, the function ring_buffer_nr_pages
is removed.

Signed-off-by: Jianhui Zhou <912460177@qq.com>
Link: https://lore.kernel.org/tencent_F4A7E9AB337F44E0F4B858D07D19EF460708@qq.com
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agotracing: Fix overflow in get_free_elt()
Tze-nan Wu [Mon, 5 Aug 2024 05:59:22 +0000 (13:59 +0800)]
tracing: Fix overflow in get_free_elt()

"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.

Once it overflows, new elements can still be inserted into the tracing_map
even though the maximum number of elements (`max_elts`) has been reached.
Continuing to insert elements after the overflow could result in the
tracing_map containing "tracing_map->max_size" elements, leaving no empty
entries.
If any attempt is made to insert an element into a full tracing_map using
`__tracing_map_insert()`, it will cause an infinite loop with preemption
disabled, leading to a CPU hang problem.

Fix this by preventing any further increments to "tracing_map->next_elt"
once it reaches "tracing_map->max_elt".

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 08d43a5fa063e ("tracing: Add lock-free tracing_map")
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Link: https://lore.kernel.org/20240805055922.6277-1-Tze-nan.Wu@mediatek.com
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agofunction_graph: Fix the ret_stack used by ftrace_graph_ret_addr()
Petr Pavlu [Sat, 3 Aug 2024 13:09:26 +0000 (15:09 +0200)]
function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()

When ftrace_graph_ret_addr() is invoked to convert a found stack return
address to its original value, the function can end up producing the
following crash:

[   95.442712] BUG: kernel NULL pointer dereference, address: 0000000000000028
[   95.442720] #PF: supervisor read access in kernel mode
[   95.442724] #PF: error_code(0x0000) - not-present page
[   95.442727] PGD 0 P4D 0-
[   95.442731] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[   95.442736] CPU: 1 UID: 0 PID: 2214 Comm: insmod Kdump: loaded Tainted: G           OE K    6.11.0-rc1-default #1 67c62a3b3720562f7e7db5f11c1fdb40b7a2857c
[   95.442747] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH
[   95.442750] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[   95.442754] RIP: 0010:ftrace_graph_ret_addr+0x42/0xc0
[   95.442766] Code: [...]
[   95.442773] RSP: 0018:ffff979b80ff7718 EFLAGS: 00010006
[   95.442776] RAX: ffffffff8ca99b10 RBX: ffff979b80ff7760 RCX: ffff979b80167dc0
[   95.442780] RDX: ffffffff8ca99b10 RSI: ffff979b80ff7790 RDI: 0000000000000005
[   95.442783] RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
[   95.442786] R10: 0000000000000005 R11: 0000000000000000 R12: ffffffff8e9491e0
[   95.442790] R13: ffffffff8d6f70f0 R14: ffff979b80167da8 R15: ffff979b80167dc8
[   95.442793] FS:  00007fbf83895740(0000) GS:ffff8a0afdd00000(0000) knlGS:0000000000000000
[   95.442797] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.442800] CR2: 0000000000000028 CR3: 0000000005070002 CR4: 0000000000370ef0
[   95.442806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   95.442809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   95.442816] Call Trace:
[   95.442823]  <TASK>
[   95.442896]  unwind_next_frame+0x20d/0x830
[   95.442905]  arch_stack_walk_reliable+0x94/0xe0
[   95.442917]  stack_trace_save_tsk_reliable+0x7d/0xe0
[   95.442922]  klp_check_and_switch_task+0x55/0x1a0
[   95.442931]  task_call_func+0xd3/0xe0
[   95.442938]  klp_try_switch_task.part.5+0x37/0x150
[   95.442942]  klp_try_complete_transition+0x79/0x2d0
[   95.442947]  klp_enable_patch+0x4db/0x890
[   95.442960]  do_one_initcall+0x41/0x2e0
[   95.442968]  do_init_module+0x60/0x220
[   95.442975]  load_module+0x1ebf/0x1fb0
[   95.443004]  init_module_from_file+0x88/0xc0
[   95.443010]  idempotent_init_module+0x190/0x240
[   95.443015]  __x64_sys_finit_module+0x5b/0xc0
[   95.443019]  do_syscall_64+0x74/0x160
[   95.443232]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   95.443236] RIP: 0033:0x7fbf82f2c709
[   95.443241] Code: [...]
[   95.443247] RSP: 002b:00007fffd5ea3b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   95.443253] RAX: ffffffffffffffda RBX: 000056359c48e750 RCX: 00007fbf82f2c709
[   95.443257] RDX: 0000000000000000 RSI: 000056356ed4efc5 RDI: 0000000000000003
[   95.443260] RBP: 000056356ed4efc5 R08: 0000000000000000 R09: 00007fffd5ea3c10
[   95.443263] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
[   95.443267] R13: 000056359c48e6f0 R14: 0000000000000000 R15: 0000000000000000
[   95.443272]  </TASK>
[   95.443274] Modules linked in: [...]
[   95.443385] Unloaded tainted modules: intel_uncore_frequency(E):1 isst_if_common(E):1 skx_edac(E):1
[   95.443414] CR2: 0000000000000028

The bug can be reproduced with kselftests:

 cd linux/tools/testing/selftests
 make TARGETS='ftrace livepatch'
 (cd ftrace; ./ftracetest test.d/ftrace/fgraph-filter.tc)
 (cd livepatch; ./test-livepatch.sh)

The problem is that ftrace_graph_ret_addr() is supposed to operate on the
ret_stack of a selected task but wrongly accesses the ret_stack of the
current task. Specifically, the above NULL dereference occurs when
task->curr_ret_stack is non-zero, but current->ret_stack is NULL.

Correct ftrace_graph_ret_addr() to work with the right ret_stack.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reported-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/20240803131211.17255-1-petr.pavlu@suse.com
Fixes: 7aa1eaef9f42 ("function_graph: Allow multiple users to attach to function graph")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agoeventfs: Use SRCU for freeing eventfs_inodes
Mathias Krause [Tue, 23 Jul 2024 21:07:53 +0000 (23:07 +0200)]
eventfs: Use SRCU for freeing eventfs_inodes

To mirror the SRCU lock held in eventfs_iterate() when iterating over
eventfs inodes, use call_srcu() to free them too.

This was accidentally(?) degraded to RCU in commit 43aa6f97c2d0
("eventfs: Get rid of dentry pointers without refcounts").

Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20240723210755.8970-1-minipli@grsecurity.net
Fixes: 43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agoeventfs: Don't return NULL in eventfs_create_dir()
Mathias Krause [Tue, 23 Jul 2024 12:25:21 +0000 (14:25 +0200)]
eventfs: Don't return NULL in eventfs_create_dir()

Commit 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing
ei->dentry") added another check, testing if the parent was freed after
we released the mutex. If so, the function returns NULL. However, all
callers expect it to either return a valid pointer or an error pointer,
at least since commit 5264a2f4bb3b ("tracing: Fix a NULL vs IS_ERR() bug
in event_subsystem_dir()"). Returning NULL will therefore fail the error
condition check in the caller.

Fix this by substituting the NULL return value with a fitting error
pointer.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Fixes: 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing ei->dentry")
Link: https://lore.kernel.org/20240723122522.2724-1-minipli@grsecurity.net
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agotracefs: Fix inode allocation
Mathias Krause [Wed, 7 Aug 2024 11:51:38 +0000 (13:51 +0200)]
tracefs: Fix inode allocation

The leading comment above alloc_inode_sb() is pretty explicit about it:

  /*
   * This must be used for allocating filesystems specific inodes to set
   * up the inode reclaim context correctly.
   */

Switch tracefs over to alloc_inode_sb() to make sure inodes are properly
linked.

Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20240807115143.45927-2-minipli@grsecurity.net
Fixes: ba37ff75e04b ("eventfs: Implement tracefs_inode_cache")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agotracing: Use refcount for trace_event_file reference counter
Steven Rostedt [Fri, 26 Jul 2024 18:42:08 +0000 (14:42 -0400)]
tracing: Use refcount for trace_event_file reference counter

Instead of using an atomic counter for the trace_event_file reference
counter, use the refcount interface. It has various checks to make sure
the reference counting is correct, and will warn if it detects an error
(like refcount_inc() on '0').

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240726144208.687cce24@rorschach.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agotracing: Have format file honor EVENT_FILE_FL_FREED
Steven Rostedt [Tue, 30 Jul 2024 15:06:57 +0000 (11:06 -0400)]
tracing: Have format file honor EVENT_FILE_FL_FREED

When eventfs was introduced, special care had to be done to coordinate the
freeing of the file meta data with the files that are exposed to user
space. The file meta data would have a ref count that is set when the file
is created and would be decremented and freed after the last user that
opened the file closed it. When the file meta data was to be freed, it
would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed,
and any new references made (like new opens or reads) would fail as it is
marked freed. This allowed other meta data to be freed after this flag was
set (under the event_mutex).

All the files that were dynamically created in the events directory had a
pointer to the file meta data and would call event_release() when the last
reference to the user space file was closed. This would be the time that it
is safe to free the file meta data.

A shortcut was made for the "format" file. It's i_private would point to
the "call" entry directly and not point to the file's meta data. This is
because all format files are the same for the same "call", so it was
thought there was no reason to differentiate them.  The other files
maintain state (like the "enable", "trigger", etc). But this meant if the
file were to disappear, the "format" file would be unaware of it.

This caused a race that could be trigger via the user_events test (that
would create dynamic events and free them), and running a loop that would
read the user_events format files:

In one console run:

 # cd tools/testing/selftests/user_events
 # while true; do ./ftrace_test; done

And in another console run:

 # cd /sys/kernel/tracing/
 # while true; do cat events/user_events/__test_event/format; done 2>/dev/null

With KASAN memory checking, it would trigger a use-after-free bug report
(which was a real bug). This was because the format file was not checking
the file's meta data flag "EVENT_FILE_FL_FREED", so it would access the
event that the file meta data pointed to after the event was freed.

After inspection, there are other locations that were found to not check
the EVENT_FILE_FL_FREED flag when accessing the trace_event_file. Add a
new helper function: event_file_file() that will make sure that the
event_mutex is held, and will return NULL if the trace_event_file has the
EVENT_FILE_FL_FREED flag set. Have the first reference of the struct file
pointer use event_file_file() and check for NULL. Later uses can still use
the event_file_data() helper function if the event_mutex is still held and
was not released since the event_file_file() call.

Link: https://lore.kernel.org/all/20240719204701.1605950-1-minipli@grsecurity.net/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Ilkka Naulapää <digirigawa@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: Alexey Makhalov <alexey.makhalov@broadcom.com>
Cc: Vasavi Sirnapalli <vasavi.sirnapalli@broadcom.com>
Link: https://lore.kernel.org/20240730110657.3b69d3c1@gandalf.local.home
Fixes: b63db58e2fa5d ("eventfs/tracing: Add callback for release of an eventfs_inode")
Reported-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 weeks agoBluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
Anton Khirnov [Mon, 29 Jul 2024 19:58:10 +0000 (21:58 +0200)]
Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor

This restores behaviour (including the comment) from now-removed
hci_request.c, and also matches existing code for active scanning.

Without this, the duplicates filter is always active when passive
scanning, which makes it impossible to work with devices that send
nontrivial dynamic data in their advertisement reports.

Fixes: abfeea476c68 ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY")
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: l2cap: always unlock channel in l2cap_conless_channel()
Dmitry Antipov [Wed, 31 Jul 2024 09:19:36 +0000 (12:19 +0300)]
Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()

Add missing call to 'l2cap_chan_unlock()' on receive error handling
path in 'l2cap_conless_channel()'.

Fixes: a24cce144b98 ("Bluetooth: Fix reference counting of global L2CAP channels")
Reported-by: syzbot+45ac74737e866894acb0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=45ac74737e866894acb0
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
Bartosz Golaszewski [Wed, 31 Jul 2024 15:20:50 +0000 (17:20 +0200)]
Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown

Unlike qca_regulator_init(), qca_power_shutdown() may be called for
QCA_ROME which does not have qcadev->bt_power assigned. Add a
NULL-pointer check before dereferencing the struct qca_power pointer.

Fixes: eba1718717b0 ("Bluetooth: hci_qca: make pwrseq calls the default if available")
Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Closes: https://lore.kernel.org/linux-bluetooth/su3wp6s44hrxf4ijvsdfzbvv4unu4ycb7kkvwbx6ltdafkldir@4g7ydqm2ap5j/
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: hci_qca: fix QCA6390 support on non-DT platforms
Bartosz Golaszewski [Wed, 31 Jul 2024 15:20:49 +0000 (17:20 +0200)]
Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms

QCA6390 can albo be used on non-DT systems so we must not make the power
sequencing the only option. Check if the serdev device consumes an OF
node. If so: honor the new contract as per the DT bindings. If not: fall
back to the previous behavior by falling through to the existing
default label.

Fixes: 9a15ce685706 ("Bluetooth: qca: use the power sequencer for QCA6390")
Reported-by: Wren Turkal <wt@penguintechs.org>
Closes: https://lore.kernel.org/linux-bluetooth/27e6a6c5-fb63-4219-be0b-eefa2c116e06@penguintechs.org/
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
Bartosz Golaszewski [Wed, 31 Jul 2024 15:20:48 +0000 (17:20 +0200)]
Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390

Now that we call pwrseq_power_off() for all models that hold a valid
power sequencing handle, we can remove the switch case for QCA_6390. The
switch will now use the default label for this model but that's fine: if
it has the BT-enable GPIO than we should use it.

Fixes: eba1718717b0 ("Bluetooth: hci_qca: make pwrseq calls the default if available")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoice: Fix incorrect assigns of FEC counts
Mateusz Polchlopek [Fri, 26 Jul 2024 10:19:28 +0000 (06:19 -0400)]
ice: Fix incorrect assigns of FEC counts

Commit ac21add2540e ("ice: Implement driver functionality to dump fec
statistics") introduces obtaining FEC correctable and uncorrectable
stats per netdev in ICE driver. Unfortunately the assignment of values
to fec_stats structure has been done incorrectly. This commit fixes the
assignments.

Fixes: ac21add2540e ("ice: Implement driver functionality to dump fec statistics")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoice: Skip PTP HW writes during PTP reset procedure
Grzegorz Nitka [Mon, 15 Jul 2024 15:39:11 +0000 (17:39 +0200)]
ice: Skip PTP HW writes during PTP reset procedure

Block HW write access for the driver while the device is in reset to
avoid potential race condition and access to the PTP HW in
non-nominal state which could lead to undesired effects

Fixes: 4aad5335969f ("ice: add individual interrupt allocation")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoice: Fix reset handler
Grzegorz Nitka [Mon, 15 Jul 2024 15:39:10 +0000 (17:39 +0200)]
ice: Fix reset handler

Synchronize OICR IRQ when preparing for reset to avoid potential
race conditions between the reset procedure and OICR

Fixes: 4aad5335969f ("ice: add individual interrupt allocation")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoMerge tag 'for-6.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 7 Aug 2024 16:53:41 +0000 (09:53 -0700)]
Merge tag 'for-6.11-rc2-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix double inode unlock for direct IO sync writes (reported by
   syzbot)

 - fix root tree id/name map definitions, don't use fixed size buffers
   for name (reported by -Werror=unterminated-string-initialization)

 - fix qgroup reserve leaks in bufferd write path

 - update scrub status structure more often so it can be reported in
   user space more accurately and let 'resume' not repeat work

 - in preparation to remove space cache v1 in the future print a warning
   if it's detected

* tag 'for-6.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: avoid using fixed char array size for tree names
  btrfs: fix double inode unlock for direct IO sync writes
  btrfs: emit a warning about space cache v1 being deprecated
  btrfs: fix qgroup reserve leaks in cow_file_range
  btrfs: implement launder_folio for clearing dirty page reserve
  btrfs: scrub: update last_physical after scrubbing one stripe
  btrfs: factor out stripe length calculation into a helper

6 weeks agoMerge tag 'for-v6.11-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux...
Linus Torvalds [Wed, 7 Aug 2024 16:45:21 +0000 (09:45 -0700)]
Merge tag 'for-v6.11-rc' of git://git./linux/kernel/git/sre/linux-power-supply

Pull power supply fixes from Sebastian Reichel:
 "rt5033:
   - fix driver regression causing kernel oops

  axp288-charger:
   - fix charge voltage setup

  qcom-battmgr:
   - fix thermal zone spamming errors
   - fix init on Qualcomm X Elite"

* tag 'for-v6.11-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: qcom_battmgr: Ignore extra __le32 in info payload
  power: supply: qcom_battmgr: return EAGAIN when firmware service is not up
  power: supply: axp288_charger: Round constant_charge_voltage writes down
  power: supply: axp288_charger: Fix constant_charge_voltage writes
  power: supply: rt5033: Bring back i2c_set_clientdata

6 weeks agobcachefs: ec should not allocate from ro devs
Kent Overstreet [Wed, 7 Aug 2024 03:30:48 +0000 (23:30 -0400)]
bcachefs: ec should not allocate from ro devs

This fixes a device removal deadlock when using erasure coding.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Improved allocator debugging for ec
Kent Overstreet [Wed, 7 Aug 2024 01:02:34 +0000 (21:02 -0400)]
bcachefs: Improved allocator debugging for ec

chasing down a device removal deadlock with erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Add missing bch2_trans_begin() call
Kent Overstreet [Wed, 7 Aug 2024 02:49:31 +0000 (22:49 -0400)]
bcachefs: Add missing bch2_trans_begin() call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Add a comment for bucket helper types
Kent Overstreet [Wed, 31 Jul 2024 00:35:59 +0000 (20:35 -0400)]
bcachefs: Add a comment for bucket helper types

We've had bugs in the past with incorrect integer conversions in disk
accounting code, which is why bucket helpers now always return s64s; add
a comment explaining this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Don't rely on implicit unsigned -> signed integer conversion
Kent Overstreet [Wed, 31 Jul 2024 00:33:25 +0000 (20:33 -0400)]
bcachefs: Don't rely on implicit unsigned -> signed integer conversion

implicit integer conversion is a fertile source of bugs, and we really
would rather not have the min()/max() macros doing it implicitly.
bcachefs appears to be the only place in the kernel where this happens,
so let's fix it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agolockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT
Kent Overstreet [Wed, 31 Jul 2024 01:14:08 +0000 (21:14 -0400)]
lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT

We won't find a contended lock if it's not being tracked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agoLoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall()
Dandan Zhang [Wed, 7 Aug 2024 09:37:14 +0000 (17:37 +0800)]
LoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall()

The kvm_hypercall() set for LoongArch is limited to a1-a5. So the
mention of a6 in the comment is undefined that needs to be rectified.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Dandan Zhang <zhangdandan@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
6 weeks agoLoongArch: KVM: Remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
Yuli Wang [Wed, 7 Aug 2024 09:37:14 +0000 (17:37 +0800)]
LoongArch: KVM: Remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS

1. "KVM_PRIVATE_MEM_SLOTS" is renamed as "KVM_INTERNAL_MEM_SLOTS".

2. "KVM_INTERNAL_MEM_SLOTS" defaults to zero, so it is not necessary to
define it in LoongArch's asm/kvm_host.h.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bdd1c37a315bc50ab14066c4852bc8dcf070451e
Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b075450868dbc0950f0942617f222eeb989cad10
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
6 weeks agoLoongArch: Use accessors to page table entries instead of direct dereference
Huacai Chen [Wed, 7 Aug 2024 09:37:11 +0000 (17:37 +0800)]
LoongArch: Use accessors to page table entries instead of direct dereference

As very well explained in commit 20a004e7b017cce282 ("arm64: mm: Use
READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose
page table walker can modify the PTE in parallel must use READ_ONCE()/
WRITE_ONCE() macro to avoid any compiler transformation.

So apply that to LoongArch which is such an architecture, in order to
avoid potential problems.

Similar to commit edf955647269422e ("riscv: Use accessors to page table
entries instead of direct dereference").

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
6 weeks agoLoongArch: Enable general EFI poweroff method
Miao Wang [Wed, 7 Aug 2024 09:37:11 +0000 (17:37 +0800)]
LoongArch: Enable general EFI poweroff method

efi_shutdown_init() can register a general sys_off handler named
efi_power_off(). Enable this by providing efi_poweroff_required(),
like arm and x86. Since EFI poweroff is also supported on LoongArch,
and the enablement makes the poweroff function usable for hardwares
which lack ACPI S5.

We prefer ACPI poweroff rather than EFI poweroff (like x86), so we only
require EFI poweroff if acpi_gbl_reduced_hardware or acpi_no_s5 is true.

Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
6 weeks agonet: usb: qmi_wwan: add MeiG Smart SRM825L
ZHANG Yuntian [Sat, 3 Aug 2024 07:46:51 +0000 (15:46 +0800)]
net: usb: qmi_wwan: add MeiG Smart SRM825L

Add support for MeiG Smart SRM825L which is based on Qualcomm 315 chip.

T:  Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=5000 MxCh= 0
D:  Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs=  1
P:  Vendor=2dee ProdID=4d22 Rev= 4.14
S:  Manufacturer=MEIG
S:  Product=LTE-A Module
S:  SerialNumber=6f345e48
C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=896mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E:  Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=88(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=89(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms

Signed-off-by: ZHANG Yuntian <yt@radxa.com>
Link: https://patch.msgid.link/D1EB81385E405DFE+20240803074656.567061-1-yt@radxa.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: microchip: Fix Wake-on-LAN check to not return an error
Tristram Ha [Mon, 5 Aug 2024 23:52:00 +0000 (16:52 -0700)]
net: dsa: microchip: Fix Wake-on-LAN check to not return an error

The wol variable in ksz_port_set_mac_address() is declared with random
data, but the code in ksz_get_wol call may not be executed so the
WAKE_MAGIC check may be invalid resulting in an error message when
setting a MAC address after starting the DSA driver.

Fixes: 3b454b6390c3 ("net: dsa: microchip: ksz9477: Add Wake on Magic Packet support")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20240805235200.24982-1-Tristram.Ha@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: linkwatch: use system_unbound_wq
Eric Dumazet [Mon, 5 Aug 2024 08:58:21 +0000 (08:58 +0000)]
net: linkwatch: use system_unbound_wq

linkwatch_event() grabs possibly very contended RTNL mutex.

system_wq is not suitable for such work.

Inspired by many noisy syzbot reports.

3 locks held by kworker/0:7/5266:
 #0: ffff888015480948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3206 [inline]
 #0: ffff888015480948 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x90a/0x1830 kernel/workqueue.c:3312
 #1: ffffc90003f6fd00 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3207 [inline]
 , at: process_scheduled_works+0x945/0x1830 kernel/workqueue.c:3312
 #2: ffffffff8fa6f208 (rtnl_mutex){+.+.}-{3:3}, at: linkwatch_event+0xe/0x60 net/core/link_watch.c:276

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240805085821.1616528-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Tue, 6 Aug 2024 17:58:28 +0000 (10:58 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio fix from Michael Tsirkin:
 "Fix a single, long-standing issue with kick pass-through vdpa"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-vdpa: switch to use vmf_insert_pfn() in the fault handler

6 weeks agoMerge tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 6 Aug 2024 14:52:10 +0000 (07:52 -0700)]
Merge tag 'platform-drivers-x86-v6.11-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes:

   - Fix ACPI notifier racing with itself (intel-vbtn)

   - Initialize local variable to cover a timeout corner case
     (intel/ifs)

   - WMI docs spelling

  New device IDs:

   - amd/{pmc,pmf}: AMD 1Ah model 60h series.

   - amd/pmf: SPS quirk support for ASUS ROG Ally X"

* tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/intel/ifs: Initialize union ifs_status to zero
  platform/x86: msi-wmi-platform: Fix spelling mistakes
  platform/x86/amd/pmf: Add new ACPI ID AMDI0107
  platform/x86/amd/pmc: Send OS_HINT command for new AMD platform
  platform/x86/amd: pmf: Add quirk for ROG Ally X
  platform/x86: intel-vbtn: Protect ACPI notify handler against recursion

6 weeks agonet: bridge: mcast: wait for previous gc cycles when removing port
Nikolay Aleksandrov [Fri, 2 Aug 2024 08:07:30 +0000 (11:07 +0300)]
net: bridge: mcast: wait for previous gc cycles when removing port

syzbot hit a use-after-free[1] which is caused because the bridge doesn't
make sure that all previous garbage has been collected when removing a
port. What happens is:
      CPU 1                   CPU 2
 start gc cycle           remove port
                         acquire gc lock first
 wait for lock
                         call br_multicasg_gc() directly
 acquire lock now but    free port
 the port can be freed
 while grp timers still
 running

Make sure all previous gc cycles have finished by using flush_work before
freeing the port.

[1]
  BUG: KASAN: slab-use-after-free in br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861
  Read of size 8 at addr ffff888071d6d000 by task syz.5.1232/9699

  CPU: 1 PID: 9699 Comm: syz.5.1232 Not tainted 6.10.0-rc5-syzkaller-00021-g24ca36a562d6 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
  Call Trace:
   <IRQ>
   __dump_stack lib/dump_stack.c:88 [inline]
   dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114
   print_address_description mm/kasan/report.c:377 [inline]
   print_report+0xc3/0x620 mm/kasan/report.c:488
   kasan_report+0xd9/0x110 mm/kasan/report.c:601
   br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861
   call_timer_fn+0x1a3/0x610 kernel/time/timer.c:1792
   expire_timers kernel/time/timer.c:1843 [inline]
   __run_timers+0x74b/0xaf0 kernel/time/timer.c:2417
   __run_timer_base kernel/time/timer.c:2428 [inline]
   __run_timer_base kernel/time/timer.c:2421 [inline]
   run_timer_base+0x111/0x190 kernel/time/timer.c:2437

Reported-by: syzbot+263426984509be19c9a0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=263426984509be19c9a0
Fixes: e12cec65b554 ("net: bridge: mcast: destroy all entries via gc")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20240802080730.3206303-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Mon, 5 Aug 2024 21:31:12 +0000 (14:31 -0700)]
Merge tag 'linux_kselftest-fixes-6.11-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
 "A single fix to the conditional in ksft.py script which incorrectly
  flags a test suite failed when there are skipped tests in the mix.

  The logic is fixed to take skipped tests into account and report the
  test as passed"

* tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: ksft: Fix finished() helper exit code on skipped tests

6 weeks agoMerge tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 5 Aug 2024 16:23:00 +0000 (09:23 -0700)]
Merge tag 'slab-fixes-for-6.11-rc2' of git://git./linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:
 "Since v6.8 we've had a subtle breakage in SLUB with KFENCE enabled,
  that can cause a crash. It hasn't been found earlier due to quite
  specific conditions necessary (OOM during kmem_cache_alloc_bulk())"

* tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm, slub: do not call do_slab_free for kfence object

6 weeks agonet: usb: qmi_wwan: fix memory leak for not ip packets
Daniele Palmas [Thu, 1 Aug 2024 13:55:12 +0000 (15:55 +0200)]
net: usb: qmi_wwan: fix memory leak for not ip packets

Free the unused skb when not ip packets arrive.

Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agoMerge branch 'virtio-net-rq-coalescing' into main
David S. Miller [Mon, 5 Aug 2024 10:10:22 +0000 (11:10 +0100)]
Merge branch 'virtio-net-rq-coalescing' into main

Heng Qi says:

====================
virtio-net: unbreak vq resizing if vq coalescing is not supported

Currently, if the driver does not negotiate the vq coalescing feature
but supports vq resize, the vq resize action, which could have been
successfully executed, is interrupted due to the failure in configuring
the vq coalescing parameters. This issue needs to be fixed.

Changelog
=========
v3->v4:
  - Add a comment for patch[2/2].

v2->v3:
  - Break out the feature check and the fix into separate patches.

v1->v2:
  - Rephrase the subject.
  - Put the feature check inside the virtnet_send_{r,t}x_ctrl_coal_vq_cmd.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agovirtio-net: unbreak vq resizing when coalescing is not negotiated
Heng Qi [Thu, 1 Aug 2024 13:23:38 +0000 (21:23 +0800)]
virtio-net: unbreak vq resizing when coalescing is not negotiated

Don't break the resize action if the vq coalescing feature
named VIRTIO_NET_F_VQ_NOTF_COAL is not negotiated.

Fixes: f61fe5f081cf ("virtio-net: fix the vq coalescing setting for vq resize")
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Eugenio Pé rez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agovirtio-net: check feature before configuring the vq coalescing command
Heng Qi [Thu, 1 Aug 2024 13:23:37 +0000 (21:23 +0800)]
virtio-net: check feature before configuring the vq coalescing command

Virtio spec says:

The driver MUST have negotiated the VIRTIO_NET_F_VQ_NOTF_COAL
feature when issuing commands VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET
and VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET.

So we add the feature negotiation check to
virtnet_send_{r,t}x_ctrl_coal_vq_cmd as a basis for the next bugfix patch.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agoLinux 6.11-rc2
Linus Torvalds [Sun, 4 Aug 2024 20:50:53 +0000 (13:50 -0700)]
Linux 6.11-rc2

6 weeks agoprofiling: remove profile=sleep support
Tetsuo Handa [Sun, 4 Aug 2024 09:48:10 +0000 (18:48 +0900)]
profiling: remove profile=sleep support

The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan()
to keep task blocked")

Booting with the 'profile=sleep' kernel command line option added or
executing

  # echo -n sleep > /sys/kernel/profiling

after boot causes the system to lock up.

Lockdep reports

  kthreadd/3 is trying to acquire lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70

  but task is already holding lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370

with the call trace being

   lock_acquire+0xc8/0x2f0
   get_wchan+0x32/0x70
   __update_stats_enqueue_sleeper+0x151/0x430
   enqueue_entity+0x4b0/0x520
   enqueue_task_fair+0x92/0x6b0
   ttwu_do_activate+0x73/0x140
   try_to_wake_up+0x213/0x370
   swake_up_locked+0x20/0x50
   complete+0x2f/0x40
   kthread+0xfb/0x180

However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.

Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 weeks agoMerge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 4 Aug 2024 15:57:08 +0000 (08:57 -0700)]
Merge tag 'x86-urgent-2024-08-04' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

 - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.

   A recent change in the ACPI code which consolidated code pathes moved
   the invocation of init_freq_invariance_cppc() to be moved to a CPU
   hotplug handler. The first invocation on AMD CPUs ends up enabling a
   static branch which dead locks because the static branch enable tries
   to acquire cpu_hotplug_lock but that lock is already held write by
   the hotplug machinery.

   Use static_branch_enable_cpuslocked() instead and take the hotplug
   lock read for the Intel code path which is invoked from the
   architecture code outside of the CPU hotplug operations.

 - Fix the number of reserved bits in the sev_config structure bit field
   so that the bitfield does not exceed 64 bit.

 - Add missing Zen5 model numbers

 - Fix the alignment assumptions of pti_clone_pgtable() and
   clone_entry_text() on 32-bit:

   The code assumes PMD aligned code sections, but on 32-bit the kernel
   entry text is not PMD aligned. So depending on the code size and
   location, which is configuration and compiler dependent, entry text
   can cross a PMD boundary. As the start is not PMD aligned adding PMD
   size to the start address is larger than the end address which
   results in partially mapped entry code for user space. That causes
   endless recursion on the first entry from userspace (usually #PF).

   Cure this by aligning the start address in the addition so it ends up
   at the next PMD start address.

   clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
   eventually be PTE mapped, which causes a map fail because the PMD for
   the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
   the clone() invocation which resolves to PTE on 32-bit and PMD on
   64-bit.

 - Zero the 8-byte case for get_user() on range check failure on 32-bit

   The recend consolidation of the 8-byte get_user() case broke the
   zeroing in the failure case again. Establish it by clearing ECX
   before the range check and not afterwards as that obvioulsy can't be
   reached when the range check fails

* tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
  x86/mm: Fix pti_clone_entry_text() for i386
  x86/mm: Fix pti_clone_pgtable() alignment assumption
  x86/setup: Parse the builtin command line before merging
  x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
  x86/sev: Fix __reserved field in sev_config
  x86/aperfmperf: Fix deadlock on cpu_hotplug_lock

6 weeks agoMerge tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 4 Aug 2024 15:50:16 +0000 (08:50 -0700)]
Merge tag 'timers-urgent-2024-08-04' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two fixes for the timer/clocksource code:

   - The recent fix to make the take over of the broadcast timer more
     reliable retrieves a per CPU pointer in preemptible context.

     This went unnoticed in testing as some compilers hoist the access
     into the non-preemotible section where the pointer is actually
     used, but obviously compilers can rightfully invoke it where the
     code put it.

     Move it into the non-preemptible section right to the actual usage
     side to cure it.

   - The clocksource watchdog is supposed to emit a warning when the
     retry count is greater than one and the number of retries reaches
     the limit.

     The condition is backwards and warns always when the count is
     greater than one. Fixup the condition to prevent spamming dmesg"

* tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
  tick/broadcast: Move per CPU pointer access into the atomic section

6 weeks agoMerge tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 4 Aug 2024 15:46:14 +0000 (08:46 -0700)]
Merge tag 'sched-urgent-2024-08-04' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:

 - When stime is larger than rtime due to accounting imprecision, then
   utime = rtime - stime becomes negative. As this is unsigned math, the
   result becomes a huge positive number.

   Cure it by resetting stime to rtime in that case, so utime becomes 0.

 - Restore consistent state when sched_cpu_deactivate() fails.

   When offlining a CPU fails in sched_cpu_deactivate() after the SMT
   present counter has been decremented, then the function aborts but
   fails to increment the SMT present counter and leaves it imbalanced.
   Consecutive operations cause it to underflow. Add the missing fixup
   for the error path.

   For SMT accounting the runqueue needs to marked online again in the
   error exit path to restore consistent state.

* tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()
  sched/core: Introduce sched_set_rq_on/offline() helper
  sched/smt: Fix unbalance sched_smt_present dec/inc
  sched/smt: Introduce sched_smt_present_inc/dec() helper
  sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime

6 weeks agoMerge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 4 Aug 2024 15:42:18 +0000 (08:42 -0700)]
Merge tag 'perf-urgent-2024-08-04' of git://git./linux/kernel/git/tip/tip

Pull x86 perf fixes from Thomas Gleixner:

 - Move the smp_processor_id() invocation back into the non-preemtible
   region, so that the result is valid to use

 - Add the missing package C2 residency counters for Sierra Forest CPUs
   to make the newly added support actually useful

* tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix smp_processor_id()-in-preemptible warnings
  perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest

6 weeks agoMerge tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 4 Aug 2024 15:36:57 +0000 (08:36 -0700)]
Merge tag 'irq-urgent-2024-08-04' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A couple of fixes for interrupt chip drivers:

   - Make sure to skip the clear register space in the MBIGEN driver
     when calculating the node register index. Otherwise the clear
     register is clobbered and the wrong node registers are accessed.

   - Fix a signed/unsigned confusion in the loongarch CPU driver which
     converts an error code to a huge "valid" interrupt number.

   - Convert the mesion GPIO interrupt controller lock to a raw spinlock
     so it works on RT.

   - Add a missing static to a internal function in the pic32 EVIC
     driver"

* tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mbigen: Fix mbigen node address layout
  irqchip/meson-gpio: Convert meson_gpio_irq_controller::lock to 'raw_spinlock_t'
  irqchip/irq-pic32-evic: Add missing 'static' to internal function
  irqchip/loongarch-cpu: Fix return value of lpic_gsi_to_irq()

6 weeks agoMerge tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 4 Aug 2024 15:32:31 +0000 (08:32 -0700)]
Merge tag 'locking-urgent-2024-08-04' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two fixes for locking and jump labels:

   - Ensure that the atomic_cmpxchg() conditions are correct and
     evaluating to true on any non-zero value except 1. The missing
     check of the return value leads to inconsisted state of the jump
     label counter.

   - Add a missing type conversion in the paravirt spinlock code which
     makes loongson build again"

* tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  jump_label: Fix the fix, brown paper bags galore
  locking/pvqspinlock: Correct the type of "old" variable in pv_kick_node()

6 weeks agoarm: dts: arm: versatile-ab: Fix duplicate clock node name
Rob Herring (Arm) [Tue, 30 Jul 2024 21:00:30 +0000 (15:00 -0600)]
arm: dts: arm: versatile-ab: Fix duplicate clock node name

Commit 04f08ef291d4 ("arm/arm64: dts: arm: Use generic clock and
regulator nodenames") renamed nodes and created 2 "clock-24000000" nodes
(at different paths).

The kernel can't handle these duplicate names even though they are at
different paths.  Fix this by renaming one of the nodes to "clock-pclk".

This name is aligned with other Arm boards (those didn't have a known
frequency to use in the node name).

Fixes: 04f08ef291d4 ("arm/arm64: dts: arm: Use generic clock and regulator nodenames")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 weeks agoMerge tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 4 Aug 2024 15:18:40 +0000 (08:18 -0700)]
Merge tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - two reparse point fixes

 - minor cleanup

 - additional trace point (to help debug a recent problem)

* tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal version number
  smb: client: fix FSCTL_GET_REPARSE_POINT against NetApp
  smb3: add dynamic tracepoints for shutdown ioctl
  cifs: Remove cifs_aio_ctx
  smb: client: handle lack of FSCTL_GET_REPARSE_POINT support

6 weeks agoMerge tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sun, 4 Aug 2024 15:12:33 +0000 (08:12 -0700)]
Merge tag 'media/v6.11-2' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - two Kconfig fixes

 - one fix for the UVC driver addressing probing time detection of a UVC
   custom controls

 - one fix related to PDF generation

* tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: v4l: Fix missing tabular column hint for Y14P format
  media: intel/ipu6: select AUXILIARY_BUS in Kconfig
  media: ipu-bridge: fix ipu6 Kconfig dependencies
  media: uvcvideo: Fix custom control mapping probing

6 weeks agonet/tcp: Disable TCP-AO static key after RCU grace period
Dmitry Safonov [Thu, 1 Aug 2024 00:13:28 +0000 (01:13 +0100)]
net/tcp: Disable TCP-AO static key after RCU grace period

The lifetime of TCP-AO static_key is the same as the last
tcp_ao_info. On the socket destruction tcp_ao_info ceases to be
with RCU grace period, while tcp-ao static branch is currently deferred
destructed. The static key definition is
: DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ);

which means that if RCU grace period is delayed by more than a second
and tcp_ao_needed is in the process of disablement, other CPUs may
yet see tcp_ao_info which atent dead, but soon-to-be.
And that breaks the assumption of static_key_fast_inc_not_disabled().

See the comment near the definition:
> * The caller must make sure that the static key can't get disabled while
> * in this function. It doesn't patch jump labels, only adds a user to
> * an already enabled static key.

Originally it was introduced in commit eb8c507296f6 ("jump_label:
Prevent key->enabled int overflow"), which is needed for the atomic
contexts, one of which would be the creation of a full socket from a
request socket. In that atomic context, it's known by the presence
of the key (md5/ao) that the static branch is already enabled.
So, the ref counter for that static branch is just incremented
instead of holding the proper mutex.
static_key_fast_inc_not_disabled() is just a helper for such usage
case. But it must not be used if the static branch could get disabled
in parallel as it's not protected by jump_label_mutex and as a result,
races with jump_label_update() implementation details.

Happened on netdev test-bot[1], so not a theoretical issue:

[] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1
[] ------------[ cut here ]------------
[] kernel BUG at arch/x86/kernel/jump_label.c:73!
[] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1
[] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[] Workqueue: events jump_label_update_timeout
[] RIP: 0010:__jump_label_patch+0x2f6/0x350
...
[] Call Trace:
[]  <TASK>
[]  arch_jump_label_transform_queue+0x6c/0x110
[]  __jump_label_update+0xef/0x350
[]  __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60
[]  jump_label_update_timeout+0x2c/0x40
[]  process_one_work+0xe3b/0x1670
[]  worker_thread+0x587/0xce0
[]  kthread+0x28a/0x350
[]  ret_from_fork+0x31/0x70
[]  ret_from_fork_asm+0x1a/0x30
[]  </TASK>
[] Modules linked in: veth
[] ---[ end trace 0000000000000000 ]---
[] RIP: 0010:__jump_label_patch+0x2f6/0x350

[1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr

Cc: stable@kernel.org
Fixes: 67fa83f7c86a ("net/tcp: Add static_key for TCP-AO")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 3 Aug 2024 22:12:56 +0000 (15:12 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "One core change that reverts the double message print patch in sd.c
  (it was causing regressions on embedded systems).

  The rest are driver fixes in ufs, mpt3sas and mpi3mr"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: exynos: Don't resume FMP when crypto support is disabled
  scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES
  scsi: mpi3mr: Avoid IOMMU page faults on REPORT ZONES
  scsi: ufs: core: Do not set link to OFF state while waking up from hibernation
  scsi: Revert "scsi: sd: Do not repeat the starting disk message"
  scsi: ufs: core: Fix deadlock during RTC update
  scsi: ufs: core: Bypass quick recovery if force reset is needed
  scsi: ufs: core: Check LSDBS cap when !mcq

6 weeks agoMerge tag 'xfs-6.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 3 Aug 2024 16:09:25 +0000 (09:09 -0700)]
Merge tag 'xfs-6.11-fixes-1' of git://git./fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

 - Fix memory leak when corruption is detected during scrubbing parent
   pointers

 - Allow SECURE namespace xattrs to use reserved block pool to in order
   to prevent ENOSPC

 - Save stack space by passing tracepoint's char array to file_path()
   instead of another stack variable

 - Remove unused parameter in macro XFS_DQUOT_LOGRES

 - Replace comma with semicolon in a couple of places

* tag 'xfs-6.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: convert comma to semicolon
  xfs: convert comma to semicolon
  xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
  xfs: fix file_path handling in tracepoints
  xfs: allow SECURE namespace xattrs to use reserved block pool
  xfs: fix a memory leak

6 weeks agoMerge tag 'parisc-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 3 Aug 2024 16:03:21 +0000 (09:03 -0700)]
Merge tag 'parisc-for-6.11-rc2' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc architecture fixes from Helge Deller:

 - fix unaligned memory accesses when calling BPF functions

 - adjust memory size constants to fix possible DMA corruptions

* tag 'parisc-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix a possible DMA corruption
  parisc: fix unaligned accesses in BPF

6 weeks agoruntime constants: deal with old decrepit linkers
Linus Torvalds [Sat, 3 Aug 2024 01:12:06 +0000 (18:12 -0700)]
runtime constants: deal with old decrepit linkers

The runtime constants linker script depended on documented linker
behavior [1]:

 "If an output section’s name is the same as the input section’s name
  and is representable as a C identifier, then the linker will
  automatically PROVIDE two symbols: __start_SECNAME and __stop_SECNAME,
  where SECNAME is the name of the section. These indicate the start
  address and end address of the output section respectively"

to just automatically define the symbol names for the bounds of the
runtime constant arrays.

It turns out that this isn't actually something we can rely on, with old
linkers not generating these automatic symbols.  It looks to have been
introduced in binutils-2.29 back in 2017, and we still support building
with versions all the way back to binutils-2.25 (from 2015).

And yes, Oleg actually seems to be using such ancient versions of
binutils.

So instead of depending on the implicit symbols from "section names
match and are representable C identifiers", just do this all manually.
It's not like it causes us any extra pain, we already have to do that
for all the other sections that we use that often have special
characters in them.

Reported-and-tested-by: Oleg Nesterov <oleg@redhat.com>
Link: https://sourceware.org/binutils/docs/ld/Input-Section-Example.html
Link: https://lore.kernel.org/all/20240802114518.GA20924@redhat.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 weeks agoMerge tag 'tags/fixes-media-uvc-20230722' of git://git.kernel.org/pub/scm/linux/kerne...
Hans Verkuil [Sat, 3 Aug 2024 09:01:04 +0000 (11:01 +0200)]
Merge tag 'tags/fixes-media-uvc-20230722' of git://git./linux/kernel/git/pinchartl/linux.git

uvcvideo v6.11 regression fix: fix custom control mapping probing

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
6 weeks agogve: Fix use of netif_carrier_ok()
Praveen Kaligineedi [Thu, 1 Aug 2024 20:56:19 +0000 (13:56 -0700)]
gve: Fix use of netif_carrier_ok()

GVE driver wrongly relies on netif_carrier_ok() to check the
interface administrative state when resources are being
allocated/deallocated for queue(s). netif_carrier_ok() needs
to be replaced with netif_running() for all such cases.

Administrative state is the result of "ip link set dev <dev>
up/down". It reflects whether the administrator wants to use
the device for traffic and the corresponding resources have
been allocated.

Fixes: 5f08cd3d6423 ("gve: Alloc before freeing when adjusting queues")
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240801205619.987396-1-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: pse-pd: tps23881: Fix the device ID check
Kyle Swenson [Wed, 31 Jul 2024 15:42:14 +0000 (15:42 +0000)]
net: pse-pd: tps23881: Fix the device ID check

The DEVID register contains two pieces of information: the device ID in
the upper nibble, and the silicon revision number in the lower nibble.
The driver should work fine with any silicon revision, so let's mask
that out in the device ID check.

Fixes: 20e6d190ffe1 ("net: pse-pd: Add TI TPS23881 PSE controller driver")
Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240731154152.4020668-1-kyle.swenson@est.tech
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agosctp: Fix null-ptr-deref in reuseport_add_sock().
Kuniyuki Iwashima [Wed, 31 Jul 2024 23:46:24 +0000 (16:46 -0700)]
sctp: Fix null-ptr-deref in reuseport_add_sock().

syzbot reported a null-ptr-deref while accessing sk2->sk_reuseport_cb in
reuseport_add_sock(). [0]

The repro first creates a listener with SO_REUSEPORT.  Then, it creates
another listener on the same port and concurrently closes the first
listener.

The second listen() calls reuseport_add_sock() with the first listener as
sk2, where sk2->sk_reuseport_cb is not expected to be cleared concurrently,
but the close() does clear it by reuseport_detach_sock().

The problem is SCTP does not properly synchronise reuseport_alloc(),
reuseport_add_sock(), and reuseport_detach_sock().

The caller of reuseport_alloc() and reuseport_{add,detach}_sock() must
provide synchronisation for sockets that are classified into the same
reuseport group.

Otherwise, such sockets form multiple identical reuseport groups, and
all groups except one would be silently dead.

  1. Two sockets call listen() concurrently
  2. No socket in the same group found in sctp_ep_hashtable[]
  3. Two sockets call reuseport_alloc() and form two reuseport groups
  4. Only one group hit first in __sctp_rcv_lookup_endpoint() receives
      incoming packets

Also, the reported null-ptr-deref could occur.

TCP/UDP guarantees that would not happen by holding the hash bucket lock.

Let's apply the locking strategy to __sctp_hash_endpoint() and
__sctp_unhash_endpoint().

[0]:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 UID: 0 PID: 10230 Comm: syz-executor119 Not tainted 6.10.0-syzkaller-12585-g301927d2d2eb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
RIP: 0010:reuseport_add_sock+0x27e/0x5e0 net/core/sock_reuseport.c:350
Code: 00 0f b7 5d 00 bf 01 00 00 00 89 de e8 1b a4 ff f7 83 fb 01 0f 85 a3 01 00 00 e8 6d a0 ff f7 49 8d 7e 12 48 89 f8 48 c1 e8 03 <42> 0f b6 04 28 84 c0 0f 85 4b 02 00 00 41 0f b7 5e 12 49 8d 7e 14
RSP: 0018:ffffc9000b947c98 EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff8880252ddf98 RCX: ffff888079478000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000012
RBP: 0000000000000001 R08: ffffffff8993e18d R09: 1ffffffff1fef385
R10: dffffc0000000000 R11: fffffbfff1fef386 R12: ffff8880252ddac0
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f24e45b96c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffcced5f7b8 CR3: 00000000241be000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __sctp_hash_endpoint net/sctp/input.c:762 [inline]
 sctp_hash_endpoint+0x52a/0x600 net/sctp/input.c:790
 sctp_listen_start net/sctp/socket.c:8570 [inline]
 sctp_inet_listen+0x767/0xa20 net/sctp/socket.c:8625
 __sys_listen_socket net/socket.c:1883 [inline]
 __sys_listen+0x1b7/0x230 net/socket.c:1894
 __do_sys_listen net/socket.c:1902 [inline]
 __se_sys_listen net/socket.c:1900 [inline]
 __x64_sys_listen+0x5a/0x70 net/socket.c:1900
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f24e46039b9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f24e45b9228 EFLAGS: 00000246 ORIG_RAX: 0000000000000032
RAX: ffffffffffffffda RBX: 00007f24e468e428 RCX: 00007f24e46039b9
RDX: 00007f24e46039b9 RSI: 0000000000000003 RDI: 0000000000000004
RBP: 00007f24e468e420 R08: 00007f24e45b96c0 R09: 00007f24e45b96c0
R10: 00007f24e45b96c0 R11: 0000000000000246 R12: 00007f24e468e42c
R13: 00007f24e465a5dc R14: 0020000000000001 R15: 00007ffcced5f7d8
 </TASK>
Modules linked in:

Fixes: 6ba845740267 ("sctp: process sk_reuseport in sctp_get_port_local")
Reported-by: syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e6979a5d2f10ecb700e4
Tested-by: syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20240731234624.94055-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMAINTAINERS: update status of sky2 and skge drivers
Stephen Hemminger [Thu, 1 Aug 2024 16:28:42 +0000 (09:28 -0700)]
MAINTAINERS: update status of sky2 and skge drivers

The old SysKonnect NIc's are not used or actively maintained anymore.
My sky2 NIC's are all in box in back corner of attic.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://patch.msgid.link/20240801162930.212299-1-stephen@networkplumber.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>