git.kernel.dk Git - linux-block.git/log

Merge tag 'bcachefs-2025-04-24' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:

- Case insensitive directories now work

- Ciemap now correctly reports on unwritten pagecache data

- bcachefs tools 1.25.1 was incorrectly picking unaligned bucket sizes;
   fix journal and write path bugs this uncovered

And assorted smaller fixes...

* tag 'bcachefs-2025-04-24' of git://evilpiepirate.org/bcachefs: (24 commits)
  bcachefs: Rework fiemap transaction restart handling
  bcachefs: add fiemap delalloc extent detection
  bcachefs: refactor fiemap processing into extent helper and struct
  bcachefs: track current fiemap offset in start variable
  bcachefs: drop duplicate fiemap sync flag
  bcachefs: Fix btree_iter_peek_prev() at end of inode
  bcachefs: Make btree_iter_peek_prev() assert more precise
  bcachefs: Unit test fixes
  bcachefs: Print mount opts earlier
  bcachefs: unlink: casefold d_invalidate
  bcachefs: Fix casefold lookups
  bcachefs: Casefold is now a regular opts.h option
  bcachefs: Implement fileattr_(get|set)
  bcachefs: Allocator now copes with unaligned buckets
  bcachefs: Start copygc, rebalance threads earlier
  bcachefs: Refactor bch2_run_recovery_passes()
  bcachefs: bch2_copygc_wakeup()
  bcachefs: Fix ref leak in write_super()
  bcachefs: Change __journal_entry_close() assert to ERO
  bcachefs: Ensure journal space is block size aligned
  ...

bcachefs: Rework fiemap transaction restart handling

Restart handling in the previous patch was incorrect, so: move btree
operations into a separate helper, and run it with a lockrestart_do().

Additionally, clarify whether pagecache or the btree takes precedence.

Right now, the btree takes precedence: this is incorrect, but it's
needed to pass fstests. Add a giant comment explaining why.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: add fiemap delalloc extent detection

bcachefs currently populates fiemap data from the extents btree.
This works correctly when the fiemap sync flag is provided, but if
not, it skips all delalloc extents that have not yet been flushed.
This is because delalloc extents from buffered writes are first
stored as reservation in the pagecache, and only become resident in
the extents btree after writeback completes.

Update the fiemap implementation to process holes between extents by
scanning pagecache for data, via seek data/hole. If a valid data
range is found over a hole in the extent btree, fake up an extent
key and flag the extent as delalloc for reporting to userspace.

Note that this does not necessarily change behavior for the case
where there is dirty pagecache over already written extents, where
when in COW mode, writeback will allocate new blocks for the
underlying ranges. The existing behavior is consistent with btrfs
and it is recommended to use the sync flag for the most up to date
extent state from fiemap.

Signed-off-by: Brian Foster <bfoster@redhat.com>

bcachefs: refactor fiemap processing into extent helper and struct

The bulk of the loop in bch2_fiemap() involves processing the
current extent key from the iter, including following indirections
and trimming the extent size and such. This patch makes a few
changes to reduce the size of the loop and facilitate future changes
to support delalloc extents.

Define a new bch_fiemap_extent structure to wrap the bkey buffer
that holds the extent key to report to userspace along with
associated fiemap flags. Update bch2_fill_extent() to take the
bch_fiemap_extent as a param instead of the individual fields.
Finally, lift the bulk of the extent processing into a
bch2_fiemap_extent() helper that takes the current key and formats
the bch_fiemap_extent appropriately for the fill function.

No functional changes intended by this patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>

bcachefs: track current fiemap offset in start variable

Signed-off-by: Brian Foster <bfoster@redhat.com>

bcachefs: drop duplicate fiemap sync flag

FIEMAP_FLAG_SYNC handling was deliberately moved into core code in
commit 45dd052e67ad ("fs: handle FIEMAP_FLAG_SYNC in fiemap_prep"),
released in kernel v5.8. Update bcachefs accordingly.

Signed-off-by: Brian Foster <bfoster@redhat.com>

bcachefs: Fix btree_iter_peek_prev() at end of inode

At the end of the inode, on an extents iterator, peek_slot() has to
advance to the next position to avoid returning a 0 size extent, which
is not allowed.

Changing iter->pos confuses peek_prev(), but we don't need to call
peek_slot() in this case.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make btree_iter_peek_prev() assert more precise

The issue this assert is guarding against is that in
BTREE_ITER_filter_snapshots mode we only want to be iterating within a
single inode number - if we iterate into another inode number with keys
for a different snapshot tree, we'll loop arbitrarily long before
finding a key we can return.

This comes up in the unit tests, where we're using inode 0 for our test
keys.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Unit test fixes

The peek_end() tests expect an empty btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Print mount opts earlier

If we aren't mounting with the correct degraded option, it's helpful to
know that before we fail to mount degraded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: unlink: casefold d_invalidate

casefolding results in additional aliases on lookup for the
non-casefolded names - these need invalidating on unlink.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix casefold lookups

Add casefolding to bch2_lookup_trans:

During the delay between when casefolding was written and when it was
merged, the main filesystem lookup path grew self healing - which meant
it was no longer using bch2_dirent_lookup_trans(), where casefolding on
lookups happens.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Casefold is now a regular opts.h option

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"The single core change is an obvious bug fix (and falls within the LF
  guidelines for patches from sanctioned entities). The other driver
  changes are a bit larger but likewise pretty obvious"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mpi3mr: Add level check to control event logging
  scsi: ufs: core: Add NULL check in ufshcd_mcq_compl_pending_transfer()
  scsi: core: Clear flags for scsi_cmnd that did not complete
  scsi: ufs: Introduce quirk to extend PA_HIBERN8TIME for UFS devices
  scsi: ufs: qcom: Add quirks for Samsung UFS devices
  scsi: target: iscsi: Fix timeout on deleted connection
  scsi: mpi3mr: Reset the pending interrupt flag
  scsi: mpi3mr: Fix pending I/O counter
  scsi: ufs: mcq: Add NULL check in ufshcd_mcq_abort()

Merge tag 'landlock-6.15-rc4' of git://git./linux/kernel/git/mic/linux

Pull landlock fixes from Mickaël Salaün:
"Fix some Landlock audit issues, add related tests, and updates
  documentation"

* tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Update log documentation
  landlock: Fix documentation for landlock_restrict_self(2)
  landlock: Fix documentation for landlock_create_ruleset(2)
  selftests/landlock: Add PID tests for audit records
  selftests/landlock: Factor out audit fixture in audit_test
  landlock: Log the TGID of the domain creator
  landlock: Remove incorrect warning

Merge tag 'net-6.15-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"No fixes from any subtree.

  Current release - regressions:

   - net: fix the missing unlock for detached devices

  Previous releases - regressions:

   - sched: fix UAF vulnerability in HFSC qdisc

   - lwtunnel: disable BHs when required

   - mptcp: pm: defer freeing of MPTCP userspace path manager entries

   - tipc: fix NULL pointer dereference in tipc_mon_reinit_self()

   - eth: virtio-net: disable delayed refill when pausing rx

  Previous releases - always broken:

   - phylink: fix suspend/resume with WoL enabled and link down

   - eth:
       - mlx5: fix null-ptr-deref in mlx5_create_{inner_,}ttc_table()
       - xen-netfront: handle NULL returned by xdp_convert_buff_to_frame()
       - enetc: fix frame corruption on bpf_xdp_adjust_head/tail() and XDP_PASS
       - stmmac: fix dwmac1000 ptp timestamp status offset
       - pds_core: prevent possible adminq overflow/stuck condition

  Misc:

   - a bunch of MAINTAINERS updates"

* tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (32 commits)
  net: stmmac: fix multiplication overflow when reading timestamp
  net: stmmac: fix dwmac1000 ptp timestamp status offset
  net: dp83822: Fix OF_MDIO config check
  pds_core: make wait_context part of q_info
  pds_core: Remove unnecessary check in pds_client_adminq_cmd()
  pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result
  pds_core: Prevent possible adminq overflow/stuck condition
  net: dsa: mt7530: sync driver-specific behavior of MT7531 variants
  selftests/tc-testing: Add test for HFSC queue emptying during peek operation
  net_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too
  net_sched: hfsc: Fix a UAF vulnerability in class handling
  selftests: mptcp: diag: use mptcp_lib_get_info_value
  mptcp: pm: Defer freeing of MPTCP userspace path manager entries
  net: ethernet: mtk_eth_soc: net: revise NETSYSv3 hardware configuration
  tipc: fix NULL pointer dereference in tipc_mon_reinit_self()
  virtio-net: disable delayed refill when pausing rx
  net: phy: leds: fix memory leak
  net: phylink: mac_link_(up|down)() clarifications
  net: phylink: fix suspend/resume with WoL enabled and link down
  net: lwtunnel: disable BHs when required
  ...

Merge tag 'v6.15-p5' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Revert acomp multibuffer tests which were buggy

- Fix off-by-one regression in new scomp code

- Lower quality setting on atmel-sha204a as it may not be random

* tag 'v6.15-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: atmel-sha204a - Set hwrng quality to lowest possible
  crypto: scomp - Fix off-by-one bug when calculating last page
  Revert "crypto: testmgr - Add multibuffer acomp testing"

Merge branch 'net-stmmac-fix-timestamp-snapshots-on-dwmac1000'

Alexis Lothore says:

====================
net: stmmac: fix timestamp snapshots on dwmac1000

this is the v2 of a small series containing two small fixes for the
timestamp snapshot feature on stmmac, especially on dwmac1000 version.
Those issues have been detected on a socfpga (Cyclone V) platform. They
kind of follow the big rework sent by Maxime at the end of last year to
properly split this feature support between different versions of the
DWMAC IP.

v1: https://lore.kernel.org/r/20250422-stmmac_ts-v1-0-b59c9f406041@bootlin.com

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
====================

Link: https://patch.msgid.link/20250423-stmmac_ts-v2-0-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: fix multiplication overflow when reading timestamp

The current way of reading a timestamp snapshot in stmmac can lead to
integer overflow, as the computation is done on 32 bits. The issue has
been observed on a dwmac-socfpga platform returning chaotic timestamp
values due to this overflow. The corresponding multiplication is done
with a MUL instruction, which returns 32 bit values. Explicitly casting
the value to 64 bits replaced the MUL with a UMLAL, which computes and
returns the result on 64 bits, and so returns correctly the timestamps.

Prevent this overflow by explicitly casting the intermediate value to
u64 to make sure that the whole computation is made on u64. While at it,
apply the same cast on the other dwmac variant (GMAC4) method for
snapshot retrieval.

Fixes: 477c3e1f6363 ("net: stmmac: Introduce dwmac1000 timestamping operations")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423-stmmac_ts-v2-2-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: fix dwmac1000 ptp timestamp status offset

When a PTP interrupt occurs, the driver accesses the wrong offset to
learn about the number of available snapshots in the FIFO for dwmac1000:
it should be accessing bits 29..25, while it is currently reading bits
19..16 (those are bits about the auxiliary triggers which have generated
the timestamps). As a consequence, it does not compute correctly the
number of available snapshots, and so possibly do not generate the
corresponding clock events if the bogus value ends up being 0.

Fix clock events generation by reading the correct bits in the timestamp
register for dwmac1000.

Fixes: 477c3e1f6363 ("net: stmmac: Introduce dwmac1000 timestamping operations")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423-stmmac_ts-v2-1-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dp83822: Fix OF_MDIO config check

When CONFIG_OF_MDIO is set to be a module the code block is not
compiled. Use the IS_ENABLED macro that checks for both built in as
well as module.

Fixes: 5dc39fd5ef35 ("net: phy: DP83822: Add ability to advertise Fiber connection")
Signed-off-by: Johannes Schneider <johannes.schneider@leica-geosystems.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423044724.1284492-1-johannes.schneider@leica-geosystems.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'pds_core-updates-and-fixes'

Shannon Nelson says:

====================
pds_core: updates and fixes

This patchset has fixes for issues seen in recent internal testing
of error conditions and stress handling.

Note that the first patch in this series is a leftover from an
earlier patchset that was abandoned:
Link: https://lore.kernel.org/netdev/20250129004337.36898-2-shannon.nelson@amd.com/
====================

Link: https://patch.msgid.link/20250421174606.3892-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: make wait_context part of q_info

Make the wait_context a full part of the q_info struct rather
than a stack variable that goes away after pdsc_adminq_post()
is done so that the context is still available after the wait
loop has given up.

There was a case where a slow development firmware caused
the adminq request to time out, but then later the FW finally
finished the request and sent the interrupt. The handler tried
to complete_all() the completion context that had been created
on the stack in pdsc_adminq_post() but no longer existed.
This caused bad pointer usage, kernel crashes, and much wailing
and gnashing of teeth.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250421174606.3892-5-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: Remove unnecessary check in pds_client_adminq_cmd()

When the pds_core driver was first created there were some race
conditions around using the adminq, especially for client drivers.
To reduce the possibility of a race condition there's a check
against pf->state in pds_client_adminq_cmd(). This is problematic
for a couple of reasons:

1. The PDSC_S_INITING_DRIVER bit is set during probe, but not
   cleared until after everything in probe is complete, which
   includes creating the auxiliary devices. For pds_fwctl this
   means it can't make any adminq commands until after pds_core's
   probe is complete even though the adminq is fully up by the
   time pds_fwctl's auxiliary device is created.

2. The race conditions around using the adminq have been fixed
   and this path is already protected against client drivers
   calling pds_client_adminq_cmd() if the adminq isn't ready,
   i.e. see pdsc_adminq_post() -> pdsc_adminq_inc_if_up().

Fix this by removing the pf->state check in pds_client_adminq_cmd()
because invalid accesses to pds_core's adminq is already handled by
pdsc_adminq_post()->pdsc_adminq_inc_if_up().

Fixes: 10659034c622 ("pds_core: add the aux client API")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250421174606.3892-4-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result

If the FW doesn't support the PDS_CORE_CMD_FW_CONTROL command
the driver might at the least print garbage and at the worst
crash when the user runs the "devlink dev info" devlink command.

This happens because the stack variable fw_list is not 0
initialized which results in fw_list.num_fw_slots being a
garbage value from the stack. Then the driver tries to access
fw_list.fw_names[i] with i >= ARRAY_SIZE and runs off the end
of the array.

Fix this by initializing the fw_list and by not failing
completely if the devcmd fails because other useful information
is printed via devlink dev info even if the devcmd fails.

Fixes: 45d76f492938 ("pds_core: set up device and adminq")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250421174606.3892-3-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: Prevent possible adminq overflow/stuck condition

The pds_core's adminq is protected by the adminq_lock, which prevents
more than 1 command to be posted onto it at any one time. This makes it
so the client drivers cannot simultaneously post adminq commands.
However, the completions happen in a different context, which means
multiple adminq commands can be posted sequentially and all waiting
on completion.

On the FW side, the backing adminq request queue is only 16 entries
long and the retry mechanism and/or overflow/stuck prevention is
lacking. This can cause the adminq to get stuck, so commands are no
longer processed and completions are no longer sent by the FW.

As an initial fix, prevent more than 16 outstanding adminq commands so
there's no way to cause the adminq from getting stuck. This works
because the backing adminq request queue will never have more than 16
pending adminq commands, so it will never overflow. This is done by
reducing the adminq depth to 16.

Fixes: 45d76f492938 ("pds_core: set up device and adminq")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250421174606.3892-2-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: sync driver-specific behavior of MT7531 variants

MT7531 standalone and MMIO variants found in MT7988 and EN7581 share
most basic properties. Despite that, assisted_learning_on_cpu_port and
mtu_enforcement_ingress were only applied for MT7531 but not for MT7988
or EN7581, causing the expected issues on MMIO devices.

Apply both settings equally also for MT7988 and EN7581 by moving both
assignments form mt7531_setup() to mt7531_setup_common().

This fixes unwanted flooding of packets due to unknown unicast
during DA lookup, as well as issues with heterogenous MTU settings.

Fixes: 7f54cc9772ce ("net: dsa: mt7530: split-off common parts from mt7531_setup")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Chester A. Unal <chester.a.unal@arinc9.com>
Link: https://patch.msgid.link/89ed7ec6d4fa0395ac53ad2809742bb1ce61ed12.1745290867.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net_sched-fix-uaf-vulnerability-in-hfsc-qdisc'

Cong Wang says:

====================
net_sched: Fix UAF vulnerability in HFSC qdisc

This patchset contains two bug fixes and a selftest for the first one
which we have a reliable reproducer, please check each patch
description for details.
====================

Link: https://patch.msgid.link/20250417184732.943057-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/tc-testing: Add test for HFSC queue emptying during peek operation

Add a selftest to exercise the condition where qdisc implementations
like netem or codel might empty the queue during a peek operation.
This tests the defensive code path in HFSC that checks the queue length
again after peeking to handle this case.

Based on the reproducer from Gerrard, improved by Jamal.

Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250417184732.943057-4-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too

Similarly to the previous patch, we need to safe guard hfsc_dequeue()
too. But for this one, we don't have a reliable reproducer.

Fixes: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250417184732.943057-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net_sched: hfsc: Fix a UAF vulnerability in class handling

This patch fixes a Use-After-Free vulnerability in the HFSC qdisc class
handling. The issue occurs due to a time-of-check/time-of-use condition
in hfsc_change_class() when working with certain child qdiscs like netem
or codel.

The vulnerability works as follows:
1. hfsc_change_class() checks if a class has packets (q.qlen != 0)
2. It then calls qdisc_peek_len(), which for certain qdiscs (e.g.,
   codel, netem) might drop packets and empty the queue
3. The code continues assuming the queue is still non-empty, adding
   the class to vttree
4. This breaks HFSC scheduler assumptions that only non-empty classes
   are in vttree
5. Later, when the class is destroyed, this can lead to a Use-After-Free

The fix adds a second queue length check after qdisc_peek_len() to verify
the queue wasn't emptied.

Fixes: 21f4d5cc25ec ("net_sched/hfsc: fix curve activation in hfsc_change_class()")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Reviewed-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250417184732.943057-2-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-pm-defer-freeing-userspace-pm-entries'

Matthieu Baerts says:

====================
mptcp: pm: Defer freeing userspace pm entries

Here are two unrelated fixes for MPTCP:

- Patch 1: free userspace PM entry with RCU helpers. A fix for v6.14.

- Patch 2: avoid a warning when running diag.sh selftest. A fix for
v6.15-rc1.
====================

Link: https://patch.msgid.link/20250421-net-mptcp-pm-defer-freeing-v1-0-e731dc6e86b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: diag: use mptcp_lib_get_info_value

When running diag.sh in a loop, chk_dump_one will report the following
"grep: write error":

13 ....chk 2 cestab                                  [ OK ]
grep: write error
14 ....chk dump_one                                  [ OK ]
15 ....chk 2->0 msk in use after flush               [ OK ]
16 ....chk 2->0 cestab after flush                   [ OK ]

This error is caused by a broken pipe. When the output of 'ss' is processed
by grep, 'head -n 1' will exit immediately after getting the first line,
causing the subsequent pipe to close. At this time, if 'grep' is still
trying to write data to the closed pipe, it will trigger a SIGPIPE signal,
causing a write error.

One solution is not to use this problematic "head -n 1" command, but to use
mptcp_lib_get_info_value() helper defined in mptcp_lib.sh to get the value
of 'token'.

Fixes: ba2400166570 ("selftests: mptcp: add a test for mptcp_diag_dump_one")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Gang Yan <yangang@kylinos.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250421-net-mptcp-pm-defer-freeing-v1-2-e731dc6e86b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: Defer freeing of MPTCP userspace path manager entries

When path manager entries are deleted from the local address list, they
are first unlinked from the address list using list_del_rcu(). The
entries must not be freed until after the RCU grace period, but the
existing code immediately frees the entry.

Use kfree_rcu_mightsleep() and adjust sk_omem_alloc in open code instead
of using the sock_kfree_s() helper. This code path is only called in a
netlink handler, so the "might sleep" function is preferable to adding
a rarely-used rcu_head member to struct mptcp_pm_addr_entry.

Fixes: 88d097316371 ("mptcp: drop free_list for deleting entries")
Cc: stable@vger.kernel.org
Signed-off-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250421-net-mptcp-pm-defer-freeing-v1-1-e731dc6e86b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Fix mis-uses of 'cc-option' for warning disablement

This was triggered by one of my mis-uses causing odd build warnings on
sparc in linux-next, but while figuring out why the "obviously correct"
use of cc-option caused such odd breakage, I found eight other cases of
the same thing in the tree.

The root cause is that 'cc-option' doesn't work for checking negative
warning options (ie things like '-Wno-stringop-overflow') because gcc
will silently accept options it doesn't recognize, and so 'cc-option'
ends up thinking they are perfectly fine.

And it all works, until you have a situation where _another_ warning is
emitted. At that point the compiler will go "Hmm, maybe the user
intended to disable this warning but used that wrong option that I
didn't recognize", and generate a warning for the unrecognized negative
option.

Which explains why we have several cases of this in the tree: the
'cc-option' test really doesn't work for this situation, but most of the
time it simply doesn't matter that ity doesn't work.

The reason my recently added case caused problems on sparc was pointed
out by Thomas Weißschuh: the sparc build had a previous explicit warning
that then triggered the new one.

I think the best fix for this would be to make 'cc-option' a bit smarter
about this sitation, possibly by adding an intentional warning to the
test case that then triggers the unrecognized option warning reliably.

But the short-term fix is to replace 'cc-option' with an existing helper
designed for this exact case: 'cc-disable-warning', which picks the
negative warning but uses the positive form for testing the compiler
support.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/all/20250422204718.0b4e3f81@canb.auug.org.au/
Explained-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

locking/local_lock: fix _Generic() matching of local_trylock_t

Michael Larabel reported [1] a nginx performance regression in v6.15-rc3
and bisected it to commit 51339d99c013 ("locking/local_lock, mm: replace
localtry_ helpers with local_trylock_t type")

The problem is the _Generic() usage with a default association that
masks the fact that "local_trylock_t *" association is not being
selected as expected.  Replacing the default with the only other
expected type "local_lock_t *" reveals the underlying problem:

  include/linux/local_lock_internal.h:174:26: error: ‘_Generic’ selector of type ‘__seg_gs local_lock_t *’ is not compatible with any association

The local_locki's are part of __percpu structures and thus the __percpu
attribute is needed to associate the type properly.  Add the attribute
and keep the default replaced to turn any further mismatches into
compile errors.

The failure to recognize local_try_lock_t in __local_lock_release()
means that a local_trylock[_irqsave]() operation will set tl->acquired
to 1 (there's no _Generic() part in the trylock code), but then
local_unlock[_irqrestore]() will not set tl->acquired back to 0, so
further trylock operations will always fail on the same cpu+lock, while
non-trylock operations continue to work - a lockdep_assert() is also not
being executed in the _Generic() part of local_lock() code.

This means consume_stock() and refill_stock() operations will fail
deterministically, resulting in taking the slow paths and worse
performance.

Fixes: 51339d99c013 ("locking/local_lock, mm: replace localtry_ helpers with local_trylock_t type")
Reported-by: Michael Larabel <Michael@phoronix.com>
Closes: https://www.phoronix.com/review/linux-615-nginx-regression/2 [1]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"A small number of fixes:

   - virtgpu is exempt from reset shutdown fow now - a more complete fix
     is in the works

   - spec compliance fixes in:
       - virtio-pci cap commands
       - vhost_scsi_send_bad_target
       - virtio console resize

   - missing locking fix in vhost-scsi

   - virtio ring - a KCSAN false positive fix

   - VHOST_*_OWNER documentation fix"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-scsi: Fix vhost_scsi_send_status()
  vhost-scsi: Fix vhost_scsi_send_bad_target()
  vhost-scsi: protect vq->log_used with vq->mutex
  vhost_task: fix vhost_task_create() documentation
  virtio_console: fix order of fields cols and rows
  virtio_console: fix missing byte order handling for cols and rows
  virtgpu: don't reset on shutdown
  virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN
  vhost: fix VHOST_*_OWNER documentation
  virtio_pci: Use self group type for cap commands

net: ethernet: mtk_eth_soc: net: revise NETSYSv3 hardware configuration

Change hardware configuration for the NETSYSv3.
- Enable PSE dummy page mechanism for the GDM1/2/3
- Enable PSE drop mechanism when the WDMA Rx ring full
- Enable PSE no-drop mechanism for packets from the WDMA Tx
- Correct PSE free drop threshold
- Correct PSE CDMA high threshold

Fixes: 1953f134a1a8b ("net: ethernet: mtk_eth_soc: add NETSYS_V3 version support")
Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b71f8fd9d4bb69c646c4d558f9331dd965068606.1744907886.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tipc: fix NULL pointer dereference in tipc_mon_reinit_self()

syzbot reported:

tipc: Node number set to 1055423674
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 3 UID: 0 PID: 6017 Comm: kworker/3:5 Not tainted 6.15.0-rc1-syzkaller-00246-g900241a5cc15 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Workqueue: events tipc_net_finalize_work
RIP: 0010:tipc_mon_reinit_self+0x11c/0x210 net/tipc/monitor.c:719
...
RSP: 0018:ffffc9000356fb68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003ee87cba
RDX: 0000000000000000 RSI: ffffffff8dbc56a7 RDI: ffff88804c2cc010
RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
R13: fffffbfff2111097 R14: ffff88804ead8000 R15: ffff88804ead9010
FS:  0000000000000000(0000) GS:ffff888097ab9000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000f720eb00 CR3: 000000000e182000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tipc_net_finalize+0x10b/0x180 net/tipc/net.c:140
process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3238
process_scheduled_works kernel/workqueue.c:3319 [inline]
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
kthread+0x3c2/0x780 kernel/kthread.c:464
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
...
RIP: 0010:tipc_mon_reinit_self+0x11c/0x210 net/tipc/monitor.c:719
...
RSP: 0018:ffffc9000356fb68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003ee87cba
RDX: 0000000000000000 RSI: ffffffff8dbc56a7 RDI: ffff88804c2cc010
RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
R13: fffffbfff2111097 R14: ffff88804ead8000 R15: ffff88804ead9010
FS:  0000000000000000(0000) GS:ffff888097ab9000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000f720eb00 CR3: 000000000e182000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

There is a racing condition between workqueue created when enabling
bearer and another thread created when disabling bearer right after
that as follow:

enabling_bearer                          | disabling_bearer
---------------                          | ----------------
tipc_disc_timeout()                      |
{                                        | bearer_disable()
...                                     | {
schedule_work(&tn->work);               |  tipc_mon_delete()
...                                     |  {
}                                        |   ...
                                         |   write_lock_bh(&mon->lock);
                                         |   mon->self = NULL;
                                         |   write_unlock_bh(&mon->lock);
                                         |   ...
                                         |  }
tipc_net_finalize_work()                 | }
{                                        |
...                                     |
tipc_net_finalize()                     |
{                                       |
  ...                                    |
  tipc_mon_reinit_self()                 |
  {                                      |
   ...                                   |
   write_lock_bh(&mon->lock);            |
   mon->self->addr = tipc_own_addr(net); |
   write_unlock_bh(&mon->lock);          |
   ...                                   |
  }                                      |
  ...                                    |
}                                       |
...                                     |
}                                        |

'mon->self' is set to NULL in disabling_bearer thread and dereferenced
later in enabling_bearer thread.

This commit fixes this issue by validating 'mon->self' before assigning
node address to it.

Reported-by: syzbot+ed60da8d686dc709164c@syzkaller.appspotmail.com
Fixes: 46cb01eeeb86 ("tipc: update mon's self addr when node addr generated")
Signed-off-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250417074826.578115-1-tung.quang.nguyen@est.tech
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

crypto: atmel-sha204a - Set hwrng quality to lowest possible

According to the review by Bill Cox [1], the Atmel SHA204A random number
generator produces random numbers with very low entropy.

Set the lowest possible entropy for this chip just to be safe.

[1] https://www.metzdowd.com/pipermail/cryptography/2014-December/023858.html

Fixes: da001fb651b00e1d ("crypto: atmel-i2c - add support for SHA204A random number generator")
Cc: <stable@vger.kernel.org>
Signed-off-by: Marek Behún <kabel@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: scomp - Fix off-by-one bug when calculating last page

Fix off-by-one bug in the last page calculation for src and dst.

Reported-by: Nhat Pham <nphamcs@gmail.com>
Fixes: 2d3553ecb4e3 ("crypto: scomp - Remove support for some non-trivial SG lists")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

virtio-net: disable delayed refill when pausing rx

When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call
napi_disable() on the receive queue's napi. In delayed refill_work, it
also calls napi_disable() on the receive queue's napi. When
napi_disable() is called on an already disabled napi, it will sleep in
napi_disable_locked while still holding the netdev_lock. As a result,
later napi_enable gets stuck too as it cannot acquire the netdev_lock.
This leads to refill_work and the pause-then-resume tx are stuck
altogether.

This scenario can be reproducible by binding a XDP socket to virtio-net
interface without setting up the fill ring. As a result, try_fill_recv
will fail until the fill ring is set up and refill_work is scheduled.

This commit adds virtnet_rx_(pause/resume)_all helpers and fixes up the
virtnet_rx_resume to disable future and cancel all inflights delayed
refill_work before calling napi_disable() to pause the rx.

Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250417072806.18660-2-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: leds: fix memory leak

A network restart test on a router led to an out-of-memory condition,
which was traced to a memory leak in the PHY LED trigger code.

The root cause is misuse of the devm API. The registration function
(phy_led_triggers_register) is called from phy_attach_direct, not
phy_probe, and the unregister function (phy_led_triggers_unregister)
is called from phy_detach, not phy_remove. This means the register and
unregister functions can be called multiple times for the same PHY
device, but devm-allocated memory is not freed until the driver is
unbound.

This also prevents kmemleak from detecting the leak, as the devm API
internally stores the allocated pointer.

Fix this by replacing devm_kzalloc/devm_kcalloc with standard
kzalloc/kcalloc, and add the corresponding kfree calls in the unregister
path.

Fixes: 3928ee6485a3 ("net: phy: leds: Add support for "link" trigger")
Fixes: 2e0bc452f472 ("net: phy: leds: add support for led triggers on phy link state change")
Signed-off-by: Hao Guan <hao.guan@siflower.com.cn>
Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250417032557.2929427-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: mac_link_(up|down)() clarifications

As a result of an email from the fbnic author, I reviewed the phylink
documentation, and I have decided to clarify the wording in the
mac_link_(up|down)() kernel documentation as this was written from the
point of view of mvneta/mvpp2 and is misleading.

The documentation talks about forcing the link - indeed, this is what
is done in the mvneta and mvpp2 drivers but not at the physical layer
but the MACs idea, which has the effect of only allowing or stopping
packet flow at the MAC. This "link" needs to be controlled when using
a PHY or fixed link to start or stop packet flow at the MAC. However,
as the MAC and PCS are tightly integrated, if the MACs idea of the
link is forced down, it has the side effect that there is no way to
determine that the media link has come up - in this mode, the MAC must
be allowed to follow its built-in PCS so we can read the link state.

Frame the documentation in more generic terms, to avoid the thought
that the physical media link to the partner needs in some way to be
forced up or down with these calls; it does not. If that were to be
done, it would be a self-fulfilling prophecy - e.g. if the media link
goes down, then mac_link_down() will be called, and if the media link
is then placed into a forced down state, there is no possibility
that the media link will ever come up again - clearly this is a wrong
interpretation.

These methods are notifications to the MAC about what has happened to
the media link state - either from the PHY, or a PCS, or whatever
mechanism fixed-link is using. Thus, reword them to get away from
talking about changing link state to avoid confusion with media link
state.

This is not a change of any requirements of these methods.

Also, remove the obsolete references to EEE for these methods, we now
have the LPI functions for configuring the EEE parameters which
renders this redundant, and also makes the passing of "phy" to the
mac_link_up() function obsolete.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1u5Ah5-001GO1-7E@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: fix suspend/resume with WoL enabled and link down

When WoL is enabled, we update the software state in phylink to
indicate that the link is down, and disable the resolver from
bringing the link back up.

On resume, we attempt to bring the overall state into consistency
by calling the .mac_link_down() method, but this is wrong if the
link was already down, as phylink strictly orders the .mac_link_up()
and .mac_link_down() methods - and this would break that ordering.

Fixes: f97493657c63 ("net: phylink: add suspend/resume support")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u55Qf-0016RN-PA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-6.15-rc3-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- subpage mode fixes:
     - access correct object (folio) when looking up bit offset
     - fix assertion condition for number of blocks per folio
     - fix upper boundary of locking range in hole punch

- zoned fixes:
     - fix potential deadlock caught by lockdep when zone reporting and
       device freeze run in parallel
     - fix zone write pointer mismatch and NULL pointer dereference when
       metadata are converted from DUP to RAID1

- fix error handling when reloc inode creation fails

- in tree-checker, unify error code for header level check

- block layer: add helpers to read zone capacity

* tag 'for-6.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: skip reporting zone for new block group
  block: introduce zone capacity helper
  btrfs: tree-checker: adjust error code for header level check
  btrfs: fix invalid inode pointer after failure to create reloc inode
  btrfs: zoned: return EIO on RAID1 block group write pointer mismatch
  btrfs: fix the ASSERT() inside GET_SUBPAGE_BITMAP()
  btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range()
  btrfs: subpage: access correct object when reading bitmap start in subpage_calc_start_bit()

Merge tag 'integrity-6.15-rc3-fix' of https://github.com/linux-integrity/linux

Pull integrity fix from Roberto Sassu:
"One performance fix to avoid unnecessarily taking the inode lock"

* tag 'integrity-6.15-rc3-fix' of https://github.com/linux-integrity/linux:
ima: process_measurement() needlessly takes inode_lock() on MAY_READ

ima: process_measurement() needlessly takes inode_lock() on MAY_READ

On IMA policy update, if a measure rule exists in the policy,
IMA_MEASURE is set for ima_policy_flags which makes the violation_check
variable always true. Coupled with a no-action on MAY_READ for a
FILE_CHECK call, we're always taking the inode_lock().

This becomes a performance problem for extremely heavy read-only workloads.
Therefore, prevent this only in the case there's no action to be taken.

Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Acked-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

net: lwtunnel: disable BHs when required

In lwtunnel_{output|xmit}(), dev_xmit_recursion() may be called in
preemptible scope for PREEMPT kernels. This patch disables BHs before
calling dev_xmit_recursion(). BHs are re-enabled only at the end, since
we must ensure the same CPU is used for both dev_xmit_recursion_inc()
and dev_xmit_recursion_dec() (and any other recursion levels in some
cases) in order to maintain valid per-cpu counters.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Closes: https://lore.kernel.org/netdev/CAADnVQJFWn3dBFJtY+ci6oN1pDFL=TzCmNbRgey7MdYxt_AP2g@mail.gmail.com/
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Closes: https://lore.kernel.org/netdev/m2h62qwf34.fsf@gmail.com/
Fixes: 986ffb3a57c5 ("net: lwtunnel: fix recursion loops")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250416160716.8823-1-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: selftests: initialize TCP header and skb payload with zero

Zero-initialize TCP header via memset() to avoid garbage values that
may affect checksum or behavior during test transmission.

Also zero-fill allocated payload and padding regions using memset()
after skb_put(), ensuring deterministic content for all outgoing
test packets.

Fixes: 3e1e58d64c3d ("net: add generic selftest support")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250416160125.2914724-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: microchip: force IRQ polling mode for lan88xx

With lan88xx based devices the lan78xx driver can get stuck in an
interrupt loop while bringing the device up, flooding the kernel log
with messages like the following:

lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped

Removing interrupt support from the lan88xx PHY driver forces the
driver to use polling instead, which avoids the problem.

The issue has been observed with Raspberry Pi devices at least since
4.14 (see [1], bug report for their downstream kernel), as well as
with Nvidia devices [2] in 2020, where disabling interrupts was the
vendor-suggested workaround (together with the claim that phylib
changes in 4.9 made the interrupt handling in lan78xx incompatible).

Iperf reports well over 900Mbits/sec per direction with client in
--dualtest mode, so there does not seem to be a significant impact on
throughput (lan88xx device connected via switch to the peer).

[1] https://github.com/raspberrypi/linux/issues/2447
[2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11

Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch
Fixes: 792aec47d59d ("add microchip LAN88xx phy driver")
Signed-off-by: Fiona Klute <fiona.klute@gmx.de>
Cc: kernel-list@raspberrypi.com
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250416102413.30654-1-fiona.klute@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'sched_ext-for-6.15-rc3-fixes' of git://git./linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

- Use kvzalloc() so that large exit_dump buffer allocations don't fail
   easily

- Remove cpu.weight / cpu.idle unimplemented warnings which are more
   annoying than helpful.

   This makes SCX_OPS_HAS_CGROUP_WEIGHT unnecessary. Mark it for
   deprecation

* tag 'sched_ext-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Mark SCX_OPS_HAS_CGROUP_WEIGHT for deprecation
  sched_ext: Remove cpu.weight / cpu.idle unimplemented warnings
  sched_ext: Use kvzalloc for large exit_dump allocation

Merge tag 'cgroup-for-6.15-rc3-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

- Fix compilation in CONFIG_LOCKDEP && !CONFIG_PROVE_RCU configurations

- Allow "cpuset_v2_mode" mount option for "cpuset" filesystem type to
   make life easier for android

* tag 'cgroup-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset-v1: Add missing support for cpuset_v2_mode
  cgroup: Fix compilation issue due to cgroup_mutex not being exported

Merge branch 'enetc-bug-fixes-for-bpf_xdp_adjust_head-and-bpf_xdp_adjust_tail'

Vladimir Oltean says:

====================
ENETC bug fixes for bpf_xdp_adjust_head() and bpf_xdp_adjust_tail()

It has been reported that on the ENETC driver, bpf_xdp_adjust_head()
and bpf_xdp_adjust_tail() are broken in combination with the XDP_PASS
verdict. I have constructed a series a simple XDP programs and tested
with various packet sizes and confirmed that this is the case.

Patch 3/3 fixes the core issue, which is that the sk_buff created on
XDP_PASS is created by the driver as if XDP never ran, but in fact the
geometry needs to be adjusted according to the delta applied by the
program on the original xdp_buff. It depends on commit 539c1fba1ac7
("xdp: add generic xdp_build_skb_from_buff()") which is not available in
"stable" but perhaps should be.

Patch 2/3 is a small refactor necessary for 3/3.

Patch 1/3 fixes a related issue I noticed, which is that
bpf_xdp_adjust_tail() with a positive offset works for linear XDP
buffers, but returns an error for non-linear ones, even if there is
plenty of space in the final page fragment.
====================

Link: https://patch.msgid.link/20250417120005.3288549-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: fix frame corruption on bpf_xdp_adjust_head/tail() and XDP_PASS

Vlatko Markovikj reported that XDP programs attached to ENETC do not
work well if they use bpf_xdp_adjust_head() or bpf_xdp_adjust_tail(),
combined with the XDP_PASS verdict. A typical use case is to add or
remove a VLAN tag.

The resulting sk_buff passed to the stack is corrupted, because the
algorithm used by the driver for XDP_PASS is to unwind the current
buffer pointer in the RX ring and to re-process the current frame with
enetc_build_skb() as if XDP hadn't run. That is incorrect because XDP
may have modified the geometry of the buffer, which we then are
completely unaware of. We are looking at a modified buffer with the
original geometry.

The initial reaction, both from me and from Vlatko, was to shop around
the kernel for code to steal that would calculate a delta between the
old and the new XDP buffer geometry, and apply that to the sk_buff too.
We noticed that veth and generic xdp have such code.

The headroom adjustment is pretty uncontroversial, but what turned out
severely problematic is the tailroom.

veth has this snippet:

__skb_put(skb, off); /* positive on grow, negative on shrink */

which on first sight looks decent enough, except __skb_put() takes an
"unsigned int" for the second argument, and the arithmetic seems to only
work correctly by coincidence. Second issue, __skb_put() contains a
SKB_LINEAR_ASSERT(). It's not a great pattern to make more widespread.
The skb may still be nonlinear at that point - it only becomes linear
later when resetting skb->data_len to zero.

To avoid the above, bpf_prog_run_generic_xdp() does this instead:

skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
skb->len += off; /* positive on grow, negative on shrink */

which is more open-coded, uses lower-level functions and is in general a
bit too much to spread around in driver code.

Then there is the snippet:

if (xdp_buff_has_frags(xdp))
skb->data_len = skb_shinfo(skb)->xdp_frags_size;
else
skb->data_len = 0;

One would have expected __pskb_trim() to be the function of choice for
this task. But it's not used in veth/xdpgeneric because the extraneous
fragments were _already_ freed by bpf_xdp_adjust_tail() ->
bpf_xdp_frags_shrink_tail() -> ... -> __xdp_return() - the backing
memory for the skb frags and the xdp frags is the same, but they don't
keep individual references.

In fact, that is the biggest reason why this snippet cannot be reused
as-is, because ENETC temporarily constructs an skb with the original len
and the original number of frags. Because the extraneous frags are
already freed by bpf_xdp_adjust_tail() and returned to the page
allocator, it means the entire approach of using enetc_build_skb() is
questionable for XDP_PASS. To avoid that, one would need to elevate the
page refcount of all frags before calling bpf_prog_run_xdp() and drop it
after XDP_PASS.

There are other things that are missing in ENETC's handling of XDP_PASS,
like for example updating skb_shinfo(skb)->meta_len.

These are all handled correctly and cleanly in commit 539c1fba1ac7
("xdp: add generic xdp_build_skb_from_buff()"), added to net-next in
Dec 2024, and in addition might even be quicker that way. I have a very
strong preference towards backporting that commit for "stable", and that
is what is used to fix the handling bugs. It is way too messy to go
this deep into the guts of an sk_buff from the code of a device driver.

Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS")
Reported-by: Vlatko Markovikj <vlatko.markovikj@etas.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250417120005.3288549-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: refactor bulk flipping of RX buffers to separate function

This small snippet of code ensures that we do something with the array
of RX software buffer descriptor elements after passing the skb to the
stack. In this case, we see if the other half of the page is reusable,
and if so, we "turn around" the buffers, making them directly usable by
enetc_refill_rx_ring() without going to enetc_new_page().

We will need to perform this kind of buffer flipping from a new code
path, i.e. from XDP_PASS. Currently, enetc_build_skb() does it there
buffer by buffer, but in a subsequent change we will stop using
enetc_build_skb() for XDP_PASS.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250417120005.3288549-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: register XDP RX queues with frag_size

At the time when bpf_xdp_adjust_tail() gained support for non-linear
buffers, ENETC was already generating this kind of geometry on RX, due
to its use of 2K half page buffers. Frames larger than 1472 bytes
(without FCS) are stored as multi-buffer, presenting a need for multi
buffer support to work properly even in standard MTU circumstances.

Allow bpf_xdp_frags_increase_tail() to know the allocation size of paged
data, so it can safely permit growing the tailroom of the buffer from
XDP programs.

Fixes: bf25146a5595 ("bpf: add frags support to the bpf_xdp_adjust_tail() API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250417120005.3288549-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xen-netfront: handle NULL returned by xdp_convert_buff_to_frame()

The function xdp_convert_buff_to_frame() may return NULL if it fails
to correctly convert the XDP buffer into an XDP frame due to memory
constraints, internal errors, or invalid data. Failing to check for NULL
may lead to a NULL pointer dereference if the result is used later in
processing, potentially causing crashes, data corruption, or undefined
behavior.

On XDP redirect failure, the associated page must be released explicitly
if it was previously retained via get_page(). Failing to do so may result
in a memory leak, as the pages reference count is not decremented.

Cc: stable@vger.kernel.org # v5.9+
Fixes: 6c5aa6fc4def ("xen networking: add basic XDP support for xen-netfront")
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Link: https://patch.msgid.link/20250417122118.1009824-1-sdl@nppct.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'maintainers-update-entries-for-s390-network-driver-files'

Simon Horman says:

====================
MAINTAINERS: Update entries for s390 network driver files

Update the entries for s390 network driver files to:

* Add include/linux/ism.h to MAINTAINERS
* Add s390 network driver files to the NETWORKING DRIVERS section

This is to aid developers, and tooling such as get_maintainer.pl alike
to CC patches to all the appropriate people and mailing lists. And is
in keeping with an ongoing effort for NETWORKING entries in MAINTAINERS
to more accurately reflect the way code is maintained.
====================

Link: https://patch.msgid.link/20250417-ism-maint-v1-0-b001be8545ce@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: Add s390 networking drivers to NETWORKING DRIVERS

These files are already correctly covered by the S390 NETWORKING DRIVERS
section. In practice commits for these drivers feed into the Networking
subsystem. So it seems appropriate to also list them under NETWORKING
DRIVERS.

This aids developers, and tooling such as get_maintainer.pl
alike to CC patches to all the appropriate people and mailing lists.
And is in keeping with an ongoing effort for NETWORKING entries
in MAINTAINERS to more accurately reflect the way code is maintained.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250417-ism-maint-v1-2-b001be8545ce@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: Add ism.h to S390 NETWORKING DRIVERS

ism.h appears to be part of s390 networking drivers
so add it to the corresponding section in MAINTAINERS.

This aids developers, and tooling such as get_maintainer.pl
alike to CC patches to the appropriate people and mailing lists.
And is in keeping with an ongoing effort for NETWORKING entries
in MAINTAINERS to more accurately reflect the way code is maintained.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250417-ism-maint-v1-1-b001be8545ce@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

scsi: mpi3mr: Add level check to control event logging

Ensure event logs are only generated when the debug logging level
MPI3_DEBUG_EVENT is enabled. This prevents unnecessary logging.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250415101546.204018-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Add NULL check in ufshcd_mcq_compl_pending_transfer()

Add a NULL check for the returned hwq pointer by ufshcd_mcq_req_to_hwq().

This is similar to the fix in commit 74736103fb41 ("scsi: ufs: core: Fix
ufshcd_abort_one racing issue").

Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
Link: https://lore.kernel.org/r/20250412195909.315418-1-chenyuan0y@gmail.com
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Clear flags for scsi_cmnd that did not complete

Commands that have not been completed with scsi_done() do not clear the
SCMD_INITIALIZED flag and therefore will not be properly reinitialized.
Thus, the next time the scsi_cmnd structure is used, the command may
fail in scsi_cmd_runtime_exceeded() due to the old jiffies_at_alloc
value:

kernel: sd 16:0:1:84: [sdts] tag#405 timing out command, waited 720s
kernel: sd 16:0:1:84: [sdts] tag#405 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=66636s

Clear flags for commands that have not been completed by SCSI.

Fixes: 4abafdc4360d ("block: remove the initialize_rq_fn blk_mq_ops method")
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Link: https://lore.kernel.org/r/20250324084933.15932-2-a.kovaleva@yadro.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

net: fix the missing unlock for detached devices

The combined condition was left as is when we converted
from __dev_get_by_index() to netdev_get_by_index_lock().
There was no need to undo anything with the former, for
the latter we need an unlock.

Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250418015317.1954107-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-fix-null-dereference-and-memory-leak-in-ttc_table-creation'

Henry Martin says:

====================
net/mlx5: Fix NULL dereference and memory leak in ttc_table creation

This patch series addresses two issues in the
mlx5_create_inner_ttc_table() and mlx5_create_ttc_table() functions:

1. A potential NULL pointer dereference if mlx5_get_flow_namespace()
returns NULL.

2. A memory leak in the error path when ttc_type is invalid (default:
switch case).
====================

Link: https://patch.msgid.link/20250418023814.71789-1-bsdhenrymartin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Move ttc allocation after switch case to prevent leaks

Relocate the memory allocation for ttc table after the switch statement
that validates params->ns_type in both mlx5_create_inner_ttc_table() and
mlx5_create_ttc_table(). This ensures memory is only allocated after
confirming valid input, eliminating potential memory leaks when invalid
ns_type cases occur.

Fixes: 137f3d50ad2a ("net/mlx5: Support matching on l4_type for ttc_table")
Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250418023814.71789-3-bsdhenrymartin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix null-ptr-deref in mlx5_create_{inner_,}ttc_table()

Add NULL check for mlx5_get_flow_namespace() returns in
mlx5_create_inner_ttc_table() and mlx5_create_ttc_table() to prevent
NULL pointer dereference.

Fixes: 137f3d50ad2a ("net/mlx5: Support matching on l4_type for ttc_table")
Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250418023814.71789-2-bsdhenrymartin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bcachefs: Implement fileattr_(get|set)

inode_operations.fileattr_(get|set) didn't exist when the various flag
ioctls where implemented - but they do now, which means we can delete a
bunch of ioctl code in favor of standard VFS level wrappers.

Closes: https://lore.kernel.org/linux-bcachefs/7ltgrgqgfummyrlvw7hnfhnu42rfiamoq3lpcvrjnlyytldmzp@yazbhusnztqn/
Cc: Petr Vorel <pvorel@suse.cz>
Cc: Andrea Cervesato <andrea.cervesato@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Allocator now copes with unaligned buckets

We had a buggy release of bcachefs-tools that wasn't properly aligning
bucket sizes.

We can't ask users to reformat - and it's easy to teach the allocator to
make sure writes are properly aligned.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Start copygc, rebalance threads earlier

Previously, copygc and rebalance weren't started until the very end of
mounting, after all recvoery passes have finished.

But copygc really should be started earlier, since it may be needed for
allocations to make forward progress. Additionally, we've been seeing
occasional bug reports where starting the kthread fails due to a pending
signal - i.e. we're getting timed out by systemd (during a version
upgrade), but we're not seeing the signal until mount is about to
complete.

Additionally, we now have copygc/rebalance explicitly wait for
check_snapshots to complete (if being run); they require that for
snapshot_is_ancestor() in the data move path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Refactor bch2_run_recovery_passes()

Don't use a continue; this simplifies the next patch where
run_recovery_passes() will be responsible for waking up copygc and
rebalance at the appropriate time.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_copygc_wakeup()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix ref leak in write_super()

found with the new enumerated_ref code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Change __journal_entry_close() assert to ERO

We've got some reports of this happening in the wild, and need a bit
more info to debug it:

https://github.com/koverstreet/bcachefs/issues/854
https://www.reddit.com/r/bcachefs/comments/1k28kjm/surprise_soft_lockup/

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure journal space is block size aligned

We don't require that bucket size is block size aligned (although it
should be!) - so we need to handle this in the journal code.

This fixes an assertion pop in jorunal_entry_close(), where the journal
entry overruns available space - after rounding it up to block size.

Fixes: https://github.com/koverstreet/bcachefs/issues/854
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Stricter checks on "key allowed in this btree"

Syzbot managed to come up with a filesystem where check/repair got
rather confused at finding a reflink pointer in the inodes btree.

Currently, the "key allowed in this btree" checks only apply at commit
time, not read time - for forwards compatibility. It seems this is too
loose.

Now, strict key type allowed checks apply:
- at commit time (no forward compatibility issues)
- for btree node pointers
- if it's a known btree, known key type, and the key type has the
"BKEY_TYPE_strict_btree_checks" flag.

This means we still have the option of using generic key types - e.g.
KEY_TYPE_error, KEY_TYPE_set - on more existing btrees in the future,
while most key types that are intended for only a specific btree get
stricter checks.

Reported-by: syzbot+baee8591f336cab0958b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Error ratelimiting is no longer only during fsck

We now more often do repair automatically, without the user invoking
fsck - and sometimes that can involve fixing lots of errors, so let's
avoid flooding the dmesg log.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix null ptr deref in bch2_snapshot_tree_oldest_subvol()

Reported-by: syzbot+baee8591f336cab0958b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix early startup error path

Don't set JOURNAL_running until we're also calling
journal_space_available() for the first time.

If JOURNAL_running is set, shutdown will write an empty journal entry -
but this will hit an assert in journal_entry_open() if we've never
called journal_space_available().

Reported-by: syzbot+53bb24d476ef8368a7f0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

gcc-15: disable '-Wunterminated-string-initialization' entirely for now

I had left the warning around but as a non-fatal error to get my gcc-15
builds going, but fixed up some of the most annoying warning cases so
that it wouldn't be *too* verbose.

Because I like the _concept_ of the warning, even if I detested the
implementation to shut it up.

It turns out the implementation to shut it up is even more broken than I
thought, and my "shut up most of the warnings" patch just caused fatal
errors on gcc-14 instead.

I had tested with clang, but when I upgrade my development environment,
I try to do it on all machines because I hate having different systems
to maintain, and hadn't realized that gcc-14 now had issues.

The ACPI case is literally why I wanted to have a *type* that doesn't
trigger the warning (see commit d5d45a7f2619: "gcc-15: make
'unterminated string initialization' just a warning"), instead of
marking individual places as "__nonstring".

But gcc-14 doesn't like that __nonstring location that shut gcc-15 up,
because it's on an array of char arrays, not on one single array:

  drivers/acpi/tables.c:399:1: error: 'nonstring' attribute ignored on objects of type 'const char[][4]' [-Werror=attributes]
    399 | static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst __nonstring = {
        | ^~~~~~

and my attempts to nest it properly with a type had failed, because of
how gcc doesn't like marking the types as having attributes, only
symbols.

There may be some trick to it, but I was already annoyed by the bad
attribute design, now I'm just entirely fed up with it.

I wish gcc had a proper way to say "this type is a *byte* array, not a
string".

The obvious thing would be to distinguish between "char []" and an
explicitly signed "unsigned char []" (as opposed to an implicitly
unsigned char, which is typically an architecture-specific default, but
for the kernel is universal thanks to '-funsigned-char').

But any "we can typedef a 8-bit type to not become a string just because
it's an array" model would be fine.

But "__attribute__((nonstring))" is sadly not that sane model.

Reported-by: Chris Clayton <chris2553@googlemail.com>
Fixes: 4b4bd8c50f48 ("gcc-15: acpi: sprinkle random '__nonstring' crumbles around")
Fixes: d5d45a7f2619 ("gcc-15: make 'unterminated string initialization' just a warning")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Linux 6.15-rc3

gcc-15: work around sequence-point warning

The C sequence points are complicated things, and gcc-15 has apparently
added a warning for the case where an object is both used and modified
multiple times within the same sequence point.

That's a great warning.

Or rather, it would be a great warning, except gcc-15 seems to not
really be very exact about it, and doesn't notice that the modification
are to two entirely different members of the same object: the array
counter and the array entries.

So that seems kind of silly.

That said, the code that gcc complains about is unnecessarily
complicated, so moving the array counter update into a separate
statement seems like the most straightforward fix for these warnings:

  drivers/net/wireless/intel/iwlwifi/mld/d3.c: In function ‘iwl_mld_set_netdetect_info’:
  drivers/net/wireless/intel/iwlwifi/mld/d3.c:1102:66: error: operation on ‘netdetect_info->n_matches’ may be undefined [-Werror=sequence-point]
   1102 |                 netdetect_info->matches[netdetect_info->n_matches++] = match;
        |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~^~

  drivers/net/wireless/intel/iwlwifi/mld/d3.c:1120:58: error: operation on ‘match->n_channels’ may be undefined [-Werror=sequence-point]
   1120 |                         match->channels[match->n_channels++] =
        |                                         ~~~~~~~~~~~~~~~~~^~

side note: the code at that second warning is actively buggy, and only
works on little-endian machines that don't do strict alignment checks.

The code casts an array of integers into an array of unsigned long in
order to use our bitmap iterators.  That happens to work fine on any
sane architecture, but it's still wrong.

This does *not* fix that more serious problem.  This only splits the two
assignments into two statements and fixes the compiler warning.  I need
to get rid of the new warnings in order to be able to actually do any
build testing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gcc-15: add '__nonstring' markers to byte arrays

All of these cases are perfectly valid and good traditional C, but hit
by the "you're not NUL-terminating your byte array" warning.

And none of the cases want any terminating NUL character.

Mark them __nonstring to shut up gcc-15 (and in the case of the ak8974
magnetometer driver, I just removed the explicit array size and let gcc
expand the 3-byte and 6-byte arrays by one extra byte, because it was
the simpler change).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gcc-15: get rid of misc extra NUL character padding

This removes two cases of explicit NUL padding that now causes warnings
because of '-Wunterminated-string-initialization' being part of -Wextra
in gcc-15.

Gcc is being silly in this case when it says that it truncates a NUL
terminator, because in these cases there were _multiple_ NUL characters.

But we can get rid of the warning by just simplifying the two
initializers that trigger the warning for me, so this does exactly that.

I'm not sure why the power supply code did that odd

.attr_name = #_name "\0",

pattern: it was introduced in commit 2cabeaf15129 ("power: supply: core:
Cleanup power supply sysfs attribute list"), but that 'attr_name[]'
field is an explicitly sized character array in a statically initialized
variable, and a string initializer always has a terminating NUL _and_
statically initialized character arrays are zero-padded anyway, so it
really seems to be rather extraneous belt-and-suspenders.

The zero_uuid[16] initialization in drivers/md/bcache/super.c makes
perfect sense, but it isn't necessary for the same reasons, and not
worth the new gcc warning noise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gcc-15: acpi: sprinkle random '__nonstring' crumbles around

This is not great: I'd much rather introduce a typedef that is a "ACPI
name byte buffer", and use that to mark these special 4-byte ACPI names
that do not use NUL termination.

But as noted in the previous commit ("gcc-15: make 'unterminated string
initialization' just a warning") gcc doesn't actually seem to support
that notion, so instead you have to just mark every single array
declaration individually.

So this is not pretty, but this gets rid of the bulk of the annoying
warnings during an allmodconfig build for me.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gcc-15: make 'unterminated string initialization' just a warning

gcc-15 enabling -Wunterminated-string-initialization in -Wextra by
default was done with the best intentions, but the warning is still
quite broken.

What annoys me about the warning is that this is a very traditional AND
CORRECT way to initialize fixed byte arrays in C:

unsigned char hex[16] = "0123456789abcdef";

and we use this all over the kernel.  And the warning is fine, but gcc
developers apparently never made a reasonable way to disable it.  As is
(sadly) tradition with these things.

Yes, there's "__attribute__((nonstring))", and we have a macro to make
that absolutely disgusting syntax more palatable (ie the kernel syntax
for that monstrosity is just "__nonstring").

But that attribute is misdesigned.  What you'd typically want to do is
tell the compiler that you are using a type that isn't a string but a
byte array, but that doesn't work at all:

warning: ‘nonstring’ attribute does not apply to types [-Wattributes]

and because of this fundamental mis-design, you then have to mark each
instance of that pattern.

This is particularly noticeable in our ACPI code, because ACPI has this
notion of a 4-byte "type name" that gets used all over, and is exactly
this kind of byte array.

This is a sad oversight, because the warning is useful, but really would
be so much better if gcc had also given a sane way to indicate that we
really just want a byte array type at a type level, not the broken "each
and every array definition" level.

So now instead of creating a nice "ACPI name" type using something like

typedef char acpi_name_t[4] __nonstring;

we have to do things like

char name[ACPI_NAMESEG_SIZE] __nonstring;

in every place that uses this concept and then happens to have the
typical initializers.

This is annoying me mainly because I think the warning _is_ a good
warning, which is why I'm not just turning it off in disgust.  But it is
hampered by this bad implementation detail.

[ And obviously I'm doing this now because system upgrades for me are
  something that happen in the middle of the release cycle: don't do it
  before or during travel, or just before or during the busy merge
  window period. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git./linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
"16 hotfixes. 2 are cc:stable and the remainder address post-6.14
  issues or aren't considered necessary for -stable kernels.

  All patches are basically for MM although five are alterations to
  MAINTAINERS"

[ Basic counting skills are clearly not a strictly necessary requirement
  for kernel maintainers.     - Linus ]

* tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: add section for locking of mm's and VMAs
  mm: vmscan: fix kswapd exit condition in defrag_mode
  mm: vmscan: restore high-cpu watermark safety in kswapd
  MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section
  mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
  mm, hugetlb: increment the number of pages to be reset on HVO
  writeback: fix false warning in inode_to_wb()
  docs: ABI: replace mcroce@microsoft.com with new Meta address
  mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
  MAINTAINERS: add memory advice section
  MAINTAINERS: add mmap trace events to MEMORY MAPPING
  mm: memcontrol: fix swap counter leak from offline cgroup
  MAINTAINERS: add MM subsection for the page allocator
  MAINTAINERS: update SLAB ALLOCATOR maintainers
  fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages
  mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()

Merge tag 'vfs-6.15-rc3.fixes.2' of git://git./linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Revert the hfs{plus} deprecation warning that's also included in this
   pull request. The commit introducing the deprecation warning resides
   rather early in this branch. So simply dropping it would've rebased
   all other commits which I decided to avoid. Hence the revert in the
   same branch

   [ Background - the deprecation warning discussion resulted in people
     stepping up, and so hfs{plus} will have a maintainer taking care of
     it after all..   - Linus ]

- Switch CONFIG_SYSFS_SYCALL default to n and decouple from
   CONFIG_EXPERT

- Fix an audit bug caused by changes to our kernel path lookup helpers
   this cycle. Audit needs the parent path even if the dentry it tried
   to look up is negative

- Ensure that the kernel path lookup helpers leave the passed in path
   argument clean when they return an error. This is consistent with all
   our other helpers

- Ensure that vfs_getattr_nosec() calls bdev_statx() so the relevant
   information is available to kernel consumers as well

- Don't set a timer and call schedule() if the timer will expire
   immediately in epoll

- Make netfs lookup tables with __nonstring

* tag 'vfs-6.15-rc3.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Revert "hfs{plus}: add deprecation warning"
  fs: move the bdex_statx call to vfs_getattr_nosec
  netfs: Mark __nonstring lookup tables
  eventpoll: Set epoll timeout if it's in the future
  fs: ensure that *path_locked*() helpers leave passed path pristine
  fs: add kern_path_locked_negative()
  hfs{plus}: add deprecation warning
  Kconfig: switch CONFIG_SYSFS_SYCALL default to n

Merge tag 'i2c-for-6.15-rc3' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- Address translator: fix wrong include

- ChromeOS EC tunnel: fix potential NULL pointer dereference

* tag 'i2c-for-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: atr: Fix wrong include
i2c: cros-ec-tunnel: defer probe if parent EC is not present

Revert "hfs{plus}: add deprecation warning"

This reverts commit ddee68c499f76ae47c011549df5be53db0057402.

There's ongoing discussion about better maintenance of at least hfsplus.
Rever the deprecation warning for now.

Signed-off-by: Christian Brauner <brauner@kernel.org>

Merge tag 'trace-v6.15-rc2' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Initialize hash variables in ftrace subops logic

   The fix that simplified the ftrace subops logic opened a path where
   some variables could be used without being initialized, and done
   subtly where the compiler did not catch it. Initialize those
   variables to the EMPTY_HASH, which is the default hash.

- Reinitialize the hash pointers after they are freed

   Some of the hash pointers in the subop logic were freed but may still
   be referenced later. To prevent use-after-free bugs, initialize them
   back to the EMPTY_HASH.

- Free the ftrace hashes when they are replaced

   The fix that simplified the subops logic updated some hash pointers,
   but left the original hash that they were pointing to where they are
   no longer used. This caused a memory leak. Free the hashes that are
   pointed to by the pointers when they are replaced.

- Fix size initialization of ftrace direct function hash

   The ftrace direct function hash used by BPF initialized the hash size
   incorrectly. It checked the size of items to a hard coded 32, which
   made the hash bit size of 5. The hash size is supposed to be limited
   by the bit size of the hash, as the bitmask is allowed to be greater
   than 5. Rework the size check to first pass the number of elements to
   fls() and then compare that to FTRACE_HASH_MAX_BITS before allocating
   the hash.

- Fix format output of ftrace_graph_ent_entry event

   The field depth of the ftrace_graph_ent_entry event is of size 4 but
   the output showed it as unsigned long and use "%lu". Change it to
   unsigned int and use "%u" in the print format that is displayed to
   user space.

- Fix the trace event filter on strings

   Events can be filtered on numbers or string values. The return value
   checked from strncpy_from_kernel_nofault() and
   strncpy_from_user_nofault() was used to determine if reading the
   strings would fault or not. It would return fault if the value was
   non zero, which is basically meant that it was always considering the
   read as a fault.

- Add selftest to test trace event string filtering

   In order to catch the breakage of the string filtering, add a self
   test to make sure that it continues to work.

* tag 'trace-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: selftests: Add testing a user string to filters
  tracing: Fix filter string testing
  ftrace: Fix type of ftrace_graph_ent_entry.depth
  ftrace: fix incorrect hash size in register_ftrace_direct()
  ftrace: Free ftrace hashes after they are replaced in the subops code
  ftrace: Reinitialize hash to EMPTY_HASH after freeing
  ftrace: Initialize variables for ftrace_startup/shutdown_subops()

Merge tag 'nfsd-6.15-1' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- v6.15 libcrc clean-up makes invalid configurations possible

- Fix a potential deadlock introduced during the v6.15 merge window

* tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: decrease sc_count directly if fail to queue dl_recall
nfs: add missing selections of CONFIG_CRC32

Merge tag 'rust-fixes-6.15' of git://git./linux/kernel/git/ojeda/linux

Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:

   - Fix missing KASAN LLVM flags on first build (and fix spurious
     rebuilds) by skipping '--target'

   - Fix Make < 4.3 build error by using '$(pound)'

   - Fix UML build error by removing 'volatile' qualifier from io
     helpers

   - Fix UML build error by adding 'dma_{alloc,free}_attrs()' helpers

   - Clean gendwarfksyms warnings by avoiding to export '__pfx' symbols

   - Clean objtool warning by adding a new 'noreturn' function for
     1.86.0

   - Disable 'needless_continue' Clippy lint due to new 1.86.0 warnings

   - Add missing 'ffi' crate to 'generate_rust_analyzer.py'

  'pin-init' crate:

   - Import a couple fixes from upstream"

* tag 'rust-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: helpers: Add dma_alloc_attrs() and dma_free_attrs()
  rust: helpers: Remove volatile qualifier from io helpers
  rust: kbuild: use `pound` to support GNU Make < 4.3
  objtool/rust: add one more `noreturn` Rust function for Rust 1.86.0
  rust: kasan/kbuild: fix missing flags on first build
  rust: disable `clippy::needless_continue`
  rust: kbuild: Don't export __pfx symbols
  rust: pin-init: use Markdown autolinks in Rust comments
  rust: pin-init: alloc: restrict `impl ZeroableOption` for `Box` to `T: Sized`
  scripts: generate_rust_analyzer: Add ffi crate

Merge tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Easter rc3 pull request, fixes in all the usuals, amdgpu, xe, msm,
  with some i915/ivpu/mgag200/v3d fixes, then a couple of bits in
  dma-buf/gem.

  Hopefully has no easter eggs in it.

  dma-buf:
   - Correctly decrement refcounter on errors

  gem:
   - Fix test for imported buffers

  amdgpu:
   - Cleaner shader sysfs fix
   - Suspend fix
   - Fix doorbell free ordering
   - Video caps fix
   - DML2 memory allocation optimization
   - HDP fix

  i915:
   - Fix DP DSC configurations that require 3 DSC engines per pipe

  xe:
   - Fix LRC address being written too late for GuC
   - Fix notifier vs folio deadlock
   - Fix race betwen dma_buf unmap and vram eviction
   - Fix debugfs handling PXP terminations unconditionally

  msm:
   - Display:
       - Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in
         case of multi-rect
       - Fix to validate plane_state pointer before using it in
         dpu_plane_virtual_atomic_check()
       - Fix to make sure dereferencing dpu_encoder_phys happens after
         making sure it is valid in _dpu_encoder_trigger_start()
       - Remove the remaining intr_tear_rd_ptr which we initialized to
         -1 because NO_IRQ indices start from 0 now
   - GPU:
       - Fix IB_SIZE overflow

  ivpu:
   - Fix debugging
   - Fixes to frequency
   - Support firmware API 3.28.3
   - Flush jobs upon reset

  mgag200:
   - Set vblank start to correct values

  v3d:
   - Fix Indirect Dispatch"

* tag 'drm-fixes-2025-04-19' of https://gitlab.freedesktop.org/drm/kernel: (26 commits)
  drm/msm/a6xx+: Don't let IB_SIZE overflow
  drm/xe/pxp: do not queue unneeded terminations from debugfs
  drm/xe/dma_buf: stop relying on placement in unmap
  drm/xe/userptr: fix notifier vs folio deadlock
  drm/xe: Set LRC addresses before guc load
  drm/mgag200: Fix value in <VBLKSTR> register
  drm/gem: Internally test import_attach for imported objects
  drm/amdgpu: Use the right function for hdp flush
  drm/amd/display/dml2: use vzalloc rather than kzalloc
  drm/amdgpu: Add back JPEG to video caps for carrizo and newer
  drm/amdgpu: fix warning of drm_mm_clean
  drm/amd: Forbid suspending into non-default suspend states
  drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v4
  drm/i915/dp: Check for HAS_DSC_3ENGINES while configuring DSC slices
  drm/i915/display: Add macro for checking 3 DSC engines
  dma-buf/sw_sync: Decrement refcount on error in sw_sync_ioctl_get_deadline()
  accel/ivpu: Add cmdq_id to job related logs
  accel/ivpu: Show NPU frequency in sysfs
  accel/ivpu: Fix the NPU's DPU frequency calculation
  accel/ivpu: Update FW Boot API to version 3.28.3
  ...

Merge tag 'drm-msm-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Fixes for v6.15-rc3

Display:
- Fix to call dpu_plane_atomic_check_pipe() for both SSPPs in
  case of multi-rect
- Fix to validate plane_state pointer before using it in
  dpu_plane_virtual_atomic_check()
- Fix to make sure dereferencing dpu_encoder_phys happens after
  making sure it is valid in _dpu_encoder_trigger_start()
- Remove the remaining intr_tear_rd_ptr which we initialized
  to -1 because NO_IRQ indices start from 0 now

GPU:
- Fix IB_SIZE overflow

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://lore.kernel.org/r/CAF6AEGtVKXEVdzUzFWmQE8JmK3nx_hp+ynOd-5j3vnfcU-sgOA@mail.gmail.com

Merge tag 'drm-xe-fixes-2025-04-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix LRC address being written too late for GuC
- Fix notifier vs folio deadlock
- Fix race betwen dma_buf unmap and vram eviction
- Fix debugfs handling PXP terminations unconditionally

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/ndinq644zenywaaycxyfqqivsb2xer4z7err3dlpalbz33jfkm@ttabzsg6wnet

Merge tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix hard link lease key problem when close is deferred

- Revert the socket lockdep/refcount workarounds done in cifs.ko now
   that it is fixed at the socket layer

* tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  Revert "smb: client: fix TCP timers deadlock after rmmod"
  Revert "smb: client: Fix netns refcount imbalance causing leaks and use-after-free"
  smb3 client: fix open hardlink on deferred close file error

Revert "crypto: testmgr - Add multibuffer acomp testing"

This reverts commit 99585c2192cb1ce212876e82ef01d1c98c7f4699.

Remove the acomp multibuffer tests as they are buggy.

Reported-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

drm/msm/a6xx+: Don't let IB_SIZE overflow

IB_SIZE is only b0..b19.  Starting with a6xx gen3, additional fields
were added above the IB_SIZE.  Accidentially setting them can cause
badness.  Fix this by properly defining the CP_INDIRECT_BUFFER packet
and using the generated builder macro to ensure unintended bits are not
set.

v2: add missing type attribute for IB_BASE
v3: fix offset attribute in xml

Reported-by: Connor Abbott <cwabbott0@gmail.com>
Fixes: a83366ef19ea ("drm/msm/a6xx: add A640/A650 to gpulist")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/643396/