git.kernel.dk Git - linux-2.6-block.git/log

xfs: reject max_atomic_write mount option for no reflink

If the FS has no reflink, then atomic writes greater than 1x block are not
supported. As such, for no reflink it is pointless to accept setting
max_atomic_write when it cannot be supported, so reject max_atomic_write
mount option in this case.

It could be still possible to accept max_atomic_write option of size 1x
block if HW atomics are supported, so check for this specifically.

Fixes: 4528b9052731 ("xfs: allow sysadmins to specify a maximum atomic write limit at mount time")
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: disallow atomic writes on DAX

Atomic writes are not currently supported for DAX, but two problems exist:
- we may go down DAX write path for IOCB_ATOMIC, which does not handle
IOCB_ATOMIC properly
- we report non-zero atomic write limits in statx (for DAX inodes)

We may want atomic writes support on DAX in future, but just disallow for
now.

For this, ensure when IOCB_ATOMIC is set that we check the write size
versus the atomic write min and max before branching off to the DAX write
path. This is not strictly required for DAX, as we should not get this far
in the write path as FMODE_CAN_ATOMIC_WRITE should not be set.

In addition, due to reflink being supported for DAX, we automatically get
CoW-based atomic writes support being advertised. Remedy this by
disallowing atomic writes for a DAX inode for both sw and hw modes.

Reported-by: Darrick J. Wong <djwong@kernel.org>
Fixes: 9dffc58f2384 ("xfs: update atomic write limits")
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

fs/dax: Reject IOCB_ATOMIC in dax_iomap_rw()

The DAX write path does not support IOCB_ATOMIC, so reject it when set.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: remove XFS_IBULK_SAME_AG

Add a new field to struct xfs_ibulk to directly pass XFS_IWALK* flags,
and thus remove the need to indirect the SAME_AG flag through
XFS_IBULK*.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: fully decouple XFS_IBULK* flags from XFS_IWALK* flags

Fix up xfs_inumbers to now pass in the XFS_IBULK* flags into the flags
argument to xfs_inobt_walk, which expects the XFS_IWALK* flags.

Currently passing the wrong flags works for non-debug builds because
the only XFS_IWALK* flag has the same encoding as the corresponding
XFS_IBULK* flag, but in debug builds it can trigger an assert that no
incorrect flag is passed. Instead just extra the relevant flag.

Fixes: 5b35d922c52798 ("xfs: Decouple XFS_IBULK flags from XFS_IWALK flags")
Cc: <stable@vger.kernel.org> # v5.19
Reported-by: cen zhang <zzzccc427@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: fix frozen file system assert in xfs_trans_alloc

Commit 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from
xfs_trans_alloc") move the place of the assert for a frozen file system
after the sb_start_intwrite call that ensures it doesn't run on frozen
file systems, and thus allows to incorrect trigger it.

Fix that by moving it back to where it belongs.

Fixes: 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from xfs_trans_alloc")
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

soc/tegra: pmc: Ensure power-domains are in a known state

After commit 13a4b7fb6260 ("pmdomain: core: Leave powered-on genpds on
until late_initcall_sync") was applied, the Tegra210 Jetson TX1 board
failed to boot. Looking into this issue, before this commit was applied,
if any of the Tegra power-domains were in 'on' state when the kernel
booted, they were being turned off by the genpd core before any driver
had chance to request them. This was purely by luck and a consequence of
the power-domains being turned off earlier during boot. After this
commit was applied, any power-domains in the 'on' state are kept on for
longer during boot and therefore, may never transitioned to the off
state before they are requested/used. The hang on the Tegra210 Jetson
TX1 is caused because devices in some power-domains are accessed without
the power-domain being turned off and on, indicating that the
power-domain is not in a completely on state.

>From reviewing the Tegra PMC driver code, if a power-domain is in the
'on' state there is no guarantee that all the necessary clocks
associated with the power-domain are on and even if they are they would
not have been requested via the clock framework and so could be turned
off later. Some power-domains also have a 'clamping' register that needs
to be configured as well. In short, if a power-domain is already 'on' it
is difficult to know if it has been configured correctly. Given that the
power-domains happened to be switched off during boot previously, to
ensure that they are in a good known state on boot, fix this by
switching off any power-domains that are on initially when registering
the power-domains with the genpd framework.

Note that commit 05cfb988a4d0 ("soc/tegra: pmc: Initialise resets
associated with a power partition") updated the
tegra_powergate_of_get_resets() function to pass the 'off' to ensure
that the resets for the power-domain are in the correct state on boot.
However, now that we may power off a domain on boot, if it is on, it is
better to move this logic into the tegra_powergate_add() function so
that there is a single place where we are handling the initial state of
the power-domain.

Fixes: a38045121bf4 ("soc/tegra: pmc: Add generic PM domain support")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250731121832.213671-1-jonathanh@nvidia.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

ALSA: hda/realtek: Add Framework Laptop 13 (AMD Ryzen AI 300) to quirks

Framework Laptop 13 (AMD Ryzen AI 300) requires the same quirk for
headset detection as other Framework 13 models.

Signed-off-by: Christopher Eby <kreed@kreed.org>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20250810030006.9060-1-kreed@kreed.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>

rcu: Fix racy re-initialization of irq_work causing hangs

RCU re-initializes the deferred QS irq work everytime before attempting
to queue it. However there are situations where the irq work is
attempted to be queued even though it is already queued. In that case
re-initializing messes-up with the irq work queue that is about to be
handled.

The chances for that to happen are higher when the architecture doesn't
support self-IPIs and irq work are then all lazy, such as with the
following sequence:

1) rcu_read_unlock() is called when IRQs are disabled and there is a
   grace period involving blocked tasks on the node. The irq work
   is then initialized and queued.

2) The related tasks are unblocked and the CPU quiescent state
   is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE,
   allowing the irq work to be requeued in the future (note the previous
   one hasn't fired yet).

3) A new grace period starts and the node has blocked tasks.

4) rcu_read_unlock() is called when IRQs are disabled again. The irq work
   is re-initialized (but it's queued! and its node is cleared) and
   requeued. Which means it's requeued to itself.

5) The irq work finally fires with the tick. But since it was requeued
   to itself, it loops and hangs.

Fix this with initializing the irq work only once before the CPU boots.

Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>

erofs: fix block count report when 48-bit layout is on

Fix incorrect shift order when combining the 48-bit block count.

Fixes: 2e1473d5195f ("erofs: implement 48-bit block addressing for unencoded inodes")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250807082019.3093539-1-hsiangkao@linux.alibaba.com

erofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOC

Since EROFS handles decompression in non-atomic contexts due to
uncontrollable decompression latencies and vmap() usage, it tries
to detect atomic contexts and only kicks off a kworker on demand
in order to reduce unnecessary scheduling overhead.

However, the current approach is insufficient and can lead to
sleeping function calls in invalid contexts, causing kernel
warnings and potential system instability. See the stacktrace [1]
and previous discussion [2].

The current implementation only checks rcu_read_lock_any_held(),
which behaves inconsistently across different kernel configurations:

- When CONFIG_DEBUG_LOCK_ALLOC is enabled: correctly detects
  RCU critical sections by checking rcu_lock_map
- When CONFIG_DEBUG_LOCK_ALLOC is disabled: compiles to
  "!preemptible()", which only checks preempt_count and misses
  RCU critical sections

This patch introduces z_erofs_in_atomic() to provide comprehensive
atomic context detection:

1. Check RCU preemption depth when CONFIG_PREEMPTION is enabled,
   as RCU critical sections may not affect preempt_count but still
   require atomic handling

2. Always use async processing when CONFIG_PREEMPT_COUNT is disabled,
   as preemption state cannot be reliably determined

3. Fall back to standard preemptible() check for remaining cases

The function replaces the previous complex condition check and ensures
that z_erofs always uses (kthread_)work in atomic contexts to minimize
scheduling overhead and prevent sleeping in invalid contexts.

[1] Problem stacktrace
[ 61.266692] BUG: sleeping function called from invalid context at kernel/locking/rtmutex_api.c:510
[ 61.266702] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 107, name: irq/54-ufshcd
[ 61.266704] preempt_count: 0, expected: 0
[ 61.266705] RCU nest depth: 2, expected: 0
[ 61.266710] CPU: 0 UID: 0 PID: 107 Comm: irq/54-ufshcd Tainted: G W O 6.12.17 #1
[ 61.266714] Tainted: [W]=WARN, [O]=OOT_MODULE
[ 61.266715] Hardware name: schumacher (DT)
[ 61.266717] Call trace:
[ 61.266718] dump_backtrace+0x9c/0x100
[ 61.266727] show_stack+0x20/0x38
[ 61.266728] dump_stack_lvl+0x78/0x90
[ 61.266734] dump_stack+0x18/0x28
[ 61.266736] __might_resched+0x11c/0x180
[ 61.266743] __might_sleep+0x64/0xc8
[ 61.266745] mutex_lock+0x2c/0xc0
[ 61.266748] z_erofs_decompress_queue+0xe8/0x978
[ 61.266753] z_erofs_decompress_kickoff+0xa8/0x190
[ 61.266756] z_erofs_endio+0x168/0x288
[ 61.266758] bio_endio+0x160/0x218
[ 61.266762] blk_update_request+0x244/0x458
[ 61.266766] scsi_end_request+0x38/0x278
[ 61.266770] scsi_io_completion+0x4c/0x600
[ 61.266772] scsi_finish_command+0xc8/0xe8
[ 61.266775] scsi_complete+0x88/0x148
[ 61.266777] blk_mq_complete_request+0x3c/0x58
[ 61.266780] scsi_done_internal+0xcc/0x158
[ 61.266782] scsi_done+0x1c/0x30
[ 61.266783] ufshcd_compl_one_cqe+0x12c/0x438
[ 61.266786] __ufshcd_transfer_req_compl+0x2c/0x78
[ 61.266788] ufshcd_poll+0xf4/0x210
[ 61.266789] ufshcd_transfer_req_compl+0x50/0x88
[ 61.266791] ufshcd_intr+0x21c/0x7c8
[ 61.266792] irq_forced_thread_fn+0x44/0xd8
[ 61.266796] irq_thread+0x1a4/0x358
[ 61.266799] kthread+0x12c/0x138
[ 61.266802] ret_from_fork+0x10/0x20

[2] https://lore.kernel.org/r/58b661d0-0ebb-4b45-a10d-c5927fb791cd@paulmck-laptop

Signed-off-by: Junli Liu <liujunli@lixiang.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250805011957.911186-1-liujunli@lixiang.com
[ Gao Xiang: Use the original trace in v1. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

erofs: Do not select tristate symbols from bool symbols

The EROFS filesystem has many configurable options, controlled through
boolean Kconfig symbols.  When enabled, these options may need to enable
additional library functionality elsewhere.  Currently this is done by
selecting the symbol for the additional functionality.  However, if
EROFS_FS itself is modular, and the target symbol is a tristate symbol,
the additional functionality is always forced built-in.

Selecting tristate symbols from a tristate symbol does keep modular
transitivity.  Hence fix this by moving selects of tristate symbols to
the main EROFS_FS symbol.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/da1b899e511145dd43fd2d398f64b2e03c6a39e7.1753879351.git.geert+renesas@glider.be
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

erofs: Fallback to normal access if DAX is not supported on extra device

If using multiple devices, we should check if the extra device support
DAX instead of checking the primary device when deciding if to use DAX
to access a file.

If an extra device does not support DAX we should fallback to normal
access otherwise the data on that device will be inaccessible.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Friendy Su <friendy.su@sony.com>
Reviewed-by: Jacky Cao <jacky.cao@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20250804082030.3667257-2-Yuezhang.Mo@sony.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

ASoC: tas2781: Fix spelling mistake "dismatch" -> "mismatch"

There is a spelling mistake (or neologism of dis and match) in a
dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20250808104943.829668-1-colin.i.king@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: rt1320: fix random cycle mute issue

This patch fixed the random cycle mute issue that occurs during long-time playback.

Signed-off-by: Shuming Fan <shumingf@realtek.com>
Link: https://patch.msgid.link/20250807092432.997989-1-shumingf@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: rt721: fix FU33 Boost Volume control not working

This patch fixed FU33 Boost Volume control not working.

Signed-off-by: Shuming Fan <shumingf@realtek.com>
Link: https://patch.msgid.link/20250808055706.1110766-1-shumingf@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: generic: tidyup standardized ASoC menu for generic

commit acc84d15e45393fb ("ASoC: generic: Standardize ASoC menu")
standardized ASoC generic menu. Then, it moved generic menu position
under SoC group. It should be kept generic position. Tidyup it.

Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/87v7n0c9d0.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codec: sma1307: replace spelling mistake with new error message

There is a spelling mistake in a failure message, replace the
message with something a little more meaningful.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20250808105324.829883-1-colin.i.king@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: tx-macro: correct tx_macro_component_drv name

We already have a component driver named "RX-MACRO", which is
lpass-rx-macro.c. The tx macro component driver's name should
be "TX-MACRO" accordingly. Fix it.

Cc: Srinivas Kandagatla <srini@kernel.org>
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20250806140030.691477-1-alexey.klimov@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: fsl_sai: replace regmap_write with regmap_update_bits

Use the regmap_write() for software reset in fsl_sai_config_disable would
cause the FSL_SAI_CSR_BCE bit to be cleared. Refer to
commit 197c53c8ecb34 ("ASoC: fsl_sai: Don't disable bitclock for i.MX8MP")
FSL_SAI_CSR_BCE should not be cleared. So need to use regmap_update_bits()
instead of regmap_write() for these bit operations.

Fixes: dc78f7e59169d ("ASoC: fsl_sai: Force a software reset when starting in consumer mode")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://patch.msgid.link/20250807020318.2143219-1-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

smb: client: fix race with concurrent opens in rename(2)

Besides sending the rename request to the server, the rename process
also involves closing any deferred close, waiting for outstanding I/O
to complete as well as marking all existing open handles as deleted to
prevent them from deferring closes, which increases the race window
for potential concurrent opens on the target file.

Fix this by unhashing the dentry in advance to prevent any concurrent
opens on the target.

Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: fix race with concurrent opens in unlink(2)

According to some logs reported by customers, CIFS client might end up
reporting unlinked files as existing in stat(2) due to concurrent
opens racing with unlink(2).

Besides sending the removal request to the server, the unlink process
could involve closing any deferred close as well as marking all
existing open handles as deleted to prevent them from deferring
closes, which increases the race window for potential concurrent
opens.

Fix this by unhashing the dentry in cifs_unlink() to prevent any
subsequent opens. Any open attempts, while we're still unlinking,
will block on parent's i_rwsem.

Reported-by: Jay Shin <jaeshin@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

Linux 6.17-rc1

Merge tag 'turbostat-2025.09.09' of git://git./linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:
"tools/power turbostat: version 2025.09.09

   - Probe and display L3 Cache topology

   - Add ability to average an added counter (useful for pre-integrated
     "counters", such as Watts)

   - Break the limit of 64 built-in counters

   - Assorted bug fixes and minor feature tweaks"

* tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2025.09.09
  tools/power turbostat: Handle non-root legacy-uncore sysfs permissions
  tools/power turbostat: standardize PER_THREAD_PARAMS
  tools/power turbostat: Fix DMR support
  tools/power turbostat: add format "average" for external attributes
  tools/power turbostat: delete GET_PKG()
  tools/power turbostat: probe and display L3 cache topology
  tools/power turbostat: Support more than 64 built-in-counters
  tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns
  tools/power turbostat: Fix bogus SysWatt for forked program
  tools/power turbostat: Handle cap_get_proc() ENOSYS
  tools/power turbostat: Fix build with musl
  tools/power turbostat: verify arguments to params --show and --hide
  tools/power turbostat: regression fix: --show C1E%

Merge tag 'smp_urgent_for_v6.17_rc1' of git://git./linux/kernel/git/tip/tip

Pull smp fixes from Borislav Petkov:

- Remove an obsolete comment and fix spelling

* tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Remove obsolete comment from takedown_cpu()
smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment

Merge tag 'irq_urgent_for_v6.17_rc1' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

- Fix a wrong ioremap size in mvebu-gicp

- Remove yet another compile-test case for a driver which needs an
   additional dependency

- Fix a lock inversion scenario in the IRQ unit test suite

- Remove an impossible flag situation in gic-v5

- Do not iounmap resources in gic-v5 which are managed by devm

- Make sure stale, left-over interrupts in mvebu-gicp are cleared on
   driver init

- Fix a reference counting mishap in msi-lib

- Fix a dereference-before-null-ptr-check case in the riscv-imsic
   irqchip driver

* tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mvebu-gicp: Use resource_size() for ioremap()
  irqchip: Build IMX_MU_MSI only on ARM
  genirq/test: Resolve irq lock inversion warnings
  irqchip/gic-v5: Remove IRQD_RESEND_WHEN_IN_PROGRESS for ITS IRQs
  irqchip/gic-v5: iwb: Fix iounmap probe failure path
  irqchip/mvebu-gicp: Clear pending interrupts on init
  irqchip/msi-lib: Fix fwnode refcount in msi_lib_irq_domain_select()
  irqchip/riscv-imsic: Don't dereference before NULL pointer check

Merge tag 'x86_urgent_for_v6.17_rc1' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Fix an interrupt vector setup race which leads to a non-functioning
   device

- Add new Intel CPU models *and* a family: 0x12. Finally. Yippie! :-)

* tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Plug vector setup race
  x86/cpu: Add new Intel CPU model numbers for Wildcatlake and Novalake

Merge tag 'locking_urgent_for_v6.17_rc1' of git://git./linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:

- Prevent a futex hash leak due to different mm lifetimes

* tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Move futex cleanup to __mmdrop()

tools/power turbostat: version 2025.09.09

Probe and display L3 Cache topology
Add ability to average an added counter
(useful for pre-integrated "counters", such as Watts)
Break the limit of 64 built-in counters.
Assorted bug fixes and minor feature tweaks

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Handle non-root legacy-uncore sysfs permissions

/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/
may be readable by all, but
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz
may be readable only by root.

Non-root turbostat users see complaints in this scenario.

Fail probe of the interface if we can't read current_freq_khz.

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Original-patch-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: standardize PER_THREAD_PARAMS

use a macro for PER_THREAD_PARAMS to make adding one later more clear.

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Fix DMR support

Together with the RAPL MSRs, there are more MSRs gone on DMR, including
PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response
Time Limit) MSRs. The configurable TDP info should also be retrieved
from TPMI based Intel Speed Select Technology feature.

Remove the access of these MSRs for DMR. Improve the DMR platform
feature table to make it more readable at the same time.

Fixes: 83075bd59de2 ("tools/power turbostat: Add initial support for DMR")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: add format "average" for external attributes

External atributes with format "raw" are not printed in summary lines
for nodes/packages (or with option -S). The new format "average"
behaves like "raw" but also adds the summary data

Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: delete GET_PKG()

pkg_base[pkg_id] is a simple array of structure pointers,
let the compiler treat it that way.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: probe and display L3 cache topology

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Support more than 64 built-in-counters

We have out-grown the ability to use a 64-bit memory location
to inventory every possible built-in counter.
Leverage the the CPU_SET(3) macros to break this barrier.

Also, break the Joules & Watts counters into two,
since we can no longer 'or' them together...

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns

Explain the meaning of the Totl%C0, Any%C0, GFX%C0, CPUGFX% columns.

Signed-off-by: Len Brown <len.brown@intel.com>

Merge tag 'tty-6.16-rc1-2' of git://git./linux/kernel/git/gregkh/tty

Pull TTY fix from Greg KH:
"Here is a single revert of one of the previous patches that went in
  the last tty/serial merge that is breaking userspace on some platforms
  (specifically powerpc, probably a few others.)

  It accidentially changed the ioctl values of some tty ioctls, which
  breaks xorg.

  The revert has been in linux-next all this week with no reported
  issues"

* tag 'tty-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "tty: vt: use _IO() to define ioctl numbers"

Merge tag 'efi-next-for-v6.17' of git://git./linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:

- Expose the OVMF firmware debug log via sysfs

- Lower the default log level for the EFI stub to avoid corrupting any
   splash screens with unimportant diagnostic output

* tag 'efi-next-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: add API doc entry for ovmf_debug_log
  efistub: Lower default log level
  efi: add ovmf debug log driver

Merge tag 'bpf-fixes' of git://git./linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix memory leak of bpf_scc_info objects (Eduard Zingerman)

- Fix a regression in the 'perf' tool caused by moving UID filtering to
   BPF (Ilya Leoshkevich)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  perf bpf-filter: Enable events manually
  libbpf: Add the ability to suppress perf event enablement
  bpf: Fix memory leak of bpf_scc_info objects

Merge tag 'block-6.17-20250808' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:

- MD pull request via Yu:
      - mddev null-ptr-dereference fix, by Erkun
      - md-cluster fail to remove the faulty disk regression fix, by
        Heming
      - minor cleanup, by Li Nan and Jinchao
      - mdadm lifetime regression fix reported by syzkaller, by Yu Kuai

- MD pull request via Christoph
      - add support for getting the FDP featuee in fabrics passthru path
        (Nitesh Shetty)
      - add capability to connect to an administrative controller
        (Kamaljit Singh)
      - fix a leak on sgl setup error (Keith Busch)
      - initialize discovery subsys after debugfs is initialized
        (Mohamed Khalfella)
      - fix various comment typos (Bjorn Helgaas)
      - remove unneeded semicolons (Jiapeng Chong)

- nvmet debugfs ordering issue fix

- Fix UAF in the tag_set in zloop

- Ensure sbitmap shallow depth covers entire set

- Reduce lock roundtrips in io context lookup

- Move scheduler tags alloc/free out of elevator and freeze lock, to
   fix some lockdep found issues

- Improve robustness of queue limits checking

- Fix a regression with IO priorities, if no io context exists

* tag 'block-6.17-20250808' of git://git.kernel.dk/linux: (26 commits)
  lib/sbitmap: make sbitmap_get_shallow() internal
  lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  nvmet: exit debugfs after discovery subsystem exits
  block, bfq: Reorder struct bfq_iocq_bfqq_data
  md: make rdev_addable usable for rcu mode
  md/raid1: remove struct pool_info and related code
  md/raid1: change r1conf->r1bio_pool to a pointer type
  block: ensure discard_granularity is zero when discard is not supported
  zloop: fix KASAN use-after-free of tag set
  block: Fix default IO priority if there is no IO context
  nvme: fix various comment typos
  nvme-auth: remove unneeded semicolon
  nvme-pci: fix leak on sgl setup error
  nvmet: initialize discovery subsys after debugfs is initialized
  nvme: add capability to connect to an administrative controller
  nvmet: add support for FDP in fabrics passthru path
  md: rename recovery_cp to resync_offset
  md/md-cluster: handle REMOVE message earlier
  md: fix create on open mddev lifetime regression
  block: fix potential deadlock while running nr_hw_queue update
  ...

Merge tag 'io_uring-6.17-20250808' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Allow vectorized payloads for send/send-zc - like sendmsg, but
   without the hassle of a msghdr.

- Fix for an integer wrap that should go to stable, spotted by syzbot.
   Nothing alarming here, as you need to be root to hit this.
   Nevertheless, it should get fixed.

   FWIW, kudos to the syzbot crew for having much nicer reproducers now,
   and with nicely annotated source code as well. This is particularly
   useful as syzbot uses the raw interface rather than liburing,
   historically it's been difficult to turn a syzbot reproducer into a
   meaningful test case. With the recent changes, not true anymore!

* tag 'io_uring-6.17-20250808' of git://git.kernel.dk/linux:
  io_uring/memmap: cast nr_pages to size_t before shifting
  io_uring/net: Allow to do vectorized send

Merge tag 'spi-fix-v6.17-merge-window' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"There's one fix here for an issue with the CS42L43 where we were
  allocating a single property for client devices as just that property
  rather than a terminated array of properties like we are supposed to.

  We also have an update to the MAINTAINERS file for some Renesas
  devices"

* tag 'spi-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: cs42l43: Property entry should be a null-terminated array
  MAINTAINERS: Add entries for the RZ/V2H(P) RSPI

Merge tag 'regulator-fix-v6.17-merge-window' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"This fixes an issue with the newly added code for handling large
  voltage changes on regulators which require that individual voltage
  changes cover a limited range, the check for convergence was broken"

* tag 'regulator-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: core: correct convergence check in regulator_set_voltage()

Merge tag 'regmap-fix-v6.17-merge-window' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"These patches fix a lockdep issue Russell King reported with nested
  regmap-irqs (unusual since regmap is generally for devices on slow
  buses so devices don't get nested), plus add a missing mutex free
  which I noticed while implementing a fix for that issue"

* tag 'regmap-fix-v6.17-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: irq: Avoid lockdep warnings with nested regmap-irq chips
  regmap: irq: Free the regmap-irq mutex

Merge tag 'pci-v6.17-fixes-1' of git://git./linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

- Fix vmd MSI interrupt domain restructure that caused crash early in
boot (Nam Cao)

* tag 'pci-v6.17-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: vmd: Fix wrong kfree() in vmd_msi_free()

Merge tag 'mailbox-v6.17' of git://git./linux/kernel/git/jassibrar/mailbox

Pull mailbox updates from Jassi Brar:

- aspeed: add driver and bindings for ast2700

- broadcom: add driver and bindings for bcm74110

- mediatek: fix RPM api usage

- qcom: use dev_fwnode

- pcc: support shared buffer

- misc dt-bindings cleanup

* tag 'mailbox-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
  mailbox/pcc: support mailbox management of the shared buffer
  mailbox: bcm74110: Fix spelling mistake
  mailbox: bcm74110: remove unneeded semicolon
  mailbox: aspeed: add mailbox driver for AST27XX series SoC
  dt-bindings: mailbox: Add ASPEED AST2700 series SoC
  dt-bindings: mailbox: Drop consumers example DTS
  dt-bindings: mailbox: nvidia,tegra186-hsp: Use generic node name
  dt-bindings: mailbox: Correct example indentation
  dt-bindings: mailbox: ti,secure-proxy: Add missing reg maxItems
  dt-bindings: mailbox: amlogic,meson-gxbb-mhu: Add missing interrupts maxItems
  dt-bindings: mailbox: qcom-ipcc: document the Milos Inter-Processor Communication Controller
  mailbox: Add support for bcm74110
  dt-bindings: mailbox: Add support for bcm74110
  mailbox: Use dev_fwnode()
  mailbox: mtk-cmdq: Switch to pm_runtime_put_autosuspend()

Merge tag 'gpio-updates-for-v6.17-rc1-part2' of git://git./linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
"As discussed: there's a small commit that removes the legacy GPIO line
  value setter callbacks as they're no longer used and a big, treewide
  commit that renames the new ones to the old names across all GPIO
  drivers at once.

  While at it: there are also two fixes that I picked up over the course
  of the merge window:

   - remove unused, legacy GPIO line value setters from struct gpio_chip

   - rename the new set callbacks back to the original names treewide

   - fix interrupt handling in gpio-mlxbf2

   - revert a buggy immutable irqchip conversion"

* tag 'gpio-updates-for-v6.17-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  treewide: rename GPIO set callbacks back to their original names
  gpio: remove legacy GPIO line value setter callbacks
  gpio: mlxbf2: use platform_get_irq_optional()
  Revert "gpio: pxa: Make irq_chip immutable"

Merge tag 'sound-fix-6.17-rc1' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:

- Support for ASoC AMD ACP 7.2 with new IDs

- ASoC Intel AVS and SOF fixes

- Yet more kconfig adjustments for HD-audio codecs

- TAS2781 codec fixes

- Fixes for longstanding (rather minor) bugs in Intel LPE audio and
   USB-audio drivers

* tag 'sound-fix-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/cirrus: Restrict prompt only for CONFIG_EXPERT
  ALSA: hda/hdmi: Restrict prompt only for CONFIG_EXPERT
  ALSA: hda/realtek: Restrict prompt only for CONFIG_EXPERT
  ALSA: hda/ca0132: Fix missing error handling in ca0132_alt_select_out()
  ASoC: SOF: Intel: hda-sdw-bpt: fix SND_SOF_SOF_HDA_SDW_BPT dependencies
  ALSA: hda/tas2781: Support L"SmartAmpCalibrationData" to save calibrated data
  ALSA: intel_hdmi: Fix off-by-one error in __hdmi_lpe_audio_probe()
  ALSA: hda/realtek: add LG gram 16Z90R-A to alc269 fixup table
  ALSA: usb-audio: Don't use printk_ratelimit for debug prints
  ASoC: Intel: sof_sdw: Add quirk for Alienware Area 51 (2025) 0CCC SKU
  ASoC: tas2781: Fix the wrong step for TLV on tas2781
  ASoC: amd: acp: Add SoundWire SOF machine driver support for acp7.2 platform
  ASoC: amd: acp: Add SoundWire legacy machine driver support for acp7.2 platform
  ASoC: amd: ps: Add SoundWire pci and dma driver support for acp7.2 platform
  ASoC: SOF: amd: Add sof audio support for acp7.2 platform
  ASoC: Intel: avs: Fix uninitialized pointer error in probe()
  ASoC: wm8962: Clear master mode when enter runtime suspend
  ASoC: SOF: amd: acp-loader: Use GFP_KERNEL for DMA allocations in resume context

Merge tag 'soc-fixes-6.17-1' of git://git./linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"These are a few patches to fix up bits that went missing during the
  merge window: The tegra and s3c patches address trivial regressions
  from conflicts, the bcm7445 makes the dt conform to the binding that
  was made stricter"

* tag 'soc-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: tegra: Remove numa-node-id properties
  ARM: s3c/gpio: complete the conversion to new GPIO value setters
  ARM: dts: broadcom: Fix bcm7445 memory controller compatible

Merge tag 'xtensa-20250808' of https://github.com/jcmvbkbc/linux-xtensa

Pull xtensa update from Max Filippov:

- replace __ASSEMBLY__ with __ASSEMBLER__ in arch headers

* tag 'xtensa-20250808' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
xtensa: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers

Merge tag 'v6.17-p2' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"Fix a regression that broke hmac(sha3-224-s390)"

* tag 'v6.17-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: hash - Increase HASH_MAX_DESCSIZE for hmac(sha3-224-s390)

Merge tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
"Highlights include:

  Stable fixes:
   - don't inherit NFS filesystem capabilities when crossing from one
     filesystem to another

  Bugfixes:
   - NFS wakeup of __nfs_lookup_revalidate() needs memory barriers
   - NFS improve bounds checking in nfs_fh_to_dentry()
   - NFS Fix allocation errors when writing to a NFS file backed
     loopback device
   - NFSv4: More listxattr fixes
   - SUNRPC: fix client handling of TLS alerts
   - pNFS block/scsi layout fix for an uninitialised pointer
     dereference
   - pNFS block/scsi layout fixes for the extent encoding, stripe
     mapping, and disk offset overflows
   - pNFS layoutcommit work around for RPC size limitations
   - pNFS/flexfiles avoid looping when handling fatal errors after
     layoutget
   - localio: fix various race conditions

  Features and cleanups:
   - Add NFSv4 support for retrieving the btime
   - NFS: Allow folio migration for the case of mode == MIGRATE_SYNC
   - NFS: Support using a kernel keyring to store TLS certificates
   - NFSv4: Speed up delegation lookup using a hash table
   - Assorted cleanups to remove unused variables and struct fields
   - Assorted new tracepoints to improve debugging"

* tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
  NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
  NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
  NFS/localio: nfs_close_local_fh() fix check for file closed
  NFSv4: Remove duplicate lookups, capability probes and fsinfo calls
  NFS: Fix the setting of capabilities when automounting a new filesystem
  sunrpc: fix client side handling of tls alerts
  nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()
  NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY
  NFSv4.2: another fix for listxattr
  NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
  SUNRPC: Silence warnings about parameters not being described
  NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final()
  NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate()
  NFS: use a hash table for delegation lookup
  NFS: track active delegations per-server
  NFS: move the delegation_watermark module parameter
  NFS: cleanup nfs_inode_reclaim_delegation
  NFS: cleanup error handling in nfs4_server_common_setup
  pNFS/flexfiles: don't attempt pnfs on fatal DS errors
  NFS: drop __exit from nfs_exit_keyring
  ...

Merge tag 'v6.17rc-part2-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull more smb client updates from Steve French:
"Non-smbdirect:
   - Fix null ptr deref caused by delay in global spinlock
     initialization
   - Two fixes for native symlink creation with SMB3.1.1 POSIX
     Extensions
   - Fix for socket special file creation with SMB3.1.1 POSIX Exensions
   - Reduce lock contention by splitting out mid_counter_lock
   - move SMB1 transport code to separate file to reduce module size
     when support for legacy servers is disabled
   - Two cleanup patches: rename mid_lock to make it clearer what it
     protects and one to convert mid flags to bool to make clearer

  Smbdirect/RDMA restructuring and fixes:
   - Fix for error handling in send done
   - Remove unneeded empty packet queue
   - Fix put_receive_buffer error path
   - Two fixes to recv_done error paths
   - Remove unused variable
   - Improve response and recvmsg type handling
   - Fix handling of incoming message type
   - Two cleanup fixes for better handling smbdirect recv io
   - Two cleanup fixes for socket spinlock
   - Two patches that add socket reassembly struct
   - Remove unused connection_status enum
   - Use flag in common header for SMBDIRECT_RECV_IO_MAX_SGE
   - Two cleanup patches to introduce and use smbdirect send io
   - Two cleanup patches to introduce and use smbdirect send_io struct
   - Fix to return error if rdma connect takes longer than 5 seconds
   - Error logging improvements
   - Fix redundand call to init_waitqueue_head
   - Remove unneeded wait queue"

* tag 'v6.17rc-part2-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (33 commits)
  smb: client: only use a single wait_queue to monitor smbdirect connection status
  smb: client: don't call init_waitqueue_head(&info->conn_wait) twice in _smbd_get_connection
  smb: client: improve logging in smbd_conn_upcall()
  smb: client: return an error if rdma_connect does not return within 5 seconds
  smb: client: make use of smbdirect_socket.{send,recv}_io.mem.{cache,pool}
  smb: smbdirect: add smbdirect_socket.{send,recv}_io.mem.{cache,pool}
  smb: client: make use of struct smbdirect_send_io
  smb: smbdirect: introduce struct smbdirect_send_io
  smb: client: make use of SMBDIRECT_RECV_IO_MAX_SGE
  smb: smbdirect: add SMBDIRECT_RECV_IO_MAX_SGE
  smb: client: remove unused enum smbd_connection_status
  smb: client: make use of smbdirect_socket.recv_io.reassembly.*
  smb: smbdirect: introduce smbdirect_socket.recv_io.reassembly.*
  smb: client: make use of smb: smbdirect_socket.recv_io.free.{list,lock}
  smb: smbdirect: introduce smbdirect_socket.recv_io.free.{list,lock}
  smb: client: make use of struct smbdirect_recv_io
  smb: smbdirect: introduce struct smbdirect_recv_io
  smb: client: make use of smbdirect_socket->recv_io.expected
  smb: smbdirect: introduce smbdirect_socket.recv_io.expected
  smb: client: remove unused smbd_connection->fragment_reassembly_remaining
  ...

Merge tag 'v6.17rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- Fix limiting repeated connections from same IP

- Fix for extracting shortname when name begins with a dot

- Four smbdirect fixes:
     - three fixes to the receive path: potential unmap bug, potential
       resource leaks and stale connections, and also potential use
       after free race
     - cleanup to remove unneeded queue

* tag 'v6.17rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  smb: server: Fix extension string in ksmbd_extract_shortname()
  ksmbd: limit repeated connections from clients with the same IP
  smb: server: let recv_done() avoid touching data_transfer after cleanup/move
  smb: server: let recv_done() consistently call put_recvmsg/smb_direct_disconnect_rdma_connection
  smb: server: make sure we call ib_dma_unmap_single() only if we called ib_dma_map_single already
  smb: server: remove separate empty_recvmsg_queue

Merge tag 'tegra-for-6.17-arm64-dt-v3' of https://git./linux/kernel/git/tegra/linux into arm/fixes

arm64: tegra: Device tree changes for v6.17-rc1

This contains an extra patch that drops numa-node-id properties that
were added to the Tegra264 DT files by mistake.

* tag 'tegra-for-6.17-arm64-dt-v3' of https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  arm64: tegra: Remove numa-node-id properties
  arm64: tegra: Add p3971-0089+p3834-0008 support
  arm64: tegra: Add memory controller on Tegra264
  arm64: tegra: Add Tegra264 support
  dt-bindings: memory: tegra: Add Tegra264 support

Link: https://lore.kernel.org/r/20250731162920.3329820-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

hv_netvsc: Fix panic during namespace deletion with VF

The existing code move the VF NIC to new namespace when NETDEV_REGISTER is
received on netvsc NIC. During deletion of the namespace,
default_device_exit_batch() >> default_device_exit_net() is called. When
netvsc NIC is moved back and registered to the default namespace, it
automatically brings VF NIC back to the default namespace. This will cause
the default_device_exit_net() >> for_each_netdev_safe loop unable to detect
the list end, and hit NULL ptr:

[  231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0
[  231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  231.450246] #PF: supervisor read access in kernel mode
[  231.450579] #PF: error_code(0x0000) - not-present page
[  231.450916] PGD 17b8a8067 P4D 0
[  231.451163] Oops: Oops: 0000 [#1] SMP NOPTI
[  231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY
[  231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
[  231.452692] Workqueue: netns cleanup_net
[  231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0
[  231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00
[  231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246
[  231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb
[  231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564
[  231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000
[  231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340
[  231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340
[  231.457161] FS:  0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000
[  231.457707] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0
[  231.458434] Call Trace:
[  231.458600]  <TASK>
[  231.458777]  ops_undo_list+0x100/0x220
[  231.459015]  cleanup_net+0x1b8/0x300
[  231.459285]  process_one_work+0x184/0x340

To fix it, move the ns change to a workqueue, and take rtnl_lock to avoid
changing the netdev list when default_device_exit_net() is using it.

Cc: stable@vger.kernel.org
Fixes: 4c262801ea60 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1754511711-11188-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hamradio: ignore ops-locked netdevs

Syzkaller managed to trigger lock dependency in xsk_notify via
register_netdevice. As discussed in [0], using register_netdevice
in the notifiers is problematic so skip adding hamradio for ops-locked
devices.

       xsk_notifier+0x89/0x230 net/xdp/xsk.c:1664
       notifier_call_chain+0x1b6/0x3e0 kernel/notifier.c:85
       call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
       call_netdevice_notifiers net/core/dev.c:2281 [inline]
       unregister_netdevice_many_notify+0x14d7/0x1ff0 net/core/dev.c:12156
       unregister_netdevice_many net/core/dev.c:12219 [inline]
       unregister_netdevice_queue+0x33c/0x380 net/core/dev.c:12063
       register_netdevice+0x1689/0x1ae0 net/core/dev.c:11241
       bpq_new_device drivers/net/hamradio/bpqether.c:481 [inline]
       bpq_device_event+0x491/0x600 drivers/net/hamradio/bpqether.c:523
       notifier_call_chain+0x1b6/0x3e0 kernel/notifier.c:85
       call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
       call_netdevice_notifiers net/core/dev.c:2281 [inline]
       __dev_notify_flags+0x18d/0x2e0 net/core/dev.c:-1
       netif_change_flags+0xe8/0x1a0 net/core/dev.c:9608
       dev_change_flags+0x130/0x260 net/core/dev_api.c:68
       devinet_ioctl+0xbb4/0x1b50 net/ipv4/devinet.c:1200
       inet_ioctl+0x3c0/0x4c0 net/ipv4/af_inet.c:1001

0: https://lore.kernel.org/netdev/20250625140357.6203d0af@kernel.org/
Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: syzbot+e6300f66a999a6612477@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e6300f66a999a6612477
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250806213726.1383379-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lapbether: ignore ops-locked netdevs

Syzkaller managed to trigger lock dependency in xsk_notify via
register_netdevice. As discussed in [0], using register_netdevice
in the notifiers is problematic so skip adding lapbeth for ops-locked
devices.

       xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
       notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
       call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
       call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
       call_netdevice_notifiers net/core/dev.c:2282 [inline]
       unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
       unregister_netdevice_many net/core/dev.c:12140 [inline]
       unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984
       register_netdevice+0x18f1/0x2270 net/core/dev.c:11149
       lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline]
       lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462
       notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
       call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
       call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
       call_netdevice_notifiers net/core/dev.c:2282 [inline]
       __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497
       netif_change_flags+0x108/0x160 net/core/dev.c:9526
       dev_change_flags+0xba/0x250 net/core/dev_api.c:68
       devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200
       inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001

0: https://lore.kernel.org/netdev/20250625140357.6203d0af@kernel.org/
Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: syzbot+e67ea9c235b13b4f0020@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e67ea9c235b13b4f0020
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250806213726.1383379-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: Fix KSZ8863 reset problem

ksz8873_valid_regs[] was added for register access for KSZ8863/KSZ8873
switches, but the reset register is not in the list so
ksz8_reset_switch() does not take any effect.

Replace regmap_update_bits() using ksz_regmap_8 with ksz_rmw8() so that
an error message will be given if the register is not defined.

A side effect of not resetting the switch is the static MAC table is not
cleared. Further additions to the table will show write error as there
are only 8 entries in the table.

Fixes: d0dec3333040 ("net: dsa: microchip: Add register access control for KSZ8873 chip")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250807005453.8306-1-Tristram.Ha@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: linearize cloned gso packets in sctp_rcv

A cloned head skb still shares these frag skbs in fraglist with the
original head skb. It's not safe to access these frag skbs.

syzbot reported two use-of-uninitialized-memory bugs caused by this:

  BUG: KMSAN: uninit-value in sctp_inq_pop+0x15b7/0x1920 net/sctp/inqueue.c:211
   sctp_inq_pop+0x15b7/0x1920 net/sctp/inqueue.c:211
   sctp_assoc_bh_rcv+0x1a7/0xc50 net/sctp/associola.c:998
   sctp_inq_push+0x2ef/0x380 net/sctp/inqueue.c:88
   sctp_backlog_rcv+0x397/0xdb0 net/sctp/input.c:331
   sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1122
   __release_sock+0x1da/0x330 net/core/sock.c:3106
   release_sock+0x6b/0x250 net/core/sock.c:3660
   sctp_wait_for_connect+0x487/0x820 net/sctp/socket.c:9360
   sctp_sendmsg_to_asoc+0x1ec1/0x1f00 net/sctp/socket.c:1885
   sctp_sendmsg+0x32b9/0x4a80 net/sctp/socket.c:2031
   inet_sendmsg+0x25a/0x280 net/ipv4/af_inet.c:851
   sock_sendmsg_nosec net/socket.c:718 [inline]

and

  BUG: KMSAN: uninit-value in sctp_assoc_bh_rcv+0x34e/0xbc0 net/sctp/associola.c:987
   sctp_assoc_bh_rcv+0x34e/0xbc0 net/sctp/associola.c:987
   sctp_inq_push+0x2a3/0x350 net/sctp/inqueue.c:88
   sctp_backlog_rcv+0x3c7/0xda0 net/sctp/input.c:331
   sk_backlog_rcv+0x142/0x420 include/net/sock.h:1148
   __release_sock+0x1d3/0x330 net/core/sock.c:3213
   release_sock+0x6b/0x270 net/core/sock.c:3767
   sctp_wait_for_connect+0x458/0x820 net/sctp/socket.c:9367
   sctp_sendmsg_to_asoc+0x223a/0x2260 net/sctp/socket.c:1886
   sctp_sendmsg+0x3910/0x49f0 net/sctp/socket.c:2032
   inet_sendmsg+0x269/0x2a0 net/ipv4/af_inet.c:851
   sock_sendmsg_nosec net/socket.c:712 [inline]

This patch fixes it by linearizing cloned gso packets in sctp_rcv().

Fixes: 90017accff61 ("sctp: Add GSO support")
Reported-by: syzbot+773e51afe420baaf0e2b@syzkaller.appspotmail.com
Reported-by: syzbot+70a42f45e76bede082be@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://patch.msgid.link/dd7dc337b99876d4132d0961f776913719f7d225.1754595611.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools/power turbostat: Fix bogus SysWatt for forked program

Similar to delta_cpu(), delta_platform() is called in turbostat main
loop. This ensures accurate SysWatt readings in periodic monitoring mode
$ sudo turbostat -S -q --show power -i 1
CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt
60 61 6.21 1.13 0.16 0.00 0.00 0.00 13.07
58 61 6.00 1.07 0.18 0.00 0.00 0.00 12.75
58 61 5.74 1.05 0.17 0.00 0.00 0.00 12.22
58 60 6.27 1.11 0.24 0.00 0.00 0.00 13.55

However, delta_platform() is missing for forked program and causes bogus
SysWatt reporting,
$ sudo turbostat -S -q --show power sleep 1
1.004736 sec
CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt
57 58 6.05 1.02 0.16 0.00 0.00 0.00 0.03

Add missing delta_platform() for forked program.

Fixes: e5f687b89bc2 ("tools/power turbostat: Add RAPL psys as a built-in counter")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Handle cap_get_proc() ENOSYS

Kernels configured with CONFIG_MULTIUSER=n have no cap_get_proc().
Check for ENOSYS to recognize this case, and continue on to
attempt to access the requested MSRs (such as temperature).

Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Fix build with musl

turbostat.c: In function 'parse_int_file':
    turbostat.c:5567:19: error: 'PATH_MAX' undeclared (first use in this function)
     5567 |         char path[PATH_MAX];
          |                   ^~~~~~~~

    turbostat.c: In function 'probe_graphics':
    turbostat.c:6787:19: error: 'PATH_MAX' undeclared (first use in this function)
     6787 |         char path[PATH_MAX];
          |                   ^~~~~~~~

Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: verify arguments to params --show and --hide

$ sudo turbostat --quiet --show junk
turbostat: Counter 'junk' can not be added.

Previously, invalid arguments to --show and --hide were silently ignored

Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

vsock: Do not allow binding to VMADDR_PORT_ANY

It is possible for a vsock to autobind to VMADDR_PORT_ANY. This can
cause a use-after-free when a connection is made to the bound socket.
The socket returned by accept() also has port VMADDR_PORT_ANY but is not
on the list of unbound sockets. Binding it will result in an extra
refcount decrement similar to the one fixed in fcdd2242c023 (vsock: Keep
the binding until socket destruction).

Modify the check in __vsock_bind_connectible() to also prevent binding
to VMADDR_PORT_ANY.

Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Reported-by: Budimir Markovic <markovicbudimir@gmail.com>
Signed-off-by: Budimir Markovic <markovicbudimir@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250807041811.678-1-markovicbudimir@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ti: icss-iep: Fix incorrect type for return value in extts_enable()

The variable ret in icss_iep_extts_enable() was incorrectly declared
as u32, while the function returns int and may return negative error
codes. This will cause sign extension issues and incorrect error
propagation. Update ret to be int to fix error handling.

This change corrects the declaration to avoid potential type mismatch.

Fixes: c1e0230eeaab ("net: ti: icss-iep: Add IEP driver")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250805142323.1949406-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: page_pool: allow enabling recycling late, fix false positive warning

Page pool can have pages "directly" (locklessly) recycled to it,
if the NAPI that owns the page pool is scheduled to run on the same CPU.
To make this safe we check that the NAPI is disabled while we destroy
the page pool. In most cases NAPI and page pool lifetimes are tied
together so this happens naturally.

The queue API expects the following order of calls:
-> mem_alloc
    alloc new pp
-> stop
    napi_disable
-> start
    napi_enable
-> mem_free
    free old pp

Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
But the NAPIs are only stopped between ->stop and ->start. We created
page_pool_disable_direct_recycling() to safely shut down the recycling
in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
have to worry about recycling any more.

Unfortunately, the page_pool_disable_direct_recycling() is not enough
to deal with failures which necessitate freeing the _new_ page pool.
If we hit a failure in ->mem_alloc or ->stop the new page pool has
to be freed while the NAPI is active (assuming driver attaches the
page pool to an existing NAPI instance and doesn't reallocate NAPIs).

Freeing the new page pool is technically safe because it hasn't been
used for any packets, yet, so there can be no recycling. But the check
in napi_assert_will_not_race() has no way of knowing that. We could
check if page pool is empty but that'd make the check much less likely
to trigger during development.

Add page_pool_enable_direct_recycling(), pairing with
page_pool_disable_direct_recycling(). It will allow us to create the new
page pools in "disabled" state and only enable recycling when we know
the reconfig operation will not fail.

Coincidentally it will also let us re-enable the recycling for the old
pool, if the reconfig failed:

-> mem_alloc (new)
-> stop (old)
    # disables direct recycling for old
-> start (new)
    # fail!!
-> start (old)
    # go back to old pp but direct recycling is lost :(
-> mem_free (new)

The new helper is idempotent to make the life easier for drivers,
which can operate in HDS mode and support zero-copy Rx.
The driver can call the helper twice whether there are two pools
or it has multiple references to a single pool.

Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
Tested-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250805003654.2944974-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ti: icssg-prueth: Fix emac link speed handling

When link settings are changed emac->speed is populated by
emac_adjust_link(). The link speed and other settings are then written into
the DRAM. However if both ports are brought down after this and brought up
again or if the operating mode is changed and a firmware reload is needed,
the DRAM is cleared by icssg_config(). As a result the link settings are
lost.

Fix this by calling emac_adjust_link() after icssg_config(). This re
populates the settings in the DRAM after a new firmware load.

Fixes: 9facce84f406 ("net: ti: icssg-prueth: Fix firmware load sequence.")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Message-ID: <20250805173812.2183161-1-danishanwar@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'there-are-some-bugfix-for-hibmcge-ethernet-driver'

Jijie Shao says:

====================
There are some bugfix for hibmcge ethernet driver

This patch set is intended to fix several issues for hibmcge driver:
1. Holding the rtnl_lock in pci_error_handlers->reset_prepare()
   may lead to a deadlock issue.
   2. A division by zero issue caused by debugfs when the port is down.
   3. A probabilistic false positive issue with np_link_fail.

v2: https://lore.kernel.org/20250805181446.3deaceb9@kernel.org
v1: https://lore.kernel.org/20250731134749.4090041-1-shaojijie@huawei.com
====================

Link: https://patch.msgid.link/20250806102758.3632674-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hibmcge: fix the np_link_fail error reporting issue

Currently, after modifying device port mode, the np_link_ok state
is immediately checked. At this point, the device may not yet ready,
leading to the querying of an intermediate state.

This patch will poll to check if np_link is ok after
modifying device port mode, and only report np_link_fail upon timeout.

Fixes: e0306637e85d ("net: hibmcge: Add support for mac link exception handling feature")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hibmcge: fix the division by zero issue

When the network port is down, the queue is released, and ring->len is 0.
In debugfs, hbg_get_queue_used_num() will be called,
which may lead to a division by zero issue.

This patch adds a check, if ring->len is 0,
hbg_get_queue_used_num() directly returns 0.

Fixes: 40735e7543f9 ("net: hibmcge: Implement .ndo_start_xmit function")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hibmcge: fix rtnl deadlock issue

Currently, the hibmcge netdev acquires the rtnl_lock in
pci_error_handlers.reset_prepare() and releases it in
pci_error_handlers.reset_done().

However, in the PCI framework:
pci_reset_bus - __pci_reset_slot - pci_slot_save_and_disable_locked -
pci_dev_save_and_disable - err_handler->reset_prepare(dev);

In pci_slot_save_and_disable_locked():
list_for_each_entry(dev, &slot->bus->devices, bus_list) {
if (!dev->slot || dev->slot!= slot)
continue;
pci_dev_save_and_disable(dev);
if (dev->subordinate)
pci_bus_save_and_disable_locked(dev->subordinate);
}

This will iterate through all devices under the current bus and execute
err_handler->reset_prepare(), causing two devices of the hibmcge driver
to sequentially request the rtnl_lock, leading to a deadlock.

Since the driver now executes netif_device_detach()
before the reset process, it will not concurrently with
other netdev APIs, so there is no need to hold the rtnl_lock now.

Therefore, this patch removes the rtnl_lock during the reset process and
adjusts the position of HBG_NIC_STATE_RESETTING to ensure
that multiple resets are not executed concurrently.

Fixes: 3f5a61f6d504f ("net: hibmcge: Add reset supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-25-08-07' of git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Reinstantiate Florian Westphal as a Netfilter maintainer.

2) Depend on both NETFILTER_XTABLES and NETFILTER_XTABLES_LEGACY,
   from Arnd Bergmann.

3) Use id to annotate last conntrack/expectation visited to resume
   netlink dump, patches from Florian Westphal.

4) Fix bogus element in nft_pipapo avx2 lookup, introduced in
   the last nf-next batch of updates, also from Florian.

5) Return 0 instead of recycling ret variable in
   nf_conntrack_log_invalid_sysctl(), introduced in the last
   nf-next batch of updates, from Dan Carpenter.

6) Fix WARN_ON_ONCE triggered by syzbot with larger cgroup level
   in nft_socket.

* tag 'nf-25-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_socket: remove WARN_ON_ONCE with huge level value
  netfilter: conntrack: clean up returns in nf_conntrack_log_invalid_sysctl()
  netfilter: nft_set_pipapo: don't return bogus extension pointer
  netfilter: ctnetlink: remove refcounting in expectation dumpers
  netfilter: ctnetlink: fix refcount leak on table dump
  netfilter: add back NETFILTER_XTABLES dependencies
  MAINTAINERS: resurrect my netfilter maintainer entry
====================

Link: https://patch.msgid.link/20250807112948.1400523-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

io_uring/memmap: cast nr_pages to size_t before shifting

If the allocated size exceeds UINT_MAX, then it's necessary to cast
the mr->nr_pages value to size_t to prevent it from overflowing. In
practice this isn't much of a concern as the required memory size will
have been validated upfront, and accounted to the user. And > 4GB sizes
will be necessary to make the lack of a cast a problem, which greatly
exceeds normal user locked_vm settings that are generally in the kb to
mb range. However, if root is used, then accounting isn't done, and
then it's possible to hit this issue.

Link: https://lore.kernel.org/all/6895b298.050a0220.7f033.0059.GAE@google.com/
Cc: stable@vger.kernel.org
Reported-by: syzbot+23727438116feb13df15@syzkaller.appspotmail.com
Fixes: 087f997870a9 ("io_uring/memmap: implement mmap for regions")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'xfrm: some fixes for GSO with SW crypto'

Sabrina Dubroca says:

====================
This series fixes a few issues with GSO. Some recent patches made the
incorrect assumption that GSO is only used by offload. The first two
patches in this series restore the old behavior.

The final patch is in the UDP GSO code, but fixes an issue with IPsec
that is currently masked by the lack of GSO for SW crypto. With GSO,
VXLAN over IPsec doesn't get checksummed.
====================

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

mailbox/pcc: support mailbox management of the shared buffer

Define a new, optional, callback that allows the driver to
specify how the return data buffer is allocated.  If that callback
is set,  mailbox/pcc.c is now responsible for reading from and
writing to the PCC shared buffer.

This also allows for proper checks of the Commnand complete flag
between the PCC sender and receiver.

For Type 4 channels, initialize the command complete flag prior
to accepting messages.

Since the mailbox does not know what memory allocation scheme
to use for response messages, the client now has an optional
callback that allows it to allocate the buffer for a response
message.

When an outbound message is written to the buffer, the mailbox
checks for the flag indicating the client wants an tx complete
notification via IRQ.  Upon receipt of the interrupt It will
pair it with the outgoing message. The expected use is to
free the kernel memory buffer for the previous outgoing message.

Signed-off-by: Adam Young <admiyo@os.amperecomputing.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>

Merge tag 'net-6.17-rc1' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
  Previous releases - regressions:

   - netlink: avoid infinite retry looping in netlink_unicast()

  Previous releases - always broken:

   - packet: fix a race in packet_set_ring() and packet_notifier()

   - ipv6: reject malicious packets in ipv6_gso_segment()

   - sched: mqprio: fix stack out-of-bounds write in tc entry parsing

   - net: drop UFO packets (injected via virtio) in udp_rcv_segment()

   - eth: mlx5: correctly set gso_segs when LRO is used, avoid false
     positive checksum validation errors

   - netpoll: prevent hanging NAPI when netcons gets enabled

   - phy: mscc: fix parsing of unicast frames for PTP timestamping

   - a number of device tree / OF reference leak fixes"

* tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
  pptp: fix pptp_xmit() error path
  net: ti: icssg-prueth: Fix skb handling for XDP_PASS
  net: Update threaded state in napi config in netif_set_threaded
  selftests: netdevsim: Xfail nexthop test on slow machines
  eth: fbnic: Lock the tx_dropped update
  eth: fbnic: Fix tx_dropped reporting
  eth: fbnic: remove the debugging trick of super high page bias
  net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnect
  dt-bindings: net: Replace bouncing Alexandru Tachici emails
  dpll: zl3073x: ZL3073X_I2C and ZL3073X_SPI should depend on NET
  net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing
  Revert "net: mdio_bus: Use devm for getting reset GPIO"
  selftests: net: packetdrill: xfail all problems on slow machines
  net/packet: fix a race in packet_set_ring() and packet_notifier()
  benet: fix BUG when creating VFs
  net: airoha: npu: Add missing MODULE_FIRMWARE macros
  net: devmem: fix DMA direction on unmapping
  ipa: fix compile-testing with qcom-mdt=m
  eth: fbnic: unlink NAPIs from queues on error to open
  net: Add locking to protect skb->dev access in ip_output
  ...

Merge tag 's390-6.17-2' of git://git./linux/kernel/git/s390/linux

Pull more s390 updates from Alexander Gordeev:

- Support MMIO read/write tracing

- Enable THP swapping and THP migration

- Unmask SLCF bit ("stateless command filtering") introduced with CEX8
   cards, so that user space applications like lszcrypt could evaluate
   and list this feature

- Fix the value of high_memory variable, so it considers possible
   tailing offline memory blocks

- Make vmem_pte_alloc() consistent and always allocate memory of
   PAGE_SIZE for page tables. This ensures a page table occupies the
   whole page, as the rest of the code assumes

- Fix kernel image end address in the decompressor debug output

- Fix a typo in debug_sprintf_format_fn() comment

* tag 's390-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/debug: Fix typo in debug_sprintf_format_fn() comment
  s390/boot: Fix startup debugging log
  s390/mm: Allocate page table with PAGE_SIZE granularity
  s390/mm: Enable THP_SWAP and THP_MIGRATION
  s390: Support CONFIG_TRACE_MMIO_ACCESS
  s390/mm: Set high_memory at the end of the identity mapping
  s390/ap: Unmask SLCF bit in card and queue ap functions sysfs

Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull vhost fix from Michael Tsirkin:
"A single fix for a regression in vhost"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: initialize vq->nheads properly

Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"This is the fixes that built up in the merge window, mostly amdgpu and
  xe with one i915 display fix, seems like things are pretty good for
  rc1.

  i915:
   - DP LPFS fixes

  xe:
   - SRIOV: PF fixes and removal of need of module param
   - Fix driver unbind around Devcoredump
   - Mark xe driver as BROKEN if kernel page size is not 4kB

  amdgpu:
   - GC 9.5.0 fixes
   - SMU fix
   - DCE 6 DC fixes
   - mmhub client ID fixes
   - VRR fix
   - Backlight fix
   - UserQ fix
   - Legacy reset fix
   - Misc fixes

  amdkfd:
   - CRIU fix
   - Debugfs fix"

* tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
  drm/amdgpu: add missing vram lost check for LEGACY RESET
  drm/amdgpu/discovery: fix fw based ip discovery
  drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
  amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
  drm/amdgpu: Update SDMA firmware version check for user queue support
  drm/amdgpu: Add NULL check for asic_funcs
  drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
  drm/amd/display: fix a Null pointer dereference vulnerability
  drm/amd/display: Add primary plane to commits for correct VRR handling
  drm/amdgpu: update mmhub 3.3 client id mappings
  drm/amdgpu: update mmhub 3.0.1 client id mappings
  drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
  drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
  drm/amd/display: Don't overwrite dce60_clk_mgr
  drm/amdkfd: Fix checkpoint-restore on multi-xcc
  drm/amd: Restore cached manual clock settings during resume
  drm/amd: Restore cached power limit during resume
  drm/amdgpu: Update external revid for GC v9.5.0
  drm/amdgpu: Update supported modes for GC v9.5.0
  Mark xe driver as BROKEN if kernel page size is not 4kB
  ...

Merge tag 'fbdev-for-6.17-rc1-2' of git://git./linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes for 6.17-rc1:

- Revert a patch which broke VGA console

- Fix an out-of-bounds access bug which may happen during console
   resizing when a console is mapped to a frame buffer

* tag 'fbdev-for-6.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  Revert "vgacon: Add check for vc_origin address range in vgacon_scroll()"
  fbdev: Fix vmalloc out-of-bounds write in fast_imageblit

Merge tag 'loongarch-6.17' of git://git./linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

- Complete KSave registers definition

- Support the mem=<size> kernel parameter

- Support BPF dynamic modification & trampoline

- Add MMC/SDIO controller nodes in dts

- Some bug fixes and other small changes

* tag 'loongarch-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: vDSO: Remove -nostdlib complier flag
  LoongArch: dts: Add eMMC/SDIO controller support to Loongson-2K2000
  LoongArch: dts: Add SDIO controller support to Loongson-2K1000
  LoongArch: dts: Add SDIO controller support to Loongson-2K0500
  LoongArch: BPF: Set bpf_jit_bypass_spec_v1/v4()
  LoongArch: BPF: Fix the tailcall hierarchy
  LoongArch: BPF: Fix jump offset calculation in tailcall
  LoongArch: BPF: Add struct ops support for trampoline
  LoongArch: BPF: Add basic bpf trampoline support
  LoongArch: BPF: Add dynamic code modification support
  LoongArch: BPF: Rename and refactor validate_code()
  LoongArch: Add larch_insn_gen_{beq,bne} helpers
  LoongArch: Don't use %pK through printk() in unwinder
  LoongArch: Avoid in-place string operation on FDT content
  LoongArch: Support mem=<size> kernel parameter
  LoongArch: Make relocate_new_kernel_size be a .quad value
  LoongArch: Complete KSave registers definition

smb: server: Fix extension string in ksmbd_extract_shortname()

In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).

Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.

Cc: stable@vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: limit repeated connections from clients with the same IP

Repeated connections from clients with the same IP address may exhaust
the max connections and prevent other normal client connections.
This patch limit repeated connections from clients with the same IP.

Reported-by: tianshuo han <hantianshuo233@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

Merge tag 'amd-drm-fixes-6.17-2025-08-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-fixes-6.17-2025-08-07:

amdgpu:
- GC 9.5.0 fixes
- SMU fix
- DCE 6 DC fixes
- mmhub client ID fixes
- VRR fix
- Backlight fix
- UserQ fix
- Legacy reset fix
- Misc fixes

amdkfd:
- CRIU fix
- Debugfs fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250807132030.1168068-1-alexander.deucher@amd.com

Merge tag 'drm-xe-next-fixes-2025-08-06' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

- SRIOV: PF fixes and removal of need of module param (Michal)
- Fix driver unbind around Devcoredump (Bala)
- Mark xe driver as BROKEN if kernel page size is not 4kB (Simon)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aJNXnIAp2Cq-2pZj@intel.com

smb: client: only use a single wait_queue to monitor smbdirect connection status

There's no need for separate conn_wait and disconn_wait queues.

This will simplify the move to common code, the server code
already a single wait_queue for this.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: don't call init_waitqueue_head(&info->conn_wait) twice in _smbd_get_connection

It is already called long before we may hit this cleanup code path.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: improve logging in smbd_conn_upcall()

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: return an error if rdma_connect does not return within 5 seconds

This matches the timeout for tcp connections.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

PCI: vmd: Fix wrong kfree() in vmd_msi_free()

vmd_msi_alloc() allocates struct vmd_irq and stashes it into
irq_data->chip_data associated with the VMD's interrupt domain.
vmd_msi_free() extracts the pointer by calling irq_get_chip_data() and
frees it.

irq_get_chip_data() returns the chip_data associated with the top interrupt
domain. This worked in the past because VMD's interrupt domain was the top
domain.

But d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()")
changed the interrupt domain hierarchy so VMD's interrupt domain is not the
top domain anymore. irq_get_chip_data() now returns the chip_data at the
MSI devices' interrupt domains. It is therefore broken for vmd_msi_free()
to kfree() this chip_data.

Fix by extracting the chip_data associated with the VMD's interrupt domain.

Fixes: d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()")
Reported-by: Kenneth Crudup <kenny@panix.com>
Closes: https://lore.kernel.org/linux-pci/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/
Reported-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Closes: https://lore.kernel.org/linux-pci/ed53280ed15d1140700b96cca2734bf327ee92539e5eb68e80f5bbbf0f01@linux.gnuweeb.org/
Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Tested-by: Kenneth Crudup <kenny@panix.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250807081051.2253962-1-namcao@linutronix.de

Merge branch 'perf-s390-regression-move-uid-filtering-to-bpf-filters'

Ilya Leoshkevich says:

====================
perf/s390: Regression: Move uid filtering to BPF filters

v4: https://lore.kernel.org/bpf/20250806114227.14617-1-iii@linux.ibm.com/
v4 -> v5: Fix a typo in the commit message (Yonghong).

v3: https://lore.kernel.org/bpf/20250805130346.1225535-1-iii@linux.ibm.com/
v3 -> v4: Rename the new field to dont_enable (Alexei, Eduard).
Switch the Fixes: tag in patch 2 (Alexander, Thomas).
Fix typos in the cover letter (Thomas).

v2: https://lore.kernel.org/bpf/20250728144340.711196-1-tmricht@linux.ibm.com/
v2 -> v3: Use no_ioctl_enable in perf.

v1: https://lore.kernel.org/bpf/20250725093405.3629253-1-tmricht@linux.ibm.com/
v1 -> v2: Introduce no_ioctl_enable (Jiri).

Hi,

This series fixes a regression caused by moving UID filtering to BPF.
The regression affects all events that support auxiliary data, most
notably, "cycles" events on s390, but also PT events on Intel. The
symptom is missing events when UID filtering is enabled.

Patch 1 introduces a new option for the
bpf_program__attach_perf_event_opts() function.
Patch 2 makes use of it in perf, and also contains a lot of technical
details of why exactly the problem is occurring.

Thanks to Thomas Richter for the investigation and the initial version
of this fix, and to Jiri Olsa for suggestions.

Best regards,
Ilya
====================

Link: https://patch.msgid.link/20250806162417.19666-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

perf bpf-filter: Enable events manually

On s390, and, in general, on all platforms where the respective event
supports auxiliary data gathering, the command:

   # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.011 MB perf.data ]
   # ./perf report --stats | grep SAMPLE
   #

does not generate samples in the perf.data file. On x86 the command:

  # sudo perf record -e intel_pt// -u 0 ls

is broken too.

Looking at the sequence of calls in 'perf record' reveals this
behavior:

1. The event 'cycles' is created and enabled:

   record__open()
   +-> evlist__apply_filters()
       +-> perf_bpf_filter__prepare()
   +-> bpf_program.attach_perf_event()
       +-> bpf_program.attach_perf_event_opts()
           +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)

   The event 'cycles' is enabled and active now. However the event's
   ring-buffer to store the samples generated by hardware is not
   allocated yet.

2. The event's fd is mmap()ed to create the ring buffer:

   record__open()
   +-> record__mmap()
       +-> record__mmap_evlist()
   +-> evlist__mmap_ex()
       +-> perf_evlist__mmap_ops()
           +-> mmap_per_cpu()
               +-> mmap_per_evsel()
                   +-> mmap__mmap()
                       +-> perf_mmap__mmap()
                           +-> mmap()

   This allocates the ring buffer for the event 'cycles'. With mmap()
   the kernel creates the ring buffer:

   perf_mmap(): kernel function to create the event's ring
   |            buffer to save the sampled data.
   |
   +-> ring_buffer_attach(): Allocates memory for ring buffer.
       |        The PMU has auxiliary data setup function. The
       |        has_aux(event) condition is true and the PMU's
       |        stop() is called to stop sampling. It is not
       |        restarted:
       |
       |        if (has_aux(event))
       |                perf_event_stop(event, 0);
       |
       +-> cpumsf_pmu_stop():

   Hardware sampling is stopped. No samples are generated and saved
   anymore.

3. After the event 'cycles' has been mapped, the event is enabled a
   second time in:

   __cmd_record()
   +-> evlist__enable()
       +-> __evlist__enable()
   +-> evsel__enable_cpu()
       +-> perf_evsel__enable_cpu()
           +-> perf_evsel__run_ioctl()
               +-> perf_evsel__ioctl()
                   +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)

   The second

      ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

   is just a NOP in this case. The first invocation in (1.) sets the
   event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions

   perf_ioctl()
   +-> _perf_ioctl()
       +-> _perf_event_enable()
           +-> __perf_event_enable()

   return immediately because event::state is already set to
   PERF_EVENT_STATE_ACTIVE.

This happens on s390, because the event 'cycles' offers the possibility
to save auxilary data. The PMU callbacks setup_aux() and free_aux() are
defined. Without both callback functions, cpumsf_pmu_stop() is not
invoked and sampling continues.

To remedy this, remove the first invocation of

   ioctl(..., PERF_EVENT_IOC_ENABLE, ...).

in step (1.) Create the event in step (1.) and enable it in step (3.)
after the ring buffer has been mapped.

Output after:

# ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.876 MB perf.data ]
# ./perf  report --stats | grep SAMPLE
              SAMPLE events:      16200  (99.5%)
              SAMPLE events:      16200
#

The software event succeeded both before and after the patch:

# ./perf record -e cpu-clock -aB --synth=no -u 0 -- \
  ./perf test -w thloop 2
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 2.870 MB perf.data ]
# ./perf  report --stats | grep SAMPLE
              SAMPLE events:      53506  (99.8%)
              SAMPLE events:      53506
#

Fixes: b4c658d4d63d61 ("perf target: Remove uid from target")
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250806162417.19666-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add the ability to suppress perf event enablement

Automatically enabling a perf event after attaching a BPF prog to it is
not always desirable.

Add a new "dont_enable" field to struct bpf_perf_event_opts. While
introducing "enable" instead would be nicer in that it would avoid
a double negation in the implementation, it would make
DECLARE_LIBBPF_OPTS() less efficient.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

btrfs: fix iteration bug in __qgroup_excl_accounting()

__qgroup_excl_accounting() uses the qgroup iterator machinery to
update the account of one qgroups usage for all its parent hierarchy,
when we either add or remove a relation and have only exclusive usage.

However, there is a small bug there: we loop with an extra iteration
temporary qgroup called `cur` but never actually refer to that in the
body of the loop. As a result, we redundantly account the same usage to
the first qgroup in the list.

This can be reproduced in the following way:

  mkfs.btrfs -f -O squota <dev>
  mount <dev> <mnt>
  btrfs subvol create <mnt>/sv
  dd if=/dev/zero of=<mnt>/sv/f bs=1M count=1
  sync
  btrfs qgroup create 1/100 <mnt>
  btrfs qgroup create 2/200 <mnt>
  btrfs qgroup assign 1/100 2/200 <mnt>
  btrfs qgroup assign 0/256 1/100 <mnt>
  btrfs qgroup show <mnt>

and the broken result is (note the 2MiB on 1/100 and 0Mib on 2/100):

  Qgroupid    Referenced    Exclusive   Path
  --------    ----------    ---------   ----
  0/5           16.00KiB     16.00KiB   <toplevel>
  0/256          1.02MiB      1.02MiB   sv

  Qgroupid    Referenced    Exclusive   Path
  --------    ----------    ---------   ----
  0/5           16.00KiB     16.00KiB   <toplevel>
  0/256          1.02MiB      1.02MiB   sv
  1/100          2.03MiB      2.03MiB   2/100<1 member qgroup>
  2/100            0.00B        0.00B   <0 member qgroups>

With this fix, which simply re-uses `qgroup` as the iteration variable,
we see the expected result:

  Qgroupid    Referenced    Exclusive   Path
  --------    ----------    ---------   ----
  0/5           16.00KiB     16.00KiB   <toplevel>
  0/256          1.02MiB      1.02MiB   sv

  Qgroupid    Referenced    Exclusive   Path
  --------    ----------    ---------   ----
  0/5           16.00KiB     16.00KiB   <toplevel>
  0/256          1.02MiB      1.02MiB   sv
  1/100          1.02MiB      1.02MiB   2/100<1 member qgroup>
  2/100          1.02MiB      1.02MiB   <0 member qgroups>

The existing fstests did not exercise two layer inheritance so this bug
was missed. I intend to add that testing there, as well.

Fixes: a0bdc04b0732 ("btrfs: qgroup: use qgroup_iterator in __qgroup_excl_accounting()")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: zoned: do not select metadata BG as finish target

We call btrfs_zone_finish_one_bg() to zone finish one block group and make
room to activate another block group. Currently, we can choose a metadata
block group as a target. But, as we reserve an active metadata block group,
we no longer want to select a metadata block group. So, skip it in the
loop.

CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: do not allow relocation of partially dropped subvolumes

[BUG]
There is an internal report that balance triggered transaction abort,
with the following call trace:

  item 85 key (594509824 169 0) itemoff 12599 itemsize 33
          extent refs 1 gen 197740 flags 2
          ref#0: tree block backref root 7
  item 86 key (594558976 169 0) itemoff 12566 itemsize 33
          extent refs 1 gen 197522 flags 2
          ref#0: tree block backref root 7
...
BTRFS error (device loop0): extent item not found for insert, bytenr 594526208 num_bytes 16384 parent 449921024 root_objectid 934 owner 1 offset 0
BTRFS error (device loop0): failed to run delayed ref for logical 594526208 num_bytes 16384 type 182 action 1 ref_mod 1: -117
------------[ cut here ]------------
BTRFS: Transaction aborted (error -117)
WARNING: CPU: 1 PID: 6963 at ../fs/btrfs/extent-tree.c:2168 btrfs_run_delayed_refs+0xfa/0x110 [btrfs]

And btrfs check doesn't report anything wrong related to the extent
tree.

[CAUSE]
The cause is a little complex, firstly the extent tree indeed doesn't
have the backref for 594526208.

The extent tree only have the following two backrefs around that bytenr
on-disk:

        item 65 key (594509824 METADATA_ITEM 0) itemoff 13880 itemsize 33
                refs 1 gen 197740 flags TREE_BLOCK
                tree block skinny level 0
                (176 0x7) tree block backref root CSUM_TREE
        item 66 key (594558976 METADATA_ITEM 0) itemoff 13847 itemsize 33
                refs 1 gen 197522 flags TREE_BLOCK
                tree block skinny level 0
                (176 0x7) tree block backref root CSUM_TREE

But the such missing backref item is not an corruption on disk, as the
offending delayed ref belongs to subvolume 934, and that subvolume is
being dropped:

        item 0 key (934 ROOT_ITEM 198229) itemoff 15844 itemsize 439
                generation 198229 root_dirid 256 bytenr 10741039104 byte_limit 0 bytes_used 345571328
                last_snapshot 198229 flags 0x1000000000001(RDONLY) refs 0
                drop_progress key (206324 EXTENT_DATA 2711650304) drop_level 2
                level 2 generation_v2 198229

And that offending tree block 594526208 is inside the dropped range of
that subvolume.  That explains why there is no backref item for that
bytenr and why btrfs check is not reporting anything wrong.

But this also shows another problem, as btrfs will do all the orphan
subvolume cleanup at a read-write mount.

So half-dropped subvolume should not exist after an RW mount, and
balance itself is also exclusive to subvolume cleanup, meaning we
shouldn't hit a subvolume half-dropped during relocation.

The root cause is, there is no orphan item for this subvolume.
In fact there are 5 subvolumes from around 2021 that have the same
problem.

It looks like the original report has some older kernels running, and
caused those zombie subvolumes.

Thankfully upstream commit 8d488a8c7ba2 ("btrfs: fix subvolume/snapshot
deletion not triggered on mount") has long fixed the bug.

[ENHANCEMENT]
For repairing such old fs, btrfs-progs will be enhanced.

Considering how delayed the problem will show up (at run delayed ref
time) and at that time we have to abort transaction already, it is too
late.

Instead here we reject any half-dropped subvolume for reloc tree at the
earliest time, preventing confusion and extra time wasted on debugging
similar bugs.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: error on missing block group when unaccounting log tree extent buffers

Currently we only log an error message if we can't find the block group
for a log tree extent buffer when unaccounting it (while freeing a log
tree). A missing block group means something is seriously wrong and we
end up leaking space from the metadata space info. So return -ENOENT in
case we don't find the block group.

CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix wrong length parameter for btrfs_cleanup_ordered_extents()

Inside nocow_one_range(), if the checksum cloning for data reloc inode
failed, we call btrfs_cleanup_ordered_extents() to cleanup the just
allocated ordered extents.

But unlike extent_clear_unlock_delalloc(),
btrfs_cleanup_ordered_extents() requires a length, not an inclusive end
bytenr.

This can be problematic, as the @end is normally way larger than @len.

This means btrfs_cleanup_ordered_extents() can be called on folios
out of the correct range, and if the out-of-range folio is under
writeback, we can incorrectly clear the ordered flag of the folio, and
trigger the DEBUG_WARN() inside btrfs_writepage_cow_fixup().

Fix the wrong parameter with correct length instead.

Fixes: 94f6c5c17e52 ("btrfs: move ordered extent cleanup to where they are allocated")
CC: stable@vger.kernel.org # 6.15+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>