git.kernel.dk Git - linux-2.6-block.git/log

io_uring/kbuf: check for ring provided buffers first in recycling

This is the most likely of paths if a provided buffer is used, so offer
it up first and push the legacy buffers to later.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: remove async/poll related provided buffer recycles

These aren't necessary anymore, get rid of them.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: switch to storing struct io_buffer_list locally

Currently the buffer list is stored in struct io_kiocb. The buffer list
can be of two types:

1) Classic/legacy buffer list. These don't need to get referenced after
   a buffer pick, and hence storing them in struct io_kiocb is perfectly
   fine.

2) Ring provided buffer lists. These DO need to be referenced after the
   initial buffer pick, as they need to get consumed later on. This can
   be either just incrementing the head of the ring, or it can be
   consuming parts of a buffer if incremental buffer consumptions has
   been configured.

For case 2, io_uring needs to be careful not to access the buffer list
after the initial pick-and-execute context. The core does recycling of
these, but it's easy to make a mistake, because it's stored in the
io_kiocb which does persist across multiple execution contexts. Either
because it's a multishot request, or simply because it needed some kind
of async trigger (eg poll) for retry purposes.

Add a struct io_buffer_list to struct io_br_sel, which is always on
stack for the various users of it. This prevents the buffer list from
leaking outside of that execution context, and additionally it enables
kbuf to not even pass back the struct io_buffer_list if the given
context isn't appropriately locked already.

This doesn't fix any bugs, it's simply a defensive measure to prevent
any issues with reuse of a buffer list.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: use struct io_br_sel->val as the send finish value

Currently a pointer is passed in to the 'ret' in the send mshot handler,
but since we already have a value field in io_br_sel, just use that.
This is also in preparation for needing to pass in struct io_br_sel
to io_send_finish() anyway.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: use struct io_br_sel->val as the recv finish value

Currently a pointer is passed in to the 'ret' in the receive handlers,
but since we already have a value field in io_br_sel, just use that.
This is also in preparation for needing to pass in struct io_br_sel
to io_recv_finish() anyway.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: use struct io_br_sel for multiple buffers picking

The networking side uses bundles, which is picking multiple buffers at
the same time. Pass in struct io_br_sel to those helpers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/rw: recycle buffers manually for non-mshot reads

The mshot side of reads already does this, but the regular read path
does not. This leads to needing recycling checks sprinkled in various
spots in the "go async" path, like arming poll. In preparation for
getting rid of those, ensure that read recycles appropriately.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: introduce struct io_br_sel

Rather than return addresses directly from buffer selection, add a
struct around it. No functional changes in this patch, it's in
preparation for storing more buffer related information locally, rather
than in struct io_kiocb.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: pass in struct io_buffer_list to commit/recycle helpers

Rather than have this implied being in the io_kiocb, pass it in directly
so it's immediately obvious where these users of ->buf_list are coming
from.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: clarify io_recv_buf_select() return value

It returns 0 on success, less than zero on error.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: don't use io_net_kbuf_recyle() for non-provided cases

A previous commit used io_net_kbuf_recyle() for any network helper that
did IO and needed partial retry. However, that's only needed if the
opcode does buffer selection, which isnt support for sendzc, sendmsg_zc,
or sendmsg. Just remove them - they don't do any harm, but it is a bit
confusing when reading the code.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: drop 'issue_flags' from io_put_kbuf(s)() arguments

Picking multiple buffers always requires the ring lock to be held across
the operation, so there's no need to pass in the issue_flags to
io_put_kbufs(). On the single buffer side, if the initial picking of a
ring buffer was unlocked, then it will have been committed already. For
legacy buffers, no locking is required, as they will simply be freed.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-6.18/io_uring' into io_uring-buf-list

* for-6.18/io_uring:
io_uring/zctx: check chained notif contexts
io_uring: add request poisoning

io_uring/zctx: check chained notif contexts

Send zc only links ubuf_info for requests coming from the same context.
There are some ambiguous syz reports, so let's check the assumption on
notification completion.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fd527d8638203fe0f1c5ff06ff2e1d8fd68f831b.1755179962.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs

Pull mount fixes from Al Viro:
"Fixes for several recent mount-related regressions"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  change_mnt_propagation(): calculate propagation source only if we'll need it
  use uniform permission checks for all mount propagation changes
  propagate_umount(): only surviving overmounts should be reparented
  fix the softlockups in attach_recursive_mnt()

Merge tag 'ovl-fixes-6.17-rc1' of git://git./linux/kernel/git/overlayfs/vfs

Pull overlayfs fixes from Amir Goldstein:
"Fixes for two fallouts from Neil's directory locking changes"

* tag 'ovl-fixes-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix possible double unlink
ovl: use I_MUTEX_PARENT when locking parent in ovl_create_temp()

Merge tag 'vfs-6.17-rc3.fixes' of git://git./linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Fix two memory leaks in pidfs

- Prevent changing the idmapping of an already idmapped mount without
   OPEN_TREE_CLONE through open_tree_attr()

- Don't fail listing extended attributes in kernfs when no extended
   attributes are set

- Fix the return value in coredump_parse()

- Fix the error handling for unbuffered writes in netfs

- Fix broken data integrity guarantees for O_SYNC writes via iomap

- Fix UAF in __mark_inode_dirty()

- Keep inode->i_blkbits constant in fuse

- Fix coredump selftests

- Fix get_unused_fd_flags() usage in do_handle_open()

- Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES

- Fix use-after-free in bh_read()

- Fix incorrect lflags value in the move_mount() syscall

* tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  signal: Fix memory leak for PIDFD_SELF* sentinels
  kernfs: don't fail listing extended attributes
  coredump: Fix return value in coredump_parse()
  fs/buffer: fix use-after-free when call bh_read() helper
  pidfs: Fix memory leak in pidfd_info()
  netfs: Fix unbuffered write error handling
  fhandle: do_handle_open() should get FD with user flags
  module: Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES
  fs: fix incorrect lflags value in the move_mount syscall
  selftests/coredump: Remove the read() that fails the test
  fuse: keep inode->i_blkbits constant
  iomap: Fix broken data integrity guarantees for O_SYNC writes
  selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug
  open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE
  fs: writeback: fix use-after-free in __mark_inode_dirty()

change_mnt_propagation(): calculate propagation source only if we'll need it

We only need it when mount in question was sending events downstream (then
recepients need to switch to new master) or the mount is being turned into
slave (then we need a new master for it).

That wouldn't be a big deal, except that it causes quite a bit of work
when umount_tree() is taking a large peer group out. Adding a trivial
"don't bother calling propagation_source() unless we are going to use
its results" logics improves the things quite a bit.

We are still doing unnecessary work on bulk removals from propagation graph,
but the full solution for that will have to wait for the next merge window.

Fixes: 955336e204ab "do_make_slave(): choose new master sanely"
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

use uniform permission checks for all mount propagation changes

do_change_type() and do_set_group() are operating on different
aspects of the same thing - propagation graph.  The latter
asks for mounts involved to be mounted in namespace(s) the caller
has CAP_SYS_ADMIN for.  The former is a mess - originally it
didn't even check that mount *is* mounted.  That got fixed,
but the resulting check turns out to be too strict for userland -
in effect, we check that mount is in our namespace, having already
checked that we have CAP_SYS_ADMIN there.

What we really need (in both cases) is
* only touch mounts that are mounted.  That's a must-have
constraint - data corruption happens if it get violated.
* don't allow to mess with a namespace unless you already
have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns).

That's an equivalent of what do_set_group() does; let's extract that
into a helper (may_change_propagation()) and use it in both
do_set_group() and do_change_type().

Fixes: 12f147ddd6de "do_change_type(): refuse to operate on unmounted/not ours mounts"
Acked-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

propagate_umount(): only surviving overmounts should be reparented

... as the comments in reparent() clearly say. As it is, we reparent
*all* overmounts of the mounts being taken out, including those that
are taken out themselves. It's not only a potentially massive slowdown
(on a pathological setup we might end up with O(N^2) time for N mounts
being kicked out), it can end up with incorrect ->overmount in the
surviving mounts.

Fixes: f0d0ba19985d "Rewrite of propagate_umount()"
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fix the softlockups in attach_recursive_mnt()

In case when we mounting something on top of a large stack of overmounts,
all of them being peers of each other, we get quadratic time by the
depth of overmount stack. Easily fixed by doing commit_tree() before
reparenting the overmount; simplifies commit_tree() as well - it doesn't
need to skip the already mounted stuff that had been reparented on top
of the new mounts.

Since we are holding mount_lock through both reparenting and call of
commit_tree(), the order does not matter from the mount hash point
of view.

Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Tested-by: "Lai, Yi" <yi1.lai@linux.intel.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 663206854f02 "copy_tree(): don't link the mounts via mnt_list"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

signal: Fix memory leak for PIDFD_SELF* sentinels

Commit f08d0c3a7111 ("pidfd: add PIDFD_SELF* sentinels to refer to own
thread/process") introduced a leak by acquiring a pid reference through
get_task_pid(), which increments pid->count but never drops it with
put_pid().

As a result, kmemleak reports unreferenced pid objects after running
tools/testing/selftests/pidfd/pidfd_test, for example:

  unreferenced object 0xff1100206757a940 (size 160):
    comm "pidfd_test", pid 16965, jiffies 4294853028
    hex dump (first 32 bytes):
      01 00 00 00 00 00 00 00 00 00 00 00 fd 57 50 04  .............WP.
      5e 44 00 00 00 00 00 00 18 de 34 17 01 00 11 ff  ^D........4.....
    backtrace (crc cd8844d4):
      kmem_cache_alloc_noprof+0x2f4/0x3f0
      alloc_pid+0x54/0x3d0
      copy_process+0xd58/0x1740
      kernel_clone+0x99/0x3b0
      __do_sys_clone3+0xbe/0x100
      do_syscall_64+0x7b/0x2c0
      entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fix this by calling put_pid() after do_pidfd_send_signal() returns.

Fixes: f08d0c3a7111 ("pidfd: add PIDFD_SELF* sentinels to refer to own thread/process")
Signed-off-by: Adrian Huang (Lenovo) <adrianhuang0701@gmail.com>
Link: https://lore.kernel.org/20250818134310.12273-1-adrianhuang0701@gmail.com
Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

kernfs: don't fail listing extended attributes

Userspace doesn't expect a failure to list extended attributes:

  $ ls -lA /sys/
  ls: /sys/: No data available
  ls: /sys/kernel: No data available
  ls: /sys/power: No data available
  ls: /sys/class: No data available
  ls: /sys/devices: No data available
  ls: /sys/dev: No data available
  ls: /sys/hypervisor: No data available
  ls: /sys/fs: No data available
  ls: /sys/bus: No data available
  ls: /sys/firmware: No data available
  ls: /sys/block: No data available
  ls: /sys/module: No data available
  total 0
  drwxr-xr-x   2 root root 0 Jan  1  1970 block
  drwxr-xr-x  52 root root 0 Jan  1  1970 bus
  drwxr-xr-x  88 root root 0 Jan  1  1970 class
  drwxr-xr-x   4 root root 0 Jan  1  1970 dev
  drwxr-xr-x  11 root root 0 Jan  1  1970 devices
  drwxr-xr-x   3 root root 0 Jan  1  1970 firmware
  drwxr-xr-x  10 root root 0 Jan  1  1970 fs
  drwxr-xr-x   2 root root 0 Jul  2 09:43 hypervisor
  drwxr-xr-x  14 root root 0 Jan  1  1970 kernel
  drwxr-xr-x 251 root root 0 Jan  1  1970 module
  drwxr-xr-x   3 root root 0 Jul  2 09:43 power

Fix it by simply reporting success when no extended attributes are
available instead of reporting ENODATA.

Link: https://lore.kernel.org/78b13bcdae82ade95e88f315682966051f461dde.camel@linaro.org
Fixes: d1f4e9026007 ("kernfs: remove iattr_mutex") # mainline only
Reported-by: André Draszik <andre.draszik@linaro.org>
Link: https://lore.kernel.org/20250819-ahndung-abgaben-524a535f8101@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>

coredump: Fix return value in coredump_parse()

The coredump_parse() function is bool type.  It should return true on
success and false on failure.  The cn_printf() returns zero on success
or negative error codes.  This mismatch means that when "return err;"
here, it is treated as success instead of failure.  Change it to return
false instead.

Fixes: a5715af549b2 ("coredump: make coredump_parse() return bool")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/aKRGu14w5vPSZLgv@stanley.mountain
Signed-off-by: Christian Brauner <brauner@kernel.org>

fs/buffer: fix use-after-free when call bh_read() helper

There's issue as follows:
BUG: KASAN: stack-out-of-bounds in end_buffer_read_sync+0xe3/0x110
Read of size 8 at addr ffffc9000168f7f8 by task swapper/3/0
CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.16.0-862.14.0.6.x86_64
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_address_description.constprop.0+0x2c/0x390
print_report+0xb4/0x270
kasan_report+0xb8/0xf0
end_buffer_read_sync+0xe3/0x110
end_bio_bh_io_sync+0x56/0x80
blk_update_request+0x30a/0x720
scsi_end_request+0x51/0x2b0
scsi_io_completion+0xe3/0x480
? scsi_device_unbusy+0x11e/0x160
blk_complete_reqs+0x7b/0x90
handle_softirqs+0xef/0x370
irq_exit_rcu+0xa5/0xd0
sysvec_apic_timer_interrupt+0x6e/0x90
</IRQ>

Above issue happens when do ntfs3 filesystem mount, issue may happens
as follows:
           mount                            IRQ
ntfs_fill_super
  read_cache_page
    do_read_cache_folio
      filemap_read_folio
        mpage_read_folio
do_mpage_readpage
  ntfs_get_block_vbo
   bh_read
     submit_bh
     wait_on_buffer(bh);
                            blk_complete_reqs
     scsi_io_completion
      scsi_end_request
       blk_update_request
        end_bio_bh_io_sync
end_buffer_read_sync
  __end_buffer_read_notouch
   unlock_buffer

            wait_on_buffer(bh);--> return will return to caller

  put_bh
    --> trigger stack-out-of-bounds
In the mpage_read_folio() function, the stack variable 'map_bh' is
passed to ntfs_get_block_vbo(). Once unlock_buffer() unlocks and
wait_on_buffer() returns to continue processing, the stack variable
is likely to be reclaimed. Consequently, during the end_buffer_read_sync()
process, calling put_bh() may result in stack overrun.

If the bh is not allocated on the stack, it belongs to a folio.  Freeing
a buffer head which belongs to a folio is done by drop_buffers() which
will fail to free buffers which are still locked.  So it is safe to call
put_bh() before __end_buffer_read_notouch().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/20250811141830.343774-1-yebin@huaweicloud.com
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>

io_uring: add request poisoning

Poison various request fields on free. __io_req_caches_free() is a slow
path, so can be done unconditionally, but gate it on kasan for
io_req_add_to_cache(). Note that some fields are logically retained
between cache allocations and can't be poisoned in
io_req_add_to_cache().

Ideally, it'd be replaced with KASAN'ed caches, but that can't be
enabled because of some synchronisation nuances.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7a78e8a7f5be434313c400650b862e36c211b312.1755459452.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'for-6.17-rc2-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"Several zoned mode fixes, mount option printing fixups, folio state
  handling fixes and one log replay fix.

   - zoned mode:
       - zone activation and finish fixes
       - block group reservation fixes

   - mount option fixes:
       - bring back printing of mount options with key=value that got
         accidentally dropped during mount option parsing in 6.8
       - fix inverse logic or typos when printing nodatasum/nodatacow

   - folio status fixes:
       - writeback fixes in zoned mode
       - properly reset dirty/writeback if submission fails
       - properly handle TOWRITE xarray mark/tag

   - do not set mtime/ctime to current time when unlinking for log
     replay"

* tag 'for-6.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix printing of mount info messages for NODATACOW/NODATASUM
  btrfs: restore mount option info messages during mount
  btrfs: fix incorrect log message for nobarrier mount option
  btrfs: fix buffer index in wait_eb_writebacks()
  btrfs: subpage: keep TOWRITE tag until folio is cleaned
  btrfs: clear TAG_TOWRITE from buffer tree when submitting a tree block
  btrfs: do not set mtime/ctime to current time when unlinking for log replay
  btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed
  btrfs: clear block dirty if submit_one_sector() failed
  btrfs: zoned: limit active zones to max_open_zones
  btrfs: zoned: fix write time activation failure for metadata block group
  btrfs: zoned: fix data relocation block group reservation
  btrfs: zoned: skip ZONE FINISH of conventional zones

Merge tag 'ext4_for_linus-6.17-rc3' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:

- Fix fast commit checks for file systems with ea_inode enabled

- Don't drop the i_version mount option on a remount

- Fix FIEMAP reporting when there are holes in a bigalloc file system

- Don't fail when mounting read-only when there are inodes in the
   orphan file

- Fix hole length overflow for indirect mapped files on file systems
   with an 8k or 16k block file system

* tag 'ext4_for_linus-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: prevent softlockup in jbd2_log_do_checkpoint()
  ext4: fix incorrect function name in comment
  ext4: use kmalloc_array() for array space allocation
  ext4: fix hole length calculation overflow in non-extent inodes
  ext4: don't try to clear the orphan_present feature block device is r/o
  ext4: fix reserved gdt blocks handling in fsmap
  ext4: fix fsmap end of range reporting with bigalloc
  ext4: remove redundant __GFP_NOWARN
  ext4: fix unused variable warning in ext4_init_new_dir
  ext4: remove useless if check
  ext4: check fast symlink for ea_inode correctly
  ext4: preserve SB_I_VERSION on remount
  ext4: show the default enabled i_version option

ovl: fix possible double unlink

commit 9d23967b18c6 ("ovl: simplify an error path in
ovl_copy_up_workdir()") introduced the helper ovl_cleanup_unlocked(),
which is later used in several following patches to re-acquire the parent
inode lock and unlink a dentry that was earlier found using lookup.
This helper was eventually renamed to ovl_cleanup().

The helper ovl_parent_lock() is used to re-acquire the parent inode lock.
After acquiring the parent inode lock, the helper verifies that the
dentry has not since been moved to another parent, but it failed to
verify that the dentry wasn't unlinked from the parent.

This means that now every call to ovl_cleanup() could potentially
race with another thread, unlinking the dentry to be cleaned up
underneath overlayfs and trigger a vfs assertion.

Reported-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com
Tested-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com
Fixes: 9d23967b18c6 ("ovl: simplify an error path in ovl_copy_up_workdir()")
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>

ovl: use I_MUTEX_PARENT when locking parent in ovl_create_temp()

ovl_create_temp() treats "workdir" as a parent in which it creates an
object so it should use I_MUTEX_PARENT.

Prior to the commit identified below the lock was taken by the caller
which sometimes used I_MUTEX_PARENT and sometimes used I_MUTEX_NORMAL.
The use of I_MUTEX_NORMAL was incorrect but unfortunately copied into
ovl_create_temp().

Note to backporters: This patch only applies after the last Fixes given
below (post v6.16). To fix the bug in v6.7 and later the
inode_lock() call in ovl_copy_up_workdir() needs to nest using
I_MUTEX_PARENT.

Link: https://lore.kernel.org/all/67a72070.050a0220.3d72c.0022.GAE@google.com/
Cc: stable@vger.kernel.org
Reported-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com
Tested-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com
Fixes: c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held")
Fixes: d2c995581c7c ("ovl: Call ovl_create_temp() without lock held.")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Linux 6.17-rc2

Merge tag 'x86_urgent_for_v6.17_rc2' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Remove a transitional asm/cpuid.h header which was added only as a
   fallback during cpuid helpers reorg

- Initialize reserved fields in the SVSM page validation calls
   structure to zero in order to allow for future structure extensions

- Have the sev-guest driver's buffers used in encryption operations be
   in linear mapping space as the encryption operation can be offloaded
   to an accelerator

- Have a read-only MSR write when in an AMD SNP guest trap to the
   hypervisor as it is usually done. This makes the guest user
   experience better by simply raising a #GP instead of terminating said
   guest

- Do not output AVX512 elapsed time for kernel threads because the data
   is wrong and fix a NULL pointer dereferencing in the process

- Adjust the SRSO mitigation selection to the new attack vectors

* tag 'x86_urgent_for_v6.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpuid: Remove transitional <asm/cpuid.h> header
  x86/sev: Ensure SVSM reserved fields in a page validation entry are initialized to zero
  virt: sev-guest: Satisfy linear mapping requirement in get_derived_key()
  x86/sev: Improve handling of writes to intercepted TSC MSRs
  x86/fpu: Fix NULL dereference in avx512_status()
  x86/bugs: Select best SRSO mitigation

Merge tag 'locking_urgent_for_v6.17_rc2' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:

- Make sure sanity checks down in the mutex lock path happen on the
   correct type of task so that they don't trigger falsely

- Use the write unsafe user access pairs when writing a futex value to
   prevent an error on PowerPC which does user read and write accesses
   differently

* tag 'locking_urgent_for_v6.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking: Fix __clear_task_blocked_on() warning from __ww_mutex_wound() path
  futex: Use user_write_access_begin/_end() in futex_put_value()

Merge tag 'rust-fixes-6.17' of git://git./linux/kernel/git/ojeda/linux

Pull rust fixes from Miguel Ojeda:

- Workaround 'rustdoc' target modifiers bug in Rust >= 1.88.0. It will
   be fixed in Rust 1.90.0 (expected 2025-09-18).

- Clean 'rustdoc' output before running it to avoid confusing the tool
   when files from previous versions remain.

* tag 'rust-fixes-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: kbuild: clean output before running `rustdoc`
  rust: workaround `rustdoc` target modifiers bug

Merge tag 'ata-ata-6.17-rc2' of git://git./linux/kernel/git/libata/linux

Pull ata fixes from Damien Le Moal:

- Fix a regression affecting old IDE/PATA device scan and introduced by
   the recent link power management cleanups & fixes. The regression
   prevented devices from being properly detected (me)

- Fix command duration limits (CDL) feature control: attempting to
   enable the feature while NCQ commands are being executed resulted in
   a silent failure to enable CDL when needed (Igor)

* tag 'ata-ata-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-scsi: Fix CDL control
  ata: libata-eh: Fix link state check for IDE/PATA ports

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"One core change removing the 'w' access flag of attributes that don't
  have a set routine (and therefore can't be written to) which should
  have no practical impact. The big scsi_debug update is caused by
  reformatting lots of arrays and the rest of the bug fixes in drivers
  are trivial"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Remove error print for devm_add_action_or_reset()
  scsi: ufs: mediatek: Fix out-of-bounds access in MCQ IRQ mapping
  scsi: lpfc: Remove redundant assignment to avoid memory leak
  scsi: lpfc: Fix wrong function reference in a comment
  scsi: ufs: core: Fix interrupt handling for MCQ Mode
  scsi: scsi_debug: Make read-only arrays static const
  scsi: core: sysfs: Correct sysfs attributes access rights

Merge tag 'drm-fixes-2025-08-16' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Relatively quiet week, usual amdgpu/i915/xe fixes along with a set of
  fixes for fbdev format info, which fix some regressions seen in with
  rc1.

  bridge:
   - fix OF-node leak
   - fix documentation

  fbdev-emulation:
   - pass correct format info to drm_helper_mode_fill_fb_struct()

  panfrost:
   - print correct RSS size

  amdgpu:
   - PSP fix
   - VRAM reservation fix
   - CSA fix
   - Process kill fix

  i915:
   - Fix the implementation of wa_18038517565 [fbc]
   - Do not trigger Frame Change events from frontbuffer flush [psr]

  xe:
   - Some more xe_migrate_access_memory fixes (Auld)
   - Defer buffer object shrinker write-backs and GPU waits (Thomas)
   - HWMON fix for clamping limits (Karthik)
   - SRIOV-PF: Set VF LMEM BAR size (Michal)"

* tag 'drm-fixes-2025-08-16' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/pf: Set VF LMEM BAR size
  drm/amdgpu: fix task hang from failed job submission during process kill
  drm/amdgpu: fix incorrect vm flags to map bo
  drm/amdgpu: fix vram reservation issue
  drm/amdgpu: Add PSP fw version check for fw reserve GFX command
  drm/xe/hwmon: Add SW clamp for power limits writes
  drm/xe: Defer buffer object shrinker write-backs and GPU waits
  drm/xe/migrate: prevent potential UAF
  drm/xe/migrate: don't overflow max copy size
  drm/xe/migrate: prevent infinite recursion
  drm/i915/psr: Do not trigger Frame Change events from frontbuffer flush
  drm/i915/fbc: fix the implementation of wa_18038517565
  drm/panfrost: Print RSS for tiler heap BO's in debugfs GEMS file
  drm/radeon: Pass along the format info from .fb_create() to drm_helper_mode_fill_fb_struct()
  drm/nouveau: Pass along the format info from .fb_create() to drm_helper_mode_fill_fb_struct()
  drm/omap: Pass along the format info from .fb_create() to drm_helper_mode_fill_fb_struct()
  drm/bridge: document HDMI CEC callbacks
  drm/bridge: Describe the newly introduced drm_connector parameter for drm_bridge_detect
  drm/bridge: fix OF node leak

Merge tag 'xfs-fixes-6.17-rc2' of git://git./fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

- Fix an assert trigger introduced during the merge window

- Prevent atomic writes to be used with DAX

- Prevent users from using the max_atomic_write mount option without
   reflink, as atomic writes > 1block are not supported without reflink

- Fix a null-pointer-deref in a tracepoint

* tag 'xfs-fixes-6.17-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: split xfs_zone_record_blocks
  xfs: fix scrub trace with null pointer in quotacheck
  xfs: reject max_atomic_write mount option for no reflink
  xfs: disallow atomic writes on DAX
  fs/dax: Reject IOCB_ATOMIC in dax_iomap_rw()
  xfs: remove XFS_IBULK_SAME_AG
  xfs: fully decouple XFS_IBULK* flags from XFS_IWALK* flags
  xfs: fix frozen file system assert in xfs_trans_alloc

Merge tag 'block-6.17-20250815' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- Fix for unprivileged daemons in ublk

- Speedup ublk release by removing unnecessary quiesce

- Fix for blk-wbt, where a regression caused it to not be possible to
   enable at runtime

- blk-wbt cleanups

- Kill the page pool from drbd

- Remove redundant __GFP_NOWARN uses in a few spots

- Fix for a kobject double initialization issues

* tag 'block-6.17-20250815' of git://git.kernel.dk/linux:
  block: restore default wbt enablement
  Docs: admin-guide: Correct spelling mistake
  blk-wbt: doc: Update the doc of the wbt_lat_usec interface
  blk-wbt: Eliminate ambiguity in the comments of struct rq_wb
  blk-wbt: Optimize wbt_done() for non-throttled writes
  block: fix kobject double initialization in add_disk
  blk-cgroup: remove redundant __GFP_NOWARN
  block, bfq: remove redundant __GFP_NOWARN
  ublk: check for unprivileged daemon on each I/O fetch
  ublk: don't quiesce in ublk_ch_release
  drbd: Remove the open-coded page pool

x86/cpuid: Remove transitional <asm/cpuid.h> header

All CPUID call sites were updated at commit:

968e30006807 ("x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header")

to include <asm/cpuid/api.h> instead of <asm/cpuid.h>.

The <asm/cpuid.h> header was still retained as a wrapper, just in case
some new code in -next started using it. Now that everything is merged
to Linus' tree, remove the header.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250815070227.19981-2-darwi@linutronix.de

x86/sev: Ensure SVSM reserved fields in a page validation entry are initialized to zero

In order to support future versions of the SVSM_CORE_PVALIDATE call, all
reserved fields within a PVALIDATE entry must be set to zero as an SVSM should
be ensuring all reserved fields are zero in order to support future usage of
reserved areas based on the protocol version.

Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/7cde412f8b057ea13a646fb166b1ca023f6a5031.1755098819.git.thomas.lendacky@amd.com

virt: sev-guest: Satisfy linear mapping requirement in get_derived_key()

Commit

7ffeb2fc2670 ("x86/sev: Document requirement for linear mapping of guest request buffers")

added a check that requires the guest request buffers to be in the linear
mapping. The get_derived_key() function was passing a buffer that was
allocated on the stack, resulting in the call to snp_send_guest_request()
returning an error.

Update the get_derived_key() function to use an allocated buffer instead
of a stack buffer.

Fixes: 7ffeb2fc2670 ("x86/sev: Document requirement for linear mapping of guest request buffers")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/9b764ca9fc79199a091aac684c4926e2080ca7a8.1752698495.git.thomas.lendacky@amd.com

Merge tag 'io_uring-6.17-20250815' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Tweak for the fairly recent changes of minimizing io-wq worker
   creations when it's pointless to create them.

- Fix for an issue with ring provided buffers, which could cause issues
   with reuse or corrupt application data.

* tag 'io_uring-6.17-20250815' of git://git.kernel.dk/linux:
  io_uring/io-wq: add check free worker before create new worker
  io_uring/net: commit partial buffers on retry

pidfs: Fix memory leak in pidfd_info()

After running the program 'ioctl_pidfd03' of Linux Test Project (LTP) or
the program 'pidfd_info_test' in 'tools/testing/selftests/pidfd' of the
kernel source, kmemleak reports the following memory leaks:

  # cat /sys/kernel/debug/kmemleak
  unreferenced object 0xff110020e5988000 (size 8216):
    comm "ioctl_pidfd03", pid 10853, jiffies 4294800031
    hex dump (first 32 bytes):
      02 40 00 00 00 00 00 00 10 00 00 00 00 00 00 00  .@..............
      00 00 00 00 af 01 00 00 80 00 00 00 00 00 00 00  ................
    backtrace (crc 69483047):
      kmem_cache_alloc_node_noprof+0x2fb/0x410
      copy_process+0x178/0x1740
      kernel_clone+0x99/0x3b0
      __do_sys_clone3+0xbe/0x100
      do_syscall_64+0x7b/0x2c0
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
  ...
  unreferenced object 0xff11002097b70000 (size 8216):
  comm "pidfd_info_test", pid 11840, jiffies 4294889165
  hex dump (first 32 bytes):
    06 40 00 00 00 00 00 00 10 00 00 00 00 00 00 00  .@..............
    00 00 00 00 b5 00 00 00 80 00 00 00 00 00 00 00  ................
  backtrace (crc a6286bb7):
    kmem_cache_alloc_node_noprof+0x2fb/0x410
    copy_process+0x178/0x1740
    kernel_clone+0x99/0x3b0
    __do_sys_clone3+0xbe/0x100
    do_syscall_64+0x7b/0x2c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e
  ...

The leak occurs because pidfd_info() obtains a task_struct via
get_pid_task() but never calls put_task_struct() to drop the reference,
leaving task->usage unbalanced.

Fix the issue by adding '__free(put_task) = NULL' to the local variable
'task', ensuring that put_task_struct() is automatically invoked when
the variable goes out of scope.

Fixes: 7477d7dce48a ("pidfs: allow to retrieve exit information")
Signed-off-by: Adrian Huang (Lenovo) <adrianhuang0701@gmail.com>
Link: https://lore.kernel.org/20250814094453.15232-1-adrianhuang0701@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

Merge tag 'sound-6.17-rc2' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes:

   - Potential OOB access fixes in USB-audio driver

   - ASoC kconfig menu fix for improving the generic drivers

   - HD-audio quirks and a fix revert

   - Codec and platform-specific small fixes for ASoC"

* tag 'sound-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/tas2781: Normalize the volume kcontrol name
  ALSA: usb-audio: Validate UAC3 cluster segment descriptors
  ALSA: usb-audio: Validate UAC3 power domain descriptors, too
  Revert "ALSA: hda: Add ASRock X670E Taichi to denylist"
  ALSA: azt3328: Put __maybe_unused for inline functions for gameport
  ASoC: tas2781: Normalize the volume kcontrol name
  ASoC: stm: stm32_i2s: Fix calc_clk_div() error handling in determine_rate()
  ASoC: codecs: Call strscpy() with correct size argument
  ALSA: hda/realtek: Fix headset mic on HONOR BRB-X
  ALSA: hda/realtek: Add Framework Laptop 13 (AMD Ryzen AI 300) to quirks
  ASoC: tas2781: Fix spelling mistake "dismatch" -> "mismatch"
  ASoC: rt1320: fix random cycle mute issue
  ASoC: rt721: fix FU33 Boost Volume control not working
  ASoC: generic: tidyup standardized ASoC menu for generic
  ASoC: codec: sma1307: replace spelling mistake with new error message
  ASoC: codecs: tx-macro: correct tx_macro_component_drv name
  ASoC: fsl_sai: replace regmap_write with regmap_update_bits

netfs: Fix unbuffered write error handling

If all the subrequests in an unbuffered write stream fail, the subrequest
collector doesn't update the stream->transferred value and it retains its
initial LONG_MAX value.  Unfortunately, if all active streams fail, then we
take the smallest value of { LONG_MAX, LONG_MAX, ... } as the value to set
in wreq->transferred - which is then returned from ->write_iter().

LONG_MAX was chosen as the initial value so that all the streams can be
quickly assessed by taking the smallest value of all stream->transferred -
but this only works if we've set any of them.

Fix this by adding a flag to indicate whether the value in
stream->transferred is valid and checking that when we integrate the
values.  stream->transferred can then be initialised to zero.

This was found by running the generic/750 xfstest against cifs with
cache=none.  It splices data to the target file.  Once (if) it has used up
all the available scratch space, the writes start failing with ENOSPC.
This causes ->write_iter() to fail.  However, it was returning
wreq->transferred, i.e. LONG_MAX, rather than an error (because it thought
the amount transferred was non-zero) and iter_file_splice_write() would
then try to clean up that amount of pipe bufferage - leading to an oops
when it overran.  The kernel log showed:

    CIFS: VFS: Send error in write = -28

followed by:

    BUG: kernel NULL pointer dereference, address: 0000000000000008

with:

    RIP: 0010:iter_file_splice_write+0x3a4/0x520
    do_splice+0x197/0x4e0

or:

    RIP: 0010:pipe_buf_release (include/linux/pipe_fs_i.h:282)
    iter_file_splice_write (fs/splice.c:755)

Also put a warning check into splice to announce if ->write_iter() returned
that it had written more than it was asked to.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Reported-by: Xiaoli Feng <fengxiaoli0714@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220445
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/915443.1755207950@warthog.procyon.org.uk
cc: Paulo Alcantara <pc@manguebit.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

Merge tag 'gpio-fixes-for-v6.17-rc2' of git://git./linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:

- fix the way optional interrupts are retrieved from firmware in
   gpio-mlxbf3

* tag 'gpio-fixes-for-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: mlxbf3: use platform_get_irq_optional()
  Revert "gpio: mlxbf3: only get IRQ for device instance 0"

fhandle: do_handle_open() should get FD with user flags

In f07c7cc4684a, do_handle_open() was switched to use the automatic
cleanup method for getting a FD. In that change it was also switched
to pass O_CLOEXEC unconditionally to get_unused_fd_flags() instead
of passing the user-specified flags.

I don't see anything in that commit description that indicates this was
intentional, so I am assuming it was an oversight.

With this fix, the FD will again be opened with, or without, O_CLOEXEC
according to what the user requested.

Fixes: f07c7cc4684a ("fhandle: simplify error handling")
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Link: https://lore.kernel.org/20250814235431.995876-4-tahbertschinger@gmail.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

Merge tag 'pmdomain-v6.17-rc1' of git://git./linux/kernel/git/ulfh/linux-pm

Pull pmdomain fix from Ulf Hansson:

- tegra: Ensure pmc power-domains are in a known state

* tag 'pmdomain-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
soc/tegra: pmc: Ensure power-domains are in a known state

Merge tag '6.17-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix unlink race and rename races

- SMB3.1.1 compression fix

- Avoid unneeded strlen calls in cifs_get_spnego_key

- Fix slab out of bounds in parse_server_interfaces()

- Fix mid leak and server buffer leak

- smbdirect send error path fix

- update internal version #

- Fix unneeded response time update in negotiate protocol

* tag '6.17-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: remove redundant lstrp update in negotiate protocol
  cifs: update internal version number
  smb: client: don't wait for info->send_pending == 0 on error
  smb: client: fix mid_q_entry memleak leak with per-mid locking
  smb3: fix for slab out of bounds on mount to ksmbd
  cifs: avoid extra calls to strlen() in cifs_get_spnego_key()
  cifs: Fix collect_sample() to handle any iterator type
  smb: client: fix race with concurrent opens in rename(2)
  smb: client: fix race with concurrent opens in unlink(2)

Merge tag 'firewire-fixes-6.17-rc1' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire fixes from Takashi Sakamoto:
"This fixes a potential call to schedule() within an RCU read-side
  critical section. The solution applies reference counting to ensure
  that handlers which may call schedule() are invoked safely outside of
  the critical section"

* tag 'firewire-fixes-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: reallocate buffer for FCP address handlers when more than 4 are registered
  firewire: core: call FCP address handlers outside RCU read-side critical section
  firewire: core: call handler for exclusive regions outside RCU read-side critical section
  firewire: core: use reference counting to invoke address handlers safely

Merge tag 'drm-xe-fixes-2025-08-14' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Some more xe_migrate_access_memory fixes (Auld)
- Defer buffer object shrinker write-backs and GPU waits (Thomas)
- HWMON fix for clamping limits (Karthik)
- SRIOV-PF: Set VF LMEM BAR size (Michal)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aJ4MIZQurSo0uNxn@intel.com

Merge tag 'drm-intel-fixes-2025-08-13' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix the implementation of wa_18038517565 [fbc] (Vinod Govindapillai)
- Do not trigger Frame Change events from frontbuffer flush [psr] (Jouni Högander)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://lore.kernel.org/r/aJ0HAh06VHWVdv63@linux

Merge tag 'acpi-6.17-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These restore corner case behavior of the EC driver related to the
  handling of defective ACPI tables and fix a recent regression in the
  ACPI processor driver:

   - Prevent the ACPI EC driver from ignoring ECDT information in the
     cases when the ID string in the ECDT is invalid, but not empty, to
     fix thouchpad detection on ThinkBook 14 G7 IML (Armin Wolf)

   - Rearrange checks in acpi_processor_ppc_init() to restore the
     handling of frequency QoS requests related to _PPC limits
     inadvertently broken by a recent update (Rafael Wysocki)"

* tag 'acpi-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: EC: Relax sanity check of the ECDT ID string
  ACPI: processor: perflib: Move problematic pr->performance check

Merge tag 'pm-6.17-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These remove an artificial limitation from the intel_idle driver,
  update the menu cpuidle governor to restore its previous behavior in a
  corner case and add one more supported platform configuration to the
  intel_pstate driver:

   - Allow intel_idle to use _CST information from ACPI tables for idle
     states enumeration on any family of processors (Len Brown)

   - Restore corner case behavior of the menu cpuidle governor, related
     to the handling of systems where idle states selected by the
     governor are rejected by the cpuidle driver, inadvertently changed
     during the 6.15 development cycle (Rafael Wysocki)

   - Add support for Clearwater Forest in the out-of-band (OOB) mode to
     the intel_pstate driver (Srinivas Pandruvada)"

* tag 'pm-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Support Clearwater Forest OOB mode
  cpuidle: governors: menu: Avoid using invalid recent intervals data
  intel_idle: Allow loading ACPI tables for any family

drm/xe/pf: Set VF LMEM BAR size

LMEM is partitioned between multiple VFs and we expect that the more
VFs we have, the less LMEM is assigned to each VF.
This means that we can achieve full LMEM BAR access without the need to
attempt full VF LMEM BAR resize via pci_resize_resource().

Always try to set the largest possible BAR size that allows to fit the
number of enabled VFs and inform the user in case the resize attempt is
not successful.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250527120637.665506-7-michal.winiarski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 32a4d1b98e6663101fd0abfaf151c48feea7abb1)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge tag 'net-6.17-rc2' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from Netfilter and IPsec.

  Current release - regressions:

   - netfilter: nft_set_pipapo:
      - don't return bogus extension pointer
      - fix null deref for empty set

  Current release - new code bugs:

   - core: prevent deadlocks when enabling NAPIs with mixed kthread
     config

   - eth: netdevsim: Fix wild pointer access in nsim_queue_free().

  Previous releases - regressions:

   - page_pool: allow enabling recycling late, fix false positive
     warning

   - sched: ets: use old 'nbands' while purging unused classes

   - xfrm:
      - restore GSO for SW crypto
      - bring back device check in validate_xmit_xfrm

   - tls: handle data disappearing from under the TLS ULP

   - ptp: prevent possible ABBA deadlock in ptp_clock_freerun()

   - eth:
      - bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
      - hv_netvsc: fix panic during namespace deletion with VF

  Previous releases - always broken:

   - netfilter: fix refcount leak on table dump

   - vsock: do not allow binding to VMADDR_PORT_ANY

   - sctp: linearize cloned gso packets in sctp_rcv

   - eth:
      - hibmcge: fix the division by zero issue
      - microchip: fix KSZ8863 reset problem"

* tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
  net: usb: asix_devices: add phy_mask for ax88772 mdio bus
  net: kcm: Fix race condition in kcm_unattach()
  selftests: net/forwarding: test purge of active DWRR classes
  net/sched: ets: use old 'nbands' while purging unused classes
  bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
  netdevsim: Fix wild pointer access in nsim_queue_free().
  net: mctp: Fix bad kfree_skb in bind lookup test
  netfilter: nf_tables: reject duplicate device on updates
  ipvs: Fix estimator kthreads preferred affinity
  netfilter: nft_set_pipapo: fix null deref for empty set
  selftests: tls: test TCP stealing data from under the TLS socket
  tls: handle data disappearing from under the TLS ULP
  ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
  ixgbe: prevent from unwanted interface name changes
  devlink: let driver opt out of automatic phys_port_name generation
  net: prevent deadlocks when enabling NAPIs with mixed kthread config
  net: update NAPI threaded config even for disabled NAPIs
  selftests: drv-net: don't assume device has only 2 queues
  docs: Fix name for net.ipv4.udp_child_hash_entries
  riscv: dts: thead: Add APB clocks for TH1520 GMACs
  ...

Merge branches 'acpi-ec' and 'acpi-processor'

* acpi-ec:
ACPI: EC: Relax sanity check of the ECDT ID string

* acpi-processor:
ACPI: processor: perflib: Move problematic pr->performance check

Merge branches 'pm-cpuidle' and 'pm-cpufreq'

* pm-cpuidle:
  cpuidle: governors: menu: Avoid using invalid recent intervals data
  intel_idle: Allow loading ACPI tables for any family

* pm-cpufreq:
  cpufreq: intel_pstate: Support Clearwater Forest OOB mode

ata: libata-scsi: Fix CDL control

Delete extra checks for the ATA_DFLAG_CDL_ENABLED flag that prevent
SET FEATURES command from being issued to a drive when NCQ commands
are active.

ata_mselect_control_ata_feature() sets / clears the ATA_DFLAG_CDL_ENABLED
flag during the translation of MODE SELECT to SET FEATURES. If SET FEATURES
gets deferred due to outstanding NCQ commands, the original MODE SELECT
command will be re-queued. When the re-queued MODE SELECT goes through
the ata_mselect_control_ata_feature() translation again, SET FEATURES
will not be issued because ATA_DFLAG_CDL_ENABLED has been already set or
cleared by the initial translation of MODE SELECT.

The ATA_DFLAG_CDL_ENABLED checks in ata_mselect_control_ata_feature()
are safe to remove because scsi_cdl_enable() implements a similar logic
that avoids enabling CDL if it has been enabled already.

Fixes: 17e897a45675 ("ata: libata-scsi: Improve CDL control")
Cc: stable@vger.kernel.org
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

ata: libata-eh: Fix link state check for IDE/PATA ports

Commit 4371fe1ba400 ("ata: libata-eh: Avoid unnecessary resets when
revalidating devices") replaced the call to ata_phys_link_offline() in
ata_eh_revalidate_and_attach() with the new function
ata_eh_link_established() which relaxes the checks on a device link
state to account for low power mode transitions. However, this change
assumed that the device port has a valid scr_read method to obtain the
SStatus register for the port. This is not always the case, especially
with older IDE/PATA adapters (e.g. PATA/IDE devices emulated with QEMU).
For such adapter, ata_eh_link_established() will always return false,
causing ata_eh_revalidate_and_attach() to go into its error path and
ultimately to the device being disabled.

Avoid this by restoring the previous behavior, which is to assume that
the link is online if reading the port SStatus register fails.

While at it, also fix the spelling of SStatus in the comment describing
the function ata_eh_link_established().

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 4371fe1ba400 ("ata: libata-eh: Avoid unnecessary resets when revalidating devices")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>

ALSA: hda/tas2781: Normalize the volume kcontrol name

Change the name of the kcontrol from "Gain" to "Volume".

Signed-off-by: Baojun Xu <baojun.xu@ti.com>
Link: https://patch.msgid.link/20250813100842.12224-1-baojun.xu@ti.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: usb-audio: Validate UAC3 cluster segment descriptors

UAC3 class segment descriptors need to be verified whether their sizes
match with the declared lengths and whether they fit with the
allocated buffer sizes, too. Otherwise malicious firmware may lead to
the unexpected OOB accesses.

Fixes: 11785ef53228 ("ALSA: usb-audio: Initial Power Domain support")
Reported-and-tested-by: Youngjun Lee <yjjuny.lee@samsung.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20250814081245.8902-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: usb-audio: Validate UAC3 power domain descriptors, too

UAC3 power domain descriptors need to be verified with its variable
bLength for avoiding the unexpected OOB accesses by malicious
firmware, too.

Fixes: 9a2fe9b801f5 ("ALSA: usb: initial USB Audio Device Class 3.0 support")
Reported-and-tested-by: Youngjun Lee <yjjuny.lee@samsung.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20250814081245.8902-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

net: usb: asix_devices: add phy_mask for ax88772 mdio bus

Without setting phy_mask for ax88772 mdio bus, current driver may create
at most 32 mdio phy devices with phy address range from 0x00 ~ 0x1f.
DLink DUB-E100 H/W Ver B1 is such a device. However, only one main phy
device will bind to net phy driver. This is creating issue during system
suspend/resume since phy_polling_mode() in phy_state_machine() will
directly deference member of phydev->drv for non-main phy devices. Then
NULL pointer dereference issue will occur. Due to only external phy or
internal phy is necessary, add phy_mask for ax88772 mdio bus to workarnoud
the issue.

Closes: https://lore.kernel.org/netdev/20250806082931.3289134-1-xu.yang_2@nxp.com
Fixes: e532a096be0e ("net: usb: asix: ax88772: add phylib support")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250811092931.860333-1-xu.yang_2@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'asoc-fix-v6.17-rc1' of https://git./linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.17

A reasonably small collection of fixes that came in since the merge
window, mostly small and driver specific plus a cleanup of the menu
reorganisation to address some user confusion with the way the generic
drivers had been handled.

Merge tag 'probes-fixes-v6.17-rc1' of git://git./linux/kernel/git/trace/linux-trace

Pull probes fix from Masami Hiramatsu:

- MAINTAINERS: Remove bouncing kprobes maintainer

* tag 'probes-fixes-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: Remove bouncing kprobes maintainer

MAINTAINERS: Remove bouncing kprobes maintainer

The kprobes MAINTAINERS entry includes anil.s.keshavamurthy@intel.com.
That address is bouncing. Remove it.

This still leaves three other listed maintainers.

Link: https://lore.kernel.org/all/20250808180124.7DDE2ECD@davehans-spike.ostc.intel.com/
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

net: kcm: Fix race condition in kcm_unattach()

syzbot found a race condition when kcm_unattach(psock)
and kcm_release(kcm) are executed at the same time.

kcm_unattach() is missing a check of the flag
kcm->tx_stopped before calling queue_work().

If the kcm has a reserved psock, kcm_unattach() might get executed
between cancel_work_sync() and unreserve_psock() in kcm_release(),
requeuing kcm->tx_work right before kcm gets freed in kcm_done().

Remove kcm->tx_stopped and replace it by the less
error-prone disable_work_sync().

Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+e62c9db591c30e174662@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e62c9db591c30e174662
Reported-by: syzbot+d199b52665b6c3069b94@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d199b52665b6c3069b94
Reported-by: syzbot+be6b1fdfeae512726b4e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=be6b1fdfeae512726b4e
Signed-off-by: Sven Stegemann <sven@stegemann.de>
Link: https://patch.msgid.link/20250812191810.27777-1-sven@stegemann.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ets-use-old-nbands-while-purging-unused-classes'

Davide Caratti says:

====================
ets: use old 'nbands' while purging unused classes

- patch 1/2 fixes a NULL dereference in the control path of sch_ets qdisc
- patch 2/2 extends kselftests to verify effectiveness of the above fix
====================

Link: https://patch.msgid.link/cover.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net/forwarding: test purge of active DWRR classes

Extend sch_ets.sh to add a reproducer for problematic list deletions when
active DWRR class are purged by ets_qdisc_change() [1] [2].

[1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/f3b9bacc73145f265c19ab80785933da5b7cbdec.1754581577.git.dcaratti@redhat.com/

Suggested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/489497cb781af7389011ca1591fb702a7391f5e7.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: ets: use old 'nbands' while purging unused classes

Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify()
after recent changes from Lion [2]. The problem is: in ets_qdisc_change()
we purge unused DWRR queues; the value of 'q->nbands' is the new one, and
the cleanup should be done with the old one. The problem is here since my
first attempts to fix ets_qdisc_change(), but it surfaced again after the
recent qdisc len accounting fixes. Fix it purging idle DWRR queues before
assigning a new value of 'q->nbands', so that all purge operations find a
consistent configuration:

- old 'q->nbands' because it's needed by ets_class_find()
- old 'q->nstrict' because it's needed by ets_class_is_strict()

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary)
Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021
RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80
Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab
RSP: 0018:ffffba186009f400 EFLAGS: 00010202
RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004
RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004
R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000
R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000
FS:  00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
  <TASK>
  ets_class_qlen_notify+0x65/0x90 [sch_ets]
  qdisc_tree_reduce_backlog+0x74/0x110
  ets_qdisc_change+0x630/0xa40 [sch_ets]
  __tc_modify_qdisc.constprop.0+0x216/0x7f0
  tc_modify_qdisc+0x7c/0x120
  rtnetlink_rcv_msg+0x145/0x3f0
  netlink_rcv_skb+0x53/0x100
  netlink_unicast+0x245/0x390
  netlink_sendmsg+0x21b/0x470
  ____sys_sendmsg+0x39d/0x3d0
  ___sys_sendmsg+0x9a/0xe0
  __sys_sendmsg+0x7a/0xd0
  do_syscall_64+0x7d/0x160
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f2155114084
Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084
RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003
RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f
R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0
R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0
  </TASK>

[1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
[2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

Cc: stable@vger.kernel.org
Fixes: 103406b38c60 ("net/sched: Always pass notifications when child class becomes empty")
Fixes: c062f2a0b04d ("net/sched: sch_ets: don't remove idle classes from the round-robin list")
Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Reported-by: Li Shuang <shuali@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-108026
Reviewed-by: Petr Machata <petrm@nvidia.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
ixgbe: bypass devlink phys_port_name generation

Jedrzej adds option to skip phys_port_name generation and opts
ixgbe into it as some configurations rely on pre-devlink naming
which could end up broken as a result.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: prevent from unwanted interface name changes
devlink: let driver opt out of automatic phys_port_name generation
====================

Link: https://patch.msgid.link/20250812205226.1984369-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE

The data page pool always fills the HW rx ring with pages. On arm64 with
64K pages, this will waste _at least_ 32K of memory per entry in the rx
ring.

Fix by fragmenting the pages if PAGE_SIZE > BNXT_RX_PAGE_SIZE. This
makes the data page pool the same as the header pool.

Tested with iperf3 with a small (64 entries) rx ring to encourage buffer
circulation.

Fixes: cd1fafe7da1f ("eth: bnxt: add support rx side device memory TCP")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250812182907.1540755-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: Fix wild pointer access in nsim_queue_free().

syzbot reported the splat below. [0]

When nsim_queue_uninit() is called from nsim_init_netdevsim(),
register_netdevice() has not been called, thus dev->dstats has
not been allocated.

Let's not call dev_dstats_rx_dropped_add() in such a case.

[0]
BUG: unable to handle page fault for address: ffff88809782c020
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
PGD 1b401067 P4D 1b401067 PUD 0
Oops: Oops: 0002 [#1] SMP KASAN NOPTI
CPU: 3 UID: 0 PID: 8476 Comm: syz.1.251 Not tainted 6.16.0-syzkaller-06699-ge8d780dcd957 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:local_add arch/x86/include/asm/local.h:33 [inline]
RIP: 0010:u64_stats_add include/linux/u64_stats_sync.h:89 [inline]
RIP: 0010:dev_dstats_rx_dropped_add include/linux/netdevice.h:3027 [inline]
RIP: 0010:nsim_queue_free+0xba/0x120 drivers/net/netdevsim/netdev.c:714
Code: 07 77 6c 4a 8d 3c ed 20 7e f1 8d 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 46 4a 03 1c ed 20 7e f1 8d <4c> 01 63 20 be 00 02 00 00 48 8d 3d 00 00 00 00 e8 61 2f 58 fa 48
RSP: 0018:ffffc900044af150 EFLAGS: 00010286
RAX: dffffc0000000000 RBX: ffff88809782c000 RCX: 00000000000079c3
RDX: 1ffffffff1be2fc7 RSI: ffffffff8c15f380 RDI: ffffffff8df17e38
RBP: ffff88805f59d000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000003 R14: ffff88806ceb3d00 R15: ffffed100dfd308e
FS: 0000000000000000(0000) GS:ffff88809782c000(0063) knlGS:00000000f505db40
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: ffff88809782c020 CR3: 000000006fc6a000 CR4: 0000000000352ef0
Call Trace:
<TASK>
nsim_queue_uninit drivers/net/netdevsim/netdev.c:993 [inline]
nsim_init_netdevsim drivers/net/netdevsim/netdev.c:1049 [inline]
nsim_create+0xd0a/0x1260 drivers/net/netdevsim/netdev.c:1101
__nsim_dev_port_add+0x435/0x7d0 drivers/net/netdevsim/dev.c:1438
nsim_dev_port_add_all drivers/net/netdevsim/dev.c:1494 [inline]
nsim_dev_reload_create drivers/net/netdevsim/dev.c:1546 [inline]
nsim_dev_reload_up+0x5b8/0x860 drivers/net/netdevsim/dev.c:1003
devlink_reload+0x322/0x7c0 net/devlink/dev.c:474
devlink_nl_reload_doit+0xe31/0x1410 net/devlink/dev.c:584
genl_family_rcv_msg_doit+0x206/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x155/0x420 net/netlink/af_netlink.c:2552
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346
netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896
sock_sendmsg_nosec net/socket.c:714 [inline]
__sock_sendmsg net/socket.c:729 [inline]
____sys_sendmsg+0xa95/0xc70 net/socket.c:2614
___sys_sendmsg+0x134/0x1d0 net/socket.c:2668
__sys_sendmsg+0x16d/0x220 net/socket.c:2700
do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
__do_fast_syscall_32+0x7c/0x3a0 arch/x86/entry/syscall_32.c:306
do_fast_syscall_32+0x32/0x80 arch/x86/entry/syscall_32.c:331
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
RIP: 0023:0xf708e579
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:00000000f505d55c EFLAGS: 00000296 ORIG_RAX: 0000000000000172
RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 0000000080000080
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Modules linked in:
CR2: ffff88809782c020

Fixes: 2a68a22304f9 ("netdevsim: account dropped packet length in stats on queue free")
Reported-by: syzbot+8aa80c6232008f7b957d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/688bb9ca.a00a0220.26d0e1.0050.GAE@google.com/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250812162130.4129322-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mctp: Fix bad kfree_skb in bind lookup test

The kunit test's skb_pkt is consumed by mctp_dst_input() so shouldn't be
freed separately.

Fixes: e6d8e7dbc5a3 ("net: mctp: Add bind lookup test")
Reported-by: Alexandre Ghiti <alex@ghiti.fr>
Closes: https://lore.kernel.org/all/734b02a3-1941-49df-a0da-ec14310d41e4@ghiti.fr/
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://patch.msgid.link/20250812-fix-mctp-bind-test-v1-1-5e2128664eb3@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'amd-drm-fixes-6.17-2025-08-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.17-2025-08-13:

amdgpu:
- PSP fix
- VRAM reservation fix
- CSA fix
- Process kill fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250813151905.2040816-1-alexander.deucher@amd.com

Merge tag 'nf-25-08-13' of https://git./linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for *net*:

1) I managed to add a null dereference crash in nft_set_pipapo
   in the current development cycle, was not caught by CI
   because the avx2 implementation is fine, but selftest
   splats when run on non-avx2 host.

2) Fix the ipvs estimater kthread affinity, was incorrect
   since 6.14. From Frederic Weisbecker.

3) nf_tables should not allow to add a device to a flowtable
   or netdev chain more than once -- reject this.
   From Pablo Neira Ayuso.  This has been broken for long time,
   blamed commit dates from v5.8.

* tag 'nf-25-08-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: reject duplicate device on updates
  ipvs: Fix estimator kthreads preferred affinity
  netfilter: nft_set_pipapo: fix null deref for empty set
====================

Link: https://patch.msgid.link/20250813113800.20775-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'drm-misc-next-fixes-2025-08-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

bridge:
- fix OF-node leak
- fix documentation

fbdev-emulation:
- pass correct format info to drm_helper_mode_fill_fb_struct()

panfrost:
- print correct RSS size

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250812064712.GA14554@2a02-2454-fd5e-fd00-2c49-c639-c55f-a125.dyn6.pyur.net

Merge tag 'erofs-for-6.17-rc2-fixes' of git://git./linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

- Align FSDAX enablement among multiple devices

- Fix EROFS_FS_ZIP_ACCEL build dependency again to prevent forcing
   CRYPTO{,_DEFLATE}=y even if EROFS=m

- Fix atomic context detection to properly launch kworkers on demand

- Fix block count statistics for 48-bit addressing support

* tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix block count report when 48-bit layout is on
  erofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOC
  erofs: Do not select tristate symbols from bool symbols
  erofs: Fallback to normal access if DAX is not supported on extra device

jbd2: prevent softlockup in jbd2_log_do_checkpoint()

Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list()
periodically release j_list_lock after processing a batch of buffers to
avoid long hold times on the j_list_lock. However, since both functions
contend for j_list_lock, the combined time spent waiting and processing
can be significant.

jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when
need_resched() is true to avoid softlockups during prolonged operations.
But jbd2_log_do_checkpoint() only exits its loop when need_resched() is
true, relying on potentially sleeping functions like __flush_batch() or
wait_on_buffer() to trigger rescheduling. If those functions do not sleep,
the kernel may hit a softlockup.

watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373]
CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10
Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017
Workqueue: writeback wb_workfn (flush-7:2)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : native_queued_spin_lock_slowpath+0x358/0x418
lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2]
Call trace:
native_queued_spin_lock_slowpath+0x358/0x418
jbd2_log_do_checkpoint+0x31c/0x438 [jbd2]
__jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2]
add_transaction_credits+0x3bc/0x418 [jbd2]
start_this_handle+0xf8/0x560 [jbd2]
jbd2__journal_start+0x118/0x228 [jbd2]
__ext4_journal_start_sb+0x110/0x188 [ext4]
ext4_do_writepages+0x3dc/0x740 [ext4]
ext4_writepages+0xa4/0x190 [ext4]
do_writepages+0x94/0x228
__writeback_single_inode+0x48/0x318
writeback_sb_inodes+0x204/0x590
__writeback_inodes_wb+0x54/0xf8
wb_writeback+0x2cc/0x3d8
wb_do_writeback+0x2e0/0x2f8
wb_workfn+0x80/0x2a8
process_one_work+0x178/0x3e8
worker_thread+0x234/0x3b8
kthread+0xf0/0x108
ret_from_fork+0x10/0x20

So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid
softlockup.

Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://patch.msgid.link/20250812063752.912130-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

ext4: fix incorrect function name in comment

Since commit 6b730a405037 “ext4: hoist ext4_block_write_begin and
replace the __block_write_begin”, the comment should be updated
accordingly from "__block_write_begin" to "ext4_block_write_begin".

Fixes: 6b730a405037 (“ext4: hoist ext4_block_write_begin and replace...")
Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://patch.msgid.link/20250812021709.1120716-1-liubaolin12138@163.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge tag 'rcu.fixes.6.17' of git://git./linux/kernel/git/rcu/linux

Pull RCU fix from Neeraj Upadhyay:
"Fix a regression introduced by commit b41642c87716 ("rcu: Fix
  rcu_read_unlock() deadloop due to IRQ work") which results in boot
  hang as reported by kernel test bot at [1].

  This issue happens because RCU re-initializes the deferred QS IRQ work
  everytime it is queued. With commit b41642c87716, the IRQ work
  re-initialization can happen while it is already queued. This results
  in IRQ work being requeued to itself. When IRQ work finally fires, as
  it is requeued to itself, it is repeatedly executed and results in
  hang.

  Fix this with initializing the IRQ work only once before the CPU
  boots"

Link: https://lore.kernel.org/rcu/202508071303.c1134cce-lkp@intel.com/
* tag 'rcu.fixes.6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  rcu: Fix racy re-initialization of irq_work causing hangs

smb: client: remove redundant lstrp update in negotiate protocol

Commit 34331d7beed7 ("smb: client: fix first command failure during
re-negotiation") addressed a race condition by updating lstrp before
entering negotiate state. However, this approach may have some unintended
side effects.

The lstrp field is documented as "when we got last response from this
server", and updating it before actually receiving a server response
could potentially affect other mechanisms that rely on this timestamp.
For example, the SMB echo detection logic also uses lstrp as a reference
point. In scenarios with frequent user operations during reconnect states,
the repeated calls to cifs_negotiate_protocol() might continuously
update lstrp, which could interfere with the echo detection timing.

Additionally, commit 266b5d02e14f ("smb: client: fix race condition in
negotiate timeout by using more precise timing") introduced a dedicated
neg_start field specifically for tracking negotiate start time. This
provides a more precise solution for the original race condition while
preserving the intended semantics of lstrp.

Since the race condition is now properly handled by the neg_start
mechanism, the lstrp update in cifs_negotiate_protocol() is no longer
necessary and can be safely removed.

Fixes: 266b5d02e14f ("smb: client: fix race condition in negotiate timeout by using more precise timing")
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: update internal version number

to 2.56

Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: don't wait for info->send_pending == 0 on error

We already called ib_drain_qp() before and that makes sure
send_done() was called with IB_WC_WR_FLUSH_ERR, but
didn't called atomic_dec_and_test(&sc->send_io.pending.count)

So we may never reach the info->send_pending == 0 condition.

Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Fixes: 5349ae5e05fa ("smb: client: let send_done() cleanup before calling smbd_disconnect_rdma_connection()")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: fix mid_q_entry memleak leak with per-mid locking

This is step 4/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.

In compound_send_recv(), when wait_for_response() is interrupted by
signals, the code attempts to cancel pending requests by changing
their callbacks to cifs_cancelled_callback. However, there's a race
condition between signal interruption and network response processing
that causes both mid_q_entry and server buffer leaks:

```
User foreground process                    cifsd
cifs_readdir
open_cached_dir
  cifs_send_recv
   compound_send_recv
    smb2_setup_request
     smb2_mid_entry_alloc
      smb2_get_mid_entry
       smb2_mid_entry_alloc
        mempool_alloc // alloc mid
        kref_init(&temp->refcount); // refcount = 1
     mid[0]->callback = cifs_compound_callback;
     mid[1]->callback = cifs_compound_last_callback;
     smb_send_rqst
     rc = wait_for_response
      wait_event_state TASK_KILLABLE
                                  cifs_demultiplex_thread
                                    allocate_buffers
                                      server->bigbuf = cifs_buf_get()
                                    standard_receive3
                                      ->find_mid()
                                        smb2_find_mid
                                          __smb2_find_mid
                                           kref_get(&mid->refcount) // +1
                                      cifs_handle_standard
                                        handle_mid
                                         /* bigbuf will also leak */
                                         mid->resp_buf = server->bigbuf
                                         server->bigbuf = NULL;
                                         dequeue_mid
                                     /* in for loop */
                                    mids[0]->callback
                                      cifs_compound_callback
    /* Signal interrupts wait: rc = -ERESTARTSYS */
    /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
    midQ[0]->callback = cifs_cancelled_callback;
    cancelled_mid[i] = true;
                                       /* The change comes too late */
                                       mid->mid_state = MID_RESPONSE_READY
                                    release_mid  // -1
    /* cancelled_mid[i] == true causes mid won't be released
       in compound_send_recv cleanup */
    /* cifs_cancelled_callback won't executed to release mid */
```

The root cause is that there's a race between callback assignment and
execution.

Fix this by introducing per-mid locking:

- Add spinlock_t mid_lock to struct mid_q_entry
- Add mid_execute_callback() for atomic callback execution
- Use mid_lock in cancellation paths to ensure atomicity

This ensures that either the original callback or the cancellation
callback executes atomically, preventing reference count leaks when
requests are interrupted by signals.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404
Fixes: ee258d79159a ("CIFS: Move credit processing to mid callbacks for SMB3")
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb3: fix for slab out of bounds on mount to ksmbd

With KASAN enabled, it is possible to get a slab out of bounds
during mount to ksmbd due to missing check in parse_server_interfaces()
(see below):

BUG: KASAN: slab-out-of-bounds in
parse_server_interfaces+0x14ee/0x1880 [cifs]
Read of size 4 at addr ffff8881433dba98 by task mount/9827

CPU: 5 UID: 0 PID: 9827 Comm: mount Tainted: G
OE       6.16.0-rc2-kasan #2 PREEMPT(voluntary)
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. Precision Tower 3620/0MWYPT,
BIOS 2.13.1 06/14/2019
Call Trace:
  <TASK>
dump_stack_lvl+0x9f/0xf0
print_report+0xd1/0x670
__virt_addr_valid+0x22c/0x430
? parse_server_interfaces+0x14ee/0x1880 [cifs]
? kasan_complete_mode_report_info+0x2a/0x1f0
? parse_server_interfaces+0x14ee/0x1880 [cifs]
   kasan_report+0xd6/0x110
   parse_server_interfaces+0x14ee/0x1880 [cifs]
   __asan_report_load_n_noabort+0x13/0x20
   parse_server_interfaces+0x14ee/0x1880 [cifs]
? __pfx_parse_server_interfaces+0x10/0x10 [cifs]
? trace_hardirqs_on+0x51/0x60
SMB3_request_interfaces+0x1ad/0x3f0 [cifs]
? __pfx_SMB3_request_interfaces+0x10/0x10 [cifs]
? SMB2_tcon+0x23c/0x15d0 [cifs]
smb3_qfs_tcon+0x173/0x2b0 [cifs]
? __pfx_smb3_qfs_tcon+0x10/0x10 [cifs]
? cifs_get_tcon+0x105d/0x2120 [cifs]
? do_raw_spin_unlock+0x5d/0x200
? cifs_get_tcon+0x105d/0x2120 [cifs]
? __pfx_smb3_qfs_tcon+0x10/0x10 [cifs]
cifs_mount_get_tcon+0x369/0xb90 [cifs]
? dfs_cache_find+0xe7/0x150 [cifs]
dfs_mount_share+0x985/0x2970 [cifs]
? check_path.constprop.0+0x28/0x50
? save_trace+0x54/0x370
? __pfx_dfs_mount_share+0x10/0x10 [cifs]
? __lock_acquire+0xb82/0x2ba0
? __kasan_check_write+0x18/0x20
cifs_mount+0xbc/0x9e0 [cifs]
? __pfx_cifs_mount+0x10/0x10 [cifs]
? do_raw_spin_unlock+0x5d/0x200
? cifs_setup_cifs_sb+0x29d/0x810 [cifs]
cifs_smb3_do_mount+0x263/0x1990 [cifs]

Reported-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

Revert "ALSA: hda: Add ASRock X670E Taichi to denylist"

On a motherboard with an AMD Granite Ridge CPU there is a report
that 3.5mm microphone and headphones aren't working. In the
log it's observed:

snd_hda_intel 0000:02:00.6: Skipping the device on the denylist

This was because of commit df42ee7e22f03 ("ALSA: hda: Add ASRock
X670E Taichi to denylist"). Reverting this commit allows the
microphone and headphones to work again. As at least some combinations
of this motherboard do have applicable devices, revert so that they
can be probed.

Cc: Richard Gong <richard.gong@amd.com>
Cc: Juan Martinez <juan.martinez@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20250813140427.1577172-1-superm1@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: azt3328: Put __maybe_unused for inline functions for gameport

Some inline functions are unused depending on kconfig, and the recent
change for clang builds made those handled as errors with W=1.
For avoiding pitfalls, mark those with __maybe_unused attributes.

Link: https://patch.msgid.link/20250813153628.12303-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"12 hotfixes. 5 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels.

  10 of these fixes are for MM"

* tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  proc: proc_maps_open allow proc_mem_open to return NULL
  mm/mremap: avoid expensive folio lookup on mremap folio pte batch
  userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
  mm: pass page directly instead of using folio_page
  selftests/proc: fix string literal warning in proc-maps-race.c
  fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats
  mm/smaps: fix race between smaps_hugetlb_range and migration
  mm: fix the race between collapse and PT_RECLAIM under per-vma lock
  mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
  MAINTAINERS: add Masami as a reviewer of hung task detector
  mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
  kasan/test: fix protection against compiler elision

ASoC: tas2781: Normalize the volume kcontrol name

Change the name of the kcontrol from "Gain" to "Volume".

Signed-off-by: Baojun Xu <baojun.xu@ti.com>
Link: https://patch.msgid.link/20250813100708.12197-1-baojun.xu@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>

io_uring/io-wq: add check free worker before create new worker

After commit 0b2b066f8a85 ("io_uring/io-wq: only create a new worker
if it can make progress"), in our produce environment, we still
observe that part of io_worker threads keeps creating and destroying.
After analysis, it was confirmed that this was due to a more complex
scenario involving a large number of fsync operations, which can be
abstracted as frequent write + fsync operations on multiple files in
a single uring instance. Since write is a hash operation while fsync
is not, and fsync is likely to be suspended during execution, the
action of checking the hash value in
io_wqe_dec_running cannot handle such scenarios.
Similarly, if hash-based work and non-hash-based work are sent at the
same time, similar issues are likely to occur.
Returning to the starting point of the issue, when a new work
arrives, io_wq_enqueue may wake up free worker A, while
io_wq_dec_running may create worker B. Ultimately, only one of A and
B can obtain and process the task, leaving the other in an idle
state. In the end, the issue is caused by inconsistent logic in the
checks performed by io_wq_enqueue and io_wq_dec_running.
Therefore, the problem can be resolved by checking for available
workers in io_wq_dec_running.

Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
Reviewed-by: Diangang Li <lidiangang@bytedance.com>
Link: https://lore.kernel.org/r/20250813120214.18729-1-changfengnan@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

btrfs: fix printing of mount info messages for NODATACOW/NODATASUM

The NODATASUM message was printed twice by mistake and the NODATACOW was
missing from the 'unset' part. Fix the duplication and make the output
look the same.

Fixes: eddb1a433f26 ("btrfs: add reconfigure callback for fs_context")
CC: stable@vger.kernel.org # 6.8+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: restore mount option info messages during mount

After the fsconfig migration in 6.8, mount option info messages are no
longer displayed during mount operations because btrfs_emit_options() is
only called during remount, not during initial mount.

Fix this by calling btrfs_emit_options() in btrfs_fill_super() after
open_ctree() succeeds. Additionally, prevent log duplication by ensuring
btrfs_check_options() handles validation with warn-level and err-level
messages, while btrfs_emit_options() provides info-level messages.

Fixes: eddb1a433f26 ("btrfs: add reconfigure callback for fs_context")
CC: stable@vger.kernel.org # 6.8+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix incorrect log message for nobarrier mount option

Fix a wrong log message that appears when the "nobarrier" mount option
is unset. When "nobarrier" is unset, barrier is actually enabled.
However, the log incorrectly stated "turning off barriers".

Fixes: eddb1a433f26 ("btrfs: add reconfigure callback for fs_context")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix buffer index in wait_eb_writebacks()

The commit f2cb97ee964a ("btrfs: index buffer_tree using node size")
changed the index of buffer_tree from "start >> sectorsize_bits" to "start
>> nodesize_bits". However, the change is not applied for
wait_eb_writebacks() and caused IO failures by writing in a full zone. Use
the index properly.

Fixes: f2cb97ee964a ("btrfs: index buffer_tree using node size")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: subpage: keep TOWRITE tag until folio is cleaned

btrfs_subpage_set_writeback() calls folio_start_writeback() the first time
a folio is written back, and it also clears the PAGECACHE_TAG_TOWRITE tag
even if there are still dirty blocks in the folio. This can break ordering
guarantees, such as those required by btrfs_wait_ordered_extents().

That ordering breakage leads to a real failure. For example, running
generic/464 on a zoned setup will hit the following ASSERT. This happens
because the broken ordering fails to flush existing dirty pages before the
file size is truncated.

  assertion failed: !list_empty(&ordered->list) :: 0, in fs/btrfs/zoned.c:1899
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/zoned.c:1899!
  Oops: invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 2 UID: 0 PID: 1906169 Comm: kworker/u130:2 Kdump: loaded Not tainted 6.16.0-rc6-BTRFS-ZNS+ #554 PREEMPT(voluntary)
  Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
  Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
  RIP: 0010:btrfs_finish_ordered_zoned.cold+0x50/0x52 [btrfs]
  RSP: 0018:ffffc9002efdbd60 EFLAGS: 00010246
  RAX: 000000000000004c RBX: ffff88811923c4e0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff827e38b1 RDI: 00000000ffffffff
  RBP: ffff88810005d000 R08: 00000000ffffdfff R09: ffffffff831051c8
  R10: ffffffff83055220 R11: 0000000000000000 R12: ffff8881c2458c00
  R13: ffff88811923c540 R14: ffff88811923c5e8 R15: ffff8881c1bd9680
  FS:  0000000000000000(0000) GS:ffff88a04acd0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f907c7a918c CR3: 0000000004024000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   ? srso_return_thunk+0x5/0x5f
   btrfs_finish_ordered_io+0x4a/0x60 [btrfs]
   btrfs_work_helper+0xf9/0x490 [btrfs]
   process_one_work+0x204/0x590
   ? srso_return_thunk+0x5/0x5f
   worker_thread+0x1d6/0x3d0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x118/0x230
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x205/0x260
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

Consider process A calling writepages() with WB_SYNC_NONE. In zoned mode or
for compressed writes, it locks several folios for delalloc and starts
writing them out. Let's call the last locked folio folio X. Suppose the
write range only partially covers folio X, leaving some pages dirty.
Process A calls btrfs_subpage_set_writeback() when building a bio. This
function call clears the TOWRITE tag of folio X, whose size = 8K and
the block size = 4K. It is following state.

   0     4K    8K
   |/////|/////|  (flag: DIRTY, tag: DIRTY)
   <-----> Process A will write this range.

Now suppose process B concurrently calls writepages() with WB_SYNC_ALL. It
calls tag_pages_for_writeback() to tag dirty folios with
PAGECACHE_TAG_TOWRITE. Since folio X is still dirty, it gets tagged. Then,
B collects tagged folios using filemap_get_folios_tag() and must wait for
folio X to be written before returning from writepages().

   0     4K    8K
   |/////|/////|  (flag: DIRTY, tag: DIRTY|TOWRITE)

However, between tagging and collecting, process A may call
btrfs_subpage_set_writeback() and clear folio X's TOWRITE tag.
   0     4K    8K
   |     |/////|  (flag: DIRTY|WRITEBACK, tag: DIRTY)

As a result, process B won't see folio X in its batch, and returns without
waiting for it. This breaks the WB_SYNC_ALL ordering requirement.

Fix this by using btrfs_subpage_set_writeback_keepwrite(), which retains
the TOWRITE tag. We now manually clear the tag only after the folio becomes
clean, via the xas operation.

Fixes: 3470da3b7d87 ("btrfs: subpage: introduce helpers for writeback status")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: clear TAG_TOWRITE from buffer tree when submitting a tree block

[POSSIBLE BUG]
After commit 5e121ae687b8 ("btrfs: use buffer xarray for extent buffer
writeback operations"), we have a dedicated xarray for extent buffers,
and a lot of tags are migrated to that buffer tree, like
PAGECACHE_TAG_TOWRITE/DIRTY/WRITEBACK.

This frees us from the limits of page flags, but there is a new
asymmetric behavior, we call buffer_tree_tag_for_writeback() to set
PAGECACHE_TAG_TOWRITE for the involved ranges, but there is no one to
clear that tag.

Before that rework, we relied on the page cache tag which was cleared
when folio_start_writeback() was called.
Although this has its own problems (e.g. the first one calling
folio_start_writeback() will clear the tag for the whole page), it at
least cleared the tag.

But now our real tags are stored in the buffer tree, no one is really
clearing the PAGECACHE_TAG_TOWRITE tag now.

[FIX]
Thankfully this is not going to cause any real bug, but just some
inefficiency iterating the extent buffers.

As if we hit an extent buffer which is not dirty but still has the
PAGECACHE_TAG_TOWRITE tag, lock_extent_buffer_for_io() will skip it so
we won't writeback the extent buffer again.

To properly fix the inefficiency, just clear the PAGECACHE_TAG_TOWRITE
inside lock_extent_buffer_for_io().

There is no error path between lock_extent_buffer_for_io() and
write_one_eb(), so we're safe to clear the tag there.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: do not set mtime/ctime to current time when unlinking for log replay

If we are doing an unlink for log replay, we are updating the directory's
mtime and ctime to the current time, and this is incorrect since it should
stay with the mtime and ctime that were set when the directory was logged.

This is the same as when adding a link to an inode during log replay (with
btrfs_add_link()), where we want the mtime and ctime to be the values that
were in place when the inode was logged.

This was found with generic/547 using LOAD_FACTOR=20 and TIME_FACTOR=20,
where due to large log trees we have longer log replay times and fssum
could detect a mismatch of the mtime and ctime of a directory.

Fix this by skipping the mtime and ctime update at __btrfs_unlink_inode()
if we are in log replay context (just like btrfs_add_link()).

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>