git.kernel.dk Git - linux-block.git/log

nfsd: decrease sc_count directly if fail to queue dl_recall

A deadlock warning occurred when invoking nfs4_put_stid following a failed
dl_recall queue operation:
            T1                            T2
                                nfs4_laundromat
                                 nfs4_get_client_reaplist
                                  nfs4_anylock_blockers
__break_lease
spin_lock // ctx->flc_lock
                                   spin_lock // clp->cl_lock
                                   nfs4_lockowner_has_blockers
                                    locks_owner_has_blockers
                                     spin_lock // flctx->flc_lock
nfsd_break_deleg_cb
  nfsd_break_one_deleg
   nfs4_put_stid
    refcount_dec_and_lock
     spin_lock // clp->cl_lock

When a file is opened, an nfs4_delegation is allocated with sc_count
initialized to 1, and the file_lease holds a reference to the delegation.
The file_lease is then associated with the file through kernel_setlease.

The disassociation is performed in nfsd4_delegreturn via the following
call chain:
nfsd4_delegreturn --> destroy_delegation --> destroy_unhashed_deleg -->
nfs4_unlock_deleg_lease --> kernel_setlease --> generic_delete_lease
The corresponding sc_count reference will be released after this
disassociation.

Since nfsd_break_one_deleg executes while holding the flc_lock, the
disassociation process becomes blocked when attempting to acquire flc_lock
in generic_delete_lease. This means:
1) sc_count in nfsd_break_one_deleg will not be decremented to 0;
2) The nfs4_put_stid called by nfsd_break_one_deleg will not attempt to
acquire cl_lock;
3) Consequently, no deadlock condition is created.

Given that sc_count in nfsd_break_one_deleg remains non-zero, we can
safely perform refcount_dec on sc_count directly. This approach
effectively avoids triggering deadlock warnings.

Fixes: 230ca758453c ("nfsd: put dl_stid if fail to queue dl_recall")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfs: add missing selections of CONFIG_CRC32

nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available
only when CONFIG_CRC32 is enabled.  But the only NFS kconfig option that
selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and
did not actually guard the use of crc32_le() even on the client.

The code worked around this bug by only actually calling crc32_le() when
CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases.  This
avoided randconfig build errors, and in real kernels the fallback code
was unlikely to be reached since CONFIG_CRC32 is 'default y'.  But, this
really needs to just be done properly, especially now that I'm planning
to update CONFIG_CRC32 to not be 'default y'.

Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select
CONFIG_CRC32.  Then remove the fallback code that becomes unnecessary,
as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG.

Fixes: 1264a2f053a3 ("NFS: refactor code for calculating the crc32 hash of a filehandle")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Add a Kconfig setting to enable delegated timestamps

After three tries, we still see test failures with delegated
timestamps. Disable them by default, but leave the implementation
intact so that development can continue.

Cc: stable@vger.kernel.org # v6.14
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

sysctl: Fixes nsm_local_state bounds

Bound nsm_local_state sysctl writings between SYSCTL_ZERO
and SYSCTL_INT_MAX.

The proc_handler has thus been updated to proc_dointvec_minmax.

Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
[ cel: updated to handle zero - UINT_MAX instead ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: use a long for the count in nfsd4_state_shrinker_count()

If there are no courtesy clients then the return value from the
atomic_long_read() could overflow an int. Use a long to store the value
instead.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: remove obsolete comment from nfs4_alloc_stid

idr_alloc_cyclic() is what guarantees that now, not this long-gone trick.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()

This isn't needed.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: reorganize struct nfs4_delegation for better packing

Move dl_type field above dl_time, which shaves 8 bytes off this struct.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: handle errors from rpc_call_async()

It's possible for rpc_call_async() to fail (mainly due to memory
allocation failure). If it does, there isn't much recourse other than to
requeue the callback and try again later.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: move cb_need_restart flag into cb_flags

Since there is now a cb_flags word, use a new NFSD4_CALLBACK_REQUEUE
flag in that instead of a separate boolean.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING

These flags serve essentially the same purpose and get set and cleared
at the same time. Drop CB_GETATTR_BUSY and just use
NFSD4_CALLBACK_RUNNING instead.

For this to work, we must use clear_and_wake_up_bit(), but doing that on
for other types of callbacks is wasteful. Declare a new NFSD4_CALLBACK_WAKE
flag in cb_flags to indicate that wake_up is needed, and only set that
for CB_GETATTRs.

Also, make the wait use a TASK_UNINTERRUPTIBLE sleep. This is done in
the context of an nfsd thread, and it should never need to deal with
signals.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY

deleg_reaper() will walk the client_lru list and put any suitable
entries onto "cblist" using the cl_ra_cblist pointer. It then walks the
objects outside the spinlock and queues callbacks for them.

None of the operations that deleg_reaper() does outside the
nn->client_lock are blocking operations. Just queue their workqueue jobs
under the nn->client_lock instead.

Also, the NFSD4_CLIENT_CB_RECALL_ANY and NFSD4_CALLBACK_RUNNING flags
serve an identical purpose now. Drop the NFSD4_CLIENT_CB_RECALL_ANY flag
and just use the one in the callback.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: prevent callback tasks running concurrently

The nfsd4_callback workqueue jobs exist to queue backchannel RPCs to
rpciod. Because they run in different workqueue contexts, the rpc_task
can run concurrently with the workqueue job itself, should it become
requeued. This is problematic as there is no locking when accessing the
fields in the nfsd4_callback.

Add a new unsigned long to nfsd4_callback and declare a new
NFSD4_CALLBACK_RUNNING flag to be set in it. When attempting to run a
workqueue job, do a test_and_set_bit() on that flag first, and don't
queue the workqueue job if it returns true. Clear NFSD4_CALLBACK_RUNNING
in nfsd41_destroy_cb().

This also gives us a more reliable mechanism for handling queueing
failures in codepaths where we have to take references under spinlocks.
We can now do the test_and_set_bit on NFSD4_CALLBACK_RUNNING first, and
only take references to the objects if that returns false.

Most of the nfsd4_run_cb() callers are converted to use this new flag or
the nfsd4_try_run_cb() wrapper. The main exception is the callback
channel probe, which has its own synchronization.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: disallow file locking and delegations for NFSv4 reexport

We do not and cannot support file locking with NFS reexport over
NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport
server reboot cannot allow clients to recover locks because the source
NFS server has not rebooted, and so it is not in grace. Since the
source NFS server is not in grace, it cannot offer any guarantees that
the file won't have been changed between the locks getting lost and
any attempt to recover/reclaim them. The same applies to delegations
and any associated locks, so disallow them too.

Clients are no longer allowed to get file locks or delegations from a
reexport server, any attempts will fail with operation not supported.

Update the "Reboot recovery" section accordingly in
Documentation/filesystems/nfs/reexport.rst

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: filecache: drop the list_lru lock during lock gc scans

Under a high NFSv3 load with lots of different files being accessed,
the LRU list of garbage-collectable files can become quite long.

Asking list_lru_scan_node() to scan the whole list can result in a long
period during which a spinlock is held, blocking the addition of new LRU
items.

So ask list_lru_scan_node() to scan only a few entries at a time, and
repeat until the scan is complete.

If the shrinker runs between two consecutive calls of
list_lru_scan_node() it could invalidate the "remaining" counter which
could lead to premature freeing. So add a spinlock to avoid that.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: filecache: don't repeatedly add/remove files on the lru list

There is no need to remove a file from the lru every time we access it,
and then add it back. It is sufficient to set the REFERENCED flag every
time we put the file. The order in the lru of REFERENCED files is
largely irrelevant as they will all be moved to the end.

With this patch, files are added only when they are allocated (if
want_gc) and they are removed only by the list_lru_(shrink_)walk
callback or when forcibly removing a file.

This should reduce contention on the list_lru spinlock(s) and reduce
memory traffic a little.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: filecache: introduce NFSD_FILE_RECENT

The filecache lru is walked in 2 circumstances for 2 different reasons.

1/ When called from the shrinker we want to discard the first few
   entries on the list, ignoring any with NFSD_FILE_REFERENCED set
   because they should really be at the end of the LRU as they have been
   referenced recently.  So those ones are ROTATED.

2/ When called from the nfsd_file_gc() timer function we want to discard
   anything that hasn't been used since before the previous call, and
   mark everything else as unused at this point in time.

Using the same flag for both of these can result in some unexpected
outcomes.  If the shrinker callback clears NFSD_FILE_REFERENCED then
nfsd_file_gc() will think the file hasn't been used in a while, while
really it has.

I think it is easier to reason about the behaviour if we instead have
two flags.

NFSD_FILE_REFERENCED means "this should be at the end of the LRU, please
     put it there when convenient"
NFSD_FILE_RECENT means "this has been used recently - since the last
     run of nfsd_file_gc()

When either caller finds an NFSD_FILE_REFERENCED entry, that entry
should be moved to the end of the LRU and the flag cleared.  This can
safely happen at any time.  The actual order on the lru might not be
strictly least-recently-used, but that is normal for linux lrus.

The shrinker callback can ignore the "recent" flag.  If it ends up
freeing something that is "recent" that simply means that memory
pressure is sufficient to limit the acceptable cache age to less than
the nfsd_file_gc frequency.

The gc callback should primarily focus on NFSD_FILE_RECENT.  It should
free everything that doesn't have this flag set, and should clear the
flag on everything else.  When it clears the flag it is convenient to
clear the "REFERENCED" flag and move to the end of the LRU too.

With this, calls from the shrinker do not prematurely age files.  It
will focus only on freeing those that are least recently used.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()

list_lru_walk() is only useful when the aim is to remove all elements
from the list_lru. It will repeatedly visit rotated elements of the
first per-node sublist before proceeding to subsequent sublists.

This patch changes nfsd_file_gc() to use list_lru_walk_node() and
list_lru_count_node() on each NUMA node.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()

nfsd_file_close_inode_sync() contains an exact copy of
nfsd_file_dispose_list().

This patch removes that copy and calls nfsd_file_dispose_list()
instead.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Re-organize nfsd_file_gc_worker()

Dave opines:

IMO, there is no need to do this unnecessary work on every object
that is added to the LRU. Changing the gc worker to always run
every 2s and check if it has work to do like so:

static void
nfsd_file_gc_worker(struct work_struct *work)
{
- nfsd_file_gc();
- if (list_lru_count(&nfsd_file_lru))
- nfsd_file_schedule_laundrette();
+ if (list_lru_count(&nfsd_file_lru))
+ nfsd_file_gc();
+ nfsd_file_schedule_laundrette();
}

means that nfsd_file_gc() will be run the same way and have the same
behaviour as the current code. When the system it idle, it does a
list_lru_count() check every 2 seconds and goes back to sleep.
That's going to be pretty much unnoticable on most machines that run
NFS servers.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: filecache: remove race handling.

The race that this code tries to protect against is not interesting.
The code is problematic as we access the "nf" after we have given our
reference to the lru system.  While that takes 2+ seconds to free
things, it is still poor form.

The only interesting race I can find would be with
nfsd_file_close_inode_sync();
This is the only place that really doesn't want the file to stay on the
LRU when unhashed (which is the direct consequence of the race).

However for the race to happen, some other thread must own a reference
to a file and be putting it while nfsd_file_close_inode_sync() is trying
to close all files for an inode.  If this is possible, that other thread
could simply call nfsd_file_put() a little bit later and the result
would be the same: not all files are closed when
nfsd_file_close_inode_sync() completes.

If this was really a problem, we would need to wait in close_inode_sync
for the other references to be dropped.  We probably don't want to do
that.

So it is best to simply remove this code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

So, in order to avoid ending up with a flexible-array member in the
middle of other structs, we use the `struct_group_tagged()` helper
to create a new tagged `struct posix_acl_hdr`. This structure
groups together all the members of the flexible `struct posix_acl`
except the flexible array.

As a result, the array is effectively separated from the rest of the
members without modifying the memory layout of the flexible structure.
We then change the type of the middle struct member currently causing
trouble from `struct posix_acl` to `struct posix_acl_hdr`.

We also want to ensure that when new members need to be added to the
flexible structure, they are always included within the newly created
tagged struct. For this, we use `static_assert()`. This ensures that the
memory layout for both the flexible structure and the new tagged struct
is the same after any changes.

This approach avoids having to implement `struct posix_acl_hdr` as a
completely separate structure, thus preventing having to maintain two
independent but basically identical structures, closing the door to
potential bugs in the future.

We also use `container_of()` whenever we need to retrieve a pointer to
the flexible structure, through which we can access the flexible-array
member, if necessary.

So, with these changes, fix the following warning:

fs/nfs_common/nfsacl.c:45:26: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix callback decoder status codes

fs/nfsd/nfs4callback.c implements a callback client. Thus its XDR
decoders are decoding replies, not calls.

NFS4ERR_BAD_XDR is an on-the-wire status code that reports that the
client sent a corrupted RPC /call/. It's not used as the internal
error code when a /reply/ can't be decoded, since that kind of
failure is never reported to the sender of that RPC message.

Instead, a reply decoder should return -EIO, as the reply decoders
in the NFS client do.

Fixes: 6487a13b5c6b ("NFSD: add support for CB_GETATTR callback")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: eliminate special handling of NFS4ERR_SEQ_MISORDERED

On a SEQ_MISORDERED error, the current code will reattempt the call, but
set the slot sequence ID to 1. I can find no mention of this remedy in
the spec, and it seems potentially dangerous. It's possible that the
last call was sent with seqid 1, and doing this will cause a
retransmission of the reply.

Drop this special handling, and always treat SEQ_MISORDERED like
BADSLOT. Retry the call, but leak the slot so that it is no longer used.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: handle NFS4ERR_BADSLOT on CB_SEQUENCE better

Currently it just restarts the call, without getting a new slot.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr

ESERVERFAULT means that the server sent a successful and legitimate
reply, but the session info didn't match what was expected. Don't
increment the seq_nr in that case.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: only check RPC_SIGNALLED() when restarting rpc_task

nfsd4_cb_sequence_done() currently checks RPC_SIGNALLED() when
processing the compound and releasing the slot. If RPC_SIGNALLED()
returns true, then that means that the client is going to be torn down.

Don't check RPC_SIGNALLED() after processing a successful reply. Check
it only before restarting the rpc_task. If it returns true, then requeue
the callback instead of restarting the task.

Also, handle rpc_restart_call() and rpc_restart_call_prepare() failures
correctly, by requeueing the callback.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: always release slot when requeueing callback

If the callback is going to be requeued to the workqueue, then release
the slot. The callback client and session could change and the slot may
no longer be valid after that point.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done()

It's a bit strange to call nfsd4_cb_sequence_done() on a callback with no
CB_SEQUENCE. Lift the handling of restarting a call into a new helper,
and move the handling of NFSv4.0 into nfsd4_cb_done().

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: prepare nfsd4_cb_sequence_done() for error handling rework

There is only one case where we want to proceed with processing the rest
of the CB_COMPOUND, and that's when the cb_seq_status is 0. Make the
default return value be false, and only set it to true in that case.

Rename the "need_restart" label to "requeue", to better indicate that
it's being requeued to the workqueue.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: put dl_stid if fail to queue dl_recall

Before calling nfsd4_run_cb to queue dl_recall to the callback_wq, we
increment the reference count of dl_stid.
We expect that after the corresponding work_struct is processed, the
reference count of dl_stid will be decremented through the callback
function nfsd4_cb_recall_release.
However, if the call to nfsd4_run_cb fails, the incremented reference
count of dl_stid will not be decremented correspondingly, leading to the
following nfs4_stid leak:
unreferenced object 0xffff88812067b578 (size 344):
  comm "nfsd", pid 2761, jiffies 4295044002 (age 5541.241s)
  hex dump (first 32 bytes):
    01 00 00 00 6b 6b 6b 6b b8 02 c0 e2 81 88 ff ff  ....kkkk........
    00 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  .kkkkkkk.....N..
  backtrace:
    kmem_cache_alloc+0x4b9/0x700
    nfsd4_process_open1+0x34/0x300
    nfsd4_open+0x2d1/0x9d0
    nfsd4_proc_compound+0x7a2/0xe30
    nfsd_dispatch+0x241/0x3e0
    svc_process_common+0x5d3/0xcc0
    svc_process+0x2a3/0x320
    nfsd+0x180/0x2e0
    kthread+0x199/0x1d0
    ret_from_fork+0x30/0x50
    ret_from_fork_asm+0x1b/0x30
unreferenced object 0xffff8881499f4d28 (size 368):
  comm "nfsd", pid 2761, jiffies 4295044005 (age 5541.239s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 30 4d 9f 49 81 88 ff ff  ........0M.I....
    30 4d 9f 49 81 88 ff ff 20 00 00 00 01 00 00 00  0M.I.... .......
  backtrace:
    kmem_cache_alloc+0x4b9/0x700
    nfs4_alloc_stid+0x29/0x210
    alloc_init_deleg+0x92/0x2e0
    nfs4_set_delegation+0x284/0xc00
    nfs4_open_delegation+0x216/0x3f0
    nfsd4_process_open2+0x2b3/0xee0
    nfsd4_open+0x770/0x9d0
    nfsd4_proc_compound+0x7a2/0xe30
    nfsd_dispatch+0x241/0x3e0
    svc_process_common+0x5d3/0xcc0
    svc_process+0x2a3/0x320
    nfsd+0x180/0x2e0
    kthread+0x199/0x1d0
    ret_from_fork+0x30/0x50
    ret_from_fork_asm+0x1b/0x30
Fix it by checking the result of nfsd4_run_cb and call nfs4_put_stid if
fail to queue dl_recall.

Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: allow SC_STATUS_FREEABLE when searching via nfs4_lookup_stateid()

The pynfs DELEG8 test fails when run against nfsd. It acquires a
delegation and then lets the lease time out. It then tries to use the
deleg stateid and expects to see NFS4ERR_DELEG_REVOKED, but it gets
bad NFS4ERR_BAD_STATEID instead.

When a delegation is revoked, it's initially marked with
SC_STATUS_REVOKED, or SC_STATUS_ADMIN_REVOKED and later, it's marked
with the SC_STATUS_FREEABLE flag, which denotes that it is waiting for
s FREE_STATEID call.

nfs4_lookup_stateid() accepts a statusmask that includes the status
flags that a found stateid is allowed to have. Currently, that mask
never includes SC_STATUS_FREEABLE, which means that revoked delegations
are (almost) never found.

Add SC_STATUS_FREEABLE to the always-allowed status flags, and remove it
from nfsd4_delegreturn() since it's now always implied.

Fixes: 8dd91e8d31fe ("nfsd: fix race between laundromat and free_stateid")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

svcrdma: do not unregister device for listeners

On an rdma-capable machine, a start/stop/start and then on a stop of
a knfsd server would lead kref underflow warning because svc_rdma_free
would indiscriminately unregister the rdma device but a listening
transport never calls the rdma_rn_register() thus leading to kref
going down to 0 on the 1st stop of the server and on the 2nd stop
it leads to a problem.

Suggested-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: c4de97f7c454 ("svcrdma: Handle device removal outside of the CM event handler")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: don't ignore the return code of svc_proc_register()

Currently, nfsd_proc_stat_init() ignores the return value of
svc_proc_register(). If the procfile creation fails, then the kernel
will WARN when it tries to remove the entry later.

Fix nfsd_proc_stat_init() to return the same type of pointer as
svc_proc_register(), and fix up nfsd_net_init() to check that and fail
the nfsd_net construction if it occurs.

svc_proc_register() can fail if the dentry can't be allocated, or if an
identical dentry already exists. The second case is pretty unlikely in
the nfsd_net construction codepath, so if this happens, return -ENOMEM.

Reported-by: syzbot+e34ad04f27991521104c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-nfs/67a47501.050a0220.19061f.05f9.GAE@google.com/
Cc: stable@vger.kernel.org # v6.9
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix trace_nfsd_slot_seqid_sequence

While running down the problem triggered by disconnect injection,
I noticed the "in use" string was actually never hooked up in this
trace point, so it always showed the traced slot as not in use. But
what might be more useful is showing all the slot status flags.

Also, this trace point can record and report the slot's index
number, which among other things is useful for troubleshooting slot
table expansion and contraction.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove unused make_checksum

Commit ec596aaf9b48 ("SUNRPC: Remove code behind
CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED") was the last user of the
make_checksum() function.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Return NFS4ERR_FILE_OPEN only when linking an open file

RFC 8881 Section 18.9.4 paragraphs 1 - 2 tell us that RENAME should
return NFS4ERR_FILE_OPEN only when the target object is a file that
is currently open. If the target is a directory, some other status
must be returned.

The VFS is unlikely to return -EBUSY, but NFSD has to ensure that
errno does not leak to clients as a status code that is not
permitted by spec.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Return NFS4ERR_FILE_OPEN only when renaming over an open file

RFC 8881 Section 18.26.4 paragraphs 1 - 3 tell us that RENAME should
return NFS4ERR_FILE_OPEN only when the target object is a file that
is currently open. If the target is a directory, some other status
must be returned.

Generally I expect that a delegation recall will be triggered in
some of these circumstances. In other cases, the VFS might return
-EBUSY for other reasons, and NFSD has to ensure that errno does
not leak to clients as a status code that is not permitted by spec.

There are some error flows where the target dentry hasn't been
found yet. The default value for @type therefore is S_IFDIR to return
an alternate status code in those cases.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Never return NFS4ERR_FILE_OPEN when removing a directory

RFC 8881 Section 18.25.4 paragraph 5 tells us that the server
should return NFS4ERR_FILE_OPEN only if the target object is an
opened file. This suggests that returning this status when removing
a directory will confuse NFS clients.

This is a version-specific issue; nfsd_proc_remove/rmdir() and
nfsd3_proc_remove/rmdir() already return nfserr_access as
appropriate.

Unfortunately there is no quick way for nfsd4_remove() to determine
whether the target object is a file or not, so the check is done in
in nfsd_unlink() for now.

Reported-by: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 466e16f0920f ("nfsd: check for EBUSY from vfs_rmdir/vfs_unink.")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: nfsd_unlink() clobbers non-zero status returned from fh_fill_pre_attrs()

If fh_fill_pre_attrs() returns a non-zero status, the error flow
takes it through out_unlock, which then overwrites the returned
status code with

err = nfserrno(host_err);

Fixes: a332018a91c4 ("nfsd: handle failure to collect pre/post-op attrs more sanely")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: remove the redundant mapping of nfserr_mlink

There two mappings of nfserr_mlink in nfs_errtbl.
Remove one of them.

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Skip sending CB_RECALL_ANY when the backchannel isn't up

NFSD sends CB_RECALL_ANY to clients when the server is low on
memory or that client has a large number of delegations outstanding.

We've seen cases where NFSD attempts to send CB_RECALL_ANY requests
to disconnected clients, and gets confused. These calls never go
anywhere if a backchannel transport to the target client isn't
available. Before the server can send any backchannel operation, the
client has to connect first and then do a BIND_CONN_TO_SESSION.

This patch doesn't address the root cause of the confusion, but
there's no need to queue up these optional operations if they can't
go anywhere.

Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: adjust WARN_ON_ONCE in revoke_delegation

A WARN_ON_ONCE() is added to revoke delegations to make sure that the
state has been marked for revocation. However, that's only true for 4.1+
stateids. For 4.0 stateids, in unhash_delegation_locked() the sc_status
is set to SC_STATUS_CLOSED. Modify the check to reflect it, otherwise
a WARN_ON_ONCE is erronously triggered.

Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: fix management of listener transports

Currently, when no active threads are running, a root user using nfsdctl
command can try to remove a particular listener from the list of previously
added ones, then start the server by increasing the number of threads,
it leads to the following problem:

[  158.835354] refcount_t: addition on 0; use-after-free.
[  158.835603] WARNING: CPU: 2 PID: 9145 at lib/refcount.c:25 refcount_warn_saturate+0x160/0x1a0
[  158.836017] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace overlay isofs uinput snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables qrtr sunrpc vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_hda_codec_generic mc e1000e snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore sg loop dm_multipath dm_mod nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs libcrc32c crct10dif_ce ghash_ce vmwgfx sha2_ce sha256_arm64 sr_mod sha1_ce cdrom nvme drm_client_lib drm_ttm_helper ttm nvme_core drm_kms_helper nvme_auth drm fuse
[  158.840093] CPU: 2 UID: 0 PID: 9145 Comm: nfsd Kdump: loaded Tainted: G    B   W          6.13.0-rc6+ #7
[  158.840624] Tainted: [B]=BAD_PAGE, [W]=WARN
[  158.840802] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024
[  158.841220] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  158.841563] pc : refcount_warn_saturate+0x160/0x1a0
[  158.841780] lr : refcount_warn_saturate+0x160/0x1a0
[  158.842000] sp : ffff800089be7d80
[  158.842147] x29: ffff800089be7d80 x28: ffff00008e68c148 x27: ffff00008e68c148
[  158.842492] x26: ffff0002e3b5c000 x25: ffff600011cd1829 x24: ffff00008653c010
[  158.842832] x23: ffff00008653c000 x22: 1fffe00011cd1829 x21: ffff00008653c028
[  158.843175] x20: 0000000000000002 x19: ffff00008653c010 x18: 0000000000000000
[  158.843505] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  158.843836] x14: 0000000000000000 x13: 0000000000000001 x12: ffff600050a26493
[  158.844143] x11: 1fffe00050a26492 x10: ffff600050a26492 x9 : dfff800000000000
[  158.844475] x8 : 00009fffaf5d9b6e x7 : ffff000285132493 x6 : 0000000000000001
[  158.844823] x5 : ffff000285132490 x4 : ffff600050a26493 x3 : ffff8000805e72bc
[  158.845174] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000098588000
[  158.845528] Call trace:
[  158.845658]  refcount_warn_saturate+0x160/0x1a0 (P)
[  158.845894]  svc_recv+0x58c/0x680 [sunrpc]
[  158.846183]  nfsd+0x1fc/0x348 [nfsd]
[  158.846390]  kthread+0x274/0x2f8
[  158.846546]  ret_from_fork+0x10/0x20
[  158.846714] ---[ end trace 0000000000000000 ]---

nfsd_nl_listener_set_doit() would manipulate the list of transports of
server's sv_permsocks and close the specified listener but the other
list of transports (server's sp_xprts list) would not be changed leading
to the problem above.

Instead, determined if the nfsdctl is trying to remove a listener, in
which case, delete all the existing listener transports and re-create
all-but-the-removed ones.

Fixes: 16a471177496 ("NFSD: add listener-{set,get} netlink command")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove unused krb5_decrypt

The last use of krb5_decrypt() was removed in 2023 by
commit 2a9893f796a3 ("SUNRPC:
Remove net/sunrpc/auth_gss/gss_krb5_seqnum.c")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

lockd: add netlink control interface

The legacy rpc.nfsd tool will set the nlm_grace_period if the NFSv4
grace period is set. nfsdctl is missing this functionality, so add a new
netlink control interface for lockd that it can use. For now, it only
allows setting the grace period, and the tcp and udp listener ports.

lockd currently uses module parameters and sysctls for configuration, so
all of its settings are global. With this change, lockd now tracks these
values on a per-net-ns basis. It will only fall back to using the global
values if any of them are 0.

Finally, as a backward compatibility measure, if updating the nlm
settings in the init_net namespace, also update the legacy global
values to match.

Link: https://issues.redhat.com/browse/RHEL-71698
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

sunrpc: clean cache_detail immediately when flush is written frequently

We will write /proc/net/rpc/xxx/flush if we want to clean cache_detail.
This updates nextcheck to the current time and calls cache_flush -->
cache_clean to clean cache_detail.
If we write this interface again within one second, it will only increase
flush_time and nextcheck without actually cleaning cache_detail.
Therefore, if we keep writing this interface repeatedly within one second,
flush_time and nextcheck will keep increasing, even far exceeding the
current time, making it impossible to clear cache_detail through the flush
interface or cache_cleaner.
If someone frequently calls the flush interface, we should immediately
clean the corresponding cache_detail instead of continuously accumulating
nextcheck.

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Linux 6.14-rc6

Merge tag 'kbuild-fixes-v6.14-3' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Use the specified $(LD) when building userprogs with Clang

- Pass the correct target triple when compile-testing UAPI headers
   with Clang

- Fix pacman-pkg build error with KBUILD_OUTPUT

* tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: install-extmod-build: Fix build when specifying KBUILD_OUTPUT
  docs: Kconfig: fix defconfig description
  kbuild: hdrcheck: fix cross build with clang
  kbuild: userprogs: use correct lld when linking through clang

Merge tag 'usb-6.14-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for some reported issues. These
  contain:

   - typec driver fixes

   - dwc3 driver fixes

   - xhci driver fixes

   - renesas controller fixes

   - gadget driver fixes

   - a new USB quirk added

  All of these have been in linux-next with no reported issues"

* tag 'usb-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix NULL pointer access
  usb: quirks: Add DELAY_INIT and NO_LPM for Prolific Mass Storage Card Reader
  usb: xhci: Fix host controllers "dying" after suspend and resume
  usb: dwc3: Set SUSPENDENABLE soon after phy init
  usb: hub: lack of clearing xHC resources
  usb: renesas_usbhs: Flush the notify_hotplug_work
  usb: renesas_usbhs: Use devm_usb_get_phy()
  usb: renesas_usbhs: Call clk_put()
  usb: dwc3: gadget: Prevent irq storm when TH re-executes
  usb: gadget: Check bmAttributes only if configuration is valid
  xhci: Restrict USB4 tunnel detection for USB3 devices to Intel hosts
  usb: xhci: Enable the TRB overfetch quirk on VIA VL805
  usb: gadget: Fix setting self-powered state on suspend
  usb: typec: ucsi: increase timeout for PPM reset operations
  acpi: typec: ucsi: Introduce a ->poll_cci method
  usb: typec: tcpci_rt1711h: Unmask alert interrupts to fix functionality
  usb: gadget: Set self-powered based on MaxPower and bmAttributes
  usb: gadget: u_ether: Set is_suspend flag if remote wakeup fails
  usb: atm: cxacru: fix a flaw in existing endpoint checks

Merge tag 'driver-core-6.14-rc6' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
"Here is a single driver core fix that resolves a reported memory leak.

It's been in linux-next for 2 weeks now with no reported problems"

* tag 'driver-core-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers: core: fix device leak in __fw_devlink_relax_cycles()

Merge tag 'char-misc-6.14-rc6' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc/IIO driver fixes from Greg KH:
"Here are a number of misc and char and iio driver fixes that have been
  sitting in my tree for way too long. They contain:

   - iio driver fixes for reported issues

   - regression fix for rtsx_usb card reader

   - mei and mhi driver fixes

   - small virt driver fixes

   - ntsync permissions fix

   - other tiny driver fixes for reported problems.

  All of these have been in linux-next for quite a while with no
  reported issues"

* tag 'char-misc-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits)
  Revert "drivers/card_reader/rtsx_usb: Restore interrupt based detection"
  ntsync: Check wait count based on byte size.
  bus: simple-pm-bus: fix forced runtime PM use
  char: misc: deallocate static minor in error path
  eeprom: digsy_mtc: Make GPIO lookup table match the device
  drivers: virt: acrn: hsm: Use kzalloc to avoid info leak in pmcmd_ioctl
  binderfs: fix use-after-free in binder_devices
  slimbus: messaging: Free transaction ID in delayed interrupt scenario
  vbox: add HAS_IOPORT dependency
  cdx: Fix possible UAF error in driver_override_show()
  intel_th: pci: Add Panther Lake-P/U support
  intel_th: pci: Add Panther Lake-H support
  intel_th: pci: Add Arrow Lake support
  intel_th: msu: Fix less trivial kernel-doc warnings
  intel_th: msu: Fix kernel-doc warnings
  MAINTAINERS: change maintainer for FSI
  ntsync: Set the permissions to be 0666
  bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock
  mei: vsc: Use "wakeuphostint" when getting the host wakeup GPIO
  mei: me: add panther lake P DID
  ...

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"arm64:

   - Fix a couple of bugs affecting pKVM's PSCI relay implementation
     when running in the hVHE mode, resulting in the host being entered
     with the MMU in an unknown state, and EL2 being in the wrong mode

  x86:

   - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow

   - Ensure DEBUGCTL is context switched on AMD to avoid running the
     guest with the host's value, which can lead to unexpected bus lock
     #DBs

   - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't
     properly emulate BTF. KVM's lack of context switching has meant BTF
     has always been broken to some extent

   - Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as
     the guest can enable DebugSwap without KVM's knowledge

   - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes
     to RO memory" phase without actually generating a write-protection
     fault

   - Fix a printf() goof in the SEV smoke test that causes build
     failures with -Werror

   - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when
     PERFMON_V2 isn't supported by KVM"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM
  KVM: selftests: Fix printf() format goof in SEV smoke test
  KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
  KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3
  KVM: SVM: Save host DR masks on CPUs with DebugSwap
  KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()
  KVM: arm64: Initialize HCR_EL2.E2H early
  KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
  KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
  KVM: x86: Snapshot the host's DEBUGCTL in common x86
  KVM: SVM: Suppress DEBUGCTL.BTF on AMD
  KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
  KVM: selftests: Assert that STI blocking isn't set after event injection
  KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow

Merge tag 'kvm-x86-fixes-6.14-rcN.2' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.14-rcN #2

- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.

- Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
   the host's value, which can lead to unexpected bus lock #DBs.

- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
   emulate BTF.  KVM's lack of context switching has meant BTF has always been
   broken to some extent.

- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
   can enable DebugSwap without KVM's knowledge.

- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
   memory" phase without actually generating a write-protection fault.

- Fix a printf() goof in the SEV smoke test that causes build failures with
   -Werror.

- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
   isn't supported by KVM.

Merge tag 'kvmarm-fixes-6.14-4' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.14, take #4

- Fix a couple of bugs affecting pKVM's PSCI relay implementation
when running in the hVHE mode, resulting in the host being entered
with the MMU in an unknown state, and EL2 being in the wrong mode.

Merge tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"33 hotfixes. 24 are cc:stable and the remainder address post-6.13
  issues or aren't considered necessary for -stable kernels.

  26 are for MM and 7 are for non-MM.

   - "mm: memory_failure: unmap poisoned folio during migrate properly"
     from Ma Wupeng fixes a couple of two year old bugs involving the
     migration of hwpoisoned folios.

   - "selftests/damon: three fixes for false results" from SeongJae Park
     fixes three one year old bugs in the SAMON selftest code.

  The remainder are singletons and doubletons. Please see the individual
  changelogs for details"

* tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits)
  mm/page_alloc: fix uninitialized variable
  rapidio: add check for rio_add_net() in rio_scan_alloc_net()
  rapidio: fix an API misues when rio_add_net() fails
  MAINTAINERS: .mailmap: update Sumit Garg's email address
  Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
  mm: fix finish_fault() handling for large folios
  mm: don't skip arch_sync_kernel_mappings() in error paths
  mm: shmem: remove unnecessary warning in shmem_writepage()
  userfaultfd: fix PTE unmapping stack-allocated PTE copies
  userfaultfd: do not block on locking a large folio with raised refcount
  mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages
  mm: shmem: fix potential data corruption during shmem swapin
  mm: fix kernel BUG when userfaultfd_move encounters swapcache
  selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
  selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
  selftests/damon/damos_quota: make real expectation of quota exceeds
  include/linux/log2.h: mark is_power_of_2() with __always_inline
  NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
  mm, swap: avoid BUG_ON in relocate_cluster()
  mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
  ...

Merge tag 'x86-urgent-2025-03-08' of git://git./linux/kernel/git/tip/tip

Pull more x86 fixes from Ingo Molnar:

- Add more model IDs to the AMD microcode version check, more people
   are hitting these checks

- Fix a Xen guest boot warning related to AMD northbridge setup

- Fix SEV guest bugs related to a recent changes in its locking logic

- Fix a missing definition of PTRS_PER_PMD that assembly builds can hit

* tag 'x86-urgent-2025-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Add some forgotten models to the SHA check
  x86/mm: Define PTRS_PER_PMD for assembly code too
  virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex
  virt: sev-guest: Allocate request data dynamically
  x86/amd_nb: Use rdmsr_safe() in amd_get_mmconfig_range()

x86/microcode/AMD: Add some forgotten models to the SHA check

Add some more forgotten models to the SHA check.

Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Link: https://lore.kernel.org/r/20250307220256.11816-1-bp@kernel.org

Merge branch 'linus' into x86/urgent, to pick up dependent patches

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'loongarch-fixes-6.14-2' of git://git./linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Fix bugs in kernel build, hibernation, memory management and KVM"

* tag 'loongarch-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Fix GPA size issue about VM
  LoongArch: KVM: Reload guest CSR registers after sleep
  LoongArch: KVM: Add interrupt checking for AVEC
  LoongArch: Set hugetlb mmap base address aligned with pmd size
  LoongArch: Set max_pfn with the PFN of the last page
  LoongArch: Use polling play_dead() when resuming from hibernation
  LoongArch: Eliminate superfluous get_numa_distances_cnt()
  LoongArch: Convert unreachable() to BUG()

LoongArch: KVM: Fix GPA size issue about VM

Physical address space is 48 bit on Loongson-3A5000 physical machine,
however it is 47 bit for VM on Loongson-3A5000 system. Size of physical
address space of VM is the same with the size of virtual user space (a
half) of physical machine.

Variable cpu_vabits represents user address space, kernel address space
is not included (user space and kernel space are both a half of total).
Here cpu_vabits, rather than cpu_vabits - 1, is to represent the size of
guest physical address space.

Also there is strict checking about page fault GPA address, inject error
if it is larger than maximum GPA address of VM.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Reload guest CSR registers after sleep

On host, the HW guest CSR registers are lost after suspend and resume
operation. Since last_vcpu of boot CPU still records latest vCPU pointer
so that the guest CSR register skips to reload when boot CPU resumes and
vCPU is scheduled.

Here last_vcpu is cleared so that guest CSR registers will reload from
scheduled vCPU context after suspend and resume.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: KVM: Add interrupt checking for AVEC

There is a newly added macro INT_AVEC with CSR ESTAT register, which is
bit 14 used for LoongArch AVEC support. AVEC interrupt status bit 14 is
supported with macro CSR_ESTAT_IS, so here replace the hard-coded value
0x1fff with macro CSR_ESTAT_IS so that the AVEC interrupt status is also
supported by KVM.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Set hugetlb mmap base address aligned with pmd size

With ltp test case "testcases/bin/hugefork02", there is a dmesg error
report message such as:

kernel BUG at mm/hugetlb.c:5550!
Oops - BUG[#1]:
CPU: 0 UID: 0 PID: 1517 Comm: hugefork02 Not tainted 6.14.0-rc2+ #241
Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
pc 90000000004eaf1c ra 9000000000485538 tp 900000010edbc000 sp 900000010edbf940
a0 900000010edbfb00 a1 9000000108d20280 a2 00007fffe9474000 a3 00007ffff3474000
a4 0000000000000000 a5 0000000000000003 a6 00000000003cadd3 a7 0000000000000000
t0 0000000001ffffff t1 0000000001474000 t2 900000010ecd7900 t3 00007fffe9474000
t4 00007fffe9474000 t5 0000000000000040 t6 900000010edbfb00 t7 0000000000000001
t8 0000000000000005 u0 90000000004849d0 s9 900000010edbfa00 s0 9000000108d20280
s1 00007fffe9474000 s2 0000000002000000 s3 9000000108d20280 s4 9000000002b38b10
s5 900000010edbfb00 s6 00007ffff3474000 s7 0000000000000406 s8 900000010edbfa08
    ra: 9000000000485538 unmap_vmas+0x130/0x218
   ERA: 90000000004eaf1c __unmap_hugepage_range+0x6f4/0x7d0
  PRMD: 00000004 (PPLV0 +PIE -PWE)
  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
Process hugefork02 (pid: 1517, threadinfo=00000000a670eaf4, task=000000007a95fc64)
Call Trace:
[<90000000004eaf1c>] __unmap_hugepage_range+0x6f4/0x7d0
[<9000000000485534>] unmap_vmas+0x12c/0x218
[<9000000000494068>] exit_mmap+0xe0/0x308
[<900000000025fdc4>] mmput+0x74/0x180
[<900000000026a284>] do_exit+0x294/0x898
[<900000000026aa30>] do_group_exit+0x30/0x98
[<900000000027bed4>] get_signal+0x83c/0x868
[<90000000002457b4>] arch_do_signal_or_restart+0x54/0xfa0
[<90000000015795e8>] irqentry_exit_to_user_mode+0xb8/0x138
[<90000000002572d0>] tlb_do_page_fault_1+0x114/0x1b4

The problem is that base address allocated from hugetlbfs is not aligned
with pmd size. Here add a checking for hugetlbfs and align base address
with pmd size. After this patch the test case "testcases/bin/hugefork02"
passes to run.

This is similar to the commit 7f24cbc9c4d42db8a3c8484d1 ("mm/mmap: teach
generic_get_unmapped_area{_topdown} to handle hugetlb mappings").

Cc: stable@vger.kernel.org # 6.13+
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Set max_pfn with the PFN of the last page

The current max_pfn equals to zero. In this case, it causes user cannot
get some page information through /proc filesystem such as kpagecount.
The following message is displayed by stress-ng test suite with command
"stress-ng --verbose --physpage 1 -t 1".

# stress-ng --verbose --physpage 1 -t 1
stress-ng: error: [1691] physpage: cannot read page count for address 0x134ac000 in /proc/kpagecount, errno=22 (Invalid argument)
stress-ng: error: [1691] physpage: cannot read page count for address 0x7ffff207c3a8 in /proc/kpagecount, errno=22 (Invalid argument)
stress-ng: error: [1691] physpage: cannot read page count for address 0x134b0000 in /proc/kpagecount, errno=22 (Invalid argument)
...

After applying this patch, the kernel can pass the test.

# stress-ng --verbose --physpage 1 -t 1
stress-ng: debug: [1701] physpage: [1701] started (instance 0 on CPU 3)
stress-ng: debug: [1701] physpage: [1701] exited (instance 0 on CPU 3)
stress-ng: debug: [1700] physpage: [1701] terminated (success)

Cc: stable@vger.kernel.org # 6.8+
Fixes: ff6c3d81f2e8 ("NUMA: optimize detection of memory with no node id assigned by firmware")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Use polling play_dead() when resuming from hibernation

When CONFIG_RANDOM_KMALLOC_CACHES or other randomization infrastructrue
enabled, the idle_task's stack may different between the booting kernel
and target kernel. So when resuming from hibernation, an ACTION_BOOT_CPU
IPI wakeup the idle instruction in arch_cpu_idle_dead() and jump to the
interrupt handler. But since the stack pointer is changed, the interrupt
handler cannot restore correct context.

So rename the current arch_cpu_idle_dead() to idle_play_dead(), make it
as the default version of play_dead(), and the new arch_cpu_idle_dead()
call play_dead() directly. For hibernation, implement an arch-specific
hibernate_resume_nonboot_cpu_disable() to use the polling version (idle
instruction is replace by nop, and irq is disabled) of play_dead(), i.e.
poll_play_dead(), to avoid IPI handler corrupting the idle_task's stack
when resuming from hibernation.

This solution is a little similar to commit 406f992e4a372dafbe3c ("x86 /
hibernate: Use hlt_play_dead() when resuming from hibernation").

Cc: stable@vger.kernel.org
Tested-by: Erpeng Xu <xuerpeng@uniontech.com>
Tested-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Eliminate superfluous get_numa_distances_cnt()

In LoongArch, get_numa_distances_cnt() isn't in use, resulting in a
compiler warning.

Fix follow errors with clang-18 when W=1e:

arch/loongarch/kernel/acpi.c:259:28: error: unused function 'get_numa_distances_cnt' [-Werror,-Wunused-function]
259 | static inline unsigned int get_numa_distances_cnt(struct acpi_table_slit *slit)
| ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Link: https://lore.kernel.org/all/Z7bHPVUH4lAezk0E@kernel.org/
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Convert unreachable() to BUG()

When compiling on LoongArch, there exists the following objtool warning
in arch/loongarch/kernel/machine_kexec.o:

kexec_reboot() falls through to next function crash_shutdown_secondary()

Avoid using unreachable() as it can (and will in the absence of UBSAN)
generate fall-through code. Use BUG() so we get a "break BRK_BUG" trap
(with unreachable annotation).

Cc: stable@vger.kernel.org # 6.12+
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

Merge tag 's390-6.14-6' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

- Fix return address recovery of traced function in ftrace to ensure
   reliable stack unwinding

- Fix compiler warnings and runtime crashes of vDSO selftests on s390
   by introducing a dedicated GNU hash bucket pointer with correct
   32-bit entry size

- Fix test_monitor_call() inline asm, which misses CC clobber, by
   switching to an instruction that doesn't modify CC

* tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ftrace: Fix return address recovery of traced function
  selftests/vDSO: Fix GNU hash table entry size for s390x
  s390/traps: Fix test_monitor_call() inline assembly

x86/mm: Define PTRS_PER_PMD for assembly code too

Andy reported the following build warning from head_32.S:

  In file included from arch/x86/kernel/head_32.S:29:
  arch/x86/include/asm/pgtable_32.h:59:5: error: "PTRS_PER_PMD" is not defined, evaluates to 0 [-Werror=undef]
       59 | #if PTRS_PER_PMD > 1

The reason is that on 2-level i386 paging the folded in PMD's
PTRS_PER_PMD constant is not defined in assembly headers,
only in generic MM C headers.

Instead of trying to fish out the definition from the generic
headers, just define it - it even has a comment for it already...

Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/Z8oa8AUVyi2HWfo9@gmail.com

Merge tag 'slab-for-6.14-rc5' of git://git./linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

- Stable fix for kmem_cache_destroy() called from a WQ_MEM_RECLAIM
   workqueue causing a warning due to the new kvfree_rcu_barrier()
   (Uladzislau Rezki)

* tag 'slab-for-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq

Merge tag 'acpi-6.14-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Restore the previous behavior of the ACPI platform_profile sysfs
  interface that has been changed recently in a way incompatible with
  the existing user space (Mario Limonciello)"

* tag 'acpi-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  platform/x86/amd: pmf: Add balanced-performance to hidden choices
  platform/x86/amd: pmf: Add 'quiet' to hidden choices
  ACPI: platform_profile: Add support for hidden choices

Merge tag 'execve-v6.14-rc6' of git://git./linux/kernel/git/kees/linux

Pull core dumping fix from Kees Cook:

- Only sort VMAs when core_sort_vma sysctl is set

* tag 'execve-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
coredump: Only sort VMAs when core_sort_vma sysctl is set

Merge tag 'for-6.14-rc5-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- fix leaked extent map after error when reading chunks

- replace use of deprecated strncpy

- in zoned mode, fixed range when ulocking extent range, causing a hang

* tag 'for-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix a leaked chunk map issue in read_one_chunk()
  btrfs: replace deprecated strncpy() with strscpy()
  btrfs: zoned: fix extent range end unlock in cow_file_range()

Merge tag 'block-6.14-20250306' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- NVMe pull request via Keith:
      - TCP use after free fix on polling (Sagi)
      - Controller memory buffer cleanup fixes (Icenowy)
      - Free leaking requests on bad user passthrough commands (Keith)
      - TCP error message fix (Maurizio)
      - TCP corruption fix on partial PDU (Maurizio)
      - TCP memory ordering fix for weakly ordered archs (Meir)
      - Type coercion fix on message error for TCP (Dan)

- Name the RQF flags enum, fixing issues with anon enums and BPF import
   of it

- ublk parameter setting fix

- GPT partition 7-bit conversion fix

* tag 'block-6.14-20250306' of git://git.kernel.dk/linux:
  block: Name the RQF flags enum
  nvme-tcp: fix signedness bug in nvme_tcp_init_connection()
  block: fix conversion of GPT partition name to 7-bit
  ublk: set_params: properly check if parameters can be applied
  nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch
  nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu()
  nvme-tcp: Fix a C2HTermReq error message
  nvmet: remove old function prototype
  nvme-ioctl: fix leaked requests on mapping error
  nvme-pci: skip CMB blocks incompatible with PCI P2P DMA
  nvme-pci: clean up CMBMSC when registering CMB fails
  nvme-tcp: fix possible UAF in nvme_tcp_poll

Merge tag 'io_uring-6.14-20250306' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"A single fix for a regression introduced in the 6.14 merge window,
causing stalls/hangs with IOPOLL reads or writes"

* tag 'io_uring-6.14-20250306' of git://git.kernel.dk/linux:
io_uring/rw: ensure reissue path is correctly handled for IOPOLL

Merge tag 'sched-urgent-2025-03-07' of git://git./linux/kernel/git/tip/tip

Pull misc scheduler fixes from Ingo Molnar:

- Fix deadline scheduler sysctl parameter setting bug

- Fix RT scheduler sysctl parameter setting bug

- Fix possible memory corruption in child_cfs_rq_on_list()

* tag 'sched-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt: Update limit of sched_rt sysctl in documentation
  sched/deadline: Use online cpus for validating runtime
  sched/fair: Fix potential memory corruption in child_cfs_rq_on_list

Merge tag 'perf-urgent-2025-03-07' of git://git./linux/kernel/git/tip/tip

Pull perf event fixes from Ingo Molnar:
"Fix a race between PMU registration and event creation, and fix
  pmus_lock vs. pmus_srcu lock ordering"

* tag 'perf-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix perf_pmu_register() vs. perf_init_event()
  perf/core: Fix pmus_lock vs. pmus_srcu ordering

Merge tag 'x86-urgent-2025-03-07' of git://git./linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

- Fix CPUID leaf 0x2 parsing bugs

- Sanitize very early boot parameters to avoid crash

- Fix size overflows in the SGX code

- Make CALL_NOSPEC use consistent

* tag 'x86-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Sanitize boot params before parsing command line
  x86/sgx: Fix size overflows in sgx_encl_create()
  x86/cpu: Properly parse CPUID leaf 0x2 TLB descriptor 0x63
  x86/cpu: Validate CPUID leaf 0x2 EDX output
  x86/cacheinfo: Validate CPUID leaf 0x2 EDX output
  x86/speculation: Add a conditional CS prefix to CALL_NOSPEC
  x86/speculation: Simplify and make CALL_NOSPEC consistent

Merge tag 'hwmon-for-v6.14-rc6' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- xgene-hwmon: Fix a NULL vs IS_ERR_OR_NULL() check

- ad7314: Return error if leading zero bits are non-zero

- ntc_thermistor: Update/fix the ncpXXxh103 sensor table

- pmbus: Initialise page count in pmbus_identify()

- peci/dimmtemp: Do not provide fake threshold data

* tag 'hwmon-for-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: fix a NULL vs IS_ERR_OR_NULL() check in xgene_hwmon_probe()
  hwmon: (ad7314) Validate leading zero bits and return error
  hwmon: (ntc_thermistor) Fix the ncpXXxh103 sensor table
  hwmon: (pmbus) Initialise page count in pmbus_identify()
  hwmon: (peci/dimmtemp) Do not provide fake thresholds data

Merge tag 'gpio-fixes-for-v6.14-rc6' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- protect gpio-aggregator against module unload

- use raw spinlock in gpio-rcar to fix a lockdep splat

- fix OF node leak in gpio-rcar

* tag 'gpio-fixes-for-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: rcar: Fix missing of_node_put() call
  gpio: rcar: Use raw_spinlock to protect register access
  gpio: aggregator: protect driver attr handlers against module unload

Merge tag 'platform-drivers-x86-v6.14-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

- amd/pmf:
     - Initialize 'cb_mutex'
     - Support for new version of PMF-TA

- intel-hid: Fix volume buttons on Microsoft Surface Go 4 tablet

- intel/vsec: Add Diamond Rapids support

- thinkpad_acpi: Add battery quirk for ThinkPad X131e

* tag 'platform-drivers-x86-v6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/amd/pmf: Update PMF Driver for Compatibility with new PMF-TA
  platform/x86/amd/pmf: Propagate PMF-TA return codes
  platform/x86/intel/vsec: Add Diamond Rapids support
  platform/x86: thinkpad_acpi: Add battery quirk for ThinkPad X131e
  platform/x86: intel-hid: fix volume buttons on Microsoft Surface Go 4 tablet
  platform/x86/amd/pmf: Initialize and clean up `cb_mutex`

Merge tag 'sound-6.14-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"There is a single change in ALSA core (for sequencer code for the
  module auto-loading in a wrong timing) while the all rest are various
  HD- and USB-audio fixes.

  Many of them are boring device-specific quirks, and should be safe to
  take"

* tag 'sound-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Add support for ASUS Zenbook UM3406KA Laptops using CS35L41 HDA
  ALSA: hda/realtek: Add support for ASUS B5405 and B5605 Laptops using CS35L41 HDA
  ALSA: hda/realtek: Add support for ASUS B3405 and B3605 Laptops using CS35L41 HDA
  ALSA: hda/realtek: Add support for various ASUS Laptops using CS35L41 HDA
  ALSA: hda/realtek: Add support for ASUS ROG Strix G614 Laptops using CS35L41 HDA
  ALSA: hda/realtek: Add support for ASUS ROG Strix GA603 Laptops using CS35L41 HDA
  ALSA: hda/realtek: Add support for ASUS ROG Strix G814 Laptop using CS35L41 HDA
  ALSA: hda: intel: Add Dell ALC3271 to power_save denylist
  ALSA: hda/realtek: update ALC222 depop optimize
  ALSA: hda: realtek: fix incorrect IS_REACHABLE() usage
  ALSA: usx2y: validate nrpacks module parameter on probe
  ALSA: hda/realtek - add supported Mic Mute LED for Lenovo platform
  ALSA: seq: Avoid module auto-load handling at event delivery
  ALSA: hda: Fix speakers on ASUS EXPERTBOOK P5405CSA 1.0
  ALSA: hda/realtek: Fix Asus Z13 2025 audio
  ALSA: hda/realtek: Remove (revert) duplicate Ally X config

virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex

Compared to the SNP Guest Request, the "Extended" version adds data pages for
receiving certificates. If not enough pages provided, the HV can report to the
VM how much is needed so the VM can reallocate and repeat.

Commit

ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex")

moved handling of the allocated/desired pages number out of scope of said
mutex and create a possibility for a race (multiple instances trying to
trigger Extended request in a VM) as there is just one instance of
snp_msg_desc per /dev/sev-guest and no locking other than snp_cmd_mutex.

Fix the issue by moving the data blob/size and the GHCB input struct
(snp_req_data) into snp_guest_req which is allocated on stack now and accessed
by the GHCB caller under that mutex.

Stop allocating SEV_FW_BLOB_MAX_SIZE in snp_msg_alloc() as only one of four
callers needs it. Free the received blob in get_ext_report() right after it is
copied to the userspace. Possible future users of snp_send_guest_request() are
likely to have different ideas about the buffer size anyways.

Fixes: ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex")
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250307013700.437505-3-aik@amd.com

virt: sev-guest: Allocate request data dynamically

Commit

ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex")

narrowed the command mutex scope to snp_send_guest_request(). However,
GET_REPORT, GET_DERIVED_KEY, and GET_EXT_REPORT share the req structure in
snp_guest_dev. Without the mutex protection, concurrent requests can overwrite
each other's data. Fix it by dynamically allocating the request structure.

Fixes: ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex")
Closes: https://github.com/AMDESE/AMDSEV/issues/265
Reported-by: andreas.stuehrk@yaxi.tech
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250307013700.437505-2-aik@amd.com

x86/amd_nb: Use rdmsr_safe() in amd_get_mmconfig_range()

Xen doesn't offer MSR_FAM10H_MMIO_CONF_BASE to all guests.  This results
in the following warning:

  unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
  Call Trace:
   xen_read_msr+0x1e/0x30
   amd_get_mmconfig_range+0x2b/0x80
   quirk_amd_mmconfig_area+0x28/0x100
   pnp_fixup_device+0x39/0x50
   __pnp_add_device+0xf/0x150
   pnp_add_device+0x3d/0x100
   pnpacpi_add_device_handler+0x1f9/0x280
   acpi_ns_get_device_callback+0x104/0x1c0
   acpi_ns_walk_namespace+0x1d0/0x260
   acpi_get_devices+0x8a/0xb0
   pnpacpi_init+0x50/0x80
   do_one_initcall+0x46/0x2e0
   kernel_init_freeable+0x1da/0x2f0
   kernel_init+0x16/0x1b0
   ret_from_fork+0x30/0x50
   ret_from_fork_asm+0x1b/0x30

based on quirks for a "PNP0c01" device.  Treating MMCFG as disabled is the
right course of action, so no change is needed there.

This was most likely exposed by fixing the Xen MSR accessors to not be
silently-safe.

Fixes: 3fac3734c43a ("xen/pv: support selecting safe/unsafe msr accesses")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250307002846.3026685-1-andrew.cooper3@citrix.com

fs/pipe: add simpler helpers for common cases

The fix to atomically read the pipe head and tail state when not holding
the pipe mutex has caused a number of headaches due to the size change
of the involved types.

It turns out that we don't have _that_ many places that access these
fields directly and were affected, but we have more than we strictly
should have, because our low-level helper functions have been designed
to have intimate knowledge of how the pipes work.

And as a result, that random noise of direct 'pipe->head' and
'pipe->tail' accesses makes it harder to pinpoint any actual potential
problem spots remaining.

For example, we didn't have a "is the pipe full" helper function, but
instead had a "given these pipe buffer indexes and this pipe size, is
the pipe full".  That's because some low-level pipe code does actually
want that much more complicated interface.

But most other places literally just want a "is the pipe full" helper,
and not having it meant that those places ended up being unnecessarily
much too aware of this all.

It would have been much better if only the very core pipe code that
cared had been the one aware of this all.

So let's fix it - better late than never.  This just introduces the
trivial wrappers for "is this pipe full or empty" and to get how many
pipe buffers are used, so that instead of writing

        if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))

the places that literally just want to know if a pipe is full can just
say

        if (pipe_is_full(pipe))

instead.  The existing trivial cases were converted with a 'sed' script.

This cuts down on the places that access pipe->head and pipe->tail
directly outside of the pipe code (and core splice code) quite a lot.

The splice code in particular still revels in doing the direct low-level
accesses, and the fuse fuse_dev_splice_write() code also seems a bit
unnecessarily eager to go very low-level, but it's at least a bit better
than it used to be.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'drm-fixes-2025-03-07' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Fixes across the board, mostly xe and imagination with some amd and
  misc others.

  The xe fixes are mostly hmm related, though there are some others in
  there as well, nothing really stands out otherwise.

  The nouveau Kconfig to select FW_CACHE is in this, which we discussed
  a while back.

  nouveau:
   - rely on fw caching Kconfig fix

  imagination:
   - avoid deadlock on fence release
   - fix fence initialisation
   - fix timestamps firmware traces

  scheduler:
   - fix include guard

  bochs:
   - dpms fix

  i915:
   - bump max stream count to match pipes

  xe:
   - Remove double page flip on initial plane
   - Properly setup userptr pfn_flags_mask
   - Fix GT "for each engine" workarounds
   - Fix userptr races and missed validations
   - Userptr invalid page access fixes
   - Cleanup some style nits

  amdgpu:
   - Fix NULL check in DC code
   - SMU 14 fix

  amdkfd:
   - Fix NULL check in queue validation

  radeon:
   - RS400 HyperZ fix"

* tag 'drm-fixes-2025-03-07' of https://gitlab.freedesktop.org/drm/kernel: (22 commits)
  drm/bochs: Fix DPMS regression
  drm/xe/userptr: Unmap userptrs in the mmu notifier
  drm/xe/hmm: Don't dereference struct page pointers without notifier lock
  drm/xe/hmm: Style- and include fixes
  drm/xe: Add staging tree for VM binds
  drm/xe: Fix fault mode invalidation with unbind
  drm/xe/vm: Fix a misplaced #endif
  drm/xe/vm: Validate userptr during gpu vma prefetching
  drm/amd/pm: always allow ih interrupt from fw
  drm/radeon: Fix rs400_gpu_init for ATI mobility radeon Xpress 200M
  drm/amdkfd: Fix NULL Pointer Dereference in KFD queue
  drm/amd/display: Fix null check for pipe_ctx->plane_state in resource_build_scaling_params
  drm/xe: Fix GT "for each engine" workarounds
  drm/xe/userptr: properly setup pfn_flags_mask
  drm/i915/mst: update max stream count to match number of pipes
  drm/xe: Remove double pageflip
  drm/sched: Fix preprocessor guard
  drm/imagination: Fix timestamps in firmware traces
  drm/imagination: only init job done fences once
  drm/imagination: Hold drm_gem_gpuva lock for unmap
  ...

block: Name the RQF flags enum

Commit 5f89154e8e9e3445f9b59 ("block: Use enum to define RQF_x bit
indexes") converted the RQF flags to an anonymous enum, which was
a beneficial change. This patch goes one step further by naming the enum
as "rqf_flags".

This naming enables exporting these flags to BPF clients, eliminating
the need to duplicate these flags in BPF code. Instead, BPF clients can
now access the same kernel-side values through CO:RE (Compile Once, Run
Everywhere), as shown in this example:

rqf_stats = bpf_core_enum_value(enum rqf_flags, __RQF_STATS)

Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20250306-rqf_flags-v1-1-bbd64918b406@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'amd-drm-fixes-6.14-2025-03-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.14-2025-03-06:

amdgpu:
- Fix NULL check in DC code
- SMU 14 fix

amdkfd:
- Fix NULL check in queue validation

radeon:
- RS400 HyperZ fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306193424.27413-1-alexander.deucher@amd.com

Merge tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:

- Fix a compatibility issue: we shouldn't be setting incompat feature
   bits unless explicitly requested

- Fix another bug where the journal alloc/resize path could spuriously
   fail with -BCH_ERR_open_buckets_empty

- Copygc shouldn't run on read-only devices: fragmentation isn't an
   issue if we're not currently writing to a given device, and it may
   not have anywhere to move the data to

* tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefs:
  bcachefs: copygc now skips non-rw devices
  bcachefs: Fix bch2_dev_journal_alloc() spuriously failing
  bcachefs: Don't set BCH_FEATURE_incompat_version_field unless requested

bcachefs: copygc now skips non-rw devices

There's no point in doing copygc on non-rw devices: the fragmentation
doesn't matter if we're not writing to them, and we may not have
anywhere to put the data on our other devices.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_dev_journal_alloc() spuriously failing

Previously, we fixed journal resize spuriousl failing with
-BCH_ERR_open_buckets_empty, but initial journal allocation was missed
because it didn't invoke the "block on allocator" loop at all.

Factor out the "loop on allocator" code to fix that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge tag 'drm-xe-fixes-2025-03-06' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Remove double page flip on initial plane (Maarten)
- Properly setup userptr pfn_flags_mask (Auld)
- Fix GT "for each engine" workarounds (Tvrtko)
- Fix userptr races and missed validations (Thomas, Brost)
- Userptr invalid page access fixes (Thomas)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z8ni6w3tskCFL11O@intel.com

Merge tag 'drm-intel-fixes-2025-03-06' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- DP MST fix (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z8ng8NjmRGiVcb5t@intel.com

Merge tag 'drm-misc-fixes-2025-03-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A Kconfig fix for nouveau, locking and timestamp fixes for imagination,
a header guard fix for sched and a DPMS regression fix for bochs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306-antelope-of-imminent-anger-bca19e@houat

x86/boot: Sanitize boot params before parsing command line

The 5-level paging code parses the command line to look for the 'no5lvl'
string, and does so very early, before sanitize_boot_params() has been
called and has been given the opportunity to wipe bogus data from the
fields in boot_params that are not covered by struct setup_header, and
are therefore supposed to be initialized to zero by the bootloader.

This triggers an early boot crash when using syslinux-efi to boot a
recent kernel built with CONFIG_X86_5LEVEL=y and CONFIG_EFI_STUB=n, as
the 0xff padding that now fills the unused PE/COFF header is copied into
boot_params by the bootloader, and interpreted as the top half of the
command line pointer.

Fix this by sanitizing the boot_params before use. Note that there is no
harm in calling this more than once; subsequent invocations are able to
spot that the boot_params have already been cleaned up.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org> # v6.1+
Link: https://lore.kernel.org/r/20250306155915.342465-2-ardb+git@google.com
Closes: https://lore.kernel.org/all/202503041549.35913.ulrich.gemkow@ikr.uni-stuttgart.de

Merge tag 'net-6.14-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and wireless.

  Current release - new code bugs:

   - wifi: nl80211: disable multi-link reconfiguration

  Previous releases - regressions:

   - gso: fix ownership in __udp_gso_segment

   - wifi: iwlwifi:
      - fix A-MSDU TSO preparation
      - free pages allocated when failing to build A-MSDU

   - ipv6: fix dst ref loop in ila lwtunnel

   - mptcp: fix 'scheduling while atomic' in
     mptcp_pm_nl_append_new_local_addr

   - bluetooth: add check for mgmt_alloc_skb() in
     mgmt_device_connected()

   - ethtool: allow NULL nlattrs when getting a phy_device

   - eth: be2net: fix sleeping while atomic bugs in
     be_ndo_bridge_getlink

  Previous releases - always broken:

   - core: support TCP GSO case for a few missing flags

   - wifi: mac80211:
      - fix vendor-specific inheritance
      - cleanup sta TXQs on flush

   - llc: do not use skb_get() before dev_queue_xmit()

   - eth: ipa: nable checksum for IPA_ENDPOINT_AP_MODEM_{RX,TX}
     for v4.7"

* tag 'net-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (41 commits)
  net: ipv6: fix missing dst ref drop in ila lwtunnel
  net: ipv6: fix dst ref loop in ila lwtunnel
  mctp i3c: handle NULL header address
  net: dsa: mt7530: Fix traffic flooding for MMIO devices
  net-timestamp: support TCP GSO case for a few missing flags
  vlan: enforce underlying device type
  mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
  net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device
  ppp: Fix KMSAN uninit-value warning with bpf
  net: ipa: Enable checksum for IPA_ENDPOINT_AP_MODEM_{RX,TX} for v4.7
  net: ipa: Fix QSB data for v4.7
  net: ipa: Fix v4.7 resource group names
  net: hns3: make sure ptp clock is unregister and freed if hclge_ptp_get_cycle returns an error
  wifi: nl80211: disable multi-link reconfiguration
  net: dsa: rtl8366rb: don't prompt users for LED control
  be2net: fix sleeping while atomic bugs in be_ndo_bridge_getlink
  llc: do not use skb_get() before dev_queue_xmit()
  wifi: cfg80211: regulatory: improve invalid hints checking
  caif_virtio: fix wrong pointer check in cfv_probe()
  net: gso: fix ownership in __udp_gso_segment
  ...

Merge tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbd

Pull smb fixes from Steve French:
"Five SMB server fixes, two related client fixes, and minor MAINTAINERS
  update:

   - Two SMB3 lock fixes fixes (including use after free and bug on fix)

   - Fix to race condition that can happen in processing IPC responses

   - Four ACL related fixes: one related to endianness of num_aces, and
     two related fixes to the checks for num_aces (for both client and
     server), and one fixing missing check for num_subauths which can
     cause memory corruption

   - And minor update to email addresses in MAINTAINERS file"

* tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbd:
  cifs: fix incorrect validation for num_aces field of smb_acl
  ksmbd: fix incorrect validation for num_aces field of smb_acl
  smb: common: change the data type of num_aces to le16
  ksmbd: fix bug on trap in smb2_lock
  ksmbd: fix use-after-free in smb2_lock
  ksmbd: fix type confusion via race condition when using ipc_msg_send_request
  ksmbd: fix out-of-bounds in parse_sec_desc()
  MAINTAINERS: update email address in cifs and ksmbd entry

Merge tag 'exfat-for-6.14-rc6' of git://git./linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:

- Optimize new cluster allocation by correctly find empty entry slot

- Add a check to prevent excessive bitmap clearing due to invalid
   data size of file/dir entry

- Fix incorrect error return for zero-byte writes

* tag 'exfat-for-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: add a check for invalid data size
  exfat: short-circuit zero-byte writes in exfat_file_write_iter
  exfat: fix soft lockup in exfat_clear_bitmap
  exfat: fix just enough dentries but allocate a new cluster to dir