git.kernel.dk Git - linux-block.git/log

kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count

Expose a simple counter to userspace for monitoring tools.

Link: https://lkml.kernel.org/r/20250504180831.4190860-3-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Cc: Core Minyard <cminyard@mvista.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Song Liu <song@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count

Patch series "sysfs: add counters for lockups and stalls", v2.

Commits 9db89b411170 ("exit: Expose "oops_count" to sysfs") and
8b05aa263361 ("panic: Expose "warn_count" to sysfs") added counters for
oopses and warnings to sysfs, and these two patches do the same for
hard/soft lockups and RCU stalls.

All of these counters are useful for monitoring tools to detect whether
the machine is healthy.  If the kernel has experienced a lockup or a
stall, it's probably due to a kernel bug, and I'd like to detect that
quickly and easily.  There is currently no way to detect that, other than
parsing dmesg.  Or observing indirect effects: such as certain tasks not
responding, but then I need to observe all tasks, and it may take a while
until these effects become visible/measurable.  I'd rather be able to
detect the primary cause more quickly, possibly before everything falls
apart.

This patch (of 2):

There is /proc/sys/kernel/hung_task_detect_count, /sys/kernel/warn_count
and /sys/kernel/oops_count but there is no userspace-accessible counter
for hard/soft lockups.  Having this is useful for monitoring tools.

Link: https://lkml.kernel.org/r/20250504180831.4190860-1-max.kellermann@ionos.com
Link: https://lkml.kernel.org/r/20250504180831.4190860-2-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Cc:
Cc: Core Minyard <cminyard@mvista.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

x86/crash: make the page that stores the dm crypt keys inaccessible

This adds an addition layer of protection for the saved copy of dm crypt
key. Trying to access the saved copy will cause page fault.

Link: https://lkml.kernel.org/r/20250502011246.99238-9-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Suggested-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

x86/crash: pass dm crypt keys to kdump kernel

1st kernel will build up the kernel command parameter dmcryptkeys as
similar to elfcorehdr to pass the memory address of the stored info of dm
crypt key to kdump kernel.

Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Revert "x86/mm: Remove unused __set_memory_prot()"

This reverts commit 693bbf2a50447353c6a47961e6a7240a823ace02 as kdump LUKS
support (CONFIG_CRASH_DM_CRYPT) depends on __set_memory_prot.

[akpm@linux-foundation.org: x86 set_memory.h needs pgtable_types.h]
Link: https://lkml.kernel.org/r/20250502011246.99238-7-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash_dump: retrieve dm crypt keys in kdump kernel

Crash kernel will retrieve the dm crypt keys based on the dmcryptkeys
command line parameter. When user space writes the key description to
/sys/kernel/config/crash_dm_crypt_key/restore, the crash kernel will save
the encryption keys to the user keyring. Then user space e.g.
cryptsetup's --volume-key-keyring API can use it to unlock the encrypted
device.

Link: https://lkml.kernel.org/r/20250502011246.99238-6-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash_dump: reuse saved dm crypt keys for CPU/memory hot-plugging

When there are CPU and memory hot un/plugs, the dm crypt keys may need to
be reloaded again depending on the solution for crash hotplug support.
Currently, there are two solutions.  One is to utilizes udev to instruct
user space to reload the kdump kernel image and initrd, elfcorehdr and etc
again.  The other is to only update the elfcorehdr segment introduced in
commit 247262756121 ("crash: add generic infrastructure for crash hotplug
support").

For the 1st solution, the dm crypt keys need to be reloaded again.  The
user space can write true to /sys/kernel/config/crash_dm_crypt_key/reuse
so the stored keys can be re-used.

For the 2nd solution, the dm crypt keys don't need to be reloaded.
Currently, only x86 supports the 2nd solution.  If the 2nd solution gets
extended to all arches, this patch can be dropped.

Link: https://lkml.kernel.org/r/20250502011246.99238-5-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash_dump: store dm crypt keys in kdump reserved memory

When the kdump kernel image and initrd are loaded, the dm crypts keys will
be read from keyring and then stored in kdump reserved memory.

Assume a key won't exceed 256 bytes thus MAX_KEY_SIZE=256 according to
"cryptsetup benchmark".

Link: https://lkml.kernel.org/r/20250502011246.99238-4-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash_dump: make dm crypt keys persist for the kdump kernel

A configfs /sys/kernel/config/crash_dm_crypt_keys is provided for user
space to make the dm crypt keys persist for the kdump kernel.  Take the
case of dumping to a LUKS-encrypted target as an example, here is the life
cycle of the kdump copies of LUKS volume keys,

1. After the 1st kernel loads the initramfs during boot, systemd uses
    an user-input passphrase to de-crypt the LUKS volume keys or simply
    TPM-sealed volume keys and then save the volume keys to specified
    keyring (using the --link-vk-to-keyring API) and the keys will expire
    within specified time.

2. A user space tool (kdump initramfs loader like kdump-utils) create
    key items inside /sys/kernel/config/crash_dm_crypt_keys to inform
    the 1st kernel which keys are needed.

3. When the kdump initramfs is loaded by the kexec_file_load
    syscall, the 1st kernel will iterate created key items, save the
    keys to kdump reserved memory.

4. When the 1st kernel crashes and the kdump initramfs is booted, the
    kdump initramfs asks the kdump kernel to create a user key using the
    key stored in kdump reserved memory by writing yes to
    /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted
    device is unlocked with libcryptsetup's --volume-key-keyring API.

5. The system gets rebooted to the 1st kernel after dumping vmcore to
    the LUKS encrypted device is finished

Eventually the keys have to stay in the kdump reserved memory for the
kdump kernel to unlock encrypted volumes.  During this process, some
measures like letting the keys expire within specified time are desirable
to reduce security risk.

This patch assumes,
1) there are 128 LUKS devices at maximum to be unlocked thus
   MAX_KEY_NUM=128.

2) a key description won't exceed 128 bytes thus KEY_DESC_MAX_LEN=128.

And here is a demo on how to interact with
/sys/kernel/config/crash_dm_crypt_keys,

    # Add key #1
    mkdir /sys/kernel/config/crash_dm_crypt_keys/7d26b7b4-e342-4d2d-b660-7426b0996720
    # Add key #1's description
    echo cryptsetup:7d26b7b4-e342-4d2d-b660-7426b0996720 > /sys/kernel/config/crash_dm_crypt_keys/description

    # how many keys do we have now?
    cat /sys/kernel/config/crash_dm_crypt_keys/count
    1

    # Add key# 2 in the same way

    # how many keys do we have now?
    cat /sys/kernel/config/crash_dm_crypt_keys/count
    2

    # the tree structure of /crash_dm_crypt_keys configfs
    tree /sys/kernel/config/crash_dm_crypt_keys/
    /sys/kernel/config/crash_dm_crypt_keys/
    ├── 7d26b7b4-e342-4d2d-b660-7426b0996720
    │   └── description
    ├── count
    ├── fce2cd38-4d59-4317-8ce2-1fd24d52c46a
    │   └── description

Link: https://lkml.kernel.org/r/20250502011246.99238-3-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kexec_file: allow to place kexec_buf randomly

Patch series "Support kdump with LUKS encryption by reusing LUKS volume
keys", v9.

LUKS is the standard for Linux disk encryption, widely adopted by users,
and in some cases, such as Confidential VMs, it is a requirement.  With
kdump enabled, when the first kernel crashes, the system can boot into the
kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) to a
specified target.  However, there are two challenges when dumping vmcore
to a LUKS-encrypted device:

- Kdump kernel may not be able to decrypt the LUKS partition. For some
   machines, a system administrator may not have a chance to enter the
   password to decrypt the device in kdump initramfs after the 1st kernel
   crashes; For cloud confidential VMs, depending on the policy the
   kdump kernel may not be able to unseal the keys with TPM and the
   console virtual keyboard is untrusted.

- LUKS2 by default use the memory-hard Argon2 key derivation function
   which is quite memory-consuming compared to the limited memory reserved
   for kdump. Take Fedora example, by default, only 256M is reserved for
   systems having memory between 4G-64G. With LUKS enabled, ~1300M needs
   to be reserved for kdump. Note if the memory reserved for kdump can't
   be used by 1st kernel i.e. an user sees ~1300M memory missing in the
   1st kernel.

Besides users (at least for Fedora) usually expect kdump to work out of
the box i.e.  no manual password input or custom crashkernel value is
needed.  And it doesn't make sense to derivate the keys again in kdump
kernel which seems to be redundant work.

This patchset addresses the above issues by making the LUKS volume keys
persistent for kdump kernel with the help of cryptsetup's new APIs
(--link-vk-to-keyring/--volume-key-keyring).  Here is the life cycle of
the kdump copies of LUKS volume keys,

1. After the 1st kernel loads the initramfs during boot, systemd
    use an user-input passphrase to de-crypt the LUKS volume keys
    or TPM-sealed key and then save the volume keys to specified keyring
    (using the --link-vk-to-keyring API) and the key will expire within
    specified time.

2. A user space tool (kdump initramfs loader like kdump-utils) create
    key items inside /sys/kernel/config/crash_dm_crypt_keys to inform
    the 1st kernel which keys are needed.

3. When the kdump initramfs is loaded by the kexec_file_load
    syscall, the 1st kernel will iterate created key items, save the
    keys to kdump reserved memory.

4. When the 1st kernel crashes and the kdump initramfs is booted, the
    kdump initramfs asks the kdump kernel to create a user key using the
    key stored in kdump reserved memory by writing yes to
    /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted
    device is unlocked with libcryptsetup's --volume-key-keyring API.

5. The system gets rebooted to the 1st kernel after dumping vmcore to
    the LUKS encrypted device is finished

After libcryptsetup saving the LUKS volume keys to specified keyring,
whoever takes this should be responsible for the safety of these copies of
keys.  The keys will be saved in the memory area exclusively reserved for
kdump where even the 1st kernel has no direct access.  And further more,
two additional protections are added,
- save the copy randomly in kdump reserved memory as suggested by Jan
- clear the _PAGE_PRESENT flag of the page that stores the copy as
   suggested by Pingfan

This patchset only supports x86.  There will be patches to support other
architectures once this patch set gets merged.

This patch (of 9):

Currently, kexec_buf is placed in order which means for the same machine,
the info in the kexec_buf is always located at the same position each time
the machine is booted.  This may cause a risk for sensitive information
like LUKS volume key.  Now struct kexec_buf has a new field random which
indicates it's supposed to be placed in a random position.

Note this feature is enabled only when CONFIG_CRASH_DUMP is enabled.  So
it only takes effect for kdump and won't impact kexec reboot.

Link: https://lkml.kernel.org/r/20250502011246.99238-1-coxu@redhat.com
Link: https://lkml.kernel.org/r/20250502011246.99238-2-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Suggested-by: Jan Pazdziora <jpazdziora@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

list: remove redundant 'extern' for function prototypes

The 'extern' keyword is redundant for function prototypes. list.h never
used them and new code in general is better without them.

Link: https://lkml.kernel.org/r/20250502121742.3997529-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/gdb: update documentation for lx_per_cpu

Commit db08c53fdd542bb7f83b ("scripts/gdb: fix parameter handling in
$lx_per_cpu") changed the parameter handling of lx_per_cpu to use GdbValue
instead of parsing the variable name. Update the documentation to reflect
the new lx_per_cpu usage. Update the hrtimer_bases example to use rb_tree
instead of the timerqueue_head.next pointer removed in commit
511885d7061eda3eb1fa ("lib/timerqueue: Rely on rbtree semantics for next
timer").

Link: https://lkml.kernel.org/r/20250503123234.2407184-3-illia@yshyn.com
Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Dongliang Mu <dzm91@hust.edu.cn>
Cc: Florian Rommel <mail@florommel.de>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/gdb: fix kgdb probing on single-core systems

Patch series "scripts/gdb: Fixes related to lx_per_cpu()".

These patches (1) fix kgdb detection on systems featuring a single CPU and
(2) update the documentation to reflect the current usage of lx_per_cpu()
and update an outdated example of its usage.

This patch (of 2):

When requested the list of threads via qfThreadInfo, gdb_cmd_query in
kernel/debug/gdbstub.c first returns "shadow" threads for CPUs followed by
the actual tasks in the system.  Extended qThreadExtraInfo queries yield
"shadowCPU%d" as the name for the CPU core threads.

This behavior is used by get_gdbserver_type() to probe for KGDB by
matching the name for the thread 2 against "shadowCPU".  This breaks down
on single-core systems, where thread 2 is the first nonshadow thread.
Request the name for thread 1 instead.

As GDB assigns thread IDs in the order of their appearance, it is safe to
assume shadowCPU0 at ID 1 as long as CPU0 is not hotplugged.

Before:

(gdb) info threads
  Id   Target Id                      Frame
  1    Thread 4294967294 (shadowCPU0) kgdb_breakpoint ()
* 2    Thread 1 (swapper/0)           kgdb_breakpoint ()
  3    Thread 2 (kthreadd)            0x0000000000000000 in ?? ()
  ...
(gdb) p $lx_current().comm
Sorry, obtaining the current CPU is not yet supported with this gdb server.

After:

(gdb) info threads
  Id   Target Id                      Frame
  1    Thread 4294967294 (shadowCPU0) kgdb_breakpoint ()
* 2    Thread 1 (swapper/0)           kgdb_breakpoint ()
  3    Thread 2 (kthreadd)            0x0000000000000000 in ?? ()
  ...
(gdb) p $lx_current().comm
$1 = "swapper/0\000\000\000\000\000\000"

Link: https://lkml.kernel.org/r/20250503123234.2407184-1-illia@yshyn.com
Link: https://lkml.kernel.org/r/20250503123234.2407184-2-illia@yshyn.com
Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Dongliang Mu <dzm91@hust.edu.cn>
Cc: Florian Rommel <mail@florommel.de>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests: fix some typos in tools/testing/selftests

Fix multiple spelling errors:

- "rougly" -> "roughly"
- "fielesystems" -> "filesystems"
- "Can'" -> "Can't"

Link: https://lkml.kernel.org/r/20250503211959.507815-1-chelsyratnawat2001@gmail.com
Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/oid_registry.c: remove unused sprint_OID

sprint_OID() was added as part of 2012's commit 4f73175d0375 ("X.509: Add
utility functions to render OIDs as strings") but it hasn't been used.
Remove it.

Note that there's also 'sprint_oid' (lower case) which is used in a lot of
places; that's left as is except for fixing its case in a comment.

Link: https://lkml.kernel.org/r/20250501010502.326472-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: do not propagate ENOENT error from nilfs_btree_propagate()

In preparation for writing logs, in nilfs_btree_propagate(), which makes
parent and ancestor node blocks dirty starting from a modified data block
or b-tree node block, if the starting block does not belong to the b-tree,
i.e. is isolated, nilfs_btree_do_lookup() called within the function
fails with -ENOENT.

In this case, even though -ENOENT is an internal code, it is propagated to
the log writer via nilfs_bmap_propagate() and may be erroneously returned
to system calls such as fsync().

Fix this issue by changing the error code to -EINVAL in this case, and
having the bmap layer detect metadata corruption and convert the error
code appropriately.

Link: https://lkml.kernel.org/r/20250428173808.6452-3-konishi.ryusuke@gmail.com
Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: add pointer check for nilfs_direct_propagate()

Patch series "nilfs2: improve sanity checks in dirty state propagation".

This fixes one missed check for block mapping anomalies and one improper
return of an error code during a preparation step for log writing, thereby
improving checking for filesystem corruption on writeback.

This patch (of 2):

In nilfs_direct_propagate(), the printer get from nilfs_direct_get_ptr()
need to be checked to ensure it is not an invalid pointer.

If the pointer value obtained by nilfs_direct_get_ptr() is
NILFS_BMAP_INVALID_PTR, means that the metadata (in this case, i_bmap in
the nilfs_inode_info struct) that should point to the data block at the
buffer head of the argument is corrupted and the data block is orphaned,
meaning that the file system has lost consistency.

Add a value check and return -EINVAL when it is an invalid pointer.

Link: https://lkml.kernel.org/r/20250428173808.6452-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250428173808.6452-2-konishi.ryusuke@gmail.com
Fixes: 36a580eb489f ("nilfs2: direct block mapping")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kexec_file: use SHA-256 library API instead of crypto_shash API

This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value. Just use the SHA-256 library
API instead, which is much simpler and easier to use.

Tested with '/sbin/kexec --kexec-file-syscall'.

Link: https://lkml.kernel.org/r/20250428185721.844686-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

util_macros.h: fix the reference in kernel-doc

In PTR_IF() description the text refers to the parameter as (ptr) while
the kernel-doc format asks for @ptr. Fix this accordingly.

Link: https://lkml.kernel.org/r/20250428072737.3265239-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

sort.h: hoist cmp_int() into generic header file

Deduplicate the same functionality implemented in several places by
moving the cmp_int() helper macro into linux/sort.h.

The macro performs a three-way comparison of the arguments mostly useful
in different sorting strategies and algorithms.

Link: https://lkml.kernel.org/r/20250427201451.900730-1-pchelkin@ispras.ru
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carlos Maiolino <cem@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Coly Li <colyli@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: remove unnecessary NULL check before unregister_sysctl_table()

unregister_sysctl_table() checks for NULL pointers internally. Remove
unneeded NULL check here.

Link: https://lkml.kernel.org/r/20250422073051.1334310-1-nichen@iscas.ac.cn
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: fix possible memory leak in ocfs2_finish_quota_recovery

If ocfs2_finish_quota_recovery() exits due to an error before passing all
rc_list elements to ocfs2_recover_local_quota_file() then it can lead to a
memory leak as rc_list may still contain elements that have to be freed.

Release all memory allocated by ocfs2_add_recovery_chunk() using
ocfs2_free_quota_recovery() instead of kfree().

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Link: https://lkml.kernel.org/r/20250402065628.706359-2-m.masimov@mt-integration.ru
Fixes: 2205363dce74 ("ocfs2: Implement quota recovery")
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ipc: fix to protect IPCS lookups using RCU

syzbot reported that it discovered a use-after-free vulnerability, [0]

[0]: https://lore.kernel.org/all/67af13f8.050a0220.21dd3.0038.GAE@google.com/

idr_for_each() is protected by rwsem, but this is not enough. If it is
not protected by RCU read-critical region, when idr_for_each() calls
radix_tree_node_free() through call_rcu() to free the radix_tree_node
structure, the node will be freed immediately, and when reading the next
node in radix_tree_for_each_slot(), the already freed memory may be read.

Therefore, we need to add code to make sure that idr_for_each() is
protected within the RCU read-critical region when we call it in
shm_destroy_orphaned().

Link: https://lkml.kernel.org/r/20250424143322.18830-1-aha310510@gmail.com
Fixes: b34a6b1da371 ("ipc: introduce shm_rmid_forced sysctl")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reported-by: syzbot+a2b84e569d06ca3a949c@syzkaller.appspotmail.com
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

compiler_types.h: fix "unused variable" in __compiletime_assert()

This refines commit c03567a8e8d5 ("include/linux/compiler.h: don't perform
compiletime_assert with -O0") and restores #ifdef __OPTIMIZE__ symmetry by
evaluating the 'condition' variable in both compile-time variants of
__compiletimeassert().

As __OPTIMIZE__ is always true by default, this commit does not change
anything by default.  But it fixes warnings with _non-default_ CFLAGS like
for instance this:

make  CFLAGS_tcp.o='-Og -U__OPTIMIZE__'

                 from net/ipv4/tcp.c:273:

include/net/sch_generic.h: In function `qdisc_cb_private_validate':
include/net/sch_generic.h:511:30:
            error: unused variable `qcb' [-Werror=unused-variable]

  {
     struct qdisc_skb_cb *qcb;

     BUILD_BUG_ON(sizeof(skb->cb) < sizeof(*qcb));
     ...
  }

[akpm@linux-foundation.org: regularize comment layout, reflow comment]
Link: https://lkml.kernel.org/r/20250424194048.652571-1-marc.herbert@linux.intel.com
Signed-off-by: Marc Herbert <Marc.Herbert@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Jan Hendrik Farr <kernel@jfarr.cc>
Cc: Macro Elver <elver@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Tony Ambardar <tony.ambardar@gmail.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

maccess: fix strncpy_from_user_nofault() empty string handling

strncpy_from_user_nofault() should return the length of the copied string
including the trailing NUL, but if the argument unsafe_addr points to an
empty string ({'\0'}), the return value is 0.

This happens as strncpy_from_user() copies terminal symbol into dst and
returns 0 (as expected), but strncpy_from_user_nofault does not modify ret
as it is not equal to count and not greater than 0, so 0 is returned,
which contradicts the contract.

Link: https://lkml.kernel.org/r/20250422131449.57177-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Andrii Nakryiko <andrii@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

watchdog: fix watchdog may detect false positive of softlockup

When updating `watchdog_thresh`, there is a race condition between writing
the new `watchdog_thresh` value and stopping the old watchdog timer.  If
the old timer triggers during this window, it may falsely detect a
softlockup due to the old interval and the new `watchdog_thresh` value
being used.  The problem can be described as follow:

# We asuume previous watchdog_thresh is 60, so the watchdog timer is
# coming every 24s.
echo 10 > /proc/sys/kernel/watchdog_thresh (User space)
|
+------>+ update watchdog_thresh (We are in kernel now)
|
|   # using old interval and new `watchdog_thresh`
+------>+ watchdog hrtimer (irq context: detect softlockup)
|
|
+-------+
|
|
+ softlockup_stop_all

To fix this problem, introduce a shadow variable for `watchdog_thresh`.
The update to the actual `watchdog_thresh` is delayed until after the old
timer is stopped, preventing false positives.

The following testcase may help to understand this problem.

---------------------------------------------
echo RT_RUNTIME_SHARE > /sys/kernel/debug/sched/features
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
echo 0 > /sys/kernel/debug/sched/fair_server/cpu3/runtime
echo 60 > /proc/sys/kernel/watchdog_thresh
taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" &
echo 10 > /proc/sys/kernel/watchdog_thresh &
---------------------------------------------

The test case above first removes the throttling restrictions for
real-time tasks.  It then sets watchdog_thresh to 60 and executes a
real-time task ,a simple while(1) loop, on cpu3.  Consequently, the final
command gets blocked because the presence of this real-time thread
prevents kworker:3 from being selected by the scheduler.  This eventually
triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating
with inconsistent variable - using both the old interval and the updated
watchdog_thresh simultaneously.

[nysal@linux.ibm.com: fix the SOFTLOCKUP_DETECTOR=n case]
Link: https://lkml.kernel.org/r/20250502111120.282690-1-nysal@linux.ibm.com
Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.com
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Nysal Jan K.A." <nysal@linux.ibm.com>
Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

treewide: fix typo "previlege"

There are some spelling mistakes of 'previlege' in comments which
should be 'privilege'.

Fix them and add it to scripts/spelling.txt.

The typo in arch/loongarch/kvm/main.c was corrected by a different
patch [1] and is therefore not included in this submission.

[1]. https://lore.kernel.org/all/20250420142208.2252280-1-wheatfox17@icloud.com/

Link: https://lkml.kernel.org/r/46AD404E411A4BAC+20250421074910.66988-1-wangyuli@uniontech.com
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash: fix spelling mistake "crahskernel" -> "crashkernel"

There is a spelling mistake in a pr_warn message. Fix it.

Link: https://lkml.kernel.org/r/20250418120331.535086-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/test_kmod: do not hardcode/depend on any filesystem

Right now test_kmod has hardcoded dependencies on btrfs/xfs.  That is not
optimal since you end up needing to select/build them, but it is not
really required since other fs could be selected for the testing.  Also,
we can't change the default/driver module used for testing on
initialization.

Thus make it more generic: introduce two module parameters (start_driver
and start_test_fs), which allow to select which modules/fs to use for the
testing on test_kmod initialization.  Then it's up to the user to select
which modules/fs to use for testing based on his config.  However, keep
test_module as required default.

This way, config/modules becomes selectable as when the testing is done
from selftests (userspace).

While at it, also change trigger_config_run_type, since at module
initialization we already set the defaults at __kmod_config_init and
should not need to do it again in test_kmod_init(), thus we can avoid to
again set test_driver/test_fs.

Link: https://lkml.kernel.org/r/20250418165047.702487-1-herton@redhat.com
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Reviewed-by: Luis Chambelrain <mcgrof@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

relay: remove unused relay_late_setup_files

The last use of relay_late_setup_files() was removed in 2018 by commit
2b47733045aa ("drm/i915/guc: Merge log relay file and channel creation")

Remove it and the helper it used.

relay_late_setup_files() was used for eventually registering 'buffer only'
channels. With it gone, delete the docs that explain how to do that.
Which suggests it should be possible to lose the 'has_base_filename'
flags.

(Are there any other uses??)

Link: https://lkml.kernel.org/r/20250418234932.490863-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

rapidio: remove unused functions

rio_request_dma() and rio_dma_prep_slave_sg() were added in 2012 by commit
e42d98ebe7d7 ("rapidio: add DMA engine support for RIO data transfers")
but never used.

rio_find_mport() last use was removed in 2013 by commit 9edbc30b434f
("rapidio: update enumerator registration mechanism")

rio_unregister_scan() was added in 2013 by commit a11650e11093 ("rapidio:
make enumeration/discovery configurable") but never used.

Remove them.

Link: https://lkml.kernel.org/r/20250419203012.429787-3-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

rapidio: remove some dead defines

Patch series "rapidio deadcoding".

A couple of rapidio deadcoding patches. The first of these is a repost
and was originally posted almost a year ago
https://lore.kernel.org/all/20240528002515.211366-1-linux@treblig.org/ but
got no answer. Other than being rebased and a typo fixed, it's not
changed.

This patch (of 2):

'mport_dma_buf', 'rio_mport_dma_map' and 'MPORT_MAX_DMA_BUFS' were added
in the original commit e8de370188d0 ("rapidio: add mport char device
driver") but never used.

'rio_cm_work' was unused since the original commit b6e8d4aa1110 ("rapidio:
add RapidIO channelized messaging driver") but never used.

Remove them.

Link: https://lkml.kernel.org/r/20250419203012.429787-1-linux@treblig.org
Link: https://lkml.kernel.org/r/20250419203012.429787-2-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scatterlist: inline sg_next()

sg_next() is a short function called frequently in I/O paths. Define it
in the header file so it can be inlined into its callers.

Link: https://lkml.kernel.org/r/20250416160615.3571958-1-csander@purestorage.com
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: simplify return statement in ocfs2_filecheck_attr_store()

Don't negate 'ret' and simplify the return statement.

No functional changes intended.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

samples: extend hung_task detector test with semaphore support

Extend the existing hung_task detector test module to support multiple
lock types, including mutex and semaphore, with room for future additions
(e.g., spinlock, etc.).  This module creates dummy files under
<debugfs>/hung_task, such as 'mutex' and 'semaphore'.  The read process on
any of these files will sleep for enough long time (256 seconds) while
holding the respective lock.  As a result, the second process will wait on
the lock for a prolonged duration and be detected by the hung_task
detector.

This change unifies the previous mutex-only sample into a single,
extensible hung_task_tests module, reducing code duplication and improving
maintainability.

Usage is:

> cd /sys/kernel/debug/hung_task
> cat mutex & cat mutex          # Test mutex blocking
> cat semaphore & cat semaphore  # Test semaphore blocking

Update the Kconfig description to reflect multiple debugfs files support.

Link: https://lkml.kernel.org/r/20250414145945.84916-4-ioworker0@gmail.com
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Signed-off-by: Zi Li <amaindex@outlook.com>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mingzhe Yang <mingzhe.yang@ly.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

hung_task: show the blocker task if the task is hung on semaphore

Inspired by mutex blocker tracking[1], this patch makes a trade-off to
balance the overhead and utility of the hung task detector.

Unlike mutexes, semaphores lack explicit ownership tracking, making it
challenging to identify the root cause of hangs.  To address this, we
introduce a last_holder field to the semaphore structure, which is updated
when a task successfully calls down() and cleared during up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it.  While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr  8 12:19:07 2025]       Tainted: G            E      6.14.0-rc6+ #1
[Tue Apr  8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr  8 12:19:07 2025] task:cat             state:D stack:0     pid:945   tgid:945   ppid:828    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0xe3/0xf0
[Tue Apr  8 12:19:07 2025]  ? __folio_mod_stat+0x2a/0x80
[Tue Apr  8 12:19:07 2025]  ? set_ptes.constprop.0+0x27/0x90
[Tue Apr  8 12:19:07 2025]  __down_common+0x155/0x280
[Tue Apr  8 12:19:07 2025]  down+0x53/0x70
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x23/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>
[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr  8 12:19:07 2025] task:cat             state:S stack:0     pid:938   tgid:938   ppid:584    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0x77/0xf0
[Tue Apr  8 12:19:07 2025]  ? __pfx_process_timeout+0x10/0x10
[Tue Apr  8 12:19:07 2025]  msleep_interruptible+0x49/0x60
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x2d/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>

[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com

Link: https://lkml.kernel.org/r/20250414145945.84916-3-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Li <amaindex@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

hung_task: replace blocker_mutex with encoded blocker

Patch series "hung_task: extend blocking task stacktrace dump to
semaphore", v5.

Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores.  Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs.  To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it.  While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr  8 12:19:07 2025]       Tainted: G            E      6.14.0-rc6+ #1
[Tue Apr  8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr  8 12:19:07 2025] task:cat             state:D stack:0     pid:945   tgid:945   ppid:828    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0xe3/0xf0
[Tue Apr  8 12:19:07 2025]  ? __folio_mod_stat+0x2a/0x80
[Tue Apr  8 12:19:07 2025]  ? set_ptes.constprop.0+0x27/0x90
[Tue Apr  8 12:19:07 2025]  __down_common+0x155/0x280
[Tue Apr  8 12:19:07 2025]  down+0x53/0x70
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x23/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>
[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr  8 12:19:07 2025] task:cat             state:S stack:0     pid:938   tgid:938   ppid:584    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0x77/0xf0
[Tue Apr  8 12:19:07 2025]  ? __pfx_process_timeout+0x10/0x10
[Tue Apr  8 12:19:07 2025]  msleep_interruptible+0x49/0x60
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x2d/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>

This patch (of 3):

This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long
blocker', as only one blocker is active at a time.

The blocker filed can store both the lock addrees and the lock type, with
LSB used to encode the type as Masami suggested, making it easier to
extend the feature to cover other types of locks.

Also, once the lock type is determined, we can directly extract the
address and cast it to a lock pointer ;)

Link: https://lkml.kernel.org/r/20250414145945.84916-1-ioworker0@gmail.com
Link: https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
Link: https://lkml.kernel.org/r/20250414145945.84916-2-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Li <amaindex@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: o2net_idle_timer: Rename del_timer_sync in comment

Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()")
switched del_timer_sync to timer_delete_sync, but did not modify the
comment for o2net_idle_timer(). Now fix it.

Link: https://lkml.kernel.org/r/BDDB1E4E2876C36C+20250411102610.165946-1-wangyuli@uniontech.com
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Squashfs: check return result of sb_min_blocksize

Syzkaller reports an "UBSAN: shift-out-of-bounds in squashfs_bio_read" bug.

Syzkaller forks multiple processes which after mounting the Squashfs
filesystem, issues an ioctl("/dev/loop0", LOOP_SET_BLOCK_SIZE, 0x8000).
Now if this ioctl occurs at the same time another process is in the
process of mounting a Squashfs filesystem on /dev/loop0, the failure
occurs. When this happens the following code in squashfs_fill_super()
fails.

----
msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE);
msblk->devblksize_log2 = ffz(~msblk->devblksize);
----

sb_min_blocksize() returns 0, which means msblk->devblksize is set to 0.

As a result, ffz(~msblk->devblksize) returns 64, and msblk->devblksize_log2
is set to 64.

This subsequently causes the

UBSAN: shift-out-of-bounds in fs/squashfs/block.c:195:36
shift exponent 64 is too large for 64-bit type 'u64' (aka
'unsigned long long')

This commit adds a check for a 0 return by sb_min_blocksize().

Link: https://lkml.kernel.org/r/20250409024747.876480-1-phillip@squashfs.org.uk
Fixes: 0aa666190509 ("Squashfs: super block operations")
Reported-by: syzbot+65761fc25a137b9c8c6e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67f0dd7a.050a0220.0a13.0230.GAE@google.com/
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

exit: combine work under lock in synchronize_group_exit() and coredump_task_exit()

This reduces single-threaded overhead as it avoids one lock+irq trip on
exit.

It also improves scalability of spawning and killing threads within one
process (just shy of 5% when doing it on 24 cores on my test jig).

Both routines are moved below kcov and kmsan exit, which should be
harmless.

Link: https://lkml.kernel.org/r/20250319195436.1864415-1-mjguzik@gmail.com
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

errseq: eliminate special limitation for macro MAX_ERRNO

Current errseq implementation depends on a very special precondition that
macro MAX_ERRNO must be (2^n - 1).

Eliminate the limitation by

- redefining macro ERRSEQ_SHIFT
- defining a new macro ERRNO_MASK instead of MAX_ERRNO for errno mask.

There is no plan to change the value of MAX_ERRNO, but this makes the
implementation more generic and eliminates the BUILD_BUG_ON().

Link: https://lkml.kernel.org/r/20250407-improve_errseq-v1-1-7b27cbeb8298@quicinc.com
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kstrtox: add support for enabled and disabled in kstrtobool()

In some places in the kernel there is a design pattern for sysfs
attributes to use kstrtobool() in store() and str_enabled_disabled() in
show().

This is counterintuitive to interact with because kstrtobool() takes
on/off but str_enabled_disabled() shows enabled/disabled. Some of those
sysfs uses could switch to str_on_off() but for some attributes
enabled/disabled really makes more sense.

Add support for kstrtobool() to accept enabled/disabled.

Link: https://lkml.kernel.org/r/20250321022538.1532445-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel.h: move PTR_IF() and u64_to_user_ptr() to util_macros.h

While the natural choice of PTR_IF() is kconfig.h, the latter is too broad
to include C code and actually the macro was moved out from there in the
past. But kernel.h is neither a good choice for that. Move it to
util_macros.h. Do the same for u64_to_user_ptr().

While moving, add necessary documentation.

Link: https://lkml.kernel.org/r/20250324105228.775784-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel.h: move READ/WRITE definitions to <linux/types.h>

Patch series "kernel.h: Move out a couple of macros and constants".

kernel.h hosts a couple of macros and constants that may be better placed.
Do that. Also add missing documentation. No functional changes
intended.

This patch (of 2):

Headers shouldn't be forced to include <linux/kernel.h> just to
gain these simple constants.

Link: https://lkml.kernel.org/r/20250324105228.775784-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20250324105228.775784-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

powernow: use pr_info_once

This reduces log-msgs during boot from many pages to ~10 occurrences. I
didn't investigate why it wasn't just 1, maybe its a low-level service to
other modules, re-probed by each of them ?

Link: https://lkml.kernel.org/r/20250325235156.663269-4-jim.cromie@gmail.com
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Louis Chauvet <louis.chauvet@bootlin.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: qualify do-while-0 advice

Add a paragraph of advice qualifying the general do-while-0 advice, noting
3 possible misguidings. reduce one ERROR to WARN, for the case I actually
encountered.

And add 'static_assert' to named exceptions, along with some additional
comments about named exceptions vs (detection of) declarative construction
primitives (union, struct, [], etc).

Link: https://lkml.kernel.org/r/20250325235156.663269-3-jim.cromie@gmail.com
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Louis Chauvet <louis.chauvet@bootlin.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: dont warn about unused macro arg on empty body

Patch series "2 checkpatch fixes, one pr_info_once".

2 small tweaks to checkpatch,
1 reducing several pages of powernow "not-relevant-here" log-msgs to a few lines

This patch (of 3):

We currently get:
WARNING: Argument 'name' is not used in function-like macro
on:
#define DRM_CLASSMAP_USE(name) /* nothing here */

Following this advice is wrong here, and shouldn't be fixed by ignoring
args altogether; the macro should properly fail if invoked with 0 or 2+
args.

Link: https://lkml.kernel.org/r/20250325235156.663269-1-jim.cromie@gmail.com
Link: https://lkml.kernel.org/r/20250325235156.663269-2-jim.cromie@gmail.com
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc:"Rafael J. Wysocki" <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

proc: fix the issue of proc_mem_open returning NULL

proc_mem_open() can return an errno, NULL, or mm_struct*.  If it fails to
acquire mm, it returns NULL, but the caller does not check for the case
when the return value is NULL.

The following conditions lead to failure in acquiring mm:

  - The task is a kernel thread (PF_KTHREAD)
  - The task is exiting (PF_EXITING)

Changes:

  - Add documentation comments for the return value of proc_mem_open().
  - Add checks in the caller to return -ESRCH when proc_mem_open()
    returns NULL.

Link: https://lkml.kernel.org/r/20250404063357.78891-1-superman.xpt@gmail.com
Reported-by: syzbot+f9238a0a31f9b5603fef@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f52642060d4e3750@google.com
Signed-off-by: Penglei Jiang <superman.xpt@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Adrian Ratiu <adrian.ratiu@collabora.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Felix Moessbauer <felix.moessbauer@siemens.com>
Cc: Jeff layton <jlayton@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/rbtree.c: fix the example typo

Replace `sr` with `Sr`. The condition `!tmp1 || rb_is_black(tmp1)`
ensures that `tmp1` (which is `sibling->rb_right`) is either NULL or a
black node. Therefore, the right child of the sibling must be black, and
the example should use `Sr` instead of `sr`.

Link: https://lkml.kernel.org/r/20250403112614.570140-1-johnny1001s000602@gmail.com
Signed-off-by: Chisheng Chen <johnny1001s000602@gmail.com>
Cc: Hsin Chang Yu <zxcvb600870024@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

task_stack.h: remove obsolete __HAVE_ARCH_KSTACK_END check

Remove __HAVE_ARCH_KSTACK_END as it has been obsolete since removal of
metag architecture in v4.17.

Link: https://lkml.kernel.org/r/20250403-kstack-end-v1-1-7798e71f34d1@linaro.org
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/20240311164638.2015063-2-pasha.tatashin@soleen.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash: export PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfo

On Intel TDX guest, unaccepted memory is unusable free memory which is not
managed by buddy, until it's accepted by guest.  Before that, it cannot be
accessed by the first kernel as well as the kexec'ed kernel.  The kexec'ed
kernel will skip these pages and fill in zero data for the reader of
vmcore.

The dump tool like makedumpfile creates a page descriptor (size 24 bytes)
for each non-free page, including zero data page, but it will not create
descriptor for free pages.  If it is not able to distinguish these
unaccepted pages with zero data pages, a certain amount of space will be
wasted in proportion (~1/170).  In fact, as a special kind of free page
the unaccepted pages should be excluded, like the real free pages.

Export the page type PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfo, so that
dump tool can identify whether a page is unaccepted.

[zhiquan1.li@intel.com: fix docs: "Title underline too short" warning]
Link: https://lore.kernel.org/all/20240809114854.3745464-5-kirill.shutemov@linux.intel.com/
Link: https://lkml.kernel.org/r/20250405060610.860465-1-zhiquan1.li@intel.com
Link: https://lore.kernel.org/all/20240809114854.3745464-5-kirill.shutemov@linux.intel.com/
Link: https://lkml.kernel.org/r/20250403030801.758687-1-zhiquan1.li@intel.com
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

init/main.c: log initcall level when initcall_debug is used

When initcall_debug is specified on the command line, the start and return
point for each initcall is printed. However, no information on the
initcall level is reported.

Add to the initcall_debug infrastructure an additional print that informs
when a new initcall level is entered. This is particularly useful when
debugging dependency chains and/or working on boot time reduction.

Link: https://lkml.kernel.org/r/20250316205014.2830071-2-francesco@valla.it
Signed-off-by: Francesco Valla <francesco@valla.it>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Bird <tim.bird@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

exit: move and extend sched_process_exit() tracepoint

It is useful to be able to access current->mm at task exit to, say, record
a bunch of VMA information right before the task exits (e.g., for stack
symbolization reasons when dealing with short-lived processes that exit in
the middle of profiling session).  Currently, trace_sched_process_exit()
is triggered after exit_mm() which resets current->mm to NULL making this
tracepoint unsuitable for inspecting and recording task's
mm_struct-related data when tracing process lifetimes.

There is a particularly suitable place, though, right after
taskstats_exit() is called, but before we do exit_mm() and other exit_*()
resource teardowns.  taskstats performs a similar kind of accounting that
some applications do with BPF, and so co-locating them seems like a good
fit.  So that's where trace_sched_process_exit() is moved with this patch.

Also, existing trace_sched_process_exit() tracepoint is notoriously
missing `group_dead` flag that is certainly useful in practice and some of
our production applications have to work around this.  So plumb
`group_dead` through while at it, to have a richer and more complete
tracepoint.

Note that we can't use sched_process_template anymore, and so we use
TRACE_EVENT()-based tracepoint definition.  But all the field names and
order, as well as assign and output logic remain intact.  We just add one
extra field at the end in backwards-compatible way.

[andrii@kernel.org: document sched_process_exit and sched_process_template relation]
Link: https://lkml.kernel.org/r/20250403174120.4087794-1-andrii@kernel.org
Link: https://lkml.kernel.org/r/20250402180925.90914-1-andrii@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Linux 6.15-rc6

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"ARM:

   - Avoid use of uninitialized memcache pointer in user_mem_abort()

   - Always set HCR_EL2.xMO bits when running in VHE, allowing
     interrupts to be taken while TGE=0 and fixing an ugly bug on
     AmpereOne that occurs when taking an interrupt while clearing the
     xMO bits (AC03_CPU_36)

   - Prevent VMMs from hiding support for AArch64 at any EL virtualized
     by KVM

   - Save/restore the host value for HCRX_EL2 instead of restoring an
     incorrect fixed value

   - Make host_stage2_set_owner_locked() check that the entire requested
     range is memory rather than just the first page

  RISC-V:

   - Add missing reset of smstateen CSRs

  x86:

   - Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid
     causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to
     sanitize the VMCB as its state is undefined after SHUTDOWN,
     emulating INIT is the least awful choice).

   - Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM
     KVM doesn't goof a sanity check in the future.

   - Free obsolete roots when (re)loading the MMU to fix a bug where
     pre-faulting memory can get stuck due to always encountering a
     stale root.

   - When dumping GHCB state, use KVM's snapshot instead of the raw GHCB
     page to print state, so that KVM doesn't print stale/wrong
     information.

   - When changing memory attributes (e.g. shared <=> private), add
     potential hugepage ranges to the mmu_invalidate_range_{start,end}
     set so that KVM doesn't create a shared/private hugepage when the
     the corresponding attributes will become mixed (the attributes are
     commited *after* KVM finishes the invalidation).

   - Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM
     has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is
     loaded led to very measurable performance regressions for non-KVM
     workloads"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions
  KVM: arm64: Fix memory check in host_stage2_set_owner_locked()
  KVM: arm64: Kill HCRX_HOST_FLAGS
  KVM: arm64: Properly save/restore HCRX_EL2
  KVM: arm64: selftest: Don't try to disable AArch64 support
  KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL
  KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
  KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()
  KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing
  KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields
  KVM: RISC-V: reset smstateen CSRs
  KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload()
  KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run()
  KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception

Merge tag 'mips-fixes_6.15_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

- Fix delayed timers

- Fix NULL pointer deref

- Fix wrong range check

* tag 'mips-fixes_6.15_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Fix MAX_REG_OFFSET
  MIPS: CPS: Fix potential NULL pointer dereferences in cps_prepare_cpus()
  MIPS: rename rollback_handler with skipover_handler
  MIPS: Move r4k_wait() to .cpuidle.text section
  MIPS: Fix idle VS timer enqueue

Merge tag 'x86-urgent-2025-05-11' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
"Fix a boot regression on very old x86 CPUs without CPUID support"

* tag 'x86-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Consolidate the loader enablement checking

Merge tag 'timers-urgent-2025-05-11' of git://git./linux/kernel/git/tip/tip

Pull misc timers fixes from Ingo Molnar:

- Fix time keeping bugs in CLOCK_MONOTONIC_COARSE clocks

- Work around absolute relocations into vDSO code that GCC erroneously
   emits in certain arm64 build environments

- Fix a false positive lockdep warning in the i8253 clocksource driver

* tag 'timers-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable()
  arm64: vdso: Work around invalid absolute relocations from GCC
  timekeeping: Prevent coarse clocks going backwards

Merge tag 'input-for-v6.15-rc5' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

- Synaptics touchpad on multiple laptops (Dynabook Portege X30L-G,
   Dynabook Portege X30-D, TUXEDO InfinityBook Pro 14 v5, Dell Precision
   M3800, HP Elitebook 850 G1) switched from PS/2 to SMBus mode

- a number of new controllers added to xpad driver: HORI Drum
   controller, PowerA Fusion Pro 4, PowerA MOGA XP-Ultra controller,
   8BitDo Ultimate 2 Wireless Controller, 8BitDo Ultimate 3-mode
   Controller, Hyperkin DuchesS Xbox One controller

- fixes to xpad driver to properly handle Mad Catz JOYTECH NEO SE
   Advanced and PDP Mirror's Edge Official controllers

- fixes to xpad driver to properly handle "Share" button on some
   controllers

- a fix for device initialization timing and for waking up the
   controller in cyttsp5 driver

- a fix for hisi_powerkey driver to properly wake up from s2idle state

- other assorted cleanups and fixes

* tag 'input-for-v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xpad - fix xpad_device sorting
  Input: xpad - add support for several more controllers
  Input: xpad - fix Share button on Xbox One controllers
  Input: xpad - fix two controller table values
  Input: hisi_powerkey - enable system-wakeup for s2idle
  Input: synaptics - enable InterTouch on Dell Precision M3800
  Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5
  Input: synaptics - enable InterTouch on Dynabook Portege X30L-G
  Input: synaptics - enable InterTouch on Dynabook Portege X30-D
  Input: synaptics - enable SMBus for HP Elitebook 850 G1
  Input: mtk-pmic-keys - fix possible null pointer dereference
  Input: xpad - add support for 8BitDo Ultimate 2 Wireless Controller
  Input: cyttsp5 - fix power control issue on wakeup
  MAINTAINERS: .mailmap: update Mattijs Korpershoek's email address
  dt-bindings: mediatek,mt6779-keypad: Update Mattijs' email address
  Input: stmpe-ts - use module alias instead of device table
  Input: cyttsp5 - ensure minimum reset pulse width
  Input: sparcspkr - avoid unannotated fall-through
  input/joystick: magellan: Mark __nonstring look-up table

Merge tag 'fixes-2025-05-11' of git://git./linux/kernel/git/rppt/memblock

Pull memblock fixes from Mike Rapoport:

- Mark set_high_memory() as __init to fix section mismatch

- Accept memory allocated in memblock_double_array() to mitigate crash
   of SNP guests

* tag 'fixes-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock: Accept allocated memory before use in memblock_double_array()
  mm,mm_init: Mark set_high_memory as __init

Input: xpad - fix xpad_device sorting

A recent commit put one entry in the wrong place. This just moves it to the
right place.

Signed-off-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/20250328234345.989761-5-vi@endrift.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: xpad - add support for several more controllers

This adds support for several new controllers, all of which include
Share buttons:

- HORI Drum controller
- PowerA Fusion Pro 4
- 8BitDo Ultimate 3-mode Controller
- Hyperkin DuchesS Xbox One controller
- PowerA MOGA XP-Ultra controller

Signed-off-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/20250328234345.989761-4-vi@endrift.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: xpad - fix Share button on Xbox One controllers

The Share button, if present, is always one of two offsets from the end of the
file, depending on the presence of a specific interface. As we lack parsing for
the identify packet we can't automatically determine the presence of that
interface, but we can hardcode which of these offsets is correct for a given
controller.

More controllers are probably fixable by adding the MAP_SHARE_BUTTON in the
future, but for now I only added the ones that I have the ability to test
directly.

Signed-off-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/20250328234345.989761-2-vi@endrift.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: xpad - fix two controller table values

Two controllers -- Mad Catz JOYTECH NEO SE Advanced and PDP Mirror's
Edge Official -- were missing the value of the mapping field, and thus
wouldn't detect properly.

Signed-off-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/20250328234345.989761-1-vi@endrift.com
Fixes: 540602a43ae5 ("Input: xpad - add a few new VID/PID combinations")
Fixes: 3492321e2e60 ("Input: xpad - add multiple supported devices")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: hisi_powerkey - enable system-wakeup for s2idle

To wake up the system from s2idle when pressing the power-button, let's
convert from using pm_wakeup_event() to pm_wakeup_dev_event(), as it allows
us to specify the "hard" in-parameter, which needs to be set for s2idle.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20250306115021.797426-1-ulf.hansson@linaro.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git./linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
"22 hotfixes. 13 are cc:stable and the remainder address post-6.14
  issues or aren't considered necessary for -stable kernels.

  About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates"

* tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
  mm: fix folio_pte_batch() on XEN PV
  nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs()
  mm/hugetlb: copy the CMA flag when demoting
  mm, swap: fix false warning for large allocation with !THP_SWAP
  selftests/mm: fix a build failure on powerpc
  selftests/mm: fix build break when compiling pkey_util.c
  mm: vmalloc: support more granular vrealloc() sizing
  tools/testing/selftests: fix guard region test tmpfs assumption
  ocfs2: stop quota recovery before disabling quotas
  ocfs2: implement handshaking with ocfs2 recovery thread
  ocfs2: switch osb->disable_recovery to enum
  mailmap: map Uwe's BayLibre addresses to a single one
  MAINTAINERS: add mm THP section
  mm/userfaultfd: fix uninitialized output field for -EAGAIN race
  selftests/mm: compaction_test: support platform with huge mount of memory
  MAINTAINERS: add core mm section
  ocfs2: fix panic in failed foilio allocation
  mm/huge_memory: fix dereferencing invalid pmd migration entry
  MAINTAINERS: add reverse mapping section
  x86: disable image size check for test builds
  ...

Merge tag 'driver-core-6.15-rc6' of git://git./linux/kernel/git/driver-core/driver-core

Pull driver core fix from Greg KH:
"Here is a single driver core fix for a regression for platform devices
  that is a regression from a change that went into 6.15-rc1 that
  affected Pixel devices. It has been in linux-next for over a week with
  no reported problems"

* tag 'driver-core-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  platform: Fix race condition during DMA configure at IOMMU probe time

Merge tag 'usb-6.15-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 6.15-rc6. Included in here
  are:

   - typec driver fixes

   - usbtmc ioctl fixes

   - xhci driver fixes

   - cdnsp driver fixes

   - some gadget driver fixes

  Nothing really major, just all little stuff that people have reported
  being issues. All of these have been in linux-next this week with no
  reported issues"

* tag 'usb-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: dbc: Avoid event polling busyloop if pending rx transfers are inactive.
  usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue
  usb: usbtmc: Fix erroneous generic_read ioctl return
  usb: usbtmc: Fix erroneous wait_srq ioctl return
  usb: usbtmc: Fix erroneous get_stb ioctl error returns
  usb: typec: tcpm: delay SNK_TRY_WAIT_DEBOUNCE to SRC_TRYWAIT transition
  USB: usbtmc: use interruptible sleep in usbtmc_read
  usb: cdnsp: fix L1 resume issue for RTL_REVISION_NEW_LPM version
  usb: typec: ucsi: displayport: Fix NULL pointer access
  usb: typec: ucsi: displayport: Fix deadlock
  usb: misc: onboard_usb_dev: fix support for Cypress HX3 hubs
  usb: uhci-platform: Make the clock really optional
  usb: dwc3: gadget: Make gadget_wakeup asynchronous
  usb: gadget: Use get_status callback to set remote wakeup capability
  usb: gadget: f_ecm: Add get_status callback
  usb: host: tegra: Prevent host controller crash when OTG port is used
  usb: cdnsp: Fix issue with resuming from L1
  usb: gadget: tegra-xudc: ACK ST_RC after clearing CTRL_RUN

Merge tag 'staging-6.15-rc6' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are three small staging driver fixes for 6.15-rc6. These are:

   - bcm2835-camera driver fix

   - two axis-fifo driver fixes

  All of these have been in linux-next for a few weeks with no reported
  issues"

* tag 'staging-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: axis-fifo: Remove hardware resets for user errors
  staging: axis-fifo: Correct handling of tx_fifo_depth for size validation
  staging: bcm2835-camera: Initialise dev in v4l2_dev

Merge tag 'char-misc-6.15-rc6' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc/IIO driver fixes from Greg KH:
"Here are a bunch of small driver fixes (mostly all IIO) for 6.15-rc6.
  Included in here are:

   - loads of tiny IIO driver fixes for reported issues

   - hyperv driver fix for a much-reported and worked on sysfs ring
     buffer creation bug

  All of these have been in linux-next for over a week (the IIO ones for
  many weeks now), with no reported issues"

* tag 'char-misc-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits)
  Drivers: hv: Make the sysfs node size for the ring buffer dynamic
  uio_hv_generic: Fix sysfs creation path for ring buffer
  iio: adis16201: Correct inclinometer channel resolution
  iio: adc: ad7606: fix serial register access
  iio: pressure: mprls0025pa: use aligned_s64 for timestamp
  iio: imu: adis16550: align buffers for timestamp
  staging: iio: adc: ad7816: Correct conditional logic for store mode
  iio: adc: ad7266: Fix potential timestamp alignment issue.
  iio: adc: ad7768-1: Fix insufficient alignment of timestamp.
  iio: adc: dln2: Use aligned_s64 for timestamp
  iio: accel: adxl355: Make timestamp 64-bit aligned using aligned_s64
  iio: temp: maxim-thermocouple: Fix potential lack of DMA safe buffer.
  iio: chemical: pms7003: use aligned_s64 for timestamp
  iio: chemical: sps30: use aligned_s64 for timestamp
  iio: imu: inv_mpu6050: align buffer for timestamp
  iio: imu: st_lsm6dsx: Fix wakeup source leaks on device unbind
  iio: adc: qcom-spmi-iadc: Fix wakeup source leaks on device unbind
  iio: accel: fxls8962af: Fix wakeup source leaks on device unbind
  iio: adc: ad7380: fix event threshold shift
  iio: hid-sensor-prox: Fix incorrect OFFSET calculation
  ...

Merge tag 'i2c-for-6.15-rc6' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- omap: use correct function to read from device tree

- MAINTAINERS: remove Seth from ISMT maintainership

* tag 'i2c-for-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: Remove entry for Seth Heasley
i2c: omap: fix deprecated of_property_read_bool() use

Merge tag 'for-linus-6.15a-rc6-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

- A fix for the xenbus driver allowing to use a PVH Dom0 with
   Xenstore running in another domain

- A fix for the xenbus driver addressing a rare race condition
   resulting in NULL dereferences and other problems

- A fix for the xen-swiotlb driver fixing a problem seen on Arm
   platforms

* tag 'for-linus-6.15a-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xenbus: Use kref to track req lifetime
  xenbus: Allow PVH dom0 a non-local xenstore
  xen: swiotlb: Use swiotlb bouncing if kmalloc allocation demands it

Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs

Pull mount fixes from Al Viro:
"A couple of races around legalize_mnt vs umount (both fairly old and
  hard to hit) plus two bugs in move_mount(2) - both around 'move
  detached subtree in place' logics"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix IS_MNT_PROPAGATING uses
  do_move_mount(): don't leak MNTNS_PROPAGATING on failures
  do_umount(): add missing barrier before refcount checks in sync case
  __legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock

Merge tag 'kvm-x86-fixes-6.15-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.15-rcN

- Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid causing
   problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to sanitize the
   VMCB as its state is undefined after SHUTDOWN, emulating INIT is the
   least awful choice).

- Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM
   KVM doesn't goof a sanity check in the future.

- Free obsolete roots when (re)loading the MMU to fix a bug where
   pre-faulting memory can get stuck due to always encountering a stale
   root.

- When dumping GHCB state, use KVM's snapshot instead of the raw GHCB page
   to print state, so that KVM doesn't print stale/wrong information.

- When changing memory attributes (e.g. shared <=> private), add potential
   hugepage ranges to the mmu_invalidate_range_{start,end} set so that KVM
   doesn't create a shared/private hugepage when the the corresponding
   attributes will become mixed (the attributes are commited *after* KVM
   finishes the invalidation).

- Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM has at
   least one active VM.  Effectively BP_SPEC_REDUCE when KVM is loaded led
   to very measurable performance regressions for non-KVM workloads.

Merge tag 'kvmarm-fixes-6.15-3' of https://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.15, round #3

- Avoid use of uninitialized memcache pointer in user_mem_abort()

- Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts
   to be taken while TGE=0 and fixing an ugly bug on AmpereOne that
   occurs when taking an interrupt while clearing the xMO bits
   (AC03_CPU_36)

- Prevent VMMs from hiding support for AArch64 at any EL virtualized by
   KVM

- Save/restore the host value for HCRX_EL2 instead of restoring an
   incorrect fixed value

- Make host_stage2_set_owner_locked() check that the entire requested
   range is memory rather than just the first page

Merge tag 'kvm-riscv-fixes-6.15-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv fixes for 6.15, take #1

- Add missing reset of smstateen CSRs

Merge tag 'i2c-host-fixes-6.15-rc6' of git://git./linux/kernel/git/andi.shyti/linux into i2c/for-current

i2c-host-fixes for v6.15-rc6

- omap: use correct function to read from device tree
- MAINTAINERS: remove Seth from ISMT maintainership

Merge tag '6.15-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix dentry leak which can cause umount crash

- Add warning for parse contexts error on compounded operation

* tag '6.15-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Avoid race in open_cached_dir with lease breaks
smb3 client: warn when parse contexts returns error on compounded operation

fix IS_MNT_PROPAGATING uses

propagate_mnt() does not attach anything to mounts created during
propagate_mnt() itself.  What's more, anything on ->mnt_slave_list
of such new mount must also be new, so we don't need to even look
there.

When move_mount() had been introduced, we've got an additional
class of mounts to skip - if we are moving from anon namespace,
we do not want to propagate to mounts we are moving (i.e. all
mounts in that anon namespace).

Unfortunately, the part about "everything on their ->mnt_slave_list
will also be ignorable" is not true - if we have propagation graph
A -> B -> C
and do OPEN_TREE_CLONE open_tree() of B, we get
A -> [B <-> B'] -> C
as propagation graph, where B' is a clone of B in our detached tree.
Making B private will result in
A -> B' -> C
C still gets propagation from A, as it would after making B private
if we hadn't done that open_tree(), but now the propagation goes
through B'.  Trying to move_mount() our detached tree on subdirectory
in A should have
* moved B' on that subdirectory in A
* skipped the corresponding subdirectory in B' itself
* copied B' on the corresponding subdirectory in C.
As it is, the logics in propagation_next() and friends ends up
skipping propagation into C, since it doesn't consider anything
downstream of B'.

IOW, walking the propagation graph should only skip the ->mnt_slave_list
of new mounts; the only places where the check for "in that one
anon namespace" are applicable are propagate_one() (where we should
treat that as the same kind of thing as "mountpoint we are looking
at is not visible in the mount we are looking at") and
propagation_would_overmount().  The latter is better dealt with
in the caller (can_move_mount_beneath()); on the first call of
propagation_would_overmount() the test is always false, on the
second it is always true in "move from anon namespace" case and
always false in "move within our namespace" one, so it's easier
to just use check_mnt() before bothering with the second call and
be done with that.

Fixes: 064fe6e233e8 ("mount: handle mount propagation for detached mount trees")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

do_move_mount(): don't leak MNTNS_PROPAGATING on failures

as it is, a failed move_mount(2) from anon namespace breaks
all further propagation into that namespace, including normal
mounts in non-anon namespaces that would otherwise propagate
there.

Fixes: 064fe6e233e8 ("mount: handle mount propagation for detached mount trees")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

do_umount(): add missing barrier before refcount checks in sync case

do_umount() analogue of the race fixed in 119e1ef80ecf "fix
__legitimize_mnt()/mntput() race".  Here we want to make sure that
if __legitimize_mnt() doesn't notice our lock_mount_hash(), we will
notice their refcount increment.  Harder to hit than mntput_no_expire()
one, fortunately, and consequences are milder (sync umount acting
like umount -l on a rare race with RCU pathwalk hitting at just the
wrong time instead of use-after-free galore mntput_no_expire()
counterpart used to be hit).  Still a bug...

Fixes: 48a066e72d97 ("RCU'd vfsmounts")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

__legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock

... or we risk stealing final mntput from sync umount - raising mnt_count
after umount(2) has verified that victim is not busy, but before it
has set MNT_SYNC_UMOUNT; in that case __legitimize_mnt() doesn't see
that it's safe to quietly undo mnt_count increment and leaves dropping
the reference to caller, where it'll be a full-blown mntput().

Check under mount_lock is needed; leaving the current one done before
taking that makes no sense - it's nowhere near common enough to bother
with.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge tag 'rust-fixes-6.15-2' of git://git./linux/kernel/git/ojeda/linux

Pull rust fixes from Miguel Ojeda:

- Make CFI_AUTO_DEFAULT depend on !RUST or Rust >= 1.88.0

- Clean Rust (and Clippy) lints for the upcoming Rust 1.87.0 and 1.88.0
   releases

- Clean objtool warning for the upcoming Rust 1.87.0 release by adding
   one more noreturn function

* tag 'rust-fixes-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  x86/Kconfig: make CFI_AUTO_DEFAULT depend on !RUST or Rust >= 1.88
  rust: clean Rust 1.88.0's `clippy::uninlined_format_args` lint
  rust: clean Rust 1.88.0's warning about `clippy::disallowed_macros` configuration
  rust: clean Rust 1.88.0's `unnecessary_transmutes` lint
  rust: allow Rust 1.87.0's `clippy::ptr_eq` lint
  objtool/rust: add one more `noreturn` Rust function for Rust 1.87.0

Merge tag 'drm-fixes-2025-05-10' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly drm fixes, bit bigger than last week, but overall amdgpu/xe
  with some ivpu bits and a random few fixes, and dropping the
  ttm_backup struct which wrapped struct file and was recently
  frowned at.

  drm:
   - Fix overflow when generating wedged event

  ttm:
   - Fix documentation
   - Remove struct ttm_backup

  panel:
   - simple: Fix timings for AUO G101EVN010

  amdgpu:
   - DC FP fixes
   - Freesync fix
   - DMUB AUX fixes
   - VCN fix
   - Hibernation fixes
   - HDP fixes

  xe:
   - Prevent PF queue overflow
   - Hold all forcewake during mocs test
   - Remove GSC flush on reset path
   - Fix forcewake put on error path
   - Fix runtime warning when building without svm

  i915:
   - Fix oops on resume after disconnecting DP MST sinks during suspend
   - Fix SPLC num_waiters refcounting

  ivpu:
   - Increase timeouts
   - Fix deadlock in cmdq ioctl
   - Unlock mutices in correct order

  v3d:
   - Avoid memory leak in job handling"

* tag 'drm-fixes-2025-05-10' of https://gitlab.freedesktop.org/drm/kernel: (32 commits)
  drm/i915/dp: Fix determining SST/MST mode during MTP TU state computation
  drm/xe: Add config control for svm flush work
  drm/xe: Release force wake first then runtime power
  drm/xe/gsc: do not flush the GSC worker from the reset path
  drm/xe/tests/mocs: Hold XE_FORCEWAKE_ALL for LNCF regs
  drm/xe: Add page queue multiplier
  drm/amdgpu/hdp7: use memcfg register to post the write for HDP flush
  drm/amdgpu/hdp6: use memcfg register to post the write for HDP flush
  drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flush
  drm/amdgpu/hdp5: use memcfg register to post the write for HDP flush
  drm/amdgpu/hdp4: use memcfg register to post the write for HDP flush
  drm/amdgpu: fix pm notifier handling
  Revert "drm/amd: Stop evicting resources on APUs in suspend"
  drm/amdgpu/vcn: using separate VCN1_AON_SOC offset
  drm/amd/display: Fix wrong handling for AUX_DEFER case
  drm/amd/display: Copy AUX read reply data whenever length > 0
  drm/amd/display: Remove incorrect checking in dmub aux handler
  drm/amd/display: Fix the checking condition in dmub aux handling
  drm/amd/display: Shift DMUB AUX reply command if necessary
  drm/amd/display: Call FP Protect Before Mode Programming/Mode Support
  ...

Merge tag 'drm-intel-fixes-2025-05-09' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

drm/i915 fixes for v6.15-rc6:
- Fix oops on resume after disconnecting DP MST sinks during suspend
- Fix SPLC num_waiters refcounting

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/87tt5umeaw.fsf@intel.com

Merge tag 'drm-xe-fixes-2025-05-09' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Prevent PF queue overflow
- Hold all forcewake during mocs test
- Remove GSC flush on reset path
- Fix forcewake put on error path
- Fix runtime warning when building without svm

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/jffqa56f2zp4i5ztz677cdspgxhnw7qfop3dd3l2epykfpfvza@q2nw6wapsphz

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
"Move the arm64_use_ng_mappings variable from the .bss to the .data
  section as it is accessed very early during boot with the MMU off and
  before the .bss has been initialised.

  This could lead to incorrect idmap page table"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: cpufeature: Move arm64_use_ng_mappings to the .data section to prevent wrong idmap generation

Merge tag 'riscv-for-linus-6.15-rc6' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- The compressed half-word misaligned access instructions (c.lhu, c.lh,
   and c.sh) from the Zcb extension are now properly emulated

- A series of fixes to properly emulate permissions while handling
   userspace misaligned accesses

- A pair of fixes for PR_GET_TAGGED_ADDR_CTRL to avoid accessing the
   envcfg CSR on systems that don't support that CSR, and to report
   those failures up to userspace

- The .rela.dyn section is no longer stripped from vmlinux, as it is
   necessary to relocate the kernel under some conditions (including
   kexec)

* tag 'riscv-for-linus-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Disallow PR_GET_TAGGED_ADDR_CTRL without Supm
  scripts: Do not strip .rela.dyn section
  riscv: Fix kernel crash due to PR_SET_TAGGED_ADDR_CTRL
  riscv: misaligned: use get_user() instead of __get_user()
  riscv: misaligned: enable IRQs while handling misaligned accesses
  riscv: misaligned: factorize trap handling
  riscv: misaligned: Add handling for ZCB instructions

Merge tag 'block-6.15-20250509' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- Fix for a regression in this series for loop and read/write iterator
   handling

- zone append block update tweak

- remove a broken IO priority test

- NVMe pull request via Christoph:
      - unblock ctrl state transition for firmware update (Daniel
        Wagner)

* tag 'block-6.15-20250509' of git://git.kernel.dk/linux:
  block: remove test of incorrect io priority level
  nvme: unblock ctrl state transition for firmware update
  block: only update request sector if needed
  loop: Add sanity check for read/write_iter

Merge tag 'io_uring-6.15-20250509' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Fix for linked timeouts arming and firing wrt prep and issue of the
   request being managed by the linked timeout

- Fix for a CQE ordering issue between requests with multishot and
   using the same buffer group. This is a dumbed down version for this
   release and for stable, it'll get improved for v6.16

- Tweak the SQPOLL submit batch size. A previous commit made SQPOLL
   manage its own task_work and chose a tiny batch size, bump it from 8
   to 32 to fix a performance regression due to that

* tag 'io_uring-6.15-20250509' of git://git.kernel.dk/linux:
  io_uring/sqpoll: Increase task_work submission batch size
  io_uring: ensure deferred completions are flushed for multishot
  io_uring: always arm linked timeouts prior to issue

Merge tag 'modules-6.15-rc6' of git://git./linux/kernel/git/modules/linux

Pull modules fix from Petr Pavlu:
"A single fix to prevent use of an uninitialized completion pointer
  when releasing a module_kobject in specific situations.

  This addresses a latent bug exposed by commit f95bbfe18512 ("drivers:
  base: handle module_kobject creation")"

* tag 'modules-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
  module: ensure that kobject_put() is safe for module type kobjects

x86/mm: Eliminate window where TLB flushes may be inadvertently skipped

tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm.  But
should_flush_tlb() has a bug and suppresses the flush.  Fix it by
widening the window where should_flush_tlb() sends an IPI.

Long Version:

=== History ===

There were a few things leading up to this.

First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier.  But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask().  So code was added to cull
mm_cpumask() periodically[2].  But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them.  So here we are
again.

=== Problem ===

The too-aggressive code in should_flush_tlb() strikes in this window:

// Turn on IPIs for this CPU/mm combination, but only
// if should_flush_tlb() agrees:
cpumask_set_cpu(cpu, mm_cpumask(next));

next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
load_new_mm_cr3(need_flush);
// ^ After 'need_flush' is set to false, IPIs *MUST*
// be sent to this CPU and not be ignored.

        this_cpu_write(cpu_tlbstate.loaded_mm, next);
// ^ Not until this point does should_flush_tlb()
// become true!

should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed.  Whoops.

=== Solution ===

Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING.  Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.

This will cause more TLB flush IPIs.  But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.

Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.

Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them.  Add a barrier to ensure that they are observed in the
order they are written.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
Fixes: 6db2526c1d69 ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2]
Reported-by: Stephen Dolan <sdolan@janestreet.com>
Cc: stable@vger.kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

io_uring/sqpoll: Increase task_work submission batch size

Our QA team reported a 10%-23%, throughput reduction on an io_uring
sqpoll testcase doing IO to a null_blk, that I traced back to a
reduction of the device submission queue depth utilization. It turns out
that, after commit af5d68f8892f ("io_uring/sqpoll: manage task_work
privately"), we capped the number of task_work entries that can be
completed from a single spin of sqpoll to only 8 entries, before the
sqpoll goes around to (potentially) sleep.  While this cap doesn't drive
the submission side directly, it impacts the completion behavior, which
affects the number of IO queued by fio per sqpoll cycle on the
submission side, and io_uring ends up seeing less ios per sqpoll cycle.
As a result, block layer plugging is less effective, and we see more
time spent inside the block layer in profilings charts, and increased
submission latency measured by fio.

There are other places that have increased overhead once sqpoll sleeps
more often, such as the sqpoll utilization calculation.  But, in this
microbenchmark, those were not representative enough in perf charts, and
their removal didn't yield measurable changes in throughput.  The major
overhead comes from the fact we plug less, and less often, when submitting
to the block layer.

My benchmark is:

fio --ioengine=io_uring --direct=1 --iodepth=128 --runtime=300 --bs=4k \
    --invalidate=1 --time_based  --ramp_time=10 --group_reporting=1 \
    --filename=/dev/nullb0 --name=RandomReads-direct-nullb-sqpoll-4k-1 \
    --rw=randread --numjobs=1 --sqthread_poll

In one machine, tested on top of Linux 6.15-rc1, we have the following
baseline:
  READ: bw=4994MiB/s (5236MB/s), 4994MiB/s-4994MiB/s (5236MB/s-5236MB/s), io=439GiB (471GB), run=90001-90001msec

With this patch:
  READ: bw=5762MiB/s (6042MB/s), 5762MiB/s-5762MiB/s (6042MB/s-6042MB/s), io=506GiB (544GB), run=90001-90001msec

which is a 15% improvement in measured bandwidth.  The average
submission latency is noticeably lowered too.  As measured by
fio:

Baseline:
   lat (usec): min=20, max=241, avg=99.81, stdev=3.38
Patched:
   lat (usec): min=26, max=226, avg=86.48, stdev=4.82

If we look at blktrace, we can also see the plugging behavior is
improved. In the baseline, we end up limited to plugging 8 requests in
the block layer regardless of the device queue depth size, while after
patching we can drive more io, and we manage to utilize the full device
queue.

In the baseline, after a stabilization phase, an ordinary submission
looks like:
  254,0    1    49942     0.016028795  5977  U   N [iou-sqp-5976] 7

After patching, I see consistently more requests per unplug.
  254,0    1     4996     0.001432872  3145  U   N [iou-sqp-3144] 32

Ideally, the cap size would at least be the deep enough to fill the
device queue, but we can't predict that behavior, or assume all IO goes
to a single device, and thus can't guess the ideal batch size.  We also
don't want to let the tw run unbounded, though I'm not sure it would
really be a problem.  Instead, let's just give it a more sensible value
that will allow for more efficient batching.  I've tested with different
cap values, and initially proposed to increase the cap to 1024.  Jens
argued it is too big of a bump and I observed that, with 32, I'm no
longer able to observe this bottleneck in any of my machines.

Fixes: af5d68f8892f ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20250508181203.3785544-1-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

drm/i915/dp: Fix determining SST/MST mode during MTP TU state computation

Determining the SST/MST mode during state computation must be done based
on the output type stored in the CRTC state, which in turn is set once
based on the modeset connector's SST vs. MST type and will not change as
long as the connector is using the CRTC. OTOH the MST mode indicated by
the given connector's intel_dp::is_mst flag can change independently of
the above output type, based on what sink is at any moment plugged to
the connector.

Fix the state computation accordingly.

Cc: Jani Nikula <jani.nikula@intel.com>
Fixes: f6971d7427c2 ("drm/i915/mst: adapt intel_dp_mtp_tu_compute_config() for 128b/132b SST")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4607
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://lore.kernel.org/r/20250507151953.251846-1-imre.deak@intel.com
(cherry picked from commit 0f45696ddb2b901fbf15cb8d2e89767be481d59f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

memblock: Accept allocated memory before use in memblock_double_array()

When increasing the array size in memblock_double_array() and the slab
is not yet available, a call to memblock_find_in_range() is used to
reserve/allocate memory. However, the range returned may not have been
accepted, which can result in a crash when booting an SNP guest:

  RIP: 0010:memcpy_orig+0x68/0x130
  Code: ...
  RSP: 0000:ffffffff9cc03ce8 EFLAGS: 00010006
  RAX: ff11001ff83e5000 RBX: 0000000000000000 RCX: fffffffffffff000
  RDX: 0000000000000bc0 RSI: ffffffff9dba8860 RDI: ff11001ff83e5c00
  RBP: 0000000000002000 R08: 0000000000000000 R09: 0000000000002000
  R10: 000000207fffe000 R11: 0000040000000000 R12: ffffffff9d06ef78
  R13: ff11001ff83e5000 R14: ffffffff9dba7c60 R15: 0000000000000c00
  memblock_double_array+0xff/0x310
  memblock_add_range+0x1fb/0x2f0
  memblock_reserve+0x4f/0xa0
  memblock_alloc_range_nid+0xac/0x130
  memblock_alloc_internal+0x53/0xc0
  memblock_alloc_try_nid+0x3d/0xa0
  swiotlb_init_remap+0x149/0x2f0
  mem_init+0xb/0xb0
  mm_core_init+0x8f/0x350
  start_kernel+0x17e/0x5d0
  x86_64_start_reservations+0x14/0x30
  x86_64_start_kernel+0x92/0xa0
  secondary_startup_64_no_verify+0x194/0x19b

Mitigate this by calling accept_memory() on the memory range returned
before the slab is available.

Prior to v6.12, the accept_memory() interface used a 'start' and 'end'
parameter instead of 'start' and 'size', therefore the accept_memory()
call must be adjusted to specify 'start + size' for 'end' when applying
to kernels prior to v6.12.

Cc: stable@vger.kernel.org # see patch description, needs adjustments for <= 6.11
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/da1ac73bf4ded761e21b4e4bb5178382a580cd73.1746725050.git.thomas.lendacky@amd.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Merge tag 'amd-drm-fixes-6.15-2025-05-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.15-2025-05-08:

amdgpu:
- DC FP fixes
- Freesync fix
- DMUB AUX fixes
- VCN fix
- Hibernation fixes
- HDP fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250508194102.3242372-1-alexander.deucher@amd.com

Merge tag 'drm-misc-fixes-2025-05-08' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

drm:
- Fix overflow when generating wedged event

ivpu:
- Increate timeouts
- Fix deadlock in cmdq ioctl
- Unlock mutices in correct order

panel:
- simple: Fix timings for AUO G101EVN010

ttm:
- Fix documentation
- Remove struct ttm_backup

v3d:
- Avoid memory leak in job handling

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250508104939.GA76697@2a02-2454-fd5e-fd00-c110-cbf2-6528-c5be.dyn6.pyur.net

Merge tag 'bcachefs-2025-05-08' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:

- Some fixes to help with filesystem analysis: ensure superblock
   error count gets written if we go ERO, don't discard the journal
   aggressively (so it's available for list_journal -a)

- Fix lost wakeup on arm causing us to get stuck when reading btree
   nodes

- Fix fsck failing to exit on ctrl-c

- An additional fix for filesystems with misaligned bucket sizes: we
   now ensure that allocations are properly aligned

- Setting background target but not promote target will now leave that
   data cached on the foreground target, as it used to

- Revert a change to when we allocate the VFS superblock, this was done
   for implementing blk_holder_ops but ended up not being needed, and
   allocating a superblock and not setting SB_BORN while we do recovery
   caused sync() calls and other things to hang

- Assorted fixes for harmless error messages that caused concern to
   users

* tag 'bcachefs-2025-05-08' of git://evilpiepirate.org/bcachefs:
  bcachefs: Don't aggressively discard the journal
  bcachefs: Ensure superblock gets written when we go ERO
  bcachefs: Filter out harmless EROFS error messages
  bcachefs: journal_shutdown is EROFS, not EIO
  bcachefs: Call bch2_fs_start before getting vfs superblock
  bcachefs: fix hung task timeout in journal read
  bcachefs: Add missing barriers before wake_up_bit()
  bcachefs: Ensure proper write alignment
  bcachefs: Improve want_cached_ptr()
  bcachefs: thread_with_stdio: fix spinning instead of exiting

drm/xe: Add config control for svm flush work

Without CONFIG_DRM_XE_GPUSVM set, GPU SVM is not initialized thus below
warning pops. Refine the flush work code to be controlled by the config
to avoid below warning:
"
[  453.132028] ------------[ cut here ]------------
[  453.132527] WARNING: CPU: 9 PID: 4491 at kernel/workqueue.c:4205 __flush_work+0x379/0x3a0
[  453.133355] Modules linked in: xe drm_ttm_helper ttm gpu_sched drm_buddy drm_suballoc_helper drm_gpuvm drm_exec
[  453.134352] CPU: 9 UID: 0 PID: 4491 Comm: xe_exec_mix_mod Tainted: G     U  W           6.15.0-rc3+ #7 PREEMPT(full)
[  453.135405] Tainted: [U]=USER, [W]=WARN
...
[  453.136921] RIP: 0010:__flush_work+0x379/0x3a0
[  453.137417] Code: 8b 45 00 48 8b 55 08 89 c7 48 c1 e8 04 83 e7 08 83 e0 0f 83 cf 02 89 c6 48 0f ba 6d 00 03 e9 d5 fe ff ff 0f 0b e9 db fd ff ff <0f> 0b 45 31 e4 e9 d1 fd ff ff 0f 0b e9 03 ff ff ff 0f 0b e9 d6 fe
[  453.139250] RSP: 0018:ffffc90000c67b18 EFLAGS: 00010246
[  453.139782] RAX: 0000000000000000 RBX: ffff888108a24000 RCX: 0000000000002000
[  453.140521] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8881016d61c8
[  453.141253] RBP: ffff8881016d61c8 R08: 0000000000000000 R09: 0000000000000000
[  453.141985] R10: 0000000000000000 R11: 0000000008a24000 R12: 0000000000000001
[  453.142709] R13: 0000000000000002 R14: 0000000000000000 R15: ffff888107db8c00
[  453.143450] FS:  00007f44853d4c80(0000) GS:ffff8882f469b000(0000) knlGS:0000000000000000
[  453.144276] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  453.144853] CR2: 00007f4487629228 CR3: 00000001016aa000 CR4: 00000000000406f0
[  453.145594] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  453.146320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  453.147061] Call Trace:
[  453.147336]  <TASK>
[  453.147579]  ? tick_nohz_tick_stopped+0xd/0x30
[  453.148067]  ? xas_load+0x9/0xb0
[  453.148435]  ? xa_load+0x6f/0xb0
[  453.148781]  __xe_vm_bind_ioctl+0xbd5/0x1500 [xe]
[  453.149338]  ? dev_printk_emit+0x48/0x70
[  453.149762]  ? _dev_printk+0x57/0x80
[  453.150148]  ? drm_ioctl+0x17c/0x440
[  453.150544]  ? __drm_dev_vprintk+0x36/0x90
[  453.150983]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[  453.151575]  ? drm_ioctl_kernel+0x9f/0xf0
[  453.151998]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[  453.152560]  drm_ioctl_kernel+0x9f/0xf0
[  453.152968]  drm_ioctl+0x20f/0x440
[  453.153332]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[  453.153893]  ? ioctl_has_perm.constprop.0.isra.0+0xae/0x100
[  453.154489]  ? memory_bm_test_bit+0x5/0x60
[  453.154935]  xe_drm_ioctl+0x47/0x70 [xe]
[  453.155419]  __x64_sys_ioctl+0x8d/0xc0
[  453.155824]  do_syscall_64+0x47/0x110
[  453.156228]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
"

v2 (Matt):
    refine commit message to have more details
    add Fixes tag
    move the code to xe_svm.h which already have the config
    remove a blank line per codestyle suggestion

Fixes: 63f6e480d115 ("drm/xe: Add SVM garbage collector")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250502170052.1787973-1-shuicheng.lin@intel.com
(cherry picked from commit 9d80698bcd97a5ad1088bcbb055e73fd068895e2)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe: Release force wake first then runtime power

xe_force_wake_get() is dependent on xe_pm_runtime_get(), so for
the release path, xe_force_wake_put() should be called first then
xe_pm_runtime_put().
Combine the error path and normal path together with goto.

Fixes: 85d547608ef5 ("drm/xe/xe_gt_debugfs: Update handling of xe_force_wake_get return")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250507022302.2187527-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 432cd94efdca06296cc5e76d673546f58aa90ee1)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>