Max Kellermann [Sun, 4 May 2025 18:08:31 +0000 (20:08 +0200)]
kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
Expose a simple counter to userspace for monitoring tools.
Link: https://lkml.kernel.org/r/20250504180831.4190860-3-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Cc: Core Minyard <cminyard@mvista.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Song Liu <song@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Max Kellermann [Sun, 4 May 2025 18:08:30 +0000 (20:08 +0200)]
kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
Patch series "sysfs: add counters for lockups and stalls", v2.
Commits
9db89b411170 ("exit: Expose "oops_count" to sysfs") and
8b05aa263361 ("panic: Expose "warn_count" to sysfs") added counters for
oopses and warnings to sysfs, and these two patches do the same for
hard/soft lockups and RCU stalls.
All of these counters are useful for monitoring tools to detect whether
the machine is healthy. If the kernel has experienced a lockup or a
stall, it's probably due to a kernel bug, and I'd like to detect that
quickly and easily. There is currently no way to detect that, other than
parsing dmesg. Or observing indirect effects: such as certain tasks not
responding, but then I need to observe all tasks, and it may take a while
until these effects become visible/measurable. I'd rather be able to
detect the primary cause more quickly, possibly before everything falls
apart.
This patch (of 2):
There is /proc/sys/kernel/hung_task_detect_count, /sys/kernel/warn_count
and /sys/kernel/oops_count but there is no userspace-accessible counter
for hard/soft lockups. Having this is useful for monitoring tools.
Link: https://lkml.kernel.org/r/20250504180831.4190860-1-max.kellermann@ionos.com
Link: https://lkml.kernel.org/r/20250504180831.4190860-2-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Cc:
Cc: Core Minyard <cminyard@mvista.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coiby Xu [Fri, 2 May 2025 01:12:42 +0000 (09:12 +0800)]
x86/crash: make the page that stores the dm crypt keys inaccessible
This adds an addition layer of protection for the saved copy of dm crypt
key. Trying to access the saved copy will cause page fault.
Link: https://lkml.kernel.org/r/20250502011246.99238-9-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Suggested-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coiby Xu [Fri, 2 May 2025 01:12:41 +0000 (09:12 +0800)]
x86/crash: pass dm crypt keys to kdump kernel
1st kernel will build up the kernel command parameter dmcryptkeys as
similar to elfcorehdr to pass the memory address of the stored info of dm
crypt key to kdump kernel.
Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coiby Xu [Fri, 2 May 2025 01:12:40 +0000 (09:12 +0800)]
Revert "x86/mm: Remove unused __set_memory_prot()"
This reverts commit
693bbf2a50447353c6a47961e6a7240a823ace02 as kdump LUKS
support (CONFIG_CRASH_DM_CRYPT) depends on __set_memory_prot.
[akpm@linux-foundation.org: x86 set_memory.h needs pgtable_types.h]
Link: https://lkml.kernel.org/r/20250502011246.99238-7-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coiby Xu [Fri, 2 May 2025 01:12:39 +0000 (09:12 +0800)]
crash_dump: retrieve dm crypt keys in kdump kernel
Crash kernel will retrieve the dm crypt keys based on the dmcryptkeys
command line parameter. When user space writes the key description to
/sys/kernel/config/crash_dm_crypt_key/restore, the crash kernel will save
the encryption keys to the user keyring. Then user space e.g.
cryptsetup's --volume-key-keyring API can use it to unlock the encrypted
device.
Link: https://lkml.kernel.org/r/20250502011246.99238-6-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coiby Xu [Fri, 2 May 2025 01:12:38 +0000 (09:12 +0800)]
crash_dump: reuse saved dm crypt keys for CPU/memory hot-plugging
When there are CPU and memory hot un/plugs, the dm crypt keys may need to
be reloaded again depending on the solution for crash hotplug support.
Currently, there are two solutions. One is to utilizes udev to instruct
user space to reload the kdump kernel image and initrd, elfcorehdr and etc
again. The other is to only update the elfcorehdr segment introduced in
commit
247262756121 ("crash: add generic infrastructure for crash hotplug
support").
For the 1st solution, the dm crypt keys need to be reloaded again. The
user space can write true to /sys/kernel/config/crash_dm_crypt_key/reuse
so the stored keys can be re-used.
For the 2nd solution, the dm crypt keys don't need to be reloaded.
Currently, only x86 supports the 2nd solution. If the 2nd solution gets
extended to all arches, this patch can be dropped.
Link: https://lkml.kernel.org/r/20250502011246.99238-5-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coiby Xu [Fri, 2 May 2025 01:12:37 +0000 (09:12 +0800)]
crash_dump: store dm crypt keys in kdump reserved memory
When the kdump kernel image and initrd are loaded, the dm crypts keys will
be read from keyring and then stored in kdump reserved memory.
Assume a key won't exceed 256 bytes thus MAX_KEY_SIZE=256 according to
"cryptsetup benchmark".
Link: https://lkml.kernel.org/r/20250502011246.99238-4-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coiby Xu [Fri, 2 May 2025 01:12:36 +0000 (09:12 +0800)]
crash_dump: make dm crypt keys persist for the kdump kernel
A configfs /sys/kernel/config/crash_dm_crypt_keys is provided for user
space to make the dm crypt keys persist for the kdump kernel. Take the
case of dumping to a LUKS-encrypted target as an example, here is the life
cycle of the kdump copies of LUKS volume keys,
1. After the 1st kernel loads the initramfs during boot, systemd uses
an user-input passphrase to de-crypt the LUKS volume keys or simply
TPM-sealed volume keys and then save the volume keys to specified
keyring (using the --link-vk-to-keyring API) and the keys will expire
within specified time.
2. A user space tool (kdump initramfs loader like kdump-utils) create
key items inside /sys/kernel/config/crash_dm_crypt_keys to inform
the 1st kernel which keys are needed.
3. When the kdump initramfs is loaded by the kexec_file_load
syscall, the 1st kernel will iterate created key items, save the
keys to kdump reserved memory.
4. When the 1st kernel crashes and the kdump initramfs is booted, the
kdump initramfs asks the kdump kernel to create a user key using the
key stored in kdump reserved memory by writing yes to
/sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted
device is unlocked with libcryptsetup's --volume-key-keyring API.
5. The system gets rebooted to the 1st kernel after dumping vmcore to
the LUKS encrypted device is finished
Eventually the keys have to stay in the kdump reserved memory for the
kdump kernel to unlock encrypted volumes. During this process, some
measures like letting the keys expire within specified time are desirable
to reduce security risk.
This patch assumes,
1) there are 128 LUKS devices at maximum to be unlocked thus
MAX_KEY_NUM=128.
2) a key description won't exceed 128 bytes thus KEY_DESC_MAX_LEN=128.
And here is a demo on how to interact with
/sys/kernel/config/crash_dm_crypt_keys,
# Add key #1
mkdir /sys/kernel/config/crash_dm_crypt_keys/
7d26b7b4-e342-4d2d-b660-
7426b0996720
# Add key #1's description
echo cryptsetup:
7d26b7b4-e342-4d2d-b660-
7426b0996720 > /sys/kernel/config/crash_dm_crypt_keys/description
# how many keys do we have now?
cat /sys/kernel/config/crash_dm_crypt_keys/count
1
# Add key# 2 in the same way
# how many keys do we have now?
cat /sys/kernel/config/crash_dm_crypt_keys/count
2
# the tree structure of /crash_dm_crypt_keys configfs
tree /sys/kernel/config/crash_dm_crypt_keys/
/sys/kernel/config/crash_dm_crypt_keys/
├──
7d26b7b4-e342-4d2d-b660-
7426b0996720
│ └── description
├── count
├──
fce2cd38-4d59-4317-8ce2-
1fd24d52c46a
│ └── description
Link: https://lkml.kernel.org/r/20250502011246.99238-3-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Coiby Xu [Fri, 2 May 2025 01:12:35 +0000 (09:12 +0800)]
kexec_file: allow to place kexec_buf randomly
Patch series "Support kdump with LUKS encryption by reusing LUKS volume
keys", v9.
LUKS is the standard for Linux disk encryption, widely adopted by users,
and in some cases, such as Confidential VMs, it is a requirement. With
kdump enabled, when the first kernel crashes, the system can boot into the
kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) to a
specified target. However, there are two challenges when dumping vmcore
to a LUKS-encrypted device:
- Kdump kernel may not be able to decrypt the LUKS partition. For some
machines, a system administrator may not have a chance to enter the
password to decrypt the device in kdump initramfs after the 1st kernel
crashes; For cloud confidential VMs, depending on the policy the
kdump kernel may not be able to unseal the keys with TPM and the
console virtual keyboard is untrusted.
- LUKS2 by default use the memory-hard Argon2 key derivation function
which is quite memory-consuming compared to the limited memory reserved
for kdump. Take Fedora example, by default, only 256M is reserved for
systems having memory between 4G-64G. With LUKS enabled, ~1300M needs
to be reserved for kdump. Note if the memory reserved for kdump can't
be used by 1st kernel i.e. an user sees ~1300M memory missing in the
1st kernel.
Besides users (at least for Fedora) usually expect kdump to work out of
the box i.e. no manual password input or custom crashkernel value is
needed. And it doesn't make sense to derivate the keys again in kdump
kernel which seems to be redundant work.
This patchset addresses the above issues by making the LUKS volume keys
persistent for kdump kernel with the help of cryptsetup's new APIs
(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of
the kdump copies of LUKS volume keys,
1. After the 1st kernel loads the initramfs during boot, systemd
use an user-input passphrase to de-crypt the LUKS volume keys
or TPM-sealed key and then save the volume keys to specified keyring
(using the --link-vk-to-keyring API) and the key will expire within
specified time.
2. A user space tool (kdump initramfs loader like kdump-utils) create
key items inside /sys/kernel/config/crash_dm_crypt_keys to inform
the 1st kernel which keys are needed.
3. When the kdump initramfs is loaded by the kexec_file_load
syscall, the 1st kernel will iterate created key items, save the
keys to kdump reserved memory.
4. When the 1st kernel crashes and the kdump initramfs is booted, the
kdump initramfs asks the kdump kernel to create a user key using the
key stored in kdump reserved memory by writing yes to
/sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted
device is unlocked with libcryptsetup's --volume-key-keyring API.
5. The system gets rebooted to the 1st kernel after dumping vmcore to
the LUKS encrypted device is finished
After libcryptsetup saving the LUKS volume keys to specified keyring,
whoever takes this should be responsible for the safety of these copies of
keys. The keys will be saved in the memory area exclusively reserved for
kdump where even the 1st kernel has no direct access. And further more,
two additional protections are added,
- save the copy randomly in kdump reserved memory as suggested by Jan
- clear the _PAGE_PRESENT flag of the page that stores the copy as
suggested by Pingfan
This patchset only supports x86. There will be patches to support other
architectures once this patch set gets merged.
This patch (of 9):
Currently, kexec_buf is placed in order which means for the same machine,
the info in the kexec_buf is always located at the same position each time
the machine is booted. This may cause a risk for sensitive information
like LUKS volume key. Now struct kexec_buf has a new field random which
indicates it's supposed to be placed in a random position.
Note this feature is enabled only when CONFIG_CRASH_DUMP is enabled. So
it only takes effect for kdump and won't impact kexec reboot.
Link: https://lkml.kernel.org/r/20250502011246.99238-1-coxu@redhat.com
Link: https://lkml.kernel.org/r/20250502011246.99238-2-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Suggested-by: Jan Pazdziora <jpazdziora@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Fri, 2 May 2025 12:17:42 +0000 (15:17 +0300)]
list: remove redundant 'extern' for function prototypes
The 'extern' keyword is redundant for function prototypes. list.h never
used them and new code in general is better without them.
Link: https://lkml.kernel.org/r/20250502121742.3997529-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Illia Ostapyshyn [Sat, 3 May 2025 12:32:32 +0000 (14:32 +0200)]
scripts/gdb: update documentation for lx_per_cpu
Commit
db08c53fdd542bb7f83b ("scripts/gdb: fix parameter handling in
$lx_per_cpu") changed the parameter handling of lx_per_cpu to use GdbValue
instead of parsing the variable name. Update the documentation to reflect
the new lx_per_cpu usage. Update the hrtimer_bases example to use rb_tree
instead of the timerqueue_head.next pointer removed in commit
511885d7061eda3eb1fa ("lib/timerqueue: Rely on rbtree semantics for next
timer").
Link: https://lkml.kernel.org/r/20250503123234.2407184-3-illia@yshyn.com
Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Dongliang Mu <dzm91@hust.edu.cn>
Cc: Florian Rommel <mail@florommel.de>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Illia Ostapyshyn [Sat, 3 May 2025 12:32:31 +0000 (14:32 +0200)]
scripts/gdb: fix kgdb probing on single-core systems
Patch series "scripts/gdb: Fixes related to lx_per_cpu()".
These patches (1) fix kgdb detection on systems featuring a single CPU and
(2) update the documentation to reflect the current usage of lx_per_cpu()
and update an outdated example of its usage.
This patch (of 2):
When requested the list of threads via qfThreadInfo, gdb_cmd_query in
kernel/debug/gdbstub.c first returns "shadow" threads for CPUs followed by
the actual tasks in the system. Extended qThreadExtraInfo queries yield
"shadowCPU%d" as the name for the CPU core threads.
This behavior is used by get_gdbserver_type() to probe for KGDB by
matching the name for the thread 2 against "shadowCPU". This breaks down
on single-core systems, where thread 2 is the first nonshadow thread.
Request the name for thread 1 instead.
As GDB assigns thread IDs in the order of their appearance, it is safe to
assume shadowCPU0 at ID 1 as long as CPU0 is not hotplugged.
Before:
(gdb) info threads
Id Target Id Frame
1 Thread
4294967294 (shadowCPU0) kgdb_breakpoint ()
* 2 Thread 1 (swapper/0) kgdb_breakpoint ()
3 Thread 2 (kthreadd) 0x0000000000000000 in ?? ()
...
(gdb) p $lx_current().comm
Sorry, obtaining the current CPU is not yet supported with this gdb server.
After:
(gdb) info threads
Id Target Id Frame
1 Thread
4294967294 (shadowCPU0) kgdb_breakpoint ()
* 2 Thread 1 (swapper/0) kgdb_breakpoint ()
3 Thread 2 (kthreadd) 0x0000000000000000 in ?? ()
...
(gdb) p $lx_current().comm
$1 = "swapper/0\000\000\000\000\000\000"
Link: https://lkml.kernel.org/r/20250503123234.2407184-1-illia@yshyn.com
Link: https://lkml.kernel.org/r/20250503123234.2407184-2-illia@yshyn.com
Signed-off-by: Illia Ostapyshyn <illia@yshyn.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Dongliang Mu <dzm91@hust.edu.cn>
Cc: Florian Rommel <mail@florommel.de>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Yanteng Si <si.yanteng@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chelsy Ratnawat [Sat, 3 May 2025 21:19:59 +0000 (14:19 -0700)]
selftests: fix some typos in tools/testing/selftests
Fix multiple spelling errors:
- "rougly" -> "roughly"
- "fielesystems" -> "filesystems"
- "Can'" -> "Can't"
Link: https://lkml.kernel.org/r/20250503211959.507815-1-chelsyratnawat2001@gmail.com
Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dr. David Alan Gilbert [Thu, 1 May 2025 01:05:02 +0000 (02:05 +0100)]
lib/oid_registry.c: remove unused sprint_OID
sprint_OID() was added as part of 2012's commit
4f73175d0375 ("X.509: Add
utility functions to render OIDs as strings") but it hasn't been used.
Remove it.
Note that there's also 'sprint_oid' (lower case) which is used in a lot of
places; that's left as is except for fixing its case in a comment.
Link: https://lkml.kernel.org/r/20250501010502.326472-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Mon, 28 Apr 2025 17:37:08 +0000 (02:37 +0900)]
nilfs2: do not propagate ENOENT error from nilfs_btree_propagate()
In preparation for writing logs, in nilfs_btree_propagate(), which makes
parent and ancestor node blocks dirty starting from a modified data block
or b-tree node block, if the starting block does not belong to the b-tree,
i.e. is isolated, nilfs_btree_do_lookup() called within the function
fails with -ENOENT.
In this case, even though -ENOENT is an internal code, it is propagated to
the log writer via nilfs_bmap_propagate() and may be erroneously returned
to system calls such as fsync().
Fix this issue by changing the error code to -EINVAL in this case, and
having the bmap layer detect metadata corruption and convert the error
code appropriately.
Link: https://lkml.kernel.org/r/20250428173808.6452-3-konishi.ryusuke@gmail.com
Fixes:
1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wentao Liang [Mon, 28 Apr 2025 17:37:07 +0000 (02:37 +0900)]
nilfs2: add pointer check for nilfs_direct_propagate()
Patch series "nilfs2: improve sanity checks in dirty state propagation".
This fixes one missed check for block mapping anomalies and one improper
return of an error code during a preparation step for log writing, thereby
improving checking for filesystem corruption on writeback.
This patch (of 2):
In nilfs_direct_propagate(), the printer get from nilfs_direct_get_ptr()
need to be checked to ensure it is not an invalid pointer.
If the pointer value obtained by nilfs_direct_get_ptr() is
NILFS_BMAP_INVALID_PTR, means that the metadata (in this case, i_bmap in
the nilfs_inode_info struct) that should point to the data block at the
buffer head of the argument is corrupted and the data block is orphaned,
meaning that the file system has lost consistency.
Add a value check and return -EINVAL when it is an invalid pointer.
Link: https://lkml.kernel.org/r/20250428173808.6452-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250428173808.6452-2-konishi.ryusuke@gmail.com
Fixes:
36a580eb489f ("nilfs2: direct block mapping")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Eric Biggers [Mon, 28 Apr 2025 18:57:20 +0000 (11:57 -0700)]
kexec_file: use SHA-256 library API instead of crypto_shash API
This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value. Just use the SHA-256 library
API instead, which is much simpler and easier to use.
Tested with '/sbin/kexec --kexec-file-syscall'.
Link: https://lkml.kernel.org/r/20250428185721.844686-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Mon, 28 Apr 2025 07:27:37 +0000 (10:27 +0300)]
util_macros.h: fix the reference in kernel-doc
In PTR_IF() description the text refers to the parameter as (ptr) while
the kernel-doc format asks for @ptr. Fix this accordingly.
Link: https://lkml.kernel.org/r/20250428072737.3265239-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fedor Pchelkin [Sun, 27 Apr 2025 20:14:49 +0000 (23:14 +0300)]
sort.h: hoist cmp_int() into generic header file
Deduplicate the same functionality implemented in several places by
moving the cmp_int() helper macro into linux/sort.h.
The macro performs a three-way comparison of the arguments mostly useful
in different sorting strategies and algorithms.
Link: https://lkml.kernel.org/r/20250427201451.900730-1-pchelkin@ispras.ru
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carlos Maiolino <cem@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Coly Li <colyli@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Ni [Tue, 22 Apr 2025 07:30:51 +0000 (15:30 +0800)]
ocfs2: remove unnecessary NULL check before unregister_sysctl_table()
unregister_sysctl_table() checks for NULL pointers internally. Remove
unneeded NULL check here.
Link: https://lkml.kernel.org/r/20250422073051.1334310-1-nichen@iscas.ac.cn
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Murad Masimov [Wed, 2 Apr 2025 06:56:27 +0000 (09:56 +0300)]
ocfs2: fix possible memory leak in ocfs2_finish_quota_recovery
If ocfs2_finish_quota_recovery() exits due to an error before passing all
rc_list elements to ocfs2_recover_local_quota_file() then it can lead to a
memory leak as rc_list may still contain elements that have to be freed.
Release all memory allocated by ocfs2_add_recovery_chunk() using
ocfs2_free_quota_recovery() instead of kfree().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Link: https://lkml.kernel.org/r/20250402065628.706359-2-m.masimov@mt-integration.ru
Fixes:
2205363dce74 ("ocfs2: Implement quota recovery")
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeongjun Park [Thu, 24 Apr 2025 14:33:22 +0000 (23:33 +0900)]
ipc: fix to protect IPCS lookups using RCU
syzbot reported that it discovered a use-after-free vulnerability, [0]
[0]: https://lore.kernel.org/all/
67af13f8.
050a0220.21dd3.0038.GAE@google.com/
idr_for_each() is protected by rwsem, but this is not enough. If it is
not protected by RCU read-critical region, when idr_for_each() calls
radix_tree_node_free() through call_rcu() to free the radix_tree_node
structure, the node will be freed immediately, and when reading the next
node in radix_tree_for_each_slot(), the already freed memory may be read.
Therefore, we need to add code to make sure that idr_for_each() is
protected within the RCU read-critical region when we call it in
shm_destroy_orphaned().
Link: https://lkml.kernel.org/r/20250424143322.18830-1-aha310510@gmail.com
Fixes:
b34a6b1da371 ("ipc: introduce shm_rmid_forced sysctl")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reported-by: syzbot+a2b84e569d06ca3a949c@syzkaller.appspotmail.com
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Marc Herbert [Thu, 24 Apr 2025 19:40:35 +0000 (19:40 +0000)]
compiler_types.h: fix "unused variable" in __compiletime_assert()
This refines commit
c03567a8e8d5 ("include/linux/compiler.h: don't perform
compiletime_assert with -O0") and restores #ifdef __OPTIMIZE__ symmetry by
evaluating the 'condition' variable in both compile-time variants of
__compiletimeassert().
As __OPTIMIZE__ is always true by default, this commit does not change
anything by default. But it fixes warnings with _non-default_ CFLAGS like
for instance this:
make CFLAGS_tcp.o='-Og -U__OPTIMIZE__'
from net/ipv4/tcp.c:273:
include/net/sch_generic.h: In function `qdisc_cb_private_validate':
include/net/sch_generic.h:511:30:
error: unused variable `qcb' [-Werror=unused-variable]
{
struct qdisc_skb_cb *qcb;
BUILD_BUG_ON(sizeof(skb->cb) < sizeof(*qcb));
...
}
[akpm@linux-foundation.org: regularize comment layout, reflow comment]
Link: https://lkml.kernel.org/r/20250424194048.652571-1-marc.herbert@linux.intel.com
Signed-off-by: Marc Herbert <Marc.Herbert@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Jan Hendrik Farr <kernel@jfarr.cc>
Cc: Macro Elver <elver@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Tony Ambardar <tony.ambardar@gmail.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mykyta Yatsenko [Tue, 22 Apr 2025 13:14:49 +0000 (14:14 +0100)]
maccess: fix strncpy_from_user_nofault() empty string handling
strncpy_from_user_nofault() should return the length of the copied string
including the trailing NUL, but if the argument unsafe_addr points to an
empty string ({'\0'}), the return value is 0.
This happens as strncpy_from_user() copies terminal symbol into dst and
returns 0 (as expected), but strncpy_from_user_nofault does not modify ret
as it is not equal to count and not greater than 0, so 0 is returned,
which contradicts the contract.
Link: https://lkml.kernel.org/r/20250422131449.57177-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Andrii Nakryiko <andrii@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Luo Gengkun [Mon, 21 Apr 2025 03:50:21 +0000 (03:50 +0000)]
watchdog: fix watchdog may detect false positive of softlockup
When updating `watchdog_thresh`, there is a race condition between writing
the new `watchdog_thresh` value and stopping the old watchdog timer. If
the old timer triggers during this window, it may falsely detect a
softlockup due to the old interval and the new `watchdog_thresh` value
being used. The problem can be described as follow:
# We asuume previous watchdog_thresh is 60, so the watchdog timer is
# coming every 24s.
echo 10 > /proc/sys/kernel/watchdog_thresh (User space)
|
+------>+ update watchdog_thresh (We are in kernel now)
|
| # using old interval and new `watchdog_thresh`
+------>+ watchdog hrtimer (irq context: detect softlockup)
|
|
+-------+
|
|
+ softlockup_stop_all
To fix this problem, introduce a shadow variable for `watchdog_thresh`.
The update to the actual `watchdog_thresh` is delayed until after the old
timer is stopped, preventing false positives.
The following testcase may help to understand this problem.
---------------------------------------------
echo RT_RUNTIME_SHARE > /sys/kernel/debug/sched/features
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
echo 0 > /sys/kernel/debug/sched/fair_server/cpu3/runtime
echo 60 > /proc/sys/kernel/watchdog_thresh
taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" &
echo 10 > /proc/sys/kernel/watchdog_thresh &
---------------------------------------------
The test case above first removes the throttling restrictions for
real-time tasks. It then sets watchdog_thresh to 60 and executes a
real-time task ,a simple while(1) loop, on cpu3. Consequently, the final
command gets blocked because the presence of this real-time thread
prevents kworker:3 from being selected by the scheduler. This eventually
triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating
with inconsistent variable - using both the old interval and the updated
watchdog_thresh simultaneously.
[nysal@linux.ibm.com: fix the SOFTLOCKUP_DETECTOR=n case]
Link: https://lkml.kernel.org/r/20250502111120.282690-1-nysal@linux.ibm.com
Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.com
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Nysal Jan K.A." <nysal@linux.ibm.com>
Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
WangYuli [Mon, 21 Apr 2025 07:49:10 +0000 (15:49 +0800)]
treewide: fix typo "previlege"
There are some spelling mistakes of 'previlege' in comments which
should be 'privilege'.
Fix them and add it to scripts/spelling.txt.
The typo in arch/loongarch/kvm/main.c was corrected by a different
patch [1] and is therefore not included in this submission.
[1]. https://lore.kernel.org/all/
20250420142208.
2252280-1-wheatfox17@icloud.com/
Link: https://lkml.kernel.org/r/46AD404E411A4BAC+20250421074910.66988-1-wangyuli@uniontech.com
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Colin Ian King [Fri, 18 Apr 2025 12:03:31 +0000 (13:03 +0100)]
crash: fix spelling mistake "crahskernel" -> "crashkernel"
There is a spelling mistake in a pr_warn message. Fix it.
Link: https://lkml.kernel.org/r/20250418120331.535086-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Herton R. Krzesinski [Fri, 18 Apr 2025 16:50:47 +0000 (13:50 -0300)]
lib/test_kmod: do not hardcode/depend on any filesystem
Right now test_kmod has hardcoded dependencies on btrfs/xfs. That is not
optimal since you end up needing to select/build them, but it is not
really required since other fs could be selected for the testing. Also,
we can't change the default/driver module used for testing on
initialization.
Thus make it more generic: introduce two module parameters (start_driver
and start_test_fs), which allow to select which modules/fs to use for the
testing on test_kmod initialization. Then it's up to the user to select
which modules/fs to use for testing based on his config. However, keep
test_module as required default.
This way, config/modules becomes selectable as when the testing is done
from selftests (userspace).
While at it, also change trigger_config_run_type, since at module
initialization we already set the defaults at __kmod_config_init and
should not need to do it again in test_kmod_init(), thus we can avoid to
again set test_driver/test_fs.
Link: https://lkml.kernel.org/r/20250418165047.702487-1-herton@redhat.com
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Reviewed-by: Luis Chambelrain <mcgrof@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dr. David Alan Gilbert [Fri, 18 Apr 2025 23:49:32 +0000 (00:49 +0100)]
relay: remove unused relay_late_setup_files
The last use of relay_late_setup_files() was removed in 2018 by commit
2b47733045aa ("drm/i915/guc: Merge log relay file and channel creation")
Remove it and the helper it used.
relay_late_setup_files() was used for eventually registering 'buffer only'
channels. With it gone, delete the docs that explain how to do that.
Which suggests it should be possible to lose the 'has_base_filename'
flags.
(Are there any other uses??)
Link: https://lkml.kernel.org/r/20250418234932.490863-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dr. David Alan Gilbert [Sat, 19 Apr 2025 20:30:12 +0000 (21:30 +0100)]
rapidio: remove unused functions
rio_request_dma() and rio_dma_prep_slave_sg() were added in 2012 by commit
e42d98ebe7d7 ("rapidio: add DMA engine support for RIO data transfers")
but never used.
rio_find_mport() last use was removed in 2013 by commit
9edbc30b434f
("rapidio: update enumerator registration mechanism")
rio_unregister_scan() was added in 2013 by commit
a11650e11093 ("rapidio:
make enumeration/discovery configurable") but never used.
Remove them.
Link: https://lkml.kernel.org/r/20250419203012.429787-3-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dr. David Alan Gilbert [Sat, 19 Apr 2025 20:30:11 +0000 (21:30 +0100)]
rapidio: remove some dead defines
Patch series "rapidio deadcoding".
A couple of rapidio deadcoding patches. The first of these is a repost
and was originally posted almost a year ago
https://lore.kernel.org/all/
20240528002515.211366-1-linux@treblig.org/ but
got no answer. Other than being rebased and a typo fixed, it's not
changed.
This patch (of 2):
'mport_dma_buf', 'rio_mport_dma_map' and 'MPORT_MAX_DMA_BUFS' were added
in the original commit
e8de370188d0 ("rapidio: add mport char device
driver") but never used.
'rio_cm_work' was unused since the original commit
b6e8d4aa1110 ("rapidio:
add RapidIO channelized messaging driver") but never used.
Remove them.
Link: https://lkml.kernel.org/r/20250419203012.429787-1-linux@treblig.org
Link: https://lkml.kernel.org/r/20250419203012.429787-2-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Caleb Sander Mateos [Wed, 16 Apr 2025 16:06:13 +0000 (10:06 -0600)]
scatterlist: inline sg_next()
sg_next() is a short function called frequently in I/O paths. Define it
in the header file so it can be inlined into its callers.
Link: https://lkml.kernel.org/r/20250416160615.3571958-1-csander@purestorage.com
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Thorsten Blum [Tue, 8 Apr 2025 10:34:07 +0000 (12:34 +0200)]
ocfs2: simplify return statement in ocfs2_filecheck_attr_store()
Don't negate 'ret' and simplify the return statement.
No functional changes intended.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zi Li [Mon, 14 Apr 2025 14:59:45 +0000 (22:59 +0800)]
samples: extend hung_task detector test with semaphore support
Extend the existing hung_task detector test module to support multiple
lock types, including mutex and semaphore, with room for future additions
(e.g., spinlock, etc.). This module creates dummy files under
<debugfs>/hung_task, such as 'mutex' and 'semaphore'. The read process on
any of these files will sleep for enough long time (256 seconds) while
holding the respective lock. As a result, the second process will wait on
the lock for a prolonged duration and be detected by the hung_task
detector.
This change unifies the previous mutex-only sample into a single,
extensible hung_task_tests module, reducing code duplication and improving
maintainability.
Usage is:
> cd /sys/kernel/debug/hung_task
> cat mutex & cat mutex # Test mutex blocking
> cat semaphore & cat semaphore # Test semaphore blocking
Update the Kconfig description to reflect multiple debugfs files support.
Link: https://lkml.kernel.org/r/20250414145945.84916-4-ioworker0@gmail.com
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Signed-off-by: Zi Li <amaindex@outlook.com>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Mingzhe Yang <mingzhe.yang@ly.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lance Yang [Mon, 14 Apr 2025 14:59:44 +0000 (22:59 +0800)]
hung_task: show the blocker task if the task is hung on semaphore
Inspired by mutex blocker tracking[1], this patch makes a trade-off to
balance the overhead and utility of the hung task detector.
Unlike mutexes, semaphores lack explicit ownership tracking, making it
challenging to identify the root cause of hangs. To address this, we
introduce a last_holder field to the semaphore structure, which is updated
when a task successfully calls down() and cleared during up().
The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.
With this change, the hung task detector can now show blocker task's info
like below:
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr 8 12:19:07 2025] Tainted: G E 6.14.0-rc6+ #1
[Tue Apr 8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr 8 12:19:07 2025] task:cat state:D stack:0 pid:945 tgid:945 ppid:828 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0xe3/0xf0
[Tue Apr 8 12:19:07 2025] ? __folio_mod_stat+0x2a/0x80
[Tue Apr 8 12:19:07 2025] ? set_ptes.constprop.0+0x27/0x90
[Tue Apr 8 12:19:07 2025] __down_common+0x155/0x280
[Tue Apr 8 12:19:07 2025] down+0x53/0x70
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x23/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr 8 12:19:07 2025] RSP: 002b:
00007fff1c4d2668 EFLAGS:
00000246 ORIG_RAX:
0000000000000000
[Tue Apr 8 12:19:07 2025] RAX:
ffffffffffffffda RBX:
0000000000020000 RCX:
00007f419478f46e
[Tue Apr 8 12:19:07 2025] RDX:
0000000000020000 RSI:
00007f4194683000 RDI:
0000000000000003
[Tue Apr 8 12:19:07 2025] RBP:
00007f4194683000 R08:
00007f4194682010 R09:
0000000000000000
[Tue Apr 8 12:19:07 2025] R10:
fffffffffffffbc5 R11:
0000000000000246 R12:
0000000000000000
[Tue Apr 8 12:19:07 2025] R13:
0000000000000003 R14:
0000000000020000 R15:
0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr 8 12:19:07 2025] task:cat state:S stack:0 pid:938 tgid:938 ppid:584 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0x77/0xf0
[Tue Apr 8 12:19:07 2025] ? __pfx_process_timeout+0x10/0x10
[Tue Apr 8 12:19:07 2025] msleep_interruptible+0x49/0x60
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x2d/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr 8 12:19:07 2025] RSP: 002b:
00007ffdba8ce158 EFLAGS:
00000246 ORIG_RAX:
0000000000000000
[Tue Apr 8 12:19:07 2025] RAX:
ffffffffffffffda RBX:
0000000000020000 RCX:
00007f7c584a646e
[Tue Apr 8 12:19:07 2025] RDX:
0000000000020000 RSI:
00007f7c5839a000 RDI:
0000000000000003
[Tue Apr 8 12:19:07 2025] RBP:
00007f7c5839a000 R08:
00007f7c58399010 R09:
0000000000000000
[Tue Apr 8 12:19:07 2025] R10:
fffffffffffffbc5 R11:
0000000000000246 R12:
0000000000000000
[Tue Apr 8 12:19:07 2025] R13:
0000000000000003 R14:
0000000000020000 R15:
0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
[1] https://lore.kernel.org/all/
174046694331.
2194069.
15472952050240807469.stgit@mhiramat.tok.corp.google.com
Link: https://lkml.kernel.org/r/20250414145945.84916-3-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Li <amaindex@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lance Yang [Mon, 14 Apr 2025 14:59:43 +0000 (22:59 +0800)]
hung_task: replace blocker_mutex with encoded blocker
Patch series "hung_task: extend blocking task stacktrace dump to
semaphore", v5.
Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores. Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs. To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().
The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.
With this change, the hung task detector can now show blocker task's info
like below:
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr 8 12:19:07 2025] Tainted: G E 6.14.0-rc6+ #1
[Tue Apr 8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr 8 12:19:07 2025] task:cat state:D stack:0 pid:945 tgid:945 ppid:828 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0xe3/0xf0
[Tue Apr 8 12:19:07 2025] ? __folio_mod_stat+0x2a/0x80
[Tue Apr 8 12:19:07 2025] ? set_ptes.constprop.0+0x27/0x90
[Tue Apr 8 12:19:07 2025] __down_common+0x155/0x280
[Tue Apr 8 12:19:07 2025] down+0x53/0x70
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x23/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr 8 12:19:07 2025] RSP: 002b:
00007fff1c4d2668 EFLAGS:
00000246 ORIG_RAX:
0000000000000000
[Tue Apr 8 12:19:07 2025] RAX:
ffffffffffffffda RBX:
0000000000020000 RCX:
00007f419478f46e
[Tue Apr 8 12:19:07 2025] RDX:
0000000000020000 RSI:
00007f4194683000 RDI:
0000000000000003
[Tue Apr 8 12:19:07 2025] RBP:
00007f4194683000 R08:
00007f4194682010 R09:
0000000000000000
[Tue Apr 8 12:19:07 2025] R10:
fffffffffffffbc5 R11:
0000000000000246 R12:
0000000000000000
[Tue Apr 8 12:19:07 2025] R13:
0000000000000003 R14:
0000000000020000 R15:
0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr 8 12:19:07 2025] task:cat state:S stack:0 pid:938 tgid:938 ppid:584 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0x77/0xf0
[Tue Apr 8 12:19:07 2025] ? __pfx_process_timeout+0x10/0x10
[Tue Apr 8 12:19:07 2025] msleep_interruptible+0x49/0x60
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x2d/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr 8 12:19:07 2025] RSP: 002b:
00007ffdba8ce158 EFLAGS:
00000246 ORIG_RAX:
0000000000000000
[Tue Apr 8 12:19:07 2025] RAX:
ffffffffffffffda RBX:
0000000000020000 RCX:
00007f7c584a646e
[Tue Apr 8 12:19:07 2025] RDX:
0000000000020000 RSI:
00007f7c5839a000 RDI:
0000000000000003
[Tue Apr 8 12:19:07 2025] RBP:
00007f7c5839a000 R08:
00007f7c58399010 R09:
0000000000000000
[Tue Apr 8 12:19:07 2025] R10:
fffffffffffffbc5 R11:
0000000000000246 R12:
0000000000000000
[Tue Apr 8 12:19:07 2025] R13:
0000000000000003 R14:
0000000000020000 R15:
0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
This patch (of 3):
This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long
blocker', as only one blocker is active at a time.
The blocker filed can store both the lock addrees and the lock type, with
LSB used to encode the type as Masami suggested, making it easier to
extend the feature to cover other types of locks.
Also, once the lock type is determined, we can directly extract the
address and cast it to a lock pointer ;)
Link: https://lkml.kernel.org/r/20250414145945.84916-1-ioworker0@gmail.com
Link: https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com
Link: https://lkml.kernel.org/r/20250414145945.84916-2-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Li <amaindex@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
WangYuli [Fri, 11 Apr 2025 10:26:10 +0000 (18:26 +0800)]
ocfs2: o2net_idle_timer: Rename del_timer_sync in comment
Commit
8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()")
switched del_timer_sync to timer_delete_sync, but did not modify the
comment for o2net_idle_timer(). Now fix it.
Link: https://lkml.kernel.org/r/BDDB1E4E2876C36C+20250411102610.165946-1-wangyuli@uniontech.com
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Phillip Lougher [Wed, 9 Apr 2025 02:47:47 +0000 (03:47 +0100)]
Squashfs: check return result of sb_min_blocksize
Syzkaller reports an "UBSAN: shift-out-of-bounds in squashfs_bio_read" bug.
Syzkaller forks multiple processes which after mounting the Squashfs
filesystem, issues an ioctl("/dev/loop0", LOOP_SET_BLOCK_SIZE, 0x8000).
Now if this ioctl occurs at the same time another process is in the
process of mounting a Squashfs filesystem on /dev/loop0, the failure
occurs. When this happens the following code in squashfs_fill_super()
fails.
----
msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE);
msblk->devblksize_log2 = ffz(~msblk->devblksize);
----
sb_min_blocksize() returns 0, which means msblk->devblksize is set to 0.
As a result, ffz(~msblk->devblksize) returns 64, and msblk->devblksize_log2
is set to 64.
This subsequently causes the
UBSAN: shift-out-of-bounds in fs/squashfs/block.c:195:36
shift exponent 64 is too large for 64-bit type 'u64' (aka
'unsigned long long')
This commit adds a check for a 0 return by sb_min_blocksize().
Link: https://lkml.kernel.org/r/20250409024747.876480-1-phillip@squashfs.org.uk
Fixes:
0aa666190509 ("Squashfs: super block operations")
Reported-by: syzbot+65761fc25a137b9c8c6e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/
67f0dd7a.
050a0220.0a13.0230.GAE@google.com/
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mateusz Guzik [Wed, 19 Mar 2025 19:54:36 +0000 (20:54 +0100)]
exit: combine work under lock in synchronize_group_exit() and coredump_task_exit()
This reduces single-threaded overhead as it avoids one lock+irq trip on
exit.
It also improves scalability of spawning and killing threads within one
process (just shy of 5% when doing it on 24 cores on my test jig).
Both routines are moved below kcov and kmsan exit, which should be
harmless.
Link: https://lkml.kernel.org/r/20250319195436.1864415-1-mjguzik@gmail.com
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zijun Hu [Mon, 7 Apr 2025 11:44:16 +0000 (19:44 +0800)]
errseq: eliminate special limitation for macro MAX_ERRNO
Current errseq implementation depends on a very special precondition that
macro MAX_ERRNO must be (2^n - 1).
Eliminate the limitation by
- redefining macro ERRSEQ_SHIFT
- defining a new macro ERRNO_MASK instead of MAX_ERRNO for errno mask.
There is no plan to change the value of MAX_ERRNO, but this makes the
implementation more generic and eliminates the BUILD_BUG_ON().
Link: https://lkml.kernel.org/r/20250407-improve_errseq-v1-1-7b27cbeb8298@quicinc.com
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mario Limonciello [Fri, 21 Mar 2025 02:25:01 +0000 (21:25 -0500)]
kstrtox: add support for enabled and disabled in kstrtobool()
In some places in the kernel there is a design pattern for sysfs
attributes to use kstrtobool() in store() and str_enabled_disabled() in
show().
This is counterintuitive to interact with because kstrtobool() takes
on/off but str_enabled_disabled() shows enabled/disabled. Some of those
sysfs uses could switch to str_on_off() but for some attributes
enabled/disabled really makes more sense.
Add support for kstrtobool() to accept enabled/disabled.
Link: https://lkml.kernel.org/r/20250321022538.1532445-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Mon, 24 Mar 2025 10:50:25 +0000 (12:50 +0200)]
kernel.h: move PTR_IF() and u64_to_user_ptr() to util_macros.h
While the natural choice of PTR_IF() is kconfig.h, the latter is too broad
to include C code and actually the macro was moved out from there in the
past. But kernel.h is neither a good choice for that. Move it to
util_macros.h. Do the same for u64_to_user_ptr().
While moving, add necessary documentation.
Link: https://lkml.kernel.org/r/20250324105228.775784-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ingo Molnar [Mon, 24 Mar 2025 10:50:24 +0000 (12:50 +0200)]
kernel.h: move READ/WRITE definitions to <linux/types.h>
Patch series "kernel.h: Move out a couple of macros and constants".
kernel.h hosts a couple of macros and constants that may be better placed.
Do that. Also add missing documentation. No functional changes
intended.
This patch (of 2):
Headers shouldn't be forced to include <linux/kernel.h> just to
gain these simple constants.
Link: https://lkml.kernel.org/r/20250324105228.775784-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20250324105228.775784-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexandru Ardelean <aardelean@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jim Cromie [Tue, 25 Mar 2025 23:51:56 +0000 (17:51 -0600)]
powernow: use pr_info_once
This reduces log-msgs during boot from many pages to ~10 occurrences. I
didn't investigate why it wasn't just 1, maybe its a low-level service to
other modules, re-probed by each of them ?
Link: https://lkml.kernel.org/r/20250325235156.663269-4-jim.cromie@gmail.com
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Louis Chauvet <louis.chauvet@bootlin.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jim Cromie [Tue, 25 Mar 2025 23:51:55 +0000 (17:51 -0600)]
checkpatch: qualify do-while-0 advice
Add a paragraph of advice qualifying the general do-while-0 advice, noting
3 possible misguidings. reduce one ERROR to WARN, for the case I actually
encountered.
And add 'static_assert' to named exceptions, along with some additional
comments about named exceptions vs (detection of) declarative construction
primitives (union, struct, [], etc).
Link: https://lkml.kernel.org/r/20250325235156.663269-3-jim.cromie@gmail.com
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Louis Chauvet <louis.chauvet@bootlin.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jim Cromie [Tue, 25 Mar 2025 23:51:54 +0000 (17:51 -0600)]
checkpatch: dont warn about unused macro arg on empty body
Patch series "2 checkpatch fixes, one pr_info_once".
2 small tweaks to checkpatch,
1 reducing several pages of powernow "not-relevant-here" log-msgs to a few lines
This patch (of 3):
We currently get:
WARNING: Argument 'name' is not used in function-like macro
on:
#define DRM_CLASSMAP_USE(name) /* nothing here */
Following this advice is wrong here, and shouldn't be fixed by ignoring
args altogether; the macro should properly fail if invoked with 0 or 2+
args.
Link: https://lkml.kernel.org/r/20250325235156.663269-1-jim.cromie@gmail.com
Link: https://lkml.kernel.org/r/20250325235156.663269-2-jim.cromie@gmail.com
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc:"Rafael J. Wysocki" <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Penglei Jiang [Fri, 4 Apr 2025 06:33:57 +0000 (23:33 -0700)]
proc: fix the issue of proc_mem_open returning NULL
proc_mem_open() can return an errno, NULL, or mm_struct*. If it fails to
acquire mm, it returns NULL, but the caller does not check for the case
when the return value is NULL.
The following conditions lead to failure in acquiring mm:
- The task is a kernel thread (PF_KTHREAD)
- The task is exiting (PF_EXITING)
Changes:
- Add documentation comments for the return value of proc_mem_open().
- Add checks in the caller to return -ESRCH when proc_mem_open()
returns NULL.
Link: https://lkml.kernel.org/r/20250404063357.78891-1-superman.xpt@gmail.com
Reported-by: syzbot+f9238a0a31f9b5603fef@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/
000000000000f52642060d4e3750@google.com
Signed-off-by: Penglei Jiang <superman.xpt@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Adrian Ratiu <adrian.ratiu@collabora.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Felix Moessbauer <felix.moessbauer@siemens.com>
Cc: Jeff layton <jlayton@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chisheng Chen [Thu, 3 Apr 2025 11:26:14 +0000 (19:26 +0800)]
lib/rbtree.c: fix the example typo
Replace `sr` with `Sr`. The condition `!tmp1 || rb_is_black(tmp1)`
ensures that `tmp1` (which is `sibling->rb_right`) is either NULL or a
black node. Therefore, the right child of the sibling must be black, and
the example should use `Sr` instead of `sr`.
Link: https://lkml.kernel.org/r/20250403112614.570140-1-johnny1001s000602@gmail.com
Signed-off-by: Chisheng Chen <johnny1001s000602@gmail.com>
Cc: Hsin Chang Yu <zxcvb600870024@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pasha Tatashin [Thu, 3 Apr 2025 12:33:03 +0000 (14:33 +0200)]
task_stack.h: remove obsolete __HAVE_ARCH_KSTACK_END check
Remove __HAVE_ARCH_KSTACK_END as it has been obsolete since removal of
metag architecture in v4.17.
Link: https://lkml.kernel.org/r/20250403-kstack-end-v1-1-7798e71f34d1@linaro.org
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/20240311164638.2015063-2-pasha.tatashin@soleen.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhiquan Li [Thu, 3 Apr 2025 03:08:01 +0000 (11:08 +0800)]
crash: export PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfo
On Intel TDX guest, unaccepted memory is unusable free memory which is not
managed by buddy, until it's accepted by guest. Before that, it cannot be
accessed by the first kernel as well as the kexec'ed kernel. The kexec'ed
kernel will skip these pages and fill in zero data for the reader of
vmcore.
The dump tool like makedumpfile creates a page descriptor (size 24 bytes)
for each non-free page, including zero data page, but it will not create
descriptor for free pages. If it is not able to distinguish these
unaccepted pages with zero data pages, a certain amount of space will be
wasted in proportion (~1/170). In fact, as a special kind of free page
the unaccepted pages should be excluded, like the real free pages.
Export the page type PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfo, so that
dump tool can identify whether a page is unaccepted.
[zhiquan1.li@intel.com: fix docs: "Title underline too short" warning]
Link: https://lore.kernel.org/all/20240809114854.3745464-5-kirill.shutemov@linux.intel.com/
Link: https://lkml.kernel.org/r/20250405060610.860465-1-zhiquan1.li@intel.com
Link: https://lore.kernel.org/all/20240809114854.3745464-5-kirill.shutemov@linux.intel.com/
Link: https://lkml.kernel.org/r/20250403030801.758687-1-zhiquan1.li@intel.com
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Francesco Valla [Sun, 16 Mar 2025 20:50:15 +0000 (21:50 +0100)]
init/main.c: log initcall level when initcall_debug is used
When initcall_debug is specified on the command line, the start and return
point for each initcall is printed. However, no information on the
initcall level is reported.
Add to the initcall_debug infrastructure an additional print that informs
when a new initcall level is entered. This is particularly useful when
debugging dependency chains and/or working on boot time reduction.
Link: https://lkml.kernel.org/r/20250316205014.2830071-2-francesco@valla.it
Signed-off-by: Francesco Valla <francesco@valla.it>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Bird <tim.bird@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrii Nakryiko [Wed, 2 Apr 2025 18:09:25 +0000 (11:09 -0700)]
exit: move and extend sched_process_exit() tracepoint
It is useful to be able to access current->mm at task exit to, say, record
a bunch of VMA information right before the task exits (e.g., for stack
symbolization reasons when dealing with short-lived processes that exit in
the middle of profiling session). Currently, trace_sched_process_exit()
is triggered after exit_mm() which resets current->mm to NULL making this
tracepoint unsuitable for inspecting and recording task's
mm_struct-related data when tracing process lifetimes.
There is a particularly suitable place, though, right after
taskstats_exit() is called, but before we do exit_mm() and other exit_*()
resource teardowns. taskstats performs a similar kind of accounting that
some applications do with BPF, and so co-locating them seems like a good
fit. So that's where trace_sched_process_exit() is moved with this patch.
Also, existing trace_sched_process_exit() tracepoint is notoriously
missing `group_dead` flag that is certainly useful in practice and some of
our production applications have to work around this. So plumb
`group_dead` through while at it, to have a richer and more complete
tracepoint.
Note that we can't use sched_process_template anymore, and so we use
TRACE_EVENT()-based tracepoint definition. But all the field names and
order, as well as assign and output logic remain intact. We just add one
extra field at the end in backwards-compatible way.
[andrii@kernel.org: document sched_process_exit and sched_process_template relation]
Link: https://lkml.kernel.org/r/20250403174120.4087794-1-andrii@kernel.org
Link: https://lkml.kernel.org/r/20250402180925.90914-1-andrii@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sun, 11 May 2025 21:54:11 +0000 (14:54 -0700)]
Linux 6.15-rc6
Linus Torvalds [Sun, 11 May 2025 18:30:13 +0000 (11:30 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Avoid use of uninitialized memcache pointer in user_mem_abort()
- Always set HCR_EL2.xMO bits when running in VHE, allowing
interrupts to be taken while TGE=0 and fixing an ugly bug on
AmpereOne that occurs when taking an interrupt while clearing the
xMO bits (AC03_CPU_36)
- Prevent VMMs from hiding support for AArch64 at any EL virtualized
by KVM
- Save/restore the host value for HCRX_EL2 instead of restoring an
incorrect fixed value
- Make host_stage2_set_owner_locked() check that the entire requested
range is memory rather than just the first page
RISC-V:
- Add missing reset of smstateen CSRs
x86:
- Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid
causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to
sanitize the VMCB as its state is undefined after SHUTDOWN,
emulating INIT is the least awful choice).
- Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM
KVM doesn't goof a sanity check in the future.
- Free obsolete roots when (re)loading the MMU to fix a bug where
pre-faulting memory can get stuck due to always encountering a
stale root.
- When dumping GHCB state, use KVM's snapshot instead of the raw GHCB
page to print state, so that KVM doesn't print stale/wrong
information.
- When changing memory attributes (e.g. shared <=> private), add
potential hugepage ranges to the mmu_invalidate_range_{start,end}
set so that KVM doesn't create a shared/private hugepage when the
the corresponding attributes will become mixed (the attributes are
commited *after* KVM finishes the invalidation).
- Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM
has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is
loaded led to very measurable performance regressions for non-KVM
workloads"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions
KVM: arm64: Fix memory check in host_stage2_set_owner_locked()
KVM: arm64: Kill HCRX_HOST_FLAGS
KVM: arm64: Properly save/restore HCRX_EL2
KVM: arm64: selftest: Don't try to disable AArch64 support
KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL
KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()
KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing
KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields
KVM: RISC-V: reset smstateen CSRs
KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload()
KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run()
KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception
Linus Torvalds [Sun, 11 May 2025 18:19:52 +0000 (11:19 -0700)]
Merge tag 'mips-fixes_6.15_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix delayed timers
- Fix NULL pointer deref
- Fix wrong range check
* tag 'mips-fixes_6.15_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Fix MAX_REG_OFFSET
MIPS: CPS: Fix potential NULL pointer dereferences in cps_prepare_cpus()
MIPS: rename rollback_handler with skipover_handler
MIPS: Move r4k_wait() to .cpuidle.text section
MIPS: Fix idle VS timer enqueue
Linus Torvalds [Sun, 11 May 2025 18:08:55 +0000 (11:08 -0700)]
Merge tag 'x86-urgent-2025-05-11' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix a boot regression on very old x86 CPUs without CPUID support"
* tag 'x86-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Consolidate the loader enablement checking
Linus Torvalds [Sun, 11 May 2025 17:33:25 +0000 (10:33 -0700)]
Merge tag 'timers-urgent-2025-05-11' of git://git./linux/kernel/git/tip/tip
Pull misc timers fixes from Ingo Molnar:
- Fix time keeping bugs in CLOCK_MONOTONIC_COARSE clocks
- Work around absolute relocations into vDSO code that GCC erroneously
emits in certain arm64 build environments
- Fix a false positive lockdep warning in the i8253 clocksource driver
* tag 'timers-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable()
arm64: vdso: Work around invalid absolute relocations from GCC
timekeeping: Prevent coarse clocks going backwards
Linus Torvalds [Sun, 11 May 2025 17:29:29 +0000 (10:29 -0700)]
Merge tag 'input-for-v6.15-rc5' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- Synaptics touchpad on multiple laptops (Dynabook Portege X30L-G,
Dynabook Portege X30-D, TUXEDO InfinityBook Pro 14 v5, Dell Precision
M3800, HP Elitebook 850 G1) switched from PS/2 to SMBus mode
- a number of new controllers added to xpad driver: HORI Drum
controller, PowerA Fusion Pro 4, PowerA MOGA XP-Ultra controller,
8BitDo Ultimate 2 Wireless Controller, 8BitDo Ultimate 3-mode
Controller, Hyperkin DuchesS Xbox One controller
- fixes to xpad driver to properly handle Mad Catz JOYTECH NEO SE
Advanced and PDP Mirror's Edge Official controllers
- fixes to xpad driver to properly handle "Share" button on some
controllers
- a fix for device initialization timing and for waking up the
controller in cyttsp5 driver
- a fix for hisi_powerkey driver to properly wake up from s2idle state
- other assorted cleanups and fixes
* tag 'input-for-v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - fix xpad_device sorting
Input: xpad - add support for several more controllers
Input: xpad - fix Share button on Xbox One controllers
Input: xpad - fix two controller table values
Input: hisi_powerkey - enable system-wakeup for s2idle
Input: synaptics - enable InterTouch on Dell Precision M3800
Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5
Input: synaptics - enable InterTouch on Dynabook Portege X30L-G
Input: synaptics - enable InterTouch on Dynabook Portege X30-D
Input: synaptics - enable SMBus for HP Elitebook 850 G1
Input: mtk-pmic-keys - fix possible null pointer dereference
Input: xpad - add support for 8BitDo Ultimate 2 Wireless Controller
Input: cyttsp5 - fix power control issue on wakeup
MAINTAINERS: .mailmap: update Mattijs Korpershoek's email address
dt-bindings: mediatek,mt6779-keypad: Update Mattijs' email address
Input: stmpe-ts - use module alias instead of device table
Input: cyttsp5 - ensure minimum reset pulse width
Input: sparcspkr - avoid unannotated fall-through
input/joystick: magellan: Mark __nonstring look-up table
Linus Torvalds [Sun, 11 May 2025 17:23:53 +0000 (10:23 -0700)]
Merge tag 'fixes-2025-05-11' of git://git./linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
- Mark set_high_memory() as __init to fix section mismatch
- Accept memory allocated in memblock_double_array() to mitigate crash
of SNP guests
* tag 'fixes-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: Accept allocated memory before use in memblock_double_array()
mm,mm_init: Mark set_high_memory as __init
Vicki Pfau [Sun, 11 May 2025 06:06:34 +0000 (23:06 -0700)]
Input: xpad - fix xpad_device sorting
A recent commit put one entry in the wrong place. This just moves it to the
right place.
Signed-off-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/20250328234345.989761-5-vi@endrift.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Vicki Pfau [Sun, 11 May 2025 06:00:10 +0000 (23:00 -0700)]
Input: xpad - add support for several more controllers
This adds support for several new controllers, all of which include
Share buttons:
- HORI Drum controller
- PowerA Fusion Pro 4
- 8BitDo Ultimate 3-mode Controller
- Hyperkin DuchesS Xbox One controller
- PowerA MOGA XP-Ultra controller
Signed-off-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/20250328234345.989761-4-vi@endrift.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Vicki Pfau [Sun, 11 May 2025 05:59:25 +0000 (22:59 -0700)]
Input: xpad - fix Share button on Xbox One controllers
The Share button, if present, is always one of two offsets from the end of the
file, depending on the presence of a specific interface. As we lack parsing for
the identify packet we can't automatically determine the presence of that
interface, but we can hardcode which of these offsets is correct for a given
controller.
More controllers are probably fixable by adding the MAP_SHARE_BUTTON in the
future, but for now I only added the ones that I have the ability to test
directly.
Signed-off-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/20250328234345.989761-2-vi@endrift.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Vicki Pfau [Fri, 28 Mar 2025 23:43:36 +0000 (16:43 -0700)]
Input: xpad - fix two controller table values
Two controllers -- Mad Catz JOYTECH NEO SE Advanced and PDP Mirror's
Edge Official -- were missing the value of the mapping field, and thus
wouldn't detect properly.
Signed-off-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/20250328234345.989761-1-vi@endrift.com
Fixes:
540602a43ae5 ("Input: xpad - add a few new VID/PID combinations")
Fixes:
3492321e2e60 ("Input: xpad - add multiple supported devices")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Ulf Hansson [Thu, 6 Mar 2025 11:50:21 +0000 (12:50 +0100)]
Input: hisi_powerkey - enable system-wakeup for s2idle
To wake up the system from s2idle when pressing the power-button, let's
convert from using pm_wakeup_event() to pm_wakeup_dev_event(), as it allows
us to specify the "hard" in-parameter, which needs to be set for s2idle.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20250306115021.797426-1-ulf.hansson@linaro.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Sat, 10 May 2025 22:50:56 +0000 (15:50 -0700)]
Merge tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git./linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"22 hotfixes. 13 are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates"
* tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mm: fix folio_pte_batch() on XEN PV
nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs()
mm/hugetlb: copy the CMA flag when demoting
mm, swap: fix false warning for large allocation with !THP_SWAP
selftests/mm: fix a build failure on powerpc
selftests/mm: fix build break when compiling pkey_util.c
mm: vmalloc: support more granular vrealloc() sizing
tools/testing/selftests: fix guard region test tmpfs assumption
ocfs2: stop quota recovery before disabling quotas
ocfs2: implement handshaking with ocfs2 recovery thread
ocfs2: switch osb->disable_recovery to enum
mailmap: map Uwe's BayLibre addresses to a single one
MAINTAINERS: add mm THP section
mm/userfaultfd: fix uninitialized output field for -EAGAIN race
selftests/mm: compaction_test: support platform with huge mount of memory
MAINTAINERS: add core mm section
ocfs2: fix panic in failed foilio allocation
mm/huge_memory: fix dereferencing invalid pmd migration entry
MAINTAINERS: add reverse mapping section
x86: disable image size check for test builds
...
Linus Torvalds [Sat, 10 May 2025 16:53:11 +0000 (09:53 -0700)]
Merge tag 'driver-core-6.15-rc6' of git://git./linux/kernel/git/driver-core/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core fix for a regression for platform devices
that is a regression from a change that went into 6.15-rc1 that
affected Pixel devices. It has been in linux-next for over a week with
no reported problems"
* tag 'driver-core-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
platform: Fix race condition during DMA configure at IOMMU probe time
Linus Torvalds [Sat, 10 May 2025 16:18:05 +0000 (09:18 -0700)]
Merge tag 'usb-6.15-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 6.15-rc6. Included in here
are:
- typec driver fixes
- usbtmc ioctl fixes
- xhci driver fixes
- cdnsp driver fixes
- some gadget driver fixes
Nothing really major, just all little stuff that people have reported
being issues. All of these have been in linux-next this week with no
reported issues"
* tag 'usb-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: dbc: Avoid event polling busyloop if pending rx transfers are inactive.
usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue
usb: usbtmc: Fix erroneous generic_read ioctl return
usb: usbtmc: Fix erroneous wait_srq ioctl return
usb: usbtmc: Fix erroneous get_stb ioctl error returns
usb: typec: tcpm: delay SNK_TRY_WAIT_DEBOUNCE to SRC_TRYWAIT transition
USB: usbtmc: use interruptible sleep in usbtmc_read
usb: cdnsp: fix L1 resume issue for RTL_REVISION_NEW_LPM version
usb: typec: ucsi: displayport: Fix NULL pointer access
usb: typec: ucsi: displayport: Fix deadlock
usb: misc: onboard_usb_dev: fix support for Cypress HX3 hubs
usb: uhci-platform: Make the clock really optional
usb: dwc3: gadget: Make gadget_wakeup asynchronous
usb: gadget: Use get_status callback to set remote wakeup capability
usb: gadget: f_ecm: Add get_status callback
usb: host: tegra: Prevent host controller crash when OTG port is used
usb: cdnsp: Fix issue with resuming from L1
usb: gadget: tegra-xudc: ACK ST_RC after clearing CTRL_RUN
Linus Torvalds [Sat, 10 May 2025 16:08:19 +0000 (09:08 -0700)]
Merge tag 'staging-6.15-rc6' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are three small staging driver fixes for 6.15-rc6. These are:
- bcm2835-camera driver fix
- two axis-fifo driver fixes
All of these have been in linux-next for a few weeks with no reported
issues"
* tag 'staging-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: axis-fifo: Remove hardware resets for user errors
staging: axis-fifo: Correct handling of tx_fifo_depth for size validation
staging: bcm2835-camera: Initialise dev in v4l2_dev
Linus Torvalds [Sat, 10 May 2025 15:55:15 +0000 (08:55 -0700)]
Merge tag 'char-misc-6.15-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc/IIO driver fixes from Greg KH:
"Here are a bunch of small driver fixes (mostly all IIO) for 6.15-rc6.
Included in here are:
- loads of tiny IIO driver fixes for reported issues
- hyperv driver fix for a much-reported and worked on sysfs ring
buffer creation bug
All of these have been in linux-next for over a week (the IIO ones for
many weeks now), with no reported issues"
* tag 'char-misc-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits)
Drivers: hv: Make the sysfs node size for the ring buffer dynamic
uio_hv_generic: Fix sysfs creation path for ring buffer
iio: adis16201: Correct inclinometer channel resolution
iio: adc: ad7606: fix serial register access
iio: pressure: mprls0025pa: use aligned_s64 for timestamp
iio: imu: adis16550: align buffers for timestamp
staging: iio: adc: ad7816: Correct conditional logic for store mode
iio: adc: ad7266: Fix potential timestamp alignment issue.
iio: adc: ad7768-1: Fix insufficient alignment of timestamp.
iio: adc: dln2: Use aligned_s64 for timestamp
iio: accel: adxl355: Make timestamp 64-bit aligned using aligned_s64
iio: temp: maxim-thermocouple: Fix potential lack of DMA safe buffer.
iio: chemical: pms7003: use aligned_s64 for timestamp
iio: chemical: sps30: use aligned_s64 for timestamp
iio: imu: inv_mpu6050: align buffer for timestamp
iio: imu: st_lsm6dsx: Fix wakeup source leaks on device unbind
iio: adc: qcom-spmi-iadc: Fix wakeup source leaks on device unbind
iio: accel: fxls8962af: Fix wakeup source leaks on device unbind
iio: adc: ad7380: fix event threshold shift
iio: hid-sensor-prox: Fix incorrect OFFSET calculation
...
Linus Torvalds [Sat, 10 May 2025 15:52:41 +0000 (08:52 -0700)]
Merge tag 'i2c-for-6.15-rc6' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- omap: use correct function to read from device tree
- MAINTAINERS: remove Seth from ISMT maintainership
* tag 'i2c-for-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: Remove entry for Seth Heasley
i2c: omap: fix deprecated of_property_read_bool() use
Linus Torvalds [Sat, 10 May 2025 15:44:36 +0000 (08:44 -0700)]
Merge tag 'for-linus-6.15a-rc6-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A fix for the xenbus driver allowing to use a PVH Dom0 with
Xenstore running in another domain
- A fix for the xenbus driver addressing a rare race condition
resulting in NULL dereferences and other problems
- A fix for the xen-swiotlb driver fixing a problem seen on Arm
platforms
* tag 'for-linus-6.15a-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: Use kref to track req lifetime
xenbus: Allow PVH dom0 a non-local xenstore
xen: swiotlb: Use swiotlb bouncing if kmalloc allocation demands it
Linus Torvalds [Sat, 10 May 2025 15:36:07 +0000 (08:36 -0700)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs
Pull mount fixes from Al Viro:
"A couple of races around legalize_mnt vs umount (both fairly old and
hard to hit) plus two bugs in move_mount(2) - both around 'move
detached subtree in place' logics"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix IS_MNT_PROPAGATING uses
do_move_mount(): don't leak MNTNS_PROPAGATING on failures
do_umount(): add missing barrier before refcount checks in sync case
__legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock
Paolo Bonzini [Sat, 10 May 2025 15:11:06 +0000 (11:11 -0400)]
Merge tag 'kvm-x86-fixes-6.15-rcN' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.15-rcN
- Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid causing
problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to sanitize the
VMCB as its state is undefined after SHUTDOWN, emulating INIT is the
least awful choice).
- Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM
KVM doesn't goof a sanity check in the future.
- Free obsolete roots when (re)loading the MMU to fix a bug where
pre-faulting memory can get stuck due to always encountering a stale
root.
- When dumping GHCB state, use KVM's snapshot instead of the raw GHCB page
to print state, so that KVM doesn't print stale/wrong information.
- When changing memory attributes (e.g. shared <=> private), add potential
hugepage ranges to the mmu_invalidate_range_{start,end} set so that KVM
doesn't create a shared/private hugepage when the the corresponding
attributes will become mixed (the attributes are commited *after* KVM
finishes the invalidation).
- Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM has at
least one active VM. Effectively BP_SPEC_REDUCE when KVM is loaded led
to very measurable performance regressions for non-KVM workloads.
Paolo Bonzini [Sat, 10 May 2025 15:10:02 +0000 (11:10 -0400)]
Merge tag 'kvmarm-fixes-6.15-3' of https://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.15, round #3
- Avoid use of uninitialized memcache pointer in user_mem_abort()
- Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts
to be taken while TGE=0 and fixing an ugly bug on AmpereOne that
occurs when taking an interrupt while clearing the xMO bits
(AC03_CPU_36)
- Prevent VMMs from hiding support for AArch64 at any EL virtualized by
KVM
- Save/restore the host value for HCRX_EL2 instead of restoring an
incorrect fixed value
- Make host_stage2_set_owner_locked() check that the entire requested
range is memory rather than just the first page
Paolo Bonzini [Sat, 10 May 2025 15:09:26 +0000 (11:09 -0400)]
Merge tag 'kvm-riscv-fixes-6.15-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.15, take #1
- Add missing reset of smstateen CSRs
Wolfram Sang [Sat, 10 May 2025 09:41:13 +0000 (11:41 +0200)]
Merge tag 'i2c-host-fixes-6.15-rc6' of git://git./linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.15-rc6
- omap: use correct function to read from device tree
- MAINTAINERS: remove Seth from ISMT maintainership
Linus Torvalds [Fri, 9 May 2025 23:45:21 +0000 (16:45 -0700)]
Merge tag '6.15-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix dentry leak which can cause umount crash
- Add warning for parse contexts error on compounded operation
* tag '6.15-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Avoid race in open_cached_dir with lease breaks
smb3 client: warn when parse contexts returns error on compounded operation
Al Viro [Thu, 8 May 2025 19:35:51 +0000 (15:35 -0400)]
fix IS_MNT_PROPAGATING uses
propagate_mnt() does not attach anything to mounts created during
propagate_mnt() itself. What's more, anything on ->mnt_slave_list
of such new mount must also be new, so we don't need to even look
there.
When move_mount() had been introduced, we've got an additional
class of mounts to skip - if we are moving from anon namespace,
we do not want to propagate to mounts we are moving (i.e. all
mounts in that anon namespace).
Unfortunately, the part about "everything on their ->mnt_slave_list
will also be ignorable" is not true - if we have propagation graph
A -> B -> C
and do OPEN_TREE_CLONE open_tree() of B, we get
A -> [B <-> B'] -> C
as propagation graph, where B' is a clone of B in our detached tree.
Making B private will result in
A -> B' -> C
C still gets propagation from A, as it would after making B private
if we hadn't done that open_tree(), but now the propagation goes
through B'. Trying to move_mount() our detached tree on subdirectory
in A should have
* moved B' on that subdirectory in A
* skipped the corresponding subdirectory in B' itself
* copied B' on the corresponding subdirectory in C.
As it is, the logics in propagation_next() and friends ends up
skipping propagation into C, since it doesn't consider anything
downstream of B'.
IOW, walking the propagation graph should only skip the ->mnt_slave_list
of new mounts; the only places where the check for "in that one
anon namespace" are applicable are propagate_one() (where we should
treat that as the same kind of thing as "mountpoint we are looking
at is not visible in the mount we are looking at") and
propagation_would_overmount(). The latter is better dealt with
in the caller (can_move_mount_beneath()); on the first call of
propagation_would_overmount() the test is always false, on the
second it is always true in "move from anon namespace" case and
always false in "move within our namespace" one, so it's easier
to just use check_mnt() before bothering with the second call and
be done with that.
Fixes:
064fe6e233e8 ("mount: handle mount propagation for detached mount trees")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 29 Apr 2025 01:43:23 +0000 (21:43 -0400)]
do_move_mount(): don't leak MNTNS_PROPAGATING on failures
as it is, a failed move_mount(2) from anon namespace breaks
all further propagation into that namespace, including normal
mounts in non-anon namespaces that would otherwise propagate
there.
Fixes:
064fe6e233e8 ("mount: handle mount propagation for detached mount trees")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 29 Apr 2025 03:56:14 +0000 (23:56 -0400)]
do_umount(): add missing barrier before refcount checks in sync case
do_umount() analogue of the race fixed in
119e1ef80ecf "fix
__legitimize_mnt()/mntput() race". Here we want to make sure that
if __legitimize_mnt() doesn't notice our lock_mount_hash(), we will
notice their refcount increment. Harder to hit than mntput_no_expire()
one, fortunately, and consequences are milder (sync umount acting
like umount -l on a rare race with RCU pathwalk hitting at just the
wrong time instead of use-after-free galore mntput_no_expire()
counterpart used to be hit). Still a bug...
Fixes:
48a066e72d97 ("RCU'd vfsmounts")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 27 Apr 2025 19:41:51 +0000 (15:41 -0400)]
__legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock
... or we risk stealing final mntput from sync umount - raising mnt_count
after umount(2) has verified that victim is not busy, but before it
has set MNT_SYNC_UMOUNT; in that case __legitimize_mnt() doesn't see
that it's safe to quietly undo mnt_count increment and leaves dropping
the reference to caller, where it'll be a full-blown mntput().
Check under mount_lock is needed; leaving the current one done before
taking that makes no sense - it's nowhere near common enough to bother
with.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Fri, 9 May 2025 21:06:34 +0000 (14:06 -0700)]
Merge tag 'rust-fixes-6.15-2' of git://git./linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
- Make CFI_AUTO_DEFAULT depend on !RUST or Rust >= 1.88.0
- Clean Rust (and Clippy) lints for the upcoming Rust 1.87.0 and 1.88.0
releases
- Clean objtool warning for the upcoming Rust 1.87.0 release by adding
one more noreturn function
* tag 'rust-fixes-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
x86/Kconfig: make CFI_AUTO_DEFAULT depend on !RUST or Rust >= 1.88
rust: clean Rust 1.88.0's `clippy::uninlined_format_args` lint
rust: clean Rust 1.88.0's warning about `clippy::disallowed_macros` configuration
rust: clean Rust 1.88.0's `unnecessary_transmutes` lint
rust: allow Rust 1.87.0's `clippy::ptr_eq` lint
objtool/rust: add one more `noreturn` Rust function for Rust 1.87.0
Linus Torvalds [Fri, 9 May 2025 19:41:34 +0000 (12:41 -0700)]
Merge tag 'drm-fixes-2025-05-10' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, bit bigger than last week, but overall amdgpu/xe
with some ivpu bits and a random few fixes, and dropping the
ttm_backup struct which wrapped struct file and was recently
frowned at.
drm:
- Fix overflow when generating wedged event
ttm:
- Fix documentation
- Remove struct ttm_backup
panel:
- simple: Fix timings for AUO G101EVN010
amdgpu:
- DC FP fixes
- Freesync fix
- DMUB AUX fixes
- VCN fix
- Hibernation fixes
- HDP fixes
xe:
- Prevent PF queue overflow
- Hold all forcewake during mocs test
- Remove GSC flush on reset path
- Fix forcewake put on error path
- Fix runtime warning when building without svm
i915:
- Fix oops on resume after disconnecting DP MST sinks during suspend
- Fix SPLC num_waiters refcounting
ivpu:
- Increase timeouts
- Fix deadlock in cmdq ioctl
- Unlock mutices in correct order
v3d:
- Avoid memory leak in job handling"
* tag 'drm-fixes-2025-05-10' of https://gitlab.freedesktop.org/drm/kernel: (32 commits)
drm/i915/dp: Fix determining SST/MST mode during MTP TU state computation
drm/xe: Add config control for svm flush work
drm/xe: Release force wake first then runtime power
drm/xe/gsc: do not flush the GSC worker from the reset path
drm/xe/tests/mocs: Hold XE_FORCEWAKE_ALL for LNCF regs
drm/xe: Add page queue multiplier
drm/amdgpu/hdp7: use memcfg register to post the write for HDP flush
drm/amdgpu/hdp6: use memcfg register to post the write for HDP flush
drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flush
drm/amdgpu/hdp5: use memcfg register to post the write for HDP flush
drm/amdgpu/hdp4: use memcfg register to post the write for HDP flush
drm/amdgpu: fix pm notifier handling
Revert "drm/amd: Stop evicting resources on APUs in suspend"
drm/amdgpu/vcn: using separate VCN1_AON_SOC offset
drm/amd/display: Fix wrong handling for AUX_DEFER case
drm/amd/display: Copy AUX read reply data whenever length > 0
drm/amd/display: Remove incorrect checking in dmub aux handler
drm/amd/display: Fix the checking condition in dmub aux handling
drm/amd/display: Shift DMUB AUX reply command if necessary
drm/amd/display: Call FP Protect Before Mode Programming/Mode Support
...
Dave Airlie [Fri, 9 May 2025 19:07:17 +0000 (05:07 +1000)]
Merge tag 'drm-intel-fixes-2025-05-09' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
drm/i915 fixes for v6.15-rc6:
- Fix oops on resume after disconnecting DP MST sinks during suspend
- Fix SPLC num_waiters refcounting
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/87tt5umeaw.fsf@intel.com
Dave Airlie [Fri, 9 May 2025 19:02:38 +0000 (05:02 +1000)]
Merge tag 'drm-xe-fixes-2025-05-09' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Prevent PF queue overflow
- Hold all forcewake during mocs test
- Remove GSC flush on reset path
- Fix forcewake put on error path
- Fix runtime warning when building without svm
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/jffqa56f2zp4i5ztz677cdspgxhnw7qfop3dd3l2epykfpfvza@q2nw6wapsphz
Linus Torvalds [Fri, 9 May 2025 18:30:26 +0000 (11:30 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Move the arm64_use_ng_mappings variable from the .bss to the .data
section as it is accessed very early during boot with the MMU off and
before the .bss has been initialised.
This could lead to incorrect idmap page table"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpufeature: Move arm64_use_ng_mappings to the .data section to prevent wrong idmap generation
Linus Torvalds [Fri, 9 May 2025 18:17:50 +0000 (11:17 -0700)]
Merge tag 'riscv-for-linus-6.15-rc6' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- The compressed half-word misaligned access instructions (c.lhu, c.lh,
and c.sh) from the Zcb extension are now properly emulated
- A series of fixes to properly emulate permissions while handling
userspace misaligned accesses
- A pair of fixes for PR_GET_TAGGED_ADDR_CTRL to avoid accessing the
envcfg CSR on systems that don't support that CSR, and to report
those failures up to userspace
- The .rela.dyn section is no longer stripped from vmlinux, as it is
necessary to relocate the kernel under some conditions (including
kexec)
* tag 'riscv-for-linus-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Disallow PR_GET_TAGGED_ADDR_CTRL without Supm
scripts: Do not strip .rela.dyn section
riscv: Fix kernel crash due to PR_SET_TAGGED_ADDR_CTRL
riscv: misaligned: use get_user() instead of __get_user()
riscv: misaligned: enable IRQs while handling misaligned accesses
riscv: misaligned: factorize trap handling
riscv: misaligned: Add handling for ZCB instructions
Linus Torvalds [Fri, 9 May 2025 17:34:50 +0000 (10:34 -0700)]
Merge tag 'block-6.15-
20250509' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix for a regression in this series for loop and read/write iterator
handling
- zone append block update tweak
- remove a broken IO priority test
- NVMe pull request via Christoph:
- unblock ctrl state transition for firmware update (Daniel
Wagner)
* tag 'block-6.15-
20250509' of git://git.kernel.dk/linux:
block: remove test of incorrect io priority level
nvme: unblock ctrl state transition for firmware update
block: only update request sector if needed
loop: Add sanity check for read/write_iter
Linus Torvalds [Fri, 9 May 2025 16:26:46 +0000 (09:26 -0700)]
Merge tag 'io_uring-6.15-
20250509' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for linked timeouts arming and firing wrt prep and issue of the
request being managed by the linked timeout
- Fix for a CQE ordering issue between requests with multishot and
using the same buffer group. This is a dumbed down version for this
release and for stable, it'll get improved for v6.16
- Tweak the SQPOLL submit batch size. A previous commit made SQPOLL
manage its own task_work and chose a tiny batch size, bump it from 8
to 32 to fix a performance regression due to that
* tag 'io_uring-6.15-
20250509' of git://git.kernel.dk/linux:
io_uring/sqpoll: Increase task_work submission batch size
io_uring: ensure deferred completions are flushed for multishot
io_uring: always arm linked timeouts prior to issue
Linus Torvalds [Fri, 9 May 2025 16:09:49 +0000 (09:09 -0700)]
Merge tag 'modules-6.15-rc6' of git://git./linux/kernel/git/modules/linux
Pull modules fix from Petr Pavlu:
"A single fix to prevent use of an uninitialized completion pointer
when releasing a module_kobject in specific situations.
This addresses a latent bug exposed by commit
f95bbfe18512 ("drivers:
base: handle module_kobject creation")"
* tag 'modules-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
module: ensure that kobject_put() is safe for module type kobjects
Dave Hansen [Thu, 8 May 2025 22:41:32 +0000 (15:41 -0700)]
x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
tl;dr: There is a window in the mm switching code where the new CR3 is
set and the CPU should be getting TLB flushes for the new mm. But
should_flush_tlb() has a bug and suppresses the flush. Fix it by
widening the window where should_flush_tlb() sends an IPI.
Long Version:
=== History ===
There were a few things leading up to this.
First, updating mm_cpumask() was observed to be too expensive, so it was
made lazier. But being lazy caused too many unnecessary IPIs to CPUs
due to the now-lazy mm_cpumask(). So code was added to cull
mm_cpumask() periodically[2]. But that culling was a bit too aggressive
and skipped sending TLB flushes to CPUs that need them. So here we are
again.
=== Problem ===
The too-aggressive code in should_flush_tlb() strikes in this window:
// Turn on IPIs for this CPU/mm combination, but only
// if should_flush_tlb() agrees:
cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
load_new_mm_cr3(need_flush);
// ^ After 'need_flush' is set to false, IPIs *MUST*
// be sent to this CPU and not be ignored.
this_cpu_write(cpu_tlbstate.loaded_mm, next);
// ^ Not until this point does should_flush_tlb()
// become true!
should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()
and writing to 'loaded_mm', which is a window where they should not be
suppressed. Whoops.
=== Solution ===
Thankfully, the fuzzy "just about to write CR3" window is already marked
with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in
should_flush_tlb() is sufficient to ensure that the CPU is targeted with
an IPI.
This will cause more TLB flush IPIs. But the window is relatively small
and I do not expect this to cause any kind of measurable performance
impact.
Update the comment where LOADED_MM_SWITCHING is written since it grew
yet another user.
Peter Z also raised a concern that should_flush_tlb() might not observe
'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()
writes them. Add a barrier to ensure that they are observed in the
order they are written.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
Fixes:
6db2526c1d69 ("x86/mm/tlb: Only trim the mm_cpumask once a second") [2]
Reported-by: Stephen Dolan <sdolan@janestreet.com>
Cc: stable@vger.kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gabriel Krisman Bertazi [Thu, 8 May 2025 18:12:03 +0000 (14:12 -0400)]
io_uring/sqpoll: Increase task_work submission batch size
Our QA team reported a 10%-23%, throughput reduction on an io_uring
sqpoll testcase doing IO to a null_blk, that I traced back to a
reduction of the device submission queue depth utilization. It turns out
that, after commit
af5d68f8892f ("io_uring/sqpoll: manage task_work
privately"), we capped the number of task_work entries that can be
completed from a single spin of sqpoll to only 8 entries, before the
sqpoll goes around to (potentially) sleep. While this cap doesn't drive
the submission side directly, it impacts the completion behavior, which
affects the number of IO queued by fio per sqpoll cycle on the
submission side, and io_uring ends up seeing less ios per sqpoll cycle.
As a result, block layer plugging is less effective, and we see more
time spent inside the block layer in profilings charts, and increased
submission latency measured by fio.
There are other places that have increased overhead once sqpoll sleeps
more often, such as the sqpoll utilization calculation. But, in this
microbenchmark, those were not representative enough in perf charts, and
their removal didn't yield measurable changes in throughput. The major
overhead comes from the fact we plug less, and less often, when submitting
to the block layer.
My benchmark is:
fio --ioengine=io_uring --direct=1 --iodepth=128 --runtime=300 --bs=4k \
--invalidate=1 --time_based --ramp_time=10 --group_reporting=1 \
--filename=/dev/nullb0 --name=RandomReads-direct-nullb-sqpoll-4k-1 \
--rw=randread --numjobs=1 --sqthread_poll
In one machine, tested on top of Linux 6.15-rc1, we have the following
baseline:
READ: bw=4994MiB/s (5236MB/s), 4994MiB/s-4994MiB/s (5236MB/s-5236MB/s), io=439GiB (471GB), run=90001-90001msec
With this patch:
READ: bw=5762MiB/s (6042MB/s), 5762MiB/s-5762MiB/s (6042MB/s-6042MB/s), io=506GiB (544GB), run=90001-90001msec
which is a 15% improvement in measured bandwidth. The average
submission latency is noticeably lowered too. As measured by
fio:
Baseline:
lat (usec): min=20, max=241, avg=99.81, stdev=3.38
Patched:
lat (usec): min=26, max=226, avg=86.48, stdev=4.82
If we look at blktrace, we can also see the plugging behavior is
improved. In the baseline, we end up limited to plugging 8 requests in
the block layer regardless of the device queue depth size, while after
patching we can drive more io, and we manage to utilize the full device
queue.
In the baseline, after a stabilization phase, an ordinary submission
looks like:
254,0 1 49942 0.
016028795 5977 U N [iou-sqp-5976] 7
After patching, I see consistently more requests per unplug.
254,0 1 4996 0.
001432872 3145 U N [iou-sqp-3144] 32
Ideally, the cap size would at least be the deep enough to fill the
device queue, but we can't predict that behavior, or assume all IO goes
to a single device, and thus can't guess the ideal batch size. We also
don't want to let the tw run unbounded, though I'm not sure it would
really be a problem. Instead, let's just give it a more sensible value
that will allow for more efficient batching. I've tested with different
cap values, and initially proposed to increase the cap to 1024. Jens
argued it is too big of a bump and I observed that, with 32, I'm no
longer able to observe this bottleneck in any of my machines.
Fixes:
af5d68f8892f ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20250508181203.3785544-1-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Imre Deak [Wed, 7 May 2025 15:19:53 +0000 (18:19 +0300)]
drm/i915/dp: Fix determining SST/MST mode during MTP TU state computation
Determining the SST/MST mode during state computation must be done based
on the output type stored in the CRTC state, which in turn is set once
based on the modeset connector's SST vs. MST type and will not change as
long as the connector is using the CRTC. OTOH the MST mode indicated by
the given connector's intel_dp::is_mst flag can change independently of
the above output type, based on what sink is at any moment plugged to
the connector.
Fix the state computation accordingly.
Cc: Jani Nikula <jani.nikula@intel.com>
Fixes:
f6971d7427c2 ("drm/i915/mst: adapt intel_dp_mtp_tu_compute_config() for 128b/132b SST")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4607
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://lore.kernel.org/r/20250507151953.251846-1-imre.deak@intel.com
(cherry picked from commit
0f45696ddb2b901fbf15cb8d2e89767be481d59f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tom Lendacky [Thu, 8 May 2025 17:24:10 +0000 (12:24 -0500)]
memblock: Accept allocated memory before use in memblock_double_array()
When increasing the array size in memblock_double_array() and the slab
is not yet available, a call to memblock_find_in_range() is used to
reserve/allocate memory. However, the range returned may not have been
accepted, which can result in a crash when booting an SNP guest:
RIP: 0010:memcpy_orig+0x68/0x130
Code: ...
RSP: 0000:
ffffffff9cc03ce8 EFLAGS:
00010006
RAX:
ff11001ff83e5000 RBX:
0000000000000000 RCX:
fffffffffffff000
RDX:
0000000000000bc0 RSI:
ffffffff9dba8860 RDI:
ff11001ff83e5c00
RBP:
0000000000002000 R08:
0000000000000000 R09:
0000000000002000
R10:
000000207fffe000 R11:
0000040000000000 R12:
ffffffff9d06ef78
R13:
ff11001ff83e5000 R14:
ffffffff9dba7c60 R15:
0000000000000c00
memblock_double_array+0xff/0x310
memblock_add_range+0x1fb/0x2f0
memblock_reserve+0x4f/0xa0
memblock_alloc_range_nid+0xac/0x130
memblock_alloc_internal+0x53/0xc0
memblock_alloc_try_nid+0x3d/0xa0
swiotlb_init_remap+0x149/0x2f0
mem_init+0xb/0xb0
mm_core_init+0x8f/0x350
start_kernel+0x17e/0x5d0
x86_64_start_reservations+0x14/0x30
x86_64_start_kernel+0x92/0xa0
secondary_startup_64_no_verify+0x194/0x19b
Mitigate this by calling accept_memory() on the memory range returned
before the slab is available.
Prior to v6.12, the accept_memory() interface used a 'start' and 'end'
parameter instead of 'start' and 'size', therefore the accept_memory()
call must be adjusted to specify 'start + size' for 'end' when applying
to kernels prior to v6.12.
Cc: stable@vger.kernel.org # see patch description, needs adjustments for <= 6.11
Fixes:
dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/da1ac73bf4ded761e21b4e4bb5178382a580cd73.1746725050.git.thomas.lendacky@amd.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Dave Airlie [Fri, 9 May 2025 01:10:41 +0000 (11:10 +1000)]
Merge tag 'amd-drm-fixes-6.15-2025-05-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.15-2025-05-08:
amdgpu:
- DC FP fixes
- Freesync fix
- DMUB AUX fixes
- VCN fix
- Hibernation fixes
- HDP fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250508194102.3242372-1-alexander.deucher@amd.com
Dave Airlie [Thu, 8 May 2025 22:51:57 +0000 (08:51 +1000)]
Merge tag 'drm-misc-fixes-2025-05-08' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
drm:
- Fix overflow when generating wedged event
ivpu:
- Increate timeouts
- Fix deadlock in cmdq ioctl
- Unlock mutices in correct order
panel:
- simple: Fix timings for AUO G101EVN010
ttm:
- Fix documentation
- Remove struct ttm_backup
v3d:
- Avoid memory leak in job handling
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250508104939.GA76697@2a02-2454-fd5e-fd00-c110-cbf2-6528-c5be.dyn6.pyur.net
Linus Torvalds [Thu, 8 May 2025 21:28:49 +0000 (14:28 -0700)]
Merge tag 'bcachefs-2025-05-08' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- Some fixes to help with filesystem analysis: ensure superblock
error count gets written if we go ERO, don't discard the journal
aggressively (so it's available for list_journal -a)
- Fix lost wakeup on arm causing us to get stuck when reading btree
nodes
- Fix fsck failing to exit on ctrl-c
- An additional fix for filesystems with misaligned bucket sizes: we
now ensure that allocations are properly aligned
- Setting background target but not promote target will now leave that
data cached on the foreground target, as it used to
- Revert a change to when we allocate the VFS superblock, this was done
for implementing blk_holder_ops but ended up not being needed, and
allocating a superblock and not setting SB_BORN while we do recovery
caused sync() calls and other things to hang
- Assorted fixes for harmless error messages that caused concern to
users
* tag 'bcachefs-2025-05-08' of git://evilpiepirate.org/bcachefs:
bcachefs: Don't aggressively discard the journal
bcachefs: Ensure superblock gets written when we go ERO
bcachefs: Filter out harmless EROFS error messages
bcachefs: journal_shutdown is EROFS, not EIO
bcachefs: Call bch2_fs_start before getting vfs superblock
bcachefs: fix hung task timeout in journal read
bcachefs: Add missing barriers before wake_up_bit()
bcachefs: Ensure proper write alignment
bcachefs: Improve want_cached_ptr()
bcachefs: thread_with_stdio: fix spinning instead of exiting
Shuicheng Lin [Fri, 2 May 2025 17:00:52 +0000 (17:00 +0000)]
drm/xe: Add config control for svm flush work
Without CONFIG_DRM_XE_GPUSVM set, GPU SVM is not initialized thus below
warning pops. Refine the flush work code to be controlled by the config
to avoid below warning:
"
[ 453.132028] ------------[ cut here ]------------
[ 453.132527] WARNING: CPU: 9 PID: 4491 at kernel/workqueue.c:4205 __flush_work+0x379/0x3a0
[ 453.133355] Modules linked in: xe drm_ttm_helper ttm gpu_sched drm_buddy drm_suballoc_helper drm_gpuvm drm_exec
[ 453.134352] CPU: 9 UID: 0 PID: 4491 Comm: xe_exec_mix_mod Tainted: G U W 6.15.0-rc3+ #7 PREEMPT(full)
[ 453.135405] Tainted: [U]=USER, [W]=WARN
...
[ 453.136921] RIP: 0010:__flush_work+0x379/0x3a0
[ 453.137417] Code: 8b 45 00 48 8b 55 08 89 c7 48 c1 e8 04 83 e7 08 83 e0 0f 83 cf 02 89 c6 48 0f ba 6d 00 03 e9 d5 fe ff ff 0f 0b e9 db fd ff ff <0f> 0b 45 31 e4 e9 d1 fd ff ff 0f 0b e9 03 ff ff ff 0f 0b e9 d6 fe
[ 453.139250] RSP: 0018:
ffffc90000c67b18 EFLAGS:
00010246
[ 453.139782] RAX:
0000000000000000 RBX:
ffff888108a24000 RCX:
0000000000002000
[ 453.140521] RDX:
0000000000000001 RSI:
0000000000000000 RDI:
ffff8881016d61c8
[ 453.141253] RBP:
ffff8881016d61c8 R08:
0000000000000000 R09:
0000000000000000
[ 453.141985] R10:
0000000000000000 R11:
0000000008a24000 R12:
0000000000000001
[ 453.142709] R13:
0000000000000002 R14:
0000000000000000 R15:
ffff888107db8c00
[ 453.143450] FS:
00007f44853d4c80(0000) GS:
ffff8882f469b000(0000) knlGS:
0000000000000000
[ 453.144276] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 453.144853] CR2:
00007f4487629228 CR3:
00000001016aa000 CR4:
00000000000406f0
[ 453.145594] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 453.146320] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 453.147061] Call Trace:
[ 453.147336] <TASK>
[ 453.147579] ? tick_nohz_tick_stopped+0xd/0x30
[ 453.148067] ? xas_load+0x9/0xb0
[ 453.148435] ? xa_load+0x6f/0xb0
[ 453.148781] __xe_vm_bind_ioctl+0xbd5/0x1500 [xe]
[ 453.149338] ? dev_printk_emit+0x48/0x70
[ 453.149762] ? _dev_printk+0x57/0x80
[ 453.150148] ? drm_ioctl+0x17c/0x440
[ 453.150544] ? __drm_dev_vprintk+0x36/0x90
[ 453.150983] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[ 453.151575] ? drm_ioctl_kernel+0x9f/0xf0
[ 453.151998] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[ 453.152560] drm_ioctl_kernel+0x9f/0xf0
[ 453.152968] drm_ioctl+0x20f/0x440
[ 453.153332] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[ 453.153893] ? ioctl_has_perm.constprop.0.isra.0+0xae/0x100
[ 453.154489] ? memory_bm_test_bit+0x5/0x60
[ 453.154935] xe_drm_ioctl+0x47/0x70 [xe]
[ 453.155419] __x64_sys_ioctl+0x8d/0xc0
[ 453.155824] do_syscall_64+0x47/0x110
[ 453.156228] entry_SYSCALL_64_after_hwframe+0x76/0x7e
"
v2 (Matt):
refine commit message to have more details
add Fixes tag
move the code to xe_svm.h which already have the config
remove a blank line per codestyle suggestion
Fixes:
63f6e480d115 ("drm/xe: Add SVM garbage collector")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250502170052.1787973-1-shuicheng.lin@intel.com
(cherry picked from commit
9d80698bcd97a5ad1088bcbb055e73fd068895e2)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Shuicheng Lin [Wed, 7 May 2025 02:23:02 +0000 (02:23 +0000)]
drm/xe: Release force wake first then runtime power
xe_force_wake_get() is dependent on xe_pm_runtime_get(), so for
the release path, xe_force_wake_put() should be called first then
xe_pm_runtime_put().
Combine the error path and normal path together with goto.
Fixes:
85d547608ef5 ("drm/xe/xe_gt_debugfs: Update handling of xe_force_wake_get return")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250507022302.2187527-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit
432cd94efdca06296cc5e76d673546f58aa90ee1)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>