Jens Axboe [Wed, 18 Jun 2025 12:44:03 +0000 (06:44 -0600)]
io_uring/fdinfo: add some defer tw ring debugging
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 18 Jun 2025 11:04:09 +0000 (05:04 -0600)]
io_uring: switch defer task_work to using a ring
Currently deferred task_work uses an llist for entries. Those lists are
LIFO, and hence are reversed when the work is run, to retain ordering.
Store the pending task_work in a multi-producer/single-consumer ring,
which preserves the ordering.
WIP - doesn't handle overflow of the defer tw ring.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 21 Nov 2024 15:27:10 +0000 (08:27 -0700)]
io_uring: make task_work pending check dependent on ring type
There's no need to check for generic task_work for DEFER_TASKRUN, if we
have local task_work pending. This avoids dipping into the huge
task_struct, if we have normal task_work pending.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 18 Jun 2025 11:12:31 +0000 (05:12 -0600)]
Merge branch 'for-6.17/io_uring' into io_uring-defer-tw.4
* for-6.17/io_uring:
io_uring/nop: add IORING_NOP_TW completion flag
io_uring/uring_cmd: implement ->sqe_copy() to avoid unnecessary copies
io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup()
io_uring: add struct io_cold_def->sqe_copy() method
io_uring: add IO_URING_F_INLINE issue flag
Jens Axboe [Wed, 18 Jun 2025 11:12:28 +0000 (05:12 -0600)]
Merge branch 'io_uring-6.16' into io_uring-defer-tw.4
* io_uring-6.16:
io_uring: fix potential page leak in io_sqe_buffer_register()
io_uring/sqpoll: don't put task_struct on tctx setup failure
io_uring: remove duplicate io_uring_alloc_task_context() definition
io_uring: fix task leak issue in io_wq_create()
io_uring/rsrc: validate buffer count with offset for cloning
Penglei Jiang [Tue, 17 Jun 2025 16:56:44 +0000 (09:56 -0700)]
io_uring: fix potential page leak in io_sqe_buffer_register()
If allocation of the 'imu' fails, then the existing pages aren't
unpinned in the error path. This is mostly a theoretical issue,
requiring fault injection to hit.
Move unpin_user_pages() to unified error handling to fix the page leak
issue.
Fixes:
d8c2237d0aa9 ("io_uring: add io_pin_pages() helper")
Signed-off-by: Penglei Jiang <superman.xpt@gmail.com>
Link: https://lore.kernel.org/r/20250617165644.79165-1-superman.xpt@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Tue, 17 Jun 2025 21:58:52 +0000 (14:58 -0700)]
Merge tag 'libnvdimm-fixes-6.16-rc3' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Ira Weiny:
"This converts the pmem-region device tree bindings to YAML to fix
errors and bring it up to date"
* tag 'libnvdimm-fixes-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dt-bindings: pmem: Convert binding to YAML
Linus Torvalds [Tue, 17 Jun 2025 18:31:53 +0000 (11:31 -0700)]
Merge tag 'platform-drivers-x86-v6.16-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- amd/hsmp: Timeout handling fixes
- amd/pmc:
- Clear metrics table at start of cycle
- Add PCSpecialist Lafite Pro V 14M to 8042 quirks list
- amd/pmf: Fix error handling corner cases (nth attempt)
- alienware-wmi-wmax: Revert G-Mode support as it lowers performance
- dell_rbu:
- Fix sparse lock context warning
- Fix list head usage
- Don't overwrite data buffer past the size of the last packet
- ideapad-laptop: Ensure EC is not polled too frequently
- intel-uncore-freq:
- Fail module load when plat_info is NULL
- Avoid a non-literal format string as it triggers a compiler warning
- intel/pmc: Add Lunar Lake and Panther Lake support to SSRAM Telemetry
- intel/power-domains: Fix error code in tpmi_init()
- samsung-galaxybook: Add support for Notebook 9 Pro and others
(SAM0426)
* tag 'platform-drivers-x86-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
Revert "platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1"
platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks list
platform/x86/intel-uncore-freq: avoid non-literal format string
platform/x86/intel/pmc: Add Panther Lake support to Intel PMC SSRAM Telemetry
platform/x86/intel/pmc: Add Lunar Lake support to Intel PMC SSRAM Telemetry
MAINTAINERS: .mailmap: Update Hans de Goede's email address
platform/x86: dell_rbu: Bump version
platform/x86: dell_rbu: Stop overwriting data buffer
platform/x86: dell_rbu: Fix list usage
platform/x86: dell_rbu: Fix lock context warning
platform/x86/amd: pmf: Simplify error flow in amd_pmf_init_smart_pc()
platform/x86/amd: pmf: Prevent amd_pmf_tee_deinit() from running twice
platform/x86/amd: pmf: Use device managed allocations
x86/platform/amd: replace down_timeout() with down_interruptible()
x86/platform/amd: move final timeout check to after final sleep
platform/x86/amd: pmc: Clear metrics table at start of cycle
platform/x86/intel: power-domains: Fix error code in tpmi_init()
platform/x86: samsung-galaxybook: Add SAM0426
platform/x86/intel-uncore-freq: Fail module load when plat_info is NULL
platform/x86: ideapad-laptop: use usleep_range() for EC polling
Jens Axboe [Tue, 17 Jun 2025 12:43:18 +0000 (06:43 -0600)]
io_uring/sqpoll: don't put task_struct on tctx setup failure
A recent commit moved the error handling of sqpoll thread and tctx
failures into the thread itself, as part of fixing an issue. However, it
missed that tctx allocation may also fail, and that
io_sq_offload_create() does its own error handling for the task_struct
in that case.
Remove the manual task putting in io_sq_offload_create(), as
io_sq_thread() will notice that the tctx did not get setup and hence it
should put itself and exit.
Reported-by: syzbot+763e12bbf004fb1062e4@syzkaller.appspotmail.com
Fixes:
ac0b8b327a56 ("io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 17 Jun 2025 12:41:48 +0000 (06:41 -0600)]
io_uring: remove duplicate io_uring_alloc_task_context() definition
This function exists in both tctx.h (where it belongs) and in io_uring.h
as a remnant of before the tctx handling code got split out. Remove the
io_uring.h definition and ensure that sqpoll.c includes the tctx.h
header to get the definition.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Mon, 16 Jun 2025 18:36:21 +0000 (11:36 -0700)]
Merge tag 'x86_urgent_for_6.16-rc3' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Dave Hansen:
"This is a pretty scattered set of fixes. The majority of them are
further fixups around the recent ITS mitigations.
The rest don't really have a coherent story:
- Some flavors of Xen PV guests don't support large pages, but the
set_memory.c code assumes all CPUs support them.
Avoid problems with a quick CPU feature check.
- The TDX code has some wrappers to help retry calls to the TDX
module. They use function pointers to assembly functions and the
compiler usually generates direct CALLs. But some new compilers,
plus -Os turned them in to indirect CALLs and the assembly code was
not annotated for indirect calls.
Force inlining of the helper to fix it up.
- Last, a FRED issue showed up when single-stepping. It's fine when
using an external debugger, but was getting stuck returning from a
SIGTRAP handler otherwise.
Clear the FRED 'swevent' bit to ensure that forward progress is
made"
* tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "mm/execmem: Unify early execmem_cache behaviour"
x86/its: explicitly manage permissions for ITS pages
x86/its: move its_pages array to struct mod_arch_specific
x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set
x86/mm/pat: don't collapse pages without PSE set
x86/virt/tdx: Avoid indirect calls to TDX assembly functions
selftests/x86: Add a test to detect infinite SIGTRAP handler loop
x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler
Linus Torvalds [Mon, 16 Jun 2025 15:49:58 +0000 (08:49 -0700)]
Merge tag 'powerpc-6.16-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix to handle VDSO32 with pcrel
- Couple of dts fixes in microwatt and mpc8315erdb
- Fix to handle PE bridge reconfiguration in VFIO EEH recovery path
- Fix ioctl macros related to struct termio
Thanks to Christophe Leroy, Ganesh Goudar, J. Neuschäfer, Justin M.
Forbes, Michael Ellerman, Narayana Murty N, Tulio Magno, and Vaibhav
Jain
* tag 'powerpc-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix struct termio related ioctl macros
powerpc: dts: mpc8315erdb: Add GPIO controller node
powerpc/microwatt: Fix model property in device tree
powerpc/eeh: Fix missing PE bridge reconfiguration during VFIO EEH recovery
powerpc/vdso: Fix build of VDSO32 with pcrel
Linus Torvalds [Mon, 16 Jun 2025 15:18:43 +0000 (08:18 -0700)]
Merge tag 'vfs-6.16-rc3.fixes' of git://git./linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix a regression in overlayfs caused by reworking the lookup_one*()
set of helpers
- Make sure that the name of the dentry is printed in overlayfs'
mkdir() helper
- Add missing iocb values to TRACE_IOCB_STRINGS define
- Unlock the superblock during iterate_supers_type(). This was an
accidental internal api change
- Drop a misleading assert in file_seek_cur_needs_f_lock() helper
- Never refuse to return PIDFD_GET_INGO when parent pid is zero
That can trivially happen in container scenarios where the parent
process might be located in an ancestor pid namespace
- Don't revalidate in try_lookup_noperm() as that causes regression for
filesystems such as cifs
- Fix simple_xattr_list() and reset the err variable after
security_inode_listsecurity() got called so as not to confuse
userspace about the length of the xattr
* tag 'vfs-6.16-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: drop assert in file_seek_cur_needs_f_lock
fs: unlock the superblock during iterate_supers_type
ovl: fix debug print in case of mkdir error
VFS: change try_lookup_noperm() to skip revalidation
fs: add missing values to TRACE_IOCB_STRINGS
fs/xattr.c: fix simple_xattr_list()
ovl: fix regression caused by lookup helpers API changes
pidfs: never refuse ppid == 0 in PIDFD_GET_INFO
Luis Henriques [Fri, 13 Jun 2025 10:11:11 +0000 (11:11 +0100)]
fs: drop assert in file_seek_cur_needs_f_lock
The assert in function file_seek_cur_needs_f_lock() can be triggered very
easily because there are many users of vfs_llseek() (such as overlayfs)
that do their custom locking around llseek instead of relying on
fdget_pos(). Just drop the overzealous assertion.
Fixes:
da06e3c51794 ("fs: don't needlessly acquire f_lock")
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Luis Henriques <luis@igalia.com>
Link: https://lore.kernel.org/20250613101111.17716-1-luis@igalia.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Linus Torvalds [Sun, 15 Jun 2025 20:49:41 +0000 (13:49 -0700)]
Linux 6.16-rc2
Penglei Jiang [Sun, 15 Jun 2025 16:39:06 +0000 (09:39 -0700)]
io_uring: fix task leak issue in io_wq_create()
Add missing put_task_struct() in the error path
Cc: stable@vger.kernel.org
Fixes:
0f8baa3c9802 ("io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()")
Signed-off-by: Penglei Jiang <superman.xpt@gmail.com>
Link: https://lore.kernel.org/r/20250615163906.2367-1-superman.xpt@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 15 Jun 2025 16:14:27 +0000 (09:14 -0700)]
Merge tag 'kbuild-fixes-v6.16' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Move warnings about linux/export.h from W=1 to W=2
- Fix structure type overrides in gendwarfksyms
* tag 'kbuild-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
gendwarfksyms: Fix structure type overrides
kbuild: move warnings about linux/export.h from W=1 to W=2
Sami Tolvanen [Sat, 14 Jun 2025 00:55:33 +0000 (00:55 +0000)]
gendwarfksyms: Fix structure type overrides
As we always iterate through the entire die_map when expanding
type strings, recursively processing referenced types in
type_expand_child() is not actually necessary. Furthermore,
the type_string kABI rule added in commit
c9083467f7b9
("gendwarfksyms: Add a kABI rule to override type strings") can
fail to override type strings for structures due to a missing
kabi_get_type_string() check in this function.
Fix the issue by dropping the unnecessary recursion and moving
the override check to type_expand(). Note that symbol versions
are otherwise unchanged with this patch.
Fixes:
c9083467f7b9 ("gendwarfksyms: Add a kABI rule to override type strings")
Reported-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Masahiro Yamada [Thu, 12 Jun 2025 16:08:48 +0000 (01:08 +0900)]
kbuild: move warnings about linux/export.h from W=1 to W=2
This hides excessive warnings, as nobody builds with W=2.
Fixes:
a934a57a42f6 ("scripts/misc-check: check missing #include <linux/export.h> when W=1")
Fixes:
7d95680d64ac ("scripts/misc-check: check unnecessary #include <linux/export.h> when W=1")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Jens Axboe [Sun, 15 Jun 2025 14:09:14 +0000 (08:09 -0600)]
io_uring/rsrc: validate buffer count with offset for cloning
syzbot reports that it can trigger a WARN_ON() for kmalloc() attempt
that's too big:
WARNING: CPU: 0 PID: 6488 at mm/slub.c:5024 __kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024
Modules linked in:
CPU: 0 UID: 0 PID: 6488 Comm: syz-executor312 Not tainted
6.15.0-rc7-syzkaller-gd7fa1af5b33e #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate:
20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024
lr : __do_kmalloc_node mm/slub.c:-1 [inline]
lr : __kvmalloc_node_noprof+0x3b4/0x640 mm/slub.c:5012
sp :
ffff80009cfd7a90
x29:
ffff80009cfd7ac0 x28:
ffff0000dd52a120 x27:
0000000000412dc0
x26:
0000000000000178 x25:
ffff7000139faf70 x24:
0000000000000000
x23:
ffff800082f4cea8 x22:
00000000ffffffff x21:
000000010cd004a8
x20:
ffff0000d75816c0 x19:
ffff0000dd52a000 x18:
00000000ffffffff
x17:
ffff800092f39000 x16:
ffff80008adbe9e4 x15:
0000000000000005
x14:
1ffff000139faf1c x13:
0000000000000000 x12:
0000000000000000
x11:
ffff7000139faf21 x10:
0000000000000003 x9 :
ffff80008f27b938
x8 :
0000000000000002 x7 :
0000000000000000 x6 :
0000000000000000
x5 :
00000000ffffffff x4 :
0000000000400dc0 x3 :
0000000200000000
x2 :
000000010cd004a8 x1 :
ffff80008b3ebc40 x0 :
0000000000000001
Call trace:
__kvmalloc_node_noprof+0x520/0x640 mm/slub.c:5024 (P)
kvmalloc_array_node_noprof include/linux/slab.h:1065 [inline]
io_rsrc_data_alloc io_uring/rsrc.c:206 [inline]
io_clone_buffers io_uring/rsrc.c:1178 [inline]
io_register_clone_buffers+0x484/0xa14 io_uring/rsrc.c:1287
__io_uring_register io_uring/register.c:815 [inline]
__do_sys_io_uring_register io_uring/register.c:926 [inline]
__se_sys_io_uring_register io_uring/register.c:903 [inline]
__arm64_sys_io_uring_register+0x42c/0xea8 io_uring/register.c:903
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
which is due to offset + buffer_count being too large. The registration
code checks only the total count of buffers, but given that the indexing
is an array, it should also check offset + count. That can't exceed
IORING_MAX_REG_BUFFERS either, as there's no way to reach buffers beyond
that limit.
There's no issue with registrering a table this large, outside of the
fact that it's pointless to register buffers that cannot be reached, and
that it can trigger this kmalloc() warning for attempting an allocation
that is too large.
Cc: stable@vger.kernel.org
Fixes:
b16e920a1909 ("io_uring/rsrc: allow cloning at an offset")
Reported-by: syzbot+cb4bf3cb653be0d25de8@syzkaller.appspotmail.com
Link: https://lore.kernel.org/io-uring/684e77bd.a00a0220.279073.0029.GAE@google.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sat, 14 Jun 2025 17:13:32 +0000 (10:13 -0700)]
Merge tag 'v6.16-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- SMB3.1.1 POSIX extensions fix for char remapping
- Fix for repeated directory listings when directory leases enabled
- deferred close handle reuse fix
* tag 'v6.16-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: improve directory cache reuse for readdir operations
smb: client: fix perf regression with deferred closes
smb: client: disable path remapping with POSIX extensions
Linus Torvalds [Sat, 14 Jun 2025 17:01:47 +0000 (10:01 -0700)]
Merge tag 'iommu-fixes-v6.16-rc1' of git://git./linux/kernel/git/iommu/linux
Pull iommu fix from Joerg Roedel:
- Fix PTE size calculation for NVidia Tegra
* tag 'iommu-fixes-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/tegra: Fix incorrect size calculation
Linus Torvalds [Sat, 14 Jun 2025 16:25:22 +0000 (09:25 -0700)]
Merge tag 'block-6.16-
20250614' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix for a deadlock on queue freeze with zoned writes
- Fix for zoned append emulation
- Two bio folio fixes, for sparsemem and for very large folios
- Fix for a performance regression introduced in 6.13 when plug
insertion was changed
- Fix for NVMe passthrough handling for polled IO
- Document the ublk auto registration feature
- loop lockdep warning fix
* tag 'block-6.16-
20250614' of git://git.kernel.dk/linux:
nvme: always punt polled uring_cmd end_io work to task_work
Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists
block: Fix bvec_set_folio() for very large folios
bio: Fix bio_first_folio() for SPARSEMEM without VMEMMAP
block: use plug request list tail for one-shot backmerge attempt
block: don't use submit_bio_noacct_nocheck in blk_zone_wplug_bio_work
block: Clear BIO_EMULATES_ZONE_APPEND flag on BIO completion
ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)
loop: move lo_set_size() out of queue freeze
Linus Torvalds [Sat, 14 Jun 2025 15:44:54 +0000 (08:44 -0700)]
Merge tag 'io_uring-6.16-
20250614' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for a race between SQPOLL exit and fdinfo reading.
It's slim and I was only able to reproduce this with an artificial
delay in the kernel. Followup sparse fix as well to unify the access
to ->thread.
- Fix for multiple buffer peeking, avoiding truncation if possible.
- Run local task_work for IOPOLL reaping when the ring is exiting.
This currently isn't done due to an assumption that polled IO will
never need task_work, but a fix on the block side is going to change
that.
* tag 'io_uring-6.16-
20250614' of git://git.kernel.dk/linux:
io_uring: run local task_work from ring exit IOPOLL reaping
io_uring/kbuf: don't truncate end buffer for multiple buffer peeks
io_uring: consistently use rcu semantics with sqpoll thread
io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()
Linus Torvalds [Sat, 14 Jun 2025 15:38:34 +0000 (08:38 -0700)]
Merge tag 'rust-fixes-6.16' of git://git./linux/kernel/git/ojeda/linux
Pull Rust fix from Miguel Ojeda:
- 'hrtimer': fix future compile error when the 'impl_has_hr_timer!'
macro starts to get called
* tag 'rust-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: time: Fix compile error in impl_has_hr_timer macro
Linus Torvalds [Sat, 14 Jun 2025 15:18:09 +0000 (08:18 -0700)]
Merge tag 'mm-hotfixes-stable-2025-06-13-21-56' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"9 hotfixes. 3 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels. Only 4 are for MM"
* tag 'mm-hotfixes-stable-2025-06-13-21-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: add mmap_prepare() compatibility layer for nested file systems
init: fix build warnings about export.h
MAINTAINERS: add Barry as a THP reviewer
drivers/rapidio/rio_cm.c: prevent possible heap overwrite
mm: close theoretical race where stale TLB entries could linger
mm/vma: reset VMA iterator on commit_merge() OOM failure
docs: proc: update VmFlags documentation in smaps
scatterlist: fix extraneous '@'-sign kernel-doc notation
selftests/mm: skip failed memfd setups in gup_longterm
Linus Torvalds [Fri, 13 Jun 2025 23:49:39 +0000 (16:49 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"All fixes for drivers.
The core change in the error handler is simply to translate an ALUA
specific sense code into a retry the ALUA components can handle and
won't impact any other devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: error: alua: I/O errors for ALUA state transitions
scsi: storvsc: Increase the timeouts to storvsc_timeout
scsi: s390: zfcp: Ensure synchronous unit_add
scsi: iscsi: Fix incorrect error path labels for flashnode operations
scsi: mvsas: Fix typos in per-phy comments and SAS cmd port registers
scsi: core: ufs: Fix a hang in the error handler
Linus Torvalds [Fri, 13 Jun 2025 23:27:27 +0000 (16:27 -0700)]
Merge tag 'drm-fixes-2025-06-14' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Quiet week, only two pull requests came my way, xe has a couple of
fixes and then a bunch of fixes across the board, vc4 probably fixes
the biggest problem:
vc4:
- Fix infinite EPROBE_DEFER loop in vc4 probing
amdxdna:
- Fix amdxdna firmware size
meson:
- modesetting fixes
sitronix:
- Kconfig fix for st7171-i2c
dma-buf:
- Fix -EBUSY WARN_ON_ONCE in dma-buf
udmabuf:
- Use dma_sync_sgtable_for_cpu in udmabuf
xe:
- Fix regression disallowing 64K SVM migration
- Use a bounce buffer for WA BB"
* tag 'drm-fixes-2025-06-14' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/lrc: Use a temporary buffer for WA BB
udmabuf: use sgtable-based scatterlist wrappers
dma-buf: fix compare in WARN_ON_ONCE
drm/sitronix: st7571-i2c: Select VIDEOMODE_HELPERS
drm/meson: fix more rounding issues with 59.94Hz modes
drm/meson: use vclk_freq instead of pixel_freq in debug print
drm/meson: fix debug log statement when setting the HDMI clocks
drm/vc4: fix infinite EPROBE_DEFER loop
drm/xe/svm: Fix regression disallowing 64K SVM migration
accel/amdxdna: Fix incorrect PSP firmware size
Jens Axboe [Fri, 13 Jun 2025 23:05:19 +0000 (17:05 -0600)]
io_uring/nop: add IORING_NOP_TW completion flag
To test and profile the overhead of io_uring task_work and the various
types of it, add IORING_NOP_TW which tells nop to signal completions
through task_work rather than complete them inline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 13 Jun 2025 21:24:53 +0000 (15:24 -0600)]
io_uring: run local task_work from ring exit IOPOLL reaping
In preparation for needing to shift NVMe passthrough to always use
task_work for polled IO completions, ensure that those are suitably
run at exit time. See commit:
9ce6c9875f3e ("nvme: always punt polled uring_cmd end_io work to task_work")
for details on why that is necessary.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 13 Jun 2025 19:37:41 +0000 (13:37 -0600)]
nvme: always punt polled uring_cmd end_io work to task_work
Currently NVMe uring_cmd completions will complete locally, if they are
polled. This is done because those completions are always invoked from
task context. And while that is true, there's no guarantee that it's
invoked under the right ring context, or even task. If someone does
NVMe passthrough via multiple threads and with a limited number of
poll queues, then ringA may find completions from ringB. For that case,
completing the request may not be sound.
Always just punt the passthrough completions via task_work, which will
redirect the completion, if needed.
Cc: stable@vger.kernel.org
Fixes:
585079b6e425 ("nvme: wire up async polling for io passthrough commands")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 13 Jun 2025 20:39:15 +0000 (13:39 -0700)]
Merge tag 'acpi-6.16-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI APEI error injection driver failure that started to
occur after switching it over to using a faux device, address an EC
driver issue related to invalid ECDT tables, clean up the usage of
mwait_idle_with_hints() in the ACPI PAD driver, add a new IRQ override
quirk, and fix a NULL pointer dereference related to nosmp:
- Update the faux device handling code in the driver core and address
an ACPI APEI error injection driver failure that started to occur
after switching it over to using a faux device on top of that (Dan
Williams)
- Update data types of variables passed as arguments to
mwait_idle_with_hints() in the ACPI PAD (processor aggregator
device) driver to match the function definition after recent
changes (Uros Bizjak)
- Fix a NULL pointer dereference in the ACPI CPPC library that occurs
when nosmp is passed to the kernel in the command line (Yunhui Cui)
- Ignore ECDT tables with an invalid ID string to prevent using an
incorrect GPE for signaling events on some systems (Armin Wolf)
- Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan)"
* tag 'acpi-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Use IRQ override on MACHENIKE 16P
ACPI: EC: Ignore ECDT tables with an invalid ID string
ACPI: CPPC: Fix NULL pointer dereference when nosmp is used
ACPI: PAD: Update arguments of mwait_idle_with_hints()
ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
driver core: faux: Quiet probe failures
driver core: faux: Suppress bind attributes
Linus Torvalds [Fri, 13 Jun 2025 20:27:41 +0000 (13:27 -0700)]
Merge tag 'pm-6.16-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the cpupower utility installation, fix up the recently added
Rust abstractions for cpufreq and OPP, restore the x86 update
eliminating mwait_play_dead_cpuid_hint() that has been reverted during
the 6.16 merge window along with preventing the failure caused by it
from happening, and clean up mwait_idle_with_hints() usage in
intel_idle:
- Implement CpuId Rust abstraction and use it to fix doctest failure
related to the recently introduced cpumask abstraction (Viresh
Kumar)
- Do minor cleanups in the `# Safety` sections for cpufreq
abstractions added recently (Viresh Kumar)
- Unbreak cpupower systemd service units installation on some systems
by adding a unitdir variable for specifying the location to install
them (Francesco Poli)
- Eliminate mwait_play_dead_cpuid_hint() again after reverting its
elimination during the 6.16 merge window due to a problem with
handling "dead" SMT siblings, but this time prevent leaving them in
C1 after initialization by taking them online and back offline when
a proper cpuidle driver for the platform has been registered
(Rafael Wysocki)
- Update data types of variables passed as arguments to
mwait_idle_with_hints() to match the function definition after
recent changes (Uros Bizjak)"
* tag 'pm-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
rust: cpu: Add CpuId::current() to retrieve current CPU ID
rust: Use CpuId in place of raw CPU numbers
rust: cpu: Introduce CpuId abstraction
intel_idle: Update arguments of mwait_idle_with_hints()
cpufreq: Convert `/// SAFETY` lines to `# Safety` sections
cpupower: split unitdir from libdir in Makefile
Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
ACPI: processor: Rescan "dead" SMT siblings during initialization
intel_idle: Rescan "dead" SMT siblings during initialization
x86/smp: PM/hibernate: Split arch_resume_nosmt()
intel_idle: Use subsys_initcall_sync() for initialization
Rafael J. Wysocki [Fri, 13 Jun 2025 19:55:30 +0000 (21:55 +0200)]
Merge branches 'acpi-pad', 'acpi-cppc', 'acpi-ec' and 'acpi-resource'
Merge assorted ACPI updates for 6.16-rc2:
- Update data types of variables passed as arguments to
mwait_idle_with_hints() in the ACPI PAD (processor aggregator device)
driver to match the function definition after recent changes (Uros
Bizjak).
- Fix a NULL pointer dereference in the ACPI CPPC library that occurs
when nosmp is passed to the kernel in the command line (Yunhui Cui).
- Ignore ECDT tables with an invalid ID string to prevent using an
incorrect GPE for signaling events on some systems (Armin Wolf).
- Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan).
* acpi-pad:
ACPI: PAD: Update arguments of mwait_idle_with_hints()
* acpi-cppc:
ACPI: CPPC: Fix NULL pointer dereference when nosmp is used
* acpi-ec:
ACPI: EC: Ignore ECDT tables with an invalid ID string
* acpi-resource:
ACPI: resource: Use IRQ override on MACHENIKE 16P
Rafael J. Wysocki [Fri, 13 Jun 2025 19:28:07 +0000 (21:28 +0200)]
Merge branch 'pm-cpuidle'
Merge cpuidle updates for 6.16-rc2:
- Update data types of variables passed as arguments to
mwait_idle_with_hints() to match the function definition
after recent changes (Uros Bizjak).
- Eliminate mwait_play_dead_cpuid_hint() again after reverting its
elimination during the merge window due to a problem with handling
"dead" SMT siblings, but this time prevent leaving them in C1 after
initialization by taking them online and back offline when a proper
cpuidle driver for the platform has been registered (Rafael Wysocki).
* pm-cpuidle:
intel_idle: Update arguments of mwait_idle_with_hints()
Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
ACPI: processor: Rescan "dead" SMT siblings during initialization
intel_idle: Rescan "dead" SMT siblings during initialization
x86/smp: PM/hibernate: Split arch_resume_nosmt()
intel_idle: Use subsys_initcall_sync() for initialization
Rafael J. Wysocki [Fri, 13 Jun 2025 19:25:38 +0000 (21:25 +0200)]
Merge branch 'pm-tools'
Merge a cpupower utility fix for 6.16-rc2 that unbreaks systemd service
units installation on some sysems (Francesco Poli).
* pm-tools:
cpupower: split unitdir from libdir in Makefile
Linus Torvalds [Fri, 13 Jun 2025 18:01:44 +0000 (11:01 -0700)]
Merge tag 'spi-fix-v6.16-rc1' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A collection of driver specific fixes, most minor apart from the OMAP
ones which disable some recent performance optimisations in some
non-standard cases where we could start driving the bus incorrectly.
The change to the stm32-ospi driver to use the newer reset APIs is a
fix for interactions with other IP sharing the same reset line in some
SoCs"
* tag 'spi-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-pci1xxxx: Drop MSI-X usage as unsupported by DMA engine
spi: stm32-ospi: clean up on error in probe()
spi: stm32-ospi: Make usage of reset_control_acquire/release() API
spi: offload: check offload ops existence before disabling the trigger
spi: spi-pci1xxxx: Fix error code in probe
spi: loongson: Fix build warnings about export.h
spi: omap2-mcspi: Disable multi-mode when the previous message kept CS asserted
spi: omap2-mcspi: Disable multi mode when CS should be kept asserted after message
Linus Torvalds [Fri, 13 Jun 2025 18:00:19 +0000 (11:00 -0700)]
Merge tag 'regulator-fix-v6.16-rc1' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One minor fix for a leak in the DT parsing code in the max20086 driver"
* tag 'regulator-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max20086: Fix refcount leak in max20086_parse_regulators_dt()
Oleg Nesterov [Fri, 13 Jun 2025 17:26:50 +0000 (19:26 +0200)]
posix-cpu-timers: fix race between handle_posix_cpu_timers() and posix_cpu_timer_del()
If an exiting non-autoreaping task has already passed exit_notify() and
calls handle_posix_cpu_timers() from IRQ, it can be reaped by its parent
or debugger right after unlock_task_sighand().
If a concurrent posix_cpu_timer_del() runs at that moment, it won't be
able to detect timer->it.cpu.firing != 0: cpu_timer_task_rcu() and/or
lock_task_sighand() will fail.
Add the tsk->exit_state check into run_posix_cpu_timers() to fix this.
This fix is not needed if CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, because
exit_task_work() is called before exit_notify(). But the check still
makes sense, task_work_add(&tsk->posix_cputimers_work.work) will fail
anyway in this case.
Cc: stable@vger.kernel.org
Reported-by: Benoît Sevens <bsevens@google.com>
Fixes:
0bdd2ed4138e ("sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 13 Jun 2025 17:51:11 +0000 (10:51 -0700)]
Merge tag 'trace-v6.16-rc1' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
- Do not free "head" variable in filter_free_subsystem_filters()
The first error path jumps to "free_now" label but first frees the
newly allocated "head" variable. But the "free_now" code checks this
variable, and if it is not NULL, it will iterate the list. As this
list variable was already initialized, the "free_now" code will not
do anything as it is empty. But freeing it will cause a UAF bug.
The error path should simply jump to the "free_now" label and leave
the "head" variable alone.
* tag 'trace-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Do not free "head" on error path of filter_free_subsystem_filters()
Linus Torvalds [Fri, 13 Jun 2025 17:05:31 +0000 (10:05 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Rework of system register accessors for system registers that are
directly writen to memory, so that sanitisation of the in-memory
value happens at the correct time (after the read, or before the
write). For convenience, RMW-style accessors are also provided.
- Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
which was always broken.
x86:
- Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to
pass only the "untouched" addresses and flipping the shared/private
bit in the implementation.
- Disable SEV-SNP support on initialization failure
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY
KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY
KVM: SEV: Disable SEV-SNP support on initialization failure
KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases
KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases
KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases
KVM: arm64: selftests: Fix help text for arch_timer_edge_cases
KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand
KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg
KVM: arm64: Add RMW specific sysreg accessor
KVM: arm64: Add assignment-specific sysreg accessor
Jens Axboe [Fri, 13 Jun 2025 17:01:49 +0000 (11:01 -0600)]
io_uring/kbuf: don't truncate end buffer for multiple buffer peeks
If peeking a bunch of buffers, normally io_ring_buffers_peek() will
truncate the end buffer. This isn't optimal as presumably more data will
be arriving later, and hence it's better to stop with the last full
buffer rather than truncate the end buffer.
Cc: stable@vger.kernel.org
Fixes:
35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Reported-by: Christian Mazakas <christian.mazakas@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 13 Jun 2025 16:59:29 +0000 (09:59 -0700)]
Merge tag 'v6.16-p4' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a broken self-test in hkdf (new regression)"
* tag 'v6.16-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: hkdf - move to late_initcall
Linus Torvalds [Fri, 13 Jun 2025 16:49:07 +0000 (09:49 -0700)]
Merge tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"As usual, highlighting the ones users have been noticing:
- Fix a small issue with has_case_insensitive not being propagated on
snapshot creation; this led to fsck errors, which we're harmless
because we're not using this flag yet (it's for overlayfs +
casefolding).
- Log the error being corrected in the journal when we're doing fsck
repair: this was one of the "lessons learned" from the i_nlink 0 ->
subvolume deletion bug, where reconstructing what had happened by
analyzing the journal was a bit more difficult than it needed to
be.
- Don't schedule btree node scan to run in the superblock: this fixes
a regression from the 6.16 recovery passes rework, and let to it
running unnecessarily.
The real issue here is that we don't have online, "self healing"
style topology repair yet: topology repair currently has to run
before we go RW, which means that we may schedule it unnecessarily
after a transient error. This will be fixed in the future.
- We now track, in btree node flags, the reason it was scheduled to
be rewritten. We discovered a deadlock in recovery when many btree
nodes need to be rewritten because they're degraded: fully fixing
this will take some work but it's now easier to see what's going
on.
For the bug report where this came up, a device had been kicked RO
due to transient errors: manually setting it back to RW was
sufficient to allow recovery to succeed.
- Mark a few more fsck errors as autofix: as a reminder to users,
please do keep reporting cases where something needs to be repaired
and is not repaired automatically (i.e. cases where -o fix_errors
or fsck -y is required).
- rcu_pending.c now works with PREEMPT_RT
- 'bcachefs device add', then umount, then remount wasn't working -
we now emit a uevent so that the new device's new superblock is
correctly picked up
- Assorted repair fixes: btree node scan will no longer incorrectly
update sb->version_min,
- Assorted syzbot fixes"
* tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs: (23 commits)
bcachefs: Don't trace should_be_locked unless changing
bcachefs: Ensure that snapshot creation propagates has_case_insensitive
bcachefs: Print devices we're mounting on multi device filesystems
bcachefs: Don't trust sb->nr_devices in members_to_text()
bcachefs: Fix version checks in validate_bset()
bcachefs: ioctl: avoid stack overflow warning
bcachefs: Don't pass trans to fsck_err() in gc_accounting_done
bcachefs: Fix leak in bch2_fs_recovery() error path
bcachefs: Fix rcu_pending for PREEMPT_RT
bcachefs: Fix downgrade_table_extra()
bcachefs: Don't put rhashtable on stack
bcachefs: Make sure opts.read_only gets propagated back to VFS
bcachefs: Fix possible console lock involved deadlock
bcachefs: mark more errors autofix
bcachefs: Don't persistently run scan_for_btree_nodes
bcachefs: Read error message now prints if self healing
bcachefs: Only run 'increase_depth' for keys from btree node csan
bcachefs: Mark need_discard_freespace_key_bad autofix
bcachefs: Update /dev/disk/by-uuid on device add
bcachefs: Add more flags to btree nodes for rewrite reason
...
Madhavan Srinivasan [Sat, 17 May 2025 14:22:37 +0000 (19:52 +0530)]
powerpc: Fix struct termio related ioctl macros
Since termio interface is now obsolete, include/uapi/asm/ioctls.h
has some constant macros referring to "struct termio", this caused
build failure at userspace.
In file included from /usr/include/asm/ioctl.h:12,
from /usr/include/asm/ioctls.h:5,
from tst-ioctls.c:3:
tst-ioctls.c: In function 'get_TCGETA':
tst-ioctls.c:12:10: error: invalid application of 'sizeof' to incomplete type 'struct termio'
12 | return TCGETA;
| ^~~~~~
Even though termios.h provides "struct termio", trying to juggle definitions around to
make it compile could introduce regressions. So better to open code it.
Reported-by: Tulio Magno <tuliom@ascii.art.br>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Closes: https://lore.kernel.org/linuxppc-dev/8734dji5wl.fsf@ascii.art.br/
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250517142237.156665-1-maddy@linux.ibm.com
Bagas Sanjaya [Fri, 13 Jun 2025 02:38:57 +0000 (09:38 +0700)]
Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists
Stephen Rothwell reports htmldocs warning on ublk docs:
Documentation/block/ublk.rst:414: ERROR: Unexpected indentation. [docutils]
Fix the warning by separating sublists of auto buffer registration
fallback behavior from their appropriate parent list item.
Fixes:
ff20c516485e ("ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/
20250612132638.
193de386@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20250613023857.15971-1-bagasdotme@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jason Gunthorpe [Tue, 3 Jun 2025 19:14:45 +0000 (16:14 -0300)]
iommu/tegra: Fix incorrect size calculation
This driver uses a mixture of ways to get the size of a PTE,
tegra_smmu_set_pde() did it as sizeof(*pd) which became wrong when pd
switched to a struct tegra_pd.
Switch pd back to a u32* in tegra_smmu_set_pde() so the sizeof(*pd)
returns 4.
Fixes:
50568f87d1e2 ("iommu/terga: Do not use struct page as the handle for as->pd memory")
Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Closes: https://lore.kernel.org/all/
62e7f7fe-6200-4e4f-ad42-
d58ad272baa6@tecnico.ulisboa.pt/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Link: https://lore.kernel.org/r/0-v1-da7b8b3d57eb+ce-iommu_terga_sizeof_jgg@nvidia.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Matthew Wilcox (Oracle) [Thu, 12 Jun 2025 14:42:53 +0000 (15:42 +0100)]
block: Fix bvec_set_folio() for very large folios
Similarly to
26064d3e2b4d ("block: fix adding folio to bio"), if
we attempt to add a folio that is larger than 4GB, we'll silently
truncate the offset and len. Widen the parameters to size_t, assert
that the length is less than 4GB and set the first page that contains
the interesting data rather than the first page of the folio.
Fixes:
26db5ee15851 (block: add a bvec_set_folio helper)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20250612144255.2850278-1-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Matthew Wilcox (Oracle) [Thu, 12 Jun 2025 14:41:25 +0000 (15:41 +0100)]
bio: Fix bio_first_folio() for SPARSEMEM without VMEMMAP
It is possible for physically contiguous folios to have discontiguous
struct pages if SPARSEMEM is enabled and SPARSEMEM_VMEMMAP is not.
This is correctly handled by folio_page_idx(), so remove this open-coded
implementation.
Fixes:
640d1930bef4 (block: Add bio_for_each_folio_all())
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20250612144126.2849931-1-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kurt Borja [Wed, 11 Jun 2025 21:30:40 +0000 (18:30 -0300)]
Revert "platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware m16 R1"
This reverts commit
5ff79cabb23a2f14d2ed29e9596aec908905a0e6.
Although the Alienware m16 R1 AMD model supports G-Mode, it actually has
a lower power ceiling than plain "performance" profile, which results in
lower performance.
Reported-by: Cihan Ozakca <cozakca@outlook.com>
Cc: stable@vger.kernel.org # 6.15.x
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Link: https://lore.kernel.org/r/20250611-m16-rev-v1-1-72d13bad03c9@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Mario Limonciello [Wed, 11 Jun 2025 20:33:40 +0000 (15:33 -0500)]
platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks list
Every other s2idle cycle fails to reach hardware sleep when keyboard
wakeup is enabled. This appears to be an EC bug, but the vendor
refuses to fix it.
It was confirmed that turning off i8042 wakeup avoids ths issue
(albeit keyboard wakeup is disabled). Take the lesser of two evils
and add it to the i8042 quirk list.
Reported-by: Raoul <ein4rth@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220116
Tested-by: Raoul <ein4rth@gmail.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250611203341.3733478-1-superm1@kernel.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Thangaraj Samynathan [Thu, 12 Jun 2025 02:30:59 +0000 (08:00 +0530)]
spi: spi-pci1xxxx: Drop MSI-X usage as unsupported by DMA engine
Removes MSI-X from the interrupt request path, as the DMA engine used by
the SPI controller does not support MSI-X interrupts.
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Link: https://patch.msgid.link/20250612023059.71726-1-thangaraj.s@microchip.com
Signed-off-by: Mark Brown <broonie@kernel.org>
J. Neuschäfer [Wed, 11 Jun 2025 19:01:01 +0000 (21:01 +0200)]
powerpc: dts: mpc8315erdb: Add GPIO controller node
The MPC8315E SoC and variants have a GPIO controller at IMMR + 0xc00.
This node was previously missing from the device tree.
Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250611-mpc-gpio-v1-1-02d1f75336e2@posteo.net
J. Neuschäfer [Wed, 11 Jun 2025 21:07:49 +0000 (23:07 +0200)]
powerpc/microwatt: Fix model property in device tree
The standard property for the model name is called "model".
Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250611-microwatt-v2-1-80847bbc5f9c@posteo.net
Narayana Murty N [Thu, 8 May 2025 06:29:28 +0000 (02:29 -0400)]
powerpc/eeh: Fix missing PE bridge reconfiguration during VFIO EEH recovery
VFIO EEH recovery for PCI passthrough devices fails on PowerNV and pseries
platforms due to missing host-side PE bridge reconfiguration. In the
current implementation, eeh_pe_configure() only performs RTAS or OPAL-based
bridge reconfiguration for native host devices, but skips it entirely for
PEs managed through VFIO in guest passthrough scenarios.
This leads to incomplete EEH recovery when a PCI error affects a
passthrough device assigned to a QEMU/KVM guest. Although VFIO triggers the
EEH recovery flow through VFIO_EEH_PE_ENABLE ioctl, the platform-specific
bridge reconfiguration step is silently bypassed. As a result, the PE's
config space is not fully restored, causing subsequent config space access
failures or EEH freeze-on-access errors inside the guest.
This patch fixes the issue by ensuring that eeh_pe_configure() always
invokes the platform's configure_bridge() callback (e.g.,
pseries_eeh_phb_configure_bridge) even for VFIO-managed PEs. This ensures
that RTAS or OPAL calls to reconfigure the PE bridge are correctly issued
on the host side, restoring the PE's configuration space after an EEH
event.
This fix is essential for reliable EEH recovery in QEMU/KVM guests using
VFIO PCI passthrough on PowerNV and pseries systems.
Tested with:
- QEMU/KVM guest using VFIO passthrough (IBM Power9,(lpar)Power11 host)
- Injected EEH errors with pseries EEH errinjct tool on host, recovery
verified on qemu guest.
- Verified successful config space access and CAP_EXP DevCtl restoration
after recovery
Fixes:
212d16cdca2d ("powerpc/eeh: EEH support for VFIO PCI device")
Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250508062928.146043-1-nnmlinux@linux.ibm.com
Christophe Leroy [Mon, 12 May 2025 18:14:55 +0000 (20:14 +0200)]
powerpc/vdso: Fix build of VDSO32 with pcrel
Building vdso32 on power10 with pcrel leads to following errors:
VDSO32A arch/powerpc/kernel/vdso/gettimeofday-32.o
arch/powerpc/kernel/vdso/gettimeofday.S: Assembler messages:
arch/powerpc/kernel/vdso/gettimeofday.S:40: Error: syntax error; found `@', expected `,'
arch/powerpc/kernel/vdso/gettimeofday.S:71: Info: macro invoked from here
arch/powerpc/kernel/vdso/gettimeofday.S:40: Error: junk at end of line: `@notoc'
arch/powerpc/kernel/vdso/gettimeofday.S:71: Info: macro invoked from here
...
make[2]: *** [arch/powerpc/kernel/vdso/Makefile:85: arch/powerpc/kernel/vdso/gettimeofday-32.o] Error 1
make[1]: *** [arch/powerpc/Makefile:388: vdso_prepare] Error 2
Once the above is fixed, the following happens:
VDSO32C arch/powerpc/kernel/vdso/vgettimeofday-32.o
cc1: error: '-mpcrel' requires '-mcmodel=medium'
make[2]: *** [arch/powerpc/kernel/vdso/Makefile:89: arch/powerpc/kernel/vdso/vgettimeofday-32.o] Error 1
make[1]: *** [arch/powerpc/Makefile:388: vdso_prepare] Error 2
make: *** [Makefile:251: __sub-make] Error 2
Make sure pcrel version of CFUNC() macro is used only for powerpc64
builds and remove -mpcrel for powerpc32 builds.
Fixes:
7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL addresing")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/1fa3453f07d42a50a70114da9905bf7b73304fca.1747073669.git.christophe.leroy@csgroup.eu
Dave Airlie [Fri, 13 Jun 2025 04:57:09 +0000 (14:57 +1000)]
Merge tag 'drm-misc-fixes-2025-06-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc2:
- Fix infinite EPROBE_DEFER loop in vc4 probing.
- Fix amdxdna firmware size.
- mode fixes for meson.
- Kconfig fix for st7171-i2c.
- Fix -EBUSY WARN_ON_ONCE in dma-buf
- Use dma_sync_sgtable_for_cpu in udmabuf.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/62c06195-8bc1-4dae-8777-e86d94e4d9d9@linux.intel.com
Lorenzo Stoakes [Mon, 9 Jun 2025 16:57:49 +0000 (17:57 +0100)]
mm: add mmap_prepare() compatibility layer for nested file systems
Nested file systems, that is those which invoke call_mmap() within their
own f_op->mmap() handlers, may encounter underlying file systems which
provide the f_op->mmap_prepare() hook introduced by commit
c84bf6dd2b83
("mm: introduce new .mmap_prepare() file callback").
We have a chicken-and-egg scenario here - until all file systems are
converted to using .mmap_prepare(), we cannot convert these nested
handlers, as we can't call f_op->mmap from an .mmap_prepare() hook.
So we have to do it the other way round - invoke the .mmap_prepare() hook
from an .mmap() one.
in order to do so, we need to convert VMA state into a struct vm_area_desc
descriptor, invoking the underlying file system's f_op->mmap_prepare()
callback passing a pointer to this, and then setting VMA state accordingly
and safely.
This patch achieves this via the compat_vma_mmap_prepare() function, which
we invoke from call_mmap() if f_op->mmap_prepare() is specified in the
passed in file pointer.
We place the fundamental logic into mm/vma.h where VMA manipulation
belongs. We also update the VMA userland tests to accommodate the
changes.
The compat_vma_mmap_prepare() function and its associated machinery is
temporary, and will be removed once the conversion of file systems is
complete.
We carefully place this code so it can be used with CONFIG_MMU and also
with cutting edge nommu silicon.
[akpm@linux-foundation.org: export compat_vma_mmap_prepare tp fix build]
[lorenzo.stoakes@oracle.com: remove unused declarations]
Link: https://lkml.kernel.org/r/ac3ae324-4c65-432a-8c6d-2af988b18ac8@lucifer.local
Link: https://lkml.kernel.org/r/20250609165749.344976-1-lorenzo.stoakes@oracle.com
Fixes:
c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback").
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/linux-mm/CAG48ez04yOEVx1ekzOChARDDBZzAKwet8PEoPM4Ln3_rk91AzQ@mail.gmail.com/
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dave Airlie [Fri, 13 Jun 2025 01:05:23 +0000 (11:05 +1000)]
Merge tag 'drm-xe-fixes-2025-06-12' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix regression disallowing 64K SVM migration (Maarten)
- Use a bounce buffer for WA BB (Lucas)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aEsBQoh5Si3ouPgE@fedora
Linus Torvalds [Thu, 12 Jun 2025 19:32:09 +0000 (12:32 -0700)]
Merge tag 'bitmap-for-6.16-rc2' of https://github.com/norov/linux
Pull bitmap fix from Yury Norov:
"Fix for __GENMASK() and __GENMASK_ULL() in UAPI"
* tag 'bitmap-for-6.16-rc2' of https://github.com/norov/linux:
uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again
Bharath SM [Wed, 11 Jun 2025 11:29:02 +0000 (16:59 +0530)]
smb: improve directory cache reuse for readdir operations
Currently, cached directory contents were not reused across subsequent
'ls' operations because the cache validity check relied on comparing
the ctx pointer, which changes with each readdir invocation. As a
result, the cached dir entries was not marked as valid and the cache was
not utilized for subsequent 'ls' operations.
This change uses the file pointer, which remains consistent across all
readdir calls for a given directory instance, to associate and validate
the cache. As a result, cached directory contents can now be
correctly reused, improving performance for repeated directory listings.
Performance gains with local windows SMB server:
Without the patch and default actimeo=1:
1000 directory enumeration operations on dir with 10k files took 135.0s
With this patch and actimeo=0:
1000 directory enumeration operations on dir with 10k files took just 5.1s
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Paulo Alcantara [Thu, 12 Jun 2025 15:45:04 +0000 (12:45 -0300)]
smb: client: fix perf regression with deferred closes
Customer reported that one of their applications started failing to
open files with STATUS_INSUFFICIENT_RESOURCES due to NetApp server
hitting the maximum number of opens to same file that it would allow
for a single client connection.
It turned out the client was failing to reuse open handles with
deferred closes because matching ->f_flags directly without masking
off O_CREAT|O_EXCL|O_TRUNC bits first broke the comparision and then
client ended up with thousands of deferred closes to same file. Those
bits are already satisfied on the original open, so no need to check
them against existing open handles.
Reproducer:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>
#define NR_THREADS 4
#define NR_ITERATIONS 2500
#define TEST_FILE "/mnt/1/test/dir/foo"
static char buf[64];
static void *worker(void *arg)
{
int i, j;
int fd;
for (i = 0; i < NR_ITERATIONS; i++) {
fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_APPEND, 0666);
for (j = 0; j < 16; j++)
write(fd, buf, sizeof(buf));
close(fd);
}
}
int main(int argc, char *argv[])
{
pthread_t t[NR_THREADS];
int fd;
int i;
fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_TRUNC, 0666);
close(fd);
memset(buf, 'a', sizeof(buf));
for (i = 0; i < NR_THREADS; i++)
pthread_create(&t[i], NULL, worker, NULL);
for (i = 0; i < NR_THREADS; i++)
pthread_join(t[i], NULL);
return 0;
}
Before patch:
$ mount.cifs //srv/share /mnt/1 -o ...
$ mkdir -p /mnt/1/test/dir
$ gcc repro.c && ./a.out
...
number of opens: 1391
After patch:
$ mount.cifs //srv/share /mnt/1 -o ...
$ mkdir -p /mnt/1/test/dir
$ gcc repro.c && ./a.out
...
number of opens: 1
Cc: linux-cifs@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: Jay Shin <jaeshin@redhat.com>
Cc: Pierguido Lambri <plambri@redhat.com>
Fixes:
b8ea3b1ff544 ("smb: enable reuse of deferred file handles for write operations")
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Thu, 12 Jun 2025 16:50:36 +0000 (09:50 -0700)]
Merge tag 'net-6.16-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
Lucas De Marchi [Wed, 4 Jun 2025 15:03:05 +0000 (08:03 -0700)]
drm/xe/lrc: Use a temporary buffer for WA BB
In case the BO is in iomem, we can't simply take the vaddr and write to
it. Instead, prepare a separate buffer that is later copied into io
memory. Right now it's just a few words that could be using
xe_map_write32(), but the intention is to grow the WA BB for other
uses.
Fixes:
617d824c5323 ("drm/xe: Add WA BB to capture active context utilization")
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit
ef48715b2d3df17c060e23b9aa636af3d95652f8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Linus Torvalds [Thu, 12 Jun 2025 15:21:13 +0000 (08:21 -0700)]
Merge tag 'pinctrl-v6.16-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Add some missing pins on the Qualcomm QCM2290, along with a managed
resources patch that make it clean and nice
- Drop an unused function in the ST Micro driver
- Drop bouncing MAINTAINER entry
- Drop of_match_ptr() macro to rid compile warnings in the TB10x
driver
- Fix up calculation of pin numbers from base in the Sunxi driver
* tag 'pinctrl-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: dt: Consider pin base when calculating bank number from pin
pinctrl: tb10x: Drop of_match_ptr for ID table
pinctrl: MAINTAINERS: Drop bouncing Jianlong Huang
pinctrl: st: Drop unused st_gpio_bank() function
pinctrl: qcom: pinctrl-qcm2290: Add missing pins
pinctrl: qcom: switch to devm_gpiochip_add_data()
Linus Torvalds [Thu, 12 Jun 2025 15:17:56 +0000 (08:17 -0700)]
Merge tag 'arc-6.16-rc1' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- arch_atomic64_cmpxchg relaxed variant [Jason]
- use of inbuilt swap in stack unwinder [Yu-Chun Lin]
- use of __ASSEMBLER__ in kernel headers [Thomas Huth]
* tag 'arc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in the non-uapi headers
ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
ARC: unwind: Use built-in sort swap to reduce code size and improve performance
ARC: atomics: Implement arch_atomic64_cmpxchg using _relaxed
Jakub Kicinski [Thu, 12 Jun 2025 15:16:47 +0000 (08:16 -0700)]
Merge tag 'wireless-2025-06-12' of https://git./linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Another quick round of updates:
- revert mwifiex HT40 that was causing issues
- many ath10k/ath11k/ath12k fixes
- re-add some iwlwifi code I lost in a merge
- use kfree_sensitive() on an error path in cfg80211
* tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: use kfree_sensitive() for connkeys cleanup
wifi: iwlwifi: fix merge damage related to iwl_pci_resume
Revert "wifi: mwifiex: Fix HT40 bandwidth issue."
wifi: ath12k: fix uaf in ath12k_core_init()
wifi: ath12k: Fix hal_reo_cmd_status kernel-doc
wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850
wifi: ath11k: validate ath11k_crypto_mode on top of ath11k_core_qmi_firmware_ready
wifi: ath11k: consistently use ath11k_mac_get_fw_stats()
wifi: ath11k: move locking outside of ath11k_mac_get_fw_stats()
wifi: ath11k: adjust unlock sequence in ath11k_update_stats_event()
wifi: ath11k: move some firmware stats related functions outside of debugfs
wifi: ath11k: don't wait when there is no vdev started
wifi: ath11k: don't use static variables in ath11k_debugfs_fw_stats_process()
wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
wil6210: fix support for sparrow chipsets
wifi: ath10k: Avoid vdev delete timeout when firmware is already down
ath10k: snoc: fix unbalanced IRQ enable in crash recovery
====================
Link: https://patch.msgid.link/20250612082519.11447-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Jun 2025 15:15:37 +0000 (08:15 -0700)]
Merge branch 'fix-ntuple-rules-targeting-default-rss'
Gal Pressman says:
====================
Fix ntuple rules targeting default RSS
This series addresses a regression in ethtool flow steering where rules
targeting the default RSS context (context 0) were incorrectly rejected.
The default RSS context always exists but is not stored in the rss_ctx
xarray like additional contexts. The current validation logic was
checking for the existence of context 0 in this array, causing valid
flow steering rules to be rejected.
This prevented configurations such as:
- High priority rules directing specific traffic to the default context
- Low priority catch-all rules directing remaining traffic to additional
contexts
Patch 1 fixes the validation logic to skip the existence check for
context 0.
Patch 2 adds a selftest that verifies this behavior.
v3: https://lore.kernel.org/
20250609120250.
1630125-1-gal@nvidia.com
v2: https://lore.kernel.org/
20250225071348.509432-1-gal@nvidia.com
====================
Link: https://patch.msgid.link/20250612071958.1696361-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Thu, 12 Jun 2025 07:19:58 +0000 (10:19 +0300)]
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
Add test_rss_default_context_rule() to verify that ntuple rules can
correctly direct traffic to the default RSS context (context 0).
The test creates two ntuple rules with explicit location priorities:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This validates that:
1. Rules targeting the default context function properly.
2. Traffic steering works as expected when mixing default and
additional RSS contexts.
The test was written by AI, and reviewed by humans.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250612071958.1696361-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Thu, 12 Jun 2025 07:19:57 +0000 (10:19 +0300)]
net: ethtool: Don't check if RSS context exists in case of context 0
Context 0 (default context) always exists, there is no need to check
whether it exists or not when adding a flow steering rule.
The existing check fails when creating a flow steering rule for context
0 as it is not stored in the rss_ctx xarray.
For example:
$ ethtool --config-ntuple eth2 flow-type tcp4 dst-ip 194.237.147.23 dst-port 19983 context 0 loc 618
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule
An example usecase for this could be:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This is a user-visible regression that was caught in our testing
environment, it was not reported by a user yet.
Fixes:
de7f7582dff2 ("net: ethtool: prevent flow steering to RSS contexts which don't exist")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250612071958.1696361-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Jun 2025 15:13:47 +0000 (08:13 -0700)]
Merge tag 'for-net-2025-06-11' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- eir: Fix NULL pointer deference on eir_get_service_data
- eir: Fix possible crashes on eir_create_adv_data
- hci_sync: Fix broadcast/PA when using an existing instance
- ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
- ISO: Fix not using bc_sid as advertisement SID
- MGMT: Fix sparse errors
* tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Fix sparse errors
Bluetooth: ISO: Fix not using bc_sid as advertisement SID
Bluetooth: ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
Bluetooth: eir: Fix possible crashes on eir_create_adv_data
Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance
Bluetooth: Fix NULL pointer deference on eir_get_service_data
====================
Link: https://patch.msgid.link/20250611204944.1559356-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Wed, 11 Jun 2025 20:27:35 +0000 (13:27 -0700)]
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
Before the cited commit, the kernel unconditionally embedded SCM
credentials to skb for embryo sockets even when both the sender
and listener disabled SO_PASSCRED and SO_PASSPIDFD.
Now, the credentials are added to skb only when configured by the
sender or the listener.
However, as reported in the link below, it caused a regression for
some programs that assume credentials are included in every skb,
but sometimes not now.
The only problematic scenario would be that a socket starts listening
before setting the option. Then, there will be 2 types of non-small
race window, where a client can send skb without credentials, which
the peer receives as an "invalid" message (and aborts the connection
it seems ?):
Client Server
------ ------
s1.listen() <-- No SO_PASS{CRED,PIDFD}
s2.connect()
s2.send() <-- w/o cred
s1.setsockopt(SO_PASS{CRED,PIDFD})
s2.send() <-- w/ cred
or
Client Server
------ ------
s1.listen() <-- No SO_PASS{CRED,PIDFD}
s2.connect()
s2.send() <-- w/o cred
s3, _ = s1.accept() <-- Inherit cred options
s2.send() <-- w/o cred but not set yet
s3.setsockopt(SO_PASS{CRED,PIDFD})
s2.send() <-- w/ cred
It's unfortunate that buggy programs depend on the behaviour,
but let's restore the previous behaviour.
Fixes:
3f84d577b79d ("af_unix: Inherit sk_flags at connect().")
Reported-by: Jacek Łuczak <difrost.kernel@gmail.com>
Closes: https://lore.kernel.org/all/
68d38b0b-1666-4974-85d4-
15575789c8d4@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Tested-by: Christian Heusel <christian@heusel.eu>
Tested-by: André Almeida <andrealmeid@igalia.com>
Tested-by: Jacek Łuczak <difrost.kernel@gmail.com>
Link: https://patch.msgid.link/20250611202758.3075858-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Wed, 11 Jun 2025 19:35:02 +0000 (12:35 -0700)]
ipv6: Move fib6_config_validate() to ip6_route_add().
syzkaller created an IPv6 route from a malformed packet, which has
a prefix len > 128, triggering the splat below. [0]
This is a similar issue fixed by commit
586ceac9acb7 ("ipv6: Restore
fib6_config validation for SIOCADDRT.").
The cited commit removed fib6_config validation from some callers
of ip6_add_route().
Let's move the validation back to ip6_route_add() and
ip6_route_multipath_add().
[0]:
UBSAN: array-index-out-of-bounds in ./include/net/ipv6.h:616:34
index 20 is out of range for type '__u8 [16]'
CPU: 1 UID: 0 PID: 7444 Comm: syz.0.708 Not tainted
6.16.0-rc1-syzkaller-g19272b37aa4f #0 PREEMPT
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<
ffffffff80078a80>] dump_backtrace+0x2e/0x3c arch/riscv/kernel/stacktrace.c:132
[<
ffffffff8000327a>] show_stack+0x30/0x3c arch/riscv/kernel/stacktrace.c:138
[<
ffffffff80061012>] __dump_stack lib/dump_stack.c:94 [inline]
[<
ffffffff80061012>] dump_stack_lvl+0x12e/0x1a6 lib/dump_stack.c:120
[<
ffffffff800610a6>] dump_stack+0x1c/0x24 lib/dump_stack.c:129
[<
ffffffff8001c0ea>] ubsan_epilogue+0x14/0x46 lib/ubsan.c:233
[<
ffffffff819ba290>] __ubsan_handle_out_of_bounds+0xf6/0xf8 lib/ubsan.c:455
[<
ffffffff85b363a4>] ipv6_addr_prefix include/net/ipv6.h:616 [inline]
[<
ffffffff85b363a4>] ip6_route_info_create+0x8f8/0x96e net/ipv6/route.c:3793
[<
ffffffff85b635da>] ip6_route_add+0x2a/0x1aa net/ipv6/route.c:3889
[<
ffffffff85b02e08>] addrconf_prefix_route+0x2c4/0x4e8 net/ipv6/addrconf.c:2487
[<
ffffffff85b23bb2>] addrconf_prefix_rcv+0x1720/0x1e62 net/ipv6/addrconf.c:2878
[<
ffffffff85b92664>] ndisc_router_discovery+0x1a06/0x3504 net/ipv6/ndisc.c:1570
[<
ffffffff85b99038>] ndisc_rcv+0x500/0x600 net/ipv6/ndisc.c:1874
[<
ffffffff85bc2c18>] icmpv6_rcv+0x145e/0x1e0a net/ipv6/icmp.c:988
[<
ffffffff85af6798>] ip6_protocol_deliver_rcu+0x18a/0x1976 net/ipv6/ip6_input.c:436
[<
ffffffff85af8078>] ip6_input_finish+0xf4/0x174 net/ipv6/ip6_input.c:480
[<
ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<
ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<
ffffffff85af8262>] ip6_input+0x16a/0x70c net/ipv6/ip6_input.c:491
[<
ffffffff85af8dcc>] ip6_mc_input+0x5c8/0x1268 net/ipv6/ip6_input.c:588
[<
ffffffff85af6112>] dst_input include/net/dst.h:469 [inline]
[<
ffffffff85af6112>] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
[<
ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<
ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<
ffffffff85af6112>] ipv6_rcv+0x5ae/0x6e0 net/ipv6/ip6_input.c:309
[<
ffffffff85087e84>] __netif_receive_skb_one_core+0x106/0x16e net/core/dev.c:5977
[<
ffffffff85088104>] __netif_receive_skb+0x2c/0x144 net/core/dev.c:6090
[<
ffffffff850883c6>] netif_receive_skb_internal net/core/dev.c:6176 [inline]
[<
ffffffff850883c6>] netif_receive_skb+0x1aa/0xbf2 net/core/dev.c:6235
[<
ffffffff8328656e>] tun_rx_batched.isra.0+0x430/0x686 drivers/net/tun.c:1485
[<
ffffffff8329ed3a>] tun_get_user+0x2952/0x3d6c drivers/net/tun.c:1938
[<
ffffffff832a21e0>] tun_chr_write_iter+0xc4/0x21c drivers/net/tun.c:1984
[<
ffffffff80b9b9ae>] new_sync_write fs/read_write.c:593 [inline]
[<
ffffffff80b9b9ae>] vfs_write+0x56c/0xa9a fs/read_write.c:686
[<
ffffffff80b9c2be>] ksys_write+0x126/0x228 fs/read_write.c:738
[<
ffffffff80b9c42e>] __do_sys_write fs/read_write.c:749 [inline]
[<
ffffffff80b9c42e>] __se_sys_write fs/read_write.c:746 [inline]
[<
ffffffff80b9c42e>] __riscv_sys_write+0x6e/0x94 fs/read_write.c:746
[<
ffffffff80076912>] syscall_handler+0x94/0x118 arch/riscv/include/asm/syscall.h:112
[<
ffffffff8637e31e>] do_trap_ecall_u+0x396/0x530 arch/riscv/kernel/traps.c:341
[<
ffffffff863a69e2>] handle_exception+0x146/0x152 arch/riscv/kernel/entry.S:197
Fixes:
fa76c1674f2e ("ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().")
Reported-by: syzbot+4c2358694722d304c44e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
6849b8c3.
a00a0220.1eb5f5.00f0.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611193551.2999991-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Jun 2025 17:46:43 +0000 (10:46 -0700)]
net: drv: netdevsim: don't napi_complete() from netpoll
netdevsim supports netpoll. Make sure we don't call napi_complete()
from it, since it may not be scheduled. Breno reports hitting a
warning in napi_complete_done():
WARNING: CPU: 14 PID: 104 at net/core/dev.c:6592 napi_complete_done+0x2cc/0x560
__napi_poll+0x2d8/0x3a0
handle_softirqs+0x1fe/0x710
This is presumably after netpoll stole the SCHED bit prematurely.
Reported-by: Breno Leitao <leitao@debian.org>
Fixes:
3762ec05a9fb ("netdevsim: add NAPI support")
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250611174643.2769263-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Wed, 11 Jun 2025 13:14:32 +0000 (16:14 +0300)]
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails.
Fixes:
17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jesper Dangaard Brouer [Wed, 11 Jun 2025 12:40:04 +0000 (14:40 +0200)]
veth: prevent NULL pointer dereference in veth_xdp_rcv
The veth peer device is RCU protected, but when the peer device gets
deleted (veth_dellink) then the pointer is assigned NULL (via
RCU_INIT_POINTER).
This patch adds a necessary NULL check in veth_xdp_rcv when accessing
the veth peer net_device.
This fixes a bug introduced in commit
dc82a33297fc ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops"). The bug is a race
and only triggers when having inflight packets on a veth that is being
deleted.
Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Closes: https://lore.kernel.org/all/
fecfcad0-7a16-42b8-bff2-
66ee83a6e5c4@linux.dev/
Reported-by: syzbot+c4c7bf27f6b0c4bd97fe@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/
683da55e.
a00a0220.d8eae.0052.GAE@google.com/
Fixes:
dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://patch.msgid.link/174964557873.519608.10855046105237280978.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Jun 2025 15:05:53 +0000 (08:05 -0700)]
Merge branch 'net_sched-no-longer-use-qdisc_tree_flush_backlog'
Eric Dumazet says:
====================
net_sched: no longer use qdisc_tree_flush_backlog()
This series is based on a report from Gerrard Tai.
Essentially, all users of qdisc_tree_flush_backlog() are racy.
We must instead use qdisc_purge_queue().
====================
Link: https://patch.msgid.link/20250611111515.1983366-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:15 +0000 (11:15 +0000)]
net_sched: remove qdisc_tree_flush_backlog()
This function is no longer used after the four prior fixes.
Given all prior uses were wrong, it seems better to remove it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:14 +0000 (11:15 +0000)]
net_sched: ets: fix a race in ets_qdisc_change()
Gerrard Tai reported a race condition in ETS, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:13 +0000 (11:15 +0000)]
net_sched: tbf: fix a race in tbf_change()
Gerrard Tai reported a race condition in TBF, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://patch.msgid.link/20250611111515.1983366-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:12 +0000 (11:15 +0000)]
net_sched: red: fix a race in __red_change()
Gerrard Tai reported a race condition in RED, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
0c8d13ac9607 ("net: sched: red: delay destroying child qdisc on replace")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:11 +0000 (11:15 +0000)]
net_sched: prio: fix a race in prio_tune()
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
7b8e0b6e6599 ("net: sched: prio: delay destroying child qdiscs on change")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 08:35:01 +0000 (08:35 +0000)]
net_sched: sch_sfq: reject invalid perturb period
Gerrard Tai reported that SFQ perturb_period has no range check yet,
and this can be used to trigger a race condition fixed in a separate patch.
We want to make sure ctl->perturb_period * HZ will not overflow
and is positive.
Tested:
tc qd add dev lo root sfq perturb -10 # negative value : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb
1000000000 # too big : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb
2000000 # acceptable value
tc -s -d qd sh dev lo
qdisc sfq 8005: root refcnt 2 limit 127p quantum 64Kb depth 127 flows 128 divisor 1024 perturb 2000000sec
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250611083501.1810459-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxime Chevallier [Fri, 6 Jun 2025 09:43:20 +0000 (11:43 +0200)]
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
When performing a non-exact phy_caps lookup, we are looking for a
supported mode that matches as closely as possible the passed speed/duplex.
Blamed patch broke that logic by returning a match too early in case
the caller asks for half-duplex, as a full-duplex linkmode may match
first, and returned as a non-exact match without even trying to mach on
half-duplex modes.
Reported-by: Jijie Shao <shaojijie@huawei.com>
Closes: https://lore.kernel.org/netdev/
20250603102500.
4ec743cf@fedora/T/#m22ed60ca635c67dc7d9cbb47e8995b2beb5c1576
Tested-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Fixes:
fc81e257d19f ("net: phy: phy_caps: Allow looking-up link caps based on speed and duplex")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250606094321.483602-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Keith Busch [Wed, 11 Jun 2025 20:53:43 +0000 (13:53 -0700)]
io_uring: consistently use rcu semantics with sqpoll thread
The sqpoll thread is dereferenced with rcu read protection in one place,
so it needs to be annotated as an __rcu type, and should consistently
use rcu helpers for access and assignment to make sparse happy.
Since most of the accesses occur under the sqd->lock, we can use
rcu_dereference_protected() without declaring an rcu read section.
Provide a simple helper to get the thread from a locked context.
Fixes:
ac0b8b327a5677d ("io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250611205343.1821117-1-kbusch@meta.com
[axboe: fold in fix for register.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Darrick J. Wong [Wed, 11 Jun 2025 16:40:44 +0000 (09:40 -0700)]
fs: unlock the superblock during iterate_supers_type
This function takes super_lock in shared mode, so it should release the
same lock.
Cc: stable@vger.kernel.org # v6.16-rc1
Fixes:
af7551cf13cf7f ("super: remove pointless s_root checks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Link: https://lore.kernel.org/20250611164044.GF6138@frogsfrogsfrogs
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Amir Goldstein [Thu, 12 Jun 2025 07:22:45 +0000 (09:22 +0200)]
ovl: fix debug print in case of mkdir error
We want to print the name in case of mkdir failure and now we will
get a cryptic (efault) as name.
Fixes:
c54b386969a5 ("VFS: Change vfs_mkdir() to return the dentry.")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250612072245.2825938-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Rafael J. Wysocki [Thu, 12 Jun 2025 11:23:56 +0000 (13:23 +0200)]
Merge tag 'cpufreq-arm-fixes-6.16-rc' of git://git./linux/kernel/git/vireshk/pm
Merge CPUFreq fixes for 6.16-rc from Viresh Kumar:
"- Implement CpuId rust abstraction and use it to fix doctest failure
(Viresh Kumar).
- Minor cleanups in the `# Safety` sections for cpufreq abstractions
(Viresh Kumar)."
* tag 'cpufreq-arm-fixes-6.16-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
rust: cpu: Add CpuId::current() to retrieve current CPU ID
rust: Use CpuId in place of raw CPU numbers
rust: cpu: Introduce CpuId abstraction
cpufreq: Convert `/// SAFETY` lines to `# Safety` sections
Arnd Bergmann [Tue, 10 Jun 2025 09:34:55 +0000 (11:34 +0200)]
platform/x86/intel-uncore-freq: avoid non-literal format string
Using a string variable in place of a format string causes a W=1 build warning:
drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c:61:40: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
61 | length += sysfs_emit_at(buf, length, agent_name[agent]);
| ^~~~~~~~~~~~~~~~~
Use the safer "%s" format string to print it instead.
Fixes:
b98fa870fce2 ("platform/x86/intel-uncore-freq: Add attributes to show agent types")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20250610093459.2646337-1-arnd@kernel.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Xi Pardee [Tue, 10 Jun 2025 23:04:07 +0000 (16:04 -0700)]
platform/x86/intel/pmc: Add Panther Lake support to Intel PMC SSRAM Telemetry
Add Panther Lake support to Intel PMC SSRAM Telemetry driver.
Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://lore.kernel.org/r/20250610230416.622970-2-xi.pardee@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Xi Pardee [Tue, 10 Jun 2025 23:04:06 +0000 (16:04 -0700)]
platform/x86/intel/pmc: Add Lunar Lake support to Intel PMC SSRAM Telemetry
Add Lunar Lake support to Intel PMC SSRAM Telemetry driver.
Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
Link: https://lore.kernel.org/r/20250610230416.622970-1-xi.pardee@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Huacai Chen [Sun, 8 Jun 2025 14:12:35 +0000 (22:12 +0800)]
init: fix build warnings about export.h
After commit
a934a57a42f64a4 ("scripts/misc-check: check missing #include
<linux/export.h> when W=1") and
7d95680d64ac8e836c ("scripts/misc-check:
check unnecessary #include <linux/export.h> when W=1"), we get some build
warnings with W=1:
init/main.c: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing
init/initramfs.c: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing
So fix these build warnings for the init code.
Link: https://lkml.kernel.org/r/20250608141235.155206-1-chenhuacai@loongson.cn
Fixes:
a934a57a42f6 ("scripts/misc-check: check missing #include <linux/export.h> when W=1")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Barry Song [Mon, 9 Jun 2025 00:24:42 +0000 (12:24 +1200)]
MAINTAINERS: add Barry as a THP reviewer
I have been actively contributing to mTHP and reviewing related patches
for an extended period, and I would like to continue supporting patch
reviews.
Link: https://lkml.kernel.org/r/20250609002442.1856-1-21cnbao@gmail.com
Signed-off-by: Barry Song <baohua@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Dev Jain <dev.jain@arm.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Sun, 8 Jun 2025 00:43:18 +0000 (17:43 -0700)]
drivers/rapidio/rio_cm.c: prevent possible heap overwrite
In
riocm_cdev_ioctl(RIO_CM_CHAN_SEND)
-> cm_chan_msg_send()
-> riocm_ch_send()
cm_chan_msg_send() checks that userspace didn't send too much data but
riocm_ch_send() failed to check that userspace sent sufficient data. The
result is that riocm_ch_send() can write to fields in the rio_ch_chan_hdr
which were outside the bounds of the space which cm_chan_msg_send()
allocated.
Address this by teaching riocm_ch_send() to check that the entire
rio_ch_chan_hdr was copied in from userspace.
Reported-by: maher azz <maherazz04@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryan Roberts [Fri, 6 Jun 2025 09:28:07 +0000 (10:28 +0100)]
mm: close theoretical race where stale TLB entries could linger
Commit
3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a
parallel reclaim leaving stale TLB entries") described a theoretical race
as such:
"""
Nadav Amit identified a theoretical race between page reclaim and mprotect
due to TLB flushes being batched outside of the PTL being held.
He described the race as follows:
CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]
user writes using cached RW PTE
...
try_to_unmap_flush()
The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such as
munmap, mremap and madvise.
"""
The solution was to introduce flush_tlb_batched_pending() and call it
under the PTL from mprotect/madvise/munmap/mremap to complete any pending
tlb flushes.
However, while madvise_free_pte_range() and
madvise_cold_or_pageout_pte_range() were both retro-fitted to call
flush_tlb_batched_pending() immediately after initially acquiring the PTL,
they both temporarily release the PTL to split a large folio if they
stumble upon one. In this case, where re-acquiring the PTL
flush_tlb_batched_pending() must be called again, but it previously was
not. Let's fix that.
There are 2 Fixes: tags here: the first is the commit that fixed
madvise_free_pte_range(). The second is the commit that added
madvise_cold_or_pageout_pte_range(), which looks like it copy/pasted the
faulty pattern from madvise_free_pte_range().
This is a theoretical bug discovered during code review.
Link: https://lkml.kernel.org/r/20250606092809.4194056-1-ryan.roberts@arm.com
Fixes:
3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries")
Fixes:
9c276cc65a58 ("mm: introduce MADV_COLD")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Fri, 6 Jun 2025 12:50:32 +0000 (13:50 +0100)]
mm/vma: reset VMA iterator on commit_merge() OOM failure
While an OOM failure in commit_merge() isn't really feasible due to the
allocation which might fail (a maple tree pre-allocation) being 'too small
to fail', we do need to handle this case correctly regardless.
In vma_merge_existing_range(), we can theoretically encounter failures
which result in an OOM error in two ways - firstly dup_anon_vma() might
fail with an OOM error, and secondly commit_merge() failing, ultimately,
to pre-allocate a maple tree node.
The abort logic for dup_anon_vma() resets the VMA iterator to the initial
range, ensuring that any logic looping on this iterator will correctly
proceed to the next VMA.
However the commit_merge() abort logic does not do the same thing. This
resulted in a syzbot report occurring because mlockall() iterates through
VMAs, is tolerant of errors, but ended up with an incorrect previous VMA
being specified due to incorrect iterator state.
While making this change, it became apparent we are duplicating logic -
the logic introduced in commit
41e6ddcaa0f1 ("mm/vma: add give_up_on_oom
option on modify/merge, use in uffd release") duplicates the
vmg->give_up_on_oom check in both abort branches.
Additionally, we observe that we can perform the anon_dup check safely on
dup_anon_vma() failure, as this will not be modified should this call
fail.
Finally, we need to reset the iterator in both cases, so now we can simply
use the exact same code to abort for both.
We remove the VM_WARN_ON(err != -ENOMEM) as it would be silly for this to
be otherwise and it allows us to implement the abort check more neatly.
Link: https://lkml.kernel.org/r/20250606125032.164249-1-lorenzo.stoakes@oracle.com
Fixes:
47b16d0462a4 ("mm: abort vma_modify() on merge out of memory failure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: syzbot+d16409ea9ecc16ed261a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/
6842cc67.
a00a0220.29ac89.003b.GAE@google.com/
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
wangfushuai [Sat, 7 Jun 2025 15:36:14 +0000 (23:36 +0800)]
docs: proc: update VmFlags documentation in smaps
Remove outdated VM_DENYWRITE("dw") reference and add missing
VM_LOCKONFAULT("lf") and VM_UFFD_MINOR("ui") flags.
[akpm@linux-foundation.org: add "dp" (VM_DROPPABLE), per Tal]
Link: https://lkml.kernel.org/r/20250607153614.81914-1-wangfushuai@baidu.com
Signed-off-by: wangfushuai <wangfushuai@baidu.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mariano Pache <npache@redhat.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Tal Zussman <tz2294@columbia.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Thu, 5 Jun 2025 00:23:37 +0000 (17:23 -0700)]
scatterlist: fix extraneous '@'-sign kernel-doc notation
Using "@argname@" in kernel-doc produces "argname****" (with "argname" in
bold) in the generated html output, so use the expected kernel-doc
notation of just "@argname" instead.
"Fixes:" lines are added in case Matthew's patch [1] is backported.
Link: https://lkml.kernel.org/r/20250605002337.2842659-1-rdunlap@infradead.org
Link: https://lore.kernel.org/linux-doc/3bc4e779-7a79-42c1-8867-024f643a22fc@infradead.org/T/#m5d2bd9d21fb34f297aa4e7db069f09bc27b89007
Fixes:
0db9299f48eb ("SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers")
Fixes:
8d1d4b538bb1 ("scatterlist: inline sg_next()")
Fixes:
18dabf473e15 ("Change table chaining layout")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mark Brown [Thu, 5 Jun 2025 21:34:31 +0000 (22:34 +0100)]
selftests/mm: skip failed memfd setups in gup_longterm
Unlike the other cases gup_longterm's memfd tests previously skipped the
test when failing to set up the file descriptor to test. Restore this
behavior to avoid hitting failures when hugetlb isn't configured.
Link: https://lkml.kernel.org/r/20250605-selftest-mm-gup-longterm-tweaks-v1-1-2fae34b05958@kernel.org
Fixes:
66bce7afbaca ("selftests/mm: fix test result reporting in gup_longterm")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reported-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Closes: https://lkml.kernel.org/r/
a76fc252-0fe3-4d4b-a9a1-
4a2895c2680d@lucifer.local
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Viresh Kumar [Mon, 9 Jun 2025 11:14:16 +0000 (16:44 +0530)]
rust: cpu: Add CpuId::current() to retrieve current CPU ID
Introduce `CpuId::current()`, a constructor that wraps the C function
`raw_smp_processor_id()` to retrieve the current CPU identifier without
guaranteeing stability.
This function should be used only when the caller can ensure that
the CPU ID won't change unexpectedly due to preemption or migration.
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>