Linus Torvalds [Mon, 26 May 2025 16:33:44 +0000 (09:33 -0700)]
Merge tag 'vfs-6.16-rc1.super' of git://git./linux/kernel/git/vfs/vfs
Pull vfs freezing updates from Christian Brauner:
"This contains various filesystem freezing related work for this cycle:
- Allow the power subsystem to support filesystem freeze for suspend
and hibernate.
Now all the pieces are in place to actually allow the power
subsystem to freeze/thaw filesystems during suspend/resume.
Filesystems are only frozen and thawed if the power subsystem does
actually own the freeze.
If the filesystem is already frozen by the time we've frozen all
userspace processes we don't care to freeze it again. That's
userspace's job once the process resumes. We only actually freeze
filesystems if we absolutely have to and we ignore other failures
to freeze.
We could bubble up errors and fail suspend/resume if the error
isn't EBUSY (aka it's already frozen) but I don't think that this
is worth it. Filesystem freezing during suspend/resume is
best-effort. If the user has 500 ext4 filesystems mounted and 4
fail to freeze for whatever reason then we simply skip them.
What we have now is already a big improvement and let's see how we
fare with it before making our lives even harder (and uglier) than
we have to.
- Allow efivars to support freeze and thaw
Allow efivarfs to partake to resync variable state during system
hibernation and suspend. Add freeze/thaw support.
This is a pretty straightforward implementation. We simply add
regular freeze/thaw support for both userspace and the kernel.
efivars is the first pseudofilesystem that adds support for
filesystem freezing and thawing.
The simplicity comes from the fact that we simply always resync
variable state after efivarfs has been frozen. It doesn't matter
whether that's because of suspend, userspace initiated freeze or
hibernation. Efivars is simple enough that it doesn't matter that
we walk all dentries. There are no directories and there aren't
insane amounts of entries and both freeze/thaw are already
heavy-handed operations. If userspace initiated a freeze/thaw cycle
they would need CAP_SYS_ADMIN in the initial user namespace (as
that's where efivarfs is mounted) so it can't be triggered by
random userspace. IOW, we really really don't care"
* tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
f2fs: fix freezing filesystem during resize
kernfs: add warning about implementing freeze/thaw
efivarfs: support freeze/thaw
power: freeze filesystems during suspend/resume
libfs: export find_next_child()
super: add filesystem freezing helpers for suspend and hibernate
gfs2: pass through holder from the VFS for freeze/thaw
super: use common iterator (Part 2)
super: use a common iterator (Part 1)
super: skip dying superblocks early
super: simplify user_get_super()
super: remove pointless s_root checks
fs: allow all writers to be frozen
locking/percpu-rwsem: add freezable alternative to down_read
Linus Torvalds [Mon, 26 May 2025 16:02:39 +0000 (09:02 -0700)]
Merge tag 'vfs-6.16-rc1.misc' of git://git./linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Use folios for symlinks in the page cache
FUSE already uses folios for its symlinks. Mirror that conversion
in the generic code and the NFS code. That lets us get rid of a few
folio->page->folio conversions in this path, and some of the few
remaining users of read_cache_page() / read_mapping_page()
- Try and make a few filesystem operations killable on the VFS
inode->i_mutex level
- Add sysctl vfs_cache_pressure_denom for bulk file operations
Some workloads need to preserve more dentries than we currently
allow through out sysctl interface
A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
involves scanning all files and caching their metadata (including
dentries and inodes) in memory. Each HDD contains approximately 2
million files, resulting in a total of ~20 million cached dentries
after initialization
To minimize dentry reclamation, they set vfs_cache_pressure to 1.
Despite this configuration, memory pressure conditions can still
trigger reclamation of up to 50% of cached dentries, reducing the
cache from 20 million to approximately 10 million entries. During
the subsequent cache rebuild period, any HDFS datanode restart
operation incurs substantial latency penalties until full cache
recovery completes
To maintain service stability, more dentries need to be preserved
during memory reclamation. The current minimum reclaim ratio (1/100
of total dentries) remains too aggressive for such workload. This
patch introduces vfs_cache_pressure_denom for more granular cache
pressure control
The configuration [vfs_cache_pressure=1,
vfs_cache_pressure_denom=10000] effectively maintains the full 20
million dentry cache under memory pressure, preventing datanode
restart performance degradation
- Avoid some jumps in inode_permission() using likely()/unlikely()
- Avid a memory access which is most likely a cache miss when
descending into devcgroup_inode_permission()
- Add fastpath predicts for stat() and fdput()
- Anonymous inodes currently don't come with a proper mode causing
issues in the kernel when we want to add useful VFS debug assert.
Fix that by giving them a proper mode and masking it off when we
report it to userspace which relies on them not having any mode
- Anonymous inodes currently allow to change inode attributes because
the VFS falls back to simple_setattr() if i_op->setattr isn't
implemented. This means the ownership and mode for every single
user of anon_inode_inode can be changed. Block that as it's either
useless or actively harmful. If specific ownership is needed the
respective subsystem should allocate anonymous inodes from their
own private superblock
- Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock
- Add proper tests for anonymous inode behavior
- Make it easy to detect proper anonymous inodes and to ensure that
we can detect them in codepaths such as readahead()
Cleanups:
- Port pidfs to the new anon_inode_{g,s}etattr() helpers
- Try to remove the uselib() system call
- Add unlikely branch hint return path for poll
- Add unlikely branch hint on return path for core_sys_select
- Don't allow signals to interrupt getdents copying for fuse
- Provide a size hint to dir_context for during readdir()
- Use writeback_iter directly in mpage_writepages
- Update compression and mtime descriptions in initramfs
documentation
- Update main netfs API document
- Remove useless plus one in super_cache_scan()
- Remove unnecessary NULL-check guards during setns()
- Add separate separate {get,put}_cgroup_ns no-op cases
Fixes:
- Fix typo in root= kernel parameter description
- Use KERN_INFO for infof()|info_plog()|infofc()
- Correct comments of fs_validate_description()
- Mark an unlikely if condition with unlikely() in
vfs_parse_monolithic_sep()
- Delete macro fsparam_u32hex()
- Remove unused and problematic validate_constant_table()
- Fix potential unsigned integer underflow in fs_name()
- Make file-nr output the total allocated file handles"
* tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits)
fs: Pass a folio to page_put_link()
nfs: Use a folio in nfs_get_link()
fs: Convert __page_get_link() to use a folio
fs/read_write: make default_llseek() killable
fs/open: make do_truncate() killable
fs/open: make chmod_common() and chown_common() killable
include/linux/fs.h: add inode_lock_killable()
readdir: supply dir_context.count as readdir buffer size hint
vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
fuse: don't allow signals to interrupt getdents copying
Documentation: fix typo in root= kernel parameter description
include/cgroup: separate {get,put}_cgroup_ns no-op case
kernel/nsproxy: remove unnecessary guards
fs: use writeback_iter directly in mpage_writepages
fs: remove useless plus one in super_cache_scan()
fs: add S_ANON_INODE
fs: remove uselib() system call
device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
fs/fs_parse: Remove unused and problematic validate_constant_table()
fs: touch up predicts in inode_permission()
...
Linus Torvalds [Mon, 26 May 2025 15:46:59 +0000 (08:46 -0700)]
Merge tag 'vfs-6.16-rc1.mount.api' of git://git./linux/kernel/git/vfs/vfs
Pull vfs mount api conversions from Christian Brauner:
"This converts the bfs and omfs filesystems to the new mount api"
* tag 'vfs-6.16-rc1.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
omfs: convert to new mount API
bfs: convert bfs to use the new mount api
Linus Torvalds [Mon, 26 May 2025 15:23:09 +0000 (08:23 -0700)]
Merge tag 'vfs-6.16-rc1.writepage' of git://git./linux/kernel/git/vfs/vfs
Pull final writepage conversion from Christian Brauner:
"This converts vboxfs from ->writepage() to ->writepages().
This was the last user of the ->writepage() method. So remove
->writepage() completely and all references to it"
* tag 'vfs-6.16-rc1.writepage' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: Remove aops->writepage
mm: Remove swap_writepage() and shmem_writepage()
ttm: Call shmem_writeout() from ttm_backup_backup_page()
i915: Use writeback_iter()
shmem: Add shmem_writeout()
writeback: Remove writeback_use_writepage()
migrate: Remove call to ->writepage
vboxsf: Convert to writepages
9p: Add a migrate_folio method
Linus Torvalds [Mon, 26 May 2025 15:02:43 +0000 (08:02 -0700)]
Merge tag 'vfs-6.16-rc1.async.dir' of git://git./linux/kernel/git/vfs/vfs
Pull vfs directory lookup updates from Christian Brauner:
"This contains cleanups for the lookup_one*() family of helpers.
We expose a set of functions with names containing "lookup_one_len"
and others without the "_len". This difference has nothing to do with
"len". It's rater a historical accident that can be confusing.
The functions without "_len" take a "mnt_idmap" pointer. This is found
in the "vfsmount" and that is an important question when choosing
which to use: do you have a vfsmount, or are you "inside" the
filesystem. A related question is "is permission checking relevant
here?".
nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
functions. They pass nop_mnt_idmap and refuse to work on filesystems
which have any other idmap.
This work changes nfsd and cachefile to use the lookup_one family of
functions and to explictily pass &nop_mnt_idmap which is consistent
with all other vfs interfaces used where &nop_mnt_idmap is explicitly
passed.
The remaining uses of the "_one" functions do not require permission
checks so these are renamed to be "_noperm" and the permission
checking is removed.
This series also changes these lookup function to take a qstr instead
of separate name and len. In many cases this simplifies the call"
* tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
VFS: change lookup_one_common and lookup_noperm_common to take a qstr
Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
VFS: rename lookup_one_len family to lookup_noperm and remove permission check
cachefiles: Use lookup_one() rather than lookup_one_len()
nfsd: Use lookup_one() rather than lookup_one_len()
VFS: improve interface for lookup_one functions
Linus Torvalds [Sun, 25 May 2025 23:09:23 +0000 (16:09 -0700)]
Linux 6.15
Linus Torvalds [Sun, 25 May 2025 22:43:36 +0000 (15:43 -0700)]
Disable FOP_DONTCACHE for now due to bugs
This is kind of last-minute, but Al Viro reported that the new
FOP_DONTCACHE flag causes memory corruption due to use-after-free
issues.
This was triggered by commit
974c5e6139db ("xfs: flag as supporting
FOP_DONTCACHE"), but that is not the underlying bug - it is just the
first user of the flag.
Vlastimil Babka suspects the underlying problem stems from the
folio_end_writeback() logic introduced in commit
fb7d3bc414939
("mm/filemap: drop streaming/uncached pages when writeback completes").
The most straightforward fix would be to just revert the commit that
exposed this, but Matthew Wilcox points out that other filesystems are
also starting to enable the FOP_DONTCACHE logic, so this instead
disables that bit globally for now.
The fix will hopefully end up being trivial and we can just re-enable
this logic after more testing, but until such a time we'll have to
disable the new FOP_DONTCACHE flag.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/all/20250525083209.GS2023217@ZenIV/
Triggered-by: 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE")
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 25 May 2025 14:48:35 +0000 (07:48 -0700)]
Merge tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"22 hotfixes.
13 are cc:stable and the remainder address post-6.14 issues or aren't
considered necessary for -stable kernels. 19 are for MM"
* tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mailmap: add Jarkko's employer email address
mm: fix copy_vma() error handling for hugetlb mappings
memcg: always call cond_resched() after fn()
mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
mm: vmalloc: only zero-init on vrealloc shrink
mm: vmalloc: actually use the in-place vrealloc region
alloc_tag: allocate percpu counters for module tags dynamically
module: release codetag section when module load fails
mm/cma: make detection of highmem_start more robust
MAINTAINERS: add mm memory policy section
MAINTAINERS: add mm ksm section
kasan: avoid sleepable page allocation from atomic context
highmem: add folio_test_partial_kmap()
MAINTAINERS: add hung-task detector section
taskstats: fix struct taskstats breaks backward compatibility since version 15
mm/truncate: fix out-of-bounds when doing a right-aligned split
MAINTAINERS: add mm reclaim section
MAINTAINERS: update page allocator section
mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y
mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
...
Jarkko Sakkinen [Fri, 23 May 2025 12:11:04 +0000 (15:11 +0300)]
mailmap: add Jarkko's employer email address
Add the current employer email address to mailmap.
Link: https://lkml.kernel.org/r/20250523121105.15850-1-jarkko@kernel.org
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Cc: Antonio Quartulli <antonio@openvpn.net>
Cc: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ricardo Cañuelo Navarro [Fri, 23 May 2025 12:19:10 +0000 (14:19 +0200)]
mm: fix copy_vma() error handling for hugetlb mappings
If, during a mremap() operation for a hugetlb-backed memory mapping,
copy_vma() fails after the source vma has been duplicated and opened (ie.
vma_link() fails), the error is handled by closing the new vma. This
updates the hugetlbfs reservation counter of the reservation map which at
this point is referenced by both the source vma and the new copy. As a
result, once the new vma has been freed and copy_vma() returns, the
reservation counter for the source vma will be incorrect.
This patch addresses this corner case by clearing the hugetlb private page
reservation reference for the new vma and decrementing the reference
before closing the vma, so that vma_close() won't update the reservation
counter. This is also what copy_vma_and_data() does with the source vma
if copy_vma() succeeds, so a helper function has been added to do the
fixup in both functions.
The issue was reported by a private syzbot instance and can be reproduced
using the C reproducer in [1]. It's also a possible duplicate of public
syzbot report [2]. The WARNING report is:
============================================================
page_counter underflow: -1024 nr_pages=1024
WARNING: CPU: 0 PID: 3287 at mm/page_counter.c:61 page_counter_cancel+0xf6/0x120
Modules linked in:
CPU: 0 UID: 0 PID: 3287 Comm: repro__WARNING_ Not tainted 6.15.0-rc7+ #54 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
RIP: 0010:page_counter_cancel+0xf6/0x120
Code: ff 5b 41 5e 41 5f 5d c3 cc cc cc cc e8 f3 4f 8f ff c6 05 64 01 27 06 01 48 c7 c7 60 15 f8 85 48 89 de 4c 89 fa e8 2a a7 51 ff <0f> 0b e9 66 ff ff ff 44 89 f9 80 e1 07 38 c1 7c 9d 4c 81
RSP: 0018:
ffffc900025df6a0 EFLAGS:
00010246
RAX:
2edfc409ebb44e00 RBX:
fffffffffffffc00 RCX:
ffff8880155f0000
RDX:
0000000000000000 RSI:
0000000000000001 RDI:
0000000000000000
RBP:
dffffc0000000000 R08:
ffffffff81c4a23c R09:
1ffff1100330482a
R10:
dffffc0000000000 R11:
ffffed100330482b R12:
0000000000000000
R13:
ffff888058a882c0 R14:
ffff888058a882c0 R15:
0000000000000400
FS:
0000000000000000(0000) GS:
ffff88808fc53000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000004b33e0 CR3:
00000000076d6000 CR4:
00000000000006f0
Call Trace:
<TASK>
page_counter_uncharge+0x33/0x80
hugetlb_cgroup_uncharge_counter+0xcb/0x120
hugetlb_vm_op_close+0x579/0x960
? __pfx_hugetlb_vm_op_close+0x10/0x10
remove_vma+0x88/0x130
exit_mmap+0x71e/0xe00
? __pfx_exit_mmap+0x10/0x10
? __mutex_unlock_slowpath+0x22e/0x7f0
? __pfx_exit_aio+0x10/0x10
? __up_read+0x256/0x690
? uprobe_clear_state+0x274/0x290
? mm_update_next_owner+0xa9/0x810
__mmput+0xc9/0x370
exit_mm+0x203/0x2f0
? __pfx_exit_mm+0x10/0x10
? taskstats_exit+0x32b/0xa60
do_exit+0x921/0x2740
? do_raw_spin_lock+0x155/0x3b0
? __pfx_do_exit+0x10/0x10
? __pfx_do_raw_spin_lock+0x10/0x10
? _raw_spin_lock_irq+0xc5/0x100
do_group_exit+0x20c/0x2c0
get_signal+0x168c/0x1720
? __pfx_get_signal+0x10/0x10
? schedule+0x165/0x360
arch_do_signal_or_restart+0x8e/0x7d0
? __pfx_arch_do_signal_or_restart+0x10/0x10
? __pfx___se_sys_futex+0x10/0x10
syscall_exit_to_user_mode+0xb8/0x2c0
do_syscall_64+0x75/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x422dcd
Code: Unable to access opcode bytes at 0x422da3.
RSP: 002b:
00007ff266cdb208 EFLAGS:
00000246 ORIG_RAX:
00000000000000ca
RAX:
0000000000000001 RBX:
00007ff266cdbcdc RCX:
0000000000422dcd
RDX:
00000000000f4240 RSI:
0000000000000081 RDI:
00000000004c7bec
RBP:
00007ff266cdb220 R08:
203a6362696c6720 R09:
203a6362696c6720
R10:
0000200000c00000 R11:
0000000000000246 R12:
ffffffffffffffd0
R13:
0000000000000002 R14:
00007ffe1cb5f520 R15:
00007ff266cbb000
</TASK>
============================================================
Link: https://lkml.kernel.org/r/20250523-warning_in_page_counter_cancel-v2-1-b6df1a8cfefd@igalia.com
Link: https://people.igalia.com/rcn/kernel_logs/20250422__WARNING_in_page_counter_cancel__repro.c
Link: https://lore.kernel.org/all/67000a50.050a0220.49194.048d.GAE@google.com/
Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Florent Revest <revest@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Breno Leitao [Fri, 23 May 2025 17:21:06 +0000 (10:21 -0700)]
memcg: always call cond_resched() after fn()
I am seeing soft lockup on certain machine types when a cgroup OOMs. This
is happening because killing the process in certain machine might be very
slow, which causes the soft lockup and RCU stalls. This happens usually
when the cgroup has MANY processes and memory.oom.group is set.
Example I am seeing in real production:
[462012.244552] Memory cgroup out of memory: Killed process
3370438 (crosvm) ....
....
[462037.318059] Memory cgroup out of memory: Killed process
4171372 (adb) ....
[462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:
1618982]
....
Quick look at why this is so slow, it seems to be related to serial flush
for certain machine types. For all the crashes I saw, the target CPU was
at console_flush_all().
In the case above, there are thousands of processes in the cgroup, and it
is soft locking up before it reaches the 1024 limit in the code (which
would call the cond_resched()). So, cond_resched() in 1024 blocks is not
sufficient.
Remove the counter-based conditional rescheduling logic and call
cond_resched() unconditionally after each task iteration, after fn() is
called. This avoids the lockup independently of how slow fn() is.
Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org
Fixes:
ade81479c7dd ("memcg: fix soft lockup in the OOM process")
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Rik van Riel <riel@surriel.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Michael van der Westhuizen <rmikey@meta.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ge Yang [Thu, 22 May 2025 03:22:17 +0000 (11:22 +0800)]
mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
A kernel crash was observed when replacing free hugetlb folios:
BUG: kernel NULL pointer dereference, address:
0000000000000028
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
RSP: 0018:
ffffc9000b30fa90 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
0000000000342cca RCX:
ffffea0043000000
RDX:
ffffc9000b30fb08 RSI:
ffffea0043000000 RDI:
0000000000000000
RBP:
ffffc9000b30fb20 R08:
0000000000001000 R09:
0000000000000000
R10:
ffff88886f92eb00 R11:
0000000000000000 R12:
ffffea0043000000
R13:
0000000000000000 R14:
00000000010c0200 R15:
0000000000000004
FS:
00007fcda5f14740(0000) GS:
ffff8888ec1d8000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000028 CR3:
0000000391402000 CR4:
0000000000350ef0
Call Trace:
<TASK>
replace_free_hugepage_folios+0xb6/0x100
alloc_contig_range_noprof+0x18a/0x590
? srso_return_thunk+0x5/0x5f
? down_read+0x12/0xa0
? srso_return_thunk+0x5/0x5f
cma_range_alloc.constprop.0+0x131/0x290
__cma_alloc+0xcf/0x2c0
cma_alloc_write+0x43/0xb0
simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
debugfs_attr_write+0x46/0x70
full_proxy_write+0x62/0xa0
vfs_write+0xf8/0x420
? srso_return_thunk+0x5/0x5f
? filp_flush+0x86/0xa0
? srso_return_thunk+0x5/0x5f
? filp_close+0x1f/0x30
? srso_return_thunk+0x5/0x5f
? do_dup2+0xaf/0x160
? srso_return_thunk+0x5/0x5f
ksys_write+0x65/0xe0
do_syscall_64+0x64/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
There is a potential race between __update_and_free_hugetlb_folio() and
replace_free_hugepage_folios():
CPU1 CPU2
__update_and_free_hugetlb_folio replace_free_hugepage_folios
folio_test_hugetlb(folio)
-- It's still hugetlb folio.
__folio_clear_hugetlb(folio)
hugetlb_free_folio(folio)
h = folio_hstate(folio)
-- Here, h is NULL pointer
When the above race condition occurs, folio_hstate(folio) returns NULL,
and subsequent access to this NULL pointer will cause the system to crash.
To resolve this issue, execute folio_hstate(folio) under the protection
of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not
return NULL.
Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com
Fixes:
04f13d241b8b ("mm: replace free hugepage folios after migration")
Signed-off-by: Ge Yang <yangge1116@126.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 15 May 2025 21:42:16 +0000 (14:42 -0700)]
mm: vmalloc: only zero-init on vrealloc shrink
The common case is to grow reallocations, and since init_on_alloc will
have already zeroed the whole allocation, we only need to zero when
shrinking the allocation.
Link: https://lkml.kernel.org/r/20250515214217.619685-2-kees@kernel.org
Fixes:
a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing")
Signed-off-by: Kees Cook <kees@kernel.org>
Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: "Erhard F." <erhard_f@mailbox.org>
Cc: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 15 May 2025 21:42:15 +0000 (14:42 -0700)]
mm: vmalloc: actually use the in-place vrealloc region
Patch series "mm: vmalloc: Actually use the in-place vrealloc region".
This fixes a performance regression[1] with vrealloc()[1].
The refactoring to not build a new vmalloc region only actually worked
when shrinking. Actually return the resized area when it grows. Ugh.
Link: https://lkml.kernel.org/r/20250515214217.619685-1-kees@kernel.org
Fixes:
a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing")
Signed-off-by: Kees Cook <kees@kernel.org>
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Closes: https://lore.kernel.org/all/
20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg [1]
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Cc: "Erhard F." <erhard_f@mailbox.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Sat, 17 May 2025 00:07:39 +0000 (17:07 -0700)]
alloc_tag: allocate percpu counters for module tags dynamically
When a module gets unloaded it checks whether any of its tags are still in
use and if so, we keep the memory containing module's allocation tags
alive until all tags are unused. However percpu counters referenced by
the tags are freed by free_module(). This will lead to UAF if the memory
allocated by a module is accessed after module was unloaded.
To fix this we allocate percpu counters for module allocation tags
dynamically and we keep it alive for tags which are still in use after
module unloading. This also removes the requirement of a larger
PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because
percpu memory for counters does not need to be reserved anymore.
Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com
Fixes:
0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/all/
20250516131246.6244-1-
00107082@163.com/
Tested-by: David Wang <00107082@163.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Wang [Mon, 19 May 2025 16:38:23 +0000 (00:38 +0800)]
module: release codetag section when module load fails
When module load fails after memory for codetag section is ready, codetag
section memory will not be properly released. This causes memory leak,
and if next module load happens to get the same module address, codetag
may pick the uninitialized section when manipulating tags during module
unload, and leads to "unable to handle page fault" BUG.
Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com
Fixes:
0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory")
Closes: https://lore.kernel.org/all/
20250516131246.6244-1-
00107082@163.com/
Signed-off-by: David Wang <00107082@163.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mike Rapoport (Microsoft) [Mon, 19 May 2025 17:18:05 +0000 (20:18 +0300)]
mm/cma: make detection of highmem_start more robust
Pratyush Yadav reports the following crash:
------------[ cut here ]------------
kernel BUG at arch/x86/mm/physaddr.c:23!
ception 0x06 IP 10:
ffffffff812ebbf8 error 0 cr2 0xffff88903ffff000
CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc6+ #231 PREEMPT(undef)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
RIP: 0010:__phys_addr+0x58/0x60
Code: 01 48 89 c2 48 d3 ea 48 85 d2 75 05 e9 91 52 cf 00 0f 0b 48 3d ff ff ff 1f 77 0f 48 8b 05 20 54 55 01 48 01 d0 e9 78 52 cf 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0000:
ffffffff82803dd8 EFLAGS:
00010006 ORIG_RAX:
0000000000000000
RAX:
000000007fffffff RBX:
00000000ffffffff RCX:
0000000000000000
RDX:
000000007fffffff RSI:
0000000280000000 RDI:
ffffffffffffffff
RBP:
ffffffff82803e68 R08:
0000000000000000 R09:
0000000000000000
R10:
ffffffff83153180 R11:
ffffffff82803e48 R12:
ffffffff83c9aed0
R13:
0000000000000000 R14:
0000001040000000 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
0000000000000000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffff88903ffff000 CR3:
0000000002838000 CR4:
00000000000000b0
Call Trace:
<TASK>
? __cma_declare_contiguous_nid+0x6e/0x340
? cma_declare_contiguous_nid+0x33/0x70
? dma_contiguous_reserve_area+0x2f/0x70
? setup_arch+0x6f1/0x870
? start_kernel+0x52/0x4b0
? x86_64_start_reservations+0x29/0x30
? x86_64_start_kernel+0x7c/0x80
? common_startup_64+0x13e/0x141
The reason is that __cma_declare_contiguous_nid() does:
highmem_start = __pa(high_memory - 1) + 1;
If dma_contiguous_reserve_area() (or any other CMA declaration) is
called before free_area_init(), high_memory is uninitialized. Without
CONFIG_DEBUG_VIRTUAL, it will likely work but use the wrong value for
highmem_start.
The issue occurs because commit
e120d1bc12da ("arch, mm: set high_memory
in free_area_init()") moved initialization of high_memory after the call
to dma_contiguous_reserve() -> __cma_declare_contiguous_nid() on several
architectures.
In the case CONFIG_HIGHMEM is enabled, some architectures that actually
support HIGHMEM (arm, powerpc and x86) have initialization of high_memory
before a possible call to __cma_declare_contiguous_nid() and some
initialized high_memory late anyway (arc, csky, microblase, mips, sparc,
xtensa) even before the commit
e120d1bc12da so they are fine with using
uninitialized value of high_memory.
And in the case CONFIG_HIGHMEM is disabled high_memory essentially becomes
the first address after memory end, so instead of relying on high_memory
to calculate highmem_start use memblock_end_of_DRAM() and eliminate the
dependency of CMA area creation on high_memory in majority of
configurations.
Link: https://lkml.kernel.org/r/20250519171805.1288393-1-rppt@kernel.org
Fixes:
e120d1bc12da ("arch, mm: set high_memory in free_area_init()")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: Pratyush Yadav <ptyadav@amazon.de>
Tested-by: Pratyush Yadav <ptyadav@amazon.de>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sun, 25 May 2025 01:54:18 +0000 (18:54 -0700)]
Merge tag 'input-for-v6.15-rc7' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- even more Xbox controllers added to xpad driver: Turtle Beach Recon
Wired Controller, Turtle Beach Stealth Ultra, and PowerA Wired
Controller
- a fix to Synaptics RMI driver to not crash if controller reports
unsupported version of F34 (firmware flash) function
* tag 'input-for-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi - fix crash with unsupported versions of F34
Input: xpad - add more controllers
Linus Torvalds [Sun, 25 May 2025 01:48:17 +0000 (18:48 -0700)]
Merge tag 'spi-fix-v6.15-rc7' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few final fixes for v6.15, some driver fixes for the Freescale DSPI
driver pulled over from their vendor code and another instance of the
fixes Greg has been sending throughout the kernel for constification
of the bus_type in driver core match() functions"
* tag 'spi-fix-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-fsl-dspi: Reset SR flags before sending a new message
spi: spi-fsl-dspi: Halt the module after a new message transfer
spi: spi-fsl-dspi: restrict register range for regmap access
spi: use container_of_cont() for to_spi_device()
Linus Torvalds [Sat, 24 May 2025 16:01:41 +0000 (09:01 -0700)]
Merge tag 'iommu-fixes-v6.15-rc7' of git://git./linux/kernel/git/iommu/linux
Pull iommu fix from Joerg Roedel:
- core: skip PASID validation for devices without PASID support
* tag 'iommu-fixes-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu: Skip PASID validation for devices without PASID capability
Linus Torvalds [Fri, 23 May 2025 22:17:55 +0000 (15:17 -0700)]
Merge tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes pull, on target to be quiet, just one amdgpu, one
edid and a few minor xe fixes.
edid:
- fix HDR metadata reset
amdgpu:
- Hibernate fix
xe:
- Make sure to check all forcewakes when dumping mocs
- Fix wrong use of read64 on 32b register
- Synchronize Panther Lake PCI IDs"
* tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/ptl: Update the PTL pci id table
drm/xe: Use xe_mmio_read32() to read mtcfg register
drm/xe/mocs: Check if all domains awake
Revert "drm/amd: Keep display off while going into S4"
drm/edid: fixed the bug that hdr metadata was not reset
Dave Airlie [Fri, 23 May 2025 21:42:16 +0000 (07:42 +1000)]
Merge tag 'drm-xe-fixes-2025-05-23' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Make sure to check all forcewakes when dumping mocs
- Fix wrong use of read64 on 32b register
- Synchronize Panther Lake PCI IDs
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/uixp5cq7emz32lmwwvq4vbujppugfozhyj3cm2aqzx4lcg7ivn@m2khvf4kvz5p
Dave Airlie [Fri, 23 May 2025 21:41:55 +0000 (07:41 +1000)]
Merge tag 'amd-drm-fixes-6.15-2025-05-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.15-2025-05-22:
amdgpu:
- Hibernate fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250522183941.9606-1-alexander.deucher@amd.com
Dave Airlie [Fri, 23 May 2025 21:37:45 +0000 (07:37 +1000)]
Merge tag 'drm-misc-fixes-2025-05-22' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
edid:
- fix HDR metadata reset
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250522113902.GA7000@localhost.localdomain
Linus Torvalds [Fri, 23 May 2025 16:47:43 +0000 (09:47 -0700)]
Merge tag 'thermal-6.15-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"This fixes a coding mistake in the x86_pkg_temp_thermal Intel thermal
driver that was introduced by an incorrect conflict resolution during
a merge (Zhang Rui)"
* tag 'thermal-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: x86_pkg_temp_thermal: Fix bogus trip temperature
Linus Torvalds [Fri, 23 May 2025 15:42:29 +0000 (08:42 -0700)]
Merge tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix for rename regression due to the recent VFS lookup changes
- Fix write failure
- locking fix for oplock handling
* tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: use list_first_entry_or_null for opinfo_get_list()
ksmbd: fix rename failure
ksmbd: fix stream write failure
Linus Torvalds [Fri, 23 May 2025 15:04:13 +0000 (08:04 -0700)]
Merge tag 'soc-fixes-6.15-3' of git://git./linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"A few last minute fixes:
- two driver fixes for samsung/google platforms, both addressing
mistakes in changes from the 6.15 merge window
- a revert for an allwinner devicetree change that caused problems
- a fix for an older regression with the LEDs on Marvell Armada 3720
- a defconfig change to enable chacha20 again after a crypto
subsystem change in 6.15 inadventently turned it off"
* tag 'soc-fixes-6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selected
arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs
Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection"
soc: samsung: usi: prevent wrong bits inversion during unconfiguring
firmware: exynos-acpm: check saved RX before bailing out on empty RX queue
Linus Torvalds [Fri, 23 May 2025 14:59:39 +0000 (07:59 -0700)]
Merge tag 'platform-drivers-x86-v6.15-6' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers fixes from Ilpo Järvinen:
- dell-wmi-sysman: Avoid buffer overflow in current_password_store()
- fujitsu-laptop: Support Lifebook S2110 hotkeys
- intel/pmc: Fix Arrow Lake U/H NPU PCI ID
- think-lmi: Fix attribute name usage for non-compliant items
- thinkpad_acpi: Ignore battery threshold change event notification
* tag 'platform-drivers-x86-v6.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID
platform/x86: think-lmi: Fix attribute name usage for non-compliant items
platform/x86: thinkpad_acpi: Ignore battery threshold change event notification
platform/x86: dell-wmi-sysman: Avoid buffer overflow in current_password_store()
platform/x86: fujitsu-laptop: Support Lifebook S2110 hotkeys
Linus Torvalds [Fri, 23 May 2025 14:51:05 +0000 (07:51 -0700)]
Merge tag 'vfs-6.15-rc8.fixes' of git://git./linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a small set of fixes for the blocking buffer lookup
conversion done earlier this cycle.
It adds a missing conversion in the getblk slowpath and a few minor
optimizations and cleanups"
* tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/buffer: optimize discard_buffer()
fs/buffer: remove superfluous statements
fs/buffer: avoid redundant lookup in getblk slowpath
fs/buffer: use sleeping lookup in __getblk_slowpath()
Todd Brandt [Tue, 20 May 2025 10:45:55 +0000 (03:45 -0700)]
platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID
The ARL requires that the GMA and NPU devices both be in D3Hot in order
for PC10 and S0iX to be achieved in S2idle. The original ARL-H/U addition
to the intel_pmc_core driver attempted to do this by switching them to D3
in the init and resume calls of the intel_pmc_core driver.
The problem is the ARL-H/U have a different NPU device and thus are not
being properly set and thus S0iX does not work properly in ARL-H/U. This
patch creates a new ARL-H specific device id that is correct and also
adds the D3 fixup to the suspend callback. This way if the PCI devies
drop from D3 to D0 after resume they can be corrected for the next
suspend. Thus there is no dropout in S0iX.
Fixes:
bd820906ea9d ("platform/x86/intel/pmc: Add Arrow Lake U/H support to intel_pmc_core driver")
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Link: https://lore.kernel.org/r/a61f78be45c13f39e122dcc684b636f4b21e79a0.1747737446.git.todd.e.brandt@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Matt Atwood [Tue, 20 May 2025 19:57:49 +0000 (12:57 -0700)]
drm/xe/ptl: Update the PTL pci id table
Update to current bspec table.
Bspec: 72574
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://lore.kernel.org/r/20250520195749.371748-1-matthew.s.atwood@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit
49c6dc74b5968885f421f9f1b45eb4890b955870)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Shuicheng Lin [Tue, 13 May 2025 15:30:10 +0000 (15:30 +0000)]
drm/xe: Use xe_mmio_read32() to read mtcfg register
The mtcfg register is a 32-bit register and should therefore be
accessed using xe_mmio_read32().
Other 3 changes per codestyle suggestion:
"
xe_mmio.c:83: CHECK: Alignment should match open parenthesis
xe_mmio.c:131: CHECK: Comparison to NULL could be written "!xe->mmio.regs"
xe_mmio.c:315: CHECK: line length of 103 exceeds 100 columns
"
Fixes:
dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250513153010.3464767-1-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit
d2662cf8f44a68deb6c76ad9f1d9f29dbf7ba601)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tejas Upadhyay [Tue, 6 May 2025 14:23:00 +0000 (19:53 +0530)]
drm/xe/mocs: Check if all domains awake
Check if all domains are awake specially for
LNCF regs
Fixes:
298661cd9cea ("drm/xe: Fix MOCS debugfs LNCF readout")
Improvements-suggested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250506142300.1865783-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit
a383cf218ef8bb35d4c03958bd956573b65cf778)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Linus Torvalds [Fri, 23 May 2025 02:17:55 +0000 (19:17 -0700)]
Merge tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Small stuff, main ones users will be interested in:
- Couple more casefolding fixes; we can now detect and repair
casefolded dirents in non-casefolded dir and vice versa
- Fix for massive write inflation with mmapped io, which hit certain
databases"
* tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs:
bcachefs: Check for casefolded dirents in non casefolded dirs
bcachefs: Fix bch2_dirent_create_snapshot() for casefolding
bcachefs: Fix casefold opt via xattr interface
bcachefs: mkwrite() now only dirties one page
bcachefs: fix extent_has_stripe_ptr()
bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
Linus Torvalds [Thu, 22 May 2025 23:25:23 +0000 (16:25 -0700)]
Merge tag 'pmdomain-v6.15-rc3' of git://git./linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
"Core:
- Fix error checking in genpd_dev_pm_attach_by_id()
Providers:
- renesas: Remove obsolete nullify checks for rcar domains"
* tag 'pmdomain-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: core: Fix error checking in genpd_dev_pm_attach_by_id()
pmdomain: renesas: rcar: Remove obsolete nullify checks
Linus Torvalds [Thu, 22 May 2025 23:15:59 +0000 (16:15 -0700)]
Merge tag 'mmc-v6.15-rc6' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- sdhci_am654: Fix MMC init failures on am62x boards
- sdhci-of-dwcmshc: Add PD workaround on RK3576 to avoid hang
* tag 'mmc-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci_am654: Add SDHCI_QUIRK2_SUPPRESS_V1P8_ENA quirk to am62 compatible
mmc: sdhci-of-dwcmshc: add PD workaround on RK3576
Linus Torvalds [Thu, 22 May 2025 20:08:21 +0000 (13:08 -0700)]
Merge tag 'block-6.15-
20250522' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix for a regression with setting up loop on a file system
without ->write_iter()
- Fix for an nvme sysfs regression
* tag 'block-6.15-
20250522' of git://git.kernel.dk/linux:
nvme: avoid creating multipath sysfs group under namespace path devices
loop: don't require ->write_iter for writable files in loop_configure
Linus Torvalds [Thu, 22 May 2025 20:05:00 +0000 (13:05 -0700)]
Merge tag 'io_uring-6.15-
20250522' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Kill a duplicate function definition, which can cause linking issues
in certain .config configurations. Introduced in this cycle.
- Fix for a potential overflow CQE reordering issue if a re-schedule is
done during posting. Heading to stable.
- Fix for an issue with recv bundles, where certain conditions can lead
to gaps in the buffers, where a contiguous buffer range was expected.
Heading to stable.
* tag 'io_uring-6.15-
20250522' of git://git.kernel.dk/linux:
io_uring/net: only retry recv bundle for a full transfer
io_uring: fix overflow resched cqe reordering
io_uring/cmd: axe duplicate io_uring_cmd_import_fixed_vec() declaration
Linus Torvalds [Thu, 22 May 2025 19:35:16 +0000 (12:35 -0700)]
Merge tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Two fixes for use after free in readdir code paths
* tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Reset all search buffer pointers when releasing buffer
smb: client: Fix use-after-free in cifs_fill_dirent
Linus Torvalds [Thu, 22 May 2025 16:15:19 +0000 (09:15 -0700)]
Merge tag 'net-6.15-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"This is somewhat larger than what I hoped for, with a few PRs from
subsystems and follow-ups for the recent netdev locking changes,
anyhow there are no known pending regressions.
Including fixes from bluetooth, ipsec and CAN.
Current release - regressions:
- eth: team: grab team lock during team_change_rx_flags
- eth: bnxt_en: fix netdev locking in ULP IRQ functions
Current release - new code bugs:
- xfrm: ipcomp: fix truesize computation on receive
- eth: airoha: fix page recycling in airoha_qdma_rx_process()
Previous releases - regressions:
- sched: hfsc: fix qlen accounting bug when using peek in
hfsc_enqueue()
- mr: consolidate the ipmr_can_free_table() checks.
- bridge: netfilter: fix forwarding of fragmented packets
- xsk: bring back busy polling support in XDP_COPY
- can:
- add missing rcu read protection for procfs content
- kvaser_pciefd: force IRQ edge in case of nested IRQ
Previous releases - always broken:
- xfrm: espintcp: remove encap socket caching to avoid reference leak
- bluetooth: use skb_pull to avoid unsafe access in QCA dump handling
- eth: idpf:
- fix null-ptr-deref in idpf_features_check
- fix idpf_vport_splitq_napi_poll()
- eth: hibmcge: fix wrong ndo.open() after reset fail issue"
* tag 'net-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG
octeontx2-af: Set LMT_ENA bit for APR table entries
net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done
octeontx2-pf: Avoid adding dcbnl_ops for LBK and SDP vf
selftests/tc-testing: Add an HFSC qlen accounting test
sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
idpf: fix idpf_vport_splitq_napi_poll()
net: hibmcge: fix wrong ndo.open() after reset fail issue.
net: hibmcge: fix incorrect statistics update issue
xsk: Bring back busy polling support in XDP_COPY
can: slcan: allow reception of short error messages
net: lan743x: Restore SGMII CTRL register on resume
bnxt_en: Fix netdev locking in ULP IRQ functions
MAINTAINERS: Drop myself to reviewer for ravb driver
net: dwmac-sun8i: Use parsed internal PHY address instead of 1
net: ethernet: ti: am65-cpsw: Lower random mac address error print to info
can: kvaser_pciefd: Continue parsing DMA buf after dropped RX
can: kvaser_pciefd: Fix echo_skb race
can: kvaser_pciefd: Force IRQ edge in case of nested IRQ
idpf: fix null-ptr-deref in idpf_features_check
...
Mario Limonciello [Thu, 22 May 2025 14:13:28 +0000 (09:13 -0500)]
Revert "drm/amd: Keep display off while going into S4"
commit
68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4")
attempted to keep displays off during the S4 sequence by not resuming
display IP. This however leads to hangs because DRM clients such as the
console can try to access registers and cause a hang.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155
Fixes:
68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
e485502c37b097b0bd773baa7e2741bf7bd2909a)
Cc: stable@vger.kernel.org
Linus Torvalds [Thu, 22 May 2025 16:08:54 +0000 (09:08 -0700)]
Merge tag 'pinctrl-v6.15-4' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"This deals with a crash in the Qualcomm pin controller GPIO
parts when using hogs.
The first patch to gpiolib makes gpiochip_line_is_valid()
NULL-tolerant.
The second patch fixes the actual problem"
* tag 'pinctrl-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: switch to devm_register_sys_off_handler()
gpiolib: don't crash on enabling GPIO HOG pins
Linus Torvalds [Thu, 22 May 2025 16:05:29 +0000 (09:05 -0700)]
Merge tag 'sound-6.15' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes for 6.15 final. It became slightly a
higher amount than expected, but all look easy and safe to apply:
- A fix for PCM core race spotted by fuzzing
- ASoC topology fix for single DAI link
- UAF fix for ASoC SOF Intel HD-audio at reloading
- ASoC SOF Intel and Mediatek fixes
- Trivial HD-audio quirks as usual"
* tag 'sound-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Add new HP ZBook laptop with micmute led fixup
ALSA: hda/realtek: Add support for HP Agusta using CS35L41 HDA
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ASP10
ALSA: hda/realtek - restore auto-mute mode for Dell Chrome platform
ALSA: pcm: Fix race of buffer access at PCM OSS layer
ASoC: SOF: Intel: hda: Fix UAF when reloading module
ASoc: SOF: topology: connect DAI to a single DAI link
ASoC: SOF: Intel: hda-bus: Use PIO mode on ACE2+ platforms
ASoC: SOF: ipc4-pcm: Delay reporting is only supported for playback direction
ASoC: SOF: ipc4-control: Use SOF_CTRL_CMD_BINARY as numid for bytes_ext
ASoC: mediatek: mt8188-mt6359: Depend on MT6359_ACCDET set or disabled
ASoC: mediatek: mt8188-mt6359: select CONFIG_SND_SOC_MT6359_ACCDET
Jens Axboe [Thu, 22 May 2025 15:25:47 +0000 (09:25 -0600)]
Merge tag 'nvme-6.15-2025-05-22' of git://git.infradead.org/nvme into block-6.15
Pull NVMe fix from Christoph:
"nvme fixes for Linux 6.15
- do not create the newly added multipath sysfs group for
non-multipath nodes (Nilay Shroff)"
* tag 'nvme-6.15-2025-05-22' of git://git.infradead.org/nvme:
nvme: avoid creating multipath sysfs group under namespace path devices
Larisa Grigore [Thu, 22 May 2025 14:51:32 +0000 (15:51 +0100)]
spi: spi-fsl-dspi: Reset SR flags before sending a new message
If, in a previous transfer, the controller sends more data than expected
by the DSPI target, SR.RFDF (RX FIFO is not empty) will remain asserted.
When flushing the FIFOs at the beginning of a new transfer (writing 1
into MCR.CLR_TXF and MCR.CLR_RXF), SR.RFDF should also be cleared.
Otherwise, when running in target mode with DMA, if SR.RFDF remains
asserted, the DMA callback will be fired before the controller sends any
data.
Take this opportunity to reset all Status Register fields.
Fixes:
5ce3cc567471 ("spi: spi-fsl-dspi: Provide support for DSPI slave mode operation (Vybryd vf610)")
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-3-bea884630cfb@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Bogdan-Gabriel Roman [Thu, 22 May 2025 14:51:31 +0000 (15:51 +0100)]
spi: spi-fsl-dspi: Halt the module after a new message transfer
The XSPI mode implementation in this driver still uses the EOQ flag to
signal the last word in a transmission and deassert the PCS signal.
However, at speeds lower than ~200kHZ, the PCS signal seems to remain
asserted even when SR[EOQF] = 1 indicates the end of a transmission.
This is a problem for target devices which require the deassertation of
the PCS signal between transfers.
Hence, this commit 'forces' the deassertation of the PCS by stopping the
module through MCR[HALT] after completing a new transfer. According to
the reference manual, the module stops or transitions from the Running
state to the Stopped state after the current frame, when any one of the
following conditions exist:
- The value of SR[EOQF] = 1.
- The chip is in Debug mode and the value of MCR[FRZ] = 1.
- The value of MCR[HALT] = 1.
This shouldn't be done if the last transfer in the message has cs_change
set.
Fixes:
ea93ed4c181b ("spi: spi-fsl-dspi: Use EOQ for last word in buffer even for XSPI mode")
Signed-off-by: Bogdan-Gabriel Roman <bogdan-gabriel.roman@nxp.com>
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-2-bea884630cfb@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Larisa Grigore [Thu, 22 May 2025 14:51:30 +0000 (15:51 +0100)]
spi: spi-fsl-dspi: restrict register range for regmap access
DSPI registers are NOT continuous, some registers are reserved and
accessing them from userspace will trigger external abort, add regmap
register access table to avoid below abort.
For example on S32G:
# cat /sys/kernel/debug/regmap/
401d8000.spi/registers
Internal error: synchronous external abort:
96000210 1 PREEMPT SMP
...
Call trace:
regmap_mmio_read32le+0x24/0x48
regmap_mmio_read+0x48/0x70
_regmap_bus_reg_read+0x38/0x48
_regmap_read+0x68/0x1b0
regmap_read+0x50/0x78
regmap_read_debugfs+0x120/0x338
Fixes:
1acbdeb92c87 ("spi/fsl-dspi: Convert to use regmap and add big-endian support")
Co-developed-by: Xulin Sun <xulin.sun@windriver.com>
Signed-off-by: Xulin Sun <xulin.sun@windriver.com>
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-1-bea884630cfb@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Greg Kroah-Hartman [Thu, 22 May 2025 10:47:31 +0000 (12:47 +0200)]
spi: use container_of_cont() for to_spi_device()
Some places in the spi core pass in a const pointer to a device and the
default container_of() casts that away, which is not a good idea.
Preserve the proper const attribute by using container_of_const() for
to_spi_device() instead, which is what it was designed for.
Note, this removes the NULL check for a device pointer in the call, but
no one was ever checking for that return value, and a device pointer
should never be NULL overall anyway, so this should be a safe change.
Cc: Mark Brown <broonie@kernel.org>
Fixes:
d69d80484598 ("driver core: have match() callback in struct bus_type take a const *")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2025052230-fidgeting-stooge-66f5@gregkh
Signed-off-by: Mark Brown <broonie@kernel.org>
Paolo Abeni [Thu, 22 May 2025 10:32:38 +0000 (12:32 +0200)]
Merge tag 'linux-can-fixes-for-6.15-
20250521' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-05-22
this is a pull request of 4 patches for net/main.
The first 3 patches are by Axel Forsman and fix a ISR race condition
in the kvaser_pciefd driver.
The last patch is by Carlos Sanchez and fixes the reception of short
error messages in the slcan driver.
* tag 'linux-can-fixes-for-6.15-
20250521' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: slcan: allow reception of short error messages
can: kvaser_pciefd: Continue parsing DMA buf after dropped RX
can: kvaser_pciefd: Fix echo_skb race
can: kvaser_pciefd: Force IRQ edge in case of nested IRQ
====================
Link: https://patch.msgid.link/20250522082344.490913-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 22 May 2025 10:17:44 +0000 (12:17 +0200)]
Merge branch 'octeontx2-af-apr-mapping-fixes'
Geetha sowjanya says:
====================
octeontx2-af: APR Mapping Fixes
This patch series includes fixes related to APR (LMT)
mapping and debugfs support.
Changes include:
Patch 1:Set LMT_ENA bit for APR table entries.
Enables the LMT line for each PF/VF by setting
the LMT_ENA bit in the APR_LMT_MAP_ENTRY_S
structure.
Patch-2:Fix APR entry in debugfs
The APR table was previously mapped using a fixed size,
which could lead to incorrect mappings when the number
of PFs and VFs differed from the assumed value.
This patch updates the logic to calculate the APR table
size dynamically, based on values from the APR_LMT_CFG
register, ensuring correct representation in debugfs.
====================
Link: https://patch.msgid.link/20250521060834.19780-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Geetha sowjanya [Wed, 21 May 2025 06:08:34 +0000 (11:38 +0530)]
octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG
The current implementation maps the APR table using a fixed size,
which can lead to incorrect mapping when the number of PFs and VFs
varies.
This patch corrects the mapping by calculating the APR table
size dynamically based on the values configured in the
APR_LMT_CFG register, ensuring accurate representation
of APR entries in debugfs.
Fixes:
0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table").
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Link: https://patch.msgid.link/20250521060834.19780-3-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Subbaraya Sundeep [Wed, 21 May 2025 06:08:33 +0000 (11:38 +0530)]
octeontx2-af: Set LMT_ENA bit for APR table entries
This patch enables the LMT line for a PF/VF by setting the
LMT_ENA bit in the APR_LMT_MAP_ENTRY_S structure.
Additionally, it simplifies the logic for calculating the
LMTST table index by consistently using the maximum
number of hw supported VFs (i.e., 256).
Fixes:
873a1e3d207a ("octeontx2-af: cn10k: Setting up lmtst map table").
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250521060834.19780-2-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 22 May 2025 09:49:52 +0000 (11:49 +0200)]
Merge tag 'ipsec-2025-05-21' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2025-05-21
1) Fix some missing kfree_skb in the error paths of espintcp.
From Sabrina Dubroca.
2) Fix a reference leak in espintcp.
From Sabrina Dubroca.
3) Fix UDP GRO handling for ESPINUDP.
From Tobias Brunner.
4) Fix ipcomp truesize computation on the receive path.
From Sabrina Dubroca.
5) Sanitize marks before policy/state insertation.
From Paul Chaignon.
* tag 'ipsec-2025-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Sanitize marks before insert
xfrm: ipcomp: fix truesize computation on receive
xfrm: Fix UDP GRO handling for some corner cases
espintcp: remove encap socket caching to avoid reference leak
espintcp: fix skb leaks
====================
Link: https://patch.msgid.link/20250521054348.4057269-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wang Liang [Tue, 20 May 2025 10:14:04 +0000 (18:14 +0800)]
net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done
Syzbot reported a slab-use-after-free with the following call trace:
==================================================================
BUG: KASAN: slab-use-after-free in tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
Read of size 8 at addr
ffff88807a733000 by task kworker/1:0/25
Call Trace:
kasan_report+0xd9/0x110 mm/kasan/report.c:601
tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
crypto_request_complete include/crypto/algapi.h:266
aead_request_complete include/crypto/internal/aead.h:85
cryptd_aead_crypt+0x3b8/0x750 crypto/cryptd.c:772
crypto_request_complete include/crypto/algapi.h:266
cryptd_queue_worker+0x131/0x200 crypto/cryptd.c:181
process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
Allocated by task 8355:
kzalloc_noprof include/linux/slab.h:778
tipc_crypto_start+0xcc/0x9e0 net/tipc/crypto.c:1466
tipc_init_net+0x2dd/0x430 net/tipc/core.c:72
ops_init+0xb9/0x650 net/core/net_namespace.c:139
setup_net+0x435/0xb40 net/core/net_namespace.c:343
copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
ksys_unshare+0x419/0x970 kernel/fork.c:3323
__do_sys_unshare kernel/fork.c:3394
Freed by task 63:
kfree+0x12a/0x3b0 mm/slub.c:4557
tipc_crypto_stop+0x23c/0x500 net/tipc/crypto.c:1539
tipc_exit_net+0x8c/0x110 net/tipc/core.c:119
ops_exit_list+0xb0/0x180 net/core/net_namespace.c:173
cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
After freed the tipc_crypto tx by delete namespace, tipc_aead_encrypt_done
may still visit it in cryptd_queue_worker workqueue.
I reproduce this issue by:
ip netns add ns1
ip link add veth1 type veth peer name veth2
ip link set veth1 netns ns1
ip netns exec ns1 tipc bearer enable media eth dev veth1
ip netns exec ns1 tipc node set key this_is_a_master_key master
ip netns exec ns1 tipc bearer disable media eth dev veth1
ip netns del ns1
The key of reproduction is that, simd_aead_encrypt is interrupted, leading
to crypto_simd_usable() return false. Thus, the cryptd_queue_worker is
triggered, and the tipc_crypto tx will be visited.
tipc_disc_timeout
tipc_bearer_xmit_skb
tipc_crypto_xmit
tipc_aead_encrypt
crypto_aead_encrypt
// encrypt()
simd_aead_encrypt
// crypto_simd_usable() is false
child = &ctx->cryptd_tfm->base;
simd_aead_encrypt
crypto_aead_encrypt
// encrypt()
cryptd_aead_encrypt_enqueue
cryptd_aead_enqueue
cryptd_enqueue_request
// trigger cryptd_queue_worker
queue_work_on(smp_processor_id(), cryptd_wq, &cpu_queue->work)
Fix this by holding net reference count before encrypt.
Reported-by: syzbot+55c12726619ff85ce1f6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
55c12726619ff85ce1f6
Fixes:
fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20250520101404.1341730-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suman Ghosh [Mon, 19 May 2025 07:26:58 +0000 (12:56 +0530)]
octeontx2-pf: Avoid adding dcbnl_ops for LBK and SDP vf
Priority flow control is not supported for LBK and SDP vf. This patch
adds support to not add dcbnl_ops for LBK and SDP vf.
Fixes:
8e67558177f8 ("octeontx2-pf: PFC config support with DCBx")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250519072658.2960851-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 22 May 2025 09:16:53 +0000 (11:16 +0200)]
Merge branch 'net_sched-fix-hfsc-qlen-backlog-accounting-bug-and-add-selftest'
Cong Wang says:
====================
net_sched: Fix HFSC qlen/backlog accounting bug and add selftest
This series addresses a long-standing bug in the HFSC qdisc where queue length
and backlog accounting could become inconsistent if a packet is dropped during
a peek-induced dequeue operation, and adds a corresponding selftest to tc-testing.
====================
Link: https://patch.msgid.link/20250518222038.58538-1-xiyou.wangcong@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cong Wang [Sun, 18 May 2025 22:20:38 +0000 (15:20 -0700)]
selftests/tc-testing: Add an HFSC qlen accounting test
This test reproduces a scenario where HFSC queue length and backlog accounting
can become inconsistent when a peek operation triggers a dequeue and possible
drop before the parent qdisc updates its counters. The test sets up a DRR root
qdisc with an HFSC class, netem, and blackhole children, and uses Scapy to
inject a packet. It helps to verify that HFSC correctly tracks qlen and backlog
even when packets are dropped during peek-induced dequeue.
Cc: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250518222038.58538-3-xiyou.wangcong@gmail.com
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cong Wang [Sun, 18 May 2025 22:20:37 +0000 (15:20 -0700)]
sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the
child qdisc's peek() operation before incrementing sch->q.qlen and
sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may
trigger an immediate dequeue and potential packet drop. In such cases,
qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog
have not yet been updated, leading to inconsistent queue accounting. This
can leave an empty HFSC class in the active list, causing further
consequences like use-after-free.
This patch fixes the bug by moving the increment of sch->q.qlen and
sch->qstats.backlog before the call to the child qdisc's peek() operation.
This ensures that queue length and backlog are always accurate when packet
drops or dequeues are triggered during the peek.
Fixes:
12d0ad3be9c3 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline")
Reported-by: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tushar Dave [Tue, 20 May 2025 01:19:37 +0000 (18:19 -0700)]
iommu: Skip PASID validation for devices without PASID capability
Generally PASID support requires ACS settings that usually create
single device groups, but there are some niche cases where we can get
multi-device groups and still have working PASID support. The primary
issue is that PCI switches are not required to treat PASID tagged TLPs
specially so appropriate ACS settings are required to route all TLPs to
the host bridge if PASID is going to work properly.
pci_enable_pasid() does check that each device that will use PASID has
the proper ACS settings to achieve this routing.
However, no-PASID devices can be combined with PASID capable devices
within the same topology using non-uniform ACS settings. In this case
the no-PASID devices may not have strict route to host ACS flags and
end up being grouped with the PASID devices.
This configuration fails to allow use of the PASID within the iommu
core code which wrongly checks if the no-PASID device supports PASID.
Fix this by ignoring no-PASID devices during the PASID validation. They
will never issue a PASID TLP anyhow so they can be ignored.
Fixes:
c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()")
Cc: stable@vger.kernel.org
Signed-off-by: Tushar Dave <tdave@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250520011937.3230557-1-tdave@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Eric Dumazet [Tue, 20 May 2025 12:40:30 +0000 (12:40 +0000)]
idpf: fix idpf_vport_splitq_napi_poll()
idpf_vport_splitq_napi_poll() can incorrectly return @budget
after napi_complete_done() has been called.
This violates NAPI rules, because after napi_complete_done(),
current thread lost napi ownership.
Move the test against POLL_MODE before the napi_complete_done().
Fixes:
c2d548cad150 ("idpf: add TX splitq napi poll support")
Reported-by: Peter Newman <peternewman@google.com>
Closes: https://lore.kernel.org/netdev/
20250520121908.
1805732-1-edumazet@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joshua Hay <joshua.a.hay@intel.com>
Cc: Alan Brady <alan.brady@intel.com>
Cc: Madhu Chittim <madhu.chittim@intel.com>
Cc: Phani Burra <phani.r.burra@intel.com>
Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Link: https://patch.msgid.link/20250520124030.1983936-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Namjae Jeon [Tue, 20 May 2025 00:25:03 +0000 (09:25 +0900)]
ksmbd: use list_first_entry_or_null for opinfo_get_list()
The list_first_entry() macro never returns NULL. If the list is
empty then it returns an invalid pointer. Use list_first_entry_or_null()
to check if the list is empty.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/
202505080231.7OXwq4Te-lkp@intel.com/
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Wed, 21 May 2025 08:07:36 +0000 (17:07 +0900)]
ksmbd: fix rename failure
I found that rename fails after cifs mount due to update of
lookup_one_qstr_excl().
mv a/c b/
mv: cannot move 'a/c' to 'b/c': No such file or directory
In order to rename to a new name regardless of whether the dentry is
negative, we need to get the dentry through lookup_one_qstr_excl().
So It will not return error if the name doesn't exist.
Fixes:
204a575e91f3 ("VFS: add common error checks to lookup_one_qstr_excl()")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Jens Axboe [Thu, 22 May 2025 00:51:49 +0000 (18:51 -0600)]
io_uring/net: only retry recv bundle for a full transfer
If a shorter than assumed transfer was seen, a partial buffer will have
been filled. For that case it isn't sane to attempt to fill more into
the bundle before posting a completion, as that will cause a gap in
the received data.
Check if the iterator has hit zero and only allow to continue a bundle
operation if that is the case.
Also ensure that for putting finished buffers, only the current transfer
is accounted. Otherwise too many buffers may be put for a short transfer.
Link: https://github.com/axboe/liburing/issues/1409
Cc: stable@vger.kernel.org
Fixes:
7c71a0af81ba ("io_uring/net: improve recv bundles")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 22 May 2025 00:24:18 +0000 (17:24 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Fixes for some SoC clk drivers:
- Define the gate clk for the OTG PHY on Rockchip RK3576 so the nvmem
driver actually works
- Initialize clk_hw_onecell_data::num before accessing the 'hws'
array to keep UBSAN happy
- Fix a perf degradation on the Allwinner D1 MMC clk that was making
things half bad
- Fix the Allwinner SNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT macro to have
proper order of arguments"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: d1: Add missing divider for MMC mod clocks
clk: s2mps11: initialise clk_hw_onecell_data::num before accessing ::hws[] in probe()
clk: sunxi-ng: fix order of arguments in clock macro
clk: rockchip: rk3576: define clk_otp_phy_g
Kent Overstreet [Wed, 21 May 2025 04:38:04 +0000 (00:38 -0400)]
bcachefs: Check for casefolded dirents in non casefolded dirs
Check for mismatches between casefold dirents and casefold directories.
A mismatch will cause lookups to fail, as we'll be doing the lookup with
the casefolded name, which won't match the non-casefolded dirent, and
vice versa.
Reported-by: Christopher Snowhill <chris@kode54.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 21 May 2025 04:41:07 +0000 (00:41 -0400)]
bcachefs: Fix bch2_dirent_create_snapshot() for casefolding
bch2_dirent_create_snapshot(), used in fsck, neglected to create a
casefolded dirent.
Just move this into dirent_create_key().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 21 May 2025 00:05:45 +0000 (20:05 -0400)]
bcachefs: Fix casefold opt via xattr interface
Changing the casefold option requires extra checks/work - factor out a
helper from bch2_fileattr_set() for the xattr code to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Jakub Kicinski [Wed, 21 May 2025 22:53:54 +0000 (15:53 -0700)]
Merge branch 'there-are-some-bugfix-for-hibmcge-driver'
Jijie Shao says:
====================
There are some bugfix for hibmcge driver
v1: https://lore.kernel.org/
20250430093127.
2400813-1-shaojijie@huawei.com
====================
Link: https://patch.msgid.link/20250517095828.1763126-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jijie Shao [Sat, 17 May 2025 09:58:28 +0000 (17:58 +0800)]
net: hibmcge: fix wrong ndo.open() after reset fail issue.
If the driver reset fails, it may not work properly.
Therefore, the ndo.open() operation should be rejected.
In this patch, the driver calls netif_device_detach()
before the reset and calls netif_device_attach()
after the reset succeeds. If the reset fails,
netif_device_attach() is not called. Therefore,
netdev does not present and cannot be opened.
If reset fails, only the PCI reset (via sysfs)
can be used to attempt recovery.
Fixes:
3f5a61f6d504 ("net: hibmcge: Add reset supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250517095828.1763126-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jijie Shao [Sat, 17 May 2025 09:58:27 +0000 (17:58 +0800)]
net: hibmcge: fix incorrect statistics update issue
When the user dumps statistics, the hibmcge driver automatically
updates all statistics. If the driver is performing the reset operation,
the error data of 0xFFFFFFFF is updated.
Therefore, if the driver is resetting, the hbg_update_stats_by_info()
needs to return directly.
Fixes:
c0bf9bf31e79 ("net: hibmcge: Add support for dump statistics")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250517095828.1763126-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Wed, 21 May 2025 21:26:53 +0000 (23:26 +0200)]
Merge tag 'mvebu-fixes-6.15-1' of https://git./linux/kernel/git/gclement/mvebu into arm/fixes
mvebu fixes for 6.15 (part 1)
Fix uDPU board LEDs by correcting pinctrl state
* tag 'mvebu-fixes-6.15-1' of https://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs
Link: https://lore.kernel.org/r/87wmagpr0a.fsf@BLaptop.bootlin.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Wed, 21 May 2025 21:26:38 +0000 (23:26 +0200)]
Merge tag 'sunxi-fixes-for-6.15' of https://git./linux/kernel/git/sunxi/linux into arm/fixes
Allwinner fixes for 6.15
Only one fix:
Switch back to I2C for PMICs on Allwinner H6 devices. Apparently using
Allwinner's proprietary bus ended up causing issues when the PMIC was
sharing the bus with other devices.
* tag 'sunxi-fixes-for-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection"
Link: https://lore.kernel.org/r/aCaeLgjZllV7bauX@wens.tw
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fabio Estevam [Thu, 15 May 2025 15:41:36 +0000 (12:41 -0300)]
arm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selected
Since commit
17ec3e71ba79 ("crypto: lib/Kconfig - Hide arch options from
user"), the CRYPTO_CHACHA20_NEON option is no longer selected by default
due to changes in how crypto library options are exposed and selected.
To restore the previous behavior and ensure CRYPTO_CHACHA20_NEON is
enabled, explicitly select CONFIG_CRYPTO_CHACHA20 in the defconfig. This
pulls in CRYPTO_LIB_CHACHA_INTERNAL and allows CRYPTO_CHACHA20_NEON to be
selected automatically as before.
Fixes:
17ec3e71ba79 ("crypto: lib/Kconfig - Hide arch options from user")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Wed, 21 May 2025 17:02:31 +0000 (19:02 +0200)]
Merge tag 'samsung-fixes-6.15' of https://git./linux/kernel/git/krzk/linux into arm/fixes
Samsung SoC driver fixes for v6.15
1. Exynos ACPM driver (used on Google GS101): Fix timeout due to missing
responses from the firmware part.
2. Samsung USI (serial engines) driver: Correct ineffective
unconfiguring of the interface during probe removal.
* tag 'samsung-fixes-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
soc: samsung: usi: prevent wrong bits inversion during unconfiguring
firmware: exynos-acpm: check saved RX before bailing out on empty RX queue
Link: https://lore.kernel.org/r/20250513101023.21552-5-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pavel Begunkov [Sat, 17 May 2025 12:27:37 +0000 (13:27 +0100)]
io_uring: fix overflow resched cqe reordering
Leaving the CQ critical section in the middle of a overflow flushing
can cause cqe reordering since the cache cq pointers are reset and any
new cqe emitters that might get called in between are not going to be
forced into io_cqe_cache_refill().
Fixes:
eac2ca2d682f9 ("io_uring: check if we need to reschedule during overflow flush")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/90ba817f1a458f091f355f407de1c911d2b93bbf.1747483784.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nilay Shroff [Wed, 21 May 2025 12:41:23 +0000 (18:11 +0530)]
nvme: avoid creating multipath sysfs group under namespace path devices
Commit
4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin
io-policy") introduced the creation of the multipath sysfs group under
the NVMe head gendisk device node. However, it also inadvertently added
the same sysfs group under each namespace path device which head node
refers to and that is incorrect.
The multipath sysfs group should only be exposed through the namespace
head gendisk node. This is sufficient, as the head device already
provides symbolic links to the individual namespace paths it manages.
This patch fixes the issue by preventing the creation of the multipath
sysfs group under namespace path devices, ensuring it only appears under
the head disk node.
Fixes:
4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Samiullah Khawaja [Fri, 16 May 2025 21:36:38 +0000 (21:36 +0000)]
xsk: Bring back busy polling support in XDP_COPY
Commit
5ef44b3cb43b ("xsk: Bring back busy polling support") fixed the
busy polling support in xsk for XDP_ZEROCOPY after it was broken in
commit
86e25f40aa1e ("net: napi: Add napi_config"). The busy polling
support with XDP_COPY remained broken since the napi_id setup in
xsk_rcv_check was removed.
Bring back the setup of napi_id for XDP_COPY so socket level SO_BUSYPOLL
can be used to poll the underlying napi.
Do the setup of napi_id for XDP_COPY in xsk_bind, as it is done
currently for XDP_ZEROCOPY. The setup of napi_id for XDP_COPY in
xsk_bind is safe because xsk_rcv_check checks that the rx queue at which
the packet arrives is equal to the queue_id that was supplied in bind.
This is done for both XDP_COPY and XDP_ZEROCOPY mode.
Tested using AF_XDP support in virtio-net by running the xsk_rr AF_XDP
benchmarking tool shared here:
https://lore.kernel.org/all/
20250320163523.
3501305-1-skhawaja@google.com/T/
Enabled socket busy polling using following commands in qemu,
```
sudo ethtool -L eth0 combined 1
echo 400 | sudo tee /proc/sys/net/core/busy_read
echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
```
Fixes:
5ef44b3cb43b ("xsk: Bring back busy polling support")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Carlos Sanchez [Tue, 20 May 2025 10:23:05 +0000 (12:23 +0200)]
can: slcan: allow reception of short error messages
Allows slcan to receive short messages (typically errors) from the serial
interface.
When error support was added to slcan protocol in
b32ff4668544e1333b694fcc7812b2d7397b4d6a ("can: slcan: extend the protocol
with error info") the minimum valid message size changed from 5 (minimum
standard can frame tIII0) to 3 ("e1a" is a valid protocol message, it is
one of the examples given in the comments for slcan_bump_err() ), but the
check for minimum message length prodicating all decoding was not adjusted.
This makes short error messages discarded and error frames not being
generated.
This patch changes the minimum length to the new minimum (3 characters,
excluding terminator, is now a valid message).
Signed-off-by: Carlos Sanchez <carlossanchez@geotab.com>
Fixes:
b32ff4668544 ("can: slcan: extend the protocol with error info")
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20250520102305.1097494-1-carlossanchez@geotab.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Christian Brauner [Wed, 21 May 2025 07:34:31 +0000 (09:34 +0200)]
Merge patch series "fs/buffer: misc optimizations"
Davidlohr Bueso <dave@stgolabs.net> says:
Four small patches - the first could be sent to Linus for v6.15
considering it is a missing nonblocking lookup conversion in the getblk
slowpath I had missed. The other two patches are small optimizations
found while reading the code, and one rocket science cleanup patch.
* patches from https://lore.kernel.org/
20250515173925.147823-1-dave@stgolabs.net:
fs/buffer: optimize discard_buffer()
fs/buffer: remove superfluous statements
fs/buffer: avoid redundant lookup in getblk slowpath
fs/buffer: use sleeping lookup in __getblk_slowpath()
Link: https://lore.kernel.org/20250515173925.147823-1-dave@stgolabs.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
Davidlohr Bueso [Thu, 15 May 2025 17:39:25 +0000 (10:39 -0700)]
fs/buffer: optimize discard_buffer()
While invalidating, the clearing of the bits in discard_buffer()
is done in one fully ordered CAS operation. In the past this was
done via individual clear_bit(), until
e7470ee89f0 (fs: buffer:
do not use unnecessary atomic operations when discarding buffers).
This implies that there were never strong ordering requirements
outside of being serialized by the buffer lock.
As such relax the ordering for archs that can benefit. Further,
the implied ordering in buffer_unlock() makes current cmpxchg
implied barrier redundant due to release semantics. And while in
theory the unlock could be part of the bulk clearing, it is
best to leave it explicit, but without the double barriers.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-5-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Davidlohr Bueso [Thu, 15 May 2025 17:39:24 +0000 (10:39 -0700)]
fs/buffer: remove superfluous statements
Get rid of those unnecessary return statements.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-4-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Davidlohr Bueso [Thu, 15 May 2025 17:39:23 +0000 (10:39 -0700)]
fs/buffer: avoid redundant lookup in getblk slowpath
__getblk_slow() already implies failing a first lookup
as the fastpath, so try to create the buffers immediately
and avoid the redundant lookup. This saves 5-10% of the
total cost/latency of the slowpath.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-3-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Davidlohr Bueso [Thu, 15 May 2025 17:39:22 +0000 (10:39 -0700)]
fs/buffer: use sleeping lookup in __getblk_slowpath()
Just as with the fast path, call the lookup variant depending
on the gfp flags.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-2-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Lorenzo Stoakes [Thu, 15 May 2025 19:13:58 +0000 (20:13 +0100)]
MAINTAINERS: add mm memory policy section
As part of the ongoing efforts to sub-divide memory management
maintainership and reviewership, establish a section for memory policy and
migration and add appropriate maintainers and reviewers.
[lorenzo.stoakes@oracle.com: add Ying as reviewer]
Link: https://lkml.kernel.org/r/ed6f0fc2-5608-4eea-b1be-07e3e19be263@lucifer.local
Link: https://lkml.kernel.org/r/20250515191358.205684-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Rakie Kim <rakie.kim@sk.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Huang Ying <ying.huang@linux.alibaba.com>
Acked-by: Byungchul Park <byungchul@sk.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gregory Price <gourry@gourry.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Thu, 15 May 2025 19:04:04 +0000 (20:04 +0100)]
MAINTAINERS: add mm ksm section
As part of the ongoing efforts to sub-divide memory management
maintainership and reviewership, establish a section for Kernel Samepage
Merging (KSM) and add appropriate maintainers and reviewers.
Link: https://lkml.kernel.org/r/20250515190404.203596-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: Xu Xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Gordeev [Thu, 15 May 2025 13:55:38 +0000 (15:55 +0200)]
kasan: avoid sleepable page allocation from atomic context
apply_to_pte_range() enters the lazy MMU mode and then invokes
kasan_populate_vmalloc_pte() callback on each page table walk iteration.
However, the callback can go into sleep when trying to allocate a single
page, e.g. if an architecutre disables preemption on lazy MMU mode enter.
On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable() and
arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash occurs:
[ 0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321
[ 0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
[ 0.663358] preempt_count: 1, expected: 0
[ 0.663366] RCU nest depth: 0, expected: 0
[ 0.663375] no locks held by kthreadd/2.
[ 0.663383] Preemption disabled at:
[ 0.663386] [<
0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0
[ 0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted
6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT
[ 0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux)
[ 0.663409] Call Trace:
[ 0.663410] [<
0002f3284c385f58>] dump_stack_lvl+0xe8/0x140
[ 0.663413] [<
0002f3284c507b9e>] __might_resched+0x66e/0x700
[ 0.663415] [<
0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0
[ 0.663419] [<
0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0
[ 0.663421] [<
0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0
[ 0.663424] [<
0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120
[ 0.663427] [<
0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0
[ 0.663429] [<
0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120
[ 0.663433] [<
0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0
[ 0.663435] [<
0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0
[ 0.663437] [<
0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0
[ 0.663440] [<
0002f3284cbc9e88>] apply_to_page_range+0x28/0x40
[ 0.663442] [<
0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0
[ 0.663445] [<
0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10
[ 0.663448] [<
0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0
[ 0.663451] [<
0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310
[ 0.663454] [<
0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110
[ 0.663457] [<
0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330
[ 0.663460] [<
0002f3284c458d56>] dup_task_struct+0x66/0x4d0
[ 0.663463] [<
0002f3284c45be90>] copy_process+0x280/0x4b90
[ 0.663465] [<
0002f3284c460940>] kernel_clone+0xd0/0x4b0
[ 0.663467] [<
0002f3284c46115e>] kernel_thread+0xbe/0xe0
[ 0.663469] [<
0002f3284c4e440e>] kthreadd+0x50e/0x7f0
[ 0.663472] [<
0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0
[ 0.663475] [<
0002f3284ed57ff2>] ret_from_fork+0xa/0x38
Instead of allocating single pages per-PTE, bulk-allocate the shadow
memory prior to applying kasan_populate_vmalloc_pte() callback on a page
range.
Link: https://lkml.kernel.org/r/c61d3560297c93ed044f0b1af085610353a06a58.1747316918.git.agordeev@linux.ibm.com
Fixes:
3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 14 May 2025 17:06:02 +0000 (18:06 +0100)]
highmem: add folio_test_partial_kmap()
In commit
c749d9b7ebbc ("iov_iter: fix copy_page_from_iter_atomic() if
KMAP_LOCAL_FORCE_MAP"), Hugh correctly noted that if KMAP_LOCAL_FORCE_MAP
is enabled, we must limit ourselves to PAGE_SIZE bytes per call to
kmap_local(). The same problem exists in memcpy_from_folio(),
memcpy_to_folio(), folio_zero_tail(), folio_fill_tail() and
memcpy_from_file_folio(), so add folio_test_partial_kmap() to do this more
succinctly.
Link: https://lkml.kernel.org/r/20250514170607.3000994-2-willy@infradead.org
Fixes:
00cdf76012ab ("mm: add memcpy_from_file_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lance Yang [Tue, 13 May 2025 05:22:34 +0000 (13:22 +0800)]
MAINTAINERS: add hung-task detector section
The hung-task detector is missing in MAINTAINERS. While it's been quiet
recently, I'm actively working on it and volunteering to review patches.
Adding this section will make it easier for contributors to know who to
contact.
Link: https://lkml.kernel.org/r/20250513052234.46463-1-lance.yang@linux.dev
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wang Yaxin [Sat, 10 May 2025 07:54:13 +0000 (15:54 +0800)]
taskstats: fix struct taskstats breaks backward compatibility since version 15
Problem
========
commit
658eb5ab916d ("delayacct: add delay max to record delay peak")
- adding more fields
commit
f65c64f311ee ("delayacct: add delay min to record delay peak")
- adding more fields
commit
b016d0873777 ("taskstats: modify taskstats version")
- version bump to 15
Since version 15 (TASKSTATS_VERSION=15) the new layout of the structure
adds fields in the middle of the structure, rendering all old software
incompatible with newer kernels and software compiled against the new
kernel headers incompatible with older kernels.
Solution
=========
move delay max and delay min to the end of taskstat, and bump
the version to 16 after the change
[wang.yaxin@zte.com.cn: adjust indentation]
Link: https://lkml.kernel.org/r/202505192131489882NSciXV4EGd8zzjLuwoOK@zte.com.cn
Link: https://lkml.kernel.org/r/20250510155413259V4JNRXxukdDgzsaL0Fo6a@zte.com.cn
Fixes:
f65c64f311ee ("delayacct: add delay min to record delay peak")
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yi [Mon, 12 May 2025 06:28:25 +0000 (14:28 +0800)]
mm/truncate: fix out-of-bounds when doing a right-aligned split
When performing a right split on a folio, the split_at2 may point to a
not-present page if the offset + length equals the original folio size,
which will trigger the following error:
BUG: unable to handle page fault for address:
ffffea0006000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD
143ffb9067 P4D
143ffb9067 PUD
143ffb8067 PMD 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 0 UID: 0 PID: 502640 Comm: fsx Not tainted
6.15.0-rc3-gc6156189fc6b #889 PR
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/4
RIP: 0010:truncate_inode_partial_folio+0x208/0x620
Code: ff 03 48 01 da e8 78 7e 13 00 48 83 05 10 b5 5a 0c 01 85 c0 0f 85 1c 02 001
RSP: 0018:
ffffc90005bafab0 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
ffffea0005ffff00 RCX:
0000000000000002
RDX:
000000000000000c RSI:
0000000000013975 RDI:
ffffc90005bafa30
RBP:
ffffea0006000000 R08:
0000000000000000 R09:
00000000000009bf
R10:
00000000000007e0 R11:
0000000000000000 R12:
0000000000001633
R13:
0000000000000000 R14:
ffffea0005ffff00 R15:
fffffffffffffffe
FS:
00007f9f9a161740(0000) GS:
ffff8894971fd000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffea0006000008 CR3:
000000017c2ae000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
truncate_inode_pages_range+0x226/0x720
truncate_pagecache+0x57/0x90
...
Fix this issue by skipping the split if truncation aligns with the folio
size, make sure the split page number lies within the folio.
Link: https://lkml.kernel.org/r/20250512062825.3533342-1-yi.zhang@huaweicloud.com
Fixes:
7460b470a131 ("mm/truncate: use folio_split() in truncate operation")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: ErKun Yang <yangerkun@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Mon, 12 May 2025 14:31:22 +0000 (15:31 +0100)]
MAINTAINERS: add mm reclaim section
In furtherance of ongoing efforts to ensure people are aware of who
de-facto maintains/has an interest in specific parts of mm, as well trying
to avoid get_maintainers.pl listing only Andrew and the mailing list for
mm files - establish a reclaim memory management section and add relevant
maintainers/reviewers.
This is a key part of memory management so sensibly deserves its own
section.
This encompasses both 'classical' reclaim and MGLRU and thus reflects this
in the reviewers from both, as well as those who have contributed
specifically on the memcg side of things.
Link: https://lkml.kernel.org/r/20250512143122.87740-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Mon, 12 May 2025 14:46:03 +0000 (15:46 +0100)]
MAINTAINERS: update page allocator section
Make Vlastimil maintainer of this section (with thanks to Vlastimil for
agreeing to this!) and add page isolation files for which this section
seem most appropriate.
We may wish to, in future, refactor/rename some of these files to more
logically fit what is actually being performed, but for the time being
this seems the most sensible place.
Additionally, fix the alphabetical ordering of files.
Link: https://lkml.kernel.org/r/20250512144603.90379-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Florent Revest [Wed, 7 May 2025 13:09:57 +0000 (15:09 +0200)]
mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y
On configs with CONFIG_ARM64_GCS=y, VM_SHADOW_STACK is bit 38. On configs
with CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y (selected by CONFIG_ARM64 when
CONFIG_USERFAULTFD=y), VM_UFFD_MINOR is _also_ bit 38.
This bit being shared by two different VMA flags could lead to all sorts
of unintended behaviors. Presumably, a process could maybe call into
userfaultfd in a way that disables the shadow stack vma flag. I can't
think of any attack where this would help (presumably, if an attacker
tries to disable shadow stacks, they are trying to hijack control flow so
can't arbitrarily call into userfaultfd yet anyway) but this still feels
somewhat scary.
Link: https://lkml.kernel.org/r/20250507131000.1204175-2-revest@chromium.org
Fixes:
ae80e1629aea ("mm: Define VM_SHADOW_STACK for arm64 when we support GCS")
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ignacio Moreno Gonzalez [Wed, 7 May 2025 13:28:06 +0000 (15:28 +0200)]
mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
commit
c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") maps the
mmap option MAP_STACK to VM_NOHUGEPAGE. This is also done if
CONFIG_TRANSPARENT_HUGEPAGE is not defined. But in that case, the
VM_NOHUGEPAGE does not make sense.
I discovered this issue when trying to use the tool CRIU to checkpoint and
restore a container. Our running kernel is compiled without
CONFIG_TRANSPARENT_HUGEPAGE. CRIU parses the output of /proc/<pid>/smaps
and saves the "nh" flag. When trying to restore the container, CRIU fails
to restore the "nh" mappings, since madvise() MADV_NOHUGEPAGE always
returns an error because CONFIG_TRANSPARENT_HUGEPAGE is not defined.
Link: https://lkml.kernel.org/r/20250507-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-v5-1-c6c38cfefd6e@kuka.com
Fixes:
c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE")
Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uladzislau Rezki (Sony) [Wed, 7 May 2025 15:02:57 +0000 (17:02 +0200)]
MAINTAINERS: add myself as vmalloc co-maintainer
I have been working on the vmalloc code for several years, contributing to
improvements and fixes. Add myself as co-maintainer ("M") alongside
Andrew Morton.
Link: https://lkml.kernel.org/r/20250507150257.61485-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Christop Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tianyang Zhang [Wed, 16 Apr 2025 08:24:05 +0000 (16:24 +0800)]
mm/page_alloc.c: avoid infinite retries caused by cpuset race
__alloc_pages_slowpath has no change detection for ac->nodemask in the
part of retry path, while cpuset can modify it in parallel. For some
processes that set mempolicy as MPOL_BIND, this results ac->nodemask
changes, and then the should_reclaim_retry will judge based on the latest
nodemask and jump to retry, while the get_page_from_freelist only
traverses the zonelist from ac->preferred_zoneref, which selected by a
expired nodemask and may cause infinite retries in some cases
cpu 64:
__alloc_pages_slowpath {
/* ..... */
retry:
/* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
/* cpu 1:
cpuset_write_resmask
update_nodemask
update_nodemasks_hier
update_tasks_nodemask
mpol_rebind_task
mpol_rebind_policy
mpol_rebind_nodemask
// mempolicy->nodes has been modified,
// which ac->nodemask point to
*/
/* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
did_some_progress > 0, &no_progress_loops))
goto retry;
}
Simultaneously starting multiple cpuset01 from LTP can quickly reproduce
this issue on a multi node server when the maximum memory pressure is
reached and the swap is enabled
Link: https://lkml.kernel.org/r/20250416082405.20988-1-zhangtianyang@loongson.cn
Fixes:
c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Wed, 21 May 2025 03:12:20 +0000 (20:12 -0700)]
Merge tag 'rproc-v6.15-fixes' of git://git./linux/kernel/git/remoteproc/linux
Pull remoteproc fix from Bjorn Andersson:
"Address a regression preventing the wireless subsystem remoteproc on
some Qualcomm platforms (e.g. SDM632) from working"
* tag 'rproc-v6.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
remoteproc: qcom_wcnss: Fix on platforms without fallback regulators
Linus Torvalds [Wed, 21 May 2025 03:10:01 +0000 (20:10 -0700)]
Merge tag 'v6.15-p7' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a regression in padata as well as an ancient double-free
bug in af_alg"
* tag 'v6.15-p7' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algif_hash - fix double free in hash_accept
padata: do not leak refcount in reorder_work
Thangaraj Samynathan [Fri, 16 May 2025 03:57:19 +0000 (09:27 +0530)]
net: lan743x: Restore SGMII CTRL register on resume
SGMII_CTRL register, which specifies the active interface, was not
properly restored when resuming from suspend. This led to incorrect
interface selection after resume particularly in scenarios involving
the FPGA.
To fix this:
- Move the SGMII_CTRL setup out of the probe function.
- Initialize the register in the hardware initialization helper function,
which is called during both device initialization and resume.
This ensures the interface configuration is consistently restored after
suspend/resume cycles.
Fixes:
a46d9d37c4f4f ("net: lan743x: Add support for SGMII interface")
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Link: https://patch.msgid.link/20250516035719.117960-1-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 19 May 2025 20:41:28 +0000 (13:41 -0700)]
bnxt_en: Fix netdev locking in ULP IRQ functions
netdev_lock is already held when calling bnxt_ulp_irq_stop() and
bnxt_ulp_irq_restart(). When converting rtnl_lock to netdev_lock,
the original code was rtnl_dereference() to indicate that rtnl_lock
was already held. rcu_dereference_protected() is the correct
conversion after replacing rtnl_lock with netdev_lock.
Add a new helper netdev_lock_dereference() similar to
rtnl_dereference().
Fixes:
004b5008016a ("eth: bnxt: remove most dependencies on RTNL")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250519204130.3097027-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>