linux-block.git
5 days agotools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared
Zhang Rui [Sat, 17 May 2025 09:44:50 +0000 (17:44 +0800)]
tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared

platform_features->rapl_msrs describes the RAPL MSRs supported. While
RAPL Perf counters can be exposed from different kernel backend drivers,
e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver.

Thus, turbostat should first blindly probe all the available RAPL Perf
counters, and falls back to the RAPL MSR counters if they are listed in
platform_features->rapl_msrs.

With this, platforms that don't have RAPL MSRs can clear the
platform_features->rapl_msrs bits and use RAPL Perf counters only.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Clean up add perf/msr counter logic
Zhang Rui [Sat, 17 May 2025 09:35:17 +0000 (17:35 +0800)]
tools/power turbostat: Clean up add perf/msr counter logic

Increase the code readability by moving the no_perf/no_msr flag and the
cai->perf_name/cai->msr sanity checks into the counter probe functions.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Introduce add_msr_counter()
Zhang Rui [Sat, 17 May 2025 07:58:51 +0000 (15:58 +0800)]
tools/power turbostat: Introduce add_msr_counter()

probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR
counters and MPERF/APERF/SMI MSR counters, thus its name is misleading.

Similar to add_perf_counter(), introduce add_msr_counter() to probe a
counter via MSR. Introduce wrapper function add_rapl_msr_counter() at
the same time to add extra check for Zero return value for specified
RAPL counters.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Remove add_msr_perf_counter_()
Zhang Rui [Sat, 17 May 2025 09:40:08 +0000 (17:40 +0800)]
tools/power turbostat: Remove add_msr_perf_counter_()

As the only caller of add_msr_perf_counter_(), add_msr_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.

Remove add_msr_perf_counter_() and move all the logic to
add_msr_perf_counter().

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Remove add_cstate_perf_counter_()
Zhang Rui [Sat, 17 May 2025 07:43:59 +0000 (15:43 +0800)]
tools/power turbostat: Remove add_cstate_perf_counter_()

As the only caller of add_cstate_perf_counter_(),
add_cstate_perf_counter() just gives extra debug output on top. There is
no need to keep both functions.

Remove add_cstate_perf_counter_() and move all the logic to
add_cstate_perf_counter().

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Remove add_rapl_perf_counter_()
Zhang Rui [Sat, 17 May 2025 04:06:22 +0000 (12:06 +0800)]
tools/power turbostat: Remove add_rapl_perf_counter_()

As the only caller of add_rapl_perf_counter_(), add_rapl_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.

Remove add_rapl_perf_counter_() and move all the logic to
add_rapl_perf_counter().

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Quit early for unsupported RAPL counters
Zhang Rui [Sat, 17 May 2025 02:26:14 +0000 (10:26 +0800)]
tools/power turbostat: Quit early for unsupported RAPL counters

Quit early for unsupported RAPL counters.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Always check rapl_joules flag
Zhang Rui [Fri, 30 May 2025 06:00:33 +0000 (14:00 +0800)]
tools/power turbostat: Always check rapl_joules flag

rapl_joules bit should always be checked even if
platform_features->rapl_msrs is not set or no_msr flag is used.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Fix AMD package-energy reporting
Gautham R. Shenoy [Thu, 29 May 2025 11:48:25 +0000 (17:18 +0530)]
tools/power turbostat: Fix AMD package-energy reporting

commit 05a2f07db888 ("tools/power turbostat: read RAPL counters via
perf") that adds support to read RAPL counters via perf defines the
notion of a RAPL domain_id which is set to physical_core_id on
platforms which support per_core_rapl counters (Eg: AMD processors
Family 17h onwards) and is set to the physical_package_id on all the
other platforms.

However, the physical_core_id is only unique within a package and on
platforms with multiple packages more than one core can have the same
physical_core_id and thus the same domain_id. (For eg, the first cores
of each package have the physical_core_id = 0). This results in all
these cores with the same physical_core_id using the same entry in the
rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the
perf-initialization for cores whose domain_ids have already been
visited, cores that have the same physical_core_id always read the
perf file corresponding to the physical_core_id of the first package
and thus the package-energy is incorrectly reported to be the same
value for different packages.

Note: This issue only arises when RAPL counters are read via perf and
not when they are read via MSRs since in the latter case the MSRs are
read separately on each core.

Fix this issue by associating each CPU with rapl_core_id which is
unique across all the packages in the system.

Fixes: 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Fix RAPL_GFX_ALL typo
Kaushlendra Kumar [Fri, 23 May 2025 08:06:59 +0000 (13:36 +0530)]
tools/power turbostat: Fix RAPL_GFX_ALL typo

Fix typo in the currently unused RAPL_GFX_ALL macro definition.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat: Add Android support for MSR device handling
Kaushlendra Kumar [Thu, 22 May 2025 08:49:46 +0000 (14:19 +0530)]
tools/power turbostat: Add Android support for MSR device handling

It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr,
updates error messages and permission checks to reflect the Android
device path, and wraps platform-specific code with #if defined(ANDROID)
to ensure correct behavior on both Android and non-Android systems.
These changes improve compatibility and usability of turbostat on
Android devices.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat.8: pm_domain wording fix
Len Brown [Fri, 18 Apr 2025 21:54:39 +0000 (17:54 -0400)]
tools/power turbostat.8: pm_domain wording fix

turbostat.8: clarify that uncore "domains" are Power Management domains,
aka pm_domains.

Signed-off-by: Len Brown <len.brown@intel.com>
5 days agotools/power turbostat.8: fix typo: idle_pct should be pct_idle
Len Brown [Wed, 9 Apr 2025 04:06:24 +0000 (00:06 -0400)]
tools/power turbostat.8: fix typo: idle_pct should be pct_idle

idle_pct should be pct_idle

Signed-off-by: Len Brown <len.brown@intel.com>
2 weeks agoLinux 6.15 v6.15
Linus Torvalds [Sun, 25 May 2025 23:09:23 +0000 (16:09 -0700)]
Linux 6.15

2 weeks agoDisable FOP_DONTCACHE for now due to bugs
Linus Torvalds [Sun, 25 May 2025 22:43:36 +0000 (15:43 -0700)]
Disable FOP_DONTCACHE for now due to bugs

This is kind of last-minute, but Al Viro reported that the new
FOP_DONTCACHE flag causes memory corruption due to use-after-free
issues.

This was triggered by commit 974c5e6139db ("xfs: flag as supporting
FOP_DONTCACHE"), but that is not the underlying bug - it is just the
first user of the flag.

Vlastimil Babka suspects the underlying problem stems from the
folio_end_writeback() logic introduced in commit fb7d3bc414939
("mm/filemap: drop streaming/uncached pages when writeback completes").

The most straightforward fix would be to just revert the commit that
exposed this, but Matthew Wilcox points out that other filesystems are
also starting to enable the FOP_DONTCACHE logic, so this instead
disables that bit globally for now.

The fix will hopefully end up being trivial and we can just re-enable
this logic after more testing, but until such a time we'll have to
disable the new FOP_DONTCACHE flag.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/all/20250525083209.GS2023217@ZenIV/
Triggered-by: 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE")
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 weeks agoMerge tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 25 May 2025 14:48:35 +0000 (07:48 -0700)]
Merge tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git./linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "22 hotfixes.

  13 are cc:stable and the remainder address post-6.14 issues or aren't
  considered necessary for -stable kernels. 19 are for MM"

* tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
  mailmap: add Jarkko's employer email address
  mm: fix copy_vma() error handling for hugetlb mappings
  memcg: always call cond_resched() after fn()
  mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  mm: vmalloc: only zero-init on vrealloc shrink
  mm: vmalloc: actually use the in-place vrealloc region
  alloc_tag: allocate percpu counters for module tags dynamically
  module: release codetag section when module load fails
  mm/cma: make detection of highmem_start more robust
  MAINTAINERS: add mm memory policy section
  MAINTAINERS: add mm ksm section
  kasan: avoid sleepable page allocation from atomic context
  highmem: add folio_test_partial_kmap()
  MAINTAINERS: add hung-task detector section
  taskstats: fix struct taskstats breaks backward compatibility since version 15
  mm/truncate: fix out-of-bounds when doing a right-aligned split
  MAINTAINERS: add mm reclaim section
  MAINTAINERS: update page allocator section
  mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y
  mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
  ...

2 weeks agomailmap: add Jarkko's employer email address
Jarkko Sakkinen [Fri, 23 May 2025 12:11:04 +0000 (15:11 +0300)]
mailmap: add Jarkko's employer email address

Add the current employer email address to mailmap.

Link: https://lkml.kernel.org/r/20250523121105.15850-1-jarkko@kernel.org
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Cc: Antonio Quartulli <antonio@openvpn.net>
Cc: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agomm: fix copy_vma() error handling for hugetlb mappings
Ricardo Cañuelo Navarro [Fri, 23 May 2025 12:19:10 +0000 (14:19 +0200)]
mm: fix copy_vma() error handling for hugetlb mappings

If, during a mremap() operation for a hugetlb-backed memory mapping,
copy_vma() fails after the source vma has been duplicated and opened (ie.
vma_link() fails), the error is handled by closing the new vma.  This
updates the hugetlbfs reservation counter of the reservation map which at
this point is referenced by both the source vma and the new copy.  As a
result, once the new vma has been freed and copy_vma() returns, the
reservation counter for the source vma will be incorrect.

This patch addresses this corner case by clearing the hugetlb private page
reservation reference for the new vma and decrementing the reference
before closing the vma, so that vma_close() won't update the reservation
counter.  This is also what copy_vma_and_data() does with the source vma
if copy_vma() succeeds, so a helper function has been added to do the
fixup in both functions.

The issue was reported by a private syzbot instance and can be reproduced
using the C reproducer in [1].  It's also a possible duplicate of public
syzbot report [2].  The WARNING report is:

============================================================
page_counter underflow: -1024 nr_pages=1024
WARNING: CPU: 0 PID: 3287 at mm/page_counter.c:61 page_counter_cancel+0xf6/0x120
Modules linked in:
CPU: 0 UID: 0 PID: 3287 Comm: repro__WARNING_ Not tainted 6.15.0-rc7+ #54 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
RIP: 0010:page_counter_cancel+0xf6/0x120
Code: ff 5b 41 5e 41 5f 5d c3 cc cc cc cc e8 f3 4f 8f ff c6 05 64 01 27 06 01 48 c7 c7 60 15 f8 85 48 89 de 4c 89 fa e8 2a a7 51 ff <0f> 0b e9 66 ff ff ff 44 89 f9 80 e1 07 38 c1 7c 9d 4c 81
RSP: 0018:ffffc900025df6a0 EFLAGS: 00010246
RAX: 2edfc409ebb44e00 RBX: fffffffffffffc00 RCX: ffff8880155f0000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff81c4a23c R09: 1ffff1100330482a
R10: dffffc0000000000 R11: ffffed100330482b R12: 0000000000000000
R13: ffff888058a882c0 R14: ffff888058a882c0 R15: 0000000000000400
FS:  0000000000000000(0000) GS:ffff88808fc53000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004b33e0 CR3: 00000000076d6000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 page_counter_uncharge+0x33/0x80
 hugetlb_cgroup_uncharge_counter+0xcb/0x120
 hugetlb_vm_op_close+0x579/0x960
 ? __pfx_hugetlb_vm_op_close+0x10/0x10
 remove_vma+0x88/0x130
 exit_mmap+0x71e/0xe00
 ? __pfx_exit_mmap+0x10/0x10
 ? __mutex_unlock_slowpath+0x22e/0x7f0
 ? __pfx_exit_aio+0x10/0x10
 ? __up_read+0x256/0x690
 ? uprobe_clear_state+0x274/0x290
 ? mm_update_next_owner+0xa9/0x810
 __mmput+0xc9/0x370
 exit_mm+0x203/0x2f0
 ? __pfx_exit_mm+0x10/0x10
 ? taskstats_exit+0x32b/0xa60
 do_exit+0x921/0x2740
 ? do_raw_spin_lock+0x155/0x3b0
 ? __pfx_do_exit+0x10/0x10
 ? __pfx_do_raw_spin_lock+0x10/0x10
 ? _raw_spin_lock_irq+0xc5/0x100
 do_group_exit+0x20c/0x2c0
 get_signal+0x168c/0x1720
 ? __pfx_get_signal+0x10/0x10
 ? schedule+0x165/0x360
 arch_do_signal_or_restart+0x8e/0x7d0
 ? __pfx_arch_do_signal_or_restart+0x10/0x10
 ? __pfx___se_sys_futex+0x10/0x10
 syscall_exit_to_user_mode+0xb8/0x2c0
 do_syscall_64+0x75/0x120
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x422dcd
Code: Unable to access opcode bytes at 0x422da3.
RSP: 002b:00007ff266cdb208 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 00007ff266cdbcdc RCX: 0000000000422dcd
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004c7bec
RBP: 00007ff266cdb220 R08: 203a6362696c6720 R09: 203a6362696c6720
R10: 0000200000c00000 R11: 0000000000000246 R12: ffffffffffffffd0
R13: 0000000000000002 R14: 00007ffe1cb5f520 R15: 00007ff266cbb000
 </TASK>
============================================================

Link: https://lkml.kernel.org/r/20250523-warning_in_page_counter_cancel-v2-1-b6df1a8cfefd@igalia.com
Link: https://people.igalia.com/rcn/kernel_logs/20250422__WARNING_in_page_counter_cancel__repro.c
Link: https://lore.kernel.org/all/67000a50.050a0220.49194.048d.GAE@google.com/
Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Florent Revest <revest@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agomemcg: always call cond_resched() after fn()
Breno Leitao [Fri, 23 May 2025 17:21:06 +0000 (10:21 -0700)]
memcg: always call cond_resched() after fn()

I am seeing soft lockup on certain machine types when a cgroup OOMs.  This
is happening because killing the process in certain machine might be very
slow, which causes the soft lockup and RCU stalls.  This happens usually
when the cgroup has MANY processes and memory.oom.group is set.

Example I am seeing in real production:

       [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) ....
       ....
       [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) ....
       [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982]
       ....

Quick look at why this is so slow, it seems to be related to serial flush
for certain machine types.  For all the crashes I saw, the target CPU was
at console_flush_all().

In the case above, there are thousands of processes in the cgroup, and it
is soft locking up before it reaches the 1024 limit in the code (which
would call the cond_resched()).  So, cond_resched() in 1024 blocks is not
sufficient.

Remove the counter-based conditional rescheduling logic and call
cond_resched() unconditionally after each task iteration, after fn() is
called.  This avoids the lockup independently of how slow fn() is.

Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org
Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Rik van Riel <riel@surriel.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Michael van der Westhuizen <rmikey@meta.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agomm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
Ge Yang [Thu, 22 May 2025 03:22:17 +0000 (11:22 +0800)]
mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios

A kernel crash was observed when replacing free hugetlb folios:

BUG: kernel NULL pointer dereference, address: 0000000000000028
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
FS:  00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
Call Trace:
<TASK>
 replace_free_hugepage_folios+0xb6/0x100
 alloc_contig_range_noprof+0x18a/0x590
 ? srso_return_thunk+0x5/0x5f
 ? down_read+0x12/0xa0
 ? srso_return_thunk+0x5/0x5f
 cma_range_alloc.constprop.0+0x131/0x290
 __cma_alloc+0xcf/0x2c0
 cma_alloc_write+0x43/0xb0
 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
 debugfs_attr_write+0x46/0x70
 full_proxy_write+0x62/0xa0
 vfs_write+0xf8/0x420
 ? srso_return_thunk+0x5/0x5f
 ? filp_flush+0x86/0xa0
 ? srso_return_thunk+0x5/0x5f
 ? filp_close+0x1f/0x30
 ? srso_return_thunk+0x5/0x5f
 ? do_dup2+0xaf/0x160
 ? srso_return_thunk+0x5/0x5f
 ksys_write+0x65/0xe0
 do_syscall_64+0x64/0x170
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

There is a potential race between __update_and_free_hugetlb_folio() and
replace_free_hugepage_folios():

CPU1                              CPU2
__update_and_free_hugetlb_folio   replace_free_hugepage_folios
                                    folio_test_hugetlb(folio)
                                    -- It's still hugetlb folio.

  __folio_clear_hugetlb(folio)
  hugetlb_free_folio(folio)
                                    h = folio_hstate(folio)
                                    -- Here, h is NULL pointer

When the above race condition occurs, folio_hstate(folio) returns NULL,
and subsequent access to this NULL pointer will cause the system to crash.
To resolve this issue, execute folio_hstate(folio) under the protection
of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not
return NULL.

Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com
Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
Signed-off-by: Ge Yang <yangge1116@126.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agomm: vmalloc: only zero-init on vrealloc shrink
Kees Cook [Thu, 15 May 2025 21:42:16 +0000 (14:42 -0700)]
mm: vmalloc: only zero-init on vrealloc shrink

The common case is to grow reallocations, and since init_on_alloc will
have already zeroed the whole allocation, we only need to zero when
shrinking the allocation.

Link: https://lkml.kernel.org/r/20250515214217.619685-2-kees@kernel.org
Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing")
Signed-off-by: Kees Cook <kees@kernel.org>
Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: "Erhard F." <erhard_f@mailbox.org>
Cc: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agomm: vmalloc: actually use the in-place vrealloc region
Kees Cook [Thu, 15 May 2025 21:42:15 +0000 (14:42 -0700)]
mm: vmalloc: actually use the in-place vrealloc region

Patch series "mm: vmalloc: Actually use the in-place vrealloc region".

This fixes a performance regression[1] with vrealloc()[1].

The refactoring to not build a new vmalloc region only actually worked
when shrinking.  Actually return the resized area when it grows.  Ugh.

Link: https://lkml.kernel.org/r/20250515214217.619685-1-kees@kernel.org
Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing")
Signed-off-by: Kees Cook <kees@kernel.org>
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Closes: https://lore.kernel.org/all/20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg [1]
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Cc: "Erhard F." <erhard_f@mailbox.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agoalloc_tag: allocate percpu counters for module tags dynamically
Suren Baghdasaryan [Sat, 17 May 2025 00:07:39 +0000 (17:07 -0700)]
alloc_tag: allocate percpu counters for module tags dynamically

When a module gets unloaded it checks whether any of its tags are still in
use and if so, we keep the memory containing module's allocation tags
alive until all tags are unused.  However percpu counters referenced by
the tags are freed by free_module().  This will lead to UAF if the memory
allocated by a module is accessed after module was unloaded.

To fix this we allocate percpu counters for module allocation tags
dynamically and we keep it alive for tags which are still in use after
module unloading.  This also removes the requirement of a larger
PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because
percpu memory for counters does not need to be reserved anymore.

Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com
Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/
Tested-by: David Wang <00107082@163.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agomodule: release codetag section when module load fails
David Wang [Mon, 19 May 2025 16:38:23 +0000 (00:38 +0800)]
module: release codetag section when module load fails

When module load fails after memory for codetag section is ready, codetag
section memory will not be properly released.  This causes memory leak,
and if next module load happens to get the same module address, codetag
may pick the uninitialized section when manipulating tags during module
unload, and leads to "unable to handle page fault" BUG.

Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com
Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory")
Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/
Signed-off-by: David Wang <00107082@163.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agomm/cma: make detection of highmem_start more robust
Mike Rapoport (Microsoft) [Mon, 19 May 2025 17:18:05 +0000 (20:18 +0300)]
mm/cma: make detection of highmem_start more robust

Pratyush Yadav reports the following crash:

    ------------[ cut here ]------------
    kernel BUG at arch/x86/mm/physaddr.c:23!
    ception 0x06 IP 10:ffffffff812ebbf8 error 0 cr2 0xffff88903ffff000
    CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc6+ #231 PREEMPT(undef)
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
    RIP: 0010:__phys_addr+0x58/0x60
    Code: 01 48 89 c2 48 d3 ea 48 85 d2 75 05 e9 91 52 cf 00 0f 0b 48 3d ff ff ff 1f 77 0f 48 8b 05 20 54 55 01 48 01 d0 e9 78 52 cf 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
    RSP: 0000:ffffffff82803dd8 EFLAGS: 00010006 ORIG_RAX: 0000000000000000
    RAX: 000000007fffffff RBX: 00000000ffffffff RCX: 0000000000000000
    RDX: 000000007fffffff RSI: 0000000280000000 RDI: ffffffffffffffff
    RBP: ffffffff82803e68 R08: 0000000000000000 R09: 0000000000000000
    R10: ffffffff83153180 R11: ffffffff82803e48 R12: ffffffff83c9aed0
    R13: 0000000000000000 R14: 0000001040000000 R15: 0000000000000000
    FS:  0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff88903ffff000 CR3: 0000000002838000 CR4: 00000000000000b0
    Call Trace:
     <TASK>
     ? __cma_declare_contiguous_nid+0x6e/0x340
     ? cma_declare_contiguous_nid+0x33/0x70
     ? dma_contiguous_reserve_area+0x2f/0x70
     ? setup_arch+0x6f1/0x870
     ? start_kernel+0x52/0x4b0
     ? x86_64_start_reservations+0x29/0x30
     ? x86_64_start_kernel+0x7c/0x80
     ? common_startup_64+0x13e/0x141

  The reason is that __cma_declare_contiguous_nid() does:

          highmem_start = __pa(high_memory - 1) + 1;

  If dma_contiguous_reserve_area() (or any other CMA declaration) is
  called before free_area_init(), high_memory is uninitialized. Without
  CONFIG_DEBUG_VIRTUAL, it will likely work but use the wrong value for
  highmem_start.

The issue occurs because commit e120d1bc12da ("arch, mm: set high_memory
in free_area_init()") moved initialization of high_memory after the call
to dma_contiguous_reserve() -> __cma_declare_contiguous_nid() on several
architectures.

In the case CONFIG_HIGHMEM is enabled, some architectures that actually
support HIGHMEM (arm, powerpc and x86) have initialization of high_memory
before a possible call to __cma_declare_contiguous_nid() and some
initialized high_memory late anyway (arc, csky, microblase, mips, sparc,
xtensa) even before the commit e120d1bc12da so they are fine with using
uninitialized value of high_memory.

And in the case CONFIG_HIGHMEM is disabled high_memory essentially becomes
the first address after memory end, so instead of relying on high_memory
to calculate highmem_start use memblock_end_of_DRAM() and eliminate the
dependency of CMA area creation on high_memory in majority of
configurations.

Link: https://lkml.kernel.org/r/20250519171805.1288393-1-rppt@kernel.org
Fixes: e120d1bc12da ("arch, mm: set high_memory in free_area_init()")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: Pratyush Yadav <ptyadav@amazon.de>
Tested-by: Pratyush Yadav <ptyadav@amazon.de>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 weeks agoMerge tag 'input-for-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 25 May 2025 01:54:18 +0000 (18:54 -0700)]
Merge tag 'input-for-v6.15-rc7' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - even more Xbox controllers added to xpad driver: Turtle Beach Recon
   Wired Controller, Turtle Beach Stealth Ultra, and PowerA Wired
   Controller

 - a fix to Synaptics RMI driver to not crash if controller reports
   unsupported version of F34 (firmware flash) function

* tag 'input-for-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics-rmi - fix crash with unsupported versions of F34
  Input: xpad - add more controllers

2 weeks agoMerge tag 'spi-fix-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Sun, 25 May 2025 01:48:17 +0000 (18:48 -0700)]
Merge tag 'spi-fix-v6.15-rc7' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few final fixes for v6.15, some driver fixes for the Freescale DSPI
  driver pulled over from their vendor code and another instance of the
  fixes Greg has been sending throughout the kernel for constification
  of the bus_type in driver core match() functions"

* tag 'spi-fix-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-fsl-dspi: Reset SR flags before sending a new message
  spi: spi-fsl-dspi: Halt the module after a new message transfer
  spi: spi-fsl-dspi: restrict register range for regmap access
  spi: use container_of_cont() for to_spi_device()

2 weeks agoMerge tag 'iommu-fixes-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 24 May 2025 16:01:41 +0000 (09:01 -0700)]
Merge tag 'iommu-fixes-v6.15-rc7' of git://git./linux/kernel/git/iommu/linux

Pull iommu fix from Joerg Roedel:

 - core: skip PASID validation for devices without PASID support

* tag 'iommu-fixes-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu: Skip PASID validation for devices without PASID capability

3 weeks agoMerge tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 23 May 2025 22:17:55 +0000 (15:17 -0700)]
Merge tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly drm fixes pull, on target to be quiet, just one amdgpu, one
  edid and a few minor xe fixes.

  edid:
   - fix HDR metadata reset

  amdgpu:
   - Hibernate fix

  xe:
   - Make sure to check all forcewakes when dumping mocs
   - Fix wrong use of read64 on 32b register
   - Synchronize Panther Lake PCI IDs"

* tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/ptl: Update the PTL pci id table
  drm/xe: Use xe_mmio_read32() to read mtcfg register
  drm/xe/mocs: Check if all domains awake
  Revert "drm/amd: Keep display off while going into S4"
  drm/edid: fixed the bug that hdr metadata was not reset

3 weeks agoMerge tag 'drm-xe-fixes-2025-05-23' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Fri, 23 May 2025 21:42:16 +0000 (07:42 +1000)]
Merge tag 'drm-xe-fixes-2025-05-23' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Make sure to check all forcewakes when dumping mocs
- Fix wrong use of read64 on 32b register
- Synchronize Panther Lake PCI IDs

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/uixp5cq7emz32lmwwvq4vbujppugfozhyj3cm2aqzx4lcg7ivn@m2khvf4kvz5p
3 weeks agoMerge tag 'amd-drm-fixes-6.15-2025-05-22' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 23 May 2025 21:41:55 +0000 (07:41 +1000)]
Merge tag 'amd-drm-fixes-6.15-2025-05-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.15-2025-05-22:

amdgpu:
- Hibernate fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250522183941.9606-1-alexander.deucher@amd.com
3 weeks agoMerge tag 'drm-misc-fixes-2025-05-22' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Fri, 23 May 2025 21:37:45 +0000 (07:37 +1000)]
Merge tag 'drm-misc-fixes-2025-05-22' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

edid:
- fix HDR metadata reset

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250522113902.GA7000@localhost.localdomain
3 weeks agoMerge tag 'thermal-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 23 May 2025 16:47:43 +0000 (09:47 -0700)]
Merge tag 'thermal-6.15-rc8' of git://git./linux/kernel/git/rafael/linux-pm

Pull thermal control fix from Rafael Wysocki:
 "This fixes a coding mistake in the x86_pkg_temp_thermal Intel thermal
  driver that was introduced by an incorrect conflict resolution during
  a merge (Zhang Rui)"

* tag 'thermal-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: x86_pkg_temp_thermal: Fix bogus trip temperature

3 weeks agoMerge tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Fri, 23 May 2025 15:42:29 +0000 (08:42 -0700)]
Merge tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Fix for rename regression due to the recent VFS lookup changes

 - Fix write failure

 - locking fix for oplock handling

* tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: use list_first_entry_or_null for opinfo_get_list()
  ksmbd: fix rename failure
  ksmbd: fix stream write failure

3 weeks agoMerge tag 'soc-fixes-6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Fri, 23 May 2025 15:04:13 +0000 (08:04 -0700)]
Merge tag 'soc-fixes-6.15-3' of git://git./linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "A few last minute fixes:

   - two driver fixes for samsung/google platforms, both addressing
     mistakes in changes from the 6.15 merge window

   - a revert for an allwinner devicetree change that caused problems

   - a fix for an older regression with the LEDs on Marvell Armada 3720

   - a defconfig change to enable chacha20 again after a crypto
     subsystem change in 6.15 inadventently turned it off"

* tag 'soc-fixes-6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selected
  arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs
  Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection"
  soc: samsung: usi: prevent wrong bits inversion during unconfiguring
  firmware: exynos-acpm: check saved RX before bailing out on empty RX queue

3 weeks agoMerge tag 'platform-drivers-x86-v6.15-6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 23 May 2025 14:59:39 +0000 (07:59 -0700)]
Merge tag 'platform-drivers-x86-v6.15-6' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Ilpo Järvinen:

 - dell-wmi-sysman: Avoid buffer overflow in current_password_store()

 - fujitsu-laptop: Support Lifebook S2110 hotkeys

 - intel/pmc: Fix Arrow Lake U/H NPU PCI ID

 - think-lmi: Fix attribute name usage for non-compliant items

 - thinkpad_acpi: Ignore battery threshold change event notification

* tag 'platform-drivers-x86-v6.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID
  platform/x86: think-lmi: Fix attribute name usage for non-compliant items
  platform/x86: thinkpad_acpi: Ignore battery threshold change event notification
  platform/x86: dell-wmi-sysman: Avoid buffer overflow in current_password_store()
  platform/x86: fujitsu-laptop: Support Lifebook S2110 hotkeys

3 weeks agoMerge tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Fri, 23 May 2025 14:51:05 +0000 (07:51 -0700)]
Merge tag 'vfs-6.15-rc8.fixes' of git://git./linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "This contains a small set of fixes for the blocking buffer lookup
  conversion done earlier this cycle.

  It adds a missing conversion in the getblk slowpath and a few minor
  optimizations and cleanups"

* tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs/buffer: optimize discard_buffer()
  fs/buffer: remove superfluous statements
  fs/buffer: avoid redundant lookup in getblk slowpath
  fs/buffer: use sleeping lookup in __getblk_slowpath()

3 weeks agoplatform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID
Todd Brandt [Tue, 20 May 2025 10:45:55 +0000 (03:45 -0700)]
platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID

The ARL requires that the GMA and NPU devices both be in D3Hot in order
for PC10 and S0iX to be achieved in S2idle. The original ARL-H/U addition
to the intel_pmc_core driver attempted to do this by switching them to D3
in the init and resume calls of the intel_pmc_core driver.

The problem is the ARL-H/U have a different NPU device and thus are not
being properly set and thus S0iX does not work properly in ARL-H/U. This
patch creates a new ARL-H specific device id that is correct and also
adds the D3 fixup to the suspend callback. This way if the PCI devies
drop from D3 to D0 after resume they can be corrected for the next
suspend. Thus there is no dropout in S0iX.

Fixes: bd820906ea9d ("platform/x86/intel/pmc: Add Arrow Lake U/H support to intel_pmc_core driver")
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Link: https://lore.kernel.org/r/a61f78be45c13f39e122dcc684b636f4b21e79a0.1747737446.git.todd.e.brandt@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 weeks agodrm/xe/ptl: Update the PTL pci id table
Matt Atwood [Tue, 20 May 2025 19:57:49 +0000 (12:57 -0700)]
drm/xe/ptl: Update the PTL pci id table

Update to current bspec table.

Bspec: 72574

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://lore.kernel.org/r/20250520195749.371748-1-matthew.s.atwood@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 49c6dc74b5968885f421f9f1b45eb4890b955870)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 weeks agodrm/xe: Use xe_mmio_read32() to read mtcfg register
Shuicheng Lin [Tue, 13 May 2025 15:30:10 +0000 (15:30 +0000)]
drm/xe: Use xe_mmio_read32() to read mtcfg register

The mtcfg register is a 32-bit register and should therefore be
accessed using xe_mmio_read32().

Other 3 changes per codestyle suggestion:
"
xe_mmio.c:83: CHECK: Alignment should match open parenthesis
xe_mmio.c:131: CHECK: Comparison to NULL could be written "!xe->mmio.regs"
xe_mmio.c:315: CHECK: line length of 103 exceeds 100 columns
"

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250513153010.3464767-1-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit d2662cf8f44a68deb6c76ad9f1d9f29dbf7ba601)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 weeks agodrm/xe/mocs: Check if all domains awake
Tejas Upadhyay [Tue, 6 May 2025 14:23:00 +0000 (19:53 +0530)]
drm/xe/mocs: Check if all domains awake

Check if all domains are awake specially for
LNCF regs

Fixes: 298661cd9cea ("drm/xe: Fix MOCS debugfs LNCF readout")
Improvements-suggested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250506142300.1865783-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit a383cf218ef8bb35d4c03958bd956573b65cf778)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 weeks agoMerge tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs
Linus Torvalds [Fri, 23 May 2025 02:17:55 +0000 (19:17 -0700)]
Merge tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Small stuff, main ones users will be interested in:

   - Couple more casefolding fixes; we can now detect and repair
     casefolded dirents in non-casefolded dir and vice versa

   - Fix for massive write inflation with mmapped io, which hit certain
     databases"

* tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs:
  bcachefs: Check for casefolded dirents in non casefolded dirs
  bcachefs: Fix bch2_dirent_create_snapshot() for casefolding
  bcachefs: Fix casefold opt via xattr interface
  bcachefs: mkwrite() now only dirties one page
  bcachefs: fix extent_has_stripe_ptr()
  bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced

3 weeks agoMerge tag 'pmdomain-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh...
Linus Torvalds [Thu, 22 May 2025 23:25:23 +0000 (16:25 -0700)]
Merge tag 'pmdomain-v6.15-rc3' of git://git./linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:
 "Core:

   - Fix error checking in genpd_dev_pm_attach_by_id()

  Providers:

   - renesas: Remove obsolete nullify checks for rcar domains"

* tag 'pmdomain-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  pmdomain: core: Fix error checking in genpd_dev_pm_attach_by_id()
  pmdomain: renesas: rcar: Remove obsolete nullify checks

3 weeks agoMerge tag 'mmc-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Thu, 22 May 2025 23:15:59 +0000 (16:15 -0700)]
Merge tag 'mmc-v6.15-rc6' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC host fixes from Ulf Hansson:

 - sdhci_am654: Fix MMC init failures on am62x boards

 - sdhci-of-dwcmshc: Add PD workaround on RK3576 to avoid hang

* tag 'mmc-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci_am654: Add SDHCI_QUIRK2_SUPPRESS_V1P8_ENA quirk to am62 compatible
  mmc: sdhci-of-dwcmshc: add PD workaround on RK3576

3 weeks agoMerge tag 'block-6.15-20250522' of git://git.kernel.dk/linux
Linus Torvalds [Thu, 22 May 2025 20:08:21 +0000 (13:08 -0700)]
Merge tag 'block-6.15-20250522' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - Fix for a regression with setting up loop on a file system
   without ->write_iter()

 - Fix for an nvme sysfs regression

* tag 'block-6.15-20250522' of git://git.kernel.dk/linux:
  nvme: avoid creating multipath sysfs group under namespace path devices
  loop: don't require ->write_iter for writable files in loop_configure

3 weeks agoMerge tag 'io_uring-6.15-20250522' of git://git.kernel.dk/linux
Linus Torvalds [Thu, 22 May 2025 20:05:00 +0000 (13:05 -0700)]
Merge tag 'io_uring-6.15-20250522' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Kill a duplicate function definition, which can cause linking issues
   in certain .config configurations. Introduced in this cycle.

 - Fix for a potential overflow CQE reordering issue if a re-schedule is
   done during posting. Heading to stable.

 - Fix for an issue with recv bundles, where certain conditions can lead
   to gaps in the buffers, where a contiguous buffer range was expected.
   Heading to stable.

* tag 'io_uring-6.15-20250522' of git://git.kernel.dk/linux:
  io_uring/net: only retry recv bundle for a full transfer
  io_uring: fix overflow resched cqe reordering
  io_uring/cmd: axe duplicate io_uring_cmd_import_fixed_vec() declaration

3 weeks agoMerge tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Thu, 22 May 2025 19:35:16 +0000 (12:35 -0700)]
Merge tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Two fixes for use after free in readdir code paths

* tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: Reset all search buffer pointers when releasing buffer
  smb: client: Fix use-after-free in cifs_fill_dirent

3 weeks agoMerge tag 'net-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 22 May 2025 16:15:19 +0000 (09:15 -0700)]
Merge tag 'net-6.15-rc8' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "This is somewhat larger than what I hoped for, with a few PRs from
  subsystems and follow-ups for the recent netdev locking changes,
  anyhow there are no known pending regressions.

  Including fixes from bluetooth, ipsec and CAN.

  Current release - regressions:

   - eth: team: grab team lock during team_change_rx_flags

   - eth: bnxt_en: fix netdev locking in ULP IRQ functions

  Current release - new code bugs:

   - xfrm: ipcomp: fix truesize computation on receive

   - eth: airoha: fix page recycling in airoha_qdma_rx_process()

  Previous releases - regressions:

   - sched: hfsc: fix qlen accounting bug when using peek in
     hfsc_enqueue()

   - mr: consolidate the ipmr_can_free_table() checks.

   - bridge: netfilter: fix forwarding of fragmented packets

   - xsk: bring back busy polling support in XDP_COPY

   - can:
       - add missing rcu read protection for procfs content
       - kvaser_pciefd: force IRQ edge in case of nested IRQ

  Previous releases - always broken:

   - xfrm: espintcp: remove encap socket caching to avoid reference leak

   - bluetooth: use skb_pull to avoid unsafe access in QCA dump handling

   - eth: idpf:
       - fix null-ptr-deref in idpf_features_check
       - fix idpf_vport_splitq_napi_poll()

   - eth: hibmcge: fix wrong ndo.open() after reset fail issue"

* tag 'net-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG
  octeontx2-af: Set LMT_ENA bit for APR table entries
  net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done
  octeontx2-pf: Avoid adding dcbnl_ops for LBK and SDP vf
  selftests/tc-testing: Add an HFSC qlen accounting test
  sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
  idpf: fix idpf_vport_splitq_napi_poll()
  net: hibmcge: fix wrong ndo.open() after reset fail issue.
  net: hibmcge: fix incorrect statistics update issue
  xsk: Bring back busy polling support in XDP_COPY
  can: slcan: allow reception of short error messages
  net: lan743x: Restore SGMII CTRL register on resume
  bnxt_en: Fix netdev locking in ULP IRQ functions
  MAINTAINERS: Drop myself to reviewer for ravb driver
  net: dwmac-sun8i: Use parsed internal PHY address instead of 1
  net: ethernet: ti: am65-cpsw: Lower random mac address error print to info
  can: kvaser_pciefd: Continue parsing DMA buf after dropped RX
  can: kvaser_pciefd: Fix echo_skb race
  can: kvaser_pciefd: Force IRQ edge in case of nested IRQ
  idpf: fix null-ptr-deref in idpf_features_check
  ...

3 weeks agoRevert "drm/amd: Keep display off while going into S4"
Mario Limonciello [Thu, 22 May 2025 14:13:28 +0000 (09:13 -0500)]
Revert "drm/amd: Keep display off while going into S4"

commit 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4")
attempted to keep displays off during the S4 sequence by not resuming
display IP.  This however leads to hangs because DRM clients such as the
console can try to access registers and cause a hang.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155
Fixes: 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e485502c37b097b0bd773baa7e2741bf7bd2909a)
Cc: stable@vger.kernel.org
3 weeks agoMerge tag 'pinctrl-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Thu, 22 May 2025 16:08:54 +0000 (09:08 -0700)]
Merge tag 'pinctrl-v6.15-4' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "This deals with a crash in the Qualcomm pin controller GPIO
  parts when using hogs.

  The first patch to gpiolib makes gpiochip_line_is_valid()
  NULL-tolerant.

  The second patch fixes the actual problem"

* tag 'pinctrl-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: qcom: switch to devm_register_sys_off_handler()
  gpiolib: don't crash on enabling GPIO HOG pins

3 weeks agoMerge tag 'sound-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Linus Torvalds [Thu, 22 May 2025 16:05:29 +0000 (09:05 -0700)]
Merge tag 'sound-6.15' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes for 6.15 final. It became slightly a
  higher amount than expected, but all look easy and safe to apply:

   - A fix for PCM core race spotted by fuzzing

   - ASoC topology fix for single DAI link

   - UAF fix for ASoC SOF Intel HD-audio at reloading

   - ASoC SOF Intel and Mediatek fixes

   - Trivial HD-audio quirks as usual"

* tag 'sound-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Add new HP ZBook laptop with micmute led fixup
  ALSA: hda/realtek: Add support for HP Agusta using CS35L41 HDA
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ASP10
  ALSA: hda/realtek - restore auto-mute mode for Dell Chrome platform
  ALSA: pcm: Fix race of buffer access at PCM OSS layer
  ASoC: SOF: Intel: hda: Fix UAF when reloading module
  ASoc: SOF: topology: connect DAI to a single DAI link
  ASoC: SOF: Intel: hda-bus: Use PIO mode on ACE2+ platforms
  ASoC: SOF: ipc4-pcm: Delay reporting is only supported for playback direction
  ASoC: SOF: ipc4-control: Use SOF_CTRL_CMD_BINARY as numid for bytes_ext
  ASoC: mediatek: mt8188-mt6359: Depend on MT6359_ACCDET set or disabled
  ASoC: mediatek: mt8188-mt6359: select CONFIG_SND_SOC_MT6359_ACCDET

3 weeks agoMerge tag 'nvme-6.15-2025-05-22' of git://git.infradead.org/nvme into block-6.15 block-6.15 block-6.15-20250522
Jens Axboe [Thu, 22 May 2025 15:25:47 +0000 (09:25 -0600)]
Merge tag 'nvme-6.15-2025-05-22' of git://git.infradead.org/nvme into block-6.15

Pull NVMe fix from Christoph:

"nvme fixes for Linux 6.15

 - do not create the newly added multipath sysfs group for
   non-multipath nodes (Nilay Shroff)"

* tag 'nvme-6.15-2025-05-22' of git://git.infradead.org/nvme:
  nvme: avoid creating multipath sysfs group under namespace path devices

3 weeks agospi: spi-fsl-dspi: Reset SR flags before sending a new message
Larisa Grigore [Thu, 22 May 2025 14:51:32 +0000 (15:51 +0100)]
spi: spi-fsl-dspi: Reset SR flags before sending a new message

If, in a previous transfer, the controller sends more data than expected
by the DSPI target, SR.RFDF (RX FIFO is not empty) will remain asserted.
When flushing the FIFOs at the beginning of a new transfer (writing 1
into MCR.CLR_TXF and MCR.CLR_RXF), SR.RFDF should also be cleared.
Otherwise, when running in target mode with DMA, if SR.RFDF remains
asserted, the DMA callback will be fired before the controller sends any
data.

Take this opportunity to reset all Status Register fields.

Fixes: 5ce3cc567471 ("spi: spi-fsl-dspi: Provide support for DSPI slave mode operation (Vybryd vf610)")
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-3-bea884630cfb@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agospi: spi-fsl-dspi: Halt the module after a new message transfer
Bogdan-Gabriel Roman [Thu, 22 May 2025 14:51:31 +0000 (15:51 +0100)]
spi: spi-fsl-dspi: Halt the module after a new message transfer

The XSPI mode implementation in this driver still uses the EOQ flag to
signal the last word in a transmission and deassert the PCS signal.
However, at speeds lower than ~200kHZ, the PCS signal seems to remain
asserted even when SR[EOQF] = 1 indicates the end of a transmission.
This is a problem for target devices which require the deassertation of
the PCS signal between transfers.

Hence, this commit 'forces' the deassertation of the PCS by stopping the
module through MCR[HALT] after completing a new transfer. According to
the reference manual, the module stops or transitions from the Running
state to the Stopped state after the current frame, when any one of the
following conditions exist:
- The value of SR[EOQF] = 1.
- The chip is in Debug mode and the value of MCR[FRZ] = 1.
- The value of MCR[HALT] = 1.

This shouldn't be done if the last transfer in the message has cs_change
set.

Fixes: ea93ed4c181b ("spi: spi-fsl-dspi: Use EOQ for last word in buffer even for XSPI mode")
Signed-off-by: Bogdan-Gabriel Roman <bogdan-gabriel.roman@nxp.com>
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-2-bea884630cfb@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agospi: spi-fsl-dspi: restrict register range for regmap access
Larisa Grigore [Thu, 22 May 2025 14:51:30 +0000 (15:51 +0100)]
spi: spi-fsl-dspi: restrict register range for regmap access

DSPI registers are NOT continuous, some registers are reserved and
accessing them from userspace will trigger external abort, add regmap
register access table to avoid below abort.

  For example on S32G:

  # cat /sys/kernel/debug/regmap/401d8000.spi/registers

  Internal error: synchronous external abort: 96000210 1 PREEMPT SMP
  ...
  Call trace:
  regmap_mmio_read32le+0x24/0x48
  regmap_mmio_read+0x48/0x70
  _regmap_bus_reg_read+0x38/0x48
  _regmap_read+0x68/0x1b0
  regmap_read+0x50/0x78
  regmap_read_debugfs+0x120/0x338

Fixes: 1acbdeb92c87 ("spi/fsl-dspi: Convert to use regmap and add big-endian support")
Co-developed-by: Xulin Sun <xulin.sun@windriver.com>
Signed-off-by: Xulin Sun <xulin.sun@windriver.com>
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-1-bea884630cfb@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agospi: use container_of_cont() for to_spi_device()
Greg Kroah-Hartman [Thu, 22 May 2025 10:47:31 +0000 (12:47 +0200)]
spi: use container_of_cont() for to_spi_device()

Some places in the spi core pass in a const pointer to a device and the
default container_of() casts that away, which is not a good idea.
Preserve the proper const attribute by using container_of_const() for
to_spi_device() instead, which is what it was designed for.

Note, this removes the NULL check for a device pointer in the call, but
no one was ever checking for that return value, and a device pointer
should never be NULL overall anyway, so this should be a safe change.

Cc: Mark Brown <broonie@kernel.org>
Fixes: d69d80484598 ("driver core: have match() callback in struct bus_type take a const *")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2025052230-fidgeting-stooge-66f5@gregkh
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoMerge tag 'linux-can-fixes-for-6.15-20250521' of git://git.kernel.org/pub/scm/linux...
Paolo Abeni [Thu, 22 May 2025 10:32:38 +0000 (12:32 +0200)]
Merge tag 'linux-can-fixes-for-6.15-20250521' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-05-22

this is a pull request of 4 patches for net/main.

The first 3 patches are by Axel Forsman and fix a ISR race condition
in the kvaser_pciefd driver.

The last patch is by Carlos Sanchez and fixes the reception of short
error messages in the slcan driver.

* tag 'linux-can-fixes-for-6.15-20250521' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: slcan: allow reception of short error messages
  can: kvaser_pciefd: Continue parsing DMA buf after dropped RX
  can: kvaser_pciefd: Fix echo_skb race
  can: kvaser_pciefd: Force IRQ edge in case of nested IRQ
====================

Link: https://patch.msgid.link/20250522082344.490913-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoMerge branch 'octeontx2-af-apr-mapping-fixes'
Paolo Abeni [Thu, 22 May 2025 10:17:44 +0000 (12:17 +0200)]
Merge branch 'octeontx2-af-apr-mapping-fixes'

Geetha sowjanya says:

====================
octeontx2-af: APR Mapping Fixes

This patch series includes fixes related to APR (LMT)
mapping and debugfs support.

Changes include:

Patch 1:Set LMT_ENA bit for APR table entries.
Enables the LMT line for each PF/VF by setting
the LMT_ENA bit in the APR_LMT_MAP_ENTRY_S
structure.

Patch-2:Fix APR entry in debugfs
The APR table was previously mapped using a fixed size,
which could lead to incorrect mappings when the number
of PFs and VFs differed from the assumed value.
This patch updates the logic to calculate the APR table
size dynamically, based on values from the APR_LMT_CFG
register, ensuring correct representation in debugfs.
====================

Link: https://patch.msgid.link/20250521060834.19780-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoocteontx2-af: Fix APR entry mapping based on APR_LMT_CFG
Geetha sowjanya [Wed, 21 May 2025 06:08:34 +0000 (11:38 +0530)]
octeontx2-af: Fix APR entry mapping based on APR_LMT_CFG

The current implementation maps the APR table using a fixed size,
which can lead to incorrect mapping when the number of PFs and VFs
varies.
This patch corrects the mapping by calculating the APR table
size dynamically based on the values configured in the
APR_LMT_CFG register, ensuring accurate representation
of APR entries in debugfs.

Fixes: 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table").
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Link: https://patch.msgid.link/20250521060834.19780-3-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoocteontx2-af: Set LMT_ENA bit for APR table entries
Subbaraya Sundeep [Wed, 21 May 2025 06:08:33 +0000 (11:38 +0530)]
octeontx2-af: Set LMT_ENA bit for APR table entries

This patch enables the LMT line for a PF/VF by setting the
LMT_ENA bit in the APR_LMT_MAP_ENTRY_S structure.

Additionally, it simplifies the logic for calculating the
LMTST table index by consistently using the maximum
number of hw supported VFs (i.e., 256).

Fixes: 873a1e3d207a ("octeontx2-af: cn10k: Setting up lmtst map table").
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250521060834.19780-2-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoMerge tag 'ipsec-2025-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/klasser...
Paolo Abeni [Thu, 22 May 2025 09:49:52 +0000 (11:49 +0200)]
Merge tag 'ipsec-2025-05-21' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2025-05-21

1) Fix some missing kfree_skb in the error paths of espintcp.
   From Sabrina Dubroca.

2) Fix a reference leak in espintcp.
   From Sabrina Dubroca.

3) Fix UDP GRO handling for ESPINUDP.
   From Tobias Brunner.

4) Fix ipcomp truesize computation on the receive path.
   From Sabrina Dubroca.

5) Sanitize marks before policy/state insertation.
   From Paul Chaignon.

* tag 'ipsec-2025-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: Sanitize marks before insert
  xfrm: ipcomp: fix truesize computation on receive
  xfrm: Fix UDP GRO handling for some corner cases
  espintcp: remove encap socket caching to avoid reference leak
  espintcp: fix skb leaks
====================

Link: https://patch.msgid.link/20250521054348.4057269-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done
Wang Liang [Tue, 20 May 2025 10:14:04 +0000 (18:14 +0800)]
net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done

Syzbot reported a slab-use-after-free with the following call trace:

  ==================================================================
  BUG: KASAN: slab-use-after-free in tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
  Read of size 8 at addr ffff88807a733000 by task kworker/1:0/25

  Call Trace:
   kasan_report+0xd9/0x110 mm/kasan/report.c:601
   tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
   crypto_request_complete include/crypto/algapi.h:266
   aead_request_complete include/crypto/internal/aead.h:85
   cryptd_aead_crypt+0x3b8/0x750 crypto/cryptd.c:772
   crypto_request_complete include/crypto/algapi.h:266
   cryptd_queue_worker+0x131/0x200 crypto/cryptd.c:181
   process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231

  Allocated by task 8355:
   kzalloc_noprof include/linux/slab.h:778
   tipc_crypto_start+0xcc/0x9e0 net/tipc/crypto.c:1466
   tipc_init_net+0x2dd/0x430 net/tipc/core.c:72
   ops_init+0xb9/0x650 net/core/net_namespace.c:139
   setup_net+0x435/0xb40 net/core/net_namespace.c:343
   copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
   create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
   unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
   ksys_unshare+0x419/0x970 kernel/fork.c:3323
   __do_sys_unshare kernel/fork.c:3394

  Freed by task 63:
   kfree+0x12a/0x3b0 mm/slub.c:4557
   tipc_crypto_stop+0x23c/0x500 net/tipc/crypto.c:1539
   tipc_exit_net+0x8c/0x110 net/tipc/core.c:119
   ops_exit_list+0xb0/0x180 net/core/net_namespace.c:173
   cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
   process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231

After freed the tipc_crypto tx by delete namespace, tipc_aead_encrypt_done
may still visit it in cryptd_queue_worker workqueue.

I reproduce this issue by:
  ip netns add ns1
  ip link add veth1 type veth peer name veth2
  ip link set veth1 netns ns1
  ip netns exec ns1 tipc bearer enable media eth dev veth1
  ip netns exec ns1 tipc node set key this_is_a_master_key master
  ip netns exec ns1 tipc bearer disable media eth dev veth1
  ip netns del ns1

The key of reproduction is that, simd_aead_encrypt is interrupted, leading
to crypto_simd_usable() return false. Thus, the cryptd_queue_worker is
triggered, and the tipc_crypto tx will be visited.

  tipc_disc_timeout
    tipc_bearer_xmit_skb
      tipc_crypto_xmit
        tipc_aead_encrypt
          crypto_aead_encrypt
            // encrypt()
            simd_aead_encrypt
              // crypto_simd_usable() is false
              child = &ctx->cryptd_tfm->base;

  simd_aead_encrypt
    crypto_aead_encrypt
      // encrypt()
      cryptd_aead_encrypt_enqueue
        cryptd_aead_enqueue
          cryptd_enqueue_request
            // trigger cryptd_queue_worker
            queue_work_on(smp_processor_id(), cryptd_wq, &cpu_queue->work)

Fix this by holding net reference count before encrypt.

Reported-by: syzbot+55c12726619ff85ce1f6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=55c12726619ff85ce1f6
Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20250520101404.1341730-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoocteontx2-pf: Avoid adding dcbnl_ops for LBK and SDP vf
Suman Ghosh [Mon, 19 May 2025 07:26:58 +0000 (12:56 +0530)]
octeontx2-pf: Avoid adding dcbnl_ops for LBK and SDP vf

Priority flow control is not supported for LBK and SDP vf. This patch
adds support to not add dcbnl_ops for LBK and SDP vf.

Fixes: 8e67558177f8 ("octeontx2-pf: PFC config support with DCBx")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250519072658.2960851-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoMerge branch 'net_sched-fix-hfsc-qlen-backlog-accounting-bug-and-add-selftest'
Paolo Abeni [Thu, 22 May 2025 09:16:53 +0000 (11:16 +0200)]
Merge branch 'net_sched-fix-hfsc-qlen-backlog-accounting-bug-and-add-selftest'

Cong Wang says:

====================
net_sched: Fix HFSC qlen/backlog accounting bug and add selftest

This series addresses a long-standing bug in the HFSC qdisc where queue length
and backlog accounting could become inconsistent if a packet is dropped during
a peek-induced dequeue operation, and adds a corresponding selftest to tc-testing.
====================

Link: https://patch.msgid.link/20250518222038.58538-1-xiyou.wangcong@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoselftests/tc-testing: Add an HFSC qlen accounting test
Cong Wang [Sun, 18 May 2025 22:20:38 +0000 (15:20 -0700)]
selftests/tc-testing: Add an HFSC qlen accounting test

This test reproduces a scenario where HFSC queue length and backlog accounting
can become inconsistent when a peek operation triggers a dequeue and possible
drop before the parent qdisc updates its counters. The test sets up a DRR root
qdisc with an HFSC class, netem, and blackhole children, and uses Scapy to
inject a packet. It helps to verify that HFSC correctly tracks qlen and backlog
even when packets are dropped during peek-induced dequeue.

Cc: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250518222038.58538-3-xiyou.wangcong@gmail.com
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agosch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
Cong Wang [Sun, 18 May 2025 22:20:37 +0000 (15:20 -0700)]
sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()

When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the
child qdisc's peek() operation before incrementing sch->q.qlen and
sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may
trigger an immediate dequeue and potential packet drop. In such cases,
qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog
have not yet been updated, leading to inconsistent queue accounting. This
can leave an empty HFSC class in the active list, causing further
consequences like use-after-free.

This patch fixes the bug by moving the increment of sch->q.qlen and
sch->qstats.backlog before the call to the child qdisc's peek() operation.
This ensures that queue length and backlog are always accurate when packet
drops or dequeues are triggered during the peek.

Fixes: 12d0ad3be9c3 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline")
Reported-by: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoiommu: Skip PASID validation for devices without PASID capability
Tushar Dave [Tue, 20 May 2025 01:19:37 +0000 (18:19 -0700)]
iommu: Skip PASID validation for devices without PASID capability

Generally PASID support requires ACS settings that usually create
single device groups, but there are some niche cases where we can get
multi-device groups and still have working PASID support. The primary
issue is that PCI switches are not required to treat PASID tagged TLPs
specially so appropriate ACS settings are required to route all TLPs to
the host bridge if PASID is going to work properly.

pci_enable_pasid() does check that each device that will use PASID has
the proper ACS settings to achieve this routing.

However, no-PASID devices can be combined with PASID capable devices
within the same topology using non-uniform ACS settings. In this case
the no-PASID devices may not have strict route to host ACS flags and
end up being grouped with the PASID devices.

This configuration fails to allow use of the PASID within the iommu
core code which wrongly checks if the no-PASID device supports PASID.

Fix this by ignoring no-PASID devices during the PASID validation. They
will never issue a PASID TLP anyhow so they can be ignored.

Fixes: c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()")
Cc: stable@vger.kernel.org
Signed-off-by: Tushar Dave <tdave@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20250520011937.3230557-1-tdave@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
3 weeks agoidpf: fix idpf_vport_splitq_napi_poll()
Eric Dumazet [Tue, 20 May 2025 12:40:30 +0000 (12:40 +0000)]
idpf: fix idpf_vport_splitq_napi_poll()

idpf_vport_splitq_napi_poll() can incorrectly return @budget
after napi_complete_done() has been called.

This violates NAPI rules, because after napi_complete_done(),
current thread lost napi ownership.

Move the test against POLL_MODE before the napi_complete_done().

Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support")
Reported-by: Peter Newman <peternewman@google.com>
Closes: https://lore.kernel.org/netdev/20250520121908.1805732-1-edumazet@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joshua Hay <joshua.a.hay@intel.com>
Cc: Alan Brady <alan.brady@intel.com>
Cc: Madhu Chittim <madhu.chittim@intel.com>
Cc: Phani Burra <phani.r.burra@intel.com>
Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Link: https://patch.msgid.link/20250520124030.1983936-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoksmbd: use list_first_entry_or_null for opinfo_get_list()
Namjae Jeon [Tue, 20 May 2025 00:25:03 +0000 (09:25 +0900)]
ksmbd: use list_first_entry_or_null for opinfo_get_list()

The list_first_entry() macro never returns NULL.  If the list is
empty then it returns an invalid pointer.  Use list_first_entry_or_null()
to check if the list is empty.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202505080231.7OXwq4Te-lkp@intel.com/
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 weeks agoksmbd: fix rename failure
Namjae Jeon [Wed, 21 May 2025 08:07:36 +0000 (17:07 +0900)]
ksmbd: fix rename failure

I found that rename fails after cifs mount due to update of
lookup_one_qstr_excl().

 mv a/c b/
mv: cannot move 'a/c' to 'b/c': No such file or directory

In order to rename to a new name regardless of whether the dentry is
negative, we need to get the dentry through lookup_one_qstr_excl().
So It will not return error if the name doesn't exist.

Fixes: 204a575e91f3 ("VFS: add common error checks to lookup_one_qstr_excl()")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 weeks agoio_uring/net: only retry recv bundle for a full transfer io_uring-6.15 io_uring-6.15-20250522
Jens Axboe [Thu, 22 May 2025 00:51:49 +0000 (18:51 -0600)]
io_uring/net: only retry recv bundle for a full transfer

If a shorter than assumed transfer was seen, a partial buffer will have
been filled. For that case it isn't sane to attempt to fill more into
the bundle before posting a completion, as that will cause a gap in
the received data.

Check if the iterator has hit zero and only allow to continue a bundle
operation if that is the case.

Also ensure that for putting finished buffers, only the current transfer
is accounted. Otherwise too many buffers may be put for a short transfer.

Link: https://github.com/axboe/liburing/issues/1409
Cc: stable@vger.kernel.org
Fixes: 7c71a0af81ba ("io_uring/net: improve recv bundles")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 22 May 2025 00:24:18 +0000 (17:24 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Fixes for some SoC clk drivers:

   - Define the gate clk for the OTG PHY on Rockchip RK3576 so the nvmem
     driver actually works

   - Initialize clk_hw_onecell_data::num before accessing the 'hws'
     array to keep UBSAN happy

   - Fix a perf degradation on the Allwinner D1 MMC clk that was making
     things half bad

   - Fix the Allwinner SNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT macro to have
     proper order of arguments"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: d1: Add missing divider for MMC mod clocks
  clk: s2mps11: initialise clk_hw_onecell_data::num before accessing ::hws[] in probe()
  clk: sunxi-ng: fix order of arguments in clock macro
  clk: rockchip: rk3576: define clk_otp_phy_g

3 weeks agobcachefs: Check for casefolded dirents in non casefolded dirs
Kent Overstreet [Wed, 21 May 2025 04:38:04 +0000 (00:38 -0400)]
bcachefs: Check for casefolded dirents in non casefolded dirs

Check for mismatches between casefold dirents and casefold directories.

A mismatch will cause lookups to fail, as we'll be doing the lookup with
the casefolded name, which won't match the non-casefolded dirent, and
vice versa.

Reported-by: Christopher Snowhill <chris@kode54.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Fix bch2_dirent_create_snapshot() for casefolding
Kent Overstreet [Wed, 21 May 2025 04:41:07 +0000 (00:41 -0400)]
bcachefs: Fix bch2_dirent_create_snapshot() for casefolding

bch2_dirent_create_snapshot(), used in fsck, neglected to create a
casefolded dirent.

Just move this into dirent_create_key().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Fix casefold opt via xattr interface
Kent Overstreet [Wed, 21 May 2025 00:05:45 +0000 (20:05 -0400)]
bcachefs: Fix casefold opt via xattr interface

Changing the casefold option requires extra checks/work - factor out a
helper from bch2_fileattr_set() for the xattr code to use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agoMerge branch 'there-are-some-bugfix-for-hibmcge-driver'
Jakub Kicinski [Wed, 21 May 2025 22:53:54 +0000 (15:53 -0700)]
Merge branch 'there-are-some-bugfix-for-hibmcge-driver'

Jijie Shao says:

====================
There are some bugfix for hibmcge driver

v1: https://lore.kernel.org/20250430093127.2400813-1-shaojijie@huawei.com
====================

Link: https://patch.msgid.link/20250517095828.1763126-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: hibmcge: fix wrong ndo.open() after reset fail issue.
Jijie Shao [Sat, 17 May 2025 09:58:28 +0000 (17:58 +0800)]
net: hibmcge: fix wrong ndo.open() after reset fail issue.

If the driver reset fails, it may not work properly.
Therefore, the ndo.open() operation should be rejected.

In this patch, the driver calls netif_device_detach()
before the reset and calls netif_device_attach()
after the reset succeeds. If the reset fails,
netif_device_attach() is not called. Therefore,
netdev does not present and cannot be opened.

If reset fails, only the PCI reset (via sysfs)
can be used to attempt recovery.

Fixes: 3f5a61f6d504 ("net: hibmcge: Add reset supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250517095828.1763126-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: hibmcge: fix incorrect statistics update issue
Jijie Shao [Sat, 17 May 2025 09:58:27 +0000 (17:58 +0800)]
net: hibmcge: fix incorrect statistics update issue

When the user dumps statistics, the hibmcge driver automatically
updates all statistics. If the driver is performing the reset operation,
the error data of 0xFFFFFFFF is updated.

Therefore, if the driver is resetting, the hbg_update_stats_by_info()
needs to return directly.

Fixes: c0bf9bf31e79 ("net: hibmcge: Add support for dump statistics")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250517095828.1763126-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge tag 'mvebu-fixes-6.15-1' of https://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Wed, 21 May 2025 21:26:53 +0000 (23:26 +0200)]
Merge tag 'mvebu-fixes-6.15-1' of https://git./linux/kernel/git/gclement/mvebu into arm/fixes

mvebu fixes for 6.15 (part 1)

Fix uDPU board LEDs by correcting pinctrl state

* tag 'mvebu-fixes-6.15-1' of https://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
  arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs

Link: https://lore.kernel.org/r/87wmagpr0a.fsf@BLaptop.bootlin.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 weeks agoMerge tag 'sunxi-fixes-for-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Wed, 21 May 2025 21:26:38 +0000 (23:26 +0200)]
Merge tag 'sunxi-fixes-for-6.15' of https://git./linux/kernel/git/sunxi/linux into arm/fixes

Allwinner fixes for 6.15

Only one fix:

Switch back to I2C for PMICs on Allwinner H6 devices. Apparently using
Allwinner's proprietary bus ended up causing issues when the PMIC was
sharing the bus with other devices.

* tag 'sunxi-fixes-for-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection"

Link: https://lore.kernel.org/r/aCaeLgjZllV7bauX@wens.tw
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 weeks agoarm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selected
Fabio Estevam [Thu, 15 May 2025 15:41:36 +0000 (12:41 -0300)]
arm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selected

Since commit 17ec3e71ba79 ("crypto: lib/Kconfig - Hide arch options from
user"), the CRYPTO_CHACHA20_NEON option is no longer selected by default
due to changes in how crypto library options are exposed and selected.

To restore the previous behavior and ensure CRYPTO_CHACHA20_NEON is
enabled, explicitly select CONFIG_CRYPTO_CHACHA20 in the defconfig. This
pulls in CRYPTO_LIB_CHACHA_INTERNAL and allows CRYPTO_CHACHA20_NEON to be
selected automatically as before.

Fixes: 17ec3e71ba79 ("crypto: lib/Kconfig - Hide arch options from user")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 weeks agoMerge tag 'samsung-fixes-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Wed, 21 May 2025 17:02:31 +0000 (19:02 +0200)]
Merge tag 'samsung-fixes-6.15' of https://git./linux/kernel/git/krzk/linux into arm/fixes

Samsung SoC driver fixes for v6.15

1. Exynos ACPM driver (used on Google GS101): Fix timeout due to missing
   responses from the firmware part.

2. Samsung USI (serial engines) driver: Correct ineffective
   unconfiguring of the interface during probe removal.

* tag 'samsung-fixes-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  soc: samsung: usi: prevent wrong bits inversion during unconfiguring
  firmware: exynos-acpm: check saved RX before bailing out on empty RX queue

Link: https://lore.kernel.org/r/20250513101023.21552-5-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
3 weeks agoio_uring: fix overflow resched cqe reordering
Pavel Begunkov [Sat, 17 May 2025 12:27:37 +0000 (13:27 +0100)]
io_uring: fix overflow resched cqe reordering

Leaving the CQ critical section in the middle of a overflow flushing
can cause cqe reordering since the cache cq pointers are reset and any
new cqe emitters that might get called in between are not going to be
forced into io_cqe_cache_refill().

Fixes: eac2ca2d682f9 ("io_uring: check if we need to reschedule during overflow flush")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/90ba817f1a458f091f355f407de1c911d2b93bbf.1747483784.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 weeks agonvme: avoid creating multipath sysfs group under namespace path devices
Nilay Shroff [Wed, 21 May 2025 12:41:23 +0000 (18:11 +0530)]
nvme: avoid creating multipath sysfs group under namespace path devices

Commit 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin
io-policy") introduced the creation of the multipath sysfs group under
the NVMe head gendisk device node. However, it also inadvertently added
the same sysfs group under each namespace path device which head node
refers to and that is incorrect.

The multipath sysfs group should only be exposed through the namespace
head gendisk node. This is sufficient, as the head device already
provides symbolic links to the individual namespace paths it manages.

This patch fixes the issue by preventing the creation of the multipath
sysfs group under namespace path devices, ensuring it only appears under
the head disk node.

Fixes: 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 weeks agoxsk: Bring back busy polling support in XDP_COPY
Samiullah Khawaja [Fri, 16 May 2025 21:36:38 +0000 (21:36 +0000)]
xsk: Bring back busy polling support in XDP_COPY

Commit 5ef44b3cb43b ("xsk: Bring back busy polling support") fixed the
busy polling support in xsk for XDP_ZEROCOPY after it was broken in
commit 86e25f40aa1e ("net: napi: Add napi_config"). The busy polling
support with XDP_COPY remained broken since the napi_id setup in
xsk_rcv_check was removed.

Bring back the setup of napi_id for XDP_COPY so socket level SO_BUSYPOLL
can be used to poll the underlying napi.

Do the setup of napi_id for XDP_COPY in xsk_bind, as it is done
currently for XDP_ZEROCOPY. The setup of napi_id for XDP_COPY in
xsk_bind is safe because xsk_rcv_check checks that the rx queue at which
the packet arrives is equal to the queue_id that was supplied in bind.
This is done for both XDP_COPY and XDP_ZEROCOPY mode.

Tested using AF_XDP support in virtio-net by running the xsk_rr AF_XDP
benchmarking tool shared here:
https://lore.kernel.org/all/20250320163523.3501305-1-skhawaja@google.com/T/

Enabled socket busy polling using following commands in qemu,

```
sudo ethtool -L eth0 combined 1
echo 400 | sudo tee /proc/sys/net/core/busy_read
echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 15000   | sudo tee /sys/class/net/eth0/gro_flush_timeout
```

Fixes: 5ef44b3cb43b ("xsk: Bring back busy polling support")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 weeks agocan: slcan: allow reception of short error messages
Carlos Sanchez [Tue, 20 May 2025 10:23:05 +0000 (12:23 +0200)]
can: slcan: allow reception of short error messages

Allows slcan to receive short messages (typically errors) from the serial
interface.

When error support was added to slcan protocol in
b32ff4668544e1333b694fcc7812b2d7397b4d6a ("can: slcan: extend the protocol
with error info") the minimum valid message size changed from 5 (minimum
standard can frame tIII0) to 3 ("e1a" is a valid protocol message, it is
one of the examples given in the comments for slcan_bump_err() ), but the
check for minimum message length prodicating all decoding was not adjusted.
This makes short error messages discarded and error frames not being
generated.

This patch changes the minimum length to the new minimum (3 characters,
excluding terminator, is now a valid message).

Signed-off-by: Carlos Sanchez <carlossanchez@geotab.com>
Fixes: b32ff4668544 ("can: slcan: extend the protocol with error info")
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20250520102305.1097494-1-carlossanchez@geotab.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 weeks agoMerge patch series "fs/buffer: misc optimizations"
Christian Brauner [Wed, 21 May 2025 07:34:31 +0000 (09:34 +0200)]
Merge patch series "fs/buffer: misc optimizations"

Davidlohr Bueso <dave@stgolabs.net> says:

Four small patches - the first could be sent to Linus for v6.15
considering it is a missing nonblocking lookup conversion in the getblk
slowpath I had missed. The other two patches are small optimizations
found while reading the code, and one rocket science cleanup patch.

* patches from https://lore.kernel.org/20250515173925.147823-1-dave@stgolabs.net:
  fs/buffer: optimize discard_buffer()
  fs/buffer: remove superfluous statements
  fs/buffer: avoid redundant lookup in getblk slowpath
  fs/buffer: use sleeping lookup in __getblk_slowpath()

Link: https://lore.kernel.org/20250515173925.147823-1-dave@stgolabs.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agofs/buffer: optimize discard_buffer()
Davidlohr Bueso [Thu, 15 May 2025 17:39:25 +0000 (10:39 -0700)]
fs/buffer: optimize discard_buffer()

While invalidating, the clearing of the bits in discard_buffer()
is done in one fully ordered CAS operation. In the past this was
done via individual clear_bit(), until e7470ee89f0 (fs: buffer:
do not use unnecessary atomic operations when discarding buffers).
This implies that there were never strong ordering requirements
outside of being serialized by the buffer lock.

As such relax the ordering for archs that can benefit. Further,
the implied ordering in buffer_unlock() makes current cmpxchg
implied barrier redundant due to release semantics. And while in
theory the unlock could be part of the bulk clearing, it is
best to leave it explicit, but without the double barriers.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-5-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agofs/buffer: remove superfluous statements
Davidlohr Bueso [Thu, 15 May 2025 17:39:24 +0000 (10:39 -0700)]
fs/buffer: remove superfluous statements

Get rid of those unnecessary return statements.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-4-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agofs/buffer: avoid redundant lookup in getblk slowpath
Davidlohr Bueso [Thu, 15 May 2025 17:39:23 +0000 (10:39 -0700)]
fs/buffer: avoid redundant lookup in getblk slowpath

__getblk_slow() already implies failing a first lookup
as the fastpath, so try to create the buffers immediately
and avoid the redundant lookup. This saves 5-10% of the
total cost/latency of the slowpath.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-3-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agofs/buffer: use sleeping lookup in __getblk_slowpath()
Davidlohr Bueso [Thu, 15 May 2025 17:39:22 +0000 (10:39 -0700)]
fs/buffer: use sleeping lookup in __getblk_slowpath()

Just as with the fast path, call the lookup variant depending
on the gfp flags.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/20250515173925.147823-2-dave@stgolabs.net
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agoMAINTAINERS: add mm memory policy section
Lorenzo Stoakes [Thu, 15 May 2025 19:13:58 +0000 (20:13 +0100)]
MAINTAINERS: add mm memory policy section

As part of the ongoing efforts to sub-divide memory management
maintainership and reviewership, establish a section for memory policy and
migration and add appropriate maintainers and reviewers.

[lorenzo.stoakes@oracle.com: add Ying as reviewer]
Link: https://lkml.kernel.org/r/ed6f0fc2-5608-4eea-b1be-07e3e19be263@lucifer.local
Link: https://lkml.kernel.org/r/20250515191358.205684-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Rakie Kim <rakie.kim@sk.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Huang Ying <ying.huang@linux.alibaba.com>
Acked-by: Byungchul Park <byungchul@sk.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gregory Price <gourry@gourry.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoMAINTAINERS: add mm ksm section
Lorenzo Stoakes [Thu, 15 May 2025 19:04:04 +0000 (20:04 +0100)]
MAINTAINERS: add mm ksm section

As part of the ongoing efforts to sub-divide memory management
maintainership and reviewership, establish a section for Kernel Samepage
Merging (KSM) and add appropriate maintainers and reviewers.

Link: https://lkml.kernel.org/r/20250515190404.203596-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: Xu Xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agokasan: avoid sleepable page allocation from atomic context
Alexander Gordeev [Thu, 15 May 2025 13:55:38 +0000 (15:55 +0200)]
kasan: avoid sleepable page allocation from atomic context

apply_to_pte_range() enters the lazy MMU mode and then invokes
kasan_populate_vmalloc_pte() callback on each page table walk iteration.
However, the callback can go into sleep when trying to allocate a single
page, e.g.  if an architecutre disables preemption on lazy MMU mode enter.

On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable() and
arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash occurs:

[    0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321
[    0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
[    0.663358] preempt_count: 1, expected: 0
[    0.663366] RCU nest depth: 0, expected: 0
[    0.663375] no locks held by kthreadd/2.
[    0.663383] Preemption disabled at:
[    0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0
[    0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT
[    0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux)
[    0.663409] Call Trace:
[    0.663410]  [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140
[    0.663413]  [<0002f3284c507b9e>] __might_resched+0x66e/0x700
[    0.663415]  [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0
[    0.663419]  [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0
[    0.663421]  [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0
[    0.663424]  [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120
[    0.663427]  [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0
[    0.663429]  [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120
[    0.663433]  [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0
[    0.663435]  [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0
[    0.663437]  [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0
[    0.663440]  [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40
[    0.663442]  [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0
[    0.663445]  [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10
[    0.663448]  [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0
[    0.663451]  [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310
[    0.663454]  [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110
[    0.663457]  [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330
[    0.663460]  [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0
[    0.663463]  [<0002f3284c45be90>] copy_process+0x280/0x4b90
[    0.663465]  [<0002f3284c460940>] kernel_clone+0xd0/0x4b0
[    0.663467]  [<0002f3284c46115e>] kernel_thread+0xbe/0xe0
[    0.663469]  [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0
[    0.663472]  [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0
[    0.663475]  [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38

Instead of allocating single pages per-PTE, bulk-allocate the shadow
memory prior to applying kasan_populate_vmalloc_pte() callback on a page
range.

Link: https://lkml.kernel.org/r/c61d3560297c93ed044f0b1af085610353a06a58.1747316918.git.agordeev@linux.ibm.com
Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agohighmem: add folio_test_partial_kmap()
Matthew Wilcox (Oracle) [Wed, 14 May 2025 17:06:02 +0000 (18:06 +0100)]
highmem: add folio_test_partial_kmap()

In commit c749d9b7ebbc ("iov_iter: fix copy_page_from_iter_atomic() if
KMAP_LOCAL_FORCE_MAP"), Hugh correctly noted that if KMAP_LOCAL_FORCE_MAP
is enabled, we must limit ourselves to PAGE_SIZE bytes per call to
kmap_local().  The same problem exists in memcpy_from_folio(),
memcpy_to_folio(), folio_zero_tail(), folio_fill_tail() and
memcpy_from_file_folio(), so add folio_test_partial_kmap() to do this more
succinctly.

Link: https://lkml.kernel.org/r/20250514170607.3000994-2-willy@infradead.org
Fixes: 00cdf76012ab ("mm: add memcpy_from_file_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoMAINTAINERS: add hung-task detector section
Lance Yang [Tue, 13 May 2025 05:22:34 +0000 (13:22 +0800)]
MAINTAINERS: add hung-task detector section

The hung-task detector is missing in MAINTAINERS.  While it's been quiet
recently, I'm actively working on it and volunteering to review patches.
Adding this section will make it easier for contributors to know who to
contact.

Link: https://lkml.kernel.org/r/20250513052234.46463-1-lance.yang@linux.dev
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agotaskstats: fix struct taskstats breaks backward compatibility since version 15
Wang Yaxin [Sat, 10 May 2025 07:54:13 +0000 (15:54 +0800)]
taskstats: fix struct taskstats breaks backward compatibility since version 15

Problem
========
commit 658eb5ab916d ("delayacct: add delay max to record delay peak")
  - adding more fields
commit f65c64f311ee ("delayacct: add delay min to record delay peak")
  - adding more fields
commit b016d0873777 ("taskstats: modify taskstats version")
 - version bump to 15

Since version 15 (TASKSTATS_VERSION=15) the new layout of the structure
adds fields in the middle of the structure, rendering all old software
incompatible with newer kernels and software compiled against the new
kernel headers incompatible with older kernels.

Solution
=========
move delay max and delay min to the end of taskstat, and bump
the version to 16 after the change

[wang.yaxin@zte.com.cn: adjust indentation]
Link: https://lkml.kernel.org/r/202505192131489882NSciXV4EGd8zzjLuwoOK@zte.com.cn
Link: https://lkml.kernel.org/r/20250510155413259V4JNRXxukdDgzsaL0Fo6a@zte.com.cn
Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak")
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agomm/truncate: fix out-of-bounds when doing a right-aligned split
Zhang Yi [Mon, 12 May 2025 06:28:25 +0000 (14:28 +0800)]
mm/truncate: fix out-of-bounds when doing a right-aligned split

When performing a right split on a folio, the split_at2 may point to a
not-present page if the offset + length equals the original folio size,
which will trigger the following error:

 BUG: unable to handle page fault for address: ffffea0006000008
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 143ffb9067 P4D 143ffb9067 PUD 143ffb8067 PMD 0
 Oops: Oops: 0000 [#1] SMP PTI
 CPU: 0 UID: 0 PID: 502640 Comm: fsx Not tainted 6.15.0-rc3-gc6156189fc6b #889 PR
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/4
 RIP: 0010:truncate_inode_partial_folio+0x208/0x620
 Code: ff 03 48 01 da e8 78 7e 13 00 48 83 05 10 b5 5a 0c 01 85 c0 0f 85 1c 02 001
 RSP: 0018:ffffc90005bafab0 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffffea0005ffff00 RCX: 0000000000000002
 RDX: 000000000000000c RSI: 0000000000013975 RDI: ffffc90005bafa30
 RBP: ffffea0006000000 R08: 0000000000000000 R09: 00000000000009bf
 R10: 00000000000007e0 R11: 0000000000000000 R12: 0000000000001633
 R13: 0000000000000000 R14: ffffea0005ffff00 R15: fffffffffffffffe
 FS:  00007f9f9a161740(0000) GS:ffff8894971fd000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffea0006000008 CR3: 000000017c2ae000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  truncate_inode_pages_range+0x226/0x720
  truncate_pagecache+0x57/0x90
  ...

Fix this issue by skipping the split if truncation aligns with the folio
size, make sure the split page number lies within the folio.

Link: https://lkml.kernel.org/r/20250512062825.3533342-1-yi.zhang@huaweicloud.com
Fixes: 7460b470a131 ("mm/truncate: use folio_split() in truncate operation")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: ErKun Yang <yangerkun@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoMAINTAINERS: add mm reclaim section
Lorenzo Stoakes [Mon, 12 May 2025 14:31:22 +0000 (15:31 +0100)]
MAINTAINERS: add mm reclaim section

In furtherance of ongoing efforts to ensure people are aware of who
de-facto maintains/has an interest in specific parts of mm, as well trying
to avoid get_maintainers.pl listing only Andrew and the mailing list for
mm files - establish a reclaim memory management section and add relevant
maintainers/reviewers.

This is a key part of memory management so sensibly deserves its own
section.

This encompasses both 'classical' reclaim and MGLRU and thus reflects this
in the reviewers from both, as well as those who have contributed
specifically on the memcg side of things.

Link: https://lkml.kernel.org/r/20250512143122.87740-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 weeks agoMAINTAINERS: update page allocator section
Lorenzo Stoakes [Mon, 12 May 2025 14:46:03 +0000 (15:46 +0100)]
MAINTAINERS: update page allocator section

Make Vlastimil maintainer of this section (with thanks to Vlastimil for
agreeing to this!) and add page isolation files for which this section
seem most appropriate.

We may wish to, in future, refactor/rename some of these files to more
logically fit what is actually being performed, but for the time being
this seems the most sensible place.

Additionally, fix the alphabetical ordering of files.

Link: https://lkml.kernel.org/r/20250512144603.90379-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>