git.kernel.dk Git - linux-block.git/log

Merge branches 'rcu/misc-for-6.16', 'rcu/seq-counters-for-6.16' and 'rcu/torture-for-6.16' into rcu/for-next

rcutorture: Fix issue with re-using old images on ARM64

On ARM64, when running with --configs '36*SRCU-P', I noticed that only 1 instance
instead of 36 for starting.

Fix it by checking for Image files, instead of bzImage which ARM does
not seem to have. With this I see all 36 instances running at the same
time in the batch.

Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01

Back in the day, rcutorture was about the only thing that tested off-stack
CPU masks, but now any arm64 system with more than 256 CPUs tests it
full time. In fact, it is necessary to hack the kernel to prevent such
a system from testing off-stack CPU masks. This means that there is
no longer much point in rcutorture going out of its way to test this.
And given the differences in how CPUMASK_OFFSTACK is enabled in x86 and
arm64, rcutorture would need to go out of its way.

This commit therefore removes CONFIG_CPUMASK_OFFSTACK=y (and the
CONFIG_MAXSMP=y required to enable it on x86) from TREE01.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcutorture: Reduce TREE01 CPU overcommit

The TREE01.boot nr_cpus kernel boot parameter has been set to 43 for
more than seven years, but it can cause RCU CPU stall warnings on arm64,
most of the time involving the stop-machine subsystem. This should
not be too surprising, given that this causes 43 vCPUs to spin with
interrupts disabled when there are only eight physical CPUs.

The point of this CPU overcommit is to test the ability of expedited RCU
grace period initialization to handle races with incoming CPUs that have
never previously been online. But limiting to 17 CPUs instead of 43
allows time for this code to be exercised, and eliminates (or at least
greatly reduces) the incidence of RCU CPU stall warnings on arm64.

So this commit therefore sets nr_cpus=17 in TREE01.boot.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

torture: Check for "Call trace:" as well as "Call Trace:"

Different architectures capitalize their splats differently. Who knew?

This commit therefore checks for both arm64 "Call trace:" and x86
"Call Trace:".

Reported-by: Joel Fernandes <joelagnelf@nvidia.com>
Closes: https://lore.kernel.org/all/553c33d8-2b51-4772-8aef-97b0163bc78e@nvidia.com/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcutorture: Perform more frequent testing of ->gpwrap

Currently, the ->gpwrap is not tested (at all per my testing) due to the
requirement of a large delta between a CPU's rdp->gp_seq and its node's
rnp->gpseq.

This results in no testing of ->gpwrap being set. This patch by default
adds 5 minutes of testing with ->gpwrap forced by lowering the delta
between rdp->gp_seq and rnp->gp_seq to just 8 GPs. All of this is
configurable, including the active time for the setting and a full
testing cycle.

By default, the first 25 minutes of a test will have the _default_
behavior there is right now (ULONG_MAX / 4) delta. Then for 5 minutes,
we switch to a smaller delta causing 1-2 wraps in 5 minutes. I believe
this is reasonable since we at least add a little bit of testing for
usecases where ->gpwrap is set.

[ Apply fix for Dan Carpenter's bug report on init path cleanup. ]
[ Apply kernel doc warning fix from Akira Yokosawa. ]

Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

torture: Add testing of RCU's Rust bindings to torture.sh

This commit adds a --do-rcu-rust parameter to torture.sh, which invokes
a rust_doctests_kernel kunit run. Note that kunit wants a clean source
tree, so this runs "make mrproper", which might come as a surprise to
some users. Should there be a --mrproper parameter to torture.sh to make
the user explicitly ask for it?

Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

torture: Add --do-{,no-}normal to torture.sh

Right now, torture.sh runs normal runs unconditionally, which can be slow
and thus annoying when you only want to test --kcsan or --kasan runs.
This commit therefore adds a --do-normal argument so that "--kcsan
--do-no-kasan --do-no-normal" runs only KCSAN runs. Note that specifying
"--do-no-kasan --do-no-kcsan --do-no-normal" gets normal runs, so you
should not try to use this as a synonym for --do-none.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

checkpatch: Deprecate srcu_read_lock_lite() and srcu_read_unlock_lite()

Uses of srcu_read_lock_lite() and srcu_read_unlock_lite() are better
served by the new srcu_read_lock_fast() and srcu_read_unlock_fast() APIs.
As in srcu_read_lock_lite() and srcu_read_unlock_lite() would never have
happened had I thought a bit harder a few months ago. Therefore, mark
them deprecated.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcutorture: Comment invocations of tick_dep_set_task()

The rcu_torture_reader() and rcu_torture_fwd_prog_cr() functions
run CPU-bound for extended periods of time (tens or even
hundreds of milliseconds), so they invoke tick_dep_set_task() and
tick_dep_clear_task() to ensure that the scheduling-clock tick helps
move grace periods forward.

So why doesn't rcu_torture_fwd_prog_nr() also invoke tick_dep_set_task()
and tick_dep_clear_task()? Because the point of this function is to test
RCU's ability to (eventually) force grace periods forward even when the
tick has been disabled during long CPU-bound kernel execution.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcu/nocb: Add Safe checks for access offloaded rdp

For built with CONFIG_PROVE_RCU=y and CONFIG_PREEMPT_RT=y kernels,
Disable BH does not change the SOFTIRQ corresponding bits in
preempt_count(), but change current->softirq_disable_cnt, this
resulted in the following splat:

WARNING: suspicious RCU usage
kernel/rcu/tree_plugin.h:36 Unsafe read of RCU_NOCB offloaded state!
stack backtrace:
CPU: 0 UID: 0 PID: 22 Comm: rcuc/0
Call Trace:
[    0.407907]  <TASK>
[    0.407910]  dump_stack_lvl+0xbb/0xd0
[    0.407917]  dump_stack+0x14/0x20
[    0.407920]  lockdep_rcu_suspicious+0x133/0x210
[    0.407932]  rcu_rdp_is_offloaded+0x1c3/0x270
[    0.407939]  rcu_core+0x471/0x900
[    0.407942]  ? lockdep_hardirqs_on+0xd5/0x160
[    0.407954]  rcu_cpu_kthread+0x25f/0x870
[    0.407959]  ? __pfx_rcu_cpu_kthread+0x10/0x10
[    0.407966]  smpboot_thread_fn+0x34c/0xa50
[    0.407970]  ? trace_preempt_on+0x54/0x120
[    0.407977]  ? __pfx_smpboot_thread_fn+0x10/0x10
[    0.407982]  kthread+0x40e/0x840
[    0.407990]  ? __pfx_kthread+0x10/0x10
[    0.407994]  ? rt_spin_unlock+0x4e/0xb0
[    0.407997]  ? rt_spin_unlock+0x4e/0xb0
[    0.408000]  ? __pfx_kthread+0x10/0x10
[    0.408006]  ? __pfx_kthread+0x10/0x10
[    0.408011]  ret_from_fork+0x40/0x70
[    0.408013]  ? __pfx_kthread+0x10/0x10
[    0.408018]  ret_from_fork_asm+0x1a/0x30
[    0.408042]  </TASK>

Currently, triggering an rdp offloaded state change need the
corresponding rdp's CPU goes offline, and at this time the rcuc
kthreads has already in parking state. this means the corresponding
rcuc kthreads can safely read offloaded state of rdp while it's
corresponding cpu is online.

This commit therefore add softirq_count() check for
Preempt-RT kernels.

Suggested-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcuscale: using kcalloc() to relpace kmalloc()

It's safer to using kcalloc() because it can prevent overflow
problem.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

doc/RCU/listRCU: refine example code for eliminating stale data

This patch adjust the example code with following two purpose:

* reduce the confusion on not releasing e->lock
* emphasize e is valid and not stale with e->lock held

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Alan Huang <mmpgouride@gmail.com>
Reviewed-by: Alan Huang <mmpgouride@gmail.com>
Link: https://lore.kernel.org/r/20250218005047.27258-1-richard.weiyang@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

doc: Update LWN RCU API links in whatisRCU.rst

This commit adds the 2024 LWN RCU API article set.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

Revert "rcu/nocb: Fix rcuog wake-up from offline softirq"

This reverts commit f7345ccc62a4b880cf76458db5f320725f28e400.

swake_up_one_online() has been removed because hrtimers can now assign
a proper online target to hrtimers queued from offline CPUs. Therefore
remove the related hackery.

Link: https://lore.kernel.org/all/20241231170712.149394-4-frederic@kernel.org/
Reviewed-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rust: sync: rcu: Mark Guard methods as inline

Currently the implementation of "Guard" methods are basically wrappers
around rcu's function within kernel. Building the kernel with llvm
18.1.8 on x86_64 machine will generate the following symbols:

$ nm vmlinux | grep ' _R'.*Guard | rustfilt
ffffffff817b6c90 T <kernel::sync::rcu::Guard>::new
ffffffff817b6cb0 T <kernel::sync::rcu::Guard>::unlock
ffffffff817b6cd0 T <kernel::sync::rcu::Guard as core::ops::drop::Drop>::drop
ffffffff817b6c90 T <kernel::sync::rcu::Guard as core::default::Default>::default

These Rust symbols are basically wrappers around functions
"rcu_read_lock" and "rcu_read_unlock". Marking them as inline can
reduce the generation of these symbols, and saves the size of code
generation for 132 bytes.

$ ./scripts/bloat-o-meter vmlinux_old vmlinux_new
(Output is demangled for readability)

add/remove: 0/10 grow/shrink: 0/1 up/down: 0/-132 (-132)
Function                                     old     new   delta
rust_driver_pci::SampleDriver::probe        1041    1034      -7
kernel::sync::rcu::Guard::default             9       -      -9
kernel::sync::rcu::Guard::drop                9       -      -9
kernel::sync::rcu::read_lock                  9       -      -9
kernel::sync::rcu::Guard::unlock              9       -      -9
kernel::sync::rcu::Guard::new                 9       -      -9
__pfx__kernel::sync::rcu::Guard::default     16       -     -16
__pfx__kernel::sync::rcu::Guard::drop        16       -     -16
__pfx__kernel::sync::rcu::read_lock          16       -     -16
__pfx__kernel::sync::rcu::Guard::unlock      16       -     -16
__pfx__kernel::sync::rcu::Guard::new         16       -     -16
Total: Before=23365955, After=23365823, chg -0.00%

Link: https://github.com/Rust-for-Linux/linux/issues/1145
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Charalampos Mitrodimas <charmitro@posteo.net>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcu/cpu_stall_cputime: fix the hardirq count for x86 architecture

When counting the number of hardirqs in the x86 architecture,
it is essential to add arch_irq_stat_cpu to ensure accuracy.

For example, a CPU loop within the rcu_read_lock function.

Before:
[   70.910184] rcu: INFO: rcu_preempt self-detected stall on CPU
[   70.910436] rcu:     3-....: (4999 ticks this GP) idle=***
[   70.910711] rcu:              hardirqs   softirqs   csw/system
[   70.910870] rcu:      number:        0        657            0
[   70.911024] rcu:     cputime:        0          0         2498   ==> 2498(ms)
[   70.911278] rcu:     (t=5001 jiffies g=3677 q=29 ncpus=8)

After:
[   68.046132] rcu: INFO: rcu_preempt self-detected stall on CPU
[   68.046354] rcu:     2-....: (4999 ticks this GP) idle=***
[   68.046628] rcu:              hardirqs   softirqs   csw/system
[   68.046793] rcu:      number:     2498        663            0
[   68.046951] rcu:     cputime:        0          0         2496   ==> 2496(ms)
[   68.047244] rcu:     (t=5000 jiffies g=3825 q=4 ncpus=8)

Fixes: be42f00b73a0 ("rcu: Add RCU stall diagnosis information")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501090842.SfI6QPGS-lkp@intel.com/
Signed-off-by: Yongliang Gao <leonylgao@tencent.com>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Link: https://lore.kernel.org/r/20250216084109.3109837-1-leonylgao@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcu: Remove swake_up_one_online() bandaid

It's now ok to perform a wake-up from an offline CPU because the
resulting armed scheduler bandwidth hrtimers are now correctly targeted
by hrtimer infrastructure.

Remove the obsolete hackerry.

Link: https://lore.kernel.org/all/20241231170712.149394-3-frederic@kernel.org/
Reviewed-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

MAINTAINERS: Update Zqiang's email address

This patch updates Zqiang's email address to qiang.zhang@linux.dev.

Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Acked-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT

The torture.sh --do-rt command-line parameter is intended to mimic -rt
kernels. Now that CONFIG_PREEMPT_RT is upstream, this commit makes this
mimicking more precise.

Note that testing of RCU priority boosting is disabled in favor
of forward-progress testing of RCU callbacks. If it turns out to be
possible to make kernels built with CONFIG_PREEMPT_RT=y to tolerate
testing of both, both will be enabled.

[ paulmck: Apply Sebastian Siewior feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

srcu: Use rcu_seq_done_exact() for polling API

poll_state_synchronize_srcu() uses rcu_seq_done() unlike
poll_state_synchronize_rcu() which uses rcu_seq_done_exact().

The rcu_seq_done_exact() makes more sense for polling API, as with
this API, there is a higher chance that there is a significant delay
between the get_state..() and poll_state..() calls since a cookie
can be stored and reused at a later time. During such a delay, if
the gp_seq counter progresses more than ULONG_MAX/2 distance, then
poll_state..() may return false for a long time unwantedly.

Fix by using the more accurate rcu_seq_done_exact() API which is
exactly what straight RCU's polling does.

It may make sense, as future work, to add debug code here as well, where
we compare a physical timestamp between get_state..() and poll_state()
calls and yell if significant time has past but the grace period has
still not progressed.

Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcu: Comment on the extraneous delta test on rcu_seq_done_exact()

The numbers used in rcu_seq_done_exact() lack some explanation behind
their magic. Especially after the commit:

    85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")

which reported a subtle issue where a new GP sequence snapshot was taken
on the root node state while a grace period had already been started and
reflected on the global state sequence but not yet on the root node
sequence, making a polling user waiting on a wrong already started grace
period that would ignore freshly online CPUs.

The fix involved taking the snaphot on the global state sequence and
waiting on the root node sequence. And since a grace period is first
started on the global state and only afterward reflected on the root
node, a snapshot taken on the global state sequence might be two full
grace periods ahead of the root node as in the following example:

rnp->gp_seq = rcu_state.gp_seq = 0

    CPU 0                                           CPU 1
    -----                                           -----
    // rcu_state.gp_seq = 1
    rcu_seq_start(&rcu_state.gp_seq)
                                                    // snap = 8
                                                    snap = rcu_seq_snap(&rcu_state.gp_seq)
                                                    // Two full GP differences
                                                    rcu_seq_done_exact(&rnp->gp_seq, snap)
    // rnp->gp_seq = 1
    WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);

Add a comment about those expectations and to clarify the magic within
the relevant function.

Note that the issue arises mainly with the use of rcu_seq_done_exact()
which has a much tigher guardband (of 2 GPs) to ensure the false-negative
window of the API during wraparound is limited to just 2 GPs.
rcu_seq_done() does not have such strict requirements, however its large
false-negative window of ULONG_MAX/2 is not ideal for the polling API.
However, this also means care is needed to ensure the guardband is as
large as needed to avoid the example scenario describe above which a
warning added in an earlier patch does.

[ Comment wordsmithing by Joel ]

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcu: Add warning to ensure rcu_seq_done_exact() is working

The previous patch improved the rcu_seq_done_exact() function by adding
a meaningful constant for the guardband.

Ensure that this is working for the future by a quick check during
rcu_gp_init().

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcu: Replace magic number with meaningful constant in rcu_seq_done_exact()

The rcu_seq_done_exact() function checks if a grace period has completed by
comparing sequence numbers. It includes a guard band to handle sequence number
wraparound, which was previously expressed using the magic number calculation
'3 * RCU_SEQ_STATE_MASK + 1'.

This magic number is not immediately obvious in terms of what it represents.

Instead, the reason we need this tiny guardband is because of the lag between
the setting of rcu_state.gp_seq_polled and root rnp's gp_seq in rcu_gp_init().

This guardband needs to be at least 2 GPs worth of counts, to avoid recognizing
the newly started GP as completed immediately, due to the following sequence
which arises due to the delay between update of rcu_state.gp_seq_polled and
root rnp's gp_seq:

rnp->gp_seq = rcu_state.gp_seq = 0

    CPU 0                                           CPU 1
    -----                                           -----
    // rcu_state.gp_seq = 1
    rcu_seq_start(&rcu_state.gp_seq)
                                                    // snap = 8
                                                    snap = rcu_seq_snap(&rcu_state.gp_seq)
                                                    // Two full GP differences
                                                    rcu_seq_done_exact(&rnp->gp_seq, snap)
    // rnp->gp_seq = 1
    WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);

This can happen due to get_state_synchronize_rcu_full() sampling
rcu_state.gp_seq_polled, however the poll_state_synchronize_rcu_full()
sampling the root rnp's gp_seq. The delay between the update of the 2
counters occurs in rcu_gp_init() during which the counters briefly go
out of sync.

Make the guardband explictly 2 GPs. This improves code readability and
maintainability by making the intent clearer as well.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcutorture: Split out beginning and end from rcu_torture_one_read()

The rcu_torture_one_read() function is designed for RCU readers that are
confined to a task, such that a single thread of control extends from the
beginning of a given RCU read-side critical section to its end.  This does
not suffice for things like srcu_down_read() and srcu_up_read(), where
the critical section might start at task level and end in a timer handler.

This commit therefore creates separate init_rcu_torture_one_read_state(),
rcu_torture_one_read_start(), and rcu_torture_one_read_end() functions,
along with a rcu_torture_one_read_state structure to coordinate their
actions.  These will be used to create tests for srcu_down_read()
and friends.

One caution:  The caller to rcu_torture_one_read_start() must enter the
initial read-side critical section prior to the call.  This enables use
of non-standard primitives such as srcu_down_read() while still using
the same validation code.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcutorture: Make srcu_lockdep.sh check reader-conflict handling

Mixing different flavors of RCU readers is forbidden, for example, you
should not use srcu_read_lock() and srcu_read_lock_nmisafe() on the same
srcu_struct structure. There are checks for this, but these checks are
not tested on a regular basis. This commit therefore adds such tests
to srcu_lockdep.sh.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

rcutorture: Make srcu_lockdep.sh check kernel Kconfig

The srcu_lockdep.sh currently blindly trusts the rcutorture SRCU-P
scenario to build its kernel with lockdep enabled. Of course, this
dependency might not be obvious to someone rebalancing SRCU scenarios.
This commit therefore adds code to srcu_lockdep.sh that verifies that
the .config file has lockdep enabled.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

MAINTAINERS: Update Joel's email address

Update MAINTAINERS file to reflect changes to Joel's email address for
upstream work.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

Linux 6.15-rc1

tools/include: make uapi/linux/types.h usable from assembly

The "real" linux/types.h UAPI header gracefully degrades to a NOOP when
included from assembly code.

Mirror this behaviour in the tools/ variant.

Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the
toolchain automatically.

Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/
Fixes: c9fbaa879508 ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'turbostat-2025.05.06' of git://git./linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

- support up to 8192 processors

- add cpuidle governor debug telemetry, disabled by default

- update default output to exclude cpuidle invocation counts

- bug fixes

* tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: v2025.05.06
  tools/power turbostat: disable "cpuidle" invocation counters, by default
  tools/power turbostat: re-factor sysfs code
  tools/power turbostat: Restore GFX sysfs fflush() call
  tools/power turbostat: Document GNR UncMHz domain convention
  tools/power turbostat: report CoreThr per measurement interval
  tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192
  tools/power turbostat: Add idle governor statistics reporting
  tools/power turbostat: Fix names matching
  tools/power turbostat: Allow Zero return value for some RAPL registers
  tools/power turbostat: Clustered Uncore MHz counters should honor show/hide options

Merge tag 'soundwire-6.15-rc1-fixes' of git://git./linux/kernel/git/vkoul/soundwire

Pull soundwire fix from Vinod Koul:

- add missing config symbol CONFIG_SND_HDA_EXT_CORE required for asoc
driver CONFIG_SND_SOF_SOF_HDA_SDW_BPT

* tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE

tools/power turbostat: v2025.05.06

Support up to 8192 processors
Add cpuidle governor debug telemetry, disabled by default
Update default output to exclude cpuidle invocation counts
Bug fixes

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: disable "cpuidle" invocation counters, by default

Create "pct_idle" counter group, the sofware notion of residency
so it can now be singled out, independent of other counter groups.

Create "cpuidle" group, the cpuidle invocation counts.
Disable "cpuidle", by default.

Create "swidle" = "cpuidle" + "pct_idle".
Undocument "sysfs", the old name for "swidle", but keep it working
for backwards compatibilty.

Create "hwidle", all the HW idle counters

Modify "idle", enabled by default
"idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle")

Signed-off-by: Len Brown <len.brown@intel.com>

Merge tag 'perf-urgent-2025-04-06' of git://git./linux/kernel/git/tip/tip

Pull perf event fix from Ingo Molnar:
"Fix a perf events time accounting bug"

* tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix child_total_time_enabled accounting bug at task exit

Merge tag 'sched-urgent-2025-04-06' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

- Fix a nonsensical Kconfig combination

- Remove an unnecessary rseq-notification

* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Eliminate useless task_work on execve
sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP

Disable SLUB_TINY for build testing

... and don't error out so hard on missing module descriptions.

Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
we used to warn about missing module descriptions, but only when
building with extra warnigns (ie 'W=1').

After that commit the warning became an unconditional hard error.

And it turns out not all modules have been converted despite the claims
to the contrary.  As reported by Damian Tometzki, the slub KUnit test
didn't have a module description, and apparently nobody ever really
noticed.

The reason nobody noticed seems to be that the slub KUnit tests get
disabled by SLUB_TINY, which also ends up disabling a lot of other code,
both in tests and in slub itself.  And so anybody doing full build tests
didn't actually see this failre.

So let's disable SLUB_TINY for build-only tests, since it clearly ends
up limiting build coverage.  Also turn the missing module descriptions
error back into a warning, but let's keep it around for non-'W=1'
builds.

Reported-by: Damian Tometzki <damian@riscv-rocks.de>
Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tools/power turbostat: re-factor sysfs code

Probe cpuidle "sysfs" residency and counts separately,
since soon we will make one disabled on, and the
other disabled off.

Clarify that some BIC (build-in-counters) are actually "groups".
since we're about to re-name some of those groups.

no functional change.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Restore GFX sysfs fflush() call

Do fflush() to discard the buffered data, before each read of the
graphics sysfs knobs.

Fixes: ba99a4fc8c24 ("tools/power turbostat: Remove unnecessary fflush() call")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Document GNR UncMHz domain convention

Document that on Intel Granite Rapids Systems,
Uncore domains 0-2 are CPU domains, and
uncore domains 3-4 are IO domains.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: report CoreThr per measurement interval

The CoreThr column displays total thermal throttling events
since boot time.

Change it to report events during the measurement interval.

This is more useful for showing a user the current conditions.
Total events since boot time are still available to the user via
/sys/devices/system/cpu/cpu*/thermal_throttle/*

Document CoreThr on turbostat.8

Fixes: eae97e053fe30 ("turbostat: Support thermal throttle count print")
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>

tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192

On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output:
"turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151"

A similar error appears with the use of turbostat --cpu when the inputted cpu
range contains a cpu number >= 1024:
# turbostat -c 1100-1151
"--cpu 1100-1151" malformed
...

Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS.

It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low.
For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS
brings support for parsing cpu numbers >= 1024.

Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64.

Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
Signed-off-by: Len Brown <len.brown@intel.com>

Merge tag 'timers-cleanups-2025-04-06' of git://git./linux/kernel/git/tip/tip

Pull timer cleanups from Thomas Gleixner:
"A set of final cleanups for the timer subsystem:

   - Convert all del_timer[_sync]() instances over to the new
     timer_delete[_sync]() API and remove the legacy wrappers.

     Conversion was done with coccinelle plus some manual fixups as
     coccinelle chokes on scoped_guard().

   - The final cleanup of the hrtimer_init() to hrtimer_setup()
     conversion.

     This has been delayed to the end of the merge window, so that all
     patches which have been merged through other trees are in mainline
     and all new users are catched.

  Doing this right before rc1 ensures that new code which is merged post
  rc1 is not introducing new instances of the original functionality"

* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/timers: Rename the hrtimer_init event to hrtimer_setup
  hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
  hrtimers: Rename debug_init() to debug_setup()
  hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
  hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
  hrtimers: Make callback function pointer private
  hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
  hrtimers: Switch to use __htimer_setup()
  hrtimers: Delete hrtimer_init()
  treewide: Convert new and leftover hrtimer_init() users
  treewide: Switch/rename to timer_delete[_sync]()

Merge tag 'irq-urgent-2025-04-06' of git://git./linux/kernel/git/tip/tip

Pull more irq updates from Thomas Gleixner:
"A set of updates for the interrupt subsystem:

   - A treewide cleanup for the irq_domain code, which makes the naming
     consistent and gets rid of the original oddity of naming domains
     'host'.

     This is a trivial mechanical change and is done late to ensure that
     all instances have been catched and new code merged post rc1 wont
     reintroduce new instances.

   - A trivial consistency fix in the migration code

     The recent introduction of irq_force_complete_move() in the core
     code, causes a problem for the nostalgia crowd who maintains ia64
     out of tree.

     The code assumes that hierarchical interrupt domains are enabled
     and dereferences irq_data::parent_data unconditionally. That works
     in mainline because both architectures which enable that code have
     hierarchical domains enabled. Though it breaks the ia64 build,
     which enables the functionality, but does not have hierarchical
     domains.

     While it's not really a problem for mainline today, this
     unconditional dereference is inconsistent and trivially fixable by
     using the existing helper function irqd_get_parent_data(), which
     has the appropriate #ifdeffery in place"

* tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
  irqdomain: Stop using 'host' for domain
  irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
  irqdomain: Rename irq_set_default_host() to irq_set_default_domain()

Merge tag 'timers-urgent-2025-04-06' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
"A revert to fix a adjtimex() regression:

  The recent change to prevent that time goes backwards for the coarse
  time getters due to immediate multiplier adjustments via adjtimex(),
  changed the way how the timekeeping core treats that.

  That change result in a regression on the adjtimex() side, which is
  user space visible:

   1) The forwarding of the base time moves the update out of the
      original period and establishes a new one. That's changing the
      behaviour of the [PF]LL control, which user space expects to be
      applied periodically.

   2) The clearing of the accumulated NTP error due to #1, changes the
      behaviour as well.

  An attempt to delay the multiplier/frequency update to the next tick
  did not solve the problem as userspace expects that the multiplier or
  frequency updates are in effect, when the syscall returns.

  There is a different solution for the coarse time problem available,
  so revert the offending commit to restore the existing adjtimex()
  behaviour"

* tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"

Merge tag 'sh-for-v6.15-tag1' of git://git./linux/kernel/git/glaubitz/sh-linux

Pull sh updates from John Paul Adrian Glaubitz:
"One important fix and one small configuration update.

  The first patch by Artur Rojek fixes an issue with the J2 firmware
  loader not being able to find the location of the device tree blob due
  to insufficient alignment of the .bss section which rendered J2 boards
  unbootable.

  The second patch by Johan Korsnes updates the defconfigs on sh to drop
  the CONFIG_NET_CLS_TCINDEX configuration option which became obsolete
  after 8c710f75256b ("net/sched: Retire tcindex classifier").

  Summary:

   - sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX

   - sh: Align .bss section padding to 8-byte boundary"

* tag 'sh-for-v6.15-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX
  sh: Align .bss section padding to 8-byte boundary

Merge tag 'kbuild-v6.15' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

- Improve performance in gendwarfksyms

- Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS

- Support CONFIG_HEADERS_INSTALL for ARCH=um

- Use more relative paths to sources files for better reproducibility

- Support the loong64 Debian architecture

- Add Kbuild bash completion

- Introduce intermediate vmlinux.unstripped for architectures that need
   static relocations to be stripped from the final vmlinux

- Fix versioning in Debian packages for -rc releases

- Treat missing MODULE_DESCRIPTION() as an error

- Convert Nios2 Makefiles to use the generic rule for built-in DTB

- Add debuginfo support to the RPM package

* tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
  kbuild: rpm-pkg: build a debuginfo RPM
  kconfig: merge_config: use an empty file as initfile
  nios2: migrate to the generic rule for built-in DTB
  rust: kbuild: skip `--remap-path-prefix` for `rustdoc`
  kbuild: pacman-pkg: hardcode module installation path
  kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally
  modpost: require a MODULE_DESCRIPTION()
  kbuild: make all file references relative to source root
  x86: drop unnecessary prefix map configuration
  kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS
  kbuild: Add a help message for "headers"
  kbuild: deb-pkg: remove "version" variable in mkdebian
  kbuild: deb-pkg: fix versioning for -rc releases
  Documentation/kbuild: Fix indentation in modules.rst example
  x86: Get rid of Makefile.postlink
  kbuild: Create intermediate vmlinux build with relocations preserved
  kbuild: Introduce Kconfig symbol for linking vmlinux with relocations
  kbuild: link-vmlinux.sh: Make output file name configurable
  kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y
  Revert "kheaders: Ignore silly-rename files"
  ...

Merge tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, mostly from the end of last week, this week was very
  quiet, maybe you scared everyone away. It's mostly amdgpu, and xe,
  with some i915, adp and bridge bits, since I think this is overly
  quiet I'd expect rc2 to be a bit more lively.

  bridge:
   - tda998x: Select CONFIG_DRM_KMS_HELPER

  amdgpu:
   - Guard against potential division by 0 in fan code
   - Zero RPM support for SMU 14.0.2
   - Properly handle SI and CIK support being disabled
   - PSR fixes
   - DML2 fixes
   - DP Link training fix
   - Vblank fixes
   - RAS fixes
   - Partitioning fix
   - SDMA fix
   - SMU 13.0.x fixes
   - Rom fetching fix
   - MES fixes
   - Queue reset fix

  xe:
   - Fix NULL pointer dereference on error path
   - Add missing HW workaround for BMG
   - Fix survivability mode not triggering
   - Fix build warning when DRM_FBDEV_EMULATION is not set

  i915:
   - Bounds check for scalers in DSC prefill latency computation
   - Fix build by adding a missing include

  adp:
   - Fix error handling in plane setup"

  # -----BEGIN PGP SIGNATURE-----

* tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
  drm/i2c: tda998x: select CONFIG_DRM_KMS_HELPER
  drm/amdgpu/gfx12: fix num_mec
  drm/amdgpu/gfx11: fix num_mec
  drm/amd/pm: Add gpu_metrics_v1_8
  drm/amdgpu: Prefer shadow rom when available
  drm/amd/pm: Update smu metrics table for smu_v13_0_6
  drm/amd/pm: Remove host limit metrics support
  Remove unnecessary firmware version check for gc v9_4_2
  drm/amdgpu: stop unmapping MQD for kernel queues v3
  Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA"
  drm/amdgpu: Parse all deferred errors with UMC aca handle
  drm/amdgpu: Update ta ras block
  drm/amdgpu: Add NPS2 to DPX compatible mode
  drm/amdgpu: Use correct gfx deferred error count
  drm/amd/display: Actually do immediate vblank disable
  drm/amd/display: prevent hang on link training fail
  Revert "drm/amd/display: dml2 soc dscclk use DPM table clk setting"
  drm/amd/display: Increase vblank offdelay for PSR panels
  drm/amd: Handle being compiled without SI or CIK support better
  drm/amd/pm: Add zero RPM enabled OD setting support for SMU14.0.2
  ...

kbuild: rpm-pkg: build a debuginfo RPM

The rpm-pkg make target currently suffers from a few issues related to
debuginfo:
1. debuginfo for things built into the kernel (vmlinux) is not available
   in any RPM produced by make rpm-pkg. This makes using tools like
   systemtap against a make rpm-pkg kernel impossible.
2. debug source for the kernel is not available. This means that
   commands like 'disas /s' in gdb, which display source intermixed with
   assembly, can only print file names/line numbers which then must be
   painstakingly resolved to actual source in a separate editor.
3. debuginfo for modules is available, but it remains bundled with the
   .ko files that contain module code, in the main kernel RPM. This is a
   waste of space for users who do not need to debug the kernel (i.e.
   most users).

Address all of these issues by additionally building a debuginfo RPM
when the kernel configuration allows for it, in line with standard
patterns followed by RPM distributors. With these changes:
1. systemtap now works (when these changes are backported to 6.11, since
   systemtap lags a bit behind in compatibility), as verified by the
   following simple test script:

   # stap -e 'probe kernel.function("do_sys_open").call { printf("%s\n", $$parms); }'
   dfd=0xffffffffffffff9c filename=0x7fe18800b160 flags=0x88800 mode=0x0
   ...

2. disas /s works correctly in gdb, with source and disassembly
   interspersed:

   # gdb vmlinux --batch -ex 'disas /s blk_op_str'
   Dump of assembler code for function blk_op_str:
   block/blk-core.c:
   125     {
      0xffffffff814c8740 <+0>:     endbr64

   127
   128             if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op])
      0xffffffff814c8744 <+4>:     mov    $0xffffffff824a7378,%rax
      0xffffffff814c874b <+11>:    cmp    $0x23,%edi
      0xffffffff814c874e <+14>:    ja     0xffffffff814c8768 <blk_op_str+40>
      0xffffffff814c8750 <+16>:    mov    %edi,%edi

   126             const char *op_str = "UNKNOWN";
      0xffffffff814c8752 <+18>:    mov    $0xffffffff824a7378,%rdx

   127
   128             if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op])
      0xffffffff814c8759 <+25>:    mov    -0x7dfa0160(,%rdi,8),%rax

   126             const char *op_str = "UNKNOWN";
      0xffffffff814c8761 <+33>:    test   %rax,%rax
      0xffffffff814c8764 <+36>:    cmove  %rdx,%rax

   129                     op_str = blk_op_name[op];
   130
   131             return op_str;
   132     }
      0xffffffff814c8768 <+40>:    jmp    0xffffffff81d01360 <__x86_return_thunk>
   End of assembler dump.

3. The size of the main kernel package goes down substantially,
   especially if many modules are built (quite typical). Here is a
   comparison of installed size of the kernel package (configured with
   allmodconfig, dwarf4 debuginfo, and module compression turned off)
   before and after this patch:

   # rpm -qi kernel-6.13* | grep -E '^(Version|Size)'
   Version     : 6.13.0postpatch+
   Size        : 1382874089
   Version     : 6.13.0prepatch+
   Size        : 17870795887

   This is a ~92% size reduction.

Note that a debuginfo package can only be produced if the following
configs are set:
- CONFIG_DEBUG_INFO=y
- CONFIG_MODULE_COMPRESS=n
- CONFIG_DEBUG_INFO_SPLIT=n

The first of these is obvious - we can't produce debuginfo if the build
does not generate it. The second two requirements can in principle be
removed, but doing so is difficult with the current approach, which uses
a generic rpmbuild script find-debuginfo.sh that processes all packaged
executables. If we want to remove those requirements the best path
forward is likely to add some debuginfo extraction/installation logic to
the modules_install target (controllable by flags). That way, it's
easier to operate on modules before they're compressed, and the logic
can be reused by all packaging targets.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kconfig: merge_config: use an empty file as initfile

The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).

Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

nios2: migrate to the generic rule for built-in DTB

Commit 654102df2ac2 ("kbuild: add generic support for built-in boot
DTBs") introduced generic support for built-in DTBs.

Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.

To keep consistency across architectures, this commit also renames
CONFIG_NIOS2_DTB_SOURCE_BOOL to CONFIG_BUILTIN_DTB, and
CONFIG_NIOS2_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX

This option was removed from Kconfig in 8c710f75256b ("net/sched:
Retire tcindex classifier") but from the defconfigs.

Fixes: 8c710f75256b ("net/sched: Retire tcindex classifier")
Signed-off-by: Johan Korsnes <johan.korsnes@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

sh: Align .bss section padding to 8-byte boundary

J2-based devices expect to find a device tree blob at the end of the
.bss section. As of a77725a9a3c5 ("scripts/dtc: Update to upstream
version v1.6.1-19-g0a3a9d3449c8"), libfdt enforces 8-byte alignment
for the DTB, causing J2 devices to fail early in sh_fdt_init().

As the J2 loader firmware calculates the DTB location based on the kernel
image .bss section size rather than the __bss_stop symbol offset, the
required alignment can't be enforced with BSS_SECTION(0, PAGE_SIZE, 8).

To fix this, inline a modified version of the above macro which grows
.bss by the required size. While this change affects all existing SH
boards, it should be benign on platforms which don't need this alignment.

Signed-off-by: Artur Rojek <contact@artur-rojek.eu>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Rob Landley <rob@landley.net>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

Merge tag 'input-for-v6.15-rc0' of git://git./linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

- a brand new driver for touchpads and touchbars in newer Apple devices

- support for Berlin-A series in goodix-berlin touchscreen driver

- improvements to matrix_keypad driver to better handle GPIOs toggling

- assorted small cleanups in other input drivers

* tag 'input-for-v6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: goodix_berlin - add support for Berlin-A series
  dt-bindings: input: goodix,gt9916: Document gt9897 compatible
  dt-bindings: input: matrix_keypad - add wakeup-source property
  dt-bindings: input: matrix_keypad - add missing property
  Input: pm8941-pwrkey - fix dev_dbg() output in pm8941_pwrkey_irq()
  Input: synaptics - hide unused smbus_pnp_ids[] array
  Input: apple_z2 - fix potential confusion in Kconfig
  Input: matrix_keypad - use fsleep for delays after activating columns
  Input: matrix_keypad - add settle time after enabling all columns
  dt-bindings: input: matrix_keypad: add settle time after enabling all columns
  dt-bindings: input: matrix_keypad: convert to YAML
  dt-bindings: input: Correct indentation and style in DTS example
  MAINTAINERS: Add entries for Apple Z2 touchscreen driver
  Input: apple_z2 - add a driver for Apple Z2 touchscreens
  dt-bindings: input: touchscreen: Add Z2 controller
  Input: Switch to use hrtimer_setup()
  Input: drop vb2_ops_wait_prepare/finish

tracing/timers: Rename the hrtimer_init event to hrtimer_setup

The function hrtimer_init() doesn't exist anymore. It was replaced by
hrtimer_setup().

Thus, rename the hrtimer_init trace event to hrtimer_setup to keep it
consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/cba84c3d853c5258aa3a262363a6eac08e2c7afc.1738746927.git.namcao@linutronix.de

hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()

All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename debug_init_on_stack() to debug_setup_on_stack() as well, to keep the
names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/073cf6162779a2f5b12624677d4c49ee7eccc1ed.1738746927.git.namcao@linutronix.de

hrtimers: Rename debug_init() to debug_setup()

All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename debug_init() to debug_setup() as well, to keep the names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/4b730c1f79648b16a1c5413f928fdc2e138dfc43.1738746927.git.namcao@linutronix.de

hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()

All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() as well, to
keep the names consistent.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/807694aedad9353421c4a7347629a30c5c31026f.1738746927.git.namcao@linutronix.de

hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()

The struct hrtimer::function field can only be changed using
hrtimer_setup*() or hrtimer_update_function(), and both already null-check
'function'. Therefore, null-checking 'function' in hrtimer_start_range_ns()
is not necessary.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/4661c571ee87980c340ccc318fc1a473c0c8f6bc.1738746927.git.namcao@linutronix.de

hrtimers: Make callback function pointer private

Make the struct hrtimer::function field private, to prevent users from
changing this field in an unsafe way. hrtimer_update_function() should be
used if the callback function needs to be changed.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/7d0e6e0c5c59a64a9bea940051aac05d750bc0c2.1738746927.git.namcao@linutronix.de

hrtimers: Merge __hrtimer_init() into __hrtimer_setup()

__hrtimer_init() is only called by __hrtimer_setup(). Simplify by merging
__hrtimer_init() into __hrtimer_setup().

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/8a0a847a35f711f66b2d05b57255aa44e7e61279.1738746927.git.namcao@linutronix.de

hrtimers: Switch to use __htimer_setup()

__hrtimer_init_sleeper() calls __hrtimer_init() and also sets up the
callback function. But there is already __hrtimer_setup() which does both
actions.

Switch to use __hrtimer_setup() to simplify the code.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/d9a45a51b6a8aa0045310d63f73753bf6b33f385.1738746927.git.namcao@linutronix.de

hrtimers: Delete hrtimer_init()

hrtimer_init() is now unused. Delete it.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/003722f60c7a2a4f8d4ed24fb741aa313b7e5136.1738746927.git.namcao@linutronix.de

treewide: Convert new and leftover hrtimer_init() users

hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.

Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.

Coccinelle scripted cleanup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

treewide: Switch/rename to timer_delete[_sync]()

timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'next' into for-linus

Prepare input updates for 6.15 merge window.

Merge tag 'v6.15-p3' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"This fixes a race condition in the newly added eip93 driver"

* tag 'v6.15-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: inside-secure/eip93 - acquire lock on eip93_put_descriptor hash

Merge tag 's390-6.15-2' of git://git./linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

- Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
   missing base register for relocated lowcore address

- Fix build failure on older linkers by conditionally adding the
   -no-pie linker option only when it is supported

- Fix inaccurate kernel messages in vfio-ap by providing descriptive
   error notifications for AP queue sharing violations

- Fix PCI isolation logic by ensuring non-VF devices correctly return
   false in zpci_bus_is_isolated_vf()

- Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
   proper sentinel element, preventing potential overruns and
   translation errors

- Cleanup header dependency problems with asm-offsets.c

- Add fault info for unexpected low-address protection faults in user
   mode

- Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
   handling with common code handling

- Use bitop functions to implement CPU flag helper functions to ensure
   that bits cannot get lost if modified in different contexts on a CPU

- Remove unused machine_flags for the lowcore

* tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
  s390/pci: Fix dev.dma_range_map missing sentinel element
  s390/mm: Dump fault info in case of low address protection fault
  s390/smp: Add support for HOTPLUG_SMT
  s390: Fix linker error when -no-pie option is unavailable
  s390/processor: Use bitop functions for cpu flag helper functions
  s390/asm-offsets: Remove ASM_OFFSETS_C
  s390/asm-offsets: Include ftrace_regs.h instead of ftrace.h
  s390/kvm: Split kvm_host header file
  s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
  s390/lowcore: Remove unused machine_flags
  s390/entry: Fix setting _CIF_MCCK_GUEST with lowcore relocation

Merge tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull more smb client updates from Steve French:

- reconnect fixes: three for updating rsize/wsize and an SMB1 reconnect
   fix

- RFC1001 fixes: fixing connections to nonstandard ports, and negprot
   retries

- fix mfsymlinks to old servers

- make mapping of open flags for SMB1 more accurate

- permission fixes: adding retry on open for write, and one for stat to
   workaround unexpected access denied

- add two new xattrs, one for retrieving SACL and one for retrieving
   owner (without having to retrieve the whole ACL)

- fix mount parm validation for echo_interval

- minor cleanup (including removing now unneeded cifs_truncate_page)

* tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal version number
  cifs: Implement is_network_name_deleted for SMB1
  cifs: Remove cifs_truncate_page() as it should be superfluous
  cifs: Do not add FILE_READ_ATTRIBUTES when using GENERIC_READ/EXECUTE/ALL
  cifs: Improve SMB2+ stat() to work also without FILE_READ_ATTRIBUTES
  cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES
  cifs: Fix querying and creating MF symlinks over SMB1
  cifs: Fix access_flags_to_smbopen_mode
  cifs: Fix negotiate retry functionality
  cifs: Improve handling of NetBIOS packets
  cifs: Allow to disable or force initialization of NetBIOS session
  cifs: Add a new xattr system.smb3_ntsd_owner for getting or setting owner
  cifs: Add a new xattr system.smb3_ntsd_sacl for getting or setting SACLs
  smb: client: Update IO sizes after reconnection
  smb: client: Store original IO parameters and prevent zero IO sizes
  smb:client: smb: client: Add reverse mapping from tcon to superblocks
  cifs: remove unreachable code in cifs_get_tcp_session()
  cifs: fix integer overflow in match_server()

Merge tag 'ntb-6.15' of https://github.com/jonmason/ntb

Pull ntb fixes from Jon Mason:
"Bug fixes for NTB Switchtec driver mw negative shift, Intel NTB link
  status db, ntb_perf double unmap (in error case), and MSI 64bit
  arithmetic.

  Also, add new AMD NTB PCI IDs, update AMD NTB maintainer, and pull in
  patch to reduce the stack usage in IDT driver"

* tag 'ntb-6.15' of https://github.com/jonmason/ntb:
  ntb_hw_amd: Add NTB PCI ID for new gen CPU
  ntb: reduce stack usage in idt_scan_mws
  ntb: use 64-bit arithmetic for the MSI doorbell mask
  MAINTAINERS: Update AMD NTB maintainers
  ntb_perf: Delete duplicate dmaengine_unmap_put() call in perf_copy_chunk()
  ntb: intel: Fix using link status DB's
  ntb_hw_switchtec: Fix shift-out-of-bounds in switchtec_ntb_mw_set_trans

Merge tag 'drm-misc-next-fixes-2025-04-04' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

Short summary of fixes pull:

bridge:
- tda998x: Select CONFIG_DRM_KMS_HELPER

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250404065105.GA27699@linux.fritz.box

Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"

This reverts commit 757b000f7b936edf79311ab0971fe465bbda75ea.

Miroslav reported that the changes for handling the inconsistencies in the
coarse time getters result in a regression on the adjtimex() side.

There are two issues:

  1) The forwarding of the base time moves the update out of the original
     period and establishes a new one.

  2) The clearing of the accumulated NTP error is changing the behaviour as
     well.

Userspace expects that multiplier/frequency updates are in effect, when the
syscall returns, so delaying the update to the next tick is not solving the
problem either.

Revert the change, so that the established expectations of user space
implementations (ntpd, chronyd) are restored. The re-introduced
inconsistency of the coarse time getters will be addressed in a subsequent
fix.

Fixes: 757b000f7b93 ("timekeeping: Fix possible inconsistencies in _COARSE clockids")
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/Z-qsg6iDGlcIJulJ@localhost

Merge tag 'riscv-for-linus-6.15-mw1' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

- The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed

- The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings

- Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling

- Support for relocatable !MMU kernels builds

- Support for hpge pfnmaps, which should improve TLB utilization

- Support for runtime constants, which improves the d_hash()
   performance

- Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm

- Various fixes, including:
      - We were missing a secondary mmu notifier call when flushing the
        tlb which is required for IOMMU
      - Fix ftrace panics by saving the registers as expected by ftrace
      - Fix a couple of stimecmp usage related to cpu hotplug
      - purgatory_start is now aligned as per the STVEC requirements
      - A fix for hugetlb when calculating the size of non-present PTEs

* tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (65 commits)
  riscv: Add norvc after .option arch in runtime const
  riscv: Make sure toolchain supports zba before using zba instructions
  riscv/purgatory: 4B align purgatory_start
  riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
  selftests: riscv: fix v_exec_initval_nolibc.c
  riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
  riscv: print hartid on bringup
  riscv: Add norvc after .option arch in runtime const
  riscv: Remove CONFIG_PAGE_OFFSET
  riscv: Support CONFIG_RELOCATABLE on riscv32
  asm-generic: Always define Elf_Rel and Elf_Rela
  riscv: Support CONFIG_RELOCATABLE on NOMMU
  riscv: Allow NOMMU kernels to access all of RAM
  riscv: Remove duplicate CONFIG_PAGE_OFFSET definition
  RISC-V: errata: Use medany for relocatable builds
  dt-bindings: riscv: document vector crypto requirements
  dt-bindings: riscv: add vector sub-extension dependencies
  dt-bindings: riscv: d requires f
  RISC-V: add f & d extension validation checks
  RISC-V: add vector crypto extension validation checks
  ...

Merge tag 'net-6.15-rc1' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.

  Current release - regressions:

   - four fixes for the netdev per-instance locking

  Current release - new code bugs:

   - consolidate more code between existing Rx zero-copy and uring so
     that the latter doesn't miss / have to duplicate the safety checks

  Previous releases - regressions:

   - ipv6: fix omitted Netlink attributes when using SKIP_STATS

  Previous releases - always broken:

   - net: fix geneve_opt length integer overflow

   - udp: fix multiple wrap arounds of sk->sk_rmem_alloc when it
     approaches INT_MAX

   - dsa: mvpp2: add a lock to avoid corruption of the shared TCAM

   - dsa: airoha: fix issues with traffic QoS configuration / offload,
     and flow table offload

  Misc:

   - touch up the Netlink YAML specs of old families to make them usable
     for user space C codegen"

* tag 'net-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
  selftests: net: amt: indicate progress in the stress test
  netlink: specs: rt_route: pull the ifa- prefix out of the names
  netlink: specs: rt_addr: pull the ifa- prefix out of the names
  netlink: specs: rt_addr: fix get multi command name
  netlink: specs: rt_addr: fix the spec format / schema failures
  net: avoid false positive warnings in __net_mp_close_rxq()
  net: move mp dev config validation to __net_mp_open_rxq()
  net: ibmveth: make veth_pool_store stop hanging
  arcnet: Add NULL check in com20020pci_probe()
  ipv6: Do not consider link down nexthops in path selection
  ipv6: Start path selection from the first nexthop
  usbnet:fix NPE during rx_complete
  net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROP
  net: fix geneve_opt length integer overflow
  io_uring/zcrx: fix selftests w/ updated netdev Python helpers
  selftests: net: use netdevsim in netns test
  docs: net: document netdev notifier expectations
  net: dummy: request ops lock
  netdevsim: add dummy device notifiers
  net: rename rtnl_net_debug to lock_debug
  ...

Merge tag 'spi-fix-v6.15-merge-window' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A small collection of fixes that came in during the merge window,
  everything is driver specific with nothing standing out particularly"

* tag 'spi-fix-v6.15-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: bcm2835: Restore native CS probing when pinctrl-bcm2835 is absent
  spi: bcm2835: Do not call gpiod_put() on invalid descriptor
  spi: cadence-qspi: revert "Improve spi memory performance"
  spi: cadence: Fix out-of-bounds array access in cdns_mrvl_xspi_setup_clock()
  spi: fsl-qspi: use devm function instead of driver remove
  spi: SPI_QPIC_SNAND should be tristate and depend on MTD
  spi-rockchip: Fix register out of bounds access

Merge tag 'soc-drivers-6.15-2' of git://git./linux/kernel/git/soc/soc

Pull more SoC driver updates from Arnd Bergmann:
"This is the promised follow-up to the soc drivers branch, adding minor
  updates to omap and freescale drivers.

  Most notably, Ioana Ciornei takes over maintenance of the DPAA bus
  driver used in some NXP (originally Freescale) chips"

* tag 'soc-drivers-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  bus: fsl-mc: Remove deadcode
  MAINTAINERS: add the linuppc-dev list to the fsl-mc bus entry
  MAINTAINERS: fix nonexistent dtbinding file name
  MAINTAINERS: add myself as maintainer for the fsl-mc bus
  irqdomain: soc: Switch to irq_find_mapping()
  Input: tsc2007 - accept standard properties

Merge tag 'platform-drivers-x86-v6.15-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

- thinkpad_acpi:
     - Fix NULL pointer dereferences while probing
     - Disable ACPI fan access for T495* and E560

- ISST: Correct command storage data length

* tag 'platform-drivers-x86-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  MAINTAINERS: consistently use my dedicated email address
  platform/x86: ISST: Correct command storage data length
  platform/x86: thinkpad_acpi: disable ACPI fan access for T495* and E560
  platform/x86: thinkpad_acpi: Fix NULL pointer dereferences while probing

genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()

Frank reported, that the common irq_force_complete_move() breaks the out of
tree build of ia64. The reason is that ia64 uses the migration code, but
does not have hierarchical interrupt domains enabled.

This went unnoticed in mainline as both x86 and RISC-V have hierarchical
domains enabled. Not that it matters for mainline, but it's still
inconsistent.

Use irqd_get_parent_data() instead of accessing the parent_data field
directly. The helper returns NULL when hierarchical domains are disabled
otherwise it accesses the parent_data field of the domain.

No functional change.

Fixes: 751dc837dabd ("genirq: Introduce common irq_force_complete_move() implementation")
Reported-by: Frank Scheiner <frank.scheiner@web.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Frank Scheiner <frank.scheiner@web.de>
Link: https://lore.kernel.org/all/87h634ugig.ffs@tglx

selftests: net: amt: indicate progress in the stress test

Our CI expects output from the test at least once every 10 minutes.
The AMT test when running on debug kernel is just on the edge
of that time for the stress test. Improve the output:
- print the name of the test first, before starting it,
- output a dot every 10% of the way.

Output after:

  TEST: amt discovery                                                 [ OK ]
  TEST: IPv4 amt multicast forwarding                                 [ OK ]
  TEST: IPv6 amt multicast forwarding                                 [ OK ]
  TEST: IPv4 amt traffic forwarding torture               ..........  [ OK ]
  TEST: IPv6 amt traffic forwarding torture               ..........  [ OK ]

Reviewed-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250403145636.2891166-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

irqdomain: Stop using 'host' for domain

It is confusing to see 'host' and 'domain' to be used as 'domain'. Given
this header is all about domains, switch the remaining 'host' uses to
'domain'.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-5-jirislaby@kernel.org

irqdomain: Rename irq_get_default_host() to irq_get_default_domain()

Naming interrupt domains host is confusing at best and the irqdomain code
uses both domain and host inconsistently.

Therefore rename irq_get_default_host() to irq_get_default_domain().

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-4-jirislaby@kernel.org

irqdomain: Rename irq_set_default_host() to irq_set_default_domain()

Naming interrupt domains host is confusing at best and the irqdomain code
uses both domain and host inconsistently.

Therefore rename irq_set_default_host() to irq_set_default_domain().

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-3-jirislaby@kernel.org

Merge branch 'netlink-specs-rt_addr-fix-problems-revealed-by-c-codegen'

Jakub Kicinski says:

====================
netlink: specs: rt_addr: fix problems revealed by C codegen

I put together basic YNL C support for classic netlink. This revealed
a few problems in the rt_addr spec.

v1: https://lore.kernel.org/20250401012939.2116915-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250403013706.2828322-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt_route: pull the ifa- prefix out of the names

YAML specs don't normally include the C prefix name in the name
of the YAML attr. Remove the ifa- prefix from all attributes
in route-attrs and metrics and specify name-prefix instead.

This is a bit risky, hopefully there aren't many users out there.

Fixes: 023289b4f582 ("doc/netlink: Add spec for rt route messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt_addr: pull the ifa- prefix out of the names

YAML specs don't normally include the C prefix name in the name
of the YAML attr. Remove the ifa- prefix from all attributes
in addr-attrs and specify name-prefix instead.

This is a bit risky, hopefully there aren't many users out there.

Fixes: dfb0f7d9d979 ("doc/netlink: Add spec for rt addr messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt_addr: fix get multi command name

Command names should match C defines, codegens may depend on it.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: rt_addr: fix the spec format / schema failures

The spec is mis-formatted, schema validation says:

  Failed validating 'type' in schema['properties']['operations']['properties']['list']['items']['properties']['dump']['properties']['request']['properties']['value']:
    {'minimum': 0, 'type': 'integer'}

  On instance['operations']['list'][3]['dump']['request']['value']:
    '58 - ifa-family'

The ifa-family clearly wants to be part of an attribute list.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Yuyang Huang <yuyanghuang@google.com>
Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support")
Link: https://patch.msgid.link/20250403013706.2828322-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-make-memory-provider-install-close-paths-more-common'

Jakub Kicinski says:

====================
net: make memory provider install / close paths more common

We seem to be fixing bugs in config path for devmem which also exist
in the io_uring ZC path. Let's try to make the two paths more common,
otherwise this is bound to keep happening.

Found by code inspection and compile tested only.

v1: https://lore.kernel.org/20250331194201.2026422-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250403013405.2827250-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: avoid false positive warnings in __net_mp_close_rxq()

Commit under Fixes solved the problem of spurious warnings when we
uninstall an MP from a device while its down. The __net_mp_close_rxq()
which is used by io_uring was not fixed. Move the fix over and reuse
__net_mp_close_rxq() in the devmem path.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Fixes: a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()")
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250403013405.2827250-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: move mp dev config validation to __net_mp_open_rxq()

devmem code performs a number of safety checks to avoid having
to reimplement all of them in the drivers. Move those to
__net_mp_open_rxq() and reuse that function for binding to make
sure that io_uring ZC also benefits from them.

While at it rename the queue ID variable to rxq_idx in
__net_mp_open_rxq(), we touch most of the relevant lines.

The XArray insertion is reordered after the netdev_rx_queue_restart()
call, otherwise we'd need to duplicate the queue index check
or risk inserting an invalid pointer. The XArray allocation
failures should be extremely rare.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Fixes: 6e18ed929d3b ("net: add helpers for setting a memory provider on an rx queue")
Link: https://patch.msgid.link/20250403013405.2827250-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ibmveth: make veth_pool_store stop hanging

v2:
- Created a single error handling unlock and exit in veth_pool_store
- Greatly expanded commit message with previous explanatory-only text

Summary: Use rtnl_mutex to synchronize veth_pool_store with itself,
ibmveth_close and ibmveth_open, preventing multiple calls in a row to
napi_disable.

Background: Two (or more) threads could call veth_pool_store through
writing to /sys/devices/vio/30000002/pool*/*. You can do this easily
with a little shell script. This causes a hang.

I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new
kernel. I ran this test again and saw:

    Setting pool0/active to 0
    Setting pool1/active to 1
    [   73.911067][ T4365] ibmveth 30000002 eth0: close starting
    Setting pool1/active to 1
    Setting pool1/active to 0
    [   73.911367][ T4366] ibmveth 30000002 eth0: close starting
    [   73.916056][ T4365] ibmveth 30000002 eth0: close complete
    [   73.916064][ T4365] ibmveth 30000002 eth0: open starting
    [  110.808564][  T712] systemd-journald[712]: Sent WATCHDOG=1 notification.
    [  230.808495][  T712] systemd-journald[712]: Sent WATCHDOG=1 notification.
    [  243.683786][  T123] INFO: task stress.sh:4365 blocked for more than 122 seconds.
    [  243.683827][  T123]       Not tainted 6.14.0-01103-g2df0c02dab82-dirty #8
    [  243.683833][  T123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [  243.683838][  T123] task:stress.sh       state:D stack:28096 pid:4365  tgid:4365  ppid:4364   task_flags:0x400040 flags:0x00042000
    [  243.683852][  T123] Call Trace:
    [  243.683857][  T123] [c00000000c38f690] [0000000000000001] 0x1 (unreliable)
    [  243.683868][  T123] [c00000000c38f840] [c00000000001f908] __switch_to+0x318/0x4e0
    [  243.683878][  T123] [c00000000c38f8a0] [c000000001549a70] __schedule+0x500/0x12a0
    [  243.683888][  T123] [c00000000c38f9a0] [c00000000154a878] schedule+0x68/0x210
    [  243.683896][  T123] [c00000000c38f9d0] [c00000000154ac80] schedule_preempt_disabled+0x30/0x50
    [  243.683904][  T123] [c00000000c38fa00] [c00000000154dbb0] __mutex_lock+0x730/0x10f0
    [  243.683913][  T123] [c00000000c38fb10] [c000000001154d40] napi_enable+0x30/0x60
    [  243.683921][  T123] [c00000000c38fb40] [c000000000f4ae94] ibmveth_open+0x68/0x5dc
    [  243.683928][  T123] [c00000000c38fbe0] [c000000000f4aa20] veth_pool_store+0x220/0x270
    [  243.683936][  T123] [c00000000c38fc70] [c000000000826278] sysfs_kf_write+0x68/0xb0
    [  243.683944][  T123] [c00000000c38fcb0] [c0000000008240b8] kernfs_fop_write_iter+0x198/0x2d0
    [  243.683951][  T123] [c00000000c38fd00] [c00000000071b9ac] vfs_write+0x34c/0x650
    [  243.683958][  T123] [c00000000c38fdc0] [c00000000071bea8] ksys_write+0x88/0x150
    [  243.683966][  T123] [c00000000c38fe10] [c0000000000317f4] system_call_exception+0x124/0x340
    [  243.683973][  T123] [c00000000c38fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
    ...
    [  243.684087][  T123] Showing all locks held in the system:
    [  243.684095][  T123] 1 lock held by khungtaskd/123:
    [  243.684099][  T123]  #0: c00000000278e370 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248
    [  243.684114][  T123] 4 locks held by stress.sh/4365:
    [  243.684119][  T123]  #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150
    [  243.684132][  T123]  #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0
    [  243.684143][  T123]  #2: c0000000366fb9a8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0
    [  243.684155][  T123]  #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_enable+0x30/0x60
    [  243.684166][  T123] 5 locks held by stress.sh/4366:
    [  243.684170][  T123]  #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150
    [  243.684183][  T123]  #1: c00000000aee2288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0
    [  243.684194][  T123]  #2: c0000000366f4ba8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0
    [  243.684205][  T123]  #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_disable+0x30/0x60
    [  243.684216][  T123]  #4: c0000003ff9bbf18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0x138/0x12a0

From the ibmveth debug, two threads are calling veth_pool_store, which
calls ibmveth_close and ibmveth_open. Here's the sequence:

  T4365             T4366
  ----------------- ----------------- ---------
  veth_pool_store   veth_pool_store
                    ibmveth_close
  ibmveth_close
  napi_disable
                    napi_disable
  ibmveth_open
  napi_enable                         <- HANG

ibmveth_close calls napi_disable at the top and ibmveth_open calls
napi_enable at the top.

https://docs.kernel.org/networking/napi.html]] says

  The control APIs are not idempotent. Control API calls are safe
  against concurrent use of datapath APIs but an incorrect sequence of
  control API calls may result in crashes, deadlocks, or race
  conditions. For example, calling napi_disable() multiple times in a
  row will deadlock.

In the normal open and close paths, rtnl_mutex is acquired to prevent
other callers. This is missing from veth_pool_store. Use rtnl_mutex in
veth_pool_store fixes these hangs.

Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com>
Fixes: 860f242eb534 ("[PATCH] ibmveth change buffer pools dynamically")
Reviewed-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250402154403.386744-1-davemarq@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

arcnet: Add NULL check in com20020pci_probe()

devm_kasprintf() returns NULL when memory allocation fails. Currently,
com20020pci_probe() does not check for this case, which results in a
NULL pointer dereference.

Add NULL check after devm_kasprintf() to prevent this issue and ensure
no resources are left allocated.

Fixes: 6b17a597fc2f ("arcnet: restoring support for multiple Sohard Arcnet cards")
Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
Link: https://patch.msgid.link/20250402135036.44697-1-bsdhenrymartin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-multipath-routing-fixes'

Ido Schimmel says:

====================
ipv6: Multipath routing fixes

This patchset contains two fixes for IPv6 multipath routing. See the
commit messages for more details.
====================

Link: https://patch.msgid.link/20250402114224.293392-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Do not consider link down nexthops in path selection

Nexthops whose link is down are not supposed to be considered during
path selection when the "ignore_routes_with_linkdown" sysctl is set.
This is done by assigning them a negative region boundary.

However, when comparing the computed hash (unsigned) with the region
boundary (signed), the negative region boundary is treated as unsigned,
resulting in incorrect nexthop selection.

Fix by treating the computed hash as signed. Note that the computed hash
is always in range of [0, 2^31 - 1].

Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250402114224.293392-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Start path selection from the first nexthop

Cited commit transitioned IPv6 path selection to use hash-threshold
instead of modulo-N. With hash-threshold, each nexthop is assigned a
region boundary in the multipath hash function's output space and a
nexthop is chosen if the calculated hash is smaller than the nexthop's
region boundary.

Hash-threshold does not work correctly if path selection does not start
with the first nexthop. For example, if fib6_select_path() is always
passed the last nexthop in the group, then it will always be chosen
because its region boundary covers the entire hash function's output
space.

Fix this by starting the selection process from the first nexthop and do
not consider nexthops for which rt6_score_route() provided a negative
score.

Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
Reported-by: Stanislav Fomichev <stfomichev@gmail.com>
Closes: https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250402114224.293392-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

usbnet:fix NPE during rx_complete

Missing usbnet_going_away Check in Critical Path.
The usb_submit_urb function lacks a usbnet_going_away
validation, whereas __usbnet_queue_skb includes this check.

This inconsistency creates a race condition where:
A URB request may succeed, but the corresponding SKB data
fails to be queued.

Subsequent processes:
(e.g., rx_complete → defer_bh → __skb_unlink(skb, list))
attempt to access skb->next, triggering a NULL pointer
dereference (Kernel Panic).

Fixes: 04e906839a05 ("usbnet: fix cyclical race on disconnect with work queue")
Cc: stable@vger.kernel.org
Signed-off-by: Ying Lu <luying1@xiaomi.com>
Link: https://patch.msgid.link/4c9ef2efaa07eb7f9a5042b74348a67e5a3a7aea.1743584159.git.luying1@xiaomi.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROP

In the current implementation octeontx2 manages XDP_ABORTED and XDP
invalid as XDP_PASS forwarding the skb to the networking stack.
Align the behaviour to other XDP drivers handling XDP_ABORTED and XDP
invalid as XDP_DROP.
Please note this patch has just compile tested.

Fixes: 06059a1a9a4a5 ("octeontx2-pf: Add XDP support to netdev PF")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250401-octeontx2-xdp-abort-fix-v1-1-f0587c35a0b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'x86-urgent-2025-04-04' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

- Fix a performance regression on AMD iGPU and dGPU drivers, related to
   the unintended activation of DMA bounce buffers that regressed game
   performance if KASLR disturbed things just enough

- Fix a copy_user_generic() performance regression on certain older
   non-FSRM/ERMS CPUs

- Fix a Clang build warning due to a semantic merge conflict the Kunit
   tree generated with the x86 tree

- Fix FRED related system hang during S4 resume

- Remove an unused API

* tag 'x86-urgent-2025-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fred: Fix system hang during S4 resume with FRED enabled
  x86/platform/iosf_mbi: Remove unused iosf_mbi_unregister_pmic_bus_access_notifier()
  x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers
  x86/tools: Drop duplicate unlikely() definition in insn_decoder_test.c
  x86/uaccess: Improve performance by aligning writes to 8 bytes in copy_user_generic(), on non-FSRM/ERMS CPUs

Merge tag 'sound-fix-6.15-rc1' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of device-specific fixes that have been gathered since
  the previous pull:

   - A few more HD-audio quirks and fixups

   - A series of Qualcomm AudioReach fixes

   - Various small fixes for ASoC rt5665, WSA, SOF and Cirrus"

* tag 'sound-fix-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Fix built-in mic on another ASUS VivoBook model
  ALSA: hda/realtek - Support mute led function for HP platform
  ASoC: imx-card: Add NULL check in imx_card_probe()
  ASoC: codecs: rt5665: Fix some error handling paths in rt5665_probe()
  ASoC: q6apm-dai: make use of q6apm_get_hw_pointer
  ASoC: qdsp6: q6apm-dai: fix capture pipeline overruns.
  ASoC: qdsp6: q6apm-dai: set 10 ms period and buffer alignment.
  ASoC: q6apm: add q6apm_get_hw_pointer helper
  ASoC: q6apm-dai: schedule all available frames to avoid dsp under-runs
  ASoC: SOF: hda/ptl: Move mic privacy change notification sending to a work
  ALSA/hda: intel-sdw-acpi: Remove (explicitly) unused header
  ALSA: hda/realtek: Enable Mute LED on HP OMEN 16 Laptop xd000xx
  ALSA: hda/tas2781: Upgrade calibratd-data writing code to support Alpha and Beta dsp firmware
  ASoC: qdsp6: q6asm-dai: fix q6asm_dai_compr_set_params error path
  ALSA: hda/realtek: Fix built-in mic breakage on ASUS VivoBook X515JA
  ASoC: sma1307: Fix error handling in sma1307_setting_loaded()
  ASoC: codecs: wsa884x: Correct VI sense channel mask
  ASoC: codecs: wsa883x: Correct VI sense channel mask
  firmware: cs_dsp: Ensure cs_dsp_load[_coeff]() returns 0 on success

Merge tag 'omap-for-v6.14/drivers-signed' of https://git./linux/kernel/git/khilman/linux-omap into soc/drivers-2

arm/omap: drivers: updates for v6.14

* tag 'omap-for-v6.14/drivers-signed' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
Input: tsc2007 - accept standard properties