Paul E. McKenney [Fri, 4 Feb 2022 20:45:18 +0000 (12:45 -0800)]
rcutorture: Suppress debugging grace period delays during flooding
Tree RCU supports grace-period delays using the rcutree.gp_cleanup_delay,
rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot
parameters. These delays are strictly for debugging purposes, and have
proven quite effective at exposing bugs involving race with CPU-hotplug
operations. However, these delays can result in false positives when
used in conjunction with callback flooding, for example, those generated
by the rcutorture.fwd_progress kernel boot parameter.
This commit therefore suppresses grace-period delays while callback
flooding is in progress.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 4 Feb 2022 01:53:22 +0000 (17:53 -0800)]
torture: Add rcu_normal and rcu_expedited runs to torture.sh
Currently, the rcupdate.rcu_normal and rcupdate.rcu_expedited kernel
boot parameters are not regularly tested. The potential addition of
polled expedited grace-period APIs increases the amount of code that is
affected by these kernel boot parameters. This commit therefore adds a
"--do-rt" argument to torture.sh to exercise these kernel-boot options.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 1 Feb 2022 15:01:20 +0000 (07:01 -0800)]
EXP rcutorture: Test polled expedited grace-period primitives
This commit adds tests of get_state_synchronize_rcu_expedited(),
start_poll_synchronize_rcu_expedited(), poll_state_synchronize_rcu_expedited(),
and cond_synchronize_rcu_expedited().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 1 Feb 2022 00:55:52 +0000 (16:55 -0800)]
EXP rcu: Add polled expedited grace-period primitives
This is an experimental proof of concept of polled expedited grace-period
functions. These functions are get_state_synchronize_rcu_expedited(),
start_poll_synchronize_rcu_expedited(), poll_state_synchronize_rcu_expedited(),
and cond_synchronize_rcu_expedited(), which are similar to
get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
poll_state_synchronize_rcu(), and cond_synchronize_rcu(), respectively.
One limitation is that start_poll_synchronize_rcu_expedited() cannot
be invoked before workqueues are initialized.
[ paulmck: Apply feedback from Neeraj Upadhyay. ]
Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
Cc: Brian Foster <bfoster@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Kent <raven@themaw.net>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Frederic Weisbecker [Wed, 2 Feb 2022 00:01:07 +0000 (01:01 +0100)]
EXP tick: Detect and fix jiffies update stall
On some rare cases, the timekeeper CPU may be delaying its jiffies
update duty for a while. Known causes include:
* The timekeeper is waiting on stop_machine in a MULTI_STOP_DISABLE_IRQ
or MULTI_STOP_RUN state. Disabled interrupts prevent from timekeeping
updates while waiting for the target CPU to complete its
stop_machine() callback.
* The timekeeper vcpu has VMEXIT'ed for a long while due to some overload
on the host.
Detect and fix these situations with emergency timekeeping catchups.
Original-patch-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 2 Feb 2022 17:10:04 +0000 (09:10 -0800)]
rcu: Clarify fill-the-gap comment in rcu_segcblist_advance()
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 7 Dec 2021 00:19:40 +0000 (16:19 -0800)]
EXP rcu-tasks: Check for abandoned callbacks
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 14 Feb 2022 18:42:18 +0000 (10:42 -0800)]
Merge branch 'lkmm-dev.2022.02.01b' into HEAD
lkmm-dev.2022.02.01b: LKMM development branch.
Paul E. McKenney [Mon, 14 Feb 2022 18:41:39 +0000 (10:41 -0800)]
Merge branch 'clocksource.2022.02.01b' into HEAD
clocksource.2022.02.01b: Clock-source watchdog updates.
Paul E. McKenney [Mon, 14 Feb 2022 18:40:07 +0000 (10:40 -0800)]
Merge branch 'lkmm.2022.02.01b' into HEAD
lkmm.2022.02.01b: Linux-kernel memory-model (LKMM) updates.
Paul E. McKenney [Mon, 14 Feb 2022 18:38:19 +0000 (10:38 -0800)]
Merge branches 'exp.2022.02.08a', 'fixes.2022.02.14a', 'rcu_barrier.2022.02.08a', 'rcu-tasks.2022.02.08a', 'rt.2022.02.01b', 'srcu.2022.02.08a', 'torture.2022.02.01b' and 'torturescript.2022.02.08a' into HEAD
exp.2022.02.08a: Expedited grace-period updates.
fixes.2022.02.14a: Miscellaneous fixes.
rcu_barrier.2022.02.08a: Make rcu_barrier() no longer exclude CPU hotplug.
rcu-tasks.2022.02.08a: RCU-tasks updates.
rt.2022.02.01b: Real-time-related updates.
srcu.2022.02.08a: Put SRCU on a memory diet.
torture.2022.02.01b: Torture-test updates.
torturescript.2022.02.08a: Torture-test scripting updates.
Yury Norov [Sun, 23 Jan 2022 18:38:53 +0000 (10:38 -0800)]
rcu: Replace cpumask_weight with cpumask_empty where appropriate
In some places, RCU code calls cpumask_weight() to check if any bit of a
given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Ingo Molnar [Sat, 15 Jan 2022 00:16:55 +0000 (16:16 -0800)]
rcu: Remove __read_mostly annotations from rcu_scheduler_active externs
Remove the __read_mostly attributes from the rcu_scheduler_active
extern declarations, because these attributes are ignored for
prototypes and we'd have to include the full <linux/cache.h> header
to gain this functionally pointless attribute defined.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Ingo Molnar [Sat, 15 Jan 2022 00:07:28 +0000 (16:07 -0800)]
rcu: Uninline multi-use function: finish_rcuwait()
This is a rarely used function, so uninlining its 3 instructions
is probably a win or a wash - but the main motivation is to
make <linux/rcuwait.h> independent of task_struct details.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 4 Jan 2022 18:34:34 +0000 (10:34 -0800)]
rcu: Mark writes to the rcu_segcblist structure's ->flags field
KCSAN reports data races between the rcu_segcblist_clear_flags() and
rcu_segcblist_set_flags() functions, though misreporting the latter
as a call to rcu_segcblist_is_enabled() from call_rcu(). This commit
converts the updates of this field to WRITE_ONCE(), relying on the
resulting unmarked reads to continue to detect buggy concurrent writes
to this field.
Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Zqiang [Sun, 26 Dec 2021 00:52:04 +0000 (08:52 +0800)]
kasan: Record work creation stack trace with interrupts enabled
Recording the work creation stack trace for KASAN reports in
call_rcu() is expensive, due to unwinding the stack, but also
due to acquiring depot_lock inside stackdepot (which may be contended).
Because calling kasan_record_aux_stack_noalloc() does not require
interrupts to already be disabled, this may unnecessarily extend
the time with interrupts disabled.
Therefore, move calling kasan_record_aux_stack() before the section
with interrupts disabled.
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sat, 18 Dec 2021 17:30:33 +0000 (09:30 -0800)]
rcu: Inline __call_rcu() into call_rcu()
Because __call_rcu() is invoked only by call_rcu(), this commit inlines
the former into the latter.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
David Woodhouse [Wed, 8 Dec 2021 23:41:53 +0000 (23:41 +0000)]
rcu: Add mutex for rcu boost kthread spawning and affinity setting
As we handle parallel CPU bringup, we will need to take care to avoid
spawning multiple boost threads, or race conditions when setting their
affinity. Spotted by Paul McKenney.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Uladzislau Rezki (Sony) [Wed, 1 Dec 2021 09:20:53 +0000 (10:20 +0100)]
rcu: Fix description of kvfree_rcu()
The kvfree_rcu() header comment's description of the "ptr" parameter
is unclear, therefore rephrase it to make it clear that it is a pointer
to the memory to eventually be passed to kvfree().
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 1 Dec 2021 14:27:59 +0000 (06:27 -0800)]
MAINTAINERS: Add Frederic and Neeraj to their RCU files
Adding Frederic as an RCU maintainer for kernel/rcu/tree_nocb.h given his
work with offloading and de-offloading callbacks from CPUs. Also adding
Neeraj for kernel/rcu/tasks.h given his focused work on RCU Tasks Trace.
As in I am reasonably certain that each understands the full contents
of the corresponding file.
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Paul E. McKenney [Tue, 1 Feb 2022 16:23:46 +0000 (08:23 -0800)]
rcutorture: Provide non-power-of-two Tasks RCU scenarios
This commit adjusts RUDE01 to 3 CPUs and TRACE01 to 5 CPUs in order to
test Tasks RCU's ability to handle non-power-of-two numbers of CPUs.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 31 Jan 2022 23:03:36 +0000 (15:03 -0800)]
rcutorture: Test SRCU size transitions
Thie commit adds kernel boot parameters to the SRCU-N and SRCU-P
rcutorture scenarios to cause SRCU-N to test contention-based resizing
and SRCU-P to test init_srcu_struct()-time resizing. Note that this
also tests never-resizing because the contention-based resizing normally
takes some minutes to make the shift.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 27 Jan 2022 17:39:15 +0000 (09:39 -0800)]
torture: Make torture.sh help message match reality
This commit fixes a couple of typos: s/--doall/--do-all/ and
s/--doallmodconfig/--do-allmodconfig/.
[ paulmck: Add Fixes: supplied by Paul Menzel. ]
Fixes:
a115a775a8d5 ("torture: Add "make allmodconfig" to torture.sh")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 18 Jan 2022 21:54:19 +0000 (13:54 -0800)]
rcu: Allow expedited RCU grace periods on incoming CPUs
Although it is usually safe to invoke synchronize_rcu_expedited() from a
preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
invoke a workqueue handler will hang due to RCU waiting on a CPU that
the scheduler is not paying attention to. This commit therefore expands
use of the existing workqueue-independent synchronize_rcu_expedited()
from early boot to also include CPUs that are being hotplugged.
Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 31 Jan 2022 21:27:15 +0000 (13:27 -0800)]
srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
This commit increases the sensitivity of contention detection by adding
checks to the acquisition of the srcu_data structure's lock on the
call_srcu() code path.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 31 Jan 2022 19:21:30 +0000 (11:21 -0800)]
srcu: Automatically determine size-transition strategy at boot
This commit adds a srcutree.convert_to_big option of zero that causes
SRCU to decide at boot whether to wait for contention (small systems) or
immediately expand to large (large systems). A new srcutree.big_cpu_lim
(defaulting to 128) defines how many CPUs constitute a large system.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Jiapeng Chong [Sat, 29 Jan 2022 03:45:02 +0000 (11:45 +0800)]
srcu: Make srcu_size_state_name static
This symbol is not used outside of srcutree.c, so this commit marks it static.
Doing so fixes the following sparse warning:
kernel/rcu/srcutree.c:1426:12: warning: symbol 'srcu_size_state_name'
was not declared. Should it be static?
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 28 Jan 2022 04:32:05 +0000 (20:32 -0800)]
srcu: Add contention-triggered addition of srcu_node tree
This commit instruments the acquisitions of the srcu_struct structure's
->lock, enabling the initiation of a transition from SRCU_SIZE_SMALL
to SRCU_SIZE_BIG when sufficient contention is experienced. The
instrumentation counts the number of trylock failures within the confines
of a single jiffy. If that number exceeds the value specified by the
srcutree.small_contention_lim kernel boot parameter (which defaults to
100), and if the value specified by the srcutree.convert_to_big kernel
boot parameter has the 0x10 bit set (defaults to 0), then a transition
will be automatically initiated.
By default, there will never be any transitions, so that none of the
srcu_struct structures ever gains an srcu_node array.
The useful values for srcutree.convert_to_big are:
0x00: Decide conversion approach at boot given system size.
0x01: Never convert.
0x02: Always convert at init_srcu_struct() time.
0x03: Convert when rcutorture prints its first round of statistics.
0x11: Convert if contention is encountered.
0x12: Convert if contention is encountered or when rcutorture prints
its first round of statistics, whichever comes first.
The value 0x12 acts the same as 0x02 because the conversion happens
before there is any chance of contention.
[ paulmck: Apply "static" feedback from kernel test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 27 Jan 2022 22:56:39 +0000 (14:56 -0800)]
srcu: Create concurrency-safe helper for initiating size transition
Once there are contention-initiated size transitions, it will be
possible for rcutorture to initiate a transition at the same time
as a contention-initiated transition. This commit therefore creates
a concurrency-safe helper function named srcu_transition_to_big() to
safely initiate size transitions.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 27 Jan 2022 21:47:42 +0000 (13:47 -0800)]
srcu: Explain srcu_funnel_gp_start() call to list_add() is safe
This commit adds a comment explaining why an unprotected call to
list_add() from srcu_funnel_gp_start() can be safe. TL;DR: It is only
called during very early boot when we don't have no steeking concurrency!
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 27 Jan 2022 21:20:49 +0000 (13:20 -0800)]
srcu: Prevent cleanup_srcu_struct() from freeing non-dynamic ->sda
When an srcu_struct structure is created (but not in a kernel module)
by DEFINE_SRCU() and friends, the per-CPU srcu_data structure is
statically allocated. In all other cases, that structure is obtained
from alloc_percpu(), in which case cleanup_srcu_struct() must invoke
free_percpu() on the resulting ->sda pointer in the srcu_struct pointer.
Which it does.
Except that it also invokes free_percpu() on the ->sda pointer
referencing the statically allocated per-CPU srcu_data structures.
Which free_percpu() is surprisingly OK with.
This commit nevertheless stops cleanup_srcu_struct() from freeing
statically allocated per-CPU srcu_data structures.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 27 Jan 2022 19:43:11 +0000 (11:43 -0800)]
srcu: Avoid NULL dereference in srcu_torture_stats_print()
You really shouldn't invoke srcu_torture_stats_print() after invoking
cleanup_srcu_struct(), but there is really no reason to get a
compiler-obfuscated per-CPU-variable NULL pointer dereference as the
diagnostic. This commit therefore checks for NULL ->sda and makes a
more polite console-message complaint in that case.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 27 Jan 2022 01:03:06 +0000 (17:03 -0800)]
srcu: Use invalid initial value for srcu_node GP sequence numbers
Currently, tree SRCU relies on the srcu_node structures being initialized
at the same time that the srcu_struct itself is initialized, and thus
use the initial grace-period sequence number as the initial value for
the srcu_node structure's ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp
fields. Although this has a high probability of also working when the
srcu_node array is allocated and initialized at some random later time,
it would be better to avoid leaving such things to chance.
This commit therefore initializes these fields with 0x1, which is a
recognizable invalid value. It then adds the required checks for this
invalid value in order to avoid confusion on long-running kernels
(especially those on 32-bit systems) that allocate and initialize
srcu_node arrays late in life.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 27 Jan 2022 00:01:26 +0000 (16:01 -0800)]
srcu: Compute snp_seq earlier in srcu_funnel_gp_start()
Currently, srcu_funnel_gp_start() tests snp->srcu_have_cbs[idx] and then
separately assigns it to the snp_seq local variable. This commit does
the assignment earlier to simplify the code a bit. While in the area,
this commit also takes advantage of the 100-character line limit to put
the call to srcu_schedule_cbs_sdp() on a single line.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Alexander Aring [Wed, 26 Jan 2022 15:03:54 +0000 (10:03 -0500)]
srcu: Use export for srcu_struct defined by DEFINE_STATIC_SRCU()
If an srcu_struct structure defined by tree SRCU's DEFINE_STATIC_SRCU()
is used by a module, sparse will give the following diagnostic:
sparse: symbol '__srcu_struct_nodes_srcu' was not declared. Should it be static?
The problem is that a within-module DEFINE_STATIC_SRCU() must define
a non-static srcu_struct because it is exported by referencing it in a
special '__section("___srcu_struct_ptrs")'. This reference is needed
so that module load and unloading can invoke init_srcu_struct() and
cleanup_srcu_struct(), respectively. Unfortunately, sparse is unaware of
'__section("___srcu_struct_ptrs")', resulting in the above false-positive
diagnostic. To avoid this false positive, this commit therefore creates
a prototype of the srcu_struct with an "extern" keyword.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 25 Jan 2022 23:41:10 +0000 (15:41 -0800)]
srcu: Add boot-time control over srcu_node array allocation
This commit adds an srcu_tree.convert_to_big kernel parameter that either
refuses to convert at all (0), converts immediately at init_srcu_struct()
time (1), or lets rcutorture convert it (2). An addition contention-based
dynamic conversion choice will be added, along with documentation.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 25 Jan 2022 01:05:51 +0000 (17:05 -0800)]
srcu: Make rcutorture dump the SRCU size state
This commit adds the numeric and string version of ->srcu_size_state to
the Tree-SRCU-specific portion of the rcutorture output.
[ paulmck: Apply feedback from kernel test robot and Dan Carpenter. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 24 Jan 2022 23:41:32 +0000 (15:41 -0800)]
srcu: Add size-state transitioning code
This is just dead code at the moment, but it serves to prevent
spurious compiler warnings about init_srcu_struct_nodes() being unused.
This function will once again be used once the state-transition code
is activated.
Because srcu_barrier() must be aware of transition before call_srcu(), the
state machine waits for an SRCU grace period before callbacks are queued
to the non-CPU-0 queues. This requres that portions of srcu_barrier()
be enclosed in an SRCU read-side critical section.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 24 Jan 2022 17:46:57 +0000 (09:46 -0800)]
srcu: Make Tree SRCU able to operate without snp_node array
This commit makes Tree SRCU able to operate without an snp_node
array, that is, when the srcu_data structures' ->mynode pointers
are NULL. This can result in high contention on the srcu_struct
structure's ->lock, but only when there are lots of call_srcu(),
synchronize_srcu(), and synchronize_srcu_expedited() calls.
Note that when there is no snp_node array, all SRCU callbacks use
CPU 0's callback queue. This is optimal in the common case of low
update-side load because it removes the need to search each CPU
for the single callback that made the grace period happen.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 3 Feb 2022 00:34:40 +0000 (16:34 -0800)]
rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention
Currently, call_rcu_tasks_generic() sets ->percpu_enqueue_shift to
order_base_2(nr_cpu_ids) upon encountering sufficient contention.
This does not shift to use of non-CPU-0 callback queues as intended, but
rather continues using only CPU 0's queue. Although this does provide
some decrease in contention due to spreading work over multiple locks,
it is not the dramatic decrease that was intended.
This commit therefore makes call_rcu_tasks_generic() set
->percpu_enqueue_shift to 0.
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 2 Feb 2022 23:42:36 +0000 (15:42 -0800)]
rcu-tasks: Use order_base_2() instead of ilog2()
The ilog2() function can be used to generate a shift count, but it will
generate the same count for a power of two as for one greater than a power
of two. This results in shift counts that are larger than necessary for
systems with a power-of-two number of CPUs because the CPUs are numbered
from zero, so that the maximum CPU number is one less than that power
of two.
This commit therefore substitutes order_base_2(), which appears to have
been designed for exactly this use case.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 10 Dec 2021 21:44:17 +0000 (13:44 -0800)]
rcu: Create and use an rcu_rdp_cpu_online()
The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently
in RCU code in order to determine whether rdp->cpu is online from an
RCU perspective. This commit therefore creates an rcu_rdp_cpu_online()
function to replace it.
[ paulmck: Apply kernel test robot unused-variable feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 14 Dec 2021 21:35:17 +0000 (13:35 -0800)]
rcu: Make rcu_barrier() no longer block CPU-hotplug operations
This commit removes the cpus_read_lock() and cpus_read_unlock() calls
from rcu_barrier(), thus allowing CPUs to come and go during the course
of rcu_barrier() execution. Posting of the ->barrier_head callbacks does
synchronize with portions of RCU's CPU-hotplug notifiers, but these locks
are held for short time periods on both sides. Thus, full CPU-hotplug
operations could both start and finish during the execution of a given
rcu_barrier() invocation.
Additional synchronization is provided by a global ->barrier_lock.
Since the ->barrier_lock is only used during rcu_barrier() execution and
during onlining/offlining a CPU, the contention for this lock should
be low. It might be tempting to make use of a per-CPU lock just on
general principles, but straightforward attempts to do this have the
problems shown below.
Initial state: 3 CPUs present, CPU 0 and CPU1 do not have
any callback and CPU2 has callbacks.
1. CPU0 calls rcu_barrier().
2. CPU1 starts offlining for CPU2. CPU1 calls
rcutree_migrate_callbacks(). rcu_barrier_entrain() is called
from rcutree_migrate_callbacks(), with CPU2's rdp->barrier_lock.
It does not entrain ->barrier_head for CPU2, as rcu_barrier()
on CPU0 hasn't started the barrier sequence (by calling
rcu_seq_start(&rcu_state.barrier_sequence)) yet.
3. CPU0 starts new barrier sequence. It iterates over
CPU0 and CPU1, after acquiring their per-cpu ->barrier_lock
and finds 0 segcblist length. It updates ->barrier_seq_snap
for CPU0 and CPU1 and continues loop iteration to CPU2.
for_each_possible_cpu(cpu) {
raw_spin_lock_irqsave(&rdp->barrier_lock, flags);
if (!rcu_segcblist_n_cbs(&rdp->cblist)) {
WRITE_ONCE(rdp->barrier_seq_snap, gseq);
raw_spin_unlock_irqrestore(&rdp->barrier_lock, flags);
rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence);
continue;
}
4. rcutree_migrate_callbacks() completes execution on CPU1.
Segcblist len for CPU2 becomes 0.
5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&rdp->cblist)
for CPU2 and completes the loop iteration after setting
->barrier_seq_snap.
6. As there isn't any ->barrier_head callback entrained; at
this point, rcu_barrier() in CPU0 returns.
7. The callbacks, which migrated from CPU2 to CPU1, execute.
Straightforward per-CPU locking is also subject to the following race
condition noted by Boqun Feng:
1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking
rcu_seq_start() and init_completion(), but does not yet initialize
rcu_state.barrier_cpu_count.
2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(),
which in turn calls rcu_barrier_entrain() holding CPU2's.
rdp->barrier_lock. It then entrains ->barrier_head for CPU2
and atomically increments rcu_state.barrier_cpu_count, which is
unfortunately not yet initialized to the value 2.
3. The just-entrained RCU callback is invoked. It atomically
decrements rcu_state.barrier_cpu_count and sees that it is
now zero. This callback therefore invokes complete().
4. CPU0 continues executing rcu_barrier(), but is not blocked
by its call to wait_for_completion(). This results in rcu_barrier()
returning before all pre-existing callbacks have been invoked,
which is a bug.
Therefore, synchronization is provided by rcu_state.barrier_lock,
which is also held across the initialization sequence, especially the
rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count
to the value 2. In addition, this lock is held when entraining the
rcu_barrier() callback, when deciding whether or not a CPU has callbacks
that rcu_barrier() must wait on, when setting the ->qsmaskinitnext for
incoming CPUs, and when migrating callbacks from a CPU that is going
offline.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 14 Dec 2021 21:15:18 +0000 (13:15 -0800)]
rcu: Rework rcu_barrier() and callback-migration logic
This commit reworks rcu_barrier() and callback-migration logic to
permit allowing rcu_barrier() to run concurrently with CPU-hotplug
operations. The key trick is for callback migration to check to see if
an rcu_barrier() is in flight, and, if so, enqueue the ->barrier_head
callback on its behalf.
This commit adds synchronization with RCU's CPU-hotplug notifiers. Taken
together, this will permit a later commit to remove the cpus_read_lock()
and cpus_read_unlock() calls from rcu_barrier().
[ paulmck: Updated per kbuild test robot feedback. ]
[ paulmck: Updated per reviews session with Neeraj, Frederic, Uladzislau, and Boqun. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sat, 11 Dec 2021 00:25:20 +0000 (16:25 -0800)]
rcu: Refactor rcu_barrier() empty-list handling
This commit saves a few lines by checking first for an empty callback
list. If the callback list is empty, then that CPU is taken care of,
regardless of its online or nocb state. Also simplify tracing accordingly
and fold a few lines together.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
David Woodhouse [Tue, 16 Feb 2021 15:04:34 +0000 (15:04 +0000)]
rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
If we allow architectures to bring APs online in parallel, then we end
up requiring rcu_cpu_starting() to be reentrant. But currently, the
manipulation of rnp->ofl_seq is not thread-safe.
However, rnp->ofl_seq is also fairly much pointless anyway since both
rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for
fairly much the whole time that rnp->ofl_seq is set to an odd number
to indicate that an operation is in progress.
So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock.
This has a couple of minor complexities: lockdep will complain when we
take rcu_state.ofl_lock, and currently accepts the 'excuse' of having
an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to
avoid that false positive complaint. Since we're killing rnp->ofl_seq
of course that 'excuse' has to be changed too, so make it check for
arch_spin_is_locked(rcu_state.ofl_lock).
There's no arch_spin_lock_irqsave() so we have to manually save and
restore local interrupts around the locking.
At Paul's request based on Neeraj's analysis, make rcu_gp_init not just
wait but *exclude* any CPU online/offline activity, which was fairly
much true already by virtue of it holding rcu_state.ofl_lock.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sat, 22 Jan 2022 00:13:52 +0000 (16:13 -0800)]
srcu: Dynamically allocate srcu_node array
This commit shrinks the srcu_struct structure by converting its ->node
field from a fixed-size compile-time array to a pointer to a dynamically
allocated array. In kernels built with large values of NR_CPUS that boot
on systems with smaller numbers of CPUs, this can save significant memory.
[ paulmck: Apply kernel test robot feedback. ]
Reported-by: A cast of thousands
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Waiman Long [Mon, 6 Dec 2021 03:38:15 +0000 (22:38 -0500)]
clocksource: Add a Kconfig option for WATCHDOG_MAX_SKEW
A watchdog maximum skew of 100us may still be too small for
some systems or archs. It may also be too small when some kernel
debug config options are enabled. So add a new Kconfig option
CLOCKSOURCE_WATCHDOG_MAX_SKEW_US to allow kernel builders to have more
control on the threshold for marking clocksource as unstable.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 25 Jun 2019 05:30:32 +0000 (22:30 -0700)]
tools/memory-model: Use "-unroll 0" to keep --hw runs finite
Litmus tests involving atomic operations produce LL/SC loops on a number
of architectures, and unrolling these loops can result in excessive
verification times or even stack overflows. This commit therefore uses
the "-unroll 0" herd7 argument to avoid unrolling, on the grounds that
additional passes through an LL/SC loop should not change the verification.
Note however, that certain bugs in the mapping of the LL/SC loop to
machine instructions may go undetected. On the other hand, herd7 might
not be the best vehicle for finding such bugs in any case. (You do
stress-test your architecture-specific code, don't you?)
Suggested-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 6 Jun 2019 09:13:27 +0000 (02:13 -0700)]
tools/memory-model: Make judgelitmus.sh handle scripted Result: tag
The scripts that generate the litmus tests in the "auto" directory of
the https://github.com/paulmckrcu/litmus archive place the "Result:"
tag into a single-line ocaml comment, which judgelitmus.sh currently
does not recognize. This commit therefore makes judgelitmus.sh
recognize both the multiline comment format that it currently does
and the automatically generated single-line format.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 3 May 2019 14:34:20 +0000 (07:34 -0700)]
tools/memory-model: Add data-race capabilities to judgelitmus.sh
This commit adds functionality to judgelitmus.sh to allow it to handle
both the "DATARACE" markers in the "Result:" comments in litmus tests
and the "Flag data-race" markers in LKMM output. For C-language tests,
if either marker is present, the other must also be as well, at least for
litmus tests having a "Result:" comment. If the LKMM output indicates
a data race, then failures of the Always/Sometimes/Never portion of the
"Result:" prediction are forgiven.
The reason for forgiving "Result:" mispredictions is that data races can
result in "interesting" compiler optimizations, so that all bets are off
in the data-race case.
[ paulmck: Apply Akira Yokosawa feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 2 May 2019 17:05:14 +0000 (10:05 -0700)]
tools/memory-model: Add checktheselitmus.sh to run specified litmus tests
This commit adds a checktheselitmus.sh script that runs the litmus tests
specified on the command line. This is useful for verifying fixes to
specific litmus tests.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 2 May 2019 17:03:29 +0000 (10:03 -0700)]
tools/memory-model: Repair parseargs.sh header comment
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 2 May 2019 16:51:57 +0000 (09:51 -0700)]
tools/memory-model: Add "--" to parseargs.sh for additional arguments
Currently, parseargs.sh expects to consume all the command-line arguments,
which prevents the calling script from having any of its own arguments.
This commit therefore causes parseargs.sh to stop consuming arguments
when it encounters a "--" argument, leaving any remaining arguments for
the calling script.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 8 Apr 2019 17:02:23 +0000 (10:02 -0700)]
tools/memory-model: Make history-check scripts use mselect7
The history-check scripts currently use grep to ignore non-C-language
litmus tests, which is a bit fragile. This commit therefore enlists the
aid of "mselect7 -arch C", given Luc Maraget's recent modifications that
allow mselect7 to operate in filter mode.
This change requires herdtools
7.52-32-g1da3e0e50977 or later.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 8 Apr 2019 16:27:28 +0000 (09:27 -0700)]
tools/memory-model: Make checkghlitmus.sh use mselect7
The checkghlitmus.sh script currently uses grep to ignore non-C-language
litmus tests, which is a bit fragile. This commit therefore enlists the
aid of "mselect7 -arch C", given Luc Maraget's recent modifications that
allow mselect7 to operate in filter mode.
This change requires herdtools
7.52-32-g1da3e0e50977 or later.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 27 Mar 2019 18:47:14 +0000 (11:47 -0700)]
tools/memory-model: Fix scripting --jobs argument
The parseargs.sh regular expression for the --jobs argument incorrectly
requires that the number of jobs be at least 10, that is, have at least
two digits. This commit therefore adjusts this regular expression to
allow single-digit numbers of jobs to be specified.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sat, 23 Mar 2019 00:18:43 +0000 (17:18 -0700)]
tools/memory-model: Implement --hw support for checkghlitmus.sh
This commits enables the "--hw" argument for the checkghlitmus.sh script,
causing it to convert any applicable C-language litmus tests to the
specified flavor of assembly language, to verify these assembly-language
litmus tests, and checking compatibility of the outcomes.
Note that the conversion does not yet handle locking, RCU, SRCU, plain
C-language memory accesses, or casts.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 5 Apr 2019 19:34:56 +0000 (12:34 -0700)]
tools/memory-model: Add -v flag to jingle7 runs
Adding the -v flag to jingle7 invocations gives much useful information
on why jingle7 didn't like a given litmus test. This commit therefore
adds this flag and saves off any such information into a .err file.
Suggested-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 26 Mar 2019 00:20:51 +0000 (17:20 -0700)]
tools/memory-model: Make runlitmus.sh check for jingle errors
It turns out that the jingle7 tool is currently a bit picky about
the litmus tests it is willing to process. This commit therefore
ensures that jingle7 failures are reported.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 22 Mar 2019 15:57:20 +0000 (08:57 -0700)]
tools/memory-model: Allow herd to deduce CPU type
Currently, the scripts specify the CPU's .cat file to herd. But this is
pointless because herd will select a good and sufficient .cat file from
the assembly-language litmus test itself. This commit therefore removes
the -model argument to herd, allowing herd to figure the CPU family out
itself.
Note that the user can override herd's choice using the "--herdopts"
argument to the scripts.
Suggested-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 21 Mar 2019 21:44:09 +0000 (14:44 -0700)]
tools/memory-model: Keep assembly-language litmus tests
This commit retains the assembly-language litmus tests generated from
the C-language litmus tests, appending the hardware tag to the original
C-language litmus test's filename. Thus, S+poonceonces.litmus.AArch64
contains the Armv8 assembly language corresponding to the C-language
S+poonceonces.litmus test.
This commit also updates the .gitignore to avoid committing these
automatically generated assembly-language litmus tests.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 21 Mar 2019 21:06:27 +0000 (14:06 -0700)]
tools/memory-model: Move from .AArch64.litmus.out to .litmus.AArch.out
When the github scripts see ".litmus.out", they assume that there must be
a corresponding C-language ".litmus" file. Won't they be disappointed
when they instead see nothing, or, worse yet, the corresponding
assembly-language litmus test? This commit therefore swaps the hardware
tag with the "litmus" to avoid this sort of disappointment.
This commit also adjusts the .gitignore file so as to avoid adding these
new ".out" files to git.
[ paulmck: Apply Akira Yokosawa feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 20 Mar 2019 23:41:41 +0000 (16:41 -0700)]
tools/memory-model: Make runlitmus.sh generate .litmus.out for --hw
In the absence of "Result:" comments, the runlitmus.sh script relies on
litmus.out files from prior LKMM runs. This can be a bit user-hostile,
so this commit makes runlitmus.sh generate any needed .litmus.out files
that don't already exist.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 20 Mar 2019 21:57:56 +0000 (14:57 -0700)]
tools/memory-model: Split runlitmus.sh out of checklitmus.sh
This commit prepares for adding --hw capability to github litmus-test
scripts by splitting runlitmus.sh (which simply runs the verification)
out of checklitmus.sh (which also judges the results).
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 20 Mar 2019 21:37:46 +0000 (14:37 -0700)]
tools/memory-model: Make judgelitmus.sh ransack .litmus.out files
The judgelitmus.sh script currently relies solely on the "Result:"
comment in the .litmus file. This is problematic when using the --hw
argument, because it is necessary to check the hardware model against
LKMM even in the absence of "Result:" comments.
This commit therefore modifies judgelitmus.sh to check the observation
in a .litmus.out file, in case one was generated by a previous LKMM run.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 20 Mar 2019 19:39:27 +0000 (12:39 -0700)]
tools/memory-model: Hardware checking for check{,all}litmus.sh
This commit makes checklitmus.sh and checkalllitmus.sh check to see
if a hardware verification was specified (via the --hw command-line
argument, which sets the LKMM_HW_MAP_FILE environment variable).
If so, the C-language litmus test is converted to the specified type
of assembly-language litmus test and herd is run on it. Hardware is
permitted to be stronger than LKMM requires, so "Always" and "Never"
verifications of "Sometimes" C-language litmus tests are forgiven.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 19 Mar 2019 23:37:01 +0000 (16:37 -0700)]
tools/memory-model: Fix checkalllitmus.sh comment
The checkalllitmus.sh runs litmus tests in the litmus-tests directory,
not those in the github archive, so this commit updates the comment to
reflect this reality.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 19 Mar 2019 23:21:09 +0000 (16:21 -0700)]
tools/memory-model: Add simpletest.sh to check locking, RCU, and SRCU
This commit abstracts out common function to check a given litmus test
for locking, RCU, and SRCU in order to avoid duplicating code.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 19 Mar 2019 21:39:10 +0000 (14:39 -0700)]
tools/memory-model: Make judgelitmus.sh handle hardware verifications
This commit makes the judgelitmus.sh script check the --hw argument
(AKA the LKMM_HW_MAP_FILE environment variable) and to adjust its
judgment for a run where a C-language litmus test has been translated to
assembly and the assembly version verified. In this case, the assembly
verification output is checked against the C-language script's "Result:"
comment. However, because hardware can be stronger than LKMM requires,
the judgelitmus.sh script forgives verification mismatches featuring
a "Sometimes" in the C-language script and an "Always" or "Never"
assembly-language verification.
Note that deadlock is not forgiven, however, this should not normally be
an issue given that C-language tests containing locking, RCU, or SRCU
cannot be translated to assembly. However, this issue can crop up in
litmus tests that mimic deadlock by using the "filter" clause to ignore
all executions. It can also crop up when certain herd arguments are
used to autofilter everything that does not match the "exists" clause
in cases where the "exists" clause cannot be satisfied.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 19 Mar 2019 22:59:26 +0000 (15:59 -0700)]
tools/memory-model: Update parseargs.sh for hardware verification
This commit adds a --hw argument to parseargs.sh to specify the CPU
family for a hardware verification. For example, "--hw AArch64" will
specify that a C-language litmus test is to be translated to ARMv8 and
the result verified. This will set the LKMM_HW_MAP_FILE environment
variable accordingly. If there is no --hw argument, this environment
variable will be set to the empty string.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 11 Apr 2019 14:33:18 +0000 (07:33 -0700)]
tools/memory-model: Fix paulmck email address on pre-existing scripts
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 19 Mar 2019 21:27:06 +0000 (14:27 -0700)]
tools/memory-model: Make judgelitmus.sh detect hard deadlocks
If a litmus test specifies "Result: Never" and if it contains an
unconditional ("hard") deadlock, then running checklitmus.sh on it will
not flag any errors, despite the fact that there are no executions.
This commit therefore updates judgelitmus.sh to complain about tests
with no executions that are marked, but not as "Result: DEADLOCK".
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 18 Mar 2019 20:40:57 +0000 (13:40 -0700)]
tools/memory-model: Make judgelitmus.sh identify bad macros
Currently, judgelitmus.sh treats use of unknown primitives (such as
srcu_read_lock() prior to SRCU support) as "!!! Verification error".
This can be misleading because it fails to call out typos and running
a version LKMM on a litmus test requiring a feature not provided by
that version. This commit therefore changes judgelitmus.sh to check
for unknown primitives and to report them, for example, with:
'!!! Current LKMM version does not know "rcu_write_lock"'.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 18 Mar 2019 20:07:46 +0000 (13:07 -0700)]
tools/memory-model: Make cmplitmushist.sh note timeouts
Currently, cmplitmushist.sh treats timeouts (as in the "--timeout"
argument) as "Missing Observation line". This can be misleading because
it is quite possible that running the test longer would have produced
a verification. This commit therefore changes cmplitmushist.sh to check
for timeouts and to report them with "Timed out".
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 18 Mar 2019 18:53:50 +0000 (11:53 -0700)]
tools/memory-model: Make judgelitmus.sh note timeouts
Currently, judgelitmus.sh treats timeouts (as in the "--timeout" argument)
as "!!! Verification error". This can be misleading because it is quite
possible that running the test longer would have produced a verification.
This commit therefore changes judgelitmus.sh to check for timeouts and
to report them with "!!! Timeout".
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 14 Aug 2020 23:14:34 +0000 (16:14 -0700)]
tools/memory-model: Document locking corner cases
Most Linux-kernel uses of locking are straightforward, but there are
corner-case uses that rely on less well-known aspects of the lock and
unlock primitives. This commit therefore adds a locking.txt and litmus
tests in Documentation/litmus-tests/locking to explain these corner-case
uses.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Alan Stern [Tue, 1 Feb 2022 19:00:08 +0000 (14:00 -0500)]
tools/memory-model: Explain syntactic and semantic dependencies
Paul Heidekrüger pointed out that the Linux Kernel Memory Model
documentation doesn't mention the distinction between syntactic and
semantic dependencies. This is an important difference, because the
compiler can easily break dependencies that are only syntactic, not
semantic.
This patch adds a few paragraphs to the LKMM documentation explaining
these issues and illustrating how they can matter.
Suggested-by: Paul Heidekrüger <paul.heidekrueger@in.tum.de>
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 26 Jan 2022 05:08:55 +0000 (21:08 -0800)]
torture: Change KVM environment variable to RCUTORTURE
The torture-test scripting's long-standing use of KVM as the environment
variable tracking the pathname of the rcutorture directory now conflicts
with allmodconfig builds due to the virt/kvm/Makefile.kvm file's use
of this as a makefile variable. This commit therefore changes the
torture-test scripting from KVM to RCUTORTURE, avoiding the name conflict.
Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 18 Jan 2022 23:40:49 +0000 (15:40 -0800)]
torture: Make kvm-find-errors.sh notice missing vmlinux file
Currently, an obtuse compiler diagnostic can fool kvm-find-errors.sh
into believing that the build was successful. This commit therefore
adds a check for a missing vmlinux file. Note that in the case of
repeated torture-test scenarios ("--configs '2*TREE01'"), the vmlinux
file will only be present in the first directory, that is, in TREE01
but not TREE01.2.
Link: https://lore.kernel.org/lkml/36bd91e4-8eda-5677-7fde-40295932a640@molgen.mpg.de/
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 28 Dec 2021 05:21:35 +0000 (21:21 -0800)]
torture: Print only one summary line per run
The torture.sh scripts currently duplicates the summary lines, getting
one during the run phase and one during the summary phase of each run.
This commit therefore removes the run phase from consideration so as to
get only one summary line per run.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 21 Dec 2021 04:24:25 +0000 (20:24 -0800)]
torture: Make kvm-remote.sh try multiple times to download tarball
This commit ups the retries for downloading the build-product tarball
to a given remote system from once to five times, the better to handle
transient network failures.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sat, 18 Dec 2021 00:14:31 +0000 (16:14 -0800)]
torture: Compress KCSAN as well as KASAN vmlinux files
Compressing KASAN vmlinux files reduces torture.sh res file size from
about 100G to about 50G, which is good, but the KCSAN vmlinux files
are also large. Compressing them reduces their size from about 700M to
about 100M (but of course your mileage may vary). This commit therefore
compresses both KASAN and KCSAN vmlinux files.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 6 Dec 2021 17:13:37 +0000 (09:13 -0800)]
torture: Indicate which torture.sh runs' bugs are all KCSAN reports
This commit further improves torture.sh run summaries by indicating
which runs' "Bugs:" counts are all KCSAN reports, and further printing
an additional end-of-run summary line when all errors reported in all
runs were KCSAN reports.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sun, 5 Dec 2021 05:00:55 +0000 (21:00 -0800)]
torture: Make kvm.sh summaries note runs having only KCSAN reports
Runs having only KCSAN reports will normally print a summary line
containing only a "Bugs:" entry. However, these bugs might or might
not be KCSAN reports. This commit therefore flags runs in which all the
"Bugs:" entries are KCSAN reports.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Sat, 4 Dec 2021 21:53:24 +0000 (13:53 -0800)]
torture: Output per-failed-run summary lines from torture.sh
Currently, torture.sh lists the failed runs, but it is up to the user
to work out what failed. This is especially annoying for KCSAN runs,
where RCU's tighter definitions result in failures being reported for
other parts of the kernel. This commit therefore outputs "Summary:"
lines for each failed run, allowing the user to more quickly identify
which failed runs need focused attention.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 2 Dec 2021 19:24:05 +0000 (11:24 -0800)]
torture: Allow four-digit repetition numbers for --configs parameter
In a clear-cut case of "not thinking big enough", kvm.sh limits the
multipliers for torture-test scenarios to three digits. Although this is
large enough for any single system that I have ever run rcutorture on,
it does become a problem when you want to use kvm-remote.sh to run as
many instances of TREE09 as fit on a set of 20 systems with 80 CPUs each.
Yes, one could simply say "--configs '800*TREE09 800*TREE09'", but this
commit removes the need for that sort of hacky workaround by permitting
four-digit repetition numbers, thus allowing "--configs '1600*TREE09'".
Five-digit repetition numbers remain off the menu. Should they ever
really be needed, they can easily be added!
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 2 Dec 2021 03:19:13 +0000 (19:19 -0800)]
torture: Drop trailing ^M from console output
Console logs can sometimes have trailing control-M characters, which the
forward-progress evaluation code in kvm-recheck-rcu.sh passes through
to the user output. Which does not cause a technical problem, but which
can look ugly. This commit therefore strips the control-M characters.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 28 Jan 2022 04:29:10 +0000 (20:29 -0800)]
rcutorture: Enable limited callback-flooding tests of SRCU
This commit allows up to 50,000 callbacks worth of callback-flooding
tests of SRCU. The goal of this change is to exercise Tree SRCU's
ability to transition from SRCU_SIZE_SMALL to SRCU_SIZE_BIG triggered
by callback-queue-time lock contention.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 3 Jan 2022 14:07:09 +0000 (06:07 -0800)]
torture: Wake up kthreads after storing task_struct pointer
Currently, _torture_create_kthread() uses kthread_run() to create
torture-test kthreads, which means that the resulting task_struct
pointer is stored after the newly created kthread has been marked
runnable. This in turn can cause spurious failure of checks for
code being run by a particular kthread. This commit therefore changes
_torture_create_kthread() to use kthread_create(), then to do an explicit
wake_up_process() after the task_struct pointer has been stored.
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 28 Dec 2021 23:59:38 +0000 (15:59 -0800)]
rcutorture: Fix rcu_fwd_mutex deadlock
The rcu_torture_fwd_cb_hist() function acquires rcu_fwd_mutex, but is
invoked from rcutorture_oom_notify() function, which hold this same
mutex across this call. This commit fixes the resulting deadlock.
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 17 Dec 2021 23:05:05 +0000 (15:05 -0800)]
rcutorture: Add end-of-test check to rcu_torture_fwd_prog() loop
The second and subsequent forward-progress kthreads loop waiting for
the first forward-progress kthread to start the next test interval.
Unfortunately, if the test ends while one of those kthreads is waiting,
the test will hang. This hang occurs because that wait loop fails to
check for the end of the test. This commit therefore adds an end-of-test
check to that wait loop.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 17 Dec 2021 20:33:53 +0000 (12:33 -0800)]
rcutorture: Make rcu_fwd_cb_nodelay be a counter
Back when only one rcutorture kthread could do forward-progress testing,
it was just fine for rcu_fwd_cb_nodelay to be a non-atomic bool. It was
set at the start of forward-progress testing and cleared at the end.
But now that there are multiple threads, the value can be cleared while
one of the threads is still doing forward-progress testing. This commit
therefore makes rcu_fwd_cb_nodelay be an atomic counter, replacing the
WRITE_ONCE() operations with atomic_inc() and atomic_dec().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 16 Dec 2021 23:36:02 +0000 (15:36 -0800)]
rcutorture: Increase visibility of forward-progress hangs
This commit adds a few pr_alert() calls to rcutorture's forward-progress
testing in order to better diagnose shutdown-time hangs.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 16 Dec 2021 20:23:31 +0000 (12:23 -0800)]
torture: Distinguish kthread stopping and being asked to stop
Right now, if a given kthread (call it "kthread") realizes that it needs
to stop, "Stopping kthread" is written to the console. When the cleanup
code decides that it is time to stop that kthread, "Stopping kthread
tasks" is written to the console. These two events might happen in
either order, especially in the case of time-based torture-test shutdown.
But it is hard to distinguish these, especially for those unfamiliar with
the torture tests. This commit therefore changes the first case from
"Stopping kthread" to "kthread is stopping" to make things more clear.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Mon, 6 Dec 2021 23:12:14 +0000 (15:12 -0800)]
rcutorture: Print message before invoking ->cb_barrier()
The various ->cb_barrier() functions, for example, rcu_barrier(),
sometimes cause rcutorture hangs. But currently, the last console
message is the unenlightening "Stopping rcu_torture_stats". This commit
therefore prints a message of the form "rcu_torture_cleanup: Invoking
rcu_barrier+0x0/0x1e0()" to help point people in the right direction.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 20 Jan 2022 21:39:33 +0000 (13:39 -0800)]
srcu: Make srcu_funnel_gp_start() cache ->mynode in snp_leaf
Currently, the srcu_funnel_gp_start() walks its local variable snp up the
tree and reloads sdp->mynode whenever it is necessary to check whether
it is still at the leaf srcu_node level. This works, but is a bit more
obtuse than absolutely necessary. In addition, upcoming commits will
dynamically size srcu_struct structures, in which case sdp->mynode will
no longer necessarily be a constant, and this commit helps prepare for
that dynamic sizing.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Thu, 20 Jan 2022 21:16:18 +0000 (13:16 -0800)]
srcu: Fix s/is/if/ typo in srcu_node comment
This commit fixed a typo in the srcu_node structure's ->srcu_have_cbs
comment. While in the area, redo a couple of comments to take advantage
of 100-character line lengths.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Wed, 12 Jan 2022 17:52:44 +0000 (09:52 -0800)]
srcu: Tighten cleanup_srcu_struct() GP checks
Currently, cleanup_srcu_struct() checks for a grace period in progress,
but it does not check for a grace period that has not yet started but
which might start at any time. Such a situation could result in a
use-after-free bug, so this commit adds a check for a grace period that
is needed but not yet started to cleanup_srcu_struct().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Zqiang [Tue, 25 Jan 2022 02:47:44 +0000 (10:47 +0800)]
rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings
When the rcutree.use_softirq kernel boot parameter is set to zero, all
RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads.
If these kthreads are being starved, quiescent states will not be
reported, which in turn means that the grace period will not end, which
can in turn trigger RCU CPU stall warnings. This commit therefore dumps
stack traces of stalled CPUs' rcuc kthreads, which can help identify
what is preventing those kthreads from running.
Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>