linux-2.6-block.git
3 years agotools/nolibc/unistd: add usleep()
Willy Tarreau [Mon, 7 Feb 2022 16:23:50 +0000 (17:23 +0100)]
tools/nolibc/unistd: add usleep()

This call is trivial to implement based on select() to complete sleep()
and msleep(), let's add it.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/unistd: extract msleep(), sleep(), tcsetpgrp() to unistd.h
Willy Tarreau [Mon, 7 Feb 2022 16:23:49 +0000 (17:23 +0100)]
tools/nolibc/unistd: extract msleep(), sleep(), tcsetpgrp() to unistd.h

These functions are normally provided by unistd.h. For ease of porting,
let's create the file and move them there.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/errno: extract errno.h from sys.h
Willy Tarreau [Mon, 7 Feb 2022 16:23:48 +0000 (17:23 +0100)]
tools/nolibc/errno: extract errno.h from sys.h

This allows us to provide a minimal errno.h to ease porting applications
that use it.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/string: export memset() and memmove()
Willy Tarreau [Mon, 7 Feb 2022 16:23:47 +0000 (17:23 +0100)]
tools/nolibc/string: export memset() and memmove()

"clang -Os" and "gcc -Ofast" without -ffreestanding may ignore memset()
and memmove(), hoping to provide their builtin equivalents, and finally
not find them. Thus we must export these functions for these rare cases.
Note that as they're set in their own sections, they will be eliminated
by the linker if not used. In addition, they do not prevent gcc from
identifying them and replacing them with the shorter "rep movsb" or
"rep stosb" when relevant.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/types: define PATH_MAX and MAXPATHLEN
Willy Tarreau [Mon, 7 Feb 2022 16:23:46 +0000 (17:23 +0100)]
tools/nolibc/types: define PATH_MAX and MAXPATHLEN

These ones are often used and commonly set by applications to fallback
values. Let's fix them both to agree on PATH_MAX=4096 by default, as is
already present in linux/limits.h.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/arch: mark the _start symbol as weak
Willy Tarreau [Mon, 7 Feb 2022 16:23:45 +0000 (17:23 +0100)]
tools/nolibc/arch: mark the _start symbol as weak

By doing so we can link together multiple C files that have been compiled
with nolibc and which each have a _start symbol.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc: move exported functions to their own section
Willy Tarreau [Mon, 7 Feb 2022 16:23:44 +0000 (17:23 +0100)]
tools/nolibc: move exported functions to their own section

Some functions like raise() and memcpy() are permanently exported because
they're needed by libgcc on certain platforms. However most of the time
they are not needed and needlessly take space.

Let's move them to their own sub-section, called .text.nolibc_<function>.
This allows ld to get rid of them if unused when passed --gc-sections.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/string: add tiny versions of strncat() and strlcat()
Willy Tarreau [Mon, 7 Feb 2022 16:23:43 +0000 (17:23 +0100)]
tools/nolibc/string: add tiny versions of strncat() and strlcat()

While these functions are often dangerous, forcing the user to work
around their absence is often much worse. Let's provide small versions
of each of them. The respective sizes in bytes on a few architectures
are:

  strncat(): x86:0x33 mips:0x68 arm:0x3c
  strlcat(): x86:0x25 mips:0x4c arm:0x2c

The two are quite different, and strncat() is even different from
strncpy() in that it limits the amount of data it copies and will always
terminate the output by one zero, while strlcat() will always limit the
total output to the specified size and will put a zero if possible.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/string: add strncpy() and strlcpy()
Willy Tarreau [Mon, 7 Feb 2022 16:23:42 +0000 (17:23 +0100)]
tools/nolibc/string: add strncpy() and strlcpy()

These are minimal variants. strncpy() always fills the destination for
<size> chars, while strlcpy() copies no more than <size> including the
zero and returns the source's length. The respective sizes on various
archs are:

  strncpy(): x86:0x1f mips:0x30 arm:0x20
  strlcpy(): x86:0x17 mips:0x34 arm:0x1a

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/string: slightly simplify memmove()
Willy Tarreau [Mon, 7 Feb 2022 16:23:41 +0000 (17:23 +0100)]
tools/nolibc/string: slightly simplify memmove()

The direction test inside the loop was not always completely optimized,
resulting in a larger than necessary function. This change adds a
direction variable that is set out of the loop. Now the function is down
to 48 bytes on x86, 32 on ARM and 68 on mips. It's worth noting that other
approaches were attempted (including relying on the up and down functions)
but they were only slightly beneficial on x86 and cost more on others.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/string: use unidirectional variants for memcpy()
Willy Tarreau [Mon, 7 Feb 2022 16:23:40 +0000 (17:23 +0100)]
tools/nolibc/string: use unidirectional variants for memcpy()

Till now memcpy() relies on memmove(), but it's always included for libgcc,
so we have a larger than needed function. Let's implement two unidirectional
variants to copy from bottom to top and from top to bottom, and use the
former for memcpy(). The variants are optimized to be compact, and at the
same time the compiler is sometimes able to detect the loop and to replace
it with a "rep movsb". The new function is 24 bytes instead of 52 on x86_64.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/sys: make getpgrp(), getpid(), gettid() not set errno
Willy Tarreau [Mon, 7 Feb 2022 16:23:39 +0000 (17:23 +0100)]
tools/nolibc/sys: make getpgrp(), getpid(), gettid() not set errno

These syscalls never fail so there is no need to extract and set errno
for them.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdlib: make raise() use the lower level syscalls only
Willy Tarreau [Mon, 7 Feb 2022 16:23:38 +0000 (17:23 +0100)]
tools/nolibc/stdlib: make raise() use the lower level syscalls only

raise() doesn't set errno, so there's no point calling kill(), better
call sys_kill(), which also reduces the function's size.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdlib: avoid a 64-bit shift in u64toh_r()
Willy Tarreau [Mon, 7 Feb 2022 16:23:37 +0000 (17:23 +0100)]
tools/nolibc/stdlib: avoid a 64-bit shift in u64toh_r()

The build of printf() on mips requires libgcc for functions __ashldi3 and
__lshrdi3 due to 64-bit shifts when scanning the input number. These are
not really needed in fact since we scan the number 4 bits at a time. Let's
arrange the loop to perform two 32-bit shifts instead on 32-bit platforms.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/sys: make open() take a vararg on the 3rd argument
Willy Tarreau [Mon, 7 Feb 2022 16:23:36 +0000 (17:23 +0100)]
tools/nolibc/sys: make open() take a vararg on the 3rd argument

Let's pass a vararg to open() so that it remains compatible with existing
code. The arg is only dereferenced when flags contain O_CREAT. The function
is generally not inlined anymore, causing an extra call (total 16 extra
bytes) but it's still optimized for constant propagation, limiting the
excess to no more than 16 bytes in practice when open() is called without
O_CREAT, and ~40 with O_CREAT, which remains reasonable.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdio: add perror() to report the errno value
Willy Tarreau [Mon, 7 Feb 2022 16:23:35 +0000 (17:23 +0100)]
tools/nolibc/stdio: add perror() to report the errno value

It doesn't contain the text for the error codes, but instead displays
"errno=" followed by the errno value. Just like the regular errno, if
a non-empty message is passed, it's placed followed with ": " on the
output before the errno code. The message is emitted on stderr.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/types: define EXIT_SUCCESS and EXIT_FAILURE
Willy Tarreau [Mon, 7 Feb 2022 16:23:34 +0000 (17:23 +0100)]
tools/nolibc/types: define EXIT_SUCCESS and EXIT_FAILURE

These ones are found in some examples found in man pages and ease
portability tests.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdio: add a minimal [vf]printf() implementation
Willy Tarreau [Mon, 7 Feb 2022 16:23:33 +0000 (17:23 +0100)]
tools/nolibc/stdio: add a minimal [vf]printf() implementation

This adds a minimal vfprintf() implementation as well as the commonly
used fprintf() and printf() that rely on it.

For now the function supports:
  - formats: %s, %c, %u, %d, %x
  - modifiers: %l and %ll
  - unknown chars are considered as modifiers and are ignored

It is designed to remain minimalist, despite this printf() is 549 bytes
on x86_64. It would be wise not to add too many formats.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdio: add fwrite() to stdio
Willy Tarreau [Mon, 7 Feb 2022 16:23:32 +0000 (17:23 +0100)]
tools/nolibc/stdio: add fwrite() to stdio

We'll use it to write substrings. It relies on a simpler _fwrite() that
only takes one size. fputs() was also modified to rely on it.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdio: add stdin/stdout/stderr and fget*/fput* functions
Willy Tarreau [Mon, 7 Feb 2022 16:23:31 +0000 (17:23 +0100)]
tools/nolibc/stdio: add stdin/stdout/stderr and fget*/fput* functions

The standard puts() function always emits the trailing LF which makes it
unconvenient for small string concatenation. fputs() ought to be used
instead but it requires a FILE*.

This adds 3 dummy FILE* values (stdin, stdout, stderr) which are in fact
pointers to struct FILE of one byte. We reserve 3 pointer values for them,
-3, -2 and -1, so that they are ordered, easing the tests and mapping to
integer.

>From this, fgetc(), fputc(), fgets() and fputs() were implemented, and
the previous putchar() and getchar() now remap to these. The standard
getc() and putc() macros were also implemented as pointing to these
ones.

There is absolutely no buffering, fgetc() and fgets() read one byte at
a time, fputc() writes one byte at a time, and only fputs() which knows
the string's length writes all of it at once.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdio: add a minimal set of stdio functions
Willy Tarreau [Mon, 7 Feb 2022 16:23:30 +0000 (17:23 +0100)]
tools/nolibc/stdio: add a minimal set of stdio functions

This only provides getchar(), putchar(), and puts().

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdlib: add utoh() and u64toh()
Willy Tarreau [Mon, 7 Feb 2022 16:23:29 +0000 (17:23 +0100)]
tools/nolibc/stdlib: add utoh() and u64toh()

This adds a pair of functions to emit hex values.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdlib: add i64toa() and u64toa()
Willy Tarreau [Mon, 7 Feb 2022 16:23:28 +0000 (17:23 +0100)]
tools/nolibc/stdlib: add i64toa() and u64toa()

These are 64-bit variants of the itoa() and utoa() functions. They also
support reentrant ones, and use the same itoa_buffer. The functions are
a bit larger than the previous ones in 32-bit mode (86 and 98 bytes on
x86_64 and armv7 respectively), which is why we continue to provide them
as separate functions.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdlib: replace the ltoa() function with more efficient ones
Willy Tarreau [Mon, 7 Feb 2022 16:23:27 +0000 (17:23 +0100)]
tools/nolibc/stdlib: replace the ltoa() function with more efficient ones

The original ltoa() function and the reentrant one ltoa_r() present a
number of drawbacks. The divide by 10 generates calls to external code
from libgcc_s, and the number does not necessarily start at the beginning
of the buffer.

Let's rewrite these functions so that they do not involve a divide and
only use loops on powers of 10, and implement both signed and unsigned
variants, always starting from the buffer's first character. Instead of
using a static buffer for each function, we're now using a common one.

In order to avoid confusion with the ltoa() name, the new functions are
called itoa_r() and utoa_r() to distinguish the signed and unsigned
versions, and for convenience for their callers, these functions now
reutrn the number of characters emitted. The ltoa_r() function is just
an inline mapping to the signed one and which returns the buffer.

The functions are quite small (86 bytes on x86_64, 68 on armv7) and
do not depend anymore on external code.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdlib: move ltoa() to stdlib.h
Willy Tarreau [Mon, 7 Feb 2022 16:23:26 +0000 (17:23 +0100)]
tools/nolibc/stdlib: move ltoa() to stdlib.h

This function is not standard and performs the opposite of atol(). Let's
move it with atol(). It's been split between a reentrant function and one
using a static buffer.

There's no more definition in nolibc.h anymore now.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/types: move makedev to types.h and make it a macro
Willy Tarreau [Mon, 7 Feb 2022 16:23:25 +0000 (17:23 +0100)]
tools/nolibc/types: move makedev to types.h and make it a macro

The makedev() man page says it's supposed to be a macro and that some
OSes have it with the other ones in sys/types.h so it now makes sense
to move it to types.h as a macro. Let's also define major() and
minor() that perform the reverse operation.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/types: make FD_SETSIZE configurable
Willy Tarreau [Sun, 13 Feb 2022 08:53:01 +0000 (09:53 +0100)]
tools/nolibc/types: make FD_SETSIZE configurable

The macro was hard-coded to 256 but it's common to see it redefined.
Let's support this and make sure we always allocate enough entries for
the cases where it wouldn't be multiple of 32.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/types: move the FD_* functions to macros in types.h
Willy Tarreau [Sun, 13 Feb 2022 08:52:10 +0000 (09:52 +0100)]
tools/nolibc/types: move the FD_* functions to macros in types.h

FD_SET, FD_CLR, FD_ISSET, FD_ZERO are often expected to be macros and
not functions. In addition we already have a file dedicated to such
macros and types used by syscalls, it's types.h, so let's move them
there and turn them to macros. FD_CLR() and FD_ISSET() were missing,
so they were added. FD_ZERO() now deals with its own loop so that it
doesn't rely on memset() that sets one byte at a time.

Cc: David Laight <David.Laight@aculab.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/ctype: add the missing is* functions
Willy Tarreau [Mon, 7 Feb 2022 16:23:22 +0000 (17:23 +0100)]
tools/nolibc/ctype: add the missing is* functions

There was only isdigit, this commit adds the other ones.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/ctype: split the is* functions to ctype.h
Willy Tarreau [Mon, 7 Feb 2022 16:23:21 +0000 (17:23 +0100)]
tools/nolibc/ctype: split the is* functions to ctype.h

In fact there's only isdigit() for now. More should definitely be added.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/string: split the string functions into string.h
Willy Tarreau [Mon, 7 Feb 2022 16:23:20 +0000 (17:23 +0100)]
tools/nolibc/string: split the string functions into string.h

The string manipulation functions (mem*, str*) are now found in
string.h. The file depends on almost nothing and will be
usable from other includes if needed. Maybe more functions could
be added.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/stdlib: extract the stdlib-specific functions to their own file
Willy Tarreau [Mon, 7 Feb 2022 16:23:19 +0000 (17:23 +0100)]
tools/nolibc/stdlib: extract the stdlib-specific functions to their own file

The new file stdlib.h contains the definitions of functions that
are usually found in stdlib.h. Many more could certainly be added.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/sys: split the syscall definitions into their own file
Willy Tarreau [Mon, 7 Feb 2022 16:23:18 +0000 (17:23 +0100)]
tools/nolibc/sys: split the syscall definitions into their own file

The syscall definitions were moved to sys.h. They were arranged
in a more easily maintainable order, whereby the sys_xxx() and xxx()
functions were grouped together, which also enlights the occasional
mappings such as wait relying on wait4().

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/arch: split arch-specific code into individual files
Willy Tarreau [Mon, 7 Feb 2022 16:23:17 +0000 (17:23 +0100)]
tools/nolibc/arch: split arch-specific code into individual files

In order to ease maintenance, this splits the arch-specific code into
one file per architecture. A common file "arch.h" is used to include the
right file among arch-* based on the detected architecture. Projects
which are already split per architecture could simply rename these
files to $arch/arch.h and get rid of the common arch.h. For this
reason, include guards were placed into each arch-specific file.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/types: split syscall-specific definitions into their own files
Willy Tarreau [Mon, 7 Feb 2022 16:23:16 +0000 (17:23 +0100)]
tools/nolibc/types: split syscall-specific definitions into their own files

The macros and type definitions used by a number of syscalls were moved
to types.h where they will be easier to maintain. A few of them
are arch-specific and must not be moved there (e.g. O_*, sys_stat_struct).
A warning about them was placed at the top of the file.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc/std: move the standard type definitions to std.h
Willy Tarreau [Mon, 7 Feb 2022 16:23:15 +0000 (17:23 +0100)]
tools/nolibc/std: move the standard type definitions to std.h

The ordering of includes and definitions for now is a bit of a mess, as
for example asm/signal.h is included after int defintions, but plenty of
structures are defined later as they rely on other includes.

Let's move the standard type definitions to a dedicated file that is
included first. We also move NULL there. This way all other includes
are aware of it, and we can bring asm/signal.h back to the top of the
file.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc: guard the main file against multiple inclusion
Willy Tarreau [Mon, 7 Feb 2022 16:23:14 +0000 (17:23 +0100)]
tools/nolibc: guard the main file against multiple inclusion

Including nolibc.h multiple times results in build errors due to multiple
definitions. Let's add a guard against multiple inclusions.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/nolibc: use pselect6 on RISCV
Willy Tarreau [Mon, 7 Feb 2022 16:23:13 +0000 (17:23 +0100)]
tools/nolibc: use pselect6 on RISCV

This arch doesn't provide the old-style select() syscall, we have to
use pselect6().

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcutorture: Suppress debugging grace period delays during flooding
Paul E. McKenney [Fri, 4 Feb 2022 20:45:18 +0000 (12:45 -0800)]
rcutorture: Suppress debugging grace period delays during flooding

Tree RCU supports grace-period delays using the rcutree.gp_cleanup_delay,
rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot
parameters.  These delays are strictly for debugging purposes, and have
proven quite effective at exposing bugs involving race with CPU-hotplug
operations.  However, these delays can result in false positives when
used in conjunction with callback flooding, for example, those generated
by the rcutorture.fwd_progress kernel boot parameter.

This commit therefore suppresses grace-period delays while callback
flooding is in progress.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotorture: Add rcu_normal and rcu_expedited runs to torture.sh
Paul E. McKenney [Fri, 4 Feb 2022 01:53:22 +0000 (17:53 -0800)]
torture: Add rcu_normal and rcu_expedited runs to torture.sh

Currently, the rcupdate.rcu_normal and rcupdate.rcu_expedited kernel
boot parameters are not regularly tested.  The potential addition of
polled expedited grace-period APIs increases the amount of code that is
affected by these kernel boot parameters.  This commit therefore adds a
"--do-rt" argument to torture.sh to exercise these kernel-boot options.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agoEXP rcutorture: Test polled expedited grace-period primitives
Paul E. McKenney [Tue, 1 Feb 2022 15:01:20 +0000 (07:01 -0800)]
EXP rcutorture: Test polled expedited grace-period primitives

This commit adds tests of get_state_synchronize_rcu_expedited(),
start_poll_synchronize_rcu_expedited(), poll_state_synchronize_rcu_expedited(),
and cond_synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agoEXP rcu: Add polled expedited grace-period primitives
Paul E. McKenney [Tue, 1 Feb 2022 00:55:52 +0000 (16:55 -0800)]
EXP rcu: Add polled expedited grace-period primitives

This is an experimental proof of concept of polled expedited grace-period
functions.  These functions are get_state_synchronize_rcu_expedited(),
start_poll_synchronize_rcu_expedited(), poll_state_synchronize_rcu_expedited(),
and cond_synchronize_rcu_expedited(), which are similar to
get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
poll_state_synchronize_rcu(), and cond_synchronize_rcu(), respectively.

One limitation is that start_poll_synchronize_rcu_expedited() cannot
be invoked before workqueues are initialized.

[ paulmck: Apply feedback from Neeraj Upadhyay. ]

Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
Cc: Brian Foster <bfoster@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Kent <raven@themaw.net>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agoEXP tick: Detect and fix jiffies update stall
Frederic Weisbecker [Wed, 2 Feb 2022 00:01:07 +0000 (01:01 +0100)]
EXP tick: Detect and fix jiffies update stall

On some rare cases, the timekeeper CPU may be delaying its jiffies
update duty for a while. Known causes include:

* The timekeeper is waiting on stop_machine in a MULTI_STOP_DISABLE_IRQ
  or MULTI_STOP_RUN state. Disabled interrupts prevent from timekeeping
  updates while waiting for the target CPU to complete its
  stop_machine() callback.

* The timekeeper vcpu has VMEXIT'ed for a long while due to some overload
  on the host.

Detect and fix these situations with emergency timekeeping catchups.

Original-patch-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Clarify fill-the-gap comment in rcu_segcblist_advance()
Paul E. McKenney [Wed, 2 Feb 2022 17:10:04 +0000 (09:10 -0800)]
rcu: Clarify fill-the-gap comment in rcu_segcblist_advance()

Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agoEXP rcu-tasks: Check for abandoned callbacks
Paul E. McKenney [Tue, 7 Dec 2021 00:19:40 +0000 (16:19 -0800)]
EXP rcu-tasks: Check for abandoned callbacks

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agoMerge branch 'lkmm-dev.2022.02.01b' into HEAD
Paul E. McKenney [Mon, 14 Feb 2022 18:42:18 +0000 (10:42 -0800)]
Merge branch 'lkmm-dev.2022.02.01b' into HEAD

lkmm-dev.2022.02.01b: LKMM development branch.

3 years agoMerge branch 'clocksource.2022.02.01b' into HEAD
Paul E. McKenney [Mon, 14 Feb 2022 18:41:39 +0000 (10:41 -0800)]
Merge branch 'clocksource.2022.02.01b' into HEAD

clocksource.2022.02.01b: Clock-source watchdog updates.

3 years agoMerge branch 'lkmm.2022.02.01b' into HEAD
Paul E. McKenney [Mon, 14 Feb 2022 18:40:07 +0000 (10:40 -0800)]
Merge branch 'lkmm.2022.02.01b' into HEAD

lkmm.2022.02.01b: Linux-kernel memory-model (LKMM) updates.

3 years agoMerge branches 'exp.2022.02.08a', 'fixes.2022.02.14a', 'rcu_barrier.2022.02.08a'...
Paul E. McKenney [Mon, 14 Feb 2022 18:38:19 +0000 (10:38 -0800)]
Merge branches 'exp.2022.02.08a', 'fixes.2022.02.14a', 'rcu_barrier.2022.02.08a', 'rcu-tasks.2022.02.08a', 'rt.2022.02.01b', 'srcu.2022.02.08a', 'torture.2022.02.01b' and 'torturescript.2022.02.08a' into HEAD

exp.2022.02.08a: Expedited grace-period updates.
fixes.2022.02.14a: Miscellaneous fixes.
rcu_barrier.2022.02.08a: Make rcu_barrier() no longer exclude CPU hotplug.
rcu-tasks.2022.02.08a: RCU-tasks updates.
rt.2022.02.01b: Real-time-related updates.
srcu.2022.02.08a: Put SRCU on a memory diet.
torture.2022.02.01b: Torture-test updates.
torturescript.2022.02.08a: Torture-test scripting updates.

3 years agorcu: Replace cpumask_weight with cpumask_empty where appropriate
Yury Norov [Sun, 23 Jan 2022 18:38:53 +0000 (10:38 -0800)]
rcu: Replace cpumask_weight with cpumask_empty where appropriate

In some places, RCU code calls cpumask_weight() to check if any bit of a
given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Remove __read_mostly annotations from rcu_scheduler_active externs
Ingo Molnar [Sat, 15 Jan 2022 00:16:55 +0000 (16:16 -0800)]
rcu: Remove __read_mostly annotations from rcu_scheduler_active externs

Remove the __read_mostly attributes from the rcu_scheduler_active
extern declarations, because these attributes are ignored for
prototypes and we'd have to include the full <linux/cache.h> header
to gain this functionally pointless attribute defined.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Uninline multi-use function: finish_rcuwait()
Ingo Molnar [Sat, 15 Jan 2022 00:07:28 +0000 (16:07 -0800)]
rcu: Uninline multi-use function: finish_rcuwait()

This is a rarely used function, so uninlining its 3 instructions
is probably a win or a wash - but the main motivation is to
make <linux/rcuwait.h> independent of task_struct details.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Mark writes to the rcu_segcblist structure's ->flags field
Paul E. McKenney [Tue, 4 Jan 2022 18:34:34 +0000 (10:34 -0800)]
rcu: Mark writes to the rcu_segcblist structure's ->flags field

KCSAN reports data races between the rcu_segcblist_clear_flags() and
rcu_segcblist_set_flags() functions, though misreporting the latter
as a call to rcu_segcblist_is_enabled() from call_rcu().  This commit
converts the updates of this field to WRITE_ONCE(), relying on the
resulting unmarked reads to continue to detect buggy concurrent writes
to this field.

Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
3 years agokasan: Record work creation stack trace with interrupts enabled
Zqiang [Sun, 26 Dec 2021 00:52:04 +0000 (08:52 +0800)]
kasan: Record work creation stack trace with interrupts enabled

Recording the work creation stack trace for KASAN reports in
call_rcu() is expensive, due to unwinding the stack, but also
due to acquiring depot_lock inside stackdepot (which may be contended).
Because calling kasan_record_aux_stack_noalloc() does not require
interrupts to already be disabled, this may unnecessarily extend
the time with interrupts disabled.

Therefore, move calling kasan_record_aux_stack() before the section
with interrupts disabled.

Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Inline __call_rcu() into call_rcu()
Paul E. McKenney [Sat, 18 Dec 2021 17:30:33 +0000 (09:30 -0800)]
rcu: Inline __call_rcu() into call_rcu()

Because __call_rcu() is invoked only by call_rcu(), this commit inlines
the former into the latter.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Add mutex for rcu boost kthread spawning and affinity setting
David Woodhouse [Wed, 8 Dec 2021 23:41:53 +0000 (23:41 +0000)]
rcu: Add mutex for rcu boost kthread spawning and affinity setting

As we handle parallel CPU bringup, we will need to take care to avoid
spawning multiple boost threads, or race conditions when setting their
affinity. Spotted by Paul McKenney.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Fix description of kvfree_rcu()
Uladzislau Rezki (Sony) [Wed, 1 Dec 2021 09:20:53 +0000 (10:20 +0100)]
rcu: Fix description of kvfree_rcu()

The kvfree_rcu() header comment's description of the "ptr" parameter
is unclear, therefore rephrase it to make it clear that it is a pointer
to the memory to eventually be passed to kvfree().

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agoMAINTAINERS: Add Frederic and Neeraj to their RCU files
Paul E. McKenney [Wed, 1 Dec 2021 14:27:59 +0000 (06:27 -0800)]
MAINTAINERS:  Add Frederic and Neeraj to their RCU files

Adding Frederic as an RCU maintainer for kernel/rcu/tree_nocb.h given his
work with offloading and de-offloading callbacks from CPUs.  Also adding
Neeraj for kernel/rcu/tasks.h given his focused work on RCU Tasks Trace.
As in I am reasonably certain that each understands the full contents
of the corresponding file.

Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
3 years agorcutorture: Provide non-power-of-two Tasks RCU scenarios
Paul E. McKenney [Tue, 1 Feb 2022 16:23:46 +0000 (08:23 -0800)]
rcutorture: Provide non-power-of-two Tasks RCU scenarios

This commit adjusts RUDE01 to 3 CPUs and TRACE01 to 5 CPUs in order to
test Tasks RCU's ability to handle non-power-of-two numbers of CPUs.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcutorture: Test SRCU size transitions
Paul E. McKenney [Mon, 31 Jan 2022 23:03:36 +0000 (15:03 -0800)]
rcutorture: Test SRCU size transitions

Thie commit adds kernel boot parameters to the SRCU-N and SRCU-P
rcutorture scenarios to cause SRCU-N to test contention-based resizing
and SRCU-P to test init_srcu_struct()-time resizing.  Note that this
also tests never-resizing because the contention-based resizing normally
takes some minutes to make the shift.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotorture: Make torture.sh help message match reality
Paul E. McKenney [Thu, 27 Jan 2022 17:39:15 +0000 (09:39 -0800)]
torture: Make torture.sh help message match reality

This commit fixes a couple of typos: s/--doall/--do-all/ and
s/--doallmodconfig/--do-allmodconfig/.

[ paulmck: Add Fixes: supplied by Paul Menzel. ]

Fixes: a115a775a8d5 ("torture: Add "make allmodconfig" to torture.sh")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Allow expedited RCU grace periods on incoming CPUs
Paul E. McKenney [Tue, 18 Jan 2022 21:54:19 +0000 (13:54 -0800)]
rcu: Allow expedited RCU grace periods on incoming CPUs

Although it is usually safe to invoke synchronize_rcu_expedited() from a
preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
invoke a workqueue handler will hang due to RCU waiting on a CPU that
the scheduler is not paying attention to.  This commit therefore expands
use of the existing workqueue-independent synchronize_rcu_expedited()
from early boot to also include CPUs that are being hotplugged.

Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Add contention check to call_srcu() srcu_data ->lock acquisition
Paul E. McKenney [Mon, 31 Jan 2022 21:27:15 +0000 (13:27 -0800)]
srcu: Add contention check to call_srcu() srcu_data ->lock acquisition

This commit increases the sensitivity of contention detection by adding
checks to the acquisition of the srcu_data structure's lock on the
call_srcu() code path.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Automatically determine size-transition strategy at boot
Paul E. McKenney [Mon, 31 Jan 2022 19:21:30 +0000 (11:21 -0800)]
srcu: Automatically determine size-transition strategy at boot

This commit adds a srcutree.convert_to_big option of zero that causes
SRCU to decide at boot whether to wait for contention (small systems) or
immediately expand to large (large systems).  A new srcutree.big_cpu_lim
(defaulting to 128) defines how many CPUs constitute a large system.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Make srcu_size_state_name static
Jiapeng Chong [Sat, 29 Jan 2022 03:45:02 +0000 (11:45 +0800)]
srcu: Make srcu_size_state_name static

This symbol is not used outside of srcutree.c, so this commit marks it static.

Doing so fixes the following sparse warning:

kernel/rcu/srcutree.c:1426:12: warning: symbol 'srcu_size_state_name'
was not declared. Should it be static?

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Add contention-triggered addition of srcu_node tree
Paul E. McKenney [Fri, 28 Jan 2022 04:32:05 +0000 (20:32 -0800)]
srcu: Add contention-triggered addition of srcu_node tree

This commit instruments the acquisitions of the srcu_struct structure's
->lock, enabling the initiation of a transition from SRCU_SIZE_SMALL
to SRCU_SIZE_BIG when sufficient contention is experienced.  The
instrumentation counts the number of trylock failures within the confines
of a single jiffy.  If that number exceeds the value specified by the
srcutree.small_contention_lim kernel boot parameter (which defaults to
100), and if the value specified by the srcutree.convert_to_big kernel
boot parameter has the 0x10 bit set (defaults to 0), then a transition
will be automatically initiated.

By default, there will never be any transitions, so that none of the
srcu_struct structures ever gains an srcu_node array.

The useful values for srcutree.convert_to_big are:

0x00:  Decide conversion approach at boot given system size.
0x01:  Never convert.
0x02:  Always convert at init_srcu_struct() time.
0x03:  Convert when rcutorture prints its first round of statistics.
0x11:  Convert if contention is encountered.
0x12:  Convert if contention is encountered or when rcutorture prints
        its first round of statistics, whichever comes first.

The value 0x12 acts the same as 0x02 because the conversion happens
before there is any chance of contention.

[ paulmck: Apply "static" feedback from kernel test robot. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Create concurrency-safe helper for initiating size transition
Paul E. McKenney [Thu, 27 Jan 2022 22:56:39 +0000 (14:56 -0800)]
srcu: Create concurrency-safe helper for initiating size transition

Once there are contention-initiated size transitions, it will be
possible for rcutorture to initiate a transition at the same time
as a contention-initiated transition.  This commit therefore creates
a concurrency-safe helper function named srcu_transition_to_big() to
safely initiate size transitions.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Explain srcu_funnel_gp_start() call to list_add() is safe
Paul E. McKenney [Thu, 27 Jan 2022 21:47:42 +0000 (13:47 -0800)]
srcu: Explain srcu_funnel_gp_start() call to list_add() is safe

This commit adds a comment explaining why an unprotected call to
list_add() from srcu_funnel_gp_start() can be safe.  TL;DR: It is only
called during very early boot when we don't have no steeking concurrency!

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Prevent cleanup_srcu_struct() from freeing non-dynamic ->sda
Paul E. McKenney [Thu, 27 Jan 2022 21:20:49 +0000 (13:20 -0800)]
srcu: Prevent cleanup_srcu_struct() from freeing non-dynamic ->sda

When an srcu_struct structure is created (but not in a kernel module)
by DEFINE_SRCU() and friends, the per-CPU srcu_data structure is
statically allocated.  In all other cases, that structure is obtained
from alloc_percpu(), in which case cleanup_srcu_struct() must invoke
free_percpu() on the resulting ->sda pointer in the srcu_struct pointer.

Which it does.

Except that it also invokes free_percpu() on the ->sda pointer
referencing the statically allocated per-CPU srcu_data structures.
Which free_percpu() is surprisingly OK with.

This commit nevertheless stops cleanup_srcu_struct() from freeing
statically allocated per-CPU srcu_data structures.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Avoid NULL dereference in srcu_torture_stats_print()
Paul E. McKenney [Thu, 27 Jan 2022 19:43:11 +0000 (11:43 -0800)]
srcu: Avoid NULL dereference in srcu_torture_stats_print()

You really shouldn't invoke srcu_torture_stats_print() after invoking
cleanup_srcu_struct(), but there is really no reason to get a
compiler-obfuscated per-CPU-variable NULL pointer dereference as the
diagnostic.  This commit therefore checks for NULL ->sda and makes a
more polite console-message complaint in that case.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Use invalid initial value for srcu_node GP sequence numbers
Paul E. McKenney [Thu, 27 Jan 2022 01:03:06 +0000 (17:03 -0800)]
srcu: Use invalid initial value for srcu_node GP sequence numbers

Currently, tree SRCU relies on the srcu_node structures being initialized
at the same time that the srcu_struct itself is initialized, and thus
use the initial grace-period sequence number as the initial value for
the srcu_node structure's ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp
fields.  Although this has a high probability of also working when the
srcu_node array is allocated and initialized at some random later time,
it would be better to avoid leaving such things to chance.

This commit therefore initializes these fields with 0x1, which is a
recognizable invalid value.  It then adds the required checks for this
invalid value in order to avoid confusion on long-running kernels
(especially those on 32-bit systems) that allocate and initialize
srcu_node arrays late in life.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Compute snp_seq earlier in srcu_funnel_gp_start()
Paul E. McKenney [Thu, 27 Jan 2022 00:01:26 +0000 (16:01 -0800)]
srcu: Compute snp_seq earlier in srcu_funnel_gp_start()

Currently, srcu_funnel_gp_start() tests snp->srcu_have_cbs[idx] and then
separately assigns it to the snp_seq local variable.  This commit does
the assignment earlier to simplify the code a bit.  While in the area,
this commit also takes advantage of the 100-character line limit to put
the call to srcu_schedule_cbs_sdp() on a single line.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Use export for srcu_struct defined by DEFINE_STATIC_SRCU()
Alexander Aring [Wed, 26 Jan 2022 15:03:54 +0000 (10:03 -0500)]
srcu: Use export for srcu_struct defined by DEFINE_STATIC_SRCU()

If an srcu_struct structure defined by tree SRCU's DEFINE_STATIC_SRCU()
is used by a module, sparse will give the following diagnostic:

sparse: symbol '__srcu_struct_nodes_srcu' was not declared. Should it be static?

The problem is that a within-module DEFINE_STATIC_SRCU() must define
a non-static srcu_struct because it is exported by referencing it in a
special '__section("___srcu_struct_ptrs")'.  This reference is needed
so that module load and unloading can invoke init_srcu_struct() and
cleanup_srcu_struct(), respectively.  Unfortunately, sparse is unaware of
'__section("___srcu_struct_ptrs")', resulting in the above false-positive
diagnostic.  To avoid this false positive, this commit therefore creates
a prototype of the srcu_struct with an "extern" keyword.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Add boot-time control over srcu_node array allocation
Paul E. McKenney [Tue, 25 Jan 2022 23:41:10 +0000 (15:41 -0800)]
srcu: Add boot-time control over srcu_node array allocation

This commit adds an srcu_tree.convert_to_big kernel parameter that either
refuses to convert at all (0), converts immediately at init_srcu_struct()
time (1), or lets rcutorture convert it (2).  An addition contention-based
dynamic conversion choice will be added, along with documentation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Make rcutorture dump the SRCU size state
Paul E. McKenney [Tue, 25 Jan 2022 01:05:51 +0000 (17:05 -0800)]
srcu: Make rcutorture dump the SRCU size state

This commit adds the numeric and string version of ->srcu_size_state to
the Tree-SRCU-specific portion of the rcutorture output.

[ paulmck: Apply feedback from kernel test robot and Dan Carpenter. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Add size-state transitioning code
Paul E. McKenney [Mon, 24 Jan 2022 23:41:32 +0000 (15:41 -0800)]
srcu: Add size-state transitioning code

This is just dead code at the moment, but it serves to prevent
spurious compiler warnings about init_srcu_struct_nodes() being unused.
This function will once again be used once the state-transition code
is activated.

Because srcu_barrier() must be aware of transition before call_srcu(), the
state machine waits for an SRCU grace period before callbacks are queued
to the non-CPU-0 queues.  This requres that portions of srcu_barrier()
be enclosed in an SRCU read-side critical section.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Make Tree SRCU able to operate without snp_node array
Paul E. McKenney [Mon, 24 Jan 2022 17:46:57 +0000 (09:46 -0800)]
srcu: Make Tree SRCU able to operate without snp_node array

This commit makes Tree SRCU able to operate without an snp_node
array, that is, when the srcu_data structures' ->mynode pointers
are NULL.  This can result in high contention on the srcu_struct
structure's ->lock, but only when there are lots of call_srcu(),
synchronize_srcu(), and synchronize_srcu_expedited() calls.

Note that when there is no snp_node array, all SRCU callbacks use
CPU 0's callback queue.  This is optimal in the common case of low
update-side load because it removes the need to search each CPU
for the single callback that made the grace period happen.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu-tasks: Set ->percpu_enqueue_shift to zero upon contention
Paul E. McKenney [Thu, 3 Feb 2022 00:34:40 +0000 (16:34 -0800)]
rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention

Currently, call_rcu_tasks_generic() sets ->percpu_enqueue_shift to
order_base_2(nr_cpu_ids) upon encountering sufficient contention.
This does not shift to use of non-CPU-0 callback queues as intended, but
rather continues using only CPU 0's queue.  Although this does provide
some decrease in contention due to spreading work over multiple locks,
it is not the dramatic decrease that was intended.

This commit therefore makes call_rcu_tasks_generic() set
->percpu_enqueue_shift to 0.

Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu-tasks: Use order_base_2() instead of ilog2()
Paul E. McKenney [Wed, 2 Feb 2022 23:42:36 +0000 (15:42 -0800)]
rcu-tasks: Use order_base_2() instead of ilog2()

The ilog2() function can be used to generate a shift count, but it will
generate the same count for a power of two as for one greater than a power
of two.  This results in shift counts that are larger than necessary for
systems with a power-of-two number of CPUs because the CPUs are numbered
from zero, so that the maximum CPU number is one less than that power
of two.

This commit therefore substitutes order_base_2(), which appears to have
been designed for exactly this use case.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Create and use an rcu_rdp_cpu_online()
Paul E. McKenney [Fri, 10 Dec 2021 21:44:17 +0000 (13:44 -0800)]
rcu: Create and use an rcu_rdp_cpu_online()

The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently
in RCU code in order to determine whether rdp->cpu is online from an
RCU perspective.  This commit therefore creates an rcu_rdp_cpu_online()
function to replace it.

[ paulmck: Apply kernel test robot unused-variable feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Make rcu_barrier() no longer block CPU-hotplug operations
Paul E. McKenney [Tue, 14 Dec 2021 21:35:17 +0000 (13:35 -0800)]
rcu: Make rcu_barrier() no longer block CPU-hotplug operations

This commit removes the cpus_read_lock() and cpus_read_unlock() calls
from rcu_barrier(), thus allowing CPUs to come and go during the course
of rcu_barrier() execution.  Posting of the ->barrier_head callbacks does
synchronize with portions of RCU's CPU-hotplug notifiers, but these locks
are held for short time periods on both sides.  Thus, full CPU-hotplug
operations could both start and finish during the execution of a given
rcu_barrier() invocation.

Additional synchronization is provided by a global ->barrier_lock.
Since the ->barrier_lock is only used during rcu_barrier() execution and
during onlining/offlining a CPU, the contention for this lock should
be low.  It might be tempting to make use of a per-CPU lock just on
general principles, but straightforward attempts to do this have the
problems shown below.

Initial state: 3 CPUs present, CPU 0 and CPU1 do not have
any callback and CPU2 has callbacks.

1. CPU0 calls rcu_barrier().

2. CPU1 starts offlining for CPU2. CPU1 calls
   rcutree_migrate_callbacks(). rcu_barrier_entrain() is called
   from rcutree_migrate_callbacks(), with CPU2's rdp->barrier_lock.
   It does not entrain ->barrier_head for CPU2, as rcu_barrier()
   on CPU0 hasn't started the barrier sequence (by calling
   rcu_seq_start(&rcu_state.barrier_sequence)) yet.

3. CPU0 starts new barrier sequence. It iterates over
   CPU0 and CPU1, after acquiring their per-cpu ->barrier_lock
   and finds 0 segcblist length. It updates ->barrier_seq_snap
   for CPU0 and CPU1 and continues loop iteration to CPU2.

    for_each_possible_cpu(cpu) {
        raw_spin_lock_irqsave(&rdp->barrier_lock, flags);
        if (!rcu_segcblist_n_cbs(&rdp->cblist)) {
            WRITE_ONCE(rdp->barrier_seq_snap, gseq);
            raw_spin_unlock_irqrestore(&rdp->barrier_lock, flags);
            rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence);
            continue;
        }

4. rcutree_migrate_callbacks() completes execution on CPU1.
   Segcblist len for CPU2 becomes 0.

5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&rdp->cblist)
   for CPU2 and completes the loop iteration after setting
   ->barrier_seq_snap.

6. As there isn't any ->barrier_head callback entrained; at
   this point, rcu_barrier() in CPU0 returns.

7. The callbacks, which migrated from CPU2 to CPU1, execute.

Straightforward per-CPU locking is also subject to the following race
condition noted by Boqun Feng:

1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking
   rcu_seq_start() and init_completion(), but does not yet initialize
   rcu_state.barrier_cpu_count.

2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(),
   which in turn calls rcu_barrier_entrain() holding CPU2's.
   rdp->barrier_lock.  It then entrains ->barrier_head for CPU2
   and atomically increments rcu_state.barrier_cpu_count, which is
   unfortunately not yet initialized to the value 2.

3. The just-entrained RCU callback is invoked.  It atomically
   decrements rcu_state.barrier_cpu_count and sees that it is
   now zero.  This callback therefore invokes complete().

4. CPU0 continues executing rcu_barrier(), but is not blocked
   by its call to wait_for_completion().  This results in rcu_barrier()
   returning before all pre-existing callbacks have been invoked,
   which is a bug.

Therefore, synchronization is provided by rcu_state.barrier_lock,
which is also held across the initialization sequence, especially the
rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count
to the value 2.  In addition, this lock is held when entraining the
rcu_barrier() callback, when deciding whether or not a CPU has callbacks
that rcu_barrier() must wait on, when setting the ->qsmaskinitnext for
incoming CPUs, and when migrating callbacks from a CPU that is going
offline.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Rework rcu_barrier() and callback-migration logic
Paul E. McKenney [Tue, 14 Dec 2021 21:15:18 +0000 (13:15 -0800)]
rcu: Rework rcu_barrier() and callback-migration logic

This commit reworks rcu_barrier() and callback-migration logic to
permit allowing rcu_barrier() to run concurrently with CPU-hotplug
operations.  The key trick is for callback migration to check to see if
an rcu_barrier() is in flight, and, if so, enqueue the ->barrier_head
callback on its behalf.

This commit adds synchronization with RCU's CPU-hotplug notifiers.  Taken
together, this will permit a later commit to remove the cpus_read_lock()
and cpus_read_unlock() calls from rcu_barrier().

[ paulmck: Updated per kbuild test robot feedback. ]
[ paulmck: Updated per reviews session with Neeraj, Frederic, Uladzislau, and Boqun. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Refactor rcu_barrier() empty-list handling
Paul E. McKenney [Sat, 11 Dec 2021 00:25:20 +0000 (16:25 -0800)]
rcu: Refactor rcu_barrier() empty-list handling

This commit saves a few lines by checking first for an empty callback
list.  If the callback list is empty, then that CPU is taken care of,
regardless of its online or nocb state.  Also simplify tracing accordingly
and fold a few lines together.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agorcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
David Woodhouse [Tue, 16 Feb 2021 15:04:34 +0000 (15:04 +0000)]
rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion

If we allow architectures to bring APs online in parallel, then we end
up requiring rcu_cpu_starting() to be reentrant. But currently, the
manipulation of rnp->ofl_seq is not thread-safe.

However, rnp->ofl_seq is also fairly much pointless anyway since both
rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for
fairly much the whole time that rnp->ofl_seq is set to an odd number
to indicate that an operation is in progress.

So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock.

This has a couple of minor complexities: lockdep will complain when we
take rcu_state.ofl_lock, and currently accepts the 'excuse' of having
an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to
avoid that false positive complaint. Since we're killing rnp->ofl_seq
of course that 'excuse' has to be changed too, so make it check for
arch_spin_is_locked(rcu_state.ofl_lock).

There's no arch_spin_lock_irqsave() so we have to manually save and
restore local interrupts around the locking.

At Paul's request based on Neeraj's analysis, make rcu_gp_init not just
wait but *exclude* any CPU online/offline activity, which was fairly
much true already by virtue of it holding rcu_state.ofl_lock.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agosrcu: Dynamically allocate srcu_node array
Paul E. McKenney [Sat, 22 Jan 2022 00:13:52 +0000 (16:13 -0800)]
srcu: Dynamically allocate srcu_node array

This commit shrinks the srcu_struct structure by converting its ->node
field from a fixed-size compile-time array to a pointer to a dynamically
allocated array.  In kernels built with large values of NR_CPUS that boot
on systems with smaller numbers of CPUs, this can save significant memory.

[ paulmck: Apply kernel test robot feedback. ]

Reported-by: A cast of thousands
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agoclocksource: Add a Kconfig option for WATCHDOG_MAX_SKEW
Waiman Long [Mon, 6 Dec 2021 03:38:15 +0000 (22:38 -0500)]
clocksource: Add a Kconfig option for WATCHDOG_MAX_SKEW

A watchdog maximum skew of 100us may still be too small for
some systems or archs. It may also be too small when some kernel
debug config options are enabled.  So add a new Kconfig option
CLOCKSOURCE_WATCHDOG_MAX_SKEW_US to allow kernel builders to have more
control on the threshold for marking clocksource as unstable.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Use "-unroll 0" to keep --hw runs finite
Paul E. McKenney [Tue, 25 Jun 2019 05:30:32 +0000 (22:30 -0700)]
tools/memory-model: Use "-unroll 0" to keep --hw runs finite

Litmus tests involving atomic operations produce LL/SC loops on a number
of architectures, and unrolling these loops can result in excessive
verification times or even stack overflows.  This commit therefore uses
the "-unroll 0" herd7 argument to avoid unrolling, on the grounds that
additional passes through an LL/SC loop should not change the verification.

Note however, that certain bugs in the mapping of the LL/SC loop to
machine instructions may go undetected.  On the other hand, herd7 might
not be the best vehicle for finding such bugs in any case.  (You do
stress-test your architecture-specific code, don't you?)

Suggested-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Make judgelitmus.sh handle scripted Result: tag
Paul E. McKenney [Thu, 6 Jun 2019 09:13:27 +0000 (02:13 -0700)]
tools/memory-model: Make judgelitmus.sh handle scripted Result: tag

The scripts that generate the litmus tests in the "auto" directory of
the https://github.com/paulmckrcu/litmus archive place the "Result:"
tag into a single-line ocaml comment, which judgelitmus.sh currently
does not recognize.  This commit therefore makes judgelitmus.sh
recognize both the multiline comment format that it currently does
and the automatically generated single-line format.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Add data-race capabilities to judgelitmus.sh
Paul E. McKenney [Fri, 3 May 2019 14:34:20 +0000 (07:34 -0700)]
tools/memory-model: Add data-race capabilities to judgelitmus.sh

This commit adds functionality to judgelitmus.sh to allow it to handle
both the "DATARACE" markers in the "Result:" comments in litmus tests
and the "Flag data-race" markers in LKMM output.  For C-language tests,
if either marker is present, the other must also be as well, at least for
litmus tests having a "Result:" comment.  If the LKMM output indicates
a data race, then failures of the Always/Sometimes/Never portion of the
"Result:" prediction are forgiven.

The reason for forgiving "Result:" mispredictions is that data races can
result in "interesting" compiler optimizations, so that all bets are off
in the data-race case.

[ paulmck: Apply Akira Yokosawa feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Add checktheselitmus.sh to run specified litmus tests
Paul E. McKenney [Thu, 2 May 2019 17:05:14 +0000 (10:05 -0700)]
tools/memory-model: Add checktheselitmus.sh to run specified litmus tests

This commit adds a checktheselitmus.sh script that runs the litmus tests
specified on the command line.  This is useful for verifying fixes to
specific litmus tests.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Repair parseargs.sh header comment
Paul E. McKenney [Thu, 2 May 2019 17:03:29 +0000 (10:03 -0700)]
tools/memory-model: Repair parseargs.sh header comment

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Add "--" to parseargs.sh for additional arguments
Paul E. McKenney [Thu, 2 May 2019 16:51:57 +0000 (09:51 -0700)]
tools/memory-model:  Add "--" to parseargs.sh for additional arguments

Currently, parseargs.sh expects to consume all the command-line arguments,
which prevents the calling script from having any of its own arguments.
This commit therefore causes parseargs.sh to stop consuming arguments
when it encounters a "--" argument, leaving any remaining arguments for
the calling script.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Make history-check scripts use mselect7
Paul E. McKenney [Mon, 8 Apr 2019 17:02:23 +0000 (10:02 -0700)]
tools/memory-model: Make history-check scripts use mselect7

The history-check scripts currently use grep to ignore non-C-language
litmus tests, which is a bit fragile.  This commit therefore enlists the
aid of "mselect7 -arch C", given Luc Maraget's recent modifications that
allow mselect7 to operate in filter mode.

This change requires herdtools 7.52-32-g1da3e0e50977 or later.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Make checkghlitmus.sh use mselect7
Paul E. McKenney [Mon, 8 Apr 2019 16:27:28 +0000 (09:27 -0700)]
tools/memory-model: Make checkghlitmus.sh use mselect7

The checkghlitmus.sh script currently uses grep to ignore non-C-language
litmus tests, which is a bit fragile.  This commit therefore enlists the
aid of "mselect7 -arch C", given Luc Maraget's recent modifications that
allow mselect7 to operate in filter mode.

This change requires herdtools 7.52-32-g1da3e0e50977 or later.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Fix scripting --jobs argument
Paul E. McKenney [Wed, 27 Mar 2019 18:47:14 +0000 (11:47 -0700)]
tools/memory-model: Fix scripting --jobs argument

The parseargs.sh regular expression for the --jobs argument incorrectly
requires that the number of jobs be at least 10, that is, have at least
two digits.  This commit therefore adjusts this regular expression to
allow single-digit numbers of jobs to be specified.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Implement --hw support for checkghlitmus.sh
Paul E. McKenney [Sat, 23 Mar 2019 00:18:43 +0000 (17:18 -0700)]
tools/memory-model: Implement --hw support for checkghlitmus.sh

This commits enables the "--hw" argument for the checkghlitmus.sh script,
causing it to convert any applicable C-language litmus tests to the
specified flavor of assembly language, to verify these assembly-language
litmus tests, and checking compatibility of the outcomes.

Note that the conversion does not yet handle locking, RCU, SRCU, plain
C-language memory accesses, or casts.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Add -v flag to jingle7 runs
Paul E. McKenney [Fri, 5 Apr 2019 19:34:56 +0000 (12:34 -0700)]
tools/memory-model: Add -v flag to jingle7 runs

Adding the -v flag to jingle7 invocations gives much useful information
on why jingle7 didn't like a given litmus test.  This commit therefore
adds this flag and saves off any such information into a .err file.

Suggested-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Make runlitmus.sh check for jingle errors
Paul E. McKenney [Tue, 26 Mar 2019 00:20:51 +0000 (17:20 -0700)]
tools/memory-model: Make runlitmus.sh check for jingle errors

It turns out that the jingle7 tool is currently a bit picky about
the litmus tests it is willing to process.  This commit therefore
ensures that jingle7 failures are reported.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Allow herd to deduce CPU type
Paul E. McKenney [Fri, 22 Mar 2019 15:57:20 +0000 (08:57 -0700)]
tools/memory-model: Allow herd to deduce CPU type

Currently, the scripts specify the CPU's .cat file to herd.  But this is
pointless because herd will select a good and sufficient .cat file from
the assembly-language litmus test itself.  This commit therefore removes
the -model argument to herd, allowing herd to figure the CPU family out
itself.

Note that the user can override herd's choice using the "--herdopts"
argument to the scripts.

Suggested-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
3 years agotools/memory-model: Keep assembly-language litmus tests
Paul E. McKenney [Thu, 21 Mar 2019 21:44:09 +0000 (14:44 -0700)]
tools/memory-model: Keep assembly-language litmus tests

This commit retains the assembly-language litmus tests generated from
the C-language litmus tests, appending the hardware tag to the original
C-language litmus test's filename.  Thus, S+poonceonces.litmus.AArch64
contains the Armv8 assembly language corresponding to the C-language
S+poonceonces.litmus test.

This commit also updates the .gitignore to avoid committing these
automatically generated assembly-language litmus tests.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>