Herbert Xu [Sun, 16 Feb 2025 03:07:22 +0000 (11:07 +0800)]
crypto: ahash - Add virtual address support
This patch adds virtual address support to ahash. Virtual addresses
were previously only supported through shash. The user may choose
to use virtual addresses with ahash by calling ahash_request_set_virt
instead of ahash_request_set_crypt.
The API will take care of translating this to an SG list if necessary,
unless the algorithm declares that it supports chaining. Therefore
in order for an ahash algorithm to support chaining, it must also
support virtual addresses directly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Sun, 16 Feb 2025 03:07:19 +0000 (11:07 +0800)]
crypto: tcrypt - Restore multibuffer ahash tests
This patch is a revert of commit
388ac25efc8ce3bf9768ce7bf24268d6fac285d5.
As multibuffer ahash is coming back in the form of request chaining,
restore the multibuffer ahash tests using the new interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Sun, 16 Feb 2025 03:07:17 +0000 (11:07 +0800)]
crypto: hash - Add request chaining API
This adds request chaining to the ahash interface. Request chaining
allows multiple requests to be submitted in one shot. An algorithm
can elect to receive chained requests by setting the flag
CRYPTO_ALG_REQ_CHAIN. If this bit is not set, the API will break
up chained requests and submit them one-by-one.
A new err field is added to struct crypto_async_request to record
the return value for each individual request.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Sun, 16 Feb 2025 03:07:15 +0000 (11:07 +0800)]
crypto: x86/ghash - Use proper helpers to clone request
Rather than copying a request by hand with memcpy, use the correct
API helpers to setup the new request. This will matter once the
API helpers start setting up chained requests as a simple memcpy
will break chaining.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Sun, 16 Feb 2025 03:07:12 +0000 (11:07 +0800)]
crypto: ahash - Only save callback and data in ahash_save_req
As unaligned operations are supported by the underlying algorithm,
ahash_save_req and ahash_restore_req can be greatly simplified to
only preserve the callback and data.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Christian Marangi [Sat, 15 Feb 2025 23:36:18 +0000 (00:36 +0100)]
crypto: inside-secure/eip93 - Correctly handle return of for sg_nents_for_len
Fix smatch warning for sg_nents_for_len return value in Inside Secure
EIP93 driver.
The return value of sg_nents_for_len was assigned to an u32 and the
error was ignored and converted to a positive integer.
Rework the code to correctly handle the error from sg_nents_for_len to
mute smatch warning.
Fixes:
9739f5f93b78 ("crypto: eip93 - Add Inside Secure SafeXcel EIP-93 crypto engine support")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Sat, 15 Feb 2025 00:57:51 +0000 (08:57 +0800)]
crypto: skcipher - Zap type in crypto_alloc_sync_skcipher
The type needs to be zeroed as otherwise the user could use it to
allocate an asynchronous sync skcipher.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Małgorzata Mielnik [Fri, 14 Feb 2025 16:40:43 +0000 (16:40 +0000)]
crypto: qat - refactor service parsing logic
The service parsing logic is used to parse the configuration string
provided by the user using the attribute qat/cfg_services in sysfs.
The logic relies on hard-coded strings. For example, the service
"sym;asym" is also replicated as "asym;sym".
This makes the addition of new services or service combinations
complex as it requires the addition of new hard-coded strings for all
possible combinations.
This commit addresses this issue by:
* reducing the number of internal service strings to only the basic
service representations.
* modifying the service parsing logic to analyze the service string
token by token instead of comparing a whole string with patterns.
* introducing the concept of a service mask where each service is
represented by a single bit.
* dividing the parsing logic into several functions to allow for code
reuse (e.g. by sysfs-related functions).
* introducing a new, device generation-specific function to verify
whether the requested service combination is supported by the
currently used device.
Signed-off-by: Małgorzata Mielnik <malgorzata.mielnik@intel.com>
Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Fri, 14 Feb 2025 16:40:42 +0000 (16:40 +0000)]
crypto: qat - do not export adf_cfg_services
The symbol `adf_cfg_services` is only used on the intel_qat module.
There is no need to export it.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Fri, 14 Feb 2025 06:02:08 +0000 (14:02 +0800)]
crypto: skcipher - Set tfm in SYNC_SKCIPHER_REQUEST_ON_STACK
Set the request tfm directly in SYNC_SKCIPHER_REQUEST_ON_STACK since
the tfm is already available.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Fri, 14 Feb 2025 02:31:25 +0000 (10:31 +0800)]
crypto: api - Fix larval relookup type and mask
When the lookup is retried after instance construction, it uses
the type and mask from the larval, which may not match the values
used by the caller. For example, if the caller is requesting for
a !NEEDS_FALLBACK algorithm, it may end up getting an algorithm
that needs fallbacks.
Fix this by making the caller supply the type/mask and using that
for the lookup.
Reported-by: Coiby Xu <coxu@redhat.com>
Fixes:
96ad59552059 ("crypto: api - Remove instance larval fulfilment")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Abel Vesa [Thu, 13 Feb 2025 12:37:05 +0000 (14:37 +0200)]
dt-bindings: crypto: qcom-qce: Document the X1E80100 crypto engine
Document the crypto engine on the X1E80100 Platform.
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 12 Feb 2025 06:10:07 +0000 (14:10 +0800)]
crypto: null - Use spin lock instead of mutex
As the null algorithm may be freed in softirq context through
af_alg, use spin locks instead of mutexes to protect the default
null algorithm.
Reported-by: syzbot+b3e02953598f447d4d2a@syzkaller.appspotmail.com
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 12 Feb 2025 04:48:55 +0000 (12:48 +0800)]
crypto: lib/Kconfig - Fix lib built-in failure when arch is modular
The HAVE_ARCH Kconfig options in lib/crypto try to solve the
modular versus built-in problem, but it still fails when the
the LIB option (e.g., CRYPTO_LIB_CURVE25519) is selected externally.
Fix this by introducing a level of indirection with ARCH_MAY_HAVE
Kconfig options, these then go on to select the ARCH_HAVE options
if the ARCH Kconfig options matches that of the LIB option.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202501230223.ikroNDr1-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Tue, 11 Feb 2025 09:58:53 +0000 (09:58 +0000)]
crypto: qat - reorder objects in qat_common Makefile
The objects in the qat_common Makefile are currently listed in a random
order.
Reorder the objects alphabetically to make it easier to find where to
add a new object.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Tue, 11 Feb 2025 09:58:52 +0000 (09:58 +0000)]
crypto: qat - fix object goals in Makefiles
Align with kbuild documentation by using <module_name>-y instead of
<module_name>-objs, following the kernel convention for building modules
from multiple object files.
Link: https://docs.kernel.org/kbuild/makefiles.html#loadable-module-goals-obj-m
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thorsten Blum [Tue, 11 Feb 2025 09:52:54 +0000 (10:52 +0100)]
crypto: aead - use str_yes_no() helper in crypto_aead_show()
Remove hard-coded strings by using the str_yes_no() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thorsten Blum [Mon, 10 Feb 2025 22:36:44 +0000 (23:36 +0100)]
crypto: bcm - set memory to zero only once
Use kmalloc_array() instead of kcalloc() because sg_init_table() already
sets the memory to zero. This avoids zeroing the memory twice.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Mon, 10 Feb 2025 17:17:40 +0000 (09:17 -0800)]
crypto: x86/aes-xts - change license to Apache-2.0 OR BSD-2-Clause
As with the other AES modes I've implemented, I've received interest in
my AES-XTS assembly code being reused in other projects. Therefore,
change the license to Apache-2.0 OR BSD-2-Clause like what I used for
AES-GCM. Apache-2.0 is the license of OpenSSL and BoringSSL.
Note that it is difficult to *directly* share code between the kernel,
OpenSSL, and BoringSSL for various reasons such as perlasm vs. plain
asm, Windows ABI support, different divisions of responsibility between
C and asm in each project, etc. So whether that will happen instead of
just doing ports is still TBD. But this dual license should at least
make it possible to port changes between the projects.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Mon, 10 Feb 2025 16:50:20 +0000 (08:50 -0800)]
crypto: x86/aes-ctr - rewrite AESNI+AVX optimized CTR and add VAES support
Delete aes_ctrby8_avx-x86_64.S and add a new assembly file
aes-ctr-avx-x86_64.S which follows a similar approach to
aes-xts-avx-x86_64.S in that it uses a "template" to provide AESNI+AVX,
VAES+AVX2, VAES+AVX10/256, and VAES+AVX10/512 code, instead of just
AESNI+AVX. Wire it up to the crypto API accordingly.
This greatly improves the performance of AES-CTR and AES-XCTR on
VAES-capable CPUs, with the best case being AMD Zen 5 where an over 230%
increase in throughput is seen on long messages. Performance on
non-VAES-capable CPUs remains about the same, and the non-AVX AES-CTR
code (aesni_ctr_enc) is also kept as-is for now. There are some slight
regressions (less than 10%) on some short message lengths on some CPUs;
these are difficult to avoid, given how the previous code was so heavily
unrolled by message length, and they are not particularly important.
Detailed performance results are given in the tables below.
Both CTR and XCTR support is retained. The main loop remains
8-vector-wide, which differs from the 4-vector-wide main loops that are
used in the XTS and GCM code. A wider loop is appropriate for CTR and
XCTR since they have fewer other instructions (such as vpclmulqdq) to
interleave with the AES instructions.
Similar to what was the case for AES-GCM, the new assembly code also has
a much smaller binary size, as it fixes the excessive unrolling by data
length and key length present in the old code. Specifically, the new
assembly file compiles to about 9 KB of text vs. 28 KB for the old file.
This is despite 4x as many implementations being included.
The tables below show the detailed performance results. The tables show
percentage improvement in single-threaded throughput for repeated
encryption of the given message length; an increase from 6000 MB/s to
12000 MB/s would be listed as 100%. They were collected by directly
measuring the Linux crypto API performance using a custom kernel module.
The tested CPUs were all server processors from Google Compute Engine
except for Zen 5 which was a Ryzen 9 9950X desktop processor.
Table 1: AES-256-CTR throughput improvement,
CPU microarchitecture vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
---------------------+-------+-------+-------+-------+-------+-------+
AMD Zen 5 | 232% | 203% | 212% | 143% | 71% | 95% |
Intel Emerald Rapids | 116% | 116% | 117% | 91% | 78% | 79% |
Intel Ice Lake | 109% | 103% | 107% | 81% | 54% | 56% |
AMD Zen 4 | 109% | 91% | 100% | 70% | 43% | 59% |
AMD Zen 3 | 92% | 78% | 87% | 57% | 32% | 43% |
AMD Zen 2 | 9% | 8% | 14% | 12% | 8% | 21% |
Intel Skylake | 7% | 7% | 8% | 5% | 3% | 8% |
| 300 | 200 | 64 | 63 | 16 |
---------------------+-------+-------+-------+-------+-------+
AMD Zen 5 | 57% | 39% | -9% | 7% | -7% |
Intel Emerald Rapids | 37% | 42% | -0% | 13% | -8% |
Intel Ice Lake | 39% | 30% | -1% | 14% | -9% |
AMD Zen 4 | 42% | 38% | -0% | 18% | -3% |
AMD Zen 3 | 38% | 35% | 6% | 31% | 5% |
AMD Zen 2 | 24% | 23% | 5% | 30% | 3% |
Intel Skylake | 9% | 1% | -4% | 10% | -7% |
Table 2: AES-256-XCTR throughput improvement,
CPU microarchitecture vs. message length in bytes:
| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
---------------------+-------+-------+-------+-------+-------+-------+
AMD Zen 5 | 240% | 201% | 216% | 151% | 75% | 108% |
Intel Emerald Rapids | 100% | 99% | 102% | 91% | 94% | 104% |
Intel Ice Lake | 93% | 89% | 92% | 74% | 50% | 64% |
AMD Zen 4 | 86% | 75% | 83% | 60% | 41% | 52% |
AMD Zen 3 | 73% | 63% | 69% | 45% | 21% | 33% |
AMD Zen 2 | -2% | -2% | 2% | 3% | -1% | 11% |
Intel Skylake | -1% | -1% | 1% | 2% | -1% | 9% |
| 300 | 200 | 64 | 63 | 16 |
---------------------+-------+-------+-------+-------+-------+
AMD Zen 5 | 78% | 56% | -4% | 38% | -2% |
Intel Emerald Rapids | 61% | 55% | 4% | 32% | -5% |
Intel Ice Lake | 57% | 42% | 3% | 44% | -4% |
AMD Zen 4 | 35% | 28% | -1% | 17% | -3% |
AMD Zen 3 | 26% | 23% | -3% | 11% | -6% |
AMD Zen 2 | 13% | 24% | -1% | 14% | -3% |
Intel Skylake | 16% | 8% | -4% | 35% | -3% |
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thorsten Blum [Mon, 10 Feb 2025 10:04:48 +0000 (11:04 +0100)]
crypto: ahash - use str_yes_no() helper in crypto_ahash_show()
Remove hard-coded strings by using the str_yes_no() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Sun, 9 Feb 2025 10:17:30 +0000 (18:17 +0800)]
crypto: inside-secure - Eliminate duplication in top-level Makefile
Instead of having two entries for inside-secure in the top-level
Makefile, make it just a single one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Devaraj Rangasamy [Thu, 6 Feb 2025 22:11:52 +0000 (03:41 +0530)]
crypto: ccp - Add support for PCI device 0x1134
PCI device 0x1134 shares same register features as PCI device 0x17E0.
Hence reuse same data for the new PCI device ID 0x1134.
Signed-off-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Wenkai Lin [Wed, 5 Feb 2025 03:56:28 +0000 (11:56 +0800)]
crypto: hisilicon/sec2 - fix for sec spec check
During encryption and decryption, user requests
must be checked first, if the specifications that
are not supported by the hardware are used, the
software computing is used for processing.
Fixes:
2f072d75d1ab ("crypto: hisilicon - Add aead support on SEC2")
Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Wenkai Lin [Wed, 5 Feb 2025 03:56:27 +0000 (11:56 +0800)]
crypto: hisilicon/sec2 - fix for aead authsize alignment
The hardware only supports authentication sizes
that are 4-byte aligned. Therefore, the driver
switches to software computation in this case.
Fixes:
2f072d75d1ab ("crypto: hisilicon - Add aead support on SEC2")
Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Wenkai Lin [Wed, 5 Feb 2025 03:56:26 +0000 (11:56 +0800)]
crypto: hisilicon/sec2 - fix for aead auth key length
According to the HMAC RFC, the authentication key
can be 0 bytes, and the hardware can handle this
scenario. Therefore, remove the incorrect validation
for this case.
Fixes:
2f072d75d1ab ("crypto: hisilicon - Add aead support on SEC2")
Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Frattaroli [Tue, 4 Feb 2025 15:35:52 +0000 (16:35 +0100)]
MAINTAINERS: add Nicolas Frattaroli to rockchip-rng maintainers
I maintain the rockchip,rk3588-rng bindings, and I guess also the part
of the driver that implements support for it. Therefore, add me to the
MAINTAINERS for this driver and these bindings.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Frattaroli [Tue, 4 Feb 2025 15:35:50 +0000 (16:35 +0100)]
hwrng: rockchip - add support for rk3588's standalone TRNG
The RK3588 SoC includes several TRNGs, one part of the Crypto IP block,
and the other one (referred to as "trngv1") as a standalone new IP.
Add support for this new standalone TRNG to the driver by both
generalising it to support multiple different rockchip RNGs and then
implementing the required functionality for the new hardware.
This work was partly based on the downstream vendor driver by Rockchip's
Lin Jinhan, which is why they are listed as a Co-author.
While the hardware does support notifying the CPU with an IRQ when the
random data is ready, I've discovered while implementing the code to use
this interrupt that this results in significantly slower throughput of
the TRNG even when under heavy CPU load. I assume this is because with
only 32 bytes of data per invocation, the overhead of reinitialising a
completion, enabling the interrupt, sleeping and then triggering the
completion in the IRQ handler is way more expensive than busylooping.
Speaking of busylooping, the poll interval for reading the ISTAT is an
atomic read with a delay of 0. In my testing, I've found that this gives
us the largest throughput, and it appears the random data is ready
pretty much the moment we begin polling, as increasing the poll delay
leads to a drop in throughput significant enough to not just be due to
the poll interval missing the ideal timing by a microsecond or two.
According to downstream, the IP should take 1024 clock cycles to
generate 56 bits of random data, which at 150MHz should work out to
6.8us. I did not test whether the data really does take 256/56*6.8us
to arrive, though changing the readl to a __raw_readl makes no
difference in throughput, and this data does pass the rngtest FIPS
checks, so I'm not entirely sure what's going on but I presume it's got
something to do with the AHB bus speed and the memory barriers that
mainline's readl/writel functions insert.
The only other current SoC that uses this new IP is the Rockchip RV1106,
but that SoC does not have mainline support as of the time of writing,
so we make no effort to declare it as supported for now.
Co-developed-by: Lin Jinhan <troy.lin@rock-chips.com>
Signed-off-by: Lin Jinhan <troy.lin@rock-chips.com>
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Frattaroli [Tue, 4 Feb 2025 15:35:49 +0000 (16:35 +0100)]
hwrng: rockchip - eliminate some unnecessary dereferences
Despite assigning a temporary variable the value of &pdev->dev early on
in the probe function, the probe function then continues to use this
construct when it could just use the local dev variable instead.
Simplify this by using the local dev variable directly.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Frattaroli [Tue, 4 Feb 2025 15:35:48 +0000 (16:35 +0100)]
hwrng: rockchip - store dev pointer in driver struct
The rockchip rng driver does a dance to store the dev pointer in the
hwrng's unsigned long "priv" member. However, since the struct hwrng
member of rk_rng is not a pointer, we can use container_of to get the
struct rk_rng instance from just the struct hwrng*, which means we don't
have to subvert what little there is in C of a type system and can
instead store a pointer to the device struct in the rk_rng itself.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Frattaroli [Tue, 4 Feb 2025 15:35:47 +0000 (16:35 +0100)]
dt-bindings: rng: add binding for Rockchip RK3588 RNG
The Rockchip RK3588 SoC has two hardware RNGs accessible to the
non-secure world: an RNG in the Crypto IP, and a standalone RNG that is
new to this SoC.
Add a binding for this new standalone RNG. It is distinct hardware from
the existing rockchip,rk3568-rng, and therefore gets its own binding as
the two hardware IPs are unrelated other than both being made by the
same vendor.
The RNG is capable of firing an interrupt when entropy is ready.
The reset is optional, as the hardware does a power-on reset, and
functions without the software manually resetting it.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Frattaroli [Tue, 4 Feb 2025 15:35:46 +0000 (16:35 +0100)]
dt-bindings: reset: Add SCMI reset IDs for RK3588
When TF-A is used to assert/deassert the resets through SCMI, the
IDs communicated to it are different than the ones mainline Linux uses.
Import the list of SCMI reset IDs from mainline TF-A so that devicetrees
can use these IDs more easily.
Co-developed-by: XiaoDong Huang <derrick.huang@rock-chips.com>
Signed-off-by: XiaoDong Huang <derrick.huang@rock-chips.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lukas Wunner [Mon, 3 Feb 2025 13:37:05 +0000 (14:37 +0100)]
crypto: virtio - Drop superfluous [as]kcipher_req pointer
The request context virtio_crypto_{akcipher,sym}_request contains a
pointer to the [as]kcipher_request itself.
The pointer is superfluous as it can be calculated with container_of().
Drop the superfluous pointer.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lukas Wunner [Mon, 3 Feb 2025 13:37:04 +0000 (14:37 +0100)]
crypto: virtio - Drop superfluous [as]kcipher_ctx pointer
The request context virtio_crypto_{akcipher,sym}_request contains a
pointer to the transform context virtio_crypto_[as]kcipher_ctx.
The pointer is superfluous as it can be calculated with the cheap
crypto_akcipher_reqtfm() + akcipher_tfm_ctx() and
crypto_skcipher_reqtfm() + crypto_skcipher_ctx() combos.
Drop the superfluous pointer.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lukas Wunner [Mon, 3 Feb 2025 13:37:03 +0000 (14:37 +0100)]
crypto: virtio - Drop superfluous ctx->tfm backpointer
struct virtio_crypto_[as]kcipher_ctx contains a backpointer to struct
crypto_[as]kcipher which is superfluous in two ways:
First, it's not used anywhere. Second, the context is embedded into
struct crypto_tfm, so one could just use container_of() to get from the
context to crypto_tfm and from there to crypto_[as]kcipher.
Drop the superfluous backpointer.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lukas Wunner [Mon, 3 Feb 2025 13:37:02 +0000 (14:37 +0100)]
crypto: virtio - Simplify RSA key size caching
When setting a public or private RSA key, the integer n is cached in the
transform context virtio_crypto_akcipher_ctx -- with the sole purpose of
calculating the key size from it in virtio_crypto_rsa_max_size().
It looks like this was copy-pasted from crypto/rsa.c.
Cache the key size directly instead of the integer n, thus simplifying
the code and reducing the memory footprint.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lukas Wunner [Mon, 3 Feb 2025 13:37:01 +0000 (14:37 +0100)]
crypto: virtio - Fix kernel-doc of virtcrypto_dev_stop()
It seems the kernel-doc of virtcrypto_dev_start() was copied verbatim to
virtcrypto_dev_stop(). Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lukas Wunner [Sun, 2 Feb 2025 19:00:52 +0000 (20:00 +0100)]
crypto: ecdsa - Harden against integer overflows in DIV_ROUND_UP()
Herbert notes that DIV_ROUND_UP() may overflow unnecessarily if an ecdsa
implementation's ->key_size() callback returns an unusually large value.
Herbert instead suggests (for a division by 8):
X / 8 + !!(X & 7)
Based on this formula, introduce a generic DIV_ROUND_UP_POW2() macro and
use it in lieu of DIV_ROUND_UP() for ->key_size() return values.
Additionally, use the macro in ecc_digits_from_bytes(), whose "nbytes"
parameter is a ->key_size() return value in some instances, or a
user-specified ASN.1 length in the case of ecdsa_get_signature_rs().
Link: https://lore.kernel.org/r/Z3iElsILmoSu6FuC@gondor.apana.org.au/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lukas Wunner [Sun, 2 Feb 2025 19:00:51 +0000 (20:00 +0100)]
crypto: sig - Prepare for algorithms with variable signature size
The callers of crypto_sig_sign() assume that the signature size is
always equivalent to the key size.
This happens to be true for RSA, which is currently the only algorithm
implementing the ->sign() callback. But it is false e.g. for X9.62
encoded ECDSA signatures because they have variable length.
Prepare for addition of a ->sign() callback to such algorithms by
letting the callback return the signature size (or a negative integer
on error). When testing the ->sign() callback in test_sig_one(),
use crypto_sig_maxsize() instead of crypto_sig_keysize() to verify that
the test vector's signature does not exceed an algorithm's maximum
signature size.
There has been a relatively recent effort to upstream ECDSA signature
generation support which may benefit from this change:
https://lore.kernel.org/linux-crypto/
20220908200036.2034-1-ignat@cloudflare.com/
However the main motivation for this commit is to reduce the number of
crypto_sig_keysize() callers: This function is about to be changed to
return the size in bits instead of bytes and that will require amending
most callers to divide the return value by 8.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Cc: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Kaiser [Sat, 1 Feb 2025 18:39:07 +0000 (19:39 +0100)]
hwrng: imx-rngc - add runtime pm
Add runtime power management to the imx-rngc driver. Disable the
peripheral clock when the rngc is idle.
The callback functions from struct hwrng wake the rngc up when they're
called and set it to idle on exit. Helper functions which are invoked
from the callbacks assume that the rngc is active.
Device init and probe are done before runtime pm is enabled. The
peripheral clock will be handled manually during these steps. Do not use
devres any more to enable/disable the peripheral clock, this conflicts
with runtime pm.
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Suman Kumar Chakraborty [Fri, 31 Jan 2025 11:34:54 +0000 (11:34 +0000)]
crypto: qat - set command ids as reserved
The XP10 algorithm is not supported by any QAT device.
Remove the definition of bit 7 (ICP_QAT_FW_COMP_20_CMD_XP10_COMPRESS)
and bit 8 (ICP_QAT_FW_COMP_20_CMD_XP10_DECOMPRESS) in the firmware
command id enum and rename them as reserved.
Those bits shall not be used in future.
Signed-off-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Kristen Carlson Accardi [Tue, 28 Jan 2025 23:57:43 +0000 (15:57 -0800)]
MAINTAINERS: Add Vinicius Gomes to MAINTAINERS for IAA Crypto
Add Vinicius Gomes to the MAINTAINERS list for the IAA Crypto driver.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Mon, 27 Jan 2025 21:16:09 +0000 (13:16 -0800)]
crypto: x86/aes-xts - make the fast path 64-bit specific
Remove 32-bit support from the fast path in xts_crypt(). Then optimize
it for 64-bit, and simplify the code, by switching to sg_virt() and
removing the now-unnecessary checks for crossing a page boundary.
The result is simpler code that is slightly smaller and faster in the
case that actually matters (64-bit).
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
lizhi [Sat, 18 Jan 2025 00:45:20 +0000 (08:45 +0800)]
crypto: hisilicon/hpre - adapt ECDH for high-performance cores
Only the ECDH with NIST P-256 meets requirements.
The algorithm will be scheduled first for high-performance cores.
The key step is to config resv1 field of BD.
Signed-off-by: lizhi <lizhi206@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tom Lendacky [Fri, 17 Jan 2025 23:05:47 +0000 (17:05 -0600)]
crypto: ccp - Fix check for the primary ASP device
Currently, the ASP primary device check does not have support for PCI
domains, and, as a result, when the system is configured with PCI domains
(PCI segments) the wrong device can be selected as primary. This results
in commands submitted to the device timing out and failing. The device
check also relies on specific device and function assignments that may
not hold in the future.
Fix the primary ASP device check to include support for PCI domains and
to perform proper checking of the Bus/Device/Function positions.
Fixes:
2a6170dfe755 ("crypto: ccp: Add Platform Security Processor (PSP) device support")
Cc: stable@vger.kernel.org
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thorsten Blum [Fri, 17 Jan 2025 14:42:22 +0000 (15:42 +0100)]
crypto: skcipher - use str_yes_no() helper in crypto_skcipher_show()
Remove hard-coded strings by using the str_yes_no() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dragan Simic [Wed, 15 Jan 2025 13:07:01 +0000 (14:07 +0100)]
hwrng: Kconfig - Move one "tristate" Kconfig description to the usual place
It's pretty usual to have "tristate" descriptions in Kconfig files placed
immediately after the actual configuration options, so correct the position
of one misplaced "tristate" spotted in the hw_random Kconfig file.
No intended functional changes are introduced by this trivial cleanup.
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dragan Simic [Wed, 15 Jan 2025 13:07:00 +0000 (14:07 +0100)]
hwrng: Kconfig - Use tabs as leading whitespace consistently in Kconfig
Replace instances of leading size-eight groups of space characters with
the usual tab characters, as spotted in the hw_random Kconfig file.
No intended functional changes are introduced by this trivial cleanup.
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Krzysztof Kozlowski [Tue, 14 Jan 2025 19:05:01 +0000 (20:05 +0100)]
crypto: drivers - Use str_enable_disable-like helpers
Replace ternary (condition ? "enable" : "disable") syntax with helpers
from string_choices.h because:
1. Simple function call with one argument is easier to read. Ternary
operator has three arguments and with wrapping might lead to quite
long code.
2. Is slightly shorter thus also easier to read.
3. It brings uniformity in the text - same string.
4. Allows deduping by the linker, which results in a smaller binary
file.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> # QAT
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tanya Agarwal [Tue, 14 Jan 2025 14:12:04 +0000 (19:42 +0530)]
lib: 842: Improve error handling in sw842_compress()
The static code analysis tool "Coverity Scan" pointed the following
implementation details out for further development considerations:
CID
1309755: Unused value
In sw842_compress: A value assigned to a variable is never used. (CWE-563)
returned_value: Assigning value from add_repeat_template(p, repeat_count)
to ret here, but that stored value is overwritten before it can be used.
Conclusion:
Add error handling for the return value from an add_repeat_template()
call.
Fixes:
2da572c959dd ("lib: add software 842 compression/decompression")
Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Christian Marangi [Tue, 14 Jan 2025 12:36:36 +0000 (13:36 +0100)]
crypto: eip93 - Add Inside Secure SafeXcel EIP-93 crypto engine support
Add support for the Inside Secure SafeXcel EIP-93 Crypto Engine used on
Mediatek MT7621 SoC and new Airoha SoC.
EIP-93 IP supports AES/DES/3DES ciphers in ECB/CBC and CTR modes as well as
authenc(HMAC(x), cipher(y)) using HMAC MD5, SHA1, SHA224 and SHA256.
EIP-93 provide regs to signal support for specific chipers and the
driver dynamically register only the supported one by the chip.
Signed-off-by: Richard van Schagen <vschagen@icloud.com>
Co-developed-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Christian Marangi [Tue, 14 Jan 2025 12:36:35 +0000 (13:36 +0100)]
dt-bindings: crypto: Add Inside Secure SafeXcel EIP-93 crypto engine
Add bindings for the Inside Secure SafeXcel EIP-93 crypto engine.
The IP is present on Airoha SoC and on various Mediatek devices and
other SoC under different names like mtk-eip93 or PKTE.
All the compatible that currently doesn't have any user are defined but
rejected waiting for an actual device that makes use of them.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Christian Marangi [Tue, 14 Jan 2025 12:36:34 +0000 (13:36 +0100)]
spinlock: extend guard with spinlock_bh variants
Extend guard APIs with missing raw/spinlock_bh variants.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Sun, 2 Feb 2025 23:39:26 +0000 (15:39 -0800)]
Linux 6.14-rc1
Linus Torvalds [Sun, 2 Feb 2025 18:49:13 +0000 (10:49 -0800)]
Merge tag 'turbostat-2025.02.02' of git://git./linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Fix regression that affinitized forked child in one-shot mode.
- Harden one-shot mode against hotplug online/offline
- Enable RAPL SysWatt column by default
- Add initial PTL, CWF platform support
- Harden initial PMT code in response to early use
- Enable first built-in PMT counter: CWF c1e residency
- Refuse to run on unsupported platforms without --force, to encourage
updating to a version that supports the system, and to avoid
no-so-useful measurement results
* tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (25 commits)
tools/power turbostat: version 2025.02.02
tools/power turbostat: Add CPU%c1e BIC for CWF
tools/power turbostat: Harden one-shot mode against cpu offline
tools/power turbostat: Fix forked child affinity regression
tools/power turbostat: Add tcore clock PMT type
tools/power turbostat: version 2025.01.14
tools/power turbostat: Allow adding PMT counters directly by sysfs path
tools/power turbostat: Allow mapping multiple PMT files with the same GUID
tools/power turbostat: Add PMT directory iterator helper
tools/power turbostat: Extend PMT identification with a sequence number
tools/power turbostat: Return default value for unmapped PMT domains
tools/power turbostat: Check for non-zero value when MSR probing
tools/power turbostat: Enhance turbostat self-performance visibility
tools/power turbostat: Add fixed RAPL PSYS divisor for SPR
tools/power turbostat: Fix PMT mmaped file size rounding
tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULT
tools/power turbostat: Add an NMI column
tools/power turbostat: add Busy% to "show idle"
tools/power turbostat: Introduce --force parameter
tools/power turbostat: Improve --help output
...
Linus Torvalds [Sun, 2 Feb 2025 18:40:27 +0000 (10:40 -0800)]
Merge tag 'sh-for-v6.14-tag1' of git://git./linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"Fixes and improvements for sh:
- replace seq_printf() with the more efficient
seq_put_decimal_ull_width() to increase performance when stress
reading /proc/interrupts (David Wang)
- migrate sh to the generic rule for built-in DTB to help avoid race
conditions during parallel builds which can occur because Kbuild
decends into arch/*/boot/dts twice (Masahiro Yamada)
- replace select with imply in the board Kconfig for enabling
hardware with complex dependencies. This addresses warnings which
were reported by the kernel test robot (Geert Uytterhoeven)"
* tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: boards: Use imply to enable hardware with complex dependencies
sh: Migrate to the generic rule for built-in DTB
sh: irq: Use seq_put_decimal_ull_width() for decimal values
Len Brown [Sun, 2 Feb 2025 16:43:02 +0000 (10:43 -0600)]
tools/power turbostat: version 2025.02.02
Summary of Changes since 2024.11.30:
Fix regression in 2023.11.07 that affinitized forked child
in one-shot mode.
Harden one-shot mode against hotplug online/offline
Enable RAPL SysWatt column by default.
Add initial PTL, CWF platform support.
Harden initial PMT code in response to early use.
Enable first built-in PMT counter: CWF c1e residency
Refuse to run on unsupported platforms without --force,
to encourage updating to a version that supports the system,
and to avoid no-so-useful measurement results.
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Sat, 1 Feb 2025 23:07:56 +0000 (15:07 -0800)]
Merge tag 'pull-misc' of git://git./linux/kernel/git/viro/vfs
Pull misc vfs cleanups from Al Viro:
"Two unrelated patches - one is a removal of long-obsolete include in
overlayfs (it used to need fs/internal.h, but the extern it wanted has
been moved back to include/linux/namei.h) and another introduces
convenience helper constructing struct qstr by a NUL-terminated
string"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add a string-to-qstr constructor
fs/overlayfs/namei.c: get rid of include ../internal.h
Linus Torvalds [Sat, 1 Feb 2025 22:54:33 +0000 (14:54 -0800)]
Merge tag 'mips_6.14_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
"Revert commit breaking sysv ipc for o32 ABI"
* tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
Revert "mips: fix shmctl/semctl/msgctl syscall for o32"
Linus Torvalds [Sat, 1 Feb 2025 19:30:41 +0000 (11:30 -0800)]
Merge tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- various updates for special file handling: symlink handling,
support for creating sockets, cleanups, new mount options (e.g. to
allow disabling using reparse points for them, and to allow
overriding the way symlinks are saved), and fixes to error paths
- fix for kerberos mounts (allow IAKerb)
- SMB1 fix for stat and for setting SACL (auditing)
- fix an incorrect error code mapping
- cleanups"
* tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: (21 commits)
cifs: Fix parsing native symlinks directory/file type
cifs: update internal version number
cifs: Add support for creating WSL-style symlinks
smb3: add support for IAKerb
cifs: Fix struct FILE_ALL_INFO
cifs: Add support for creating NFS-style symlinks
cifs: Add support for creating native Windows sockets
cifs: Add mount option -o reparse=none
cifs: Add mount option -o symlink= for choosing symlink create type
cifs: Fix creating and resolving absolute NT-style symlinks
cifs: Simplify reparse point check in cifs_query_path_info() function
cifs: Remove symlink member from cifs_open_info_data union
cifs: Update description about ACL permissions
cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move to common/smb2pdu.h
cifs: Remove struct reparse_posix_data from struct cifs_open_info_data
cifs: Remove unicode parameter from parse_reparse_point() function
cifs: Fix getting and setting SACLs over SMB1
cifs: Remove intermediate object of failed create SFU call
cifs: Validate EAs for WSL reparse points
cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM
...
Linus Torvalds [Sat, 1 Feb 2025 18:04:29 +0000 (10:04 -0800)]
Merge tag 'driver-core-6.14-rc1-2' of git://git./linux/kernel/git/gregkh/driver-core
Pull debugfs fix from Greg KH:
"Here is a single debugfs fix from Al to resolve a reported regression
in the driver-core tree. It has been reported to fix the issue"
* tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
debugfs: Fix the missing initializations in __debugfs_file_get()
Linus Torvalds [Sat, 1 Feb 2025 17:49:20 +0000 (09:49 -0800)]
Merge tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"21 hotfixes. 8 are cc:stable and the remainder address post-6.13
issues. 13 are for MM and 8 are for non-MM.
All are singletons, please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
MAINTAINERS: include linux-mm for xarray maintenance
revert "xarray: port tests to kunit"
MAINTAINERS: add lib/test_xarray.c
mailmap, MAINTAINERS, docs: update Carlos's email address
mm/hugetlb: fix hugepage allocation for interleaved memory nodes
mm: gup: fix infinite loop within __get_longterm_locked
mm, swap: fix reclaim offset calculation error during allocation
.mailmap: update email address for Christopher Obbard
kfence: skip __GFP_THISNODE allocations on NUMA systems
nilfs2: fix possible int overflows in nilfs_fiemap()
mm: compaction: use the proper flag to determine watermarks
kernel: be more careful about dup_mmap() failures and uprobe registering
mm/fake-numa: handle cases with no SRAT info
mm: kmemleak: fix upper boundary check for physical address objects
mailmap: add an entry for Hamza Mahfooz
MAINTAINERS: mailmap: update Yosry Ahmed's email address
scripts/gdb: fix aarch64 userspace detection in get_current_task
mm/vmscan: accumulate nr_demoted for accurate demotion statistics
ocfs2: fix incorrect CPU endianness conversion causing mount failure
mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
...
Linus Torvalds [Sat, 1 Feb 2025 17:15:01 +0000 (09:15 -0800)]
Merge tag 'media/v6.14-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"A revert for a regression in the uvcvideo driver"
* tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "media: uvcvideo: Require entities to have a non-zero unique ID"
Andrew Morton [Fri, 31 Jan 2025 00:16:20 +0000 (16:16 -0800)]
MAINTAINERS: include linux-mm for xarray maintenance
MM developers have an interest in the xarray code.
Cc: David Gow <davidgow@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Fri, 31 Jan 2025 00:09:20 +0000 (16:09 -0800)]
revert "xarray: port tests to kunit"
Revert
c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build
when compiing the xarray userspace test harness code.
Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Closes: https://lkml.kernel.org/r/
07cf896e-adf8-414f-a629-
a808fc26014a@oracle.com
Cc: David Gow <davidgow@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Tamir Duberstein <tamird@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tamir Duberstein [Wed, 29 Jan 2025 21:13:49 +0000 (16:13 -0500)]
MAINTAINERS: add lib/test_xarray.c
Ensure test-only changes are sent to the relevant maintainer.
Link: https://lkml.kernel.org/r/20250129-xarray-test-maintainer-v1-1-482e31f30f47@gmail.com
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Carlos Bilbao [Thu, 30 Jan 2025 01:22:44 +0000 (19:22 -0600)]
mailmap, MAINTAINERS, docs: update Carlos's email address
Update .mailmap to reflect my new (and final) primary email address,
carlos.bilbao@kernel.org. Also update contact information in files
Documentation/translations/sp_SP/index.rst and MAINTAINERS.
Link: https://lkml.kernel.org/r/20250130012248.1196208-1-carlos.bilbao@kernel.org
Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: Carlos Bilbao <bilbao@vt.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ritesh Harjani (IBM) [Sat, 11 Jan 2025 11:06:55 +0000 (16:36 +0530)]
mm/hugetlb: fix hugepage allocation for interleaved memory nodes
gather_bootmem_prealloc() assumes the start nid as 0 and size as
num_node_state(N_MEMORY). That means in case if memory attached numa
nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail
to scan few of these nodes.
Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.
e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize:
1048576 kB
Hugetlb: 0 kB
with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize:
1048576 kB
Hugetlb:
2097152 kB
Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com
Fixes:
b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com>
Suggested-by: Muchun Song <muchun.song@linux.dev>
Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Gang Li <gang.li@linux.dev>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhaoyang Huang [Tue, 21 Jan 2025 02:01:59 +0000 (10:01 +0800)]
mm: gup: fix infinite loop within __get_longterm_locked
We can run into an infinite loop in __get_longterm_locked() when
collect_longterm_unpinnable_folios() finds only folios that are isolated
from the LRU or were never added to the LRU. This can happen when all
folios to be pinned are never added to the LRU, for example when
vm_ops->fault allocated pages using cma_alloc() and never added them to
the LRU.
Fix it by simply taking a look at the list in the single caller, to see if
anything was added.
[zhaoyang.huang@unisoc.com: move definition of local]
Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com
Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com
Fixes:
67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()")
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aijun Sun <aijun.sun@unisoc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Thu, 30 Jan 2025 11:51:31 +0000 (19:51 +0800)]
mm, swap: fix reclaim offset calculation error during allocation
There is a code error that will cause the swap entry allocator to reclaim
and check the whole cluster with an unexpected tail offset instead of the
part that needs to be reclaimed. This may cause corruption of the swap
map, so fix it.
Link: https://lkml.kernel.org/r/20250130115131.37777-1-ryncsn@gmail.com
Fixes:
3b644773eefd ("mm, swap: reduce contention on device lock")
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Chris Li <chrisl@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Christopher Obbard [Wed, 22 Jan 2025 12:04:27 +0000 (12:04 +0000)]
.mailmap: update email address for Christopher Obbard
Update my email address.
Link: https://lkml.kernel.org/r/20250122-wip-obbardc-update-email-v2-1-12bde6b79ad0@linaro.org
Signed-off-by: Christopher Obbard <christopher.obbard@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Marco Elver [Fri, 24 Jan 2025 12:01:38 +0000 (13:01 +0100)]
kfence: skip __GFP_THISNODE allocations on NUMA systems
On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on
a particular node, and failure to allocate on the desired node will result
in a failed allocation.
Skip __GFP_THISNODE allocations if we are running on a NUMA system, since
KFENCE can't guarantee which node its pool pages are allocated on.
Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com
Fixes:
236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Chistoph Lameter <cl@linux.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nikita Zhandarovich [Fri, 24 Jan 2025 22:20:53 +0000 (07:20 +0900)]
nilfs2: fix possible int overflows in nilfs_fiemap()
Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.
While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.
Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.
Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com
Fixes:
622daaff0a89 ("nilfs2: fiemap support")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
yangge [Sat, 25 Jan 2025 06:53:57 +0000 (14:53 +0800)]
mm: compaction: use the proper flag to determine watermarks
There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of
memory. I have configured 16GB of CMA memory on each NUMA node, and
starting a 32GB virtual machine with device passthrough is extremely slow,
taking almost an hour.
Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB
of no-CMA memory on a NUMA node can be used as virtual machine memory.
There is 16GB of free CMA memory on a NUMA node, which is sufficient to
pass the order-0 watermark check, causing the __compaction_suitable()
function to consistently return true.
For costly allocations, if the __compaction_suitable() function always
returns true, it causes the __alloc_pages_slowpath() function to fail to
exit at the appropriate point. This prevents timely fallback to
allocating memory on other nodes, ultimately resulting in excessively long
virtual machine startup times.
Call trace:
__alloc_pages_slowpath
if (compact_result == COMPACT_SKIPPED ||
compact_result == COMPACT_DEFERRED)
goto nopage; // should exit __alloc_pages_slowpath() from here
We could use the real unmovable allocation context to have
__zone_watermark_unusable_free() subtract CMA pages, and thus we won't
pass the order-0 check anymore once the non-CMA part is exhausted. There
is some risk that in some different scenario the compaction could in fact
migrate pages from the exhausted non-CMA part of the zone to the CMA part
and succeed, and we'll skip it instead. But only __GFP_NORETRY
allocations should be affected in the immediate "goto nopage" when
compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY
anyway and won't fail without trying to compact-migrate the non-CMA
pageblocks into CMA pageblocks first, so it should be fine.
After this fix, it only takes a few tens of seconds to start a 32GB
virtual machine with device passthrough functionality.
Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/
Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com
Signed-off-by: yangge <yangge1116@126.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam R. Howlett [Mon, 27 Jan 2025 17:02:21 +0000 (12:02 -0500)]
kernel: be more careful about dup_mmap() failures and uprobe registering
If a memory allocation fails during dup_mmap(), the maple tree can be left
in an unsafe state for other iterators besides the exit path. All the
locks are dropped before the exit_mmap() call (in mm/mmap.c), but the
incomplete mm_struct can be reached through (at least) the rmap finding
the vmas which have a pointer back to the mm_struct.
Up to this point, there have been no issues with being able to find an
mm_struct that was only partially initialised. Syzbot was able to make
the incomplete mm_struct fail with recent forking changes, so it has been
proven unsafe to use the mm_struct that hasn't been initialised, as
referenced in the link below.
Although
8ac662f5da19f ("fork: avoid inappropriate uprobe access to
invalid mm") fixed the uprobe access, it does not completely remove the
race.
This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the
oom side (even though this is extremely unlikely to be selected as an oom
victim in the race window), and sets MMF_UNSTABLE to avoid other potential
users from using a partially initialised mm_struct.
When registering vmas for uprobe, skip the vmas in an mm that is marked
unstable. Modifying a vma in an unstable mm may cause issues if the mm
isn't fully initialised.
Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com
Fixes:
d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bruno Faccini [Mon, 27 Jan 2025 17:16:23 +0000 (09:16 -0800)]
mm/fake-numa: handle cases with no SRAT info
Handle more gracefully cases where no SRAT information is available, like
in VMs with no Numa support, and allow fake-numa configuration to complete
successfully in these cases
Link: https://lkml.kernel.org/r/20250127171623.1523171-1-bfaccini@nvidia.com
Fixes:
63db8170bf34 (“mm/fake-numa: allow later numa node hotplug”)
Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Catalin Marinas [Mon, 27 Jan 2025 18:42:33 +0000 (18:42 +0000)]
mm: kmemleak: fix upper boundary check for physical address objects
Memblock allocations are registered by kmemleak separately, based on their
physical address. During the scanning stage, it checks whether an object
is within the min_low_pfn and max_low_pfn boundaries and ignores it
otherwise.
With the recent addition of __percpu pointer leak detection (commit
6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak
started reporting leaks in setup_zone_pageset() and
setup_per_cpu_pageset(). These were caused by the node_data[0] object
(initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn)
boundary. The non-strict upper boundary check introduced by commit
84c326299191 ("mm: kmemleak: check physical address when scan") causes the
pg_data_t object to be ignored (not scanned) and the __percpu pointers it
contains to be reported as leaks.
Make the max_low_pfn upper boundary check strict when deciding whether to
ignore a physical address object and not scan it.
Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com
Fixes:
84c326299191 ("mm: kmemleak: check physical address when scan")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: Patrick Wang <patrick.wang.shcn@gmail.com>
Cc: <stable@vger.kernel.org> [6.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hamza Mahfooz [Mon, 20 Jan 2025 20:56:59 +0000 (15:56 -0500)]
mailmap: add an entry for Hamza Mahfooz
Map my previous work email to my current one.
Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hans verkuil <hverkuil@xs4all.nl>
Cc: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yosry Ahmed [Thu, 23 Jan 2025 23:13:44 +0000 (23:13 +0000)]
MAINTAINERS: mailmap: update Yosry Ahmed's email address
Moving to a linux.dev email address.
Link: https://lkml.kernel.org/r/20250123231344.817358-1-yosry.ahmed@linux.dev
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kiszka [Fri, 10 Jan 2025 10:36:33 +0000 (11:36 +0100)]
scripts/gdb: fix aarch64 userspace detection in get_current_task
At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long
which lets the right-shift always return 0.
Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Li Zhijian [Fri, 10 Jan 2025 12:21:32 +0000 (20:21 +0800)]
mm/vmscan: accumulate nr_demoted for accurate demotion statistics
In shrink_folio_list(), demote_folio_list() can be called 2 times.
Currently stat->nr_demoted will only store the last nr_demoted( the later
nr_demoted is always zero, the former nr_demoted will get lost), as a
result number of demoted pages is not accurate.
Accumulate the nr_demoted count across multiple calls to
demote_folio_list(), ensuring accurate reporting of demotion statistics.
[lizhijian@fujitsu.com: introduce local nr_demoted to fix nr_reclaimed double counting]
Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com
Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com
Fixes:
f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Tested-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heming Zhao [Tue, 21 Jan 2025 11:22:03 +0000 (19:22 +0800)]
ocfs2: fix incorrect CPU endianness conversion causing mount failure
Commit
23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
introduced a regression bug. The blksz_bits value is already converted to
CPU endian in the previous code; therefore, the code shouldn't use
le32_to_cpu() anymore.
Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com
Fixes:
23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hyeonggon Yoo [Mon, 27 Jan 2025 23:16:31 +0000 (08:16 +0900)]
mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
Commit
c1b3bb73d55e ("mm/zsmalloc: use zpdesc in
trylock_zspage()/lock_zspage()") introduces is_first_zpdesc() function.
However, the function is only used when CONFIG_DEBUG_VM=y.
When building with LLVM=1 and W=1 option, the following warning is
generated:
$ make -j12 W=1 LLVM=1 mm/zsmalloc.o
mm/zsmalloc.c:455:20: error: function 'is_first_zpdesc' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
455 | static inline bool is_first_zpdesc(struct zpdesc *zpdesc)
| ^~~~~~~~~~~~~~~
1 error generated.
Fix the warning by adding __maybe_unused attribute to the function.
No functional change intended.
Link: https://lkml.kernel.org/r/20250127231631.4363-1-42.hyeyoo@gmail.com
Fixes:
c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202501240958.4ILzuBrH-lkp@intel.com/
Cc: Alex Shi <alexs@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
liuye [Tue, 19 Nov 2024 06:08:42 +0000 (14:08 +0800)]
mm/vmscan: fix hard LOCKUP in function isolate_lru_folios
This fixes the following hard lockup in isolate_lru_folios() during memory
reclaim. If the LRU mostly contains ineligible folios this may trigger
watchdog.
watchdog: Watchdog detected hard LOCKUP on cpu 173
RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
Call Trace:
_raw_spin_lock_irqsave+0x31/0x40
folio_lruvec_lock_irqsave+0x5f/0x90
folio_batch_move_lru+0x91/0x150
lru_add_drain_per_cpu+0x1c/0x40
process_one_work+0x17d/0x350
worker_thread+0x27b/0x3a0
kthread+0xe8/0x120
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1b/0x30
lruvec->lru_lock owner:
PID: 2865 TASK:
ffff888139214d40 CPU: 40 COMMAND: "kswapd0"
#0 [
fffffe0000945e60] crash_nmi_callback at
ffffffffa567a555
#1 [
fffffe0000945e68] nmi_handle at
ffffffffa563b171
#2 [
fffffe0000945eb0] default_do_nmi at
ffffffffa6575920
#3 [
fffffe0000945ed0] exc_nmi at
ffffffffa6575af4
#4 [
fffffe0000945ef0] end_repeat_nmi at
ffffffffa6601dde
[exception RIP: isolate_lru_folios+403]
RIP:
ffffffffa597df53 RSP:
ffffc90006fb7c28 RFLAGS:
00000002
RAX:
0000000000000001 RBX:
ffffc90006fb7c60 RCX:
ffffea04a2196f88
RDX:
ffffc90006fb7c60 RSI:
ffffc90006fb7c60 RDI:
ffffea04a2197048
RBP:
ffff88812cbd3010 R8:
ffffea04a2197008 R9:
0000000000000001
R10:
0000000000000000 R11:
0000000000000001 R12:
ffffea04a2197008
R13:
ffffea04a2197048 R14:
ffffc90006fb7de8 R15:
0000000003e3e937
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
<NMI exception stack>
#5 [
ffffc90006fb7c28] isolate_lru_folios at
ffffffffa597df53
#6 [
ffffc90006fb7cf8] shrink_active_list at
ffffffffa597f788
#7 [
ffffc90006fb7da8] balance_pgdat at
ffffffffa5986db0
#8 [
ffffc90006fb7ec0] kswapd at
ffffffffa5987354
#9 [
ffffc90006fb7ef8] kthread at
ffffffffa5748238
crash>
Scenario:
User processe are requesting a large amount of memory and keep page active.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached.
However pages in the LRU(active_anon) list are mostly from
the ZONE_NORMAL area.
Reproduce:
Terminal 1: Construct to continuously increase pages active(anon).
mkdir /tmp/memory
mount -t tmpfs -o size=1024000M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block bs=4M
tail /tmp/memory/block
Terminal 2:
vmstat -a 1
active will increase.
procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ...
r b swpd free inact active si so bi bo
1 0 0
1445623076 45898836 83646008 0 0 0
1 0 0
1445623076 43450228 86094616 0 0 0
1 0 0
1445623076 41003480 88541364 0 0 0
1 0 0
1445623076 38557088 90987756 0 0 0
1 0 0
1445623076 36109688 93435156 0 0 0
1 0 0
1445619552 33663256 95881632 0 0 0
1 0 0
1445619804 31217140 98327792 0 0 0
1 0 0
1445619804 28769988 100774944 0 0 0
1 0 0
1445619804 26322348 103222584 0 0 0
1 0 0
1445619804 23875592 105669340 0 0 0
cat /proc/meminfo | head
Active(anon) increase.
MemTotal:
1579941036 kB
MemFree:
1445618500 kB
MemAvailable:
1453013224 kB
Buffers: 6516 kB
Cached:
128653956 kB
SwapCached: 0 kB
Active:
118110812 kB
Inactive:
11436620 kB
Active(anon):
115345744 kB
Inactive(anon): 945292 kB
When the Active(anon) is
115345744 kB, insmod module triggers
the ZONE_DMA32 watermark.
perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2
nr_skipped=2 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=
28835844
nr_skipped=
28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=
28835844
nr_skipped=
28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29
nr_skipped=29 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon
See nr_scanned=
28835844.
28835844 * 4k = 115343376KB approximately equal to
115345744 kB.
If increase Active(anon) to 1000G then insmod module triggers
the ZONE_DMA32 watermark. hard lockup will occur.
In my device nr_scanned =
0000000003e3e937 when hard lockup.
Convert to memory size 0x0000000003e3e937 * 4KB =
261072092 KB.
[
ffffc90006fb7c28] isolate_lru_folios at
ffffffffa597df53
ffffc90006fb7c30:
0000000000000020 0000000000000000
ffffc90006fb7c40:
ffffc90006fb7d40 ffff88812cbd3000
ffffc90006fb7c50:
ffffc90006fb7d30 0000000106fb7de8
ffffc90006fb7c60:
ffffea04a2197008 ffffea0006ed4a48
ffffc90006fb7c70:
0000000000000000 0000000000000000
ffffc90006fb7c80:
0000000000000000 0000000000000000
ffffc90006fb7c90:
0000000000000000 0000000000000000
ffffc90006fb7ca0:
0000000000000000 0000000003e3e937
ffffc90006fb7cb0:
0000000000000000 0000000000000000
ffffc90006fb7cc0:
8d7c0b56b7874b00 ffff88812cbd3000
About the Fixes:
Why did it take eight years to be discovered?
The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area.
3. The memory in ZONE_DMA32 needs to reach the watermark.
If the memory is not large enough, or if the usage design of ZONE_DMA32
area memory is reasonable, this problem is difficult to detect.
notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL,
but other suitable scenarios may also trigger the problem.
Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn
Fixes:
b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Signed-off-by: liuye <liuye@kylinos.cn>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Geert Uytterhoeven [Fri, 24 Jan 2025 08:39:19 +0000 (09:39 +0100)]
sh: boards: Use imply to enable hardware with complex dependencies
If CONFIG_I2C=n:
WARNING: unmet direct dependencies detected for SND_SOC_AK4642
Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && I2C [=n]
Selected by [y]:
- SH_7724_SOLUTION_ENGINE [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y]
WARNING: unmet direct dependencies detected for SND_SOC_DA7210
Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && SND_SOC_I2C_AND_SPI [=n]
Selected by [y]:
- SH_ECOVEC [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y]
Fix this by replacing select by imply, instead of adding a dependency on
I2C.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202501240836.OvXqmANX-lkp@intel.com/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Masahiro Yamada [Sun, 22 Dec 2024 00:32:07 +0000 (09:32 +0900)]
sh: Migrate to the generic rule for built-in DTB
Commit
654102df2ac2 ("kbuild: add generic support for built-in
boot DTBs") introduced generic support for built-in DTBs.
Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.
To keep consistency across architectures, this commit also renames
CONFIG_USE_BUILTIN_DTB to CONFIG_BUILTIN_DTB, and
CONFIG_BUILTIN_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
David Wang [Sat, 30 Nov 2024 13:49:09 +0000 (21:49 +0800)]
sh: irq: Use seq_put_decimal_ull_width() for decimal values
On a system with n CPUs and m interrupts, there will be n*m decimal
values yielded via seq_printf(.."%10u "..) which has significant costs
parsing format string and is less efficient than seq_put_decimal_ull_width().
Stress reading /proc/interrupts indicates ~30% performance improvement with
this patch.
Signed-off-by: David Wang <00107082@163.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Linus Torvalds [Sat, 1 Feb 2025 04:11:24 +0000 (20:11 -0800)]
Merge tag 'for-linus-hexagon-6.14-rc1' of git://git./linux/kernel/git/bcain/linux
Pull hexagon updates from Brian Cain:
- Move kernel prototypes out of uapi header to internal header
- Fix to address an unbalanced spinlock
- Miscellaneous patches to fix static checks
- Update bcain@quicinc.com->brian.cain@oss.qualcomm.com
* tag 'for-linus-hexagon-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux:
MAINTAINERS: Update my email address
hexagon: Fix unbalanced spinlock in die()
hexagon: Fix warning comparing pointer to 0
hexagon: Move kernel prototypes out of uapi/asm/setup.h header
hexagon: time: Remove redundant null check for resource
hexagon: fix using plain integer as NULL pointer warning in cmpxchg
Linus Torvalds [Sat, 1 Feb 2025 03:49:17 +0000 (19:49 -0800)]
Remove stale generated 'genheaders' file
This bogus stale file was added in commit
101971298be2 ("riscv: add a
warning when physical memory address overflows"). It's the old location
for what is now 'security/selinux/genheaders'.
It looks like it got incorrectly committed back when that file was in
the old location, and then rebasing kept the bogus file alive.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/linux-riscv/20250201020003.GA77370@sol.localdomain/
Fixes:
101971298be2 ("riscv: add a warning when physical memory address overflows")
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 1 Feb 2025 01:12:31 +0000 (17:12 -0800)]
Merge tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git./linux/kernel/git/kees/linux
Pull AT_EXECVE_CHECK selftest fix from Kees Cook:
"Fixes the AT_EXECVE_CHECK selftests which didn't run on old versions
of glibc"
* tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests: Handle old glibc without execveat(2)
Linus Torvalds [Sat, 1 Feb 2025 01:10:26 +0000 (17:10 -0800)]
Merge tag 'hardening-v6.14-rc1-fix1' of git://git./linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
"This is a fix for the soon to be released GCC 15 which has regressed
its initialization of unions when performing explicit initialization
(i.e. a general problem, not specifically a hardening problem; we're
just carrying the fix).
Details in the final patch, Acked by Masahiro, with updated selftests
to validate the fix"
* tag 'hardening-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kbuild: Use -fzero-init-padding-bits=all
stackinit: Add union initialization to selftests
stackinit: Add old-style zero-init syntax to struct tests
Linus Torvalds [Fri, 31 Jan 2025 23:45:41 +0000 (15:45 -0800)]
Merge tag 'drm-next-2025-02-01' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"This is only AMD fixes:
amdgpu:
- GC 12 fix
- Aldebaran fix
- DCN 3.5 fix
- Freesync fix
amdkfd:
- Per queue reset fix
- MES fix"
* tag 'drm-next-2025-02-01' of https://gitlab.freedesktop.org/drm/kernel:
drm/amd/display: restore invalid MSA timing check for freesync
drm/amdkfd: only flush the validate MES contex
drm/amd/display: Correct register address in dcn35
drm/amd/pm: Mark MM activity as unsupported
drm/amd/amdgpu: change the config of cgcg on gfx12
drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1
Linus Torvalds [Fri, 31 Jan 2025 23:39:50 +0000 (15:39 -0800)]
Merge tag 'pci-v6.14-fixes-1' of git://git./linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Save the original INTX_DISABLE bit at the first pcim_intx() call and
restore that at devres cleanup instead of restoring the opposite of
the most recent enable/disable pcim_intx() argument, which was wrong
when a driver called pcim_intx() multiple times or with the already
enabled state (Takashi Iwai)
* tag 'pci-v6.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Restore original INTX_DISABLE bit by pcim_intx()
Linus Torvalds [Fri, 31 Jan 2025 23:13:25 +0000 (15:13 -0800)]
Merge tag 'riscv-for-linus-6.14-mw1' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- The PH1520 pinctrl and dwmac drivers are enabeled in defconfig
- A redundant AQRL barrier has been removed from the futex cmpxchg
implementation
- Support for the T-Head vector extensions, which includes exposing
these extensions to userspace on systems that implement them
- Some more page table information is now printed on die() and systems
that cause PA overflows
* tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: add a warning when physical memory address overflows
riscv/mm/fault: add show_pte() before die()
riscv: Add ghostwrite vulnerability
selftests: riscv: Support xtheadvector in vector tests
selftests: riscv: Fix vector tests
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
riscv: hwprobe: Add thead vendor extension probing
riscv: vector: Support xtheadvector save/restore
riscv: Add xtheadvector instruction definitions
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
RISC-V: define the elements of the VCSR vector CSR
riscv: vector: Use vlenb from DT for thead
riscv: Add thead and xtheadvector as a vendor extension
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
dt-bindings: cpus: add a thead vlen register length property
dt-bindings: riscv: Add xtheadvector ISA extension description
RISC-V: Mark riscv_v_init() as __init
riscv: defconfig: drop RT_GROUP_SCHED=y
riscv/futex: Optimize atomic cmpxchg
riscv: defconfig: enable pinctrl and dwmac support for TH1520
Thadeu Lima de Souza Cascardo [Tue, 14 Jan 2025 20:00:45 +0000 (17:00 -0300)]
Revert "media: uvcvideo: Require entities to have a non-zero unique ID"
This reverts commit
3dd075fe8ebbc6fcbf998f81a75b8c4b159a6195.
Tomasz has reported that his device, Generalplus Technology Inc. 808 Camera,
with ID 1b3f:2002, stopped being detected:
$ ls -l /dev/video*
zsh: no matches found: /dev/video*
[ 7.230599] usb 3-2: Found multiple Units with ID 5
This particular device is non-compliant, having both the Output Terminal
and Processing Unit with ID 5. uvc_scan_fallback, though, is able to build
a chain. However, when media elements are added and uvc_mc_create_links
call uvc_entity_by_id, it will get the incorrect entity,
media_create_pad_link will WARN, and it will fail to register the entities.
In order to reinstate support for such devices in a timely fashion,
reverting the fix for these warnings is appropriate. A proper fix that
considers the existence of such non-compliant devices will be submitted in
a later development cycle.
Reported-by: Tomasz Sikora <sikora.tomus@gmail.com>
Fixes:
3dd075fe8ebb ("media: uvcvideo: Require entities to have a non-zero unique ID")
Cc: stable@vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250114200045.1401644-1-cascardo@igalia.com
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Linus Torvalds [Fri, 31 Jan 2025 20:07:07 +0000 (12:07 -0800)]
Merge tag 'kbuild-v6.14' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support multiple hook locations for maint scripts of Debian package
- Remove 'cpio' from the build tool requirement
- Introduce gendwarfksyms tool, which computes CRCs for export symbols
based on the DWARF information
- Support CONFIG_MODVERSIONS for Rust
- Resolve all conflicts in the genksyms parser
- Fix several syntax errors in genksyms
* tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits)
kbuild: fix Clang LTO with CONFIG_OBJTOOL=n
kbuild: Strip runtime const RELA sections correctly
kconfig: fix memory leak in sym_warn_unmet_dep()
kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST
genksyms: fix syntax error for attribute before init-declarator
genksyms: fix syntax error for builtin (u)int*x*_t types
genksyms: fix syntax error for attribute after 'union'
genksyms: fix syntax error for attribute after 'struct'
genksyms: fix syntax error for attribute after abstact_declarator
genksyms: fix syntax error for attribute before nested_declarator
genksyms: fix syntax error for attribute before abstract_declarator
genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier
genksyms: record attributes consistently for init-declarator
genksyms: restrict direct-declarator to take one parameter-type-list
genksyms: restrict direct-abstract-declarator to take one parameter-type-list
genksyms: remove Makefile hack
genksyms: fix last 3 shift/reduce conflicts
genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts
genksyms: reduce type_qualifier directly to decl_specifier
genksyms: rename cvar_qualifier to type_qualifier
...
Linus Torvalds [Fri, 31 Jan 2025 19:49:30 +0000 (11:49 -0800)]
Merge tag 'block-6.14-
20250131' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- MD pull request via Song:
- Fix a md-cluster regression introduced
- More sysfs race fixes
- Mark anything inside queue freezing as not being able to do IO for
memory allocations
- Fix for a regression introduced in loop in this merge window
- Fix for a regression in queue mapping setups introduced in this merge
window
- Fix for the block dio fops attempting an iov_iter revert upton
getting -EIOCBQUEUED on the read side. This one is going to stable as
well
* tag 'block-6.14-
20250131' of git://git.kernel.dk/linux:
block: force noio scope in blk_mq_freeze_queue
block: fix nr_hw_queue update racing with disk addition/removal
block: get rid of request queue ->sysfs_dir_lock
loop: don't clear LO_FLAGS_PARTSCAN on LOOP_SET_STATUS{,64}
md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime
blk-mq: create correct map for fallback case
block: don't revert iter for -EIOCBQUEUED
Linus Torvalds [Fri, 31 Jan 2025 19:29:23 +0000 (11:29 -0800)]
Merge tag 'io_uring-6.14-
20250131' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
- Series cleaning up the alloc cache changes from this merge window,
and then another series on top making it better yet.
This also solves an issue with KASAN_EXTRA_INFO, by making io_uring
resilient to KASAN using parts of the freed struct for storage
- Cleanups and simplications to buffer cloning and io resource node
management
- Fix an issue introduced in this merge window where READ/WRITE_ONCE
was used on an atomic_t, which made some archs complain
- Fix for an errant connect retry when the socket has been shut down
- Fix for multishot and provided buffers
* tag 'io_uring-6.14-
20250131' of git://git.kernel.dk/linux:
io_uring/net: don't retry connect operation on EPOLLERR
io_uring/rw: simplify io_rw_recycle()
io_uring: remove !KASAN guards from cache free
io_uring/net: extract io_send_select_buffer()
io_uring/net: clean io_msg_copy_hdr()
io_uring/net: make io_net_vec_assign() return void
io_uring: add alloc_cache.c
io_uring: dont ifdef io_alloc_cache_kasan()
io_uring: include all deps for alloc_cache.h
io_uring: fix multishots with selected buffers
io_uring/register: use atomic_read/write for sq_flags migration
io_uring/alloc_cache: get rid of _nocache() helper
io_uring: get rid of alloc cache init_once handling
io_uring/uring_cmd: cleanup struct io_uring_cmd_data layout
io_uring/uring_cmd: use cached cmd_op in io_uring_cmd_sock()
io_uring/msg_ring: don't leave potentially dangling ->tctx pointer
io_uring/rsrc: Move lockdep assert from io_free_rsrc_node() to caller
io_uring/rsrc: remove unused parameter ctx for io_rsrc_node_alloc()
io_uring: clean up io_uring_register_get_file()
io_uring/rsrc: Simplify buffer cloning by locking both rings
Masahiro Yamada [Fri, 31 Jan 2025 14:04:01 +0000 (23:04 +0900)]
kbuild: fix Clang LTO with CONFIG_OBJTOOL=n
Since commit
bede169618c6 ("kbuild: enable objtool for *.mod.o and
additional kernel objects"), Clang LTO builds do not perform any
optimizations when CONFIG_OBJTOOL is disabled (e.g., for ARCH=arm64).
This is because every LLVM bitcode file is immediately converted to
ELF format before the object files are linked together.
This commit fixes the breakage.
Fixes:
bede169618c6 ("kbuild: enable objtool for *.mod.o and additional kernel objects")
Reported-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
Ard Biesheuvel [Mon, 13 Jan 2025 15:53:07 +0000 (16:53 +0100)]
kbuild: Strip runtime const RELA sections correctly
Due to the fact that runtime const ELF sections are named without a
leading period or double underscore, the RSTRIP logic that removes the
static RELA sections from vmlinux fails to identify them. This results
in a situation like below, where some sections that were supposed to get
removed are left behind.
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[58] runtime_shift_d_hash_shift PROGBITS
ffffffff83500f50 2900f50 000014 00 A 0 0 1
[59] .relaruntime_shift_d_hash_shift RELA
0000000000000000 55b6f00 000078 18 I 70 58 8
[60] runtime_ptr_dentry_hashtable PROGBITS
ffffffff83500f68 2900f68 000014 00 A 0 0 1
[61] .relaruntime_ptr_dentry_hashtable RELA
0000000000000000 55b6f78 000078 18 I 70 60 8
[62] runtime_ptr_USER_PTR_MAX PROGBITS
ffffffff83500f80 2900f80 000238 00 A 0 0 1
[63] .relaruntime_ptr_USER_PTR_MAX RELA
0000000000000000 55b6ff0 000d50 18 I 70 62 8
So tweak the match expression to strip all sections starting with .rel.
While at it, consolidate the logic used by RISC-V, s390 and x86 into a
single shared Makefile library command.
Link: https://lore.kernel.org/all/CAHk-=wjk3ynjomNvFN8jf9A1k=qSc=JFF591W00uXj-qqNUxPQ@mail.gmail.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>