Sibi Sankar [Mon, 24 Jun 2024 09:22:12 +0000 (14:52 +0530)]
dt-bindings: interconnect: qcom,msm8998-bwmon: Add X1E80100 BWMON instances
Document X1E80100 BWMONs, which has multiple (one per cluster) BWMONv4
instances for the CPU->LLCC path and one BWMONv5 instance for LLCC->DDR
path.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20240624092214.146935-3-quic_sibis@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Sibi Sankar [Mon, 24 Jun 2024 09:22:11 +0000 (14:52 +0530)]
dt-bindings: interconnect: qcom,msm8998-bwmon: Remove opp-table from the required list
Remove opp-table from the required list as the bindings shouldn't care
where the OPP tables (referenced by the operating-points-v2 property)
comes from.
Suggested-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Georgi Djakov <djakov@kernel.org>
Link: https://lore.kernel.org/r/20240624092214.146935-2-quic_sibis@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 24 Jun 2024 19:06:14 +0000 (21:06 +0200)]
firmware: qcom: tzmem: export devm_qcom_tzmem_pool_new()
EXPORT_SYMBOL_GPL() is missing for devm_qcom_tzmem_pool_new() which
causes build failures with randconfig. Add it and fix the issue.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202406250127.8Pl2kqFp-lkp@intel.com
Fixes:
84f5a7b67b61 ("firmware: qcom: add a dedicated TrustZone buffer allocator")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240624190615.36282-1-brgl@bgdev.pl
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bjorn Andersson [Mon, 24 Jun 2024 03:41:51 +0000 (22:41 -0500)]
Merge branch '
20240622-qcom-pd-mapper-v9-0-
a84ee3591c8e@linaro.org' into drivers-for-6.11
Merge the in-kernel protection-domain mapper implementation through a
feature branch, to allow sharing with the remoteproc tree.
Dmitry Baryshkov [Fri, 21 Jun 2024 22:03:43 +0000 (01:03 +0300)]
soc: qcom: add pd-mapper implementation
Existing userspace protection domain mapper implementation has several
issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't
reread JSON files if firmware location is changed (or if firmware was
not available at the time pd-mapper was started but the corresponding
directory is mounted later), etc.
Provide in-kernel service implementing protection domain mapping
required to work with several services, which are provided by the DSP
firmware.
This module is loaded automatically by the remoteproc drivers when
necessary via the symbol dependency. It uses a root node to match a
protection domains map for a particular board. It is not possible to
implement it as a 'driver' as there is no corresponding device.
Tested-by: Steev Klimaszewski <steev@kali.org>
Tested-by: Alexey Minnekhanov <alexeymin@postmarketos.org>
Reviewed-by: Chris Lew <quic_clew@quicinc.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240622-qcom-pd-mapper-v9-4-a84ee3591c8e@linaro.org
[bjorn: include linux/slab.h]
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Dmitry Baryshkov [Fri, 21 Jun 2024 22:03:42 +0000 (01:03 +0300)]
soc: qcom: pdr: extract PDR message marshalling data
The in-kernel PD mapper is going to use same message structures as the
QCOM_PDR_HELPERS module. Extract message marshalling data to separate
module that can be used by both PDR helpers and by PD mapper.
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Tested-by: Alexey Minnekhanov <alexeymin@postmarketos.org>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240622-qcom-pd-mapper-v9-3-a84ee3591c8e@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Dmitry Baryshkov [Fri, 21 Jun 2024 22:03:41 +0000 (01:03 +0300)]
soc: qcom: pdr: fix parsing of domains lists
While parsing the domains list, start offsets from 0 rather than from
domains_read. The domains_read is equal to the total count of the
domains we have seen, while the domains list in the message starts from
offset 0.
Fixes:
fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
Tested-by: Steev Klimaszewski <steev@kali.org>
Tested-by: Alexey Minnekhanov <alexeymin@postmarketos.org>
Reviewed-by: Chris Lew <quic_clew@quicinc.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240622-qcom-pd-mapper-v9-2-a84ee3591c8e@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Dmitry Baryshkov [Fri, 21 Jun 2024 22:03:40 +0000 (01:03 +0300)]
soc: qcom: pdr: protect locator_addr with the main mutex
If the service locator server is restarted fast enough, the PDR can
rewrite locator_addr fields concurrently. Protect them by placing
modification of those fields under the main pdr->lock.
Fixes:
fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Tested-by: Steev Klimaszewski <steev@kali.org>
Tested-by: Alexey Minnekhanov <alexeymin@postmarketos.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240622-qcom-pd-mapper-v9-1-a84ee3591c8e@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:55:03 +0000 (14:55 +0200)]
firmware: qcom: scm: clarify the comment in qcom_scm_pas_init_image()
The "memory protection" mechanism mentioned in the comment is the SHM
Bridge. This is also the reason why we do not convert this call to using
the TZ memory allocator.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-13-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:55:02 +0000 (14:55 +0200)]
firmware: qcom: scm: add support for SHM bridge memory carveout
Parse the "memory-region" property and - if present - use it to assign
the dedicated reserved memory to the underlying DMA callbacks which will
then allocate memory for the SCM calls from it.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-12-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:55:01 +0000 (14:55 +0200)]
firmware: qcom: tzmem: enable SHM Bridge support
SHM Bridge is a safety mechanism allowing to limit the amount of memory
shared between the kernel and the TrustZone to regions explicitly marked
as such.
Add a variant of the tzmem allocator that configures the memory pools as
SHM bridges. It also enables the SHM bridge globally so non-SHM bridge
memory will no longer work with SCM calls.
If enabled at build-time, it will still be checked for availability at
run-time. If the architecture doesn't support SHM Bridge, the allocator
will fall back to the generic mode.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-11-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:55:00 +0000 (14:55 +0200)]
firmware: qcom: scm: add support for SHM bridge operations
SHM Bridge is a safety mechanism allowing to limit the amount of memory
shared between the kernel and the TrustZone to regions explicitly marked
as such.
Add low-level primitives for enabling SHM bridge support as well as
creating and destroying SHM bridges to qcom-scm.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Acked-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-10-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:59 +0000 (14:54 +0200)]
firmware: qcom: qseecom: convert to using the TZ allocator
Drop the DMA mapping operations from qcom_scm_qseecom_app_send() and
convert all users of it in the qseecom module to using the TZ allocator
for creating SCM call buffers. As this is largely a module separate from
the SCM driver, let's use a separate memory pool. Set the initial size to
4K and - if we run out - add twice the current amount to the pool.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Reviewed-by: Amirreza Zarrabi <quic_azarrabi@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-9-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:58 +0000 (14:54 +0200)]
firmware: qcom: scm: make qcom_scm_qseecom_app_get_id() use the TZ allocator
Let's use the new TZ memory allocator to obtain a buffer for this call
instead of manually kmalloc()ing it and then mapping to physical space.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Maximilian Luz <luzmaximilian@gmail.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-8-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:57 +0000 (14:54 +0200)]
firmware: qcom: scm: make qcom_scm_lmh_dcvsh() use the TZ allocator
Let's use the new TZ memory allocator to obtain a buffer for this call
instead of using dma_alloc_coherent().
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-7-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:56 +0000 (14:54 +0200)]
firmware: qcom: scm: make qcom_scm_ice_set_key() use the TZ allocator
Let's use the new TZ memory allocator to obtain a buffer for this call
instead of using dma_alloc_coherent().
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-6-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:55 +0000 (14:54 +0200)]
firmware: qcom: scm: make qcom_scm_assign_mem() use the TZ allocator
Let's use the new TZ memory allocator to obtain a buffer for this call
instead of using dma_alloc_coherent().
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-5-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:54 +0000 (14:54 +0200)]
firmware: qcom: scm: smc: switch to using the SCM allocator
We need to allocate, map and pass a buffer to the trustzone if we have
more than 4 arguments for a given SCM call. Let's use the new TrustZone
allocator for that memory and shrink the code in process.
As this code lives in a different compilation unit than the rest of the
SCM code, we need to provide a helper in the form of
qcom_scm_get_tzmem_pool() that allows the SMC low-level routines to
access the SCM memory pool.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-4-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:53 +0000 (14:54 +0200)]
firmware: qcom: scm: enable the TZ mem allocator
Select the TrustZone memory allocator in Kconfig and create a pool of
memory shareable with the TrustZone when probing the SCM driver.
This will allow a gradual conversion of all relevant SCM calls to using
the dedicated allocator.
The policy used for the pool is "on-demand" and the initial size is 0
as - depending on the config - it's possible that no SCM calls needing
to allocate memory will be called. The sizes of possible allocations also
vary substiantially further warranting the "on-demand" approach.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-3-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:52 +0000 (14:54 +0200)]
firmware: qcom: add a dedicated TrustZone buffer allocator
We have several SCM calls that require passing buffers to the TrustZone
on top of the SMC core which allocates memory for calls that require
more than 4 arguments.
Currently every user does their own thing which leads to code
duplication. Many users call dma_alloc_coherent() for every call which
is terribly unperformant (speed- and size-wise).
Provide a set of library functions for creating and managing pools of
memory which is suitable for sharing with the TrustZone, that is:
page-aligned, contiguous and non-cachable as well as provides a way of
mapping of kernel virtual addresses to physical space.
Make the allocator ready for extending with additional modes of operation
which will allow us to support the SHM bridge safety mechanism once all
users convert.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Tested-by: Deepti Jaggi <quic_djaggi@quicinc.com> #sa8775p-ride
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-2-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bartosz Golaszewski [Mon, 27 May 2024 12:54:51 +0000 (14:54 +0200)]
dt-bindings: firmware: qcom,scm: add memory-region for sa8775p
Document a new property (currently only for sa8775p) that describes the
memory region reserved for communicating with the TrustZone.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20240527-shm-bridge-v10-1-ce7afaa58d3a@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Sibi Sankar [Thu, 13 Jun 2024 16:45:06 +0000 (22:15 +0530)]
soc: qcom: icc-bwmon: Fix refcount imbalance seen during bwmon_remove
The following warning is seen during bwmon_remove due to refcount
imbalance, fix this by releasing the OPPs after use.
Logs:
WARNING: at drivers/opp/core.c:1640 _opp_table_kref_release+0x150/0x158
Hardware name: Qualcomm Technologies, Inc. X1E80100 CRD (DT)
...
Call trace:
_opp_table_kref_release+0x150/0x158
dev_pm_opp_remove_table+0x100/0x1b4
devm_pm_opp_of_table_release+0x10/0x1c
devm_action_release+0x14/0x20
devres_release_all+0xa4/0x104
device_unbind_cleanup+0x18/0x60
device_release_driver_internal+0x1ec/0x228
driver_detach+0x50/0x98
bus_remove_driver+0x6c/0xbc
driver_unregister+0x30/0x60
platform_driver_unregister+0x14/0x20
bwmon_driver_exit+0x18/0x524 [icc_bwmon]
__arm64_sys_delete_module+0x184/0x264
invoke_syscall+0x48/0x118
el0_svc_common.constprop.0+0xc8/0xe8
do_el0_svc+0x20/0x2c
el0_svc+0x34/0xdc
el0t_64_sync_handler+0x13c/0x158
el0t_64_sync+0x190/0x194
--[ end trace
0000000000000000 ]---
Fixes:
0276f69f13e2 ("soc: qcom: icc-bwmon: Set default thresholds dynamically")
Fixes:
b9c2ae6cac40 ("soc: qcom: icc-bwmon: Add bandwidth monitoring driver")
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20240613164506.982068-1-quic_sibis@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bjorn Andersson [Fri, 21 Jun 2024 03:57:26 +0000 (22:57 -0500)]
Merge branch '
20240430-a750-raytracing-v3-2-
7f57c5ac082d@gmail.com' into drivers-for-6.11
Merge SMEM and SCM patches related to GPU features through a topic
branch to make it possible to share these with the msm-next DRM tree.
Konrad Dybcio [Wed, 5 Jun 2024 20:10:15 +0000 (22:10 +0200)]
soc: qcom: smem: Add a feature code getter
Recent (SM8550+ ish) Qualcomm SoCs have a new mechanism for precisely
identifying the specific SKU and the precise speed bin (in the general
meaning of this word, anyway): a pair of values called Product Code
and Feature Code.
Based on this information, we can deduce the available frequencies for
things such as Adreno. In the case of Adreno specifically, Pcode is
useless for non-prototype SoCs.
Introduce a getter for the feature code and export it.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20240605-topic-smem_speedbin-v2-2-8989d7e3d176@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Konrad Dybcio [Wed, 5 Jun 2024 20:10:14 +0000 (22:10 +0200)]
soc: qcom: Move some socinfo defines to the header
In preparation for parsing the chip "feature code" (FC) and "product
code" (PC) (essentially the parameters that let us conclusively
characterize the sillicon we're running on, including various speed
bins), move the socinfo version defines to the public header.
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20240605-topic-smem_speedbin-v2-1-8989d7e3d176@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Connor Abbott [Tue, 30 Apr 2024 10:43:16 +0000 (11:43 +0100)]
firmware: qcom: scm: Add gpu_init_regs call
This will used by drm/msm to initialize GPU registers that Qualcomm's
firmware doesn't make writeable to the kernel.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20240430-a750-raytracing-v3-2-7f57c5ac082d@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Luca Weiss [Thu, 6 Jun 2024 19:18:33 +0000 (21:18 +0200)]
soc: qcom: smsm: Support using mailbox interface
Add support for using the mbox interface instead of manually writing to
the syscon. With this change the driver will attempt to get the mailbox
first, and if that fails it will fall back to the existing way of using
qcom,ipc-* properties and converting to syscon.
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Link: https://lore.kernel.org/r/20240606-smsm-mbox-v2-2-8abe6b5f01da@z3ntu.xyz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Luca Weiss [Thu, 6 Jun 2024 19:18:32 +0000 (21:18 +0200)]
dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc
The qcom,ipc-N properties are essentially providing a reference to a
mailbox, so allow using the mboxes property to do the same in a more
structured way.
Since multiple SMSM hosts are supported, we need to be able to provide
the correct mailbox for each host. The old qcom,ipc-N properties map to
the mboxes property by index, starting at 0 since that's a valid SMSM
host also.
Mark the older qcom,ipc-N as deprecated and update the example with
mboxes.
Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240606-smsm-mbox-v2-1-8abe6b5f01da@z3ntu.xyz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Jeff Johnson [Tue, 4 Jun 2024 00:09:34 +0000 (17:09 -0700)]
soc: qcom: spm: add missing MODULE_DESCRIPTION()
make allmodconfig && make W=1 C=1 warns:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/soc/qcom/spm.o
Add the missing MODULE_DESCRIPTION(), using the same description as
the underlying QCOM_SPM Kconfig item.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240603-md-soc-qcom-spm-v1-1-617730f08d22@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Unnathi Chalicheemala [Fri, 31 May 2024 16:45:25 +0000 (09:45 -0700)]
soc: qcom: llcc: Add regmap for Broadcast_AND region
Until SM8450, there was only one broadcast region (Broadcast_OR)
used to broadcast write and check for status bit 0.
>From SM8450 onwards another broadcast region (Broadcast_AND) has been
added which checks for status bit 1. This hasn't been updated and
Broadcast_OR region was wrongly being used to check for status bit 1 all
along.
Hence define new regmap structure for Broadcast_AND region and initialize
this regmap when HW block version is greater than 4.1, otherwise
initialize as a NULL pointer for backwards compatibility.
Switch from broadcast_OR to broadcast_AND region (when defined in DT)
for checking status bit 1 as Broadcast_OR region checks only for bit 0.
Signed-off-by: Unnathi Chalicheemala <quic_uchalich@quicinc.com>
Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Link: https://lore.kernel.org/r/9cf19928a67eaa577ae0f02de5bf86276be34ea2.1717014052.git.quic_uchalich@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Unnathi Chalicheemala [Fri, 31 May 2024 16:45:24 +0000 (09:45 -0700)]
dt-bindings: arm: msm: Add llcc Broadcast_AND register
The LLCC block in SM8450, SM8550 and SM8650 have a new register
space for Broadcast_AND region. This is used to check that all
channels have bit set to "1", mainly in SCID activation/deactivation.
Previously we were mapping only the Broadcast_OR region assuming
there was only one broadcast register region. Now we also map
Broadcast_AND region.
Signed-off-by: Unnathi Chalicheemala <quic_uchalich@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/3306bf3026f38b0486e00307d26827d71c99915d.1717014052.git.quic_uchalich@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Tengfei Fan [Wed, 29 May 2024 10:15:33 +0000 (18:15 +0800)]
soc: qcom: llcc: Add llcc configuration support for the SA8775p platform
Add llcc configuration support for the SA8775p platform.
Signed-off-by: Tengfei Fan <quic_tengfan@quicinc.com>
Link: https://lore.kernel.org/r/20240529101534.3166507-3-quic_tengfan@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Tengfei Fan [Wed, 29 May 2024 10:15:32 +0000 (18:15 +0800)]
dt-bindings: cache: qcom,llcc: Add SA8775p description
Add the cache controller compatible and register region descriptions for
SA8775p platform.
Signed-off-by: Tengfei Fan <quic_tengfan@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240529101534.3166507-2-quic_tengfan@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Luca Weiss [Thu, 25 Apr 2024 19:14:31 +0000 (21:14 +0200)]
dt-bindings: soc: qcom,smp2p: Mark qcom,ipc as deprecated
Deprecate the qcom,ipc way of accessing the mailbox in favor of the
'mboxes' property.
Update the example to use mboxes.
Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Reviewed-by: "Rob Herring (Arm)" <robh@kernel.org>
Link: https://lore.kernel.org/r/20240425-qcom-ipc-deprecate-v1-2-a8d8034253ea@z3ntu.xyz
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Richard Acayan [Fri, 24 May 2024 01:20:26 +0000 (21:20 -0400)]
soc: qcom: socinfo: Add SDM670 SoC ID table entry
There is support for SDM670 already, but not recognized by the socinfo
driver. Add the table entry so SDM670 can be found in sysfs.
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20240524012023.318965-7-mailingradian@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Richard Acayan [Fri, 24 May 2024 01:20:25 +0000 (21:20 -0400)]
dt-bindings: arm: qcom,ids: Add SoC ID for SDM670
The socinfo driver uses SoC IDs from the device tree bindings.
Add the SoC ID for SDM670.
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240524012023.318965-6-mailingradian@gmail.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Kathiravan Thirumoorthy [Mon, 25 Mar 2024 15:49:50 +0000 (21:19 +0530)]
cpufreq: qcom-nvmem: add support for IPQ5321
Like all other SoCs in IPQ5332 family, cpufreq for IPQ5321 is also
determined by the eFuse, with the maximum limit of 1.1GHz. Add support
for the same.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com>
Link: https://lore.kernel.org/r/20240325-ipq5321-sku-support-v2-3-f30ce244732f@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Kathiravan Thirumoorthy [Mon, 25 Mar 2024 15:49:49 +0000 (21:19 +0530)]
soc: qcom: socinfo: Add SoC ID for IPQ5321
Add the SoC ID for IPQ5321, which belong to the family of IPQ5332 SoC.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com>
Link: https://lore.kernel.org/r/20240325-ipq5321-sku-support-v2-2-f30ce244732f@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Kathiravan Thirumoorthy [Mon, 25 Mar 2024 15:49:48 +0000 (21:19 +0530)]
dt-bindings: arm: qcom,ids: Add SoC ID for IPQ5321
Add the ID for the Qualcomm IPQ5321 SoC.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com>
Link: https://lore.kernel.org/r/20240325-ipq5321-sku-support-v2-1-f30ce244732f@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Stephen Boyd [Thu, 9 May 2024 18:41:28 +0000 (11:41 -0700)]
soc: qcom: rpmh-rsc: Ensure irqs aren't disabled by rpmh_rsc_send_data() callers
Dan pointed out that Smatch is concerned about this code because it uses
spin_lock_irqsave() and then calls wait_event_lock_irq() which enables
irqs before going to sleep. The comment above the function says it
should be called with interrupts enabled, but we simply hope that's true
without really confirming that. Let's add a might_sleep() here to
confirm that interrupts and preemption aren't disabled. Once we do that,
we can change the lock to be non-saving, spin_lock_irq(), to clarify
that we don't expect irqs to be disabled. If irqs are disabled by
callers they're going to be enabled anyway in the wait_event_lock_irq()
call which would be bad.
This should make Smatch happier and find bad callers faster with the
might_sleep(). We can drop the WARN_ON() in the caller because we have
the might_sleep() now, simplifying the code.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/
911181ed-c430-4592-ad26-
4dc948834e08@moroto.mountain
Fixes:
2bc20f3c8487 ("soc: qcom: rpmh-rsc: Sleep waiting for tcs slots to be free")
Cc: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240509184129.3924422-1-swboyd@chromium.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Chen Ni [Fri, 10 May 2024 08:31:56 +0000 (16:31 +0800)]
soc: qcom: pmic_glink: Handle the return value of pmic_glink_init
As platform_driver_register() and register_rpmsg_driver() can return
error numbers, it should be better to check the return value and deal
with the exception.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Fixes:
58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Link: https://lore.kernel.org/r/20240510083156.1996783-1-nichen@iscas.ac.cn
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Unnathi Chalicheemala [Tue, 14 May 2024 18:00:46 +0000 (11:00 -0700)]
firmware: qcom-scm: Remove QCOM_SMC_WAITQ_FLAG_WAKE_ALL
This flag was never supported by firmware, so remove it.
Signed-off-by: Unnathi Chalicheemala <quic_uchalich@quicinc.com>
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20240514180046.543763-1-quic_uchalich@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Abel Vesa [Mon, 27 May 2024 08:16:01 +0000 (11:16 +0300)]
soc: qcom: pmic_glink: Increase max ports to 3
Up until now, all Qualcomm platforms only had maximum 2 ports. The X Elite
(x1e80100) adds a third one. Increase the maximum allowed to 3.
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240527-x1e80100-soc-qcom-pmic-glink-v1-1-e5c4cda2f745@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Rohit Agarwal [Fri, 26 Apr 2024 05:53:23 +0000 (11:23 +0530)]
dt-bindings: soc: qcom,aoss-qmp: Document the SDX75 AOSS channel
Document the Always-On Subsystem side channel on the SDX75 Platform.
Signed-off-by: Rohit Agarwal <quic_rohiagar@quicinc.com>
Acked-by: "Rob Herring (Arm)" <robh@kernel.org>
Link: https://lore.kernel.org/r/20240426055326.3141727-4-quic_rohiagar@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Konrad Dybcio [Wed, 22 May 2024 11:33:43 +0000 (13:33 +0200)]
soc: qcom: socinfo: Update X1E PMICs
Assign the correct name to ID 82 and fix the ID of SMB2360.
Fixes:
e025171d1ab1 ("soc: qcom: socinfo: Add SMB2360 PMIC")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20240522-topic-x1e_pmics_socinfo-v1-1-da8a097e5134@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Bjorn Andersson [Fri, 24 May 2024 03:28:41 +0000 (20:28 -0700)]
firmware: qcom: uefisecapp: Allow on X1E devices
As with previous platforms, qseecom and the uefisecapp provides access
to EFI variables. Add X1E CRD and QCP devices to the allowlist.
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Tested-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20240523-x1e-efivarfs-v1-1-5d986265b8e4@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Linus Torvalds [Sun, 26 May 2024 22:20:12 +0000 (15:20 -0700)]
Linux 6.10-rc1
Kent Overstreet [Fri, 24 May 2024 15:42:09 +0000 (11:42 -0400)]
mm: percpu: Include smp.h in alloc_tag.h
percpu.h depends on smp.h, but doesn't include it directly because of
circular header dependency issues; percpu.h is needed in a bunch of low
level headers.
This fixes a randconfig build error on mips:
include/linux/alloc_tag.h: In function '__alloc_tag_ref_set':
include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration]
Reported-by: kernel test robot <lkp@intel.com>
Fixes:
24e44cc22aa3 ("mm: percpu: enable per-cpu allocation tagging")
Closes: https://lore.kernel.org/oe-kbuild-all/
202405210052.DIrMXJNz-lkp@intel.com/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 26 May 2024 16:54:26 +0000 (09:54 -0700)]
Merge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git./linux/kernel/git/perf/perf-tools
Pull perf tool fix from Arnaldo Carvalho de Melo:
"Revert a patch causing a regression.
This made a simple 'perf record -e cycles:pp make -j199' stop working
on the Ampere ARM64 system Linus uses to test ARM64 kernels".
* tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
Arnaldo Carvalho de Melo [Sun, 26 May 2024 11:13:21 +0000 (08:13 -0300)]
Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
This reverts commit
617824a7f0f73e4de325cf8add58e55b28c12493.
This made a simple 'perf record -e cycles:pp make -j199' stop working on
the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed
at length in the threads in the Link tags below.
The fix provided by Ian wasn't acceptable and work to fix this will take
time we don't have at this point, so lets revert this and work on it on
the next devel cycle.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Ethan Adams <j.ethan.adams@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com
Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sun, 26 May 2024 05:33:10 +0000 (22:33 -0700)]
Merge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- two important netfs integration fixes - including for a data
corruption and also fixes for multiple xfstests
- reenable swap support over SMB3
* tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix missing set of remote_i_size
cifs: Fix smb3_insert_range() to move the zero_point
cifs: update internal version number
smb3: reenable swapfiles over SMB3 mounts
Linus Torvalds [Sat, 25 May 2024 22:10:33 +0000 (15:10 -0700)]
Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes, 11 of which are cc:stable.
A few nilfs2 fixes, the remainder are for MM: a couple of selftests
fixes, various singletons fixing various issues in various parts"
* tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/ksm: fix possible UAF of stable_node
mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
nilfs2: fix potential hang in nilfs_detach_log_writer()
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
nilfs2: fix use-after-free of timer for log writer thread
selftests/mm: fix build warnings on ppc64
arm64: patching: fix handling of execmem addresses
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
selftests/mm: compaction_test: fix bogus test success on Aarch64
mailmap: update email address for Satya Priya
mm/huge_memory: don't unpoison huge_zero_folio
kasan, fortify: properly rename memintrinsics
lib: add version into /proc/allocinfo output
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
Linus Torvalds [Sat, 25 May 2024 21:48:40 +0000 (14:48 -0700)]
Merge tag 'irq-urgent-2024-05-25' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
- Fix x86 IRQ vector leak caused by a CPU offlining race
- Fix build failure in the riscv-imsic irqchip driver
caused by an API-change semantic conflict
- Fix use-after-free in irq_find_at_or_after()
* tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
Linus Torvalds [Sat, 25 May 2024 21:40:09 +0000 (14:40 -0700)]
Merge tag 'x86-urgent-2024-05-25' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix regressions of the new x86 CPU VFM (vendor/family/model)
enumeration/matching code
- Fix crash kernel detection on buggy firmware with
non-compliant ACPI MADT tables
- Address Kconfig warning
* tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
crypto: x86/aes-xts - switch to new Intel CPU model defines
x86/topology: Handle bogus ACPI tables correctly
x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
Linus Torvalds [Sat, 25 May 2024 21:32:29 +0000 (14:32 -0700)]
Merge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi
Pull ipmi updates from Corey Minyard:
"Mostly updates for deprecated interfaces, platform.remove and
converting from a tasklet to a BH workqueue.
Also use HAS_IOPORT for disabling inb()/outb()"
* tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi:
ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void
ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void
ipmi: ipmi_ssif: Convert to platform remove callback returning void
ipmi: ipmi_si_platform: Convert to platform remove callback returning void
ipmi: ipmi_powernv: Convert to platform remove callback returning void
ipmi: bt-bmc: Convert to platform remove callback returning void
char: ipmi: handle HAS_IOPORT dependencies
ipmi: Convert from tasklet to BH workqueue
Linus Torvalds [Sat, 25 May 2024 21:23:58 +0000 (14:23 -0700)]
Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A series from Xiubo that adds support for additional access checks
based on MDS auth caps which were recently made available to clients.
This is needed to prevent scenarios where the MDS quietly discards
updates that a UID-restricted client previously (wrongfully) acked to
the user.
Other than that, just a documentation fixup"
* tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
doc: ceph: update userspace command to get CephFS metadata
ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
ceph: check the cephx mds auth access for async dirop
ceph: check the cephx mds auth access for open
ceph: check the cephx mds auth access for setattr
ceph: add ceph_mds_check_access() helper
ceph: save cap_auths in MDS client when session is opened
Linus Torvalds [Sat, 25 May 2024 21:19:01 +0000 (14:19 -0700)]
Merge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Fixes:
- reusing of the file index (could cause the file to be trimmed)
- infinite dir enumeration
- taking DOS names into account during link counting
- le32_to_cpu conversion, 32 bit overflow, NULL check
- some code was refactored
Changes:
- removed max link count info display during driver init
Remove:
- atomic_open has been removed for lack of use"
* tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Break dir enumeration if directory contents error
fs/ntfs3: Fix case when index is reused during tree transformation
fs/ntfs3: Mark volume as dirty if xattr is broken
fs/ntfs3: Always make file nonresident on fallocate call
fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
fs/ntfs3: Use variable length array instead of fixed size
fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
fs/ntfs3: Check 'folio' pointer for NULL
fs/ntfs3: Missed le32_to_cpu conversion
fs/ntfs3: Remove max link count info display during driver init
fs/ntfs3: Taking DOS names into account during link counting
fs/ntfs3: remove atomic_open
fs/ntfs3: use kcalloc() instead of kzalloc()
Linus Torvalds [Sat, 25 May 2024 21:15:39 +0000 (14:15 -0700)]
Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Two ksmbd server fixes, both for stable"
* tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: ignore trailing slashes in share paths
ksmbd: avoid to send duplicate oplock break notifications
Linus Torvalds [Sat, 25 May 2024 20:33:53 +0000 (13:33 -0700)]
Merge tag 'rtc-6.10' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There is one new driver and then most of the changes are the device
tree bindings conversions to yaml.
New driver:
- Epson RX8111
Drivers:
- Many Device Tree bindings conversions to dtschema
- pcf8563: wakeup-source support"
* tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
pcf8563: add wakeup-source support
rtc: rx8111: handle VLOW flag
rtc: rx8111: demote warnings to debug level
rtc: rx6110: Constify struct regmap_config
dt-bindings: rtc: convert trivial devices into dtschema
dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema
dt-bindings: rtc: pxa-rtc: convert to dtschema
rtc: Add driver for Epson RX8111
dt-bindings: rtc: Add Epson RX8111
rtc: mcp795: drop unneeded MODULE_ALIAS
rtc: nuvoton: Modify part number value
rtc: test: Split rtc unit test into slow and normal speed test
dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema
dt-bindings: rtc: digicolor-rtc: move to trivial-rtc
dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema
dt-bindings: rtc: armada-380-rtc: convert to dtschema
rtc: cros-ec: provide ID table for avoiding fallback match
Linus Torvalds [Sat, 25 May 2024 20:28:29 +0000 (13:28 -0700)]
Merge tag 'i3c/for-6.10' of git://git./linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Runtime PM (power management) is improved and hot-join support has
been added to the dw controller driver.
Core:
- Allow device driver to trigger controller runtime PM
Drivers:
- dw: hot-join support
- svc: better IBI handling"
* tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: dw: Add hot-join support.
i3c: master: Enable runtime PM for master controller
i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame
i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()
Linus Torvalds [Sat, 25 May 2024 20:23:42 +0000 (13:23 -0700)]
Merge tag 'jffs2-for-linus-6.10-rc1' of git://git./linux/kernel/git/rw/ubifs
Pull jffs2 updates from Richard Weinberger:
- Fix illegal memory access in jffs2_free_inode()
- Kernel-doc fixes
- print symbolic error names
* tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
jffs2: Fix potential illegal address access in jffs2_free_inode
jffs2: Simplify the allocation of slab caches
jffs2: nodemgmt: fix kernel-doc comments
jffs2: print symbolic error name instead of error code
Linus Torvalds [Sat, 25 May 2024 20:17:48 +0000 (13:17 -0700)]
Merge tag 'uml-for-linus-6.10-rc1' of git://git./linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Fixes for -Wmissing-prototypes warnings and further cleanup
- Remove callback returning void from rtc and virtio drivers
- Fix bash location
* tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
um: virtio_uml: Convert to platform remove callback returning void
um: rtc: Convert to platform remove callback returning void
um: Remove unused do_get_thread_area function
um: Fix -Wmissing-prototypes warnings for __vdso_*
um: Add an internal header shared among the user code
um: Fix the declaration of kasan_map_memory
um: Fix the -Wmissing-prototypes warning for get_thread_reg
um: Fix the -Wmissing-prototypes warning for __switch_mm
um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
um: Stop tracking host PID in cpu_tasks
um: process: remove unused 'n' variable
um: vector: remove unused len variable/calculation
um: vector: fix bpfflash parameter evaluation
um: slirp: remove set but unused variable 'pid'
um: signal: move pid variable where needed
um: Makefile: use bash from the environment
um: Add winch to winch_handlers before registering winch IRQ
um: Fix -Wmissing-prototypes warnings for __warp_* and foo
um: Fix -Wmissing-prototypes warnings for text_poke*
um: Move declarations to proper headers
...
Linus Torvalds [Sat, 25 May 2024 00:28:02 +0000 (17:28 -0700)]
Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Some fixes for the end of the merge window, mostly amdgpu and panthor,
with one nouveau uAPI change that fixes a bad decision we made a few
months back.
nouveau:
- fix bo metadata uAPI for vm bind
panthor:
- Fixes for panthor's heap logical block.
- Reset on unrecoverable fault
- Fix VM references.
- Reset fix.
xlnx:
- xlnx compile and doc fixes.
amdgpu:
- Handle vbios table integrated info v2.3
amdkfd:
- Handle duplicate BOs in reserve_bo_and_cond_vms
- Handle memory limitations on small APUs
dp/mst:
- MST null deref fix.
bridge:
- Don't let next bridge create connector in adv7511 to make probe
work"
* tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
drm/amdgpu/atomfirmware: add intergrated info v2.3 table
drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
drm/bridge: adv7511: Attach next bridge without creating connector
drm/buddy: Fix the warn on's during force merge
drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
drm/panthor: Call panthor_sched_post_reset() even if the reset failed
drm/panthor: Reset the FW VM to NULL on unplug
drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
drm/panthor: Force an immediate reset on unrecoverable faults
drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
drm/panthor: Fix an off-by-one in the heap context retrieval logic
drm/panthor: Relax the constraints on the tiler chunk size
drm/panthor: Make sure the tiler initial/max chunks are consistent
drm/panthor: Fix tiler OOM handling to allow incremental rendering
drm: xlnx: zynqmp_dpsub: Fix compilation error
drm: xlnx: zynqmp_dpsub: Fix few function comments
David Howells [Fri, 24 May 2024 14:23:36 +0000 (15:23 +0100)]
cifs: Fix missing set of remote_i_size
Occasionally, the generic/001 xfstest will fail indicating corruption in
one of the copy chains when run on cifs against a server that supports
FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs). The
problem is that the remote_i_size value isn't updated by cifs_setsize()
when called by smb2_duplicate_extents(), but i_size *is*.
This may cause cifs_remap_file_range() to then skip the bit after calling
->duplicate_extents() that sets sizes.
Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before
calling cifs_setsize() to set i_size.
This means we don't then need to call netfs_resize_file() upon return from
->duplicate_extents(), but we also fix the test to compare against the pre-dup
inode size.
[Note that this goes back before the addition of remote_i_size with the
netfs_inode struct. It should probably have been setting cifsi->server_eof
previously.]
Fixes:
cfc63fc8126a ("smb3: fix cached file size problems in duplicate extents (reflink)")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
David Howells [Wed, 22 May 2024 08:38:48 +0000 (09:38 +0100)]
cifs: Fix smb3_insert_range() to move the zero_point
Fix smb3_insert_range() to move the zero_point over to the new EOF.
Without this, generic/147 fails as reads of data beyond the old EOF point
return zeroes.
Fixes:
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Fri, 24 May 2024 19:47:28 +0000 (12:47 -0700)]
Merge tag 'mm-stable-2024-05-24-11-49' of git://git./linux/kernel/git/akpm/mm
Pull more mm updates from Andrew Morton:
"Jeff Xu's implementation of the mseal() syscall"
* tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
selftest mm/mseal read-only elf memory segment
mseal: add documentation
selftest mm/mseal memory sealing
mseal: add mseal syscall
mseal: wire up mseal syscall
Chengming Zhou [Mon, 13 May 2024 03:07:56 +0000 (11:07 +0800)]
mm/ksm: fix possible UAF of stable_node
The commit
2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page
deduplication limit") introduced a possible failure case in the
stable_tree_insert(), where we may free the new allocated stable_node_dup
if we fail to prepare the missing chain node.
Then that kfolio return and unlock with a freed stable_node set... And
any MM activities can come in to access kfolio->mapping, so UAF.
Fix it by moving folio_set_stable_node() to the end after stable_node
is inserted successfully.
Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev
Fixes:
2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication limit")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Thu, 23 May 2024 07:12:17 +0000 (15:12 +0800)]
mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
When I did memory failure tests recently, below panic occurs:
page: refcount:0 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw:
06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw:
0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:
ffffa49c90437998 EFLAGS:
00000046
RAX:
0000000000000035 RBX:
0000000000000009 RCX:
ffff8dd8dfd1c9c8
RDX:
0000000000000000 RSI:
0000000000000027 RDI:
ffff8dd8dfd1c9c0
RBP:
ffffd901233b8000 R08:
ffffffffab5511f8 R09:
0000000000008c69
R10:
0000000000003c15 R11:
ffffffffab5511f8 R12:
ffff8dd8fffc0c80
R13:
0000000000000001 R14:
ffff8dd8fffc0c80 R15:
0000000000000009
FS:
00007ff916304740(0000) GS:
ffff8dd8dfd00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000055eae50124c8 CR3:
00000008479e0000 CR4:
00000000000006f0
Call Trace:
<TASK>
__rmqueue_pcplist+0x23b/0x520
get_page_from_freelist+0x26b/0xe40
__alloc_pages_noprof+0x113/0x1120
__folio_alloc_noprof+0x11/0xb0
alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
__alloc_fresh_hugetlb_folio+0xe7/0x140
alloc_pool_huge_folio+0x68/0x100
set_max_huge_pages+0x13d/0x340
hugetlb_sysctl_handler_common+0xe8/0x110
proc_sys_call_handler+0x194/0x280
vfs_write+0x387/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xc2/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:
00007ffec8a2fd78 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
000055eae500e350 RCX:
00007ff916114887
RDX:
0000000000000004 RSI:
000055eae500e390 RDI:
0000000000000003
RBP:
000055eae50104c0 R08:
0000000000000000 R09:
000055eae50104c0
R10:
0000000000000077 R11:
0000000000000246 R12:
0000000000000004
R13:
0000000000000004 R14:
00007ff916216b80 R15:
00007ff916216a00
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace
0000000000000000 ]---
And before the panic, there had an warning about bad page state:
BUG: Bad page state in process page-types pfn:8cee00
page: refcount:0 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw:
06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw:
0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted
6.9.0-rc4-00499-g5544ec3178e2-dirty #22
Call Trace:
<TASK>
dump_stack_lvl+0x83/0xa0
bad_page+0x63/0xf0
free_unref_page+0x36e/0x5c0
unpoison_memory+0x50b/0x630
simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
debugfs_attr_write+0x42/0x60
full_proxy_write+0x5b/0x80
vfs_write+0xcd/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xc2/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:
00007ffdcd899718 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f189a514887
RDX:
0000000000000009 RSI:
00007ffdcd899730 RDI:
0000000000000003
RBP:
00007ffdcd8997a0 R08:
0000000000000000 R09:
00007ffdcd8994b2
R10:
0000000000000000 R11:
0000000000000246 R12:
00007ffdcda199a8
R13:
0000000000404af1 R14:
000000000040ad78 R15:
00007f189a7a5040
</TASK>
The root cause should be the below race:
memory_failure
try_memory_failure_hugetlb
me_huge_page
__page_handle_poison
dissolve_free_hugetlb_folio
drain_all_pages -- Buddy page can be isolated e.g. for compaction.
take_page_off_buddy -- Failed as page is not in the buddy list.
-- Page can be putback into buddy after compaction.
page_ref_inc -- Leads to buddy page with refcnt = 1.
Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning. And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.
Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.
Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
Fixes:
ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yuanyuan Zhong [Thu, 23 May 2024 18:35:31 +0000 (12:35 -0600)]
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
After switching smaps_rollup to use VMA iterator, searching for next entry
is part of the condition expression of the do-while loop. So the current
VMA needs to be addressed before the continue statement.
Otherwise, with some VMAs skipped, userspace observed memory
consumption from /proc/pid/smaps_rollup will be smaller than the sum of
the corresponding fields from /proc/pid/smaps.
Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
Fixes:
c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Mon, 20 May 2024 13:26:21 +0000 (22:26 +0900)]
nilfs2: fix potential hang in nilfs_detach_log_writer()
Syzbot has reported a potential hang in nilfs_detach_log_writer() called
during nilfs2 unmount.
Analysis revealed that this is because nilfs_segctor_sync(), which
synchronizes with the log writer thread, can be called after
nilfs_segctor_destroy() terminates that thread, as shown in the call trace
below:
nilfs_detach_log_writer
nilfs_segctor_destroy
nilfs_segctor_kill_thread --> Shut down log writer thread
flush_work
nilfs_iput_work_func
nilfs_dispose_list
iput
nilfs_evict_inode
nilfs_transaction_commit
nilfs_construct_segment (if inode needs sync)
nilfs_segctor_sync --> Attempt to synchronize with
log writer thread
*** DEADLOCK ***
Fix this issue by changing nilfs_segctor_sync() so that the log writer
thread returns normally without synchronizing after it terminates, and by
forcing tasks that are already waiting to complete once after the thread
terminates.
The skipped inode metadata flushout will then be processed together in the
subsequent cleanup work in nilfs_segctor_destroy().
Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
e3973c409251e136fdd0
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Mon, 20 May 2024 13:26:20 +0000 (22:26 +0900)]
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
A potential and reproducible race issue has been identified where
nilfs_segctor_sync() would block even after the log writer thread writes a
checkpoint, unless there is an interrupt or other trigger to resume log
writing.
This turned out to be because, depending on the execution timing of the
log writer thread running in parallel, the log writer thread may skip
responding to nilfs_segctor_sync(), which causes a call to schedule()
waiting for completion within nilfs_segctor_sync() to lose the opportunity
to wake up.
The reason why waking up the task waiting in nilfs_segctor_sync() may be
skipped is that updating the request generation issued using a shared
sequence counter and adding an wait queue entry to the request wait queue
to the log writer, are not done atomically. There is a possibility that
log writing and request completion notification by nilfs_segctor_wakeup()
may occur between the two operations, and in that case, the wait queue
entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
nilfs_segctor_sync() will be carried over until the next request occurs.
Fix this issue by performing these two operations simultaneously within
the lock section of sc_state_lock. Also, following the memory barrier
guidelines for event waiting loops, move the call to set_current_state()
in the same location into the event waiting loop to ensure that a memory
barrier is inserted just before the event condition determination.
Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com
Fixes:
9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Mon, 20 May 2024 13:26:19 +0000 (22:26 +0900)]
nilfs2: fix use-after-free of timer for log writer thread
Patch series "nilfs2: fix log writer related issues".
This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis. Details are described in each commit log.
This patch (of 3):
A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.
The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info structure
is about to be freed, and is used regardless of the thread's lifetime.
Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.
Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
Fixes:
fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: "Bai, Shuangpeng" <sjb7183@psu.edu>
Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michael Ellerman [Tue, 21 May 2024 03:02:19 +0000 (13:02 +1000)]
selftests/mm: fix build warnings on ppc64
Fix warnings like:
In file included from uffd-unit-tests.c:8:
uffd-unit-tests.c: In function `uffd_poison_handle_fault':
uffd-common.h:45:33: warning: format `%llu' expects argument of type
`long long unsigned int', but argument 3 has type `__u64' {aka `long
unsigned int'} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Will Deacon [Tue, 21 May 2024 21:38:13 +0000 (00:38 +0300)]
arm64: patching: fix handling of execmem addresses
Klara Modin reported warnings for a kernel configured with BPF_JIT but
without MODULES:
[ 44.131296] Trying to vfree() bad address (
000000004a17c299)
[ 44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[ 44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G D W
6.9.0-01786-g2c9e5d4a0082 #25
[ 44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
[ 44.164433] Workqueue: events bpf_prog_free_deferred
[ 44.170492] pstate:
60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[ 44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[ 44.188772] sp :
ffff800082a13c70
[ 44.193112] x29:
ffff800082a13c70 x28:
0000000000000000 x27:
0000000000000000
[ 44.201384] x26:
0000000000000000 x25:
ffff00003a44efa0 x24:
00000000d4202000
[ 44.209658] x23:
ffff800081223dd0 x22:
ffff00003a198a40 x21:
ffff8000814dd880
[ 44.217924] x20:
00000000d4202000 x19:
ffff8000814dd880 x18:
0000000000000006
[ 44.226206] x17:
0000000000000000 x16:
0000000000000020 x15:
0000000000000002
[ 44.234460] x14:
ffff8000811a6370 x13:
0000000020000000 x12:
0000000000000000
[ 44.242710] x11:
ffff8000811a6370 x10:
0000000000000144 x9 :
ffff8000811fe370
[ 44.250959] x8 :
0000000000017fe8 x7 :
00000000fffff000 x6 :
ffff8000811fe370
[ 44.259206] x5 :
0000000000000000 x4 :
0000000000000000 x3 :
0000000000000000
[ 44.267457] x2 :
0000000000000000 x1 :
0000000000000000 x0 :
ffff000002203240
[ 44.275703] Call trace:
[ 44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[ 44.283858] vfree (mm/vmalloc.c:3322)
[ 44.287835] execmem_free (mm/execmem.c:70)
[ 44.292347] bpf_jit_free_exec+0x10/0x1c
[ 44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
[ 44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
[ 44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
[ 44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
[ 44.317785] process_one_work (kernel/workqueue.c:3273)
[ 44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
[ 44.327292] kthread (kernel/kthread.c:388)
[ 44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)
The problem is because bpf_arch_text_copy() silently fails to write to the
read-only area as a result of patch_map() faulting and the resulting
-EFAULT being chucked away.
Update patch_map() to use CONFIG_EXECMEM instead of
CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.
Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
Fixes:
2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reported-by: Klara Modin <klarasmodin@gmail.com>
Closes: https://lore.kernel.org/all/
7983fbbf-0127-457c-9394-
8d6e4299c685@gmail.com
Tested-by: Klara Modin <klarasmodin@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:58 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes:
bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:57 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes:
bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:56 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix bogus test success on Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes:
bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Satya Priya Kakitapalli [Wed, 15 May 2024 06:04:50 +0000 (11:34 +0530)]
mailmap: update email address for Satya Priya
Update mailmap with my latest email ID, quic_c_skakit@quicinc.com
is no longer active.
Link: https://lkml.kernel.org/r/20240515-mailmap-update-v1-1-df4853f757a3@quicinc.com
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com>
Cc: Ajit Pandey <quic_ajipan@quicinc.com>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Imran Shaik <quic_imrashai@quicinc.com>
Cc: Jagadeesh Kona <quic_jkona@quicinc.com>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Taniya Das <quic_tdas@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Thu, 16 May 2024 12:26:08 +0000 (20:26 +0800)]
mm/huge_memory: don't unpoison huge_zero_folio
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted
6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:
ffff9933c6c57bd0 EFLAGS:
00000246
RAX:
000000000000003e RBX:
0000000000000000 RCX:
ffff88f61fc5c9c8
RDX:
0000000000000000 RSI:
0000000000000027 RDI:
ffff88f61fc5c9c0
RBP:
ffffcd7c446b0000 R08:
ffffffff9a9405f0 R09:
0000000000005492
R10:
00000000000030ea R11:
ffffffff9a9405f0 R12:
0000000000000000
R13:
0000000000000000 R14:
0000000000000000 R15:
ffff88e703c4ac00
FS:
0000000000000000(0000) GS:
ffff88f61fc40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000055f4da6e9878 CR3:
0000000c71048000 CR4:
00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace
0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:
ffff9933c6c57bd0 EFLAGS:
00000246
RAX:
000000000000003e RBX:
0000000000000000 RCX:
ffff88f61fc5c9c8
RDX:
0000000000000000 RSI:
0000000000000027 RDI:
ffff88f61fc5c9c0
RBP:
ffffcd7c446b0000 R08:
ffffffff9a9405f0 R09:
0000000000005492
R10:
00000000000030ea R11:
ffffffff9a9405f0 R12:
0000000000000000
R13:
0000000000000000 R14:
0000000000000000 R15:
ffff88e703c4ac00
FS:
0000000000000000(0000) GS:
ffff88f61fc40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000055f4da6e9878 CR3:
0000000c71048000 CR4:
00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt. But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.
Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet.
Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com
Fixes:
478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Fri, 17 May 2024 13:01:18 +0000 (15:01 +0200)]
kasan, fortify: properly rename memintrinsics
After commit
69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*()
functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled,
even though the compiler instruments meminstrinsics by generating calls to
__asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses
uninstrumented memset/memmove/memcpy as the underlying functions.
As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy.
This also makes KASAN tests corrupt kernel memory and cause crashes.
To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying
functions whenever appropriate. Do this only for the instrumented code
(as indicated by __SANITIZE_ADDRESS__).
Link: https://lkml.kernel.org/r/20240517130118.759301-1-andrey.konovalov@linux.dev
Fixes:
69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions")
Fixes:
51287dcb00cc ("kasan: emit different calls for instrumentable memintrinsics")
Fixes:
36be5cba99f6 ("kasan: treat meminstrinsic as builtins in uninstrumented files")
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Reported-by: Nico Pache <npache@redhat.com>
Closes: https://lore.kernel.org/all/
20240501144156.
17e65021@outsider.home/
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Nico Pache <npache@redhat.com>
Acked-by: Nico Pache <npache@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Tue, 14 May 2024 16:31:28 +0000 (09:31 -0700)]
lib: add version into /proc/allocinfo output
Add version string and a header at the beginning of /proc/allocinfo to
allow later format changes. Example output:
> head /proc/allocinfo
allocinfo - version: 1.0
# <size> <calls> <tag info>
0 0 init/main.c:1314 func:do_initcalls
0 0 init/do_mounts.c:353 func:mount_nodev_root
0 0 init/do_mounts.c:187 func:mount_root_generic
0 0 init/do_mounts.c:158 func:do_mount_root
0 0 init/initramfs.c:493 func:unpack_to_rootfs
0 0 init/initramfs.c:492 func:unpack_to_rootfs
0 0 init/initramfs.c:491 func:unpack_to_rootfs
512 1 arch/x86/events/rapl.c:681 func:init_rapl_pmus
128 1 arch/x86/events/rapl.c:571 func:rapl_cpu_online
[akpm@linux-foundation.org: remove stray newline from struct allocinfo_private]
Link: https://lkml.kernel.org/r/20240514163128.3662251-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hailong.Liu [Fri, 10 May 2024 10:01:31 +0000 (18:01 +0800)]
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
commit
a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
includes support for __GFP_NOFAIL, but it presents a conflict with commit
dd544141b9eb ("vmalloc: back off when the current task is OOM-killed"). A
possible scenario is as follows:
process-a
__vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
__vmalloc_area_node()
vm_area_alloc_pages()
--> oom-killer send SIGKILL to process-a
if (fatal_signal_pending(current)) break;
--> return NULL;
To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
if __GFP_NOFAIL set.
This issue occurred during OPLUS KASAN TEST. Below is part of the log
-> oom-killer sends signal to process
[65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198
[65731.259685] [T32454] Call trace:
[65731.259698] [T32454] dump_backtrace+0xf4/0x118
[65731.259734] [T32454] show_stack+0x18/0x24
[65731.259756] [T32454] dump_stack_lvl+0x60/0x7c
[65731.259781] [T32454] dump_stack+0x18/0x38
[65731.259800] [T32454] mrdump_common_die+0x250/0x39c [mrdump]
[65731.259936] [T32454] ipanic_die+0x20/0x34 [mrdump]
[65731.260019] [T32454] atomic_notifier_call_chain+0xb4/0xfc
[65731.260047] [T32454] notify_die+0x114/0x198
[65731.260073] [T32454] die+0xf4/0x5b4
[65731.260098] [T32454] die_kernel_fault+0x80/0x98
[65731.260124] [T32454] __do_kernel_fault+0x160/0x2a8
[65731.260146] [T32454] do_bad_area+0x68/0x148
[65731.260174] [T32454] do_mem_abort+0x151c/0x1b34
[65731.260204] [T32454] el1_abort+0x3c/0x5c
[65731.260227] [T32454] el1h_64_sync_handler+0x54/0x90
[65731.260248] [T32454] el1h_64_sync+0x68/0x6c
[65731.260269] [T32454] z_erofs_decompress_queue+0x7f0/0x2258
--> be->decompressed_pages = kvcalloc(be->nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL);
kernel panic by NULL pointer dereference.
erofs assume kvmalloc with __GFP_NOFAIL never return NULL.
[65731.260293] [T32454] z_erofs_runqueue+0xf30/0x104c
[65731.260314] [T32454] z_erofs_readahead+0x4f0/0x968
[65731.260339] [T32454] read_pages+0x170/0xadc
[65731.260364] [T32454] page_cache_ra_unbounded+0x874/0xf30
[65731.260388] [T32454] page_cache_ra_order+0x24c/0x714
[65731.260411] [T32454] filemap_fault+0xbf0/0x1a74
[65731.260437] [T32454] __do_fault+0xd0/0x33c
[65731.260462] [T32454] handle_mm_fault+0xf74/0x3fe0
[65731.260486] [T32454] do_mem_abort+0x54c/0x1b34
[65731.260509] [T32454] el0_da+0x44/0x94
[65731.260531] [T32454] el0t_64_sync_handler+0x98/0xb4
[65731.260553] [T32454] el0t_64_sync+0x198/0x19c
Link: https://lkml.kernel.org/r/20240510100131.1865-1-hailong.liu@oppo.com
Fixes:
9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
Signed-off-by: Hailong.Liu <hailong.liu@oppo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Barry Song <21cnbao@gmail.com>
Reported-by: Oven <liyangouwen1@oppo.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Fri, 24 May 2024 17:46:35 +0000 (10:46 -0700)]
Merge tag 'riscv-for-linus-6.10-mw2' of git://git./linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- The compression format used for boot images is now configurable at
build time, and these formats are shown in `make help`
- access_ok() has been optimized
- A pair of performance bugs have been fixed in the uaccess handlers
- Various fixes and cleanups, including one for the IMSIC build failure
and one for the early-boot ftrace illegal NOPs bug
* tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix early ftrace nop patching
irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
riscv: selftests: Add signal handling vector tests
riscv: mm: accelerate pagefault when badaccess
riscv: uaccess: Relax the threshold for fast path
riscv: uaccess: Allow the last potential unrolled copy
riscv: typo in comment for get_f64_reg
Use bool value in set_cpu_online()
riscv: selftests: Add hwprobe binaries to .gitignore
riscv: stacktrace: fixed walk_stackframe()
ftrace: riscv: move from REGS to ARGS
riscv: do not select MODULE_SECTIONS by default
riscv: show help string for riscv-specific targets
riscv: make image compression configurable
riscv: cpufeature: Fix extension subset checking
riscv: cpufeature: Fix thead vector hwcap removal
riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
riscv: Define TASK_SIZE_MAX for __access_ok()
riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
Linus Torvalds [Fri, 24 May 2024 17:24:49 +0000 (10:24 -0700)]
Merge tag 'for-linus-6.10a-rc1-tag' of git://git./linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a small cleanup in the drivers/xen/xenbus Makefile
- a fix of the Xen xenstore driver to improve connecting to a late
started Xenstore
- an enhancement for better support of ballooning in PVH guests
- a cleanup using try_cmpxchg() instead of open coding it
* tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
drivers/xen: Improve the late XenStore init protocol
xen/xenbus: Use *-y instead of *-objs in Makefile
xen/x86: add extra pages to unpopulated-alloc if available
locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()
Linus Torvalds [Fri, 24 May 2024 16:40:31 +0000 (09:40 -0700)]
Merge tag 'for-6.10-tag' of git://git./linux/kernel/git/kdave/linux
Pull more btrfs updates from David Sterba:
"A few more updates, mostly stability fixes or user visible changes:
- fix race in zoned mode during device replace that can lead to
use-after-free
- update return codes and lower message levels for quota rescan where
it's causing false alerts
- fix unexpected qgroup id reuse under some conditions
- fix condition when looking up extent refs
- add option norecovery (removed in 6.8), the intended replacements
haven't been used and some aplications still rely on the old one
- build warning fixes"
* tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: re-introduce 'norecovery' mount option
btrfs: fix end of tree detection when searching for data extent ref
btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning
btrfs: zoned: fix use-after-free due to race with dev replace
btrfs: qgroup: fix qgroup id collision across mounts
btrfs: qgroup: update rescan message levels and error codes
Linus Torvalds [Fri, 24 May 2024 16:31:50 +0000 (09:31 -0700)]
Merge tag 'erofs-for-6.10-rc1-2' of git://git./linux/kernel/git/xiang/erofs
Pull more erofs updates from Gao Xiang:
"The main ones are metadata API conversion to byte offsets by Al Viro.
Another patch gets rid of unnecessary memory allocation out of DEFLATE
decompressor. The remaining one is a trivial cleanup.
- Convert metadata APIs to byte offsets
- Avoid allocating DEFLATE streams unnecessarily
- Some erofs_show_options() cleanup"
* tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: avoid allocating DEFLATE streams before mounting
z_erofs_pcluster_begin(): don't bother with rounding position down
erofs: don't round offset down for erofs_read_metabuf()
erofs: don't align offset for erofs_read_metabuf() (simple cases)
erofs: mechanically convert erofs_read_metabuf() to offsets
erofs: clean up erofs_show_options()
Linus Torvalds [Fri, 24 May 2024 16:07:22 +0000 (09:07 -0700)]
Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Nothing exciting, just syzbot fixes (except for the one
FMODE_CAN_ODIRECT patch).
Looks like syzbot reports have slowed down; this is all catch up from
two weeks of conferences.
Next hardening project is using Thomas's error injection tooling to
torture test repair"
* tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Fix race path in bch2_inode_insert()
bcachefs: Ensure we're RW before journalling
bcachefs: Fix shutdown ordering
bcachefs: Fix unsafety in bch2_dirent_name_bytes()
bcachefs: Fix stack oob in __bch2_encrypt_bio()
bcachefs: Fix btree_trans leak in bch2_readahead()
bcachefs: Fix bogus verify_replicas_entry() assert
bcachefs: Check for subvolues with bogus snapshot/inode fields
bcachefs: bch2_checksum() returns 0 for unknown checksum type
bcachefs: Fix bch2_alloc_ciphers()
bcachefs: Add missing guard in bch2_snapshot_has_children()
bcachefs: Fix missing parens in drop_locks_do()
bcachefs: Improve bch2_assert_pos_locked()
bcachefs: Fix shift overflows in replicas.c
bcachefs: Fix shift overflow in btree_lost_data()
bcachefs: Fix ref in trans_mark_dev_sbs() error path
bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
bcachefs: Fix rcu splat in check_fix_ptrs()
Linus Torvalds [Fri, 24 May 2024 16:01:21 +0000 (09:01 -0700)]
Merge tag 'input-for-v6.10-rc0' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a change to input core to trim amount of keys data in modalias string
in case when a device declares too many keys and they do not fit in
uevent buffer instead of reporting an error which results in uevent
not being generated at all
- support for Machenike G5 Pro Controller added to xpad driver
- support for FocalTech FT5452 and FT8719 added to edt-ft5x06
- support for new SPMI vibrator added to pm8xxx-vibrator driver
- missing locking added to cyapa touchpad driver
- removal of unused fields in various driver structures
- explicit initialization of i2c_device_id::driver_data to 0 dropped
from input drivers
- other assorted fixes and cleanups.
* tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits)
Input: edt-ft5x06 - add support for FocalTech FT5452 and FT8719
dt-bindings: input: touchscreen: edt-ft5x06: Document FT5452 and FT8719 support
Input: xpad - add support for Machenike G5 Pro Controller
Input: try trimming too long modalias strings
Input: drop explicit initialization of struct i2c_device_id::driver_data to 0
Input: zet6223 - remove an unused field in struct zet6223_ts
Input: chipone_icn8505 - remove an unused field in struct icn8505_data
Input: cros_ec_keyb - remove an unused field in struct cros_ec_keyb
Input: lpc32xx-keys - remove an unused field in struct lpc32xx_kscan_drv
Input: matrix_keypad - remove an unused field in struct matrix_keypad
Input: tca6416-keypad - remove unused struct tca6416_drv_data
Input: tca6416-keypad - remove an unused field in struct tca6416_keypad_chip
Input: da7280 - remove an unused field in struct da7280_haptic
Input: ff-core - prefer struct_size over open coded arithmetic
Input: cyapa - add missing input core locking to suspend/resume functions
input: pm8xxx-vibrator: add new SPMI vibrator support
dt-bindings: input: qcom,pm8xxx-vib: add new SPMI vibrator module
input: pm8xxx-vibrator: refactor to support new SPMI vibrator
Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation
Input: sur40 - convert le16 to cpu before use
...
Linus Torvalds [Fri, 24 May 2024 15:48:51 +0000 (08:48 -0700)]
Merge tag 'sound-fix-6.10-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes for 6.10-rc1. Most of changes are various
device-specific fixes and quirks, while there are a few small changes
in ALSA core timer and module / built-in fixes"
* tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11.
ALSA: core: Enable proc module when CONFIG_MODULES=y
ALSA: core: Fix NULL module pointer assignment at card init
ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string
ASoC: tas2781: Fix wrong loading calibrated data sequence
ASoC: tas2552: Add TX path for capturing AUDIO-OUT data
ALSA: usb-audio: Fix for sampling rates support for Mbox3
Documentation: sound: Fix trailing whitespaces
ALSA: timer: Set lower bound of start tick time
ASoC: codecs: ES8326: solve hp and button detect issue
ASoC: rt5645: mic-in detection threshold modification
ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection
Linus Torvalds [Fri, 24 May 2024 15:43:25 +0000 (08:43 -0700)]
Merge tag 'char-misc-6.10-rc1-fix' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
merge window, and has been sitting in my tree and linux-next for quite
a while now, but wasn't sent to you (my fault, travels...)
It is a bugfix to resolve an error in the speakup code that could
overflow a buffer.
It has been in linux-next for a while with no reported problems"
* tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
speakup: Fix sizeof() vs ARRAY_SIZE() bug
Linus Torvalds [Fri, 24 May 2024 15:38:28 +0000 (08:38 -0700)]
Merge tag 'tty-6.10-rc1-fixes' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small TTY and Serial driver fixes that missed the
6.9-final merge window, but have been in my tree for weeks (my fault,
travel caused me to miss this)
These fixes include:
- more n_gsm fixes for reported problems
- 8520_mtk driver fix
- 8250_bcm7271 driver fix
- sc16is7xx driver fix
All of these have been in linux-next for weeks without any reported
problems"
* tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler
serial: 8250_bcm7271: use default_mux_rate if possible
serial: 8520_mtk: Set RTS on shutdown for Rx in-band wakeup
tty: n_gsm: fix missing receive state reset after mode switch
tty: n_gsm: fix possible out-of-bounds in gsm0_receive()
Linus Torvalds [Fri, 24 May 2024 15:33:44 +0000 (08:33 -0700)]
Merge tag 'hardening-v6.10-rc1-fixes' of git://git./linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module
decompression (Stephen Boyd)
- ubsan: Restore dependency on ARCH_HAS_UBSAN
- kunit/fortify: Fix memcmp() test to be amplitude agnostic
* tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kunit/fortify: Fix memcmp() test to be amplitude agnostic
ubsan: Restore dependency on ARCH_HAS_UBSAN
loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
Linus Torvalds [Fri, 24 May 2024 15:27:34 +0000 (08:27 -0700)]
Merge tag 'trace-tracefs-v6.10' of git://git./linux/kernel/git/trace/linux-trace
Pull tracefs/eventfs updates from Steven Rostedt:
"Bug fixes:
- The eventfs directories need to have unique inode numbers. Make
sure that they do not get the default file inode number.
- Update the inode uid and gid fields on remount.
When a remount happens where a uid and/or gid is specified, all the
tracefs files and directories should get the specified uid and/or
gid. But this can be sporadic when some uids were assigned already.
There's already a list of inodes that are allocated. Just update
their uid and gid fields at the time of remount.
- Update the eventfs_inodes on remount from the top level "events"
descriptor.
There was a bug where not all the eventfs files or directories
where getting updated on remount. One fix was to clear the
SAVED_UID/GID flags from the inode list during the iteration of the
inodes during the remount. But because the eventfs inodes can be
freed when the last referenced is released, not all the
eventfs_inodes were being updated. This lead to the ownership
selftest to fail if it was run a second time (the first time would
leave eventfs_inodes with no corresponding tracefs_inode).
Instead, for eventfs_inodes, only process the "events"
eventfs_inode from the list iteration, as it is guaranteed to have
a tracefs_inode (it's never freed while the "events" directory
exists). As it has a list of its children, and the children have a
list of their children, just iterate all the eventfs_inodes from
the "events" descriptor and it is guaranteed to get all of them.
- Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
callback. But this is the wrong location. The iput() callback is
called when the last reference to the dentry inode is hit. There
could be a case where two dentry's have the same inode, and the
flag will be cleared prematurely. The flag needs to be cleared when
the last reference of the inode is dropped and that happens in the
inode's drop_inode() callback handler.
Cleanups:
- Consolidate the creation of a tracefs_inode for an eventfs_inode
A tracefs_inode is created for both files and directories of the
eventfs system. It is open coded. Instead, consolidate it into a
single eventfs_get_inode() function call.
- Remove the eventfs getattr and permission callbacks.
The permissions for the eventfs files and directories are updated
when the inodes are created, on remount, and when the user sets
them (via setattr). The inodes hold the current permissions so
there is no need to have custom getattr or permissions callbacks as
they will more likely cause them to be incorrect. The inode's
permissions are updated when they should be updated. Remove the
getattr and permissions inode callbacks.
- Do not update eventfs_inode attributes on creation of inodes.
The eventfs_inodes attribute field is used to store the permissions
of the directories and files for when their corresponding inodes
are freed and are created again. But when the creation of the
inodes happen, the eventfs_inode attributes are recalculated. The
recalculation should only happen when the permissions change for a
given file or directory. Currently, the attribute changes are just
being set to their current files so this is not a bug, but it's
unnecessary and error prone. Stop doing that.
- The events directory inode is created once when the events
directory is created and deleted when it is deleted. It is now
updated on remount and when the user changes the permissions.
There's no need to use the eventfs_inode of the events directory to
store the events directory permissions. But using it to store the
default permissions for the files within the directory that have
not been updated by the user can simplify the code"
* tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Do not use attributes for events directory
eventfs: Cleanup permissions in creation of inodes
eventfs: Remove getattr and permission callbacks
eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
eventfs: Update all the eventfs_inodes from the events descriptor
tracefs: Update inode permissions on remount
eventfs: Keep the directories from having the same inode number as files
dicken.ding [Fri, 24 May 2024 09:17:39 +0000 (17:17 +0800)]
genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
irq_find_at_or_after() dereferences the interrupt descriptor which is
returned by mt_find() while neither holding sparse_irq_lock nor RCU read
lock, which means the descriptor can be freed between mt_find() and the
dereference:
CPU0 CPU1
desc = mt_find()
delayed_free_desc(desc)
irq_desc_get_irq(desc)
The use-after-free is reported by KASAN:
Call trace:
irq_get_next_irq+0x58/0x84
show_stat+0x638/0x824
seq_read_iter+0x158/0x4ec
proc_reg_read_iter+0x94/0x12c
vfs_read+0x1e0/0x2c8
Freed by task 4471:
slab_free_freelist_hook+0x174/0x1e0
__kmem_cache_free+0xa4/0x1dc
kfree+0x64/0x128
irq_kobj_release+0x28/0x3c
kobject_put+0xcc/0x1e0
delayed_free_desc+0x14/0x2c
rcu_do_batch+0x214/0x720
Guard the access with a RCU read lock section.
Fixes:
721255b9826b ("genirq: Use a maple tree for interrupt descriptor management")
Signed-off-by: dicken.ding <dicken.ding@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
Konstantin Komarov [Tue, 23 Apr 2024 14:21:58 +0000 (17:21 +0300)]
fs/ntfs3: Break dir enumeration if directory contents error
If we somehow attempt to read beyond the directory size, an error
is supposed to be returned.
However, in some cases, read requests do not stop and instead enter
into a loop.
To avoid this, we set the position in the directory to the end.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org
Konstantin Komarov [Tue, 23 Apr 2024 12:31:56 +0000 (15:31 +0300)]
fs/ntfs3: Fix case when index is reused during tree transformation
In most cases when adding a cluster to the directory index,
they are placed at the end, and in the bitmap, this cluster corresponds
to the last bit. The new directory size is calculated as follows:
data_size = (u64)(bit + 1) << indx->index_bits;
In the case of reusing a non-final cluster from the index,
data_size is calculated incorrectly, resulting in the directory size
differing from the actual size.
A check for cluster reuse has been added, and the size update is skipped.
Fixes:
82cae269cfa95 ("fs/ntfs3: Add initialization of super block")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org
Jeff Xu [Mon, 15 Apr 2024 16:35:24 +0000 (16:35 +0000)]
selftest mm/mseal read-only elf memory segment
Sealing read-only of elf mapping so it can't be changed by mprotect.
[jeffxu@chromium.org: style change]
Link: https://lkml.kernel.org/r/20240416220944.2481203-2-jeffxu@chromium.org
[amer.shanawany@gmail.com: fix linker error for inline function]
Link: https://lkml.kernel.org/r/20240420202346.546444-1-amer.shanawany@gmail.com
[jeffxu@chromium.org: fix compile warning]
Link: https://lkml.kernel.org/r/20240420003515.345982-2-jeffxu@chromium.org
[jeffxu@chromium.org: fix arm build]
Link: https://lkml.kernel.org/r/20240502225331.3806279-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-6-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Signed-off-by: Amer Al Shanawany <amer.shanawany@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeff Xu [Mon, 15 Apr 2024 16:35:23 +0000 (16:35 +0000)]
mseal: add documentation
Add documentation for mseal().
Link: https://lkml.kernel.org/r/20240415163527.626541-5-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeff Xu [Mon, 15 Apr 2024 16:35:22 +0000 (16:35 +0000)]
selftest mm/mseal memory sealing
selftest for memory sealing change in mmap() and mseal().
Link: https://lkml.kernel.org/r/20240415163527.626541-4-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeff Xu [Mon, 15 Apr 2024 16:35:21 +0000 (16:35 +0000)]
mseal: add mseal syscall
The new mseal() is an syscall on 64 bit CPU, and with following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
Following input during RFC are incooperated into this patch:
Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Liam R. Howlett: perf optimization.
Theo de Raadt: sharing the experiences and insight gained from
implementing mimmutable() in OpenBSD.
Finally, the idea that inspired this patch comes from Stephen Röttger's
work in Chrome V8 CFI.
[jeffxu@chromium.org: add branch prediction hint, per Pedro]
Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>