linux-2.6-block.git
2 months agoMerge branch 'pdx86/platform-drivers-x86-lenovo-c630' into review-ilpo
Ilpo Järvinen [Mon, 24 Jun 2024 16:06:28 +0000 (19:06 +0300)]
Merge branch 'pdx86/platform-drivers-x86-lenovo-c630' into review-ilpo

2 months agoplatform/arm64: build drivers even on non-ARM64 platforms
Dmitry Baryshkov [Mon, 24 Jun 2024 15:08:12 +0000 (18:08 +0300)]
platform/arm64: build drivers even on non-ARM64 platforms

The Kconfig for platforms/arm64 has 'depends on ARM64 || COMPILE_TEST'.
However due to Makefile having just obj-$(CONFIG_ARM64) the subdir will
not be descended for !ARM64 platforms and thus the drivers won't get
built. This breaks modular builds of other driver drivers which depend
on arm64 platform drivers.

Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 363c8aea2572 ("platform: Add ARM64 platform directory")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240624-ucsi-yoga-ec-driver-v9-1-53af411a9bd6@linaro.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel-uncore-freq: Use generic helpers for current frequency
Tero Kristo [Mon, 17 Jun 2024 06:04:40 +0000 (09:04 +0300)]
platform/x86/intel-uncore-freq: Use generic helpers for current frequency

Use the generic sysfs helpers for reading the current frequency also,
and remove the custom ones.

No functional change intended.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240617060708.892981-8-tero.kristo@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel-uncore-freq: Rename the sysfs helper macro names
Tero Kristo [Mon, 17 Jun 2024 06:04:39 +0000 (09:04 +0300)]
platform/x86/intel-uncore-freq: Rename the sysfs helper macro names

The macros to create sysfs entries are going to be used for other
attributes in addition to current min/max frequencies only, so rename
these to be more generic.

No functional change intended.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240617060708.892981-7-tero.kristo@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel-uncore-freq: Get rid of uncore_read_freq driver API
Tero Kristo [Mon, 17 Jun 2024 06:04:38 +0000 (09:04 +0300)]
platform/x86/intel-uncore-freq: Get rid of uncore_read_freq driver API

Get rid of uncore_read_freq driver API. Instead, add a new entry to the
enumerated read interface and use this.

No functional change intended.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240617060708.892981-6-tero.kristo@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel-uncore-freq: Use uncore_index with read_control_freq
Tero Kristo [Mon, 17 Jun 2024 06:04:37 +0000 (09:04 +0300)]
platform/x86/intel-uncore-freq: Use uncore_index with read_control_freq

Use the enumerated index for selecting the uncore driver parameter to
read, instead of reading everything. This is done in preparation to
expand the API to access more parameters later.

No functional change intended.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240617060708.892981-5-tero.kristo@linux.intel.com
[ij: Removed underscores from variable names]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel-uncore-freq: Get rid of magic min_max argument
Tero Kristo [Mon, 17 Jun 2024 06:04:36 +0000 (09:04 +0300)]
platform/x86/intel-uncore-freq: Get rid of magic min_max argument

Get rid of the hardcoded / magic min_max argument from internal APIs.
Instead, use an enumerated index value for it.

No functional change intended.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240617060708.892981-4-tero.kristo@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel-uncore-freq: Get rid of magic values
Tero Kristo [Mon, 17 Jun 2024 06:04:35 +0000 (09:04 +0300)]
platform/x86/intel-uncore-freq: Get rid of magic values

Get rid of any magic bitmasks from the code. Define proper macros for
these, and use the bitfield operations to access them.

No functional change intended.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240617060708.892981-3-tero.kristo@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel-uncore-freq: Re-arrange bit masks
Tero Kristo [Mon, 17 Jun 2024 06:04:34 +0000 (09:04 +0300)]
platform/x86/intel-uncore-freq: Re-arrange bit masks

Rename the various bitmasks from the 'UNCORE_GENMASK_*' to
'UNCORE_*_MASK', and re-order them based on the register they reside in.

No functional change intended.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240617060708.892981-2-tero.kristo@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoMerge branch 'pdx86/platform-drivers-x86-lenovo-c630' into review-ilpo
Ilpo Järvinen [Tue, 18 Jun 2024 08:20:08 +0000 (11:20 +0300)]
Merge branch 'pdx86/platform-drivers-x86-lenovo-c630' into review-ilpo

3 months agoplatform: arm64: add Lenovo Yoga C630 WOS EC driver
Dmitry Baryshkov [Thu, 13 Jun 2024 23:43:39 +0000 (02:43 +0300)]
platform: arm64: add Lenovo Yoga C630 WOS EC driver

Lenovo Yoga C630 WOS is a laptop using Snapdragon 850 SoC. Like many
laptops it uses an embedded controller (EC) to perform various platform
operations, including, but not limited, to Type-C port control or power
supply handlng.

Add the driver for the EC, that creates devices for UCSI and power
supply devices.

Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240614-yoga-ec-driver-v7-2-9f0b9b40ae76@linaro.org
[ij: added #include <linux/cleanup.h>]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agodt-bindings: platform: Add Lenovo Yoga C630 EC
Bjorn Andersson [Thu, 13 Jun 2024 23:43:38 +0000 (02:43 +0300)]
dt-bindings: platform: Add Lenovo Yoga C630 EC

Add binding for the Embedded Controller found in the Qualcomm
Snapdragon-based Lenovo Yoga C630.

Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240614-yoga-ec-driver-v7-1-9f0b9b40ae76@linaro.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel/ifs: Switch to new Intel CPU model defines
Tony Luck [Tue, 11 Jun 2024 17:14:55 +0000 (10:14 -0700)]
platform/x86/intel/ifs: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240611171455.352536-1-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: dell-pc: avoid double free and invalid unregistration
Thomas Weißschuh [Tue, 4 Jun 2024 21:41:24 +0000 (23:41 +0200)]
platform/x86: dell-pc: avoid double free and invalid unregistration

If platform_profile_register() fails it does kfree(thermal_handler) and
leaves the pointer value around.
Any call to thermal_cleanup() will try to kfree(thermal_handler) again.
This will happen right away in dell_init().
In addition, platform_profile_remove() will be called although no
profile is registered.

NULL out the thermal_handler, so thermal_cleanup() avoids the double free.

Fixes: 996ad4129810 ("platform/x86: dell-pc: Implement platform_profile")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Lyndon Sanche <lsanche@lyndeno.ca>
Link: https://lore.kernel.org/r/20240604-dell-pc-double-free-v1-1-6d81255b2a44@weissschuh.net
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: thinkpad_acpi: Use 2-argument strscpy()
Andy Shevchenko [Sun, 2 Jun 2024 08:57:59 +0000 (11:57 +0300)]
platform/x86: thinkpad_acpi: Use 2-argument strscpy()

Use 2-argument strscpy(), which is not only shorter but also provides
an additional check that destination buffer is an array.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20240602090244.1666360-7-andy.shevchenko@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: think-lmi: Use 2-argument strscpy()
Andy Shevchenko [Sun, 2 Jun 2024 08:57:58 +0000 (11:57 +0300)]
platform/x86: think-lmi: Use 2-argument strscpy()

Use 2-argument strscpy(), which is not only shorter but also provides
an additional check that destination buffer is an array.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20240602090244.1666360-6-andy.shevchenko@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: serial-multi-instantiate: Use 2-argument strscpy()
Andy Shevchenko [Sun, 2 Jun 2024 08:57:57 +0000 (11:57 +0300)]
platform/x86: serial-multi-instantiate: Use 2-argument strscpy()

Use 2-argument strscpy(), which is not only shorter but also provides
an additional check that destination buffer is an array.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20240602090244.1666360-5-andy.shevchenko@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: intel: chtwc_int33fe: Use 2-argument strscpy()
Andy Shevchenko [Sun, 2 Jun 2024 08:57:56 +0000 (11:57 +0300)]
platform/x86: intel: chtwc_int33fe: Use 2-argument strscpy()

Use 2-argument strscpy(), which is not only shorter but also provides
an additional check that destination buffer is an array.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20240602090244.1666360-4-andy.shevchenko@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: hp: hp-bioscfg: Use 2-argument strscpy()
Andy Shevchenko [Sun, 2 Jun 2024 08:57:55 +0000 (11:57 +0300)]
platform/x86: hp: hp-bioscfg: Use 2-argument strscpy()

Use 2-argument strscpy(), which is not only shorter but also provides
an additional check that destination buffer is an array.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20240602090244.1666360-3-andy.shevchenko@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: asus-tf103c-dock: Use 2-argument strscpy()
Andy Shevchenko [Sun, 2 Jun 2024 08:57:54 +0000 (11:57 +0300)]
platform/x86: asus-tf103c-dock: Use 2-argument strscpy()

Use 2-argument strscpy(), which is not only shorter but also provides
an additional check that destination buffer is an array.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240602090244.1666360-2-andy.shevchenko@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: intel_scu_wdt: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:37:05 +0000 (13:37 -0700)]
platform/x86: intel_scu_wdt: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240531203706.233365-10-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: intel_speed_select_if: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:37:04 +0000 (13:37 -0700)]
platform/x86: intel_speed_select_if: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240531203706.233365-9-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: intel-uncore-freq: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:37:03 +0000 (13:37 -0700)]
platform/x86: intel-uncore-freq: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240531203706.233365-8-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: intel_ips: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:37:02 +0000 (13:37 -0700)]
platform/x86: intel_ips: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240531203706.233365-7-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: intel_turbo_max_3: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:37:01 +0000 (13:37 -0700)]
platform/x86: intel_turbo_max_3: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240531203706.233365-6-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: p2sb: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:37:00 +0000 (13:37 -0700)]
platform/x86: p2sb: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240531203706.233365-5-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agox86/platform/atom: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:36:59 +0000 (13:36 -0700)]
x86/platform/atom: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240531203706.233365-4-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: intel_telemetry: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:36:58 +0000 (13:36 -0700)]
platform/x86: intel_telemetry: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240531203706.233365-3-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel/pmc: Switch to new Intel CPU model defines
Tony Luck [Fri, 31 May 2024 20:36:57 +0000 (13:36 -0700)]
platform/x86/intel/pmc: Switch to new Intel CPU model defines

New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240531203706.233365-2-tony.luck@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agodoc: TPMI: Add entry for Performance Limit Reasons
Tero Kristo [Mon, 27 May 2024 13:29:38 +0000 (16:29 +0300)]
doc: TPMI: Add entry for Performance Limit Reasons

Describe the new 'plr' (Performance Limit Reasons) directory contents under
the main TPMI debugfs folder.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20240527133400.483634-7-tero.kristo@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel/tpmi/plr: Add support for the plr mailbox
Tero Kristo [Mon, 27 May 2024 13:29:37 +0000 (16:29 +0300)]
platform/x86/intel/tpmi/plr: Add support for the plr mailbox

Add support for reading fine grained power limit reasons via the PLR
mailbox.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20240527133400.483634-6-tero.kristo@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel/tpmi: Add new auxiliary driver for performance limits
Tero Kristo [Mon, 27 May 2024 13:29:36 +0000 (16:29 +0300)]
platform/x86/intel/tpmi: Add new auxiliary driver for performance limits

Add new auxiliary driver that exposes the SoC performance limit reasons
via debugfs interface.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20240527133400.483634-5-tero.kristo@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel: TPMI domain id and CPU mapping
Srinivas Pandruvada [Tue, 28 May 2024 07:34:57 +0000 (10:34 +0300)]
platform/x86/intel: TPMI domain id and CPU mapping

Each TPMI power domain includes a group of CPUs. Several power
management settings in this case applicable to a group of CPUs.
There can be several power domains in a CPU package. So, provide
interfaces for:
- Get power domain id for a Linux CPU
- Get mask of Linux CPUs in a power domain

Hardware Punit uses different CPU numbering, which is not based on
APIC (Advanced Programmable Interrupt Controller) CPU numbering.
The Linux CPU numbering is based on APIC CPU numbering. Some PM features
like Intel Speed Select, the CPU core mask provided by the hardware is
based on the Punit CPU numbering. To use the core mask, this mask
needs to be converted to a Linux CPUs mask. So, provide interfaces for:
- Convert to a Linux CPU number from a Punit CPU number
- Convert to a Punit CPU number from a Linux CPU number

On each CPU online, MSR 0x54 is used to read the mapping and stores in
a per cpu array. Create a hash for faster searching of a Linux CPU number
from a Punit CPU number.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[tero.kristo: minor updates]
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20240528073457.497816-1-tero.kristo@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel/tpmi: Add API to get debugfs root
Tero Kristo [Mon, 27 May 2024 13:29:34 +0000 (16:29 +0300)]
platform/x86/intel/tpmi: Add API to get debugfs root

Add new API to get the debugfs root directory for TPMI. This allows any
TPMI devices to add their own debugfs items under the same directory
structure.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20240527133400.483634-3-tero.kristo@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/intel/tpmi: Add support for performance limit reasons
Tero Kristo [Mon, 27 May 2024 13:29:33 +0000 (16:29 +0300)]
platform/x86/intel/tpmi: Add support for performance limit reasons

Add TPMI ID 0x0C (Perf Limit Reasons) to the list of supported TPMI IDs.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20240527133400.483634-2-tero.kristo@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: ISST: Use only TPMI interface when present
Srinivas Pandruvada [Fri, 31 May 2024 08:35:48 +0000 (01:35 -0700)]
platform/x86: ISST: Use only TPMI interface when present

When the TPMI interface is present, use this interface instead of legacy.
On some systems legacy IO device is also present. Using both interfaces
together is confusing and may set the hardware in inconsistent state.

When TPMI interface is present, don't load legacy drivers.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240531083554.1313110-4-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: ISST: Avoid some SkyLake server models
Srinivas Pandruvada [Fri, 31 May 2024 08:35:47 +0000 (01:35 -0700)]
platform/x86: ISST: Avoid some SkyLake server models

Some SkyLake server variants don't support any SST functionality. No use
of providing any SST related interfaces on them.

All supported SkyLake servers provide mailbox interface via MSR. So check
for the presence of MSR 0xB0 and 0xB1. If not present don't load common
module.

Move defines for MSR_OS_MAILBOX_INTERFACE and MSR_OS_MAILBOX_DATA to
common header file to avoid duplicating them.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240531083554.1313110-3-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: ISST: Add model specific loading for common module
Srinivas Pandruvada [Fri, 31 May 2024 08:35:46 +0000 (01:35 -0700)]
platform/x86: ISST: Add model specific loading for common module

SST common module is loaded when model specific or TPMI SST driver
registers for services. There are model specific features used in SST
common modules which are checked with a CPU model list. So, this module
is model specific.

There are some use cases where loading the common module independently
only on the supported CPU models helps. The first use case is for
preventing SST TPMI module loading if the model specific features are
not implemented. The second use case for presenting information to
user space when SST is used in OOB (Out of Band) mode.

1.
With TPMI, SST interface is architectural. This means that no need to add
new PCI device IDs for new CPU models. This means that there can be lag
in adding CPU models for the model specific features in the common
module. For example, before adding CPU model to GRANITERAPIDS_D to
hpm_cpu_ids[], SST is still functional for some features and but will
get/set wrong data for features like SST-CP. This is because IOCTL
ISST_IF_GET_PHY_ID, will not give correct mapping for newer CPU models.
So adding explicit model check during load time will prevent such cases.
For unsupported CPU models, common driver will fail to load and hence
dependent modules will not be loaded.

2.
When the SST TPMI features are controlled by some OOB agent (not from OS
interface), even if the CPU model is supported, there will be no user
space interface available for tools as SST TPMI modules will not
be loaded. User space interface is registered when TPMI modules call
isst_if_cdev_register(). Even in this case user space orchestrator
software needs to get power domain information to schedule workload and
get/set turbo ratio limits. This information is exposed by the common
module using IOCTLs ISST_IF_GET_PHY_ID and ISST_IF_MSR_COMMAND
respectively. Since the user space MSR access can be locked, direct MSR
access from the user space is not an option using /dev/cpu/*/msr.

Converge all the existing model checks to one common place and
use driver data to differentiate. On successful model check
call isst_misc_reg().

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20240531083554.1313110-2-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: dell-pc: Implement platform_profile
Lyndon Sanche [Wed, 29 May 2024 17:47:46 +0000 (11:47 -0600)]
platform/x86: dell-pc: Implement platform_profile

Some Dell laptops support configuration of preset fan modes through
smbios tables.

If the platform supports these fan modes, set up platform_profile to
change these modes. If not supported, skip enabling platform_profile.

Signed-off-by: Lyndon Sanche <lsanche@lyndeno.ca>
Link: https://lore.kernel.org/r/20240529174843.13226-4-lsanche@lyndeno.ca
[ij: Adjust #includes into alphabetical order]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: dell-smbios: Move request functions for reuse
Lyndon Sanche [Wed, 29 May 2024 17:47:45 +0000 (11:47 -0600)]
platform/x86: dell-smbios: Move request functions for reuse

Move the dell_send_request and dell_fill_request symbols into smbios-base
so that they can be used in multiple modules.

Signed-off-by: Lyndon Sanche <lsanche@lyndeno.ca>
Link: https://lore.kernel.org/r/20240529174843.13226-3-lsanche@lyndeno.ca
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86: dell-smbios: Add helper for checking supported class
Lyndon Sanche [Wed, 29 May 2024 17:47:44 +0000 (11:47 -0600)]
platform/x86: dell-smbios: Add helper for checking supported class

Add helper that returns true if smbios command class is supported.

Signed-off-by: Lyndon Sanche <lsanche@lyndeno.ca>
Link: https://lore.kernel.org/r/20240529174843.13226-2-lsanche@lyndeno.ca
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoplatform/x86/amd/pmf: Use memdup_user()
Thorsten Blum [Mon, 27 May 2024 08:36:29 +0000 (10:36 +0200)]
platform/x86/amd/pmf: Use memdup_user()

Switch to memdup_user() to overwrite the allocated memory only once
instead of initializing the allocated memory to zero with kzalloc() and
then immediately overwriting it with copy_from_user().

Fixes the following Coccinelle/coccicheck warning reported by
memdup_user.cocci:

WARNING opportunity for memdup_user

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Link: https://lore.kernel.org/r/20240527083628.210491-2-thorsten.blum@toblux.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 months agoLinux 6.10-rc1 v6.10-rc1
Linus Torvalds [Sun, 26 May 2024 22:20:12 +0000 (15:20 -0700)]
Linux 6.10-rc1

3 months agomm: percpu: Include smp.h in alloc_tag.h
Kent Overstreet [Fri, 24 May 2024 15:42:09 +0000 (11:42 -0400)]
mm: percpu: Include smp.h in alloc_tag.h

percpu.h depends on smp.h, but doesn't include it directly because of
circular header dependency issues; percpu.h is needed in a bunch of low
level headers.

This fixes a randconfig build error on mips:

  include/linux/alloc_tag.h: In function '__alloc_tag_ref_set':
  include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration]

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 24e44cc22aa3 ("mm: percpu: enable per-cpu allocation tagging")
Closes: https://lore.kernel.org/oe-kbuild-all/202405210052.DIrMXJNz-lkp@intel.com/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 months agoMerge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 26 May 2024 16:54:26 +0000 (09:54 -0700)]
Merge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git./linux/kernel/git/perf/perf-tools

Pull perf tool fix from Arnaldo Carvalho de Melo:
 "Revert a patch causing a regression.

  This made a simple 'perf record -e cycles:pp make -j199' stop working
  on the Ampere ARM64 system Linus uses to test ARM64 kernels".

* tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"

3 months agoRevert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
Arnaldo Carvalho de Melo [Sun, 26 May 2024 11:13:21 +0000 (08:13 -0300)]
Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"

This reverts commit 617824a7f0f73e4de325cf8add58e55b28c12493.

This made a simple 'perf record -e cycles:pp make -j199' stop working on
the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed
at length in the threads in the Link tags below.

The fix provided by Ian wasn't acceptable and work to fix this will take
time we don't have at this point, so lets revert this and work on it on
the next devel cycle.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Ethan Adams <j.ethan.adams@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com
Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 months agoMerge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 26 May 2024 05:33:10 +0000 (22:33 -0700)]
Merge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - two important netfs integration fixes - including for a data
   corruption and also fixes for multiple xfstests

 - reenable swap support over SMB3

* tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix missing set of remote_i_size
  cifs: Fix smb3_insert_range() to move the zero_point
  cifs: update internal version number
  smb3: reenable swapfiles over SMB3 mounts

3 months agoMerge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 25 May 2024 22:10:33 +0000 (15:10 -0700)]
Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "16 hotfixes, 11 of which are cc:stable.

  A few nilfs2 fixes, the remainder are for MM: a couple of selftests
  fixes, various singletons fixing various issues in various parts"

* tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/ksm: fix possible UAF of stable_node
  mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
  mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
  nilfs2: fix potential hang in nilfs_detach_log_writer()
  nilfs2: fix unexpected freezing of nilfs_segctor_sync()
  nilfs2: fix use-after-free of timer for log writer thread
  selftests/mm: fix build warnings on ppc64
  arm64: patching: fix handling of execmem addresses
  selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
  selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
  selftests/mm: compaction_test: fix bogus test success on Aarch64
  mailmap: update email address for Satya Priya
  mm/huge_memory: don't unpoison huge_zero_folio
  kasan, fortify: properly rename memintrinsics
  lib: add version into /proc/allocinfo output
  mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL

3 months agoMerge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 May 2024 21:48:40 +0000 (14:48 -0700)]
Merge tag 'irq-urgent-2024-05-25' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:

 - Fix x86 IRQ vector leak caused by a CPU offlining race

 - Fix build failure in the riscv-imsic irqchip driver
   caused by an API-change semantic conflict

 - Fix use-after-free in irq_find_at_or_after()

* tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
  genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
  irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict

3 months agoMerge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 May 2024 21:40:09 +0000 (14:40 -0700)]
Merge tag 'x86-urgent-2024-05-25' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix regressions of the new x86 CPU VFM (vendor/family/model)
   enumeration/matching code

 - Fix crash kernel detection on buggy firmware with
   non-compliant ACPI MADT tables

 - Address Kconfig warning

* tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
  crypto: x86/aes-xts - switch to new Intel CPU model defines
  x86/topology: Handle bogus ACPI tables correctly
  x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y

3 months agoMerge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi
Linus Torvalds [Sat, 25 May 2024 21:32:29 +0000 (14:32 -0700)]
Merge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi

Pull ipmi updates from Corey Minyard:
 "Mostly updates for deprecated interfaces, platform.remove and
  converting from a tasklet to a BH workqueue.

  Also use HAS_IOPORT for disabling inb()/outb()"

* tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi:
  ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void
  ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void
  ipmi: ipmi_ssif: Convert to platform remove callback returning void
  ipmi: ipmi_si_platform: Convert to platform remove callback returning void
  ipmi: ipmi_powernv: Convert to platform remove callback returning void
  ipmi: bt-bmc: Convert to platform remove callback returning void
  char: ipmi: handle HAS_IOPORT dependencies
  ipmi: Convert from tasklet to BH workqueue

3 months agoMerge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client
Linus Torvalds [Sat, 25 May 2024 21:23:58 +0000 (14:23 -0700)]
Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A series from Xiubo that adds support for additional access checks
  based on MDS auth caps which were recently made available to clients.

  This is needed to prevent scenarios where the MDS quietly discards
  updates that a UID-restricted client previously (wrongfully) acked to
  the user.

  Other than that, just a documentation fixup"

* tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
  doc: ceph: update userspace command to get CephFS metadata
  ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
  ceph: check the cephx mds auth access for async dirop
  ceph: check the cephx mds auth access for open
  ceph: check the cephx mds auth access for setattr
  ceph: add ceph_mds_check_access() helper
  ceph: save cap_auths in MDS client when session is opened

3 months agoMerge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3
Linus Torvalds [Sat, 25 May 2024 21:19:01 +0000 (14:19 -0700)]
Merge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 updates from Konstantin Komarov:
 "Fixes:
   - reusing of the file index (could cause the file to be trimmed)
   - infinite dir enumeration
   - taking DOS names into account during link counting
   - le32_to_cpu conversion, 32 bit overflow, NULL check
   - some code was refactored

  Changes:
   - removed max link count info display during driver init

  Remove:
   - atomic_open has been removed for lack of use"

* tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: Break dir enumeration if directory contents error
  fs/ntfs3: Fix case when index is reused during tree transformation
  fs/ntfs3: Mark volume as dirty if xattr is broken
  fs/ntfs3: Always make file nonresident on fallocate call
  fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
  fs/ntfs3: Use variable length array instead of fixed size
  fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
  fs/ntfs3: Check 'folio' pointer for NULL
  fs/ntfs3: Missed le32_to_cpu conversion
  fs/ntfs3: Remove max link count info display during driver init
  fs/ntfs3: Taking DOS names into account during link counting
  fs/ntfs3: remove atomic_open
  fs/ntfs3: use kcalloc() instead of kzalloc()

3 months agoMerge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Sat, 25 May 2024 21:15:39 +0000 (14:15 -0700)]
Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two ksmbd server fixes, both for stable"

* tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: ignore trailing slashes in share paths
  ksmbd: avoid to send duplicate oplock break notifications

3 months agoMerge tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Linus Torvalds [Sat, 25 May 2024 20:33:53 +0000 (13:33 -0700)]
Merge tag 'rtc-6.10' of git://git./linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "There is one new driver and then most of the changes are the device
  tree bindings conversions to yaml.

  New driver:
   - Epson RX8111

  Drivers:
   - Many Device Tree bindings conversions to dtschema
   - pcf8563: wakeup-source support"

* tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  pcf8563: add wakeup-source support
  rtc: rx8111: handle VLOW flag
  rtc: rx8111: demote warnings to debug level
  rtc: rx6110: Constify struct regmap_config
  dt-bindings: rtc: convert trivial devices into dtschema
  dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema
  dt-bindings: rtc: pxa-rtc: convert to dtschema
  rtc: Add driver for Epson RX8111
  dt-bindings: rtc: Add Epson RX8111
  rtc: mcp795: drop unneeded MODULE_ALIAS
  rtc: nuvoton: Modify part number value
  rtc: test: Split rtc unit test into slow and normal speed test
  dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema
  dt-bindings: rtc: digicolor-rtc: move to trivial-rtc
  dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema
  dt-bindings: rtc: armada-380-rtc: convert to dtschema
  rtc: cros-ec: provide ID table for avoiding fallback match

3 months agoMerge tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Linus Torvalds [Sat, 25 May 2024 20:28:29 +0000 (13:28 -0700)]
Merge tag 'i3c/for-6.10' of git://git./linux/kernel/git/i3c/linux

Pull i3c updates from Alexandre Belloni:
 "Runtime PM (power management) is improved and hot-join support has
  been added to the dw controller driver.

  Core:
   - Allow device driver to trigger controller runtime PM

  Drivers:
   - dw: hot-join support
   - svc: better IBI handling"

* tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: dw: Add hot-join support.
  i3c: master: Enable runtime PM for master controller
  i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
  i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame
  i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()

3 months agoMerge tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 May 2024 20:23:42 +0000 (13:23 -0700)]
Merge tag 'jffs2-for-linus-6.10-rc1' of git://git./linux/kernel/git/rw/ubifs

Pull jffs2 updates from Richard Weinberger:

 - Fix illegal memory access in jffs2_free_inode()

 - Kernel-doc fixes

 - print symbolic error names

* tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  jffs2: Fix potential illegal address access in jffs2_free_inode
  jffs2: Simplify the allocation of slab caches
  jffs2: nodemgmt: fix kernel-doc comments
  jffs2: print symbolic error name instead of error code

3 months agoMerge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 May 2024 20:17:48 +0000 (13:17 -0700)]
Merge tag 'uml-for-linus-6.10-rc1' of git://git./linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Fixes for -Wmissing-prototypes warnings and further cleanup

 - Remove callback returning void from rtc and virtio drivers

 - Fix bash location

* tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
  um: virtio_uml: Convert to platform remove callback returning void
  um: rtc: Convert to platform remove callback returning void
  um: Remove unused do_get_thread_area function
  um: Fix -Wmissing-prototypes warnings for __vdso_*
  um: Add an internal header shared among the user code
  um: Fix the declaration of kasan_map_memory
  um: Fix the -Wmissing-prototypes warning for get_thread_reg
  um: Fix the -Wmissing-prototypes warning for __switch_mm
  um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
  um: Stop tracking host PID in cpu_tasks
  um: process: remove unused 'n' variable
  um: vector: remove unused len variable/calculation
  um: vector: fix bpfflash parameter evaluation
  um: slirp: remove set but unused variable 'pid'
  um: signal: move pid variable where needed
  um: Makefile: use bash from the environment
  um: Add winch to winch_handlers before registering winch IRQ
  um: Fix -Wmissing-prototypes warnings for __warp_* and foo
  um: Fix -Wmissing-prototypes warnings for text_poke*
  um: Move declarations to proper headers
  ...

3 months agoMerge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Sat, 25 May 2024 00:28:02 +0000 (17:28 -0700)]
Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Some fixes for the end of the merge window, mostly amdgpu and panthor,
  with one nouveau uAPI change that fixes a bad decision we made a few
  months back.

  nouveau:
   - fix bo metadata uAPI for vm bind

  panthor:
   - Fixes for panthor's heap logical block.
   - Reset on unrecoverable fault
   - Fix VM references.
   - Reset fix.

  xlnx:
   - xlnx compile and doc fixes.

  amdgpu:
   - Handle vbios table integrated info v2.3

  amdkfd:
   - Handle duplicate BOs in reserve_bo_and_cond_vms
   - Handle memory limitations on small APUs

  dp/mst:
   - MST null deref fix.

  bridge:
   - Don't let next bridge create connector in adv7511 to make probe
     work"

* tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amdgpu/atomfirmware: add intergrated info v2.3 table
  drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
  drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
  drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
  drm/bridge: adv7511: Attach next bridge without creating connector
  drm/buddy: Fix the warn on's during force merge
  drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
  drm/panthor: Call panthor_sched_post_reset() even if the reset failed
  drm/panthor: Reset the FW VM to NULL on unplug
  drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
  drm/panthor: Force an immediate reset on unrecoverable faults
  drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
  drm/panthor: Fix an off-by-one in the heap context retrieval logic
  drm/panthor: Relax the constraints on the tiler chunk size
  drm/panthor: Make sure the tiler initial/max chunks are consistent
  drm/panthor: Fix tiler OOM handling to allow incremental rendering
  drm: xlnx: zynqmp_dpsub: Fix compilation error
  drm: xlnx: zynqmp_dpsub: Fix few function comments

3 months agocifs: Fix missing set of remote_i_size
David Howells [Fri, 24 May 2024 14:23:36 +0000 (15:23 +0100)]
cifs: Fix missing set of remote_i_size

Occasionally, the generic/001 xfstest will fail indicating corruption in
one of the copy chains when run on cifs against a server that supports
FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs).  The
problem is that the remote_i_size value isn't updated by cifs_setsize()
when called by smb2_duplicate_extents(), but i_size *is*.

This may cause cifs_remap_file_range() to then skip the bit after calling
->duplicate_extents() that sets sizes.

Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before
calling cifs_setsize() to set i_size.

This means we don't then need to call netfs_resize_file() upon return from
->duplicate_extents(), but we also fix the test to compare against the pre-dup
inode size.

[Note that this goes back before the addition of remote_i_size with the
netfs_inode struct.  It should probably have been setting cifsi->server_eof
previously.]

Fixes: cfc63fc8126a ("smb3: fix cached file size problems in duplicate extents (reflink)")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
3 months agocifs: Fix smb3_insert_range() to move the zero_point
David Howells [Wed, 22 May 2024 08:38:48 +0000 (09:38 +0100)]
cifs: Fix smb3_insert_range() to move the zero_point

Fix smb3_insert_range() to move the zero_point over to the new EOF.
Without this, generic/147 fails as reads of data beyond the old EOF point
return zeroes.

Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
3 months agoMerge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 24 May 2024 19:47:28 +0000 (12:47 -0700)]
Merge tag 'mm-stable-2024-05-24-11-49' of git://git./linux/kernel/git/akpm/mm

Pull more mm updates from Andrew Morton:
 "Jeff Xu's implementation of the mseal() syscall"

* tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  selftest mm/mseal read-only elf memory segment
  mseal: add documentation
  selftest mm/mseal memory sealing
  mseal: add mseal syscall
  mseal: wire up mseal syscall

3 months agomm/ksm: fix possible UAF of stable_node
Chengming Zhou [Mon, 13 May 2024 03:07:56 +0000 (11:07 +0800)]
mm/ksm: fix possible UAF of stable_node

The commit 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page
deduplication limit") introduced a possible failure case in the
stable_tree_insert(), where we may free the new allocated stable_node_dup
if we fail to prepare the missing chain node.

Then that kfolio return and unlock with a freed stable_node set...  And
any MM activities can come in to access kfolio->mapping, so UAF.

Fix it by moving folio_set_stable_node() to the end after stable_node
is inserted successfully.

Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev
Fixes: 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication limit")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm/memory-failure: fix handling of dissolved but not taken off from buddy pages
Miaohe Lin [Thu, 23 May 2024 07:12:17 +0000 (15:12 +0800)]
mm/memory-failure: fix handling of dissolved but not taken off from buddy pages

When I did memory failure tests recently, below panic occurs:

page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __rmqueue_pcplist+0x23b/0x520
 get_page_from_freelist+0x26b/0xe40
 __alloc_pages_noprof+0x113/0x1120
 __folio_alloc_noprof+0x11/0xb0
 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
 __alloc_fresh_hugetlb_folio+0xe7/0x140
 alloc_pool_huge_folio+0x68/0x100
 set_max_huge_pages+0x13d/0x340
 hugetlb_sysctl_handler_common+0xe8/0x110
 proc_sys_call_handler+0x194/0x280
 vfs_write+0x387/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
 </TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---

And before the panic, there had an warning about bad page state:

BUG: Bad page state in process page-types  pfn:8cee00
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
Call Trace:
 <TASK>
 dump_stack_lvl+0x83/0xa0
 bad_page+0x63/0xf0
 free_unref_page+0x36e/0x5c0
 unpoison_memory+0x50b/0x630
 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
 debugfs_attr_write+0x42/0x60
 full_proxy_write+0x5b/0x80
 vfs_write+0xcd/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
 </TASK>

The root cause should be the below race:

 memory_failure
  try_memory_failure_hugetlb
   me_huge_page
    __page_handle_poison
     dissolve_free_hugetlb_folio
     drain_all_pages -- Buddy page can be isolated e.g. for compaction.
     take_page_off_buddy -- Failed as page is not in the buddy list.
     -- Page can be putback into buddy after compaction.
    page_ref_inc -- Leads to buddy page with refcnt = 1.

Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning.  And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.

Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.

Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
Yuanyuan Zhong [Thu, 23 May 2024 18:35:31 +0000 (12:35 -0600)]
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again

After switching smaps_rollup to use VMA iterator, searching for next entry
is part of the condition expression of the do-while loop.  So the current
VMA needs to be addressed before the continue statement.

Otherwise, with some VMAs skipped, userspace observed memory
consumption from /proc/pid/smaps_rollup will be smaller than the sum of
the corresponding fields from /proc/pid/smaps.

Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agonilfs2: fix potential hang in nilfs_detach_log_writer()
Ryusuke Konishi [Mon, 20 May 2024 13:26:21 +0000 (22:26 +0900)]
nilfs2: fix potential hang in nilfs_detach_log_writer()

Syzbot has reported a potential hang in nilfs_detach_log_writer() called
during nilfs2 unmount.

Analysis revealed that this is because nilfs_segctor_sync(), which
synchronizes with the log writer thread, can be called after
nilfs_segctor_destroy() terminates that thread, as shown in the call trace
below:

nilfs_detach_log_writer
  nilfs_segctor_destroy
    nilfs_segctor_kill_thread  --> Shut down log writer thread
    flush_work
      nilfs_iput_work_func
        nilfs_dispose_list
          iput
            nilfs_evict_inode
              nilfs_transaction_commit
                nilfs_construct_segment (if inode needs sync)
                  nilfs_segctor_sync  --> Attempt to synchronize with
                                          log writer thread
                           *** DEADLOCK ***

Fix this issue by changing nilfs_segctor_sync() so that the log writer
thread returns normally without synchronizing after it terminates, and by
forcing tasks that are already waiting to complete once after the thread
terminates.

The skipped inode metadata flushout will then be processed together in the
subsequent cleanup work in nilfs_segctor_destroy().

Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agonilfs2: fix unexpected freezing of nilfs_segctor_sync()
Ryusuke Konishi [Mon, 20 May 2024 13:26:20 +0000 (22:26 +0900)]
nilfs2: fix unexpected freezing of nilfs_segctor_sync()

A potential and reproducible race issue has been identified where
nilfs_segctor_sync() would block even after the log writer thread writes a
checkpoint, unless there is an interrupt or other trigger to resume log
writing.

This turned out to be because, depending on the execution timing of the
log writer thread running in parallel, the log writer thread may skip
responding to nilfs_segctor_sync(), which causes a call to schedule()
waiting for completion within nilfs_segctor_sync() to lose the opportunity
to wake up.

The reason why waking up the task waiting in nilfs_segctor_sync() may be
skipped is that updating the request generation issued using a shared
sequence counter and adding an wait queue entry to the request wait queue
to the log writer, are not done atomically.  There is a possibility that
log writing and request completion notification by nilfs_segctor_wakeup()
may occur between the two operations, and in that case, the wait queue
entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
nilfs_segctor_sync() will be carried over until the next request occurs.

Fix this issue by performing these two operations simultaneously within
the lock section of sc_state_lock.  Also, following the memory barrier
guidelines for event waiting loops, move the call to set_current_state()
in the same location into the event waiting loop to ensure that a memory
barrier is inserted just before the event condition determination.

Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agonilfs2: fix use-after-free of timer for log writer thread
Ryusuke Konishi [Mon, 20 May 2024 13:26:19 +0000 (22:26 +0900)]
nilfs2: fix use-after-free of timer for log writer thread

Patch series "nilfs2: fix log writer related issues".

This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis.  Details are described in each commit log.

This patch (of 3):

A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.

The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info structure
is about to be freed, and is used regardless of the thread's lifetime.

Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.

Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: "Bai, Shuangpeng" <sjb7183@psu.edu>
Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoselftests/mm: fix build warnings on ppc64
Michael Ellerman [Tue, 21 May 2024 03:02:19 +0000 (13:02 +1000)]
selftests/mm: fix build warnings on ppc64

Fix warnings like:

  In file included from uffd-unit-tests.c:8:
  uffd-unit-tests.c: In function `uffd_poison_handle_fault':
  uffd-common.h:45:33: warning: format `%llu' expects argument of type
  `long long unsigned int', but argument 3 has type `__u64' {aka `long
  unsigned int'} [-Wformat=]

By switching to unsigned long long for u64 for ppc64 builds.

Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoarm64: patching: fix handling of execmem addresses
Will Deacon [Tue, 21 May 2024 21:38:13 +0000 (00:38 +0300)]
arm64: patching: fix handling of execmem addresses

Klara Modin reported warnings for a kernel configured with BPF_JIT but
without MODULES:

[   44.131296] Trying to vfree() bad address (000000004a17c299)
[   44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a0082 #25
[   44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
[   44.164433] Workqueue: events bpf_prog_free_deferred
[   44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.188772] sp : ffff800082a13c70
[   44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
[   44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
[   44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
[   44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
[   44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
[   44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
[   44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
[   44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
[   44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
[   44.275703] Call trace:
[   44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.283858] vfree (mm/vmalloc.c:3322)
[   44.287835] execmem_free (mm/execmem.c:70)
[   44.292347] bpf_jit_free_exec+0x10/0x1c
[   44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
[   44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
[   44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
[   44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
[   44.317785] process_one_work (kernel/workqueue.c:3273)
[   44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
[   44.327292] kthread (kernel/kthread.c:388)
[   44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)

The problem is because bpf_arch_text_copy() silently fails to write to the
read-only area as a result of patch_map() faulting and the resulting
-EFAULT being chucked away.

Update patch_map() to use CONFIG_EXECMEM instead of
CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.

Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reported-by: Klara Modin <klarasmodin@gmail.com>
Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.com
Tested-by: Klara Modin <klarasmodin@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoselftests/mm: compaction_test: fix bogus test success and reduce probability of OOM...
Dev Jain [Tue, 21 May 2024 07:43:58 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation

Reset nr_hugepages to zero before the start of the test.

If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:

 - The probability of the test getting OOM-killed increases.  Proof:
   The test wants to run on 80% of available memory to prevent OOM-killing
   (see original code comments).  Let the value of mem_free at the start
   of the test, when nr_hugepages = 0, be x.  In the other case, when
   nr_hugepages > 0, let the memory consumed by hugepages be y.  In the
   former case, the test operates on 0.8 * x of memory.  In the latter,
   the test operates on 0.8 * (x - y) of memory, with y already filled,
   hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
   x.  Q.E.D

 - The probability of a bogus test success increases.  Proof: Let the
   memory consumed by hugepages be greater than 25% of x, with x and y
   defined as above.  The definition of compaction_index is c_index = (x -
   y)/z where z is the memory consumed by hugepages after trying to
   increase them again.  In check_compaction(), we set the number of
   hugepages to zero, and then increase them back; the probability that
   they will be set back to consume at least y amount of memory again is
   very high (since there is not much delay between the two attempts of
   changing nr_hugepages).  Hence, z >= y > (x/4) (by the 25% assumption).
   Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
   hence, c_index can always be forced to be less than 3, thereby the test
   succeeding always.  Q.E.D

Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoselftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
Dev Jain [Tue, 21 May 2024 07:43:57 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages

Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read().  Fix that
using lseek().

Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoselftests/mm: compaction_test: fix bogus test success on Aarch64
Dev Jain [Tue, 21 May 2024 07:43:56 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix bogus test success on Aarch64

Patch series "Fixes for compaction_test", v2.

The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.

On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.

Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.

Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.

This patch (of 3):

Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index.  Fix that by checking
for nr_hugepages == 0.  Anyways, in general, avoid a division by zero by
exiting the program beforehand.  While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.

Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomailmap: update email address for Satya Priya
Satya Priya Kakitapalli [Wed, 15 May 2024 06:04:50 +0000 (11:34 +0530)]
mailmap: update email address for Satya Priya

Update mailmap with my latest email ID, quic_c_skakit@quicinc.com
is no longer active.

Link: https://lkml.kernel.org/r/20240515-mailmap-update-v1-1-df4853f757a3@quicinc.com
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com>
Cc: Ajit Pandey <quic_ajipan@quicinc.com>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Imran Shaik <quic_imrashai@quicinc.com>
Cc: Jagadeesh Kona <quic_jkona@quicinc.com>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Taniya Das <quic_tdas@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm/huge_memory: don't unpoison huge_zero_folio
Miaohe Lin [Thu, 16 May 2024 12:26:08 +0000 (20:26 +0800)]
mm/huge_memory: don't unpoison huge_zero_folio

When I did memory failure tests recently, below panic occurs:

 kernel BUG at include/linux/mm.h:1135!
 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
 FS:  0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
 Call Trace:
  <TASK>
  do_shrink_slab+0x14f/0x6a0
  shrink_slab+0xca/0x8c0
  shrink_node+0x2d0/0x7d0
  balance_pgdat+0x33a/0x720
  kswapd+0x1f3/0x410
  kthread+0xd5/0x100
  ret_from_fork+0x2f/0x50
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 Modules linked in: mce_inject hwpoison_inject
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
 FS:  0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0

The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt.  But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.

Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet.

Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agokasan, fortify: properly rename memintrinsics
Andrey Konovalov [Fri, 17 May 2024 13:01:18 +0000 (15:01 +0200)]
kasan, fortify: properly rename memintrinsics

After commit 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*()
functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled,
even though the compiler instruments meminstrinsics by generating calls to
__asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses
uninstrumented memset/memmove/memcpy as the underlying functions.

As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy.
This also makes KASAN tests corrupt kernel memory and cause crashes.

To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying
functions whenever appropriate.  Do this only for the instrumented code
(as indicated by __SANITIZE_ADDRESS__).

Link: https://lkml.kernel.org/r/20240517130118.759301-1-andrey.konovalov@linux.dev
Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions")
Fixes: 51287dcb00cc ("kasan: emit different calls for instrumentable memintrinsics")
Fixes: 36be5cba99f6 ("kasan: treat meminstrinsic as builtins in uninstrumented files")
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Reported-by: Nico Pache <npache@redhat.com>
Closes: https://lore.kernel.org/all/20240501144156.17e65021@outsider.home/
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Nico Pache <npache@redhat.com>
Acked-by: Nico Pache <npache@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agolib: add version into /proc/allocinfo output
Suren Baghdasaryan [Tue, 14 May 2024 16:31:28 +0000 (09:31 -0700)]
lib: add version into /proc/allocinfo output

Add version string and a header at the beginning of /proc/allocinfo to
allow later format changes.  Example output:

> head /proc/allocinfo
allocinfo - version: 1.0
#     <size>  <calls> <tag info>
           0        0 init/main.c:1314 func:do_initcalls
           0        0 init/do_mounts.c:353 func:mount_nodev_root
           0        0 init/do_mounts.c:187 func:mount_root_generic
           0        0 init/do_mounts.c:158 func:do_mount_root
           0        0 init/initramfs.c:493 func:unpack_to_rootfs
           0        0 init/initramfs.c:492 func:unpack_to_rootfs
           0        0 init/initramfs.c:491 func:unpack_to_rootfs
         512        1 arch/x86/events/rapl.c:681 func:init_rapl_pmus
         128        1 arch/x86/events/rapl.c:571 func:rapl_cpu_online

[akpm@linux-foundation.org: remove stray newline from struct allocinfo_private]
Link: https://lkml.kernel.org/r/20240514163128.3662251-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
Hailong.Liu [Fri, 10 May 2024 10:01:31 +0000 (18:01 +0800)]
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL

commit a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
includes support for __GFP_NOFAIL, but it presents a conflict with commit
dd544141b9eb ("vmalloc: back off when the current task is OOM-killed").  A
possible scenario is as follows:

process-a
__vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
    __vmalloc_area_node()
        vm_area_alloc_pages()
--> oom-killer send SIGKILL to process-a
        if (fatal_signal_pending(current)) break;
--> return NULL;

To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
if __GFP_NOFAIL set.

This issue occurred during OPLUS KASAN TEST. Below is part of the log
-> oom-killer sends signal to process
[65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198

[65731.259685] [T32454] Call trace:
[65731.259698] [T32454]  dump_backtrace+0xf4/0x118
[65731.259734] [T32454]  show_stack+0x18/0x24
[65731.259756] [T32454]  dump_stack_lvl+0x60/0x7c
[65731.259781] [T32454]  dump_stack+0x18/0x38
[65731.259800] [T32454]  mrdump_common_die+0x250/0x39c [mrdump]
[65731.259936] [T32454]  ipanic_die+0x20/0x34 [mrdump]
[65731.260019] [T32454]  atomic_notifier_call_chain+0xb4/0xfc
[65731.260047] [T32454]  notify_die+0x114/0x198
[65731.260073] [T32454]  die+0xf4/0x5b4
[65731.260098] [T32454]  die_kernel_fault+0x80/0x98
[65731.260124] [T32454]  __do_kernel_fault+0x160/0x2a8
[65731.260146] [T32454]  do_bad_area+0x68/0x148
[65731.260174] [T32454]  do_mem_abort+0x151c/0x1b34
[65731.260204] [T32454]  el1_abort+0x3c/0x5c
[65731.260227] [T32454]  el1h_64_sync_handler+0x54/0x90
[65731.260248] [T32454]  el1h_64_sync+0x68/0x6c

[65731.260269] [T32454]  z_erofs_decompress_queue+0x7f0/0x2258
--> be->decompressed_pages = kvcalloc(be->nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL);
kernel panic by NULL pointer dereference.
erofs assume kvmalloc with __GFP_NOFAIL never return NULL.
[65731.260293] [T32454]  z_erofs_runqueue+0xf30/0x104c
[65731.260314] [T32454]  z_erofs_readahead+0x4f0/0x968
[65731.260339] [T32454]  read_pages+0x170/0xadc
[65731.260364] [T32454]  page_cache_ra_unbounded+0x874/0xf30
[65731.260388] [T32454]  page_cache_ra_order+0x24c/0x714
[65731.260411] [T32454]  filemap_fault+0xbf0/0x1a74
[65731.260437] [T32454]  __do_fault+0xd0/0x33c
[65731.260462] [T32454]  handle_mm_fault+0xf74/0x3fe0
[65731.260486] [T32454]  do_mem_abort+0x54c/0x1b34
[65731.260509] [T32454]  el0_da+0x44/0x94
[65731.260531] [T32454]  el0t_64_sync_handler+0x98/0xb4
[65731.260553] [T32454]  el0t_64_sync+0x198/0x19c

Link: https://lkml.kernel.org/r/20240510100131.1865-1-hailong.liu@oppo.com
Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
Signed-off-by: Hailong.Liu <hailong.liu@oppo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Barry Song <21cnbao@gmail.com>
Reported-by: Oven <liyangouwen1@oppo.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoMerge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 24 May 2024 17:46:35 +0000 (10:46 -0700)]
Merge tag 'riscv-for-linus-6.10-mw2' of git://git./linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - The compression format used for boot images is now configurable at
   build time, and these formats are shown in `make help`

 - access_ok() has been optimized

 - A pair of performance bugs have been fixed in the uaccess handlers

 - Various fixes and cleanups, including one for the IMSIC build failure
   and one for the early-boot ftrace illegal NOPs bug

* tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix early ftrace nop patching
  irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
  riscv: selftests: Add signal handling vector tests
  riscv: mm: accelerate pagefault when badaccess
  riscv: uaccess: Relax the threshold for fast path
  riscv: uaccess: Allow the last potential unrolled copy
  riscv: typo in comment for get_f64_reg
  Use bool value in set_cpu_online()
  riscv: selftests: Add hwprobe binaries to .gitignore
  riscv: stacktrace: fixed walk_stackframe()
  ftrace: riscv: move from REGS to ARGS
  riscv: do not select MODULE_SECTIONS by default
  riscv: show help string for riscv-specific targets
  riscv: make image compression configurable
  riscv: cpufeature: Fix extension subset checking
  riscv: cpufeature: Fix thead vector hwcap removal
  riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
  riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
  riscv: Define TASK_SIZE_MAX for __access_ok()
  riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN

3 months agoMerge tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 24 May 2024 17:24:49 +0000 (10:24 -0700)]
Merge tag 'for-linus-6.10a-rc1-tag' of git://git./linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - a small cleanup in the drivers/xen/xenbus Makefile

 - a fix of the Xen xenstore driver to improve connecting to a late
   started Xenstore

 - an enhancement for better support of ballooning in PVH guests

 - a cleanup using try_cmpxchg() instead of open coding it

* tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  drivers/xen: Improve the late XenStore init protocol
  xen/xenbus: Use *-y instead of *-objs in Makefile
  xen/x86: add extra pages to unpopulated-alloc if available
  locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()

3 months agoMerge tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Linus Torvalds [Fri, 24 May 2024 16:40:31 +0000 (09:40 -0700)]
Merge tag 'for-6.10-tag' of git://git./linux/kernel/git/kdave/linux

Pull more btrfs updates from David Sterba:
 "A few more updates, mostly stability fixes or user visible changes:

   - fix race in zoned mode during device replace that can lead to
     use-after-free

   - update return codes and lower message levels for quota rescan where
     it's causing false alerts

   - fix unexpected qgroup id reuse under some conditions

   - fix condition when looking up extent refs

   - add option norecovery (removed in 6.8), the intended replacements
     haven't been used and some aplications still rely on the old one

   - build warning fixes"

* tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: re-introduce 'norecovery' mount option
  btrfs: fix end of tree detection when searching for data extent ref
  btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning
  btrfs: zoned: fix use-after-free due to race with dev replace
  btrfs: qgroup: fix qgroup id collision across mounts
  btrfs: qgroup: update rescan message levels and error codes

3 months agoMerge tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 24 May 2024 16:31:50 +0000 (09:31 -0700)]
Merge tag 'erofs-for-6.10-rc1-2' of git://git./linux/kernel/git/xiang/erofs

Pull more erofs updates from Gao Xiang:
 "The main ones are metadata API conversion to byte offsets by Al Viro.

  Another patch gets rid of unnecessary memory allocation out of DEFLATE
  decompressor. The remaining one is a trivial cleanup.

   - Convert metadata APIs to byte offsets

   - Avoid allocating DEFLATE streams unnecessarily

   - Some erofs_show_options() cleanup"

* tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: avoid allocating DEFLATE streams before mounting
  z_erofs_pcluster_begin(): don't bother with rounding position down
  erofs: don't round offset down for erofs_read_metabuf()
  erofs: don't align offset for erofs_read_metabuf() (simple cases)
  erofs: mechanically convert erofs_read_metabuf() to offsets
  erofs: clean up erofs_show_options()

3 months agoMerge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Fri, 24 May 2024 16:07:22 +0000 (09:07 -0700)]
Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Nothing exciting, just syzbot fixes (except for the one
  FMODE_CAN_ODIRECT patch).

  Looks like syzbot reports have slowed down; this is all catch up from
  two weeks of conferences.

  Next hardening project is using Thomas's error injection tooling to
  torture test repair"

* tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix race path in bch2_inode_insert()
  bcachefs: Ensure we're RW before journalling
  bcachefs: Fix shutdown ordering
  bcachefs: Fix unsafety in bch2_dirent_name_bytes()
  bcachefs: Fix stack oob in __bch2_encrypt_bio()
  bcachefs: Fix btree_trans leak in bch2_readahead()
  bcachefs: Fix bogus verify_replicas_entry() assert
  bcachefs: Check for subvolues with bogus snapshot/inode fields
  bcachefs: bch2_checksum() returns 0 for unknown checksum type
  bcachefs: Fix bch2_alloc_ciphers()
  bcachefs: Add missing guard in bch2_snapshot_has_children()
  bcachefs: Fix missing parens in drop_locks_do()
  bcachefs: Improve bch2_assert_pos_locked()
  bcachefs: Fix shift overflows in replicas.c
  bcachefs: Fix shift overflow in btree_lost_data()
  bcachefs: Fix ref in trans_mark_dev_sbs() error path
  bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
  bcachefs: Fix rcu splat in check_fix_ptrs()

3 months agoMerge tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 24 May 2024 16:01:21 +0000 (09:01 -0700)]
Merge tag 'input-for-v6.10-rc0' of git://git./linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

 - a change to input core to trim amount of keys data in modalias string
   in case when a device declares too many keys and they do not fit in
   uevent buffer instead of reporting an error which results in uevent
   not being generated at all

 - support for Machenike G5 Pro Controller added to xpad driver

 - support for FocalTech FT5452 and FT8719 added to edt-ft5x06

 - support for new SPMI vibrator added to pm8xxx-vibrator driver

 - missing locking added to cyapa touchpad driver

 - removal of unused fields in various driver structures

 - explicit initialization of i2c_device_id::driver_data to 0 dropped
   from input drivers

 - other assorted fixes and cleanups.

* tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits)
  Input: edt-ft5x06 - add support for FocalTech FT5452 and FT8719
  dt-bindings: input: touchscreen: edt-ft5x06: Document FT5452 and FT8719 support
  Input: xpad - add support for Machenike G5 Pro Controller
  Input: try trimming too long modalias strings
  Input: drop explicit initialization of struct i2c_device_id::driver_data to 0
  Input: zet6223 - remove an unused field in struct zet6223_ts
  Input: chipone_icn8505 - remove an unused field in struct icn8505_data
  Input: cros_ec_keyb - remove an unused field in struct cros_ec_keyb
  Input: lpc32xx-keys - remove an unused field in struct lpc32xx_kscan_drv
  Input: matrix_keypad - remove an unused field in struct matrix_keypad
  Input: tca6416-keypad - remove unused struct tca6416_drv_data
  Input: tca6416-keypad - remove an unused field in struct tca6416_keypad_chip
  Input: da7280 - remove an unused field in struct da7280_haptic
  Input: ff-core - prefer struct_size over open coded arithmetic
  Input: cyapa - add missing input core locking to suspend/resume functions
  input: pm8xxx-vibrator: add new SPMI vibrator support
  dt-bindings: input: qcom,pm8xxx-vib: add new SPMI vibrator module
  input: pm8xxx-vibrator: refactor to support new SPMI vibrator
  Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation
  Input: sur40 - convert le16 to cpu before use
  ...

3 months agoMerge tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 24 May 2024 15:48:51 +0000 (08:48 -0700)]
Merge tag 'sound-fix-6.10-rc1' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes for 6.10-rc1. Most of changes are various
  device-specific fixes and quirks, while there are a few small changes
  in ALSA core timer and module / built-in fixes"

* tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11.
  ALSA: core: Enable proc module when CONFIG_MODULES=y
  ALSA: core: Fix NULL module pointer assignment at card init
  ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
  ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string
  ASoC: tas2781: Fix wrong loading calibrated data sequence
  ASoC: tas2552: Add TX path for capturing AUDIO-OUT data
  ALSA: usb-audio: Fix for sampling rates support for Mbox3
  Documentation: sound: Fix trailing whitespaces
  ALSA: timer: Set lower bound of start tick time
  ASoC: codecs: ES8326: solve hp and button detect issue
  ASoC: rt5645: mic-in detection threshold modification
  ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection

3 months agoMerge tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 24 May 2024 15:43:25 +0000 (08:43 -0700)]
Merge tag 'char-misc-6.10-rc1-fix' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
  merge window, and has been sitting in my tree and linux-next for quite
  a while now, but wasn't sent to you (my fault, travels...)

  It is a bugfix to resolve an error in the speakup code that could
  overflow a buffer.

  It has been in linux-next for a while with no reported problems"

* tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  speakup: Fix sizeof() vs ARRAY_SIZE() bug

3 months agoMerge tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Fri, 24 May 2024 15:38:28 +0000 (08:38 -0700)]
Merge tag 'tty-6.10-rc1-fixes' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small TTY and Serial driver fixes that missed the
  6.9-final merge window, but have been in my tree for weeks (my fault,
  travel caused me to miss this)

  These fixes include:

   - more n_gsm fixes for reported problems

   - 8520_mtk driver fix

   - 8250_bcm7271 driver fix

   - sc16is7xx driver fix

  All of these have been in linux-next for weeks without any reported
  problems"

* tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler
  serial: 8250_bcm7271: use default_mux_rate if possible
  serial: 8520_mtk: Set RTS on shutdown for Rx in-band wakeup
  tty: n_gsm: fix missing receive state reset after mode switch
  tty: n_gsm: fix possible out-of-bounds in gsm0_receive()

3 months agoMerge tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 24 May 2024 15:33:44 +0000 (08:33 -0700)]
Merge tag 'hardening-v6.10-rc1-fixes' of git://git./linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module
   decompression (Stephen Boyd)

 - ubsan: Restore dependency on ARCH_HAS_UBSAN

 - kunit/fortify: Fix memcmp() test to be amplitude agnostic

* tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kunit/fortify: Fix memcmp() test to be amplitude agnostic
  ubsan: Restore dependency on ARCH_HAS_UBSAN
  loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression

3 months agoMerge tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 24 May 2024 15:27:34 +0000 (08:27 -0700)]
Merge tag 'trace-tracefs-v6.10' of git://git./linux/kernel/git/trace/linux-trace

Pull tracefs/eventfs updates from Steven Rostedt:
 "Bug fixes:

   - The eventfs directories need to have unique inode numbers. Make
     sure that they do not get the default file inode number.

   - Update the inode uid and gid fields on remount.

     When a remount happens where a uid and/or gid is specified, all the
     tracefs files and directories should get the specified uid and/or
     gid. But this can be sporadic when some uids were assigned already.
     There's already a list of inodes that are allocated. Just update
     their uid and gid fields at the time of remount.

   - Update the eventfs_inodes on remount from the top level "events"
     descriptor.

     There was a bug where not all the eventfs files or directories
     where getting updated on remount. One fix was to clear the
     SAVED_UID/GID flags from the inode list during the iteration of the
     inodes during the remount. But because the eventfs inodes can be
     freed when the last referenced is released, not all the
     eventfs_inodes were being updated. This lead to the ownership
     selftest to fail if it was run a second time (the first time would
     leave eventfs_inodes with no corresponding tracefs_inode).

     Instead, for eventfs_inodes, only process the "events"
     eventfs_inode from the list iteration, as it is guaranteed to have
     a tracefs_inode (it's never freed while the "events" directory
     exists). As it has a list of its children, and the children have a
     list of their children, just iterate all the eventfs_inodes from
     the "events" descriptor and it is guaranteed to get all of them.

   - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.

     Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
     callback. But this is the wrong location. The iput() callback is
     called when the last reference to the dentry inode is hit. There
     could be a case where two dentry's have the same inode, and the
     flag will be cleared prematurely. The flag needs to be cleared when
     the last reference of the inode is dropped and that happens in the
     inode's drop_inode() callback handler.

  Cleanups:

   - Consolidate the creation of a tracefs_inode for an eventfs_inode

     A tracefs_inode is created for both files and directories of the
     eventfs system. It is open coded. Instead, consolidate it into a
     single eventfs_get_inode() function call.

   - Remove the eventfs getattr and permission callbacks.

     The permissions for the eventfs files and directories are updated
     when the inodes are created, on remount, and when the user sets
     them (via setattr). The inodes hold the current permissions so
     there is no need to have custom getattr or permissions callbacks as
     they will more likely cause them to be incorrect. The inode's
     permissions are updated when they should be updated. Remove the
     getattr and permissions inode callbacks.

   - Do not update eventfs_inode attributes on creation of inodes.

     The eventfs_inodes attribute field is used to store the permissions
     of the directories and files for when their corresponding inodes
     are freed and are created again. But when the creation of the
     inodes happen, the eventfs_inode attributes are recalculated. The
     recalculation should only happen when the permissions change for a
     given file or directory. Currently, the attribute changes are just
     being set to their current files so this is not a bug, but it's
     unnecessary and error prone. Stop doing that.

   - The events directory inode is created once when the events
     directory is created and deleted when it is deleted. It is now
     updated on remount and when the user changes the permissions.
     There's no need to use the eventfs_inode of the events directory to
     store the events directory permissions. But using it to store the
     default permissions for the files within the directory that have
     not been updated by the user can simplify the code"

* tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Do not use attributes for events directory
  eventfs: Cleanup permissions in creation of inodes
  eventfs: Remove getattr and permission callbacks
  eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
  tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
  eventfs: Update all the eventfs_inodes from the events descriptor
  tracefs: Update inode permissions on remount
  eventfs: Keep the directories from having the same inode number as files

3 months agogenirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
dicken.ding [Fri, 24 May 2024 09:17:39 +0000 (17:17 +0800)]
genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()

irq_find_at_or_after() dereferences the interrupt descriptor which is
returned by mt_find() while neither holding sparse_irq_lock nor RCU read
lock, which means the descriptor can be freed between mt_find() and the
dereference:

    CPU0                            CPU1
    desc = mt_find()
                                    delayed_free_desc(desc)
    irq_desc_get_irq(desc)

The use-after-free is reported by KASAN:

    Call trace:
     irq_get_next_irq+0x58/0x84
     show_stat+0x638/0x824
     seq_read_iter+0x158/0x4ec
     proc_reg_read_iter+0x94/0x12c
     vfs_read+0x1e0/0x2c8

    Freed by task 4471:
     slab_free_freelist_hook+0x174/0x1e0
     __kmem_cache_free+0xa4/0x1dc
     kfree+0x64/0x128
     irq_kobj_release+0x28/0x3c
     kobject_put+0xcc/0x1e0
     delayed_free_desc+0x14/0x2c
     rcu_do_batch+0x214/0x720

Guard the access with a RCU read lock section.

Fixes: 721255b9826b ("genirq: Use a maple tree for interrupt descriptor management")
Signed-off-by: dicken.ding <dicken.ding@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
3 months agofs/ntfs3: Break dir enumeration if directory contents error
Konstantin Komarov [Tue, 23 Apr 2024 14:21:58 +0000 (17:21 +0300)]
fs/ntfs3: Break dir enumeration if directory contents error

If we somehow attempt to read beyond the directory size, an error
is supposed to be returned.

However, in some cases, read requests do not stop and instead enter
into a loop.

To avoid this, we set the position in the directory to the end.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org
3 months agofs/ntfs3: Fix case when index is reused during tree transformation
Konstantin Komarov [Tue, 23 Apr 2024 12:31:56 +0000 (15:31 +0300)]
fs/ntfs3: Fix case when index is reused during tree transformation

In most cases when adding a cluster to the directory index,
they are placed at the end, and in the bitmap, this cluster corresponds
to the last bit. The new directory size is calculated as follows:

data_size = (u64)(bit + 1) << indx->index_bits;

In the case of reusing a non-final cluster from the index,
data_size is calculated incorrectly, resulting in the directory size
differing from the actual size.

A check for cluster reuse has been added, and the size update is skipped.

Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org
3 months agoselftest mm/mseal read-only elf memory segment
Jeff Xu [Mon, 15 Apr 2024 16:35:24 +0000 (16:35 +0000)]
selftest mm/mseal read-only elf memory segment

Sealing read-only of elf mapping so it can't be changed by mprotect.

[jeffxu@chromium.org: style change]
Link: https://lkml.kernel.org/r/20240416220944.2481203-2-jeffxu@chromium.org
[amer.shanawany@gmail.com: fix linker error for inline function]
Link: https://lkml.kernel.org/r/20240420202346.546444-1-amer.shanawany@gmail.com
[jeffxu@chromium.org: fix compile warning]
Link: https://lkml.kernel.org/r/20240420003515.345982-2-jeffxu@chromium.org
[jeffxu@chromium.org: fix arm build]
Link: https://lkml.kernel.org/r/20240502225331.3806279-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-6-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Signed-off-by: Amer Al Shanawany <amer.shanawany@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomseal: add documentation
Jeff Xu [Mon, 15 Apr 2024 16:35:23 +0000 (16:35 +0000)]
mseal: add documentation

Add documentation for mseal().

Link: https://lkml.kernel.org/r/20240415163527.626541-5-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoselftest mm/mseal memory sealing
Jeff Xu [Mon, 15 Apr 2024 16:35:22 +0000 (16:35 +0000)]
selftest mm/mseal memory sealing

selftest for memory sealing change in mmap() and mseal().

Link: https://lkml.kernel.org/r/20240415163527.626541-4-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomseal: add mseal syscall
Jeff Xu [Mon, 15 Apr 2024 16:35:21 +0000 (16:35 +0000)]
mseal: add mseal syscall

The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

Following input during RFC are incooperated into this patch:

Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Liam R. Howlett: perf optimization.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

Finally, the idea that inspired this patch comes from Stephen Röttger's
work in Chrome V8 CFI.

[jeffxu@chromium.org: add branch prediction hint, per Pedro]
Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agomseal: wire up mseal syscall
Jeff Xu [Mon, 15 Apr 2024 16:35:20 +0000 (16:35 +0000)]
mseal: wire up mseal syscall

Patch series "Introduce mseal", v10.

This patchset proposes a new mseal() syscall for the Linux kernel.

In a nutshell, mseal() protects the VMAs of a given virtual memory range
against modifications, such as changes to their permission bits.

Modern CPUs support memory permissions, such as the read/write (RW) and
no-execute (NX) bits.  Linux has supported NX since the release of kernel
version 2.6.8 in August 2004 [1].  The memory permission feature improves
the security stance on memory corruption bugs, as an attacker cannot
simply write to arbitrary memory and point the code to it.  The memory
must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct).  mseal() additionally protects the
VMA itself against modifications of the selected seal type.

Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system.  For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.  Memory sealing can automatically be
applied by the runtime loader to seal .text and .rodata pages and
applications can additionally seal security critical data at runtime.  A
similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall
[4].  Also, Chrome wants to adopt this feature for their CFI work [2] and
this patchset has been designed to be compatible with the Chrome use case.

Two system calls are involved in sealing the map:  mmap() and mseal().

The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5].  Chrome browser in ChromeOS will be the first user of this
API.

Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications.  For example, in the
case of libc, sealing is only applied to read-only (RO) or read-execute
(RX) memory segments (such as .text and .RELRO) to prevent them from
becoming writable, the lifetime of those mappings are tied to the lifetime
of the process.

Chrome wants to seal two large address space reservations that are managed
by different allocators.  The memory is mapped RW- and RWX respectively
but write access to it is restricted using pkeys (or in the future ARM
permission overlay extensions).  The lifetime of those mappings are not
tied to the lifetime of the process, therefore, while the memory is
sealed, the allocators still need to free or discard the unused memory.
For example, with madvise(DONTNEED).

However, always allowing madvise(DONTNEED) on this range poses a security
risk.  For example if a jump instruction crosses a page boundary and the
second page gets discarded, it will overwrite the target bytes with zeros
and change the control flow.  Checking write-permission before the discard
operation allows us to control when the operation is valid.  In this case,
the madvise will only succeed if the executing thread has PKEY write
permissions and PKRU changes are protected in software by control-flow
integrity.

Although the initial version of this patch series is targeting the Chrome
browser as its first user, it became evident during upstream discussions
that we would also want to ensure that the patch set eventually is a
complete solution for memory sealing and compatible with other use cases.
The specific scenario currently in mind is glibc's use case of loading and
sealing ELF executables.  To this end, Stephen is working on a change to
glibc to add sealing support to the dynamic linker, which will seal all
non-writable segments at startup.  Once this work is completed, all
applications will be able to automatically benefit from these new
protections.

In closing, I would like to formally acknowledge the valuable
contributions received during the RFC process, which were instrumental in
shaping this patch:

Jann Horn: raising awareness and providing valuable insights on the
  destructive madvise operations.
Liam R. Howlett: perf optimization.
Linus Torvalds: assisting in defining system call signature and scope.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

MM perf benchmarks
==================
This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
check the VMAs’ sealing flag, so that no partial update can be made,
when any segment within the given memory range is sealed.

To measure the performance impact of this loop, two tests are developed.
[8]

The first is measuring the time taken for a particular system call,
by using clock_gettime(CLOCK_MONOTONIC). The second is using
PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
similar results.

The tests have roughly below sequence:
for (i = 0; i < 1000, i++)
    create 1000 mappings (1 page per VMA)
    start the sampling
    for (j = 0; j < 1000, j++)
        mprotect one mapping
    stop and save the sample
    delete 1000 mappings
calculates all samples.

Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
4G memory, Chromebook.

Based on the latest upstream code:
The first test (measuring time)
syscall__ vmas t t_mseal delta_ns per_vma %
munmap__   1 909 944 35 35 104%
munmap__   2 1398 1502 104 52 107%
munmap__   4 2444 2594 149 37 106%
munmap__   8 4029 4323 293 37 107%
munmap__   16 6647 6935 288 18 104%
munmap__   32 11811 12398 587 18 105%
mprotect 1 439 465 26 26 106%
mprotect 2 1659 1745 86 43 105%
mprotect 4 3747 3889 142 36 104%
mprotect 8 6755 6969 215 27 103%
mprotect 16 13748 14144 396 25 103%
mprotect 32 27827 28969 1142 36 104%
madvise_ 1 240 262 22 22 109%
madvise_ 2 366 442 76 38 121%
madvise_ 4 623 751 128 32 121%
madvise_ 8 1110 1324 215 27 119%
madvise_ 16 2127 2451 324 20 115%
madvise_ 32 4109 4642 534 17 113%

The second test (measuring cpu cycle)
syscall__ vmas cpu cmseal delta_cpu per_vma %
munmap__ 1 1790 1890 100 100 106%
munmap__ 2 2819 3033 214 107 108%
munmap__ 4 4959 5271 312 78 106%
munmap__ 8 8262 8745 483 60 106%
munmap__ 16 13099 14116 1017 64 108%
munmap__ 32 23221 24785 1565 49 107%
mprotect 1 906 967 62 62 107%
mprotect 2 3019 3203 184 92 106%
mprotect 4 6149 6569 420 105 107%
mprotect 8 9978 10524 545 68 105%
mprotect 16 20448 21427 979 61 105%
mprotect 32 40972 42935 1963 61 105%
madvise_ 1 434 497 63 63 115%
madvise_ 2 752 899 147 74 120%
madvise_ 4 1313 1513 200 50 115%
madvise_ 8 2271 2627 356 44 116%
madvise_ 16 4312 4883 571 36 113%
madvise_ 32 8376 9319 943 29 111%

Based on the result, for 6.8 kernel, sealing check adds
20-40 nano seconds, or around 50-100 CPU cycles, per VMA.

In addition, I applied the sealing to 5.10 kernel:
The first test (measuring time)
syscall__ vmas t tmseal delta_ns per_vma %
munmap__ 1 357 390 33 33 109%
munmap__ 2 442 463 21 11 105%
munmap__ 4 614 634 20 5 103%
munmap__ 8 1017 1137 120 15 112%
munmap__ 16 1889 2153 263 16 114%
munmap__ 32 4109 4088 -21 -1 99%
mprotect 1 235 227 -7 -7 97%
mprotect 2 495 464 -30 -15 94%
mprotect 4 741 764 24 6 103%
mprotect 8 1434 1437 2 0 100%
mprotect 16 2958 2991 33 2 101%
mprotect 32 6431 6608 177 6 103%
madvise_ 1 191 208 16 16 109%
madvise_ 2 300 324 24 12 108%
madvise_ 4 450 473 23 6 105%
madvise_ 8 753 806 53 7 107%
madvise_ 16 1467 1592 125 8 108%
madvise_ 32 2795 3405 610 19 122%

The second test (measuring cpu cycle)
syscall__ nbr_vma cpu cmseal delta_cpu per_vma %
munmap__ 1 684 715 31 31 105%
munmap__ 2 861 898 38 19 104%
munmap__ 4 1183 1235 51 13 104%
munmap__ 8 1999 2045 46 6 102%
munmap__ 16 3839 3816 -23 -1 99%
munmap__ 32 7672 7887 216 7 103%
mprotect 1 397 443 46 46 112%
mprotect 2 738 788 50 25 107%
mprotect 4 1221 1256 35 9 103%
mprotect 8 2356 2429 72 9 103%
mprotect 16 4961 4935 -26 -2 99%
mprotect 32 9882 10172 291 9 103%
madvise_ 1 351 380 29 29 108%
madvise_ 2 565 615 49 25 109%
madvise_ 4 872 933 61 15 107%
madvise_ 8 1508 1640 132 16 109%
madvise_ 16 3078 3323 245 15 108%
madvise_ 32 5893 6704 811 25 114%

For 5.10 kernel, sealing check adds 0-15 ns in time, or 10-30
CPU cycles, there is even decrease in some cases.

It might be interesting to compare 5.10 and 6.8 kernel
The first test (measuring time)
syscall__ vmas t_5_10 t_6_8 delta_ns per_vma %
munmap__ 1 357 909 552 552 254%
munmap__ 2 442 1398 956 478 316%
munmap__ 4 614 2444 1830 458 398%
munmap__ 8 1017 4029 3012 377 396%
munmap__ 16 1889 6647 4758 297 352%
munmap__ 32 4109 11811 7702 241 287%
mprotect 1 235 439 204 204 187%
mprotect 2 495 1659 1164 582 335%
mprotect 4 741 3747 3006 752 506%
mprotect 8 1434 6755 5320 665 471%
mprotect 16 2958 13748 10790 674 465%
mprotect 32 6431 27827 21397 669 433%
madvise_ 1 191 240 49 49 125%
madvise_ 2 300 366 67 33 122%
madvise_ 4 450 623 173 43 138%
madvise_ 8 753 1110 357 45 147%
madvise_ 16 1467 2127 660 41 145%
madvise_ 32 2795 4109 1314 41 147%

The second test (measuring cpu cycle)
syscall__ vmas cpu_5_10 c_6_8 delta_cpu per_vma %
munmap__ 1 684 1790 1106 1106 262%
munmap__ 2 861 2819 1958 979 327%
munmap__ 4 1183 4959 3776 944 419%
munmap__ 8 1999 8262 6263 783 413%
munmap__ 16 3839 13099 9260 579 341%
munmap__ 32 7672 23221 15549 486 303%
mprotect 1 397 906 509 509 228%
mprotect 2 738 3019 2281 1140 409%
mprotect 4 1221 6149 4929 1232 504%
mprotect 8 2356 9978 7622 953 423%
mprotect 16 4961 20448 15487 968 412%
mprotect 32 9882 40972 31091 972 415%
madvise_ 1 351 434 82 82 123%
madvise_ 2 565 752 186 93 133%
madvise_ 4 872 1313 442 110 151%
madvise_ 8 1508 2271 763 95 151%
madvise_ 16 3078 4312 1234 77 140%
madvise_ 32 5893 8376 2483 78 142%

From 5.10 to 6.8
munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma.
mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma.
madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma.

In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the
increase from 5.10 to 6.8 is significantly larger, approximately ten times
greater for munmap and mprotect.

When I discuss the mm performance with Brian Makin, an engineer who worked
on performance, it was brought to my attention that such performance
benchmarks, which measuring millions of mm syscall in a tight loop, may
not accurately reflect real-world scenarios, such as that of a database
service.  Also this is tested using a single HW and ChromeOS, the data
from another HW or distribution might be different.  It might be best to
take this data with a grain of salt.

This patch (of 5):

Wire up mseal syscall for all architectures.

Link: https://lkml.kernel.org/r/20240415163527.626541-1-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-2-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com> [Bug #2]
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 months agoMerge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Thu, 23 May 2024 20:51:09 +0000 (13:51 -0700)]
Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Stable fixes:
   - nfs: fix undefined behavior in nfs_block_bits()
   - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS

  Bugfixes:
   - Fix mixing of the lock/nolock and local_lock mount options
   - NFSv4: Fixup smatch warning for ambiguous return
   - NFSv3: Fix remount when using the legacy binary mount api
   - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
   - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
   - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL

  Features and cleanups:
   - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
   - pNFS/filelayout: S layout segment range in LAYOUTGET
   - pNFS: rework pnfs_generic_pg_check_layout to check IO range
   - NFSv2: Turn off enabling of NFS v2 by default"

* tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix undefined behavior in nfs_block_bits()
  pNFS: rework pnfs_generic_pg_check_layout to check IO range
  pNFS/filelayout: check layout segment range
  pNFS/filelayout: fixup pNfs allocation modes
  rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
  NFS: Don't enable NFS v2 by default
  NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
  sunrpc: fix NFSACL RPC retry on soft mount
  SUNRPC: fix handling expired GSS context
  nfs: keep server info for remounts
  NFSv4: Fixup smatch warning for ambiguous return
  NFS: make sure lock/nolock overriding local_lock mount option
  NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
  pNFS/filelayout: Specify the layout segment range in LAYOUTGET
  pNFS/filelayout: Remove the whole file layout requirement

3 months agoMerge tag 'block-6.10-20240523' of git://git.kernel.dk/linux
Linus Torvalds [Thu, 23 May 2024 20:44:47 +0000 (13:44 -0700)]
Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:
 "Followup block updates, mostly due to NVMe being a bit late to the
  party. But nothing major in there, so not a big deal.

  In detail, this contains:

   - NVMe pull request via Keith:
       - Fabrics connection retries (Daniel, Hannes)
       - Fabrics logging enhancements (Tokunori)
       - RDMA delete optimization (Sagi)

   - ublk DMA alignment fix (me)

   - null_blk sparse warning fixes (Bart)

   - Discard support for brd (Keith)

   - blk-cgroup list corruption fixes (Ming)

   - blk-cgroup stat propagation fix (Waiman)

   - Regression fix for plugging stall with md (Yu)

   - Misc fixes or cleanups (David, Jeff, Justin)"

* tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits)
  null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
  blk-throttle: remove unused struct 'avg_latency_bucket'
  block: fix lost bio for plug enabled bio based device
  block: t10-pi: add MODULE_DESCRIPTION()
  blk-mq: add helper for checking if one CPU is mapped to specified hctx
  blk-cgroup: Properly propagate the iostat update up the hierarchy
  blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
  blk-cgroup: fix list corruption from resetting io stat
  cdrom: rearrange last_media_change check to avoid unintentional overflow
  nbd: Fix signal handling
  nbd: Remove a local variable from nbd_send_cmd()
  nbd: Improve the documentation of the locking assumptions
  nbd: Remove superfluous casts
  nbd: Use NULL to represent a pointer
  brd: implement discard support
  null_blk: Fix two sparse warnings
  ublk_drv: set DMA alignment mask to 3
  nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
  nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
  nvme: do not retry authentication failures
  ...

3 months agoMerge tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux
Linus Torvalds [Thu, 23 May 2024 20:41:49 +0000 (13:41 -0700)]
Merge tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:
 "Single fix here for a regression in 6.9, and then a simple cleanup
  removing some dead code"

* tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux:
  io_uring: remove checks for NULL 'sq_offset'
  io_uring/sqpoll: ensure that normal task_work is also run timely