Dave Airlie [Thu, 19 Jun 2025 23:01:21 +0000 (09:01 +1000)]
Merge tag 'drm-xe-fixes-2025-06-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- A workaround update (Vinay)
- Fix memset on iomem (Lucas)
- Fix early wedge on GuC Load failure (Daniele)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aFQ03kNzhbiNK7gW@fedora
Dave Airlie [Thu, 19 Jun 2025 22:57:10 +0000 (08:57 +1000)]
Merge tag 'drm-misc-fixes-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc3:
- vivante scheduler fix.
- v3d null pointer crash fix.
- fix backlight, booting GSP-RM, and potential integer shift overflow in nouveau.
- fix compiler warnings about unused linux/export.h
- fix malidp unknown modifier spam.
- fix for ssd130x.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/d44bab7b-01f8-45a8-a7f4-5d3d563d2f9d@linux.intel.com
Daniele Ceraolo Spurio [Wed, 11 Jun 2025 21:44:54 +0000 (14:44 -0700)]
drm/xe: Fix early wedge on GuC load failure
When the GuC fails to load we declare the device wedged. However, the
very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done
before the GT1 GuC objects are initialized, so things go bad when the
wedge code attempts to cleanup GT1. To fix this, check the initialization
status in the functions called during wedge.
Fixes:
7dbe8af13c18 ("drm/xe: Wedge the entire device")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: stable@vger.kernel.org # v6.12+: 1e1981b16bb1: drm/xe: Fix taking invalid lock on wedge
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit
0b93b7dcd9eb888a6ac7546560877705d4ad61bf)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Lucas De Marchi [Thu, 12 Jun 2025 22:14:12 +0000 (15:14 -0700)]
drm/xe: Fix memset on iomem
It should rather use xe_map_memset() as the BO is created with
XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init().
Fixes:
dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250612-vmap-vaddr-v1-1-26238ed443eb@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit
21cf47d89fba353b2d5915ba4718040c4cb955d3)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Vinay Belgaumkar [Thu, 12 Jun 2025 07:09:01 +0000 (00:09 -0700)]
drm/xe/bmg: Update Wa_16023588340
This allows for additional L2 caching modes.
Fixes:
01570b446939 ("drm/xe/bmg: implement Wa_16023588340")
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-2-94ba5dcc1e30@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit
6ab42fa03d4c88a0ddf5e56e62794853b198e7bf)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Dave Airlie [Thu, 19 Jun 2025 07:29:09 +0000 (17:29 +1000)]
Merge tag 'amd-drm-fixes-6.16-2025-06-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.16-2025-06-18:
amdgpu:
- DP tunneling fix
- LTTPR fix
- DSC fix
- DML2.x ABGR16161616 fix
- RMCM fix
- Backlight fixes
- GFX11 kicker support
- SDMA reset fixes
- VCN 5.0.1 fix
- Reset fix
- Misc small fixes
amdkfd:
- SDMA reset fix
- Fix race in GWS scheduling
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250618203115.1533451-1-alexander.deucher@amd.com
Dave Airlie [Thu, 19 Jun 2025 04:18:54 +0000 (14:18 +1000)]
Merge tag 'drm-intel-fixes-2025-06-18' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix MIPI vtotal programming off by one on Broxton
- Fix PMU code for GCOV and AutoFDO enabled build
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://lore.kernel.org/r/aFJfykDpUwtmpilE@jlahtine-mobl
Alex Deucher [Mon, 16 Jun 2025 19:47:13 +0000 (15:47 -0400)]
drm/amdgpu/sdma5.2: init engine reset mutex
Missing the mutex init.
Fixes:
47454f2dc0bf ("drm/amdgpu: Register the new sdma function pointers for sdma_v5_2")
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
ea685ff30a51a25dd9be90786933ada49a088f65)
Jay Cornwall [Wed, 11 Jun 2025 14:52:14 +0000 (09:52 -0500)]
drm/amdkfd: Fix race in GWS queue scheduling
q->gws is not updated atomically with qpd->mapped_gws_queue. If a
runlist is created between pqm_set_gws and update_queue it will
contain a queue which uses GWS in a process with no GWS allocated.
This will result in a scheduler hang.
Use q->properties.is_gws which is changed while holding the DQM lock.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
b98370220eb3110e82248e3354e16a489a492cfb)
Cc: stable@vger.kernel.org
Alex Deucher [Mon, 16 Jun 2025 19:43:55 +0000 (15:43 -0400)]
drm/amdgpu/sdma5: init engine reset mutex
Missing the mutex init.
Fixes:
e56d4bf57fab ("drm/amdgpu/: drm/amdgpu: Register the new sdma function pointers for sdma_v5_0")
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
3f4caf092f02f0de169c6122639af481c7edc8f9)
Alex Deucher [Mon, 2 Jun 2025 15:31:52 +0000 (11:31 -0400)]
drm/amdgpu: switch job hw_fence to amdgpu_fence
Use the amdgpu fence container so we can store additional
data in the fence. This also fixes the start_time handling
for MCBP since we were casting the fence to an amdgpu_fence
and it wasn't.
Fixes:
3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
bf1cd14f9e2e1fdf981eed273ddd595863f5288c)
Cc: stable@vger.kernel.org
Jesse Zhang [Mon, 16 Jun 2025 11:21:41 +0000 (19:21 +0800)]
drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequences
This commit makes two key fixes to SDMA v4.4.2 handling:
1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines
by reading the current value before modifying UTC_L1_ENABLE bit.
2. Ensure UTC_L1_ENABLE is consistently managed by:
- Adding the missing register write when enabling UTC_L1 during start
- Keeping UTC_L1 enabled by default as per hardware requirements
v2: Correct SDMA_CNTL setting (Philip)
Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
375bf564654e85a7b1b0657b191645b3edca1bda)
Cc: stable@vger.kernel.org
Lijo Lazar [Fri, 6 Jun 2025 04:59:28 +0000 (10:29 +0530)]
drm/amdgpu: Release reset locks during failures
Make sure to release reset domain lock in case of failures.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Fixes:
11bb33766f66 ("drm/amdgpu: refactor amdgpu_device_gpu_recover")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
1ab11a82681eb33a66f423216cb063e7f40c6f85)
Alex Hung [Wed, 4 Jun 2025 00:30:55 +0000 (18:30 -0600)]
drm/amd/display: Check dce_hwseq before dereferencing it
[WHAT]
hws was checked for null earlier in dce110_blank_stream, indicating hws
can be null, and should be checked whenever it is used.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
79db43611ff61280b6de58ce1305e0b2ecf675ad)
Cc: stable@vger.kernel.org
Sonny Jiang [Thu, 12 Jun 2025 15:01:08 +0000 (11:01 -0400)]
drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pause
Add a protection to ensure programming are all complete prior VCPU
starting. This is a WA for an unintended VCPU running.
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
c29521b529fa5e225feaf709d863a636ca0cbbfa)
Cc: stable@vger.kernel.org
Jesse Zhang [Wed, 11 Jun 2025 07:07:11 +0000 (15:07 +0800)]
drm/amdgpu: Use logical instance ID for SDMA v4_4_2 queue operations
Simplify SDMA v4_4_2 queue reset and stop operations by:
1. Removing GET_INST(SDMA0) conversion for ring->me
2. Using the logical instance ID (ring->me) directly
3. Maintaining consistent behavior with other SDMA queue operations
This change aligns with the existing queue handling logic where
ring->me already represents the correct instance identifier.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
3bab282dfe25dff7a55add432f56898505a6cc6c)
Cc: stable@vger.kernel.org
Jesse Zhang [Wed, 11 Jun 2025 07:02:09 +0000 (15:02 +0800)]
drm/amdgpu: Fix SDMA engine reset with logical instance ID
This commit makes the following improvements to SDMA engine reset handling:
1. Clarifies in the function documentation that instance_id refers to a logical ID
2. Adds conversion from logical to physical instance ID before performing reset
using GET_INST(SDMA0, instance_id) macro
3. Improves error messaging to indicate when a logical instance reset fails
4. Adds better code organization with blank lines for readability
The change ensures proper SDMA engine reset by using the correct physical
instance ID while maintaining the logical ID interface for callers.
V2: Remove harvest_config check and convert directly to physical instance (Lijo)
Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
5efa6217c239ed1ceec0f0414f9b6f6927387dfc)
Cc: stable@vger.kernel.org
Frank Min [Wed, 4 Jun 2025 13:17:05 +0000 (21:17 +0800)]
drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13
1. Add kicker firmwares loading for gfx11/smu13/psp13
2. Register additional MODULE_FIRMWARE entries for kicker fws
- gc_11_0_0_rlc_kicker.bin
- gc_11_0_0_imu_kicker.bin
- psp_13_0_0_sos_kicker.bin
- psp_13_0_0_ta_kicker.bin
- smu_13_0_0_kicker.bin
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
fb5ec2174d70a8989bc207d257db90ffeca3b163)
Cc: stable@vger.kernel.org
Frank Min [Wed, 4 Jun 2025 13:00:44 +0000 (21:00 +0800)]
drm/amdgpu: Add kicker device detection
1. add kicker device list
2. add kicker device checking helper function
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
09aa2b408f4ab689c3541d22b0968de0392ee406)
Cc: stable@vger.kernel.org
Mario Limonciello [Thu, 29 May 2025 14:46:32 +0000 (09:46 -0500)]
drm/amd/display: Export full brightness range to userspace
[WHY]
Userspace currently is offered a range from 0-0xFF but the PWM is
programmed from 0-0xFFFF. This can be limiting to some software
that wants to apply greater granularity.
[HOW]
Convert internally to firmware values only when mapping custom
brightness curves because these are in 0-0xFF range. Advertise full
PWM range to userspace.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
8dbd72cb790058ce52279af38a43c2b302fdd3e5)
Cc: stable@vger.kernel.org
Mario Limonciello [Thu, 29 May 2025 16:33:44 +0000 (11:33 -0500)]
drm/amd/display: Only read ACPI backlight caps once
[WHY]
Backlight caps are read already in amdgpu_dm_update_backlight_caps().
They may be updated by update_connector_ext_caps(). Reading again when
registering backlight device may cause wrong values to be used.
[HOW]
Use backlight caps already registered to the dm.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
148144f6d2f14b02eaaa39b86bbe023cbff350bd)
Cc: stable@vger.kernel.org
Yihan Zhu [Tue, 27 May 2025 20:47:40 +0000 (16:47 -0400)]
drm/amd/display: Fix RMCM programming seq errors
[WHY & HOW]
Fix RMCM programming sequence errors and mapping issues to pass the RMCM
test.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
11baa4975025033547f45f5894087a0dda6efbb8)
Cc: stable@vger.kernel.org
Alex Hung [Thu, 29 May 2025 16:59:19 +0000 (10:59 -0600)]
drm/amd/display: Fix mpv playback corruption on weston
[WHAT]
Severe video playback corruption is observed in the following setup:
weston 14.0.90 (built from source) + mpv v0.40.0 with command:
mpv bbb_sunflower_1080p_60fps_normal.mp4 --vo=gpu
[HOW]
ABGR16161616 needs to be included in dml2/2.1 translation.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
d023de809f85307ca819a9dbbceee6ae1f50e2ad)
Cc: stable@vger.kernel.org
Nicholas Kazlauskas [Wed, 21 May 2025 20:40:25 +0000 (16:40 -0400)]
drm/amd/display: Add more checks for DSC / HUBP ONO guarantees
[WHY]
For non-zero DSC instances it's possible that the HUBP domain required
to drive it for sequential ONO ASICs isn't met, potentially causing
the logic to the tile to enter an undefined state leading to a system
hang.
[HOW]
Add more checks to ensure that the HUBP domain matching the DSC instance
is appropriately powered.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Duncan Ma <duncan.ma@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
da63df07112e5a9857a8d2aaa04255c4206754ec)
Cc: stable@vger.kernel.org
Michael Strauss [Wed, 26 Feb 2025 15:03:48 +0000 (10:03 -0500)]
drm/amd/display: Get LTTPR IEEE OUI/Device ID From Closest LTTPR To Host
[WHY]
These fields are read for the explicit purpose of detecting embedded LTTPRs
(i.e. between host ASIC and the user-facing port), and thus need to
calculate the correct DPCD address offset based on LTTPR count to target
the appropriate LTTPR's DPCD register space with these queries.
[HOW]
Cascaded LTTPRs in a link each snoop and increment LTTPR count when queried
via DPCD read, so an LTTPR embedded in a source device (e.g. USB4 port on a
laptop) will always be addressible using the max LTTPR count seen by the
host. Therefore we simply need to use a recently added helper function to
calculate the correct DPCD address to target potentially embedded LTTPRs
based on the received LTTPR count.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
791897f5c77a2a65d0e500be4743af2ddf6eb061)
Cc: stable@vger.kernel.org
Peichen Huang [Mon, 26 May 2025 08:04:10 +0000 (16:04 +0800)]
drm/amd/display: Add dc cap for dp tunneling
[WHAT]
1. add dc cap for dp tunneling
2. add function to get index of host router
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Cruise Hung <cruise.hung@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
29e178d13979cf6fdb42c5fe2dfec2da2306c4ad)
Cc: stable@vger.kernel.org
Jesse.Zhang [Thu, 29 May 2025 03:27:37 +0000 (11:27 +0800)]
drm/amdkfd: move SDMA queue reset capability check to node_show
Relocate the per-SDMA queue reset capability check from
kfd_topology_set_capabilities() to node_show() to ensure we read the
latest value of sdma.supported_reset after all IP blocks are initialized.
Fixes:
ceb7114c961b ("drm/amdkfd: flag per-sdma queue reset supported to user space")
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
e17df7b086cf908cedd919f448da9e00028419bb)
Dave Airlie [Wed, 18 Jun 2025 01:37:14 +0000 (11:37 +1000)]
Merge tag 'drm-msm-fixes-2025-06-16' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
Fixes for v6.16-rc3
Display:
- Fixed DP output on SDM845
- Fixed 10nm DSI PLL init
GPU:
- SUBMIT ioctl error path leak fixes
- drm half of stall-on-fault fixes. Note there is a soft dependency,
to get correct mmu fault devcoredumps, on arm-smmu changes which
are not in this branch, but have already been merged by Linus. So
by the time Linus merges this, everything should be peachy.
- a7xx: Missing CP_RESET_CONTEXT_STATE
- Skip GPU component bind if GPU is not in the device table.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <rob.clark@oss.qualcomm.com>
Link: https://lore.kernel.org/r/CACSVV03=OH74ip8O1xqb8RJWGyM4HFuUnWuR=p3zJR+-ko_AJA@mail.gmail.com
Maíra Canal [Mon, 2 Jun 2025 13:22:16 +0000 (10:22 -0300)]
drm/etnaviv: Protect the scheduler's pending list with its lock
Commit
704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still
active") ensured that active jobs are returned to the pending list when
extending the timeout. However, it didn't use the pending list's lock to
manipulate the list, which causes a race condition as the scheduler's
workqueues are running.
Hold the lock while manipulating the scheduler's pending list to prevent
a race.
Cc: stable@vger.kernel.org
Fixes:
704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active")
Reported-by: Philipp Stanner <phasta@kernel.org>
Closes: https://lore.kernel.org/dri-devel/
964e59ba1539083ef29b06d3c78f5e2e9b138ab8.camel@mailbox.org/
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250602132240.93314-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Maíra Canal [Mon, 2 Jun 2025 15:14:02 +0000 (12:14 -0300)]
drm/v3d: Avoid NULL pointer dereference in `v3d_job_update_stats()`
The following kernel Oops was recently reported by Mesa CI:
[ 800.139824] Unable to handle kernel NULL pointer dereference at virtual address
0000000000000588
[ 800.148619] Mem abort info:
[ 800.151402] ESR = 0x0000000096000005
[ 800.155141] EC = 0x25: DABT (current EL), IL = 32 bits
[ 800.160444] SET = 0, FnV = 0
[ 800.163488] EA = 0, S1PTW = 0
[ 800.166619] FSC = 0x05: level 1 translation fault
[ 800.171487] Data abort info:
[ 800.174357] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 800.179832] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 800.184873] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 800.190176] user pgtable: 4k pages, 39-bit VAs, pgdp=
00000001014c2000
[ 800.196607] [
0000000000000588] pgd=
0000000000000000, p4d=
0000000000000000, pud=
0000000000000000
[ 800.205305] Internal error: Oops:
0000000096000005 [#1] PREEMPT SMP
[ 800.211564] Modules linked in: vc4 snd_soc_hdmi_codec drm_display_helper v3d cec gpu_sched drm_dma_helper drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm i2c_brcmstb snd_timer snd backlight
[ 800.234448] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1 Debian 1:6.12.25-1+rpt1
[ 800.244182] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 800.250005] pstate:
600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 800.256959] pc : v3d_job_update_stats+0x60/0x130 [v3d]
[ 800.262112] lr : v3d_job_update_stats+0x48/0x130 [v3d]
[ 800.267251] sp :
ffffffc080003e60
[ 800.270555] x29:
ffffffc080003e60 x28:
ffffffd842784980 x27:
0224012000000000
[ 800.277687] x26:
ffffffd84277f630 x25:
ffffff81012fd800 x24:
0000000000000020
[ 800.284818] x23:
ffffff8040238b08 x22:
0000000000000570 x21:
0000000000000158
[ 800.291948] x20:
0000000000000000 x19:
ffffff8040238000 x18:
0000000000000000
[ 800.299078] x17:
ffffffa8c1bd2000 x16:
ffffffc080000000 x15:
0000000000000000
[ 800.306208] x14:
0000000000000000 x13:
0000000000000000 x12:
0000000000000000
[ 800.313338] x11:
0000000000000040 x10:
0000000000001a40 x9 :
ffffffd83b39757c
[ 800.320468] x8 :
ffffffd842786420 x7 :
7fffffffffffffff x6 :
0000000000ef32b0
[ 800.327598] x5 :
00ffffffffffffff x4 :
0000000000000015 x3 :
ffffffd842784980
[ 800.334728] x2 :
0000000000000004 x1 :
0000000000010002 x0 :
000000ba4c0ca382
[ 800.341859] Call trace:
[ 800.344294] v3d_job_update_stats+0x60/0x130 [v3d]
[ 800.349086] v3d_irq+0x124/0x2e0 [v3d]
[ 800.352835] __handle_irq_event_percpu+0x58/0x218
[ 800.357539] handle_irq_event+0x54/0xb8
[ 800.361369] handle_fasteoi_irq+0xac/0x240
[ 800.365458] handle_irq_desc+0x48/0x68
[ 800.369200] generic_handle_domain_irq+0x24/0x38
[ 800.373810] gic_handle_irq+0x48/0xd8
[ 800.377464] call_on_irq_stack+0x24/0x58
[ 800.381379] do_interrupt_handler+0x88/0x98
[ 800.385554] el1_interrupt+0x34/0x68
[ 800.389123] el1h_64_irq_handler+0x18/0x28
[ 800.393211] el1h_64_irq+0x64/0x68
[ 800.396603] default_idle_call+0x3c/0x168
[ 800.400606] do_idle+0x1fc/0x230
[ 800.403827] cpu_startup_entry+0x40/0x50
[ 800.407742] rest_init+0xe4/0xf0
[ 800.410962] start_kernel+0x5e8/0x790
[ 800.414616] __primary_switched+0x80/0x90
[ 800.418622] Code:
8b170277 8b160296 11000421 b9000861 (
b9401ac1)
[ 800.424707] ---[ end trace
0000000000000000 ]---
[ 800.457313] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
This issue happens when the file descriptor is closed before the jobs
submitted by it are completed. When the job completes, we update the
global GPU stats and the per-fd GPU stats, which are exposed through
fdinfo. If the file descriptor was closed, then the struct `v3d_file_priv`
and its stats were already freed and we can't update the per-fd stats.
Therefore, if the file descriptor was already closed, don't update the
per-fd GPU stats, only update the global ones.
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Link: https://lore.kernel.org/r/20250602151451.10161-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Ville Syrjälä [Fri, 14 Mar 2025 15:01:34 +0000 (17:01 +0200)]
drm/i915/dsi: Fix off by one in BXT_MIPI_TRANS_VTOTAL
BXT_MIPI_TRANS_VTOTAL must be programmed with vtotal-1
instead of vtotal. Make it so.
Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250314150136.22564-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit
7b3685c9b38c3097f465efec8b24dbed63258cf6)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Tzung-Bi Shih [Thu, 12 Jun 2025 08:30:23 +0000 (08:30 +0000)]
drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled
i915_pmu.c may fail to build with GCOV and AutoFDO enabled.
../drivers/gpu/drm/i915/i915_pmu.c:116:3: error: call to '__compiletime_assert_487' declared with 'error' attribute: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1
116 | BUILD_BUG_ON(bit >
| ^
Here is a way to reproduce the issue:
$ git checkout v6.15
$ mkdir build
$ ./scripts/kconfig/merge_config.sh -O build -n -m <(cat <<EOF
CONFIG_DRM=y
CONFIG_PCI=y
CONFIG_DRM_I915=y
CONFIG_PERF_EVENTS=y
CONFIG_DEBUG_FS=y
CONFIG_GCOV_KERNEL=y
CONFIG_GCOV_PROFILE_ALL=y
CONFIG_AUTOFDO_CLANG=y
EOF
)
$ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \
olddefconfig
$ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \
CLANG_AUTOFDO_PROFILE=...PATH_TO_SOME_AFDO_PROFILE... \
drivers/gpu/drm/i915/i915_pmu.o
Although not super sure what happened, by reviewing the code, it should
depend on `__builtin_constant_p(bit)` directly instead of assuming
`__builtin_constant_p(config)` makes `bit` a builtin constant.
Also fix a nit, to reuse the `bit` local variable.
Fixes:
a644fde77ff7 ("drm/i915/pmu: Change bitmask of enabled events to u32")
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20250612083023.562585-1-tzungbi@kernel.org
(cherry picked from commit
686d773186bf72b739bab7e12eb8665d914676ee)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Linus Torvalds [Sun, 15 Jun 2025 20:49:41 +0000 (13:49 -0700)]
Linux 6.16-rc2
Linus Torvalds [Sun, 15 Jun 2025 16:14:27 +0000 (09:14 -0700)]
Merge tag 'kbuild-fixes-v6.16' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Move warnings about linux/export.h from W=1 to W=2
- Fix structure type overrides in gendwarfksyms
* tag 'kbuild-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
gendwarfksyms: Fix structure type overrides
kbuild: move warnings about linux/export.h from W=1 to W=2
Sami Tolvanen [Sat, 14 Jun 2025 00:55:33 +0000 (00:55 +0000)]
gendwarfksyms: Fix structure type overrides
As we always iterate through the entire die_map when expanding
type strings, recursively processing referenced types in
type_expand_child() is not actually necessary. Furthermore,
the type_string kABI rule added in commit
c9083467f7b9
("gendwarfksyms: Add a kABI rule to override type strings") can
fail to override type strings for structures due to a missing
kabi_get_type_string() check in this function.
Fix the issue by dropping the unnecessary recursion and moving
the override check to type_expand(). Note that symbol versions
are otherwise unchanged with this patch.
Fixes:
c9083467f7b9 ("gendwarfksyms: Add a kABI rule to override type strings")
Reported-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Masahiro Yamada [Thu, 12 Jun 2025 16:08:48 +0000 (01:08 +0900)]
kbuild: move warnings about linux/export.h from W=1 to W=2
This hides excessive warnings, as nobody builds with W=2.
Fixes:
a934a57a42f6 ("scripts/misc-check: check missing #include <linux/export.h> when W=1")
Fixes:
7d95680d64ac ("scripts/misc-check: check unnecessary #include <linux/export.h> when W=1")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Linus Torvalds [Sat, 14 Jun 2025 17:13:32 +0000 (10:13 -0700)]
Merge tag 'v6.16-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- SMB3.1.1 POSIX extensions fix for char remapping
- Fix for repeated directory listings when directory leases enabled
- deferred close handle reuse fix
* tag 'v6.16-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: improve directory cache reuse for readdir operations
smb: client: fix perf regression with deferred closes
smb: client: disable path remapping with POSIX extensions
Linus Torvalds [Sat, 14 Jun 2025 17:01:47 +0000 (10:01 -0700)]
Merge tag 'iommu-fixes-v6.16-rc1' of git://git./linux/kernel/git/iommu/linux
Pull iommu fix from Joerg Roedel:
- Fix PTE size calculation for NVidia Tegra
* tag 'iommu-fixes-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/tegra: Fix incorrect size calculation
Linus Torvalds [Sat, 14 Jun 2025 16:25:22 +0000 (09:25 -0700)]
Merge tag 'block-6.16-
20250614' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix for a deadlock on queue freeze with zoned writes
- Fix for zoned append emulation
- Two bio folio fixes, for sparsemem and for very large folios
- Fix for a performance regression introduced in 6.13 when plug
insertion was changed
- Fix for NVMe passthrough handling for polled IO
- Document the ublk auto registration feature
- loop lockdep warning fix
* tag 'block-6.16-
20250614' of git://git.kernel.dk/linux:
nvme: always punt polled uring_cmd end_io work to task_work
Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists
block: Fix bvec_set_folio() for very large folios
bio: Fix bio_first_folio() for SPARSEMEM without VMEMMAP
block: use plug request list tail for one-shot backmerge attempt
block: don't use submit_bio_noacct_nocheck in blk_zone_wplug_bio_work
block: Clear BIO_EMULATES_ZONE_APPEND flag on BIO completion
ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)
loop: move lo_set_size() out of queue freeze
Linus Torvalds [Sat, 14 Jun 2025 15:44:54 +0000 (08:44 -0700)]
Merge tag 'io_uring-6.16-
20250614' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for a race between SQPOLL exit and fdinfo reading.
It's slim and I was only able to reproduce this with an artificial
delay in the kernel. Followup sparse fix as well to unify the access
to ->thread.
- Fix for multiple buffer peeking, avoiding truncation if possible.
- Run local task_work for IOPOLL reaping when the ring is exiting.
This currently isn't done due to an assumption that polled IO will
never need task_work, but a fix on the block side is going to change
that.
* tag 'io_uring-6.16-
20250614' of git://git.kernel.dk/linux:
io_uring: run local task_work from ring exit IOPOLL reaping
io_uring/kbuf: don't truncate end buffer for multiple buffer peeks
io_uring: consistently use rcu semantics with sqpoll thread
io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()
Linus Torvalds [Sat, 14 Jun 2025 15:38:34 +0000 (08:38 -0700)]
Merge tag 'rust-fixes-6.16' of git://git./linux/kernel/git/ojeda/linux
Pull Rust fix from Miguel Ojeda:
- 'hrtimer': fix future compile error when the 'impl_has_hr_timer!'
macro starts to get called
* tag 'rust-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: time: Fix compile error in impl_has_hr_timer macro
Linus Torvalds [Sat, 14 Jun 2025 15:18:09 +0000 (08:18 -0700)]
Merge tag 'mm-hotfixes-stable-2025-06-13-21-56' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"9 hotfixes. 3 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels. Only 4 are for MM"
* tag 'mm-hotfixes-stable-2025-06-13-21-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: add mmap_prepare() compatibility layer for nested file systems
init: fix build warnings about export.h
MAINTAINERS: add Barry as a THP reviewer
drivers/rapidio/rio_cm.c: prevent possible heap overwrite
mm: close theoretical race where stale TLB entries could linger
mm/vma: reset VMA iterator on commit_merge() OOM failure
docs: proc: update VmFlags documentation in smaps
scatterlist: fix extraneous '@'-sign kernel-doc notation
selftests/mm: skip failed memfd setups in gup_longterm
Rob Clark [Fri, 13 Jun 2025 14:41:44 +0000 (07:41 -0700)]
drm/msm: Fix inverted WARN_ON() logic
We want to WARN_ON() if info is NULL.
Suggested-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Fixes:
0838fc3e6718 ("drm/msm/adreno: Check for recognized GPU before bind")
Tested-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reported-by: Alexey Klimov <alexey.klimov@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/658631/
Linus Torvalds [Fri, 13 Jun 2025 23:49:39 +0000 (16:49 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"All fixes for drivers.
The core change in the error handler is simply to translate an ALUA
specific sense code into a retry the ALUA components can handle and
won't impact any other devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: error: alua: I/O errors for ALUA state transitions
scsi: storvsc: Increase the timeouts to storvsc_timeout
scsi: s390: zfcp: Ensure synchronous unit_add
scsi: iscsi: Fix incorrect error path labels for flashnode operations
scsi: mvsas: Fix typos in per-phy comments and SAS cmd port registers
scsi: core: ufs: Fix a hang in the error handler
Linus Torvalds [Fri, 13 Jun 2025 23:27:27 +0000 (16:27 -0700)]
Merge tag 'drm-fixes-2025-06-14' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Quiet week, only two pull requests came my way, xe has a couple of
fixes and then a bunch of fixes across the board, vc4 probably fixes
the biggest problem:
vc4:
- Fix infinite EPROBE_DEFER loop in vc4 probing
amdxdna:
- Fix amdxdna firmware size
meson:
- modesetting fixes
sitronix:
- Kconfig fix for st7171-i2c
dma-buf:
- Fix -EBUSY WARN_ON_ONCE in dma-buf
udmabuf:
- Use dma_sync_sgtable_for_cpu in udmabuf
xe:
- Fix regression disallowing 64K SVM migration
- Use a bounce buffer for WA BB"
* tag 'drm-fixes-2025-06-14' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/lrc: Use a temporary buffer for WA BB
udmabuf: use sgtable-based scatterlist wrappers
dma-buf: fix compare in WARN_ON_ONCE
drm/sitronix: st7571-i2c: Select VIDEOMODE_HELPERS
drm/meson: fix more rounding issues with 59.94Hz modes
drm/meson: use vclk_freq instead of pixel_freq in debug print
drm/meson: fix debug log statement when setting the HDMI clocks
drm/vc4: fix infinite EPROBE_DEFER loop
drm/xe/svm: Fix regression disallowing 64K SVM migration
accel/amdxdna: Fix incorrect PSP firmware size
Jens Axboe [Fri, 13 Jun 2025 21:24:53 +0000 (15:24 -0600)]
io_uring: run local task_work from ring exit IOPOLL reaping
In preparation for needing to shift NVMe passthrough to always use
task_work for polled IO completions, ensure that those are suitably
run at exit time. See commit:
9ce6c9875f3e ("nvme: always punt polled uring_cmd end_io work to task_work")
for details on why that is necessary.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 13 Jun 2025 19:37:41 +0000 (13:37 -0600)]
nvme: always punt polled uring_cmd end_io work to task_work
Currently NVMe uring_cmd completions will complete locally, if they are
polled. This is done because those completions are always invoked from
task context. And while that is true, there's no guarantee that it's
invoked under the right ring context, or even task. If someone does
NVMe passthrough via multiple threads and with a limited number of
poll queues, then ringA may find completions from ringB. For that case,
completing the request may not be sound.
Always just punt the passthrough completions via task_work, which will
redirect the completion, if needed.
Cc: stable@vger.kernel.org
Fixes:
585079b6e425 ("nvme: wire up async polling for io passthrough commands")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 13 Jun 2025 20:39:15 +0000 (13:39 -0700)]
Merge tag 'acpi-6.16-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI APEI error injection driver failure that started to
occur after switching it over to using a faux device, address an EC
driver issue related to invalid ECDT tables, clean up the usage of
mwait_idle_with_hints() in the ACPI PAD driver, add a new IRQ override
quirk, and fix a NULL pointer dereference related to nosmp:
- Update the faux device handling code in the driver core and address
an ACPI APEI error injection driver failure that started to occur
after switching it over to using a faux device on top of that (Dan
Williams)
- Update data types of variables passed as arguments to
mwait_idle_with_hints() in the ACPI PAD (processor aggregator
device) driver to match the function definition after recent
changes (Uros Bizjak)
- Fix a NULL pointer dereference in the ACPI CPPC library that occurs
when nosmp is passed to the kernel in the command line (Yunhui Cui)
- Ignore ECDT tables with an invalid ID string to prevent using an
incorrect GPE for signaling events on some systems (Armin Wolf)
- Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan)"
* tag 'acpi-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Use IRQ override on MACHENIKE 16P
ACPI: EC: Ignore ECDT tables with an invalid ID string
ACPI: CPPC: Fix NULL pointer dereference when nosmp is used
ACPI: PAD: Update arguments of mwait_idle_with_hints()
ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
driver core: faux: Quiet probe failures
driver core: faux: Suppress bind attributes
Linus Torvalds [Fri, 13 Jun 2025 20:27:41 +0000 (13:27 -0700)]
Merge tag 'pm-6.16-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the cpupower utility installation, fix up the recently added
Rust abstractions for cpufreq and OPP, restore the x86 update
eliminating mwait_play_dead_cpuid_hint() that has been reverted during
the 6.16 merge window along with preventing the failure caused by it
from happening, and clean up mwait_idle_with_hints() usage in
intel_idle:
- Implement CpuId Rust abstraction and use it to fix doctest failure
related to the recently introduced cpumask abstraction (Viresh
Kumar)
- Do minor cleanups in the `# Safety` sections for cpufreq
abstractions added recently (Viresh Kumar)
- Unbreak cpupower systemd service units installation on some systems
by adding a unitdir variable for specifying the location to install
them (Francesco Poli)
- Eliminate mwait_play_dead_cpuid_hint() again after reverting its
elimination during the 6.16 merge window due to a problem with
handling "dead" SMT siblings, but this time prevent leaving them in
C1 after initialization by taking them online and back offline when
a proper cpuidle driver for the platform has been registered
(Rafael Wysocki)
- Update data types of variables passed as arguments to
mwait_idle_with_hints() to match the function definition after
recent changes (Uros Bizjak)"
* tag 'pm-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
rust: cpu: Add CpuId::current() to retrieve current CPU ID
rust: Use CpuId in place of raw CPU numbers
rust: cpu: Introduce CpuId abstraction
intel_idle: Update arguments of mwait_idle_with_hints()
cpufreq: Convert `/// SAFETY` lines to `# Safety` sections
cpupower: split unitdir from libdir in Makefile
Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
ACPI: processor: Rescan "dead" SMT siblings during initialization
intel_idle: Rescan "dead" SMT siblings during initialization
x86/smp: PM/hibernate: Split arch_resume_nosmt()
intel_idle: Use subsys_initcall_sync() for initialization
Rafael J. Wysocki [Fri, 13 Jun 2025 19:55:30 +0000 (21:55 +0200)]
Merge branches 'acpi-pad', 'acpi-cppc', 'acpi-ec' and 'acpi-resource'
Merge assorted ACPI updates for 6.16-rc2:
- Update data types of variables passed as arguments to
mwait_idle_with_hints() in the ACPI PAD (processor aggregator device)
driver to match the function definition after recent changes (Uros
Bizjak).
- Fix a NULL pointer dereference in the ACPI CPPC library that occurs
when nosmp is passed to the kernel in the command line (Yunhui Cui).
- Ignore ECDT tables with an invalid ID string to prevent using an
incorrect GPE for signaling events on some systems (Armin Wolf).
- Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan).
* acpi-pad:
ACPI: PAD: Update arguments of mwait_idle_with_hints()
* acpi-cppc:
ACPI: CPPC: Fix NULL pointer dereference when nosmp is used
* acpi-ec:
ACPI: EC: Ignore ECDT tables with an invalid ID string
* acpi-resource:
ACPI: resource: Use IRQ override on MACHENIKE 16P
Rafael J. Wysocki [Fri, 13 Jun 2025 19:28:07 +0000 (21:28 +0200)]
Merge branch 'pm-cpuidle'
Merge cpuidle updates for 6.16-rc2:
- Update data types of variables passed as arguments to
mwait_idle_with_hints() to match the function definition
after recent changes (Uros Bizjak).
- Eliminate mwait_play_dead_cpuid_hint() again after reverting its
elimination during the merge window due to a problem with handling
"dead" SMT siblings, but this time prevent leaving them in C1 after
initialization by taking them online and back offline when a proper
cpuidle driver for the platform has been registered (Rafael Wysocki).
* pm-cpuidle:
intel_idle: Update arguments of mwait_idle_with_hints()
Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
ACPI: processor: Rescan "dead" SMT siblings during initialization
intel_idle: Rescan "dead" SMT siblings during initialization
x86/smp: PM/hibernate: Split arch_resume_nosmt()
intel_idle: Use subsys_initcall_sync() for initialization
Rafael J. Wysocki [Fri, 13 Jun 2025 19:25:38 +0000 (21:25 +0200)]
Merge branch 'pm-tools'
Merge a cpupower utility fix for 6.16-rc2 that unbreaks systemd service
units installation on some sysems (Francesco Poli).
* pm-tools:
cpupower: split unitdir from libdir in Makefile
Linus Torvalds [Fri, 13 Jun 2025 18:01:44 +0000 (11:01 -0700)]
Merge tag 'spi-fix-v6.16-rc1' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A collection of driver specific fixes, most minor apart from the OMAP
ones which disable some recent performance optimisations in some
non-standard cases where we could start driving the bus incorrectly.
The change to the stm32-ospi driver to use the newer reset APIs is a
fix for interactions with other IP sharing the same reset line in some
SoCs"
* tag 'spi-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-pci1xxxx: Drop MSI-X usage as unsupported by DMA engine
spi: stm32-ospi: clean up on error in probe()
spi: stm32-ospi: Make usage of reset_control_acquire/release() API
spi: offload: check offload ops existence before disabling the trigger
spi: spi-pci1xxxx: Fix error code in probe
spi: loongson: Fix build warnings about export.h
spi: omap2-mcspi: Disable multi-mode when the previous message kept CS asserted
spi: omap2-mcspi: Disable multi mode when CS should be kept asserted after message
Linus Torvalds [Fri, 13 Jun 2025 18:00:19 +0000 (11:00 -0700)]
Merge tag 'regulator-fix-v6.16-rc1' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One minor fix for a leak in the DT parsing code in the max20086 driver"
* tag 'regulator-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max20086: Fix refcount leak in max20086_parse_regulators_dt()
Oleg Nesterov [Fri, 13 Jun 2025 17:26:50 +0000 (19:26 +0200)]
posix-cpu-timers: fix race between handle_posix_cpu_timers() and posix_cpu_timer_del()
If an exiting non-autoreaping task has already passed exit_notify() and
calls handle_posix_cpu_timers() from IRQ, it can be reaped by its parent
or debugger right after unlock_task_sighand().
If a concurrent posix_cpu_timer_del() runs at that moment, it won't be
able to detect timer->it.cpu.firing != 0: cpu_timer_task_rcu() and/or
lock_task_sighand() will fail.
Add the tsk->exit_state check into run_posix_cpu_timers() to fix this.
This fix is not needed if CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, because
exit_task_work() is called before exit_notify(). But the check still
makes sense, task_work_add(&tsk->posix_cputimers_work.work) will fail
anyway in this case.
Cc: stable@vger.kernel.org
Reported-by: Benoît Sevens <bsevens@google.com>
Fixes:
0bdd2ed4138e ("sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 13 Jun 2025 17:51:11 +0000 (10:51 -0700)]
Merge tag 'trace-v6.16-rc1' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
- Do not free "head" variable in filter_free_subsystem_filters()
The first error path jumps to "free_now" label but first frees the
newly allocated "head" variable. But the "free_now" code checks this
variable, and if it is not NULL, it will iterate the list. As this
list variable was already initialized, the "free_now" code will not
do anything as it is empty. But freeing it will cause a UAF bug.
The error path should simply jump to the "free_now" label and leave
the "head" variable alone.
* tag 'trace-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Do not free "head" on error path of filter_free_subsystem_filters()
Linus Torvalds [Fri, 13 Jun 2025 17:05:31 +0000 (10:05 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Rework of system register accessors for system registers that are
directly writen to memory, so that sanitisation of the in-memory
value happens at the correct time (after the read, or before the
write). For convenience, RMW-style accessors are also provided.
- Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
which was always broken.
x86:
- Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to
pass only the "untouched" addresses and flipping the shared/private
bit in the implementation.
- Disable SEV-SNP support on initialization failure
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY
KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY
KVM: SEV: Disable SEV-SNP support on initialization failure
KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases
KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases
KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases
KVM: arm64: selftests: Fix help text for arch_timer_edge_cases
KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand
KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg
KVM: arm64: Add RMW specific sysreg accessor
KVM: arm64: Add assignment-specific sysreg accessor
Jens Axboe [Fri, 13 Jun 2025 17:01:49 +0000 (11:01 -0600)]
io_uring/kbuf: don't truncate end buffer for multiple buffer peeks
If peeking a bunch of buffers, normally io_ring_buffers_peek() will
truncate the end buffer. This isn't optimal as presumably more data will
be arriving later, and hence it's better to stop with the last full
buffer rather than truncate the end buffer.
Cc: stable@vger.kernel.org
Fixes:
35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Reported-by: Christian Mazakas <christian.mazakas@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 13 Jun 2025 16:59:29 +0000 (09:59 -0700)]
Merge tag 'v6.16-p4' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a broken self-test in hkdf (new regression)"
* tag 'v6.16-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: hkdf - move to late_initcall
Linus Torvalds [Fri, 13 Jun 2025 16:49:07 +0000 (09:49 -0700)]
Merge tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"As usual, highlighting the ones users have been noticing:
- Fix a small issue with has_case_insensitive not being propagated on
snapshot creation; this led to fsck errors, which we're harmless
because we're not using this flag yet (it's for overlayfs +
casefolding).
- Log the error being corrected in the journal when we're doing fsck
repair: this was one of the "lessons learned" from the i_nlink 0 ->
subvolume deletion bug, where reconstructing what had happened by
analyzing the journal was a bit more difficult than it needed to
be.
- Don't schedule btree node scan to run in the superblock: this fixes
a regression from the 6.16 recovery passes rework, and let to it
running unnecessarily.
The real issue here is that we don't have online, "self healing"
style topology repair yet: topology repair currently has to run
before we go RW, which means that we may schedule it unnecessarily
after a transient error. This will be fixed in the future.
- We now track, in btree node flags, the reason it was scheduled to
be rewritten. We discovered a deadlock in recovery when many btree
nodes need to be rewritten because they're degraded: fully fixing
this will take some work but it's now easier to see what's going
on.
For the bug report where this came up, a device had been kicked RO
due to transient errors: manually setting it back to RW was
sufficient to allow recovery to succeed.
- Mark a few more fsck errors as autofix: as a reminder to users,
please do keep reporting cases where something needs to be repaired
and is not repaired automatically (i.e. cases where -o fix_errors
or fsck -y is required).
- rcu_pending.c now works with PREEMPT_RT
- 'bcachefs device add', then umount, then remount wasn't working -
we now emit a uevent so that the new device's new superblock is
correctly picked up
- Assorted repair fixes: btree node scan will no longer incorrectly
update sb->version_min,
- Assorted syzbot fixes"
* tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs: (23 commits)
bcachefs: Don't trace should_be_locked unless changing
bcachefs: Ensure that snapshot creation propagates has_case_insensitive
bcachefs: Print devices we're mounting on multi device filesystems
bcachefs: Don't trust sb->nr_devices in members_to_text()
bcachefs: Fix version checks in validate_bset()
bcachefs: ioctl: avoid stack overflow warning
bcachefs: Don't pass trans to fsck_err() in gc_accounting_done
bcachefs: Fix leak in bch2_fs_recovery() error path
bcachefs: Fix rcu_pending for PREEMPT_RT
bcachefs: Fix downgrade_table_extra()
bcachefs: Don't put rhashtable on stack
bcachefs: Make sure opts.read_only gets propagated back to VFS
bcachefs: Fix possible console lock involved deadlock
bcachefs: mark more errors autofix
bcachefs: Don't persistently run scan_for_btree_nodes
bcachefs: Read error message now prints if self healing
bcachefs: Only run 'increase_depth' for keys from btree node csan
bcachefs: Mark need_discard_freespace_key_bad autofix
bcachefs: Update /dev/disk/by-uuid on device add
bcachefs: Add more flags to btree nodes for rewrite reason
...
Bagas Sanjaya [Fri, 13 Jun 2025 02:38:57 +0000 (09:38 +0700)]
Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists
Stephen Rothwell reports htmldocs warning on ublk docs:
Documentation/block/ublk.rst:414: ERROR: Unexpected indentation. [docutils]
Fix the warning by separating sublists of auto buffer registration
fallback behavior from their appropriate parent list item.
Fixes:
ff20c516485e ("ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/
20250612132638.
193de386@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20250613023857.15971-1-bagasdotme@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jason Gunthorpe [Tue, 3 Jun 2025 19:14:45 +0000 (16:14 -0300)]
iommu/tegra: Fix incorrect size calculation
This driver uses a mixture of ways to get the size of a PTE,
tegra_smmu_set_pde() did it as sizeof(*pd) which became wrong when pd
switched to a struct tegra_pd.
Switch pd back to a u32* in tegra_smmu_set_pde() so the sizeof(*pd)
returns 4.
Fixes:
50568f87d1e2 ("iommu/terga: Do not use struct page as the handle for as->pd memory")
Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Closes: https://lore.kernel.org/all/
62e7f7fe-6200-4e4f-ad42-
d58ad272baa6@tecnico.ulisboa.pt/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Link: https://lore.kernel.org/r/0-v1-da7b8b3d57eb+ce-iommu_terga_sizeof_jgg@nvidia.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Bagas Sanjaya [Wed, 11 Jun 2025 02:08:06 +0000 (09:08 +0700)]
Documentation: nouveau: Update GSP message queue kernel-doc reference
GSP message queue docs has been moved following RPC handling split in
commit
8a8b1ec5261f20 ("drm/nouveau/gsp: split rpc handling out on its
own"), before GSP-RM implementation is versioned in commit
c472d828348caf
("drm/nouveau/gsp: move subdev/engine impls to subdev/gsp/rm/r535/").
However, the kernel-doc reference in nouveau docs is left behind, which
triggers htmldocs warnings:
ERROR: Cannot find file ./drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
WARNING: No kernel-doc for file ./drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
Update the reference.
Fixes:
c472d828348c ("drm/nouveau/gsp: move subdev/engine impls to subdev/gsp/rm/r535/")
Fixes:
8a8b1ec5261f ("drm/nouveau/gsp: split rpc handling out on its own")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20250611020805.22418-2-bagasdotme@gmail.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Jacob Keller [Tue, 10 Jun 2025 21:54:51 +0000 (14:54 -0700)]
drm/nouveau/bl: increase buffer size to avoid truncate warning
The nouveau_get_backlight_name() function generates a unique name for the
backlight interface, appending an id from 1 to 99 for all backlight devices
after the first.
GCC 15 (and likely other compilers) produce the following
-Wformat-truncation warning:
nouveau_backlight.c: In function ‘nouveau_backlight_init’:
nouveau_backlight.c:56:69: error: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 3 [-Werror=format-truncation=]
56 | snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb);
| ^~
In function ‘nouveau_get_backlight_name’,
inlined from ‘nouveau_backlight_init’ at nouveau_backlight.c:351:7:
nouveau_backlight.c:56:56: note: directive argument in the range [1,
2147483647]
56 | snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb);
| ^~~~~~~~~~~~~~~~
nouveau_backlight.c:56:17: note: ‘snprintf’ output between 14 and 23 bytes into a destination of size 15
56 | snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The warning started appearing after commit
ab244be47a8f ("drm/nouveau:
Fix a potential theorical leak in nouveau_get_backlight_name()") This fix
for the ida usage removed the explicit value check for ids larger than 99.
The compiler is unable to intuit that the ida_alloc_max() limits the
returned value range between 0 and 99.
Because the compiler can no longer infer that the number ranges from 0 to
99, it thinks that it could use as many as 11 digits (10 + the potential -
sign for negative numbers).
The warning has gone unfixed for some time, with at least one kernel test
robot report. The code breaks W=1 builds, which is especially frustrating
with the introduction of CONFIG_WERROR.
The string is stored temporarily on the stack and then copied into the
device name. Its not a big deal to use 11 more bytes of stack rounding out
to an even 24 bytes. Increase BL_NAME_SIZE to 24 to avoid the truncation
warning. This fixes the W=1 builds that include this driver.
Compile tested only.
Fixes:
ab244be47a8f ("drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202312050324.0kv4PnfZ-lkp@intel.com/
Suggested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20250610-jk-nouveua-drm-bl-snprintf-fix-v2-1-7fdd4b84b48e@intel.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Zhi Wang [Tue, 27 May 2025 16:37:12 +0000 (16:37 +0000)]
drm/nouveau: fix a use-after-free in r535_gsp_rpc_push()
The RPC container is released after being passed to r535_gsp_rpc_send().
When sending the initial fragment of a large RPC and passing the
caller's RPC container, the container will be freed prematurely. Subsequent
attempts to send remaining fragments will therefore result in a
use-after-free.
Allocate a temporary RPC container for holding the initial fragment of a
large RPC when sending. Free the caller's container when all fragments
are successfully sent.
Fixes:
176fdcbddfd2 ("drm/nouveau/gsp/r535: add support for booting GSP-RM")
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Link: https://lore.kernel.org/r/20250527163712.3444-1-zhiw@nvidia.com
[ Rebase onto Blackwell changes. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Colin Ian King [Thu, 22 May 2025 13:15:12 +0000 (14:15 +0100)]
drm/nouveau/gsp: Fix potential integer overflow on integer shifts
The left shift int 32 bit integer constants 1 is evaluated using 32 bit
arithmetic and then assigned to a 64 bit unsigned integer. In the case
where the shift is 32 or more this can lead to an overflow. Avoid this
by shifting using the BIT_ULL macro instead.
Fixes:
6c3ac7bcfcff ("drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250522131512.2768310-1-colin.i.king@gmail.com
Matthew Wilcox (Oracle) [Thu, 12 Jun 2025 14:42:53 +0000 (15:42 +0100)]
block: Fix bvec_set_folio() for very large folios
Similarly to
26064d3e2b4d ("block: fix adding folio to bio"), if
we attempt to add a folio that is larger than 4GB, we'll silently
truncate the offset and len. Widen the parameters to size_t, assert
that the length is less than 4GB and set the first page that contains
the interesting data rather than the first page of the folio.
Fixes:
26db5ee15851 (block: add a bvec_set_folio helper)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20250612144255.2850278-1-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Matthew Wilcox (Oracle) [Thu, 12 Jun 2025 14:41:25 +0000 (15:41 +0100)]
bio: Fix bio_first_folio() for SPARSEMEM without VMEMMAP
It is possible for physically contiguous folios to have discontiguous
struct pages if SPARSEMEM is enabled and SPARSEMEM_VMEMMAP is not.
This is correctly handled by folio_page_idx(), so remove this open-coded
implementation.
Fixes:
640d1930bef4 (block: Add bio_for_each_folio_all())
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20250612144126.2849931-1-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thangaraj Samynathan [Thu, 12 Jun 2025 02:30:59 +0000 (08:00 +0530)]
spi: spi-pci1xxxx: Drop MSI-X usage as unsupported by DMA engine
Removes MSI-X from the interrupt request path, as the DMA engine used by
the SPI controller does not support MSI-X interrupts.
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Link: https://patch.msgid.link/20250612023059.71726-1-thangaraj.s@microchip.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Thomas Zimmermann [Thu, 12 Jun 2025 08:53:02 +0000 (10:53 +0200)]
drm/mgag200: Do not include <linux/export.h>
Fix the compile-time warning
drivers/gpu/drm/mgag200/mgag200_ddc.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20250612085308.203861-1-tzimmermann@suse.de
Thomas Zimmermann [Thu, 12 Jun 2025 08:42:55 +0000 (10:42 +0200)]
drm/ast: Do not include <linux/export.h>
Fix the compile-time warning
drivers/gpu/drm/ast/ast_mode.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20250612084257.200907-1-tzimmermann@suse.de
Dave Airlie [Fri, 13 Jun 2025 04:57:09 +0000 (14:57 +1000)]
Merge tag 'drm-misc-fixes-2025-06-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc2:
- Fix infinite EPROBE_DEFER loop in vc4 probing.
- Fix amdxdna firmware size.
- mode fixes for meson.
- Kconfig fix for st7171-i2c.
- Fix -EBUSY WARN_ON_ONCE in dma-buf
- Use dma_sync_sgtable_for_cpu in udmabuf.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/62c06195-8bc1-4dae-8777-e86d94e4d9d9@linux.intel.com
Lorenzo Stoakes [Mon, 9 Jun 2025 16:57:49 +0000 (17:57 +0100)]
mm: add mmap_prepare() compatibility layer for nested file systems
Nested file systems, that is those which invoke call_mmap() within their
own f_op->mmap() handlers, may encounter underlying file systems which
provide the f_op->mmap_prepare() hook introduced by commit
c84bf6dd2b83
("mm: introduce new .mmap_prepare() file callback").
We have a chicken-and-egg scenario here - until all file systems are
converted to using .mmap_prepare(), we cannot convert these nested
handlers, as we can't call f_op->mmap from an .mmap_prepare() hook.
So we have to do it the other way round - invoke the .mmap_prepare() hook
from an .mmap() one.
in order to do so, we need to convert VMA state into a struct vm_area_desc
descriptor, invoking the underlying file system's f_op->mmap_prepare()
callback passing a pointer to this, and then setting VMA state accordingly
and safely.
This patch achieves this via the compat_vma_mmap_prepare() function, which
we invoke from call_mmap() if f_op->mmap_prepare() is specified in the
passed in file pointer.
We place the fundamental logic into mm/vma.h where VMA manipulation
belongs. We also update the VMA userland tests to accommodate the
changes.
The compat_vma_mmap_prepare() function and its associated machinery is
temporary, and will be removed once the conversion of file systems is
complete.
We carefully place this code so it can be used with CONFIG_MMU and also
with cutting edge nommu silicon.
[akpm@linux-foundation.org: export compat_vma_mmap_prepare tp fix build]
[lorenzo.stoakes@oracle.com: remove unused declarations]
Link: https://lkml.kernel.org/r/ac3ae324-4c65-432a-8c6d-2af988b18ac8@lucifer.local
Link: https://lkml.kernel.org/r/20250609165749.344976-1-lorenzo.stoakes@oracle.com
Fixes:
c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback").
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/linux-mm/CAG48ez04yOEVx1ekzOChARDDBZzAKwet8PEoPM4Ln3_rk91AzQ@mail.gmail.com/
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dave Airlie [Fri, 13 Jun 2025 01:05:23 +0000 (11:05 +1000)]
Merge tag 'drm-xe-fixes-2025-06-12' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix regression disallowing 64K SVM migration (Maarten)
- Use a bounce buffer for WA BB (Lucas)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aEsBQoh5Si3ouPgE@fedora
Linus Torvalds [Thu, 12 Jun 2025 19:32:09 +0000 (12:32 -0700)]
Merge tag 'bitmap-for-6.16-rc2' of https://github.com/norov/linux
Pull bitmap fix from Yury Norov:
"Fix for __GENMASK() and __GENMASK_ULL() in UAPI"
* tag 'bitmap-for-6.16-rc2' of https://github.com/norov/linux:
uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again
Bharath SM [Wed, 11 Jun 2025 11:29:02 +0000 (16:59 +0530)]
smb: improve directory cache reuse for readdir operations
Currently, cached directory contents were not reused across subsequent
'ls' operations because the cache validity check relied on comparing
the ctx pointer, which changes with each readdir invocation. As a
result, the cached dir entries was not marked as valid and the cache was
not utilized for subsequent 'ls' operations.
This change uses the file pointer, which remains consistent across all
readdir calls for a given directory instance, to associate and validate
the cache. As a result, cached directory contents can now be
correctly reused, improving performance for repeated directory listings.
Performance gains with local windows SMB server:
Without the patch and default actimeo=1:
1000 directory enumeration operations on dir with 10k files took 135.0s
With this patch and actimeo=0:
1000 directory enumeration operations on dir with 10k files took just 5.1s
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Paulo Alcantara [Thu, 12 Jun 2025 15:45:04 +0000 (12:45 -0300)]
smb: client: fix perf regression with deferred closes
Customer reported that one of their applications started failing to
open files with STATUS_INSUFFICIENT_RESOURCES due to NetApp server
hitting the maximum number of opens to same file that it would allow
for a single client connection.
It turned out the client was failing to reuse open handles with
deferred closes because matching ->f_flags directly without masking
off O_CREAT|O_EXCL|O_TRUNC bits first broke the comparision and then
client ended up with thousands of deferred closes to same file. Those
bits are already satisfied on the original open, so no need to check
them against existing open handles.
Reproducer:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>
#define NR_THREADS 4
#define NR_ITERATIONS 2500
#define TEST_FILE "/mnt/1/test/dir/foo"
static char buf[64];
static void *worker(void *arg)
{
int i, j;
int fd;
for (i = 0; i < NR_ITERATIONS; i++) {
fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_APPEND, 0666);
for (j = 0; j < 16; j++)
write(fd, buf, sizeof(buf));
close(fd);
}
}
int main(int argc, char *argv[])
{
pthread_t t[NR_THREADS];
int fd;
int i;
fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_TRUNC, 0666);
close(fd);
memset(buf, 'a', sizeof(buf));
for (i = 0; i < NR_THREADS; i++)
pthread_create(&t[i], NULL, worker, NULL);
for (i = 0; i < NR_THREADS; i++)
pthread_join(t[i], NULL);
return 0;
}
Before patch:
$ mount.cifs //srv/share /mnt/1 -o ...
$ mkdir -p /mnt/1/test/dir
$ gcc repro.c && ./a.out
...
number of opens: 1391
After patch:
$ mount.cifs //srv/share /mnt/1 -o ...
$ mkdir -p /mnt/1/test/dir
$ gcc repro.c && ./a.out
...
number of opens: 1
Cc: linux-cifs@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: Jay Shin <jaeshin@redhat.com>
Cc: Pierguido Lambri <plambri@redhat.com>
Fixes:
b8ea3b1ff544 ("smb: enable reuse of deferred file handles for write operations")
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Thu, 12 Jun 2025 16:50:36 +0000 (09:50 -0700)]
Merge tag 'net-6.16-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
Lucas De Marchi [Wed, 4 Jun 2025 15:03:05 +0000 (08:03 -0700)]
drm/xe/lrc: Use a temporary buffer for WA BB
In case the BO is in iomem, we can't simply take the vaddr and write to
it. Instead, prepare a separate buffer that is later copied into io
memory. Right now it's just a few words that could be using
xe_map_write32(), but the intention is to grow the WA BB for other
uses.
Fixes:
617d824c5323 ("drm/xe: Add WA BB to capture active context utilization")
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit
ef48715b2d3df17c060e23b9aa636af3d95652f8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Linus Torvalds [Thu, 12 Jun 2025 15:21:13 +0000 (08:21 -0700)]
Merge tag 'pinctrl-v6.16-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Add some missing pins on the Qualcomm QCM2290, along with a managed
resources patch that make it clean and nice
- Drop an unused function in the ST Micro driver
- Drop bouncing MAINTAINER entry
- Drop of_match_ptr() macro to rid compile warnings in the TB10x
driver
- Fix up calculation of pin numbers from base in the Sunxi driver
* tag 'pinctrl-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: dt: Consider pin base when calculating bank number from pin
pinctrl: tb10x: Drop of_match_ptr for ID table
pinctrl: MAINTAINERS: Drop bouncing Jianlong Huang
pinctrl: st: Drop unused st_gpio_bank() function
pinctrl: qcom: pinctrl-qcm2290: Add missing pins
pinctrl: qcom: switch to devm_gpiochip_add_data()
Linus Torvalds [Thu, 12 Jun 2025 15:17:56 +0000 (08:17 -0700)]
Merge tag 'arc-6.16-rc1' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- arch_atomic64_cmpxchg relaxed variant [Jason]
- use of inbuilt swap in stack unwinder [Yu-Chun Lin]
- use of __ASSEMBLER__ in kernel headers [Thomas Huth]
* tag 'arc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in the non-uapi headers
ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
ARC: unwind: Use built-in sort swap to reduce code size and improve performance
ARC: atomics: Implement arch_atomic64_cmpxchg using _relaxed
Jakub Kicinski [Thu, 12 Jun 2025 15:16:47 +0000 (08:16 -0700)]
Merge tag 'wireless-2025-06-12' of https://git./linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Another quick round of updates:
- revert mwifiex HT40 that was causing issues
- many ath10k/ath11k/ath12k fixes
- re-add some iwlwifi code I lost in a merge
- use kfree_sensitive() on an error path in cfg80211
* tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: use kfree_sensitive() for connkeys cleanup
wifi: iwlwifi: fix merge damage related to iwl_pci_resume
Revert "wifi: mwifiex: Fix HT40 bandwidth issue."
wifi: ath12k: fix uaf in ath12k_core_init()
wifi: ath12k: Fix hal_reo_cmd_status kernel-doc
wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850
wifi: ath11k: validate ath11k_crypto_mode on top of ath11k_core_qmi_firmware_ready
wifi: ath11k: consistently use ath11k_mac_get_fw_stats()
wifi: ath11k: move locking outside of ath11k_mac_get_fw_stats()
wifi: ath11k: adjust unlock sequence in ath11k_update_stats_event()
wifi: ath11k: move some firmware stats related functions outside of debugfs
wifi: ath11k: don't wait when there is no vdev started
wifi: ath11k: don't use static variables in ath11k_debugfs_fw_stats_process()
wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
wil6210: fix support for sparrow chipsets
wifi: ath10k: Avoid vdev delete timeout when firmware is already down
ath10k: snoc: fix unbalanced IRQ enable in crash recovery
====================
Link: https://patch.msgid.link/20250612082519.11447-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Jun 2025 15:15:37 +0000 (08:15 -0700)]
Merge branch 'fix-ntuple-rules-targeting-default-rss'
Gal Pressman says:
====================
Fix ntuple rules targeting default RSS
This series addresses a regression in ethtool flow steering where rules
targeting the default RSS context (context 0) were incorrectly rejected.
The default RSS context always exists but is not stored in the rss_ctx
xarray like additional contexts. The current validation logic was
checking for the existence of context 0 in this array, causing valid
flow steering rules to be rejected.
This prevented configurations such as:
- High priority rules directing specific traffic to the default context
- Low priority catch-all rules directing remaining traffic to additional
contexts
Patch 1 fixes the validation logic to skip the existence check for
context 0.
Patch 2 adds a selftest that verifies this behavior.
v3: https://lore.kernel.org/
20250609120250.
1630125-1-gal@nvidia.com
v2: https://lore.kernel.org/
20250225071348.509432-1-gal@nvidia.com
====================
Link: https://patch.msgid.link/20250612071958.1696361-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Thu, 12 Jun 2025 07:19:58 +0000 (10:19 +0300)]
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
Add test_rss_default_context_rule() to verify that ntuple rules can
correctly direct traffic to the default RSS context (context 0).
The test creates two ntuple rules with explicit location priorities:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This validates that:
1. Rules targeting the default context function properly.
2. Traffic steering works as expected when mixing default and
additional RSS contexts.
The test was written by AI, and reviewed by humans.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250612071958.1696361-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Thu, 12 Jun 2025 07:19:57 +0000 (10:19 +0300)]
net: ethtool: Don't check if RSS context exists in case of context 0
Context 0 (default context) always exists, there is no need to check
whether it exists or not when adding a flow steering rule.
The existing check fails when creating a flow steering rule for context
0 as it is not stored in the rss_ctx xarray.
For example:
$ ethtool --config-ntuple eth2 flow-type tcp4 dst-ip 194.237.147.23 dst-port 19983 context 0 loc 618
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule
An example usecase for this could be:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This is a user-visible regression that was caught in our testing
environment, it was not reported by a user yet.
Fixes:
de7f7582dff2 ("net: ethtool: prevent flow steering to RSS contexts which don't exist")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250612071958.1696361-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Jun 2025 15:13:47 +0000 (08:13 -0700)]
Merge tag 'for-net-2025-06-11' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- eir: Fix NULL pointer deference on eir_get_service_data
- eir: Fix possible crashes on eir_create_adv_data
- hci_sync: Fix broadcast/PA when using an existing instance
- ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
- ISO: Fix not using bc_sid as advertisement SID
- MGMT: Fix sparse errors
* tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Fix sparse errors
Bluetooth: ISO: Fix not using bc_sid as advertisement SID
Bluetooth: ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
Bluetooth: eir: Fix possible crashes on eir_create_adv_data
Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance
Bluetooth: Fix NULL pointer deference on eir_get_service_data
====================
Link: https://patch.msgid.link/20250611204944.1559356-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Wed, 11 Jun 2025 20:27:35 +0000 (13:27 -0700)]
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
Before the cited commit, the kernel unconditionally embedded SCM
credentials to skb for embryo sockets even when both the sender
and listener disabled SO_PASSCRED and SO_PASSPIDFD.
Now, the credentials are added to skb only when configured by the
sender or the listener.
However, as reported in the link below, it caused a regression for
some programs that assume credentials are included in every skb,
but sometimes not now.
The only problematic scenario would be that a socket starts listening
before setting the option. Then, there will be 2 types of non-small
race window, where a client can send skb without credentials, which
the peer receives as an "invalid" message (and aborts the connection
it seems ?):
Client Server
------ ------
s1.listen() <-- No SO_PASS{CRED,PIDFD}
s2.connect()
s2.send() <-- w/o cred
s1.setsockopt(SO_PASS{CRED,PIDFD})
s2.send() <-- w/ cred
or
Client Server
------ ------
s1.listen() <-- No SO_PASS{CRED,PIDFD}
s2.connect()
s2.send() <-- w/o cred
s3, _ = s1.accept() <-- Inherit cred options
s2.send() <-- w/o cred but not set yet
s3.setsockopt(SO_PASS{CRED,PIDFD})
s2.send() <-- w/ cred
It's unfortunate that buggy programs depend on the behaviour,
but let's restore the previous behaviour.
Fixes:
3f84d577b79d ("af_unix: Inherit sk_flags at connect().")
Reported-by: Jacek Łuczak <difrost.kernel@gmail.com>
Closes: https://lore.kernel.org/all/
68d38b0b-1666-4974-85d4-
15575789c8d4@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Tested-by: Christian Heusel <christian@heusel.eu>
Tested-by: André Almeida <andrealmeid@igalia.com>
Tested-by: Jacek Łuczak <difrost.kernel@gmail.com>
Link: https://patch.msgid.link/20250611202758.3075858-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Wed, 11 Jun 2025 19:35:02 +0000 (12:35 -0700)]
ipv6: Move fib6_config_validate() to ip6_route_add().
syzkaller created an IPv6 route from a malformed packet, which has
a prefix len > 128, triggering the splat below. [0]
This is a similar issue fixed by commit
586ceac9acb7 ("ipv6: Restore
fib6_config validation for SIOCADDRT.").
The cited commit removed fib6_config validation from some callers
of ip6_add_route().
Let's move the validation back to ip6_route_add() and
ip6_route_multipath_add().
[0]:
UBSAN: array-index-out-of-bounds in ./include/net/ipv6.h:616:34
index 20 is out of range for type '__u8 [16]'
CPU: 1 UID: 0 PID: 7444 Comm: syz.0.708 Not tainted
6.16.0-rc1-syzkaller-g19272b37aa4f #0 PREEMPT
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<
ffffffff80078a80>] dump_backtrace+0x2e/0x3c arch/riscv/kernel/stacktrace.c:132
[<
ffffffff8000327a>] show_stack+0x30/0x3c arch/riscv/kernel/stacktrace.c:138
[<
ffffffff80061012>] __dump_stack lib/dump_stack.c:94 [inline]
[<
ffffffff80061012>] dump_stack_lvl+0x12e/0x1a6 lib/dump_stack.c:120
[<
ffffffff800610a6>] dump_stack+0x1c/0x24 lib/dump_stack.c:129
[<
ffffffff8001c0ea>] ubsan_epilogue+0x14/0x46 lib/ubsan.c:233
[<
ffffffff819ba290>] __ubsan_handle_out_of_bounds+0xf6/0xf8 lib/ubsan.c:455
[<
ffffffff85b363a4>] ipv6_addr_prefix include/net/ipv6.h:616 [inline]
[<
ffffffff85b363a4>] ip6_route_info_create+0x8f8/0x96e net/ipv6/route.c:3793
[<
ffffffff85b635da>] ip6_route_add+0x2a/0x1aa net/ipv6/route.c:3889
[<
ffffffff85b02e08>] addrconf_prefix_route+0x2c4/0x4e8 net/ipv6/addrconf.c:2487
[<
ffffffff85b23bb2>] addrconf_prefix_rcv+0x1720/0x1e62 net/ipv6/addrconf.c:2878
[<
ffffffff85b92664>] ndisc_router_discovery+0x1a06/0x3504 net/ipv6/ndisc.c:1570
[<
ffffffff85b99038>] ndisc_rcv+0x500/0x600 net/ipv6/ndisc.c:1874
[<
ffffffff85bc2c18>] icmpv6_rcv+0x145e/0x1e0a net/ipv6/icmp.c:988
[<
ffffffff85af6798>] ip6_protocol_deliver_rcu+0x18a/0x1976 net/ipv6/ip6_input.c:436
[<
ffffffff85af8078>] ip6_input_finish+0xf4/0x174 net/ipv6/ip6_input.c:480
[<
ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<
ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<
ffffffff85af8262>] ip6_input+0x16a/0x70c net/ipv6/ip6_input.c:491
[<
ffffffff85af8dcc>] ip6_mc_input+0x5c8/0x1268 net/ipv6/ip6_input.c:588
[<
ffffffff85af6112>] dst_input include/net/dst.h:469 [inline]
[<
ffffffff85af6112>] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
[<
ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<
ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<
ffffffff85af6112>] ipv6_rcv+0x5ae/0x6e0 net/ipv6/ip6_input.c:309
[<
ffffffff85087e84>] __netif_receive_skb_one_core+0x106/0x16e net/core/dev.c:5977
[<
ffffffff85088104>] __netif_receive_skb+0x2c/0x144 net/core/dev.c:6090
[<
ffffffff850883c6>] netif_receive_skb_internal net/core/dev.c:6176 [inline]
[<
ffffffff850883c6>] netif_receive_skb+0x1aa/0xbf2 net/core/dev.c:6235
[<
ffffffff8328656e>] tun_rx_batched.isra.0+0x430/0x686 drivers/net/tun.c:1485
[<
ffffffff8329ed3a>] tun_get_user+0x2952/0x3d6c drivers/net/tun.c:1938
[<
ffffffff832a21e0>] tun_chr_write_iter+0xc4/0x21c drivers/net/tun.c:1984
[<
ffffffff80b9b9ae>] new_sync_write fs/read_write.c:593 [inline]
[<
ffffffff80b9b9ae>] vfs_write+0x56c/0xa9a fs/read_write.c:686
[<
ffffffff80b9c2be>] ksys_write+0x126/0x228 fs/read_write.c:738
[<
ffffffff80b9c42e>] __do_sys_write fs/read_write.c:749 [inline]
[<
ffffffff80b9c42e>] __se_sys_write fs/read_write.c:746 [inline]
[<
ffffffff80b9c42e>] __riscv_sys_write+0x6e/0x94 fs/read_write.c:746
[<
ffffffff80076912>] syscall_handler+0x94/0x118 arch/riscv/include/asm/syscall.h:112
[<
ffffffff8637e31e>] do_trap_ecall_u+0x396/0x530 arch/riscv/kernel/traps.c:341
[<
ffffffff863a69e2>] handle_exception+0x146/0x152 arch/riscv/kernel/entry.S:197
Fixes:
fa76c1674f2e ("ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().")
Reported-by: syzbot+4c2358694722d304c44e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
6849b8c3.
a00a0220.1eb5f5.00f0.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611193551.2999991-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Jun 2025 17:46:43 +0000 (10:46 -0700)]
net: drv: netdevsim: don't napi_complete() from netpoll
netdevsim supports netpoll. Make sure we don't call napi_complete()
from it, since it may not be scheduled. Breno reports hitting a
warning in napi_complete_done():
WARNING: CPU: 14 PID: 104 at net/core/dev.c:6592 napi_complete_done+0x2cc/0x560
__napi_poll+0x2d8/0x3a0
handle_softirqs+0x1fe/0x710
This is presumably after netpoll stole the SCHED bit prematurely.
Reported-by: Breno Leitao <leitao@debian.org>
Fixes:
3762ec05a9fb ("netdevsim: add NAPI support")
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250611174643.2769263-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Wed, 11 Jun 2025 13:14:32 +0000 (16:14 +0300)]
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails.
Fixes:
17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jesper Dangaard Brouer [Wed, 11 Jun 2025 12:40:04 +0000 (14:40 +0200)]
veth: prevent NULL pointer dereference in veth_xdp_rcv
The veth peer device is RCU protected, but when the peer device gets
deleted (veth_dellink) then the pointer is assigned NULL (via
RCU_INIT_POINTER).
This patch adds a necessary NULL check in veth_xdp_rcv when accessing
the veth peer net_device.
This fixes a bug introduced in commit
dc82a33297fc ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops"). The bug is a race
and only triggers when having inflight packets on a veth that is being
deleted.
Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Closes: https://lore.kernel.org/all/
fecfcad0-7a16-42b8-bff2-
66ee83a6e5c4@linux.dev/
Reported-by: syzbot+c4c7bf27f6b0c4bd97fe@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/
683da55e.
a00a0220.d8eae.0052.GAE@google.com/
Fixes:
dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://patch.msgid.link/174964557873.519608.10855046105237280978.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Jun 2025 15:05:53 +0000 (08:05 -0700)]
Merge branch 'net_sched-no-longer-use-qdisc_tree_flush_backlog'
Eric Dumazet says:
====================
net_sched: no longer use qdisc_tree_flush_backlog()
This series is based on a report from Gerrard Tai.
Essentially, all users of qdisc_tree_flush_backlog() are racy.
We must instead use qdisc_purge_queue().
====================
Link: https://patch.msgid.link/20250611111515.1983366-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:15 +0000 (11:15 +0000)]
net_sched: remove qdisc_tree_flush_backlog()
This function is no longer used after the four prior fixes.
Given all prior uses were wrong, it seems better to remove it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:14 +0000 (11:15 +0000)]
net_sched: ets: fix a race in ets_qdisc_change()
Gerrard Tai reported a race condition in ETS, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:13 +0000 (11:15 +0000)]
net_sched: tbf: fix a race in tbf_change()
Gerrard Tai reported a race condition in TBF, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://patch.msgid.link/20250611111515.1983366-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:12 +0000 (11:15 +0000)]
net_sched: red: fix a race in __red_change()
Gerrard Tai reported a race condition in RED, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
0c8d13ac9607 ("net: sched: red: delay destroying child qdisc on replace")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 11:15:11 +0000 (11:15 +0000)]
net_sched: prio: fix a race in prio_tune()
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes:
7b8e0b6e6599 ("net: sched: prio: delay destroying child qdiscs on change")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 11 Jun 2025 08:35:01 +0000 (08:35 +0000)]
net_sched: sch_sfq: reject invalid perturb period
Gerrard Tai reported that SFQ perturb_period has no range check yet,
and this can be used to trigger a race condition fixed in a separate patch.
We want to make sure ctl->perturb_period * HZ will not overflow
and is positive.
Tested:
tc qd add dev lo root sfq perturb -10 # negative value : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb
1000000000 # too big : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb
2000000 # acceptable value
tc -s -d qd sh dev lo
qdisc sfq 8005: root refcnt 2 limit 127p quantum 64Kb depth 127 flows 128 divisor 1024 perturb 2000000sec
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250611083501.1810459-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxime Chevallier [Fri, 6 Jun 2025 09:43:20 +0000 (11:43 +0200)]
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
When performing a non-exact phy_caps lookup, we are looking for a
supported mode that matches as closely as possible the passed speed/duplex.
Blamed patch broke that logic by returning a match too early in case
the caller asks for half-duplex, as a full-duplex linkmode may match
first, and returned as a non-exact match without even trying to mach on
half-duplex modes.
Reported-by: Jijie Shao <shaojijie@huawei.com>
Closes: https://lore.kernel.org/netdev/
20250603102500.
4ec743cf@fedora/T/#m22ed60ca635c67dc7d9cbb47e8995b2beb5c1576
Tested-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Fixes:
fc81e257d19f ("net: phy: phy_caps: Allow looking-up link caps based on speed and duplex")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250606094321.483602-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Keith Busch [Wed, 11 Jun 2025 20:53:43 +0000 (13:53 -0700)]
io_uring: consistently use rcu semantics with sqpoll thread
The sqpoll thread is dereferenced with rcu read protection in one place,
so it needs to be annotated as an __rcu type, and should consistently
use rcu helpers for access and assignment to make sparse happy.
Since most of the accesses occur under the sqd->lock, we can use
rcu_dereference_protected() without declaring an rcu read section.
Provide a simple helper to get the thread from a locked context.
Fixes:
ac0b8b327a5677d ("io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250611205343.1821117-1-kbusch@meta.com
[axboe: fold in fix for register.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>