Denis Arefev [Thu, 20 Mar 2025 09:35:02 +0000 (12:35 +0300)]
drm/amd/pm/smu11: Prevent division by zero
The user can set any speed value.
If speed is greater than UINT_MAX/8, division by zero is possible.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
1e866f1fe528 ("drm/amd/pm: Prevent divide by zero")
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
da7dc714a8f8e1c9fc33c57cd63583779a3bef71)
Cc: stable@vger.kernel.org
Alex Deucher [Sun, 6 Apr 2025 21:27:24 +0000 (17:27 -0400)]
drm/amdgpu: cancel gfx idle work in device suspend for s0ix
This is normally handled in the gfx IP suspend callbacks, but
for S0ix, those are skipped because we don't want to touch
gfx. So handle it in device suspend.
Fixes:
b9467983b774 ("drm/amdgpu: add dynamic workload profile switching for gfx10")
Fixes:
963537ca2325 ("drm/amdgpu: add dynamic workload profile switching for gfx11")
Fixes:
5f95a1549555 ("drm/amdgpu: add dynamic workload profile switching for gfx12")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
906ad451675155380c1dc1881a244ebde8e8df0a)
Cc: stable@vger.kernel.org
Kenneth Feng [Fri, 28 Mar 2025 02:34:57 +0000 (10:34 +0800)]
drm/amd/display: pause the workload setting in dm
Pause the workload setting in dm when doing idle optimization
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
b23f81c442ac33af0c808b4bb26333b881669bb7)
Alex Deucher [Wed, 26 Mar 2025 14:54:56 +0000 (10:54 -0400)]
drm/amdgpu/pm/swsmu: implement pause workload profile
Add the callback for implementation for swsmu.
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
92e511d1cecc6a8fa7bdfc8657f16ece9ab4d456)
Alex Deucher [Wed, 26 Mar 2025 14:26:25 +0000 (10:26 -0400)]
drm/amdgpu/pm: add workload profile pause helper
To be used for display idle optimizations when
we want to pause non-default profiles.
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
6dafb5d4c7cdfc8f994e789d050e29e0d5ca6efd)
Christian König [Thu, 9 Jan 2025 16:57:56 +0000 (11:57 -0500)]
drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P
Try pinning into VRAM to allow P2P with RDMA NICs without ODP
support if all attachments can do P2P. If any attachment can't do
P2P just pin into GTT instead.
Acked-by: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Pak Nin Lui <pak.lui@amd.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Maarten Lankhorst [Thu, 27 Mar 2025 19:51:28 +0000 (20:51 +0100)]
drm/amdgpu: Add cgroups implementation
Similar to xe, enable some simple management of VRAM only.
Reviewed-by: Christian König <christian.koenig@amd.com>
Co-developed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jay Cornwall [Fri, 21 Mar 2025 18:19:05 +0000 (13:19 -0500)]
drm/amdgpu: Increase KIQ invalidate_tlbs timeout
KIQ invalidate_tlbs request has been seen to marginally exceed the
configured 100 ms timeout on systems under load.
All other KIQ requests in the driver use a 10 second timeout. Use a
similar timeout implementation on the invalidate_tlbs path.
v2: Poll once before msleep
v3: Fix return value
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Cc: Kent Russell <kent.russell@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Thu, 27 Mar 2025 15:50:42 +0000 (11:50 -0400)]
drm/amdkfd: limit sdma queue reset caps flagging for gfx9
ASICs post GFX 9 are being flagged as SDMA per queue reset supported
in the KGD but KFD and scheduler FW currently have no support.
Limit SDMA queue reset capabilities to GFX 9.
Fixes:
ceb7114c961b ("drm/amdkfd: flag per-sdma queue reset supported to user space")
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Mario Limonciello [Thu, 27 Mar 2025 19:07:55 +0000 (14:07 -0500)]
drm/amd/display: Add HP Elitebook 645 to the quirk list for eDP on DP1
[Why]
HP Elitebook 645 has DP0 and DP1 swapped.
[How]
Add HP Elitebook 645 to DP0/DP1 swap quirk list.
Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3701
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Thu, 6 Mar 2025 17:29:20 +0000 (11:29 -0600)]
drm/amd/display: Add HP Probook 445 and 465 to the quirk list for eDP on DP1
[Why]
HP Probook 445 and 465 has DP0 and DP1 swapped.
[How]
Add HP Probook 445 and 465 to DP0/DP1 swap quirk list.
Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3995
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Anson Tsao <anson.tsao@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Huacai Chen [Thu, 27 Mar 2025 09:53:34 +0000 (17:53 +0800)]
drm/amd/display: Protect FPU in dml2_validate()/dml21_validate()
Commit
7da55c27e76749b9 ("drm/amd/display: Remove incorrect FP context
start") removes the FP context protection of dml2_create(), and it said
"All the DC_FP_START/END should be used before call anything from DML2".
However, dml2_validate()/dml21_validate() are not protected from their
callers, causing such errors:
do_fpu invoked from kernel context![#1]:
CPU: 10 UID: 0 PID: 331 Comm: kworker/10:1H Not tainted 6.14.0-rc6+ #4
Workqueue: events_highpri dm_irq_work_func [amdgpu]
pc
ffff800003191eb0 ra
ffff800003191e60 tp
9000000107a94000 sp
9000000107a975b0
a0
9000000140ce4910 a1
0000000000000000 a2
9000000140ce49b0 a3
9000000140ce49a8
a4
9000000140ce49a8 a5
0000000100000000 a6
0000000000000001 a7
9000000107a97660
t0
ffff800003790000 t1
9000000140ce5000 t2
0000000000000001 t3
0000000000000000
t4
0000000000000004 t5
0000000000000000 t6
0000000000000000 t7
0000000000000000
t8
0000000100000000 u0
ffff8000031a3b9c s9
9000000130bc0000 s0
9000000132400000
s1
9000000140ec0000 s2
9000000132400000 s3
9000000140ce0000 s4
90000000057f8b88
s5
9000000140ec0000 s6
9000000140ce4910 s7
0000000000000001 s8
9000000130d45010
ra:
ffff800003191e60 dml21_map_dc_state_into_dml_display_cfg+0x40/0x1140 [amdgpu]
ERA:
ffff800003191eb0 dml21_map_dc_state_into_dml_display_cfg+0x90/0x1140 [amdgpu]
CRMD:
000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD:
00000004 (PPLV0 +PIE -PWE)
EUEN:
00000000 (-FPE -SXE -ASXE -BTE)
ECFG:
00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT:
000f0000 [FPD] (IS= ECode=15 EsubCode=0)
PRID:
0014d010 (Loongson-64bit, Loongson-3C6000/S)
Process kworker/10:1H (pid: 331, threadinfo=
000000007bf9ddb0, task=
00000000cc4ab9f3)
Stack :
0000000100000000 0000043800000780 0000000100000001 0000000100000001
0000000000000000 0000078000000000 0000000000000438 0000078000000000
0000000000000438 0000078000000000 0000000000000438 0000000100000000
0000000100000000 0000000100000000 0000000100000000 0000000100000000
0000000000000001 9000000140ec0000 9000000132400000 9000000132400000
ffff800003408000 ffff800003408000 9000000132400000 9000000140ce0000
9000000140ce0000 ffff800003193850 0000000000000001 9000000140ec0000
9000000132400000 9000000140ec0860 9000000140ec0738 0000000000000001
90000001405e8000 9000000130bc0000 9000000140ec02a8 ffff8000031b5db8
0000000000000000 0000043800000780 0000000000000003 ffff8000031b79cc
...
Call Trace:
[<
ffff800003191eb0>] dml21_map_dc_state_into_dml_display_cfg+0x90/0x1140 [amdgpu]
[<
ffff80000319384c>] dml21_validate+0xcc/0x520 [amdgpu]
[<
ffff8000031b8948>] dc_validate_global_state+0x2e8/0x460 [amdgpu]
[<
ffff800002e94034>] create_validate_stream_for_sink+0x3d4/0x420 [amdgpu]
[<
ffff800002e940e4>] amdgpu_dm_connector_mode_valid+0x64/0x240 [amdgpu]
[<
900000000441d6b8>] drm_connector_mode_valid+0x38/0x80
[<
900000000441d824>] __drm_helper_update_and_validate+0x124/0x3e0
[<
900000000441ddc0>] drm_helper_probe_single_connector_modes+0x2e0/0x620
[<
90000000044050dc>] drm_client_modeset_probe+0x23c/0x1780
[<
9000000004420384>] __drm_fb_helper_initial_config_and_unlock+0x44/0x5a0
[<
9000000004403acc>] drm_client_dev_hotplug+0xcc/0x140
[<
ffff800002e9ab50>] handle_hpd_irq_helper+0x1b0/0x1e0 [amdgpu]
[<
90000000038f5da0>] process_one_work+0x160/0x300
[<
90000000038f6718>] worker_thread+0x318/0x440
[<
9000000003901b8c>] kthread+0x12c/0x220
[<
90000000038b1484>] ret_from_kernel_thread+0x8/0xa4
Unfortunately, protecting dml2_validate()/dml21_validate() out of DML2
causes "sleeping function called from invalid context", so protect them
with DC_FP_START() and DC_FP_END() inside.
Fixes:
7da55c27e767 ("drm/amd/display: Remove incorrect FP context start")
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tested-by: Dongyan Qian <qiandongyan@loongson.cn>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Huacai Chen [Thu, 27 Mar 2025 09:53:33 +0000 (17:53 +0800)]
drm/amd/display: Protect FPU in dml2_init()/dml21_init()
Commit
7da55c27e76749b9 ("drm/amd/display: Remove incorrect FP context
start") removes the FP context protection of dml2_create(), and it said
"All the DC_FP_START/END should be used before call anything from DML2".
However, dml2_init()/dml21_init() are not protected from their callers,
causing such errors:
do_fpu invoked from kernel context![#1]:
CPU: 0 UID: 0 PID: 239 Comm: kworker/0:5 Not tainted 6.14.0-rc6+ #2
Workqueue: events work_for_cpu_fn
pc
ffff80000319de80 ra
ffff80000319de5c tp
900000010575c000 sp
900000010575f840
a0
0000000000000000 a1
900000012f210130 a2
900000012f000000 a3
ffff80000357e268
a4
ffff80000357e260 a5
900000012ea52cf0 a6
0000000400000004 a7
0000012c00001388
t0
00001900000015e0 t1
ffff80000379d000 t2
0000000010624dd3 t3
0000006400000014
t4
00000000000003e8 t5
0000005000000018 t6
0000000000000020 t7
0000000f00000064
t8
000000000000002f u0
5f5e9200f8901912 s9
900000012d380010 s0
900000012ea51fd8
s1
900000012f000000 s2
9000000109296000 s3
0000000000000001 s4
0000000000001fd8
s5
0000000000000001 s6
ffff800003415000 s7
900000012d390000 s8
ffff800003211f80
ra:
ffff80000319de5c dml21_apply_soc_bb_overrides+0x3c/0x960 [amdgpu]
ERA:
ffff80000319de80 dml21_apply_soc_bb_overrides+0x60/0x960 [amdgpu]
CRMD:
000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD:
00000004 (PPLV0 +PIE -PWE)
EUEN:
00000000 (-FPE -SXE -ASXE -BTE)
ECFG:
00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT:
000f0000 [FPD] (IS= ECode=15 EsubCode=0)
PRID:
0014d010 (Loongson-64bit, Loongson-3C6000/S)
Process kworker/0:5 (pid: 239, threadinfo=
00000000927eadc6, task=
000000008fd31682)
Stack :
00040dc000003164 0000000000000001 900000012f210130 900000012eabeeb8
900000012f000000 ffff80000319fe48 900000012f210000 900000012f210130
900000012f000000 900000012eabeeb8 0000000000000001 ffff8000031a0064
900000010575f9f0 900000012f210130 900000012eac0000 900000012ea80000
900000012f000000 ffff8000031cefc4 900000010575f9f0 ffff8000035859c0
ffff800003414000 900000010575fa78 900000012f000000 ffff8000031b4c50
0000000000000000 9000000101c9d700 9000000109c40000 5f5e9200f8901912
900000012d3c4bd0 900000012d3c5000 ffff8000034aed18 900000012d380010
900000012d3c4bd0 ffff800003414000 900000012d380000 ffff800002ea49dc
0000000000000001 900000012d3c6000 00000000ffffe423 0000000000010000
...
Call Trace:
[<
ffff80000319de80>] dml21_apply_soc_bb_overrides+0x60/0x960 [amdgpu]
[<
ffff80000319fe44>] dml21_init+0xa4/0x280 [amdgpu]
[<
ffff8000031a0060>] dml21_create+0x40/0x80 [amdgpu]
[<
ffff8000031cefc0>] dc_state_create+0x100/0x160 [amdgpu]
[<
ffff8000031b4c4c>] dc_create+0x44c/0x640 [amdgpu]
[<
ffff800002ea49d8>] amdgpu_dm_init+0x3f8/0x2060 [amdgpu]
[<
ffff800002ea6658>] dm_hw_init+0x18/0x60 [amdgpu]
[<
ffff800002b16738>] amdgpu_device_init+0x1938/0x27e0 [amdgpu]
[<
ffff800002b18e80>] amdgpu_driver_load_kms+0x20/0xa0 [amdgpu]
[<
ffff800002b0c8f0>] amdgpu_pci_probe+0x1b0/0x580 [amdgpu]
[<
900000000448eae4>] local_pci_probe+0x44/0xc0
[<
9000000003b02b18>] work_for_cpu_fn+0x18/0x40
[<
9000000003b05da0>] process_one_work+0x160/0x300
[<
9000000003b06718>] worker_thread+0x318/0x440
[<
9000000003b11b8c>] kthread+0x12c/0x220
[<
9000000003ac1484>] ret_from_kernel_thread+0x8/0xa4
Unfortunately, protecting dml2_init()/dml21_init() out of DML2 causes
"sleeping function called from invalid context", so protect them with
DC_FP_START() and DC_FP_END() inside.
Fixes:
7da55c27e767 ("drm/amd/display: Remove incorrect FP context start")
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Huacai Chen [Thu, 27 Mar 2025 09:53:32 +0000 (17:53 +0800)]
drm/amd/display: Protect FPU in dml21_copy()
Commit
7da55c27e76749b9 ("drm/amd/display: Remove incorrect FP context
start") removes the FP context protection of dml2_create(), and it said
"All the DC_FP_START/END should be used before call anything from DML2".
However, dml21_copy() are not protected from their callers, causing such
errors:
do_fpu invoked from kernel context![#1]:
CPU: 0 UID: 0 PID: 240 Comm: kworker/0:5 Not tainted 6.14.0-rc6+ #1
Workqueue: events work_for_cpu_fn
pc
ffff80000318bd2c ra
ffff80000315750c tp
9000000105910000 sp
9000000105913810
a0
0000000000000000 a1
0000000000000002 a2
900000013140d728 a3
900000013140d720
a4
0000000000000000 a5
9000000131592d98 a6
0000000000017ae8 a7
00000000001312d0
t0
9000000130751ff0 t1
ffff800003790000 t2
ffff800003790000 t3
9000000131592e28
t4
000000000004c6a8 t5
00000000001b7740 t6
0000000000023e38 t7
0000000000249f00
t8
0000000000000002 u0
0000000000000000 s9
900000012b010000 s0
9000000131400000
s1
9000000130751fd8 s2
ffff800003408000 s3
9000000130752c78 s4
9000000131592da8
s5
9000000131592120 s6
9000000130751ff0 s7
9000000131592e28 s8
9000000131400008
ra:
ffff80000315750c dml2_top_soc15_initialize_instance+0x20c/0x300 [amdgpu]
ERA:
ffff80000318bd2c mcg_dcn4_build_min_clock_table+0x14c/0x600 [amdgpu]
CRMD:
000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD:
00000004 (PPLV0 +PIE -PWE)
EUEN:
00000000 (-FPE -SXE -ASXE -BTE)
ECFG:
00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT:
000f0000 [FPD] (IS= ECode=15 EsubCode=0)
PRID:
0014d010 (Loongson-64bit, Loongson-3C6000/S)
Process kworker/0:5 (pid: 240, threadinfo=
00000000f1700428, task=
0000000020d2e962)
Stack :
0000000000000000 0000000000000000 0000000000000000 9000000130751fd8
9000000131400000 ffff8000031574e0 9000000130751ff0 0000000000000000
9000000131592e28 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 f9175936df5d7fd2
900000012b00ff08 900000012b000000 ffff800003409000 ffff8000034a1780
90000001019634c0 900000012b000010 90000001307beeb8 90000001306b0000
0000000000000001 ffff8000031942b4 9000000130780000 90000001306c0000
9000000130780000 ffff8000031c276c 900000012b044bd0 ffff800003408000
...
Call Trace:
[<
ffff80000318bd2c>] mcg_dcn4_build_min_clock_table+0x14c/0x600 [amdgpu]
[<
ffff800003157508>] dml2_top_soc15_initialize_instance+0x208/0x300 [amdgpu]
[<
ffff8000031942b0>] dml21_create_copy+0x30/0x60 [amdgpu]
[<
ffff8000031c2768>] dc_state_create_copy+0x68/0xe0 [amdgpu]
[<
ffff800002e98ea0>] amdgpu_dm_init+0x8c0/0x2060 [amdgpu]
[<
ffff800002e9a658>] dm_hw_init+0x18/0x60 [amdgpu]
[<
ffff800002b0a738>] amdgpu_device_init+0x1938/0x27e0 [amdgpu]
[<
ffff800002b0ce80>] amdgpu_driver_load_kms+0x20/0xa0 [amdgpu]
[<
ffff800002b008f0>] amdgpu_pci_probe+0x1b0/0x580 [amdgpu]
[<
9000000003c7eae4>] local_pci_probe+0x44/0xc0
[<
90000000032f2b18>] work_for_cpu_fn+0x18/0x40
[<
90000000032f5da0>] process_one_work+0x160/0x300
[<
90000000032f6718>] worker_thread+0x318/0x440
[<
9000000003301b8c>] kthread+0x12c/0x220
[<
90000000032b1484>] ret_from_kernel_thread+0x8/0xa4
Unfortunately, protecting dml21_copy() out of DML2 causes "sleeping
function called from invalid context", so protect them with DC_FP_START()
and DC_FP_END() inside.
Fixes:
7da55c27e767 ("drm/amd/display: Remove incorrect FP context start")
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Chung [Wed, 19 Mar 2025 08:31:31 +0000 (16:31 +0800)]
drm/amd/display: Do not enable Replay and PSR while VRR is on in amdgpu_dm_commit_planes()
[Why]
Replay and PSR will cause some video corruption while VRR is enabled.
[How]
Do not enable the Replay and PSR while VRR is active in
amdgpu_dm_enable_self_refresh().
Fixes:
67edb81d6e9a ("drm/amd/display: Disable replay and psr while VRR is enabled")
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Emily Deng [Fri, 28 Mar 2025 10:14:17 +0000 (18:14 +0800)]
drm/amdkfd: sriov doesn't support per queue reset
Disable per queue reset for sriov.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Wed, 26 Mar 2025 12:06:13 +0000 (20:06 +0800)]
drm/amdgpu/ip_discovery: add missing ip_discovery fw
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Auld [Mon, 7 Apr 2025 14:18:25 +0000 (15:18 +0100)]
drm/amdgpu/dma_buf: fix page_link check
The page_link lower bits of the first sg could contain something like
SG_END, if we are mapping a single VRAM page or contiguous blob which
fits into one sg entry. Rather pull out the struct page, and use that in
our check to know if we mapped struct pages vs VRAM.
Fixes:
f44ffd677fb3 ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 20 Mar 2025 13:46:18 +0000 (14:46 +0100)]
drm/amdgpu: immediately use GTT for new allocations
Only use GTT as a fallback if we already have a backing store. This
prevents evictions when an application constantly allocates and frees new
memory.
Partially fixes
https://gitlab.freedesktop.org/drm/amd/-/issues/3844#note_2833985.
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes:
216c1282dde3 ("drm/amdgpu: use GTT only as fallback for VRAM|GTT")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Alex Deucher [Thu, 27 Mar 2025 21:33:49 +0000 (17:33 -0400)]
drm/amdgpu/mes11: optimize MES pipe FW version fetching
Don't fetch it again if we already have it. It seems the
registers don't reliably have the value at resume in some
cases.
Fixes:
028c3fb37e70 ("drm/amdgpu/mes11: initiate mes v11 support")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4083
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Alex Deucher [Thu, 20 Mar 2025 16:09:11 +0000 (12:09 -0400)]
drm/amdgpu/gfx12: fix num_mec
GC12 only has 1 mec.
Fixes:
52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)")
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 26 Mar 2025 13:35:02 +0000 (09:35 -0400)]
drm/amdgpu/gfx11: fix num_mec
GC11 only has 1 mec.
Fixes:
3d879e81f0f9 ("drm/amdgpu: add init support for GFX11 (v2)")
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 12 Mar 2025 06:07:49 +0000 (14:07 +0800)]
drm/amd/pm: Add gpu_metrics_v1_8
Add new gpu_metrics_v1_8 to acquire below host limit counters
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 25 Mar 2025 06:12:08 +0000 (11:42 +0530)]
drm/amdgpu: Prefer shadow rom when available
Fetch VBIOS from shadow ROM when available before trying other methods
like EFI method.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes:
9c081c11c621 ("drm/amdgpu: Reorder to read EFI exported ROM first")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4066
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Asad Kamal [Mon, 17 Mar 2025 06:17:51 +0000 (14:17 +0800)]
drm/amd/pm: Update smu metrics table for smu_v13_0_6
Update smu metrics table to vesrion 0x10 for smu_v13_0_6
v2: Host metrics support removal moved to separate patch (Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Mon, 17 Mar 2025 06:16:04 +0000 (14:16 +0800)]
drm/amd/pm: Remove host limit metrics support
Firmware algorithm changed and the values in this version
are not accurate thereby remove host limit metric support
for smu_v13_0_6, smu_v13_0_12 & smu_v13_0_14
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Wed, 26 Mar 2025 05:41:01 +0000 (13:41 +0800)]
Remove unnecessary firmware version check for gc v9_4_2
GC v9_4_2 uses a new versioning scheme for CP firmware, making
the warning ("CP firmware version too old, please update!") irrelevant.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Christian König [Tue, 18 Mar 2025 15:15:12 +0000 (16:15 +0100)]
drm/amdgpu: stop unmapping MQD for kernel queues v3
This looks unnecessary and actually extremely harmful since using kmap()
is not possible while inside the ring reset.
Remove all the extra mapping and unmapping of the MQDs.
v2: also fix debugfs
v3: fix coding style typo
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Tue, 25 Mar 2025 07:01:19 +0000 (15:01 +0800)]
Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA"
this temporarily reverts
commit
6ec04e38b2f6 ("drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA")
it cause a regression.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Mon, 24 Mar 2025 09:19:54 +0000 (17:19 +0800)]
drm/amdgpu: Parse all deferred errors with UMC aca handle
We should only increase the deferred errors in UMC block.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Tue, 25 Mar 2025 03:10:43 +0000 (11:10 +0800)]
drm/amdgpu: Update ta ras block
Update ta ra block to keep sync with RAS TA.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Mon, 24 Mar 2025 07:56:26 +0000 (13:26 +0530)]
drm/amdgpu: Add NPS2 to DPX compatible mode
Compute partition DPX is possible in NPS2 mode. Update the compatible
modes for DPX.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Fri, 21 Mar 2025 12:47:23 +0000 (20:47 +0800)]
drm/amdgpu: Use correct gfx deferred error count
In the case of parsing GFX deferred error from SMU corrected error
channel, the error count should be set to 1 instead of parsing from
MISC0 register, which is 0.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Li [Tue, 11 Mar 2025 13:43:38 +0000 (09:43 -0400)]
drm/amd/display: Actually do immediate vblank disable
[Why]
The `vblank_config.offdelay` field follows the same semantics as the
`drm_vblank_offdelay` parameter. Setting it to 0 will never disable
vblank.
[How]
Set `offdelay` to a positive number.
Fixes:
e45b6716de4b ("drm/amd/display: use a more lax vblank enable policy for DCN35+")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Brendan Tam [Fri, 14 Mar 2025 17:09:13 +0000 (13:09 -0400)]
drm/amd/display: prevent hang on link training fail
[Why]
When link training fails, the phy clock will be disabled. However, in
enable_streams, it is assumed that link training succeeded and the
mux selects the phy clock, causing a hang when a register write is made.
[How]
When enable_stream is hit, check if link training failed. If it did, fall
back to the ref clock to avoid a hang and keep the system in a recoverable
state.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Brendan Tam <Brendan.Tam@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Charlene Liu [Fri, 14 Mar 2025 00:07:13 +0000 (20:07 -0400)]
Revert "drm/amd/display: dml2 soc dscclk use DPM table clk setting"
[why]
this dscclk use DCN defined per DPM level will cause a DCFCLK increase.
needs to follow up.
This reverts commit
15b959534a39530a21d378190557cc8d1eab7b09
Reviewed-by: Yihan Zhu <yihan.zhu@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Li [Mon, 10 Mar 2025 16:20:39 +0000 (12:20 -0400)]
drm/amd/display: Increase vblank offdelay for PSR panels
[Why]
Depending on when the HW latching event (vupdate) of double-buffered
registers happen relative to the PSR SDP (signals panel psr enter/exit)
deadline, and how bad the Panel clock has drifted since the last ALPM
off event, there can be up to 3 frames of delay between sending the PSR
exit cmd to DMUB fw, and when the panel starts displaying live frames.
This can manifest as micro-stuttering when userspace commit patterns
cause rapid toggling of the DRM vblank counter, since PSR enter/exit is
hooked up to DRM vblank disable/enable respectively.
In the ideal world, the panel should present the live frame immediately
on PSR exit cmd. But due to HW design and PSR limitations, immediate
exit can only happen by chance, when:
1. PSR exit cmd is ack'd by FW before HW latching (vupdate) event, and
2. Panel's SDP deadline -- determined by it's PSR Start Delay in DPCD
71h -- is after the vupdate event. The PSR exit SDP can then be sent
immediately after HW latches. Otherwise, we have to wait 1 frame. And
3. There is negligible drift between the panel's clock and source clock.
Otherwise, there can be up to 1 frame of drift.
Note that this delay is not expected with Panel Replay.
[How]
Since PSR power savings can be quite substantial, and there are a lot of
systems in the wild with PSR panels, It'll be nice to have a middle
ground that balances user experience with power savings.
A simple way to achieve this is by extending the vblank offdelay, such
that additional PSR exit delays will be less perceivable.
We can set:
20/100 * offdelay_ms = 3_frames_ms
=> offdelay_ms = 5 * 3_frames_ms
This ensures that `3_frames_ms` will only be experienced as a 20% delay
on top how long the panel has been static, and thus make the delay
less perceivable.
If this ends up being too high of a percentage, it can be dropped
further in a future change.
Fixes:
537ef0f88897 ("drm/amd/display: use new vblank enable policy for DCN35+")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Mario Limonciello [Wed, 19 Mar 2025 16:58:31 +0000 (11:58 -0500)]
drm/amd: Handle being compiled without SI or CIK support better
If compiled without SI or CIK support but amdgpu tries to load it
will run into failures with uninitialized callbacks.
Show a nicer message in this case and fail probe instead.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4050
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Tomasz Pakuła [Tue, 25 Mar 2025 10:26:52 +0000 (11:26 +0100)]
drm/amd/pm: Add zero RPM enabled OD setting support for SMU14.0.2
Hook up zero RPM enable for 9070 and 9070 XT based on RDNA3
(smu 13.0.0 and 13.0.7) code.
Tested on 9070 XT Hellhound
Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Denis Arefev [Fri, 21 Mar 2025 11:08:16 +0000 (14:08 +0300)]
drm/amd/pm: Prevent division by zero
The user can set any speed value.
If speed is greater than UINT_MAX/8, division by zero is possible.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
c52dcf49195d ("drm/amd/pp: Avoid divide-by-zero in fan_ctrl_set_fan_speed_rpm")
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Denis Arefev [Fri, 21 Mar 2025 11:08:15 +0000 (14:08 +0300)]
drm/amd/pm: Prevent division by zero
The user can set any speed value.
If speed is greater than UINT_MAX/8, division by zero is possible.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
c52dcf49195d ("drm/amd/pp: Avoid divide-by-zero in fan_ctrl_set_fan_speed_rpm")
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Denis Arefev [Fri, 21 Mar 2025 10:52:33 +0000 (13:52 +0300)]
drm/amd/pm: Prevent division by zero
The user can set any speed value.
If speed is greater than UINT_MAX/8, division by zero is possible.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
031db09017da ("drm/amd/powerplay/vega20: enable fan RPM and pwm settings V2")
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Denis Arefev [Fri, 21 Mar 2025 10:52:32 +0000 (13:52 +0300)]
drm/amd/pm: Prevent division by zero
The user can set any speed value.
If speed is greater than UINT_MAX/8, division by zero is possible.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
b64625a303de ("drm/amd/pm: correct the address of Arcturus fan related registers")
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Denis Arefev [Fri, 21 Mar 2025 10:52:31 +0000 (13:52 +0300)]
drm/amd/pm: Prevent division by zero
The user can set any speed value.
If speed is greater than UINT_MAX/8, division by zero is possible.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
c05d1c401572 ("drm/amd/swsmu: add aldebaran smu13 ip support (v3)")
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Dave Airlie [Mon, 24 Mar 2025 22:21:06 +0000 (08:21 +1000)]
Merge tag 'drm-intel-gt-next-2025-03-12' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
UAPI Changes:
- Increase I915_PARAM_MMAP_GTT_VERSION version to indicate support for partial mmaps (José Roberto de Souza)
Driver Changes:
Fixes/improvements/new stuff:
- Implement vmap/vunmap GEM object functions (Asbjørn Sloth Tønnesen)
Miscellaneous:
- Various register definition cleanups (Ville Syrjälä)
- Fix typo in a comment [gt/uc] (Yuichiro Tsuji)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z9IXs5CzHHKScuQn@linux
Dave Airlie [Mon, 24 Mar 2025 07:57:13 +0000 (17:57 +1000)]
Merge tag 'amd-drm-next-6.15-2025-03-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.15-2025-03-21:
amdgpu:
- Refine nomodeset handling
- RAS fixes
- DCN 3.x fixes
- DMUB fixes
- eDP fixes
- SMU 14.0.2 fixes
- SMU 13.0.6 fixes
- SMU 13.0.12 fixes
- SDMA engine reset fixes
- Enforce Isolation fixes
- Runtime workload profile ref count fixes
- Documentation fixes
- SR-IOV fixes
- MES fixes
- GC 11.5 cleaner shader support
- SDMA VM invalidation fixes
- IP discovery improvements for GC based chips
amdkfd:
- Dequeue wait count fixes
- Precise memops fixes
radeon:
- Code cleanup
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250321210909.2809595-1-alexander.deucher@amd.com
Dave Airlie [Mon, 24 Mar 2025 07:56:31 +0000 (17:56 +1000)]
Merge tag 'amd-drm-next-6.15-2025-03-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.15-2025-03-14:
amdgpu:
- GC 12.x DCC fixes
- VCN 2.5 fix
- Replay/PSR fixes
- HPD fixes
- DMUB fixes
- Backlight fixes
- DM suspend/resume cleanup
- Misc DC fixes
- HDCP UAF fix
- Misc code cleanups
- VCE 2.x fix
- Wedged event support
- GC 12.x PTE fixes
- Misc multimedia cap fixes
- Enable unique id support for GC 12.x
- XGMI code cleanup
- GC 11.x and 12.x MQD cleanups
- SMU 13.x updates
- SMU 14.x fan speed reporting
- Enable VCN activity reporting for additional chips
- SR-IOV fixes
- RAS fixes
- MES fixes
amdkfd:
- Dequeue wait count API cleanups
- Queue eviction cleanup fixes
- Retry fault fixes
- Dequeue retry timeout adjustments
- GC 12.x trap handler fixes
- GC 9.5.x updates
radeon:
- VCE command parser fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250314170618.3142042-1-alexander.deucher@amd.com
Dave Airlie [Mon, 24 Mar 2025 07:52:24 +0000 (17:52 +1000)]
Merge tag 'drm-misc-next-fixes-2025-03-13' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Short summary of fixes pull:
appletbdrm:
- Fix device refcount
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250313180135.GA276891@linux.fritz.box
Asad Kamal [Mon, 17 Mar 2025 12:07:06 +0000 (20:07 +0800)]
drm/amd/pm: Update feature list for smu_v13_0_6
Update feature list for smu_v13_0_6 to show vcn & smu deep
sleep feature enable status
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Fri, 21 Mar 2025 01:23:47 +0000 (06:53 +0530)]
drm/amdgpu: Add parameter documentation for amdgpu_sync_fence
The 'flags' parameter, which specifies memory allocation behavior while
creating a sync entry,
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c:162: warning: Function parameter or struct member 'flags' not described in 'amdgpu_sync_fence'
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 11 Mar 2025 22:00:57 +0000 (18:00 -0400)]
drm/amdgpu/discovery: optionally use fw based ip discovery
On chips without native IP discovery support, use the fw binary
if available, otherwise we can continue without it.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Tue, 11 Mar 2025 05:44:08 +0000 (13:44 +0800)]
drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics
vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Thu, 27 Feb 2025 02:39:27 +0000 (10:39 +0800)]
drm/amdgpu/discovery: check ip_discovery fw file available
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Thu, 20 Mar 2025 07:01:38 +0000 (15:01 +0800)]
drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
Few of the metrics data for smu_v13_0_12 has not been reported
in Q10 format, remove UQ10 to UINT conversion for those
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 19 Mar 2025 11:18:43 +0000 (19:18 +0800)]
drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
Few of the metrics data for smu_v13_0_6 has not been reported
in Q10 format, remove UQ10 to UINT conversion for those
v2: Move smu_v13_0_12 changes to separate patch(Kevin)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Thu, 27 Feb 2025 11:33:19 +0000 (19:33 +0800)]
drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA
This commit updates the VM flush implementation for the SDMA engine.
- Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ
register value for the specified VMID and flush type. This function ensures that all relevant
page table cache levels (L1 PTEs, L2 PTEs, and L2 PDEs) are invalidated.
- Modified the `sdma_v4_4_2_ring_emit_vm_flush` function to use the new `sdma_v4_4_2_get_invalidate_req`
function. The updated function emits the necessary register writes and waits to perform a VM flush
for the specified VMID. It updates the PTB address registers and issues a VM invalidation request
using the specified VM invalidation engine.
- Included the necessary header file `gc/gc_9_0_sh_mask.h` to provide access to the required register
definitions.
v2: vm flush by the vm inalidation packet (Lijo)
v3: code stle and define thh macro for the vm invalidation packet (Christian)
v4: Format definition sdma vm invalidate packet (Lijo)
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Tue, 25 Feb 2025 07:25:00 +0000 (15:25 +0800)]
drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush
- Modify the VM invalidation engine allocation logic to handle SDMA page rings.
SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of
allocating a separate engine. This change ensures efficient resource management and
avoids the issue of insufficient VM invalidation engines.
- Add synchronization for GPU TLB flush operations in gmc_v9_0.c.
Use spin_lock and spin_unlock to ensure thread safety and prevent race conditions
during TLB flush operations. This improves the stability and reliability of the driver,
especially in multi-threaded environments.
v2: replace the sdma ring check with a function `amdgpu_sdma_is_page_queue`
to check if a ring is an SDMA page queue.(Lijo)
v3: Add GC version check, only enabled on GC9.4.3/9.4.4/9.5.0
v4: Fix code style and add more detailed description (Christian)
v5: Remove dependency on vm_inv_eng loop order, explicitly lookup shared inv_eng(Christian/Lijo)
v6: Added search shared ring function amdgpu_sdma_get_shared_ring (Lijo)
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Fri, 28 Feb 2025 08:18:32 +0000 (16:18 +0800)]
drm/amd/amdgpu: Increase max rings to enable SDMA page ring
Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149.
This change is necessary to enable support for the SDMA page ring.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Wed, 19 Mar 2025 09:02:49 +0000 (17:02 +0800)]
drm/amdgpu: Decode deferred error type in gfx aca bank parser
In the case of injecting uncorrected error with background workload,
the deferred error among uncorrected errors need to be specified
by checking the deferred and poison bits of status register.
v2: refine checking for deferred error
v2: log possiable DEs among CEs
v2: generate CPER records for DEs among UEs
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Thu, 20 Mar 2025 16:01:35 +0000 (21:31 +0530)]
drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs
Enable the cleaner shader for GFX11.5.0/11.5.1 GPUs to provide data
isolation between GPU workloads. The cleaner shader is responsible for
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps
prevent data leakage and ensures accurate computation results.
This update extends cleaner shader support to GFX11.5.0/11.5.1 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.
Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 19 Mar 2025 14:03:31 +0000 (10:03 -0400)]
drm/amdgpu/mes: clean up SDMA HQD loop
Follow the same logic as the other IP types.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 19 Mar 2025 14:42:45 +0000 (10:42 -0400)]
drm/amdgpu/mes: enable compute pipes across all MEC
Enable pipes on both MECs for MES.
Fixes:
745f46b6a99f ("drm/amdgpu: enable mes v12 self test")
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 19 Mar 2025 13:57:51 +0000 (09:57 -0400)]
drm/amdgpu/mes: drop MES 10.x leftovers
Leftover from MES bring up. There is no production
MES support for MES 10.x. The rest of the MES 10.x
code has already been removed so drop this.
Acked-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 19 Mar 2025 13:56:00 +0000 (09:56 -0400)]
drm/amdgpu/mes: optimize compute loop handling
Break when we get to the end of the supported pipes
rather than continuing the loop.
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 17 Mar 2025 13:45:02 +0000 (09:45 -0400)]
drm/amdgpu/sdma: guilty tracking is per instance
The gfx and page queues are per instance, so track them
per instance.
v2: drop extra parameter (Lijo)
Fixes:
fdbfaaaae06b ("drm/amdgpu: Improve SDMA reset logic with guilty queue tracking")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 14 Mar 2025 23:23:46 +0000 (19:23 -0400)]
drm/amdgpu/sdma: fix engine reset handling
Move the kfd suspend/resume code into the caller. That
is where the KFD is likely to detect a reset so on the KFD
side there is no need to call them. Also add a mutex to
lock the actual reset sequence.
v2: make the locking per instance
Fixes:
bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+")
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Tue, 18 Mar 2025 14:43:41 +0000 (15:43 +0100)]
drm/amdgpu: remove invalid usage of sched.ready
I can't count how often I had to remove this nonsense.
Probably doesn't need an explanation any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 6 Feb 2025 13:26:30 +0000 (14:26 +0100)]
drm/amdgpu: add cleaner shader trace point
Note when the cleaner shader is executed.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 6 Feb 2025 13:16:06 +0000 (14:16 +0100)]
drm/amdgpu: add isolation trace point
Note when we switch from one isolation owner to another.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 27 Jan 2025 15:27:51 +0000 (16:27 +0100)]
drm/amdgpu: stop reserving VMIDs to enforce isolation
That was quite troublesome for gang submit. Completely drop this
approach and enforce the isolation separately.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 27 Jan 2025 14:09:45 +0000 (15:09 +0100)]
drm/amdgpu: rework how the cleaner shader is emitted v3
Instead of emitting the cleaner shader for every job which has the
enforce_isolation flag set only emit it for the first submission from
every client.
v2: add missing NULL check
v3: fix another NULL pointer deref
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 15 Jan 2025 12:44:26 +0000 (13:44 +0100)]
drm/amdgpu: rework how isolation is enforced v2
Limiting the number of available VMIDs to enforce isolation causes some
issues with gang submit and applying certain HW workarounds which
require multiple VMIDs to work correctly.
So instead start to track all submissions to the relevant engines in a
per partition data structure and use the dma_fences of the submissions
to enforce isolation similar to what a VMID limit does.
v2: use ~0l for jobs without isolation to distinct it from kernel
submissions which uses NULL for the owner. Add some warning when we
are OOM.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 23 Jan 2025 13:59:01 +0000 (14:59 +0100)]
drm/amdgpu: overwrite signaled fence in amdgpu_sync
This allows using amdgpu_sync even without peeking into the fences for a
long time.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 15 Jan 2025 14:10:13 +0000 (15:10 +0100)]
drm/amdgpu: use GFP_NOWAIT for memory allocations
In the critical submission path memory allocations can't wait for
reclaim since that can potentially wait for submissions to finish.
Finally clean that up and mark most memory allocations in the critical
path with GFP_NOWAIT. The only exception left is the dma_fence_array()
used when no VMID is available, but that will be cleaned up later on.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kenneth Feng [Wed, 19 Mar 2025 06:08:14 +0000 (14:08 +0800)]
drm/amd/amdgpu: Revert "drm/amd/amdgpu: shorten the gfx idle worker timeout"
This reverts commit
55ff973fe1c053de143969cfc8b34baff084084a.
Reason for revert: this causes some tests fail with call trace.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ahmad Rehman [Tue, 18 Mar 2025 03:41:47 +0000 (23:41 -0400)]
drm/amdgpu/sdam: Skip SDMA queue reset for SRIOV
For SRIOV, skip the SDMA queue reset and return
error. The engine/queue reset failure will trigger
FLR in the sequence.
v2: do not add queue reset support mask for sriov
Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ahmad Rehman [Mon, 17 Mar 2025 04:20:06 +0000 (00:20 -0400)]
drm/amdgpu: Add support to load PSP TA v13.0.12 for SRIOV
Add case for 13.0.12.
Signed-off-by: Ahmad Rehman <ahrehman@amd.com>
Reviewed-by: Vignesh.Chander@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 18 Mar 2025 13:45:18 +0000 (09:45 -0400)]
drm/amdgpu: Enable amdgpu_ras_resume for gfx 9.5.0
This enables ras to be resumed after gpu recovery on mi350 sriov.
Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Fri, 14 Mar 2025 15:08:21 +0000 (11:08 -0400)]
drm/amdkfd: set precise mem ops caps to disabled for gfx 11 and 12
Clause instructions with precise memory enabled currently hang the
shader so set capabilities flag to disabled since it's unsafe to use
for debugging.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Victor Skvortsov [Mon, 17 Mar 2025 13:32:13 +0000 (09:32 -0400)]
drm/amdgpu: Skip pcie_replay_count sysfs creation for VF
VFs cannot read the NAK_COUNTER register. This information is only
available through PMFW metrics.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Wed, 12 Mar 2025 06:21:38 +0000 (14:21 +0800)]
drm/amdgpu: Add active_umc_mask to ras init_flags
Add active_umc_mask to ras init_flags.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 7 Mar 2025 06:17:28 +0000 (11:47 +0530)]
Documentation/amdgpu: Add debug_mask documentation
Add description for debug_mask bit options.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Fri, 7 Mar 2025 05:41:23 +0000 (11:11 +0530)]
drm/amd/pm: Add debug bit for smu pool allocation
In certain cases, it's desirable to avoid PMFW log transactions to
system memory. Add a mask bit to decide whether to allocate smu pool in
device memory or system memory.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 14 Mar 2025 13:45:51 +0000 (09:45 -0400)]
drm/amdgpu/vcn: adjust workload profile handling
No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two. As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission. It should not
affect the reference counting.
v2: bail early if the the profile is already active (Lijo)
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 14 Mar 2025 13:29:59 +0000 (09:29 -0400)]
drm/amdgpu/gfx: adjust workload profile handling
No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two. As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission. It should not
affect the reference counting.
v2: bail early if the the profile is already active (Lijo)
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Thu, 13 Mar 2025 07:06:44 +0000 (15:06 +0800)]
drm/amdgpu: Add EEPROM I2C address support for smu v13_0_12
Add EEPROM I2C address support for smu v13_0_12.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 12 Mar 2025 13:48:47 +0000 (09:48 -0400)]
drm/amdgpu/vcn: fix ref counting for ring based profile handling
We need to make sure the workload profile ref counts are
balanced. This isn't currently the case because we can
increment the count on submissions, but the decrement may
be delayed as work comes in. Track when we enable the
workload profile so the references are balanced.
v2: switch to a mutex and active flag
v3: fix mutex init
Fixes:
1443dd3c67f6 ("drm/amd/pm: fix and simplify workload handling")
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 12 Mar 2025 13:44:19 +0000 (09:44 -0400)]
drm/amdgpu/gfx: fix ref counting for ring based profile handling
We need to make sure the workload profile ref counts are
balanced. This isn't currently the case because we can
increment the count on submissions, but the decrement may
be delayed as work comes in. Track when we enable the
workload profile so the references are balanced.
v2: switch to a mutex and active flag
v3: fix mutex init
Fixes:
8fdb3958e396 ("drm/amdgpu/gfx: add ring helpers for setting workload profile")
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Tested-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harish Kasiviswanathan [Fri, 14 Mar 2025 16:03:53 +0000 (12:03 -0400)]
drm/amdkfd: Fix bug in config_dequeue_wait_counts
For certain ASICs where dequeue_wait_count don't need to be initialized,
pm_config_dequeue_wait_counts_v9 return without filling in the packet
information. However, the calling function interprets this as a success
and sends the uninitialized packet to firmware causing hang.
Fix the above bug by not calling pm_config_dequeue_wait_counts_v9 for
ASICs that don't need the value to be initialized.
v2: Removed redudant code.
Tidy up code based on review comments
v3: Don't call pm_config_dequeue_wait_counts_v9 for certain ASICs
Fixes:
ed962f8d0603 ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
FengWei [Mon, 17 Mar 2025 02:57:45 +0000 (10:57 +0800)]
drm/radeon/uvd: Replace nested max() with single max3()
Use max3() macro instead of nesting max() to simplify the return
statement.
Signed-off-by: FengWei <feng.wei8@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Tue, 14 Jan 2025 12:51:39 +0000 (13:51 +0100)]
drm/amdgpu: grab an additional reference on the gang fence v2
We keep the gang submission fence around in adev, make sure that it
stays alive.
v2: fix memory leak on retry
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 25 Feb 2025 05:59:47 +0000 (11:29 +0530)]
drm/amdgpu: Use wafl version for xgmi
XGMI and WAFL share the same versions. Use WAFL version if XGMI version
is not present in discovery.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Mon, 17 Mar 2025 01:14:36 +0000 (09:14 +0800)]
drm/amdgpu: Fix SDMA engine reset logic
The scheduler should restart only if the reset operation
succeeds This ensures that new tasks are only submitted
to the queues after a successful reset.
Fixes:
4c02f7301657 ("drm/amdgpu: Introduce conditional user queue suspension for SDMA resets")
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse.Zhang <Jesse.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tomasz Pakuła [Tue, 11 Mar 2025 21:38:33 +0000 (22:38 +0100)]
drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2
Currently, it seems like the code was carried over from RDNA3 because
it assumes two possible values to set. RDNA4, instead of having:
0: min SCLK
1: max SCLK
only has
0: SCLK offset
This change makes it so it only reports current offset value instead of
showing possible min/max values and their indices. Moreover, it now only
accepts the offset as a value, without the indice index.
Additionally, the lower bound was printed as %u by mistake.
Old:
OD_SCLK_OFFSET:
0: -500Mhz
1: 1000Mhz
OD_MCLK:
0: 97Mhz
1: 1259MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK_OFFSET: -500Mhz 1000Mhz
MCLK: 97Mhz 1500Mhz
VDDGFX_OFFSET: -200mv 0mv
New:
OD_SCLK_OFFSET:
0Mhz
OD_MCLK:
0: 97Mhz
1: 1259MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK_OFFSET: -500Mhz 1000Mhz
MCLK: 97Mhz 1500Mhz
VDDGFX_OFFSET: -200mv 0mv
Setting this offset:
Old: "s 1 <offset>"
New: "s <offset>"
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4036
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Fri, 14 Mar 2025 02:27:55 +0000 (10:27 +0800)]
drm/amdgpu: release xcp_mgr on exit
Free on driver cleanup.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 9 Mar 2025 08:57:46 +0000 (04:57 -0400)]
drm/amd/display: 3.2.325
This version brings along following fixes:
- Use DPM table clk setting for dml2 soc dscclk
- Update static soc table
- Fix incorrect fw_state address in dmub_srv
- Use HW lock mgr for PSR1 when only one eDP
- Revert "Support for reg inbox0 for host->DMUB CMDs"
- Change notification of link BW allocation
- Fix message for support_edp0_on_dp1
- Guard against setting dispclk low for dcn31x
- Prevent VStartup Overflow
- Check pipe->stream before passing it to a function
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Fri, 28 Feb 2025 13:02:20 +0000 (08:02 -0500)]
drm/amd/display: Use DPM table clk setting for dml2 soc dscclk
[WHY]
Not like dppclk/dispclk, dml2 will calculate the minimum required clocks.
For dscclk, it is used for pure comparision.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Fri, 7 Mar 2025 23:24:56 +0000 (18:24 -0500)]
drm/amd/display: Update static soc table
[WHY]
Update the static soc table dcn3_5_soc.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lo-an Chen [Mon, 10 Mar 2025 06:52:22 +0000 (14:52 +0800)]
drm/amd/display: Fix incorrect fw_state address in dmub_srv
[WHY]
The fw_state in dmub_srv was assigned with wrong address.
The address was pointed to the firmware region.
[HOW]
Fix the firmware state by using DMUB_DEBUG_FW_STATE_OFFSET
in dmub_cmd.h.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Lo-an Chen <lo-an.chen@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Fri, 7 Mar 2025 21:55:20 +0000 (15:55 -0600)]
drm/amd/display: Use HW lock mgr for PSR1 when only one eDP
[WHY]
DMUB locking is important to make sure that registers aren't accessed
while in PSR. Previously it was enabled but caused a deadlock in
situations with multiple eDP panels.
[HOW]
Detect if multiple eDP panels are in use to decide whether to use
lock. Refactor the function so that the first check is for PSR-SU
and then replay is in use to prevent having to look up number
of eDP panels for those configurations.
Fixes:
f245b400a223 ("Revert "drm/amd/display: Use HW lock mgr for PSR1"")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3965
Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>