André Almeida [Wed, 26 Feb 2025 13:11:18 +0000 (10:11 -0300)]
drm/amdgpu: Create a debug option to disable ring reset
Prior to the addition of ring reset, the debug option
`debug_disable_soft_recovery` could be used to force a full device
reset. Now that we have ring reset, create a debug option to disable
them in amdgpu, forcing the driver to go with the full device
reset path again when both options are combined.
This option is useful for testing and debugging purposes when one wants
to test the full reset from userspace.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ma Ke [Wed, 26 Feb 2025 08:37:31 +0000 (16:37 +0800)]
drm/amd/display: Fix null check for pipe_ctx->plane_state in resource_build_scaling_params
Null pointer dereference issue could occur when pipe_ctx->plane_state
is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not
null before accessing. This prevents a null pointer dereference.
Found by code review.
Fixes:
3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 26 Feb 2025 14:25:49 +0000 (09:25 -0500)]
Documentation/gpu: remove duplicate entries in different glossaries
Some items were defined in both the general and DC glossaries.
Remove the duplicate entries.
Fixes:
2df30ae0ba0b ("Documentation/gpu: Add acronyms for some firmware components")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 20 Feb 2025 15:42:30 +0000 (10:42 -0500)]
drm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX11 for most firmware versions. KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Colin Ian King [Wed, 26 Feb 2025 08:57:33 +0000 (08:57 +0000)]
drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammar
There is a spelling mistake and a grammatical error in a dev_err
message. Fix it.
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Wed, 26 Feb 2025 03:36:55 +0000 (11:36 +0800)]
drm/amdgpu: Decode deferred error type in aca bank parser
In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.
v2: check both deferred and poison bit
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Le Ma [Tue, 11 Feb 2025 06:06:54 +0000 (14:06 +0800)]
drm/amdgpu: add sdma page queue irq processing for sdma442
Add the trap irq processing for page queue of sdma442
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by and Tested-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kenneth Feng [Wed, 26 Feb 2025 08:05:57 +0000 (16:05 +0800)]
drm/amd/pm: disable gfxoff on the specific sku
disable gfxoff on the specific sku based on the requirement
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Wed, 26 Feb 2025 06:27:27 +0000 (14:27 +0800)]
drm/amdgpu: Report generic instead of unknown boot time errors
Change the DMESG reporting of unknown errors to "Boot Controller
Generic Error" to align with the RAS SPEC and provide more clarity
to customers.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 25 Feb 2025 10:51:51 +0000 (16:21 +0530)]
drm/amdgpu: Fix logic to fetch supported NPS modes
Correct the logic to find supported NPS modes from firmware.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Ava Zhang <niandong.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Fixes:
30eb41f5d1a7 ("drm/amdgpu: Use firmware supported NPS modes")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Mon, 24 Feb 2025 07:13:40 +0000 (15:13 +0800)]
drm/amdgpu: Disable fru_id field in CPER section
The fru_id field is disabled cause of mis-matching defination
between CPER spec and driver.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Mon, 24 Feb 2025 08:16:32 +0000 (13:46 +0530)]
drm/amdkfd: Fix Circular Locking Dependency in 'svm_range_cpu_invalidate_pagetables'
This commit addresses a circular locking dependency in the
svm_range_cpu_invalidate_pagetables function. The function previously
held a lock while determining whether to perform an unmap or eviction
operation, which could lead to deadlocks.
Fixes the below:
[ 223.418794] ======================================================
[ 223.418820] WARNING: possible circular locking dependency detected
[ 223.418845] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE
[ 223.418869] ------------------------------------------------------
[ 223.418889] kfdtest/3939 is trying to acquire lock:
[ 223.418906]
ffff8957552eae38 (&dqm->lock_hidden){+.+.}-{3:3}, at: evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.419302]
but task is already holding lock:
[ 223.419303]
ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[ 223.419447] Console: switching to colour dummy device 80x25
[ 223.419477] [IGT] amd_basic: executing
[ 223.419599]
which lock already depends on the new lock.
[ 223.419611]
the existing dependency chain (in reverse order) is:
[ 223.419621]
-> #2 (&prange->lock){+.+.}-{3:3}:
[ 223.419636] __mutex_lock+0x85/0xe20
[ 223.419647] mutex_lock_nested+0x1b/0x30
[ 223.419656] svm_range_validate_and_map+0x2f1/0x15b0 [amdgpu]
[ 223.419954] svm_range_set_attr+0xe8c/0x1710 [amdgpu]
[ 223.420236] svm_ioctl+0x46/0x50 [amdgpu]
[ 223.420503] kfd_ioctl_svm+0x50/0x90 [amdgpu]
[ 223.420763] kfd_ioctl+0x409/0x6d0 [amdgpu]
[ 223.421024] __x64_sys_ioctl+0x95/0xd0
[ 223.421036] x64_sys_call+0x1205/0x20d0
[ 223.421047] do_syscall_64+0x87/0x140
[ 223.421056] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.421068]
-> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 223.421084] __ww_mutex_lock.constprop.0+0xab/0x1560
[ 223.421095] ww_mutex_lock+0x2b/0x90
[ 223.421103] amdgpu_amdkfd_alloc_gtt_mem+0xcc/0x2b0 [amdgpu]
[ 223.421361] add_queue_mes+0x3bc/0x440 [amdgpu]
[ 223.421623] unhalt_cpsch+0x1ae/0x240 [amdgpu]
[ 223.421888] kgd2kfd_start_sched+0x5e/0xd0 [amdgpu]
[ 223.422148] amdgpu_amdkfd_start_sched+0x3d/0x50 [amdgpu]
[ 223.422414] amdgpu_gfx_enforce_isolation_handler+0x132/0x270 [amdgpu]
[ 223.422662] process_one_work+0x21e/0x680
[ 223.422673] worker_thread+0x190/0x330
[ 223.422682] kthread+0xe7/0x120
[ 223.422690] ret_from_fork+0x3c/0x60
[ 223.422699] ret_from_fork_asm+0x1a/0x30
[ 223.422708]
-> #0 (&dqm->lock_hidden){+.+.}-{3:3}:
[ 223.422723] __lock_acquire+0x16f4/0x2810
[ 223.422734] lock_acquire+0xd1/0x300
[ 223.422742] __mutex_lock+0x85/0xe20
[ 223.422751] mutex_lock_nested+0x1b/0x30
[ 223.422760] evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.423025] kfd_process_evict_queues+0x8a/0x1d0 [amdgpu]
[ 223.423285] kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu]
[ 223.423540] svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu]
[ 223.423807] __mmu_notifier_invalidate_range_start+0x1f5/0x250
[ 223.423819] copy_page_range+0x1e94/0x1ea0
[ 223.423829] copy_process+0x172f/0x2ad0
[ 223.423839] kernel_clone+0x9c/0x3f0
[ 223.423847] __do_sys_clone+0x66/0x90
[ 223.423856] __x64_sys_clone+0x25/0x30
[ 223.423864] x64_sys_call+0x1d7c/0x20d0
[ 223.423872] do_syscall_64+0x87/0x140
[ 223.423880] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.423891]
other info that might help us debug this:
[ 223.423903] Chain exists of:
&dqm->lock_hidden --> reservation_ww_class_mutex --> &prange->lock
[ 223.423926] Possible unsafe locking scenario:
[ 223.423935] CPU0 CPU1
[ 223.423942] ---- ----
[ 223.423949] lock(&prange->lock);
[ 223.423958] lock(reservation_ww_class_mutex);
[ 223.423970] lock(&prange->lock);
[ 223.423981] lock(&dqm->lock_hidden);
[ 223.423990]
*** DEADLOCK ***
[ 223.423999] 5 locks held by kfdtest/3939:
[ 223.424006] #0:
ffffffffb82b4fc0 (dup_mmap_sem){.+.+}-{0:0}, at: copy_process+0x1387/0x2ad0
[ 223.424026] #1:
ffff89575eda81b0 (&mm->mmap_lock){++++}-{3:3}, at: copy_process+0x13a8/0x2ad0
[ 223.424046] #2:
ffff89575edaf3b0 (&mm->mmap_lock/1){+.+.}-{3:3}, at: copy_process+0x13e4/0x2ad0
[ 223.424066] #3:
ffffffffb82e76e0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: copy_page_range+0x1cea/0x1ea0
[ 223.424088] #4:
ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[ 223.424365]
stack backtrace:
[ 223.424374] CPU: 0 UID: 0 PID: 3939 Comm: kfdtest Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14
[ 223.424392] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 223.424401] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F36a 02/16/2022
[ 223.424416] Call Trace:
[ 223.424423] <TASK>
[ 223.424430] dump_stack_lvl+0x9b/0xf0
[ 223.424441] dump_stack+0x10/0x20
[ 223.424449] print_circular_bug+0x275/0x350
[ 223.424460] check_noncircular+0x157/0x170
[ 223.424469] ? __bfs+0xfd/0x2c0
[ 223.424481] __lock_acquire+0x16f4/0x2810
[ 223.424490] ? srso_return_thunk+0x5/0x5f
[ 223.424505] lock_acquire+0xd1/0x300
[ 223.424514] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.424783] __mutex_lock+0x85/0xe20
[ 223.424792] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425058] ? srso_return_thunk+0x5/0x5f
[ 223.425067] ? mark_held_locks+0x54/0x90
[ 223.425076] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425339] ? srso_return_thunk+0x5/0x5f
[ 223.425350] mutex_lock_nested+0x1b/0x30
[ 223.425358] ? mutex_lock_nested+0x1b/0x30
[ 223.425367] evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425631] kfd_process_evict_queues+0x8a/0x1d0 [amdgpu]
[ 223.425893] kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu]
[ 223.426156] svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu]
[ 223.426423] ? srso_return_thunk+0x5/0x5f
[ 223.426436] __mmu_notifier_invalidate_range_start+0x1f5/0x250
[ 223.426450] copy_page_range+0x1e94/0x1ea0
[ 223.426461] ? srso_return_thunk+0x5/0x5f
[ 223.426474] ? srso_return_thunk+0x5/0x5f
[ 223.426484] ? lock_acquire+0xd1/0x300
[ 223.426494] ? copy_process+0x1718/0x2ad0
[ 223.426502] ? srso_return_thunk+0x5/0x5f
[ 223.426510] ? sched_clock_noinstr+0x9/0x10
[ 223.426519] ? local_clock_noinstr+0xe/0xc0
[ 223.426528] ? copy_process+0x1718/0x2ad0
[ 223.426537] ? srso_return_thunk+0x5/0x5f
[ 223.426550] copy_process+0x172f/0x2ad0
[ 223.426569] kernel_clone+0x9c/0x3f0
[ 223.426577] ? __schedule+0x4c9/0x1b00
[ 223.426586] ? srso_return_thunk+0x5/0x5f
[ 223.426594] ? sched_clock_noinstr+0x9/0x10
[ 223.426602] ? srso_return_thunk+0x5/0x5f
[ 223.426610] ? local_clock_noinstr+0xe/0xc0
[ 223.426619] ? schedule+0x107/0x1a0
[ 223.426629] __do_sys_clone+0x66/0x90
[ 223.426643] __x64_sys_clone+0x25/0x30
[ 223.426652] x64_sys_call+0x1d7c/0x20d0
[ 223.426661] do_syscall_64+0x87/0x140
[ 223.426671] ? srso_return_thunk+0x5/0x5f
[ 223.426679] ? common_nsleep+0x44/0x50
[ 223.426690] ? srso_return_thunk+0x5/0x5f
[ 223.426698] ? trace_hardirqs_off+0x52/0xd0
[ 223.426709] ? srso_return_thunk+0x5/0x5f
[ 223.426717] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426727] ? srso_return_thunk+0x5/0x5f
[ 223.426736] ? do_syscall_64+0x93/0x140
[ 223.426748] ? srso_return_thunk+0x5/0x5f
[ 223.426756] ? up_write+0x1c/0x1e0
[ 223.426765] ? srso_return_thunk+0x5/0x5f
[ 223.426775] ? srso_return_thunk+0x5/0x5f
[ 223.426783] ? trace_hardirqs_off+0x52/0xd0
[ 223.426792] ? srso_return_thunk+0x5/0x5f
[ 223.426800] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426810] ? srso_return_thunk+0x5/0x5f
[ 223.426818] ? do_syscall_64+0x93/0x140
[ 223.426826] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426836] ? srso_return_thunk+0x5/0x5f
[ 223.426844] ? do_syscall_64+0x93/0x140
[ 223.426853] ? srso_return_thunk+0x5/0x5f
[ 223.426861] ? irqentry_exit+0x6b/0x90
[ 223.426869] ? srso_return_thunk+0x5/0x5f
[ 223.426877] ? exc_page_fault+0xa7/0x2c0
[ 223.426888] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.426898] RIP: 0033:0x7f46758eab57
[ 223.426906] Code: ba 04 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 41 89 c0 85 c0 75 2c 64 48 8b 04 25 10 00
[ 223.426930] RSP: 002b:
00007fff5c3e5188 EFLAGS:
00000246 ORIG_RAX:
0000000000000038
[ 223.426943] RAX:
ffffffffffffffda RBX:
00007f4675f8c040 RCX:
00007f46758eab57
[ 223.426954] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000001200011
[ 223.426965] RBP:
0000000000000000 R08:
0000000000000000 R09:
0000000000000000
[ 223.426975] R10:
00007f4675e81a50 R11:
0000000000000246 R12:
0000000000000001
[ 223.426986] R13:
00007fff5c3e5470 R14:
00007fff5c3e53e0 R15:
00007fff5c3e5410
[ 223.427004] </TASK>
v2: To resolve this issue, the allocation of the process context buffer
(`proc_ctx_bo`) has been moved from the `add_queue_mes` function to the
`pqm_create_queue` function. This change ensures that the buffer is
allocated only when the first queue for a process is created and only if
the Micro Engine Scheduler (MES) is enabled. (Felix)
v3: Fix typo s/Memory Execution Scheduler (MES)/Micro Engine Scheduler
in commit message. (Lijo)
Fixes:
438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd")
Cc: Jesse Zhang <jesse.zhang@amd.com>
Cc: Yunxiang Li <Yunxiang.Li@amd.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Benjamin Chan [Fri, 31 Jan 2025 19:03:46 +0000 (14:03 -0500)]
drm/amdgpu: Add amdisp pinctrl MFD resource
AMDISP GPIO control uses a dedicated pinctrl driver,
and requires MFD hotadd GPIO resources.
Co-developed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Benjamin Chan <benjamin.chan@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 20 Feb 2025 14:58:25 +0000 (09:58 -0500)]
drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX12. There is no suspend/resume all support
in firmware so the function doesn't do anything. KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dr. David Alan Gilbert [Mon, 24 Feb 2025 01:49:42 +0000 (01:49 +0000)]
drm/amd/display: Remove unused optc3_fpu_set_vrr_m_const
The last use of optc3_fpu_set_vrr_m_const() was removed in 2022's
commit
64f991590ff4 ("drm/amd/display: Fix a compilation failure on PowerPC
caused by FPU code")
which removed the only caller (with a similar) name.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pratap Nirujogi [Wed, 19 Feb 2025 22:01:26 +0000 (17:01 -0500)]
drm/amdgpu: Replace DRM_ERROR() with drm_err()
DRM_ERROR() is no longer preferred. Replace DRM_ERROR() usage
with drm_err() in isp driver.
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Luan Arcanjo [Tue, 25 Feb 2025 01:55:29 +0000 (22:55 -0300)]
drm/amd/display/dc: Refactor remove duplications
All dce command_table_helper's shares a copy-pasted collection
of copy-pasted functions, which are: phy_id_to_atom,
clock_source_id_to_atom_phy_clk_src_id, and engine_bp_to_atom.
This patch removes the multiple copy-pasted by moving them to
the command_table_helper.c and make the command_table_helper's
calls the functions implemented by the command_table_helper.c
instead.
The changes were not tested on actual hardware. I am only able
to verify that the changes keep the code compileable and do my
best to to look repeatedly if I am not actually changing any code.
This is the version 4 of the PATCH, fixed comments about
licence in the new files and the matches From email to
Signed-off-by email. Fixed comments about using
command_table_helper instead of creating a dce_common
Signed-off-by: Luan Icaro Pinto Arcanjo <luanicaro@usp.br>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 7 Jan 2025 17:26:46 +0000 (12:26 -0500)]
drm/amdgpu/vcn: use dev_info() for firmware information
To properly handle multiple GPUs.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 7 Jan 2025 17:16:28 +0000 (12:16 -0500)]
drm/amdgpu/vcn: optimize firmware storage
If each instance uses the same fw image, only store one
copy in the driver.
Acked-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:25:46 +0000 (14:25 -0500)]
drm/amdgpu/vcn5.0.1: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:22:53 +0000 (14:22 -0500)]
drm/amdgpu/vcn5.0.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:22:25 +0000 (14:22 -0500)]
drm/amdgpu/vcn4.0.5: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:41:45 +0000 (12:41 -0500)]
drm/amdgpu/vcn4.0.3: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:41:17 +0000 (12:41 -0500)]
drm/amdgpu/vcn4.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:40:47 +0000 (12:40 -0500)]
drm/amdgpu/vcn3.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:35:42 +0000 (12:35 -0500)]
drm/amdgpu/vcn2.5: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:33:58 +0000 (12:33 -0500)]
drm/amdgpu/vcn2.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:32:12 +0000 (12:32 -0500)]
drm/amdgpu/vcn1.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:30:30 +0000 (12:30 -0500)]
drm/amdgpu/vcn: add a generic helper for set_power_gating_state
It's common for all VCN variants.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:26:32 +0000 (12:26 -0500)]
drm/amdgpu/vcn: use per instance callbacks for idle work handler
Use the vcn instance power gating callbacks rather than
the IP powergating callback. This limits power gating to
only the instance in use rather than all of the instances.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:15:34 +0000 (14:15 -0500)]
drm/amdgpu/vcn5.0.1: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:20:38 +0000 (12:20 -0500)]
drm/amdgpu/vcn5.0.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:20:19 +0000 (12:20 -0500)]
drm/amdgpu/vcn4.0.5: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:20:04 +0000 (12:20 -0500)]
drm/amdgpu/vcn4.0.3: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:19:49 +0000 (12:19 -0500)]
drm/amdgpu/vcn4.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:19:33 +0000 (12:19 -0500)]
drm/amdgpu/vcn3.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:19:15 +0000 (12:19 -0500)]
drm/amdgpu/vcn2.5: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 15:57:48 +0000 (10:57 -0500)]
drm/amdgpu/vcn2.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 15:52:15 +0000 (10:52 -0500)]
drm/amdgpu/vcn1.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 16:27:06 +0000 (11:27 -0500)]
drm/amdgpu/vcn: add new per instance callback for powergating
This is per instance so add a new function pointer for it.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 16:14:58 +0000 (11:14 -0500)]
drm/amdgpu/vcn: adjust pause_dpg_mode function signature
Change it to take a vcn instance rather than adev to align
with the vcn instance changes.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:00:57 +0000 (14:00 -0500)]
drm/amdgpu/vcn5.0.1: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 23:07:03 +0000 (18:07 -0500)]
drm/amdgpu/vcn5.0.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 22:38:48 +0000 (17:38 -0500)]
drm/amdgpu/vcn4.0.5: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 22:01:49 +0000 (17:01 -0500)]
drm/amdgpu/vcn4.0.3: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 20:38:37 +0000 (15:38 -0500)]
drm/amdgpu/vcn4.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 18:53:40 +0000 (13:53 -0500)]
drm/amdgpu/vcn2.5: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 18:30:16 +0000 (13:30 -0500)]
drm/amdgpu/vcn2.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 19 Nov 2024 21:51:36 +0000 (16:51 -0500)]
drm/amdgpu/vcn1.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 19 Nov 2024 21:10:46 +0000 (16:10 -0500)]
drm/amdgpu/vcn3.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 15 Nov 2024 22:44:01 +0000 (17:44 -0500)]
drm/amdgpu/vcn: switch vcn helpers to be instance based
Pass the instance to the helpers.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 15 Nov 2024 21:19:23 +0000 (16:19 -0500)]
drm/amdgpu/vcn: move more instanced data to vcn_instance
Move more per instance data into the per instance structure.
v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
v3: fix typo on vcn 2.5
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 20:28:41 +0000 (15:28 -0500)]
drm/amdgpu/vcn: make powergating status per instance
Store it per instance so we can track it per instance.
v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 19:43:15 +0000 (14:43 -0500)]
drm/amdgpu/vcn: switch work handler to be per instance
Have a separate work handler for each VCN instance. This
paves the way for per instance VCN power gating at runtime.
v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 17:34:54 +0000 (12:34 -0500)]
drm/amdgpu/vcn5.0.1: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 17:27:45 +0000 (12:27 -0500)]
drm/amdgpu/vcn5.0.0: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 17:21:18 +0000 (12:21 -0500)]
drm/amdgpu/vcn4.0.5: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 17:13:15 +0000 (12:13 -0500)]
drm/amdgpu/vcn4.0.3: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 17:01:44 +0000 (12:01 -0500)]
drm/amdgpu/vcn4.0: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 16:47:49 +0000 (11:47 -0500)]
drm/amdgpu/vcn3.0: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 24 Feb 2025 16:13:27 +0000 (11:13 -0500)]
drm/amdgpu/vcn2.5: fix VCN stop logic
Need to make sure we call amdgpu_dpm_enable_vcn()
in vcn_v2_5_stop() at the end if there are errors
or DPG is enabled.
Fixes:
ebc25499de12 ("drm/amdgpu/vcn2.5: split code along instances")
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Suggested-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Tue, 25 Feb 2025 11:18:12 +0000 (19:18 +0800)]
drm/amdgpu: increase AMDGPU_MAX_RINGS
Increase it since a cper ring is introduced.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Mon, 24 Feb 2025 11:56:22 +0000 (17:26 +0530)]
drm/amdgpu: Fix correct parameter desc for VCN idle check functions
Fixes the kdoc for the following VCN idle check functions by updating
the parameter description from 'handle' to 'ip_block':
- vcn_v4_0_is_idle
- vcn_v4_0_3_is_idle
- vcn_v4_0_5_is_idle
- vcn_v5_0_1_is_idle
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Excess function parameter 'handle' description in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Excess function parameter 'handle' description in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_is_idle'
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pierre-Eric Pelloux-Prayer [Thu, 20 Feb 2025 13:41:59 +0000 (14:41 +0100)]
drm/amdgpu: init return value in amdgpu_ttm_clear_buffer
Otherwise an uninitialized value can be returned if
amdgpu_res_cleared returns true for all regions.
Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
Fixes:
a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ganglxie [Mon, 24 Feb 2025 07:06:51 +0000 (15:06 +0800)]
drm/amdgpu: Change page/record number calculation based on nps
save only one record to save eeprom space,and
bad_page_num = pa_rec_num + mca_rec_num*16
Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ganglxie [Mon, 24 Feb 2025 07:03:05 +0000 (15:03 +0800)]
drm/amdgpu: Refine bad page adding
bad page adding can be simpler with nps info
Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Sat, 22 Feb 2025 10:11:35 +0000 (18:11 +0800)]
drm/amd/pm: Get metrics table version for smu_v13_0_12
Get metrics table version for smu_v13_0_12 and populate pm_metrics
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Fri, 21 Feb 2025 10:18:07 +0000 (18:18 +0800)]
drm/amdgpu: update SDMA sysfs reset mask in late_init
- Added `sdma_v4_4_2_update_reset_mask` function to update the reset mask.
- update the sysfs reset mask to the `late_init` stage to ensure that the SMU initialization
and capability setup are completed before checking the SDMA reset capability.
- For IP versions 9.4.3 and 9.4.4, enable per-queue reset if the MEC firmware version is at least 0xb0 and PMFW supports queue reset.
- Add a TODO comment for future support of per-queue reset for IP version 9.5.0.
This change ensures that per-queue reset is only enabled when the MEC and PMFW support it.
v2: fix ip version (9.5.4 -> 9.5.0)(Lijo)
Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Mon, 24 Feb 2025 15:01:06 +0000 (23:01 +0800)]
drm/amdgpu: Set CPER enabled flag after ring initiailized
Setting cper.enabled to be true only after cper ring is successfully
created.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ganglxie [Mon, 24 Feb 2025 03:17:33 +0000 (11:17 +0800)]
drm/amdgpu: Save nps to eeprom
nps info saved together with bad page makes bad page parsing more efficient
Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Mon, 24 Feb 2025 13:10:24 +0000 (21:10 +0800)]
drm/amdgpu: Check if CPER enabled when generating CPER
In the case of CPER disabled, generating CPER will cause kernel NULL
pointer dereference without checking.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mangesh Gadre [Fri, 21 Feb 2025 09:38:21 +0000 (17:38 +0800)]
drm/amd/pm: handling of set performance level
display performance level when set not supported
Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Mon, 10 Feb 2025 18:15:48 +0000 (13:15 -0500)]
drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Fri, 21 Feb 2025 14:39:27 +0000 (09:39 -0500)]
drm/amdkfd: enable cooperative launch on gfx12
Even though GWS no longer exists, to maintain runtime usage for
cooperative launch, SW set legacy GWS size.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 16 Feb 2025 21:27:26 +0000 (16:27 -0500)]
drm/amd/display: Promote DAL to 3.2.322
- Disable PSR-SU on eDP panels
- Fix HPD after GPU reset
- Fixes on dcn4x init, DML2 state policy on DCN36
- Various minor logic fixes
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 16 Feb 2025 20:06:06 +0000 (15:06 -0500)]
drm/amd/display: [FW Promotion] Release 0.0.255.0
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Wed, 12 Feb 2025 19:49:36 +0000 (14:49 -0500)]
drm/amd/display: Fix HPD after gpu reset
[Why]
DC is not using amdgpu_irq_get/put to manage the HPD interrupt refcounts.
So when amdgpu_irq_gpu_reset_resume_helper() reprograms all of the IRQs,
HPD gets disabled.
[How]
Use amdgpu_irq_get/put() for HPD init/fini in DM in order to sync refcounts
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mike Katsnelson [Thu, 13 Feb 2025 16:52:32 +0000 (11:52 -0500)]
drm/amd/display: stop DML2 from removing pipes based on planes
[Why]
Transitioning from low to high resolutions at high refresh rates caused grey corruption.
During the transition state, there is a period where plane size is based on low resultion
state and ODM slices are based on high resoultion state, causing the entire plane to be
contained in one ODM slice. DML2 would turn off the pipe for the ODM slice with no plane,
causing an underflow since the pixel rate for the higher resolution cannot be supported on
one pipe. This change stops DML2 from turning off pipes that are mapped to an ODM slice
with no plane. This is possible to do without negative consequences because pipes can now
take the minimum viewport and draw with zero recout size, removing the need to have the
pipe turned off.
[How]
In map_pipes_from_plane(), remove "check" that skips ODM slices that are not covered by
the plane. This prevents the pipes for those ODM slices from being freed.
Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Mike Katsnelson <mike.katsnelson@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Thu, 13 Feb 2025 22:40:29 +0000 (17:40 -0500)]
drm/amd/display: Increase halt timeout for DMCUB to 1s
[Why]
If we soft reset before halt finishes and there are outstanding
memory transactions then the memory interface may produce unexpected
results, such as out of order transactions when the firmware next runs.
These can manifest as random or unexpected load/store violations.
[How]
Increase the timeout before soft reset to ensure the DMCUB has quiesced.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Krunoslav Kovac [Fri, 14 Feb 2025 00:14:59 +0000 (19:14 -0500)]
drm/amd/display: Remove unused header
[Why]
Removes unused header
Reviewed-by: Samson Tam <samson.tam@amd.com>
Signed-off-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yihan Zhu [Wed, 12 Feb 2025 20:17:56 +0000 (15:17 -0500)]
drm/amd/display: handle max_downscale_src_width fail check
[WHY]
If max_downscale_src_width check fails, we exit early from TAP calculation and left a NULL
value to the scaling data structure to cause the zero divide in the DML validation.
[HOW]
Call set default TAP calculation before early exit in get_optimal_number_of_taps due to
max downscale limit exceed.
Reviewed-by: Samson Tam <samson.tam@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Fri, 24 Jan 2025 20:02:27 +0000 (15:02 -0500)]
drm/amd/display: Update FIXED_VS Link Rate Toggle Workaround Usage
[WHY]
Previously the 128b/132b LTTPR support DPCD field was used to decide if
FIXED_VS training sequence required a rate toggle before initiating LT.
When running DP2.1 4.9.x.x compliance tests, emulated LTTPRs can report
no-128b/132b support which is then forwarded by the FIXED_VS retimer.
As a result this test exposes the rate toggle again, erroneously causing
failures as certain compliance sinks don't expect this behaviour.
[HOW]
Add new DPCD register defines/reads to read LTTPR IEEE OUI and device ID.
Decide whether to perform the rate toggle based on the LTTPR's IEEE OUI
which guarantees that we only perform the toggle on affected retimers.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Thu, 13 Feb 2025 17:37:10 +0000 (12:37 -0500)]
drm/amd/display: fix dcn4x init failed
[why]
failed due to cmdtable not created.
switch atombios cmdtable as default.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Mon, 20 Jan 2025 20:27:23 +0000 (15:27 -0500)]
drm/amd/display: Temporarily disable hostvm on DCN31
With HostVM enabled, DCN31 fails to pass validation for 3x4k60. Some Linux
userspace does not downgrade one of the monitors to 4k30, and the result
is that the monitor does not light up. Disable it until the bandwidth
calculation failure is resolved.
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rafal Ostrowski [Wed, 12 Feb 2025 07:08:07 +0000 (08:08 +0100)]
drm/amd/display: ACPI Re-timer Programming
[Why]
We must implement an ACPI re-timer programming interface and notify
ACPI driver whenever a PHY transition is about to take place.
Because some trace lengths on certain platforms are very long,
then a re-timer may need to be programmed whenever a PHY transition
takes place. The implementation of this re-timer programming interface
will notify ACPI driver that PHY transition is taking place and it
will trigger the re-timer as needed.
First we need to gather retimer information from ACPI interface.
Then, in the PRE case, the re-timer interface needs to be called before we call
transmitter ENABLE.
In the POST case, it has to be called after we call transmitter DISABLE.
[How]
Implemented ACPI retimer programming interface.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rafal Ostrowski <rostrows@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Patel, Swapnil [Sun, 9 Feb 2025 16:42:23 +0000 (11:42 -0500)]
drm/amd/display: Refactor DCN4x and related code
[why & how]
Refactor existing code related to DCN4x for better code sharing with
other modules.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Swapnil Patel <Swapnil.Patel@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yilin Chen [Fri, 7 Feb 2025 20:26:19 +0000 (15:26 -0500)]
drm/amd/display: add a quirk to enable eDP0 on DP1
[why]
some board designs have eDP0 connected to DP1, need a way to enable
support_edp0_on_dp1 flag, otherwise edp related features cannot work
[how]
do a dmi check during dm initialization to identify systems that
require support_edp0_on_dp1. Optimize quirk table with callback
functions to set quirk entries, retrieve_dmi_info can set quirks
according to quirk entries
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Yilin Chen <Yilin.Chen@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Peichen Huang [Fri, 17 Jan 2025 02:48:11 +0000 (10:48 +0800)]
drm/amd/display: replace dio encoder access
[WHY]
replace dio encoder access to work with new dio encoder
assignment.
[HOW}
1. before validation, access dio encoder by get_temp_dio_link_enc()
2. after validation, access dio encoder through pipe_ctx->link_res
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Navid Assadian [Mon, 20 Jan 2025 17:35:07 +0000 (12:35 -0500)]
drm/amd/display: Add SPL namespace
[Why]
In order to avoid component conflicts, spl namespace is needed.
[How]
Adding SPL namespace to the public API os that each user of SPL can have
their own namespace.
Signed-off-by: Navid Assadian <Navid.Assadian@amd.com>
Reviewed-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 7 Jan 2025 19:17:15 +0000 (14:17 -0500)]
drm/amd/display: Fix unit test failure
[Why]
Some of unit tests use large scaling ratio such that when we
calculate optimal number of taps, max_taps is negative.
Then in recent change, we changed max_taps to uint instead
of int so now max_taps wraps and is positive. This change
changed the behaviour from returning back false to return
true and breaks unit test check
[How]
Add check to prevent max_taps from wrapping and set to 0
instead
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 21 Jan 2025 16:01:47 +0000 (11:01 -0500)]
drm/amd/display: fix check for identity ratio
[Why]
IDENTITY_RATIO check uses 2 bits for integer, which only allows
checking downscale ratios up to 3. But we support up to 6x
downscale
[How]
Update IDENTITY_RATIO to check 3 bits for integer
Add ASSERT to catch if we downscale more than 6x
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Assadian, Navid [Thu, 19 Dec 2024 22:19:09 +0000 (17:19 -0500)]
drm/amd/display: Fix mismatch type comparison
The mismatch type comparison/assignment may cause data loss. Since the
values are always non-negative, it is safe to use unsigned variables to
resolve the mismatch.
Signed-off-by: Navid Assadian <navid.assadian@amd.com>
Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Navid Assadian [Mon, 20 Jan 2025 17:35:23 +0000 (12:35 -0500)]
drm/amd/display: Add opp recout adjustment
[Why]
For subsampled YUV output formats, more pixels can get fetched and be
used for scaling.
[How]
Add the adjustment to the calculated recout, so the viewport covers the
corresponding pixels on the source plane.
Signed-off-by: Navid Assadian <Navid.Assadian@amd.com>
Reviewed-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 7 Jan 2025 19:16:04 +0000 (14:16 -0500)]
drm/amd/display: Fix mismatch type comparison in custom_float
[Why & How]
Passing uint into uchar function param. Pass uint instead
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Fri, 24 Jan 2025 14:59:37 +0000 (09:59 -0500)]
drm/amd/display: Apply DCN35 DML2 state policy for DCN36 too
[Why]
DCN36 should inherit the same policy as DCN35 for DML2.
[How]
Add it to the list of checks in translation helper.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Tue, 11 Feb 2025 20:43:48 +0000 (13:43 -0700)]
drm/amd/display: update incorrect cursor buffer size
[WHAT & HOW]
Fix the incorrect value of the cursor_buffer_size.
Signed-off-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Chung [Thu, 6 Feb 2025 03:31:23 +0000 (11:31 +0800)]
drm/amd/display: Disable PSR-SU on eDP panels
[Why]
PSR-SU may cause some glitching randomly on several panels.
[How]
Temporarily disable the PSR-SU and fallback to PSR1 for
all eDP panels.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3388
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Chung [Thu, 6 Feb 2025 03:30:17 +0000 (11:30 +0800)]
drm/amd/display: Revert "Disable PSR-SU on some OLED panel"
This reverts commit
c31b41f1cb32450d8ac176eef9bda979760040e7.
We planning to disable the PSR-SU and fallback to PSR1 for
all eDP panels not only for specific eDP panel temporarily.
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Colin Ian King [Mon, 17 Feb 2025 09:53:25 +0000 (09:53 +0000)]
drm/amd/display: Fix spelling mistake "oustanding" -> "outstanding"
There is a spelling mistake in max_oustanding_when_urgent_expected,
fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Fri, 21 Feb 2025 19:19:12 +0000 (14:19 -0500)]
MAINTAINERS: Update AMDGPU DML maintainers info
Chaitanya is no longer with AMD, and the responsibility has been
taken over by Austin.
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>