Philip Yang [Mon, 10 Feb 2025 14:42:31 +0000 (09:42 -0500)]
drm/amdkfd: debugfs hang_hws skip GPU with MES
debugfs hang_hws is used by GPU reset test with HWS, for MES this crash
the kernel with NULL pointer access because dqm->packet_mgr is not setup
for MES path.
Skip GPU with MES for now, MES hang_hws debugfs interface will be
supported later.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 20 Feb 2025 21:02:13 +0000 (16:02 -0500)]
drm/amdkfd: Fix pqm_destroy_queue race with GPU reset
If GPU in reset, destroy_queue return -EIO, pqm_destroy_queue should
delete the queue from process_queue_list and free the resource.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Wed, 26 Feb 2025 01:40:38 +0000 (07:10 +0530)]
drm/amdgpu: Fix parameter annotations for VCN clock gating functions
The previous references to a non-existent `adev` parameter have been
removed & corrected to reflect the use of the `vinst` pointer, which
points to the VCN instance structure, in the below files:
- vcn_v1_0.c
- vcn_v2_0.c
- vcn_v3_0.c
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Function parameter or struct member 'vinst' not described in 'vcn_v1_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Excess function parameter 'adev' description in 'vcn_v1_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Function parameter or struct member 'vinst' not described in 'vcn_v2_0_mc_resume'
drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Excess function parameter 'adev' description in 'vcn_v2_0_mc_resume'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'adev' description in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'inst' description in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'adev' description in 'vcn_v3_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'inst' description in 'vcn_v3_0_enable_clock_gating'
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 6 Feb 2025 22:50:13 +0000 (17:50 -0500)]
drm/amdkfd: Fix mode1 reset crash issue
If HW scheduler hangs and mode1 reset is used to recover GPU, KFD signal
user space to abort the processes. After process abort exit, user queues
still use the GPU to access system memory before h/w is reset while KFD
cleanup worker free system memory and free VRAM.
There is use-after-free race bug that KFD allocate and reuse the freed
system memory, and user queue write to the same system memory to corrupt
the data structure and cause driver crash.
To fix this race, KFD cleanup worker terminate user queues, then flush
reset_domain wq to wait for any GPU ongoing reset complete, and then
free outstanding BOs.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 18 Feb 2025 01:08:29 +0000 (20:08 -0500)]
drm/amdkfd: KFD release_work possible circular locking
If waiting for gpu reset done in KFD release_work, thers is WARNING:
possible circular locking dependency detected
#2 kfd_create_process
kfd_process_mutex
flush kfd release work
#1 kfd release work
wait for amdgpu reset work
#0 amdgpu_device_gpu_reset
kgd2kfd_pre_reset
kfd_process_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&p->release_work));
lock((wq_completion)kfd_process_wq);
lock((work_completion)(&p->release_work));
lock((wq_completion)amdgpu-reset-dev);
To fix this, KFD create process move flush release work outside
kfd_process_mutex.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 25 Feb 2025 15:04:06 +0000 (10:04 -0500)]
drm/amdkfd: Remove kfd_process_hw_exception worker
With GPU reset-domain worker implemented, KFD hw_exception worker is not
needed any more, just call amdgpu_amdkfd_gpu_reset directly from
kfd_hws_hang.
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 26 Feb 2025 04:52:50 +0000 (12:52 +0800)]
drm/amd/amdgpu: Add support for xgmi_v6_4_1
Add support for xgmi_v6_4_1 and use it appropriate places
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 6 Feb 2025 11:54:51 +0000 (17:24 +0530)]
drm/amdgpu: Add xgmi speed/width related info
Add APIs to initialize XGMI speed, width details and get to max
bandwidth supported. It is assumed that a device only supports same
generation of XGMI links with uniform width.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 6 Feb 2025 10:46:43 +0000 (16:16 +0530)]
drm/amdgpu: Move xgmi definitions to xgmi header
Move definitions related to xgmi to amdgpu_xgmi header
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kenneth Feng [Thu, 27 Feb 2025 02:13:53 +0000 (10:13 +0800)]
drm/amd/pm: add fan abnormal detection
add fan abnormal detection on smu v14.0.2&smu v14.0.3
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiaogang Chen [Mon, 24 Feb 2025 22:50:50 +0000 (16:50 -0600)]
drm/amdkfd: remove kfd_pasid.c from amdgpu driver build
Since kfd uses pasid values from graphic driver now do not need use kfd pasid
fucntions.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yat Sin [Tue, 25 Feb 2025 23:08:02 +0000 (18:08 -0500)]
drm/amdkfd: clamp queue size to minimum
If queue size is less than minimum, clamp it to minimum to prevent
underflow when writing queue mqd.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
André Almeida [Wed, 26 Feb 2025 13:11:18 +0000 (10:11 -0300)]
drm/amdgpu: Create a debug option to disable ring reset
Prior to the addition of ring reset, the debug option
`debug_disable_soft_recovery` could be used to force a full device
reset. Now that we have ring reset, create a debug option to disable
them in amdgpu, forcing the driver to go with the full device
reset path again when both options are combined.
This option is useful for testing and debugging purposes when one wants
to test the full reset from userspace.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ma Ke [Wed, 26 Feb 2025 08:37:31 +0000 (16:37 +0800)]
drm/amd/display: Fix null check for pipe_ctx->plane_state in resource_build_scaling_params
Null pointer dereference issue could occur when pipe_ctx->plane_state
is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not
null before accessing. This prevents a null pointer dereference.
Found by code review.
Fixes:
3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 26 Feb 2025 14:25:49 +0000 (09:25 -0500)]
Documentation/gpu: remove duplicate entries in different glossaries
Some items were defined in both the general and DC glossaries.
Remove the duplicate entries.
Fixes:
2df30ae0ba0b ("Documentation/gpu: Add acronyms for some firmware components")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 20 Feb 2025 15:42:30 +0000 (10:42 -0500)]
drm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX11 for most firmware versions. KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Colin Ian King [Wed, 26 Feb 2025 08:57:33 +0000 (08:57 +0000)]
drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammar
There is a spelling mistake and a grammatical error in a dev_err
message. Fix it.
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Wed, 26 Feb 2025 03:36:55 +0000 (11:36 +0800)]
drm/amdgpu: Decode deferred error type in aca bank parser
In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.
v2: check both deferred and poison bit
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Le Ma [Tue, 11 Feb 2025 06:06:54 +0000 (14:06 +0800)]
drm/amdgpu: add sdma page queue irq processing for sdma442
Add the trap irq processing for page queue of sdma442
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by and Tested-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kenneth Feng [Wed, 26 Feb 2025 08:05:57 +0000 (16:05 +0800)]
drm/amd/pm: disable gfxoff on the specific sku
disable gfxoff on the specific sku based on the requirement
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Wed, 26 Feb 2025 06:27:27 +0000 (14:27 +0800)]
drm/amdgpu: Report generic instead of unknown boot time errors
Change the DMESG reporting of unknown errors to "Boot Controller
Generic Error" to align with the RAS SPEC and provide more clarity
to customers.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 25 Feb 2025 10:51:51 +0000 (16:21 +0530)]
drm/amdgpu: Fix logic to fetch supported NPS modes
Correct the logic to find supported NPS modes from firmware.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Ava Zhang <niandong.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Fixes:
30eb41f5d1a7 ("drm/amdgpu: Use firmware supported NPS modes")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Mon, 24 Feb 2025 07:13:40 +0000 (15:13 +0800)]
drm/amdgpu: Disable fru_id field in CPER section
The fru_id field is disabled cause of mis-matching defination
between CPER spec and driver.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Mon, 24 Feb 2025 08:16:32 +0000 (13:46 +0530)]
drm/amdkfd: Fix Circular Locking Dependency in 'svm_range_cpu_invalidate_pagetables'
This commit addresses a circular locking dependency in the
svm_range_cpu_invalidate_pagetables function. The function previously
held a lock while determining whether to perform an unmap or eviction
operation, which could lead to deadlocks.
Fixes the below:
[ 223.418794] ======================================================
[ 223.418820] WARNING: possible circular locking dependency detected
[ 223.418845] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE
[ 223.418869] ------------------------------------------------------
[ 223.418889] kfdtest/3939 is trying to acquire lock:
[ 223.418906]
ffff8957552eae38 (&dqm->lock_hidden){+.+.}-{3:3}, at: evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.419302]
but task is already holding lock:
[ 223.419303]
ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[ 223.419447] Console: switching to colour dummy device 80x25
[ 223.419477] [IGT] amd_basic: executing
[ 223.419599]
which lock already depends on the new lock.
[ 223.419611]
the existing dependency chain (in reverse order) is:
[ 223.419621]
-> #2 (&prange->lock){+.+.}-{3:3}:
[ 223.419636] __mutex_lock+0x85/0xe20
[ 223.419647] mutex_lock_nested+0x1b/0x30
[ 223.419656] svm_range_validate_and_map+0x2f1/0x15b0 [amdgpu]
[ 223.419954] svm_range_set_attr+0xe8c/0x1710 [amdgpu]
[ 223.420236] svm_ioctl+0x46/0x50 [amdgpu]
[ 223.420503] kfd_ioctl_svm+0x50/0x90 [amdgpu]
[ 223.420763] kfd_ioctl+0x409/0x6d0 [amdgpu]
[ 223.421024] __x64_sys_ioctl+0x95/0xd0
[ 223.421036] x64_sys_call+0x1205/0x20d0
[ 223.421047] do_syscall_64+0x87/0x140
[ 223.421056] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.421068]
-> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 223.421084] __ww_mutex_lock.constprop.0+0xab/0x1560
[ 223.421095] ww_mutex_lock+0x2b/0x90
[ 223.421103] amdgpu_amdkfd_alloc_gtt_mem+0xcc/0x2b0 [amdgpu]
[ 223.421361] add_queue_mes+0x3bc/0x440 [amdgpu]
[ 223.421623] unhalt_cpsch+0x1ae/0x240 [amdgpu]
[ 223.421888] kgd2kfd_start_sched+0x5e/0xd0 [amdgpu]
[ 223.422148] amdgpu_amdkfd_start_sched+0x3d/0x50 [amdgpu]
[ 223.422414] amdgpu_gfx_enforce_isolation_handler+0x132/0x270 [amdgpu]
[ 223.422662] process_one_work+0x21e/0x680
[ 223.422673] worker_thread+0x190/0x330
[ 223.422682] kthread+0xe7/0x120
[ 223.422690] ret_from_fork+0x3c/0x60
[ 223.422699] ret_from_fork_asm+0x1a/0x30
[ 223.422708]
-> #0 (&dqm->lock_hidden){+.+.}-{3:3}:
[ 223.422723] __lock_acquire+0x16f4/0x2810
[ 223.422734] lock_acquire+0xd1/0x300
[ 223.422742] __mutex_lock+0x85/0xe20
[ 223.422751] mutex_lock_nested+0x1b/0x30
[ 223.422760] evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.423025] kfd_process_evict_queues+0x8a/0x1d0 [amdgpu]
[ 223.423285] kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu]
[ 223.423540] svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu]
[ 223.423807] __mmu_notifier_invalidate_range_start+0x1f5/0x250
[ 223.423819] copy_page_range+0x1e94/0x1ea0
[ 223.423829] copy_process+0x172f/0x2ad0
[ 223.423839] kernel_clone+0x9c/0x3f0
[ 223.423847] __do_sys_clone+0x66/0x90
[ 223.423856] __x64_sys_clone+0x25/0x30
[ 223.423864] x64_sys_call+0x1d7c/0x20d0
[ 223.423872] do_syscall_64+0x87/0x140
[ 223.423880] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.423891]
other info that might help us debug this:
[ 223.423903] Chain exists of:
&dqm->lock_hidden --> reservation_ww_class_mutex --> &prange->lock
[ 223.423926] Possible unsafe locking scenario:
[ 223.423935] CPU0 CPU1
[ 223.423942] ---- ----
[ 223.423949] lock(&prange->lock);
[ 223.423958] lock(reservation_ww_class_mutex);
[ 223.423970] lock(&prange->lock);
[ 223.423981] lock(&dqm->lock_hidden);
[ 223.423990]
*** DEADLOCK ***
[ 223.423999] 5 locks held by kfdtest/3939:
[ 223.424006] #0:
ffffffffb82b4fc0 (dup_mmap_sem){.+.+}-{0:0}, at: copy_process+0x1387/0x2ad0
[ 223.424026] #1:
ffff89575eda81b0 (&mm->mmap_lock){++++}-{3:3}, at: copy_process+0x13a8/0x2ad0
[ 223.424046] #2:
ffff89575edaf3b0 (&mm->mmap_lock/1){+.+.}-{3:3}, at: copy_process+0x13e4/0x2ad0
[ 223.424066] #3:
ffffffffb82e76e0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: copy_page_range+0x1cea/0x1ea0
[ 223.424088] #4:
ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu]
[ 223.424365]
stack backtrace:
[ 223.424374] CPU: 0 UID: 0 PID: 3939 Comm: kfdtest Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14
[ 223.424392] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 223.424401] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F36a 02/16/2022
[ 223.424416] Call Trace:
[ 223.424423] <TASK>
[ 223.424430] dump_stack_lvl+0x9b/0xf0
[ 223.424441] dump_stack+0x10/0x20
[ 223.424449] print_circular_bug+0x275/0x350
[ 223.424460] check_noncircular+0x157/0x170
[ 223.424469] ? __bfs+0xfd/0x2c0
[ 223.424481] __lock_acquire+0x16f4/0x2810
[ 223.424490] ? srso_return_thunk+0x5/0x5f
[ 223.424505] lock_acquire+0xd1/0x300
[ 223.424514] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.424783] __mutex_lock+0x85/0xe20
[ 223.424792] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425058] ? srso_return_thunk+0x5/0x5f
[ 223.425067] ? mark_held_locks+0x54/0x90
[ 223.425076] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425339] ? srso_return_thunk+0x5/0x5f
[ 223.425350] mutex_lock_nested+0x1b/0x30
[ 223.425358] ? mutex_lock_nested+0x1b/0x30
[ 223.425367] evict_process_queues_cpsch+0x43/0x210 [amdgpu]
[ 223.425631] kfd_process_evict_queues+0x8a/0x1d0 [amdgpu]
[ 223.425893] kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu]
[ 223.426156] svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu]
[ 223.426423] ? srso_return_thunk+0x5/0x5f
[ 223.426436] __mmu_notifier_invalidate_range_start+0x1f5/0x250
[ 223.426450] copy_page_range+0x1e94/0x1ea0
[ 223.426461] ? srso_return_thunk+0x5/0x5f
[ 223.426474] ? srso_return_thunk+0x5/0x5f
[ 223.426484] ? lock_acquire+0xd1/0x300
[ 223.426494] ? copy_process+0x1718/0x2ad0
[ 223.426502] ? srso_return_thunk+0x5/0x5f
[ 223.426510] ? sched_clock_noinstr+0x9/0x10
[ 223.426519] ? local_clock_noinstr+0xe/0xc0
[ 223.426528] ? copy_process+0x1718/0x2ad0
[ 223.426537] ? srso_return_thunk+0x5/0x5f
[ 223.426550] copy_process+0x172f/0x2ad0
[ 223.426569] kernel_clone+0x9c/0x3f0
[ 223.426577] ? __schedule+0x4c9/0x1b00
[ 223.426586] ? srso_return_thunk+0x5/0x5f
[ 223.426594] ? sched_clock_noinstr+0x9/0x10
[ 223.426602] ? srso_return_thunk+0x5/0x5f
[ 223.426610] ? local_clock_noinstr+0xe/0xc0
[ 223.426619] ? schedule+0x107/0x1a0
[ 223.426629] __do_sys_clone+0x66/0x90
[ 223.426643] __x64_sys_clone+0x25/0x30
[ 223.426652] x64_sys_call+0x1d7c/0x20d0
[ 223.426661] do_syscall_64+0x87/0x140
[ 223.426671] ? srso_return_thunk+0x5/0x5f
[ 223.426679] ? common_nsleep+0x44/0x50
[ 223.426690] ? srso_return_thunk+0x5/0x5f
[ 223.426698] ? trace_hardirqs_off+0x52/0xd0
[ 223.426709] ? srso_return_thunk+0x5/0x5f
[ 223.426717] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426727] ? srso_return_thunk+0x5/0x5f
[ 223.426736] ? do_syscall_64+0x93/0x140
[ 223.426748] ? srso_return_thunk+0x5/0x5f
[ 223.426756] ? up_write+0x1c/0x1e0
[ 223.426765] ? srso_return_thunk+0x5/0x5f
[ 223.426775] ? srso_return_thunk+0x5/0x5f
[ 223.426783] ? trace_hardirqs_off+0x52/0xd0
[ 223.426792] ? srso_return_thunk+0x5/0x5f
[ 223.426800] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426810] ? srso_return_thunk+0x5/0x5f
[ 223.426818] ? do_syscall_64+0x93/0x140
[ 223.426826] ? syscall_exit_to_user_mode+0xcc/0x200
[ 223.426836] ? srso_return_thunk+0x5/0x5f
[ 223.426844] ? do_syscall_64+0x93/0x140
[ 223.426853] ? srso_return_thunk+0x5/0x5f
[ 223.426861] ? irqentry_exit+0x6b/0x90
[ 223.426869] ? srso_return_thunk+0x5/0x5f
[ 223.426877] ? exc_page_fault+0xa7/0x2c0
[ 223.426888] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 223.426898] RIP: 0033:0x7f46758eab57
[ 223.426906] Code: ba 04 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 41 89 c0 85 c0 75 2c 64 48 8b 04 25 10 00
[ 223.426930] RSP: 002b:
00007fff5c3e5188 EFLAGS:
00000246 ORIG_RAX:
0000000000000038
[ 223.426943] RAX:
ffffffffffffffda RBX:
00007f4675f8c040 RCX:
00007f46758eab57
[ 223.426954] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000001200011
[ 223.426965] RBP:
0000000000000000 R08:
0000000000000000 R09:
0000000000000000
[ 223.426975] R10:
00007f4675e81a50 R11:
0000000000000246 R12:
0000000000000001
[ 223.426986] R13:
00007fff5c3e5470 R14:
00007fff5c3e53e0 R15:
00007fff5c3e5410
[ 223.427004] </TASK>
v2: To resolve this issue, the allocation of the process context buffer
(`proc_ctx_bo`) has been moved from the `add_queue_mes` function to the
`pqm_create_queue` function. This change ensures that the buffer is
allocated only when the first queue for a process is created and only if
the Micro Engine Scheduler (MES) is enabled. (Felix)
v3: Fix typo s/Memory Execution Scheduler (MES)/Micro Engine Scheduler
in commit message. (Lijo)
Fixes:
438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd")
Cc: Jesse Zhang <jesse.zhang@amd.com>
Cc: Yunxiang Li <Yunxiang.Li@amd.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Benjamin Chan [Fri, 31 Jan 2025 19:03:46 +0000 (14:03 -0500)]
drm/amdgpu: Add amdisp pinctrl MFD resource
AMDISP GPIO control uses a dedicated pinctrl driver,
and requires MFD hotadd GPIO resources.
Co-developed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Benjamin Chan <benjamin.chan@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 20 Feb 2025 14:58:25 +0000 (09:58 -0500)]
drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX12. There is no suspend/resume all support
in firmware so the function doesn't do anything. KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dr. David Alan Gilbert [Mon, 24 Feb 2025 01:49:42 +0000 (01:49 +0000)]
drm/amd/display: Remove unused optc3_fpu_set_vrr_m_const
The last use of optc3_fpu_set_vrr_m_const() was removed in 2022's
commit
64f991590ff4 ("drm/amd/display: Fix a compilation failure on PowerPC
caused by FPU code")
which removed the only caller (with a similar) name.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pratap Nirujogi [Wed, 19 Feb 2025 22:01:26 +0000 (17:01 -0500)]
drm/amdgpu: Replace DRM_ERROR() with drm_err()
DRM_ERROR() is no longer preferred. Replace DRM_ERROR() usage
with drm_err() in isp driver.
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Luan Arcanjo [Tue, 25 Feb 2025 01:55:29 +0000 (22:55 -0300)]
drm/amd/display/dc: Refactor remove duplications
All dce command_table_helper's shares a copy-pasted collection
of copy-pasted functions, which are: phy_id_to_atom,
clock_source_id_to_atom_phy_clk_src_id, and engine_bp_to_atom.
This patch removes the multiple copy-pasted by moving them to
the command_table_helper.c and make the command_table_helper's
calls the functions implemented by the command_table_helper.c
instead.
The changes were not tested on actual hardware. I am only able
to verify that the changes keep the code compileable and do my
best to to look repeatedly if I am not actually changing any code.
This is the version 4 of the PATCH, fixed comments about
licence in the new files and the matches From email to
Signed-off-by email. Fixed comments about using
command_table_helper instead of creating a dce_common
Signed-off-by: Luan Icaro Pinto Arcanjo <luanicaro@usp.br>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 7 Jan 2025 17:26:46 +0000 (12:26 -0500)]
drm/amdgpu/vcn: use dev_info() for firmware information
To properly handle multiple GPUs.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 7 Jan 2025 17:16:28 +0000 (12:16 -0500)]
drm/amdgpu/vcn: optimize firmware storage
If each instance uses the same fw image, only store one
copy in the driver.
Acked-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:25:46 +0000 (14:25 -0500)]
drm/amdgpu/vcn5.0.1: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:22:53 +0000 (14:22 -0500)]
drm/amdgpu/vcn5.0.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:22:25 +0000 (14:22 -0500)]
drm/amdgpu/vcn4.0.5: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:41:45 +0000 (12:41 -0500)]
drm/amdgpu/vcn4.0.3: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:41:17 +0000 (12:41 -0500)]
drm/amdgpu/vcn4.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:40:47 +0000 (12:40 -0500)]
drm/amdgpu/vcn3.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:35:42 +0000 (12:35 -0500)]
drm/amdgpu/vcn2.5: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:33:58 +0000 (12:33 -0500)]
drm/amdgpu/vcn2.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:32:12 +0000 (12:32 -0500)]
drm/amdgpu/vcn1.0: use generic set_power_gating_state helper
No need for an IP specific version.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:30:30 +0000 (12:30 -0500)]
drm/amdgpu/vcn: add a generic helper for set_power_gating_state
It's common for all VCN variants.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:26:32 +0000 (12:26 -0500)]
drm/amdgpu/vcn: use per instance callbacks for idle work handler
Use the vcn instance power gating callbacks rather than
the IP powergating callback. This limits power gating to
only the instance in use rather than all of the instances.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:15:34 +0000 (14:15 -0500)]
drm/amdgpu/vcn5.0.1: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:20:38 +0000 (12:20 -0500)]
drm/amdgpu/vcn5.0.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:20:19 +0000 (12:20 -0500)]
drm/amdgpu/vcn4.0.5: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:20:04 +0000 (12:20 -0500)]
drm/amdgpu/vcn4.0.3: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:19:49 +0000 (12:19 -0500)]
drm/amdgpu/vcn4.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:19:33 +0000 (12:19 -0500)]
drm/amdgpu/vcn3.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 17:19:15 +0000 (12:19 -0500)]
drm/amdgpu/vcn2.5: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 15:57:48 +0000 (10:57 -0500)]
drm/amdgpu/vcn2.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 15:52:15 +0000 (10:52 -0500)]
drm/amdgpu/vcn1.0: add set_pg_state callback
Rework the code as a vcn instance callback.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 16:27:06 +0000 (11:27 -0500)]
drm/amdgpu/vcn: add new per instance callback for powergating
This is per instance so add a new function pointer for it.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 26 Nov 2024 16:14:58 +0000 (11:14 -0500)]
drm/amdgpu/vcn: adjust pause_dpg_mode function signature
Change it to take a vcn instance rather than adev to align
with the vcn instance changes.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 19:00:57 +0000 (14:00 -0500)]
drm/amdgpu/vcn5.0.1: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 23:07:03 +0000 (18:07 -0500)]
drm/amdgpu/vcn5.0.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 22:38:48 +0000 (17:38 -0500)]
drm/amdgpu/vcn4.0.5: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 22:01:49 +0000 (17:01 -0500)]
drm/amdgpu/vcn4.0.3: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 20:38:37 +0000 (15:38 -0500)]
drm/amdgpu/vcn4.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 18:53:40 +0000 (13:53 -0500)]
drm/amdgpu/vcn2.5: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 22 Nov 2024 18:30:16 +0000 (13:30 -0500)]
drm/amdgpu/vcn2.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 19 Nov 2024 21:51:36 +0000 (16:51 -0500)]
drm/amdgpu/vcn1.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 19 Nov 2024 21:10:46 +0000 (16:10 -0500)]
drm/amdgpu/vcn3.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.
TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 15 Nov 2024 22:44:01 +0000 (17:44 -0500)]
drm/amdgpu/vcn: switch vcn helpers to be instance based
Pass the instance to the helpers.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 15 Nov 2024 21:19:23 +0000 (16:19 -0500)]
drm/amdgpu/vcn: move more instanced data to vcn_instance
Move more per instance data into the per instance structure.
v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
v3: fix typo on vcn 2.5
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 20:28:41 +0000 (15:28 -0500)]
drm/amdgpu/vcn: make powergating status per instance
Store it per instance so we can track it per instance.
v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 19:43:15 +0000 (14:43 -0500)]
drm/amdgpu/vcn: switch work handler to be per instance
Have a separate work handler for each VCN instance. This
paves the way for per instance VCN power gating at runtime.
v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 10 Dec 2024 17:34:54 +0000 (12:34 -0500)]
drm/amdgpu/vcn5.0.1: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 17:27:45 +0000 (12:27 -0500)]
drm/amdgpu/vcn5.0.0: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 17:21:18 +0000 (12:21 -0500)]
drm/amdgpu/vcn4.0.5: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 17:13:15 +0000 (12:13 -0500)]
drm/amdgpu/vcn4.0.3: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 17:01:44 +0000 (12:01 -0500)]
drm/amdgpu/vcn4.0: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 13 Nov 2024 16:47:49 +0000 (11:47 -0500)]
drm/amdgpu/vcn3.0: split code along instances
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
v2: squash in fix for stop() from Boyuan
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 24 Feb 2025 16:13:27 +0000 (11:13 -0500)]
drm/amdgpu/vcn2.5: fix VCN stop logic
Need to make sure we call amdgpu_dpm_enable_vcn()
in vcn_v2_5_stop() at the end if there are errors
or DPG is enabled.
Fixes:
ebc25499de12 ("drm/amdgpu/vcn2.5: split code along instances")
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Suggested-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Tue, 25 Feb 2025 11:18:12 +0000 (19:18 +0800)]
drm/amdgpu: increase AMDGPU_MAX_RINGS
Increase it since a cper ring is introduced.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Srinivasan Shanmugam [Mon, 24 Feb 2025 11:56:22 +0000 (17:26 +0530)]
drm/amdgpu: Fix correct parameter desc for VCN idle check functions
Fixes the kdoc for the following VCN idle check functions by updating
the parameter description from 'handle' to 'ip_block':
- vcn_v4_0_is_idle
- vcn_v4_0_3_is_idle
- vcn_v4_0_5_is_idle
- vcn_v5_0_1_is_idle
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Excess function parameter 'handle' description in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Excess function parameter 'handle' description in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_is_idle'
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pierre-Eric Pelloux-Prayer [Thu, 20 Feb 2025 13:41:59 +0000 (14:41 +0100)]
drm/amdgpu: init return value in amdgpu_ttm_clear_buffer
Otherwise an uninitialized value can be returned if
amdgpu_res_cleared returns true for all regions.
Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
Fixes:
a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ganglxie [Mon, 24 Feb 2025 07:06:51 +0000 (15:06 +0800)]
drm/amdgpu: Change page/record number calculation based on nps
save only one record to save eeprom space,and
bad_page_num = pa_rec_num + mca_rec_num*16
Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ganglxie [Mon, 24 Feb 2025 07:03:05 +0000 (15:03 +0800)]
drm/amdgpu: Refine bad page adding
bad page adding can be simpler with nps info
Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Sat, 22 Feb 2025 10:11:35 +0000 (18:11 +0800)]
drm/amd/pm: Get metrics table version for smu_v13_0_12
Get metrics table version for smu_v13_0_12 and populate pm_metrics
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.zhang@amd.com [Fri, 21 Feb 2025 10:18:07 +0000 (18:18 +0800)]
drm/amdgpu: update SDMA sysfs reset mask in late_init
- Added `sdma_v4_4_2_update_reset_mask` function to update the reset mask.
- update the sysfs reset mask to the `late_init` stage to ensure that the SMU initialization
and capability setup are completed before checking the SDMA reset capability.
- For IP versions 9.4.3 and 9.4.4, enable per-queue reset if the MEC firmware version is at least 0xb0 and PMFW supports queue reset.
- Add a TODO comment for future support of per-queue reset for IP version 9.5.0.
This change ensures that per-queue reset is only enabled when the MEC and PMFW support it.
v2: fix ip version (9.5.4 -> 9.5.0)(Lijo)
Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Mon, 24 Feb 2025 15:01:06 +0000 (23:01 +0800)]
drm/amdgpu: Set CPER enabled flag after ring initiailized
Setting cper.enabled to be true only after cper ring is successfully
created.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ganglxie [Mon, 24 Feb 2025 03:17:33 +0000 (11:17 +0800)]
drm/amdgpu: Save nps to eeprom
nps info saved together with bad page makes bad page parsing more efficient
Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Mon, 24 Feb 2025 13:10:24 +0000 (21:10 +0800)]
drm/amdgpu: Check if CPER enabled when generating CPER
In the case of CPER disabled, generating CPER will cause kernel NULL
pointer dereference without checking.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mangesh Gadre [Fri, 21 Feb 2025 09:38:21 +0000 (17:38 +0800)]
drm/amd/pm: handling of set performance level
display performance level when set not supported
Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Mon, 10 Feb 2025 18:15:48 +0000 (13:15 -0500)]
drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Fri, 21 Feb 2025 14:39:27 +0000 (09:39 -0500)]
drm/amdkfd: enable cooperative launch on gfx12
Even though GWS no longer exists, to maintain runtime usage for
cooperative launch, SW set legacy GWS size.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 16 Feb 2025 21:27:26 +0000 (16:27 -0500)]
drm/amd/display: Promote DAL to 3.2.322
- Disable PSR-SU on eDP panels
- Fix HPD after GPU reset
- Fixes on dcn4x init, DML2 state policy on DCN36
- Various minor logic fixes
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 16 Feb 2025 20:06:06 +0000 (15:06 -0500)]
drm/amd/display: [FW Promotion] Release 0.0.255.0
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Wed, 12 Feb 2025 19:49:36 +0000 (14:49 -0500)]
drm/amd/display: Fix HPD after gpu reset
[Why]
DC is not using amdgpu_irq_get/put to manage the HPD interrupt refcounts.
So when amdgpu_irq_gpu_reset_resume_helper() reprograms all of the IRQs,
HPD gets disabled.
[How]
Use amdgpu_irq_get/put() for HPD init/fini in DM in order to sync refcounts
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mike Katsnelson [Thu, 13 Feb 2025 16:52:32 +0000 (11:52 -0500)]
drm/amd/display: stop DML2 from removing pipes based on planes
[Why]
Transitioning from low to high resolutions at high refresh rates caused grey corruption.
During the transition state, there is a period where plane size is based on low resultion
state and ODM slices are based on high resoultion state, causing the entire plane to be
contained in one ODM slice. DML2 would turn off the pipe for the ODM slice with no plane,
causing an underflow since the pixel rate for the higher resolution cannot be supported on
one pipe. This change stops DML2 from turning off pipes that are mapped to an ODM slice
with no plane. This is possible to do without negative consequences because pipes can now
take the minimum viewport and draw with zero recout size, removing the need to have the
pipe turned off.
[How]
In map_pipes_from_plane(), remove "check" that skips ODM slices that are not covered by
the plane. This prevents the pipes for those ODM slices from being freed.
Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Mike Katsnelson <mike.katsnelson@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nicholas Kazlauskas [Thu, 13 Feb 2025 22:40:29 +0000 (17:40 -0500)]
drm/amd/display: Increase halt timeout for DMCUB to 1s
[Why]
If we soft reset before halt finishes and there are outstanding
memory transactions then the memory interface may produce unexpected
results, such as out of order transactions when the firmware next runs.
These can manifest as random or unexpected load/store violations.
[How]
Increase the timeout before soft reset to ensure the DMCUB has quiesced.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Krunoslav Kovac [Fri, 14 Feb 2025 00:14:59 +0000 (19:14 -0500)]
drm/amd/display: Remove unused header
[Why]
Removes unused header
Reviewed-by: Samson Tam <samson.tam@amd.com>
Signed-off-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yihan Zhu [Wed, 12 Feb 2025 20:17:56 +0000 (15:17 -0500)]
drm/amd/display: handle max_downscale_src_width fail check
[WHY]
If max_downscale_src_width check fails, we exit early from TAP calculation and left a NULL
value to the scaling data structure to cause the zero divide in the DML validation.
[HOW]
Call set default TAP calculation before early exit in get_optimal_number_of_taps due to
max downscale limit exceed.
Reviewed-by: Samson Tam <samson.tam@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Fri, 24 Jan 2025 20:02:27 +0000 (15:02 -0500)]
drm/amd/display: Update FIXED_VS Link Rate Toggle Workaround Usage
[WHY]
Previously the 128b/132b LTTPR support DPCD field was used to decide if
FIXED_VS training sequence required a rate toggle before initiating LT.
When running DP2.1 4.9.x.x compliance tests, emulated LTTPRs can report
no-128b/132b support which is then forwarded by the FIXED_VS retimer.
As a result this test exposes the rate toggle again, erroneously causing
failures as certain compliance sinks don't expect this behaviour.
[HOW]
Add new DPCD register defines/reads to read LTTPR IEEE OUI and device ID.
Decide whether to perform the rate toggle based on the LTTPR's IEEE OUI
which guarantees that we only perform the toggle on affected retimers.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Thu, 13 Feb 2025 17:37:10 +0000 (12:37 -0500)]
drm/amd/display: fix dcn4x init failed
[why]
failed due to cmdtable not created.
switch atombios cmdtable as default.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Mon, 20 Jan 2025 20:27:23 +0000 (15:27 -0500)]
drm/amd/display: Temporarily disable hostvm on DCN31
With HostVM enabled, DCN31 fails to pass validation for 3x4k60. Some Linux
userspace does not downgrade one of the monitors to 4k30, and the result
is that the monitor does not light up. Disable it until the bandwidth
calculation failure is resolved.
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rafal Ostrowski [Wed, 12 Feb 2025 07:08:07 +0000 (08:08 +0100)]
drm/amd/display: ACPI Re-timer Programming
[Why]
We must implement an ACPI re-timer programming interface and notify
ACPI driver whenever a PHY transition is about to take place.
Because some trace lengths on certain platforms are very long,
then a re-timer may need to be programmed whenever a PHY transition
takes place. The implementation of this re-timer programming interface
will notify ACPI driver that PHY transition is taking place and it
will trigger the re-timer as needed.
First we need to gather retimer information from ACPI interface.
Then, in the PRE case, the re-timer interface needs to be called before we call
transmitter ENABLE.
In the POST case, it has to be called after we call transmitter DISABLE.
[How]
Implemented ACPI retimer programming interface.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rafal Ostrowski <rostrows@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Patel, Swapnil [Sun, 9 Feb 2025 16:42:23 +0000 (11:42 -0500)]
drm/amd/display: Refactor DCN4x and related code
[why & how]
Refactor existing code related to DCN4x for better code sharing with
other modules.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Swapnil Patel <Swapnil.Patel@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yilin Chen [Fri, 7 Feb 2025 20:26:19 +0000 (15:26 -0500)]
drm/amd/display: add a quirk to enable eDP0 on DP1
[why]
some board designs have eDP0 connected to DP1, need a way to enable
support_edp0_on_dp1 flag, otherwise edp related features cannot work
[how]
do a dmi check during dm initialization to identify systems that
require support_edp0_on_dp1. Optimize quirk table with callback
functions to set quirk entries, retrieve_dmi_info can set quirks
according to quirk entries
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Yilin Chen <Yilin.Chen@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Peichen Huang [Fri, 17 Jan 2025 02:48:11 +0000 (10:48 +0800)]
drm/amd/display: replace dio encoder access
[WHY]
replace dio encoder access to work with new dio encoder
assignment.
[HOW}
1. before validation, access dio encoder by get_temp_dio_link_enc()
2. after validation, access dio encoder through pipe_ctx->link_res
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>