drm/amdkfd: Run restore_workers on freezable WQs
authorFelix Kuehling <Felix.Kuehling@amd.com>
Fri, 27 Oct 2023 22:21:55 +0000 (18:21 -0400)
committerAlex Deucher <alexander.deucher@amd.com>
Wed, 29 Nov 2023 21:49:23 +0000 (16:49 -0500)
commit9a1c1339abf972477aeef4ea037e650f49c5892d
treead6e97f0185045197b24bab66c42b2746d6316da
parent4fc26c2f912b5d9232dc4432fb1b7bfd6f016be6
drm/amdkfd: Run restore_workers on freezable WQs

Make restore workers freezable so we don't have to explicitly flush them
in suspend and GPU reset code paths, and we don't accidentally try to
restore BOs while the GPU is suspended. Not having to flush restore_work
also helps avoid lock/fence dependencies in the GPU reset case where we're
not allowed to wait for fences.

A side effect of this is, that we can now have multiple concurrent threads
trying to signal the same eviction fence. Rework eviction fence signaling
and replacement to account for that.

The GPU reset path can no longer rely on restore_process_worker to resume
queues because evict/restore workers can run independently of it. Instead
call a new restore_process_helper directly.

This is an RFC and request for testing.

v2:
- Reworked eviction fence signaling
- Introduced restore_process_helper

v3:
- Handle unsignaled eviction fences in restore_process_bos

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
drivers/gpu/drm/amd/amdkfd/kfd_process.c
drivers/gpu/drm/amd/amdkfd/kfd_svm.c