drm/amdgpu: Check fence emitted count to identify bad jobs
authorShikang Fan <shikang.fan@amd.com>
Thu, 21 Nov 2024 09:06:30 +0000 (17:06 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Tue, 10 Dec 2024 15:26:48 +0000 (10:26 -0500)
commit0859eb540f1412cced6234922626c8b1e6072126
tree690a494a01bbb985eaa6f6e4569fccdb0e0c3a36
parent9aa879da796fde31533e72884276a440c8c1d886
drm/amdgpu: Check fence emitted count to identify bad jobs

In SRIOV, when host driver performs MODE 1 reset and notifies FLR to
guest driver, there is a small chance that there is no job running on hw
but the driver has not updated the pending list yet, causing the driver
not respond the FLR request. Modify the has_job_running function to
make sure if there is still running job.

v2: Use amdgpu_fence_count_emitted to determine job running status.
v3: Remove the timeout wait in has_job_running

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Shikang Fan <shikang.fan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c