drm/amdkfd: Reset GPU on queue preemption failure
authorHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Tue, 26 Mar 2024 19:32:46 +0000 (15:32 -0400)
committerAlex Deucher <alexander.deucher@amd.com>
Wed, 27 Mar 2024 05:44:53 +0000 (01:44 -0400)
Currently, with F32 HWS GPU reset is only when unmap queue fails.

However, if compute queue doesn't repond to preemption request in time
unmap will return without any error. In this case, only preemption error
is logged and Reset is not triggered. Call GPU reset in this case also.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 151fabf84040020f9a925d0069b76adaa2e524d5..c08b6ee252898d96981c16a8d739e7779778c802 100644 (file)
@@ -2000,6 +2000,7 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
        if (mqd_mgr->check_preemption_failed(mqd_mgr, dqm->packet_mgr.priv_queue->queue->mqd)) {
                while (halt_if_hws_hang)
                        schedule();
+               kfd_hws_hang(dqm);
                return -ETIME;
        }