drm/amdgpu: Forward soft recovery errors to userspace
authorJoshua Ashton <joshua@froggi.es>
Thu, 7 Mar 2024 19:04:31 +0000 (19:04 +0000)
committerAlex Deucher <alexander.deucher@amd.com>
Tue, 6 Aug 2024 15:11:01 +0000 (11:11 -0400)
As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.

1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index e238f2832f65b4daa95fb06f9141c5a05bbe5b77..908e134551523366c4adc914cfa192232f5c0036 100644 (file)
@@ -264,9 +264,8 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
        struct dma_fence *fence = NULL;
        int r;
 
-       /* Ignore soft recovered fences here */
        r = drm_sched_entity_error(s_entity);
-       if (r && r != -ENODATA)
+       if (r)
                goto error;
 
        if (!fence && job->gang_submit)