nouveau/gsp: add a 50ms delay between fbsr and driver unload rpcs
authorDave Airlie <airlied@redhat.com>
Wed, 2 Jul 2025 23:27:07 +0000 (09:27 +1000)
committerDanilo Krummrich <dakr@kernel.org>
Thu, 3 Jul 2025 22:22:12 +0000 (00:22 +0200)
This fixes a bunch of command hangs after runtime suspend/resume.

This fixes a regression caused by code movement in the commit below,
the commit seems to just change timings enough to cause this to happen
now, and adding the sleep seems to avoid it.

I've spent some time trying to root cause it to no great avail,
it seems like a bug on the firmware side, but it could be a bug
in our rpc handling that I can't find.

Either way, we should land the workaround to fix the problem,
while we continue to work out the root cause.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Fixes: c21b039715ce ("drm/nouveau/gsp: add hals for fbsr.suspend/resume()")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250702232707.175679-1-airlied@gmail.com
drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c

index baf42339f93ea78801db62c4afb83cb575d95b48..23f80e16770582febcaddbb8c7d54ad26a974c09 100644 (file)
@@ -1744,6 +1744,13 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
                        nvkm_gsp_sg_free(gsp->subdev.device, &gsp->sr.sgt);
                        return ret;
                }
+
+               /*
+                * TODO: Debug the GSP firmware / RPC handling to find out why
+                * without this Turing (but none of the other architectures)
+                * ends up resetting all channels after resume.
+                */
+               msleep(50);
        }
 
        ret = r535_gsp_rpc_unloading_guest_driver(gsp, suspend);