drm/xe: Block exec and rebind worker while evicting for suspend / hibernate
authorThomas Hellström <thomas.hellstrom@linux.intel.com>
Thu, 4 Sep 2025 16:07:15 +0000 (18:07 +0200)
committerRodrigo Vivi <rodrigo.vivi@intel.com>
Tue, 9 Sep 2025 17:20:31 +0000 (13:20 -0400)
commiteb5723a75104605b7d2207a7d598e314166fbef4
tree40feb89b78ff22d45aa46ca1f731dd0411548042
parentd84820309ed34cc412ce76ecfa9471dae7d7d144
drm/xe: Block exec and rebind worker while evicting for suspend / hibernate

When the xe pm_notifier evicts for suspend / hibernate, there might be
racing tasks trying to re-validate again. This can lead to suspend taking
excessive time or get stuck in a live-lock. This behaviour becomes
much worse with the fix that actually makes re-validation bring back
bos to VRAM rather than letting them remain in TT.

Prevent that by having exec and the rebind worker waiting for a completion
that is set to block by the pm_notifier before suspend and is signaled
by the pm_notifier after resume / wakeup.

It's probably still possible to craft malicious applications that block
suspending. More work is pending to fix that.

v3:
- Avoid wait_for_completion() in the kernel worker since it could
  potentially cause work item flushes from freezable processes to
  wait forever. Instead terminate the rebind workers if needed and
  re-launch at resume. (Matt Auld)
v4:
- Fix some bad naming and leftover debug printouts.
- Fix kerneldoc.
- Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld).
- Rework the interface of xe_vm_rebind_resume_worker (Matt Auld).

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel.com
(cherry picked from commit 599334572a5a99111015fbbd5152ce4dedc2f8b7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drivers/gpu/drm/xe/xe_device_types.h
drivers/gpu/drm/xe/xe_exec.c
drivers/gpu/drm/xe/xe_pm.c
drivers/gpu/drm/xe/xe_vm.c
drivers/gpu/drm/xe/xe_vm.h
drivers/gpu/drm/xe/xe_vm_types.h