drm/i915/guc: Handle race condition where wakeref count drops below 0
authorJesus Narvaez <jesus.narvaez@intel.com>
Wed, 28 May 2025 23:05:51 +0000 (16:05 -0700)
committerJohn Harrison <John.C.Harrison@Intel.com>
Fri, 30 May 2025 00:48:34 +0000 (17:48 -0700)
commitf36a75aba1c3176d177964bca76f86a075d2943a
treea03c6bdf4e181dd840713d00c88c3dabb019fc1f
parenta6a26786f22a4ab0227bcf610510c4c9c2df0808
drm/i915/guc: Handle race condition where wakeref count drops below 0

There is a rare race condition when preparing for a reset where
guc_lrc_desc_unpin() could be in the process of deregistering a context
while a different thread is scrubbing outstanding contexts and it alters
the context state and does a wakeref put. Then, if there is a failure
with deregister_context(), a second wakeref put could occur. As a result
the wakeref count could drop below 0 and fail an INTEL_WAKEREF_BUG_ON()
check.

Therefore if there is a failure with deregister_context(), undo the
context state changes and do a wakeref put only if the context was set
to be destroyed earlier.

v2: Expand comment to better explain change. (Daniele)
v3: Removed addition to the original comment. (Daniele)

Fixes: 2f2cc53b5fe7 ("drm/i915/guc: Close deregister-context race against CT-loss")
Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Mousumi Jana <mousumi.jana@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250528230551.1855177-1-jesus.narvaez@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c