drm/i915/guc: Avoid circular locking issue on busyness flush
authorJohn Harrison <John.C.Harrison@Intel.com>
Tue, 19 Dec 2023 19:59:57 +0000 (11:59 -0800)
committerJohn Harrison <John.C.Harrison@Intel.com>
Tue, 9 Jan 2024 18:14:09 +0000 (10:14 -0800)
commit0e00a8814eec16057e783170456442adde80c0b4
tree4343401c54b41dfbc6997713c11fea49fe3e22a0
parent2f2cc53b5fe7022f3ae602eb24573d52f8740959
drm/i915/guc: Avoid circular locking issue on busyness flush

Avoid the following lockdep complaint:
<4> [298.856498] ======================================================
<4> [298.856500] WARNING: possible circular locking dependency detected
<4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G
    N
<4> [298.856505] ------------------------------------------------------
<4> [298.856507] kworker/4:1H/190 is trying to acquire lock:
<4> [298.856509] ffff8881103e9978 (&gt->reset.backoff_srcu){++++}-{0:0}, at:
_intel_gt_reset_lock+0x35/0x380 [i915]
<4> [298.856661]
but task is already holding lock:
<4> [298.856663] ffffc900013f7e58
((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at:
process_scheduled_works+0x264/0x530
<4> [298.856671]
which lock already depends on the new lock.

The complaint is not actually valid. The busyness worker thread does
indeed hold the worker lock and then attempt to acquire the reset lock
(which may have happened in reverse order elsewhere). However, it does
so with a trylock that exits if the reset lock is not available
(specifically to prevent this and other similar deadlocks).
Unfortunately, lockdep does not understand the trylock semantics (the
lock is an i915 specific custom implementation for resets).

Not doing a synchronous flush of the worker thread when a reset is in
progress resolves the lockdep splat by never even attempting to grab
the lock in this particular scenario.

There are situatons where a synchronous cancel is required, however.
So, always do the synchronous cancel if not in reset. And add an extra
synchronous cancel to the end of the reset flow to account for when a
reset is occurring at driver shutdown and the cancel is no longer
synchronous but could lead to unallocated memory accesses if the
worker is not stopped.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20231219195957.212600-1-John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
drivers/gpu/drm/i915/gt/uc/intel_uc.c