From: Janusz Krzysztofik Date: Fri, 13 Dec 2024 18:59:48 +0000 (+0100) Subject: drm/i915/selftests: Use preemption timeout on cleanup X-Git-Tag: block-6.14-20240131~40^2~8^2~46 X-Git-Url: https://git.kernel.dk/?a=commitdiff_plain;h=5efc58e409d9e11fc43a029c4186cf6671dd3521;p=linux-block.git drm/i915/selftests: Use preemption timeout on cleanup Many selftests call igt_flush_test() on cleanup. With default preemption timeout of compute engines raised to 7.5 seconds, hardcoded flush timeout of 3 seconds is too short. That results in GPU forcibly wedged and kernel taineted, then IGT abort triggered. CI BAT runs loose a part of their expected coverage. Calculate the flush timeout based on the longest preemption timeout currently configured for any engine. That way, selftest can still report detected issues as non-critical, and the GPU gets a chance to recover from preemptible hangs and prepare for fluent execution of next test cases. Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12061 Signed-off-by: Janusz Krzysztofik Reviewed-by: Andi Shyti Signed-off-by: Andi Shyti Link: https://patchwork.freedesktop.org/patch/msgid/20241213190122.513709-2-janusz.krzysztofik@linux.intel.com --- diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c index 29110abb4fe0..c383d31d46b0 100644 --- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c +++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c @@ -19,12 +19,22 @@ int igt_flush_test(struct drm_i915_private *i915) int ret = 0; for_each_gt(gt, i915, i) { + struct intel_engine_cs *engine; + unsigned long timeout_ms = 0; + unsigned int id; + if (intel_gt_is_wedged(gt)) ret = -EIO; + for_each_engine(engine, gt, id) { + if (engine->props.preempt_timeout_ms > timeout_ms) + timeout_ms = engine->props.preempt_timeout_ms; + } + cond_resched(); - if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) { + /* 2x longest preempt timeout, experimentally determined */ + if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500) == -ETIME) { pr_err("%pS timed out, cancelling all further testing.\n", __builtin_return_address(0));