drm/xe/vf: Retry sending MMIO request to GUC on timeout error
authorSatyanarayana K V P <satyanarayana.k.v.p@intel.com>
Mon, 24 Feb 2025 10:28:07 +0000 (15:58 +0530)
committerMichal Wajdeczko <michal.wajdeczko@intel.com>
Fri, 28 Feb 2025 09:20:35 +0000 (10:20 +0100)
Add support to allow retrying the sending of MMIO requests
from the VF to the GUC in the event of an error. During the
suspend/resume process, VFs begin resuming only after the PF has
resumed. Although the PF resumes, the GUC reset and provisioning
occur later in a separate worker process.

When there are a large number of VFs, some may attempt to resume
before the PF has completed its provisioning. Therefore, if a
MMIO request from a VF fails during this period, we will retry
sending the request up to GUC_RESET_VF_STATE_RETRY_MAX times,
which is set to a maximum of 10 attempts.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michał Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piorkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224102807.11065-3-satyanarayana.k.v.p@intel.com
drivers/gpu/drm/xe/xe_gt_sriov_vf.c

index 4831549da319aa8c2b929ef7dfa5eaf576433458..a439261bf4d7294cf29df83a8e47392ddd92151c 100644 (file)
@@ -47,12 +47,19 @@ static int guc_action_vf_reset(struct xe_guc *guc)
        return ret > 0 ? -EPROTO : ret;
 }
 
+#define GUC_RESET_VF_STATE_RETRY_MAX   10
 static int vf_reset_guc_state(struct xe_gt *gt)
 {
+       unsigned int retry = GUC_RESET_VF_STATE_RETRY_MAX;
        struct xe_guc *guc = &gt->uc.guc;
        int err;
 
-       err = guc_action_vf_reset(guc);
+       do {
+               err = guc_action_vf_reset(guc);
+               if (!err || err != -ETIMEDOUT)
+                       break;
+       } while (--retry);
+
        if (unlikely(err))
                xe_gt_sriov_err(gt, "Failed to reset GuC state (%pe)\n", ERR_PTR(err));
        return err;