A remote nport can stop responding to PLOGI beyond the ELS I/O timeout
under some fault conditions. When this happens, the non-response triggers
a dev_loss_tmo event from the transport which causes the driver to abort
the PLOGI and stop any retries. This was due to a policy in the ELS
completion handler whenever an ELS was terminated due to driver request.
Revise the ELS completion path to detect PLOGIs that were aborted and
allow retries.
Link: https://lore.kernel.org/r/20211020211417.88754-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
retry = 1;
delay = 100;
break;
+ case IOERR_SLI_ABORTED:
+ /* Retry ELS PLOGI command?
+ * Possibly the rport just wasn't ready.
+ */
+ if (cmd == ELS_CMD_PLOGI) {
+ /* No retry if state change */
+ if (ndlp &&
+ ndlp->nlp_state != NLP_STE_PLOGI_ISSUE)
+ goto out_retry;
+ retry = 1;
+ maxretry = 2;
+ }
+ break;
}
break;