nvme: fix possible hang when removing a controller during error recovery
authorMing Lei <ming.lei@redhat.com>
Tue, 11 Jul 2023 09:40:39 +0000 (17:40 +0800)
committerKeith Busch <kbusch@kernel.org>
Fri, 21 Jul 2023 07:53:26 +0000 (00:53 -0700)
Error recovery can be interrupted by controller removal, then the
controller is left as quiesced, and IO hang can be caused.

Fix the issue by unquiescing controller unconditionally when removing
namespaces.

This way is reasonable and safe given forward progress can be made
when removing namespaces.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reported-by: Chunguang Xu <brookxu.cn@gmail.com>
Closes: https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
drivers/nvme/host/core.c

index 37b6fa7466620436b2ab23cd00eb0de91309bb30..f3a01b79148cb1e3246d759cf0bdab809270eb2a 100644 (file)
@@ -3933,6 +3933,12 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
         */
        nvme_mpath_clear_ctrl_paths(ctrl);
 
+       /*
+        * Unquiesce io queues so any pending IO won't hang, especially
+        * those submitted from scan work
+        */
+       nvme_unquiesce_io_queues(ctrl);
+
        /* prevent racing with ns scanning */
        flush_work(&ctrl->scan_work);
 
@@ -3942,10 +3948,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
         * removing the namespaces' disks; fail all the queues now to avoid
         * potentially having to clean up the failed sync later.
         */
-       if (ctrl->state == NVME_CTRL_DEAD) {
+       if (ctrl->state == NVME_CTRL_DEAD)
                nvme_mark_namespaces_dead(ctrl);
-               nvme_unquiesce_io_queues(ctrl);
-       }
 
        /* this is a no-op when called from the controller reset handler */
        nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);