nvme-multipath: don't inherit LBA-related fields for the multipath node
authorChristoph Hellwig <hch@lst.de>
Thu, 21 Mar 2024 21:08:19 +0000 (07:08 +1000)
committerKeith Busch <kbusch@kernel.org>
Tue, 2 Apr 2024 15:06:55 +0000 (08:06 -0700)
Linux 6.9 made the nvme multipath nodes not properly pick up changes when
the LBA size goes smaller after an nvme format.  This is because we now
try to inherit the queue settings for the multipath node entirely from
the individual paths.  That is the right thing to do for I/O size
limitations, which make up most of the queue limits, but it is wrong for
changes to the namespace configuration, where we do want to pick up the
new format, which will eventually show up on all paths once they are
re-queried.

Fix this by not inheriting the block size and related fields and always
for updating them.

Fixes: 8f03cfa117e0 ("nvme: don't use nvme_update_disk_info for the multipath disk")
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Tested-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
drivers/nvme/host/core.c

index 943d72bdd794ca5e6258cb02841447ca38898251..0cf46068f1d014fe6a4e5078ca0900d64268ec46 100644 (file)
@@ -2201,6 +2201,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
        }
 
        if (!ret && nvme_ns_head_multipath(ns->head)) {
+               struct queue_limits *ns_lim = &ns->disk->queue->limits;
                struct queue_limits lim;
 
                blk_mq_freeze_queue(ns->head->disk->queue);
@@ -2212,7 +2213,26 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
                set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
                nvme_mpath_revalidate_paths(ns);
 
+               /*
+                * queue_limits mixes values that are the hardware limitations
+                * for bio splitting with what is the device configuration.
+                *
+                * For NVMe the device configuration can change after e.g. a
+                * Format command, and we really want to pick up the new format
+                * value here.  But we must still stack the queue limits to the
+                * least common denominator for multipathing to split the bios
+                * properly.
+                *
+                * To work around this, we explicitly set the device
+                * configuration to those that we just queried, but only stack
+                * the splitting limits in to make sure we still obey possibly
+                * lower limitations of other controllers.
+                */
                lim = queue_limits_start_update(ns->head->disk->queue);
+               lim.logical_block_size = ns_lim->logical_block_size;
+               lim.physical_block_size = ns_lim->physical_block_size;
+               lim.io_min = ns_lim->io_min;
+               lim.io_opt = ns_lim->io_opt;
                queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
                                        ns->head->disk->disk_name);
                ret = queue_limits_commit_update(ns->head->disk->queue, &lim);