block/mq-deadline: use separate insertion lists
Reduce lock contention on dd->lock by calling dd_insert_request() from
inside the dispatch callback instead of from the insert callback. This
patch is inspired by a patch from Jens.
With the previous dispatch and merge optimization, this drastically
reduces contention for a sample cases of 32 threads doing IO to devices.
The test case looks as follows:
fio --bs=512 --group_reporting=1 --gtod_reduce=1 --invalidate=1 \
--ioengine=io_uring --norandommap --runtime=60 --rw=randread \
--thread --time_based=1 --buffered=0 --fixedbufs=1 --numjobs=32 \
--iodepth=4 --iodepth_batch_submit=4 --iodepth_batch_complete=4 \
--name=scaletest --filename=/dev/$DEV
Before:
Device IOPS sys contention diff
====================================================
null_blk 879K 89% 93.6%
nvme0n1 901K 86% 94.5%
and after this and the previous two patches:
Device IOPS sys contention diff
====================================================
null_blk 2867K 11.1% ~6.0% +226%
nvme0n1 3162K 9.9% ~5.0% +250%
which basically eliminates all of the lock contention, it's down to
more normal levels. The throughput increases show that nicely, with more
than a 300% improvement for both cases.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
[axboe: expand commit message with more details and perf results]
Signed-off-by: Jens Axboe <axboe@kernel.dk>