block/bfq: use separate insertion lists
Based on the similar patch for mq-deadline, this uses separate
insertion lists so we can defer touching dd->lock until dispatch
time.
This improves the following fio workload:
fio --bs=512 --group_reporting=1 --gtod_reduce=1 --invalidate=1 \
--ioengine=io_uring --norandommap --runtime=60 --rw=randread \
--thread --time_based=1 --buffered=0 --fixedbufs=1 --numjobs=32 \
--iodepth=4 --iodepth_batch_submit=4 --iodepth_batch_complete=4 \
--name=/dev/nvme0n1 --filename=/dev/nvme0n1
from:
/dev/nvme0n1: (groupid=0, jobs=32): err= 0: pid=1113: Fri Jan 19 20:59:26 2024
read: IOPS=567k, BW=277MiB/s (290MB/s)(1820MiB/6575msec)
bw ( KiB/s): min=274824, max=291156, per=100.00%, avg=283930.08, stdev=143.01, samples=416
iops : min=549648, max=582312, avg=567860.31, stdev=286.01, samples=416
cpu : usr=0.18%, sys=86.04%, ctx=866079, majf=0, minf=0
IO depths : 1=0.0%, 2=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=
3728344,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=4
with 96% lock contention and 86% sys time, to:
/dev/nvme0n1: (groupid=0, jobs=32): err= 0: pid=8922: Sat Jan 20 11:16:20 2024
read: IOPS=1550k, BW=757MiB/s (794MB/s)(19.6GiB/26471msec)
bw ( KiB/s): min=754668, max=848896, per=100.00%, avg=775459.33, stdev=624.43, samples=1664
iops : min=
1509336, max=
1697793, avg=
1550918.83, stdev=1248.87, samples=1664
cpu : usr=1.34%, sys=14.49%, ctx=
9950560, majf=0, minf=0
IO depths : 1=0.0%, 2=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=
41042924,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=4
with ~30% lock contention and 14.5% sys time, by applying the lessons
learnt with scaling mq-deadline.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>