iomap: simplify logic for when a dio can get completed inline
Currently iomap gates this on !IOMAP_DIO_WRITE, but this isn't entirely
accurate. Some writes can complete just fine inline. One such example is
polled IO, where the completion always happens in task context.
Add IOMAP_DIO_INLINE_COMP which tells the completion side if we can
complete this dio inline, or if it needs punting to a workqueue. We set
this flag by default for any dio, and turn it off for unwritten extents
or blocks that require a sync at completion time.
Gate the inline completion on whether we're in a task or not as well.
This will always be true for polled IO, but for IRQ driven IO, the
completion context may not allow for inline completions.
Testing a basic QD 1..8 dio random write with polled IO with the
following fio job:
fio --name=polled-dio-write --filename=/data1/file --time_based=1 \
--runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \
--cpus_allowed=4 --ioengine=io_uring --iodepth=$depth --hipri=1
yields:
Stock Patched Diff
=======================================
QD1 180K 201K +11%
QD2 356K 394K +10%
QD4 608K 650K +7%
QD8 827K 831K +0.5%
which shows a nice win, particularly for lower queue depth writes.
This is expected, as higher queue depths will be busy polling
completions while the offloaded workqueue completions can happen in
parallel.
Signed-off-by: Jens Axboe <axboe@kernel.dk>