fio.git
15 months agozbd: Avoid async I/O multi-job workload deadlock
Damien Le Moal [Thu, 21 Feb 2019 04:11:06 +0000 (13:11 +0900)]
zbd: Avoid async I/O multi-job workload deadlock

With zonemode=zbd, for a multi-job workload using asynchronous I/O
engines with a deep I/O queue depth setting, a job that is building a
batch of asynchronous I/Os to submit may end up waiting for an I/O
target zone lock held by another job that is also preparing a batch.
For small devices with few zones and/or a large number of jobs, such
prepare phase zone lock contention can be frequent enough to end up in a
situation where all jobs are waiting for zone locks held by other jobs
and no I/O being executed (so no zone being unlocked).

Avoid this situation by using pthread_mutex_trylock() instead of
pthread_mutex_lock() and by calling io_u_quiesce() to execute queued
I/O units if locking fails. pthread_mutex_lock() is then called to
lock the desired target zone. The execution of io_u_quiesce() forces
I/O execution progress and so zones to be unlocked, avoiding job
deadlock.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agozbd: Fix zone locking for async I/O engines
Damien Le Moal [Thu, 21 Feb 2019 04:11:05 +0000 (13:11 +0900)]
zbd: Fix zone locking for async I/O engines

For a zoned block device with zonemode=zbd, the lock on the target zone
of an I/O is held from the time the I/O is prepared with
zbd_adjust_block() execution in fill_io_u() until the I/O is queued in
td_io_queue(). For a sync I/O engines, this means that the target zone
of an I/O operations is locked throughout the liftime of an I/O unit,
resulting in the serialization of write request preparation and
execution, as well as serialization of write operations and reset zone
operations for a zone, avoiding error inducing reordering.

However, in the case of an async I/O engine, the engine ->commit()
method falls outside of the zone lock serialization for all I/O units
that will be issued by the method execution. This results in potential
reordering of write requests during issuing, as well as simultaneous
queueing of write requests and zone reset operations resulting in
unaligned write errors.

For example, using a 1GB null_blk zoned device, the command:

fio --name=nullb0 --filename=/dev/nullb0 --direct=1 --zonemode=zbd
    --bs=4k --rw=randwrite --ioengine=libaio --group_reporting=1
    --iodepth=32 --numjobs=4

always fails due to unaligned write errors.

Fix this by refining the control over zone locking and unlocking.
Locking of an I/O target zone is unchanged and done in
zbd_adjust_block(), but the I/O callback function zbd_post_submit()
which updates a zone write pointer and unlocks the zone is split into
two different callbacks zbd_queue_io() and zbd_put_io().
zbd_queue_io() updates the zone write pointer for write operations and
unlocks the target zone only if the I/O operation was not queued or if
the I/O operation completed during the execution of the engine
->queue() method (e.g. a sync I/O engine is being used). The execution
of this I/O callback is done right after executing the I/O engine
->queue() method. The zbd_put_io() callback is used to unlock an I/O
target zone after completion of an I/O from within the put_io_u()
function.

To simplify the code the helper functions zbd_queue_io_u() and
zbd_put_io_u() which respectively call an io_u zbd_queue_io() and
zbd_put_io() callbacks are introduced. These helper functions are
conditionally defined only if CONFIG_LINUX_BLKZONED is set.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agot/zbd: Default to using blkzone tool
Shin'ichiro Kawasaki [Thu, 21 Feb 2019 04:11:04 +0000 (13:11 +0900)]
t/zbd: Default to using blkzone tool

The test-zbd-support script fails to execute for partition devices with
the error message "Open /dev/sdX1 failed (No such file or directory)"
when libzbc tools are used by the script to open the specified
partition device. This is due to libzbc also opening a partition holder
block device file, which when closed causes a partition table
revalidation and the partition device files to be deleted and
recreated by udev through the RRPART ioctl.

To avoid the failure, default to using blkzone for zone report and
reset if supported by the system (util-linux v2.30 and higher) as this
tool does not open the older device and avoids the same problem.
To obtain the device maximum number of open zones, which is not
advertized by blkzone, use sg_inq for SCSI devices and use the default
maximum of 128 for other device types (i.e. null_blk devices in zone
mode).

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agot/zbd: Fix test 2 and 3 result handling
Damien Le Moal [Thu, 21 Feb 2019 04:11:03 +0000 (13:11 +0900)]
t/zbd: Fix test 2 and 3 result handling

Removal of the message "No I/O performed" when fio does not execute any
I/O broke zbd tests 2 and 3 as this message is looked after to test for
success. Fix this by looking for a "Run status" line starting with
"WRITE:" for test 2 and "READ:" for test 3. The run status lines are not
printed when no I/O is performed. Testing for the absence of these
strings thus allows to easily test if I/Os where executed or not.

Fixes: ff3aa922570c ("Kill "No I/O performed by ..." message")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agot/zbd: Fix handling of partition devices
Shin'ichiro Kawasaki [Thu, 21 Feb 2019 04:11:02 +0000 (13:11 +0900)]
t/zbd: Fix handling of partition devices

To allow t/zbd/tests-zbd-support test script to run correctly on
partitions of zoned block devices, fix access to the device properties
through sysfs by referencing the sysfs directory of the holder block
device. Doing so, the "zoned", "logical_block_size" and "mq" attributes
can be correctly accessed.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agosg: Clean up handling of big endian data fields
Dmitry Fomichev [Thu, 21 Feb 2019 04:11:01 +0000 (13:11 +0900)]
sg: Clean up handling of big endian data fields

Getting and setting values in SCSI commands and descriptors,
which are big endian, in SG driver can use a bit of cleanup.
This patch simplifies SG driver code by introducing a set of
accessor functions for reading raw big endian values from SCSI
buffers and another set for properly storing the local values
as big endian byte sequences.

The patch also adds some missing endianness conversion macros
in os.h.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agosg: Avoid READ CAPACITY failures
Dmitry Fomichev [Thu, 21 Feb 2019 04:11:00 +0000 (13:11 +0900)]
sg: Avoid READ CAPACITY failures

Some SCSI devices (very large disks or SMR zoned disks in particular)
do not support the READ CAPACITY(10) command and only reply
successfully to the READ CAPACITY(16) command. This patch forces the
execution READ CAPACITY(16) if READ CAPACITY(10) fails with
CHECK CONDITION.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agozbd: Fix partition block device handling
Shin'ichiro Kawasaki [Thu, 21 Feb 2019 04:10:59 +0000 (13:10 +0900)]
zbd: Fix partition block device handling

For fio to correctly handle the zonemode=zbd mode with partitions of
zoned block devices, the partition block device file must be identified
as a zoned disk. However, partition block device files do not have
a zoned sysfs file. This patch allows a correct identification of the
device file zone model by accessing the sysfs "zoned" file of the
holder disk for partition devices.

Change get_zbd_model() function to resolve the symbolic link to the
sysfs path to obtain the canonical sysfs path. The canonical sysfs
path of a partition device includes both of the holder device name and
the partition device name. If the given device is a partition device,
cut the partition device name in the canonical sysfs path to access
the "zoned" file in the holder device sysfs path.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agooptions: catch division by zero in setting CPU affinity
Vincent Fu [Tue, 19 Feb 2019 21:44:08 +0000 (16:44 -0500)]
options: catch division by zero in setting CPU affinity

Catch a division by zero and abort with a helpful message instead of a
signal 8 floating point error.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agostat: use long doubles to identify latency percentiles
Vincent Fu [Tue, 19 Feb 2019 21:44:07 +0000 (16:44 -0500)]
stat: use long doubles to identify latency percentiles

In some cases, the 100th percentile latency is not correctly identified
because of problems with double precision floating point arithmetic.
Use long doubles instead in the while loop condition to reduce the
likelihood of encountering this problem.

Also, print an error message when latency percentiles are not
successfully identified.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoconfigure: enable -Wimplicit-fallthrough if we have it
Jens Axboe [Mon, 11 Feb 2019 20:30:52 +0000 (13:30 -0700)]
configure: enable -Wimplicit-fallthrough if we have it

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoDocument switch fall-through cases
Jens Axboe [Mon, 11 Feb 2019 18:20:29 +0000 (11:20 -0700)]
Document switch fall-through cases

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoio_uring: sync header with the kernel
Jens Axboe [Sun, 10 Feb 2019 16:36:48 +0000 (09:36 -0700)]
io_uring: sync header with the kernel

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoclient/server: inflate error handling
Jeff Furlong [Fri, 8 Feb 2019 23:33:34 +0000 (16:33 -0700)]
client/server: inflate error handling

Occasionally fio client/server with zlib enabled may report:

fio: inflate error -5
fio: failed decompressing log
fio: failed converting IO log

The error -5 is a Z_BUF_ERROR, and references are available at
https://zlib.net/zlib_how.html and https://www.zlib.net/manual.html  It
seems that when decompressing the buffer, if the buffer chunk is the
same size as remaining data in the buffer, the Z_BUF_ERROR can safely be
ignored.  So one idea is to ignore the safe errors noting the zlib
references:

"inflate() can also return Z_STREAM_ERROR, which should not be possible
here, but could be checked for as noted above for def(). Z_BUF_ERROR
does not need to be checked for here, for the same reasons noted for
def(). Z_STREAM_END will be checked for later.

        ret = inflate(&strm, Z_NO_FLUSH);
        assert(ret != Z_STREAM_ERROR);  /* state not clobbered */
        switch (ret) {
        case Z_NEED_DICT:
            ret = Z_DATA_ERROR;     /* and fall through */
        case Z_DATA_ERROR:
        case Z_MEM_ERROR:
            (void)inflateEnd(&strm);
            return ret;
        }

...

The way we tell that deflate() has no more output is by seeing that it
did not fill the output buffer, leaving avail_out greater than zero.
However suppose that deflate() has no more output, but just so happened
to exactly fill the output buffer! avail_out is zero, and we can't tell
that deflate() has done all it can. As far as we know, deflate() has
more output for us. So we call it again. But now deflate() produces no
output at all, and avail_out remains unchanged as CHUNK. That deflate()
call wasn't able to do anything, either consume input or produce output,
and so it returns Z_BUF_ERROR. (See, I told you I'd cover this later.)
However this is not a problem at all. Now we finally have the desired
indication that deflate() is really done, and so we drop out of the
inner loop to provide more input to deflate()."

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoFio 3.13 fio-3.13
Jens Axboe [Fri, 8 Feb 2019 19:47:47 +0000 (12:47 -0700)]
Fio 3.13

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agostat: put 'percentiles' object in appropriate 'clat_ns' or 'lat_ns' parent
Vincent Fu [Thu, 7 Feb 2019 15:51:07 +0000 (10:51 -0500)]
stat: put 'percentiles' object in appropriate 'clat_ns' or 'lat_ns' parent

In the JSON output, the 'percentiles' object currently always appears within the
'clat_ns' object. Put it inside the 'lat_ns' object when --lat_percentiles=1 is set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agostat: clean up calc_clat_percentiles
Vincent Fu [Thu, 7 Feb 2019 15:51:06 +0000 (10:51 -0500)]
stat: clean up calc_clat_percentiles

We already know the size of the buffer needed. So there
is no need to do anything fancy when allocating it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoImprove wording in REPORTING-BUGS
Jens Axboe [Mon, 4 Feb 2019 16:01:48 +0000 (09:01 -0700)]
Improve wording in REPORTING-BUGS

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agot/io_uring: verbose error for -95/-EOPNOTSUPP failure
Jens Axboe [Fri, 1 Feb 2019 05:55:52 +0000 (22:55 -0700)]
t/io_uring: verbose error for -95/-EOPNOTSUPP failure

If we fail with this error and polling is enabled, it's because the
file system hosting the file doesn't support polling. Let the user
know.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoio_uring: ensure we use the right argument syscall
Jens Axboe [Tue, 29 Jan 2019 19:20:02 +0000 (12:20 -0700)]
io_uring: ensure we use the right argument syscall

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agot/io_uring: fix bad if
Jens Axboe [Tue, 29 Jan 2019 19:04:22 +0000 (12:04 -0700)]
t/io_uring: fix bad if

We need braces for that check, or it's always going to be true.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoio_uring: update to kernel struct io_uring_params
Jens Axboe [Tue, 29 Jan 2019 13:25:54 +0000 (06:25 -0700)]
io_uring: update to kernel struct io_uring_params

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoio_uring: sync with kernel
Jens Axboe [Mon, 28 Jan 2019 18:42:20 +0000 (11:42 -0700)]
io_uring: sync with kernel

- Update to newer kernel API header
- Use IORING_ENTER_SQ_WAKEUP

Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoMerge branch 'hygon-support' of https://github.com/hygonsoc/fio
Jens Axboe [Mon, 28 Jan 2019 17:43:09 +0000 (10:43 -0700)]
Merge branch 'hygon-support' of https://github.com/hygonsoc/fio

* 'hygon-support' of https://github.com/hygonsoc/fio:
  Add Hygon SoC support to enable tsc_reliable feature

15 months agoAdd Hygon SoC support to enable tsc_reliable feature
hygonsoc [Mon, 28 Jan 2019 16:11:04 +0000 (00:11 +0800)]
Add Hygon SoC support to enable tsc_reliable feature

16 months agorate-submit: call ioengine post_init when starting workers
Vincent Fu [Thu, 24 Jan 2019 19:26:42 +0000 (14:26 -0500)]
rate-submit: call ioengine post_init when starting workers

ioengines with post_init steps were not fully fully initialized by
offload worker threads because the post_init function was never called.

Without this patch all libaio operations submitted in offload mode fail
because the ioengine was not fully initialized.

Fixes: 2041bd343da1 ("engines/libaio: add preliminary support for pre-mapped IO buffers")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: system calls have been renumbered
Jens Axboe [Wed, 23 Jan 2019 15:04:28 +0000 (08:04 -0700)]
io_uring: system calls have been renumbered

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoengines/io_uring: cleanup setrlimit()
Jens Axboe [Wed, 16 Jan 2019 16:07:13 +0000 (09:07 -0700)]
engines/io_uring: cleanup setrlimit()

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: sync with upstream API
Jens Axboe [Wed, 16 Jan 2019 15:45:43 +0000 (08:45 -0700)]
io_uring: sync with upstream API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoengines/io_uring: ensure sqe stores are ordered SQ ring tail update
Jens Axboe [Wed, 16 Jan 2019 05:06:05 +0000 (22:06 -0700)]
engines/io_uring: ensure sqe stores are ordered SQ ring tail update

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: use fio provided memory barriers
Jens Axboe [Wed, 16 Jan 2019 04:43:52 +0000 (21:43 -0700)]
t/io_uring: use fio provided memory barriers

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agox86-64: correct read/write barriers
Jens Axboe [Wed, 16 Jan 2019 04:43:11 +0000 (21:43 -0700)]
x86-64: correct read/write barriers

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: fixes
Jens Axboe [Tue, 15 Jan 2019 21:48:31 +0000 (14:48 -0700)]
t/io_uring: fixes

- Break out if we get a fatal error from reap_events()
- Ignore polled=1 if do_nop=1

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: terminate buf[] file depth string
Jens Axboe [Tue, 15 Jan 2019 20:52:14 +0000 (13:52 -0700)]
t/io_uring: terminate buf[] file depth string

Prevents garbage print for !s->nr_files (do_nop = 1).

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: wait if we're at queue limit
Jens Axboe [Tue, 15 Jan 2019 17:58:17 +0000 (10:58 -0700)]
t/io_uring: wait if we're at queue limit

There was an off-by-one there, it's perfectly fine not to specify
events to wait for if the submission will take us to the queue
depth limit.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: print file depths
Jens Axboe [Tue, 15 Jan 2019 13:12:54 +0000 (06:12 -0700)]
t/io_uring: print file depths

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: pick next file if we're over the limti
Jens Axboe [Tue, 15 Jan 2019 12:57:54 +0000 (05:57 -0700)]
t/io_uring: pick next file if we're over the limti

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: use the right check for when to wait
Jens Axboe [Mon, 14 Jan 2019 05:49:48 +0000 (22:49 -0700)]
t/io_uring: use the right check for when to wait

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: only call setrlimit() for fixedbufs
Jens Axboe [Sun, 13 Jan 2019 21:22:03 +0000 (14:22 -0700)]
t/io_uring: only call setrlimit() for fixedbufs

It's root only.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: add 32-bit x86 support
Jens Axboe [Sun, 13 Jan 2019 17:57:44 +0000 (10:57 -0700)]
io_uring: add 32-bit x86 support

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: add option for register_files
Jens Axboe [Sun, 13 Jan 2019 17:56:39 +0000 (10:56 -0700)]
t/io_uring: add option for register_files

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: fix pointer cast warning on 32-bit
Jens Axboe [Sun, 13 Jan 2019 16:17:39 +0000 (09:17 -0700)]
io_uring: fix pointer cast warning on 32-bit

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: ensure that the io_uring_register() structs are 32-bit safe
Jens Axboe [Sun, 13 Jan 2019 16:15:32 +0000 (09:15 -0700)]
io_uring: ensure that the io_uring_register() structs are 32-bit safe

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoMove io_uring to os/linux/
Jens Axboe [Sun, 13 Jan 2019 15:56:11 +0000 (08:56 -0700)]
Move io_uring to os/linux/

It's not a generic OS header, reflect the fact that it's Linux only
by moving it to a linux/ directory.

Also update io_uring_sqe to match current API.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: add IORING_OP_NOP support
Jens Axboe [Sun, 13 Jan 2019 05:14:54 +0000 (22:14 -0700)]
t/io_uring: add IORING_OP_NOP support

Doesn't do anything on the kernel side, just a round trip through
the SQ and CQ ring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: only set IORING_ENTER_GETEVENTS when actively reaping
Jens Axboe [Fri, 11 Jan 2019 21:40:16 +0000 (14:40 -0700)]
t/io_uring: only set IORING_ENTER_GETEVENTS when actively reaping

Don't set it if we don't need to find an event (to_wait == 0).

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoengines/io_uring: remove unused ld->io_us array
Jens Axboe [Fri, 11 Jan 2019 21:15:26 +0000 (14:15 -0700)]
engines/io_uring: remove unused ld->io_us array

Leftover from a previous API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: remember to set p->sq_thread_cpu
Jens Axboe [Fri, 11 Jan 2019 18:38:29 +0000 (11:38 -0700)]
t/io_uring: remember to set p->sq_thread_cpu

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: update to newer API
Jens Axboe [Fri, 11 Jan 2019 17:33:28 +0000 (10:33 -0700)]
io_uring: update to newer API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: add support for registered files
Jens Axboe [Fri, 11 Jan 2019 05:27:56 +0000 (22:27 -0700)]
t/io_uring: add support for registered files

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: make submits/reaps per-second reflected with sq thread poll
Jens Axboe [Fri, 11 Jan 2019 04:38:35 +0000 (21:38 -0700)]
t/io_uring: make submits/reaps per-second reflected with sq thread poll

If we use polling, the numbers currently read as 0. Make them -1 to
reflect that we're actually doing zero calls per IO.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: enable SQ thread poll mode
Jens Axboe [Fri, 11 Jan 2019 04:37:15 +0000 (21:37 -0700)]
t/io_uring: enable SQ thread poll mode

With this, we can do IO without ever entering the kernel.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: make more efficient for multiple files
Jens Axboe [Fri, 11 Jan 2019 02:43:41 +0000 (19:43 -0700)]
t/io_uring: make more efficient for multiple files

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: restore usage of IORING_SETUP_IOPOLL
Jens Axboe [Fri, 11 Jan 2019 02:10:03 +0000 (19:10 -0700)]
t/io_uring: restore usage of IORING_SETUP_IOPOLL

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: cleanup sq thread poll/cpu setup
Jens Axboe [Thu, 10 Jan 2019 22:42:07 +0000 (15:42 -0700)]
io_uring: cleanup sq thread poll/cpu setup

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoUpdate io_uring API
Jens Axboe [Thu, 10 Jan 2019 21:22:08 +0000 (14:22 -0700)]
Update io_uring API

- Fixed buffers are now available through io_uring_register()
- Various thread/wq options are now dead and automatic instead
- sqe->index is now sqe->buf_index
- Fixed buffers require flag, not separate opcode

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: io_uring_setup(2) takes a 'nr_iovecs' field now
Jens Axboe [Thu, 10 Jan 2019 16:48:37 +0000 (09:48 -0700)]
io_uring: io_uring_setup(2) takes a 'nr_iovecs' field now

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoMakefile: make t/io_uring depend on os/io_uring.h
Jens Axboe [Thu, 10 Jan 2019 16:45:58 +0000 (09:45 -0700)]
Makefile: make t/io_uring depend on os/io_uring.h

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoUpdate to newer io_uring API
Jens Axboe [Thu, 10 Jan 2019 16:39:14 +0000 (09:39 -0700)]
Update to newer io_uring API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoengines/io_uring: always setup ld->iovecs[]
Jens Axboe [Wed, 9 Jan 2019 22:11:04 +0000 (15:11 -0700)]
engines/io_uring: always setup ld->iovecs[]

We need it now for the vectored commands. But only pass it in to
ring setup, if we use fixedbufs.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoUpdate to newer io_uring API
Jens Axboe [Wed, 9 Jan 2019 21:53:56 +0000 (14:53 -0700)]
Update to newer io_uring API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoengines/devdax: Make detection of device-dax instances more robust
Dan Williams [Tue, 8 Jan 2019 19:34:19 +0000 (11:34 -0800)]
engines/devdax: Make detection of device-dax instances more robust

In preparation for the kernel switching device-dax instances from the
"/sys/class/dax" subsystem to "/sys/bus/dax" [1], teach the device-dax
instance detection to be subsystem-type agnostic.

Note that the subsystem switch will require an administrator, or distro
opt-in. The opt-in will either be at kernel compile time by disabling
the default compatibility driver in the kernel, or at runtime with a
modprobe policy to override which kernel module service device-dax
devices. The daxctl utility [2] will ship a command to install the
modprobe policy and include a man page that lists the potential
regression risk to older FIO and other userspace tools that are hard
coded to "/sys/class/dax".

[1]: https://lwn.net/Articles/770128/
[2]: https://github.com/pmem/ndctl/tree/master/daxctl

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/io_uring: ensure to use the right opcode for fixed buffers
Jens Axboe [Tue, 8 Jan 2019 17:26:47 +0000 (10:26 -0700)]
t/io_uring: ensure to use the right opcode for fixed buffers

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoengines/io_uring: ensure to use the right opcode for fixed buffers
Jens Axboe [Tue, 8 Jan 2019 17:26:19 +0000 (10:26 -0700)]
engines/io_uring: ensure to use the right opcode for fixed buffers

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoconfigure: add __kernel_rwf_t check
Jens Axboe [Tue, 8 Jan 2019 17:14:00 +0000 (10:14 -0700)]
configure: add __kernel_rwf_t check

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring: use kernel header directly
Jens Axboe [Tue, 8 Jan 2019 13:20:58 +0000 (06:20 -0700)]
io_uring: use kernel header directly

The kernel header has been designed as such that it doesn't require
a special userland version of it. Use it directly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoio_uring.h should include <linux/fs.h>
Jens Axboe [Tue, 8 Jan 2019 12:43:38 +0000 (05:43 -0700)]
io_uring.h should include <linux/fs.h>

This ensures we have the __kernel_rwf_t definition.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoRename aioring engine to io_uring
Jens Axboe [Tue, 8 Jan 2019 04:46:30 +0000 (21:46 -0700)]
Rename aioring engine to io_uring

The new API is completely discoupled from the aio/libaio
interface. Rename it while adopting the new API.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoRename t/aio-ring to t/io_uring
Jens Axboe [Tue, 8 Jan 2019 04:35:15 +0000 (21:35 -0700)]
Rename t/aio-ring to t/io_uring

The new API is completely discoupled from the aio/libaio
interface. Rename it while adopting the new API.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/aio-ring: cleanup the code a bit
Jens Axboe [Sat, 5 Jan 2019 14:42:30 +0000 (07:42 -0700)]
t/aio-ring: cleanup the code a bit

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoaioring: make sq/cqring_offsets a bit more future proof
Jens Axboe [Sat, 5 Jan 2019 14:37:02 +0000 (07:37 -0700)]
aioring: make sq/cqring_offsets a bit more future proof

And include 'dropped' as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoaioring: update to newer API
Jens Axboe [Sat, 5 Jan 2019 05:22:54 +0000 (22:22 -0700)]
aioring: update to newer API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/aio-ring: use syscall defines
Jens Axboe [Fri, 4 Jan 2019 21:02:25 +0000 (14:02 -0700)]
t/aio-ring: use syscall defines

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoengines/aioring: update for newer mmap based API
Jens Axboe [Fri, 4 Jan 2019 21:00:30 +0000 (14:00 -0700)]
engines/aioring: update for newer mmap based API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agot/aio-ring: update to newer mmap() API
Jens Axboe [Fri, 4 Jan 2019 20:27:46 +0000 (13:27 -0700)]
t/aio-ring: update to newer mmap() API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoaioring: remove IOCB_FLAG_HIPRI
Jens Axboe [Mon, 31 Dec 2018 00:19:40 +0000 (17:19 -0700)]
aioring: remove IOCB_FLAG_HIPRI

New API doesn't require the setting of this flag at runtime,
it's implied from the io context.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
16 months agoaioring: update API
Jens Axboe [Sun, 30 Dec 2018 23:40:09 +0000 (16:40 -0700)]
aioring: update API

Both the engine and t/aio-ring, drop IORING_FLAG_SUBMIT as it's
been dropped on the kernel side. Renumber IORING_FLAG_GETEVENTS.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agot/aio-ring: print head/tail as unsigneds
Jens Axboe [Fri, 21 Dec 2018 22:37:16 +0000 (15:37 -0700)]
t/aio-ring: print head/tail as unsigneds

Since we're wrapping now and using the full range, we can get
logging ala:

IOPS=1094880, IOS/call=32/32, inflight=32 (head=-1509517216 tail=-1509517216), Cachehit=0.00%

Ensure we print as unsigned, as that's the right type.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/aioring: fix harmless typo
Jens Axboe [Fri, 21 Dec 2018 22:09:45 +0000 (15:09 -0700)]
engines/aioring: fix harmless typo

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agot/aio-ring: update for continually rolling ring
Jens Axboe [Fri, 21 Dec 2018 21:47:34 +0000 (14:47 -0700)]
t/aio-ring: update for continually rolling ring

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/aioring: update for continually rolling ring
Jens Axboe [Fri, 21 Dec 2018 21:47:07 +0000 (14:47 -0700)]
engines/aioring: update for continually rolling ring

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/aio-ring: initialization error handling
Jens Axboe [Wed, 19 Dec 2018 19:55:10 +0000 (12:55 -0700)]
engines/aio-ring: initialization error handling

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/aio-ring: cleanup read/write prep
Jens Axboe [Wed, 19 Dec 2018 19:51:51 +0000 (12:51 -0700)]
engines/aio-ring: cleanup read/write prep

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoFix 'min' latency times being 0 with ramp_time
Jens Axboe [Fri, 14 Dec 2018 21:36:52 +0000 (14:36 -0700)]
Fix 'min' latency times being 0 with ramp_time

If the job includes a ramp_time setting, we end up with latencies
that look like this:

    slat (nsec): min=0, max=17585, avg=1896.34, stdev=733.35
    clat (nsec): min=0, max=1398.1k, avg=77851.76, stdev=25055.97
     lat (nsec): min=0, max=1406.1k, avg=79824.20, stdev=25066.57

with the 'min' being 0. This is because the reset stats sets the
field to zero, and no new IO will be smaller than that...

Set the min value to the max value of the type when we reset stats.

Reported-by: Matthew Eaton <m.eaton82@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/aioring: get rid of old error on sqwq and sqthread
Jens Axboe [Fri, 14 Dec 2018 20:07:55 +0000 (13:07 -0700)]
engines/aioring: get rid of old error on sqwq and sqthread

They are not mutually exclusive for buffered aio.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agot/aio-ring: add cache hit statistics
Jens Axboe [Fri, 14 Dec 2018 17:54:01 +0000 (10:54 -0700)]
t/aio-ring: add cache hit statistics

Pretty nifty to run it on a drive that will eventually end up being
fully cached, and watch the hit rates climb:

sudo taskset -c 0 t/aio-ring /dev/sde3
polled=0, fixedbufs=1, buffered=1
  QD=32, sq_ring=33, cq_ring=66
submitter=4269
IOPS=477, IOS/call=1/0, inflight=32 (head=50 tail=50), Cachehit=0.00%
IOPS=447, IOS/call=1/1, inflight=32 (head=35 tail=35), Cachehit=0.00%
IOPS=419, IOS/call=1/1, inflight=32 (head=58 tail=58), Cachehit=0.00%
[...]
IOPS=483, IOS/call=1/1, inflight=32 (head=63 tail=63), Cachehit=5.80%
IOPS=452, IOS/call=1/1, inflight=32 (head=53 tail=53), Cachehit=4.65%
IOPS=459, IOS/call=1/1, inflight=32 (head=50 tail=50), Cachehit=5.45%
[...]
IOPS=484, IOS/call=1/1, inflight=32 (head=22 tail=22), Cachehit=11.16%
IOPS=494, IOS/call=1/1, inflight=32 (head=54 tail=54), Cachehit=11.34%
IOPS=508, IOS/call=1/1, inflight=32 (head=34 tail=34), Cachehit=12.99%
[...]
IOPS=606, IOS/call=1/1, inflight=32 (head=18 tail=18), Cachehit=26.07%
IOPS=573, IOS/call=1/1, inflight=32 (head=63 tail=63), Cachehit=26.70%
IOPS=561, IOS/call=1/1, inflight=32 (head=30 tail=30), Cachehit=23.53%
[...]
IOPS=916, IOS/call=1/1, inflight=32 (head=63 tail=63), Cachehit=59.06%
IOPS=882, IOS/call=1/1, inflight=32 (head=21 tail=32), Cachehit=61.79%
IOPS=984, IOS/call=1/1, inflight=32 (head=22 tail=22), Cachehit=63.87%
[...]
IOPS=1993, IOS/call=7/7, inflight=32 (head=58 tail=4), Cachehit=86.75%
IOPS=2260, IOS/call=5/5, inflight=32 (head=12 tail=16), Cachehit=87.15%
IOPS=1957, IOS/call=4/4, inflight=17 (head=7 tail=10), Cachehit=86.78%
[...]
IOPS=3606, IOS/call=7/7, inflight=32 (head=26 tail=35), Cachehit=93.47%
IOPS=3487, IOS/call=6/6, inflight=28 (head=23 tail=31), Cachehit=92.59%
IOPS=3379, IOS/call=7/7, inflight=26 (head=38 tail=40), Cachehit=92.66%
[...]
IOPS=4590, IOS/call=6/6, inflight=26 (head=38 tail=46), Cachehit=95.64%
IOPS=5464, IOS/call=7/7, inflight=28 (head=22 tail=24), Cachehit=95.94%
IOPS=4896, IOS/call=8/8, inflight=18 (head=44 tail=51), Cachehit=95.62%
[...]
IOPS=7736, IOS/call=8/8, inflight=24 (head=25 tail=29), Cachehit=97.35%
IOPS=6632, IOS/call=8/7, inflight=27 (head=54 tail=61), Cachehit=97.28%
IOPS=8488, IOS/call=8/8, inflight=22 (head=33 tail=39), Cachehit=97.33%
[...]
IOPS=10696, IOS/call=8/8, inflight=16 (head=63 tail=64), Cachehit=98.11%
IOPS=11874, IOS/call=7/7, inflight=17 (head=56 tail=56), Cachehit=98.31%
IOPS=11488, IOS/call=8/7, inflight=23 (head=54 tail=57), Cachehit=98.17%
[...]
IOPS=15472, IOS/call=8/8, inflight=17 (head=11 tail=17), Cachehit=98.58%
IOPS=18656, IOS/call=8/7, inflight=22 (head=50 tail=59), Cachehit=98.95%
IOPS=19408, IOS/call=8/8, inflight=18 (head=58 tail=63), Cachehit=99.01%
[...]
IOPS=54768, IOS/call=8/7, inflight=19 (head=63 tail=3), Cachehit=99.64%
IOPS=62888, IOS/call=8/7, inflight=21 (head=51 tail=53), Cachehit=99.73%
IOPS=71656, IOS/call=7/7, inflight=24 (head=28 tail=36), Cachehit=99.75%
[...]
IOPS=125320, IOS/call=8/8, inflight=22 (head=42 tail=46), Cachehit=99.85%
IOPS=201808, IOS/call=8/8, inflight=17 (head=27 tail=35), Cachehit=99.90%
IOPS=390325, IOS/call=7/7, inflight=22 (head=23 tail=27), Cachehit=99.94%
[...]
IOPS=834056, IOS/call=8/8, inflight=8 (head=23 tail=27), Cachehit=100.00%
IOPS=837520, IOS/call=8/8, inflight=8 (head=13 tail=17), Cachehit=100.00%
IOPS=833232, IOS/call=8/8, inflight=8 (head=51 tail=57), Cachehit=100.00%

It's also a nice visual into how high a cache hit rate has to be on a
rotational drive to make a substantial impact on performance.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoAdd cache hit stats
Jens Axboe [Fri, 14 Dec 2018 15:32:01 +0000 (08:32 -0700)]
Add cache hit stats

With the aioring engine, we can get notified if a buffered read was
a cache hit or if it hit media. Add that to the output stats for
normal and json output.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoclient/server: convert nr_zone_resets on the wire
Jens Axboe [Fri, 14 Dec 2018 15:29:14 +0000 (08:29 -0700)]
client/server: convert nr_zone_resets on the wire

Fixes: fd5d733fa34 ("Collect and show zone reset statistics")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/aioring: update to newer API
Jens Axboe [Thu, 13 Dec 2018 21:23:39 +0000 (14:23 -0700)]
engines/aioring: update to newer API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/aioring: enable IOCTX_FLAG_SQPOLL
Jens Axboe [Thu, 13 Dec 2018 20:52:35 +0000 (13:52 -0700)]
engines/aioring: enable IOCTX_FLAG_SQPOLL

With this flag set, we don't have to do any system calls for
polled IO.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoio_u: ensure buflen is capped at maxbs
Jens Axboe [Thu, 13 Dec 2018 16:09:42 +0000 (09:09 -0700)]
io_u: ensure buflen is capped at maxbs

If we use bsranges and the maxbs isn't a natural multiple of the minbs,
then we can generate sizes that are larger than maxbs. Ensure that we
cap the buffer length generated at maxbs.

Sample workload and problem report:

fio --name=App2 --size=10m --rw=read --blocksize_range=3k-10k

App2: (g=0): rw=read, bs=(R) 3072B-10.0KiB, (W) 3072B-10.0KiB, (T) 3072B-10.0KiB, ioengine=psync, iodepth=1
fio-3.12-17-g0fcbc0
Starting 1 process
*** Error in `fio': double free or corruption (!prev): 0x0000555f92a80a60 ***
fio: pid=1468, got signal=6

App2: (groupid=0, jobs=1): err= 0: pid=1468: Wed Dec 12 19:09:07 2018
read: IOPS=8365, BW=52.9MiB/s (55.5MB/s)(9.00MiB/189msec)
clat (nsec): min=874, max=74912k, avg=116222.51, stdev=2186743.16
lat (nsec): min=912, max=74912k, avg=116373.83, stdev=2186743.70
clat percentiles (nsec):
| 1.00th=[ 964], 5.00th=[ 1128], 10.00th=[ 1368],
| 20.00th=[ 1672], 30.00th=[ 2008], 40.00th=[ 2288],
| 50.00th=[ 2704], 60.00th=[ 3088], 70.00th=[ 3536],
| 80.00th=[ 4768], 90.00th=[ 6304], 95.00th=[ 8160],
| 99.00th=[ 544768], 99.50th=[ 2113536], 99.90th=[30539776],
| 99.95th=[74973184], 99.99th=[74973184]
lat (nsec) : 1000=1.52%
lat (usec) : 2=28.34%, 4=44.85%, 10=21.70%, 20=0.51%, 50=0.32%
lat (usec) : 250=0.25%, 500=1.20%, 750=0.76%
lat (msec) : 4=0.25%, 20=0.13%, 50=0.13%, 100=0.06%
cpu : usr=3.72%, sys=3.72%, ctx=43, majf=0, minf=14
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=1581,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: bw=52.9MiB/s (55.5MB/s), 52.9MiB/s-52.9MiB/s (55.5MB/s-55.5MB/s), io=9.00MiB (10.5MB), run=189-189msec

Disk stats (read/write):
sda: ios=24/0, merge=0/0, ticks=188/0, in_queue=260, util=55.70%

Fixes: https://github.com/axboe/fio/issues/726
Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/aioring: various updates and fixes
Jens Axboe [Thu, 13 Dec 2018 13:33:37 +0000 (06:33 -0700)]
engines/aioring: various updates and fixes

- Add support for SQWQ and SQTHREAD. Buffered is now async!
- Kill unnecessary ifdefs
- Cleanup/fix error handling
- Handle fsync like a queued command
- Queue depth handling fixups

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoengines/libaio: remove features deprecated from old interface
Jens Axboe [Thu, 13 Dec 2018 05:02:16 +0000 (22:02 -0700)]
engines/libaio: remove features deprecated from old interface

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoaioring: remove qd > 1 restriction
Jens Axboe [Thu, 13 Dec 2018 04:10:25 +0000 (21:10 -0700)]
aioring: remove qd > 1 restriction

Just add the extra ring entry we need in ->init().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoaioring: check for arch support AFTER including the headers
Jens Axboe [Thu, 13 Dec 2018 03:31:52 +0000 (20:31 -0700)]
aioring: check for arch support AFTER including the headers

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoaioring: hide it if archs don't define syscalls
Jens Axboe [Thu, 13 Dec 2018 03:21:42 +0000 (20:21 -0700)]
aioring: hide it if archs don't define syscalls

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agot/aio-ring: update for new API
Jens Axboe [Thu, 13 Dec 2018 03:05:40 +0000 (20:05 -0700)]
t/aio-ring: update for new API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoAdd aioring engine
Jens Axboe [Thu, 13 Dec 2018 02:48:15 +0000 (19:48 -0700)]
Add aioring engine

This is a new Linux aio engine, built on top of the new aio
interfaces. It supports polled aio, regular aio, and buffered
aio.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agoioengine: remove ancient alias for libaio
Jens Axboe [Thu, 13 Dec 2018 02:47:31 +0000 (19:47 -0700)]
ioengine: remove ancient alias for libaio

Signed-off-by: Jens Axboe <axboe@kernel.dk>
17 months agot/aio-ring: set nr_events after clear
Jens Axboe [Wed, 12 Dec 2018 16:49:40 +0000 (09:49 -0700)]
t/aio-ring: set nr_events after clear

Signed-off-by: Jens Axboe <axboe@kernel.dk>