Jens Axboe [Sat, 9 Oct 2021 18:56:11 +0000 (12:56 -0600)]
t/io_uring: fix latency stats for depth == 1
Two issues here:
- Stat increment accounting was off-by-one, causing no stats added
for depth == 1
- The stat batch count should be a minimum of 2, since it's really
a mask.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 7 Oct 2021 12:18:21 +0000 (06:18 -0600)]
Merge branch 'evelu-ocp' of https://github.com/ErwanAliasr1/fio
* 'evelu-ocp' of https://github.com/ErwanAliasr1/fio:
t/io_uring: Add -r option to control the runtime
t/one-core-peak: Reporting RETPOLINE & PAGE_TABLE_ISOLATION
t/one-core-peak: Reporting kernel cmdline
t/one-core-peak: Reporting BLK_WBT_MQ
t/one-core-peak: Reporting BLK_CGROUP
Erwan Velu [Wed, 6 Oct 2021 21:40:27 +0000 (23:40 +0200)]
t/io_uring: Add -r option to control the runtime
By default the test is running until someone press Ctrl-C.
This commit add an option to define the expected runtime.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 6 Oct 2021 21:42:29 +0000 (23:42 +0200)]
t/one-core-peak: Reporting RETPOLINE & PAGE_TABLE_ISOLATION
These settings can influence the max perf if enabled.
Let's report them.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 6 Oct 2021 21:25:49 +0000 (23:25 +0200)]
t/one-core-peak: Reporting kernel cmdline
The cmdline can contain many interesting options that were set and could
influence the final result/one-core-peak: Reporting kernel cmdline
The cmdline can contain many interesting options that were set and could
influence the final result
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 6 Oct 2021 21:02:15 +0000 (23:02 +0200)]
t/one-core-peak: Reporting BLK_WBT_MQ
If BLK_WBT_MQ is set, some ktime_get() call can be seen in the io path.
Let's report the value of this setting and disable it if present.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 6 Oct 2021 20:19:30 +0000 (22:19 +0200)]
t/one-core-peak: Reporting BLK_CGROUP
When BLK_CGROUP is enabled, it induces some rdtsc calls which reduce the
overall performance.
Let's report if this option is enabled.
The tool was reporting BLK_CGROUP_IOCOST which wasn't the right one.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Jens Axboe [Tue, 5 Oct 2021 12:58:07 +0000 (06:58 -0600)]
t/io_uring: get rid of old debug printfs
We don't really care about the sq/cq ring pointers, that was something
I originally added as this test tool was the first one that I wrote to
bring up io_uring and help debug ring issues.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 5 Oct 2021 12:38:41 +0000 (06:38 -0600)]
t/io_uring: print submitter id with tid on startup
Makes it easier to match up multiple threads with the stats.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 4 Oct 2021 23:04:04 +0000 (17:04 -0600)]
t/io_uring: clean up aio wait loop
No functional changes, just makes it easier to read and gets rid of
an indentation.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 4 Oct 2021 22:35:15 +0000 (16:35 -0600)]
t/io_uring: check for valid clock_index and finish state for stats
If the clock_index isn't non-zero, it's not valid and we should disregard
the sample. Ditto if an exit signal has been sent, we're done at that
point and aren't interested in the last samples.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 4 Oct 2021 22:18:39 +0000 (16:18 -0600)]
t/io_uring: don't track IO latencies the first second of runtime
The most variation is usually seen at startup, so don't start tracking
latencies until we've done the first reporting run. Things should be
nice and stable at that point.
To make this cheaper on the fast path, clock_index is only valid if
it's non-zero. This makes checking for stats cheap in the reap path.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 4 Oct 2021 22:16:01 +0000 (16:16 -0600)]
t/io_uring: don't print partial IOPS etc output if exit signal was received
The run always terminates with what looks like a much slower cycle than
the previous seconds. That's not really the case, it's just that the
sleep() got interrupted by the signal and we slept less than we thought
we did, yet we still account it as a full second.
Just make it cleaner and break if finish is set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 4 Oct 2021 18:42:01 +0000 (12:42 -0600)]
t/io_uring: add support for legacy AIO
Just as a comparison point, not really interesting otherwise. It doesn't
support any of the advanced features, just basic IO.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 4 Oct 2021 18:33:40 +0000 (12:33 -0600)]
t/io_uring: remove extra add_stat() call
If we're batching the stat updates, it's incorrect to add the individual
stat. Would have skewed the percentiles, and make -t1 run slower than it
otherwise would have.
Fixes:
ab85494f8bf0 ("t/io_uring: batch stat updates")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 1 Oct 2021 19:55:52 +0000 (13:55 -0600)]
Merge branch 'evelu-fixes2' of https://github.com/ErwanAliasr1/fio
* 'evelu-fixes2' of https://github.com/ErwanAliasr1/fio:
t/one-core-peak: nvme-cli as optional tooling
t/one-core-peak: Report numa as off if missing
Erwan Velu [Fri, 1 Oct 2021 19:43:07 +0000 (21:43 +0200)]
t/one-core-peak: nvme-cli as optional tooling
Not all systems has nvme-cli installed.
If present then let's print additional low-level info,
If not, let's ignore and continue.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Fri, 1 Oct 2021 19:37:29 +0000 (21:37 +0200)]
t/one-core-peak: Report numa as off if missing
Some systems doesn't have numa enabled,
if so don't report an error but report numa as off.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Shin'ichiro Kawasaki [Fri, 1 Oct 2021 10:32:57 +0000 (19:32 +0900)]
Refer td->loops instead of td->o.loops to fix loop count issue
In the github issues #1093 and #1278, it was reported that the loops
option does not work as expected when do_verify=0 option is specified.
Per analysis by Sowmya Ravi, the cause was as follows:
1) keep_running() decrements td->o.loops at job repetition, then
td->o.loops has zero value when the last loop is executed.
2) clear_io_state() is called at the beginning of the thread_main loop
for each repetition for loops option.
3) clear_io_state() calls reset_io_counters() which resets
td->nr_done_files to zero when td->o.loops is non-zero.
4) For the last loop of loops option, clear_io_state() call does not
clear td->nr_done_files since td->l.loops is zero. This results in a
setup error in do_io().
To fix the issue, modify reset_io_counters() to refer td->loops instead
of td->o.loops. td->o.loops is not a good reference since it is updated
in keep_running(). td->loops is not updated during fio run, and safe to
refer.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20211001103257.4130231-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shin'ichiro Kawasaki [Fri, 1 Oct 2021 10:32:56 +0000 (19:32 +0900)]
Revert "Fix for loop count issue when do_verify=0 (#1093)"
This reverts commit
499cded5f435a0a7c379b606eb3e903d7f43c360.
The commit enabled clear_io_state() call in the loop of thread_main()
after completion of IOs, regardless of verify option. This sets zero to
td->nr_done_files even when the IOs are sequential workload with holes.
Such IOs depend on td->nr_done_files to judge job completion in
__get_next_file(). With zero value in td->nr_done_files, the sequential
IOs do not complete as expected, and results in failure of a test case
Revert the commit to avoid the failure. Regarding the loop count issue
with do_verify=0 option, another fix patch follows.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20211001103257.4130231-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 1 Oct 2021 17:11:53 +0000 (11:11 -0600)]
t/io_uring: correct percentile ranking
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shin'ichiro Kawasaki [Thu, 30 Sep 2021 00:02:36 +0000 (09:02 +0900)]
zbd: Fix unexpected job termination by open zone search failure
Test case #46 in t/zbd/test-zbd-support fails when it is repeated
hundreds of times on null_blk zoned devices. The test case uses libaio
IO engine to run 8 random write jobs on 4 sequential write required
zones. When all of the 4 zones get almost full but still open for
in-flight writes, the helper function zbd_convert_to_open_zone() fails
to get an opened zone for next write. This results in unexpected job
termination.
To avoid the unexpected job termination, retry the steps in
zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
that the in-flight writes get completed.
To prevent infinite loop by the retry, retry only when any IOs are
in-flight or in-flight IOs get completed. To check in-flight IO count of
all jobs, add a new helper function any_io_in_flight().
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Link: https://lore.kernel.org/r/20210930000236.4116945-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 30 Sep 2021 02:15:45 +0000 (20:15 -0600)]
t/io_uring: store TSC rate in local file
Doesn't change on a single machine, so let's just cache the value instead
of requiring it to be specified every time. If we specify the rate, the
local data is updated. If we don't specify it, we check the file, and use
the rate in there if it exists.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 29 Sep 2021 17:38:58 +0000 (11:38 -0600)]
Merge branch 'patch-1' of https://github.com/ravisowmya/fio
* 'patch-1' of https://github.com/ravisowmya/fio:
Fix for loop count issue when do_verify=0 (#1093)
ravisowmya [Tue, 28 Sep 2021 19:09:38 +0000 (12:09 -0700)]
Fix for loop count issue when do_verify=0 (#1093)
'clear_io_state' is called twice and resets the nr_done_files.
'clear_io_state' resets the nr_done_files if loop>=1.
This API is called twice with in thread_main and the second call is
skipped if do_verify=0. We rely on the first call for setup management.
So, for the very last loop, we would have skipped reseting
'nr_done_files' because loops=0 resulting in an IO error
in do_io and we exit without performing any IOs. Fix will invoke
the second call to clear_io_state
Signed-off-by: Sowmya Ravi sowmyaravi.92@gmail.com
Jens Axboe [Tue, 28 Sep 2021 19:28:18 +0000 (13:28 -0600)]
Merge branch 'sigbreak' of https://github.com/bjpaupor/fio
* 'sigbreak' of https://github.com/bjpaupor/fio:
add signal handlers for Windows SIGBREAK
Brandon Paupore [Tue, 28 Sep 2021 17:12:15 +0000 (12:12 -0500)]
add signal handlers for Windows SIGBREAK
Signed-off-by: Brandon Paupore <brandon.paupore@wdc.com>
Jens Axboe [Sun, 26 Sep 2021 22:32:32 +0000 (16:32 -0600)]
Merge branch 'onecore' of https://github.com/ByteHamster/fio
* 'onecore' of https://github.com/ByteHamster/fio:
Pick core for running t/one-core-peak.sh
Jens Axboe [Sun, 26 Sep 2021 22:32:05 +0000 (16:32 -0600)]
Merge branch 'evelu-fio' of https://github.com/ErwanAliasr1/fio
* 'evelu-fio' of https://github.com/ErwanAliasr1/fio:
one-core-peak: Reporting NVME features
t/one-core-peak: Reporting kernel config
one-core-peak.sh: Fixing bash
Erwan Velu [Sun, 26 Sep 2021 20:26:27 +0000 (22:26 +0200)]
one-core-peak: Reporting NVME features
This commit get some low-level features of NVME drives and report them.
It includes, temperature, apste, power state and submission & completion queues
A typical output looks like :
nvme0n1: MODEL=Samsung SSD 970 EVO Plus 2TB FW=2B2QEXM7 serial=S59CNM0R417706B PCI=0000:01:00.0@8.0 GT/s PCIe IRQ=62 NUMA=0 CPUS=0-31
nvme0n1: Temp:34 C, Autonomous Power State Transition: Enabled, PowerState:4, Completion Queues:32, Submission Queues:32
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Sun, 26 Sep 2021 19:43:39 +0000 (21:43 +0200)]
t/one-core-peak: Reporting kernel config
This patch add a reporting of some items of the kernel config.
A typical output looks like :
system: KERNEL: 5.15.0-rc2+
system: KERNEL: CONFIG_BLK_CGROUP_IOCOST=y
system: KERNEL: CONFIG_HZ=1000
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
ByteHamster [Wed, 22 Sep 2021 14:30:35 +0000 (16:30 +0200)]
Pick core for running t/one-core-peak.sh
Erwan Velu [Sun, 26 Sep 2021 19:03:36 +0000 (21:03 +0200)]
one-core-peak.sh: Fixing bash
This commit fixes some warning around the bash syntax
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Jens Axboe [Sun, 26 Sep 2021 15:58:05 +0000 (09:58 -0600)]
Merge branch 'tsc' of https://github.com/ErwanAliasr1/fio
* 'tsc' of https://github.com/ErwanAliasr1/fio:
one-core-peak: Adding option to reporting latencies
one-core-peak: Avoid reporting Unknown memory speed
Erwan Velu [Sat, 25 Sep 2021 21:51:24 +0000 (23:51 +0200)]
one-core-peak: Adding option to reporting latencies
Since commit
932131c944b10f2a03f4028318c454c98eca489f,
it is now possible to report the io_uring benchmark latencies.
This patch detects the current TSC value and enable the latency feature if requested.
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Erwan Velu [Sat, 25 Sep 2021 21:49:12 +0000 (23:49 +0200)]
one-core-peak: Avoid reporting Unknown memory speed
Some BIOSes, reports the configured mem speed to unknown making the report useless.
Adding a match on a real speed to avoid this.
Before: system: MEMORY: Unknown
After: system: MEMORY: 3466 MT/s
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Jens Axboe [Sat, 25 Sep 2021 20:56:14 +0000 (14:56 -0600)]
Merge branch 'evelu-uring' of https://github.com/ErwanAliasr1/fio
* 'evelu-uring' of https://github.com/ErwanAliasr1/fio:
t/io_uring.c: Adding \n on help
Erwan Velu [Sat, 25 Sep 2021 20:45:51 +0000 (22:45 +0200)]
t/io_uring.c: Adding \n on help
Without these \n, the new options were baddly printed
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Jens Axboe [Sat, 25 Sep 2021 20:38:10 +0000 (14:38 -0600)]
t/io_uring: batch stat updates
Track the last clock_index, and batch increments if at all possible.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sat, 25 Sep 2021 20:25:05 +0000 (14:25 -0600)]
t/io_uring: add support for latency tracking
This will display the latency percentiles for the run when done, per
submitter thread. It takes two arguments:
-t<x> Enable latency tracking if x is non-zero
-T<Y> Set TSC clock rate to Y Hz
The tsc rate can be programatically deduced (fio does this), for now
pass it in. dmesg will generally tell you:
tsc: Refined TSC clocksource calibration: 3699.889 MHz
and you'd then do:
-t1 -T3699889000
for that. Here's an example, synchronous optane gen2 read:
[...]
IOPS=254118, BW=124MiB/s, IOS/call=0/0, inflight=(1)
IOPS=255024, BW=124MiB/s, IOS/call=0/0, inflight=(1)
IOPS=255100, BW=124MiB/s, IOS/call=0/0, inflight=(1)
IOPS=254791, BW=124MiB/s, IOS/call=0/0, inflight=(1)
^CExiting on signal 2
IOPS=100086, BW=48MiB/s, IOS/call=1/1, inflight=(1)
515102: Latency percentiles:
percentiles (nsec):
| 1.0000th=[ 3857], 5.0000th=[ 3857], 10.0000th=[ 3857],
| 20.0000th=[ 3857], 30.0000th=[ 3857], 40.0000th=[ 3892],
| 50.0000th=[ 3892], 60.0000th=[ 3892], 70.0000th=[ 3892],
| 80.0000th=[ 3892], 90.0000th=[ 3961], 95.0000th=[ 3961],
| 99.9000th=[ 8752], 99.5000th=[ 8752], 99.9000th=[ 8752],
| 99.9500th=[ 9064], 99.9900th=[ 9755]
Or a higher depth run:
IOPS=
3549568, BW=1733MiB/s, IOS/call=32/32, inflight=(64)
IOPS=
3547712, BW=1732MiB/s, IOS/call=32/31, inflight=(111)
IOPS=
3549504, BW=1733MiB/s, IOS/call=32/31, inflight=(128)
^CExiting on signal 2
IOPS=
1413600, BW=690MiB/s, IOS/call=32/32, inflight=(35)
515078: Latency percentiles:
percentiles (nsec):
| 1.0000th=[13630], 5.0000th=[14322], 10.0000th=[15291],
| 20.0000th=[16121], 30.0000th=[20065], 40.0000th=[21726],
| 50.0000th=[22279], 60.0000th=[26154], 70.0000th=[27814],
| 80.0000th=[28368], 90.0000th=[33903], 95.0000th=[34180],
| 99.9000th=[52862], 99.5000th=[52862], 99.9000th=[52862],
| 99.9500th=[56183], 99.9900th=[67807]
Note that latency tracking isn't cheap, even if we tried to do it in the
cheapest way possible. The peak workload shown here will run at ~3.7M
IOPS without tracking, and as shown about 3.55M with tracking enabled.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 24 Sep 2021 21:17:44 +0000 (15:17 -0600)]
t/io_uring: don't print BW numbers for do_nop
They don't mean anything for nops, we're just interested in IOPS here.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 23 Sep 2021 15:15:16 +0000 (09:15 -0600)]
t/io_uring: ensure batch counts are smaller or equal to depth
If you use a batch submit or complete count that's larger than the
depth, then t/io_uring will stall. Make sure to sanitize the counts
so that any batch values is always <= total depth.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 21 Sep 2021 00:29:40 +0000 (18:29 -0600)]
Merge branch 'one-core' of https://github.com/ErwanAliasr1/fio
* 'one-core' of https://github.com/ErwanAliasr1/fio:
t/one-core.sh: Adding script to run the one-core io benchmark
Erwan Velu [Thu, 16 Sep 2021 20:52:22 +0000 (22:52 +0200)]
t/one-core.sh: Adding script to run the one-core io benchmark
Associated to fio, the t/io_uring test is used to compute the max IOPS a
single core can get.
Jens published several times the procedure he uses, but trying to
reproduce this setup is error-prone. It's easy to miss a configuration
and get a different result.
This script is about setting up a common setup to reproduce these runs.
From the fio directory, execute like the folliowing :
[user@fio] t/one-core.sh /dev/nvme0n1 [other drives]
##################################################:
system: CPU: AMD EPYC 7502P 32-Core Processor
system: MEMORY: 2933 MT/s
system: KERNEL: 5.10.35-1.el8.x86_64
nvme0n1: MODEL=Samsung SSD 970 EVO Plus 2TB FW=2B2QEXM7 serial=S59CNM0R417706B PCI=0000:01:00.0@8.0 GT/s PCIe IRQ=64 NUMA=0 CPUS=0-23
nvme0n1: set none as io scheduler
nvme0n1: iostats set to 1.
nvme0n1: nomerge set to 0.
Warning: For better performance, you should enable nvme poll queues by setting nvme.poll_queues=32 on the kernel commande line
##################################################:
io_uring: Running taskset -c 0,12 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n4 /dev/nvme0n1
[...]
IOPS=731008, BWPS=356 MB IOS/call=32/31, inflight=(108 127 126 106)
This script will take care of the following items:
- nvme poll queues
- io scheduler
- iostats
- io_poll
- nomerge
- finding the logical cores running on the first physical core
- cpu frequency governor on performance
- cpu idle governor on menu
- calling t/io_uring with the proper parameters in 512 bytes fashion
- reporting the nvme & pci configuration
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Jens Axboe [Thu, 16 Sep 2021 17:41:06 +0000 (11:41 -0600)]
t/io_uring: fix bandwidth calculation
Fixes:
22fd35012cea ("t/io_uring: Reporting bandwidth")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 16 Sep 2021 17:29:57 +0000 (11:29 -0600)]
Merge branch 'bwps' of https://github.com/ErwanAliasr1/fio
* 'bwps' of https://github.com/ErwanAliasr1/fio:
t/io_uring: Reporting bandwidth
Erwan Velu [Thu, 16 Sep 2021 16:46:30 +0000 (18:46 +0200)]
t/io_uring: Reporting bandwidth
When performing tests at various block size, it's sometimes a bit
difficult to estimate if we reach the limit of the datapath.
This commit offer to simply prints the resulting bandwitdh of the IOPS
multiplied by the block size.
A typical output looks like :
[user@hosŧ] t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n4 /dev/nvme0n1
...
IOPS=729856, BW=356 MiB/s, IOS/call=32/32, inflight=(105 119 108 109)
[user@host] t/io_uring -b4096 -d128 -c32 -s32 -p1 -F1 -B1 -n4 /dev/nvme0n1
...
IOPS=746368, BW=2915 MiB/s, IOS/call=32/31, inflight=(121 115 122 122)
In the 4K case, as for a PCI Gen3 product, we are clearly limited by the
bandwidth while in the 512 case we hit latency issues.
BW is expressed in MiB/sec.
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Jens Axboe [Wed, 15 Sep 2021 12:51:01 +0000 (06:51 -0600)]
t/io_uring: add switch -O for O_DIRECT vs buffered
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Sep 2021 20:09:01 +0000 (14:09 -0600)]
zbd: remove dead zone retrieval call
A previous commit missed to realize that not only was the assignment
useless, that also made the very call to zbd_zone_nr() useless as
well. Remove it.
Fixes:
000ecb5fe36d ("zbd: Removing useless variable assignment")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Sep 2021 19:18:26 +0000 (13:18 -0600)]
t/io_uring: add -N option for do_nop
Makes it easier than asking people to edit and compile.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Sep 2021 19:14:29 +0000 (13:14 -0600)]
t/io_uring: don't require a file for do_nop runs
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 8 Sep 2021 21:40:47 +0000 (15:40 -0600)]
Merge branch 'ft' of https://github.com/ErwanAliasr1/fio
* 'ft' of https://github.com/ErwanAliasr1/fio:
log: Removing useless assignment
zbd: Removing useless variable assignment
lib/fls.h: Remove unused variable assignment
engines/sg: Removing useless variable assignment
stat: Avoid freeing null pointer
filesetup: Removing unused variable usage
engines/sg: Return error if generic_close_file fails
Erwan Velu [Wed, 8 Sep 2021 21:10:50 +0000 (23:10 +0200)]
log: Removing useless assignment
The last len assigment is never read which makes it useless.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 8 Sep 2021 21:00:45 +0000 (23:00 +0200)]
zbd: Removing useless variable assignment
zone_idx_b is set but never read again.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 8 Sep 2021 20:52:10 +0000 (22:52 +0200)]
lib/fls.h: Remove unused variable assignment
x is modified just before the last set of r but x is never used again.
Let's remove this useless assignment.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 8 Sep 2021 20:43:39 +0000 (22:43 +0200)]
engines/sg: Removing useless variable assignment
ret is set to -1 but the break statement will not use this value.
So let's remove this useless assignment which could be confusing.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 8 Sep 2021 20:35:59 +0000 (22:35 +0200)]
stat: Avoid freeing null pointer
If ovals is NULL, the jump to out will free(ovals) and will trigger an error.
As the out label was only used for this condition, let's remove it and return immediately.
As out was also used as a variable name, this makes the function easier
to read and more robust.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Jens Axboe [Wed, 8 Sep 2021 20:31:04 +0000 (14:31 -0600)]
README: add link to new lore archive
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Erwan Velu [Wed, 8 Sep 2021 20:22:56 +0000 (22:22 +0200)]
filesetup: Removing unused variable usage
done is set to true but this is useless as break will
stop the while loop.
So let's remove this useless assignment.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Erwan Velu [Wed, 8 Sep 2021 20:18:53 +0000 (22:18 +0200)]
engines/sg: Return error if generic_close_file fails
The current code was returning 1 if generic_close_file() fails.
The ret value was prepared with the real error, let's return this one as
the per generic_open_file() error handling.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
Jens Axboe [Wed, 8 Sep 2021 20:12:16 +0000 (14:12 -0600)]
t/io_uring: ensure that nthreads is > 0
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Andrzej Jakowski [Wed, 8 Sep 2021 18:35:00 +0000 (11:35 -0700)]
t/io_uring: allow flexible IO threads assignment
This patch allows to flexibly assign IO threads to fileset. When
you specify:
t/io_uring -n 5 /dev/dev1 dev/dev2
First file/device will get 3 IO threads and second file/device
remaining 2 IO threads. When there is more files then IO threads,
IO thread may get assigned multiple files/devices.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 8 Sep 2021 14:59:48 +0000 (08:59 -0600)]
Fio 3.28
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 8 Sep 2021 14:07:57 +0000 (08:07 -0600)]
t/io_uring: don't make setrlimit() failing fatal
We don't even need this on newer kernels, so just ignore it if it
fails. The worst that can happen is that buffer registration will
fail.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Andrzej Jakowski [Wed, 8 Sep 2021 04:10:44 +0000 (21:10 -0700)]
t/io_uring: fixes in output
Provide description of available options in usage command
and fix alignment so they look pretty.
Also remove debug output.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shin'ichiro Kawasaki [Mon, 6 Sep 2021 01:50:00 +0000 (10:50 +0900)]
options: Add thinktime_iotime option
The thinktime option allows stalling a job for a specified amount of
time. Using the thinktime_blocks option, periodic stalls can be added
every thinktime_blocks IOs. However, with this option, the periodic
stall may not be repeated at equal time intervals as the time to execute
thinktime_blocks IOs may vary.
To control the thinktime interval by time, introduce the option
thinktime_iotime. With this new option, the thinktime stall is repeated
after IOs are executed for thinktime_iotime. If this option is used
together with the thinktime_blocks option, the thinktime pause is
repeated after thinktime_iotime or after thinktime_blocks IOs, whichever
happens first.
To support the new option, add a new member thinktime_iotime in the
struct thread_options and the struct thread_options_pack. Avoid size
increase of the struct thread_options_pack by replacing a padding 'pad5'
with the new member. To keep thinktime related members close, move the
members near the position where the padding was placed. Make same
changes to the struct thread_option also for consistency.
To track the time and IO block count at the last stall, add
last_thinktime variable and last_thinktime_blocks variable to struct
thread_data. Also, introduce the helper function init_thinktime()
to group thinktime related preparations.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:27 +0000 (15:20 +0000)]
examples: add examples for cmdprio_* IO priority options
Add the example scripts cmdprio-percentage.fio and cmdprio-bssplit.fio
to illustrate the use of the cmdprio_percentage, cmdprio_class,
cmdprio and cmdprio_bssplit options. Also add the fiograph output
images for these example scripts.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:26 +0000 (15:20 +0000)]
fio: Introduce the log_prio option
Introduce the log_prio option to expand priority logging from just a
single bit information (priority high vs low) to the full value of the
priority value used to execute IOs. When this option is set, the
priority value is printed as a 16-bits hexadecimal value combining
the I/O priority class and priority level as defined by the
ioprio_value() helper.
Similarly to the log_offset option, this option does not result in
actual I/O priority logging when log_avg_msec is set.
This patch also fixes a problem with the IO_U_F_PRIORITY flag, namely
that this flag is used to indicate that the IO is being executed with a
high priority on the device while at the same time indicating how to
account for the IO completion latency (high_prio clat vs low_prio clat).
With the introduction of the cmdprio_class and cmdprio options, these
assumptions are not necesarilly compatible anymore.
These problems are addressed as follows:
* The priority_bit field of struct iosample is replaced with the
16-bits priority field representing the full io_u->ioprio value. When
log_prio is set, the priority field value is logged as is. When
log_prio is not set, 1 is logged as the entry's priority field if the
sample priority class is IOPRIO_CLASS_RT, and 0 otherwise.
* IO_U_F_PRIORITY is renamed to IO_U_F_HIGH_PRIO to indicate that a job
IO has the highest priority within the job context and so must be
accounted as such using high_prio clat.
While fio final statistics only show accounting of high vs low IO
completion latency statistics, the log_prio option allows a user to
perform more detailed statistical analysis of a workload using
multiple different IO priorities.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:26 +0000 (15:20 +0000)]
libaio,io_uring: relax cmdprio_percentage constraints
In fio, a job IO priority is controlled with the prioclass and prio
options and these options cannot be used together with the
cmdprio_percentage option.
Allow a user to have async IO priorities default to the job defined
IO priority by removing the mutual exclusion between the options
cmdprio_percentage and prioclass/prio.
With the introduction of the cmdprio_class option, an async IO priority
may be lower than the job default priority, resulting in reversed clat
statistics showed for high and low priority IOs when fio completes.
Solve this by setting an io_u IO_U_F_PRIORITY flag depending on a
comparison between the async IO priority and job default IO priority.
When an async IO is issued without a priority set, Linux kernel will
execute it using the IO priority of the issuing context set with
ioprio_set(). This works fine for libaio, where the context will be
the same as the context that submitted the IO.
However, io_uring can be used with a kernel thread that performs
block device IO submissions (sqthread_poll). Therefore, for io_uring,
an IO sqe ioprio field must be set to the job default priority unless
the IO priority is set according to the job cmdprio_percentage value.
Because of this, IO uring already did set sqe->ioprio even when only
prio/prioclass was used. See commit
b7ed2a862dda ("io_uring: set sqe
iopriority, if prio/prioclass is set"). In order to make the code easier
to maintain, handle all I/O priority preparations in the same function.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:25 +0000 (15:20 +0000)]
libaio,io_uring: introduce cmdprio_bssplit
The cmdprio_percentage, cmdprio_class and cmdprio options allow
specifying different values for read and write operations. This enables
various IO priority issuing patterns even uner a mixed read-write
workload but does not allow differentiation within read and write
I/O operation types with different sizes when the bssplit option is
used.
Introduce the cmdprio_bssplit option to complement the use of the
bssplit option. This new option has the same format as the bssplit
option, but the percentage values indicate the percentage of I/O
operations with a particular block size that must be issued with the
priority class and value specified by cmdprio_class and cmdprio.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:24 +0000 (15:20 +0000)]
libaio,io_uring: introduce cmdprio_class and cmdprio options
When the cmdprio_percentage option is used, the specified percentage of
IO will be issued with the highest priority class IOPRIO_CLASS_RT. This
priority class maps to the ATA NCQ "high" priority level and allows
exercising a SATA device to measure its command latency characteristics
in the presence of low and high priority commands.
Beside ATA NCQ commands, Linux block IO schedulers also support IO
priorities and will behave differently in the presence of IOs with
different IO priority classes and values. However, cmdprio_percentage
does not allow specifying all possible priority classes and values.
To solve this, introduce libaio and io_uring engine specific options
cmdprio_class and cmdprio. These new options are the equivalent
of the prioclass and prio options and allow specifying the priority
class and priority value to use for asynchronous I/Os when the
cmdprio_percentage option is used. If not specified, the I/O priority
class defaults to IOPRIO_CLASS_RT and the I/O priority value to 0,
as before. Similarly to the cmdprio_percentage option, these options
can specify different values for read and write I/Os using a comma
separated list.
The manpage, HOWTO and fiograph configuration file are updated to
document these new options.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:24 +0000 (15:20 +0000)]
libaio,io_uring: improve cmdprio_percentage option
The cmdprio_percentage option of the libaio and io_uring engines defines
a single percentage that applies to all IO operations, regardless of
their direction. This prevents defining different high priority IO
percentages for reads and writes operations. This differentiation can
however be useful in the case of a mixed read-write workload (rwmixread
and rwmixwrite options).
Change the option definition to allow specifying a comma separated list
of percentages, 2 at most, one for reads and one for writes. If only a
single percentage is defined, it applies to both reads and writes as
before. The cmdprio_percentage option becomes an array of DDIR_RWDIR_CNT
elements indexed with enum fio_ddir values. The last entry of the array
(for DDIR_TRIM) is always 0.
Also create a new cmdprio helper file, engines/cmdprio.h,
such that we can avoid code duplication between io_uring and libaio
io engines. This helper file will be extended in subsequent patches.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:23 +0000 (15:20 +0000)]
options: make parsing functions available to ioengines
Move the declaration of split_parse_ddir(), str_split_parse() and
the split_parse_fn typedef to thread_options.h so that IO engines
can use these functions to parse options. The definition of struct
split is also moved to thread_options.h from options.c.
The type of the split_parse_fn callback function is changed to add a
void * argument that can be used for an option parsing callback to pass
a private data pointer to the split_parse_fn function. This can be used
by an IO engine to pass a pointer to its engine specific option
structure as td->eo is not yet set when options are being parsed.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:23 +0000 (15:20 +0000)]
os: introduce ioprio_value() helper
Introduce the ioprio_value() helper function to calculate a priority
value based on a priority class and priority level. For Linux and
Android, this is defined as an integer equal to the priority class
shifted left by 13 bits and or-ed with the priority level. For
Dragonfly, ioprio_value() simply returns the priority level as there
is no concept of priority class.
Use this new helper in the io_uring and libaio engines to set IO
priority when the cmdprio_percentage option is used.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:22 +0000 (15:20 +0000)]
tools: fiograph: do not overwrite input script file
In fiograph.py, the setup_commandline() function mistakenly initializes
the output_file variable to the input fio script file, causing this file
to always be overwritten, even if an output file is specified using the
--output option. Fix this by properly initializing the output_file
variable using the --output option argument value. If an output file
name is not provided, the input script file name is used by default.
Also fix fiograph configuration file to remove the cmdprio_percentage
option repeated entry for io_uring and libaio.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:21 +0000 (15:20 +0000)]
manpage: fix definition of prio and prioclass options
Remove the reference to the hipri_percentage option in the definition of
the prio and prioclass options as hipri_percentage controls the use of
RWF_HIPRI flag which triggers I/O completion polling, which is unrelated
with I/O priority (polling and I/O priority can be used together). This
change is done in both fio man page and HOWTO document.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 3 Sep 2021 15:20:20 +0000 (15:20 +0000)]
manpage: fix formatting
For ioengine options supported by multiple ioengines, remove spaces
after commas in the ioengine list to have troff correctly format in bold
the entire ioengine list and option name. Also add "=int" indicators
missing for some options.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Wed, 1 Sep 2021 06:41:53 +0000 (15:41 +0900)]
oslib: Fix blkzoned_get_max_open_zones()
When the kernel does not have the sysfs atttribute file
queue/max_open_zones, blkzoned_get_max_open_zones() returns success
without initializing the max_open_zones value to 0 to indicate to the
caller (zbd_get_max_open_zones() in zbd.c) that the device limit is
unknown. If the max_open_zones variable in zbd_get_max_open_zones() is
not already 0 (depending on the memory status), the missing
initialization in blkzoned_get_max_open_zones() can cause errors or
misbehavior as an incorrect, random, limit may be used.
Fix this by always initializing max_open_zones to 0 when the
max_open_zones sysfs attribute file does not exist.
Reported-by: Bao-Hua Li <baohua.li@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sat, 28 Aug 2021 21:37:25 +0000 (15:37 -0600)]
t/io_uring: further simplify inflight tracking
Don't dump the last submitter inflight as well, just use file_depths()
for all of them.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 27 Aug 2021 18:44:02 +0000 (12:44 -0600)]
t/io_uring: pretty up multi-file depths
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Niklas Cassel [Thu, 26 Aug 2021 16:45:05 +0000 (16:45 +0000)]
io_uring: don't clear recently set sqe->rw_flags
Commit
7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
removed the memset of sqe from fio_ioring_prep().
This commit did add a clear of the sqe->rw_flags, however, it did so
after both RWF_UNCACHED and RWF_NOWAIT flags might have been set,
effectively clearing these flags if they got set.
This doesn't make any sense. Make sure that we clear sqe->rw_flags
before, not after, setting the flags.
Fixes:
7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Niklas Cassel [Thu, 26 Aug 2021 16:45:05 +0000 (16:45 +0000)]
io_uring: fix misbehaving cmdprio_percentage option
Commit
7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
removed the memset of sqe from fio_ioring_prep().
Before this commit, fio_ioring_prio_prep() behaved properly, because
sqe->ioprio was always cleared by the memset in fio_ioring_prep().
cmdprio_percentage=20 is supposed to set the highest priority class for
20% of the total I/Os, however, because sqes got reused without clearing
the ioprio field, this meant that the number of I/Os sent with the highest
priority became 95% already after 10 seconds. Quite far off from the
intended 20%.
Fix this by explicitly clearing the priority in fio_ioring_prio_prep().
Note that prio/prioclass cannot be used together with cmdprio_percentage,
so we do not need to do an additional clear in fio_ioring_prep().
engines/libaio.c doesn't explicitly clear the ioprio, nor does it memset
the descriptor entry, this is because io_prep_pread()/io_prep_pwrite() in
libaio itself performs a memset.
Fixes:
7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Niklas Cassel [Thu, 26 Aug 2021 16:45:04 +0000 (16:45 +0000)]
io_uring: always initialize sqe->flags
Commit
7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
removed the memset of sqe from fio_ioring_prep().
Later, force_async was added in commit
5a59a81d2923 ("engines/io_uring:
allow setting of IOSQE_ASYNC").
The force_async commit sets sqe->flags every N requests, however,
since we no longer do a memset, this commit should have made sure that
flags is always initialized, such that we don't have sqe->flags set on
reused sqes where we didn't intend to.
Fixes:
5a59a81d2923 ("engines/io_uring: allow setting of IOSQE_ASYNC")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 Aug 2021 14:46:36 +0000 (08:46 -0600)]
Merge branch 'wip-cxx' of https://github.com/tchaikov/fio
* 'wip-cxx' of https://github.com/tchaikov/fio:
arch,lib/seqlock: implement seqlock with C++ atomic if compiled with C++
Kefu Chai [Wed, 11 Aug 2021 04:29:39 +0000 (12:29 +0800)]
arch,lib/seqlock: implement seqlock with C++ atomic if compiled with C++
because some functions declared by <stdatomic.h> share the same names
with those declared by <atomic>, for instance `kill_dependency()` is
defined as a macro by <stdatomic.h>, while it is defined as a template
function in <atomic>.
this renders it impossible to compile an ioengine written in C++ if
its source file includes both <atomic> and <fio.h>. the latter includes
<stdatomic.h> via arch/arch.h. the compile error would look like:
In file included from ../src/test/fio/fio_ceph_objectstore.cc:26:
In file included from src/fio/fio.h:18:
In file included from src/fio/thread_options.h:8:
In file included from src/fio/gettime.h:7:
src/fio/lib/seqlock.h:21:9: error: expected ')'
seq = atomic_load_acquire(&s->sequence);
^
src/fio/arch/../lib/../arch/arch.h:47:32: note: expanded from macro 'atomic_load_acquire'
atomic_load_explicit((_Atomic typeof(*(p)) *)(p), \
^
src/fio/lib/seqlock.h:21:9: note: to match this '('
to address this issue, instead of using the functions in <stdatomic.h> to
implement seqlock, use the primitives offered by C++ standard library
if the source code is compiled using a C++ compiler.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Jens Axboe [Sat, 21 Aug 2021 02:58:42 +0000 (20:58 -0600)]
Merge branch 'master' of https://github.com/DamonPalovaara/fio
* 'master' of https://github.com/DamonPalovaara/fio:
fixed type boot->bool
Damon Palovaara [Fri, 20 Aug 2021 22:22:58 +0000 (15:22 -0700)]
fixed type boot->bool
Jens Axboe [Wed, 18 Aug 2021 16:47:55 +0000 (10:47 -0600)]
Merge branch 'patch-1' of https://github.com/antroseco/fio
* 'patch-1' of https://github.com/antroseco/fio:
server: reopen standard streams to /dev/null
Andreas Economides [Wed, 18 Aug 2021 12:19:51 +0000 (13:19 +0100)]
server: reopen standard streams to /dev/null
For some custom ioengines (see https://github.com/spdk/spdk/issues/1118),
it's not trivial to suppress output to stdout and stderr, so they would
write to some other file descriptor fio had opened - which is bad.
This change ensures that fd's 0, 1, and 2 (stdin, stdout, stderr) are
always valid and can be used without any unintended consequences.
Signed-off-by: Andreas Economides <andreas.economides@nutanix.com>
Jens Axboe [Fri, 13 Aug 2021 16:01:31 +0000 (10:01 -0600)]
Merge branch 'dfs_update_13_api' of https://github.com/johannlombardi/fio
* 'dfs_update_13_api' of https://github.com/johannlombardi/fio:
engines/dfs: add support for 1.3 DAOS API
Jens Axboe [Wed, 11 Aug 2021 22:54:23 +0000 (16:54 -0600)]
t/io_uring: allow multiple IO threads
If you do:
t/io_uring -n2 /dev/dev1 /dev/dev2
then t/io_uring will create two IO threads, and each one will get
a file/device assigned. In the above example, thread 1 will run
on dev1, thread 2 on dev2.
Note that for now, you'll need at least as many files as threads.
Adding support for adding the same file set over the specified
threads (if we have less files than threads) is left as an
exercise for the reader. You know where to send the patches.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johann Lombardi [Tue, 10 Aug 2021 13:03:21 +0000 (15:03 +0200)]
engines/dfs: add support for 1.3 DAOS API
A few changes were done to the pool connect and container open API
in DAOS 1.3+. UUID string or label are now passed via the API
instead of uuid_t structures. Change the dfs engine accordingly.
Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Shin'ichiro Kawasaki [Fri, 6 Aug 2021 01:07:11 +0000 (10:07 +0900)]
t/zbd: Add test #58 to test zone reset by trim workload
To exercise zone reset by trim workload, add the test case #58. As the
precondition, it fills several zones. After that, a trim job and a write
job run in parallel for 30 seconds. The ratio of trim commands and write
commands is controlled by --flow option.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shin'ichiro Kawasaki [Fri, 6 Aug 2021 01:07:10 +0000 (10:07 +0900)]
HOWTO/man: Describe trim support by zone reset for zoned devices
Previous commits added trim support for zoned devices. Update HOWTO and
man page to describe it. Also add missing description about libzbc I/O
engine to HOWTO.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shin'ichiro Kawasaki [Fri, 6 Aug 2021 01:07:09 +0000 (10:07 +0900)]
engines/libzbc: Enable trim for libzbc I/O engine
The trim workload to zoned block devices is supported as zone reset, and
this feature is available for I/O engines which support both zoned
devices and trim workload. Libzbc I/O engine supports zoned devices but
lacks trim workload support. To enable trim support with libzbc I/O
engine, remove the check which inhibited trim from requests to libzbc
I/O engine. Also set file open flags for trim same as write, and call
zbd_do_io_u_trim() for trim I/O.
Of note is that libzbc I/O engine now can support trim to sequential
write required zones only. The trim I/Os to conventional zones are
reported as an error.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shin'ichiro Kawasaki [Fri, 6 Aug 2021 01:07:08 +0000 (10:07 +0900)]
zbd: Support zone reset by trim
Enable trim workload for zonemode=zbd by modifying do_io_u_trim() to
call zoned block device unique function zbd_do_io_u_trim() which resets
the target zone. This allows fio to emulate workloads which mix data
read/write and zone resets with zonemode=zbd.
To call reset zones, the trim I/O shall have offset aligned to zone
start and block size same as zone size. Reset zone is called only to
sequential write required zones and sequential write preferred zones.
Conventional zones are handled in same manner as regular block devices
by calling os_trim() function.
When zones are reset with random trim workload, choose only non-empty
zones as trim target. This avoids meaningless trim to empty zones and
makes the workload more realistic. To find the non-empty zones, utilize
zbd_find_zone() helper function which is already used for read workload,
specifying 1 byte as the minimum valid data size.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shin'ichiro Kawasaki [Fri, 6 Aug 2021 01:07:07 +0000 (10:07 +0900)]
zbd: Add min_bytes argument to zbd_find_zone()
The helper function zbd_find_zone() finds a zone with at least
min_bs[DDIR_READ] bytes of readable data before the zone write pointer.
This patch generalizes this function to allow finding a non-empty zone.
To do so, add the min_bytes argument to specify the minimum readable
data of a zone to filter the search. Specifying 1 to min_bytes then
become equivalent to finding a non-empty zone.
This change will allow to reuse this function to find a suitable zone
for trim I/O.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Vincent Fu [Wed, 4 Aug 2021 18:29:05 +0000 (18:29 +0000)]
ioengines: fix crash with --enghelp option
Since
f6931a1dd35896433c8cc2e10de51372a2c496c4 commands like the
following segfault:
fio --enghelp=sg
fio --enghelp=sg,sg_write_mode
This is because free_ioengine() assumes that td->io_ops is not NULL.
Make this true when free_ioengine() is called by
fio_show_ioengine_help() to avoid the crash.
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Vincent Fu [Wed, 4 Aug 2021 18:29:04 +0000 (18:29 +0000)]
backend: clarify io scheduler setting error message
If you know *how* fio tries to confirm that the IO scheduler was
successfully set, then the error message "io scheduler not found" makes
sense. However, if you don't know what fio does to confirm the io
scheduler setting, then the error message is confusing. This patch
modifies the error message to indicate that the selected IO scheduler
could not be set.
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ankit Kumar [Wed, 4 Aug 2021 11:23:08 +0000 (16:53 +0530)]
zbd: Improve random zone index generation logic
Existing random zone index generation logic is dependent on the file size.
For smaller I/O sizes the random zone index always return a particular
section of open zones. As this index is used to return one of the open zones,
it was observed that after one of the max_open_zones / job_max_open_zones limit
is reached all further I/O's are redirected to only a few open zones till they
are full.
This patch modifies the random zone index genration logic so that it is uniform
across all the open zones.
It reverts part of the commit
6463db6c1
('fio: fix interaction between offset/size limited threads and
"max_open_zones"')
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>