Age | Commit message (Collapse) | Author |
|
This patch introduces a new fio engine to work with xNVMe >= 0.2.0.
xNVMe provides a user space library (libxnvme) to work with NVMe
devices. The NVMe driver being used by libxnvme is re-targetable and
can be any one of the GNU/Linux Kernel NVMe driver via libaio,
IOCTLs, io_uring, the SPDK NVMe driver, or your own custom NVMe driver.
For more info visit https://xnvme.io
https://github.com/OpenMPDK/xNVMe
Co-Authored-By: Ankit Kumar <ankit.kumar@samsung.com>
Co-Authored-By: Simon A. F. Lund <simon.lund@samsung.com>
Co-Authored-By: Mads Ynddal <m.ynddal@samsung.com>
Co-Authored-By: Michael Bang <mi.bang@samsung.com>
Co-Authored-By: Karl Bonde Torp <k.torp@samsung.com>
Co-Authored-By: Gurmeet Singh <gur.singh@samsung.com>
Co-Authored-By: Pierre Labat <plabat@micron.com>
Link: https://lore.kernel.org/r/20220511163019.5608-2-ankit.kumar@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
fallthrough is reserved in C++, so this causes issues with C++
programs pulling in the fio.h -> compiler.h header.
Rename it to something fio specific instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fixes: cef0a8357b3f ("engines/null: update external engine compilation")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Everything needs to include config-host.h, and make sure that the C++
side uses the right type for the queue op.
Fixes: https://github.com/axboe/fio/issues/1371
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Windows wants the file opened for write if we do a file sync, so
ensure we do that if we have syncs.
Fixes: https://github.com/axboe/fio/issues/1352
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix three occurrences of the following clang compiler warning:
warning: suggest braces around initialization of subobject [-Wmissing-braces]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
|
|
The only real hot system call here is the io_uring_enter(2) call,
as that'll happen during the IO submission/completion parts. The rest
are just setup function calls, we don't really care about those.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Ville Skyttä <ville.skytta@upcloud.com>
|
|
The API of librpma has been changed between v0.10.0 and v0.12.0
and fio has to be updated.
Signed-off-by: Oksana Salyk <oksana.salyk@intel.com>
|
|
Convert the stat code to report clat stats on a per priority granularity,
rather than simply supporting high/low priority.
This is made possible by using the new clat_prio_stat array (per ddir),
together with the clat_prio_stat index which is saved in each io_u.
The per priority samples are only printed when there are samples for more
than one priority in the clat_prio_stat array. If there are only samples
for one priority, that means that all I/Os where submitted using the same
priority, so no need to print.
For example, running the following fio command:
fio --name=test --filename=/dev/sdc --direct=1 --runtime=60 --rw=randread \
--ioengine=io_uring --ioscheduler=mq-deadline --iodepth=32 --bs=32k \
--prioclass=2 --prio=7 --cmdprio_bssplit=32k/20/3/0:32k/10/1/4
Now results in the following output:
test: (groupid=0, jobs=1): err= 0: pid=465655: Tue Feb 1 02:24:47 2022
read: IOPS=146, BW=4695KiB/s (4808kB/s)(276MiB/60239msec)
slat (usec): min=18, max=335, avg=62.87, stdev=22.59
clat (msec): min=2, max=2135, avg=217.97, stdev=287.26
lat (msec): min=2, max=2135, avg=218.03, stdev=287.26
clat prio 2/7 (msec): min=3, max=606, avg=106.57, stdev=86.64
clat prio 3/0 (msec): min=10, max=2135, avg=664.94, stdev=339.42
clat prio 1/4 (msec): min=2, max=300, avg=52.29, stdev=42.52
clat percentiles (msec):
| 1.00th=[ 8], 5.00th=[ 14], 10.00th=[ 19], 20.00th=[ 33],
| 30.00th=[ 52], 40.00th=[ 77], 50.00th=[ 108], 60.00th=[ 144],
| 70.00th=[ 192], 80.00th=[ 300], 90.00th=[ 684], 95.00th=[ 911],
| 99.00th=[ 1234], 99.50th=[ 1318], 99.90th=[ 1687], 99.95th=[ 1770],
| 99.99th=[ 2140]
clat prio 2/7 (69.25% of IOs) percentiles (msec):
| 1.00th=[ 7], 5.00th=[ 13], 10.00th=[ 17], 20.00th=[ 28],
| 30.00th=[ 44], 40.00th=[ 64], 50.00th=[ 85], 60.00th=[ 111],
| 70.00th=[ 140], 80.00th=[ 174], 90.00th=[ 226], 95.00th=[ 279],
| 99.00th=[ 368], 99.50th=[ 418], 99.90th=[ 502], 99.95th=[ 567],
| 99.99th=[ 609]
clat prio 3/0 (20.91% of IOs) percentiles (msec):
| 1.00th=[ 44], 5.00th=[ 138], 10.00th=[ 205], 20.00th=[ 347],
| 30.00th=[ 464], 40.00th=[ 558], 50.00th=[ 659], 60.00th=[ 760],
| 70.00th=[ 860], 80.00th=[ 961], 90.00th=[ 1099], 95.00th=[ 1217],
| 99.00th=[ 1485], 99.50th=[ 1687], 99.90th=[ 1871], 99.95th=[ 2140],
| 99.99th=[ 2140]
clat prio 1/4 (9.84% of IOs) percentiles (msec):
| 1.00th=[ 7], 5.00th=[ 10], 10.00th=[ 13], 20.00th=[ 18],
| 30.00th=[ 24], 40.00th=[ 30], 50.00th=[ 39], 60.00th=[ 51],
| 70.00th=[ 63], 80.00th=[ 84], 90.00th=[ 114], 95.00th=[ 136],
| 99.00th=[ 188], 99.50th=[ 197], 99.90th=[ 300], 99.95th=[ 300],
| 99.99th=[ 300]
bw ( KiB/s): min= 3456, max= 5888, per=100.00%, avg=4697.60, stdev=472.38, samples=120
iops : min= 108, max= 184, avg=146.80, stdev=14.76, samples=120
lat (msec) : 4=0.11%, 10=2.57%, 20=8.67%, 50=18.21%, 100=18.34%
lat (msec) : 250=28.87%, 500=9.41%, 750=5.22%, 1000=5.09%, 2000=3.50%
lat (msec) : >=2000=0.01%
cpu : usr=0.16%, sys=0.97%, ctx=17715, majf=0, minf=262
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=99.6%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=8839,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20220203192814.18552-15-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add support for a new cmdprio_bssplit format, while keeping support for the
old format, by migrating to the split_parse_prio_ddir() parsing function.
In this new format, a priority class and priority level is defined inside
each entry itself. In comparison with the old format, the new format does
not restrict all entries to share the same priority class and priority
level.
Therefore, this new format is very useful if you need to submit I/Os with
multiple IO priority class + IO priority level combinations, e.g. when
testing or verifying an IO scheduler.
cmdprio will allocate a clat_prio_stat array that holds all unique
priorities (including the default priority). Finally, it will set the
clat_prio pointer in the struct thread_stat (td->ts.clat_prio) to the
newly allocated array.
We also add a clat_prio_stat index to io_u.h, that will inform which array
element (which priority value) this specific I/O was submitted with.
The clat_prio_stat index will be used by the stat.c code, to avoid a costly
search operation to find the correct array element to use, for each and
every add_sample().
Note that while this patch will send down the correct I/O pattern to the
drive (potentially using multiple different priorities), it will not
display the cmdprio_{bssplit,percentage} stats correctly until a later
commit in the series (which changes stat.c to report clat stats on a per
priority granularity). This was done to ease reviewing.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220203192814.18552-9-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
* 'master' of https://github.com/blah325/fio:
Added a new windows only IO engine option “no_completion_thread”.
Add Windows support for --server.
Avoid client calls to recv() without prior poll()
|
|
Without this option, Windows FIO creates a
completion polling thread for each worker thread. This also
requires an event queue for the completion thread to forward
completions to the worker thread. Polling directly improves
performance and better matches the linuxaio engine model.
Signed-off-by: james rizzo <james.rizzo@broadcom.com>
|
|
File System DAX is handled in a different way than Device DAX:
1) In case of File System DAX, each thread uses a separate file
from this file system and no offset is needed. In case of Device DAX,
each thread uses a separate offset within the same Device DAX.
2) File System DAX requires rpma_mr_advise(3)(ibv_advise_mr(3))
to be called for the registered memory to avoid page faults
and degraded performance.
Ref: https://github.com/axboe/fio/issues/1238
Signed-off-by: Wang, Long <long1.wang@intel.com>
|
|
If --stream_id=0 then fio will open a stream for WRITE STREAM(16) commands and
close the stream when the device file is closed.
Example:
./fio --name=test --filename=/dev/sdb --ioengine=sg --number_ios=1 --debug=file,io --sg_write_mode=write_stream --rw=randwrite
fio: set debug option file
fio: set debug option io
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sg, iodepth=1
fio-3.27
Starting 1 process
file 1072297 setup files
file 1072297 get file size for 0x7f0306fa5110/0//dev/sdb
file 1072307 trying file /dev/sdb 290
file 1072307 fd open /dev/sdb
file 1072307 file not found in hash /dev/sdb
file 1072307 sgio_stream_control: opened stream 1
file 1072307 get file /dev/sdb, ref=0
io 1072307 drop page cache /dev/sdb
file 1072307 goodf=1, badf=2, ff=2b1
file 1072307 get_next_file_rr: 0x7f0306fa5110
file 1072307 get_next_file: 0x7f0306fa5110 [/dev/sdb]
file 1072307 get file /dev/sdb, ref=1
io 1072307 fill: io_u 0xb55700: off=0x35ef554000,len=0x1000,ddir=1,file=/dev/sdb
io 1072307 prep: io_u 0xb55700: off=0x35ef554000,len=0x1000,ddir=1,file=/dev/sdb
io 1072307 prep: io_u 0xb55700: ret=0
io 1072307 queue: io_u 0xb55700: off=0x35ef554000,len=0x1000,ddir=1,file=/dev/sdb
io 1072307 complete: io_u 0xb55700: off=0x35ef554000,len=0x1000,ddir=1,file=/dev/sdb
file 1072307 put file /dev/sdb, ref=2
file 1072307 close files
file 1072307 put file /dev/sdb, ref=1
file 1072307 sgio_stream_control: closed stream 1
file 1072307 fd close /dev/sdb
io 1072307 close ioengine sg
io 1072307 free ioengine sg
test: (groupid=0, jobs=1): err= 0: pid=1072307: Mon Aug 16 14:25:45 2021
write: IOPS=200, BW=800KiB/s (819kB/s)(4096B/5msec); 0 zone resets
clat (nsec): min=93339, max=93339, avg=93339.00, stdev= 0.00
lat (nsec): min=96201, max=96201, avg=96201.00, stdev= 0.00
clat percentiles (nsec):
| 1.00th=[93696], 5.00th=[93696], 10.00th=[93696], 20.00th=[93696],
| 30.00th=[93696], 40.00th=[93696], 50.00th=[93696], 60.00th=[93696],
| 70.00th=[93696], 80.00th=[93696], 90.00th=[93696], 95.00th=[93696],
| 99.00th=[93696], 99.50th=[93696], 99.90th=[93696], 99.95th=[93696],
| 99.99th=[93696]
lat (usec) : 100=100.00%
cpu : usr=100.00%, sys=0.00%, ctx=2, majf=0, minf=20
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,1,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=800KiB/s (819kB/s), 800KiB/s-800KiB/s (819kB/s-819kB/s), io=4096B (4096B), run=5-5msec
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-6-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add the "write_stream" option to sg_write_mode to send WRITE STREAM(16)
commands. Use the new stream_id option to set the stream identifier.
Example:
sg_stream_ctl -o /dev/sdb
Assigned stream id: 1
./fio --name=test --filename=/dev/sdb --ioengine=sg --sg_write_mode=write_stream --stream_id=1 --rw=randwrite --time_based --runtime=10s
...
sg_stream_ctl -c --id=1 /dev/sdb
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-5-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is a name collision for the sg_write_mode options for the WRITE AND
VERIFY and VERIFY commands. Deprecate the 'verify' option and use
'write_and_verify' instead. Do the same thing for 'same' and 'write_same' to
have a consistent naming scheme. The original option names are still supported
for backward compatibility but list them as deprecated.
Here are the new sg_write_mode options:
Option SCSI command
write WRITE (default)
write_and_verify WRITE AND VERIFY
verify (deprecated) WRITE AND VERIFY
write_same WRITE SAME
same (deprecated) WRITE SAME
write_same_ndob WRITE SAME with NDOB flag set
verify_bytchk_00 VERIFY with BYTCHK set to 00
verify_bytchk_01 VERIFY with BYTCHK set to 01
verify_bytchk_11 VERIFY with BYTCHK set to 11
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-4-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add the sg_write_mode option write_same_ndob to issue WRITE SAME(16) commands
with the no data output buffer flag set. This flag is not supported for WRITE
SAME(10). So all commands with this option will be WRITE SAME(16).
Also include an example job file.
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-3-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
fio does not have an explicit verify data direction and creating a new data
direction just for SCSI VERIFY commands probably is not worthwhile. The format
of SCSI VERIFY commands matches that of write operations since VERIFY commands
can include data transfer to the device. So it seems reasonable to have VERIFY
commands be accounted for as write operations by fio.
Use the sg_write_mode option to support SCSI VERIFY commands with different
BYTCHK values.
BYTCHK Description
00 No data is transferred to the device; device data is checked
01 Device data is compared with data transferred to device
11 Same as 01 except that only one sector of data is transferred to the
device and each sector specified in the verification extent is compared against
this transferred data.
Also update documentation and add a couple example jobs files.
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-2-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For older kernels without IORING_SETUP_CQSIZE, we'll get EINVAL if we
set it. Just retry the ring setup if that happens.
Link: https://github.com/axboe/fio/issues/1324
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
libzbc includes 3 different internal backend drivers:
1) The block backend: this backend relies on the kernel SMR support and
uses regular system calls.
2) The SCSI backend: this is a SG passthrough driver for SAS drives and
for SATA drives accessible through an SMR compliant SAT (SCSI-to-ATA
translation layer).
3) The ATA backend: this is a SG passthrough driver for SATA drives not
handled by the system SAT (either kernel or HBA SAT)
libzbc automatically selects the internal backend driver, using the
first one that is detected as functional (tested in the same order shown
above).
When running on an SMR enabled system (SMR compliant HBA and kernel with
zoned block device support enabled), any fio job using the libzbc IO
engine will thus end up using the regular kernel IO path. This is silly:
for this IO path, the libaio or psync IO engines are far better (less
overhead and more functionalities). The libzbc IO engine should be
restricted to be a passthrough engine only, similarly to the sg engine.
Fix the libzbc engine to not allow the use of libzbc block backend
driver by removing the ZBC_O_DRV_BLOCK flag when opening the device.
Also adjust the test script t/zbd/run-tests-against-nullb to remove the
-l option to force the use of the libzbc IO engine as it will not work
anymore (since the nullb device is neither a SCSI nor an ATA device).
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211210012041.310670-1-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
By default, io_uring uses twice as big a CQ ring as the SQ ring. That's
to help with cases where completions can come in unexpectedly. This is not
the case for storage IO, so just clamp the CQ size to save a bit of memory
on the CQEs and CQ ring.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The way that fio currently handles engine options:
options_free() will call free() only for options that have the type
FIO_OPT_STR_STORE. This means that any option that has a pointer in
either td->o or td->eo, which is not of type FIO_OPT_STR_STORE will
leak memory. This is true even for numjobs == 1.
When running with numjobs > 1, fio_options_mem_dupe() will memcpy
td->eo into the new td. Since off1 of the pointers in the first td
has already been set, the pointers in the new td will point to the
same data. (Regardless, options_free() will never try to free the
memory, for neither td.) Neither can we manually free the memory in
cleanup(), since the other td will still point to the same memory,
so this would lead to a double free.
These memory leaks are reported by e.g. valgrind.
The most obvious way to solve this is to put dynamically allocated
memory in {ioring,libaio}_data instead of {ioring,libaio}_options.
This solves the problem since {ioring,libaio}_data is dynamically
allocated by each td during the ioengine init callback, and is freed
when the ioengine cleanup callback for that td is called.
The downside of this is that the parsing has to be done in
fio_cmdprio_init() instead of in the option .cb callback, since the
.cb callback is called before {ioring,libaio}_data is available.
This patch keeps the static cmdprio options in
{ioring,libaio}_options, but moves the dynamically allocated memory
needed by cmdprio to {ioring,libaio}_data.
No cmdprio related memory leaks are reported after this patch.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-9-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new field "mode", in order to know if we are determining IO
priorities according to cmdprio_percentage or to cmdprio_bssplit.
This makes the logic easier to reason about, and allows us to
remove the "use_cmdprio" variable from the ioengines themselves.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-8-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move common cmdprio_prep() code to cmdprio.c to avoid code duplication.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-7-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The default priority (which is either 0 or the value set by "prio" and
"prioclass" options, will now be used regardless if prio_prep() is
called or not. This is true for both libaio and io_uring.
The way to think about it is that prio_prep() is only called if
cmdprio_percentage/cmdprio_bssplit is used.
prio_prep() might then override the default priority, if the random
value happens to say that this I/O should use the cmdprio_value,
rather than the default priority.
Rename the prio_prep() functions to highlight that these functions
are now only called if cmdprio is used. (If only option
"prio"/"prioclass" is used, that is handled elsewhere.)
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-6-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The default priority (which is either 0 or the value set by "prio" and
"prioclass" options) is now saved in td->ioprio.
The simplest thing is therefore to unconditionally set the async IO
priority to td->ioprio in fio_ioring_prep(), and let fio_ioring_prio_prep()
only handle the case where cmdprio_percentage/cmdprio_bssplit is enabled.
Therefore, fio_ioring_prio_prep() doesn't need to care if prio/prioclass
was enabled or not, we can simply think that fio_ioring_prio_prep()
might "override" the default priority, whatever the default priority may
be.
Doing it this way also has the advantage that the prio_prep() function
in io_uring will now look identical to the prio_prep() function in
libaio.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-5-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
All cmdprio options only support data directions read and write.
However, each cmdprio option allocates memory for ddir trim as well,
even though nothing is ever written to this memory.
Change this so that we don't allocate memory for something which is
never used.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-4-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move cmdprio function definitions from the cmdprio.h header file to a new
cmdprio.c file, such that we can add new static functions to cmdprio.c.
A follow up patch will add new cmdprio functions which do not need to be
directly accessible by ioengines.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-3-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
To avoid the warning from clang "warning: unannotated fall-through
between switch labels [-Wimplicit-fallthrough]" swap the "fall through"
comment with the "fallthrough;" annotation from compiler.h.
Since the second "fall through" comment isn't really a new fall-through,
remove it.
Signed-off-by: Rebecca Cran <rebecca@bsdio.com>
Link: https://lore.kernel.org/r/20211016061738.76654-1-rebecca@bsdio.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ret is set to -1 but the break statement will not use this value.
So let's remove this useless assignment which could be confusing.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
|
|
The current code was returning 1 if generic_close_file() fails.
The ret value was prepared with the real error, let's return this one as
the per generic_open_file() error handling.
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
|
|
Introduce the log_prio option to expand priority logging from just a
single bit information (priority high vs low) to the full value of the
priority value used to execute IOs. When this option is set, the
priority value is printed as a 16-bits hexadecimal value combining
the I/O priority class and priority level as defined by the
ioprio_value() helper.
Similarly to the log_offset option, this option does not result in
actual I/O priority logging when log_avg_msec is set.
This patch also fixes a problem with the IO_U_F_PRIORITY flag, namely
that this flag is used to indicate that the IO is being executed with a
high priority on the device while at the same time indicating how to
account for the IO completion latency (high_prio clat vs low_prio clat).
With the introduction of the cmdprio_class and cmdprio options, these
assumptions are not necesarilly compatible anymore.
These problems are addressed as follows:
* The priority_bit field of struct iosample is replaced with the
16-bits priority field representing the full io_u->ioprio value. When
log_prio is set, the priority field value is logged as is. When
log_prio is not set, 1 is logged as the entry's priority field if the
sample priority class is IOPRIO_CLASS_RT, and 0 otherwise.
* IO_U_F_PRIORITY is renamed to IO_U_F_HIGH_PRIO to indicate that a job
IO has the highest priority within the job context and so must be
accounted as such using high_prio clat.
While fio final statistics only show accounting of high vs low IO
completion latency statistics, the log_prio option allows a user to
perform more detailed statistical analysis of a workload using
multiple different IO priorities.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In fio, a job IO priority is controlled with the prioclass and prio
options and these options cannot be used together with the
cmdprio_percentage option.
Allow a user to have async IO priorities default to the job defined
IO priority by removing the mutual exclusion between the options
cmdprio_percentage and prioclass/prio.
With the introduction of the cmdprio_class option, an async IO priority
may be lower than the job default priority, resulting in reversed clat
statistics showed for high and low priority IOs when fio completes.
Solve this by setting an io_u IO_U_F_PRIORITY flag depending on a
comparison between the async IO priority and job default IO priority.
When an async IO is issued without a priority set, Linux kernel will
execute it using the IO priority of the issuing context set with
ioprio_set(). This works fine for libaio, where the context will be
the same as the context that submitted the IO.
However, io_uring can be used with a kernel thread that performs
block device IO submissions (sqthread_poll). Therefore, for io_uring,
an IO sqe ioprio field must be set to the job default priority unless
the IO priority is set according to the job cmdprio_percentage value.
Because of this, IO uring already did set sqe->ioprio even when only
prio/prioclass was used. See commit b7ed2a862dda ("io_uring: set sqe
iopriority, if prio/prioclass is set"). In order to make the code easier
to maintain, handle all I/O priority preparations in the same function.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The cmdprio_percentage, cmdprio_class and cmdprio options allow
specifying different values for read and write operations. This enables
various IO priority issuing patterns even uner a mixed read-write
workload but does not allow differentiation within read and write
I/O operation types with different sizes when the bssplit option is
used.
Introduce the cmdprio_bssplit option to complement the use of the
bssplit option. This new option has the same format as the bssplit
option, but the percentage values indicate the percentage of I/O
operations with a particular block size that must be issued with the
priority class and value specified by cmdprio_class and cmdprio.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When the cmdprio_percentage option is used, the specified percentage of
IO will be issued with the highest priority class IOPRIO_CLASS_RT. This
priority class maps to the ATA NCQ "high" priority level and allows
exercising a SATA device to measure its command latency characteristics
in the presence of low and high priority commands.
Beside ATA NCQ commands, Linux block IO schedulers also support IO
priorities and will behave differently in the presence of IOs with
different IO priority classes and values. However, cmdprio_percentage
does not allow specifying all possible priority classes and values.
To solve this, introduce libaio and io_uring engine specific options
cmdprio_class and cmdprio. These new options are the equivalent
of the prioclass and prio options and allow specifying the priority
class and priority value to use for asynchronous I/Os when the
cmdprio_percentage option is used. If not specified, the I/O priority
class defaults to IOPRIO_CLASS_RT and the I/O priority value to 0,
as before. Similarly to the cmdprio_percentage option, these options
can specify different values for read and write I/Os using a comma
separated list.
The manpage, HOWTO and fiograph configuration file are updated to
document these new options.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The cmdprio_percentage option of the libaio and io_uring engines defines
a single percentage that applies to all IO operations, regardless of
their direction. This prevents defining different high priority IO
percentages for reads and writes operations. This differentiation can
however be useful in the case of a mixed read-write workload (rwmixread
and rwmixwrite options).
Change the option definition to allow specifying a comma separated list
of percentages, 2 at most, one for reads and one for writes. If only a
single percentage is defined, it applies to both reads and writes as
before. The cmdprio_percentage option becomes an array of DDIR_RWDIR_CNT
elements indexed with enum fio_ddir values. The last entry of the array
(for DDIR_TRIM) is always 0.
Also create a new cmdprio helper file, engines/cmdprio.h,
such that we can avoid code duplication between io_uring and libaio
io engines. This helper file will be extended in subsequent patches.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Introduce the ioprio_value() helper function to calculate a priority
value based on a priority class and priority level. For Linux and
Android, this is defined as an integer equal to the priority class
shifted left by 13 bits and or-ed with the priority level. For
Dragonfly, ioprio_value() simply returns the priority level as there
is no concept of priority class.
Use this new helper in the io_uring and libaio engines to set IO
priority when the cmdprio_percentage option is used.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
removed the memset of sqe from fio_ioring_prep().
This commit did add a clear of the sqe->rw_flags, however, it did so
after both RWF_UNCACHED and RWF_NOWAIT flags might have been set,
effectively clearing these flags if they got set.
This doesn't make any sense. Make sure that we clear sqe->rw_flags
before, not after, setting the flags.
Fixes: 7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
removed the memset of sqe from fio_ioring_prep().
Before this commit, fio_ioring_prio_prep() behaved properly, because
sqe->ioprio was always cleared by the memset in fio_ioring_prep().
cmdprio_percentage=20 is supposed to set the highest priority class for
20% of the total I/Os, however, because sqes got reused without clearing
the ioprio field, this meant that the number of I/Os sent with the highest
priority became 95% already after 10 seconds. Quite far off from the
intended 20%.
Fix this by explicitly clearing the priority in fio_ioring_prio_prep().
Note that prio/prioclass cannot be used together with cmdprio_percentage,
so we do not need to do an additional clear in fio_ioring_prep().
engines/libaio.c doesn't explicitly clear the ioprio, nor does it memset
the descriptor entry, this is because io_prep_pread()/io_prep_pwrite() in
libaio itself performs a memset.
Fixes: 7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 7c70f506e438 ("engines/io_uring: move sqe clear out of hot path")
removed the memset of sqe from fio_ioring_prep().
Later, force_async was added in commit 5a59a81d2923 ("engines/io_uring:
allow setting of IOSQE_ASYNC").
The force_async commit sets sqe->flags every N requests, however,
since we no longer do a memset, this commit should have made sure that
flags is always initialized, such that we don't have sqe->flags set on
reused sqes where we didn't intend to.
Fixes: 5a59a81d2923 ("engines/io_uring: allow setting of IOSQE_ASYNC")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A few changes were done to the pool connect and container open API
in DAOS 1.3+. UUID string or label are now passed via the API
instead of uuid_t structures. Change the dfs engine accordingly.
Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
|
|
The trim workload to zoned block devices is supported as zone reset, and
this feature is available for I/O engines which support both zoned
devices and trim workload. Libzbc I/O engine supports zoned devices but
lacks trim workload support. To enable trim support with libzbc I/O
engine, remove the check which inhibited trim from requests to libzbc
I/O engine. Also set file open flags for trim same as write, and call
zbd_do_io_u_trim() for trim I/O.
Of note is that libzbc I/O engine now can support trim to sequential
write required zones only. The trim I/Os to conventional zones are
reported as an error.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As per the coverty reports, there was some issues in my code :
- Some structures were not properly freed before returning.
- Some file descriptors were not properly closed
- Testing with 'if (!int)' isn't a good way to test if the value is negative
Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
|
|
No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When performing benchmarks with fio, some need to execute
tasks in parallel to the job execution. A typical use-case would be
observing performance/power metrics.
Several implementations were possible :
- Adding an exec_run in addition of the existing exec_{pre|post}run
- Implementating performance/power metrics in fio
- Adding an exec engine
1°) Adding an exec_run
This was my first intention but quickly noticed that exec_{pre-post}run
are executed for each 'numjob'. In the case of performance/power
monitoring, this doesn't make sense to spawn an instance for each
thread.
2°) Implementing performance/power metrics
This is possible but would require lot of work to maintain this part of
fio while 3rd party tools already take care of that perfectly.
3°) Adding an engine
Adding an engine let users defining when and how many instances of the program they want.
In the provided example, a single monitoring job is spawning at the same
time as the worker thread which could be composed of several worker
threads.
A stonewall barrier is used to define which jobs must run together
(monitoring / benchmark).
The engine has two parameters :
- program: name of the program to run
- arguments: arguments to pass to the program
- grace_time: duration between SIGTERM and SIGKILL
- std_redirect: redirect std{err|out} to dedicated files
Arguments can have special variables to be expanded before the execution:
- %r will be replaced by the job duration in seconds
- %n will be replaced by the job name
During the program execution, the std{out|err} are redirected to files if std_redirect option is set (default).
- stdout: <job_name>.stdout
- stderr: <job_name>.stderr
If the executed program has a nice stdout output, after the fio
execution, the stdout file can be parsed by other tools like CI jobs or graphing tools.
A sample job is provided here to show how this can be used.
It runs twice the CPU engine with two different CPU modes (noop vs qsort).
For each benchmark, the output of turbostat is saved for later analysis.
After the fio run, it is possible to compare the impact of the two modes
on the CPU frequency and power consumption.
This can be easily extended to any other usage that needs to analysis
the behavior of the host during some jobs.
About the implementation, the exec engine forks :
- the child doing an execvp() of the program.
- the parent, fio, will monitor the time passed into the job
Once the time is over, the program is SIGTERM followed by a SIGKILL to
ensure it will not run _after_ the job is completed.
This mechanism is required as :
- not all programs can be controlled properly
- that's last resort protection if the program gets crazy
The delay is controlled by grace_time option, default is 1 sec.
If the program can be limited in its duration, using the %r variable in
the arguments can be used to request the program to stop _before_ the
job finished like :
program=/usr/bin/mytool.sh
arguments=--duration %r
Signed-off-by: Erwan Velu <e.velu@criteo.com>
|
|
For a job with zonemode=zbd, we do not want any file to be ignored.
Each file's file type in that job should be supported by either zbd.c
or the ioengine. If not, we should return an error.
This way, ZBD_IGNORE becomes redundant and can be removed.
By removing ZBD_IGNORE, we know that all files belonging to a job that
has zonemode=zbd set, will either be a zoned block device, or emulate
a zoned block device.
This means that for jobs that have zonemode=zbd, f->zbd_info will always
be non-NULL. This will make the zbd code slightly easier to reason about
and to maintain.
When removing zbd_zoned_model ZBD_IGNORE, define the new first enum value
as 0x1, so that we avoid potential ABI problems with existing binaries.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
check_engine_ops() already returns an error if io_submit_mode is
IO_MODE_OFFLOAD and the engine is marked FIO_NO_OFFLOAD.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
* 'fix-libpmem' of https://github.com/lukaszstolarczuk/fio:
engines/libpmem: do not call drain on close
engines/libpmem: cleanup a little code, comments and example
engines/libpmem: set file open/create mode always to RW
|
|
* 'taras/nfs-upstream' of https://github.com/tarasglek/fio-1:
clean up nfs example
skip skeleton comments
single line bodies
C-style comments
NFS configure fixes
NFS engine
|