fio.git
2 years agofio: use LDFLAGS when linking dynamic engines
Eric Sandeen [Wed, 26 Jan 2022 14:49:45 +0000 (08:49 -0600)]
fio: use LDFLAGS when linking dynamic engines

Without this, locally defined LDFLAGS won't be applied when
linking the dynamically loaded IO engines.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: link with libaio when necessary
Eric Sandeen [Tue, 25 Jan 2022 18:57:39 +0000 (12:57 -0600)]
t/io_uring: link with libaio when necessary

When CONFIG_LIBAIO is enabled, we need t/io_uring to link with it.
(libaio_LIBS only affects the aio engine, AFAICT.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'rpma-add-support-for-File-System-DAX' of https://github.com/ldorau/fio
Jens Axboe [Wed, 26 Jan 2022 15:40:18 +0000 (08:40 -0700)]
Merge branch 'rpma-add-support-for-File-System-DAX' of https://github.com/ldorau/fio

* 'rpma-add-support-for-File-System-DAX' of https://github.com/ldorau/fio:
  rpma: add support for File System DAX
  rpma: RPMA engine requires librpma>=v0.10.0 with rpma_mr_advise()

2 years agorpma: add support for File System DAX
Wang, Long [Tue, 25 Jan 2022 09:18:14 +0000 (10:18 +0100)]
rpma: add support for File System DAX

File System DAX is handled in a different way than Device DAX:

1) In case of File System DAX, each thread uses a separate file
from this file system and no offset is needed. In case of Device DAX,
each thread uses a separate offset within the same Device DAX.

2) File System DAX requires rpma_mr_advise(3)(ibv_advise_mr(3))
to be called for the registered memory to avoid page faults
and degraded performance.

Ref: https://github.com/axboe/fio/issues/1238

Signed-off-by: Wang, Long <long1.wang@intel.com>
2 years agorpma: RPMA engine requires librpma>=v0.10.0 with rpma_mr_advise()
Lukasz Dorau [Mon, 24 Jan 2022 22:56:47 +0000 (23:56 +0100)]
rpma: RPMA engine requires librpma>=v0.10.0 with rpma_mr_advise()

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
2 years agoMerge branch 'master' of https://github.com/ben-ihelputech/fio
Jens Axboe [Fri, 21 Jan 2022 17:46:26 +0000 (10:46 -0700)]
Merge branch 'master' of https://github.com/ben-ihelputech/fio

* 'master' of https://github.com/ben-ihelputech/fio:
  Update README to markdown format

2 years agoUpdate README to markdown format
ben-ihelputech [Fri, 21 Jan 2022 15:01:13 +0000 (09:01 -0600)]
Update README to markdown format

- Updated README to README.md to make it look nicer when rendered on Github.

2 years agoiolog.c: Fix memory leak for blkparse case
Lukas Straub [Wed, 19 Jan 2022 21:14:40 +0000 (21:14 +0000)]
iolog.c: Fix memory leak for blkparse case

init_blkparse_read (load_blkparse previously) didn't free the
filename. Fix this by freeing it in the init_iolog function and
handling it for both init_iolog_read and init_blkparse_read.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/e4acf183ab789b7284bfa96089ebe1256e15f98d.1642626314.git.lukasstraub2@web.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblktrace.c: Make thread-safe by removing local static variables
Lukas Straub [Wed, 19 Jan 2022 21:14:36 +0000 (21:14 +0000)]
blktrace.c: Make thread-safe by removing local static variables

Local static variables are not thread-safe. Make the functions in
blktrace.c safe by replacing them.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/b805bb3f6acf6c5b4d8811872c62af939aac62a7.1642626314.git.lukasstraub2@web.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblktrace.c: Don't sleep indefinitely if there is a wrong timestamp
Lukas Straub [Wed, 19 Jan 2022 21:14:33 +0000 (21:14 +0000)]
blktrace.c: Don't sleep indefinitely if there is a wrong timestamp

Each of my traces have a single entry with a wrong timestamp
that causes a underflow followed by a infinite sleep.

Fix this by checking for underflow.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/a19b7ea899093c4c0ed98d2d9a310f2f0f01fddd.1642626314.git.lukasstraub2@web.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblktrace.c: Don't hardcode direct-io
Lukas Straub [Wed, 19 Jan 2022 21:14:30 +0000 (21:14 +0000)]
blktrace.c: Don't hardcode direct-io

This is unexpected if one wants to test performance of a
standard filesystem (by pointing replay_redirect to a standard file)
with buffered io.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/239cc0c47c346408607772fb423aa5745a3779dd.1642626314.git.lukasstraub2@web.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agolinux-dev-lookup.c: Put the check for replay_redirect in the beginning
Lukas Straub [Wed, 19 Jan 2022 21:14:26 +0000 (21:14 +0000)]
linux-dev-lookup.c: Put the check for replay_redirect in the beginning

The machine may not have any block device nodes (like my dev container)
which makes this function fail despite replay_redirect being set.

Move the check to the beginning to fix this.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/0dd4b6407f7b7f5f15f1fcad409554ff339ffca1.1642626314.git.lukasstraub2@web.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblktrace.c: Add support for read_iolog_chunked
Lukas Straub [Wed, 19 Jan 2022 21:14:23 +0000 (21:14 +0000)]
blktrace.c: Add support for read_iolog_chunked

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/d43a8a2d5fd23d9756cdcf280cd2f3572585f264.1642626314.git.lukasstraub2@web.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoiolog.c: Make iolog_items_to_fetch public
Lukas Straub [Wed, 19 Jan 2022 21:14:20 +0000 (21:14 +0000)]
iolog.c: Make iolog_items_to_fetch public

This function be needed in the next patch.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/81c9fbb31bbf0c487dc0ebff5eb85ca764fb14ef.1642626314.git.lukasstraub2@web.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblktrace.c: Use file stream interface instead of fifo
Lukas Straub [Wed, 19 Jan 2022 21:14:16 +0000 (21:14 +0000)]
blktrace.c: Use file stream interface instead of fifo

Like in iolog.c use the file stream interface for accessing
the iolog file.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Link: https://lore.kernel.org/r/5f52a20f95ebead7fa9ae8bce0acf8f0570219ca.1642626314.git.lukasstraub2@web.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodocs: documentation for sg WRITE STREAM(16)
Vincent Fu [Mon, 15 Nov 2021 20:07:17 +0000 (20:07 +0000)]
docs: documentation for sg WRITE STREAM(16)

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-7-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agosg: allow fio to open and close streams for WRITE STREAM(16) commands
Vincent Fu [Mon, 15 Nov 2021 20:07:17 +0000 (20:07 +0000)]
sg: allow fio to open and close streams for WRITE STREAM(16) commands

If --stream_id=0 then fio will open a stream for WRITE STREAM(16) commands and
close the stream when the device file is closed.

Example:
./fio --name=test --filename=/dev/sdb --ioengine=sg --number_ios=1 --debug=file,io --sg_write_mode=write_stream --rw=randwrite
fio: set debug option file
fio: set debug option io
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sg, iodepth=1
fio-3.27
Starting 1 process
file     1072297 setup files
file     1072297 get file size for 0x7f0306fa5110/0//dev/sdb
file     1072307 trying file /dev/sdb 290
file     1072307 fd open /dev/sdb
file     1072307 file not found in hash /dev/sdb
file     1072307 sgio_stream_control: opened stream 1
file     1072307 get file /dev/sdb, ref=0
io       1072307 drop page cache /dev/sdb
file     1072307 goodf=1, badf=2, ff=2b1
file     1072307 get_next_file_rr: 0x7f0306fa5110
file     1072307 get_next_file: 0x7f0306fa5110 [/dev/sdb]
file     1072307 get file /dev/sdb, ref=1
io       1072307 fill: io_u 0xb55700: off=0x35ef554000,len=0x1000,ddir=1,file=/dev/sdb
io       1072307 prep: io_u 0xb55700: off=0x35ef554000,len=0x1000,ddir=1,file=/dev/sdb
io       1072307 prep: io_u 0xb55700: ret=0
io       1072307 queue: io_u 0xb55700: off=0x35ef554000,len=0x1000,ddir=1,file=/dev/sdb
io       1072307 complete: io_u 0xb55700: off=0x35ef554000,len=0x1000,ddir=1,file=/dev/sdb
file     1072307 put file /dev/sdb, ref=2
file     1072307 close files
file     1072307 put file /dev/sdb, ref=1
file     1072307 sgio_stream_control: closed stream 1
file     1072307 fd close /dev/sdb
io       1072307 close ioengine sg
io       1072307 free ioengine sg

test: (groupid=0, jobs=1): err= 0: pid=1072307: Mon Aug 16 14:25:45 2021
  write: IOPS=200, BW=800KiB/s (819kB/s)(4096B/5msec); 0 zone resets
    clat (nsec): min=93339, max=93339, avg=93339.00, stdev= 0.00
     lat (nsec): min=96201, max=96201, avg=96201.00, stdev= 0.00
    clat percentiles (nsec):
     |  1.00th=[93696],  5.00th=[93696], 10.00th=[93696], 20.00th=[93696],
     | 30.00th=[93696], 40.00th=[93696], 50.00th=[93696], 60.00th=[93696],
     | 70.00th=[93696], 80.00th=[93696], 90.00th=[93696], 95.00th=[93696],
     | 99.00th=[93696], 99.50th=[93696], 99.90th=[93696], 99.95th=[93696],
     | 99.99th=[93696]
  lat (usec)   : 100=100.00%
  cpu          : usr=100.00%, sys=0.00%, ctx=2, majf=0, minf=20
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=800KiB/s (819kB/s), 800KiB/s-800KiB/s (819kB/s-819kB/s), io=4096B (4096B), run=5-5msec

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-6-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agosg: add support for WRITE STREAM(16) commands
Vincent Fu [Mon, 15 Nov 2021 20:07:17 +0000 (20:07 +0000)]
sg: add support for WRITE STREAM(16) commands

Add the "write_stream" option to sg_write_mode to send WRITE STREAM(16)
commands. Use the new stream_id option to set the stream identifier.

Example:

sg_stream_ctl -o /dev/sdb
Assigned stream id: 1
./fio --name=test --filename=/dev/sdb --ioengine=sg --sg_write_mode=write_stream --stream_id=1 --rw=randwrite --time_based --runtime=10s
...
sg_stream_ctl -c --id=1 /dev/sdb

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-5-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agosg: improve sg_write_mode option names
Vincent Fu [Mon, 15 Nov 2021 20:07:17 +0000 (20:07 +0000)]
sg: improve sg_write_mode option names

There is a name collision for the sg_write_mode options for the WRITE AND
VERIFY and VERIFY commands. Deprecate the 'verify' option and use
'write_and_verify' instead. Do the same thing for 'same' and 'write_same' to
have a consistent naming scheme. The original option names are still supported
for backward compatibility but list them as deprecated.

Here are the new sg_write_mode options:

Option SCSI command
write WRITE (default)
write_and_verify WRITE AND VERIFY
verify (deprecated) WRITE AND VERIFY
write_same WRITE SAME
same (deprecated) WRITE SAME
write_same_ndob         WRITE SAME with NDOB flag set
verify_bytchk_00 VERIFY with BYTCHK set to 00
verify_bytchk_01 VERIFY with BYTCHK set to 01
verify_bytchk_11 VERIFY with BYTCHK set to 11

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-4-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agosg: add support for WRITE SAME(16) commands with NDOB flag set
Vincent Fu [Mon, 15 Nov 2021 20:07:17 +0000 (20:07 +0000)]
sg: add support for WRITE SAME(16) commands with NDOB flag set

Add the sg_write_mode option write_same_ndob to issue WRITE SAME(16) commands
with the no data output buffer flag set. This flag is not supported for WRITE
SAME(10). So all commands with this option will be WRITE SAME(16).

Also include an example job file.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-3-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agosg: add support for VERIFY command using write modes
Vincent Fu [Mon, 15 Nov 2021 20:07:17 +0000 (20:07 +0000)]
sg: add support for VERIFY command using write modes

fio does not have an explicit verify data direction and creating a new data
direction just for SCSI VERIFY commands probably is not worthwhile. The format
of SCSI VERIFY commands matches that of write operations since VERIFY commands
can include data transfer to the device. So it seems reasonable to have VERIFY
commands be accounted for as write operations by fio.

Use the sg_write_mode option to support SCSI VERIFY commands with different
BYTCHK values.

BYTCHK Description
00 No data is transferred to the device; device data is checked
01 Device data is compared with data transferred to device
11 Same as 01 except that only one sector of data is transferred to the
device and each sector specified in the verification extent is compared against
this transferred data.

Also update documentation and add a couple example jobs files.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211115200807.117138-2-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: move unified=both mixed allocation and calculation to new helper
Niklas Cassel [Mon, 17 Jan 2022 15:50:54 +0000 (15:50 +0000)]
stat: move unified=both mixed allocation and calculation to new helper

When using unified_rw_reporting=both, we need to print both the
per ddir stats, as well as the mixed stats.

In order to print both, the regular printing functions are responsible
for printing the per ddir stats from the unmodified struct thread_stat,
and show_mixed_ddir_status(), show_mixed_ddir_status_terse()
or add_mixed_ddir_status_json() is responsible for calculating and
printing the mixed stats.

In order to keep the original struct thread_stat intact, these three
functions have to allocate a new local thread_stat, where the mixed ddir
result can be stored before printing.

Move the allocation and calculation of this new struct thread_stat to a
new helper function, so that the code is easier to follow.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20220117155045.311453-3-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: remove duplicated code in show_mixed_ddir_status()
Niklas Cassel [Mon, 17 Jan 2022 15:50:53 +0000 (15:50 +0000)]
stat: remove duplicated code in show_mixed_ddir_status()

When using unified_rw_reporting=mixed, show_ddir_status() is called,
and is solely responsible for printing the mixed stats.

When using unified_rw_reporting=both, show_ddir_status() is called
and prints the regular output, after that, show_mixed_ddir_status()
is called to print the mixed stats.

The way that show_mixed_ddir_status_terse() and
add_mixed_ddir_status_json() is implemented, is to alloc a new local ts
that will hold the mixed result, and then simply call the regular non-mixed
print function show_ddir_status_terse()/add_ddir_status_json() with this
local ts.

show_mixed_ddir_status() also allocates a new local ts, but fails to
initialize the lat percentiles and the percentile_list in the new local ts.
Therefore, show_mixed_ddir_status() has duplicated all the code from
show_ddir_status(), except that it uses the lat percentiles and the
percentile_list from the original ts.

Simplify show_mixed_ddir_status(), to behave in the same way as
show_mixed_ddir_status_terse() and add_mixed_ddir_status_json().

In other words, initialize the lat percentiles and the percentile_list in
the new local ts, and replace all the duplicated code with a simple call to
the regular non-mixed print function (show_ddir_status()).

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20220117155045.311453-2-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoinit: do not create lat logs when not needed
Damien Le Moal [Mon, 17 Jan 2022 02:11:27 +0000 (11:11 +0900)]
init: do not create lat logs when not needed

When any of the options disable_lat, disable_slat and disable_clat are
used, there is no need to create the lat log associated with the
disabled latency. In addition, when write_lat_log is also specified,
this change avoids the creation of empty latency log files.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20220117021127.9259-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: remove unnecessary bool parameter to sum_thread_stats()
Niklas Cassel [Mon, 10 Jan 2022 09:01:39 +0000 (09:01 +0000)]
stat: remove unnecessary bool parameter to sum_thread_stats()

We can deduce if it is the first struct io_stat src being added to the
struct io_stat dst by checking if the current amount of samples in dst
is zero.

Therefore, remove the bool parameter "first" to sum_stat().
Since sum_stat() was the only user of the bool parameter "first" to
the sum_thread_stats() function, we can remove it from sum_thread_stats()
as well.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220110090133.69955-1-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoengines/io_uring: don't set CQSIZE clamp unconditionally
Jens Axboe [Mon, 10 Jan 2022 02:34:27 +0000 (19:34 -0700)]
engines/io_uring: don't set CQSIZE clamp unconditionally

For older kernels without IORING_SETUP_CQSIZE, we'll get EINVAL if we
set it. Just retry the ring setup if that happens.

Link: https://github.com/axboe/fio/issues/1324
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'github-actions-i686' of https://github.com/vincentkfu/fio
Jens Axboe [Thu, 23 Dec 2021 23:27:33 +0000 (16:27 -0700)]
Merge branch 'github-actions-i686' of https://github.com/vincentkfu/fio

* 'github-actions-i686' of https://github.com/vincentkfu/fio:
  t/io_uring: fix help defaults for aio and random_io
  t/io_uring: fix 32-bit build warnings
  Revert "ci: temporarily remove linux-i686-gcc build"
  ci: workaround for problem with i686 builds

2 years agot/io_uring: fix help defaults for aio and random_io
Vincent Fu [Thu, 23 Dec 2021 21:08:09 +0000 (16:08 -0500)]
t/io_uring: fix help defaults for aio and random_io

The positions of the default values for aio and random_io were swapped
in the help message. Put the default values in their proper positions.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
2 years agot/io_uring: fix 32-bit build warnings
Vincent Fu [Tue, 21 Dec 2021 16:07:58 +0000 (11:07 -0500)]
t/io_uring: fix 32-bit build warnings

Also change the type for offset to long long since that's what is
expected by io_prep_pread.

Latency measurement for t/io_uring is actually broken on 32-bit builds
because iocb->data and io_event->data are 32-bit void pointers. So it is
not possible to fit both fileno and clock_index in there.

On a 32-bit build with 4-byte longs, the following warnings appear:

t/io_uring.c: In function ‘prep_more_ios_aio’:
t/io_uring.c:666:44: error: left shift count >= width of type [-Werror=shift-count-overflow]
  666 |    data |= ((unsigned long) s->clock_index << 32);
      |                                            ^~
t/io_uring.c: In function ‘reap_events_aio’:
t/io_uring.c:688:27: error: right shift count >= width of type [-Werror=shift-count-overflow]
  688 |    int clock_index = data >> 32;
      |                           ^~

Explicitly specify 64-bit types to resolve these warnings.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
2 years agoRevert "ci: temporarily remove linux-i686-gcc build"
Vincent Fu [Sun, 19 Dec 2021 00:51:49 +0000 (19:51 -0500)]
Revert "ci: temporarily remove linux-i686-gcc build"

This reverts commit cea3243fb3bb44d541c2b3fb82ee45eb669b6fe6.

1420399f6620b417d9da4b801d3c049cf66e58f0 provides a work-around for the
problem with i686 builds. So we can now re-enable them.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
2 years agoci: workaround for problem with i686 builds
Vincent Fu [Sun, 19 Dec 2021 00:46:51 +0000 (19:46 -0500)]
ci: workaround for problem with i686 builds

GitHub Actions currently has package dependency problems with some i386
packages. The work-around suggested in

https://github.com/actions/virtual-environments/issues/4620

appears to resolve the issue for our builds.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
2 years agoFio 3.29 fio-3.29
Jens Axboe [Sat, 18 Dec 2021 14:09:32 +0000 (07:09 -0700)]
Fio 3.29

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: code cleanup and leak free
Jens Axboe [Sat, 18 Dec 2021 14:06:12 +0000 (07:06 -0700)]
stat: code cleanup and leak free

This file is somewhat of a mess. Only functional change is the free of
ts_lcl in show_mixed_ddir_status(), rest is just a vague attempt at
bringing some sanity back into this file.

Fixes: https://github.com/axboe/fio/issues/1319
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: sum sync_stat before reassigning bool first
Niklas Cassel [Wed, 15 Dec 2021 12:26:04 +0000 (12:26 +0000)]
stat: sum sync_stat before reassigning bool first

Currently, sum_stat(&dst->sync_stat, &src->sync_stat, first, false)
is called after the summing the stats on a per ddir level.

The for-loop that sums the stats on a per ddir level will reassign
bool first to false when unified_rw_rep is used.

This means that the call to sum_stat() for sync_stat will be called
with first == false, even when it is the first sync_stat being summed,
leading to incorrect sync_stat calculations when unified_rw_rep is used.

In order to ensure that sync_stat is not incorrectly affected by the
reassignment of first, move the sync_stat summing before the for-loop.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211215122557.95600-1-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/zbd: Avoid inappropriate blkzone command call in zone_cap_bs
Shin'ichiro Kawasaki [Tue, 14 Dec 2021 01:24:13 +0000 (10:24 +0900)]
t/zbd: Avoid inappropriate blkzone command call in zone_cap_bs

When the script test-zbd-support is run for regular block devices or
SG nodes, blkzone command shall not be called. However, zone_cap_bs()
helper function calls the command regardless of the zone model or
device type, and results in error messages such as "unable to determine
zone size" or "not a block device". Avoid the command call by returning
the zone size argument passed to this function when the test device is
a regular block device or a SG node.

Fixes: 1ae82d673cf5 ("t/zbd: Align block size to zone capacity")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-13-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: introduce zbd_offset_to_zone() helper
Damien Le Moal [Tue, 14 Dec 2021 01:24:12 +0000 (10:24 +0900)]
zbd: introduce zbd_offset_to_zone() helper

Introduce the helper function zbd_offset_to_zone() to get a zone
structure using a file offset. In many functions, this replaces the
two line code pattern:

zone_idx = zbd_offset_to_zone_idx(f, offset);
z = zbd_get_zone(f, zone_idx);

with a single line of code:

z = zbd_offset_to_zone(f, offset);

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-12-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: rename get_zone()
Damien Le Moal [Tue, 14 Dec 2021 01:24:11 +0000 (10:24 +0900)]
zbd: rename get_zone()

Rename get_zone() to zbd_get_zone() to be consistent with the naming
pattern of most zbd functions.

No functional changes.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-11-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: rename zbd_zone_idx() and zbd_zone_nr()
Damien Le Moal [Tue, 14 Dec 2021 01:24:10 +0000 (10:24 +0900)]
zbd: rename zbd_zone_idx() and zbd_zone_nr()

Rename zbd_zone_idx() to zbd_offset_to_zone_idx() to make it clear that
the argument determining the zone is a file offset. To be consistent,
rename zbd_zone_nr() to zbd_zone_idx() to avoid confusion with a number
of zones. While at it, have both functions return value be of the same
unsigned int type.

No functional changes.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-10-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: simplify zbd_open_zone()
Damien Le Moal [Tue, 14 Dec 2021 01:24:09 +0000 (10:24 +0900)]
zbd: simplify zbd_open_zone()

Similarly to zbd_close_zone(), directly pass a pointer to a zone
information structure to zbd_open_zone() instead of a zone number.

No functional changes.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-9-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: simplify zbd_close_zone()
Damien Le Moal [Tue, 14 Dec 2021 01:24:08 +0000 (10:24 +0900)]
zbd: simplify zbd_close_zone()

Change the interface of zbd_close_zone() to directly use a pointer to a
zone information structure as all callers already have this information.
Also do nothing for zones that are not marked as open instead of
figuring out this fact by searching the array of open zones.

No functional changes.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-8-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: fix code style issues
Damien Le Moal [Tue, 14 Dec 2021 01:24:07 +0000 (10:24 +0900)]
zbd: fix code style issues

Avoid overly long lines, remove unnecessary curly brackets and add blank
lines to make the code more readable.

No functional changes.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-7-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: introduce zbd_zone_align_file_sizes() helper
Damien Le Moal [Tue, 14 Dec 2021 01:24:06 +0000 (10:24 +0900)]
zbd: introduce zbd_zone_align_file_sizes() helper

Move the code for the innermost loop of the function zbd_verify_sizes()
to the new helper function zbd_zone_align_file_sizes(). This helper
avoids large indentation of the code in zbd_verify_sizes() and makes
the code easier to read.

No functional changes.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-6-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: remove is_zone_open() helper
Damien Le Moal [Tue, 14 Dec 2021 01:24:05 +0000 (10:24 +0900)]
zbd: remove is_zone_open() helper

The helper function is_zone_open() is useless as a each zone has an open
flag indicating if it is part of the array of open zones. Remove this
function code and use the zone open flag in zbd_open_zone().

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-5-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: move and cleanup code
Damien Le Moal [Tue, 14 Dec 2021 01:24:04 +0000 (10:24 +0900)]
zbd: move and cleanup code

Move zone manipulation helper functions at the beginning of the zbd.c
file to avoid forward declarations and to group these functions
together apart from the IO manipulation functions.
Also fix function comments.

No functional changes.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-4-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: define local functions as static
Damien Le Moal [Tue, 14 Dec 2021 01:24:03 +0000 (10:24 +0900)]
zbd: define local functions as static

Define zbd_get_zoned_model(), zbd_report_zones(), zbd_reset_wp() and
zbd_get_max_open_zones() as static since these functions are used
locally only.

No functional changes.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-3-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agofio: Improve documentation of ignore_zone_limits option
Damien Le Moal [Tue, 14 Dec 2021 01:24:02 +0000 (10:24 +0900)]
fio: Improve documentation of ignore_zone_limits option

In the manual pages, change the description of the option
ignore_zone_limits to its action when set, instead of the confusing text
describing what happens when it is not set. Also add the description
of this option in the HOWTO file as it is missing.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214012413.464798-2-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoci: use macos 11 in virtual environment
Niklas Cassel [Tue, 14 Dec 2021 11:18:04 +0000 (11:18 +0000)]
ci: use macos 11 in virtual environment

GitHub Actions was recently enabled in commit ce1b5612ce99 ("ci: add CI
via GitHub Actions").

The commit has a AuthorDate of 2020.

The commit mentions that it uses macOS 10.15 rather than 11.0 because
11.0 is only private preview.

This was true in 2020, but looking at:
https://docs.github.com/en/actions/reference/specifications-for-github-hosted-runners

macos-11 is no longer marked as private preview, so let's use it in the
virtual environment.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214111756.52968-2-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoci: temporarily remove linux-i686-gcc build
Niklas Cassel [Tue, 14 Dec 2021 11:18:03 +0000 (11:18 +0000)]
ci: temporarily remove linux-i686-gcc build

GitHub Actions was recently enabled in commit ce1b5612ce99 ("ci: add CI
via GitHub Actions").

The new CI configuration was not properly tested before being merged,
as the linux-i686-gcc build currently fails for the master branch:
https://github.com/axboe/fio/actions

The problem appears to be related to ci/actions-install.sh wanting to
install broken packages on linux-i686-gcc.

The new CI configuration will also cause fio forks on GitHub to trigger
a GitHub Action (inside the forked repo) for every push.

Since the linux-i686-gcc build currently fails, this will currently cause
error emails to be sent out for every push to a forked repo.

In order to avoid spamming everyone who has forked fio on GitHub, let's
temporarily remove the linux-i686-gcc build until ci/actions-install.sh
specifies a working list of packages. Once that is done, this commit can
simply be reverted.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211214111756.52968-1-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'github-actions' of https://github.com/sitsofe/fio
Jens Axboe [Fri, 10 Dec 2021 18:08:26 +0000 (11:08 -0700)]
Merge branch 'github-actions' of https://github.com/sitsofe/fio

* 'github-actions' of https://github.com/sitsofe/fio:
  ci: retire travis configuration
  ci: add CI via GitHub Actions

2 years agoioengines: libzbc: disable libzbc block backend driver
Damien Le Moal [Fri, 10 Dec 2021 01:20:41 +0000 (10:20 +0900)]
ioengines: libzbc: disable libzbc block backend driver

libzbc includes 3 different internal backend drivers:
1) The block backend: this backend relies on the kernel SMR support and
   uses regular system calls.
2) The SCSI backend: this is a SG passthrough driver for SAS drives and
   for SATA drives accessible through an SMR compliant SAT (SCSI-to-ATA
   translation layer).
3) The ATA backend: this is a SG passthrough driver for SATA drives not
   handled by the system SAT (either kernel or HBA SAT)

libzbc automatically selects the internal backend driver, using the
first one that is detected as functional (tested in the same order shown
above).

When running on an SMR enabled system (SMR compliant HBA and kernel with
zoned block device support enabled), any fio job using the libzbc IO
engine will thus end up using the regular kernel IO path. This is silly:
for this IO path, the libaio or psync IO engines are far better (less
overhead and more functionalities). The libzbc IO engine should be
restricted to be a passthrough engine only, similarly to the sg engine.

Fix the libzbc engine to not allow the use of libzbc block backend
driver by removing the ZBC_O_DRV_BLOCK flag when opening the device.

Also adjust the test script t/zbd/run-tests-against-nullb to remove the
-l option to force the use of the libzbc IO engine as it will not work
anymore (since the nullb device is neither a SCSI nor an ATA device).

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211210012041.310670-1-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'arm-detect-pmull' of https://github.com/sitsofe/fio
Jens Axboe [Mon, 6 Dec 2021 20:26:52 +0000 (13:26 -0700)]
Merge branch 'arm-detect-pmull' of https://github.com/sitsofe/fio

* 'arm-detect-pmull' of https://github.com/sitsofe/fio:
  os: detect PMULL support before enabling accelerated crc32c on ARM

2 years agoos: detect PMULL support before enabling accelerated crc32c on ARM
Sitsofe Wheeler [Mon, 6 Dec 2021 20:02:53 +0000 (20:02 +0000)]
os: detect PMULL support before enabling accelerated crc32c on ARM

Issue #1239 shows a crash on a FUJITSU/A64FX ARM platform at the
following line:

crc/crc32c-arm64.c:
 64                 t1 = (uint64_t)vmull_p64(crc1, k2);

On armv8 PMULL crypto instructions like vmull_p64 are defined as
optional (see
https://github.com/google/crc32c/pull/6#issuecomment-328713398 and
https://github.com/dotnet/runtime/issues/35143#issuecomment-617263508 ).

Avoid the crash by gating use of the hardware accelerated ARM crc32c
path behind runtime detection of PMULL.

Fixes: https://github.com/axboe/fio/issues/1239

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
2 years agolibfio: drop unneeded reset of rwmix_issues
Vincent Fu [Fri, 3 Dec 2021 21:10:10 +0000 (21:10 +0000)]
libfio: drop unneeded reset of rwmix_issues

We don't need to repeatedly reset rwmix_issues inside a loop. We
actually don't need to reset it at all inside reset_all_stats() because
this is already done in reset_io_counters().

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211203211050.51241-3-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_ddir: return appropriate string for DDIR_INVAL
Vincent Fu [Fri, 3 Dec 2021 21:10:10 +0000 (21:10 +0000)]
io_ddir: return appropriate string for DDIR_INVAL

When looking up the appropriate name for a data direction, make sure the
array index is non-negative. This ensures that we return an appropriate
string for DDIR_INVAL.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20211203211050.51241-2-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agofilesetup: create zbd_info before jumping to done label
Niklas Cassel [Thu, 2 Dec 2021 09:42:06 +0000 (09:42 +0000)]
filesetup: create zbd_info before jumping to done label

For a thread that has zonemode == ZONE_MODE_ZBD set, the zbd code requires
that each file (for that thread) has a valid f->zbd_info pointer.

This intent was further clarified by commit 5ddf46d0b2df ("zbd: change some
f->zbd_info conditionals to asserts").

The zbd info pointer is set by zbd_init_files(), either by creating a new
zbd_info struct, or by increasing the refcount of an existing zbd_info.

A zbd_info struct contains the in memory state of the zones, including e.g.
each zone's wp and zone capacity.

Normally, zbd_init_files() is always called, even for read only workloads.
However, in the case where a read iolog was supplied, setup_files()
currently jumps to the done label before zbd_init_files() has been called.

Even for a read only workload, zbd_adjust_block() will do things as
checking if the read I/O is below the wp (unless td->o.read_beyond_wp is
enabled). In order to be able to do this comparison, we need a valid
zbd_info.

There is no reason why the zbd code should treat a read only workload
different from a read iolog workload. (E.g. the wp for the zones might
have changed since the read iolog was recorded.)

If the user for some reason wants to disregard the wp check during a read
iolog workload, the td->o.read_beyond_wp option can be used, just like in
the regular read only workload case.

Move the read iolog check and the matching "goto done" after the call to
zbd_init_files(). This way, we treat a read iolog workload simlar to a
regular read only workload, while avoiding an assertion failure in
zbd_setup_files() (which is called after the done label).

Reported-by: Shane Moore <shane.moore@wdc.com>
Suggested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Shane Moore <shane.moore@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211202094153.8381-1-Niklas.Cassel@wdc.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: make add lat percentile functions inline
Niklas Cassel [Thu, 25 Nov 2021 13:20:32 +0000 (13:20 +0000)]
stat: make add lat percentile functions inline

Now that add_lat_percentile_prio_sample() has been simplified,
make both add lat percentile functions inline, just like add_stat_sample().

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-7-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: simplify add_lat_percentile_prio_sample()
Niklas Cassel [Thu, 25 Nov 2021 13:20:32 +0000 (13:20 +0000)]
stat: simplify add_lat_percentile_prio_sample()

add_lat_percentile_prio_sample() currently adds both a per priority sample
and a regular sample.

Since these two samples are completely unrelated, it is very confusing that
the add_lat_percentile_prio_sample() also adds a regular sample.

Remove the add_lat_percentile_sample() function call from
add_lat_percentile_prio_sample(), and let functions calling
add_lat_percentile_prio_sample() call add_lat_percentile_sample()
explicitly. This makes the flow in e.g. add_clat_sample() much easier to
follow.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-6-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: rename add_lat_percentile_sample_noprio()
Niklas Cassel [Thu, 25 Nov 2021 13:20:31 +0000 (13:20 +0000)]
stat: rename add_lat_percentile_sample_noprio()

add_lat_percentile_sample_noprio() is the regular function to add a latency
percentile sample. It adds a regular sample (it doesn't add any per
priority sample). Therefore, it makes sense that this function has no
suffix, neither _noprio nor _prio.

Drop the _noprio suffix from add_lat_percentile_sample_noprio(), to make it
more obvious that this function should be used if you want to add a regular
percentile sample.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-5-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: rename add_lat_percentile_sample()
Niklas Cassel [Thu, 25 Nov 2021 13:20:31 +0000 (13:20 +0000)]
stat: rename add_lat_percentile_sample()

The name for add_lat_percentile_sample() is confusing, since the function
actually adds a per priority percentile sample (it also adds a regular
sample), yet it doesn't have prio as part of the function name.

Rename the function so that it is more obvious that this function should
be used if you want to add a prio percentile sample.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-4-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: add comments describing the quirky behavior of clat prio samples
Niklas Cassel [Thu, 25 Nov 2021 13:20:30 +0000 (13:20 +0000)]
stat: add comments describing the quirky behavior of clat prio samples

Commit 56440e63ac17 ("fio: report percentiles for slat, clat, lat")
together with commit 38ec5c514104 ("stat: make priority summary statistics
consistent with percentiles") changed so that per prio stats track either
completion latency (clat) or total latency (lat), depending on the option
lat_percentiles.

It is not obvious why add_clat_sample() shouldn't add a high/low clat prio
sample when option lat_percentiles is set, especially considering that
option lat_percentiles is usually used for controlling if total latency
percentiles should be displayed or not.

Add comments to describe why add_clat_sample() has to care about option
lat_percentiles.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-3-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodocs: document quirky implementation of per priority stats reporting
Niklas Cassel [Thu, 25 Nov 2021 13:20:30 +0000 (13:20 +0000)]
docs: document quirky implementation of per priority stats reporting

Commit 56440e63ac17 ("fio: report percentiles for slat, clat, lat") changed
many things. One of the changes, from the commit message:
"- for the new cmdprio_percentage latencies, if lat_percentiles=1,
*total* latency percentiles will be tracked. Otherwise, *completion*
latency percentiles will be tracked."

In other words, the commit changed the per prio stats from always tracking
(and reporting) clat latency, to instead either track (and report) clat or
lat latency.

Considering that a certain latency type reports two things:
1) min/max/avg latency for the the specific latency type
2) latency percentiles for the specific latency type

If disable_clat/disable_lat is used, neither 1) nor 2) will be reported.
If clat_percentiles/lat_percentiles is false, 2) will not be reported.

Therefore it is unintuitive that setting lat_percentiles=1, an option
usually used to enable/disable percentile reporting, also affects which
type of latency that will be tracked (and reported) for per prio stats.

The fact that the variables are named e.g. clat_prio_stat, regardless of
the type of latency being tracked does not help.

Anyway, let's document the way that the current implementation works,
so that a user can know how per priority stats are handled, without having
to read the source, since the commit that introduced this behavior forgot
to update the documentation.

Fixes: 56440e63ac17 ("fio: report percentiles for slat, clat, lat")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-2-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio
Jens Axboe [Wed, 24 Nov 2021 17:27:20 +0000 (10:27 -0700)]
Merge branch 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio

* 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio:
  parse: handle comma-separated options

2 years agoparse: handle comma-separated options
Oleg Latin [Wed, 24 Nov 2021 17:17:04 +0000 (20:17 +0300)]
parse: handle comma-separated options

Option parser does not properly handle 'sync_file_range' option with
multiple flags.  It was due to opt_len() only use ':' as delimiter, so
only last flag in comma-separated list have effect.

This patch adds ',' as a delimiter.  All flags are correctly ORed now.

Fixes: https://github.com/axboe/fio/issues/1234
Signed-off-by: Oleg Latin <oleglatin@yandex-team.ru>
2 years agot/dedupe: style fixups
Jens Axboe [Sun, 21 Nov 2021 13:51:11 +0000 (06:51 -0700)]
t/dedupe: style fixups

Some introduced by a recent patch, some old. Be more consistent with the
fio coding style.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: fix 32-bit compile warnings
Jens Axboe [Sun, 21 Nov 2021 13:50:22 +0000 (06:50 -0700)]
t/io_uring: fix 32-bit compile warnings

We need to use a 64-bit cast for the shift the the user_data, and
fix the init of minv in the clat percentile calculation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'dedupe_and_compression' of https://github.com/bardavid/fio
Jens Axboe [Sun, 21 Nov 2021 13:43:05 +0000 (06:43 -0700)]
Merge branch 'dedupe_and_compression' of https://github.com/bardavid/fio

* 'dedupe_and_compression' of https://github.com/bardavid/fio:
  fio-dedup: adjusted the binary to support compression
  Mixed dedup and compression

2 years agofio-dedup: adjusted the binary to support compression
Bar David [Wed, 10 Nov 2021 06:56:47 +0000 (08:56 +0200)]
fio-dedup: adjusted the binary to support compression

When given -C 1 option, fio-dedup will first look for
dedupable data and then calculate compressible (unique)
data opportunity. The rationale is to measure the total
data reduction potential.

Signed-off-by: Bar David <bardavvid@gmail.com>
2 years agoMixed dedup and compression
Bar David [Sun, 24 Oct 2021 10:59:50 +0000 (13:59 +0300)]
Mixed dedup and compression

Introducing support for dedupe and compression
on the same job. When used together, compression is
calculated from unique capacity. E.g. when using
dedupe_percentage=50 and buffer_compress_percentage=50,
then total reduction should be 75% - 50% would be deduped
while 50% of the remaining buffers would be compressed

Signed-off-by: Bar David <bardavvid@gmail.com>
2 years agoSync io_uring header with the kernel
Jens Axboe [Sat, 20 Nov 2021 14:31:20 +0000 (07:31 -0700)]
Sync io_uring header with the kernel

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clamp CQ size to SQ size
Jens Axboe [Sat, 20 Nov 2021 14:27:57 +0000 (07:27 -0700)]
io_uring: clamp CQ size to SQ size

By default, io_uring uses twice as big a CQ ring as the SQ ring. That's
to help with cases where completions can come in unexpectedly. This is not
the case for storage IO, so just clamp the CQ size to save a bit of memory
on the CQEs and CQ ring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: add -R option for random/sequential IO
Jens Axboe [Fri, 19 Nov 2021 17:44:15 +0000 (10:44 -0700)]
t/io_uring: add -R option for random/sequential IO

If -R1 is used, which is the default, then a random IO pattern is used.
If -R0 is used, then the IO will be sequential.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: use internal random generator
Jens Axboe [Fri, 19 Nov 2021 17:40:20 +0000 (10:40 -0700)]
t/io_uring: use internal random generator

Instead of using lrand48_r, use the internal fio random number generator.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agofio: Introduce the log_entries option
Damien Le Moal [Thu, 18 Nov 2021 05:27:29 +0000 (14:27 +0900)]
fio: Introduce the log_entries option

When iops, latency, or bw logging options are used, fio will by default
log information for any I/O that completes. The initial number of I/O
log entries is 1024, as defined by (DEF_LOG_ENTRIES). When all log
entries are used, new log entries are dynamically allocated by
get_new_log(). This dynamic log entry allocation can negatively impact
time-related statistics such as the I/O tail latencies (e.g. 99.9
percentile completion latency) as growing the logs causes a temporary
I/O stall (IO quiesce), which disturbs the workload steady state. The
effect of this is especially noticeable with workloads using IO
priorities: the tail latencies of high priority I/Os increase if the
IO log needs to be grown.

For example, running the following fio command on a SATA disk
supporting NCQ priority:

fio --name=prio-randread --filename=/dev/sdg \
    --random_generator=tausworthe64 --ioscheduler=none \
    --write_lat_log=randread.log --log_prio=1 --rw=randread --bs=128k \
    --ioengine=libaio --iodepth=32 --direct=1 --cmdprio_class=1 \
    --cmdprio_percentage=30 --runtime=900

(128KB random read workload at QD=32 and 30% of commands issued with a
high priority), with an inital number of log entries equal to the
default of 1024, depending on the machine memory state, the completion
latency statistics may show imprecise percentiles such as shown below.

high prio (30.75%) clat percentiles (msec):
 |  1.00th=[   14],  5.00th=[   17], 10.00th=[   20], 20.00th=[   23],
 | 30.00th=[   27], 40.00th=[   32], 50.00th=[   40], 60.00th=[   53],
 | 70.00th=[   71], 80.00th=[  104], 90.00th=[  169], 95.00th=[  243],
 | 99.00th=[  514], 99.50th=[  676], 99.90th=[ 1485], 99.95th=[ 1502],
 | 99.99th=[ 1552]
low prio (69.25%) clat percentiles (msec):
 |  1.00th=[   16],  5.00th=[   24], 10.00th=[   37], 20.00th=[   68],
 | 30.00th=[  105], 40.00th=[  146], 50.00th=[  199], 60.00th=[  255],
 | 70.00th=[  330], 80.00th=[  439], 90.00th=[  592], 95.00th=[  718],
 | 99.00th=[  885], 99.50th=[  986], 99.90th=[ 1469], 99.95th=[ 1536],
 | 99.99th=[ 1586]

All completion latency percentiles above the 99.90th percentile are
similar for the high and low priority commands, which is not consistent
with the drive expected execution of prioritized read commands.

To solve this issue and get more precise latency statistics, this patch
introduces the new "log_entries" option to allow specifying a larger
initial number of IO log entries to avoid run-time allocation.
This option value defaults to DEF_LOG_ENTRIES and its maximum value is
MAX_LOG_ENTRIES to be consistent with get_new_log() allocation. Also
simplify get_new_log() by using calloc() instead of malloc, thus
removing the need for the local variable new_size.

Adding the "--log_entries=65536" option to the previous command line
example, the completion latency results obtained are more stable:

high prio (30.72%) clat percentiles (msec):
 |  1.00th=[   15],  5.00th=[   17], 10.00th=[   19], 20.00th=[   22],
 | 30.00th=[   24], 40.00th=[   27], 50.00th=[   32], 60.00th=[   36],
 | 70.00th=[   46], 80.00th=[   57], 90.00th=[   81], 95.00th=[  105],
 | 99.00th=[  161], 99.50th=[  188], 99.90th=[  271], 99.95th=[  275],
 | 99.99th=[  363]
low prio (69.28%) clat percentiles (msec):
 |  1.00th=[   16],  5.00th=[   27], 10.00th=[   43], 20.00th=[   80],
 | 30.00th=[  123], 40.00th=[  176], 50.00th=[  236], 60.00th=[  313],
 | 70.00th=[  401], 80.00th=[  506], 90.00th=[  634], 95.00th=[  718],
 | 99.00th=[  844], 99.50th=[  885], 99.90th=[  953], 99.95th=[  995],
 | 99.99th=[ 1053]

All completion percentiles clearly now show shorter latencies for high
priority commands, as expected. The 99.99th percentile for low priority
commands is also improved compared to the previous case as the
measurements are not impacted by the log dynamic allocation.

Suggested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211118052729.132423-1-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMakefile: Fix android compilation
Gwendal Grignou [Wed, 17 Nov 2021 22:19:18 +0000 (14:19 -0800)]
Makefile: Fix android compilation

Inclue cmdprio for Android as well.
Without the patch, make for Android fails at link time:

engines/io_uring.c:824: error: undefined reference to 'fio_cmdprio_init'
engines/io_uring.c:456: error: undefined reference to 'fio_cmdprio_set_ioprio'
...

Fixes e27b9ff0e ("cmdprio: move cmdprio function definitions to a new cmdprio.c file")

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Link: https://lore.kernel.org/r/20211117221918.3050439-1-gwendal@chromium.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'jf_readme_typo' of https://github.com/jfpanisset/fio
Jens Axboe [Fri, 12 Nov 2021 16:22:21 +0000 (09:22 -0700)]
Merge branch 'jf_readme_typo' of https://github.com/jfpanisset/fio

* 'jf_readme_typo' of https://github.com/jfpanisset/fio:
  Small typo fix

2 years agolibaio,io_uring: make it possible to cleanup cmdprio malloced data
Niklas Cassel [Fri, 12 Nov 2021 09:54:44 +0000 (09:54 +0000)]
libaio,io_uring: make it possible to cleanup cmdprio malloced data

The way that fio currently handles engine options:
options_free() will call free() only for options that have the type
FIO_OPT_STR_STORE. This means that any option that has a pointer in
either td->o or td->eo, which is not of type FIO_OPT_STR_STORE will
leak memory. This is true even for numjobs == 1.

When running with numjobs > 1, fio_options_mem_dupe() will memcpy
td->eo into the new td. Since off1 of the pointers in the first td
has already been set, the pointers in the new td will point to the
same data. (Regardless, options_free() will never try to free the
memory, for neither td.) Neither can we manually free the memory in
cleanup(), since the other td will still point to the same memory,
so this would lead to a double free.

These memory leaks are reported by e.g. valgrind.

The most obvious way to solve this is to put dynamically allocated
memory in {ioring,libaio}_data instead of {ioring,libaio}_options.

This solves the problem since {ioring,libaio}_data is dynamically
allocated by each td during the ioengine init callback, and is freed
when the ioengine cleanup callback for that td is called.

The downside of this is that the parsing has to be done in
fio_cmdprio_init() instead of in the option .cb callback, since the
.cb callback is called before {ioring,libaio}_data is available.

This patch keeps the static cmdprio options in
{ioring,libaio}_options, but moves the dynamically allocated memory
needed by cmdprio to {ioring,libaio}_data.

No cmdprio related memory leaks are reported after this patch.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-9-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agocmdprio: add mode to make the logic easier to reason about
Niklas Cassel [Fri, 12 Nov 2021 09:54:43 +0000 (09:54 +0000)]
cmdprio: add mode to make the logic easier to reason about

Add a new field "mode", in order to know if we are determining IO
priorities according to cmdprio_percentage or to cmdprio_bssplit.

This makes the logic easier to reason about, and allows us to
remove the "use_cmdprio" variable from the ioengines themselves.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-8-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agolibaio,io_uring: move common cmdprio_prep() code to cmdprio
Niklas Cassel [Fri, 12 Nov 2021 09:54:42 +0000 (09:54 +0000)]
libaio,io_uring: move common cmdprio_prep() code to cmdprio

Move common cmdprio_prep() code to cmdprio.c to avoid code duplication.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-7-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agolibaio,io_uring: rename prio_prep() to include cmdprio in the name
Niklas Cassel [Fri, 12 Nov 2021 09:54:41 +0000 (09:54 +0000)]
libaio,io_uring: rename prio_prep() to include cmdprio in the name

The default priority (which is either 0 or the value set by "prio" and
"prioclass" options, will now be used regardless if prio_prep() is
called or not. This is true for both libaio and io_uring.

The way to think about it is that prio_prep() is only called if
cmdprio_percentage/cmdprio_bssplit is used.

prio_prep() might then override the default priority, if the random
value happens to say that this I/O should use the cmdprio_value,
rather than the default priority.

Rename the prio_prep() functions to highlight that these functions
are now only called if cmdprio is used. (If only option
"prio"/"prioclass" is used, that is handled elsewhere.)

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-6-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: set async IO priority to td->ioprio in fio_ioring_prep()
Niklas Cassel [Fri, 12 Nov 2021 09:54:41 +0000 (09:54 +0000)]
io_uring: set async IO priority to td->ioprio in fio_ioring_prep()

The default priority (which is either 0 or the value set by "prio" and
"prioclass" options) is now saved in td->ioprio.

The simplest thing is therefore to unconditionally set the async IO
priority to td->ioprio in fio_ioring_prep(), and let fio_ioring_prio_prep()
only handle the case where cmdprio_percentage/cmdprio_bssplit is enabled.

Therefore, fio_ioring_prio_prep() doesn't need to care if prio/prioclass
was enabled or not, we can simply think that fio_ioring_prio_prep()
might "override" the default priority, whatever the default priority may
be.

Doing it this way also has the advantage that the prio_prep() function
in io_uring will now look identical to the prio_prep() function in
libaio.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-5-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agocmdprio: do not allocate memory for unused data direction
Niklas Cassel [Fri, 12 Nov 2021 09:54:40 +0000 (09:54 +0000)]
cmdprio: do not allocate memory for unused data direction

All cmdprio options only support data directions read and write.
However, each cmdprio option allocates memory for ddir trim as well,
even though nothing is ever written to this memory.

Change this so that we don't allocate memory for something which is
never used.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-4-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agocmdprio: move cmdprio function definitions to a new cmdprio.c file
Niklas Cassel [Fri, 12 Nov 2021 09:54:40 +0000 (09:54 +0000)]
cmdprio: move cmdprio function definitions to a new cmdprio.c file

Move cmdprio function definitions from the cmdprio.h header file to a new
cmdprio.c file, such that we can add new static functions to cmdprio.c.

A follow up patch will add new cmdprio functions which do not need to be
directly accessible by ioengines.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-3-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodocs: update cmdprio_percentage documentation
Niklas Cassel [Fri, 12 Nov 2021 09:54:39 +0000 (09:54 +0000)]
docs: update cmdprio_percentage documentation

Commit 1437d6357429 ("libaio,io_uring: relax cmdprio_percentage
constraints") relaxed the cmdprio_percentage constraints such that
cmdprio_percentage and prioclass/prio could be used together.

However, it forgot to remove the mention of this constraint from
the docs. Update the docs to reflect the new behavior.

Fixes: 1437d6357429 ("libaio,io_uring: relax cmdprio_percentage constraints")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-2-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoSmall typo fix
Jean-Francois Panisset [Fri, 12 Nov 2021 03:56:30 +0000 (19:56 -0800)]
Small typo fix

Signed-off-by: Jean-Francois Panisset <panisset@gmail.com>
2 years agostat: create a init_thread_stat_min_vals() helper
Niklas Cassel [Mon, 8 Nov 2021 13:12:09 +0000 (13:12 +0000)]
stat: create a init_thread_stat_min_vals() helper

Create a init_thread_stat_min_vals() helper so that we can remove
duplicated code.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211108131143.80158-1-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'evelu-peak' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Mon, 25 Oct 2021 18:38:35 +0000 (12:38 -0600)]
Merge branch 'evelu-peak' of https://github.com/ErwanAliasr1/fio

* 'evelu-peak' of https://github.com/ErwanAliasr1/fio:
  t/one-core-peak: Don't report errors if missing NVME features
  t/io_uring: Fixing typo in help message
  t/one-core-peak: Reporting SElinux status

2 years agot/one-core-peak: Don't report errors if missing NVME features
Erwan Velu [Sun, 17 Oct 2021 20:00:02 +0000 (22:00 +0200)]
t/one-core-peak: Don't report errors if missing NVME features

Some NVMEs doesn't support some features, an error message is reported
like in the following example :
NVMe status: INVALID_FIELD: A reserved coded value or an unsupported value in a defined field(0x4002)
nvme2n1: Temp:26 C, Autonomous Power State Transition:, PowerState:0, Completion Queues:135, Submission Queues:135

This commit will only report features if available :
nvme2n1: Completion Queues:135, Submission Queues:135, PowerState:0, Temp:27 C

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/io_uring: Fixing typo in help message
Erwan Velu [Sun, 17 Oct 2021 19:44:53 +0000 (21:44 +0200)]
t/io_uring: Fixing typo in help message

Commit a71ad043a3f4a introduce the DMA pre mapping support but made a
typo in the help message.

This option is enabled via -D, not -R.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/one-core-peak: Reporting SElinux status
Erwan Velu [Sun, 17 Oct 2021 19:18:40 +0000 (21:18 +0200)]
t/one-core-peak: Reporting SElinux status

SElinux can influence the overall performance.
Let's report its state

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agoMerge branch 'master' of https://github.com/bvanassche/fio
Jens Axboe [Fri, 22 Oct 2021 16:19:04 +0000 (10:19 -0600)]
Merge branch 'master' of https://github.com/bvanassche/fio

* 'master' of https://github.com/bvanassche/fio:
  Android: Add io_uring support

2 years agoAndroid: Add io_uring support
Bart Van Assche [Thu, 21 Oct 2021 21:41:40 +0000 (14:41 -0700)]
Android: Add io_uring support

This patch has been tested on a recent Android phone. Compilation of this
patch has been verified as follows:

    NDK=/usr/lib/android-ndk
    export LIBS="-landroid"
    export UNAME=Android
    for ((i=23;i<=30;i++)); do
        echo "==== i = $i ===="
        export CC=$NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android${i}-clang
        [ -e "$CC" ] || continue
        ./configure && make -j$(nproc) fio || break
    done

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
2 years agoMerge branch 'patch-1' of https://github.com/sweettea/fio
Jens Axboe [Tue, 19 Oct 2021 22:09:21 +0000 (16:09 -0600)]
Merge branch 'patch-1' of https://github.com/sweettea/fio

* 'patch-1' of https://github.com/sweettea/fio:
  t/fuzz: Clean up generated dependency makefiles

2 years agot/fuzz: Clean up generated dependency makefiles
Sweet Tea Dorminy [Tue, 19 Oct 2021 20:31:27 +0000 (16:31 -0400)]
t/fuzz: Clean up generated dependency makefiles

Currently, the 'clean' target cleans up the t/ directory, but not its
subdirectories. As t/fuzz contains c files, though, dependency makefiles
are created there and should be cleaned up.

Signed-off-by: Sweet Tea Dorminy <sweettea@dorminy.me>
2 years agoMerge branch 'fixes_1290' of https://github.com/rthardin/fio
Jens Axboe [Tue, 19 Oct 2021 01:29:46 +0000 (19:29 -0600)]
Merge branch 'fixes_1290' of https://github.com/rthardin/fio

* 'fixes_1290' of https://github.com/rthardin/fio:
  Use min_bs in rate_process=poisson

2 years agoUse min_bs in rate_process=poisson
Ryan Hardin [Mon, 18 Oct 2021 20:43:22 +0000 (16:43 -0400)]
Use min_bs in rate_process=poisson

This fixes an issue where IOPS targets were not met
when the `bs` parameter was not given explicitly, such
as when using `bssplit`.

Fixes #1290

Signed-off-by: Ryan Hardin <ryan.hardin@nutanix.com>
2 years agorun-fio-tests: make test runs more resilient
Vincent Fu [Tue, 21 Sep 2021 21:27:11 +0000 (21:27 +0000)]
run-fio-tests: make test runs more resilient

Catch exceptions that occur during test setup/running/evaluation. This
makes it more likely that the entire test suite can run to completion
even if some tests fail in an unexpected fashion.

In particular I have seen failures in FioJobTest_t0014() when the test
is run on a bare metal machine. Without this patch these failures make
the entire script grind to a halt.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20210921212639.61319-1-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/zbd: Add -w option to ensure no open zone before write tests
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:09:03 +0000 (15:09 +0900)]
t/zbd: Add -w option to ensure no open zone before write tests

The commit b34eb155e4a6 ("t/zbd: Reset all zones before test when max
open zones is specified") introduced -o max_open_zones option to the
script t/zbd/test-zbd-support. It passes max_open_zones value to fio and
resets all zones of the test target device before each test case run
with write operation. This zone reset by the script ensures that no zone
out of the IO range is in open status and the write operation do not
exceed the max_open_zones limit.

On the other hand, since commit d2f442bc0bd5 ("ioengines: add
get_max_open_zones zoned block device operation"), fio automatically
fetches the max_open_zones value. So it is no longer required to pass
the max_open_zones value from the script to fio. To simplify the script
usage, introduce -w option which does not require max_open_zones value.
This option just resets zones before test cases with write operation.

Of note is that fio itself resets the zones exceeding max_open_zones
limit since the commit 954217b90191 ("zbd: Initialize open zones list
referring zone status at fio start"), but it just resets zones within
the fio IO range. Still zone reset by the test script is required for
zones out of IO range. Zone reset out of IO range by fio is not
implemented since it may cause unexpected data erasure.

Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-6-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/zbd: Align block size to zone capacity
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:09:02 +0000 (15:09 +0900)]
t/zbd: Align block size to zone capacity

The test cases #5, #6, #15 and #37 writes data and read it back (or
write with verify option for read back). When test target zones have
zone capacity unaligned to the block size, read request can not be made
to all of the written data, and the test cases fail.

To avoid the failures, check zone capacity of zones and get block size
which can align to the zone capacities. Then use the block size for the
test cases.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-5-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/zbd: Do not use too large block size in test case #4
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:09:01 +0000 (15:09 +0900)]
t/zbd: Do not use too large block size in test case #4

The test case #4 specifies zone size as block size to read a zone. For
some devices, zone size is very large in GB order, then single pread64
system call can not complete the request. This makes the test case fail.

To avoid the failure, keep the block size adequate. If zone size is too
large, use logical_block_size * 256 as the block size.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-4-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: Fix type of local variable min_bs
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:09:00 +0000 (15:09 +0900)]
zbd: Fix type of local variable min_bs

In zbd.c, thread option min_bs[] is referred and stored in the local
variable min_bs. Elements of min_bs[] have type unsigned long long, but
the local variable min_bs has type uint32_t. When an element of min_bs[]
has value larger than UINT32_MAX, it overflows on assignment to min_bs.

To avoid the overflow, fix type of the local variable min_bs from
uint32_t to uint64_t. Use uint64_t rather than unsigned long long to be
more specific about data size and consistency in zbd.c. The variable is
passed to the helper function zbd_find_zone(), then fix the type of the
argument of the function also.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>