git.kernel.dk Git - fio.git/log

don't access dlclose'd dynamic ioengine object after close

Alexey reported this bug when using dynamically loaded IO engines;
a segfault on the line where we set the dlhandle to NULL after
the dlclose.

I think this is because ops points to the thing we obtained from dlsym:

ops = dlsym(dlhandle, engine_lib);

and after the final dlclose, the object no longer exists and efforts
to set the handle within it will fail for obvious reasons.
I'm not sure why I hadn't seen this before.

Fixes-RH-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1956963
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Fixes: f6931a1 ("fio: move dynamic library handle to io_ops structure")
Tested-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

configure: fix check_min_lib_version() eval

The following shell statement:

if eval "echo \$$_feature" = "yes" ; then

executes:

echo $... = "yes"

It does not actually compare the variable named by $_feature to the
string "yes".

Add the missing "test" call so the comparison happens as intended and
wrap the eval so it doesn't include the = "yes".

Fixes: 3e48f7c9de61 ("configure: fix syntax error with NetBSD")
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ioengines: don't call zbd_put_io_u() for engines not implementing commit

Commit d9ed3e63e528 ("zbd: Fix zone locking for async I/O engines") added
a call to zbd_put_io_u() in the case where td->io_ops->commit callback
is not implemented on an ioengine.

The commit in question fails to mention why this zbd_put_io_u() call was
added for ioengines not implementing the commit callback.

The code in td_io_queue() looks like this:

ret = td->io_ops->queue(td, io_u);
zbd_queue_io_u(td, io_u, ret);

if (!td->io_ops->commit) {
io_u_mark_submit(td, 1);
io_u_mark_complete(td, 1);
zbd_put_io_u(td, io_u);
}

SYNC I/O engine case (e.g. psync):
The zone will be locked by zbd_adjust_block(), td->io_ops->queue(td, io_u),
which for a sync I/O engine will return FIO_Q_COMPLETED.

This return value will be send in to zbd_queue_io_u(), which at the end
of the function, unlocks the zone if the return value from ->queue()
differs from FIO_Q_QUEUED. For a sync I/O engine, the zone will be
unlocked here, and io_u->zbd_put_io function pointer will be set to NULL.

psync does not implement the ->commit() callback, so it will call
zbd_put_io_u(), which will do nothing, because the io_u->zbd_put_io
pointer is NULL.

ASYNC I/O engine case (e.g. io_uring):
The zone will be locked by zbd_adjust_block(), td->io_ops->queue(td, io_u),
which for an async I/O engine will return FIO_Q_QUEUED.

This return value will be send in to zbd_queue_io_u(), which at the end
of the function, unlocks the zone if the return value from ->queue()
differs from FIO_Q_QUEUED. For an async I/O engine, the zone will not be
unlocked here, so the io_u->zbd_put_io function pointer will still be set.

io_uring does implement the ->commit() callback, so it will not call
zbd_put_io_u() here at all.

Instead zbd_put_io_u() will be called by do_io() -> wait_for_completions()
-> io_u_queued_complete() -> ios_completed() -> put_io_u() -> zbd_put_io_u(),
which will unlock the zone and will set the io_u->zbd_put_io function pointer
to NULL.

In conclusion, the zbd_put_io_u() should never had been added in the case
where the ->commit() callback wasn't implemented in the first place,
and removing it shouldn't affect ioengines psync or io_uring.

Commit d9ed3e63e528 ("zbd: Fix zone locking for async I/O engines")
probably made the assumption that an async I/O engine == the ->commit()
callback is implemented, however, this is not true, there are async
I/O engines in tree (and out of tree), that does not implement the
->commit() callback. Instead, an async I/O engine is recognized by
the ->queue() callback returning FIO_Q_QUEUED.

Removing the invalid zbd_put_io_u() call will ensure that a zone is not
prematurely unlocked for async I/O engines that do not implement the
->commit() callback. Unlocking a zone prematurely leads to I/O errors.

Fixes: d9ed3e63e528 ("zbd: Fix zone locking for async I/O engines")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

The GPL isn't a EULA: remove it and introduce WixUI_Minimal_NoEULA

The GPL shouldn't be used as a EULA in an installer.
Remove it, and since the WixUI_Minimal dialog set requires a EULA
create a custom WixUI_Minimal_NoEULA set.

Signed-off-by: Rebecca Cran <rebecca@bsdio.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'pthread_getaffinity_1' of https://github.com/kusumi/fio

* 'pthread_getaffinity_1' of https://github.com/kusumi/fio:
gettime: Fix compilation on non-Linux with pthread_getaffinity_np()

gettime: Fix compilation on non-Linux with pthread_getaffinity_np()

874d55e50c("os/os-linux: add pthread CPU affinity helper") and a few
commits after that broke compilation on non-Linux platforms which support
pthread_getaffinity_np().

Define fio_get_thread_affinity() on non-Linux platforms, and make gettime
test FIO_HAVE_GET_THREAD_AFFINITY which may or may not depend on pthread.
FIO_HAVE_GET_THREAD_AFFINITY is currently not defined on Windows.

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>

Merge branch 'gpspm-add-optional-use-rpma_conn_completion_wait-function' of https://github.com/ldorau/fio

* 'gpspm-add-optional-use-rpma_conn_completion_wait-function' of https://github.com/ldorau/fio:
rpma: gpspm: introduce the busy_wait_polling toggle

rpma: gpspm: introduce the busy_wait_polling toggle

The performance of the librpma_gpspm engine depends heavily
on how much CPU power it can use to its work.
One can want either to take all available CPU power
and see what the maximum possible performance is
or configure it less aggressively and collect the results
when the CPU is not solely dedicated to doing this one task.

The librpma_gpspm engine allows toggling between one and another
by either waiting for incoming requests in the kernel
using rpma_conn_completion_wait() (busy_wait_polling=0)
or trying to collect the completion as soon as it appears
by polling all the time using rpma_conn_completion_get()
(busy_wait_polling=1).

Signed-off-by: Oksana Salyk <oksana.salyk@intel.com>

Merge branch 'zbd-no-parallel-init' of https://github.com/floatious/fio

* 'zbd-no-parallel-init' of https://github.com/floatious/fio:
init: zonemode=zbd does not work with create_serialize=0

init: zonemode=zbd does not work with create_serialize=0

zbd_init_zone_info() has a comment that it only works correctly if it
called before the first fio fork() call.
However, right now, there is nothing that ensures this.

If the user specifies --create_serialize=0 and --numjobs=2, each thread
will get their own version of zbd_info.

zbd_info contains one mutex per zone, so if the threads get different
zbd_info, two threads can manage to lock the same zone at the same time,
which will lead to I/O errors.

Explicitly disallow --zonemode=zbd together with --create_serialize=0,
so that we know that all threads will use the same zbd_info, instead of
silently misbehaving.

Analysis:
setup_files() calls zbd_init_files() which calls zbd_init_zone_info().
zbd_init_zone_info() does a for_each_td(), where it checks if zbd_info
(for the same filename) has already been allocated by another thread.
This only works if create_serialize=1 (default).
If create_serialize=0, zbd_init_zone_info() will get called in parallel,
and in this case when the second thread checks if any other thread has
allocated zbd_info, the check will fail, since the first thread has not
yet been running long enough to allocate zbd_info.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

backend: fix switch_ioscheduler()

The backend.c function switch_ioscheduler() suffers from several
problems:
1) This function only considers the first file of a job. For jobs using
   multiple files, the ioscheduler switch will done only for that file.
2) If the job file is a character device, a pipe or a regular file for
   which the hosting block device file cannot be determined (e.g. a
   remote file), thring to switch the IO scheduler causes a crash as the
   file disk_util field is NULL.
Fix both problems by introducing the helper function set_ioscheduler()
and changing switch_ioscheduler() to repeatdly call this helper for all
files of the job, ignoring character device files, pipe files and files
without a hosting device information.

Also update the man page to better explain when the ioscheduler option
applies.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines: add engine for file delete

This engine is to measure the performance of deleting files.

In practice, it is an important benchmark for a file system that
how quick it can release the disk space of deleted files.

Signed-off-by: friendy-su <friendy.su@sony.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'parse-signedness-warn' of https://github.com/floatious/fio

* 'parse-signedness-warn' of https://github.com/floatious/fio:
parse: fix parse_is_percent() warning

parse: fix parse_is_percent() warning

When compiling an out of tree ioengine, such as the SPDK fio plugin,
which has -Wextra in CFLAGS, the compiler gives the following warning:

parse.h: In function ‘parse_is_percent’:
parse.h:134:13: warning: comparison of integer expressions of different signedness: ‘long long unsigned int’ and ‘int’ [-Wsign-compare]

Since this warning was introduced recently by fio
commit b75c0fae6612 ("parse: simplify parse_is_percent()"),
and since this is the only warning seen when compiling the SPDK fio
plugin, readd the ULL prefix to parse_is_percent().

Fixes: b75c0fae6612 ("parse: simplify parse_is_percent()")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

t/zbd: test repeated async write with block size unaligned to zone size

A recently fixed bug was caused by zone reset during asynchronous IOs
in-flight. The bug symptom was unaligned command error which was
observed using random write workload with libaio engine and block size
not a divisor of zone size. To confirm the bug fix and to prevent future
regression, add a test case which runs the workload.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: avoid zone reset during asynchronous IOs in-flight

When fio repeats same workload on zoned block devices, zbd_file_reset()
is called for each repetition. This function resets target zones when
one of two conditions are met: 1) the write pointer of the zone has
offset from the device start unaligned to block size, or 2) the workload
is verify and verification is not in process. When the workload runs
with block size not a divisor of the zone size, the offsets of write
pointers from device start (not from zone start) become unaligned to
block size, then zbd_file_reset() resets target zones. This zone reset
happens even when the asynchronous IOs are in-flight and causes
unexpected IO results. Especially if write requests are in-flight, they
fail with unaligned write command error. A single thread may do both the
zone reset and the write request submit, recursive zone locks can not
prevent the zone reset during the writes.

The write pointer check for block size alignment is not working as
intended. It should have checked offset not from device start but from
zone start. Having said that, regardless of this write pointer check
correctness, the zone reset is not required since the zones are reset in
zbd_adjust_block() anyway when remainder of the zone between write
pointer and zone end is smaller than block size.

To avoid the zone reset during asynchronous IOs, do not reset zones in
zbd_file_reset() when the write pointer offset from the device start is
unaligned to block size. Modify zbd_file_reset() to call the helper
function zbd_reset_zones() only when the workload is verify and
verification is not in process. The function zbd_reset_zones() had an
argument 'all_zones' to inform that the zones should be reset when its
write pointer is unaligned to block size. This argument is not required.
Remove it and simplify the function accordingly.

The zone reset for verify workloads is still required. It does not
conflict with asynchronous IOs, since fio waits for IO completion at
verification end, then IOs are not in-flight when zbd_file_reset() is
called for repetition after verification.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'wip-rados-dont-zerowrite' of https://github.com/aclamk/fio

* 'wip-rados-dont-zerowrite' of https://github.com/aclamk/fio:
engine/rados: Add option to skip object creation

gettime: cleanup ifdef mess

Let's just set mask to all CPUs for the non-affinity case, and we
can use the same mask test case throughout.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

gettime: check affinity for thread, if we have it

If we have fio_get_thread_affinity(), we can support a smaller
(and sparse) CPU mask for the clock test.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

os/os-linux: add pthread CPU affinity helper

Signed-off-by: Jens Axboe <axboe@kernel.dk>

configure: add test case for pthread_getaffinity_np()

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'unified-merge' of https://github.com/jeffreyalien/fio

* 'unified-merge' of https://github.com/jeffreyalien/fio:
Add functionality to the unified_rw_reporting parameter to output separate and mixed stats when set to 'both' or 2.

Add functionality to the unified_rw_reporting parameter to output
separate and mixed stats when set to 'both' or 2.

To achieve this, the stats are gathered on a per/data direction
(read/write/etc) basis and then summed into the mixed stats when
generating output.

This functionality also allows setting unified_rw_reporting to
'none' or 'mixed' instead of 0 or 1 respectively.

Signed-off-by: Brandon Paupore <brandon.paupore@wdc.com>
Co-authored-by: Jeff Lien <jeff.lien@wdc.com>

Merge branch 'add-librpma-engines' of https://github.com/janekmi/fio

* 'add-librpma-engines' of https://github.com/janekmi/fio:
rpma: add librpma_apm_* and librpma_gpspm_* engines

Merge branch 'free-dump-options' of https://github.com/floatious/fio

* 'free-dump-options' of https://github.com/floatious/fio:
options: free dump options list on exit

Merge branch 'patch-1' of https://github.com/ihsinme/fio

* 'patch-1' of https://github.com/ihsinme/fio:
fix loop with unreachable exit condition

Merge branch 'dfs_engine' of https://github.com/johannlombardi/fio

* 'dfs_engine' of https://github.com/johannlombardi/fio:
engines/dfs: add DAOS File System (dfs) engine
Disable pthread_condattr_setclock on cygwin

engines/dfs: add DAOS File System (dfs) engine

DAOS is a new scale-out open-source object store.
See https://github.com/daos-stack/daos for more information.

This patch adds a new fio engine to support the filesystem layer built
on top of DAOS called DFS (DAOS File System). It supports asynchronous
read/write operations.

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Co-authored-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Co-authored-by: Vishwanath Venkatesan <vishwanath.venkatesan@intel.com>
Co-authored-by: Ioannis Galanis <ioannis.galanis@intel.com>

Disable pthread_condattr_setclock on cygwin

Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>

rpma: add librpma_apm_* and librpma_gpspm_* engines

The Remote Persistent Memory Access (RPMA) Library is a C library
created to simplify accessing persistent memory on remote hosts over
Remote Direct Memory Access (RDMA).

The librpma_apm_client and librpma_apm_server is a pair of engines
which allows benchmarking persistent writes achieved via
the Appliance Persistency Method (APM; natively supported by
the librpma library) and regular reads (a part of the RDMA standard).

The librpma_gpspm_client and librpma_gpspm_server is a pair of
engines which allows benchmarking persistent writes achieved via
the General Purpose Persistency Method (GPSPM; build on top of
the librpma API).

The librpma library is available here: https://github.com/pmem/rpma
along with the set of scripts using the newly introduced engines
to construct miscellaneous benchmarking scenarios:
https://github.com/pmem/rpma/tree/master/tools/perf

The full history of the development of the librpma fio engines
is available at: https://github.com/pmem/fio/tree/rpma

Co-Authored-By: Lukasz Dorau <lukasz.dorau@intel.com>
Co-Authored-By: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Co-Authored-By: Jan Michalski <jan.m.michalski@intel.com>
Co-Authored-By: Oksana Salyk <oksana.salyk@intel.com>

Merge branch 'dev_luye_github' of https://github.com/louisluSCU/fio

* 'dev_luye_github' of https://github.com/louisluSCU/fio:
Fix reading multiple blktrace replay files

Fix reading multiple blktrace replay files

During fio replay, the input file would be checked first by is_blktrace()
if it is a blktrace output file. If so, however, this would treat the
read_iolog option as one single file and ignore multiple files seperated by ":"

This commit contains the fixation that read input file name by index
before doing the check

Signed-off-by: luye <luye.yelu@bytedance.com>

Merge branch 'fallock-blkdev' of https://github.com/dmonakhov/fio

* 'fallock-blkdev' of https://github.com/dmonakhov/fio:
engines/falloc: add blockdevice as a target

engines/falloc: add blockdevice as a target

fallocate supoport for blockdevices was added five years ago.
It is time to stress-test it via fio.

Merge branch 'master' of https://github.com/venkatrag1/fio

* 'master' of https://github.com/venkatrag1/fio:
options: allow separate values for max_latency

options: allow separate values for max_latency

Adding the ability to specify max_latency limits separately for read, write, trim.
Uses comma-separated values similar to blocksize, rate parameters etc.

Also modifies the behavior for the one option only case for max_latency and latency_target
by excluding syncs. This makes the latency limits similar to the rate checks, in that they
only apply to reads, writes and trims.

Extends the output printed on latency exceeded event, to display io_unit information using format
similar to that iolog write.

Signed-off-by: Venkat Ramesh <venkatraghavan@fb.com>

Merge branch 'master' of https://github.com/DevriesL/fio

* 'master' of https://github.com/DevriesL/fio:
engines/io_uring: fix compilation conflict with Android NDK

engines/io_uring: fix compilation conflict with Android NDK

Android NDK already uses __errno in errno.h, rename variable __errno
to fix this conflict.

Signed-off-by: DevriesL <therkduan@gmail.com>

Fio 3.26

Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/io_uring: SQPOLL fixes

Use the proper primitives to load/store the ring head/tail. And as
importantly, account any submit as a full submit, even if we don't
need to call io_uring_enter().

This fixes hangs with using SQPOLL.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'clock_monotonic_unused' of https://github.com/foxeng/fio

* 'clock_monotonic_unused' of https://github.com/foxeng/fio:
configure: remove unused CLOCK_MONOTONIC_* symbols

configure: remove unused CLOCK_MONOTONIC_* symbols

The symbols are not used in the code and the respective probes were
removed in commits 187f39063e1b0a7baeda20a9f4f2406327ec0d41 and
69212fc41c0420f8caf272a0cc270194edbddfe7.

Signed-off-by: Fotis Xenakis <foxen@windowslive.com>

engines/filecreate: remove improper message print

'fallocate' should not be printed in filecreate engine

Signed-off-by: friendy-su <friendy.su@sony.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fix loop with unreachable exit condition

the essence of the error lies in a possible hit in the eternal cycle while, when exiting the higher cycle by means of break.

Signed-off-by: Sergey Valentey <ihsinme@gmail.com>

zbd: support 'z' suffix for zone granularity

Allow users to pass some options with zone granularity which is natural
for ZBD workloads.

This is nifty for writing quick tests and when firmware guys change
zone sizes.

Converted options are

io_size=
offset=
offset_increment=
size=
zoneskip=

Example:

rw=write
numjobs=2
offset=1z
offset_increment=10z
size=5z
io_size=6z

Thread 1 will write zones 1, 2, 3, 4, 5, 1.
Thread 2 will write zones 11, 12, 13, 14, 15, 11.

Note:
zonemode=strided doesn't create ZBD zone structure but requires
value recalculation. This is why 2 functions are split.

Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: fix check against 32-bit zone size

Zone size can be bigger than 4GB.

Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: simplify zoneskip= validness check

Simply check the remainder:

(zoneskip % zone_size) > 0

It will do the right thing for all zoneskip= values, and
zone size being positive is checked earlier.

Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

parse: simplify parse_is_percent()

Check "unsigned long long val <= -1ULL" is tautologically true
because of how value conversions work.

Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Add a new file to gitignore

Add t/fuzz/fuzz_parseini to the ignore list.

Signed-off-by: Hongwei Qin <glqinhongwei@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

options: free dump options list on exit

Fix the following LeakSanitizer warnings:

Indirect leak of 224 byte(s) in 7 object(s) allocated from:
    #0 0x7f7377b21bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
    #1 0x563951e5e09d in add_to_dump_list /home/nks/src/fio/parse.c:1135
    #2 0x563951e5e09d in add_to_dump_list /home/nks/src/fio/parse.c:1127
    #3 0x563951e5e09d in parse_cmd_option /home/nks/src/fio/parse.c:1162

Indirect leak of 43 byte(s) in 7 object(s) allocated from:
    #0 0x7f7377aaa3dd in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x963dd)
    #1 0x563951e5e0a8 in add_to_dump_list /home/nks/src/fio/parse.c:1136
    #2 0x563951e5e0a8 in add_to_dump_list /home/nks/src/fio/parse.c:1127
    #3 0x563951e5e0a8 in parse_cmd_option /home/nks/src/fio/parse.c:1162

Indirect leak of 36 byte(s) in 7 object(s) allocated from:
    #0 0x7f7377aaa3dd in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x963dd)
    #1 0x563951e5e0b9 in add_to_dump_list /home/nks/src/fio/parse.c:1138
    #2 0x563951e5e0b9 in add_to_dump_list /home/nks/src/fio/parse.c:1127
    #3 0x563951e5e0b9 in parse_cmd_option /home/nks/src/fio/parse.c:1162

by moving fio_dump_options_free() to options.h,
so that we can call it during exit.

Reproducer:
LD_PRELOAD=libasan.so.5 fio --name=test --filename=/dev/nullb0 --runtime=2

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

engines/io_uring: add verbose error for ENOSYS

If we get ENOSYS for setting up the rings, then the kernel is too old
to support io_uring. Mention that explicitly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

engine/rados: Add option to skip object creation

When `touch_objects` is false, connection procedure no longer attempts to create objects.
In such case ceph caches are untouched and do not affect quality of performance measurement.
But if `touch_object` is false, tests that read objects without creating them previously
will now fail.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>

Merge branch 'per-engine-pre-write-function' of https://github.com/lukaszstolarczuk/fio

* 'per-engine-pre-write-function' of https://github.com/lukaszstolarczuk/fio:
filesetup: add engine's io_ops to prepopulate file with data

Merge branch 'taras/clientuid' of https://github.com/tarasglek/fio-1

* 'taras/clientuid' of https://github.com/tarasglek/fio-1:
$clientuid keyword to differentiate clients in client/server mode.

$clientuid keyword to differentiate clients in client/server mode.

Prior to this change getting fio to include IP as part of filename was a struggle. One had to use directory=/ to trigger IP-inclusion code and there was no way to customize that.

Signed-off-by: Taras Glek <taras@glek.net>

filesetup: add engine's io_ops to prepopulate file with data

In some cases (e.g. engine marked as diskless) files are not laid out.
If the first job is a read job, results are higher than expected
(because reading zero page). Each engine should deliver func to
prepopulate file with data to avoid this situation.

Signed-off-by: Łukasz Stolarczuk <lukasz.stolarczuk@intel.com>

zbd: relocate Coverity annotation

The Coverity annotation added earlier to suppress a false positive
about missing unlock in zbd_adjust_block() didn't work because it
was placed not before the return statement, but earlier in the code.

Move the annotation to the right place to avoid the warning.

Reported-by: Bart Van Assche <bvanassche@acm.org>
Fixes: 8e4b2e55512f("zbd: don't unlock zone mutex after verify replay")
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: fix 32-bit compile warnings for logging

Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: avoid looping on invalid command line options

t/zbd/test-zbd-support loops indefinitely if an unrecognized option
is specified in the command line. Add a switch case to display usage
and exit the script.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: increase timeout in test #48

Test #48 runs some i/o to the test device for 30 seconds and then waits
45 seconds for fio to finish. If this wait times out, the test assumes
that fio is hung because of a zone locking issue and fails. It is
observed that 45s may not be enough for some HDDs, especially the ones
running specialized firmware.

Increase the timeout to 180 seconds to avoid any false positives.
There is no change in test duration for the most common devices.
The test will wait for the full 180 seconds only if it fails, otherwise
it will finish very soon after the 30 second i/o period ends.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: show elapsed time in test-zbd-support

This script may take quite a lot of time to run against large
zoned HDDs. At the end of every run, show exactly how much time
it took.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: add test #54 to exercise ZBD verification

Add a new test case to perform 75/25 read/write workload with varying
i/o size and verification on. It is very important to use a good random
generator for this test. Setting experimental_verify=1 is required for
this test to operate correctly.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: add checks for offline zone condition

Some tests, e.g. #39 an #40, try to read the first zone of the drive.
It is assumed that the first zone is readable. However, if the first
zone is offline, the read fails along with the entire test.

This commit adds two functions to perform zone report and find the
first and the last zones that are not offline. Several test cases
now call these functions to avoid test failures described above.

Fixes for two more test failures are included in this commit -

Test #14 tries to write to conventional zones if they are found at
the beginning of the LBA range of the drive, but it assumes that
these zones are online. This may not always be the case. Add "offset"
to avoid the i/o to be attempted to run against any preceding offline
zones.

Similarly, in test #17, the script tries to find the last zone.
Check for the case when the last zone is offline. The test doesn't
set the i/o file size, but it works OK in most of the cases because
typically this test operates on the last physical zone. With the
online lookup in place, this may not always be the case and if there
are any offline zones that trail the last non-offline zone,
then the i/o will try to access that zone and fail. Add the "size"
to avoid the i/o to be attempted to run against any trailing offline
zones.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: prevent test #31 from looping

The test 31 starts i/o to 128 zones in parallel.
There are two corner cases that are not properly handled in the
existing implementation -
1) If the total number of zones on the device is < 128, the test
will loop indefinitely because the loop increment is calculated as
zero by the script.
2) If the number of max_open_zones of the device is < 128, the
test will fail due to exceeding max_open_zones limit as the code
expects it to be >= 128.

Limit the number of open zones to the reported maximum
and skip the test if there is not enough zones on the device.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: add an option to bail on a failed test

Sometimes, it can be useful to inspect the state of the zones of the
test device, usually right after a test failure. Currently,
test-zbd-support script just keeps running and proper examination of
device zones can be difficult.

Add the -q option to test/zbd/support to quit immediately upon
encountering any test failure. Additionally, define the same option
in run-tests-against-nullb to propagate it to test/zbd/support.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: fix wrong units in test case #37

The second argument of the function total_zone_capacity is expected to
be in bytes. However, the call in test case #37 provides this argument
in sectors and this results in a wrong capacity calculation. Make sure
that the value that is passed to this function is converted to bytes.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: test random I/O direction in all-conventional case

The number of 'sectors with data' is counted and used to determine the
direction of the first I/O of a random read/write ZBD workload. To
initialize the number, min_zone and max_zone fields in struct fio_file
are referred. There was a code bug that was recently fixed where
min_zone and max_zone fields were not initialized when all zones in I/O
region were conventional zones. This led to an uninitialized number of
sectors with data, and the write direction was always set for random
workloads.

Add a test case to perform random read/write workload on an I/O region
with both sequential and conventional zones. Check that both read and
write I/Os are executed.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: test that zone_reset_threshold calculation is correct

The option "zone_reset_threshold" specifies the ratio of logical blocks
with data to trigger zone resets. When the I/O range includes
conventional zones, only blocks in sequential zones must be used to
track this value. A recently fixed bug has uncovered that the number of
blocks in conventional zones were erroneously counted as the blocks
with data.

To prevent future regressions, add a test case to confirm that the
logical blocks accounting does not include conventional zones.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: test that conventional zones are not locked during random i/o

A recently fixed bug was caused by an unexpected conventional zone lock
during random I/O adjustment. Only sequential zones are supposed to be
locked, but the conventional zone lock was observed with a random
workload against an I/O region with mixed conventional and sequential
zones.

Add two test cases with the same workload to ensure that no similar
regression happens in the future. One case tests reads and the other
is for writes. As a related change, add the helper function
require_conv_zones() to check that the test target device has enough
conventional zones available.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: skip tests that need too many sequential zones

Test cases #3, #4, #28, #29 and #48 require rather large numbers of
sequential zones to run properly and they fail if the test target
device has not enough of such zones in its zone configuration.

Check how many sequential zones are present on the test device and
skip any test cases for which this number is not enough.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: skip tests when test prerequisites are not met

Some of the test cases in t/zbd/test-zbd-support require test target
devices to have certain features. When these prerequisites are not met,
they skip the actual test and report the test result to be "PASS".
This does not help users to understand the true test outcome.
As the tests expand to cover a wider variety of zoned devices and
layouts, reporting skipped tests becomes more and more beneficial.

Modify test-zbd-support script to report skipped test cases.
Introduce helper functions require_*() to check test target
prerequisites. If they are not met, set the variable SKIP_REASON and
return the constant SKIP_TESTCASE from the test function. In the main
loo, print "SKIP" status and SKIP_REASON if the test case is skipped.
Also, output the total number of skipped cases at the end of the test
script run.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: add -t option to run-tests-against-nullb

For debugging, it can be useful to run a single ZBD test case in all
zoned configurations defined in run-tests-against-nullb. Add -t option
to run-tests-against-nullb so that the single ZBD test case specified
in run-tests-against-nullb command line is executed in all sections
of the script.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: add run-tests-against-nullb script

This script combines the t/zbd/run-tests-against-zoned-nullb script
functionality with t/zbd/run-tests-against-regular-nullb and adds
more zoned device configurations to test. This considerably improves
ZBD test coverage.

The added script makes the two old scripts named above obsolete,
remove them. Modify t/run-fio-tests.py and Makefile to refer to the
new script instead of the old one. Since the full test now runs
significantly longer than the two old ones combined due to many more
zoned configurations, only execute a few individual sections as a
part of testing n "make fulltest" and run-fio-tests.py. One extra test
section with 10% conventional zones is executed from the Makefile.
The Python tests only exercise all-conventional and all-sequential
configurations, exactly as before.

The script returns a non-zero return code if at least one of the
executed sections had a failed test.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/zbd: check for error in test #2

With the preceding commit in place, fio gives an error if user attempts
to run write I/O size that is larger than the zone size. Grep for that
message instead of checking that no write has happened.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: set thread errors in zbd_adjust_block()

Several error conditions that are encountered during zone processing
in zbd_adjust_block() function cause it to return io_u_eof value.
This stops the i/o to the given file, but there is no error raised or
reported if this code is returned. For a few particular conditions,
just stopping the i/o is reasonable, but others are serious errors
that should be reported.

Add td_verror() calls to raise thread errors for a few abnormal
conditions during adjusting the i/o. The only test that needs to be
modified because of this changes is test #2.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: avoid failing assertion in zbd_convert_to_open_zone()

The test run against null_blk with the following command line -

t/zbd/run-tests-against-nullb -l -q -s 12 -t 51 -n 100

stops with a failure and the message below can be seen in the test log:

fio: zbd.c:1110: zbd_convert_to_open_zone: Assertion `open_zone_idx < f->zbd_info->num_open_zones' failed.

This assertion fails because pick_random_zone_idx() function returns
index 0 if no zones are currently open. In this case, open_zone_idx and
f->zbd_info->num_open_zones are both zero. Since this situation is
normal, simply modify the assert statement to avoid failing.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/libzbc: enable block backend

When opening a device, the current version of libzbc ioengine instructs
libzbc to only try SCSI and ATA backends for scanning the drive. This
prevents opening null_blk devices that fail to be accepted by the both
above mentioned backends and require the block backend to be enabled.

Set the appropriate flag to enable the block backend in zbc_open()
libzbc call.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: improve replay range validation

The function zbd_replay_write_order() is called when a read is
issued by fio verification code to compare the read data with the
previously written data. Any data mismatch is marked as a verification
failure.

Since zbd_adjust_block() may change the i/o offset and length
to comply with i/o constrains that zoned model has set,
zbd_replay_write_order() needs to replicate the same adjustment during
verify. The general flow in this function matches the write processing
done in zbd_adjust_block(), but there are some differences. For
example, z->verify_block acts as the pseudo-write pointer during replay
and it needs to be advanced by buflen every time the function called,
but it is advanced by min_bs in the existing code (the value of this
variable is measured in min_bs units).

Fix the issue with verify_block and add more error logging to simplify
troubleshooting of this tricky part of ZBD code.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: handle conventional start zone in zbd_convert_to_open_zone()

At the beginning of zbd_convert_to_open_zone() function, a zone
is picked in semi-random manner to become a candidate zone for
redirecting the incoming write. In some circumstances, such as
unlimited MaxOpen or i/o range that spans the boundary between
conventional and sequential zones, a conventional zone may be
selected.

This may create problems in the subsequent for (;;) loop in the
same function. Failed assertions were observed during the execution
of newly introduced test #51 that showed that the code in that loop
was trying to lock and unlock conventional zones.

Check if the zone which has been initially picked is conventional.
If yes, force the zone selection to be re-tried until a sequential
zone is selected for further processing.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: don't log "zone nnnn is not open" message

This log message has been added recently (it could have been my idea
to add it during internal review) and it turns out that the message
tends to flood the log when any decent workload is run with
--zonemode=zbd. Remove logging of this debug message.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: disable crossing from conventional to sequential zones

Write I/Os to conventional zones may have the range that spans across
zone boundaries. Such writes may cause I/O errors when its next zone
is a sequential zone.

To avoid such I/O errors, check for the cross over from a conventional
to a sequential zone. When the write crosses the boundary, shrink the
I/O length to fit within the first zone. If the offset is too close to
the end of the zone, wrap it around to the beginning of the same zone.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: use zone_lock() in zbd_process_swd()

Most of ZBD code in fio uses zone_lock() to lock write pointer zones.
This wrapper, besides doing the obvious pthread mutex lock, quiesce
the outstanding i/o when running via asynchronous ioengines. This is
necessary to avoid deadlocks. The function zbd_process_swd(), however,
still uses the naked pthread mutex to lock zones and this leads to a
deadlock when running ZBD test #48 against regular nullb devices.

The fix added in the same patch series that introduced test #48 was to
NOT initialize SWD at all, but this solution is found to create
problems with verify. As the proper fix, modify zbd_process_swd()
to use zone_lock(). This makes the test #48 pass even when SWD counter
is initialized.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: initialize sectors with data at start time

Based on the flag enable_check_swd, which is false by default, fio
does not initialize the swd value at startup, initializing the swd
value to be zero, even if some zones have sectors with data. This can
result in fio reflecting less than actual swd after a few writes are
completed. In workloads where verify is enabled, fio resets all the
zones and while resetting, it decrements the swd counter with the
actual number of swds in that zone(swd-count - swd-in-zone),
since swd-count is initialized to 0, it results in overflow of the
variable causing unpredictable issues.

So, initialize the swd to the correct value.

Fixes: 409a4f291e7f ("zbd: avoid initializing swd when unnecessary")
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: initialize min_zone and max_zone for all zone types

The function zbd_verify_sizes() checks if the given I/O range includes
write pointer zones. When all zones in the I/O range are conventional,
it skips checks for size options and leaves min_zone and max_zone in
struct fio_file with zero values. These uninitialized min_zone and
max_zone fields trigger unexpected behaviors such as unset
sectors_with_data.

Fix this by moving min_zone and max_zone set up from zbd_verify_sizes()
to zbd_setup_files(). This allows for setting up the values regardless
of zone types in I/O range.

Bypass the assertion to ensure that max_zone is larger than min_zone if
all zones in the I/O range are conventional. In this case, io_size
can be smaller than zone size and, consequently, min_zone may become
the same as max_zone.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: count sectors with data for write pointer zones

ZBD fio code tracks 'sectors with data' for two different purposes.
The first one is to process zone_reset_threshold. When the ratio of
sectors with data in zones with write pointer goes beyond the specified
number, zone reset is triggered. The second purpose is to control the
direction of the first I/O of random mixed read/write workloads. If all
write pointer zones in the I/O range are reset at the beginning of such
a workload, fio has no data to read and will immediately end the run of
the test section. To avoid this, fio checks 'sectors with data' and if
it is zero (i.e. it is the very first I/O), it modifies the direction
of that I/O from read to write.

Currently, when the workload range includes both conventional and
sequential zones, all sectors in conventional zones are counted as
'sectors with data' along with sectors in sequential zones.
This leads to incorrect handling of 'zone_reset_threshold' option -
zone reset timing of sequential zones is affected by the amount of
data read from or written to conventional zones. To avoid this,
conventional zones should be excluded from 'sectors with data'
calculation.

On the other hand, if the sectors of conventional zones were excluded
from the sectors with data, it could result in the wrong initial I/O
direction for random workloads. When the zones in I/O region are all
conventional, 'sectors with data' would always be zero. Because of
this, read operations are always changed to writes and reads are never
performed.

To avoid this contradiction, introduce another counter,
'wp_sector_with_data'. It works similar to the existing
'sectors_with_data', but it counts data sectors only in write pointer
zones. Use this newly introduced count for zone_reset_threshold checks
and keep on using the original count for the initial random I/O
direction determination.

When counting sectors with data, lock only write pointer zones, no need
to lock conventional zones.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: do not set zbd handlers for conventional zones

When zbd_adjust_block() modifies io_u to satisfy write pointer
restrictions, it may change the zone for the io_u. The function sets
pointers to zbd_queue_io() and zbd_put_io() handlers to io_u to
further process write pointer zones. However, when the I/O is
redirected to a conventional zone, these handlers should not
be set in io_u.

Skip setting the handlers when this function returns a conventional
zone. When zbd_adjust_block() can not find a zone to fit the I/O,
the existing code unlocks the zone pointer 'zb' used in the function.
This unlock should not be performed if 'zb' points to a conventional
zone upon return, skip it in this case.

These changes make the assert for 'zb' pointer near 'accept' label in
zbd_adjust_block() unnecessary. Replace it with assert for zb->has_wp,
since the zone at the step shall have write pointer.

Since zone locking functions (zone_lock(), zbd_queue_io() and
zbd_put_io()) are supposed to be called only for write pointer zones,
add assertions to zone_lock() and zone_unlock() to make sure this is
the case. This allows us to convert a few existing conditional checks
to assertions to make zone type validation more strict.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: do not lock conventional zones on I/O adjustment

When a random workload runs against write pointer zones, I/Os are
adjusted to meet write pointer restrictions. During read, I/O offsets
are adjusted to point to zones with data to read and during write, I/O
offsets are adjusted to be at write pointers of open zones.

However, when a random workload runs in a range that contains both
write pointer zones and conventional zones, I/Os to write pointer
zones can potentially be adjusted to conventional zones. The functions
zbd_find_zone() and zbd_convert_to_open_zone() search for zones
regardless of their type, and therefore they may return conventional
zones. These functions lock the found zone to guard its open status
and write pointer position, but this lock is meaningless for
conventional zones. This unwanted lock of conventional zones has been
observed to cause a deadlock.

Furthermore, zbd_convert_to_open_zone() may add the found conventional
zone to the array of open zones. However, conventional zones should
never be added to the array of open zones as conventional zones never
take the "implicit open" condition and not counted as part of the
device open zone management.

To avoid the deadlock, modify zbd_find_zone() not to lock zone when it
checks conventional zone without write pointer. To avoid the deadlock
and the conventional zone open, modify zbd_convert_to_open_zone() to
ignore conventional zones.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: don't unlock zone mutex after verify replay

zbd_adjust_block() always returns with the zone locked if the i/o is
accepted. The corresponding unlock happens in zbd_put_io(). The
function description says -

* Locking strategy: returns with z->mutex locked if and only if z refers
* to a sequential zone and if io_u_accept is returned. z is the zone that
* corresponds to io_u->offset at the end of this function.

Remove the recently added unlock after zbd_replay_write_order() call.
Add a Coverity annotation to mark the absence of unlock as intentional.

Fixes: b2726d53bb5d ("zbd: Add a missing pthread_mutex_unlock() call")
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: avoid zone buffer overrun

If the total number of zones on a drive is calculated to a value that
is less than the number of zones it can actually report, zone info
buffer can be overrun. This may happen not only due to drive firmware
problems, but also because of underlying software incorrectly
reporting zoned device capacity.

Fix this by more carefully setting zone report size.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: skip offline zones in zbd_convert_to_open_zone()

Since all I/Os to an offline zone will fail, add a check in
zbd_convert_to_open_zone() to ignore zones that have this condition.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: remove dependency on zone type during i/o

Two different type of zones have a write pointer: Sequential Write
Required (SWR) and Sequential Write Preferred (SWP). Introduce the
zone flag "has_wp" in struct zbd_zone_info and set it to 1 for these
zone types upon initialization, thus avoiding the necessity to check
multiple zone types in core zbd code. This flag replaces zbd_zone_swr()
function and lays the groundwork for supporting additional write
pointer zone types in the future.

The overall functionality stays the same after this commit.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: engines/libzbc: don't fail on assert for offline zones

If fio is run against a zoned device that has any zones in OFFLINE
condition, the following assertion is raised -

fio: zbd.c:473: parse_zone_info: Assertion `z->wp <= z->start + zone_size' failed.

This happens because offline zones have no valid write pointer and
it is reported by libzbc and blkzoned as (uint64_t)(-1). To avoid
violating this assertion, set the write pointer in all offline zones
to point at the start of the zone.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: introduce zone_unlock()

ZBD code already defines a helper function to lock a device zone,
zone_lock(). There is no zone_unlock() function though.

Wrap zone mutex unlock to zone_unlock() helper along with an assert
to make sure that the unlock is successful, i.e. that the function
is being called with the pointer to a locked zone.

Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: add get_zone() helper function

The following pattern is used very widely in zbd.c -

zone = &f->zbd_info->zone_info[zone_idx] .

For the sake of code clarity, wrap this construct into an inline
helper. No change in functionality.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: use zbd_zone_nr() more actively in the code

The function zbd_zone_nr() is always called with the first argument
being f->zbd_info. If "f" is made the first argument instead, calls
of this function become more compact end easier to read.

Besides this change, convert several places in the code where the same
zone number calculation is open coded to zbd_zone_nr() calls.
This is a refactoring patch, no change in functionality.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

zbd: return ENOMEM if zone buffer allocation fails

parse_zone_info() function tries to allocate a buffer of
ZBD_REPORT_MAX_ZONES zone descriptors and exits if this allocation
fails. The problem is that it returns 0 error code in this case and
the caller may interpret this as the success.

Just return ENOMEM if we can't allocate that buffer.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'fio-fix-detecting-libpmem' of https://github.com/ldorau/fio

* 'fio-fix-detecting-libpmem' of https://github.com/ldorau/fio:
fio: fix detecting libpmem

fio: fix detecting libpmem

The current test for libpmem in 'configure' fails
in the following way:

$ gcc test.c -lpmem
test.c: In function ‘main’:
test.c:6:27: warning: passing argument 2 of ‘pmem_is_pmem’ \
             makes integer from pointer \
             without a cast [-Wint-conversion]
    6 |   rc = pmem_is_pmem(NULL, NULL);
      |                           ^~~~
      |                           |
      |                           void *
In file included from test.c:1:
/usr/include/libpmem.h:92:43: note: expected ‘size_t’ \
             {aka ‘long unsigned int’} but argument \
             is of type ‘void *’
   92 | int pmem_is_pmem(const void *addr, size_t len);
      |                                    ~~~~~~~^~~

Fix it.

Calculate min_rate with the consideration of thinktime

This patch updates the compare time if handle_thinktime
sleeps or spin.

Signed-off-by: Hongwei Qin <glqinhongwei@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Add thinktime_blocks_type parameter

This patch adds a new parameter thinktime_blocks_type to control the
behavior of thinktime_blocks. It can be either `complete` or `issue`. If
it is `complete` (default), fio triggers thinktime when thinktime_blocks
number of blocks are **completed**. If it is `issue`, fio triggers
thinktime when thinktime_blocks number of blocks are **issued**

Tests:
jobfile1:
```
[global]
thread
kb_base=1000
direct=1
size=1GiB
group_reporting
io_size=96KiB
ioengine=libaio
iodepth=8
bs=4096
filename=/dev/qblkdev
rw=randwrite

[fio_randwrite]
thinktime=2s
thinktime_blocks=4
```

jobfile2:
```
[global]
thread
kb_base=1000
direct=1
size=1GiB
group_reporting
io_size=96KiB
ioengine=libaio
iodepth=8
bs=4096
filename=/dev/qblkdev
rw=randwrite

[fio_randwrite]
thinktime=2s
thinktime_blocks=4
thinktime_blocks_type=issue
```

Results:
Current HEAD:
fio jobfile1:
write: IOPS=5, BW=24.6kB/s (24.0KiB/s)(98.3kB/4002msec); 0 zone resets
blktrace:
11 reqs -- 2s -- 8 reqs -- 2s -- 5 reqs -- end

This patch:
fio jobfile1:
write: IOPS=5, BW=24.6kB/s (24.0KiB/s)(98.3kB/4001msec); 0 zone resets
blktrace:
11 reqs -- 2s -- 8 reqs -- 2s -- 5 reqs -- end

fio jobfile2:
write: IOPS=1, BW=8190B/s (8190B/s)(98.3kB/12002msec); 0 zone resets
blktrace:
4 reqs -- 2s -- 4 reqs ... -- 4 reqs -- 2s -- end

Server:
fio --server=192.168.1.172,8765
Client (On the same machine):
fio --client=192.168.1.172,8765 jobfile1
write: IOPS=5, BW=24.6kB/s (24.0KiB/s)(98.3kB/4001msec); 0 zone resets
blktrace:
11 reqs -- 2s -- 8 reqs -- 2s -- 5 reqs -- end

fio --client=192.168.1.172,8765 jobfile2
write: IOPS=1, BW=8191B/s (8191B/s)(98.3kB/12001msec); 0 zone resets
blktrace:
4 reqs -- 2s -- 4 reqs ... -- 4 reqs -- 2s -- end

Signed-off-by: Hongwei Qin <glqinhongwei@gmail.com>
[axboe: fold patch 3 into this one]
Signed-off-by: Jens Axboe <axboe@kernel.dk>