git.kernel.dk Git - fio.git/log

examples: Add example for atomic write verify

Add an example for verifying atomic writes.

Until now, atomic writes are only supported on Linux for block devices, so
only give instructions on for that.

Currently support is being worked on for XFS and EXT4, and instructions can
be updated in due course.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-10-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fio: Support verify_write_sequence

Add an option to disable verifying the write sequence number. By default,
it is enabled. However disable for verify_only mode.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-9-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

doc: Document atomic command

Now that the atomic command is formally supported, document it.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-8-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

tools/fiograph: Update for atomic support

Add atomic support for the specific engines which support this option.

This just means that "atomic" will show up as a special iongeine config
option (if fio 'atomic' option is specified).

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-7-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: Support RWF_ATOMIC

Set RWF_ATOMIC for writes and oatomic==1.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-6-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

libaio: Support RWF_ATOMIC

Set RWF_ATOMIC for writes and oatomic==1.

Guard setting RWF_ATOMIC by FIO_HAVE_RWF_ATOMIC, as only linux supports
RWF_ATOMIC.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-5-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

pvsync2: Support RWF_ATOMIC

Set RWF_ATOMIC for writes and atomic==1.

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
jpg: Set FIO_ATOMICWRITES for pvsync2
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-4-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

os: Reintroduce atomic write support

Previously O_ATOMIC support was added in commit d01612f3ae25 ("Add support
for O_ATOMIC"). But support was removed in commit a25ba6c64fe1 ("Get rid of
O_ATOMIC"), as support was never added in the Linux kernel.

Linux kernel 6.11 will add support for RWF_ATOMIC, which can be supported
for various ioengines. See latest man pages for details.

The plumbing was left in place for thread option oatomic, so that will be
reused.

Add a flag to say whether an engine supports atomic writes, and reject
when oatomic is set for an engine which does not support atomic writes.

This is a change in behaviour, as since commit a25ba6c64fe1 ("Get rid of
O_ATOMIC"), this oatomic has been ignored. However, it is better to tell
the user that their ioengine of choice does not support atomic writes.

Today RWF_ATOMIC is only supported for direct-IO. In future it may be
supported for buffered IO. As such, do not auto-set odirect=1 when
oatomic==1.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-3-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

os-linux: Define RWF_ATOMIC

Add a definition of RWF_ATOMIC when not available from uapi headers.

RWF_ATOMIC is going to be part of Linux v6.11

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240916165347.2226763-2-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Update mailing list details in README.rst

Majordomo commands are no longer used, so update README.rst with the new
method of subscribing to the mailing list.

Signed-off-by: Rebecca Cran <rebecca@bsdio.com>
Link: https://lore.kernel.org/r/20240906211941.850156-1-rebecca@bsdio.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

t/jobs/t0035: add test for the file operations IO engine

The previous commit fixed the NULL pointer dereference which happened
when the write_lat_log option is specified for the file operations IO
engine. Add a new test case to confirm the fix. This test case also
covers the basic use cases of the file operations IO engine.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240906023717.1464031-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

stat: fix the null io_u dereference in add_clat_sample()

As recorded in the Link, NULL pointer dereference happens when the
write_lat_log option is specified for the file operations IO engine.
This failure was caused by the commit 14d3134a5fc0 ("introduce the
log_issue_time option") which added the new field 'issue_time' to the
struct log_sample. To calculate the issue time, add_clat_sample() was
modified to refer to io_u->issue_time. However, the file operations IO
engine passes NULL as the io_u pointer. Hence the failure.

Fix this by skipping the io_u->issue_time reference when io_u is NULL.
Instead, set 0 as the issue time.

Link: https://lore.kernel.org/fio/0e2c84c9-f9e4-4073-a075-016393ca7bde@gmail.com/
Fixes: 14d3134a5fc0 ("introduce the log_issue_time option")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240906023717.1464031-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

iolog: add va_end on error

Call va_end when we encounter an error trying to print sample fields.

This was reported by Coverity:

** CID 509197:  API usage errors  (VARARGS)
/iolog.c: 1025 in print_sample_fields()
________________________________________________________________________________________________________
*** CID 509197:  API usage errors  (VARARGS)
/iolog.c: 1025 in print_sample_fields()
1019      int ret;
1020
1021      va_start(ap, fmt);
1022      ret = vsnprintf(*p, *left, fmt, ap);
1023      if (ret < 0 || ret >= *left) {
1024      log_err("sample file write failed: %d\n", ret);
>>>     CID 509197:  API usage errors  (VARARGS)
>>>     "va_end" was not called for "ap".
1025      return -1;
1026      }
1027      va_end(ap);
1028
1029      *p += ret;
1030      *left -= ret;

Fixes: 3ec6b6da ("iolog: refactor flush_samples()")
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

t/jobs/t0034: add test for the log_issue_time option

Add a test to check the newly added option 'log_issue_time'. Generate
log files using the option and check that lines in the log files have
the format described in the "Log File Format" section in HOWTO.rst.

This test case has the logic same as t0033 except the log file names and
matching patterns. Factor out the logic to the new class
FioJobFileTest_LogFileFormat.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-10-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

t/jobs/t0033: add test for the log file format

Add a test to check the log file format which is described in the "Log
File Format" section in HOWTO.rst. Generate log files using combination
of options relevant to log files, and check that lines in log files have
the format expected. This test helps to confirm that the changes in the
log file related functions do not cause regressions.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-9-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

doc: describe the log_issue_time option

The recent commit introduced the new option log_issue_time. Describe its
feature and restrictions.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-8-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

doc: fix the descriptions of the log_prio option

Commit 03ec570f6e57 ("fio: Introduce the log_prio option") added the
description of the log_prio option to fio.1. However, the description
was wrong. It mentioned that the log_prio option would change the number
of fields in the Log File Format, but actually it does not do so. Fix
the description.

Also, the commit did not update HOWTO.rst for the log_prio option. To
keep HOWTO.rst same as fio.1, add the missing descriptions to HOWTO.rst.

Fixes: 03ec570f6e57 ("fio: Introduce the log_prio option")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-7-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

introduce the log_issue_time option

When write_lat_log option is set, fio outputs the 'clat' completion
latency log file. This clat log can be used to analyze IO latency. This
file can also be considered as an IO trace file as each IO entry can
fully describe an IO when the --log_offset and --log_prio options are
also used.

However, using the clat log file as an IO trace is inaccuate due to two
reasons. Firstly, the time field of each entry uses millisecond units,
which is too coarse for fast IOs (e.g. when using SSDs). Secondly, the
time field value is recorded not at command completion, but at log
sample recording. The time field value is slightly different from the IO
completion time. It can be used only as an approximated completion time.

To analyze IO issue time and IO completion time accurately using the
clat log, introduce the new option 'log_issue_time'. When this option is
set, add another field to the log file entries and put the IO issue time
in nanosecond to the field. The IO completion time can be calculated by
adding the completion latency to the IO issue time.

The IO issue time field is added to 'slat' submit latency log file also.
This helps to calculate IO start time by subtracting the submission
latency from the IO issue time.

The log_issue_time option can be used for IO trace when the
write_lat_log option and the log_offset options are set together. When
the log_issue_time option is set but the write_lat_log option or the
log_offset option is not set, fio errors out. When the log_issue_time
option and the write_lat_log option are set together with other
write_X_log options, the IO issue time field is added to all log files.
As for the other log files than clat and slat log, the IO issue time
does not have meaning then '0' is set to the field. When log_avg_msec
option is set, average of the log values of the specified duration is
logged. The IO issue time does not have meaning in this case either and
'0' is set to the field.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-6-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

iolog: drop struct io_sample_offset

Fio uses the struct io_sample to log attributes of each IO. When the
log_offset option is set, fio uses the struct io_sample_offset instead
which has the additional field dedicated to the offset data. Fio chooses
one of these two structs by the log_offset option to minimize memory
usage for IO sampling. However, the dedicated struct io_sample_offset is
not flexible and makes it difficult to add new sampling items.

To allow adding new sampling items, drop the struct io_sample_offset.
Instead, introduce the variable length array "uint64_t aux[]" at the
end of the struct io_sample which holds any auxiliary sampling data.
At this moment, it holds only one item "offset" as the first array
element. The following patch will add a new item.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-5-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

iolog: refactor flush_samples()

flush_samples() controls the log file format depending on the options
log_avg_msec, log_window_value, log_offset and log_prio. It has deeply
nested branches to check the options. These nested branches make it
difficult to add more fields to the log file format. For ease of the log
file format improvements, refactor the function. Instead of checking all
conditions at once, check each condition one by one, generate small
strings for each field and add them to the final string to output. For
this purpose, introduce the new function print_sample_fields() which
generates each field string.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-4-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

stat: reduce arguments of add_log_sample()

The number of arguments of add_log_sample() has increased as fields get
added to the log file format. Five parameters of them (data, ddir, bs,
offset and priority) are passed to __add_log_sample(). This makes the
function look more complicated and log field addition harder. To
simplify the function, pack the five arguments into the new struct
log_sample.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

stat: reduce arguments of add_*lat_sample() functions

The functions add_clat_sample(), add_lat_sample() and add_slat_sample()
have a rather large number of arguments, and some of the arguments are
members of the struct io_u. Pass io_u instead of those arguments to
reduce the number of arguments.

Some add_clat_sample() callers in engines/fileoperations.c do not have
io_u reference, then pass NULL instead of the io_u. This indicates to
use 0 values instead of the io_u fields.

While add_slat_sample() takes only struct thread_data * and struct
*io_u, add_clat_sample() and add_lat_sample() still require three more
arguments: 1) nsec is required because struct io_u does not have
completion time, 2) ddir is required to support fileoperations IO
engine, and 3) bs to record partial IO completion.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240829085826.999859-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Merge branch 'no-librpma' of https://github.com/grom72/fio

* 'no-librpma' of https://github.com/grom72/fio:
  Remove obsolete library ref.
  Fix parameter type
  rpma: remove librpma support
  Revert "rpma: add librpma_apm_* and librpma_gpspm_* engines"
  Revert "rpma: RPMA engine requires librpma>=v0.10.0 with rpma_mr_advise()"
  Revert "rpma: RPMA engines require librpma>=v0.11.0 with rpma_cq_get_wc()"
  Revert "rpma: simplify server_cmpl_process()"
  Revert "ci: build the librpma fio engine"

Remove obsolete library ref.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>

Fix parameter type

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>

Remove 'uncached' support

Experimental kernel patches existed for this, usable via setting
RWF_UNCACHED for the IO. But they never made it into mainline, so
let's mark the option as deprecated and kill off the support for
now.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

rpma: remove librpma support

Remove librpma support as rpma project is already archived:
https://github.com/pmem/rpma

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>

Revert "rpma: add librpma_apm_* and librpma_gpspm_* engines"

This reverts commit e4c4625ff8368f7667b2fe81cd2040186d440c94.

Revert "rpma: RPMA engine requires librpma>=v0.10.0 with rpma_mr_advise()"

This reverts commit e662bc9815de906e3498f4261ec5a28481872a18.

Revert "rpma: RPMA engines require librpma>=v0.11.0 with rpma_cq_get_wc()"

This reverts commit d479658a965ac17ff213d7ba506116f822cb3219.

Revert "rpma: simplify server_cmpl_process()"

This reverts commit d3061c18e84c91a417f8832b1a7cc09b1d26d1ee.

Revert "ci: build the librpma fio engine"

This reverts commit 4e2bd713356cfc89ea6c898985c492af93b34a5d.

ci: add containers for Alma, Oracle, and Rocky Linux

Expand our testing platforms with these distributions. They mostly use
the same package names as Fedora with a handful of exceptions.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

ci: install updated bash on macOS platforms via workflow

Our shell script for installing dependencies uses a feature that is only
available starting with bash 4. macOS ships with bash 3, so install bash
from homebrew in the GitHub Actions workflow when runing on macOS.
Previously we could install bash in our shell script for installing
dependencies but this depedencies install script now needs the bash 4
feature.

The feature in question is for bash to be able to match multiple
patterns in a case statement.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

ci: install isal packages for testing

This brings in faster checksum calculation routines that are used for
protection information support.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

ci: run tests in containers

This patch adds GitHub Actions tests that run in containers to support
running tests on Debian and Fedora. Ubuntu tests are also run in
containers.

The io_uring (t0018) and command priority (in latency_percentiles.py)
tests are not supported in containers, so the Ubuntu container tests
cannot replace the tests running directly on GitHub Actions Ubuntu
runners.

This is a single uncomfortably large patch because all of these changes
are required for the tests to pass.

Here is a list of changes:

ci.yml:
  add GitHub Actions jobs for the different containers
actions-build.sh:
  use bash found in PATH to pick up bash 4 installed on macOS because
    bash 4 is required to match multiple patterns in a case statement
  only enable cuda when running on Ubuntu because Debian does not have
    the cuda package by default
actions-full-test.sh:
  skip io_uring and cmdprio tests when running on containers because
    these features are not supported
actions-install.sh:
  install nvidia-cuda-dev only on Ubuntu because this is not available
    for Debian by default
  install additional packages when running on Debian and Ubuntu
    containers. These are already installed in the GHA image.
  install packages for Fedora
  install bash via homebrew on macOS to get bash v4

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

ci: remove arm64 case for tests

We no longer run tests on arm64 platforms, so remove related part of the
shell script.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Merge branch 'errdetails' of https://github.com/minwooim/fio

* 'errdetails' of https://github.com/minwooim/fio:
io_uring: Add .errdetails to parse CQ status
ioengines: Add thread_data to .errdetails

io_uring: Add .errdetails to parse CQ status

Background
- fio normally prints out the strerr and errno value when facing
   errors.  In case of io_uring_cmd ioengine with --cmd_type=nvme,
   io_u->error represents the CQ entry status code type and status
   code which should be parsed as a NVMe error value rather than
   errno.

In io_u error failure condition, it prints out parsed CQ entry error
status values with SCT(Status Code Type) and SC(Status Code).  The print
will be like the following example:

  fio: io_uring_cmd: /dev/ng0n1: cq entry status (sct=0x00; sc=0x04)

If --cmd_type!=nvme, it prints out generic status code like below:

  fio: io_uring_cmd: /dev/<devnode>: status=0x4

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>

ioengines: Add thread_data to .errdetails

No functional changes here, but added a 'struct thread_data *td' to the
errdetails callback. This is a prep patch for the following commits to
access 'td->eo' instance from .errdetails callback.

Bump up FIO_IOOPS_VERSION to 36 since the previous commits updated
.errdetails callback for ioengines by adding 'thread_data' argument.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>

Merge branch 'master' of https://github.com/scaleoutsean/fio

* 'master' of https://github.com/scaleoutsean/fio:
Improve http_host, filename in docs and example/http-s3.fio

Improve http_host, filename in docs and example/http-s3.fio

In fio.1 and HOWTO.rst:
  * http_host: add details on vHost- vs Path-style hostname
    differences
  * filename: add details on bucket prefix for Path-style hostname
In examples/http-s3.fio:
  * add comments at the top to clarify vHost vs Path S3 and change
    the hostname in existing example to disambiguate two scenarios

Signed-off-by: Sean Lee <sean.lee@netapp.com>

nvme/streams: avoid allocating too large a buffer

We only need to allocate as many data structures as there were stream
IDs provided.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Merge branch 'qnx-phys-mem' of https://github.com/mvf/fio

* 'qnx-phys-mem' of https://github.com/mvf/fio:
QNX: Fix physical memory detection

QNX: Fix physical memory detection

On QNX 7.1, this was returning an uninitialized value due to failing
sysctl. Use the accumulated size of all "ram" areas from the asinfo
array in the syspage instead. Also fixes build on QNX 8.0.

Signed-off-by: Matthias von Faber <mvf@gmx.eu>

Merge branch 'fdp/pid_limit_fix' of https://github.com/ankit-sam/fio

* 'fdp/pid_limit_fix' of https://github.com/ankit-sam/fio:
  ioengines: bump up FIO_IOOPS_VERSION
  dataplacement: change log_info to log_err for error messages
  dataplacement: remove FDP_MAX_RUHS

ioengines: bump up FIO_IOOPS_VERSION

For fdp backend the way ruhs are fetched has been changed.
Earlier fdp_fetch_ruhs was called once with a buffer that can store
ruhs upto FDP_MAX_RUHS. The new fdp_fetch_ruhs is called twice. The
first call doesn't have any buffer for ruhs, and is only to get the
number of ruhs reported by the device. The second call has the buffer
that can store all the ruhs.

This impacts any external ioengines, so bump up FIO_IOOPS_VERSION.

Fixes: commit 56d12245 (dataplacement: update ruh info initialization)

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>

dataplacement: change log_info to log_err for error messages

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>

dataplacement: remove FDP_MAX_RUHS

Earlier fio used to have different limits on the max number of data
placement ID indices which user can pass, and the max number of ruhs
which can be stored. As during initialization we now fetch all the ruhs
from the device and then allocate buffer for the requested ones, there
is no need for FDP_MAX_RUHS.

Add missing error message if incorrect placement ID index is specified.

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>

t/nvmept_fdp: accommodate devices with many RUHS

Fio can only accept 128 placement IDs to write to. It is possible for
namespaces to have thousands of placement IDs. Adjust the standard tests
to acommodate this situation. Instead of just assuming that the device
has fewer than 128 placement IDs, change the expected RUAMW calculations
to also work for the case where the namespace has more than 128
placement IDs exceeds 128.

Also adjust the scheme test to work where there are more RUHS than the
max scheme entries allowed.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

engines/xnvme: allocate fdp ruhs buffer as per actual

Remove the restriction on maximum number of ruhs, fetch and fill the
ruhs buffer as requested by fdp backend.

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

engines/io_uring: fdp allocate ruhs buffer as per actual

Use calloc instead of scalloc as ruhs buffer allocation is temporary.
Remove the restriction on maximum number of ruhs, fetch and fill the
ruhs buffer as requested by fdp backend.

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

dataplacement: update ruh info initialization

The current way of initilization limits ruhs to 128. This commit
updates the way we fetch ruhs. We now fetch the ruhs info in two steps.
The first step only gets us the number of ruhs from the ioengine. This
is used by fdp backend to allocate the correct buffer size for the
second step, where we fetch the actual ruhs info. Fio no longer limits
the maximum number of ruhs for a device.

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
[Vincent: edited commit message]

os/os-qnx: fix overwrite of 'errno'

You can't call perror() and expect errno to retain the same value,
it'll be zero after that. On top of that, there's no reason to
print the error here, the higher up layers will do that when an error
is returned.

Fixes: 946733c6f19c ("Support QNX OS platform")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'master' of https://github.com/huangweiliang/fio_for_qnx

* 'master' of https://github.com/huangweiliang/fio_for_qnx:
Support QNX OS platform

Support QNX OS platform

Compile command: QNX SDP7.1 example
./configure --cc=aarch64-unknown-nto-qnx7.1.0-gcc --disable-shm
make

Function is verified by UFS device read and write test.

Signed-off-by: Huang weiliang <weller.huang@us.bosch.com>

Merge branch 'io-uring-cmd/nvme/add-flush' of https://github.com/minwooim/fio

* 'io-uring-cmd/nvme/add-flush' of https://github.com/minwooim/fio:
  t/nvmept.py: Add test cases for FLUSH
  io_u: Support fsync for --rw=trimwrite
  io_u: Ensure fsync only after write(s)
  td: Replace last_was_sync with last_ddir_issued
  td: Rename last_ddir to last_ddir_completed
  io_uring: Add support FLUSH command

t/nvmept.py: Add test cases for FLUSH

This test script tests number of FLUSH commands triggered by --fsync=<N>
options to make FLUSH commands are followed by the WRITE commands from
the various --rw I/O workload.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>

io_u: Support fsync for --rw=trimwrite

Even if ddir is determined in get_rw_ddir(), ddir might be updated in
set_rw_ddir(). if td represents trimwrite, it will be updated to either
DDIR_TRIM or DDIR_WRITE even ddir already represents for DDIR_SYNC.

To support DDIR_SYNC(fsync) for trimwrite, this patch checks ddir_sync()
in case of trimwrite not to update the pre-determined ddir.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>

io_u: Ensure fsync only after write(s)

When using `--rw=write --fsync=N`, the FLUSH command is correctly
issued after N WRITE commands. However, if READ commands are mixed
in with --rw, fsync occurs after READ commands as well. This patch
ensures that fsync is only triggered after the specified number of
WRITE commands, regardless of READ commands.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>

td: Replace last_was_sync with last_ddir_issued

`last_was_sync` has represented that the last command had DDIR_SYNC.
This can be replaced with `ddir_sync(last_ddir_issued)` and it's much
more flexible to represent the last issued command's data direction.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>

td: Rename last_ddir to last_ddir_completed

`last_ddir` represents the data direction of the latest completed
command. To avoid confusions, this patch renamed `last_ddir` to
`last_ddir_completed` to make it much more clear.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>

io_uring: Add support FLUSH command

Add support for --fsync and --fdatasync in io_uring_cmd ioengine to
enable FLUSH commands just like libaio or io_uring ioengines.

If --fsync or --fdatasync is given N, FLUSH command will be issued as
per N write commands.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>

windows: don't define strtoll for 32-bit builds

Our 32-bit Windows Cygwin builds started failing because one of the
libraries they require now defines strtoll when fio already defines
strtoll. To avoid this, don't define strtoll for 32-bit Windows builds.

Failed build: https://github.com/axboe/fio/actions/runs/9718276970/job/26825784199

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Merge branch 'nfs' of https://github.com/panxiao2014/fio

* 'nfs' of https://github.com/panxiao2014/fio:
Fix issue when start randwrite by using nfs engine

Fix issue when start randwrite by using nfs engine

Use td_write(td) instead of td->o.td_ddir == TD_DDIR_WRITE
So fio will treat write and randwrite the same way when start jobs
by using NFS engine.

Pan Xiao <xiaopan@outlook.com>

smalloc: add a comment explaining why scalloc does not zero memory

scalloc does not zero out the buffer because this is already done
elsewhere. Explain this in a comment because this could be confusing.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

helper_thread: check for null scalloc return value

scalloc can return NULL if it fails to allocate memory. Check for this
condition and fail the helper thread setup if it occurs.

This issue was reported by Coverity:

** CID 496644:  Null pointer dereferences  (NULL_RETURNS)
/helper_thread.c: 425 in helper_thread_create()
419
420      hd = scalloc(1, sizeof(*hd));
421
422      setup_disk_util();
423      steadystate_setup();
424
>>>     CID 496644:  Null pointer dereferences  (NULL_RETURNS)
>>>     Dereferencing "hd", which is known to be "NULL".
425      hd->sk_out = sk_out;

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

t/stest: remove useless error assignment

Because the break statement breaks out of the while loop setting error
to true has no effect. So remove this useless assignment.

This issue was reported by Coverity:

** CID 496645:  Code maintainability issues  (UNUSED_VALUE)
/t/stest.c: 83 in do_rand_allocs()
77      flist_del(&e->list);
78      sfree(e);
79
80      if (!error) {
81      e = scalloc(1, LARGESMALLOC);
82      if (!e) {
>>>     CID 496645:  Code maintainability issues  (UNUSED_VALUE)
>>>     Assigning value "true" to "error" here, but that stored value is overwritten before it can be used.
83      error = true;
84      ret++;
85      printf("failure allocating %u bytes at %lu allocated during sfree phase\n",
86      LARGESMALLOC, total);
87      break;
88      }

Fixes: c6783fc3 ("t/stest: confirm that scalloc clears the buffer")
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

iolog: check scalloc return value

It is possible for scalloc to return NULL. setup_log() does not return a
value to indicate failure but we can use an assert here to check for a
NULL scalloc return value. This will trigger an exception similar to the
segfault that would happen if scalloc returns null, but this should
silence Coverity.

This was reported by Coverity:

** CID 496646:  Null pointer dereferences  (NULL_RETURNS)
/iolog.c: 843 in setup_log()

*** CID 496646:  Null pointer dereferences  (NULL_RETURNS)
/iolog.c: 843 in setup_log()
837      struct io_log *l;
838      int i;
839      struct io_u_plat_entry *entry;
840      struct flist_head *list;
841
842      l = scalloc(1, sizeof(*l));
>>>     CID 496646:  Null pointer dereferences  (NULL_RETURNS)
>>>     Dereferencing "l", which is known to be "NULL".

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

t/stest: confirm that scalloc clears the buffer

Change smalloc calls to scalloc and make sure buffers are zeroed out.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Reapply "smalloc: smalloc() already clears memory, scalloc() need not do it again"

This reverts commit eb7fe4550ff2a569d0d8c71de16a1ea1e1aaf0a5.

It turns out that each buffer is in fact cleared in smalloc_pool() when
it is called by smalloc(). So there is no need to clear the buffer a
second time in scalloc.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

engines/io_uring: eliminate FDP memory corruption risk

We only allocate FDP_MAX_RUHS reclaim unit handle status descriptors. It
is possible that the device will have more than this many descriptors.
Make sure we do not run over the end of the buffer we have allocated
when this happens.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Revert "smalloc: smalloc() already clears memory, scalloc() need not do it again"

Originally:
smalloc cleared the buffer
scalloc did not need to clear the differ

Later: 9c3e13e3314da394698ca32f21cc46d46b7cfe47
smalloc was changed to not clear the buffer
scalloc was not updated to clear the buffer when the above smalloc
change was made

Originally smalloc always cleared the buffer. So it wasn't necessary for
scalloc to clear it again. But later on smalloc was changed to no longer
clear the buffer but scalloc was not changed back to clear the buffer.

Reverting this commit makes scalloc clear the buffer again.

This reverts commit a640ed36829f3be6d9dd8c7974dba41b9b59e6a5.

Fixes: https://github.com/axboe/fio/issues/1772
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

dataplacement: increase max data placement IDs to 128

Some users have requested the ability to test a larger number of
placement IDs in a single job. Bump the max placement IDs to 128.
Change the type to 16 bits to reduce the amount of space these
additional IDs will consume.

Also bump the server version for this change.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

docs: update and clarify plids option

Make it clearer that for FDP the values specified by the plids option
are indices referencing the list of placement identifiers available to
the namespace.

Also note that it now accepts ranges.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

t/nvmept_fdp: add tests for plid ranges

Add a few tests to make sure that parsing of ranges for placement ID
indices works.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

options: support ranges for FDP plids option

Instead of forcing users to list every single placement ID, allow users
to specify a list of ranges (1-3, 4-6, 7, 8) for placement IDs.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

docs: fix operations misspellings

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Merge branch 'io_uring_cmd/support-write-family' of https://github.com/samsungds/fio

* 'io_uring_cmd/support-write-family' of https://github.com/samsungds/fio:
io_uring: Add 'write_mode' option for optional cmds

io_uring: Add 'write_mode' option for optional cmds

Add a new option 'write_mode' to support additional optional Write
command family such as Write Uncorrectable and Write Zeroes in NVMe.
Since we have io_uring_cmd ioengine where we can define the actual
opcode of the command, this option will be used to test NVMe device with
various combination of Write command types.

'write_mode' option can be given either 'write', 'uncor', 'zeroes' or
'verify'. 'write' is for normal Write command which is by default,
'uncor' is for Write Uncorrectable, 'zeroes' for Write Zeroes and 'verify'
for Verify command This should be used with DDIR_WRITE ddir.

This patch updates command's opcode in fio_ioring_init() to avoid
branches in the I/O hottest path giving opcode value to the
fio_nvme_uring_cmd_prep() as an argument.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>

Merge branch 'fix-coverity-scan-defect' of https://github.com/parkvibes/fio

* 'fix-coverity-scan-defect' of https://github.com/parkvibes/fio:
iolog: fix Error handling issues (NEGATIVE_RETURNS)
iolog: fix Null pointer dereferences (FORWARD_NULL)

iolog: fix Error handling issues (NEGATIVE_RETURNS)

CID 494151: Error handling issues (NEGATIVE_RETURNS) @ io_u.c:1877 in get_io_u()
This patch removes negative returns from dp_init() to ensure
its value can be properly consumed by td_verror()

Signed-off-by: Hyunwoo Park <dshw.park@samsung.com>

iolog: fix Null pointer dereferences (FORWARD_NULL)

CID 494150: Null pointer dereferences (FORWARD_NULL) @ iolog.c:148 in ipo_special()
This patch removes the possibility of null pointer dereferencing(io_u->file)
throughout the call stack of get_io_u() → read_iolog_get() → dp_fill_dspec_data()

Signed-off-by: Hyunwoo Park <dshw.park@samsung.com>

Merge branch 'enable-dataplacement-while-replaying-io' of https://github.com/parkvibes/fio

* 'enable-dataplacement-while-replaying-io' of https://github.com/parkvibes/fio:
t/nvmept_fdp: add a test(402)
fio: enable dataplacement(fdp) while replaying I/Os

Merge branch 'io_uring/fix-negative-cqe-status' of https://github.com/minwooim/fio

* 'io_uring/fix-negative-cqe-status' of https://github.com/minwooim/fio:
options: Add support hex value to ignore_error
io_uring: Fix the flip to negative of CQE status

t/nvmept_fdp: add a test(402)

A test(402) checks whether dataplacement(fdp) works fine while replaying iologs

Signed-off-by: Hyunwoo Park <dshw.park@samsung.com>

fio: enable dataplacement(fdp) while replaying I/Os

Add initialization and dataplacement logic to enable
dataplacement(fdp) while fio replays I/Os with read_iolog.

Signed-off-by: Hyunwoo Park <dshw.park@samsung.com>

Merge branch 'nvme/support-sync-fua-for-iouring-v2' of https://github.com/minwooim/fio

* 'nvme/support-sync-fua-for-iouring-v2' of https://github.com/minwooim/fio:
io_uring: Add 'readfua' and 'writefua' options

io_uring: Add 'readfua' and 'writefua' options

Provide options to set the FUA flag in CDW12 in the NVMe command. FUA
affects the internal operation of the NVMe controller and is used for
testing. In this patchset we expand readfua and writefua options to
directly control FUA flag in io_uring_cmd engine.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>

Merge branch 'enable-dataplacement-scheme' of https://github.com/parkvibes/fio

* 'enable-dataplacement-scheme' of https://github.com/parkvibes/fio:
t/nvmept_fdp: add tests(302,303,400,401) for fdp scheme
fdp: support scheme placement id (index) selection

t/nvmept_fdp: add tests(302,303,400,401) for fdp scheme

- 302/303: invalid options tests
- 400/401: check whether fdp scheme works properly

Signed-off-by: Hyunwoo Park <dshw.park@samsung.com>

fdp: support scheme placement id (index) selection

Add a new placement id selection method called scheme. It allows
users to assign a placement ID (index) depending on the offset range.
The strategy of the scheme is specified in the file by user and
is applicable using the option dp_scheme.

Signed-off-by: Hyunwoo Park <dshw.park@samsung.com>

options: Add support hex value to ignore_error

The 'ignore_error=str' option expects either the name or the numeric
value of the errno in string format. With recent additions like
io_uring_cmd ioengine, it's been possible to check not only errno values
but also the actual status values provided by the storage device.

Given that most status codes in NVMe specs are represented in
hexadecimal, specifying error values in hexadecimal is also useful to
ignore some errors.

For example, DULBE (Deallocated or Unwritten Logical Block Error) status
code can be ignored:

ignore_error=0x287

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>

io_uring: Fix the flip to negative of CQE status

Since cqe->res is expected to be a negative value of errno, it's been
flipped to a positive value before passing it to io_u.c. However, in
case of io_uring_cmd with cmd_type=nvme, cqe->res might represent a NVMe
completion status type and code along with control fields such as DNR
since nvme_uring_cmd_end_io() in the NVMe driver sets the completion
status value and passes it up to io_uring.

For example, If a DULBE(Deallocated or Unwritten Logical Block Error)
is coming up from the device, cqe->res here would be 0x4287 which is a
DULBE error code of media error type.

This patch unified the error code to a positive value regardless of the
error type.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>

t/zbd: avoid test case 31 failure with small devices

The test case assumed that the test target devices have 128 or more
sequential write required zones and uses 128 as the minimum number of
zones to write. This caused failure when the devices had a smaller
number of sequential write required zones. To avoid the failure, count
the actual number of sequential write required zones and use it if it is
smaller than 128.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240430103022.4136039-4-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

t/zbd: add test case to confirm verify_backlog=1 options

The previous commit fixed the verify failure due to the zone reset with
the verify_backlog option. Add a test to confirm the fix.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240430103022.4136039-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

zbd: remove unnecessary verify_backlog check in zbd_file_reset()

The commit c5c8b92be5a2 ("zbd: fix zone reset condition for verify")
improved zbd_file_reset() to not reset zones when data to verify is
left. To check the left verify data, it tried to do the same as
check_get_verify() including the check for the modulo operation
"td->io_hist_len % td->o.verify_backlog". This check is required in
check_get_verify() to know when to do the verify backlog operation.
However, this check is not required in zbd_file_reset() since zone reset
is not related to the verify backlog timing. The unnecessary check for
"td->io_hist_len % td->o.verify_backlog" allows to reset zones even when
td->io_hist_len is non-zero and the data to verify is left. It erases
the data to verify and causes verify errors. Fix this by removing the
unnecessary check.

Fixes: c5c8b92be5a2 ("zbd: fix zone reset condition for verify")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240430103022.4136039-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

t/nvmept_trim: increase transfer size for some tests

The final sequence of tests uses a block size of 4096 bytes. This can be
slow enough on some platforms to trigger a 10-minute timeout. Increase
the block size to 256K to reduce the run time.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

docs: update for new data placement options

Update the HOWTO and man page for the unified data placement options
that cover both FDP and Streams.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>