git.kernel.dk Git - fio.git/log

t/io_uring: Vectored fixed buffer test support for nvme passthrough path

The current kernel NVMe passthrough path already supports vectored IO
when using fixed buffers, but fio has not yet adapted it. This patch
aims to add a corresponding test interface in fio.

Test results:

taskset -c 1 t/io_uring -b512 -d64 -c2 -s2 -p1 -F1 -B1 -O0 -n1 -V1 -u1 -r4 /dev/ng1n1
submitter=0, tid=6179, file=/dev/ng1n1, nfiles=1, node=-1
polled=1, fixedbufs=1, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=289.78K, BW=141MiB/s, IOS/call=1/1
IOPS=294.68K, BW=143MiB/s, IOS/call=1/1
IOPS=295.26K, BW=144MiB/s, IOS/call=1/1
Exiting on timeout
Maximum IOPS=295.26K

taskset -c 1 t/io_uring -b512 -d64 -c2 -s2 -p1 -F1 -B1 -O0 -n1 -V0 -u1 -r4 /dev/ng1n1
submitter=0, tid=6183, file=/dev/ng1n1, nfiles=1, node=-1
polled=1, fixedbufs=1, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=292.31K, BW=142MiB/s, IOS/call=1/1
IOPS=295.79K, BW=144MiB/s, IOS/call=1/1
IOPS=290.78K, BW=141MiB/s, IOS/call=1/1
Exiting on timeout
Maximum IOPS=295.79K

Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

stat: used shared sem for stats lock

Like commit:

21628ec537c7 ("fio_sem, diskutil: introduce fio_shared_sem and use it for diskutil lock")

the stats sem is also potentially shared between processes, and hence
should be allocated and freed as a shared sem.

See the referenced commit, which has more details. Switch the stats sem
to be allocated in such a way that it's propagated properly between
processes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'master' of https://github.com/Meiye-lj/fio

* 'master' of https://github.com/Meiye-lj/fio:
Makefile: fix missing test tool and unit test dependencies

Makefile: fix missing test tool and unit test dependencies

This PR fixes an issue in the Makefile. Specifically, previously, any
modifications of files like lib/types.h would not trigger a rebuild of
t/verify-state.o. The PR fixes this by including them as additional
dependencies. Mainly, T_OBJS and UT_OBJS do not use .d files to record
dependencies correctly, unlike OBJS.

Signed-off-by: Jun Lyu <lvjun_dnt@outlook.com>

Merge branch 'improve_flushing_darwin' of https://github.com/Developer-Ecosystem-Engineering/fio

* 'improve_flushing_darwin' of https://github.com/Developer-Ecosystem-Engineering/fio:
mac: add readahead control to the posix_fadvise() shim
mac: implement (file) cache invalidation

Fio 3.41

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'fix_verify-state' of https://github.com/sitsofe/fio

* 'fix_verify-state' of https://github.com/sitsofe/fio:
t/verify-state: improve verify state inflight output
t/verify-state: synchronise verify state version

t/verify-state: improve verify state inflight output

- Move INVALID_NUMBERIO to the verify-state.h to make it accessible to
t/verify-state.c
- Make unused inflight I/O slots more obvious

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

t/verify-state: synchronise verify state version

Stop hard coding the supported state version number in t/verify-state.c
and just use VSTATE_HDR_VERSION so we stay in sync with the rest of fio.

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

sprandom: wait to free validity_dist

This array is actually used to calculate invalid_capacity. So wait to
free it until the very end.

Fixes: 8c8e7050ccd9 ("sprandom: drop validity_dist after use")
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Merge branch 'sprandom-fixes' of https://github.com/tomas-winkler-sndk/fio

* 'sprandom-fixes' of https://github.com/tomas-winkler-sndk/fio:
  sprandom: drop validity_dist after use
  sprandom: free invalid_pct buffer
  sprandom: fix debug printout for offset
  sprandom: setup SPRandom before total_io_size is computed

mac: add readahead control to the posix_fadvise() shim

- Add support for POSIX_FADV_NORMAL in the posix_fadvise() shim by just
  ignoring it
- Add support for POSIX_FADV_SEQUENTIAL/POSIX_FADV_RANDOM by mapping
  them to enable/disable of readahead via fcntl(..., F_RDAHEAD, ...).
  Because macOS only lets you control readahead at the descriptor level
  the offset and len values passed will be ignored and range control is
  not done.

The impact of being able to tune readahead is demonstrated by the
bandwidths achieved by the following jobs running on an SSD of an
otherwise idle Intel Mac laptop with 16GBytes of RAM:

./fio --stonewall --size=128M --filename=fio.tmp --bs=4k --rw=read \
  --name=sequential-readahead --fadvise=sequential \
  --name=sequential-no-readahead --fadvise=random

[...]
sequential-readahead: (groupid=0, jobs=1): err= 0: pid=6250: Tue Sep  2 22:10:45 2025
  read: IOPS=331k, BW=1293MiB/s (1356MB/s)(128MiB/99msec)
[...]
sequential-no-readahead: (groupid=1, jobs=1): err= 0: pid=6251: Tue Sep  2 22:10:45 2025
  read: IOPS=25.9k, BW=101MiB/s (106MB/s)(128MiB/1263msec)

rm -f fio-huge.tmp
truncate -s 1T fio-huge.tmp
./fio --stonewall --filename=fio-huge.tmp --bs=32k --runtime=10s --rw=randread:3 \
  --name=partial-random-no-readahead --fadvise=random \
  --name=absorb-cache-invalidation --number_ios=1 --bs=4k \
  --name=partial-random-readahead --fadvise=sequential

[...]
partial-random-no-readahead: (groupid=0, jobs=1): err= 0: pid=6259: Tue Sep  2 22:12:35 2025
  read: IOPS=92.4k, BW=2888MiB/s (3029MB/s)(28.2GiB/10001msec)
[...]
partial-random-readahead: (groupid=2, jobs=1): err= 0: pid=6261: Tue Sep  2 22:12:35 2025
  read: IOPS=61.8k, BW=1931MiB/s (2024MB/s)(18.9GiB/10001msec)

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

mac: implement (file) cache invalidation

This (finally) provides macOS cache invalidation and is heavily based on
code originally provided by DeveloperEcosystemEngineering@apple.

Because posix_fadvise() isn't implemented on macOS,
DeveloperEcosystemEngineering demonstrated that creating a shared
mapping of a file and using using msync([...], MS_INVALIDATE) on it can
be used to discard covered page cache pages instead - ingenious! This
commit uses that technique to create a macOS posix_fadvise([...],
POSIX_FADV_DONTNEED) shim.

To paraphrase commit 8300eba5 ("windowsaio: add best effort cache
invalidation") that was done for similar reasons:

This change may make default bandwidth speeds on macOS look lower
compared to older versions of fio but this matches the behaviour of fio
on other platforms with invalidation (such as Linux) because we are
trying to avoid measuring cache reuse (unless invalidate=0 is set).

The impact of invalidation is demonstrated by the bandwidths achieved by
the following jobs running on an SSD of an otherwise idle Intel Mac
laptop with 16GBytes of RAM:

./fio --stonewall --size=128M --ioengine=posixaio --filename=fio.tmp \
  --iodepth=64 --bs=4k --direct=0 \
  --name=create --rw=write \
  --name=cached --rw=randread --loops=2 --invalidate=0 \
  --name=invalidated --rw=randread --loops=2 --invalidate=1

[...]
cached: (groupid=1, jobs=1): err= 0: pid=7795: Tue Sep  2 22:34:12 2025
  read: IOPS=228k, BW=889MiB/s (932MB/s)(256MiB/288msec)
[...]
invalidated: (groupid=2, jobs=1): err= 0: pid=7796: Tue Sep  2 22:34:12 2025
  read: IOPS=46.8k, BW=183MiB/s (192MB/s)(256MiB/1399msec)

v2:
- Move platform specific code into its own file under os/mac/
- Don't do prior fsync() because msync([...], MS_INVALIDATE) doesn't
  imply the dropping of dirty pages and will have the same effect

v3:
- Up the mmap chunk size to 16 GBytes to reduce the number of times we
  mmap()/msync()/munmap() on large files
- Align offset and len to the system page size to prevent errors on jobs
  like ./fio --name=n --offset=2k --size=30k
- Try and munmap() if msync() fails
- Make Rosetta comment clearer
- Drop some variables and rename some others
- Don't bother trying to restore errno after displaying an error message
  because posix_fadvise() isn't defined as setting errno

Fixes: https://github.com/axboe/fio/issues/48
Suggested-by: DeveloperEcosystemEngineering <DeveloperEcosystemEngineering@apple.com>
Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

sprandom: drop validity_dist after use

The validity_dist buffer is only needed to compute the invalid_pct
array. Once that is done, we can free it instead of keeping it
around unnecessarily.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: free invalid_pct buffer

Free invalid_pct, fix a memory leak.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: fix debug printout for offset

Fix a lefover bug, after code changes. Use 'offset' instead of *b pointer.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: setup SPRandom before total_io_size is computed

Move the sprandom_init() call to occur before total_io_size is computed,
in order to ensure correctly compute statitics.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

Merge branch 'patch-1' of https://github.com/neheb/fio

* 'patch-1' of https://github.com/neheb/fio:
fio: fix formats under MIPS64/PPC

fio: fix formats under MIPS64/PPC

__SANE_USERSPACE_TYPES__ needs to be defined to get consistent formats on all platforms.

It mostly affects 64-bit architectures (no op on 32 bit) with long long
vs long.

Signed-off-by: Rosen Penev <rosenp@gmail.com>

Merge branch 'fix_mandoc_warnings' of https://github.com/sitsofe/fio

* 'fix_mandoc_warnings' of https://github.com/sitsofe/fio:
  man: update date
  man: fix mandoc "PP empty" warnings
  man: fix mandoc lint errors

man: update date

Update the man page's date to match the date of the last fio release
(fio-3.40) so things look less crufty.

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

man: fix mandoc "PP empty" warnings

Silence mandoc "WARNING: skipping paragraph macro: PP empty" complaints.

This also fixes the over indention of the final paragraph in "Trace file
format v3".

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

man: fix mandoc lint errors

Fix the most egregious warnings produced by running
mandoc -W warning,stop ./fio.1 > /dev/null

- Fix usage of .RE when .RS block isn't open
- Stop escaping = as it's not needed
- Fix incorrect usage .TP before .RE as the line following .TP isn't a
leading tag and it is having no effect
- Fix broken macro sequence and incorrect usage of \fR...\fP which
should have been \fB...\fP because it's outside of .BI

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>

verify: use new buffer for threads with %o format

When thread=1 multiple jobs share the same buffer used to expand the %o
format specifier. This can cause verify failures when one thread tries
to verify a buffer using the %o expansion from another thread. This
patch makes sure threads use different buffers.

This is a stop-gap measure to resolve verify failures for now. A better
solution would be to at init time allocate a set of buffers for each job
thread (and verify_async thread since they are vulnerable to the same
issue) and use those buffers instead of doing a malloc/free for each
verify operation.

Fixes: https://github.com/axboe/fio/issues/1845
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

sprandom: abort when invalid options specified

Since fio is often used in scripts, refuse to continue and give the user
an opportunity to correct invalid settings when they appear.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Merge branch 'sprandom' of https://github.com/tomas-winkler-sndk/fio

* 'sprandom' of https://github.com/tomas-winkler-sndk/fio:
  sprandom: integrate sprandom_get_next_offset() into io_u path
  sprandom: initialize sprandom for file
  sprandom: implement sprandom_get_next_offset()
  sprandom: initialize random state
  unittests: add pcbuf simple unit test
  sprandom: pcbuf.h add two-phase circular buffer header-only library
  unittests: add bytes2str_simple()
  num2str: add bytes2str_simple()
  sprandom: set up LFSR random generator and disable randommap
  sprandom: implement region computation and invalidation percentage
  sprandom: examples: add sprandom example file
  sprandom: add debug facility
  sprandom: add command line options

Kill of IO engine cancelation support

This was more of a thought experiment back in the day, but even for
an old interface like libaio on Linux, it does not support canceling
IOs at all. Neither does posixaio. And while cancel support could
get plumbed up to io_uring, since Linux doesn't support canceling
normal IO, then it will never do anything.

Hence it's utterly pointless to have a cancel ops in the IO engine,
and the backend attempts at first reaping done IO and then canceling
the rest is also then pointless.

Just replace the at-exit cancelation with waiting on pending IO.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

backend: call IO engine post-init after file creation

A bug report says that using the io_uring engine with the
registerfiles=1 option fails if create_serialize=0 is also used. The
reason is that if create_serialize=0 is used, then file setup is
deferred until later. In fact so late, that it happens after the IO
engine post_init() handler is called, which is what registers the files
on the io_uring side.

Move the post_init call a bit later, after setup_files() has been run.

Link: https://github.com/axboe/fio/issues/1971
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/libaio: enable libaio fsync

libaio has supported async fsync for a long time, enable it rather
than drain the queue depth to zero and do a sync fsync. This may be
slower than the previous approach, but the libaio engine should be
using the libaio fsync support.

Link: https://github.com/axboe/fio/discussions/1967
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines: make engines static-correct

Mark internal function static.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
Link: https://lore.kernel.org/r/20250820130321.73376-4-tomas.winkler@sandisk.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

engines/http: make fio_http_getevents static

The fio_http_getevents handler is not used outside of http.c,
so it should be declared static.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
Link: https://lore.kernel.org/r/20250820130321.73376-3-tomas.winkler@sandisk.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

stat: make stat.c static and const-correct

Mark internal function static and input only parameters const.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
Link: https://lore.kernel.org/r/20250820130321.73376-2-tomas.winkler@sandisk.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

sprandom: integrate sprandom_get_next_offset() into io_u path

Invoke sprandom_get_next_offset() to generate offsets for sprandom
random-write operations.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: initialize sprandom for file

plug sprandom to file initialization
make sure that sprandom operates on one file

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: implement sprandom_get_next_offset()

Implement sprandom_get_next_offset(), which generates offsets for each
region using an LFSR. The function enforces an invalidation percentage
by randomly recycling a defined fraction of offsets back into the pool.

A two-phase cyclic buffer is used to manage this process:
one phase collects new offsets while the other serves recycled offsets.
When transitioning between regions, all stored offsets are exhausted
first, ensuring the target invalidation level is achived.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: initialize random state

sprandom requires a random number generator
to randomly choose which offsets will be
rewritten.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

unittests: add pcbuf simple unit test

Add tests for basic functionality and wraparound behavior.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: pcbuf.h add two-phase circular buffer header-only library

Implements a circular buffer with staged (write-ahead) and committed
(read-visible) regions using dual head pointers. Data is written to a
staging area and becomes visible only upon explicit commit.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

unittests: add bytes2str_simple()

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

num2str: add bytes2str_simple()

Function only converts bytes to a human-readable provided
string. It doesn't allocate memory and can be used directly
in printf("%s", bytes2str_simple())

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: set up LFSR random generator and disable randommap

SPrandom targets large storage devices, hence use an LFSR generator
to ensure full storage coverage without repetion and without
extra memory.
Disable randommap since it prevents repeated writes to the same offset,
which is required for invalidation.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: implement region computation and invalidation percentage

Divide storage into equally sized regions and compute desired
invalidation percentage per region.

This model estimates the distribution of valid data across a flash drive
in a steady state. It is based on the key insight from Desnoyers'
research, which establishes a relationship between data validity
and the physical space it occupies.

This is based on:
P. Desnoyers, "Analytic Models of SSD Write Performance" paper
and SandDisk internal reaserch using Markov chain analysis to
model write amplification as a function of over-provisioning.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: examples: add sprandom example file

An example demonstrating sprandom preconditioning:
Includes sample commands for basic execution, enabling debug output,
setting over-provisioning, and tuning region count for large devices.

Default job section:
[preconditioning]
sprandom=1
spr_op=0.15
spr_num_regions=100

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: add debug facility

Add FD_SPRANDOM debug facility

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

sprandom: add command line options

Add sprandom command line options:
1. boolean sprandom: enables the sprandom flow.
2. spr_num_regions: granularity of sprandom, defaults to 100
3. spr_op: over provisioning factor, defaults to 0.15

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>

Merge branch 'fix-install-paths' of https://github.com/kubo326/fio

* 'fix-install-paths' of https://github.com/kubo326/fio:
Makefile: fix man and share install paths on macOS

Merge branch 'http-range-header' of https://github.com/sfc-gh-rnarubin/fio

* 'http-range-header' of https://github.com/sfc-gh-rnarubin/fio:
engines/http: Add support for range reads

engines/http: Add support for range reads

The existing HTTP implementation treated blocks as individual objects
when reading and writing. This is useful in as far as it enables read
and write symmetry -- object stores generally don't allow writes to
individual ranges of objects like file systems do with files, so
treating blocks as objects is a practical way to replicate block reads
and writes.

Reading of object ranges, on the other hand, is widely supported by
object stores with the "Range" HTTP header. This change adds a parameter
which alters fio's object conventions to issue reads using the block
size and offset parameters more like file IO. When enabled, both reads
and writes will use the plain filename as the object path to issue IO.
Reads will add a "Range: bytes=<start>-<end>" header to the requests,
where the start and end positions are determined by the blocksize and
offset of the benchmark. Aside from the object path, writes are
unchanged for simplicity: the object size is determined by blocksize as
before.

Signed-off-by: Renar Narubin <renar.narubin@snowflake.com>

configure: skip isal64 check when isal check fails

If the isal check fails then we can skip the isal64 check because there
is no environment under which we would be able to detect isal64 support
without 16b isal support.

Suggested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

engines/io_uring: fix memory leaks during init

Make sure fio cleans up propely if it is unable to allocate memory while
initializing the io_uring ioengine.

This fixes an issue reported by Coverity.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

engines/io_uring: fix error value

io_u->error expects a (positive) errno. fio_nvme_pi_verify returns a
negative errno. Negate the return value when we store it in io_u->error.

This fixes an issue reported by Coverity.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Makefile: fix man and share install paths on macOS

Unlike when bcdf7c5 ("Install man page in /usr/share/man by default on OSX")
was made, recent macOS has /usr/local/share/man in the man path by default.
Further, since macOS 10.11 /usr/share is a read-only directory due to SIP which
means a default macOS fio install will fail.

Fix the above by changing the macOS install to use $(prefix) for man pages and
shared files the same way other OSes do.

Fixes: https://github.com/axboe/fio/issues/1957

Signed-off-by: Kubo Saburo <kubosaburo@protonmail.com>

verify: bump up verify state file version

Commit 935297d18d58 ("verify: plumb inflight write information through
verify state") modified the data structure of the verify state file.
However, the version number of the file was not incremented. Then users
may encounter compatibility issues when working with the verify state
file across different fio version. To avoid the issue, bump up the
version number.

Fixes: 935297d18d58 ("verify: plumb inflight write information through verify state")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20250809011555.948564-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

verify: fix write tracking in error cases

Commit a9ba7cef70a7 ("verify: rework write tracking for use with
verify_save_state()") reworked write tracking for verify workloads.
As part of the rework, two new functions were added: log_inflight() and
invalidate_inflight(). log_inflight() is called when a write io_u is
prepared for the verify workload. The call is ensured to occur only once
per write io_u by using the IO_U_F_PATTERN_DONE flag.
invalidate_inflight() is called when the write io_u completes. Under
non-failure conditions, the number of calls to these two functions
is expected to be equal upon the completions of all writes.

However, when a write io_u fails with an error, the balance between the
calls to log_inflight() and invalidate_inflight() breaks for two
reasons. Firstly, invalidate_inflight() is not called when the failed
sync writes. Secondly, the IO_U_F_PATTERN_DONE flag is not cleared when
the failed write io_u is reused. As a result the subsequent attempt to
prepare the io_u does not trigger the call to log_inflight(). The
unbalanced function calls causes mismatch between io_u->numberio and
td->inflight_issued, then results in abort in log_inflight(). This
failure symptom is observed by running t/zbd/run-tests-against-nullb for
the test case 72 in t/zbd/test-zbd-support.

To fix the unbalanced function calls, add calls to log_inflight() in
the sync write failure path in io_queue_event(). This ensures that
failed sync writes are appropriately logged. To fix the left
IO_U_F_PATTERN_DONE flag, clear the flag in io_queue_event() failure
path. Additional, clear the IO_U_F_BUSY_OK flag which is to be cleared
when inflight I/Os complete. For the code simplicity, introduce a new
helper function io_u_clear_inflight_flags() to avoid duplication of
the code to clear the flags.

Fixes: a9ba7cef70a7 ("verify: rework write tracking for use with verify_save_state()")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20250809011555.948564-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

crc: use ISA-L for crc64 NVMe if available

ISA-L implements the crc64 calculation needed for NVMe 64b Guard
Protection Information. Use the ISA-L library function if it's available
at build time since ISA-L faster than what fio provides.

The CRC64 function for NVMe 64b Guard PI was added to ISA-L in May 2023.
Some platforms may have an older version of ISA-L. So we must explicitly
detect whether the installed version of ISA-L is recent enough to have
this feature.

Link: https://github.com/axboe/fio/discussions/1952
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

ioengines: bump version number

Recently the fio file and io_u unit were changed, let's play it safe
and bump the ioengine version number too.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'verify_inflight' of https://github.com/noclip-code/fio

* 'verify_inflight' of https://github.com/noclip-code/fio:
  verify: add versioning to verify_header
  verify: clear inflight log in between loops
  verify: plumb inflight write information through verify state
  verify: rework write tracking for use with verify_save_state()
  verify: make numberio uint64_t

t/io_uring_pi: test script for io_uring PI

Test io_uring metadata support on NVMe devices.
Limited to direct=1 and metadata in a separate buffer.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20250725175808.2632-8-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/io_uring: support r/w with metadata

Use the new IOCTL to query block devices for metadata support. Then use
the new io_uring capability to send read and write commands with
metadata.

This reuses much of the existing infrastructure for io_uring_cmd
protection information support.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20250725175808.2632-7-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/io_uring: fill in guard generation options at init time

The protection information check flags, apptag, and apptag mask are
fixed for every single operation in a job. So we should just set these
values at init time instead of populating this structure anew for every
single IO.

Any uses of this structure are guarded by the device's protection
information type, so it is not a problem to fill in this structure even
when the device is formatted with Type 0 PI (no protection).

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20250725175808.2632-6-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/io_uring: simplify io_u_free

Just free the buffer unconditionally. If the pointer is null nothing
will happen, so no harm is done.

This lets us also use this function for the io_uring ioengine when it
gains the ability to handle metadata which will happen in a subsequent
patch.

No functional change intended.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20250725175808.2632-5-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/nvme: refactor filling protection information

Factor out of fio_nvme_pi_fill() the code that generates and fills in
the Guard Protection Information field. This is so that later patches
can use this code without filling in the fields in the NVMe command.

No functional change intended.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20250725175808.2632-4-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/nvme: move inline functions from .c to .h file

Move two inline helper functions from C source file to header file. In
later patches we will need to use these helper functions outside of
nvme.c.

No functional change intended.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20250725175808.2632-3-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/io_uring: store ioengine id in ioengine data

To reduce pointer chasing in the hot path just store whether we are
using the io_uring or io_uring_cmd ioengine in the ioengine data.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20250725175808.2632-2-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

eta: convert skip_eta() to ANSI C declaration

Update skip_eta() to use an ANSI C function declaration
by explicitly specifying the void parameter list.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
Link: https://lore.kernel.org/r/20250731122011.539660-2-tomas.winkler@sandisk.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

filesetup: make longest_existing_path() static and const-correct

Mark longest_existing_path() as static since it is only used within
filesetup.c. Also, declare the 'path' parameter as const char *
because it is not modified within the function.

Signed-off-by: Tomas Winkler tomas.winkler@sandisk.com
Link: https://lore.kernel.org/r/20250731122011.539660-1-tomas.winkler@sandisk.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

verify: add versioning to verify_header

Add a version field to verify_header and print a helpful message when
it does not match expectations. This can happen if a user tries to
verify data written by a different version of fio.

The version field is split from the verify_type field in order to avoid
messing with the layout of the struct. This is also makes distinguishing
the new versioned header from older unversioned headers easier, as the
old header format has a very limited set of valid values at this offset,
regardless of endian-ness. Set the MSB in the new version value to
distinguish it from the old header format.

Signed-off-by: Riley Thomasson <riley.thomasson@gmail.com>

verify: clear inflight log in between loops

When loops are used, the sequence number invariants in the inflight log
are broken. In particular, experimental verify can issue writes in
between loops, which ends up incrementing numberio without logging the
writes to the inflight log.

The intended interaction between loops and verify state save/load is a
bit murky to me, but it seems reasonable to clear the inflight log in
between each loop.

Signed-off-by: Riley Thomasson <riley.thomasson@gmail.com>

verify: plumb inflight write information through verify state

Plumb the new inflight write information from thread_data through
thread_io_list and the saved/loaded verify state.

Use this information in verify_state_should_stop() to halt verify as
soon as the first inflight sequence number is reached.

Fixes: Issue #1950

Signed-off-by: Riley Thomasson <riley.thomasson@gmail.com>

verify: rework write tracking for use with verify_save_state()

Currently, fio keeps track of a finite number of recent write
completions for each file in fio_file->last_write_comp. This information
is saved/loaded as part of the "verify state." The verify code
(verify_state_should_stop(), specifically) assumes that any write issued
before these recorded writes must have been successfully completed. This
is not generally true for iodepth > 1 and if writes are completed
out-of-order.

Consider this example: a single write stalls while all other writes
complete normally. This condition can persist for an arbitrarily long
time, and the stalled write will fall out of the range "covered" by
last_write_comp. Saving state at this point (e.g. via a trigger) and
halting the workload (e.g. via power-cycling the machine) will result in
that stalled write being verified when the state is loaded despite the
fact that the write may have never completed.

Instead of tracking the last N write completions, we can instead track
(1) the maximum issued sequence number, and (2) the sequence numbers for
all in-flight writes. The "sequence number" here is a monotonically
increasing value assigned to each write and verifying read (this is
already implemented as io_u->numberio). An "in-flight" write is a write
which has been issued but not yet completed. Furthermore, the number of
in-flight writes is bounded by the iodepth.

We can accomplish this by using a simple array of sequence numbers,
which are initialized to an invalid value. Before issuing a write, its
sequence number is written to a "free" slot, and then the maximum issued
sequence number is incremented. After completing a write, its slot is
changed back to the invalid value. On the verify side, we are allowed to
verify as long as the current sequence number is <= the maximum issued
sequence number AND it is not present in the inflight list.

Saving/loading this information in the verify state and using it in
verify_state_should_stop() is covered in a subsequent patch.

Fixes: Issue #1950

Signed-off-by: Riley Thomasson <riley.thomasson@gmail.com>

verify: make numberio uint64_t

io_u->numberio is used to keep track of the sequence number of writes
and verify reads. It is entirely feasible to issue millions or even
billions of IOs in a singe load, so let's use enough bits to handle
that.

numberio is copied into io_piece and verify_header, so update those
structs accordingly.

Signed-off-by: Riley Thomasson <riley.thomasson@gmail.com>

engines/io_uring: don't duplicate open/close file code

Don't repeat the code for open/close file, just have the cmd variants
call the normal helper for the actual open or close part.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/io_uring: code cleanup

Don't use an overly long line if it can be avoided.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/io_uring: cleanup fio_ioring_cmd_open_file()

For the love of deity, let's use functions where they make sense.
It nicely encapsulates code that is specific to one thing, AND it
avoids having a ton of indented levels making the code utterly
unreadable.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/io_uring: io_uring engine type cleanups

Replace the memory compare with ioengine_uring_cmd with checking the
prep pointer, as that should always be sane.

Outside of that, about half the comparisons are either redundant (eg
it's ONLY run in a uring_cmd specific handler), or should be factored
out into separate code.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

engines/io_uring: get rid of silly strcmp() calls for io_ops->name

For some reason folks thought this was a good idea, but sprinkling
strcmp() calls in a hot path is pretty crazy. Particularly when
you can just check the io_ops address for the right IO engine,
trading a string compare for a simple address compare.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'filetype-option' of https://github.com/struschev/fio

* 'filetype-option' of https://github.com/struschev/fio:
fio: add filetype option

fio: add filetype option

The filetype option enables the skipping of the 'stat' syscall for each file
defined in jobs at initialization stage, thus optimizing the huge-set-of-files
fio usage scenario.

Signed-off-by: Sergei Truschev <s.truschev@yadro.com>

Merge branch 'fix/io_uring-cq-reap' of https://github.com/calebsander/fio

* 'fix/io_uring-cq-reap' of https://github.com/calebsander/fio:
  engines/io_uring: relax CQ head atomic store ordering
  arch: add atomic_store_relaxed()
  engines/io_uring: simplify getevents control flow
  engines/io_uring: return unsigned from fio_ioring_cqring_reap()
  engines/io_uring: remove loop over CQEs in fio_ioring_cqring_reap()
  engines/io_uring: consolidate fio_ioring_cqring_reap() arguments
  Revert "engines/io_uring: update getevents max to reflect previously seen events"

engines/io_uring: relax CQ head atomic store ordering

fio_ioring_getevents() advances the io_uring CQ head index in
fio_ioring_cqring_reap() before fio_ioring_event() is called to read the
CQEs. In general this would allow the kernel to reuse the CQE slot
prematurely, but the CQ is sized large enough for the maximum iodepth
and a new io_uring operation isn't submitted until the CQE is processed.
Add a comment to explain why it's safe to advance the CQ head index
early. Use relaxed ordering for the store, as there aren't any accesses
to the CQEs that need to be ordered before the store.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>

arch: add atomic_store_relaxed()

Add a relaxed-ordering atomic store helper, analogous to
atomic_store_release() and atomic_load_relaxed().

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>

engines/io_uring: simplify getevents control flow

There is no point in comparing events to min again after calling
io_uring_enter() to wait for events, as it doesn't change either events
of min. So remove the loop condition and only compare events to min
after updating events. Don't bother repeating fio_ioring_cqring_reap()
before calling io_uring_enter() if less than the min requested events
were available, as it's highly unlikely the CQ tail will have changed.
Avoid breaking and then branching on the return value by just returning
the value from inside the loop.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>

engines/io_uring: return unsigned from fio_ioring_cqring_reap()

fio_ioring_cqring_reap() can't fail and returns an unsigned variable. So
change its return type from int to unsigned.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>

engines/io_uring: remove loop over CQEs in fio_ioring_cqring_reap()

Currently fio_ioring_cqring_reap() loops over each available CQE,
re-loading the tail index, incrementing local variables, and checking
whether the max requested CQEs have been seen.
Avoid the loop by computing the number of available CQEs as tail - head
and capping it to the requested max.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>

engines/io_uring: consolidate fio_ioring_cqring_reap() arguments

fio_ioring_cqring_reap() takes both an events and a max argument and
will return up to events - max CQEs. Only one of the two callers passes
an existing events count. So remove the events argument and have
fio_ioring_getevents() pass events - max instead. This simplifies the
function signature and avoids an addition inside the loop over CQEs.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>

Revert "engines/io_uring: update getevents max to reflect previously seen events"

This reverts commit ae8646a1e5848994c4f4511aa190178958309c92.

fio_ioring_cqring_reap() returns up to max - events CQEs. However, the
return value of fio_ioring_cqring_reap() is used to both add to events
and subtract from max. This means that if less than min CQEs are
available and the CQ needs to be polled again, max is effectively
lowered by the number of CQEs that were available. Adding to events is
sufficient to ensure the next call to fio_ioring_cqring_reap() will only
return the remaining CQEs. Commit ae8646a1e584 ("engines/io_uring:
update getevents max to reflect previously seen events") added an
incorrect subtraction from max as well, so revert it.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: ae8646a1e584 ("engines/io_uring: update getevents max to reflect previously seen events")

Merge branch 'fsync-get-io-u-from-freelist' of https://github.com/jeongjonghwi/fio

* 'fsync-get-io-u-from-freelist' of https://github.com/jeongjonghwi/fio:
io_u: get io_u from io_u_freelist when TD_FSYNCING

io_u: get io_u from io_u_freelist when TD_FSYNCING

As Commit 813445e71292 ('backend: clean up requeued io_u's') has been
applied, backend cleans up the remained io_u's in td->io_u_requeues.
However, with end_fsync=1, the __get_io_u() function returns an io_u
from td->io_u_requeues if any io_u exist, and pops it. This leads that
the synced io_u will not put file which it got, and, finally, cannot
close the file.

This patch returns io_u from td->io_u_free_list when td->runstate is
TD_FSYNCING, so that the io_u's in td->io_u_requeues will be cleaned up
and leads to close file appropriately.

Signed-off-by: Jonghwi Jeong <jongh2.jeong@samsung.com>

Merge branch 'security-token' of https://github.com/sfc-gh-rnarubin/fio

* 'security-token' of https://github.com/sfc-gh-rnarubin/fio:
engines/http: Add S3 security token support

engines/http: Add S3 security token support

Security tokens are an element of S3 authorization in some environments. This
change adds a parameter to allow users to specify a security token, and pass
this to S3 requests with the appropriate header.

Signed-off-by: Renar Narubin <renar.narubin@snowflake.com>

Merge branch 'http-filename-fix' of https://github.com/sfc-gh-rnarubin/fio

* 'http-filename-fix' of https://github.com/sfc-gh-rnarubin/fio:
engines/http: fix file name

engines/http: fix file name

Previously when using the HTTP engine and nrfiles > 1, the engine would
upload a single object N times, instead of N files once. This was due to
a file name reference using the first item in the files list, instead of
the file name passed in the IO information.

Signed-off-by: Renar Narubin <renar.narubin@snowflake.com>

Merge branch 'fix-randtrimwrite' of https://github.com/minwooim/fio

* 'fix-randtrimwrite' of https://github.com/minwooim/fio:
io_u: fix offset calculation in randtrimwrite

io_u: fix offset calculation in randtrimwrite

For randtrimwrite, we should issue trim + write pair and those offsets
should be same.

This works good for cases without `offset=` option, but not for cases
with `offset=` option.  In cases with `offset=` option, it's necessary
to subtract `file_offset`, which is value of `offset=` option, when
calculationg offset of write.

This is a bit confusing because `last_start` is an actual offset that
has already been issued through trim.  However, `last_start` is the
value to which `file_offset` is added.  Since we add back `file_offset`
later on after calling `get_next_block` in `get_next_offset`,
`last_start` should be adjusted.

Signed-off-by: Jungwon Lee <jjung1.lee@samsung.com>
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
[+ updated commit title]

windows: drop nanosleep and clock_gettime

Cygwin and msys2 now provide nanosleep and clock_gettime, so fio no
longer needs to implement them. The presence of our implementations was
triggering build failures:

https://github.com/axboe/fio/actions/runs/15828051168

Since fio no longer provides clock_gettime, stop unconditionally setting
clock_gettime and clock_monotonic to yes on Windows and start detectinga
these features at build time. These two features are successfully
detected by our configure script:

https://github.com/vincentkfu/fio/actions/runs/15832278184

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>

Merge branch 'fix-random-distribution-parsing-failure' of https://github.com/leonid-kozlov/fio

* 'fix-random-distribution-parsing-failure' of https://github.com/leonid-kozlov/fio:
parse: use minimum delimiter distance

Merge branch 'fix_real_file_size_when_pi_is_enabled' of https://github.com/SuhoSon/fio

* 'fix_real_file_size_when_pi_is_enabled' of https://github.com/SuhoSon/fio:
io_uring: ensure accurate real_file_size setup for full device access with PI enabled

io_uring: ensure accurate real_file_size setup for full device access with PI enabled

Fix real_file_size calculation when PI is enabled

When PI is enabled, the extended LBA (lba_ext) should be used to calculate
real_file_size instead of lba_size. This ensures FIO can access the entire
device area correctly.

Signed-off by: Suho Son <suho.son@samsung.com>

parse: use minimum delimiter distance

Use minimal distance to delimiter to determine option length

Current implementation of opt_len() makes impossible to
locate option name in random_distribution zones list
combining ':' and ',' chars.
opt_len() function should try to locate option name
by all possible delimiters and return minimal length one
instead of returning first found.

Fixes: https://github.com/axboe/fio/issues/1923

Signed-off-by: Leonid Kozlov <leonid.e.kozlov@gmail.com>

backend: clean up requeued io_u's

When an atttempt to queue an io_u returns FIO_Q_BUSY, the io_u is added
to td->io_u_requeues. If the runtime timeout expires with
td->io_u_requeues not empty, the job will not close the relevant
file because its file->references will be non-zero since the requeued
io_u still holds a reference to the file.

This patch discards the contents of td->io_u_requeues during io_u
cleanup which leads to file closure when its last reference is
destroyed. This is relevant for resource-constrained environments.

Suggested-by: Jonghwi Jeong <jongh2.jeong@samsung.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>