.TP
.BI fdatasync \fR=\fPint
Like \fBfsync\fR but uses \fBfdatasync\fR\|(2) to only sync data and
-not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
+not metadata blocks. In Windows, DragonFlyBSD or OSX there is no
\fBfdatasync\fR\|(2) so this falls back to using \fBfsync\fR\|(2).
Defaults to 0, which means fio does not periodically issue and wait for a
data-only sync to complete.
range of possible random values.
Defaults are: random for \fBpareto\fR and \fBzipf\fR, and 0.5 for \fBnormal\fR.
If you wanted to use \fBzipf\fR with a `theta` of 1.2 centered on 1/4 of allowed value range,
-you would use `random_distibution=zipf:1.2:0.25`.
+you would use `random_distribution=zipf:1.2:0.25`.
.P
For a \fBzoned\fR distribution, fio supports specifying percentages of I/O
access that should fall within what range of the file or device. For
per job is supported
.RE
.TP
+.BI dedupe_global \fR=\fPbool
+This controls whether the deduplication buffers will be shared amongst
+all jobs that have this option set. The buffers are spread evenly between
+participating jobs.
+.P
+.RS
+Note that \fBdedupe_mode\fR must be set to \fBworking_set\fR for this to work.
+Can be used in combination with compression
+.TP
.BI invalidate \fR=\fPbool
Invalidate the buffer/page cache parts of the files to be used prior to
starting I/O if the platform and file type support it. Defaults to true.
\fBmmaphuge\fR to work, the system must have free huge pages allocated. This
can normally be checked and set by reading/writing
`/proc/sys/vm/nr_hugepages' on a Linux system. Fio assumes a huge page
-is 4MiB in size. So to calculate the number of huge pages you need for a
-given job file, add up the I/O depth of all jobs (normally one unless
-\fBiodepth\fR is used) and multiply by the maximum bs set. Then divide
-that number by the huge page size. You can see the size of the huge pages in
-`/proc/meminfo'. If no huge pages are allocated by having a non-zero
+is 2 or 4MiB in size depending on the platform. So to calculate the number of
+huge pages you need for a given job file, add up the I/O depth of all jobs
+(normally one unless \fBiodepth\fR is used) and multiply by the maximum bs set.
+Then divide that number by the huge page size. You can see the size of the huge
+pages in `/proc/meminfo'. If no huge pages are allocated by having a non-zero
number in `nr_hugepages', using \fBmmaphuge\fR or \fBshmhuge\fR will fail. Also
see \fBhugepage\-size\fR.
.P
\fBbs\fR used.
.TP
.BI hugepage\-size \fR=\fPint
-Defines the size of a huge page. Must at least be equal to the system
-setting, see `/proc/meminfo'. Defaults to 4MiB. Should probably
-always be a multiple of megabytes, so using `hugepage\-size=Xm' is the
-preferred way to set this to avoid setting a non-pow-2 bad value.
+Defines the size of a huge page. Must at least be equal to the system setting,
+see `/proc/meminfo' and `/sys/kernel/mm/hugepages/'. Defaults to 2 or 4MiB
+depending on the platform. Should probably always be a multiple of megabytes,
+so using `hugepage\-size=Xm' is the preferred way to set this to avoid setting
+a non-pow-2 bad value.
.TP
.BI lockmem \fR=\fPint
Pin the specified amount of memory with \fBmlock\fR\|(2). Can be used to
.TP
.B exec
Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
+.TP
+.B xnvme
+I/O engine using the xNVMe C API, for NVMe devices. The xnvme engine provides
+flexibility to access GNU/Linux Kernel NVMe driver via libaio, IOCTLs, io_uring,
+the SPDK NVMe driver, or your own custom NVMe driver. The xnvme engine includes
+engine specific options. (See \fIhttps://xnvme.io/\fR).
.SS "I/O engine specific parameters"
In addition, there are some parameters which are only valid when a specific
\fBioengine\fR is in use. These are used identically to normal parameters,
before IO is started. This eliminates the need to map and release for each IO.
This is more efficient, and reduces the IO latency as well.
.TP
-.BI (io_uring)hipri
+.BI (io_uring,xnvme)hipri
If this option is set, fio will attempt to use polled IO completions. Normal IO
completions generate interrupts to signal the completion of IO, polled
completions do not. Hence they are require active reaping by the application.
submission and completion part more lightweight. Required for the below
sqthread_poll option.
.TP
-.BI (io_uring)sqthread_poll
+.BI (io_uring,xnvme)sqthread_poll
Normally fio will submit IO by issuing a system call to notify the kernel of
available items in the SQ ring. If this option is set, the act of submitting IO
will be done by a polling thread in the kernel. This frees up cycles for fio, at
.BI (cpuio)cpuchunks \fR=\fPint
Split the load into cycles of the given time. In microseconds.
.TP
+.BI (cpuio)cpumode \fR=\fPstr
+Specify how to stress the CPU. It can take these two values:
+.RS
+.RS
+.TP
+.B noop
+This is the default and directs the CPU to execute noop instructions.
+.TP
+.B qsort
+Replace the default noop instructions with a qsort algorithm to consume more energy.
+.RE
+.RE
+.TP
.BI (cpuio)exit_on_io_done \fR=\fPbool
Detect when I/O threads are done, then exit.
.TP
the full *type.id* string. If no type. prefix is given, fio will add 'client.'
by default.
.TP
+.BI (rados)conf \fR=\fPstr
+Specifies the configuration path of ceph cluster, so conf file does not
+have to be /etc/ceph/ceph.conf.
+.TP
.BI (rbd,rados)busy_poll \fR=\fPbool
Poll store instead of waiting for completion. Usually this provides better
throughput at cost of higher(up to 100%) CPU utilization.
.TP
.BI (exec)std_redirect\fR=\fbool
If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
+.TP
+.BI (xnvme)xnvme_async\fR=\fPstr
+Select the xnvme async command interface. This can take these values.
+.RS
+.RS
+.TP
+.B emu
+This is default and used to emulate asynchronous I/O
+.TP
+.BI thrpool
+Use thread pool for Asynchronous I/O
+.TP
+.BI io_uring
+Use Linux io_uring/liburing for Asynchronous I/O
+.TP
+.BI libaio
+Use Linux aio for Asynchronous I/O
+.TP
+.BI posix
+Use POSIX aio for Asynchronous I/O
+.TP
+.BI nil
+Use nil-io; For introspective perf. evaluation
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_sync\fR=\fPstr
+Select the xnvme synchronous command interface. This can take these values.
+.RS
+.RS
+.TP
+.B nvme
+This is default and uses Linux NVMe Driver ioctl() for synchronous I/O
+.TP
+.BI psync
+Use pread()/write() for synchronous I/O
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_admin\fR=\fPstr
+Select the xnvme admin command interface. This can take these values.
+.RS
+.RS
+.TP
+.B nvme
+This is default and uses Linux NVMe Driver ioctl() for admin commands
+.TP
+.BI block
+Use Linux Block Layer ioctl() and sysfs for admin commands
+.TP
+.BI file_as_ns
+Use file-stat as to construct NVMe idfy responses
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_dev_nsid\fR=\fPint
+xnvme namespace identifier, for userspace NVMe driver.
+.TP
+.BI (xnvme)xnvme_iovec
+If this option is set, xnvme will use vectored read/write commands.
.SS "I/O depth"
.TP
.BI iodepth \fR=\fPint
To avoid false verification errors, do not use the norandommap option when
verifying data with async I/O engines and I/O depths > 1. Or use the
norandommap and the lfsr random generator together to avoid writing to the
-same offset with muliple outstanding I/Os.
+same offset with multiple outstanding I/Os.
.RE
.TP
.BI verify_offset \fR=\fPint
write_type_log for each log type, instead of the default zero-based
timestamps.
.TP
+.BI log_alternate_epoch \fR=\fPbool
+If set, fio will log timestamps based on the epoch used by the clock specified
+in the \fBlog_alternate_epoch_clock_id\fR option, to the log files produced by
+enabling write_type_log for each log type, instead of the default zero-based
+timestamps.
+.TP
+.BI log_alternate_epoch_clock_id \fR=\fPint
+Specifies the clock_id to be used by clock_gettime to obtain the alternate epoch
+if either \fBBlog_unix_epoch\fR or \fBlog_alternate_epoch\fR are true. Otherwise has no
+effect. Default value is 0, or CLOCK_REALTIME.
+.TP
.BI block_error_percentiles \fR=\fPbool
If set, record errors in trim block-sized units from writes and trims and
output a histogram of how many trims it took to get to errors, and what kind
.TP
.B wait
Wait for `offset' microseconds. Everything below 100 is discarded.
-The time is relative to the previous `wait' statement.
+The time is relative to the previous `wait' statement. Note that action `wait`
+is not allowed as of version 3, as the same behavior can be achieved using
+timestamps.
.TP
.B read
Read `length' bytes beginning from `offset'.
Trim the given file from the given `offset' for `length' bytes.
.RE
.RE
+.RE
+.TP
+.B Trace file format v3
+The third version of the trace file format was added in fio version 3.31. It
+forces each action to have a timestamp associated with it.
+.RS
+.P
+The first line of the trace file has to be:
+.RS
+.P
+"fio version 3 iolog"
+.RE
+.P
+Following this can be lines in two different formats, which are described below.
+.P
+.B
+The file management format:
+.RS
+timestamp filename action
+.P
+.RE
+.B
+The file I/O action format:
+.RS
+timestamp filename action offset length
+.P
+The `timestamp` is relative to the beginning of the run (ie starts at 0). The
+`filename`, `action`, `offset` and `length` are identical to version 2, except
+that version 3 does not allow the `wait` action.
+.RE
+.RE
.SH I/O REPLAY \- MERGING TRACES
Colocation is a common practice used to get the most out of a machine.
Knowing which workloads play nicely with each other and which ones don't is