thread/process.
.. option:: ignore_zone_limits=bool
+
If this option is used, fio will ignore the maximum number of open
zones limit of the zoned block device in use, thus allowing the
option :option:`max_open_zones` value to be larger than the device
range of possible random values.
Defaults are: random for **pareto** and **zipf**, and 0.5 for **normal**.
If you wanted to use **zipf** with a `theta` of 1.2 centered on 1/4 of allowed value range,
- you would use ``random_distibution=zipf:1.2:0.25``.
+ you would use ``random_distribution=zipf:1.2:0.25``.
For a **zoned** distribution, fio supports specifying percentages of I/O
access that should fall within what range of the file or device. For
Note that size needs to be explicitly provided and only 1 file per
job is supported
+.. option:: dedupe_global=bool
+
+ This controls whether the deduplication buffers will be shared amongst
+ all jobs that have this option set. The buffers are spread evenly between
+ participating jobs.
+
.. option:: invalidate=bool
Invalidate the buffer/page cache parts of the files to be used prior to
**mmaphuge** to work, the system must have free huge pages allocated. This
can normally be checked and set by reading/writing
:file:`/proc/sys/vm/nr_hugepages` on a Linux system. Fio assumes a huge page
- is 4MiB in size. So to calculate the number of huge pages you need for a
- given job file, add up the I/O depth of all jobs (normally one unless
- :option:`iodepth` is used) and multiply by the maximum bs set. Then divide
- that number by the huge page size. You can see the size of the huge pages in
- :file:`/proc/meminfo`. If no huge pages are allocated by having a non-zero
- number in `nr_hugepages`, using **mmaphuge** or **shmhuge** will fail. Also
- see :option:`hugepage-size`.
+ is 2 or 4MiB in size depending on the platform. So to calculate the
+ number of huge pages you need for a given job file, add up the I/O
+ depth of all jobs (normally one unless :option:`iodepth` is used) and
+ multiply by the maximum bs set. Then divide that number by the huge
+ page size. You can see the size of the huge pages in
+ :file:`/proc/meminfo`. If no huge pages are allocated by having a
+ non-zero number in `nr_hugepages`, using **mmaphuge** or **shmhuge**
+ will fail. Also see :option:`hugepage-size`.
**mmaphuge** also needs to have hugetlbfs mounted and the file location
should point there. So if it's mounted in :file:`/huge`, you would use
.. option:: hugepage-size=int
- Defines the size of a huge page. Must at least be equal to the system
- setting, see :file:`/proc/meminfo`. Defaults to 4MiB. Should probably
- always be a multiple of megabytes, so using ``hugepage-size=Xm`` is the
- preferred way to set this to avoid setting a non-pow-2 bad value.
+ Defines the size of a huge page. Must at least be equal to the system
+ setting, see :file:`/proc/meminfo` and
+ :file:`/sys/kernel/mm/hugepages/`. Defaults to 2 or 4MiB depending on
+ the platform. Should probably always be a multiple of megabytes, so
+ using ``hugepage-size=Xm`` is the preferred way to set this to avoid
+ setting a non-pow-2 bad value.
.. option:: lockmem=int
**exec**
Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
+ **xnvme**
+ I/O engine using the xNVMe C API, for NVMe devices. The xnvme engine provides
+ flexibility to access GNU/Linux Kernel NVMe driver via libaio, IOCTLs, io_uring,
+ the SPDK NVMe driver, or your own custom NVMe driver. The xnvme engine includes
+ engine specific options. (See https://xnvme.io).
+
I/O engine specific parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
making the submission and completion part more lightweight. Required
for the below :option:`sqthread_poll` option.
-.. option:: sqthread_poll : [io_uring]
+.. option:: sqthread_poll : [io_uring] [xnvme]
Normally fio will submit IO by issuing a system call to notify the
kernel of available items in the SQ ring. If this option is set, the
.. option:: hipri
- [io_uring]
+ [io_uring], [xnvme]
If this option is set, fio will attempt to use polled IO completions.
Normal IO completions generate interrupts to signal the completion of
the full *type.id* string. If no type. prefix is given, fio will add
'client.' by default.
+.. option:: conf=str : [rados]
+
+ Specifies the configuration path of ceph cluster, so conf file does not
+ have to be /etc/ceph/ceph.conf.
+
.. option:: busy_poll=bool : [rbd,rados]
Poll store instead of waiting for completion. Usually this provides better
If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
+.. option:: xnvme_async=str : [xnvme]
+
+ Select the xnvme async command interface. This can take these values.
+
+ **emu**
+ This is default and used to emulate asynchronous I/O.
+ **thrpool**
+ Use thread pool for Asynchronous I/O.
+ **io_uring**
+ Use Linux io_uring/liburing for Asynchronous I/O.
+ **libaio**
+ Use Linux aio for Asynchronous I/O.
+ **posix**
+ Use POSIX aio for Asynchronous I/O.
+ **nil**
+ Use nil-io; For introspective perf. evaluation
+
+.. option:: xnvme_sync=str : [xnvme]
+
+ Select the xnvme synchronous command interface. This can take these values.
+
+ **nvme**
+ This is default and uses Linux NVMe Driver ioctl() for synchronous I/O.
+ **psync**
+ Use pread()/write() for synchronous I/O.
+
+.. option:: xnvme_admin=str : [xnvme]
+
+ Select the xnvme admin command interface. This can take these values.
+
+ **nvme**
+ This is default and uses linux NVMe Driver ioctl() for admin commands.
+ **block**
+ Use Linux Block Layer ioctl() and sysfs for admin commands.
+ **file_as_ns**
+ Use file-stat to construct NVMe idfy responses.
+
+.. option:: xnvme_dev_nsid=int : [xnvme]
+
+ xnvme namespace identifier, for userspace NVMe driver.
+
+.. option:: xnvme_iovec=int : [xnvme]
+
+ If this option is set. xnvme will use vectored read/write commands.
+
I/O depth
~~~~~~~~~
To avoid false verification errors, do not use the norandommap option when
verifying data with async I/O engines and I/O depths > 1. Or use the
norandommap and the lfsr random generator together to avoid writing to the
- same offset with muliple outstanding I/Os.
+ same offset with multiple outstanding I/Os.
.. option:: verify_offset=int
**wait**
Wait for `offset` microseconds. Everything below 100 is discarded.
- The time is relative to the previous `wait` statement.
+ The time is relative to the previous `wait` statement. Note that
+ action `wait` is not allowed as of version 3, as the same behavior
+ can be achieved using timestamps.
**read**
Read `length` bytes beginning from `offset`.
**write**
Trim the given file from the given `offset` for `length` bytes.
+Trace file format v3
+~~~~~~~~~~~~~~~~~~~~
+
+The third version of the trace file format was added in fio version 3.31. It
+forces each action to have a timestamp associated with it.
+
+The first line of the trace file has to be::
+
+ fio version 3 iolog
+
+Following this can be lines in two different formats, which are described below.
+
+The file management format::
+
+ timestamp filename action
+
+The file I/O action format::
+
+ timestamp filename action offset length
+
+The `timestamp` is relative to the beginning of the run (ie starts at 0). The
+`filename`, `action`, `offset` and `length` are identical to version 2, except
+that version 3 does not allow the `wait` action.
+
+
I/O Replay - Merging Traces
---------------------------