configure: refer to zlib1g-dev package for zlib support

[fio.git] / fio.1
diff --git a/fio.1 b/fio.1

index b87d2309ac16704256f2deb1f08f96b0d8069ca2..ded7bbfc993bd431a10a034a1c63e14d765d44e1 100644 (file)
--- a/fio.1
+++ b/fio.1
@@ -1122,7 +1122,7 @@ see \fBend_fsync\fR and \fBfsync_on_close\fR.
  .TP
  .BI fdatasync \fR=\fPint
  Like \fBfsync\fR but uses \fBfdatasync\fR\|(2) to only sync data and
-not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
+not metadata blocks. In Windows, DragonFlyBSD or OSX there is no
  \fBfdatasync\fR\|(2) so this falls back to using \fBfsync\fR\|(2).
  Defaults to 0, which means fio does not periodically issue and wait for a
  data-only sync to complete.
@@ -1221,7 +1221,7 @@ more control over most probable outcome. This value is in range [0-1] which maps
  range of possible random values.
  Defaults are: random for \fBpareto\fR and \fBzipf\fR, and 0.5 for \fBnormal\fR.
  If you wanted to use \fBzipf\fR with a `theta` of 1.2 centered on 1/4 of allowed value range,
-you would use `random_distibution=zipf:1.2:0.25`.
+you would use `random_distribution=zipf:1.2:0.25`.
  .P
  For a \fBzoned\fR distribution, fio supports specifying percentages of I/O
  access that should fall within what range of the file or device. For
@@ -1553,6 +1553,15 @@ Note that \fBsize\fR needs to be explicitly provided and only 1 file
  per job is supported
  .RE
  .TP
+.BI dedupe_global \fR=\fPbool
+This controls whether the deduplication buffers will be shared amongst
+all jobs that have this option set. The buffers are spread evenly between
+participating jobs.
+.P
+.RS
+Note that \fBdedupe_mode\fR must be set to \fBworking_set\fR for this to work.
+Can be used in combination with compression
+.TP
  .BI invalidate \fR=\fPbool
  Invalidate the buffer/page cache parts of the files to be used prior to
  starting I/O if the platform and file type support it. Defaults to true.
@@ -1956,6 +1965,12 @@ via kernel NFS.
  .TP
  .B exec
  Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
+.TP
+.B xnvme
+I/O engine using the xNVMe C API, for NVMe devices. The xnvme engine provides
+flexibility to access GNU/Linux Kernel NVMe driver via libaio, IOCTLs, io_uring,
+the SPDK NVMe driver, or your own custom NVMe driver. The xnvme engine includes
+engine specific options. (See \fIhttps://xnvme.io/\fR).
  .SS "I/O engine specific parameters"
  In addition, there are some parameters which are only valid when a specific
  \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -1995,10 +2010,34 @@ To get a finer control over I/O priority, this option allows specifying
  the percentage of IOs that must have a priority set depending on the block
  size of the IO. This option is useful only when used together with the option
  \fBbssplit\fR, that is, multiple different block sizes are used for reads and
-writes. The format for this option is the same as the format of the
-\fBbssplit\fR option, with the exception that values for trim IOs are
-ignored. This option is mutually exclusive with the \fBcmdprio_percentage\fR
-option.
+writes.
+.RS
+.P
+The first accepted format for this option is the same as the format of the
+\fBbssplit\fR option:
+.RS
+.P
+cmdprio_bssplit=blocksize/percentage:blocksize/percentage
+.RE
+.P
+In this case, each entry will use the priority class and priority level defined
+by the options \fBcmdprio_class\fR and \fBcmdprio\fR respectively.
+.P
+The second accepted format for this option is:
+.RS
+.P
+cmdprio_bssplit=blocksize/percentage/class/level:blocksize/percentage/class/level
+.RE
+.P
+In this case, the priority class and priority level is defined inside each
+entry. In comparison with the first accepted format, the second accepted format
+does not restrict all entries to have the same priority class and priority
+level.
+.P
+For both formats, only the read and write data directions are supported, values
+for trim IOs are ignored. This option is mutually exclusive with the
+\fBcmdprio_percentage\fR option.
+.RE
  .TP
  .BI (io_uring)fixedbufs
  If fio is asked to do direct IO, then Linux will map pages for each IO call, and
@@ -2006,7 +2045,7 @@ release them when IO is done. If this option is set, the pages are pre-mapped
  before IO is started. This eliminates the need to map and release for each IO.
  This is more efficient, and reduces the IO latency as well.
  .TP
-.BI (io_uring)hipri
+.BI (io_uring,xnvme)hipri
  If this option is set, fio will attempt to use polled IO completions. Normal IO
  completions generate interrupts to signal the completion of IO, polled
  completions do not. Hence they are require active reaping by the application.
@@ -2019,7 +2058,7 @@ This avoids the overhead of managing file counts in the kernel, making the
  submission and completion part more lightweight. Required for the below
  sqthread_poll option.
  .TP
-.BI (io_uring)sqthread_poll
+.BI (io_uring,xnvme)sqthread_poll
  Normally fio will submit IO by issuing a system call to notify the kernel of
  available items in the SQ ring. If this option is set, the act of submitting IO
  will be done by a polling thread in the kernel. This frees up cycles for fio, at
@@ -2067,6 +2106,19 @@ option when using cpuio I/O engine.
  .BI (cpuio)cpuchunks \fR=\fPint
  Split the load into cycles of the given time. In microseconds.
  .TP
+.BI (cpuio)cpumode \fR=\fPstr
+Specify how to stress the CPU. It can take these two values:
+.RS
+.RS
+.TP
+.B noop
+This is the default and directs the CPU to execute noop instructions.
+.TP
+.B qsort
+Replace the default noop instructions with a qsort algorithm to consume more energy.
+.RE
+.RE
+.TP
  .BI (cpuio)exit_on_io_done \fR=\fPbool
  Detect when I/O threads are done, then exit.
  .TP
@@ -2434,6 +2486,66 @@ Defines the time between the SIGTERM and SIGKILL signals. Default is 1 second.
  .TP
  .BI (exec)std_redirect\fR=\fbool
  If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
+.TP
+.BI (xnvme)xnvme_async\fR=\fPstr
+Select the xnvme async command interface. This can take these values.
+.RS
+.RS
+.TP
+.B emu
+This is default and used to emulate asynchronous I/O
+.TP
+.BI thrpool
+Use thread pool for Asynchronous I/O
+.TP
+.BI io_uring
+Use Linux io_uring/liburing for Asynchronous I/O
+.TP
+.BI libaio
+Use Linux aio for Asynchronous I/O
+.TP
+.BI posix
+Use POSIX aio for Asynchronous I/O
+.TP
+.BI nil
+Use nil-io; For introspective perf. evaluation
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_sync\fR=\fPstr
+Select the xnvme synchronous command interface. This can take these values.
+.RS
+.RS
+.TP
+.B nvme
+This is default and uses Linux NVMe Driver ioctl() for synchronous I/O
+.TP
+.BI psync
+Use pread()/write() for synchronous I/O
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_admin\fR=\fPstr
+Select the xnvme admin command interface. This can take these values.
+.RS
+.RS
+.TP
+.B nvme
+This is default and uses Linux NVMe Driver ioctl() for admin commands
+.TP
+.BI block
+Use Linux Block Layer ioctl() and sysfs for admin commands
+.TP
+.BI file_as_ns
+Use file-stat as to construct NVMe idfy responses
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_dev_nsid\fR=\fPint
+xnvme namespace identifier, for userspace NVMe driver.
+.TP
+.BI (xnvme)xnvme_iovec
+If this option is set, xnvme will use vectored read/write commands.
  .SS "I/O depth"
  .TP
  .BI iodepth \fR=\fPint
@@ -3045,7 +3157,7 @@ the verify will be of the newly written data.
  To avoid false verification errors, do not use the norandommap option when
  verifying data with async I/O engines and I/O depths > 1.  Or use the
  norandommap and the lfsr random generator together to avoid writing to the
-same offset with muliple outstanding I/Os.
+same offset with multiple outstanding I/Os.
  .RE
  .TP
  .BI verify_offset \fR=\fPint
@@ -3360,6 +3472,17 @@ If set, fio will log Unix timestamps to the log files produced by enabling
  write_type_log for each log type, instead of the default zero-based
  timestamps.
  .TP
+.BI log_alternate_epoch \fR=\fPbool
+If set, fio will log timestamps based on the epoch used by the clock specified
+in the \fBlog_alternate_epoch_clock_id\fR option, to the log files produced by
+enabling write_type_log for each log type, instead of the default zero-based
+timestamps.
+.TP
+.BI log_alternate_epoch_clock_id \fR=\fPint
+Specifies the clock_id to be used by clock_gettime to obtain the alternate epoch
+if either \fBBlog_unix_epoch\fR or \fBlog_alternate_epoch\fR are true. Otherwise has no
+effect. Default value is 0, or CLOCK_REALTIME.
+.TP
  .BI block_error_percentiles \fR=\fPbool
  If set, record errors in trim block-sized units from writes and trims and
  output a histogram of how many trims it took to get to errors, and what kind
@@ -4069,7 +4192,9 @@ given in bytes. The `action' can be one of these:
  .TP
  .B wait
  Wait for `offset' microseconds. Everything below 100 is discarded.
-The time is relative to the previous `wait' statement.
+The time is relative to the previous `wait' statement. Note that action `wait`
+is not allowed as of version 3, as the same behavior can be achieved using
+timestamps.
  .TP
  .B read
  Read `length' bytes beginning from `offset'.
@@ -4087,6 +4212,37 @@ Write `length' bytes beginning from `offset'.
  Trim the given file from the given `offset' for `length' bytes.
  .RE
  .RE
+.RE
+.TP
+.B Trace file format v3
+The third version of the trace file format was added in fio version 3.31. It
+forces each action to have a timestamp associated with it.
+.RS
+.P
+The first line of the trace file has to be:
+.RS
+.P
+"fio version 3 iolog"
+.RE
+.P
+Following this can be lines in two different formats, which are described below.
+.P
+.B
+The file management format:
+.RS
+timestamp filename action
+.P
+.RE
+.B
+The file I/O action format:
+.RS
+timestamp filename action offset length
+.P
+The `timestamp` is relative to the beginning of the run (ie starts at 0). The
+`filename`, `action`, `offset` and `length`  are identical to version 2, except
+that version 3 does not allow the `wait` action.
+.RE
+.RE
  .SH I/O REPLAY \- MERGING TRACES
  Colocation is a common practice used to get the most out of a machine.
  Knowing which workloads play nicely with each other and which ones don't is