Merge branch 'update-fio-ioops-version' of https://github.com/diameter/fio

[fio.git] / fio.1
diff --git a/fio.1 b/fio.1

index 3a7a359b1a6cc4bbe7858795995bfff2462ae1ff..481193254832befdd285394d6424c209b8805989 100644 (file)
--- a/fio.1
+++ b/fio.1
@@ -738,12 +738,13 @@ Accepted values are:
  .RS
  .TP
  .B none
-The \fBzonerange\fR, \fBzonesize\fR and \fBzoneskip\fR parameters are ignored.
+The \fBzonerange\fR, \fBzonesize\fR \fBzonecapacity\fR and \fBzoneskip\fR
+parameters are ignored.
  .TP
  .B strided
  I/O happens in a single zone until \fBzonesize\fR bytes have been transferred.
  After that number of bytes has been transferred processing of the next zone
-starts.
+starts. The \fBzonecapacity\fR parameter is ignored.
  .TP
  .B zbd
  Zoned block device mode. I/O happens sequentially in each zone, even if random
@@ -771,6 +772,14 @@ zoned block device, the specified \fBzonesize\fR must be 0 or equal to the
  device zone size. For a regular block device or file, the specified
  \fBzonesize\fR must be at least 512B.
  .TP
+.BI zonecapacity \fR=\fPint
+For \fBzonemode\fR=zbd, this defines the capacity of a single zone, which is
+the accessible area starting from the zone start address. This parameter only
+applies when using \fBzonemode\fR=zbd in combination with regular block devices.
+If not specified it defaults to the zone size. If the target device is a zoned
+block device, the zone capacity is obtained from the device information and this
+option is ignored.
+.TP
  .BI zoneskip \fR=\fPint
  For \fBzonemode\fR=strided, the number of bytes to skip after \fBzonesize\fR
  bytes of data have been transferred.
@@ -804,7 +813,11 @@ so. Default: false.
  When running a random write test across an entire drive many more zones will be
  open than in a typical application workload. Hence this command line option
  that allows to limit the number of open zones. The number of open zones is
-defined as the number of zones to which write commands are issued.
+defined as the number of zones to which write commands are issued by all
+threads/processes.
+.TP
+.BI job_max_open_zones \fR=\fPint
+Limit on the number of simultaneously opened zones per single thread/process.
  .TP
  .BI zone_reset_threshold \fR=\fPfloat
  A number between zero and one that indicates the ratio of logical blocks with
@@ -1449,9 +1462,31 @@ starting I/O if the platform and file type support it. Defaults to true.
  This will be ignored if \fBpre_read\fR is also specified for the
  same job.
  .TP
-.BI sync \fR=\fPbool
-Use synchronous I/O for buffered writes. For the majority of I/O engines,
-this means using O_SYNC. Default: false.
+.BI sync \fR=\fPstr
+Whether, and what type, of synchronous I/O to use for writes.  The allowed
+values are:
+.RS
+.RS
+.TP
+.B none
+Do not use synchronous IO, the default.
+.TP
+.B 0
+Same as \fBnone\fR.
+.TP
+.B sync
+Use synchronous file IO. For the majority of I/O engines,
+this means using O_SYNC.
+.TP
+.B 1
+Same as \fBsync\fR.
+.TP
+.B dsync
+Use synchronous data IO. For the majority of I/O engines,
+this means using O_DSYNC.
+.PD
+.RE
+.RE
  .TP
  .BI iomem \fR=\fPstr "\fR,\fP mem" \fR=\fPstr
  Fio can use various types of memory as the I/O unit buffer. The allowed
@@ -1548,7 +1583,8 @@ if \fBsize\fR is set to 20GiB and \fBio_size\fR is set to 5GiB, fio
  will perform I/O within the first 20GiB but exit when 5GiB have been
  done. The opposite is also possible \-\- if \fBsize\fR is set to 20GiB,
  and \fBio_size\fR is set to 40GiB, then fio will do 40GiB of I/O within
-the 0..20GiB region.
+the 0..20GiB region. Value can be set as percentage: \fBio_size\fR=N%.
+In this case \fBio_size\fR multiplies \fBsize\fR= value.
  .TP
  .BI filesize \fR=\fPirange(int)
  Individual file sizes. May be a range, in which case fio will select sizes
@@ -1661,11 +1697,6 @@ to get desired CPU usage, as the cpuload only loads a
  single CPU at the desired rate. A job never finishes unless there is
  at least one non-cpuio job.
  .TP
-.B guasi
-The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi-lib.html\fR
-for more info on GUASI.
-.TP
  .B rdma
  The RDMA I/O engine supports both RDMA memory semantics
  (RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
@@ -1806,7 +1837,8 @@ Set the percentage of I/O that will be issued with higher priority by setting
  the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
  This option cannot be used with the `prio` or `prioclass` options. For this
  option to set the priority bit properly, NCQ priority must be supported and
-enabled and `direct=1' option must be used.
+enabled and `direct=1' option must be used. fio must also be run as the root
+user.
  .TP
  .BI (io_uring)fixedbufs
  If fio is asked to do direct IO, then Linux will map pages for each IO call, and
@@ -1852,6 +1884,22 @@ than normal.
  When hipri is set this determines the probability of a pvsync2 I/O being high
  priority. The default is 100%.
  .TP
+.BI (pvsync2,libaio,io_uring)nowait
+By default if a request cannot be executed immediately (e.g. resource starvation,
+waiting on locks) it is queued and the initiating process will be blocked until
+the required resource becomes free.
+This option sets the RWF_NOWAIT flag (supported from the 4.14 Linux kernel) and
+the call will return instantly with EAGAIN or a partial result rather than waiting.
+
+It is useful to also use \fBignore_error\fR=EAGAIN when using this option.
+Note: glibc 2.27, 2.28 have a bug in syscall wrappers preadv2, pwritev2.
+They return EOPNOTSUP instead of EAGAIN.
+
+For cached I/O, using this option usually means a request operates only with
+cached data. Currently the RWF_NOWAIT flag does not supported for cached write.
+For direct I/O, requests will only succeed if cache invalidation isn't required,
+file blocks are fully allocated and the disk request could be issued immediately.
+.TP
  .BI (cpuio)cpuload \fR=\fPint
  Attempt to use the specified percentage of CPU cycles. This is a mandatory
  option when using cpuio I/O engine.
@@ -2189,7 +2237,7 @@ has a bit of extra overhead, especially for lower queue depth I/O where it
  can increase latencies. The benefit is that fio can manage submission rates
  independently of the device completion rates. This avoids skewed latency
  reporting if I/O gets backed up on the device side (the coordinated omission
-problem).
+problem). Note that this option cannot reliably be used with async IO engines.
  .SS "I/O rate"
  .TP
  .BI thinktime \fR=\fPtime
@@ -2306,7 +2354,9 @@ replay, the file needs to be turned into a blkparse binary data file first
  You can specify a number of files by separating the names with a ':' character.
  See the \fBfilename\fR option for information on how to escape ':'
  characters within the file names. These files will be sequentially assigned to
-job clones created by \fBnumjobs\fR.
+job clones created by \fBnumjobs\fR. '-' is a reserved name, meaning read from
+stdin, notably if \fBfilename\fR is set to '-' which means stdin as well,
+then this flag can't be set to '-'.
  .TP
  .BI read_iolog_chunked \fR=\fPbool
  Determines how iolog is read. If false (default) entire \fBread_iolog\fR will
@@ -2517,27 +2567,25 @@ The ID of the flow. If not specified, it defaults to being a global
  flow. See \fBflow\fR.
  .TP
  .BI flow \fR=\fPint
-Weight in token-based flow control. If this value is used, then there is
-a 'flow counter' which is used to regulate the proportion of activity between
-two or more jobs. Fio attempts to keep this flow counter near zero. The
-\fBflow\fR parameter stands for how much should be added or subtracted to the
-flow counter on each iteration of the main I/O loop. That is, if one job has
-`flow=8' and another job has `flow=\-1', then there will be a roughly 1:8
-ratio in how much one runs vs the other.
-.TP
-.BI flow_watermark \fR=\fPint
-The maximum value that the absolute value of the flow counter is allowed to
-reach before the job must wait for a lower value of the counter.
+Weight in token-based flow control. If this value is used,
+then fio regulates the activity between two or more jobs
+sharing the same flow_id.
+Fio attempts to keep each job activity proportional to other jobs' activities
+in the same flow_id group, with respect to requested weight per job.
+That is, if one job has `flow=3', another job has `flow=2'
+and another with `flow=1`, then there will be a roughly 3:2:1 ratio
+in how much one runs vs the others.
  .TP
  .BI flow_sleep \fR=\fPint
-The period of time, in microseconds, to wait after the flow watermark has
-been exceeded before retrying operations.
+The period of time, in microseconds, to wait after the flow counter
+has exceeded its proportion before retrying operations.
  .TP
  .BI stonewall "\fR,\fB wait_for_previous"
  Wait for preceding jobs in the job file to exit, before starting this
  one. Can be used to insert serialization points in the job file. A stone
  wall also implies starting a new reporting group, see
-\fBgroup_reporting\fR.
+\fBgroup_reporting\fR. Optionally you can use `stonewall=0` to disable or
+`stonewall=1` to enable it.
  .TP
  .BI exitall
  By default, fio will continue running all other jobs when one job finishes.
@@ -2545,15 +2593,27 @@ Sometimes this is not the desired action. Setting \fBexitall\fR will instead
  make fio terminate all jobs in the same group, as soon as one job of that
  group finishes.
  .TP
-.BI exit_what
+.BI exit_what \fR=\fPstr
  By default, fio will continue running all other jobs when one job finishes.
-Sometimes this is not the desired action. Setting \fBexit_all\fR will instead
+Sometimes this is not the desired action. Setting \fBexitall\fR will instead
  make fio terminate all jobs in the same group. The option \fBexit_what\fR
-allows to control which jobs get terminated when \fBexitall\fR is enabled. The
-default is \fBgroup\fR and does not change the behaviour of \fBexitall\fR. The
-setting \fBall\fR terminates all jobs. The setting \fBstonewall\fR terminates
-all currently running jobs across all groups and continues execution with the
-next stonewalled group.
+allows you to control which jobs get terminated when \fBexitall\fR is enabled.
+The default value is \fBgroup\fR.
+The allowed values are:
+.RS
+.RS
+.TP
+.B all
+terminates all jobs.
+.TP
+.B group
+is the default and does not change the behaviour of \fBexitall\fR.
+.TP
+.B stonewall
+terminates all currently running jobs across all groups and continues
+execution with the next stonewalled group.
+.RE
+.RE
  .TP
  .BI exec_prerun \fR=\fPstr
  Before running this job, issue the command specified through
@@ -3842,7 +3902,8 @@ Fio supports a variety of log file formats, for logging latencies, bandwidth,
  and IOPS. The logs share a common format, which looks like this:
  .RS
  .P
-time (msec), value, data direction, block size (bytes), offset (bytes)
+time (msec), value, data direction, block size (bytes), offset (bytes),
+command priority
  .RE
  .P
  `Time' for the log entry is always in milliseconds. The `value' logged depends
@@ -3876,6 +3937,9 @@ The entry's `block size' is always in bytes. The `offset' is the position in byt
  from the start of the file for that particular I/O. The logging of the offset can be
  toggled with \fBlog_offset\fR.
  .P
+`Command priority` is 0 for normal priority and 1 for high priority. This is controlled
+by the ioengine specific \fBcmdprio_percentage\fR.
+.P
  Fio defaults to logging every individual I/O but when windowed logging is set
  through \fBlog_avg_msec\fR, either the average (by default) or the maximum
  (\fBlog_max_value\fR is set) `value' seen over the specified period of time