Merge branch 'evelu-peak' of https://github.com/ErwanAliasr1/fio

[fio.git] / HOWTO
diff --git a/HOWTO b/HOWTO

index e1399b4a101925d94527542e4c5f8f696668912b..297a04851a2bbb747d9c3b8e6df0143be45203a2 100644 (file)
--- a/HOWTO
+++ b/HOWTO
@@ -93,6 +93,12 @@ Command line options
                         Dump info related to I/O rate switching.
         *compress*
                         Dump info related to log compress/decompress.
+       *steadystate*
+                       Dump info related to steadystate detection.
+       *helperthread*
+                       Dump info related to the helper thread.
+       *zbd*
+                       Dump info related to support for zoned block devices.
         *?* or *help*
                         Show available debug options.
  
@@ -100,6 +106,10 @@ Command line options
  
         Parse options only, don't start any I/O.
  
+.. option:: --merge-blktrace-only
+
+       Merge blktraces only, don't start any I/O.
+
  .. option:: --output=filename
  
         Write output to file `filename`.
@@ -194,7 +204,10 @@ Command line options
         Force a full status dump of cumulative (from job start) values at `time`
         intervals. This option does *not* provide per-period measurements. So
         values such as bandwidth are running averages. When the time unit is omitted,
-       `time` is interpreted in seconds.
+       `time` is interpreted in seconds. Note that using this option with
+       ``--output-format=json`` will yield output that technically isn't valid
+       json, since the output will be collated sets of valid json. It will need
+       to be split into valid sets of json after the run.
  
  .. option:: --section=name
  
@@ -209,8 +222,8 @@ Command line options
  
  .. option:: --alloc-size=kb
  
-       Set the internal smalloc pool size to `kb` in KiB.  The
-       ``--alloc-size`` switch allows one to use a larger pool size for smalloc.
+       Allocate additional internal smalloc pools of size `kb` in KiB.  The
+       ``--alloc-size`` option increases shared memory set aside for use by fio.
         If running large jobs with randommap enabled, fio can run out of memory.
         Smalloc is an internal allocator for shared structures from a fixed size
         memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
@@ -283,7 +296,8 @@ Command line options
  
  .. option:: --aux-path=path
  
-       Use this `path` for fio state generated files.
+       Use the directory specified by `path` for generated state files instead
+       of the current working directory.
  
  Any parameters following the options will be assumed to be job files, unless
  they match a job file parameter. Multiple job files can be listed and each job
@@ -530,6 +544,9 @@ Parameter types
                 * *Ti* -- means tebi (Ti) or 1024**4
                 * *Pi* -- means pebi (Pi) or 1024**5
  
+       For Zone Block Device Mode:
+               * *z*  -- means Zone
+
         With :option:`kb_base`\=1024 (the default), the unit prefixes are opposite
         from those specified in the SI and IEC 80000-13 standards to provide
         compatibility with old scripts.  For example, 4k means 4096.
@@ -748,11 +765,14 @@ Target file/device
         assigned equally distributed to job clones created by :option:`numjobs` as
         long as they are using generated filenames. If specific `filename(s)` are
         set fio will use the first listed directory, and thereby matching the
-       `filename` semantic which generates a file each clone if not specified, but
-       let all clones use the same if set.
+       `filename` semantic (which generates a file for each clone if not
+       specified, but lets all clones use the same file if set).
  
-       See the :option:`filename` option for information on how to escape "``:``" and
-       "``\``" characters within the directory path itself.
+       See the :option:`filename` option for information on how to escape "``:``"
+       characters within the directory path itself.
+
+       Note: To control the directory fio will use for internal state files
+       use :option:`--aux-path`.
  
  .. option:: filename=str
  
@@ -768,10 +788,10 @@ Target file/device
         by this option will be :option:`size` divided by number of files unless an
         explicit size is specified by :option:`filesize`.
  
-       Each colon and backslash in the wanted path must be escaped with a ``\``
+       Each colon in the wanted path must be escaped with a ``\``
         character.  For instance, if the path is :file:`/dev/dsk/foo@3,0:c` then you
         would use ``filename=/dev/dsk/foo@3,0\:c`` and if the path is
-       :file:`F:\\filename` then you would use ``filename=F\:\\filename``.
+       :file:`F:\\filename` then you would use ``filename=F\:\filename``.
  
         On Windows, disk devices are accessed as :file:`\\\\.\\PhysicalDrive0` for
         the first device, :file:`\\\\.\\PhysicalDrive1` for the second etc.
@@ -792,6 +812,8 @@ Target file/device
  
                 **$jobname**
                                 The name of the worker thread or process.
+               **$clientuid**
+                               IP of the fio process when using client/server mode.
                 **$jobnum**
                                 The incremental number of the worker thread or process.
                 **$filenum**
@@ -948,18 +970,112 @@ Target file/device
  
         Unlink job files after each iteration or loop.  Default: false.
  
-.. option:: zonesize=int
+.. option:: zonemode=str
  
-       Divide a file into zones of the specified size. See :option:`zoneskip`.
+       Accepted values are:
+
+               **none**
+                               The :option:`zonerange`, :option:`zonesize`,
+                               :option `zonecapacity` and option:`zoneskip`
+                               parameters are ignored.
+               **strided**
+                               I/O happens in a single zone until
+                               :option:`zonesize` bytes have been transferred.
+                               After that number of bytes has been
+                               transferred processing of the next zone
+                               starts. :option `zonecapacity` is ignored.
+               **zbd**
+                               Zoned block device mode. I/O happens
+                               sequentially in each zone, even if random I/O
+                               has been selected. Random I/O happens across
+                               all zones instead of being restricted to a
+                               single zone. The :option:`zoneskip` parameter
+                               is ignored. :option:`zonerange` and
+                               :option:`zonesize` must be identical.
+                               Trim is handled using a zone reset operation.
+                               Trim only considers non-empty sequential write
+                               required and sequential write preferred zones.
  
  .. option:: zonerange=int
  
-       Give size of an I/O zone.  See :option:`zoneskip`.
+       Size of a single zone. See also :option:`zonesize` and
+       :option:`zoneskip`.
+
+.. option:: zonesize=int
+
+       For :option:`zonemode` =strided, this is the number of bytes to
+       transfer before skipping :option:`zoneskip` bytes. If this parameter
+       is smaller than :option:`zonerange` then only a fraction of each zone
+       with :option:`zonerange` bytes will be accessed.  If this parameter is
+       larger than :option:`zonerange` then each zone will be accessed
+       multiple times before skipping to the next zone.
+
+       For :option:`zonemode` =zbd, this is the size of a single zone. The
+       :option:`zonerange` parameter is ignored in this mode.
+
+
+.. option:: zonecapacity=int
+
+       For :option:`zonemode` =zbd, this defines the capacity of a single zone,
+       which is the accessible area starting from the zone start address.
+       This parameter only applies when using :option:`zonemode` =zbd in
+       combination with regular block devices. If not specified it defaults to
+       the zone size. If the target device is a zoned block device, the zone
+       capacity is obtained from the device information and this option is
+       ignored.
  
  .. option:: zoneskip=int
  
-       Skip the specified number of bytes when :option:`zonesize` data has been
-       read. The two zone options can be used to only do I/O on zones of a file.
+       For :option:`zonemode` =strided, the number of bytes to skip after
+       :option:`zonesize` bytes of data have been transferred. This parameter
+       must be zero for :option:`zonemode` =zbd.
+
+.. option:: read_beyond_wp=bool
+
+       This parameter applies to :option:`zonemode` =zbd only.
+
+       Zoned block devices are block devices that consist of multiple zones.
+       Each zone has a type, e.g. conventional or sequential. A conventional
+       zone can be written at any offset that is a multiple of the block
+       size. Sequential zones must be written sequentially. The position at
+       which a write must occur is called the write pointer. A zoned block
+       device can be either drive managed, host managed or host aware. For
+       host managed devices the host must ensure that writes happen
+       sequentially. Fio recognizes host managed devices and serializes
+       writes to sequential zones for these devices.
+
+       If a read occurs in a sequential zone beyond the write pointer then
+       the zoned block device will complete the read without reading any data
+       from the storage medium. Since such reads lead to unrealistically high
+       bandwidth and IOPS numbers fio only reads beyond the write pointer if
+       explicitly told to do so. Default: false.
+
+.. option:: max_open_zones=int
+
+       When running a random write test across an entire drive many more
+       zones will be open than in a typical application workload. Hence this
+       command line option that allows to limit the number of open zones. The
+       number of open zones is defined as the number of zones to which write
+       commands are issued.
+
+.. option:: job_max_open_zones=int
+
+       Limit on the number of simultaneously opened zones per single
+       thread/process.
+
+.. option:: zone_reset_threshold=float
+
+       A number between zero and one that indicates the ratio of logical
+       blocks with data to the total number of logical blocks in the test
+       above which zones should be reset periodically.
+
+.. option:: zone_reset_frequency=float
+
+       A number between zero and one that indicates how often a zone reset
+       should be issued if the zone reset threshold has been exceeded. A zone
+       reset is submitted after each (1 / zone_reset_frequency) write
+       requests. This and the previous parameter can be used to simulate
+       garbage collection activity.
  
  
  I/O type
@@ -991,13 +1107,15 @@ I/O type
                 **write**
                                 Sequential writes.
                 **trim**
-                               Sequential trims (Linux block devices only).
+                               Sequential trims (Linux block devices and SCSI
+                               character devices only).
                 **randread**
                                 Random reads.
                 **randwrite**
                                 Random writes.
                 **randtrim**
-                               Random trims (Linux block devices only).
+                               Random trims (Linux block devices and SCSI
+                               character devices only).
                 **rw,readwrite**
                                 Sequential mixed reads and writes.
                 **randrw**
@@ -1039,11 +1157,31 @@ I/O type
         behaves in a similar fashion, except it sends the same offset 8 number of
         times before generating a new offset.
  
-.. option:: unified_rw_reporting=bool
+.. option:: unified_rw_reporting=str
  
         Fio normally reports statistics on a per data direction basis, meaning that
-       reads, writes, and trims are accounted and reported separately. If this
-       option is set fio sums the results and report them as "mixed" instead.
+       reads, writes, and trims are accounted and reported separately. This option
+       determines whether fio reports the results normally, summed together, or as
+       both options.
+       Accepted values are:
+
+               **none**
+                       Normal statistics reporting.
+
+               **mixed**
+                       Statistics are summed per data direction and reported together.
+
+               **both**
+                       Statistics are reported normally, followed by the mixed statistics.
+
+               **0**
+                       Backward-compatible alias for **none**.
+
+               **1**
+                       Backward-compatible alias for **mixed**.
+
+               **2**
+                       Alias for **both**.
  
  .. option:: randrepeat=bool
  
@@ -1080,6 +1218,10 @@ I/O type
                         Pre-allocate via :manpage:`fallocate(2)` with
                         FALLOC_FL_KEEP_SIZE set.
  
+               **truncate**
+                       Extend file to final size via :manpage:`ftruncate(2)`
+                       instead of allocating.
+
                 **0**
                         Backward-compatible alias for **none**.
  
@@ -1089,7 +1231,15 @@ I/O type
         May not be available on all supported platforms. **keep** is only available
         on Linux. If using ZFS on Solaris this cannot be set to **posix**
         because ZFS doesn't support pre-allocation. Default: **native** if any
-       pre-allocation methods are available, **none** if not.
+       pre-allocation methods except **truncate** are available, **none** if not.
+
+       Note that using **truncate** on Windows will interact surprisingly
+       with non-sequential write patterns. When writing to a file that has
+       been extended by setting the end-of-file information, Windows will
+       backfill the unwritten portion of the file up to that offset with
+       zeroes before issuing the new write. This means that a single small
+       write to the end of an extended file will stall until the entire
+       file has been filled with zeroes.
  
  .. option:: fadvise_hint=str
  
@@ -1138,13 +1288,14 @@ I/O type
  .. option:: offset=int
  
         Start I/O at the provided offset in the file, given as either a fixed size in
-       bytes or a percentage. If a percentage is given, the generated offset will be
+       bytes, zones or a percentage. If a percentage is given, the generated offset will be
         aligned to the minimum ``blocksize`` or to the value of ``offset_align`` if
         provided. Data before the given offset will not be touched. This
         effectively caps the file size at `real_size - offset`. Can be combined with
         :option:`size` to constrain the start and end range of the I/O workload.
         A percentage can be specified by a number between 1 and 100 followed by '%',
-       for example, ``offset=20%`` to specify 20%.
+       for example, ``offset=20%`` to specify 20%. In ZBD mode, value can be set as 
+        number of zones using 'z'.
  
  .. option:: offset_align=int
  
@@ -1159,7 +1310,10 @@ I/O type
         is incremented for each sub-job (i.e. when :option:`numjobs` option is
         specified). This option is useful if there are several jobs which are
         intended to operate on a file in parallel disjoint segments, with even
-       spacing between the starting points.
+       spacing between the starting points. Percentages can be used for this option.
+       If a percentage is given, the generated offset will be aligned to the minimum
+       ``blocksize`` or to the value of ``offset_align`` if provided. In ZBD mode, value can
+        also be set as number of zones using 'z'.
  
  .. option:: number_ios=int
  
@@ -1184,7 +1338,7 @@ I/O type
  .. option:: fdatasync=int
  
         Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
-       not metadata blocks.  In Windows, FreeBSD, and DragonFlyBSD there is no
+       not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
         :manpage:`fdatasync(2)` so this falls back to using :manpage:`fsync(2)`.
         Defaults to 0, which means fio does not periodically issue and wait for a
         data-only sync to complete.
@@ -1242,7 +1396,7 @@ I/O type
         limit reads or writes to a certain rate.  If that is the case, then the
         distribution may be skewed. Default: 50.
  
-.. option:: random_distribution=str:float[,str:float][,str:float]
+.. option:: random_distribution=str:float[:float][,str:float][,str:float]
  
         By default, fio will use a completely uniform random distribution when asked
         to perform random I/O. Sometimes it is useful to skew the distribution in
@@ -1277,6 +1431,14 @@ I/O type
         map. For the **normal** distribution, a normal (Gaussian) deviation is
         supplied as a value between 0 and 100.
  
+       The second, optional float is allowed for **pareto**, **zipf** and **normal** distributions.
+       It allows to set base of distribution in non-default place, giving more control
+       over most probable outcome. This value is in range [0-1] which maps linearly to
+       range of possible random values.
+       Defaults are: random for **pareto** and **zipf**, and 0.5 for **normal**.
+       If you wanted to use **zipf** with a `theta` of 1.2 centered on 1/4 of allowed value range,
+       you would use ``random_distibution=zipf:1.2:0.25``.
+
         For a **zoned** distribution, fio supports specifying percentages of I/O
         access that should fall within what range of the file or device. For
         example, given a criteria of:
@@ -1329,7 +1491,9 @@ I/O type
         and that some blocks may be read/written more than once. If this option is
         used with :option:`verify` and multiple blocksizes (via :option:`bsrange`),
         only intact blocks are verified, i.e., partially-overwritten blocks are
-       ignored.
+       ignored.  With an async I/O engine and an I/O depth > 1, it is possible for
+       the same block to be overwritten, which can cause verification errors.  Either
+       do not use norandommap in this case, or also use the lfsr random generator.
  
  .. option:: softrandommap=bool
  
@@ -1549,6 +1713,36 @@ Buffers and memory
         this option will also enable :option:`refill_buffers` to prevent every buffer
         being identical.
  
+.. option:: dedupe_mode=str
+
+       If ``dedupe_percentage=<int>`` is given, then this option controls how fio
+       generates the dedupe buffers.
+
+               **repeat**
+                       Generate dedupe buffers by repeating previous writes
+               **working_set**
+                       Generate dedupe buffers from working set
+
+       ``repeat`` is the default option for fio. Dedupe buffers are generated
+       by repeating previous unique write.
+
+       ``working_set`` is a more realistic workload.
+       With ``working_set``, ``dedupe_working_set_percentage=<int>`` should be provided.
+       Given that, fio will use the initial unique write buffers as its working set.
+       Upon deciding to dedupe, fio will randomly choose a buffer from the working set.
+       Note that by using ``working_set`` the dedupe percentage will converge
+       to the desired over time while ``repeat`` maintains the desired percentage
+       throughout the job.
+
+.. option:: dedupe_working_set_percentage=int
+
+       If ``dedupe_mode=<str>`` is set to ``working_set``, then this controls
+       the percentage of size of the file or device used as the buffers
+       fio will choose to generate the dedupe buffers from
+
+       Note that size needs to be explicitly provided and only 1 file per
+       job is supported
+
  .. option:: invalidate=bool
  
         Invalidate the buffer/page cache parts of the files to be used prior to
@@ -1556,10 +1750,28 @@ Buffers and memory
         This will be ignored if :option:`pre_read` is also specified for the
         same job.
  
-.. option:: sync=bool
+.. option:: sync=str
+
+       Whether, and what type, of synchronous I/O to use for writes.  The allowed
+       values are:
+
+               **none**
+                       Do not use synchronous IO, the default.
+
+               **0**
+                       Same as **none**.
+
+               **sync**
+                       Use synchronous file IO. For the majority of I/O engines,
+                       this means using O_SYNC.
+
+               **1**
+                       Same as **sync**.
+
+               **dsync**
+                       Use synchronous data IO. For the majority of I/O engines,
+                       this means using O_DSYNC.
  
-       Use synchronous I/O for buffered writes. For the majority of I/O engines,
-       this means using O_SYNC. Default: false.
  
  .. option:: iomem=str, mem=str
  
@@ -1649,7 +1861,8 @@ I/O size
         If this option is not specified, fio will use the full size of the given
         files or devices.  If the files do not exist, size must be given. It is also
         possible to give size as a percentage between 1 and 100. If ``size=20%`` is
-       given, fio will use 20% of the full size of the given files or devices.
+       given, fio will use 20% of the full size of the given files or devices. 
+       In ZBD mode, value can also be set as number of zones using 'z'.
         Can be combined with :option:`offset` to constrain the start and end range
         that I/O will be done within.
  
@@ -1683,7 +1896,8 @@ I/O size
  .. option:: fill_device=bool, fill_fs=bool
  
         Sets size to something really large and waits for ENOSPC (no space left on
-       device) as the terminating condition. Only makes sense with sequential
+       device) or EDQUOT (disk quota exceeded)
+       as the terminating condition. Only makes sense with sequential
         write. For a read workload, the mount point will be filled first then I/O
         started on the result. This option doesn't make sense if operating on a raw
         device node, since the size of that is already known by the file system.
@@ -1716,6 +1930,11 @@ I/O engine
                 **pvsync2**
                         Basic :manpage:`preadv2(2)` or :manpage:`pwritev2(2)` I/O.
  
+               **io_uring**
+                       Fast Linux native asynchronous I/O. Supports async IO
+                       for both direct and buffered IO.
+                       This engine defines engine specific options.
+
                 **libaio**
                         Linux native asynchronous I/O. Note that Linux may only support
                         queued behavior with non-buffered I/O (set ``direct=1`` or
@@ -1746,9 +1965,14 @@ I/O engine
                         ioctl, or if the target is an sg character device we use
                         :manpage:`read(2)` and :manpage:`write(2)` for asynchronous
                         I/O. Requires :option:`filename` option to specify either block or
-                       character devices.
+                       character devices. This engine supports trim operations.
                         The sg engine includes engine specific options.
  
+               **libzbc**
+                       Read, write, trim and ZBC/ZAC operations to a zoned
+                       block device using libzbc library. The target can be
+                       either an SG character device or a block device file.
+
                 **null**
                         Doesn't transfer any data, just pretends to.  This is mainly used to
                         exercise fio itself and for debugging/testing purposes.
@@ -1768,20 +1992,14 @@ I/O engine
  
                 **cpuio**
                         Doesn't transfer any data, but burns CPU cycles according to the
-                       :option:`cpuload` and :option:`cpuchunks` options. Setting
-                       :option:`cpuload`\=85 will cause that job to do nothing but burn 85%
+                       :option:`cpuload`, :option:`cpuchunks` and :option:`cpumode` options.
+                       Setting :option:`cpuload`\=85 will cause that job to do nothing but burn 85%
                         of the CPU. In case of SMP machines, use :option:`numjobs`\=<nr_of_cpu>
                         to get desired CPU usage, as the cpuload only loads a
                         single CPU at the desired rate. A job never finishes unless there is
                         at least one non-cpuio job.
-
-               **guasi**
-                       The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-                       Interface approach to async I/O. See
-
-                       http://www.xmailserver.org/guasi-lib.html
-
-                       for more info on GUASI.
+                       Setting :option:`cpumode`\=qsort replace the default noop instructions loop
+                       by a qsort algorithm to consume more energy.
  
                 **rdma**
                         The RDMA I/O engine supports both RDMA memory semantics
@@ -1821,6 +2039,15 @@ I/O engine
                         (RBD) via librbd without the need to use the kernel rbd driver. This
                         ioengine defines engine specific options.
  
+               **http**
+                       I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to
+                       a WebDAV or S3 endpoint.  This ioengine defines engine specific options.
+
+                       This engine only supports direct IO of iodepth=1; you need to scale this
+                       via numjobs. blocksize defines the size of the objects to be created.
+
+                       TRIM is translated to object deletion.
+
                 **gfapi**
                         Using GlusterFS libgfapi sync interface to direct access to
                         GlusterFS volumes without having to go through FUSE.  This ioengine
@@ -1873,11 +2100,61 @@ I/O engine
                         set  `filesize` so that all the accounting still occurs, but no
                         actual I/O will be done other than creating the file.
  
+               **filestat**
+                       Simply do stat() and do no I/O to the file. You need to set 'filesize'
+                       and 'nrfiles', so that files will be created.
+                       This engine is to measure file lookup and meta data access.
+
+               **filedelete**
+                       Simply delete the files by unlink() and do no I/O to them. You need to set 'filesize'
+                       and 'nrfiles', so that the files will be created.
+                       This engine is to measure file delete.
+
                 **libpmem**
                         Read and write using mmap I/O to a file on a filesystem
                         mounted with DAX on a persistent memory device through the PMDK
                         libpmem library.
  
+               **ime_psync**
+                       Synchronous read and write using DDN's Infinite Memory Engine (IME).
+                       This engine is very basic and issues calls to IME whenever an IO is
+                       queued.
+
+               **ime_psyncv**
+                       Synchronous read and write using DDN's Infinite Memory Engine (IME).
+                       This engine uses iovecs and will try to stack as much IOs as possible
+                       (if the IOs are "contiguous" and the IO depth is not exceeded)
+                       before issuing a call to IME.
+
+               **ime_aio**
+                       Asynchronous read and write using DDN's Infinite Memory Engine (IME).
+                       This engine will try to stack as much IOs as possible by creating
+                       requests for IME. FIO will then decide when to commit these requests.
+               **libiscsi**
+                       Read and write iscsi lun with libiscsi.
+               **nbd**
+                       Read and write a Network Block Device (NBD).
+
+               **libcufile**
+                       I/O engine supporting libcufile synchronous access to nvidia-fs and a
+                       GPUDirect Storage-supported filesystem. This engine performs
+                       I/O without transferring buffers between user-space and the kernel,
+                       unless :option:`verify` is set or :option:`cuda_io` is `posix`.
+                       :option:`iomem` must not be `cudamalloc`. This ioengine defines
+                       engine specific options.
+               **dfs**
+                       I/O engine supporting asynchronous read and write operations to the
+                       DAOS File System (DFS) via libdfs.
+
+               **nfs**
+                       I/O engine supporting asynchronous read and write operations to
+                       NFS filesystems from userspace via libnfs. This is useful for
+                       achieving higher concurrency and thus throughput than is possible
+                       via kernel NFS.
+
+               **exec**
+                       Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
+
  I/O engine specific parameters
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
@@ -1886,6 +2163,86 @@ In addition, there are some parameters which are only valid when a specific
  with the caveat that when used on the command line, they must come after the
  :option:`ioengine` that defines them is selected.
  
+.. option:: cmdprio_percentage=int[,int] : [io_uring] [libaio]
+
+    Set the percentage of I/O that will be issued with the highest priority.
+    Default: 0. A single value applies to reads and writes. Comma-separated
+    values may be specified for reads and writes. This option cannot be used
+    with the :option:`prio` or :option:`prioclass` options. For this option
+    to be effective, NCQ priority must be supported and enabled, and `direct=1'
+    option must be used. fio must also be run as the root user.
+
+.. option:: cmdprio_class=int[,int] : [io_uring] [libaio]
+
+       Set the I/O priority class to use for I/Os that must be issued with
+       a priority when :option:`cmdprio_percentage` or
+       :option:`cmdprio_bssplit` is set. If not specified when
+       :option:`cmdprio_percentage` or :option:`cmdprio_bssplit` is set,
+       this defaults to the highest priority class. A single value applies
+       to reads and writes. Comma-separated values may be specified for
+       reads and writes. See :manpage:`ionice(1)`. See also the
+       :option:`prioclass` option.
+
+.. option:: cmdprio=int[,int] : [io_uring] [libaio]
+
+       Set the I/O priority value to use for I/Os that must be issued with
+       a priority when :option:`cmdprio_percentage` or
+       :option:`cmdprio_bssplit` is set. If not specified when
+       :option:`cmdprio_percentage` or :option:`cmdprio_bssplit` is set,
+       this defaults to 0.
+       Linux limits us to a positive value between 0 and 7, with 0 being the
+       highest. A single value applies to reads and writes. Comma-separated
+       values may be specified for reads and writes. See :manpage:`ionice(1)`.
+       Refer to an appropriate manpage for other operating systems since
+       meaning of priority may differ. See also the :option:`prio` option.
+
+.. option:: cmdprio_bssplit=str[,str] : [io_uring] [libaio]
+       To get a finer control over I/O priority, this option allows
+       specifying the percentage of IOs that must have a priority set
+       depending on the block size of the IO. This option is useful only
+       when used together with the :option:`bssplit` option, that is,
+       multiple different block sizes are used for reads and writes.
+       The format for this option is the same as the format of the
+       :option:`bssplit` option, with the exception that values for
+       trim IOs are ignored. This option is mutually exclusive with the
+       :option:`cmdprio_percentage` option.
+
+.. option:: fixedbufs : [io_uring]
+
+    If fio is asked to do direct IO, then Linux will map pages for each
+    IO call, and release them when IO is done. If this option is set, the
+    pages are pre-mapped before IO is started. This eliminates the need to
+    map and release for each IO. This is more efficient, and reduces the
+    IO latency as well.
+
+.. option:: hipri : [io_uring]
+
+    If this option is set, fio will attempt to use polled IO completions.
+    Normal IO completions generate interrupts to signal the completion of
+    IO, polled completions do not. Hence they are require active reaping
+    by the application. The benefits are more efficient IO for high IOPS
+    scenarios, and lower latencies for low queue depth IO.
+
+.. option:: registerfiles : [io_uring]
+
+       With this option, fio registers the set of files being used with the
+       kernel. This avoids the overhead of managing file counts in the kernel,
+       making the submission and completion part more lightweight. Required
+       for the below :option:`sqthread_poll` option.
+
+.. option:: sqthread_poll : [io_uring]
+
+       Normally fio will submit IO by issuing a system call to notify the
+       kernel of available items in the SQ ring. If this option is set, the
+       act of submitting IO will be done by a polling thread in the kernel.
+       This frees up cycles for fio, at the cost of using more CPU in the
+       system.
+
+.. option:: sqthread_poll_cpu : [io_uring]
+
+       When :option:`sqthread_poll` is set, this option provides a way to
+       define which CPU should be used for the polling thread.
+
  .. option:: userspace_reap : [libaio]
  
         Normally, with the libaio engine in use, fio will use the
@@ -1904,6 +2261,26 @@ with the caveat that when used on the command line, they must come after the
         When hipri is set this determines the probability of a pvsync2 I/O being high
         priority. The default is 100%.
  
+.. option:: nowait : [pvsync2] [libaio] [io_uring]
+
+       By default if a request cannot be executed immediately (e.g. resource starvation,
+       waiting on locks) it is queued and the initiating process will be blocked until
+       the required resource becomes free.
+
+       This option sets the RWF_NOWAIT flag (supported from the 4.14 Linux kernel) and
+       the call will return instantly with EAGAIN or a partial result rather than waiting.
+
+       It is useful to also use ignore_error=EAGAIN when using this option.
+
+       Note: glibc 2.27, 2.28 have a bug in syscall wrappers preadv2, pwritev2.
+       They return EOPNOTSUP instead of EAGAIN.
+
+       For cached I/O, using this option usually means a request operates only with
+       cached data. Currently the RWF_NOWAIT flag does not supported for cached write.
+
+       For direct I/O, requests will only succeed if cache invalidation isn't required,
+       file blocks are fully allocated and the disk request could be issued immediately.
+
  .. option:: cpuload=int : [cpuio]
  
         Attempt to use the specified percentage of CPU cycles. This is a mandatory
@@ -1934,7 +2311,7 @@ with the caveat that when used on the command line, they must come after the
                 this will be the starting port number since fio will use a range of
                 ports.
  
-   [rdma]
+   [rdma], [librpma_*]
  
                 The port to use for RDMA-CM communication. This should be the same value
                 on the client and the server side.
@@ -1945,6 +2322,20 @@ with the caveat that when used on the command line, they must come after the
         is a TCP listener or UDP reader, the hostname is not used and must be omitted
         unless it is a valid UDP multicast address.
  
+.. option:: serverip=str : [librpma_*]
+
+       The IP address to be used for RDMA-CM based I/O.
+
+.. option:: direct_write_to_pmem=bool : [librpma_*]
+
+       Set to 1 only when Direct Write to PMem from the remote host is possible.
+       Otherwise, set to 0.
+
+.. option:: busy_wait_polling=bool : [librpma_*_server]
+
+       Set to 0 to wait for completion instead of busy-wait polling completion.
+       Default: 1.
+
  .. option:: interface=str : [netsplice] [net]
  
         The IP address of the network interface used to send or receive UDP
@@ -2041,6 +2432,12 @@ with the caveat that when used on the command line, they must come after the
          Poll store instead of waiting for completion. Usually this provides better
          throughput at cost of higher(up to 100%) CPU utilization.
  
+.. option:: touch_objects=bool : [rados]
+
+        During initialization, touch (create if do not exist) all objects (files).
+        Touching all objects affects ceph caches and likely impacts test results.
+        Enabled by default.
+
  .. option:: skip_bad=bool : [mtd]
  
         Skip operations against known bad blocks.
@@ -2069,6 +2466,11 @@ with the caveat that when used on the command line, they must come after the
         multiple paths exist between the client and the server or in certain loopback
         configurations.
  
+.. option:: stat_type=str : [filestat]
+
+       Specify stat system call type to measure lookup/getattr performance.
+       Default is **stat** for :manpage:`stat(2)`.
+
  .. option:: readfua=bool : [sg]
  
         With readfua option set to 1, read operations include
@@ -2080,6 +2482,7 @@ with the caveat that when used on the command line, they must come after the
         the force unit access (fua) flag. Default is 0.
  
  .. option:: sg_write_mode=str : [sg]
+
         Specify the type of write commands to issue. This option can take three values:
  
         **write**
@@ -2100,6 +2503,142 @@ with the caveat that when used on the command line, they must come after the
                 transferred to the device. The writefua option is ignored with this
                 selection.
  
+.. option:: hipri : [sg]
+
+       If this option is set, fio will attempt to use polled IO completions.
+       This will have a similar effect as (io_uring)hipri. Only SCSI READ and
+       WRITE commands will have the SGV4_FLAG_HIPRI set (not UNMAP (trim) nor
+       VERIFY). Older versions of the Linux sg driver that do not support
+       hipri will simply ignore this flag and do normal IO. The Linux SCSI
+       Low Level Driver (LLD) that "owns" the device also needs to support
+       hipri (also known as iopoll and mq_poll). The MegaRAID driver is an
+       example of a SCSI LLD. Default: clear (0) which does normal
+       (interrupted based) IO.
+
+.. option:: http_host=str : [http]
+
+       Hostname to connect to. For S3, this could be the bucket hostname.
+       Default is **localhost**
+
+.. option:: http_user=str : [http]
+
+       Username for HTTP authentication.
+
+.. option:: http_pass=str : [http]
+
+       Password for HTTP authentication.
+
+.. option:: https=str : [http]
+
+       Enable HTTPS instead of http. *on* enables HTTPS; *insecure*
+       will enable HTTPS, but disable SSL peer verification (use with
+       caution!). Default is **off**
+
+.. option:: http_mode=str : [http]
+
+       Which HTTP access mode to use: *webdav*, *swift*, or *s3*.
+       Default is **webdav**
+
+.. option:: http_s3_region=str : [http]
+
+       The S3 region/zone string.
+       Default is **us-east-1**
+
+.. option:: http_s3_key=str : [http]
+
+       The S3 secret key.
+
+.. option:: http_s3_keyid=str : [http]
+
+       The S3 key/access id.
+
+.. option:: http_swift_auth_token=str : [http]
+
+       The Swift auth token. See the example configuration file on how
+       to retrieve this.
+
+.. option:: http_verbose=int : [http]
+
+       Enable verbose requests from libcurl. Useful for debugging. 1
+       turns on verbose logging from libcurl, 2 additionally enables
+       HTTP IO tracing. Default is **0**
+
+.. option:: uri=str : [nbd]
+
+       Specify the NBD URI of the server to test.  The string
+       is a standard NBD URI
+       (see https://github.com/NetworkBlockDevice/nbd/tree/master/doc).
+       Example URIs: nbd://localhost:10809
+       nbd+unix:///?socket=/tmp/socket
+       nbds://tlshost/exportname
+
+.. option:: gpu_dev_ids=str : [libcufile]
+
+       Specify the GPU IDs to use with CUDA. This is a colon-separated list of
+       int. GPUs are assigned to workers roundrobin. Default is 0.
+
+.. option:: cuda_io=str : [libcufile]
+
+       Specify the type of I/O to use with CUDA. Default is **cufile**.
+
+       **cufile**
+               Use libcufile and nvidia-fs. This option performs I/O directly
+               between a GPUDirect Storage filesystem and GPU buffers,
+               avoiding use of a bounce buffer. If :option:`verify` is set,
+               cudaMemcpy is used to copy verificaton data between RAM and GPU.
+               Verification data is copied from RAM to GPU before a write
+               and from GPU to RAM after a read. :option:`direct` must be 1.
+       **posix**
+               Use POSIX to perform I/O with a RAM buffer, and use cudaMemcpy
+               to transfer data between RAM and the GPUs. Data is copied from
+               GPU to RAM before a write and copied from RAM to GPU after a
+               read. :option:`verify` does not affect use of cudaMemcpy.
+
+.. option:: pool=str : [dfs]
+
+       Specify the label or UUID of the DAOS pool to connect to.
+
+.. option:: cont=str : [dfs]
+
+       Specify the label or UUID of the DAOS container to open.
+
+.. option:: chunk_size=int : [dfs]
+
+       Specificy a different chunk size (in bytes) for the dfs file.
+       Use DAOS container's chunk size by default.
+
+.. option:: object_class=str : [dfs]
+
+       Specificy a different object class for the dfs file.
+       Use DAOS container's object class by default.
+
+.. option:: nfs_url=str : [nfs]
+
+       URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]
+       Refer to the libnfs README for more details.
+
+.. option:: program=str : [exec]
+
+       Specify the program to execute.
+
+.. option:: arguments=str : [exec]
+
+       Specify arguments to pass to program.
+       Some special variables can be expanded to pass fio's job details to the program.
+
+       **%r**
+               Replaced by the duration of the job in seconds.
+       **%n**
+               Replaced by the name of the job.
+
+.. option:: grace_time=int : [exec]
+
+       Specify the time between the SIGTERM and SIGKILL signals. Default is 1 second.
+
+.. option:: std_redirect=bool : [exec]
+
+       If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
+
  I/O depth
  ~~~~~~~~~
  
@@ -2176,8 +2715,13 @@ I/O depth
         ``serialize_overlap`` tells fio to avoid provoking this behavior by explicitly
         serializing in-flight I/Os that have a non-zero overlap. Note that setting
         this option can reduce both performance and the :option:`iodepth` achieved.
-       Additionally this option does not work when :option:`io_submit_mode` is set to
-       offload. Default: false.
+
+       This option only applies to I/Os issued for a single job except when it is
+       enabled along with :option:`io_submit_mode`\=offload. In offload mode, fio
+       will check for overlap among all I/Os submitted by offload jobs with :option:`serialize_overlap`
+       enabled.
+
+       Default: false.
  
  .. option:: io_submit_mode=str
  
@@ -2189,7 +2733,8 @@ I/O depth
         can increase latencies. The benefit is that fio can manage submission rates
         independently of the device completion rates. This avoids skewed latency
         reporting if I/O gets backed up on the device side (the coordinated omission
-       problem).
+       problem). Note that this option cannot reliably be used with async IO
+       engines.
  
  
  I/O rate
@@ -2200,7 +2745,7 @@ I/O rate
         Stall the job for the specified period of time after an I/O has completed before issuing the
         next. May be used to simulate processing being done by an application.
         When the unit is omitted, the value is interpreted in microseconds.  See
-       :option:`thinktime_blocks` and :option:`thinktime_spin`.
+       :option:`thinktime_blocks`, :option:`thinktime_iotime` and :option:`thinktime_spin`.
  
  .. option:: thinktime_spin=time
  
@@ -2218,6 +2763,25 @@ I/O rate
         before we have to complete it and do our :option:`thinktime`. In other words, this
         setting effectively caps the queue depth if the latter is larger.
  
+.. option:: thinktime_blocks_type=str
+
+       Only valid if :option:`thinktime` is set - control how :option:`thinktime_blocks`
+       triggers. The default is `complete`, which triggers thinktime when fio completes
+       :option:`thinktime_blocks` blocks. If this is set to `issue`, then the trigger happens
+       at the issue side.
+
+.. option:: thinktime_iotime=time
+
+       Only valid if :option:`thinktime` is set - control :option:`thinktime`
+       interval by time. The :option:`thinktime` stall is repeated after IOs
+       are executed for :option:`thinktime_iotime`. For example,
+       ``--thinktime_iotime=9s --thinktime=1s`` repeat 10-second cycle with IOs
+       for 9 seconds and stall for 1 second. When the unit is omitted,
+       :option:`thinktime_iotime` is interpreted as a number of seconds. If
+       this option is used together with :option:`thinktime_blocks`, the
+       :option:`thinktime` stall is repeated after :option:`thinktime_iotime`
+       or after :option:`thinktime_blocks` IOs, whichever happens first.
+
  .. option:: rate=int[,int][,int]
  
         Cap the bandwidth used by this job. The number is in bytes/sec, the normal
@@ -2291,11 +2855,19 @@ I/O latency
         defaults to 100.0, meaning that all I/Os must be equal or below to the value
         set by :option:`latency_target`.
  
-.. option:: max_latency=time
+.. option:: latency_run=bool
+
+       Used with :option:`latency_target`. If false (default), fio will find
+       the highest queue depth that meets :option:`latency_target` and exit. If
+       true, fio will continue running and try to meet :option:`latency_target`
+       by adjusting queue depth.
+
+.. option:: max_latency=time[,time][,time]
  
         If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
         maximum latency. When the unit is omitted, the value is interpreted in
-       microseconds.
+       microseconds. Comma-separated values may be specified for reads, writes,
+       and trims as described in :option:`blocksize`.
  
  .. option:: rate_cycle=int
  
@@ -2321,6 +2893,46 @@ I/O replay
         :manpage:`blktrace(8)` for how to capture such logging data. For blktrace
         replay, the file needs to be turned into a blkparse binary data file first
         (``blkparse <device> -o /dev/null -d file_for_fio.bin``).
+       You can specify a number of files by separating the names with a ':'
+       character. See the :option:`filename` option for information on how to
+       escape ':' characters within the file names. These files will
+       be sequentially assigned to job clones created by :option:`numjobs`.
+       '-' is a reserved name, meaning read from stdin, notably if
+       :option:`filename` is set to '-' which means stdin as well, then
+       this flag can't be set to '-'.
+
+.. option:: read_iolog_chunked=bool
+
+       Determines how iolog is read. If false(default) entire :option:`read_iolog`
+       will be read at once. If selected true, input from iolog will be read
+       gradually. Useful when iolog is very large, or it is generated.
+
+.. option:: merge_blktrace_file=str
+
+       When specified, rather than replaying the logs passed to :option:`read_iolog`,
+       the logs go through a merge phase which aggregates them into a single
+       blktrace. The resulting file is then passed on as the :option:`read_iolog`
+       parameter. The intention here is to make the order of events consistent.
+       This limits the influence of the scheduler compared to replaying multiple
+       blktraces via concurrent jobs.
+
+.. option:: merge_blktrace_scalars=float_list
+
+       This is a percentage based option that is index paired with the list of
+       files passed to :option:`read_iolog`. When merging is performed, scale
+       the time of each event by the corresponding amount. For example,
+       ``--merge_blktrace_scalars="50:100"`` runs the first trace in halftime
+       and the second trace in realtime. This knob is separately tunable from
+       :option:`replay_time_scale` which scales the trace during runtime and
+       does not change the output of the merge unlike this option.
+
+.. option:: merge_blktrace_iters=float_list
+
+       This is a whole number option that is index paired with the list of files
+       passed to :option:`read_iolog`. When merging is performed, run each trace
+       for the specified number of iterations. For example,
+       ``--merge_blktrace_iters="2:1"`` runs the first trace for two iterations
+       and the second trace for one iteration.
  
  .. option:: replay_no_stall=bool
  
@@ -2359,12 +2971,13 @@ I/O replay
  
  .. option:: replay_align=int
  
-       Force alignment of I/O offsets and lengths in a trace to this power of 2
-       value.
+       Force alignment of the byte offsets in a trace to this value. The value
+       must be a power of 2.
  
  .. option:: replay_scale=int
  
-       Scale sector offsets down by this factor when replaying traces.
+       Scale byte offsets down by this factor when replaying traces. Should most
+       likely use :option:`replay_align` as well.
  
  .. option:: replay_skip=str
  
@@ -2407,11 +3020,15 @@ Threads, processes and job synchronization
         Set the I/O priority value of this job. Linux limits us to a positive value
         between 0 and 7, with 0 being the highest.  See man
         :manpage:`ionice(1)`. Refer to an appropriate manpage for other operating
-       systems since meaning of priority may differ.
+       systems since meaning of priority may differ. For per-command priority
+       setting, see I/O engine specific :option:`cmdprio_percentage` and
+       :option:`cmdprio` options.
  
  .. option:: prioclass=int
  
-       Set the I/O priority class. See man :manpage:`ionice(1)`.
+       Set the I/O priority class. See man :manpage:`ionice(1)`. For per-command
+       priority setting, see I/O engine specific :option:`cmdprio_percentage`
+       and :option:`cmdprio_class` options.
  
  .. option:: cpus_allowed=str
  
@@ -2446,7 +3063,7 @@ Threads, processes and job synchronization
                         Each job will get a unique CPU from the CPU set.
  
         **shared** is the default behavior, if the option isn't specified. If
-       **split** is specified, then fio will will assign one cpu per job. If not
+       **split** is specified, then fio will assign one cpu per job. If not
         enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
         in the set.
  
@@ -2516,15 +3133,10 @@ Threads, processes and job synchronization
         ``flow=8`` and another job has ``flow=-1``, then there will be a roughly 1:8
         ratio in how much one runs vs the other.
  
-.. option:: flow_watermark=int
-
-       The maximum value that the absolute value of the flow counter is allowed to
-       reach before the job must wait for a lower value of the counter.
-
  .. option:: flow_sleep=int
  
-       The period of time, in microseconds, to wait after the flow watermark has
-       been exceeded before retrying operations.
+       The period of time, in microseconds, to wait after the flow counter
+       has exceeded its proportion before retrying operations.
  
  .. option:: stonewall, wait_for_previous
  
@@ -2535,9 +3147,21 @@ Threads, processes and job synchronization
  
  .. option:: exitall
  
-       By default, fio will continue running all other jobs when one job finishes
-       but sometimes this is not the desired action.  Setting ``exitall`` will
-       instead make fio terminate all other jobs when one job finishes.
+       By default, fio will continue running all other jobs when one job finishes.
+       Sometimes this is not the desired action.  Setting ``exitall`` will instead
+       make fio terminate all jobs in the same group, as soon as one job of that
+       group finishes.
+
+.. option:: exit_what
+
+       By default, fio will continue running all other jobs when one job finishes.
+       Sometimes this is not the desired action. Setting ``exit_all`` will
+       instead make fio terminate all jobs in the same group. The option
+        ``exit_what`` allows to control which jobs get terminated when ``exitall`` is
+        enabled. The default is ``group`` and does not change the behaviour of
+        ``exitall``. The setting ``all`` terminates all jobs. The setting ``stonewall``
+        terminates all currently running jobs across all groups and continues execution
+        with the next stonewalled group.
  
  .. option:: exec_prerun=str
  
@@ -2663,6 +3287,11 @@ Verification
         previously written file. If the data direction includes any form of write,
         the verify will be of the newly written data.
  
+       To avoid false verification errors, do not use the norandommap option when
+       verifying data with async I/O engines and I/O depths > 1.  Or use the
+       norandommap and the lfsr random generator together to avoid writing to the
+       same offset with muliple outstanding I/Os.
+
  .. option:: verify_offset=int
  
         Swap the verification header with data somewhere else in the block before
@@ -2795,6 +3424,10 @@ Steady state
         data from the rolling collection window. Threshold limits can be expressed
         as a fixed value or as a percentage of the mean in the collection window.
  
+       When using this feature, most jobs should include the :option:`time_based`
+       and :option:`runtime` options or the :option:`loops` option so that fio does not
+       stop running after it has covered the full size of the specified file(s) or device(s).
+
                 **iops**
                         Collect IOPS data. Stop the job if all individual IOPS measurements
                         are within the specified limit of the mean IOPS (e.g., ``iops:2``
@@ -2899,9 +3532,11 @@ Measurements and reporting
  .. option:: write_iops_log=str
  
         Same as :option:`write_bw_log`, but writes an IOPS file (e.g.
-       :file:`name_iops.x.log`) instead. See :option:`write_bw_log` for
-       details about the filename format and `Log File Formats`_ for how data
-       is structured within the file.
+       :file:`name_iops.x.log`) instead. Because fio defaults to individual
+       I/O logging, the value entry in the IOPS log will be 1 unless windowed
+       logging (see :option:`log_avg_msec`) has been enabled. See
+       :option:`write_bw_log` for details about the filename format and `Log
+       File Formats`_ for how data is structured within the file.
  
  .. option:: log_avg_msec=int
  
@@ -3022,27 +3657,28 @@ Measurements and reporting
         Disable measurements of throughput/bandwidth numbers. See
         :option:`disable_lat`.
  
+.. option:: slat_percentiles=bool
+
+       Report submission latency percentiles. Submission latency is not recorded
+       for synchronous ioengines.
+
  .. option:: clat_percentiles=bool
  
-       Enable the reporting of percentiles of completion latencies.  This
-       option is mutually exclusive with :option:`lat_percentiles`.
+       Report completion latency percentiles.
  
  .. option:: lat_percentiles=bool
  
-       Enable the reporting of percentiles of I/O latencies. This is similar
-       to :option:`clat_percentiles`, except that this includes the
-       submission latency. This option is mutually exclusive with
-       :option:`clat_percentiles`.
+       Report total latency percentiles. Total latency is the sum of submission
+       latency and completion latency.
  
  .. option:: percentile_list=float_list
  
-       Overwrite the default list of percentiles for completion latencies and
-       the block error histogram.  Each number is a floating number in the
-       range (0,100], and the maximum length of the list is 20. Use ``:`` to
-       separate the numbers, and list the numbers in ascending order. For
+       Overwrite the default list of percentiles for latencies and the block error
+       histogram.  Each number is a floating point number in the range (0,100], and
+       the maximum length of the list is 20. Use ``:`` to separate the numbers. For
         example, ``--percentile_list=99.5:99.9`` will cause fio to report the
-       values of completion latency below which 99.5% and 99.9% of the observed
-       latencies fell, respectively.
+       latency durations below which 99.5% and 99.9% of the observed latencies fell,
+       respectively.
  
  .. option:: significant_figures=int
  
@@ -3474,7 +4110,8 @@ is one long line of values, such as::
      2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
      A description of this job goes here.
  
-The job description (if provided) follows on a second line.
+The job description (if provided) follows on a second line for terse v2.
+It appears on the same line for other terse versions.
  
  To enable terse output, use the :option:`--minimal` or
  :option:`--output-format`\=terse command line options. The
@@ -3557,7 +4194,12 @@ will be a disk utilization section.
  Below is a single line containing short names for each of the fields in the
  minimal output v3, separated by semicolons::
  
-        terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+        terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth_kb;read_iops;read_runtime_ms;read_slat_min_us;read_slat_max_us;read_slat_mean_us;read_slat_dev_us;read_clat_min_us;read_clat_max_us;read_clat_mean_us;read_clat_dev_us;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min_us;read_lat_max_us;read_lat_mean_us;read_lat_dev_us;read_bw_min_kb;read_bw_max_kb;read_bw_agg_pct;read_bw_mean_kb;read_bw_dev_kb;write_kb;write_bandwidth_kb;write_iops;write_runtime_ms;write_slat_min_us;write_slat_max_us;write_slat_mean_us;write_slat_dev_us;write_clat_min_us;write_clat_max_us;write_clat_mean_us;write_clat_dev_us;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min_us;write_lat_max_us;write_lat_mean_us;write_lat_dev_us;write_bw_min_kb;write_bw_max_kb;write_bw_agg_pct;write_bw_mean_kb;write_bw_dev_kb;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+
+In client/server mode terse output differs from what appears when jobs are run
+locally. Disk utilization data is omitted from the standard terse output and
+for v3 and later appears on its own separate line at the end of each terse
+reporting cycle.
  
  
  JSON output
@@ -3663,6 +4305,46 @@ given in bytes. The `action` can be one of these:
  **trim**
            Trim the given file from the given `offset` for `length` bytes.
  
+
+I/O Replay - Merging Traces
+---------------------------
+
+Colocation is a common practice used to get the most out of a machine.
+Knowing which workloads play nicely with each other and which ones don't is
+a much harder task. While fio can replay workloads concurrently via multiple
+jobs, it leaves some variability up to the scheduler making results harder to
+reproduce. Merging is a way to make the order of events consistent.
+
+Merging is integrated into I/O replay and done when a
+:option:`merge_blktrace_file` is specified. The list of files passed to
+:option:`read_iolog` go through the merge process and output a single file
+stored to the specified file. The output file is passed on as if it were the
+only file passed to :option:`read_iolog`. An example would look like::
+
+       $ fio --read_iolog="<file1>:<file2>" --merge_blktrace_file="<output_file>"
+
+Creating only the merged file can be done by passing the command line argument
+:option:`--merge-blktrace-only`.
+
+Scaling traces can be done to see the relative impact of any particular trace
+being slowed down or sped up. :option:`merge_blktrace_scalars` takes in a colon
+separated list of percentage scalars. It is index paired with the files passed
+to :option:`read_iolog`.
+
+With scaling, it may be desirable to match the running time of all traces.
+This can be done with :option:`merge_blktrace_iters`. It is index paired with
+:option:`read_iolog` just like :option:`merge_blktrace_scalars`.
+
+In an example, given two traces, A and B, each 60s long. If we want to see
+the impact of trace A issuing IOs twice as fast and repeat trace A over the
+runtime of trace B, the following can be done::
+
+       $ fio --read_iolog="<trace_a>:"<trace_b>" --merge_blktrace_file"<output_file>" --merge_blktrace_scalars="50:100" --merge_blktrace_iters="2:1"
+
+This runs trace A at 2x the speed twice for approximately the same runtime as
+a single run of trace B.
+
+
  CPU idleness profiling
  ----------------------
  
@@ -3765,7 +4447,7 @@ Fio supports a variety of log file formats, for logging latencies, bandwidth,
  and IOPS. The logs share a common format, which looks like this:
  
      *time* (`msec`), *value*, *data direction*, *block size* (`bytes`),
-    *offset* (`bytes`)
+    *offset* (`bytes`), *command priority*
  
  *Time* for the log entry is always in milliseconds. The *value* logged depends
  on the type of log, it will be one of the following:
@@ -3786,17 +4468,20 @@ on the type of log, it will be one of the following:
         **2**
                 I/O is a TRIM
  
-The entry's *block size* is always in bytes. The *offset* is the offset, in bytes,
-from the start of the file, for that particular I/O. The logging of the offset can be
+The entry's *block size* is always in bytes. The *offset* is the position in bytes
+from the start of the file for that particular I/O. The logging of the offset can be
  toggled with :option:`log_offset`.
  
-Fio defaults to logging every individual I/O.  When IOPS are logged for individual
-I/Os the *value* entry will always be 1. If windowed logging is enabled through
-:option:`log_avg_msec`, fio logs the average values over the specified period of time.
-If windowed logging is enabled and :option:`log_max_value` is set, then fio logs
-maximum values in that window instead of averages. Since *data direction*, *block
-size* and *offset* are per-I/O values, if windowed logging is enabled they
-aren't applicable and will be 0.
+*Command priority* is 0 for normal priority and 1 for high priority. This is controlled
+by the ioengine specific :option:`cmdprio_percentage`.
+
+Fio defaults to logging every individual I/O but when windowed logging is set
+through :option:`log_avg_msec`, either the average (by default) or the maximum
+(:option:`log_max_value` is set) *value* seen over the specified period of time
+is recorded. Each *data direction* seen within the window period will aggregate
+its values in a separate row. Further, when using windowed logging the *block
+size* and *offset* entries will always contain 0.
+
  
  Client/Server
  -------------
@@ -3885,3 +4570,6 @@ containing two hostnames ``h1`` and ``h2`` with IP addresses 192.168.10.120 and
  
         /mnt/nfs/fio/192.168.10.120.fileio.tmp
         /mnt/nfs/fio/192.168.10.121.fileio.tmp
+
+Terse output in client/server mode will differ slightly from what is produced
+when fio is run in stand-alone mode. See the terse output section for details.