Make log_unix_epoch an official alias of log_alternate_epoch

[fio.git] / HOWTO.rst
diff --git a/HOWTO.rst b/HOWTO.rst

index aba6c9b3b9d78ede27e1bd3b9cf89d82b1441775..7f26978a74efad628ed00264a56922bb755e621f 100644 (file)
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -686,10 +686,12 @@ Time related parameters
  
  .. option:: runtime=time
  
-       Tell fio to terminate processing after the specified period of time.  It
-       can be quite hard to determine for how long a specified job will run, so
-       this parameter is handy to cap the total runtime to a given time.  When
-       the unit is omitted, the value is interpreted in seconds.
+       Limit runtime. The test will run until it completes the configured I/O
+       workload or until it has run for this specified amount of time, whichever
+       occurs first. It can be quite hard to determine for how long a specified
+       job will run, so this parameter is handy to cap the total runtime to a
+       given time.  When the unit is omitted, the value is interpreted in
+       seconds.
  
  .. option:: time_based
  
@@ -753,6 +755,10 @@ Time related parameters
         calls will be excluded from other uses. Fio will manually clear it from the
         CPU mask of other jobs.
  
+.. option:: job_start_clock_id=int
+   The clock_id passed to the call to `clock_gettime` used to record job_start
+   in the `json` output format. Default is 0, or CLOCK_REALTIME.
+
  
  Target file/device
  ~~~~~~~~~~~~~~~~~~
@@ -841,7 +847,9 @@ Target file/device
  
  .. option:: opendir=str
  
-       Recursively open any files below directory `str`.
+        Recursively open any files below directory `str`. This accepts only a
+        single directory and unlike related options, colons appearing in the
+        path must not be escaped.
  
  .. option:: lockfile=str
  
@@ -1052,22 +1060,34 @@ Target file/device
  
  .. option:: max_open_zones=int
  
-       A zone of a zoned block device is in the open state when it is partially
-       written (i.e. not all sectors of the zone have been written). Zoned
-       block devices may have a limit on the total number of zones that can
-       be simultaneously in the open state, that is, the number of zones that
-       can be written to simultaneously. The :option:`max_open_zones` parameter
-       limits the number of zones to which write commands are issued by all fio
-       jobs, that is, limits the number of zones that will be in the open
-       state. This parameter is relevant only if the :option:`zonemode` =zbd is
-       used. The default value is always equal to maximum number of open zones
-       of the target zoned block device and a value higher than this limit
-       cannot be specified by users unless the option
-       :option:`ignore_zone_limits` is specified. When
-       :option:`ignore_zone_limits` is specified or the target device has no
-       limit on the number of zones that can be in an open state,
-       :option:`max_open_zones` can specify 0 to disable any limit on the
-       number of zones that can be simultaneously written to by all jobs.
+       When a zone of a zoned block device is partially written (i.e. not all
+       sectors of the zone have been written), the zone is in one of three
+       conditions: 'implicit open', 'explicit open' or 'closed'. Zoned block
+       devices may have a limit called 'max_open_zones' (same name as the
+       parameter) on the total number of zones that can simultaneously be in
+       the 'implicit open' or 'explicit open' conditions. Zoned block devices
+       may have another limit called 'max_active_zones', on the total number of
+       zones that can simultaneously be in the three conditions. The
+       :option:`max_open_zones` parameter limits the number of zones to which
+       write commands are issued by all fio jobs, that is, limits the number of
+       zones that will be in the conditions. When the device has the
+       max_open_zones limit and does not have the max_active_zones limit, the
+       :option:`max_open_zones` parameter limits the number of zones in the two
+       open conditions up to the limit. In this case, fio includes zones in the
+       two open conditions to the write target zones at fio start. When the
+       device has both the max_open_zones and the max_active_zones limits, the
+       :option:`max_open_zones` parameter limits the number of zones in the
+       three conditions up to the limit. In this case, fio includes zones in
+       the three conditions to the write target zones at fio start.
+
+       This parameter is relevant only if the :option:`zonemode` =zbd is used.
+       The default value is always equal to the max_open_zones limit of the
+       target zoned block device and a value higher than this limit cannot be
+       specified by users unless the option :option:`ignore_zone_limits` is
+       specified. When :option:`ignore_zone_limits` is specified or the target
+       device does not have the max_open_zones limit, :option:`max_open_zones`
+       can specify 0 to disable any limit on the number of zones that can be
+       simultaneously written to by all jobs.
  
  .. option:: job_max_open_zones=int
  
@@ -1085,9 +1105,12 @@ Target file/device
  
  .. option:: zone_reset_threshold=float
  
-       A number between zero and one that indicates the ratio of logical
-       blocks with data to the total number of logical blocks in the test
-       above which zones should be reset periodically.
+       A number between zero and one that indicates the ratio of written bytes
+       in the zones with write pointers in the IO range to the size of the IO
+       range. When current ratio is above this ratio, zones are reset
+       periodically as :option:`zone_reset_frequency` specifies. If there are
+       multiple jobs when using this option, the IO range for all write jobs
+       has to be the same.
  
  .. option:: zone_reset_frequency=float
  
@@ -1107,12 +1130,6 @@ I/O type
         OpenBSD and ZFS on Solaris don't support direct I/O.  On Windows the synchronous
         ioengines don't support direct I/O.  Default: false.
  
-.. option:: atomic=bool
-
-       If value is true, attempt to use atomic direct I/O. Atomic writes are
-       guaranteed to be stable once acknowledged by the operating system. Only
-       Linux supports O_ATOMIC right now.
-
  .. option:: buffered=bool
  
         If value is true, use buffered I/O. This is the opposite of the
@@ -1176,13 +1193,34 @@ I/O type
                         Generate the same offset.
  
         ``sequential`` is only useful for random I/O, where fio would normally
-       generate a new random offset for every I/O. If you append e.g. 8 to randread,
-       you would get a new random offset for every 8 I/Os. The result would be a
-       seek for only every 8 I/Os, instead of for every I/O. Use ``rw=randread:8``
-       to specify that. As sequential I/O is already sequential, setting
-       ``sequential`` for that would not result in any differences.  ``identical``
-       behaves in a similar fashion, except it sends the same offset 8 number of
-       times before generating a new offset.
+       generate a new random offset for every I/O. If you append e.g. 8 to
+       randread, i.e. ``rw=randread:8`` you would get a new random offset for
+       every 8 I/Os. The result would be a sequence of 8 sequential offsets
+       with a random starting point. However this behavior may change if a
+       sequential I/O reaches end of the file. As sequential I/O is already
+       sequential, setting ``sequential`` for that would not result in any
+       difference. ``identical`` behaves in a similar fashion, except it sends
+       the same offset 8 number of times before generating a new offset.
+
+       Example #1::
+
+               rw=randread:8
+               rw_sequencer=sequential
+               bs=4k
+
+       The generated sequence of offsets will look like this:
+       4k, 8k, 12k, 16k, 20k, 24k, 28k, 32k, 92k, 96k, 100k, 104k, 108k,
+       112k, 116k, 120k, 48k, 52k ...
+
+       Example #2::
+
+               rw=randread:8
+               rw_sequencer=identical
+               bs=4k
+
+       The generated sequence of offsets will look like this:
+       4k, 4k, 4k, 4k, 4k, 4k, 4k, 4k, 92k, 92k, 92k, 92k, 92k, 92k, 92k, 92k,
+       48k, 48k, 48k ...
  
  .. option:: unified_rw_reporting=str
  
@@ -1212,13 +1250,12 @@ I/O type
  
  .. option:: randrepeat=bool
  
-       Seed the random number generator used for random I/O patterns in a
-       predictable way so the pattern is repeatable across runs. Default: true.
+        Seed all random number generators in a predictable way so the pattern
+        is repeatable across runs. Default: true.
  
  .. option:: allrandrepeat=bool
  
-       Seed all random number generators in a predictable way so results are
-       repeatable across runs.  Default: false.
+       Alias for :option:`randrepeat`. Default: true.
  
  .. option:: randseed=int
  
@@ -1288,6 +1325,11 @@ I/O type
                 **random**
                         Advise using **FADV_RANDOM**.
  
+               **noreuse**
+                       Advise using **FADV_NOREUSE**. This may be a no-op on older Linux
+                       kernels. Since Linux 6.3, it provides a hint to the LRU algorithm.
+                       See the :manpage:`posix_fadvise(2)` man page.
+
  .. option:: write_hint=str
  
         Use :manpage:`fcntl(2)` to advise the kernel what life time to expect
@@ -2123,11 +2165,6 @@ I/O engine
                         before overwriting. The `trimwrite` mode works well for this
                         constraint.
  
-               **pmemblk**
-                       Read and write using filesystem DAX to a file on a filesystem
-                       mounted with DAX on a persistent memory device through the PMDK
-                       libpmemblk library.
-
                 **dev-dax**
                         Read and write using device DAX to a persistent memory device (e.g.,
                         /dev/dax0.0) through the PMDK libpmem library.
@@ -2254,6 +2291,16 @@ with the caveat that when used on the command line, they must come after the
         reads and writes. See :manpage:`ionice(1)`. See also the
         :option:`prioclass` option.
  
+.. option:: cmdprio_hint=int[,int] : [io_uring] [libaio]
+
+       Set the I/O priority hint to use for I/Os that must be issued with
+       a priority when :option:`cmdprio_percentage` or
+       :option:`cmdprio_bssplit` is set. If not specified when
+       :option:`cmdprio_percentage` or :option:`cmdprio_bssplit` is set,
+       this defaults to 0 (no hint). A single value applies to reads and
+       writes. Comma-separated values may be specified for reads and writes.
+       See also the :option:`priohint` option.
+
  .. option:: cmdprio=int[,int] : [io_uring] [libaio]
  
         Set the I/O priority value to use for I/Os that must be issued with
@@ -2280,9 +2327,9 @@ with the caveat that when used on the command line, they must come after the
  
                 cmdprio_bssplit=blocksize/percentage:blocksize/percentage
  
-       In this case, each entry will use the priority class and priority
-       level defined by the options :option:`cmdprio_class` and
-       :option:`cmdprio` respectively.
+       In this case, each entry will use the priority class, priority hint
+       and priority level defined by the options :option:`cmdprio_class`,
+        :option:`cmdprio` and :option:`cmdprio_hint` respectively.
  
         The second accepted format for this option is:
  
@@ -2293,7 +2340,14 @@ with the caveat that when used on the command line, they must come after the
         accepted format does not restrict all entries to have the same priority
         class and priority level.
  
-       For both formats, only the read and write data directions are supported,
+       The third accepted format for this option is:
+
+               cmdprio_bssplit=blocksize/percentage/class/level/hint:...
+
+       This is an extension of the second accepted format that allows to also
+       specify a priority hint.
+
+       For all formats, only the read and write data directions are supported,
         values for trim IOs are ignored. This option is mutually exclusive with
         the :option:`cmdprio_percentage` option.
  
@@ -2410,6 +2464,72 @@ with the caveat that when used on the command line, they must come after the
         For direct I/O, requests will only succeed if cache invalidation isn't required,
         file blocks are fully allocated and the disk request could be issued immediately.
  
+.. option:: fdp=bool : [io_uring_cmd] [xnvme]
+
+       Enable Flexible Data Placement mode for write commands.
+
+.. option:: fdp_pli_select=str : [io_uring_cmd] [xnvme]
+
+       Defines how fio decides which placement ID to use next. The following
+       types are defined:
+
+               **random**
+                       Choose a placement ID at random (uniform).
+
+               **roundrobin**
+                       Round robin over available placement IDs. This is the
+                       default.
+
+       The available placement ID index/indices is defined by the option
+       :option:`fdp_pli`.
+
+.. option:: fdp_pli=str : [io_uring_cmd] [xnvme]
+
+       Select which Placement ID Index/Indicies this job is allowed to use for
+       writes. By default, the job will cycle through all available Placement
+        IDs, so use this to isolate these identifiers to specific jobs. If you
+        want fio to use placement identifier only at indices 0, 2 and 5 specify
+        ``fdp_pli=0,2,5``.
+
+.. option:: md_per_io_size=int : [io_uring_cmd]
+
+       Size in bytes for separate metadata buffer per IO. Default: 0.
+
+.. option:: pi_act=int : [io_uring_cmd]
+
+       Action to take when nvme namespace is formatted with protection
+       information. If this is set to 1 and namespace is formatted with
+       metadata size equal to protection information size, fio won't use
+       separate metadata buffer or extended logical block. If this is set to
+       1 and namespace is formatted with metadata size greater than protection
+       information size, fio will not generate or verify the protection
+       information portion of metadata for write or read case respectively.
+       If this is set to 0, fio generates protection information for
+       write case and verifies for read case. Default: 1.
+
+.. option:: pi_chk=str[,str][,str] : [io_uring_cmd]
+
+       Controls the protection information check. This can take one or more
+       of these values. Default: none.
+
+       **GUARD**
+               Enables protection information checking of guard field.
+       **REFTAG**
+               Enables protection information checking of logical block
+               reference tag field.
+       **APPTAG**
+               Enables protection information checking of application tag field.
+
+.. option:: apptag=int : [io_uring_cmd]
+
+       Specifies logical block application tag value, if namespace is
+       formatted to use end to end protection information. Default: 0x1234.
+
+.. option:: apptag_mask=int : [io_uring_cmd]
+
+       Specifies logical block application tag mask value, if namespace is
+       formatted to use end to end protection information. Default: 0xffff.
+
  .. option:: cpuload=int : [cpuio]
  
         Attempt to use the specified percentage of CPU cycles. This is a mandatory
@@ -2880,6 +3000,28 @@ with the caveat that when used on the command line, they must come after the
  
         xnvme namespace identifier for userspace NVMe driver, SPDK or vfio.
  
+.. option:: xnvme_dev_subnqn=str : [xnvme]
+
+       Sets the subsystem NQN for fabrics. This is for xNVMe to utilize a
+       fabrics target with multiple systems.
+
+.. option:: xnvme_mem=str : [xnvme]
+
+       Select the xnvme memory backend. This can take these values.
+
+       **posix**
+               This is the default posix memory backend for linux NVMe driver.
+       **hugepage**
+               Use hugepages, instead of existing posix memory backend. The
+               memory backend uses hugetlbfs. This require users to allocate
+               hugepages, mount hugetlbfs and set an enviornment variable for
+               XNVME_HUGETLB_PATH.
+       **spdk**
+               Uses SPDK's memory allocator.
+       **vfio**
+               Uses libvfn's memory allocator. This also specifies the use
+               of libvfn backend instead of SPDK.
+
  .. option:: xnvme_iovec=int : [xnvme]
  
         If this option is set. xnvme will use vectored read/write commands.
@@ -2958,6 +3100,10 @@ with the caveat that when used on the command line, they must come after the
         performance. The default is to enable it only if
         :option:`libblkio_wait_mode=eventfd <libblkio_wait_mode>`.
  
+.. option:: no_completion_thread : [windowsaio]
+
+       Avoid using a separate thread for completion polling.
+
  I/O depth
  ~~~~~~~~~
  
@@ -3150,6 +3296,11 @@ I/O rate
         fio will ignore the thinktime and continue doing IO at the specified
         rate, instead of entering a catch-up mode after thinktime is done.
  
+.. option:: rate_cycle=int
+
+       Average bandwidth for :option:`rate` and :option:`rate_min` over this number
+       of milliseconds. Defaults to 1000.
+
  
  I/O latency
  ~~~~~~~~~~~
@@ -3188,11 +3339,6 @@ I/O latency
         microseconds. Comma-separated values may be specified for reads, writes,
         and trims as described in :option:`blocksize`.
  
-.. option:: rate_cycle=int
-
-       Average bandwidth for :option:`rate` and :option:`rate_min` over this number
-       of milliseconds. Defaults to 1000.
-
  
  I/O replay
  ~~~~~~~~~~
@@ -3350,6 +3496,18 @@ Threads, processes and job synchronization
         priority setting, see I/O engine specific :option:`cmdprio_percentage`
         and :option:`cmdprio_class` options.
  
+.. option:: priohint=int
+
+       Set the I/O priority hint. This is only applicable to platforms that
+       support I/O priority classes and to devices with features controlled
+       through priority hints, e.g. block devices supporting command duration
+       limits, or CDL. CDL is a way to indicate the desired maximum latency
+       of I/Os so that the device can optimize its internal command scheduling
+       according to the latency limits indicated by the user.
+
+       For per-I/O priority hint setting, see the I/O engine specific
+       :option:`cmdprio_hint` option.
+
  .. option:: cpus_allowed=str
  
         Controls the same options as :option:`cpumask`, but accepts a textual
@@ -3708,6 +3866,13 @@ Verification
         verification pass, according to the settings in the job file used.  Default
         false.
  
+.. option:: experimental_verify=bool
+
+        Enable experimental verification. Standard verify records I/O metadata
+        for later use during the verification phase. Experimental verify
+        instead resets the file after the write phase and then replays I/Os for
+        the verification phase.
+
  .. option:: trim_percentage=int
  
         Number of verify blocks to discard/trim.
@@ -3724,13 +3889,6 @@ Verification
  
         Trim this number of I/O blocks.
  
-.. option:: experimental_verify=bool
-
-        Enable experimental verification. Standard verify records I/O metadata
-        for later use during the verification phase. Experimental verify
-        instead resets the file after the write phase and then replays I/Os for
-        the verification phase.
-
  Steady state
  ~~~~~~~~~~~~
  
@@ -3772,10 +3930,11 @@ Steady state
  
  .. option:: steadystate_duration=time, ss_dur=time
  
-       A rolling window of this duration will be used to judge whether steady state
-       has been reached. Data will be collected once per second. The default is 0
-       which disables steady state detection.  When the unit is omitted, the
-       value is interpreted in seconds.
+        A rolling window of this duration will be used to judge whether steady
+        state has been reached. Data will be collected every
+        :option:`ss_interval`.  The default is 0 which disables steady state
+        detection.  When the unit is omitted, the value is interpreted in
+        seconds.
  
  .. option:: steadystate_ramp_time=time, ss_ramp=time
  
@@ -3783,6 +3942,14 @@ Steady state
         collection for checking the steady state job termination criterion. The
         default is 0.  When the unit is omitted, the value is interpreted in seconds.
  
+.. option:: steadystate_check_interval=time, ss_interval=time
+
+        The values during the rolling window will be collected with a period of
+        this value. If :option:`ss_interval` is 30s and :option:`ss_dur` is
+        300s, 10 measurements will be taken. Default is 1s but that might not
+        converge, especially for slower devices, so set this accordingly. When
+        the unit is omitted, the value is interpreted in seconds.
+
  
  Measurements and reporting
  ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -3803,6 +3970,13 @@ Measurements and reporting
         same reporting group, unless if separated by a :option:`stonewall`, or by
         using :option:`new_group`.
  
+    NOTE: When :option: `group_reporting` is used along with `json` output,
+    there are certain per-job properties which can be different between jobs
+    but do not have a natural group-level equivalent. Examples include
+    `kb_base`, `unit_base`, `sig_figs`, `thread_number`, `pid`, and
+    `job_start`. For these properties, the values for the first job are
+    recorded for the group.
+
  .. option:: new_group
  
         Start a new reporting group. See: :option:`group_reporting`.  If not given,
@@ -3940,9 +4114,7 @@ Measurements and reporting
  
  .. option:: log_unix_epoch=bool
  
-       If set, fio will log Unix timestamps to the log files produced by enabling
-       write_type_log for each log type, instead of the default zero-based
-       timestamps.
+       Backwards compatible alias for log_alternate_epoch.
  
  .. option:: log_alternate_epoch=bool
  
@@ -3953,9 +4125,9 @@ Measurements and reporting
  
  .. option:: log_alternate_epoch_clock_id=int
  
-       Specifies the clock_id to be used by clock_gettime to obtain the alternate epoch
-       if either log_unix_epoch or log_alternate_epoch are true. Otherwise has no
-       effect. Default value is 0, or CLOCK_REALTIME.
+    Specifies the clock_id to be used by clock_gettime to obtain the alternate
+    epoch if log_alternate_epoch is true. Otherwise has no effect. Default
+    value is 0, or CLOCK_REALTIME.
  
  .. option:: block_error_percentiles=bool
  
@@ -4355,15 +4527,23 @@ writes in the example above).  In the order listed, they denote:
                  It is the sum of submission and completion latency.
  
  **bw**
-               Bandwidth statistics based on samples. Same names as the xlat stats,
-               but also includes the number of samples taken (**samples**) and an
-               approximate percentage of total aggregate bandwidth this thread
-               received in its group (**per**). This last value is only really
-               useful if the threads in this group are on the same disk, since they
-               are then competing for disk access.
+               Bandwidth statistics based on measurements from discrete
+               intervals. Fio continuously monitors bytes transferred and I/O
+               operations completed. By default fio calculates bandwidth in
+               each half-second interval (see :option:`bwavgtime`) and reports
+               descriptive statistics for the measurements here. Same names as
+               the xlat stats, but also includes the number of samples taken
+               (**samples**) and an approximate percentage of total aggregate
+               bandwidth this thread received in its group (**per**). This
+               last value is only really useful if the threads in this group
+               are on the same disk, since they are then competing for disk
+               access.
  
  **iops**
-               IOPS statistics based on samples. Same names as bw.
+               IOPS statistics based on measurements from discrete intervals.
+               For details see the description for bw above. See
+               :option:`iopsavgtime` to control the duration of the intervals.
+               Same values reported here as for bw except for percentage.
  
  **lat (nsec/usec/msec)**
                 The distribution of I/O completion latencies. This is the time from when
@@ -4437,13 +4617,15 @@ For each data direction it prints:
  And finally, the disk statistics are printed. This is Linux specific. They will look like this::
  
    Disk stats (read/write):
-    sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
+    sda: ios=16398/16511, sectors=32321/65472, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
  
  Each value is printed for both reads and writes, with reads first. The
  numbers denote:
  
  **ios**
                 Number of I/Os performed by all groups.
+**sectors**
+               Amount of data transferred in units of 512 bytes for all groups.
  **merge**
                 Number of merges performed by the I/O scheduler.
  **ticks**