t/zbd: Combine write and read fio commands for test case #16

[fio.git] / HOWTO
diff --git a/HOWTO b/HOWTO

index 95289040dd3fd58a0bb70dfb0a94fadf1475e175..8cf8d6506b219e0f9933813616a7d682218adb24 100644 (file)
--- a/HOWTO
+++ b/HOWTO
@@ -93,6 +93,12 @@ Command line options
                         Dump info related to I/O rate switching.
         *compress*
                         Dump info related to log compress/decompress.
+       *steadystate*
+                       Dump info related to steadystate detection.
+       *helperthread*
+                       Dump info related to the helper thread.
+       *zbd*
+                       Dump info related to support for zoned block devices.
         *?* or *help*
                         Show available debug options.
  
@@ -216,8 +222,8 @@ Command line options
  
  .. option:: --alloc-size=kb
  
-       Set the internal smalloc pool size to `kb` in KiB.  The
-       ``--alloc-size`` switch allows one to use a larger pool size for smalloc.
+       Allocate additional internal smalloc pools of size `kb` in KiB.  The
+       ``--alloc-size`` option increases shared memory set aside for use by fio.
         If running large jobs with randommap enabled, fio can run out of memory.
         Smalloc is an internal allocator for shared structures from a fixed size
         memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
@@ -759,8 +765,8 @@ Target file/device
         `filename` semantic (which generates a file for each clone if not
         specified, but lets all clones use the same file if set).
  
-       See the :option:`filename` option for information on how to escape "``:``" and
-       "``\``" characters within the directory path itself.
+       See the :option:`filename` option for information on how to escape "``:``"
+       characters within the directory path itself.
  
         Note: To control the directory fio will use for internal state files
         use :option:`--aux-path`.
@@ -779,10 +785,10 @@ Target file/device
         by this option will be :option:`size` divided by number of files unless an
         explicit size is specified by :option:`filesize`.
  
-       Each colon and backslash in the wanted path must be escaped with a ``\``
+       Each colon in the wanted path must be escaped with a ``\``
         character.  For instance, if the path is :file:`/dev/dsk/foo@3,0:c` then you
         would use ``filename=/dev/dsk/foo@3,0\:c`` and if the path is
-       :file:`F:\\filename` then you would use ``filename=F\:\\filename``.
+       :file:`F:\\filename` then you would use ``filename=F\:\filename``.
  
         On Windows, disk devices are accessed as :file:`\\\\.\\PhysicalDrive0` for
         the first device, :file:`\\\\.\\PhysicalDrive1` for the second etc.
@@ -1167,6 +1173,10 @@ I/O type
                         Pre-allocate via :manpage:`fallocate(2)` with
                         FALLOC_FL_KEEP_SIZE set.
  
+               **truncate**
+                       Extend file to final size via :manpage:`ftruncate(2)`
+                       instead of allocating.
+
                 **0**
                         Backward-compatible alias for **none**.
  
@@ -1176,7 +1186,15 @@ I/O type
         May not be available on all supported platforms. **keep** is only available
         on Linux. If using ZFS on Solaris this cannot be set to **posix**
         because ZFS doesn't support pre-allocation. Default: **native** if any
-       pre-allocation methods are available, **none** if not.
+       pre-allocation methods except **truncate** are available, **none** if not.
+
+       Note that using **truncate** on Windows will interact surprisingly
+       with non-sequential write patterns. When writing to a file that has
+       been extended by setting the end-of-file information, Windows will
+       backfill the unwritten portion of the file up to that offset with
+       zeroes before issuing the new write. This means that a single small
+       write to the end of an extended file will stall until the entire
+       file has been filled with zeroes.
  
  .. option:: fadvise_hint=str
  
@@ -1246,7 +1264,9 @@ I/O type
         is incremented for each sub-job (i.e. when :option:`numjobs` option is
         specified). This option is useful if there are several jobs which are
         intended to operate on a file in parallel disjoint segments, with even
-       spacing between the starting points.
+       spacing between the starting points. Percentages can be used for this option.
+       If a percentage is given, the generated offset will be aligned to the minimum
+       ``blocksize`` or to the value of ``offset_align`` if provided.
  
  .. option:: number_ios=int
  
@@ -1271,7 +1291,7 @@ I/O type
  .. option:: fdatasync=int
  
         Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
-       not metadata blocks.  In Windows, FreeBSD, and DragonFlyBSD there is no
+       not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
         :manpage:`fdatasync(2)` so this falls back to using :manpage:`fsync(2)`.
         Defaults to 0, which means fio does not periodically issue and wait for a
         data-only sync to complete.
@@ -1805,6 +1825,11 @@ I/O engine
                 **pvsync2**
                         Basic :manpage:`preadv2(2)` or :manpage:`pwritev2(2)` I/O.
  
+               **io_uring**
+                       Fast Linux native asynchronous I/O. Supports async IO
+                       for both direct and buffered IO.
+                       This engine defines engine specific options.
+
                 **libaio**
                         Linux native asynchronous I/O. Note that Linux may only support
                         queued behavior with non-buffered I/O (set ``direct=1`` or
@@ -1971,6 +1996,11 @@ I/O engine
                         set  `filesize` so that all the accounting still occurs, but no
                         actual I/O will be done other than creating the file.
  
+               **filestat**
+                       Simply do stat() and do no I/O to the file. You need to set 'filesize'
+                       and 'nrfiles', so that files will be created.
+                       This engine is to measure file lookup and meta data access.
+
                 **libpmem**
                         Read and write using mmap I/O to a file on a filesystem
                         mounted with DAX on a persistent memory device through the PMDK
@@ -1991,6 +2021,10 @@ I/O engine
                         Asynchronous read and write using DDN's Infinite Memory Engine (IME).
                         This engine will try to stack as much IOs as possible by creating
                         requests for IME. FIO will then decide when to commit these requests.
+               **libiscsi**
+                       Read and write iscsi lun with libiscsi.
+               **nbd**
+                       Read and write a Network Block Device (NBD).
  
  I/O engine specific parameters
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2000,6 +2034,51 @@ In addition, there are some parameters which are only valid when a specific
  with the caveat that when used on the command line, they must come after the
  :option:`ioengine` that defines them is selected.
  
+.. option:: cmdprio_percentage=int : [io_uring] [libaio]
+
+    Set the percentage of I/O that will be issued with higher priority by setting
+    the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
+    This option cannot be used with the `prio` or `prioclass` options. For this
+    option to set the priority bit properly, NCQ priority must be supported and
+    enabled and :option:`direct`\=1 option must be used. fio must also be run as
+    the root user.
+
+.. option:: fixedbufs : [io_uring]
+
+    If fio is asked to do direct IO, then Linux will map pages for each
+    IO call, and release them when IO is done. If this option is set, the
+    pages are pre-mapped before IO is started. This eliminates the need to
+    map and release for each IO. This is more efficient, and reduces the
+    IO latency as well.
+
+.. option:: hipri : [io_uring]
+
+    If this option is set, fio will attempt to use polled IO completions.
+    Normal IO completions generate interrupts to signal the completion of
+    IO, polled completions do not. Hence they are require active reaping
+    by the application. The benefits are more efficient IO for high IOPS
+    scenarios, and lower latencies for low queue depth IO.
+
+.. option:: registerfiles : [io_uring]
+
+       With this option, fio registers the set of files being used with the
+       kernel. This avoids the overhead of managing file counts in the kernel,
+       making the submission and completion part more lightweight. Required
+       for the below :option:`sqthread_poll` option.
+
+.. option:: sqthread_poll : [io_uring]
+
+       Normally fio will submit IO by issuing a system call to notify the
+       kernel of available items in the SQ ring. If this option is set, the
+       act of submitting IO will be done by a polling thread in the kernel.
+       This frees up cycles for fio, at the cost of using more CPU in the
+       system.
+
+.. option:: sqthread_poll_cpu : [io_uring]
+
+       When :option:`sqthread_poll` is set, this option provides a way to
+       define which CPU should be used for the polling thread.
+
  .. option:: userspace_reap : [libaio]
  
         Normally, with the libaio engine in use, fio will use the
@@ -2018,6 +2097,26 @@ with the caveat that when used on the command line, they must come after the
         When hipri is set this determines the probability of a pvsync2 I/O being high
         priority. The default is 100%.
  
+.. option:: nowait : [pvsync2] [libaio] [io_uring]
+
+       By default if a request cannot be executed immediately (e.g. resource starvation,
+       waiting on locks) it is queued and the initiating process will be blocked until
+       the required resource becomes free.
+
+       This option sets the RWF_NOWAIT flag (supported from the 4.14 Linux kernel) and
+       the call will return instantly with EAGAIN or a partial result rather than waiting.
+
+       It is useful to also use ignore_error=EAGAIN when using this option.
+
+       Note: glibc 2.27, 2.28 have a bug in syscall wrappers preadv2, pwritev2.
+       They return EOPNOTSUP instead of EAGAIN.
+
+       For cached I/O, using this option usually means a request operates only with
+       cached data. Currently the RWF_NOWAIT flag does not supported for cached write.
+
+       For direct I/O, requests will only succeed if cache invalidation isn't required,
+       file blocks are fully allocated and the disk request could be issued immediately.
+
  .. option:: cpuload=int : [cpuio]
  
         Attempt to use the specified percentage of CPU cycles. This is a mandatory
@@ -2183,6 +2282,11 @@ with the caveat that when used on the command line, they must come after the
         multiple paths exist between the client and the server or in certain loopback
         configurations.
  
+.. option:: stat_type=str : [filestat]
+
+       Specify stat system call type to measure lookup/getattr performance.
+       Default is **stat** for :manpage:`stat(2)`.
+
  .. option:: readfua=bool : [sg]
  
         With readfua option set to 1, read operations include
@@ -2263,6 +2367,15 @@ with the caveat that when used on the command line, they must come after the
         turns on verbose logging from libcurl, 2 additionally enables
         HTTP IO tracing. Default is **0**
  
+.. option:: uri=str : [nbd]
+
+       Specify the NBD URI of the server to test.  The string
+       is a standard NBD URI
+       (see https://github.com/NetworkBlockDevice/nbd/tree/master/doc).
+       Example URIs: nbd://localhost:10809
+       nbd+unix:///?socket=/tmp/socket
+       nbds://tlshost/exportname
+
  I/O depth
  ~~~~~~~~~
  
@@ -2339,8 +2452,13 @@ I/O depth
         ``serialize_overlap`` tells fio to avoid provoking this behavior by explicitly
         serializing in-flight I/Os that have a non-zero overlap. Note that setting
         this option can reduce both performance and the :option:`iodepth` achieved.
-       Additionally this option does not work when :option:`io_submit_mode` is set to
-       offload. Default: false.
+
+       This option only applies to I/Os issued for a single job except when it is
+       enabled along with :option:`io_submit_mode`\=offload. In offload mode, fio
+       will check for overlap among all I/Os submitted by offload jobs with :option:`serialize_overlap`
+       enabled.
+
+       Default: false.
  
  .. option:: io_submit_mode=str
  
@@ -2454,6 +2572,13 @@ I/O latency
         defaults to 100.0, meaning that all I/Os must be equal or below to the value
         set by :option:`latency_target`.
  
+.. option:: latency_run=bool
+
+       Used with :option:`latency_target`. If false (default), fio will find
+       the highest queue depth that meets :option:`latency_target` and exit. If
+       true, fio will continue running and try to meet :option:`latency_target`
+       by adjusting queue depth.
+
  .. option:: max_latency=time
  
         If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
@@ -2486,7 +2611,7 @@ I/O replay
         (``blkparse <device> -o /dev/null -d file_for_fio.bin``).
         You can specify a number of files by separating the names with a ':'
         character. See the :option:`filename` option for information on how to
-       escape ':' and '\' characters within the file names. These files will
+       escape ':' characters within the file names. These files will
         be sequentially assigned to job clones created by :option:`numjobs`.
  
  .. option:: read_iolog_chunked=bool
@@ -2559,12 +2684,13 @@ I/O replay
  
  .. option:: replay_align=int
  
-       Force alignment of I/O offsets and lengths in a trace to this power of 2
-       value.
+       Force alignment of the byte offsets in a trace to this value. The value
+       must be a power of 2.
  
  .. option:: replay_scale=int
  
-       Scale sector offsets down by this factor when replaying traces.
+       Scale byte offsets down by this factor when replaying traces. Should most
+       likely use :option:`replay_align` as well.
  
  .. option:: replay_skip=str
  
@@ -2607,11 +2733,15 @@ Threads, processes and job synchronization
         Set the I/O priority value of this job. Linux limits us to a positive value
         between 0 and 7, with 0 being the highest.  See man
         :manpage:`ionice(1)`. Refer to an appropriate manpage for other operating
-       systems since meaning of priority may differ.
+       systems since meaning of priority may differ. For per-command priority
+       setting, see I/O engine specific `cmdprio_percentage` and `hipri_percentage`
+       options.
  
  .. option:: prioclass=int
  
-       Set the I/O priority class. See man :manpage:`ionice(1)`.
+       Set the I/O priority class. See man :manpage:`ionice(1)`. For per-command
+       priority setting, see I/O engine specific `cmdprio_percentage` and
+       `hipri_percentage` options.
  
  .. option:: cpus_allowed=str
  
@@ -2646,7 +2776,7 @@ Threads, processes and job synchronization
                         Each job will get a unique CPU from the CPU set.
  
         **shared** is the default behavior, if the option isn't specified. If
-       **split** is specified, then fio will will assign one cpu per job. If not
+       **split** is specified, then fio will assign one cpu per job. If not
         enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
         in the set.
  
@@ -2735,9 +2865,21 @@ Threads, processes and job synchronization
  
  .. option:: exitall
  
-       By default, fio will continue running all other jobs when one job finishes
-       but sometimes this is not the desired action.  Setting ``exitall`` will
-       instead make fio terminate all other jobs when one job finishes.
+       By default, fio will continue running all other jobs when one job finishes.
+       Sometimes this is not the desired action.  Setting ``exitall`` will instead
+       make fio terminate all jobs in the same group, as soon as one job of that
+       group finishes.
+
+.. option:: exit_what
+
+       By default, fio will continue running all other jobs when one job finishes.
+       Sometimes this is not the desired action. Setting ``exit_all`` will
+       instead make fio terminate all jobs in the same group. The option
+        ``exit_what`` allows to control which jobs get terminated when ``exitall`` is
+        enabled. The default is ``group`` and does not change the behaviour of
+        ``exitall``. The setting ``all`` terminates all jobs. The setting ``stonewall``
+        terminates all currently running jobs across all groups and continues execution
+        with the next stonewalled group.
  
  .. option:: exec_prerun=str
  
@@ -3233,27 +3375,28 @@ Measurements and reporting
         Disable measurements of throughput/bandwidth numbers. See
         :option:`disable_lat`.
  
+.. option:: slat_percentiles=bool
+
+       Report submission latency percentiles. Submission latency is not recorded
+       for synchronous ioengines.
+
  .. option:: clat_percentiles=bool
  
-       Enable the reporting of percentiles of completion latencies.  This
-       option is mutually exclusive with :option:`lat_percentiles`.
+       Report completion latency percentiles.
  
  .. option:: lat_percentiles=bool
  
-       Enable the reporting of percentiles of I/O latencies. This is similar
-       to :option:`clat_percentiles`, except that this includes the
-       submission latency. This option is mutually exclusive with
-       :option:`clat_percentiles`.
+       Report total latency percentiles. Total latency is the sum of submission
+       latency and completion latency.
  
  .. option:: percentile_list=float_list
  
-       Overwrite the default list of percentiles for completion latencies and
-       the block error histogram.  Each number is a floating number in the
-       range (0,100], and the maximum length of the list is 20. Use ``:`` to
-       separate the numbers, and list the numbers in ascending order. For
+       Overwrite the default list of percentiles for latencies and the block error
+       histogram.  Each number is a floating point number in the range (0,100], and
+       the maximum length of the list is 20. Use ``:`` to separate the numbers. For
         example, ``--percentile_list=99.5:99.9`` will cause fio to report the
-       values of completion latency below which 99.5% and 99.9% of the observed
-       latencies fell, respectively.
+       latency durations below which 99.5% and 99.9% of the observed latencies fell,
+       respectively.
  
  .. option:: significant_figures=int
  
@@ -3685,7 +3828,8 @@ is one long line of values, such as::
      2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
      A description of this job goes here.
  
-The job description (if provided) follows on a second line.
+The job description (if provided) follows on a second line for terse v2.
+It appears on the same line for other terse versions.
  
  To enable terse output, use the :option:`--minimal` or
  :option:`--output-format`\=terse command line options. The
@@ -3770,6 +3914,11 @@ minimal output v3, separated by semicolons::
  
          terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
  
+In client/server mode terse output differs from what appears when jobs are run
+locally. Disk utilization data is omitted from the standard terse output and
+for v3 and later appears on its own separate line at the end of each terse
+reporting cycle.
+
  
  JSON output
  ------------
@@ -3893,7 +4042,7 @@ only file passed to :option:`read_iolog`. An example would look like::
         $ fio --read_iolog="<file1>:<file2>" --merge_blktrace_file="<output_file>"
  
  Creating only the merged file can be done by passing the command line argument
-:option:`merge-blktrace-only`.
+:option:`--merge-blktrace-only`.
  
  Scaling traces can be done to see the relative impact of any particular trace
  being slowed down or sped up. :option:`merge_blktrace_scalars` takes in a colon
@@ -4016,7 +4165,7 @@ Fio supports a variety of log file formats, for logging latencies, bandwidth,
  and IOPS. The logs share a common format, which looks like this:
  
      *time* (`msec`), *value*, *data direction*, *block size* (`bytes`),
-    *offset* (`bytes`)
+    *offset* (`bytes`), *command priority*
  
  *Time* for the log entry is always in milliseconds. The *value* logged depends
  on the type of log, it will be one of the following:
@@ -4041,6 +4190,9 @@ The entry's *block size* is always in bytes. The *offset* is the position in byt
  from the start of the file for that particular I/O. The logging of the offset can be
  toggled with :option:`log_offset`.
  
+*Command priority* is 0 for normal priority and 1 for high priority. This is controlled
+by the ioengine specific :option:`cmdprio_percentage`.
+
  Fio defaults to logging every individual I/O but when windowed logging is set
  through :option:`log_avg_msec`, either the average (by default) or the maximum
  (:option:`log_max_value` is set) *value* seen over the specified period of time
@@ -4048,6 +4200,7 @@ is recorded. Each *data direction* seen within the window period will aggregate
  its values in a separate row. Further, when using windowed logging the *block
  size* and *offset* entries will always contain 0.
  
+
  Client/Server
  -------------
  
@@ -4135,3 +4288,6 @@ containing two hostnames ``h1`` and ``h2`` with IP addresses 192.168.10.120 and
  
         /mnt/nfs/fio/192.168.10.120.fileio.tmp
         /mnt/nfs/fio/192.168.10.121.fileio.tmp
+
+Terse output in client/server mode will differ slightly from what is produced
+when fio is run in stand-alone mode. See the terse output section for details.