engines/libzbc: enable block backend

[fio.git] / HOWTO
diff --git a/HOWTO b/HOWTO

index f19f9226a93de504d04aad6c60c27856bf1b19cb..b6d1b58a3af46147091fb1a3828362fff10b0e0a 100644 (file)
--- a/HOWTO
+++ b/HOWTO
@@ -970,14 +970,15 @@ Target file/device
         Accepted values are:
  
                 **none**
-                               The :option:`zonerange`, :option:`zonesize` and
-                               :option:`zoneskip` parameters are ignored.
+                               The :option:`zonerange`, :option:`zonesize`,
+                               :option `zonecapacity` and option:`zoneskip`
+                               parameters are ignored.
                 **strided**
                                 I/O happens in a single zone until
                                 :option:`zonesize` bytes have been transferred.
                                 After that number of bytes has been
                                 transferred processing of the next zone
-                               starts.
+                               starts. :option `zonecapacity` is ignored.
                 **zbd**
                                 Zoned block device mode. I/O happens
                                 sequentially in each zone, even if random I/O
@@ -1004,6 +1005,17 @@ Target file/device
         For :option:`zonemode` =zbd, this is the size of a single zone. The
         :option:`zonerange` parameter is ignored in this mode.
  
+
+.. option:: zonecapacity=int
+
+       For :option:`zonemode` =zbd, this defines the capacity of a single zone,
+       which is the accessible area starting from the zone start address.
+       This parameter only applies when using :option:`zonemode` =zbd in
+       combination with regular block devices. If not specified it defaults to
+       the zone size. If the target device is a zoned block device, the zone
+       capacity is obtained from the device information and this option is
+       ignored.
+
  .. option:: zoneskip=int
  
         For :option:`zonemode` =strided, the number of bytes to skip after
@@ -1349,7 +1361,7 @@ I/O type
         limit reads or writes to a certain rate.  If that is the case, then the
         distribution may be skewed. Default: 50.
  
-.. option:: random_distribution=str:float[,str:float][,str:float]
+.. option:: random_distribution=str:float[:float][,str:float][,str:float]
  
         By default, fio will use a completely uniform random distribution when asked
         to perform random I/O. Sometimes it is useful to skew the distribution in
@@ -1384,6 +1396,14 @@ I/O type
         map. For the **normal** distribution, a normal (Gaussian) deviation is
         supplied as a value between 0 and 100.
  
+       The second, optional float is allowed for **pareto**, **zipf** and **normal** distributions.
+       It allows to set base of distribution in non-default place, giving more control
+       over most probable outcome. This value is in range [0-1] which maps linearly to
+       range of possible random values.
+       Defaults are: random for **pareto** and **zipf**, and 0.5 for **normal**.
+       If you wanted to use **zipf** with a `theta` of 1.2 centered on 1/4 of allowed value range,
+       you would use ``random_distibution=zipf:1.2:0.25``.
+
         For a **zoned** distribution, fio supports specifying percentages of I/O
         access that should fall within what range of the file or device. For
         example, given a criteria of:
@@ -1665,10 +1685,28 @@ Buffers and memory
         This will be ignored if :option:`pre_read` is also specified for the
         same job.
  
-.. option:: sync=bool
+.. option:: sync=str
+
+       Whether, and what type, of synchronous I/O to use for writes.  The allowed
+       values are:
+
+               **none**
+                       Do not use synchronous IO, the default.
+
+               **0**
+                       Same as **none**.
+
+               **sync**
+                       Use synchronous file IO. For the majority of I/O engines,
+                       this means using O_SYNC.
+
+               **1**
+                       Same as **sync**.
+
+               **dsync**
+                       Use synchronous data IO. For the majority of I/O engines,
+                       this means using O_DSYNC.
  
-       Use synchronous I/O for buffered writes. For the majority of I/O engines,
-       this means using O_SYNC. Default: false.
  
  .. option:: iomem=str, mem=str
  
@@ -1882,20 +1920,14 @@ I/O engine
  
                 **cpuio**
                         Doesn't transfer any data, but burns CPU cycles according to the
-                       :option:`cpuload` and :option:`cpuchunks` options. Setting
-                       :option:`cpuload`\=85 will cause that job to do nothing but burn 85%
+                       :option:`cpuload`, :option:`cpuchunks` and :option:`cpumode` options.
+                       Setting :option:`cpuload`\=85 will cause that job to do nothing but burn 85%
                         of the CPU. In case of SMP machines, use :option:`numjobs`\=<nr_of_cpu>
                         to get desired CPU usage, as the cpuload only loads a
                         single CPU at the desired rate. A job never finishes unless there is
                         at least one non-cpuio job.
-
-               **guasi**
-                       The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-                       Interface approach to async I/O. See
-
-                       http://www.xmailserver.org/guasi-lib.html
-
-                       for more info on GUASI.
+                       Setting :option:`cpumode`\=qsort replace the default noop instructions loop
+                       by a qsort algorithm to consume more energy.
  
                 **rdma**
                         The RDMA I/O engine supports both RDMA memory semantics
@@ -2026,6 +2058,14 @@ I/O engine
                 **nbd**
                         Read and write a Network Block Device (NBD).
  
+               **libcufile**
+                       I/O engine supporting libcufile synchronous access to nvidia-fs and a
+                       GPUDirect Storage-supported filesystem. This engine performs
+                       I/O without transferring buffers between user-space and the kernel,
+                       unless :option:`verify` is set or :option:`cuda_io` is `posix`.
+                       :option:`iomem` must not be `cudamalloc`. This ioengine defines
+                       engine specific options.
+
  I/O engine specific parameters
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
@@ -2040,7 +2080,8 @@ with the caveat that when used on the command line, they must come after the
      the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
      This option cannot be used with the `prio` or `prioclass` options. For this
      option to set the priority bit properly, NCQ priority must be supported and
-    enabled and :option:`direct`\=1 option must be used.
+    enabled and :option:`direct`\=1 option must be used. fio must also be run as
+    the root user.
  
  .. option:: fixedbufs : [io_uring]
  
@@ -2096,6 +2137,26 @@ with the caveat that when used on the command line, they must come after the
         When hipri is set this determines the probability of a pvsync2 I/O being high
         priority. The default is 100%.
  
+.. option:: nowait : [pvsync2] [libaio] [io_uring]
+
+       By default if a request cannot be executed immediately (e.g. resource starvation,
+       waiting on locks) it is queued and the initiating process will be blocked until
+       the required resource becomes free.
+
+       This option sets the RWF_NOWAIT flag (supported from the 4.14 Linux kernel) and
+       the call will return instantly with EAGAIN or a partial result rather than waiting.
+
+       It is useful to also use ignore_error=EAGAIN when using this option.
+
+       Note: glibc 2.27, 2.28 have a bug in syscall wrappers preadv2, pwritev2.
+       They return EOPNOTSUP instead of EAGAIN.
+
+       For cached I/O, using this option usually means a request operates only with
+       cached data. Currently the RWF_NOWAIT flag does not supported for cached write.
+
+       For direct I/O, requests will only succeed if cache invalidation isn't required,
+       file blocks are fully allocated and the disk request could be issued immediately.
+
  .. option:: cpuload=int : [cpuio]
  
         Attempt to use the specified percentage of CPU cycles. This is a mandatory
@@ -2261,9 +2322,10 @@ with the caveat that when used on the command line, they must come after the
         multiple paths exist between the client and the server or in certain loopback
         configurations.
  
-.. option:: lstat=bool : [filestat]
+.. option:: stat_type=str : [filestat]
  
-       Use lstat(2) to measure lookup/getattr performance. Default is 0.
+       Specify stat system call type to measure lookup/getattr performance.
+       Default is **stat** for :manpage:`stat(2)`.
  
  .. option:: readfua=bool : [sg]
  
@@ -2297,6 +2359,18 @@ with the caveat that when used on the command line, they must come after the
                 transferred to the device. The writefua option is ignored with this
                 selection.
  
+.. option:: hipri : [sg]
+
+       If this option is set, fio will attempt to use polled IO completions.
+       This will have a similar effect as (io_uring)hipri. Only SCSI READ and
+       WRITE commands will have the SGV4_FLAG_HIPRI set (not UNMAP (trim) nor
+       VERIFY). Older versions of the Linux sg driver that do not support
+       hipri will simply ignore this flag and do normal IO. The Linux SCSI
+       Low Level Driver (LLD) that "owns" the device also needs to support
+       hipri (also known as iopoll and mq_poll). The MegaRAID driver is an
+       example of a SCSI LLD. Default: clear (0) which does normal
+       (interrupted based) IO.
+
  .. option:: http_host=str : [http]
  
         Hostname to connect to. For S3, this could be the bucket hostname.
@@ -2354,6 +2428,28 @@ with the caveat that when used on the command line, they must come after the
         nbd+unix:///?socket=/tmp/socket
         nbds://tlshost/exportname
  
+.. option:: gpu_dev_ids=str : [libcufile]
+
+       Specify the GPU IDs to use with CUDA. This is a colon-separated list of
+       int. GPUs are assigned to workers roundrobin. Default is 0.
+
+.. option:: cuda_io=str : [libcufile]
+
+       Specify the type of I/O to use with CUDA. Default is **cufile**.
+
+       **cufile**
+               Use libcufile and nvidia-fs. This option performs I/O directly
+               between a GPUDirect Storage filesystem and GPU buffers,
+               avoiding use of a bounce buffer. If :option:`verify` is set,
+               cudaMemcpy is used to copy verificaton data between RAM and GPU.
+               Verification data is copied from RAM to GPU before a write
+               and from GPU to RAM after a read. :option:`direct` must be 1.
+       **posix**
+               Use POSIX to perform I/O with a RAM buffer, and use cudaMemcpy
+               to transfer data between RAM and the GPUs. Data is copied from
+               GPU to RAM before a write and copied from RAM to GPU after a
+               read. :option:`verify` does not affect use of cudaMemcpy.
+
  I/O depth
  ~~~~~~~~~
  
@@ -2448,7 +2544,8 @@ I/O depth
         can increase latencies. The benefit is that fio can manage submission rates
         independently of the device completion rates. This avoids skewed latency
         reporting if I/O gets backed up on the device side (the coordinated omission
-       problem).
+       problem). Note that this option cannot reliably be used with async IO
+       engines.
  
  
  I/O rate
@@ -2477,6 +2574,13 @@ I/O rate
         before we have to complete it and do our :option:`thinktime`. In other words, this
         setting effectively caps the queue depth if the latter is larger.
  
+.. option:: thinktime_blocks_type=str
+
+       Only valid if :option:`thinktime` is set - control how :option:`thinktime_blocks`
+       triggers. The default is `complete`, which triggers thinktime when fio completes
+       :option:`thinktime_blocks` blocks. If this is set to `issue`, then the trigger happens
+       at the issue side.
+
  .. option:: rate=int[,int][,int]
  
         Cap the bandwidth used by this job. The number is in bytes/sec, the normal
@@ -2550,6 +2654,13 @@ I/O latency
         defaults to 100.0, meaning that all I/Os must be equal or below to the value
         set by :option:`latency_target`.
  
+.. option:: latency_run=bool
+
+       Used with :option:`latency_target`. If false (default), fio will find
+       the highest queue depth that meets :option:`latency_target` and exit. If
+       true, fio will continue running and try to meet :option:`latency_target`
+       by adjusting queue depth.
+
  .. option:: max_latency=time
  
         If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
@@ -2584,6 +2695,9 @@ I/O replay
         character. See the :option:`filename` option for information on how to
         escape ':' characters within the file names. These files will
         be sequentially assigned to job clones created by :option:`numjobs`.
+       '-' is a reserved name, meaning read from stdin, notably if
+       :option:`filename` is set to '-' which means stdin as well, then
+       this flag can't be set to '-'.
  
  .. option:: read_iolog_chunked=bool
  
@@ -2817,15 +2931,10 @@ Threads, processes and job synchronization
         ``flow=8`` and another job has ``flow=-1``, then there will be a roughly 1:8
         ratio in how much one runs vs the other.
  
-.. option:: flow_watermark=int
-
-       The maximum value that the absolute value of the flow counter is allowed to
-       reach before the job must wait for a lower value of the counter.
-
  .. option:: flow_sleep=int
  
-       The period of time, in microseconds, to wait after the flow watermark has
-       been exceeded before retrying operations.
+       The period of time, in microseconds, to wait after the flow counter
+       has exceeded its proportion before retrying operations.
  
  .. option:: stonewall, wait_for_previous
  
@@ -3346,27 +3455,28 @@ Measurements and reporting
         Disable measurements of throughput/bandwidth numbers. See
         :option:`disable_lat`.
  
+.. option:: slat_percentiles=bool
+
+       Report submission latency percentiles. Submission latency is not recorded
+       for synchronous ioengines.
+
  .. option:: clat_percentiles=bool
  
-       Enable the reporting of percentiles of completion latencies.  This
-       option is mutually exclusive with :option:`lat_percentiles`.
+       Report completion latency percentiles.
  
  .. option:: lat_percentiles=bool
  
-       Enable the reporting of percentiles of I/O latencies. This is similar
-       to :option:`clat_percentiles`, except that this includes the
-       submission latency. This option is mutually exclusive with
-       :option:`clat_percentiles`.
+       Report total latency percentiles. Total latency is the sum of submission
+       latency and completion latency.
  
  .. option:: percentile_list=float_list
  
-       Overwrite the default list of percentiles for completion latencies and
-       the block error histogram.  Each number is a floating number in the
-       range (0,100], and the maximum length of the list is 20. Use ``:`` to
-       separate the numbers, and list the numbers in ascending order. For
+       Overwrite the default list of percentiles for latencies and the block error
+       histogram.  Each number is a floating point number in the range (0,100], and
+       the maximum length of the list is 20. Use ``:`` to separate the numbers. For
         example, ``--percentile_list=99.5:99.9`` will cause fio to report the
-       values of completion latency below which 99.5% and 99.9% of the observed
-       latencies fell, respectively.
+       latency durations below which 99.5% and 99.9% of the observed latencies fell,
+       respectively.
  
  .. option:: significant_figures=int
  
@@ -3882,7 +3992,7 @@ will be a disk utilization section.
  Below is a single line containing short names for each of the fields in the
  minimal output v3, separated by semicolons::
  
-        terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+        terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth_kb;read_iops;read_runtime_ms;read_slat_min_us;read_slat_max_us;read_slat_mean_us;read_slat_dev_us;read_clat_min_us;read_clat_max_us;read_clat_mean_us;read_clat_dev_us;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min_us;read_lat_max_us;read_lat_mean_us;read_lat_dev_us;read_bw_min_kb;read_bw_max_kb;read_bw_agg_pct;read_bw_mean_kb;read_bw_dev_kb;write_kb;write_bandwidth_kb;write_iops;write_runtime_ms;write_slat_min_us;write_slat_max_us;write_slat_mean_us;write_slat_dev_us;write_clat_min_us;write_clat_max_us;write_clat_mean_us;write_clat_dev_us;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min_us;write_lat_max_us;write_lat_mean_us;write_lat_dev_us;write_bw_min_kb;write_bw_max_kb;write_bw_agg_pct;write_bw_mean_kb;write_bw_dev_kb;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
  
  In client/server mode terse output differs from what appears when jobs are run
  locally. Disk utilization data is omitted from the standard terse output and
@@ -4135,7 +4245,7 @@ Fio supports a variety of log file formats, for logging latencies, bandwidth,
  and IOPS. The logs share a common format, which looks like this:
  
      *time* (`msec`), *value*, *data direction*, *block size* (`bytes`),
-    *offset* (`bytes`)
+    *offset* (`bytes`), *command priority*
  
  *Time* for the log entry is always in milliseconds. The *value* logged depends
  on the type of log, it will be one of the following:
@@ -4160,6 +4270,9 @@ The entry's *block size* is always in bytes. The *offset* is the position in byt
  from the start of the file for that particular I/O. The logging of the offset can be
  toggled with :option:`log_offset`.
  
+*Command priority* is 0 for normal priority and 1 for high priority. This is controlled
+by the ioengine specific :option:`cmdprio_percentage`.
+
  Fio defaults to logging every individual I/O but when windowed logging is set
  through :option:`log_avg_msec`, either the average (by default) or the maximum
  (:option:`log_max_value` is set) *value* seen over the specified period of time