Document io_uring feature

[fio.git] / fio.1
diff --git a/fio.1 b/fio.1

index e28a1fa7b23699696c465399cf2f899d96ac92e1..2966d9d5f0e35acb45e4f1b099bd6a7c0bd381e1 100644 (file)
--- a/fio.1
+++ b/fio.1
@@ -201,6 +201,8 @@ argument, \fB\-\-cmdhelp\fR will detail the given \fIcommand\fR.
  See the `examples/' directory for inspiration on how to write job files. Note
  the copyright and license requirements currently apply to
  `examples/' files.
+
+Note that the maximum length of a line in the job file is 8192 bytes.
  .SH "JOB FILE PARAMETERS"
  Some parameters take an option of a given type, such as an integer or a
  string. Anywhere a numeric value is required, an arithmetic expression may be
@@ -1751,12 +1753,38 @@ are "contiguous" and the IO depth is not exceeded) before issuing a call to IME.
  Asynchronous read and write using DDN's Infinite Memory Engine (IME). This
  engine will try to stack as much IOs as possible by creating requests for IME.
  FIO will then decide when to commit these requests.
+.TP
+.B libiscsi
+Read and write iscsi lun with libiscsi.
  .SS "I/O engine specific parameters"
  In addition, there are some parameters which are only valid when a specific
  \fBioengine\fR is in use. These are used identically to normal parameters,
  with the caveat that when used on the command line, they must come after the
  \fBioengine\fR that defines them is selected.
  .TP
+.BI (io_uring)hipri
+If this option is set, fio will attempt to use polled IO completions. Normal IO
+completions generate interrupts to signal the completion of IO, polled
+completions do not. Hence they are require active reaping by the application.
+The benefits are more efficient IO for high IOPS scenarios, and lower latencies
+for low queue depth IO.
+.TP
+.BI (io_uring)fixedbufs
+If fio is asked to do direct IO, then Linux will map pages for each IO call, and
+release them when IO is done. If this option is set, the pages are pre-mapped
+before IO is started. This eliminates the need to map and release for each IO.
+This is more efficient, and reduces the IO latency as well.
+.TP
+.BI (io_uring)sqthread_poll
+Normally fio will submit IO by issuing a system call to notify the kernel of
+available items in the SQ ring. If this option is set, the act of submitting IO
+will be done by a polling thread in the kernel. This frees up cycles for fio, at
+the cost of using more CPU in the system.
+.TP
+.BI (io_uring)sqthread_poll_cpu
+When `sqthread_poll` is set, this option provides a way to define which CPU
+should be used for the polling thread.
+.TP
  .BI (libaio)userspace_reap
  Normally, with the libaio engine in use, fio will use the
  \fBio_getevents\fR\|(3) system call to reap newly returned events. With
@@ -2070,8 +2098,15 @@ changing data and the overlapping region has a non-zero size. Setting
  \fBserialize_overlap\fR tells fio to avoid provoking this behavior by explicitly
  serializing in-flight I/Os that have a non-zero overlap. Note that setting
  this option can reduce both performance and the \fBiodepth\fR achieved.
-Additionally this option does not work when \fBio_submit_mode\fR is set to
-offload. Default: false.
+.RS
+.P
+This option only applies to I/Os issued for a single job except when it is
+enabled along with \fBio_submit_mode\fR=offload. In offload mode, fio
+will check for overlap among all I/Os submitted by offload jobs with \fBserialize_overlap\fR
+enabled.
+.P
+Default: false.
+.RE
  .TP
  .BI io_submit_mode \fR=\fPstr
  This option controls how fio submits the I/O to the I/O engine. The default
@@ -2209,6 +2244,22 @@ intention here is to make the order of events consistent. This limits the
  influence of the scheduler compared to replaying multiple blktraces via
  concurrent jobs.
  .TP
+.BI merge_blktrace_scalars \fR=\fPfloat_list
+This is a percentage based option that is index paired with the list of files
+passed to \fBread_iolog\fR. When merging is performed, scale the time of each
+event by the corresponding amount. For example,
+`\-\-merge_blktrace_scalars="50:100"' runs the first trace in halftime and the
+second trace in realtime. This knob is separately tunable from
+\fBreplay_time_scale\fR which scales the trace during runtime and will not
+change the output of the merge unlike this option.
+.TP
+.BI merge_blktrace_iters \fR=\fPfloat_list
+This is a whole number option that is index paired with the list of files
+passed to \fBread_iolog\fR. When merging is performed, run each trace for
+the specified number of iterations. For example,
+`\-\-merge_blktrace_iters="2:1"' runs the first trace for two iterations
+and the second trace for one iteration.
+.TP
  .BI replay_no_stall \fR=\fPbool
  When replaying I/O with \fBread_iolog\fR the default behavior is to
  attempt to respect the timestamps within the log and replay them with the
@@ -2241,11 +2292,12 @@ Unfortunately this also breaks the strict time ordering between multiple
  device accesses.
  .TP
  .BI replay_align \fR=\fPint
-Force alignment of I/O offsets and lengths in a trace to this power of 2
-value.
+Force alignment of the byte offsets in a trace to this value. The value
+must be a power of 2.
  .TP
  .BI replay_scale \fR=\fPint
-Scale sector offsets down by this factor when replaying traces.
+Scale bye offsets down by this factor when replaying traces. Should most
+likely use \fBreplay_align\fR as well.
  .SS "Threads, processes and job synchronization"
  .TP
  .BI replay_skip \fR=\fPstr
@@ -2655,6 +2707,12 @@ steady state assessment criteria. All assessments are carried out using only
  data from the rolling collection window. Threshold limits can be expressed
  as a fixed value or as a percentage of the mean in the collection window.
  .RS
+.P
+When using this feature, most jobs should include the \fBtime_based\fR
+and \fBruntime\fR options or the \fBloops\fR option so that fio does not
+stop running after it has covered the full size of the specified file(s)
+or device(s).
+.RS
  .RS
  .TP
  .B iops
@@ -3297,7 +3355,8 @@ is one long line of values, such as:
                 A description of this job goes here.
  .fi
  .P
-The job description (if provided) follows on a second line.
+The job description (if provided) follows on a second line for terse v2.
+It appears on the same line for other terse versions.
  .P
  To enable terse output, use the \fB\-\-minimal\fR or
  `\-\-output\-format=terse' command line options. The
@@ -3432,6 +3491,11 @@ minimal output v3, separated by semicolons:
  .nf
                 terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
  .fi
+.P
+In client/server mode terse output differs from what appears when jobs are run
+locally. Disk utilization data is omitted from the standard terse output and
+for v3 and later appears on its own separate line at the end of each terse
+reporting cycle.
  .SH JSON OUTPUT
  The \fBjson\fR output format is intended to be both human readable and convenient
  for automated parsing. For the most part its sections mirror those of the
@@ -3561,6 +3625,26 @@ $ fio \-\-read_iolog="<file1>:<file2>" \-\-merge_blktrace_file="<output_file>"
  .P
  Creating only the merged file can be done by passing the command line argument
  \fBmerge-blktrace-only\fR.
+.P
+Scaling traces can be done to see the relative impact of any particular trace
+being slowed down or sped up. \fBmerge_blktrace_scalars\fR takes in a colon
+separated list of percentage scalars. It is index paired with the files passed
+to \fBread_iolog\fR.
+.P
+With scaling, it may be desirable to match the running time of all traces.
+This can be done with \fBmerge_blktrace_iters\fR. It is index paired with
+\fBread_iolog\fR just like \fBmerge_blktrace_scalars\fR.
+.P
+In an example, given two traces, A and B, each 60s long. If we want to see
+the impact of trace A issuing IOs twice as fast and repeat trace A over the
+runtime of trace B, the following can be done:
+.RS
+.P
+$ fio \-\-read_iolog="<trace_a>:"<trace_b>" \-\-merge_blktrace_file"<output_file>" \-\-merge_blktrace_scalars="50:100" \-\-merge_blktrace_iters="2:1"
+.RE
+.P
+This runs trace A at 2x the speed twice for approximately the same runtime as
+a single run of trace B.
  .SH CPU IDLENESS PROFILING
  In some cases, we want to understand CPU overhead in a test. For example, we
  test patches for the specific goodness of whether they reduce CPU usage.
@@ -3806,6 +3890,9 @@ containing two hostnames `h1' and `h2' with IP addresses 192.168.10.120 and
  /mnt/nfs/fio/192.168.10.121.fileio.tmp
  .PD
  .RE
+.P
+Terse output in client/server mode will differ slightly from what is produced
+when fio is run in stand-alone mode. See the terse output section for details.
  .SH AUTHORS
  .B fio
  was written by Jens Axboe <axboe@kernel.dk>.