io_u: Fix bad interaction with --openfiles and non-sequential file selection policy

[fio.git] / fio.1
diff --git a/fio.1 b/fio.1

index f15194ff78c464913c6cf7cc1e110f5817b0d2de..45ec8d43dcbf8172318f91d12c33005037386bd9 100644 (file)
--- a/fio.1
+++ b/fio.1
@@ -1462,9 +1462,31 @@ starting I/O if the platform and file type support it. Defaults to true.
  This will be ignored if \fBpre_read\fR is also specified for the
  same job.
  .TP
-.BI sync \fR=\fPbool
-Use synchronous I/O for buffered writes. For the majority of I/O engines,
-this means using O_SYNC. Default: false.
+.BI sync \fR=\fPstr
+Whether, and what type, of synchronous I/O to use for writes.  The allowed
+values are:
+.RS
+.RS
+.TP
+.B none
+Do not use synchronous IO, the default.
+.TP
+.B 0
+Same as \fBnone\fR.
+.TP
+.B sync
+Use synchronous file IO. For the majority of I/O engines,
+this means using O_SYNC.
+.TP
+.B 1
+Same as \fBsync\fR.
+.TP
+.B dsync
+Use synchronous data IO. For the majority of I/O engines,
+this means using O_DSYNC.
+.PD
+.RE
+.RE
  .TP
  .BI iomem \fR=\fPstr "\fR,\fP mem" \fR=\fPstr
  Fio can use various types of memory as the I/O unit buffer. The allowed
@@ -1561,7 +1583,8 @@ if \fBsize\fR is set to 20GiB and \fBio_size\fR is set to 5GiB, fio
  will perform I/O within the first 20GiB but exit when 5GiB have been
  done. The opposite is also possible \-\- if \fBsize\fR is set to 20GiB,
  and \fBio_size\fR is set to 40GiB, then fio will do 40GiB of I/O within
-the 0..20GiB region.
+the 0..20GiB region. Value can be set as percentage: \fBio_size\fR=N%.
+In this case \fBio_size\fR multiplies \fBsize\fR= value.
  .TP
  .BI filesize \fR=\fPirange(int)
  Individual file sizes. May be a range, in which case fio will select sizes
@@ -1674,11 +1697,6 @@ to get desired CPU usage, as the cpuload only loads a
  single CPU at the desired rate. A job never finishes unless there is
  at least one non-cpuio job.
  .TP
-.B guasi
-The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi-lib.html\fR
-for more info on GUASI.
-.TP
  .B rdma
  The RDMA I/O engine supports both RDMA memory semantics
  (RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
@@ -1808,6 +1826,13 @@ Read and write iscsi lun with libiscsi.
  .TP
  .B nbd
  Synchronous read and write a Network Block Device (NBD).
+.TP
+.B libcufile
+I/O engine supporting libcufile synchronous access to nvidia-fs and a
+GPUDirect Storage-supported filesystem. This engine performs
+I/O without transferring buffers between user-space and the kernel,
+unless \fBverify\fR is set or \fBcuda_io\fR is \fBposix\fR. \fBiomem\fR must
+not be \fBcudamalloc\fR. This ioengine defines engine specific options.
  .SS "I/O engine specific parameters"
  In addition, there are some parameters which are only valid when a specific
  \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2121,7 +2146,36 @@ Example URIs:
  \fInbd+unix:///?socket=/tmp/socket\fR
  .TP
  \fInbds://tlshost/exportname\fR
-
+.RE
+.RE
+.TP
+.BI (libcufile)gpu_dev_ids\fR=\fPstr
+Specify the GPU IDs to use with CUDA. This is a colon-separated list of int.
+GPUs are assigned to workers roundrobin. Default is 0.
+.TP
+.BI (libcufile)cuda_io\fR=\fPstr
+Specify the type of I/O to use with CUDA. This option
+takes the following values:
+.RS
+.RS
+.TP
+.B cufile (default)
+Use libcufile and nvidia-fs. This option performs I/O directly
+between a GPUDirect Storage filesystem and GPU buffers,
+avoiding use of a bounce buffer. If \fBverify\fR is set,
+cudaMemcpy is used to copy verification data between RAM and GPU(s).
+Verification data is copied from RAM to GPU before a write
+and from GPU to RAM after a read.
+\fBdirect\fR must be 1.
+.TP
+.BI posix
+Use POSIX to perform I/O with a RAM buffer, and use
+cudaMemcpy to transfer data between RAM and the GPU(s).
+Data is copied from GPU to RAM before a write and copied
+from RAM to GPU after a read. \fBverify\fR does not affect
+the use of cudaMemcpy.
+.RE
+.RE
  .SS "I/O depth"
  .TP
  .BI iodepth \fR=\fPint
@@ -2219,7 +2273,7 @@ has a bit of extra overhead, especially for lower queue depth I/O where it
  can increase latencies. The benefit is that fio can manage submission rates
  independently of the device completion rates. This avoids skewed latency
  reporting if I/O gets backed up on the device side (the coordinated omission
-problem).
+problem). Note that this option cannot reliably be used with async IO engines.
  .SS "I/O rate"
  .TP
  .BI thinktime \fR=\fPtime