X-Git-Url: https://git.kernel.dk/?a=blobdiff_plain;f=fio.1;h=45ec8d43dcbf8172318f91d12c33005037386bd9;hb=4ef1562a013513fd0a0048cca4048f28d308a90f;hp=f15194ff78c464913c6cf7cc1e110f5817b0d2de;hpb=d4e74fda98b60dc175b4114492fcc7c21c617ddd;p=fio.git

diff --git a/fio.1 b/fio.1
index f15194ff..45ec8d43 100644
--- a/fio.1
+++ b/fio.1
@@ -1462,9 +1462,31 @@ starting I/O if the platform and file type support it. Defaults to true.
 This will be ignored if \fBpre_read\fR is also specified for the
 same job.
 .TP
-.BI sync \fR=\fPbool
-Use synchronous I/O for buffered writes. For the majority of I/O engines,
-this means using O_SYNC. Default: false.
+.BI sync \fR=\fPstr
+Whether, and what type, of synchronous I/O to use for writes.  The allowed
+values are:
+.RS
+.RS
+.TP
+.B none
+Do not use synchronous IO, the default.
+.TP
+.B 0
+Same as \fBnone\fR.
+.TP
+.B sync
+Use synchronous file IO. For the majority of I/O engines,
+this means using O_SYNC.
+.TP
+.B 1
+Same as \fBsync\fR.
+.TP
+.B dsync
+Use synchronous data IO. For the majority of I/O engines,
+this means using O_DSYNC.
+.PD
+.RE
+.RE
 .TP
 .BI iomem \fR=\fPstr "\fR,\fP mem" \fR=\fPstr
 Fio can use various types of memory as the I/O unit buffer. The allowed
@@ -1561,7 +1583,8 @@ if \fBsize\fR is set to 20GiB and \fBio_size\fR is set to 5GiB, fio
 will perform I/O within the first 20GiB but exit when 5GiB have been
 done. The opposite is also possible \-\- if \fBsize\fR is set to 20GiB,
 and \fBio_size\fR is set to 40GiB, then fio will do 40GiB of I/O within
-the 0..20GiB region.
+the 0..20GiB region. Value can be set as percentage: \fBio_size\fR=N%.
+In this case \fBio_size\fR multiplies \fBsize\fR= value.
 .TP
 .BI filesize \fR=\fPirange(int)
 Individual file sizes. May be a range, in which case fio will select sizes
@@ -1674,11 +1697,6 @@ to get desired CPU usage, as the cpuload only loads a
 single CPU at the desired rate. A job never finishes unless there is
 at least one non-cpuio job.
 .TP
-.B guasi
-The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi-lib.html\fR
-for more info on GUASI.
-.TP
 .B rdma
 The RDMA I/O engine supports both RDMA memory semantics
 (RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
@@ -1808,6 +1826,13 @@ Read and write iscsi lun with libiscsi.
 .TP
 .B nbd
 Synchronous read and write a Network Block Device (NBD).
+.TP
+.B libcufile
+I/O engine supporting libcufile synchronous access to nvidia-fs and a
+GPUDirect Storage-supported filesystem. This engine performs
+I/O without transferring buffers between user-space and the kernel,
+unless \fBverify\fR is set or \fBcuda_io\fR is \fBposix\fR. \fBiomem\fR must
+not be \fBcudamalloc\fR. This ioengine defines engine specific options.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2121,7 +2146,36 @@ Example URIs:
 \fInbd+unix:///?socket=/tmp/socket\fR
 .TP
 \fInbds://tlshost/exportname\fR
-
+.RE
+.RE
+.TP
+.BI (libcufile)gpu_dev_ids\fR=\fPstr
+Specify the GPU IDs to use with CUDA. This is a colon-separated list of int.
+GPUs are assigned to workers roundrobin. Default is 0.
+.TP
+.BI (libcufile)cuda_io\fR=\fPstr
+Specify the type of I/O to use with CUDA. This option
+takes the following values:
+.RS
+.RS
+.TP
+.B cufile (default)
+Use libcufile and nvidia-fs. This option performs I/O directly
+between a GPUDirect Storage filesystem and GPU buffers,
+avoiding use of a bounce buffer. If \fBverify\fR is set,
+cudaMemcpy is used to copy verification data between RAM and GPU(s).
+Verification data is copied from RAM to GPU before a write
+and from GPU to RAM after a read.
+\fBdirect\fR must be 1.
+.TP
+.BI posix
+Use POSIX to perform I/O with a RAM buffer, and use
+cudaMemcpy to transfer data between RAM and the GPU(s).
+Data is copied from GPU to RAM before a write and copied
+from RAM to GPU after a read. \fBverify\fR does not affect
+the use of cudaMemcpy.
+.RE
+.RE
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
@@ -2219,7 +2273,7 @@ has a bit of extra overhead, especially for lower queue depth I/O where it
 can increase latencies. The benefit is that fio can manage submission rates
 independently of the device completion rates. This avoids skewed latency
 reporting if I/O gets backed up on the device side (the coordinated omission
-problem).
+problem). Note that this option cannot reliably be used with async IO engines.
 .SS "I/O rate"
 .TP
 .BI thinktime \fR=\fPtime