X-Git-Url: https://git.kernel.dk/?a=blobdiff_plain;f=HOWTO;h=7e46cee0eceac9f8d85c5b6191b3335dd72c2e14;hb=dee9b29bef5bc344815d7a53dda6bb21426f2bfa;hp=e0403b0803f04cb04ef7a14832dd39b3803c34d8;hpb=f0ed01ed095cf1ca7c1945a5a0267e8f73b7b4a9;p=fio.git

diff --git a/HOWTO b/HOWTO
index e0403b08..7e46cee0 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1677,10 +1677,28 @@ Buffers and memory
 	This will be ignored if :option:`pre_read` is also specified for the
 	same job.
 
-.. option:: sync=bool
+.. option:: sync=str
+
+	Whether, and what type, of synchronous I/O to use for writes.  The allowed
+	values are:
+
+		**none**
+			Do not use synchronous IO, the default.
+
+		**0**
+			Same as **none**.
+
+		**sync**
+			Use synchronous file IO. For the majority of I/O engines,
+			this means using O_SYNC.
+
+		**1**
+			Same as **sync**.
+
+		**dsync**
+			Use synchronous data IO. For the majority of I/O engines,
+			this means using O_DSYNC.
 
-	Use synchronous I/O for buffered writes. For the majority of I/O engines,
-	this means using O_SYNC. Default: false.
 
 .. option:: iomem=str, mem=str
 
@@ -1901,14 +1919,6 @@ I/O engine
 			single CPU at the desired rate. A job never finishes unless there is
 			at least one non-cpuio job.
 
-		**guasi**
-			The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-			Interface approach to async I/O. See
-
-			http://www.xmailserver.org/guasi-lib.html
-
-			for more info on GUASI.
-
 		**rdma**
 			The RDMA I/O engine supports both RDMA memory semantics
 			(RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
@@ -2038,6 +2048,14 @@ I/O engine
 		**nbd**
 			Read and write a Network Block Device (NBD).
 
+		**libcufile**
+			I/O engine supporting libcufile synchronous access to nvidia-fs and a
+			GPUDirect Storage-supported filesystem. This engine performs
+			I/O without transferring buffers between user-space and the kernel,
+			unless :option:`verify` is set or :option:`cuda_io` is `posix`.
+			:option:`iomem` must not be `cudamalloc`. This ioengine defines
+			engine specific options.
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2388,6 +2406,28 @@ with the caveat that when used on the command line, they must come after the
 	nbd+unix:///?socket=/tmp/socket
 	nbds://tlshost/exportname
 
+.. option:: gpu_dev_ids=str : [libcufile]
+
+	Specify the GPU IDs to use with CUDA. This is a colon-separated list of
+	int. GPUs are assigned to workers roundrobin. Default is 0.
+
+.. option:: cuda_io=str : [libcufile]
+
+	Specify the type of I/O to use with CUDA. Default is **cufile**.
+
+	**cufile**
+		Use libcufile and nvidia-fs. This option performs I/O directly
+		between a GPUDirect Storage filesystem and GPU buffers,
+		avoiding use of a bounce buffer. If :option:`verify` is set,
+		cudaMemcpy is used to copy verificaton data between RAM and GPU.
+		Verification data is copied from RAM to GPU before a write
+		and from GPU to RAM after a read. :option:`direct` must be 1.
+	**posix**
+		Use POSIX to perform I/O with a RAM buffer, and use cudaMemcpy
+		to transfer data between RAM and the GPUs. Data is copied from
+		GPU to RAM before a write and copied from RAM to GPU after a
+		read. :option:`verify` does not affect use of cudaMemcpy.
+
 I/O depth
 ~~~~~~~~~
 
@@ -2482,7 +2522,8 @@ I/O depth
 	can increase latencies. The benefit is that fio can manage submission rates
 	independently of the device completion rates. This avoids skewed latency
 	reporting if I/O gets backed up on the device side (the coordinated omission
-	problem).
+	problem). Note that this option cannot reliably be used with async IO
+	engines.
 
 
 I/O rate
@@ -2861,15 +2902,10 @@ Threads, processes and job synchronization
 	``flow=8`` and another job has ``flow=-1``, then there will be a roughly 1:8
 	ratio in how much one runs vs the other.
 
-.. option:: flow_watermark=int
-
-	The maximum value that the absolute value of the flow counter is allowed to
-	reach before the job must wait for a lower value of the counter.
-
 .. option:: flow_sleep=int
 
-	The period of time, in microseconds, to wait after the flow watermark has
-	been exceeded before retrying operations.
+	The period of time, in microseconds, to wait after the flow counter
+	has exceeded its proportion before retrying operations.
 
 .. option:: stonewall, wait_for_previous