X-Git-Url: https://git.kernel.dk/?p=fio.git;a=blobdiff_plain;f=HOWTO;h=0c5b7109df30c55c35bbc2d8441be0ee97d3ecb8;hp=acb9e97fc35bf6ca790d45bb93b7be0260509a8f;hb=f88817479014b87bfed3a20f9fe7dae0efb33dee;hpb=804c08390492023293f7febb71172e5fe0aefbf5

diff --git a/HOWTO b/HOWTO
index acb9e97f..0c5b7109 100644
--- a/HOWTO
+++ b/HOWTO
@@ -163,12 +163,12 @@ Command line options
 
 .. option:: --readonly
 
-	Turn on safety read-only checks, preventing writes.  The ``--readonly``
-	option is an extra safety guard to prevent users from accidentally starting
-	a write workload when that is not desired.  Fio will only write if
-	`rw=write/randwrite/rw/randrw` is given.  This extra safety net can be used
-	as an extra precaution as ``--readonly`` will also enable a write check in
-	the I/O engine core to prevent writes due to unknown user space bug(s).
+	Turn on safety read-only checks, preventing writes and trims.  The
+	``--readonly`` option is an extra safety guard to prevent users from
+	accidentally starting a write or trim workload when that is not desired.
+	Fio will only modify the device under test if
+	`rw=write/randwrite/rw/randrw/trim/randtrim/trimwrite` is given.  This
+	safety net can be used as an extra precaution.
 
 .. option:: --eta=when
 
@@ -194,7 +194,10 @@ Command line options
 	Force a full status dump of cumulative (from job start) values at `time`
 	intervals. This option does *not* provide per-period measurements. So
 	values such as bandwidth are running averages. When the time unit is omitted,
-	`time` is interpreted in seconds.
+	`time` is interpreted in seconds. Note that using this option with
+	``--output-format=json`` will yield output that technically isn't valid
+	json, since the output will be collated sets of valid json. It will need
+	to be split into valid sets of json after the run.
 
 .. option:: --section=name
 
@@ -283,7 +286,8 @@ Command line options
 
 .. option:: --aux-path=path
 
-	Use this `path` for fio state generated files.
+	Use the directory specified by `path` for generated state files instead
+	of the current working directory.
 
 Any parameters following the options will be assumed to be job files, unless
 they match a job file parameter. Multiple job files can be listed and each job
@@ -748,12 +752,15 @@ Target file/device
 	assigned equally distributed to job clones created by :option:`numjobs` as
 	long as they are using generated filenames. If specific `filename(s)` are
 	set fio will use the first listed directory, and thereby matching the
-	`filename` semantic which generates a file each clone if not specified, but
-	let all clones use the same if set.
+	`filename` semantic (which generates a file for each clone if not
+	specified, but lets all clones use the same file if set).
 
 	See the :option:`filename` option for information on how to escape "``:``" and
 	"``\``" characters within the directory path itself.
 
+	Note: To control the directory fio will use for internal state files
+	use :option:`--aux-path`.
+
 .. option:: filename=str
 
 	Fio normally makes up a `filename` based on the job name, thread number, and
@@ -948,18 +955,92 @@ Target file/device
 
 	Unlink job files after each iteration or loop.  Default: false.
 
-.. option:: zonesize=int
+.. option:: zonemode=str
+
+	Accepted values are:
 
-	Divide a file into zones of the specified size. See :option:`zoneskip`.
+		**none**
+				The :option:`zonerange`, :option:`zonesize` and
+				:option:`zoneskip` parameters are ignored.
+		**strided**
+				I/O happens in a single zone until
+				:option:`zonesize` bytes have been transferred.
+				After that number of bytes has been
+				transferred processing of the next zone
+				starts.
+		**zbd**
+				Zoned block device mode. I/O happens
+				sequentially in each zone, even if random I/O
+				has been selected. Random I/O happens across
+				all zones instead of being restricted to a
+				single zone. The :option:`zoneskip` parameter
+				is ignored. :option:`zonerange` and
+				:option:`zonesize` must be identical.
 
 .. option:: zonerange=int
 
-	Give size of an I/O zone.  See :option:`zoneskip`.
+	Size of a single zone. See also :option:`zonesize` and
+	:option:`zoneskip`.
+
+.. option:: zonesize=int
+
+	For :option:`zonemode` =strided, this is the number of bytes to
+	transfer before skipping :option:`zoneskip` bytes. If this parameter
+	is smaller than :option:`zonerange` then only a fraction of each zone
+	with :option:`zonerange` bytes will be accessed.  If this parameter is
+	larger than :option:`zonerange` then each zone will be accessed
+	multiple times before skipping to the next zone.
+
+	For :option:`zonemode` =zbd, this is the size of a single zone. The
+	:option:`zonerange` parameter is ignored in this mode.
 
 .. option:: zoneskip=int
 
-	Skip the specified number of bytes when :option:`zonesize` data has been
-	read. The two zone options can be used to only do I/O on zones of a file.
+	For :option:`zonemode` =strided, the number of bytes to skip after
+	:option:`zonesize` bytes of data have been transferred. This parameter
+	must be zero for :option:`zonemode` =zbd.
+
+.. option:: read_beyond_wp=bool
+
+	This parameter applies to :option:`zonemode` =zbd only.
+
+	Zoned block devices are block devices that consist of multiple zones.
+	Each zone has a type, e.g. conventional or sequential. A conventional
+	zone can be written at any offset that is a multiple of the block
+	size. Sequential zones must be written sequentially. The position at
+	which a write must occur is called the write pointer. A zoned block
+	device can be either drive managed, host managed or host aware. For
+	host managed devices the host must ensure that writes happen
+	sequentially. Fio recognizes host managed devices and serializes
+	writes to sequential zones for these devices.
+
+	If a read occurs in a sequential zone beyond the write pointer then
+	the zoned block device will complete the read without reading any data
+	from the storage medium. Since such reads lead to unrealistically high
+	bandwidth and IOPS numbers fio only reads beyond the write pointer if
+	explicitly told to do so. Default: false.
+
+.. option:: max_open_zones=int
+
+	When running a random write test across an entire drive many more
+	zones will be open than in a typical application workload. Hence this
+	command line option that allows to limit the number of open zones. The
+	number of open zones is defined as the number of zones to which write
+	commands are issued.
+
+.. option:: zone_reset_threshold=float
+
+	A number between zero and one that indicates the ratio of logical
+	blocks with data to the total number of logical blocks in the test
+	above which zones should be reset periodically.
+
+.. option:: zone_reset_frequency=float
+
+	A number between zero and one that indicates how often a zone reset
+	should be issued if the zone reset threshold has been exceeded. A zone
+	reset is submitted after each (1 / zone_reset_frequency) write
+	requests. This and the previous parameter can be used to simulate
+	garbage collection activity.
 
 
 I/O type
@@ -991,13 +1072,15 @@ I/O type
 		**write**
 				Sequential writes.
 		**trim**
-				Sequential trims (Linux block devices only).
+				Sequential trims (Linux block devices and SCSI
+				character devices only).
 		**randread**
 				Random reads.
 		**randwrite**
 				Random writes.
 		**randtrim**
-				Random trims (Linux block devices only).
+				Random trims (Linux block devices and SCSI
+				character devices only).
 		**rw,readwrite**
 				Sequential mixed reads and writes.
 		**randrw**
@@ -1329,7 +1412,9 @@ I/O type
 	and that some blocks may be read/written more than once. If this option is
 	used with :option:`verify` and multiple blocksizes (via :option:`bsrange`),
 	only intact blocks are verified, i.e., partially-overwritten blocks are
-	ignored.
+	ignored.  With an async I/O engine and an I/O depth > 1, it is possible for
+	the same block to be overwritten, which can cause verification errors.  Either
+	do not use norandommap in this case, or also use the lfsr random generator.
 
 .. option:: softrandommap=bool
 
@@ -1429,7 +1514,7 @@ Block size
 	If you want a workload that has 50% 2k reads and 50% 4k reads, while
 	having 90% 4k writes and 10% 8k writes, you would specify::
 
-		bssplit=2k/50:4k/50,4k/90,8k/10
+		bssplit=2k/50:4k/50,4k/90:8k/10
 
 	Fio supports defining up to 64 different weights for each data
 	direction.
@@ -1746,7 +1831,8 @@ I/O engine
 			ioctl, or if the target is an sg character device we use
 			:manpage:`read(2)` and :manpage:`write(2)` for asynchronous
 			I/O. Requires :option:`filename` option to specify either block or
-			character devices.
+			character devices. This engine supports trim operations.
+			The sg engine includes engine specific options.
 
 		**null**
 			Doesn't transfer any data, just pretends to.  This is mainly used to
@@ -1820,6 +1906,15 @@ I/O engine
 			(RBD) via librbd without the need to use the kernel rbd driver. This
 			ioengine defines engine specific options.
 
+		**http**
+			I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to
+			a WebDAV or S3 endpoint.  This ioengine defines engine specific options.
+
+			This engine only supports direct IO of iodepth=1; you need to scale this
+			via numjobs. blocksize defines the size of the objects to be created.
+
+			TRIM is translated to object deletion.
+
 		**gfapi**
 			Using GlusterFS libgfapi sync interface to direct access to
 			GlusterFS volumes without having to go through FUSE.  This ioengine
@@ -1853,12 +1948,12 @@ I/O engine
 
 		**pmemblk**
 			Read and write using filesystem DAX to a file on a filesystem
-			mounted with DAX on a persistent memory device through the NVML
+			mounted with DAX on a persistent memory device through the PMDK
 			libpmemblk library.
 
 		**dev-dax**
 			Read and write using device DAX to a persistent memory device (e.g.,
-			/dev/dax0.0) through the NVML libpmem library.
+			/dev/dax0.0) through the PMDK libpmem library.
 
 		**external**
 			Prefix to specify loading an external I/O engine object file. Append
@@ -1874,9 +1969,25 @@ I/O engine
 
 		**libpmem**
 			Read and write using mmap I/O to a file on a filesystem
-			mounted with DAX on a persistent memory device through the NVML
+			mounted with DAX on a persistent memory device through the PMDK
 			libpmem library.
 
+		**ime_psync**
+			Synchronous read and write using DDN's Infinite Memory Engine (IME).
+			This engine is very basic and issues calls to IME whenever an IO is
+			queued.
+
+		**ime_psyncv**
+			Synchronous read and write using DDN's Infinite Memory Engine (IME).
+			This engine uses iovecs and will try to stack as much IOs as possible
+			(if the IOs are "contiguous" and the IO depth is not exceeded)
+			before issuing a call to IME.
+
+		**ime_aio**
+			Asynchronous read and write using DDN's Infinite Memory Engine (IME).
+			This engine will try to stack as much IOs as possible by creating
+			requests for IME. FIO will then decide when to commit these requests.
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2068,6 +2179,86 @@ with the caveat that when used on the command line, they must come after the
 	multiple paths exist between the client and the server or in certain loopback
 	configurations.
 
+.. option:: readfua=bool : [sg]
+
+	With readfua option set to 1, read operations include
+	the force unit access (fua) flag. Default is 0.
+
+.. option:: writefua=bool : [sg]
+
+	With writefua option set to 1, write operations include
+	the force unit access (fua) flag. Default is 0.
+
+.. option:: sg_write_mode=str : [sg]
+
+	Specify the type of write commands to issue. This option can take three values:
+
+	**write**
+		This is the default where write opcodes are issued as usual.
+	**verify**
+		Issue WRITE AND VERIFY commands. The BYTCHK bit is set to 0. This
+		directs the device to carry out a medium verification with no data
+		comparison. The writefua option is ignored with this selection.
+	**same**
+		Issue WRITE SAME commands. This transfers a single block to the device
+		and writes this same block of data to a contiguous sequence of LBAs
+		beginning at the specified offset. fio's block size parameter specifies
+		the amount of data written with each command. However, the amount of data
+		actually transferred to the device is equal to the device's block
+		(sector) size. For a device with 512 byte sectors, blocksize=8k will
+		write 16 sectors with each command. fio will still generate 8k of data
+		for each command but only the first 512 bytes will be used and
+		transferred to the device. The writefua option is ignored with this
+		selection.
+
+.. option:: http_host=str : [http]
+
+	Hostname to connect to. For S3, this could be the bucket hostname.
+	Default is **localhost**
+
+.. option:: http_user=str : [http]
+
+	Username for HTTP authentication.
+
+.. option:: http_pass=str : [http]
+
+	Password for HTTP authentication.
+
+.. option:: https=str : [http]
+
+	Enable HTTPS instead of http. *on* enables HTTPS; *insecure*
+	will enable HTTPS, but disable SSL peer verification (use with
+	caution!). Default is **off**
+
+.. option:: http_mode=str : [http]
+
+	Which HTTP access mode to use: *webdav*, *swift*, or *s3*.
+	Default is **webdav**
+
+.. option:: http_s3_region=str : [http]
+
+	The S3 region/zone string.
+	Default is **us-east-1**
+
+.. option:: http_s3_key=str : [http]
+
+	The S3 secret key.
+
+.. option:: http_s3_keyid=str : [http]
+
+	The S3 key/access id.
+
+.. option:: http_swift_auth_token=str : [http]
+
+	The Swift auth token. See the example configuration file on how
+	to retrieve this.
+
+.. option:: http_verbose=int : [http]
+
+	Enable verbose requests from libcurl. Useful for debugging. 1
+	turns on verbose logging from libcurl, 2 additionally enables
+	HTTP IO tracing. Default is **0**
+
 I/O depth
 ~~~~~~~~~
 
@@ -2289,6 +2480,16 @@ I/O replay
 	:manpage:`blktrace(8)` for how to capture such logging data. For blktrace
 	replay, the file needs to be turned into a blkparse binary data file first
 	(``blkparse <device> -o /dev/null -d file_for_fio.bin``).
+	You can specify a number of files by separating the names with a ':'
+	character. See the :option:`filename` option for information on how to
+	escape ':' and '\' characters within the file names. These files will
+	be sequentially assigned to job clones created by :option:`numjobs`.
+
+.. option:: read_iolog_chunked=bool
+
+	Determines how iolog is read. If false(default) entire :option:`read_iolog`
+	will be read at once. If selected true, input from iolog will be read
+	gradually. Useful when iolog is very large, or it is generated.
 
 .. option:: replay_no_stall=bool
 
@@ -2299,6 +2500,14 @@ I/O replay
 	still respecting ordering. The result is the same I/O pattern to a given
 	device, but different timings.
 
+.. option:: replay_time_scale=int
+
+	When replaying I/O with :option:`read_iolog`, fio will honor the
+	original timing in the trace. With this option, it's possible to scale
+	the time. It's a percentage option, if set to 50 it means run at 50%
+	the original IO rate in the trace. If set to 200, run at twice the
+	original IO rate. Defaults to 100.
+
 .. option:: replay_redirect=str
 
 	While replaying I/O patterns using :option:`read_iolog` the default behavior
@@ -2326,6 +2535,14 @@ I/O replay
 
 	Scale sector offsets down by this factor when replaying traces.
 
+.. option:: replay_skip=str
+
+	Sometimes it's useful to skip certain IO types in a replay trace.
+	This could be, for instance, eliminating the writes in the trace.
+	Or not replaying the trims/discards, if you are redirecting to
+	a device that doesn't support them. This option takes a comma
+	separated list of read, write, trim, sync.
+
 
 Threads, processes and job synchronization
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2365,24 +2582,27 @@ Threads, processes and job synchronization
 
 	Set the I/O priority class. See man :manpage:`ionice(1)`.
 
-.. option:: cpumask=int
-
-	Set the CPU affinity of this job. The parameter given is a bit mask of
-	allowed CPUs the job may run on. So if you want the allowed CPUs to be 1
-	and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
-	:manpage:`sched_setaffinity(2)`. This may not work on all supported
-	operating systems or kernel versions. This option doesn't work well for a
-	higher CPU count than what you can store in an integer mask, so it can only
-	control cpus 1-32. For boxes with larger CPU counts, use
-	:option:`cpus_allowed`.
-
 .. option:: cpus_allowed=str
 
 	Controls the same options as :option:`cpumask`, but accepts a textual
-	specification of the permitted CPUs instead. So to use CPUs 1 and 5 you
-	would specify ``cpus_allowed=1,5``. This option also allows a range of CPUs
-	to be specified -- say you wanted a binding to CPUs 1, 5, and 8 to 15, you
-	would set ``cpus_allowed=1,5,8-15``.
+	specification of the permitted CPUs instead and CPUs are indexed from 0. So
+	to use CPUs 0 and 5 you would specify ``cpus_allowed=0,5``. This option also
+	allows a range of CPUs to be specified -- say you wanted a binding to CPUs
+	0, 5, and 8 to 15, you would set ``cpus_allowed=0,5,8-15``.
+
+	On Windows, when ``cpus_allowed`` is unset only CPUs from fio's current
+	processor group will be used and affinity settings are inherited from the
+	system. An fio build configured to target Windows 7 makes options that set
+	CPUs processor group aware and values will set both the processor group
+	and a CPU from within that group. For example, on a system where processor
+	group 0 has 40 CPUs and processor group 1 has 32 CPUs, ``cpus_allowed``
+	values between 0 and 39 will bind CPUs from processor group 0 and
+	``cpus_allowed`` values between 40 and 71 will bind CPUs from processor
+	group 1. When using ``cpus_allowed_policy=shared`` all CPUs specified by a
+	single ``cpus_allowed`` option must be from the same processor group. For
+	Windows fio builds not built for Windows 7, CPUs will only be selected from
+	(and be relative to) whatever processor group fio happens to be running in
+	and CPUs from other processor groups cannot be used.
 
 .. option:: cpus_allowed_policy=str
 
@@ -2399,6 +2619,17 @@ Threads, processes and job synchronization
 	enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
 	in the set.
 
+.. option:: cpumask=int
+
+	Set the CPU affinity of this job. The parameter given is a bit mask of
+	allowed CPUs the job may run on. So if you want the allowed CPUs to be 1
+	and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
+	:manpage:`sched_setaffinity(2)`. This may not work on all supported
+	operating systems or kernel versions. This option doesn't work well for a
+	higher CPU count than what you can store in an integer mask, so it can only
+	control cpus 1-32. For boxes with larger CPU counts, use
+	:option:`cpus_allowed`.
+
 .. option:: numa_cpu_nodes=str
 
 	Set this job running on specified NUMA nodes' CPUs. The arguments allow
@@ -2601,17 +2832,10 @@ Verification
 	previously written file. If the data direction includes any form of write,
 	the verify will be of the newly written data.
 
-.. option:: verifysort=bool
-
-	If true, fio will sort written verify blocks when it deems it faster to read
-	them back in a sorted manner. This is often the case when overwriting an
-	existing file, since the blocks are already laid out in the file system. You
-	can ignore this option unless doing huge amounts of really fast I/O where
-	the red-black tree sorting CPU time becomes significant. Default: true.
-
-.. option:: verifysort_nr=int
-
-	Pre-load and sort verify blocks for a read workload.
+	To avoid false verification errors, do not use the norandommap option when
+	verifying data with async I/O engines and I/O depths > 1.  Or use the
+	norandommap and the lfsr random generator together to avoid writing to the
+	same offset with muliple outstanding I/Os.
 
 .. option:: verify_offset=int
 
@@ -2849,9 +3073,11 @@ Measurements and reporting
 .. option:: write_iops_log=str
 
 	Same as :option:`write_bw_log`, but writes an IOPS file (e.g.
-	:file:`name_iops.x.log`) instead. See :option:`write_bw_log` for
-	details about the filename format and `Log File Formats`_ for how data
-	is structured within the file.
+	:file:`name_iops.x.log`) instead. Because fio defaults to individual
+	I/O logging, the value entry in the IOPS log will be 1 unless windowed
+	logging (see :option:`log_avg_msec`) has been enabled. See
+	:option:`write_bw_log` for details about the filename format and `Log
+	File Formats`_ for how data is structured within the file.
 
 .. option:: log_avg_msec=int
 
@@ -2909,7 +3135,8 @@ Measurements and reporting
 
 	Define the set of CPUs that are allowed to handle online log compression for
 	the I/O jobs. This can provide better isolation between performance
-	sensitive jobs, and background compression work.
+	sensitive jobs, and background compression work. See
+	:option:`cpus_allowed` for the format used.
 
 .. option:: log_store_compressed=bool
 
@@ -3735,17 +3962,16 @@ on the type of log, it will be one of the following:
 	**2**
 		I/O is a TRIM
 
-The entry's *block size* is always in bytes. The *offset* is the offset, in bytes,
-from the start of the file, for that particular I/O. The logging of the offset can be
+The entry's *block size* is always in bytes. The *offset* is the position in bytes
+from the start of the file for that particular I/O. The logging of the offset can be
 toggled with :option:`log_offset`.
 
-Fio defaults to logging every individual I/O.  When IOPS are logged for individual
-I/Os the *value* entry will always be 1. If windowed logging is enabled through
-:option:`log_avg_msec`, fio logs the average values over the specified period of time.
-If windowed logging is enabled and :option:`log_max_value` is set, then fio logs
-maximum values in that window instead of averages. Since *data direction*, *block
-size* and *offset* are per-I/O values, if windowed logging is enabled they
-aren't applicable and will be 0.
+Fio defaults to logging every individual I/O but when windowed logging is set
+through :option:`log_avg_msec`, either the average (by default) or the maximum
+(:option:`log_max_value` is set) *value* seen over the specified period of time
+is recorded. Each *data direction* seen within the window period will aggregate
+its values in a separate row. Further, when using windowed logging the *block
+size* and *offset* entries will always contain 0.
 
 Client/Server
 -------------