-Table of contents
------------------
+How fio works
+-------------
+
+The first step in getting fio to simulate a desired I/O workload, is writing a
+job file describing that specific setup. A job file may contain any number of
+threads and/or files -- the typical contents of the job file is a *global*
+section defining shared parameters, and one or more job sections describing the
+jobs involved. When run, fio parses this file and sets everything up as
+described. If we break down a job from top to bottom, it contains the following
+basic parameters:
+
+`I/O type`_
+
+ Defines the I/O pattern issued to the file(s). We may only be reading
+ sequentially from this file(s), or we may be writing randomly. Or even
+ mixing reads and writes, sequentially or randomly.
+ Should we be doing buffered I/O, or direct/raw I/O?
+
+`Block size`_
+
+ In how large chunks are we issuing I/O? This may be a single value,
+ or it may describe a range of block sizes.
+
+`I/O size`_
+
+ How much data are we going to be reading/writing.
+
+`I/O engine`_
+
+ How do we issue I/O? We could be memory mapping the file, we could be
+ using regular read/write, we could be using splice, async I/O, or even
+ SG (SCSI generic sg).
+
+`I/O depth`_
+
+ If the I/O engine is async, how large a queuing depth do we want to
+ maintain?
+
+
+`Target file/device`_
+
+ How many files are we spreading the workload over.
+
+`Threads, processes and job synchronization`_
+
+ How many threads or processes should we spread this workload over.
+
+The above are the basic parameters defined for a workload, in addition there's a
+multitude of parameters that modify other aspects of how this job behaves.
+
+
+Command line options
+--------------------
+
+.. option:: --debug=type
+
+ Enable verbose tracing of various fio actions. May be ``all`` for all types
+ or individual types separated by a comma (e.g. ``--debug=file,mem`` will
+ enable file and memory debugging). Currently, additional logging is
+ available for:
+
+ *process*
+ Dump info related to processes.
+ *file*
+ Dump info related to file actions.
+ *io*
+ Dump info related to I/O queuing.
+ *mem*
+ Dump info related to memory allocations.
+ *blktrace*
+ Dump info related to blktrace setup.
+ *verify*
+ Dump info related to I/O verification.
+ *all*
+ Enable all debug options.
+ *random*
+ Dump info related to random offset generation.
+ *parse*
+ Dump info related to option matching and parsing.
+ *diskutil*
+ Dump info related to disk utilization updates.
+ *job:x*
+ Dump info only related to job number x.
+ *mutex*
+ Dump info only related to mutex up/down ops.
+ *profile*
+ Dump info related to profile extensions.
+ *time*
+ Dump info related to internal time keeping.
+ *net*
+ Dump info related to networking connections.
+ *rate*
+ Dump info related to I/O rate switching.
+ *compress*
+ Dump info related to log compress/decompress.
+ *?* or *help*
+ Show available debug options.
+
+.. option:: --parse-only
+
+ Parse options only, don\'t start any I/O.
+
+.. option:: --output=filename
+
+ Write output to file `filename`.
+
+.. option:: --bandwidth-log
+
+ Generate aggregate bandwidth logs.
+
+.. option:: --minimal
+
+ Print statistics in a terse, semicolon-delimited format.
+
+.. option:: --append-terse
+
+ Print statistics in selected mode AND terse, semicolon-delimited format.
+ **deprecated**, use :option:`--output-format` instead to select multiple
+ formats.
+
+.. option:: --output-format=type
+
+ Set the reporting format to `normal`, `terse`, `json`, or `json+`. Multiple
+ formats can be selected, separate by a comma. `terse` is a CSV based
+ format. `json+` is like `json`, except it adds a full dump of the latency
+ buckets.
+
+.. option:: --terse-version=type
+
+ Set terse version output format (default 3, or 2 or 4).
+
+.. option:: --version
+
+ Print version info and exit.
+
+.. option:: --help
+
+ Print this page.
+
+.. option:: --cpuclock-test
+
+ Perform test and validation of internal CPU clock.
+
+.. option:: --crctest=test
+
+ Test the speed of the builtin checksumming functions. If no argument is
+ given, all of them are tested. Or a comma separated list can be passed, in
+ which case the given ones are tested.
+
+.. option:: --cmdhelp=command
+
+ Print help information for `command`. May be ``all`` for all commands.
+
+.. option:: --enghelp=[ioengine[,command]]
+
+ List all commands defined by :option:`ioengine`, or print help for `command`
+ defined by :option:`ioengine`. If no :option:`ioengine` is given, list all
+ available ioengines.
+
+.. option:: --showcmd=jobfile
+
+ Turn a job file into command line options.
+
+.. option:: --readonly
+
+ Turn on safety read-only checks, preventing writes. The ``--readonly``
+ option is an extra safety guard to prevent users from accidentally starting
+ a write workload when that is not desired. Fio will only write if
+ `rw=write/randwrite/rw/randrw` is given. This extra safety net can be used
+ as an extra precaution as ``--readonly`` will also enable a write check in
+ the I/O engine core to prevent writes due to unknown user space bug(s).
+
+.. option:: --eta=when
+
+ When real-time ETA estimate should be printed. May be `always`, `never` or
+ `auto`.
+
+.. option:: --eta-newline=time
+
+ Force a new line for every `time` period passed.
+
+.. option:: --status-interval=time
+
+ Force full status dump every `time` period passed.
+
+.. option:: --section=name
+
+ Only run specified section in job file. Multiple sections can be specified.
+ The ``--section`` option allows one to combine related jobs into one file.
+ E.g. one job file could define light, moderate, and heavy sections. Tell
+ fio to run only the "heavy" section by giving ``--section=heavy``
+ command line option. One can also specify the "write" operations in one
+ section and "verify" operation in another section. The ``--section`` option
+ only applies to job sections. The reserved *global* section is always
+ parsed and used.
+
+.. option:: --alloc-size=kb
+
+ Set the internal smalloc pool to this size in kb (def 1024). The
+ ``--alloc-size`` switch allows one to use a larger pool size for smalloc.
+ If running large jobs with randommap enabled, fio can run out of memory.
+ Smalloc is an internal allocator for shared structures from a fixed size
+ memory pool. The pool size defaults to 16M and can grow to 8 pools.
+
+ NOTE: While running :file:`.fio_smalloc.*` backing store files are visible
+ in :file:`/tmp`.
+
+.. option:: --warnings-fatal
+
+ All fio parser warnings are fatal, causing fio to exit with an
+ error.
+
+.. option:: --max-jobs=nr
+
+ Maximum number of threads/processes to support.
+
+.. option:: --server=args
+
+ Start a backend server, with `args` specifying what to listen to.
+ See `Client/Server`_ section.
+
+.. option:: --daemonize=pidfile
+
+ Background a fio server, writing the pid to the given `pidfile` file.
+
+.. option:: --client=hostname
+
+ Instead of running the jobs locally, send and run them on the given host or
+ set of hosts. See `Client/Server`_ section.
+
+.. option:: --remote-config=file
+
+ Tell fio server to load this local file.
+
+.. option:: --idle-prof=option
+
+ Report cpu idleness on a system or percpu basis
+ ``--idle-prof=system,percpu`` or
+ run unit work calibration only ``--idle-prof=calibrate``.
+
+.. option:: --inflate-log=log
+
+ Inflate and output compressed log.
+
+.. option:: --trigger-file=file
+
+ Execute trigger cmd when file exists.
+
+.. option:: --trigger-timeout=t
+
+ Execute trigger at this time.
+
+.. option:: --trigger=cmd
+
+ Set this command as local trigger.
+
+.. option:: --trigger-remote=cmd
+
+ Set this command as remote trigger.
+
+.. option:: --aux-path=path
+
+ Use this path for fio state generated files.
+
+Any parameters following the options will be assumed to be job files, unless
+they match a job file parameter. Multiple job files can be listed and each job
+file will be regarded as a separate group. Fio will :option:`stonewall`
+execution between each group.
+
+
+Job file format
+---------------
+
+As previously described, fio accepts one or more job files describing what it is
+supposed to do. The job file format is the classic ini file, where the names
+enclosed in [] brackets define the job name. You are free to use any ASCII name
+you want, except *global* which has special meaning. Following the job name is
+a sequence of zero or more parameters, one per line, that define the behavior of
+the job. If the first character in a line is a ';' or a '#', the entire line is
+discarded as a comment.
+
+A *global* section sets defaults for the jobs described in that file. A job may
+override a *global* section parameter, and a job file may even have several
+*global* sections if so desired. A job is only affected by a *global* section
+residing above it.
+
+The :option:`--cmdhelp` option also lists all options. If used with an `option`
+argument, :option:`--cmdhelp` will detail the given `option`.
+
+See the `examples/` directory for inspiration on how to write job files. Note
+the copyright and license requirements currently apply to `examples/` files.
+
+So let's look at a really simple job file that defines two processes, each
+randomly reading from a 128MiB file:
+
+.. code-block:: ini
+
+ ; -- start job file --
+ [global]
+ rw=randread
+ size=128m
+
+ [job1]
+
+ [job2]
+
+ ; -- end job file --
+
+As you can see, the job file sections themselves are empty as all the described
+parameters are shared. As no :option:`filename` option is given, fio makes up a
+`filename` for each of the jobs as it sees fit. On the command line, this job
+would look as follows::
+
+$ fio --name=global --rw=randread --size=128m --name=job1 --name=job2
+
+
+Let's look at an example that has a number of processes writing randomly to
+files:
+
+.. code-block:: ini
+
+ ; -- start job file --
+ [random-writers]
+ ioengine=libaio
+ iodepth=4
+ rw=randwrite
+ bs=32k
+ direct=0
+ size=64m
+ numjobs=4
+ ; -- end job file --
+
+Here we have no *global* section, as we only have one job defined anyway. We
+want to use async I/O here, with a depth of 4 for each file. We also increased
+the buffer size used to 32KiB and define numjobs to 4 to fork 4 identical
+jobs. The result is 4 processes each randomly writing to their own 64MiB
+file. Instead of using the above job file, you could have given the parameters
+on the command line. For this case, you would specify::
+
+$ fio --name=random-writers --ioengine=libaio --iodepth=4 --rw=randwrite --bs=32k --direct=0 --size=64m --numjobs=4
+
+When fio is utilized as a basis of any reasonably large test suite, it might be
+desirable to share a set of standardized settings across multiple job files.
+Instead of copy/pasting such settings, any section may pull in an external
+:file:`filename.fio` file with *include filename* directive, as in the following
+example::
+
+ ; -- start job file including.fio --
+ [global]
+ filename=/tmp/test
+ filesize=1m
+ include glob-include.fio
+
+ [test]
+ rw=randread
+ bs=4k
+ time_based=1
+ runtime=10
+ include test-include.fio
+ ; -- end job file including.fio --
+
+.. code-block:: ini
+
+ ; -- start job file glob-include.fio --
+ thread=1
+ group_reporting=1
+ ; -- end job file glob-include.fio --
+
+.. code-block:: ini
+
+ ; -- start job file test-include.fio --
+ ioengine=libaio
+ iodepth=4
+ ; -- end job file test-include.fio --
+
+Settings pulled into a section apply to that section only (except *global*
+section). Include directives may be nested in that any included file may contain
+further include directive(s). Include files may not contain [] sections.
+
+
+Environment variables
+~~~~~~~~~~~~~~~~~~~~~
+
+Fio also supports environment variable expansion in job files. Any sub-string of
+the form ``${VARNAME}`` as part of an option value (in other words, on the right
+of the '='), will be expanded to the value of the environment variable called
+`VARNAME`. If no such environment variable is defined, or `VARNAME` is the
+empty string, the empty string will be substituted.
+
+As an example, let's look at a sample fio invocation and job file::
+
+$ SIZE=64m NUMJOBS=4 fio jobfile.fio
+
+.. code-block:: ini
+
+ ; -- start job file --
+ [random-writers]
+ rw=randwrite
+ size=${SIZE}
+ numjobs=${NUMJOBS}
+ ; -- end job file --
+
+This will expand to the following equivalent job file at runtime:
+
+.. code-block:: ini
+
+ ; -- start job file --
+ [random-writers]
+ rw=randwrite
+ size=64m
+ numjobs=4
+ ; -- end job file --
+
+Fio ships with a few example job files, you can also look there for inspiration.
+
+Reserved keywords
+~~~~~~~~~~~~~~~~~
+
+Additionally, fio has a set of reserved keywords that will be replaced
+internally with the appropriate value. Those keywords are:
+
+**$pagesize**
+
+ The architecture page size of the running system.
+
+**$mb_memory**
+
+ Megabytes of total memory in the system.
+
+**$ncpus**
+
+ Number of online available CPUs.
+
+These can be used on the command line or in the job file, and will be
+automatically substituted with the current system values when the job is
+run. Simple math is also supported on these keywords, so you can perform actions
+like::
+
+ size=8*$mb_memory
+
+and get that properly expanded to 8 times the size of memory in the machine.
+
+
+Job file parameters
+-------------------
+
+This section describes in details each parameter associated with a job. Some
+parameters take an option of a given type, such as an integer or a
+string. Anywhere a numeric value is required, an arithmetic expression may be
+used, provided it is surrounded by parentheses. Supported operators are:
+
+ - addition (+)
+ - subtraction (-)
+ - multiplication (*)
+ - division (/)
+ - modulus (%)
+ - exponentiation (^)
+
+For time values in expressions, units are microseconds by default. This is
+different than for time values not in expressions (not enclosed in
+parentheses). The following types are used:
+
+
+Parameter types
+~~~~~~~~~~~~~~~
+
+**str**
+ String. This is a sequence of alpha characters.
+
+**time**
+ Integer with possible time suffix. In seconds unless otherwise
+ specified, use e.g. 10m for 10 minutes. Accepts s/m/h for seconds, minutes,
+ and hours, and accepts 'ms' (or 'msec') for milliseconds, and 'us' (or
+ 'usec') for microseconds.
+
+.. _int:
+
+**int**
+ Integer. A whole number value, which may contain an integer prefix
+ and an integer suffix:
+
+ [*integer prefix*] **number** [*integer suffix*]
+
+ The optional *integer prefix* specifies the number's base. The default
+ is decimal. *0x* specifies hexadecimal.
+
+ The optional *integer suffix* specifies the number's units, and includes an
+ optional unit prefix and an optional unit. For quantities of data, the
+ default unit is bytes. For quantities of time, the default unit is seconds.
+
+ With :option:`kb_base` =1000, fio follows international standards for unit
+ prefixes. To specify power-of-10 decimal values defined in the
+ International System of Units (SI):
+
+ * *Ki* -- means kilo (K) or 1000
+ * *Mi* -- means mega (M) or 1000**2
+ * *Gi* -- means giga (G) or 1000**3
+ * *Ti* -- means tera (T) or 1000**4
+ * *Pi* -- means peta (P) or 1000**5
+
+ To specify power-of-2 binary values defined in IEC 80000-13:
+
+ * *k* -- means kibi (Ki) or 1024
+ * *M* -- means mebi (Mi) or 1024**2
+ * *G* -- means gibi (Gi) or 1024**3
+ * *T* -- means tebi (Ti) or 1024**4
+ * *P* -- means pebi (Pi) or 1024**5
+
+ With :option:`kb_base` =1024 (the default), the unit prefixes are opposite
+ from those specified in the SI and IEC 80000-13 standards to provide
+ compatibility with old scripts. For example, 4k means 4096.
+
+ For quantities of data, an optional unit of 'B' may be included
+ (e.g., 'kB' is the same as 'k').
+
+ The *integer suffix* is not case sensitive (e.g., m/mi mean mebi/mega,
+ not milli). 'b' and 'B' both mean byte, not bit.
+
+ Examples with :option:`kb_base` =1000:
+
+ * *4 KiB*: 4096, 4096b, 4096B, 4ki, 4kib, 4kiB, 4Ki, 4KiB
+ * *1 MiB*: 1048576, 1mi, 1024ki
+ * *1 MB*: 1000000, 1m, 1000k
+ * *1 TiB*: 1099511627776, 1ti, 1024gi, 1048576mi
+ * *1 TB*: 1000000000, 1t, 1000m, 1000000k
+
+ Examples with :option:`kb_base` =1024 (default):
+
+ * *4 KiB*: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
+ * *1 MiB*: 1048576, 1m, 1024k
+ * *1 MB*: 1000000, 1mi, 1000ki
+ * *1 TiB*: 1099511627776, 1t, 1024g, 1048576m
+ * *1 TB*: 1000000000, 1ti, 1000mi, 1000000ki
+
+ To specify times (units are not case sensitive):
+
+ * *D* -- means days
+ * *H* -- means hours
+ * *M* -- mean minutes
+ * *s* -- or sec means seconds (default)
+ * *ms* -- or *msec* means milliseconds
+ * *us* -- or *usec* means microseconds
+
+ If the option accepts an upper and lower range, use a colon ':' or
+ minus '-' to separate such values. See :ref:`irange <irange>`.
+
+.. _bool:
+
+**bool**
+ Boolean. Usually parsed as an integer, however only defined for
+ true and false (1 and 0).
+
+.. _irange:
+
+**irange**
+ Integer range with suffix. Allows value range to be given, such as
+ 1024-4096. A colon may also be used as the separator, e.g. 1k:4k. If the
+ option allows two sets of ranges, they can be specified with a ',' or '/'
+ delimiter: 1k-4k/8k-32k. Also see :ref:`int <int>`.
+
+**float_list**
+ A list of floating point numbers, separated by a ':' character.
+
+
+Units
+~~~~~
+
+.. option:: kb_base=int
+
+ Select the interpretation of unit prefixes in input parameters.
+
+ **1000**
+ Inputs comply with IEC 80000-13 and the International
+ System of Units (SI). Use:
+
+ - power-of-2 values with IEC prefixes (e.g., KiB)
+ - power-of-10 values with SI prefixes (e.g., kB)
+
+ **1024**
+ Compatibility mode (default). To avoid breaking old scripts:
+
+ - power-of-2 values with SI prefixes
+ - power-of-10 values with IEC prefixes
+
+ See :option:`bs` for more details on input parameters.
+
+ Outputs always use correct prefixes. Most outputs include both
+ side-by-side, like::
+
+ bw=2383.3kB/s (2327.4KiB/s)
+
+ If only one value is reported, then kb_base selects the one to use:
+
+ **1000** -- SI prefixes
+
+ **1024** -- IEC prefixes
+
+.. option:: unit_base=int
+
+ Base unit for reporting. Allowed values are:
+
+ **0**
+ Use auto-detection (default).
+ **8**
+ Byte based.
+ **1**
+ Bit based.
+
+
+With the above in mind, here follows the complete list of fio job parameters.
+
+
+Job description
+~~~~~~~~~~~~~~~
+
+.. option:: name=str
+
+ ASCII name of the job. This may be used to override the name printed by fio
+ for this job. Otherwise the job name is used. On the command line this
+ parameter has the special purpose of also signaling the start of a new job.
+
+.. option:: description=str
+
+ Text description of the job. Doesn't do anything except dump this text
+ description when this job is run. It's not parsed.
+
+.. option:: loops=int
+
+ Run the specified number of iterations of this job. Used to repeat the same
+ workload a given number of times. Defaults to 1.
+
+.. option:: numjobs=int
+
+ Create the specified number of clones of this job. Each clone of job
+ is spawned as an independent thread or process. May be used to setup a
+ larger number of threads/processes doing the same thing. Each thread is
+ reported separately; to see statistics for all clones as a whole, use
+ :option:`group_reporting` in conjunction with :option:`new_group`.
+ See :option:`--max-jobs`.
+
+
+Time related parameters
+~~~~~~~~~~~~~~~~~~~~~~~
+
+.. option:: runtime=time
+
+ Tell fio to terminate processing after the specified period of time. It
+ can be quite hard to determine for how long a specified job will run, so
+ this parameter is handy to cap the total runtime to a given time. When
+ the unit is omitted, the value is given in seconds.
+
+.. option:: time_based
+
+ If set, fio will run for the duration of the :option:`runtime` specified
+ even if the file(s) are completely read or written. It will simply loop over
+ the same workload as many times as the :option:`runtime` allows.
+
+.. option:: startdelay=irange(time)
+
+ Delay start of job for the specified number of seconds. Supports all time
+ suffixes to allow specification of hours, minutes, seconds and milliseconds
+ -- seconds are the default if a unit is omitted. Can be given as a range
+ which causes each thread to choose randomly out of the range.
+
+.. option:: ramp_time=time
+
+ If set, fio will run the specified workload for this amount of time before
+ logging any performance numbers. Useful for letting performance settle
+ before logging results, thus minimizing the runtime required for stable
+ results. Note that the ``ramp_time`` is considered lead in time for a job,
+ thus it will increase the total runtime if a special timeout or
+ :option:`runtime` is specified. When the unit is omitted, the value is
+ given in seconds.
+
+.. option:: clocksource=str
+
+ Use the given clocksource as the base of timing. The supported options are:
+
+ **gettimeofday**
+ :manpage:`gettimeofday(2)`
+
+ **clock_gettime**
+ :manpage:`clock_gettime(2)`
+
+ **cpu**
+ Internal CPU clock source
+
+ cpu is the preferred clocksource if it is reliable, as it is very fast (and
+ fio is heavy on time calls). Fio will automatically use this clocksource if
+ it's supported and considered reliable on the system it is running on,
+ unless another clocksource is specifically set. For x86/x86-64 CPUs, this
+ means supporting TSC Invariant.
+
+.. option:: gtod_reduce=bool
+
+ Enable all of the :manpage:`gettimeofday(2)` reducing options
+ (:option:`disable_clat`, :option:`disable_slat`, :option:`disable_bw_measurement`) plus
+ reduce precision of the timeout somewhat to really shrink the
+ :manpage:`gettimeofday(2)` call count. With this option enabled, we only do
+ about 0.4% of the :manpage:`gettimeofday(2)` calls we would have done if all
+ time keeping was enabled.
+
+.. option:: gtod_cpu=int
+
+ Sometimes it's cheaper to dedicate a single thread of execution to just
+ getting the current time. Fio (and databases, for instance) are very
+ intensive on :manpage:`gettimeofday(2)` calls. With this option, you can set
+ one CPU aside for doing nothing but logging current time to a shared memory
+ location. Then the other threads/processes that run I/O workloads need only
+ copy that segment, instead of entering the kernel with a
+ :manpage:`gettimeofday(2)` call. The CPU set aside for doing these time
+ calls will be excluded from other uses. Fio will manually clear it from the
+ CPU mask of other jobs.
+
+
+Target file/device
+~~~~~~~~~~~~~~~~~~
+
+.. option:: directory=str
+
+ Prefix filenames with this directory. Used to place files in a different
+ location than :file:`./`. You can specify a number of directories by
+ separating the names with a ':' character. These directories will be
+ assigned equally distributed to job clones creates with :option:`numjobs` as
+ long as they are using generated filenames. If specific `filename(s)` are
+ set fio will use the first listed directory, and thereby matching the
+ `filename` semantic which generates a file each clone if not specified, but
+ let all clones use the same if set.
+
+ See the :option:`filename` option for escaping certain characters.
+
+.. option:: filename=str
+
+ Fio normally makes up a `filename` based on the job name, thread number, and
+ file number. If you want to share files between threads in a job or several
+ jobs with fixed file paths, specify a `filename` for each of them to override
+ the default. If the ioengine is file based, you can specify a number of files
+ by separating the names with a ':' colon. So if you wanted a job to open
+ :file:`/dev/sda` and :file:`/dev/sdb` as the two working files, you would use
+ ``filename=/dev/sda:/dev/sdb``. This also means that whenever this option is
+ specified, :option:`nrfiles` is ignored. The size of regular files specified
+ by this option will be :option:`size` divided by number of files unless
+ explicit size is specified by :option:`filesize`.
+
+ On Windows, disk devices are accessed as :file:`\\\\.\\PhysicalDrive0` for
+ the first device, :file:`\\\\.\\PhysicalDrive1` for the second etc.
+ Note: Windows and FreeBSD prevent write access to areas
+ of the disk containing in-use data (e.g. filesystems). If the wanted
+ `filename` does need to include a colon, then escape that with a ``\``
+ character. For instance, if the `filename` is :file:`/dev/dsk/foo@3,0:c`,
+ then you would use ``filename="/dev/dsk/foo@3,0\:c"``. The
+ :file:`-` is a reserved name, meaning stdin or stdout. Which of the two
+ depends on the read/write direction set.
+
+.. option:: filename_format=str
+
+ If sharing multiple files between jobs, it is usually necessary to have fio
+ generate the exact names that you want. By default, fio will name a file
+ based on the default file format specification of
+ :file:`jobname.jobnumber.filenumber`. With this option, that can be
+ customized. Fio will recognize and replace the following keywords in this
+ string:
+
+ **$jobname**
+ The name of the worker thread or process.
+ **$jobnum**
+ The incremental number of the worker thread or process.
+ **$filenum**
+ The incremental number of the file for that worker thread or
+ process.
+
+ To have dependent jobs share a set of files, this option can be set to have
+ fio generate filenames that are shared between the two. For instance, if
+ :file:`testfiles.$filenum` is specified, file number 4 for any job will be
+ named :file:`testfiles.4`. The default of :file:`$jobname.$jobnum.$filenum`
+ will be used if no other format specifier is given.
+
+.. option:: unique_filename=bool
+
+ To avoid collisions between networked clients, fio defaults to prefixing any
+ generated filenames (with a directory specified) with the source of the
+ client connecting. To disable this behavior, set this option to 0.
+
+.. option:: opendir=str
+
+ Recursively open any files below directory `str`.
+
+.. option:: lockfile=str
+
+ Fio defaults to not locking any files before it does I/O to them. If a file
+ or file descriptor is shared, fio can serialize I/O to that file to make the
+ end result consistent. This is usual for emulating real workloads that share
+ files. The lock modes are:
+
+ **none**
+ No locking. The default.
+ **exclusive**
+ Only one thread or process may do I/O at a time, excluding all
+ others.
+ **readwrite**
+ Read-write locking on the file. Many readers may
+ access the file at the same time, but writes get exclusive access.
+
+.. option:: nrfiles=int
+
+ Number of files to use for this job. Defaults to 1. The size of files
+ will be :option:`size` divided by this unless explicit size is specified by
+ :option:`filesize`. Files are created for each thread separately, and each
+ file will have a file number within its name by default, as explained in
+ :option:`filename` section.
+
+
+.. option:: openfiles=int
+
+ Number of files to keep open at the same time. Defaults to the same as
+ :option:`nrfiles`, can be set smaller to limit the number simultaneous
+ opens.
+
+.. option:: file_service_type=str
+
+ Defines how fio decides which file from a job to service next. The following
+ types are defined:
+
+ **random**
+ Choose a file at random.
+
+ **roundrobin**
+ Round robin over opened files. This is the default.
+
+ **sequential**
+ Finish one file before moving on to the next. Multiple files can
+ still be open depending on 'openfiles'.
+
+ **zipf**
+ Use a *Zipf* distribution to decide what file to access.
+
+ **pareto**
+ Use a *Pareto* distribution to decide what file to access.
+
+ **gauss**
+ Use a *Gaussian* (normal) distribution to decide what file to
+ access.
+
+ For *random*, *roundrobin*, and *sequential*, a postfix can be appended to
+ tell fio how many I/Os to issue before switching to a new file. For example,
+ specifying ``file_service_type=random:8`` would cause fio to issue
+ 8 I/Os before selecting a new file at random. For the non-uniform
+ distributions, a floating point postfix can be given to influence how the
+ distribution is skewed. See :option:`random_distribution` for a description
+ of how that would work.
+
+.. option:: ioscheduler=str
+
+ Attempt to switch the device hosting the file to the specified I/O scheduler
+ before running.
+
+.. option:: create_serialize=bool
+
+ If true, serialize the file creation for the jobs. This may be handy to
+ avoid interleaving of data files, which may greatly depend on the filesystem
+ used and even the number of processors in the system.
+
+.. option:: create_fsync=bool
+
+ fsync the data file after creation. This is the default.
+
+.. option:: create_on_open=bool
+
+ Don't pre-setup the files for I/O, just create open() when it's time to do
+ I/O to that file.
+
+.. option:: create_only=bool
+
+ If true, fio will only run the setup phase of the job. If files need to be
+ laid out or updated on disk, only that will be done. The actual job contents
+ are not executed.
+
+.. option:: allow_file_create=bool
+
+ If true, fio is permitted to create files as part of its workload. This is
+ the default behavior. If this option is false, then fio will error out if
+ the files it needs to use don't already exist. Default: true.
+
+.. option:: allow_mounted_write=bool
+
+ If this isn't set, fio will abort jobs that are destructive (e.g. that write)
+ to what appears to be a mounted device or partition. This should help catch
+ creating inadvertently destructive tests, not realizing that the test will
+ destroy data on the mounted file system. Default: false.
+
+.. option:: pre_read=bool
+
+ If this is given, files will be pre-read into memory before starting the
+ given I/O operation. This will also clear the :option:`invalidate` flag,
+ since it is pointless to pre-read and then drop the cache. This will only
+ work for I/O engines that are seek-able, since they allow you to read the
+ same data multiple times. Thus it will not work on e.g. network or splice I/O.
+
+.. option:: unlink=bool
+
+ Unlink the job files when done. Not the default, as repeated runs of that
+ job would then waste time recreating the file set again and again.
+
+.. option:: unlink_each_loop=bool
+
+ Unlink job files after each iteration or loop.
+
+.. option:: zonesize=int
+
+ Divide a file into zones of the specified size. See :option:`zoneskip`.
+
+.. option:: zonerange=int
+
+ Give size of an I/O zone. See :option:`zoneskip`.
+
+.. option:: zoneskip=int
+
+ Skip the specified number of bytes when :option:`zonesize` data has been
+ read. The two zone options can be used to only do I/O on zones of a file.
+
+
+I/O type
+~~~~~~~~
+
+.. option:: direct=bool
+
+ If value is true, use non-buffered I/O. This is usually O_DIRECT. Note that
+ ZFS on Solaris doesn't support direct I/O. On Windows the synchronous
+ ioengines don't support direct I/O. Default: false.
+
+.. option:: atomic=bool
+
+ If value is true, attempt to use atomic direct I/O. Atomic writes are
+ guaranteed to be stable once acknowledged by the operating system. Only
+ Linux supports O_ATOMIC right now.
+
+.. option:: buffered=bool
+
+ If value is true, use buffered I/O. This is the opposite of the
+ :option:`direct` option. Defaults to true.
+
+.. option:: readwrite=str, rw=str
+
+ Type of I/O pattern. Accepted values are:
+
+ **read**
+ Sequential reads.
+ **write**
+ Sequential writes.
+ **trim**
+ Sequential trims (Linux block devices only).
+ **randwrite**
+ Random writes.
+ **randread**
+ Random reads.
+ **randtrim**
+ Random trims (Linux block devices only).
+ **rw,readwrite**
+ Sequential mixed reads and writes.
+ **randrw**
+ Random mixed reads and writes.
+ **trimwrite**
+ Sequential trim+write sequences. Blocks will be trimmed first,
+ then the same blocks will be written to.
+
+ Fio defaults to read if the option is not specified. For the mixed I/O
+ types, the default is to split them 50/50. For certain types of I/O the
+ result may still be skewed a bit, since the speed may be different. It is
+ possible to specify a number of I/O's to do before getting a new offset,
+ this is done by appending a ``:<nr>`` to the end of the string given. For a
+ random read, it would look like ``rw=randread:8`` for passing in an offset
+ modifier with a value of 8. If the suffix is used with a sequential I/O
+ pattern, then the value specified will be added to the generated offset for
+ each I/O. For instance, using ``rw=write:4k`` will skip 4k for every
+ write. It turns sequential I/O into sequential I/O with holes. See the
+ :option:`rw_sequencer` option.
+
+.. option:: rw_sequencer=str
+
+ If an offset modifier is given by appending a number to the ``rw=<str>``
+ line, then this option controls how that number modifies the I/O offset
+ being generated. Accepted values are:
+
+ **sequential**
+ Generate sequential offset.
+ **identical**
+ Generate the same offset.
+
+ ``sequential`` is only useful for random I/O, where fio would normally
+ generate a new random offset for every I/O. If you append e.g. 8 to randread,
+ you would get a new random offset for every 8 I/O's. The result would be a
+ seek for only every 8 I/O's, instead of for every I/O. Use ``rw=randread:8``
+ to specify that. As sequential I/O is already sequential, setting
+ ``sequential`` for that would not result in any differences. ``identical``
+ behaves in a similar fashion, except it sends the same offset 8 number of
+ times before generating a new offset.
+
+.. option:: unified_rw_reporting=bool
+
+ Fio normally reports statistics on a per data direction basis, meaning that
+ reads, writes, and trims are accounted and reported separately. If this
+ option is set fio sums the results and report them as "mixed" instead.
+
+.. option:: randrepeat=bool
+
+ Seed the random number generator used for random I/O patterns in a
+ predictable way so the pattern is repeatable across runs. Default: true.
+
+.. option:: allrandrepeat=bool
+
+ Seed all random number generators in a predictable way so results are
+ repeatable across runs. Default: false.
+
+.. option:: randseed=int
+
+ Seed the random number generators based on this seed value, to be able to
+ control what sequence of output is being generated. If not set, the random
+ sequence depends on the :option:`randrepeat` setting.
+
+.. option:: fallocate=str
+
+ Whether pre-allocation is performed when laying down files.
+ Accepted values are:
+
+ **none**
+ Do not pre-allocate space.
+
+ **posix**
+ Pre-allocate via :manpage:`posix_fallocate(3)`.
+
+ **keep**
+ Pre-allocate via :manpage:`fallocate(2)` with
+ FALLOC_FL_KEEP_SIZE set.
+
+ **0**
+ Backward-compatible alias for **none**.
+
+ **1**
+ Backward-compatible alias for **posix**.
+
+ May not be available on all supported platforms. **keep** is only available
+ on Linux. If using ZFS on Solaris this must be set to **none** because ZFS
+ doesn't support it. Default: **posix**.
+
+.. option:: fadvise_hint=str
+
+ Use :manpage:`posix_fadvise(2)` to advise the kernel on what I/O patterns
+ are likely to be issued. Accepted values are:
+
+ **0**
+ Backwards-compatible hint for "no hint".
+
+ **1**
+ Backwards compatible hint for "advise with fio workload type". This
+ uses **FADV_RANDOM** for a random workload, and **FADV_SEQUENTIAL**
+ for a sequential workload.
+
+ **sequential**
+ Advise using **FADV_SEQUENTIAL**.
+
+ **random**
+ Advise using **FADV_RANDOM**.
+
+.. option:: fadvise_stream=int
+
+ Use :manpage:`posix_fadvise(2)` to advise the kernel what stream ID the
+ writes issued belong to. Only supported on Linux. Note, this option may
+ change going forward.
+
+.. option:: offset=int
+
+ Start I/O at the given offset in the file. The data before the given offset
+ will not be touched. This effectively caps the file size at `real_size -
+ offset`. Can be combined with :option:`size` to constrain the start and
+ end range that I/O will be done within.
+
+.. option:: offset_increment=int
+
+ If this is provided, then the real offset becomes `offset + offset_increment
+ * thread_number`, where the thread number is a counter that starts at 0 and
+ is incremented for each sub-job (i.e. when :option:`numjobs` option is
+ specified). This option is useful if there are several jobs which are
+ intended to operate on a file in parallel disjoint segments, with even
+ spacing between the starting points.
+
+.. option:: number_ios=int
+
+ Fio will normally perform I/Os until it has exhausted the size of the region
+ set by :option:`size`, or if it exhaust the allocated time (or hits an error
+ condition). With this setting, the range/size can be set independently of
+ the number of I/Os to perform. When fio reaches this number, it will exit
+ normally and report status. Note that this does not extend the amount of I/O
+ that will be done, it will only stop fio if this condition is met before
+ other end-of-job criteria.
+
+.. option:: fsync=int
+
+ If writing to a file, issue a sync of the dirty data for every number of
+ blocks given. For example, if you give 32 as a parameter, fio will sync the
+ file for every 32 writes issued. If fio is using non-buffered I/O, we may
+ not sync the file. The exception is the sg I/O engine, which synchronizes
+ the disk cache anyway.
+
+.. option:: fdatasync=int
+
+ Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
+ not metadata blocks. In Windows, FreeBSD, and DragonFlyBSD there is no
+ :manpage:`fdatasync(2)`, this falls back to using :manpage:`fsync(2)`.
+
+.. option:: write_barrier=int
+
+ Make every `N-th` write a barrier write.
+
+.. option:: sync_file_range=str:val
+
+ Use :manpage:`sync_file_range(2)` for every `val` number of write
+ operations. Fio will track range of writes that have happened since the last
+ :manpage:`sync_file_range(2)` call. `str` can currently be one or more of:
+
+ **wait_before**
+ SYNC_FILE_RANGE_WAIT_BEFORE
+ **write**
+ SYNC_FILE_RANGE_WRITE
+ **wait_after**
+ SYNC_FILE_RANGE_WAIT_AFTER
+
+ So if you do ``sync_file_range=wait_before,write:8``, fio would use
+ ``SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE`` for every 8
+ writes. Also see the :manpage:`sync_file_range(2)` man page. This option is
+ Linux specific.
+
+.. option:: overwrite=bool
+
+ If true, writes to a file will always overwrite existing data. If the file
+ doesn't already exist, it will be created before the write phase begins. If
+ the file exists and is large enough for the specified write phase, nothing
+ will be done.
+
+.. option:: end_fsync=bool
+
+ If true, fsync file contents when a write stage has completed.
+
+.. option:: fsync_on_close=bool
+
+ If true, fio will :manpage:`fsync(2)` a dirty file on close. This differs
+ from end_fsync in that it will happen on every file close, not just at the
+ end of the job.
+
+.. option:: rwmixread=int
+
+ Percentage of a mixed workload that should be reads. Default: 50.
+
+.. option:: rwmixwrite=int
+
+ Percentage of a mixed workload that should be writes. If both
+ :option:`rwmixread` and :option:`rwmixwrite` is given and the values do not
+ add up to 100%, the latter of the two will be used to override the
+ first. This may interfere with a given rate setting, if fio is asked to
+ limit reads or writes to a certain rate. If that is the case, then the
+ distribution may be skewed. Default: 50.
+
+.. option:: random_distribution=str:float[,str:float][,str:float]
+
+ By default, fio will use a completely uniform random distribution when asked
+ to perform random I/O. Sometimes it is useful to skew the distribution in
+ specific ways, ensuring that some parts of the data is more hot than others.
+ fio includes the following distribution models:
+
+ **random**
+ Uniform random distribution
+
+ **zipf**
+ Zipf distribution
+
+ **pareto**
+ Pareto distribution
+
+ **gauss**
+ Normal (Gaussian) distribution
+
+ **zoned**
+ Zoned random distribution
+
+ When using a **zipf** or **pareto** distribution, an input value is also
+ needed to define the access pattern. For **zipf**, this is the `zipf
+ theta`. For **pareto**, it's the `Pareto power`. Fio includes a test
+ program, :command:`genzipf`, that can be used visualize what the given input
+ values will yield in terms of hit rates. If you wanted to use **zipf** with
+ a `theta` of 1.2, you would use ``random_distribution=zipf:1.2`` as the
+ option. If a non-uniform model is used, fio will disable use of the random
+ map. For the **gauss** distribution, a normal deviation is supplied as a
+ value between 0 and 100.
+
+ For a **zoned** distribution, fio supports specifying percentages of I/O
+ access that should fall within what range of the file or device. For
+ example, given a criteria of:
+
+ * 60% of accesses should be to the first 10%
+ * 30% of accesses should be to the next 20%
+ * 8% of accesses should be to to the next 30%
+ * 2% of accesses should be to the next 40%
+
+ we can define that through zoning of the random accesses. For the above
+ example, the user would do::
+
+ random_distribution=zoned:60/10:30/20:8/30:2/40
+
+ similarly to how :option:`bssplit` works for setting ranges and percentages
+ of block sizes. Like :option:`bssplit`, it's possible to specify separate
+ zones for reads, writes, and trims. If just one set is given, it'll apply to
+ all of them.
+
+.. option:: percentage_random=int[,int][,int]
+
+ For a random workload, set how big a percentage should be random. This
+ defaults to 100%, in which case the workload is fully random. It can be set
+ from anywhere from 0 to 100. Setting it to 0 would make the workload fully
+ sequential. Any setting in between will result in a random mix of sequential
+ and random I/O, at the given percentages. Comma-separated values may be
+ specified for reads, writes, and trims as described in :option:`blocksize`.
+
+.. option:: norandommap
+
+ Normally fio will cover every block of the file when doing random I/O. If
+ this option is given, fio will just get a new random offset without looking
+ at past I/O history. This means that some blocks may not be read or written,
+ and that some blocks may be read/written more than once. If this option is
+ used with :option:`verify` and multiple blocksizes (via :option:`bsrange`),
+ only intact blocks are verified, i.e., partially-overwritten blocks are
+ ignored.
+
+.. option:: softrandommap=bool
+
+ See :option:`norandommap`. If fio runs with the random block map enabled and
+ it fails to allocate the map, if this option is set it will continue without
+ a random block map. As coverage will not be as complete as with random maps,
+ this option is disabled by default.
+
+.. option:: random_generator=str
+
+ Fio supports the following engines for generating
+ I/O offsets for random I/O:
+
+ **tausworthe**
+ Strong 2^88 cycle random number generator
+ **lfsr**
+ Linear feedback shift register generator
+ **tausworthe64**
+ Strong 64-bit 2^258 cycle random number generator
+
+ **tausworthe** is a strong random number generator, but it requires tracking
+ on the side if we want to ensure that blocks are only read or written
+ once. **LFSR** guarantees that we never generate the same offset twice, and
+ it's also less computationally expensive. It's not a true random generator,
+ however, though for I/O purposes it's typically good enough. **LFSR** only
+ works with single block sizes, not with workloads that use multiple block
+ sizes. If used with such a workload, fio may read or write some blocks
+ multiple times. The default value is **tausworthe**, unless the required
+ space exceeds 2^32 blocks. If it does, then **tausworthe64** is
+ selected automatically.
+
+
+Block size
+~~~~~~~~~~
+
+.. option:: blocksize=int[,int][,int], bs=int[,int][,int]
+
+ The block size in bytes used for I/O units. Default: 4096. A single value
+ applies to reads, writes, and trims. Comma-separated values may be
+ specified for reads, writes, and trims. A value not terminated in a comma
+ applies to subsequent types.
+
+ Examples:
+
+ **bs=256k**
+ means 256k for reads, writes and trims.
+
+ **bs=8k,32k**
+ means 8k for reads, 32k for writes and trims.
+
+ **bs=8k,32k,**
+ means 8k for reads, 32k for writes, and default for trims.
+
+ **bs=,8k**
+ means default for reads, 8k for writes and trims.
+
+ **bs=,8k,**
+ means default for reads, 8k for writes, and default for writes.
+
+.. option:: blocksize_range=irange[,irange][,irange], bsrange=irange[,irange][,irange]
+
+ A range of block sizes in bytes for I/O units. The issued I/O unit will
+ always be a multiple of the minimum size, unless
+ :option:`blocksize_unaligned` is set.
+
+ Comma-separated ranges may be specified for reads, writes, and trims as
+ described in :option:`blocksize`.
+
+ Example: ``bsrange=1k-4k,2k-8k``.
+
+.. option:: bssplit=str[,str][,str]
+
+ Sometimes you want even finer grained control of the block sizes issued, not
+ just an even split between them. This option allows you to weight various
+ block sizes, so that you are able to define a specific amount of block sizes
+ issued. The format for this option is::
+
+ bssplit=blocksize/percentage:blocksize/percentage
+
+ for as many block sizes as needed. So if you want to define a workload that
+ has 50% 64k blocks, 10% 4k blocks, and 40% 32k blocks, you would write::
+
+ bssplit=4k/10:64k/50:32k/40
+
+ Ordering does not matter. If the percentage is left blank, fio will fill in
+ the remaining values evenly. So a bssplit option like this one::
+
+ bssplit=4k/50:1k/:32k/
+
+ would have 50% 4k ios, and 25% 1k and 32k ios. The percentages always add up
+ to 100, if bssplit is given a range that adds up to more, it will error out.
+
+ Comma-separated values may be specified for reads, writes, and trims as
+ described in :option:`blocksize`.
+
+ If you want a workload that has 50% 2k reads and 50% 4k reads, while having
+ 90% 4k writes and 10% 8k writes, you would specify::
+
+ bssplit=2k/50:4k/50,4k/90,8k/10
+
+.. option:: blocksize_unaligned, bs_unaligned
+
+ If set, fio will issue I/O units with any size within
+ :option:`blocksize_range`, not just multiples of the minimum size. This
+ typically won't work with direct I/O, as that normally requires sector
+ alignment.
+
+.. option:: bs_is_seq_rand
+
+ If this option is set, fio will use the normal read,write blocksize settings
+ as sequential,random blocksize settings instead. Any random read or write
+ will use the WRITE blocksize settings, and any sequential read or write will
+ use the READ blocksize settings.
+
+.. option:: blockalign=int[,int][,int], ba=int[,int][,int]
+
+ Boundary to which fio will align random I/O units. Default:
+ :option:`blocksize`. Minimum alignment is typically 512b for using direct
+ I/O, though it usually depends on the hardware block size. This option is
+ mutually exclusive with using a random map for files, so it will turn off
+ that option. Comma-separated values may be specified for reads, writes, and
+ trims as described in :option:`blocksize`.
+
+
+Buffers and memory
+~~~~~~~~~~~~~~~~~~
+
+.. option:: zero_buffers
+
+ Initialize buffers with all zeros. Default: fill buffers with random data.
+
+.. option:: refill_buffers
+
+ If this option is given, fio will refill the I/O buffers on every
+ submit. The default is to only fill it at init time and reuse that
+ data. Only makes sense if zero_buffers isn't specified, naturally. If data
+ verification is enabled, `refill_buffers` is also automatically enabled.
+
+.. option:: scramble_buffers=bool
+
+ If :option:`refill_buffers` is too costly and the target is using data
+ deduplication, then setting this option will slightly modify the I/O buffer
+ contents to defeat normal de-dupe attempts. This is not enough to defeat
+ more clever block compression attempts, but it will stop naive dedupe of
+ blocks. Default: true.
+
+.. option:: buffer_compress_percentage=int
+
+ If this is set, then fio will attempt to provide I/O buffer content (on
+ WRITEs) that compress to the specified level. Fio does this by providing a
+ mix of random data and a fixed pattern. The fixed pattern is either zeroes,
+ or the pattern specified by :option:`buffer_pattern`. If the pattern option
+ is used, it might skew the compression ratio slightly. Note that this is per
+ block size unit, for file/disk wide compression level that matches this
+ setting, you'll also want to set :option:`refill_buffers`.
+
+.. option:: buffer_compress_chunk=int
+
+ See :option:`buffer_compress_percentage`. This setting allows fio to manage
+ how big the ranges of random data and zeroed data is. Without this set, fio
+ will provide :option:`buffer_compress_percentage` of blocksize random data,
+ followed by the remaining zeroed. With this set to some chunk size smaller
+ than the block size, fio can alternate random and zeroed data throughout the
+ I/O buffer.
+
+.. option:: buffer_pattern=str
+
+ If set, fio will fill the I/O buffers with this pattern. If not set, the
+ contents of I/O buffers is defined by the other options related to buffer
+ contents. The setting can be any pattern of bytes, and can be prefixed with
+ 0x for hex values. It may also be a string, where the string must then be
+ wrapped with ``""``, e.g.::
+
+ buffer_pattern="abcd"
+
+ or::
+
+ buffer_pattern=-12
+
+ or::
+
+ buffer_pattern=0xdeadface
+
+ Also you can combine everything together in any order::
+
+ buffer_pattern=0xdeadface"abcd"-12
+
+.. option:: dedupe_percentage=int
+
+ If set, fio will generate this percentage of identical buffers when
+ writing. These buffers will be naturally dedupable. The contents of the
+ buffers depend on what other buffer compression settings have been set. It's
+ possible to have the individual buffers either fully compressible, or not at
+ all. This option only controls the distribution of unique buffers.
+
+.. option:: invalidate=bool
+
+ Invalidate the buffer/page cache parts for this file prior to starting
+ I/O if the platform and file type support it. Defaults to true.
+ This will be ignored if :option:`pre_read` is also specified for the
+ same job.
+
+.. option:: sync=bool
+
+ Use synchronous I/O for buffered writes. For the majority of I/O engines,
+ this means using O_SYNC. Default: false.
+
+.. option:: iomem=str, mem=str
+
+ Fio can use various types of memory as the I/O unit buffer. The allowed
+ values are:
+
+ **malloc**
+ Use memory from :manpage:`malloc(3)` as the buffers. Default memory
+ type.
+
+ **shm**
+ Use shared memory as the buffers. Allocated through
+ :manpage:`shmget(2)`.
+
+ **shmhuge**
+ Same as shm, but use huge pages as backing.
+
+ **mmap**
+ Use mmap to allocate buffers. May either be anonymous memory, or can
+ be file backed if a filename is given after the option. The format
+ is `mem=mmap:/path/to/file`.
+
+ **mmaphuge**
+ Use a memory mapped huge file as the buffer backing. Append filename
+ after mmaphuge, ala `mem=mmaphuge:/hugetlbfs/file`.
+
+ **mmapshared**
+ Same as mmap, but use a MMAP_SHARED mapping.
+
+ The area allocated is a function of the maximum allowed bs size for the job,
+ multiplied by the I/O depth given. Note that for **shmhuge** and
+ **mmaphuge** to work, the system must have free huge pages allocated. This
+ can normally be checked and set by reading/writing
+ :file:`/proc/sys/vm/nr_hugepages` on a Linux system. Fio assumes a huge page
+ is 4MiB in size. So to calculate the number of huge pages you need for a
+ given job file, add up the I/O depth of all jobs (normally one unless
+ :option:`iodepth` is used) and multiply by the maximum bs set. Then divide
+ that number by the huge page size. You can see the size of the huge pages in
+ :file:`/proc/meminfo`. If no huge pages are allocated by having a non-zero
+ number in `nr_hugepages`, using **mmaphuge** or **shmhuge** will fail. Also
+ see :option:`hugepage-size`.
+
+ **mmaphuge** also needs to have hugetlbfs mounted and the file location
+ should point there. So if it's mounted in :file:`/huge`, you would use
+ `mem=mmaphuge:/huge/somefile`.
+
+.. option:: iomem_align=int
+
+ This indicates the memory alignment of the I/O memory buffers. Note that
+ the given alignment is applied to the first I/O unit buffer, if using
+ :option:`iodepth` the alignment of the following buffers are given by the
+ :option:`bs` used. In other words, if using a :option:`bs` that is a
+ multiple of the page sized in the system, all buffers will be aligned to
+ this value. If using a :option:`bs` that is not page aligned, the alignment
+ of subsequent I/O memory buffers is the sum of the :option:`iomem_align` and
+ :option:`bs` used.
+
+.. option:: hugepage-size=int
+
+ Defines the size of a huge page. Must at least be equal to the system
+ setting, see :file:`/proc/meminfo`. Defaults to 4MiB. Should probably
+ always be a multiple of megabytes, so using ``hugepage-size=Xm`` is the
+ preferred way to set this to avoid setting a non-pow-2 bad value.
+
+.. option:: lockmem=int
+
+ Pin the specified amount of memory with :manpage:`mlock(2)`. Can be used to
+ simulate a smaller amount of memory. The amount specified is per worker.
+
+
+I/O size
+~~~~~~~~
+
+.. option:: size=int
+
+ The total size of file I/O for each thread of this job. Fio will run until
+ this many bytes has been transferred, unless runtime is limited by other options
+ (such as :option:`runtime`, for instance, or increased/decreased by :option:`io_size`).
+ Fio will divide this size between the available files determined by options
+ such as :option:`nrfiles`, :option:`filename`, unless :option:`filesize` is
+ specified by the job. If the result of division happens to be 0, the size is
+ set to the physical size of the given files or devices if they exist.
+ If this option is not specified, fio will use the full size of the given
+ files or devices. If the files do not exist, size must be given. It is also
+ possible to give size as a percentage between 1 and 100. If ``size=20%`` is
+ given, fio will use 20% of the full size of the given files or devices.
+ Can be combined with :option:`offset` to constrain the start and end range
+ that I/O will be done within.
+
+.. option:: io_size=int, io_limit=int
+
+ Normally fio operates within the region set by :option:`size`, which means
+ that the :option:`size` option sets both the region and size of I/O to be
+ performed. Sometimes that is not what you want. With this option, it is
+ possible to define just the amount of I/O that fio should do. For instance,
+ if :option:`size` is set to 20GiB and :option:`io_size` is set to 5GiB, fio
+ will perform I/O within the first 20GiB but exit when 5GiB have been
+ done. The opposite is also possible -- if :option:`size` is set to 20GiB,
+ and :option:`io_size` is set to 40GiB, then fio will do 40GiB of I/O within
+ the 0..20GiB region.
+
+.. option:: filesize=int
+
+ Individual file sizes. May be a range, in which case fio will select sizes
+ for files at random within the given range and limited to :option:`size` in
+ total (if that is given). If not given, each created file is the same size.
+ This option overrides :option:`size` in terms of file size, which means
+ this value is used as a fixed size or possible range of each file.
+
+.. option:: file_append=bool
+
+ Perform I/O after the end of the file. Normally fio will operate within the
+ size of a file. If this option is set, then fio will append to the file
+ instead. This has identical behavior to setting :option:`offset` to the size
+ of a file. This option is ignored on non-regular files.
+
+.. option:: fill_device=bool, fill_fs=bool
+
+ Sets size to something really large and waits for ENOSPC (no space left on
+ device) as the terminating condition. Only makes sense with sequential
+ write. For a read workload, the mount point will be filled first then I/O
+ started on the result. This option doesn't make sense if operating on a raw
+ device node, since the size of that is already known by the file system.
+ Additionally, writing beyond end-of-device will not return ENOSPC there.
+
+
+I/O engine
+~~~~~~~~~~
+
+.. option:: ioengine=str
+
+ Defines how the job issues I/O to the file. The following types are defined:
+
+ **sync**
+ Basic :manpage:`read(2)` or :manpage:`write(2)`
+ I/O. :manpage:`lseek(2)` is used to position the I/O location.
+
+ **psync**
+ Basic :manpage:`pread(2)` or :manpage:`pwrite(2)` I/O. Default on
+ all supported operating systems except for Windows.
+
+ **vsync**
+ Basic :manpage:`readv(2)` or :manpage:`writev(2)` I/O. Will emulate
+ queuing by coalescing adjacent I/Os into a single submission.
+
+ **pvsync**
+ Basic :manpage:`preadv(2)` or :manpage:`pwritev(2)` I/O.
+
+ **pvsync2**
+ Basic :manpage:`preadv2(2)` or :manpage:`pwritev2(2)` I/O.
+
+ **libaio**
+ Linux native asynchronous I/O. Note that Linux may only support
+ queued behaviour with non-buffered I/O (set ``direct=1`` or
+ ``buffered=0``).
+ This engine defines engine specific options.
+
+ **posixaio**
+ POSIX asynchronous I/O using :manpage:`aio_read(3)` and
+ :manpage:`aio_write(3)`.
+
+ **solarisaio**
+ Solaris native asynchronous I/O.
+
+ **windowsaio**
+ Windows native asynchronous I/O. Default on Windows.
+
+ **mmap**
+ File is memory mapped with :manpage:`mmap(2)` and data copied
+ to/from using :manpage:`memcpy(3)`.
+
+ **splice**
+ :manpage:`splice(2)` is used to transfer the data and
+ :manpage:`vmsplice(2)` to transfer data from user space to the
+ kernel.
+
+ **sg**
+ SCSI generic sg v3 I/O. May either be synchronous using the SG_IO
+ ioctl, or if the target is an sg character device we use
+ :manpage:`read(2)` and :manpage:`write(2)` for asynchronous
+ I/O. Requires filename option to specify either block or character
+ devices.
+
+ **null**
+ Doesn't transfer any data, just pretends to. This is mainly used to
+ exercise fio itself and for debugging/testing purposes.
+
+ **net**
+ Transfer over the network to given ``host:port``. Depending on the
+ :option:`protocol` used, the :option:`hostname`, :option:`port`,
+ :option:`listen` and :option:`filename` options are used to specify
+ what sort of connection to make, while the :option:`protocol` option
+ determines which protocol will be used. This engine defines engine
+ specific options.
+
+ **netsplice**
+ Like **net**, but uses :manpage:`splice(2)` and
+ :manpage:`vmsplice(2)` to map data and send/receive.
+ This engine defines engine specific options.
+
+ **cpuio**
+ Doesn't transfer any data, but burns CPU cycles according to the
+ :option:`cpuload` and :option:`cpuchunks` options. Setting
+ :option:`cpuload` =85 will cause that job to do nothing but burn 85%
+ of the CPU. In case of SMP machines, use :option:`numjobs`
+ =<no_of_cpu> to get desired CPU usage, as the cpuload only loads a
+ single CPU at the desired rate. A job never finishes unless there is
+ at least one non-cpuio job.
+
+ **guasi**
+ The GUASI I/O engine is the Generic Userspace Asyncronous Syscall
+ Interface approach to async I/O. See
+
+ http://www.xmailserver.org/guasi-lib.html
+
+ for more info on GUASI.
+
+ **rdma**
+ The RDMA I/O engine supports both RDMA memory semantics
+ (RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
+ InfiniBand, RoCE and iWARP protocols.
+
+ **falloc**
+ I/O engine that does regular fallocate to simulate data transfer as
+ fio ioengine.
+
+ DDIR_READ
+ does fallocate(,mode = FALLOC_FL_KEEP_SIZE,).
+
+ DDIR_WRITE
+ does fallocate(,mode = 0).
+
+ DDIR_TRIM
+ does fallocate(,mode = FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE).
+
+ **e4defrag**
+ I/O engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate
+ defragment activity in request to DDIR_WRITE event.
+
+ **rbd**
+ I/O engine supporting direct access to Ceph Rados Block Devices
+ (RBD) via librbd without the need to use the kernel rbd driver. This
+ ioengine defines engine specific options.
+
+ **gfapi**
+ Using Glusterfs libgfapi sync interface to direct access to
+ Glusterfs volumes without having to go through FUSE. This ioengine
+ defines engine specific options.
+
+ **gfapi_async**
+ Using Glusterfs libgfapi async interface to direct access to
+ Glusterfs volumes without having to go through FUSE. This ioengine
+ defines engine specific options.
+
+ **libhdfs**
+ Read and write through Hadoop (HDFS). The :file:`filename` option
+ is used to specify host,port of the hdfs name-node to connect. This
+ engine interprets offsets a little differently. In HDFS, files once
+ created cannot be modified. So random writes are not possible. To
+ imitate this, libhdfs engine expects bunch of small files to be
+ created over HDFS, and engine will randomly pick a file out of those
+ files based on the offset generated by fio backend. (see the example
+ job file to create such files, use ``rw=write`` option). Please
+ note, you might want to set necessary environment variables to work
+ with hdfs/libhdfs properly. Each job uses its own connection to
+ HDFS.
+
+ **mtd**
+ Read, write and erase an MTD character device (e.g.,
+ :file:`/dev/mtd0`). Discards are treated as erases. Depending on the
+ underlying device type, the I/O may have to go in a certain pattern,
+ e.g., on NAND, writing sequentially to erase blocks and discarding
+ before overwriting. The writetrim mode works well for this
+ constraint.
+
+ **pmemblk**
+ Read and write using filesystem DAX to a file on a filesystem
+ mounted with DAX on a persistent memory device through the NVML
+ libpmemblk library.
+
+ **dev-dax**
+ Read and write using device DAX to a persistent memory device (e.g.,
+ /dev/dax0.0) through the NVML libpmem library.
+
+ **external**
+ Prefix to specify loading an external I/O engine object file. Append
+ the engine filename, e.g. ``ioengine=external:/tmp/foo.o`` to load
+ ioengine :file:`foo.o` in :file:`/tmp`.
+
+
+I/O engine specific parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In addition, there are some parameters which are only valid when a specific
+ioengine is in use. These are used identically to normal parameters, with the
+caveat that when used on the command line, they must come after the
+:option:`ioengine` that defines them is selected.
+
+.. option:: userspace_reap : [libaio]
+
+ Normally, with the libaio engine in use, fio will use the
+ :manpage:`io_getevents(2)` system call to reap newly returned events. With
+ this flag turned on, the AIO ring will be read directly from user-space to
+ reap events. The reaping mode is only enabled when polling for a minimum of
+ 0 events (e.g. when :option:`iodepth_batch_complete` `=0`).
+
+.. option:: hipri : [pvsync2]
+
+ Set RWF_HIPRI on I/O, indicating to the kernel that it's of higher priority
+ than normal.
+
+.. option:: cpuload=int : [cpuio]
+
+ Attempt to use the specified percentage of CPU cycles.
+
+.. option:: cpuchunks=int : [cpuio]
+
+ Split the load into cycles of the given time. In microseconds.
+
+.. option:: exit_on_io_done=bool : [cpuio]
+
+ Detect when I/O threads are done, then exit.
+
+.. option:: hostname=str : [netsplice] [net]
+
+ The host name or IP address to use for TCP or UDP based I/O. If the job is
+ a TCP listener or UDP reader, the host name is not used and must be omitted
+ unless it is a valid UDP multicast address.
+
+.. option:: namenode=str : [libhdfs]
+
+ The host name or IP address of a HDFS cluster namenode to contact.
+
+.. option:: port=int
+
+ [netsplice], [net]
+
+ The TCP or UDP port to bind to or connect to. If this is used with
+ :option:`numjobs` to spawn multiple instances of the same job type, then
+ this will be the starting port number since fio will use a range of
+ ports.
+
+ [libhdfs]
+
+ the listening port of the HFDS cluster namenode.
+
+.. option:: interface=str : [netsplice] [net]
+
+ The IP address of the network interface used to send or receive UDP
+ multicast.
+
+.. option:: ttl=int : [netsplice] [net]
+
+ Time-to-live value for outgoing UDP multicast packets. Default: 1.
+
+.. option:: nodelay=bool : [netsplice] [net]
+
+ Set TCP_NODELAY on TCP connections.
+
+.. option:: protocol=str : [netsplice] [net]
+
+.. option:: proto=str : [netsplice] [net]
+
+ The network protocol to use. Accepted values are:
+
+ **tcp**
+ Transmission control protocol.
+ **tcpv6**
+ Transmission control protocol V6.
+ **udp**
+ User datagram protocol.
+ **udpv6**
+ User datagram protocol V6.
+ **unix**
+ UNIX domain socket.
+
+ When the protocol is TCP or UDP, the port must also be given, as well as the
+ hostname if the job is a TCP listener or UDP reader. For unix sockets, the
+ normal filename option should be used and the port is invalid.
+
+.. option:: listen : [net]
+
+ For TCP network connections, tell fio to listen for incoming connections
+ rather than initiating an outgoing connection. The :option:`hostname` must
+ be omitted if this option is used.
+
+.. option:: pingpong : [net]
+
+ Normally a network writer will just continue writing data, and a network
+ reader will just consume packages. If ``pingpong=1`` is set, a writer will
+ send its normal payload to the reader, then wait for the reader to send the
+ same payload back. This allows fio to measure network latencies. The
+ submission and completion latencies then measure local time spent sending or
+ receiving, and the completion latency measures how long it took for the
+ other end to receive and send back. For UDP multicast traffic
+ ``pingpong=1`` should only be set for a single reader when multiple readers
+ are listening to the same address.
+
+.. option:: window_size : [net]
+
+ Set the desired socket buffer size for the connection.
+
+.. option:: mss : [net]
+
+ Set the TCP maximum segment size (TCP_MAXSEG).
+
+.. option:: donorname=str : [e4defrag]
+
+ File will be used as a block donor(swap extents between files).
+
+.. option:: inplace=int : [e4defrag]
+
+ Configure donor file blocks allocation strategy:
+
+ **0**
+ Default. Preallocate donor's file on init.
+ **1**
+ Allocate space immediately inside defragment event, and free right
+ after event.
+
+.. option:: clustername=str : [rbd]
+
+ Specifies the name of the Ceph cluster.
+
+.. option:: rbdname=str : [rbd]
+
+ Specifies the name of the RBD.
+
+.. option:: pool=str : [rbd]
+
+ Specifies the name of the Ceph pool containing RBD.
+
+.. option:: clientname=str : [rbd]
+
+ Specifies the username (without the 'client.' prefix) used to access the
+ Ceph cluster. If the *clustername* is specified, the *clientname* shall be
+ the full *type.id* string. If no type. prefix is given, fio will add
+ 'client.' by default.
+
+.. option:: skip_bad=bool : [mtd]
+
+ Skip operations against known bad blocks.
+
+.. option:: hdfsdirectory : [libhdfs]
+
+ libhdfs will create chunk in this HDFS directory.
+
+.. option:: chunk_size : [libhdfs]
+
+ the size of the chunk to use for each file.
+
+
+I/O depth
+~~~~~~~~~
+
+.. option:: iodepth=int
+
+ Number of I/O units to keep in flight against the file. Note that
+ increasing *iodepth* beyond 1 will not affect synchronous ioengines (except
+ for small degrees when :option:`verify_async` is in use). Even async
+ engines may impose OS restrictions causing the desired depth not to be
+ achieved. This may happen on Linux when using libaio and not setting
+ :option:`direct` =1, since buffered I/O is not async on that OS. Keep an
+ eye on the I/O depth distribution in the fio output to verify that the
+ achieved depth is as expected. Default: 1.
+
+.. option:: iodepth_batch_submit=int, iodepth_batch=int
+
+ This defines how many pieces of I/O to submit at once. It defaults to 1
+ which means that we submit each I/O as soon as it is available, but can be
+ raised to submit bigger batches of I/O at the time. If it is set to 0 the
+ :option:`iodepth` value will be used.
+
+.. option:: iodepth_batch_complete_min=int, iodepth_batch_complete=int
+
+ This defines how many pieces of I/O to retrieve at once. It defaults to 1
+ which means that we'll ask for a minimum of 1 I/O in the retrieval process
+ from the kernel. The I/O retrieval will go on until we hit the limit set by
+ :option:`iodepth_low`. If this variable is set to 0, then fio will always
+ check for completed events before queuing more I/O. This helps reduce I/O
+ latency, at the cost of more retrieval system calls.
+
+.. option:: iodepth_batch_complete_max=int
+
+ This defines maximum pieces of I/O to retrieve at once. This variable should
+ be used along with :option:`iodepth_batch_complete_min` =int variable,
+ specifying the range of min and max amount of I/O which should be
+ retrieved. By default it is equal to :option:`iodepth_batch_complete_min`
+ value.
+
+ Example #1::
+
+ iodepth_batch_complete_min=1
+ iodepth_batch_complete_max=<iodepth>
+
+ which means that we will retrieve at least 1 I/O and up to the whole
+ submitted queue depth. If none of I/O has been completed yet, we will wait.
+
+ Example #2::
+
+ iodepth_batch_complete_min=0
+ iodepth_batch_complete_max=<iodepth>
+
+ which means that we can retrieve up to the whole submitted queue depth, but
+ if none of I/O has been completed yet, we will NOT wait and immediately exit
+ the system call. In this example we simply do polling.
+
+.. option:: iodepth_low=int
+
+ The low water mark indicating when to start filling the queue
+ again. Defaults to the same as :option:`iodepth`, meaning that fio will
+ attempt to keep the queue full at all times. If :option:`iodepth` is set to
+ e.g. 16 and *iodepth_low* is set to 4, then after fio has filled the queue of
+ 16 requests, it will let the depth drain down to 4 before starting to fill
+ it again.
+
+.. option:: io_submit_mode=str
+
+ This option controls how fio submits the I/O to the I/O engine. The default
+ is `inline`, which means that the fio job threads submit and reap I/O
+ directly. If set to `offload`, the job threads will offload I/O submission
+ to a dedicated pool of I/O threads. This requires some coordination and thus
+ has a bit of extra overhead, especially for lower queue depth I/O where it
+ can increase latencies. The benefit is that fio can manage submission rates
+ independently of the device completion rates. This avoids skewed latency
+ reporting if I/O gets back up on the device side (the coordinated omission
+ problem).
+
+
+I/O rate
+~~~~~~~~
+
+.. option:: thinktime=time
+
+ Stall the job for the specified period of time after an I/O has completed before issuing the
+ next. May be used to simulate processing being done by an application.
+ When the unit is omitted, the value is given in microseconds. See
+ :option:`thinktime_blocks` and :option:`thinktime_spin`.
+
+.. option:: thinktime_spin=time
+
+ Only valid if :option:`thinktime` is set - pretend to spend CPU time doing
+ something with the data received, before falling back to sleeping for the
+ rest of the period specified by :option:`thinktime`. When the unit is
+ omitted, the value is given in microseconds.
+
+.. option:: thinktime_blocks=int
+
+ Only valid if :option:`thinktime` is set - control how many blocks to issue,
+ before waiting `thinktime` usecs. If not set, defaults to 1 which will make
+ fio wait `thinktime` usecs after every block. This effectively makes any
+ queue depth setting redundant, since no more than 1 I/O will be queued
+ before we have to complete it and do our thinktime. In other words, this
+ setting effectively caps the queue depth if the latter is larger.
+
+.. option:: rate=int[,int][,int]
+
+ Cap the bandwidth used by this job. The number is in bytes/sec, the normal
+ suffix rules apply. Comma-separated values may be specified for reads,
+ writes, and trims as described in :option:`blocksize`.
+
+.. option:: rate_min=int[,int][,int]
+
+ Tell fio to do whatever it can to maintain at least this bandwidth. Failing
+ to meet this requirement will cause the job to exit. Comma-separated values
+ may be specified for reads, writes, and trims as described in
+ :option:`blocksize`.
+
+.. option:: rate_iops=int[,int][,int]
+
+ Cap the bandwidth to this number of IOPS. Basically the same as
+ :option:`rate`, just specified independently of bandwidth. If the job is
+ given a block size range instead of a fixed value, the smallest block size
+ is used as the metric. Comma-separated values may be specified for reads,
+ writes, and trims as described in :option:`blocksize`.
+
+.. option:: rate_iops_min=int[,int][,int]
+
+ If fio doesn't meet this rate of I/O, it will cause the job to exit.
+ Comma-separated values may be specified for reads, writes, and trims as
+ described in :option:`blocksize`.
+
+.. option:: rate_process=str
+
+ This option controls how fio manages rated I/O submissions. The default is
+ `linear`, which submits I/O in a linear fashion with fixed delays between
+ I/Os that gets adjusted based on I/O completion rates. If this is set to
+ `poisson`, fio will submit I/O based on a more real world random request
+ flow, known as the Poisson process
+ (https://en.wikipedia.org/wiki/Poisson_point_process). The lambda will be
+ 10^6 / IOPS for the given workload.
+
+
+I/O latency
+~~~~~~~~~~~
+
+.. option:: latency_target=time
+
+ If set, fio will attempt to find the max performance point that the given
+ workload will run at while maintaining a latency below this target. When
+ the unit is omitted, the value is given in microseconds. See
+ :option:`latency_window` and :option:`latency_percentile`.
+
+.. option:: latency_window=time
+
+ Used with :option:`latency_target` to specify the sample window that the job
+ is run at varying queue depths to test the performance. When the unit is
+ omitted, the value is given in microseconds.
+
+.. option:: latency_percentile=float
+
+ The percentage of I/Os that must fall within the criteria specified by
+ :option:`latency_target` and :option:`latency_window`. If not set, this
+ defaults to 100.0, meaning that all I/Os must be equal or below to the value
+ set by :option:`latency_target`.
+
+.. option:: max_latency=time
+
+ If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
+ maximum latency. When the unit is omitted, the value is given in
+ microseconds.
+
+.. option:: rate_cycle=int
+
+ Average bandwidth for :option:`rate` and :option:`rate_min` over this number
+ of milliseconds.
+
+
+I/O replay
+~~~~~~~~~~
+
+.. option:: write_iolog=str
+
+ Write the issued I/O patterns to the specified file. See
+ :option:`read_iolog`. Specify a separate file for each job, otherwise the
+ iologs will be interspersed and the file may be corrupt.
+
+.. option:: read_iolog=str
+
+ Open an iolog with the specified file name and replay the I/O patterns it
+ contains. This can be used to store a workload and replay it sometime
+ later. The iolog given may also be a blktrace binary file, which allows fio
+ to replay a workload captured by :command:`blktrace`. See
+ :manpage:`blktrace(8)` for how to capture such logging data. For blktrace
+ replay, the file needs to be turned into a blkparse binary data file first
+ (``blkparse <device> -o /dev/null -d file_for_fio.bin``).
+
+.. option:: replay_no_stall=int
+
+ When replaying I/O with :option:`read_iolog` the default behavior is to
+ attempt to respect the time stamps within the log and replay them with the
+ appropriate delay between IOPS. By setting this variable fio will not
+ respect the timestamps and attempt to replay them as fast as possible while
+ still respecting ordering. The result is the same I/O pattern to a given
+ device, but different timings.
+
+.. option:: replay_redirect=str
+
+ While replaying I/O patterns using :option:`read_iolog` the default behavior
+ is to replay the IOPS onto the major/minor device that each IOP was recorded
+ from. This is sometimes undesirable because on a different machine those
+ major/minor numbers can map to a different device. Changing hardware on the
+ same system can also result in a different major/minor mapping.
+ ``replay_redirect`` causes all IOPS to be replayed onto the single specified
+ device regardless of the device it was recorded
+ from. i.e. :option:`replay_redirect` = :file:`/dev/sdc` would cause all I/O
+ in the blktrace or iolog to be replayed onto :file:`/dev/sdc`. This means
+ multiple devices will be replayed onto a single device, if the trace
+ contains multiple devices. If you want multiple devices to be replayed
+ concurrently to multiple redirected devices you must blkparse your trace
+ into separate traces and replay them with independent fio invocations.
+ Unfortunately this also breaks the strict time ordering between multiple
+ device accesses.
+
+.. option:: replay_align=int
+
+ Force alignment of I/O offsets and lengths in a trace to this power of 2
+ value.
+
+.. option:: replay_scale=int
+
+ Scale sector offsets down by this factor when replaying traces.
+
+
+Threads, processes and job synchronization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. option:: thread
+
+ Fio defaults to forking jobs, however if this option is given, fio will use
+ POSIX Threads function :manpage:`pthread_create(3)` to create threads instead
+ of forking processes.
+
+.. option:: wait_for=str
+
+ Specifies the name of the already defined job to wait for. Single waitee
+ name only may be specified. If set, the job won't be started until all
+ workers of the waitee job are done.
+
+ ``wait_for`` operates on the job name basis, so there are a few
+ limitations. First, the waitee must be defined prior to the waiter job
+ (meaning no forward references). Second, if a job is being referenced as a
+ waitee, it must have a unique name (no duplicate waitees).
+
+.. option:: nice=int
+
+ Run the job with the given nice value. See man :manpage:`nice(2)`.
+
+ On Windows, values less than -15 set the process class to "High"; -1 through
+ -15 set "Above Normal"; 1 through 15 "Below Normal"; and above 15 "Idle"
+ priority class.
+
+.. option:: prio=int
+
+ Set the I/O priority value of this job. Linux limits us to a positive value
+ between 0 and 7, with 0 being the highest. See man
+ :manpage:`ionice(1)`. Refer to an appropriate manpage for other operating
+ systems since meaning of priority may differ.
+
+.. option:: prioclass=int
+
+ Set the I/O priority class. See man :manpage:`ionice(1)`.
+
+.. option:: cpumask=int
+
+ Set the CPU affinity of this job. The parameter given is a bitmask of
+ allowed CPU's the job may run on. So if you want the allowed CPUs to be 1
+ and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
+ :manpage:`sched_setaffinity(2)`. This may not work on all supported
+ operating systems or kernel versions. This option doesn't work well for a
+ higher CPU count than what you can store in an integer mask, so it can only
+ control cpus 1-32. For boxes with larger CPU counts, use
+ :option:`cpus_allowed`.
+
+.. option:: cpus_allowed=str
+
+ Controls the same options as :option:`cpumask`, but it allows a text setting
+ of the permitted CPUs instead. So to use CPUs 1 and 5, you would specify
+ ``cpus_allowed=1,5``. This options also allows a range of CPUs. Say you
+ wanted a binding to CPUs 1, 5, and 8-15, you would set
+ ``cpus_allowed=1,5,8-15``.
+
+.. option:: cpus_allowed_policy=str
+
+ Set the policy of how fio distributes the CPUs specified by
+ :option:`cpus_allowed` or cpumask. Two policies are supported:
-1. Overview
-2. How fio works
-3. Running fio
-4. Job file format
-5. Detailed list of parameters
-6. Normal output
-7. Terse output
-8. Trace file format
-9. CPU idleness profiling
-
-1.0 Overview and history
-------------------------
-fio was originally written to save me the hassle of writing special test
-case programs when I wanted to test a specific workload, either for
-performance reasons or to find/reproduce a bug. The process of writing
-such a test app can be tiresome, especially if you have to do it often.
-Hence I needed a tool that would be able to simulate a given io workload
-without resorting to writing a tailored test case again and again.
-
-A test work load is difficult to define, though. There can be any number
-of processes or threads involved, and they can each be using their own
-way of generating io. You could have someone dirtying large amounts of
-memory in an memory mapped file, or maybe several threads issuing
-reads using asynchronous io. fio needed to be flexible enough to
-simulate both of these cases, and many more.
-
-2.0 How fio works
------------------
-The first step in getting fio to simulate a desired io workload, is
-writing a job file describing that specific setup. A job file may contain
-any number of threads and/or files - the typical contents of the job file
-is a global section defining shared parameters, and one or more job
-sections describing the jobs involved. When run, fio parses this file
-and sets everything up as described. If we break down a job from top to
-bottom, it contains the following basic parameters:
+ **shared**
+ All jobs will share the CPU set specified.
+ **split**
+ Each job will get a unique CPU from the CPU set.
- IO type Defines the io pattern issued to the file(s).
- We may only be reading sequentially from this
- file(s), or we may be writing randomly. Or even
- mixing reads and writes, sequentially or randomly.
+ **shared** is the default behaviour, if the option isn't specified. If
+ **split** is specified, then fio will will assign one cpu per job. If not
+ enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
+ in the set.
- Block size In how large chunks are we issuing io? This may be
- a single value, or it may describe a range of
- block sizes.
+.. option:: numa_cpu_nodes=str
- IO size How much data are we going to be reading/writing.
+ Set this job running on specified NUMA nodes' CPUs. The arguments allow
+ comma delimited list of cpu numbers, A-B ranges, or `all`. Note, to enable
+ numa options support, fio must be built on a system with libnuma-dev(el)
+ installed.
- IO engine How do we issue io? We could be memory mapping the
- file, we could be using regular read/write, we
- could be using splice, async io, syslet, or even
- SG (SCSI generic sg).
+.. option:: numa_mem_policy=str
- IO depth If the io engine is async, how large a queuing
- depth do we want to maintain?
+ Set this job's memory policy and corresponding NUMA nodes. Format of the
+ arguments::
- IO type Should we be doing buffered io, or direct/raw io?
+ <mode>[:<nodelist>]
- Num files How many files are we spreading the workload over.
+ ``mode`` is one of the following memory policy: ``default``, ``prefer``,
+ ``bind``, ``interleave``, ``local`` For ``default`` and ``local`` memory
+ policy, no node is needed to be specified. For ``prefer``, only one node is
+ allowed. For ``bind`` and ``interleave``, it allow comma delimited list of
+ numbers, A-B ranges, or `all`.
- Num threads How many threads or processes should we spread
- this workload over.
+.. option:: cgroup=str
-The above are the basic parameters defined for a workload, in addition
-there's a multitude of parameters that modify other aspects of how this
-job behaves.
+ Add job to this control group. If it doesn't exist, it will be created. The
+ system must have a mounted cgroup blkio mount point for this to work. If
+ your system doesn't have it mounted, you can do so with::
+ # mount -t cgroup -o blkio none /cgroup
-3.0 Running fio
----------------
-See the README file for command line parameters, there are only a few
-of them.
+.. option:: cgroup_weight=int
-Running fio is normally the easiest part - you just give it the job file
-(or job files) as parameters:
+ Set the weight of the cgroup to this value. See the documentation that comes
+ with the kernel, allowed values are in the range of 100..1000.
-$ fio job_file
+.. option:: cgroup_nodelete=bool
-and it will start doing what the job_file tells it to do. You can give
-more than one job file on the command line, fio will serialize the running
-of those files. Internally that is the same as using the 'stonewall'
-parameter described in the parameter section.
+ Normally fio will delete the cgroups it has created after the job
+ completion. To override this behavior and to leave cgroups around after the
+ job completion, set ``cgroup_nodelete=1``. This can be useful if one wants
+ to inspect various cgroup files after job completion. Default: false.
-If the job file contains only one job, you may as well just give the
-parameters on the command line. The command line parameters are identical
-to the job parameters, with a few extra that control global parameters
-(see README). For example, for the job file parameter iodepth=2, the
-mirror command line option would be --iodepth 2 or --iodepth=2. You can
-also use the command line for giving more than one job entry. For each
---name option that fio sees, it will start a new job with that name.
-Command line entries following a --name entry will apply to that job,
-until there are no more entries or a new --name entry is seen. This is
-similar to the job file options, where each option applies to the current
-job until a new [] job entry is seen.
+.. option:: flow_id=int
-fio does not need to run as root, except if the files or devices specified
-in the job section requires that. Some other options may also be restricted,
-such as memory locking, io scheduler switching, and decreasing the nice value.
+ The ID of the flow. If not specified, it defaults to being a global
+ flow. See :option:`flow`.
+.. option:: flow=int
-4.0 Job file format
--------------------
-As previously described, fio accepts one or more job files describing
-what it is supposed to do. The job file format is the classic ini file,
-where the names enclosed in [] brackets define the job name. You are free
-to use any ascii name you want, except 'global' which has special meaning.
-A global section sets defaults for the jobs described in that file. A job
-may override a global section parameter, and a job file may even have
-several global sections if so desired. A job is only affected by a global
-section residing above it. If the first character in a line is a ';' or a
-'#', the entire line is discarded as a comment.
+ Weight in token-based flow control. If this value is used, then there is a
+ 'flow counter' which is used to regulate the proportion of activity between
+ two or more jobs. Fio attempts to keep this flow counter near zero. The
+ ``flow`` parameter stands for how much should be added or subtracted to the
+ flow counter on each iteration of the main I/O loop. That is, if one job has
+ ``flow=8`` and another job has ``flow=-1``, then there will be a roughly 1:8
+ ratio in how much one runs vs the other.
-So let's look at a really simple job file that defines two processes, each
-randomly reading from a 128MB file.
+.. option:: flow_watermark=int
-; -- start job file --
-[global]
-rw=randread
-size=128m
+ The maximum value that the absolute value of the flow counter is allowed to
+ reach before the job must wait for a lower value of the counter.
-[job1]
+.. option:: flow_sleep=int
-[job2]
+ The period of time, in microseconds, to wait after the flow watermark has
+ been exceeded before retrying operations.
-; -- end job file --
+.. option:: stonewall, wait_for_previous
-As you can see, the job file sections themselves are empty as all the
-described parameters are shared. As no filename= option is given, fio
-makes up a filename for each of the jobs as it sees fit. On the command
-line, this job would look as follows:
+ Wait for preceding jobs in the job file to exit, before starting this
+ one. Can be used to insert serialization points in the job file. A stone
+ wall also implies starting a new reporting group, see
+ :option:`group_reporting`.
-$ fio --name=global --rw=randread --size=128m --name=job1 --name=job2
+.. option:: exitall
+ When one job finishes, terminate the rest. The default is to wait for each
+ job to finish, sometimes that is not the desired action.
-Let's look at an example that has a number of processes writing randomly
-to files.
+.. option:: exec_prerun=str
-; -- start job file --
-[random-writers]
-ioengine=libaio
-iodepth=4
-rw=randwrite
-bs=32k
-direct=0
-size=64m
-numjobs=4
+ Before running this job, issue the command specified through
+ :manpage:`system(3)`. Output is redirected in a file called
+ :file:`jobname.prerun.txt`.
-; -- end job file --
+.. option:: exec_postrun=str
-Here we have no global section, as we only have one job defined anyway.
-We want to use async io here, with a depth of 4 for each file. We also
-increased the buffer size used to 32KB and define numjobs to 4 to
-fork 4 identical jobs. The result is 4 processes each randomly writing
-to their own 64MB file. Instead of using the above job file, you could
-have given the parameters on the command line. For this case, you would
-specify:
+ After the job completes, issue the command specified though
+ :manpage:`system(3)`. Output is redirected in a file called
+ :file:`jobname.postrun.txt`.
-$ fio --name=random-writers --ioengine=libaio --iodepth=4 --rw=randwrite --bs=32k --direct=0 --size=64m --numjobs=4
+.. option:: uid=int
-When fio is utilized as a basis of any reasonably large test suite, it might be
-desirable to share a set of standardized settings across multiple job files.
-Instead of copy/pasting such settings, any section may pull in an external
-.fio file with 'include filename' directive, as in the following example:
-
-; -- start job file including.fio --
-[global]
-filename=/tmp/test
-filesize=1m
-include glob-include.fio
-
-[test]
-rw=randread
-bs=4k
-time_based=1
-runtime=10
-include test-include.fio
-; -- end job file including.fio --
-
-; -- start job file glob-include.fio --
-thread=1
-group_reporting=1
-; -- end job file glob-include.fio --
-
-; -- start job file test-include.fio --
-ioengine=libaio
-iodepth=4
-; -- end job file test-include.fio --
-
-Settings pulled into a section apply to that section only (except global
-section). Include directives may be nested in that any included file may
-contain further include directive(s). Include files may not contain []
-sections.
-
-
-4.1 Environment variables
--------------------------
+ Instead of running as the invoking user, set the user ID to this value
+ before the thread/process does any work.
-fio also supports environment variable expansion in job files. Any
-substring of the form "${VARNAME}" as part of an option value (in other
-words, on the right of the `='), will be expanded to the value of the
-environment variable called VARNAME. If no such environment variable
-is defined, or VARNAME is the empty string, the empty string will be
-substituted.
+.. option:: gid=int
-As an example, let's look at a sample fio invocation and job file:
+ Set group ID, see :option:`uid`.
-$ SIZE=64m NUMJOBS=4 fio jobfile.fio
-; -- start job file --
-[random-writers]
-rw=randwrite
-size=${SIZE}
-numjobs=${NUMJOBS}
-; -- end job file --
+Verification
+~~~~~~~~~~~~
-This will expand to the following equivalent job file at runtime:
+.. option:: verify_only
-; -- start job file --
-[random-writers]
-rw=randwrite
-size=64m
-numjobs=4
-; -- end job file --
+ Do not perform specified workload, only verify data still matches previous
+ invocation of this workload. This option allows one to check data multiple
+ times at a later date without overwriting it. This option makes sense only
+ for workloads that write data, and does not support workloads with the
+ :option:`time_based` option set.
-fio ships with a few example job files, you can also look there for
-inspiration.
+.. option:: do_verify=bool
-4.2 Reserved keywords
----------------------
+ Run the verify phase after a write phase. Only valid if :option:`verify` is
+ set. Default: true.
-Additionally, fio has a set of reserved keywords that will be replaced
-internally with the appropriate value. Those keywords are:
+.. option:: verify=str
-$pagesize The architecture page size of the running system
-$mb_memory Megabytes of total memory in the system
-$ncpus Number of online available CPUs
+ If writing to a file, fio can verify the file contents after each iteration
+ of the job. Each verification method also implies verification of special
+ header, which is written to the beginning of each block. This header also
+ includes meta information, like offset of the block, block number, timestamp
+ when block was written, etc. :option:`verify` can be combined with
+ :option:`verify_pattern` option. The allowed values are:
-These can be used on the command line or in the job file, and will be
-automatically substituted with the current system values when the job
-is run. Simple math is also supported on these keywords, so you can
-perform actions like:
+ **md5**
+ Use an md5 sum of the data area and store it in the header of
+ each block.
-size=8*$mb_memory
+ **crc64**
+ Use an experimental crc64 sum of the data area and store it in the
+ header of each block.
-and get that properly expanded to 8 times the size of memory in the
-machine.
+ **crc32c**
+ Use a crc32c sum of the data area and store it in the header of each
+ block.
+ **crc32c-intel**
+ Use hardware assisted crc32c calculation provided on SSE4.2 enabled
+ processors. Falls back to regular software crc32c, if not supported
+ by the system.
-5.0 Detailed list of parameters
--------------------------------
+ **crc32**
+ Use a crc32 sum of the data area and store it in the header of each
+ block.
-This section describes in details each parameter associated with a job.
-Some parameters take an option of a given type, such as an integer or
-a string. Anywhere a numeric value is required, an arithmetic expression
-may be used, provided it is surrounded by parentheses. Supported operators
-are:
+ **crc16**
+ Use a crc16 sum of the data area and store it in the header of each
+ block.
- addition (+)
- subtraction (-)
- multiplication (*)
- division (/)
- modulus (%)
- exponentiation (^)
+ **crc7**
+ Use a crc7 sum of the data area and store it in the header of each
+ block.
-For time values in expressions, units are microseconds by default. This is
-different than for time values not in expressions (not enclosed in
-parentheses). The following types are used:
+ **xxhash**
+ Use xxhash as the checksum function. Generally the fastest software
+ checksum that fio supports.
-str String. This is a sequence of alpha characters.
-time Integer with possible time suffix. In seconds unless otherwise
- specified, use eg 10m for 10 minutes. Accepts s/m/h for seconds,
- minutes, and hours, and accepts 'ms' (or 'msec') for milliseconds,
- and 'us' (or 'usec') for microseconds.
-int SI integer. A whole number value, which may contain a suffix
- describing the base of the number. Accepted suffixes are k/m/g/t/p,
- meaning kilo, mega, giga, tera, and peta. The suffix is not case
- sensitive, and you may also include trailing 'b' (eg 'kb' is the same
- as 'k'). So if you want to specify 4096, you could either write
- out '4096' or just give 4k. The suffixes signify base 2 values, so
- 1024 is 1k and 1024k is 1m and so on, unless the suffix is explicitly
- set to a base 10 value using 'kib', 'mib', 'gib', etc. If that is the
- case, then 1000 is used as the multiplier. This can be handy for
- disks, since manufacturers generally use base 10 values when listing
- the capacity of a drive. If the option accepts an upper and lower
- range, use a colon ':' or minus '-' to separate such values. May also
- include a prefix to indicate numbers base. If 0x is used, the number
- is assumed to be hexadecimal. See irange.
-bool Boolean. Usually parsed as an integer, however only defined for
- true and false (1 and 0).
-irange Integer range with suffix. Allows value range to be given, such
- as 1024-4096. A colon may also be used as the separator, eg
- 1k:4k. If the option allows two sets of ranges, they can be
- specified with a ',' or '/' delimiter: 1k-4k/8k-32k. Also see
- int.
-float_list A list of floating numbers, separated by a ':' character.
-
-With the above in mind, here follows the complete list of fio job
-parameters.
-
-name=str ASCII name of the job. This may be used to override the
- name printed by fio for this job. Otherwise the job
- name is used. On the command line this parameter has the
- special purpose of also signaling the start of a new
- job.
-
-description=str Text description of the job. Doesn't do anything except
- dump this text description when this job is run. It's
- not parsed.
-
-directory=str Prefix filenames with this directory. Used to place files
- in a different location than "./". See the 'filename' option
- for escaping certain characters.
-
-filename=str Fio normally makes up a filename based on the job name,
- thread number, and file number. If you want to share
- files between threads in a job or several jobs, specify
- a filename for each of them to override the default. If
- the ioengine used is 'net', the filename is the host, port,
- and protocol to use in the format of =host,port,protocol.
- See ioengine=net for more. If the ioengine is file based, you
- can specify a number of files by separating the names with a
- ':' colon. So if you wanted a job to open /dev/sda and /dev/sdb
- as the two working files, you would use
- filename=/dev/sda:/dev/sdb. On Windows, disk devices are
- accessed as \\.\PhysicalDrive0 for the first device,
- \\.\PhysicalDrive1 for the second etc. Note: Windows and
- FreeBSD prevent write access to areas of the disk containing
- in-use data (e.g. filesystems).
- If the wanted filename does need to include a colon, then
- escape that with a '\' character. For instance, if the filename
- is "/dev/dsk/foo@3,0:c", then you would use
- filename="/dev/dsk/foo@3,0\:c". '-' is a reserved name, meaning
- stdin or stdout. Which of the two depends on the read/write
- direction set.
-
-filename_format=str
- If sharing multiple files between jobs, it is usually necessary
- to have fio generate the exact names that you want. By default,
- fio will name a file based on the default file format
- specification of jobname.jobnumber.filenumber. With this
- option, that can be customized. Fio will recognize and replace
- the following keywords in this string:
-
- $jobname
- The name of the worker thread or process.
-
- $jobnum
- The incremental number of the worker thread or
- process.
-
- $filenum
- The incremental number of the file for that worker
- thread or process.
-
- To have dependent jobs share a set of files, this option can
- be set to have fio generate filenames that are shared between
- the two. For instance, if testfiles.$filenum is specified,
- file number 4 for any job will be named testfiles.4. The
- default of $jobname.$jobnum.$filenum will be used if
- no other format specifier is given.
-
-opendir=str Tell fio to recursively add any file it can find in this
- directory and down the file system tree.
-
-lockfile=str Fio defaults to not locking any files before it does
- IO to them. If a file or file descriptor is shared, fio
- can serialize IO to that file to make the end result
- consistent. This is usual for emulating real workloads that
- share files. The lock modes are:
-
- none No locking. The default.
- exclusive Only one thread/process may do IO,
- excluding all others.
- readwrite Read-write locking on the file. Many
- readers may access the file at the
- same time, but writes get exclusive
- access.
-
-readwrite=str
-rw=str Type of io pattern. Accepted values are:
-
- read Sequential reads
- write Sequential writes
- randwrite Random writes
- randread Random reads
- rw,readwrite Sequential mixed reads and writes
- randrw Random mixed reads and writes
- trimwrite Mixed trims and writes. Blocks will be
- trimmed first, then written to.
-
- For the mixed io types, the default is to split them 50/50.
- For certain types of io the result may still be skewed a bit,
- since the speed may be different. It is possible to specify
- a number of IO's to do before getting a new offset, this is
- done by appending a ':<nr>' to the end of the string given.
- For a random read, it would look like 'rw=randread:8' for
- passing in an offset modifier with a value of 8. If the
- suffix is used with a sequential IO pattern, then the value
- specified will be added to the generated offset for each IO.
- For instance, using rw=write:4k will skip 4k for every
- write. It turns sequential IO into sequential IO with holes.
- See the 'rw_sequencer' option.
-
-rw_sequencer=str If an offset modifier is given by appending a number to
- the rw=<str> line, then this option controls how that
- number modifies the IO offset being generated. Accepted
- values are:
-
- sequential Generate sequential offset
- identical Generate the same offset
-
- 'sequential' is only useful for random IO, where fio would
- normally generate a new random offset for every IO. If you
- append eg 8 to randread, you would get a new random offset for
- every 8 IO's. The result would be a seek for only every 8
- IO's, instead of for every IO. Use rw=randread:8 to specify
- that. As sequential IO is already sequential, setting
- 'sequential' for that would not result in any differences.
- 'identical' behaves in a similar fashion, except it sends
- the same offset 8 number of times before generating a new
- offset.
-
-kb_base=int The base unit for a kilobyte. The defacto base is 2^10, 1024.
- Storage manufacturers like to use 10^3 or 1000 as a base
- ten unit instead, for obvious reasons. Allow values are
- 1024 or 1000, with 1024 being the default.
-
-unified_rw_reporting=bool Fio normally reports statistics on a per
- data direction basis, meaning that read, write, and trim are
- accounted and reported separately. If this option is set,
- the fio will sum the results and report them as "mixed"
- instead.
-
-randrepeat=bool For random IO workloads, seed the generator in a predictable
- way so that results are repeatable across repetitions.
-
-randseed=int Seed the random number generators based on this seed value, to
- be able to control what sequence of output is being generated.
- If not set, the random sequence depends on the randrepeat
- setting.
-
-fallocate=str Whether pre-allocation is performed when laying down files.
- Accepted values are:
-
- none Do not pre-allocate space
- posix Pre-allocate via posix_fallocate()
- keep Pre-allocate via fallocate() with
- FALLOC_FL_KEEP_SIZE set
- 0 Backward-compatible alias for 'none'
- 1 Backward-compatible alias for 'posix'
-
- May not be available on all supported platforms. 'keep' is only
- available on Linux.If using ZFS on Solaris this must be set to
- 'none' because ZFS doesn't support it. Default: 'posix'.
-
-fadvise_hint=bool By default, fio will use fadvise() to advise the kernel
- on what IO patterns it is likely to issue. Sometimes you
- want to test specific IO patterns without telling the
- kernel about it, in which case you can disable this option.
- If set, fio will use POSIX_FADV_SEQUENTIAL for sequential
- IO and POSIX_FADV_RANDOM for random IO.
-
-fadvise_stream=int Notify the kernel what write stream ID to place these
- writes under. Only supported on Linux. Note, this option
- may change going forward.
-
-size=int The total size of file io for this job. Fio will run until
- this many bytes has been transferred, unless runtime is
- limited by other options (such as 'runtime', for instance,
- or increased/decreased by 'io_size'). Unless specific nrfiles
- and filesize options are given, fio will divide this size
- between the available files specified by the job. If not set,
- fio will use the full size of the given files or devices.
- If the files do not exist, size must be given. It is also
- possible to give size as a percentage between 1 and 100. If
- size=20% is given, fio will use 20% of the full size of the
- given files or devices.
-
-io_size=int
-io_limit=int Normally fio operates within the region set by 'size', which
- means that the 'size' option sets both the region and size of
- IO to be performed. Sometimes that is not what you want. With
- this option, it is possible to define just the amount of IO
- that fio should do. For instance, if 'size' is set to 20G and
- 'io_size' is set to 5G, fio will perform IO within the first
- 20G but exit when 5G have been done. The opposite is also
- possible - if 'size' is set to 20G, and 'io_size' is set to
- 40G, then fio will do 40G of IO within the 0..20G region.
-
-filesize=int Individual file sizes. May be a range, in which case fio
- will select sizes for files at random within the given range
- and limited to 'size' in total (if that is given). If not
- given, each created file is the same size.
-
-file_append=bool Perform IO after the end of the file. Normally fio will
- operate within the size of a file. If this option is set, then
- fio will append to the file instead. This has identical
- behavior to setting offset to the size of a file. This option
- is ignored on non-regular files.
-
-fill_device=bool
-fill_fs=bool Sets size to something really large and waits for ENOSPC (no
- space left on device) as the terminating condition. Only makes
- sense with sequential write. For a read workload, the mount
- point will be filled first then IO started on the result. This
- option doesn't make sense if operating on a raw device node,
- since the size of that is already known by the file system.
- Additionally, writing beyond end-of-device will not return
- ENOSPC there.
-
-blocksize=int
-bs=int The block size used for the io units. Defaults to 4k. Values
- can be given for both read and writes. If a single int is
- given, it will apply to both. If a second int is specified
- after a comma, it will apply to writes only. In other words,
- the format is either bs=read_and_write or bs=read,write,trim.
- bs=4k,8k will thus use 4k blocks for reads, 8k blocks for
- writes, and 8k for trims. You can terminate the list with
- a trailing comma. bs=4k,8k, would use the default value for
- trims.. If you only wish to set the write size, you
- can do so by passing an empty read size - bs=,8k will set
- 8k for writes and leave the read default value.
-
-blockalign=int
-ba=int At what boundary to align random IO offsets. Defaults to
- the same as 'blocksize' the minimum blocksize given.
- Minimum alignment is typically 512b for using direct IO,
- though it usually depends on the hardware block size. This
- option is mutually exclusive with using a random map for
- files, so it will turn off that option.
-
-blocksize_range=irange
-bsrange=irange Instead of giving a single block size, specify a range
- and fio will mix the issued io block sizes. The issued
- io unit will always be a multiple of the minimum value
- given (also see bs_unaligned). Applies to both reads and
- writes, however a second range can be given after a comma.
- See bs=.
-
-bssplit=str Sometimes you want even finer grained control of the
- block sizes issued, not just an even split between them.
- This option allows you to weight various block sizes,
- so that you are able to define a specific amount of
- block sizes issued. The format for this option is:
-
- bssplit=blocksize/percentage:blocksize/percentage
-
- for as many block sizes as needed. So if you want to define
- a workload that has 50% 64k blocks, 10% 4k blocks, and
- 40% 32k blocks, you would write:
-
- bssplit=4k/10:64k/50:32k/40
-
- Ordering does not matter. If the percentage is left blank,
- fio will fill in the remaining values evenly. So a bssplit
- option like this one:
-
- bssplit=4k/50:1k/:32k/
-
- would have 50% 4k ios, and 25% 1k and 32k ios. The percentages
- always add up to 100, if bssplit is given a range that adds
- up to more, it will error out.
-
- bssplit also supports giving separate splits to reads and
- writes. The format is identical to what bs= accepts. You
- have to separate the read and write parts with a comma. So
- if you want a workload that has 50% 2k reads and 50% 4k reads,
- while having 90% 4k writes and 10% 8k writes, you would
- specify:
-
- bssplit=2k/50:4k/50,4k/90:8k/10
-
-blocksize_unaligned
-bs_unaligned If this option is given, any byte size value within bsrange
- may be used as a block range. This typically wont work with
- direct IO, as that normally requires sector alignment.
-
-bs_is_seq_rand If this option is set, fio will use the normal read,write
- blocksize settings as sequential,random instead. Any random
- read or write will use the WRITE blocksize settings, and any
- sequential read or write will use the READ blocksize setting.
-
-zero_buffers If this option is given, fio will init the IO buffers to
- all zeroes. The default is to fill them with random data.
-
-refill_buffers If this option is given, fio will refill the IO buffers
- on every submit. The default is to only fill it at init
- time and reuse that data. Only makes sense if zero_buffers
- isn't specified, naturally. If data verification is enabled,
- refill_buffers is also automatically enabled.
-
-scramble_buffers=bool If refill_buffers is too costly and the target is
- using data deduplication, then setting this option will
- slightly modify the IO buffer contents to defeat normal
- de-dupe attempts. This is not enough to defeat more clever
- block compression attempts, but it will stop naive dedupe of
- blocks. Default: true.
-
-buffer_compress_percentage=int If this is set, then fio will attempt to
- provide IO buffer content (on WRITEs) that compress to
- the specified level. Fio does this by providing a mix of
- random data and a fixed pattern. The fixed pattern is either
- zeroes, or the pattern specified by buffer_pattern. If the
- pattern option is used, it might skew the compression ratio
- slightly. Note that this is per block size unit, for file/disk
- wide compression level that matches this setting, you'll also
- want to set refill_buffers.
-
-buffer_compress_chunk=int See buffer_compress_percentage. This
- setting allows fio to manage how big the ranges of random
- data and zeroed data is. Without this set, fio will
- provide buffer_compress_percentage of blocksize random
- data, followed by the remaining zeroed. With this set
- to some chunk size smaller than the block size, fio can
- alternate random and zeroed data throughout the IO
- buffer.
-
-buffer_pattern=str If set, fio will fill the io buffers with this
- pattern. If not set, the contents of io buffers is defined by
- the other options related to buffer contents. The setting can
- be any pattern of bytes, and can be prefixed with 0x for hex
- values. It may also be a string, where the string must then
- be wrapped with "".
-
-dedupe_percentage=int If set, fio will generate this percentage of
- identical buffers when writing. These buffers will be
- naturally dedupable. The contents of the buffers depend on
- what other buffer compression settings have been set. It's
- possible to have the individual buffers either fully
- compressible, or not at all. This option only controls the
- distribution of unique buffers.
-
-nrfiles=int Number of files to use for this job. Defaults to 1.
-
-openfiles=int Number of files to keep open at the same time. Defaults to
- the same as nrfiles, can be set smaller to limit the number
- simultaneous opens.
-
-file_service_type=str Defines how fio decides which file from a job to
- service next. The following types are defined:
-
- random Just choose a file at random.
-
- roundrobin Round robin over open files. This
- is the default.
-
- sequential Finish one file before moving on to
- the next. Multiple files can still be
- open depending on 'openfiles'.
-
- The string can have a number appended, indicating how
- often to switch to a new file. So if option random:4 is
- given, fio will switch to a new random file after 4 ios
- have been issued.
-
-ioengine=str Defines how the job issues io to the file. The following
- types are defined:
-
- sync Basic read(2) or write(2) io. lseek(2) is
- used to position the io location.
-
- psync Basic pread(2) or pwrite(2) io.
-
- vsync Basic readv(2) or writev(2) IO.
-
- psyncv Basic preadv(2) or pwritev(2) IO.
-
- libaio Linux native asynchronous io. Note that Linux
- may only support queued behaviour with
- non-buffered IO (set direct=1 or buffered=0).
- This engine defines engine specific options.
-
- posixaio glibc posix asynchronous io.
-
- solarisaio Solaris native asynchronous io.
-
- windowsaio Windows native asynchronous io.
-
- mmap File is memory mapped and data copied
- to/from using memcpy(3).
-
- splice splice(2) is used to transfer the data and
- vmsplice(2) to transfer data from user
- space to the kernel.
-
- syslet-rw Use the syslet system calls to make
- regular read/write async.
-
- sg SCSI generic sg v3 io. May either be
- synchronous using the SG_IO ioctl, or if
- the target is an sg character device
- we use read(2) and write(2) for asynchronous
- io.
-
- null Doesn't transfer any data, just pretends
- to. This is mainly used to exercise fio
- itself and for debugging/testing purposes.
-
- net Transfer over the network to given host:port.
- Depending on the protocol used, the hostname,
- port, listen and filename options are used to
- specify what sort of connection to make, while
- the protocol option determines which protocol
- will be used.
- This engine defines engine specific options.
-
- netsplice Like net, but uses splice/vmsplice to
- map data and send/receive.
- This engine defines engine specific options.
-
- cpuio Doesn't transfer any data, but burns CPU
- cycles according to the cpuload= and
- cpucycle= options. Setting cpuload=85
- will cause that job to do nothing but burn
- 85% of the CPU. In case of SMP machines,
- use numjobs=<no_of_cpu> to get desired CPU
- usage, as the cpuload only loads a single
- CPU at the desired rate.
-
- guasi The GUASI IO engine is the Generic Userspace
- Asyncronous Syscall Interface approach
- to async IO. See
-
- http://www.xmailserver.org/guasi-lib.html
-
- for more info on GUASI.
-
- rdma The RDMA I/O engine supports both RDMA
- memory semantics (RDMA_WRITE/RDMA_READ) and
- channel semantics (Send/Recv) for the
- InfiniBand, RoCE and iWARP protocols.
-
- falloc IO engine that does regular fallocate to
- simulate data transfer as fio ioengine.
- DDIR_READ does fallocate(,mode = keep_size,)
- DDIR_WRITE does fallocate(,mode = 0)
- DDIR_TRIM does fallocate(,mode = punch_hole)
-
- e4defrag IO engine that does regular EXT4_IOC_MOVE_EXT
- ioctls to simulate defragment activity in
- request to DDIR_WRITE event
-
- rbd IO engine supporting direct access to Ceph
- Rados Block Devices (RBD) via librbd without
- the need to use the kernel rbd driver. This
- ioengine defines engine specific options.
-
- gfapi Using Glusterfs libgfapi sync interface to
- direct access to Glusterfs volumes without
- options.
-
- gfapi_async Using Glusterfs libgfapi async interface
- to direct access to Glusterfs volumes without
- having to go through FUSE. This ioengine
- defines engine specific options.
-
- libhdfs Read and write through Hadoop (HDFS).
- The 'filename' option is used to specify host,
- port of the hdfs name-node to connect. This
- engine interprets offsets a little
- differently. In HDFS, files once created
- cannot be modified. So random writes are not
- possible. To imitate this, libhdfs engine
- expects bunch of small files to be created
- over HDFS, and engine will randomly pick a
- file out of those files based on the offset
- generated by fio backend. (see the example
- job file to create such files, use rw=write
- option). Please note, you might want to set
- necessary environment variables to work with
- hdfs/libhdfs properly.
-
- mtd Read, write and erase an MTD character device
- (e.g., /dev/mtd0). Discards are treated as
- erases. Depending on the underlying device
- type, the I/O may have to go in a certain
- pattern, e.g., on NAND, writing sequentially
- to erase blocks and discarding before
- overwriting. The writetrim mode works well
- for this constraint.
-
- external Prefix to specify loading an external
- IO engine object file. Append the engine
- filename, eg ioengine=external:/tmp/foo.o
- to load ioengine foo.o in /tmp.
-
-iodepth=int This defines how many io units to keep in flight against
- the file. The default is 1 for each file defined in this
- job, can be overridden with a larger value for higher
- concurrency. Note that increasing iodepth beyond 1 will not
- affect synchronous ioengines (except for small degress when
- verify_async is in use). Even async engines may impose OS
- restrictions causing the desired depth not to be achieved.
- This may happen on Linux when using libaio and not setting
- direct=1, since buffered IO is not async on that OS. Keep an
- eye on the IO depth distribution in the fio output to verify
- that the achieved depth is as expected. Default: 1.
-
-iodepth_batch_submit=int
-iodepth_batch=int This defines how many pieces of IO to submit at once.
- It defaults to 1 which means that we submit each IO
- as soon as it is available, but can be raised to submit
- bigger batches of IO at the time.
-
-iodepth_batch_complete=int This defines how many pieces of IO to retrieve
- at once. It defaults to 1 which means that we'll ask
- for a minimum of 1 IO in the retrieval process from
- the kernel. The IO retrieval will go on until we
- hit the limit set by iodepth_low. If this variable is
- set to 0, then fio will always check for completed
- events before queuing more IO. This helps reduce
- IO latency, at the cost of more retrieval system calls.
-
-iodepth_low=int The low water mark indicating when to start filling
- the queue again. Defaults to the same as iodepth, meaning
- that fio will attempt to keep the queue full at all times.
- If iodepth is set to eg 16 and iodepth_low is set to 4, then
- after fio has filled the queue of 16 requests, it will let
- the depth drain down to 4 before starting to fill it again.
-
-direct=bool If value is true, use non-buffered io. This is usually
- O_DIRECT. Note that ZFS on Solaris doesn't support direct io.
- On Windows the synchronous ioengines don't support direct io.
-
-atomic=bool If value is true, attempt to use atomic direct IO. Atomic
- writes are guaranteed to be stable once acknowledged by
- the operating system. Only Linux supports O_ATOMIC right
- now.
-
-buffered=bool If value is true, use buffered io. This is the opposite
- of the 'direct' option. Defaults to true.
-
-offset=int Start io at the given offset in the file. The data before
- the given offset will not be touched. This effectively
- caps the file size at real_size - offset.
-
-offset_increment=int If this is provided, then the real offset becomes
- offset + offset_increment * thread_number, where the thread
- number is a counter that starts at 0 and is incremented for
- each sub-job (i.e. when numjobs option is specified). This
- option is useful if there are several jobs which are intended
- to operate on a file in parallel disjoint segments, with
- even spacing between the starting points.
-
-number_ios=int Fio will normally perform IOs until it has exhausted the size
- of the region set by size=, or if it exhaust the allocated
- time (or hits an error condition). With this setting, the
- range/size can be set independently of the number of IOs to
- perform. When fio reaches this number, it will exit normally
- and report status. Note that this does not extend the amount
- of IO that will be done, it will only stop fio if this
- condition is met before other end-of-job criteria.
-
-fsync=int If writing to a file, issue a sync of the dirty data
- for every number of blocks given. For example, if you give
- 32 as a parameter, fio will sync the file for every 32
- writes issued. If fio is using non-buffered io, we may
- not sync the file. The exception is the sg io engine, which
- synchronizes the disk cache anyway.
-
-fdatasync=int Like fsync= but uses fdatasync() to only sync data and not
- metadata blocks.
- In FreeBSD and Windows there is no fdatasync(), this falls back to
- using fsync()
-
-sync_file_range=str:val Use sync_file_range() for every 'val' number of
- write operations. Fio will track range of writes that
- have happened since the last sync_file_range() call. 'str'
- can currently be one or more of:
-
- wait_before SYNC_FILE_RANGE_WAIT_BEFORE
- write SYNC_FILE_RANGE_WRITE
- wait_after SYNC_FILE_RANGE_WAIT_AFTER
-
- So if you do sync_file_range=wait_before,write:8, fio would
- use SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE for
- every 8 writes. Also see the sync_file_range(2) man page.
- This option is Linux specific.
-
-overwrite=bool If true, writes to a file will always overwrite existing
- data. If the file doesn't already exist, it will be
- created before the write phase begins. If the file exists
- and is large enough for the specified write phase, nothing
- will be done.
-
-end_fsync=bool If true, fsync file contents when a write stage has completed.
-
-fsync_on_close=bool If true, fio will fsync() a dirty file on close.
- This differs from end_fsync in that it will happen on every
- file close, not just at the end of the job.
-
-rwmixread=int How large a percentage of the mix should be reads.
-
-rwmixwrite=int How large a percentage of the mix should be writes. If both
- rwmixread and rwmixwrite is given and the values do not add
- up to 100%, the latter of the two will be used to override
- the first. This may interfere with a given rate setting,
- if fio is asked to limit reads or writes to a certain rate.
- If that is the case, then the distribution may be skewed.
-
-random_distribution=str:float By default, fio will use a completely uniform
- random distribution when asked to perform random IO. Sometimes
- it is useful to skew the distribution in specific ways,
- ensuring that some parts of the data is more hot than others.
- fio includes the following distribution models:
-
- random Uniform random distribution
- zipf Zipf distribution
- pareto Pareto distribution
-
- When using a zipf or pareto distribution, an input value
- is also needed to define the access pattern. For zipf, this
- is the zipf theta. For pareto, it's the pareto power. Fio
- includes a test program, genzipf, that can be used visualize
- what the given input values will yield in terms of hit rates.
- If you wanted to use zipf with a theta of 1.2, you would use
- random_distribution=zipf:1.2 as the option. If a non-uniform
- model is used, fio will disable use of the random map.
-
-percentage_random=int For a random workload, set how big a percentage should
- be random. This defaults to 100%, in which case the workload
- is fully random. It can be set from anywhere from 0 to 100.
- Setting it to 0 would make the workload fully sequential. Any
- setting in between will result in a random mix of sequential
- and random IO, at the given percentages. It is possible to
- set different values for reads, writes, and trim. To do so,
- simply use a comma separated list. See blocksize.
-
-norandommap Normally fio will cover every block of the file when doing
- random IO. If this option is given, fio will just get a
- new random offset without looking at past io history. This
- means that some blocks may not be read or written, and that
- some blocks may be read/written more than once. If this option
- is used with verify= and multiple blocksizes (via bsrange=),
- only intact blocks are verified, i.e., partially-overwritten
- blocks are ignored.
-
-softrandommap=bool See norandommap. If fio runs with the random block map
- enabled and it fails to allocate the map, if this option is
- set it will continue without a random block map. As coverage
- will not be as complete as with random maps, this option is
- disabled by default.
-
-random_generator=str Fio supports the following engines for generating
- IO offsets for random IO:
-
- tausworthe Strong 2^88 cycle random number generator
- lfsr Linear feedback shift register generator
-
- Tausworthe is a strong random number generator, but it
- requires tracking on the side if we want to ensure that
- blocks are only read or written once. LFSR guarantees
- that we never generate the same offset twice, and it's
- also less computationally expensive. It's not a true
- random generator, however, though for IO purposes it's
- typically good enough. LFSR only works with single
- block sizes, not with workloads that use multiple block
- sizes. If used with such a workload, fio may read or write
- some blocks multiple times.
-
-nice=int Run the job with the given nice value. See man nice(2).
-
-prio=int Set the io priority value of this job. Linux limits us to
- a positive value between 0 and 7, with 0 being the highest.
- See man ionice(1).
-
-prioclass=int Set the io priority class. See man ionice(1).
-
-thinktime=int Stall the job x microseconds after an io has completed before
- issuing the next. May be used to simulate processing being
- done by an application. See thinktime_blocks and
- thinktime_spin.
-
-thinktime_spin=int
- Only valid if thinktime is set - pretend to spend CPU time
- doing something with the data received, before falling back
- to sleeping for the rest of the period specified by
- thinktime.
-
-thinktime_blocks=int
- Only valid if thinktime is set - control how many blocks
- to issue, before waiting 'thinktime' usecs. If not set,
- defaults to 1 which will make fio wait 'thinktime' usecs
- after every block. This effectively makes any queue depth
- setting redundant, since no more than 1 IO will be queued
- before we have to complete it and do our thinktime. In
- other words, this setting effectively caps the queue depth
- if the latter is larger.
-
-rate=int Cap the bandwidth used by this job. The number is in bytes/sec,
- the normal suffix rules apply. You can use rate=500k to limit
- reads and writes to 500k each, or you can specify read and
- writes separately. Using rate=1m,500k would limit reads to
- 1MB/sec and writes to 500KB/sec. Capping only reads or
- writes can be done with rate=,500k or rate=500k,. The former
- will only limit writes (to 500KB/sec), the latter will only
- limit reads.
-
-ratemin=int Tell fio to do whatever it can to maintain at least this
- bandwidth. Failing to meet this requirement, will cause
- the job to exit. The same format as rate is used for
- read vs write separation.
-
-rate_iops=int Cap the bandwidth to this number of IOPS. Basically the same
- as rate, just specified independently of bandwidth. If the
- job is given a block size range instead of a fixed value,
- the smallest block size is used as the metric. The same format
- as rate is used for read vs write separation.
-
-rate_iops_min=int If fio doesn't meet this rate of IO, it will cause
- the job to exit. The same format as rate is used for read vs
- write separation.
-
-latency_target=int If set, fio will attempt to find the max performance
- point that the given workload will run at while maintaining a
- latency below this target. The values is given in microseconds.
- See latency_window and latency_percentile
-
-latency_window=int Used with latency_target to specify the sample window
- that the job is run at varying queue depths to test the
- performance. The value is given in microseconds.
-
-latency_percentile=float The percentage of IOs that must fall within the
- criteria specified by latency_target and latency_window. If not
- set, this defaults to 100.0, meaning that all IOs must be equal
- or below to the value set by latency_target.
-
-max_latency=int If set, fio will exit the job if it exceeds this maximum
- latency. It will exit with an ETIME error.
-
-ratecycle=int Average bandwidth for 'rate' and 'ratemin' over this number
- of milliseconds.
-
-cpumask=int Set the CPU affinity of this job. The parameter given is a
- bitmask of allowed CPU's the job may run on. So if you want
- the allowed CPUs to be 1 and 5, you would pass the decimal
- value of (1 << 1 | 1 << 5), or 34. See man
- sched_setaffinity(2). This may not work on all supported
- operating systems or kernel versions. This option doesn't
- work well for a higher CPU count than what you can store in
- an integer mask, so it can only control cpus 1-32. For
- boxes with larger CPU counts, use cpus_allowed.
-
-cpus_allowed=str Controls the same options as cpumask, but it allows a text
- setting of the permitted CPUs instead. So to use CPUs 1 and
- 5, you would specify cpus_allowed=1,5. This options also
- allows a range of CPUs. Say you wanted a binding to CPUs
- 1, 5, and 8-15, you would set cpus_allowed=1,5,8-15.
-
-cpus_allowed_policy=str Set the policy of how fio distributes the CPUs
- specified by cpus_allowed or cpumask. Two policies are
- supported:
-
- shared All jobs will share the CPU set specified.
- split Each job will get a unique CPU from the CPU set.
-
- 'shared' is the default behaviour, if the option isn't
- specified. If split is specified, then fio will will assign
- one cpu per job. If not enough CPUs are given for the jobs
- listed, then fio will roundrobin the CPUs in the set.
-
-numa_cpu_nodes=str Set this job running on spcified NUMA nodes' CPUs. The
- arguments allow comma delimited list of cpu numbers,
- A-B ranges, or 'all'. Note, to enable numa options support,
- fio must be built on a system with libnuma-dev(el) installed.
-
-numa_mem_policy=str Set this job's memory policy and corresponding NUMA
- nodes. Format of the argements:
- <mode>[:<nodelist>]
- `mode' is one of the following memory policy:
- default, prefer, bind, interleave, local
- For `default' and `local' memory policy, no node is
- needed to be specified.
- For `prefer', only one node is allowed.
- For `bind' and `interleave', it allow comma delimited
- list of numbers, A-B ranges, or 'all'.
-
-startdelay=time Start this job the specified number of seconds after fio
- has started. Only useful if the job file contains several
- jobs, and you want to delay starting some jobs to a certain
- time.
-
-runtime=time Tell fio to terminate processing after the specified number
- of seconds. It can be quite hard to determine for how long
- a specified job will run, so this parameter is handy to
- cap the total runtime to a given time.
-
-time_based If set, fio will run for the duration of the runtime
- specified even if the file(s) are completely read or
- written. It will simply loop over the same workload
- as many times as the runtime allows.
-
-ramp_time=time If set, fio will run the specified workload for this amount
- of time before logging any performance numbers. Useful for
- letting performance settle before logging results, thus
- minimizing the runtime required for stable results. Note
- that the ramp_time is considered lead in time for a job,
- thus it will increase the total runtime if a special timeout
- or runtime is specified.
-
-invalidate=bool Invalidate the buffer/page cache parts for this file prior
- to starting io. Defaults to true.
-
-sync=bool Use sync io for buffered writes. For the majority of the
- io engines, this means using O_SYNC.
-
-iomem=str
-mem=str Fio can use various types of memory as the io unit buffer.
- The allowed values are:
-
- malloc Use memory from malloc(3) as the buffers.
-
- shm Use shared memory as the buffers. Allocated
- through shmget(2).
-
- shmhuge Same as shm, but use huge pages as backing.
-
- mmap Use mmap to allocate buffers. May either be
- anonymous memory, or can be file backed if
- a filename is given after the option. The
- format is mem=mmap:/path/to/file.
-
- mmaphuge Use a memory mapped huge file as the buffer
- backing. Append filename after mmaphuge, ala
- mem=mmaphuge:/hugetlbfs/file
-
- The area allocated is a function of the maximum allowed
- bs size for the job, multiplied by the io depth given. Note
- that for shmhuge and mmaphuge to work, the system must have
- free huge pages allocated. This can normally be checked
- and set by reading/writing /proc/sys/vm/nr_hugepages on a
- Linux system. Fio assumes a huge page is 4MB in size. So
- to calculate the number of huge pages you need for a given
- job file, add up the io depth of all jobs (normally one unless
- iodepth= is used) and multiply by the maximum bs set. Then
- divide that number by the huge page size. You can see the
- size of the huge pages in /proc/meminfo. If no huge pages
- are allocated by having a non-zero number in nr_hugepages,
- using mmaphuge or shmhuge will fail. Also see hugepage-size.
-
- mmaphuge also needs to have hugetlbfs mounted and the file
- location should point there. So if it's mounted in /huge,
- you would use mem=mmaphuge:/huge/somefile.
-
-iomem_align=int This indiciates the memory alignment of the IO memory buffers.
- Note that the given alignment is applied to the first IO unit
- buffer, if using iodepth the alignment of the following buffers
- are given by the bs used. In other words, if using a bs that is
- a multiple of the page sized in the system, all buffers will
- be aligned to this value. If using a bs that is not page
- aligned, the alignment of subsequent IO memory buffers is the
- sum of the iomem_align and bs used.
-
-hugepage-size=int
- Defines the size of a huge page. Must at least be equal
- to the system setting, see /proc/meminfo. Defaults to 4MB.
- Should probably always be a multiple of megabytes, so using
- hugepage-size=Xm is the preferred way to set this to avoid
- setting a non-pow-2 bad value.
-
-exitall When one job finishes, terminate the rest. The default is
- to wait for each job to finish, sometimes that is not the
- desired action.
-
-bwavgtime=int Average the calculated bandwidth over the given time. Value
- is specified in milliseconds.
-
-iopsavgtime=int Average the calculated IOPS over the given time. Value
- is specified in milliseconds.
-
-create_serialize=bool If true, serialize the file creating for the jobs.
- This may be handy to avoid interleaving of data
- files, which may greatly depend on the filesystem
- used and even the number of processors in the system.
-
-create_fsync=bool fsync the data file after creation. This is the
- default.
-
-create_on_open=bool Don't pre-setup the files for IO, just create open()
- when it's time to do IO to that file.
-
-create_only=bool If true, fio will only run the setup phase of the job.
- If files need to be laid out or updated on disk, only
- that will be done. The actual job contents are not
- executed.
-
-pre_read=bool If this is given, files will be pre-read into memory before
- starting the given IO operation. This will also clear
- the 'invalidate' flag, since it is pointless to pre-read
- and then drop the cache. This will only work for IO engines
- that are seekable, since they allow you to read the same data
- multiple times. Thus it will not work on eg network or splice
- IO.
-
-unlink=bool Unlink the job files when done. Not the default, as repeated
- runs of that job would then waste time recreating the file
- set again and again.
-
-loops=int Run the specified number of iterations of this job. Used
- to repeat the same workload a given number of times. Defaults
- to 1.
-
-verify_only Do not perform specified workload---only verify data still
- matches previous invocation of this workload. This option
- allows one to check data multiple times at a later date
- without overwriting it. This option makes sense only for
- workloads that write data, and does not support workloads
- with the time_based option set.
-
-do_verify=bool Run the verify phase after a write phase. Only makes sense if
- verify is set. Defaults to 1.
-
-verify=str If writing to a file, fio can verify the file contents
- after each iteration of the job. The allowed values are:
-
- md5 Use an md5 sum of the data area and store
- it in the header of each block.
-
- crc64 Use an experimental crc64 sum of the data
- area and store it in the header of each
- block.
-
- crc32c Use a crc32c sum of the data area and store
- it in the header of each block.
-
- crc32c-intel Use hardware assisted crc32c calcuation
- provided on SSE4.2 enabled processors. Falls
- back to regular software crc32c, if not
- supported by the system.
-
- crc32 Use a crc32 sum of the data area and store
- it in the header of each block.
-
- crc16 Use a crc16 sum of the data area and store
- it in the header of each block.
-
- crc7 Use a crc7 sum of the data area and store
- it in the header of each block.
-
- xxhash Use xxhash as the checksum function. Generally
- the fastest software checksum that fio
- supports.
-
- sha512 Use sha512 as the checksum function.
-
- sha256 Use sha256 as the checksum function.
-
- sha1 Use optimized sha1 as the checksum function.
-
- meta Write extra information about each io
- (timestamp, block number etc.). The block
- number is verified. The io sequence number is
- verified for workloads that write data.
- See also verify_pattern.
-
- null Only pretend to verify. Useful for testing
- internals with ioengine=null, not for much
- else.
-
- This option can be used for repeated burn-in tests of a
- system to make sure that the written data is also
- correctly read back. If the data direction given is
- a read or random read, fio will assume that it should
- verify a previously written file. If the data direction
- includes any form of write, the verify will be of the
- newly written data.
-
-verifysort=bool If set, fio will sort written verify blocks when it deems
- it faster to read them back in a sorted manner. This is
- often the case when overwriting an existing file, since
- the blocks are already laid out in the file system. You
- can ignore this option unless doing huge amounts of really
- fast IO where the red-black tree sorting CPU time becomes
- significant.
-
-verify_offset=int Swap the verification header with data somewhere else
- in the block before writing. Its swapped back before
- verifying.
-
-verify_interval=int Write the verification header at a finer granularity
- than the blocksize. It will be written for chunks the
- size of header_interval. blocksize should divide this
- evenly.
-
-verify_pattern=str If set, fio will fill the io buffers with this
- pattern. Fio defaults to filling with totally random
- bytes, but sometimes it's interesting to fill with a known
- pattern for io verification purposes. Depending on the
- width of the pattern, fio will fill 1/2/3/4 bytes of the
- buffer at the time(it can be either a decimal or a hex number).
- The verify_pattern if larger than a 32-bit quantity has to
- be a hex number that starts with either "0x" or "0X". Use
- with verify=meta.
-
-verify_fatal=bool Normally fio will keep checking the entire contents
- before quitting on a block verification failure. If this
- option is set, fio will exit the job on the first observed
- failure.
-
-verify_dump=bool If set, dump the contents of both the original data
- block and the data block we read off disk to files. This
- allows later analysis to inspect just what kind of data
- corruption occurred. Off by default.
-
-verify_async=int Fio will normally verify IO inline from the submitting
- thread. This option takes an integer describing how many
- async offload threads to create for IO verification instead,
- causing fio to offload the duty of verifying IO contents
- to one or more separate threads. If using this offload
- option, even sync IO engines can benefit from using an
- iodepth setting higher than 1, as it allows them to have
- IO in flight while verifies are running.
-
-verify_async_cpus=str Tell fio to set the given CPU affinity on the
- async IO verification threads. See cpus_allowed for the
- format used.
-
-verify_backlog=int Fio will normally verify the written contents of a
- job that utilizes verify once that job has completed. In
- other words, everything is written then everything is read
- back and verified. You may want to verify continually
- instead for a variety of reasons. Fio stores the meta data
- associated with an IO block in memory, so for large
- verify workloads, quite a bit of memory would be used up
- holding this meta data. If this option is enabled, fio
- will write only N blocks before verifying these blocks.
-
-verify_backlog_batch=int Control how many blocks fio will verify
- if verify_backlog is set. If not set, will default to
- the value of verify_backlog (meaning the entire queue
- is read back and verified). If verify_backlog_batch is
- less than verify_backlog then not all blocks will be verified,
- if verify_backlog_batch is larger than verify_backlog, some
- blocks will be verified more than once.
-
-verify_state_save=bool When a job exits during the write phase of a verify
- workload, save its current state. This allows fio to replay
- up until that point, if the verify state is loaded for the
- verify read phase. The format of the filename is, roughly,
- <type>-<jobname>-<jobindex>-verify.state. <type> is "local"
- for a local run, "sock" for a client/server socket connection,
- and "ip" (192.168.0.1, for instance) for a networked
- client/server connection.
-
-verify_state_load=bool If a verify termination trigger was used, fio stores
- the current write state of each thread. This can be used at
- verification time so that fio knows how far it should verify.
- Without this information, fio will run a full verification
- pass, according to the settings in the job file used.
-
-stonewall
-wait_for_previous Wait for preceding jobs in the job file to exit, before
- starting this one. Can be used to insert serialization
- points in the job file. A stone wall also implies starting
- a new reporting group.
-
-new_group Start a new reporting group. See: group_reporting.
-
-numjobs=int Create the specified number of clones of this job. May be
- used to setup a larger number of threads/processes doing
- the same thing. Each thread is reported separately; to see
- statistics for all clones as a whole, use group_reporting in
- conjunction with new_group.
-
-group_reporting It may sometimes be interesting to display statistics for
- groups of jobs as a whole instead of for each individual job.
- This is especially true if 'numjobs' is used; looking at
- individual thread/process output quickly becomes unwieldy.
- To see the final report per-group instead of per-job, use
- 'group_reporting'. Jobs in a file will be part of the same
- reporting group, unless if separated by a stonewall, or by
- using 'new_group'.
-
-thread fio defaults to forking jobs, however if this option is
- given, fio will use pthread_create(3) to create threads
- instead.
-
-zonesize=int Divide a file into zones of the specified size. See zoneskip.
-
-zoneskip=int Skip the specified number of bytes when zonesize data has
- been read. The two zone options can be used to only do
- io on zones of a file.
-
-write_iolog=str Write the issued io patterns to the specified file. See
- read_iolog. Specify a separate file for each job, otherwise
- the iologs will be interspersed and the file may be corrupt.
-
-read_iolog=str Open an iolog with the specified file name and replay the
- io patterns it contains. This can be used to store a
- workload and replay it sometime later. The iolog given
- may also be a blktrace binary file, which allows fio
- to replay a workload captured by blktrace. See blktrace
- for how to capture such logging data. For blktrace replay,
- the file needs to be turned into a blkparse binary data
- file first (blkparse <device> -o /dev/null -d file_for_fio.bin).
-
-replay_no_stall=int When replaying I/O with read_iolog the default behavior
- is to attempt to respect the time stamps within the log and
- replay them with the appropriate delay between IOPS. By
- setting this variable fio will not respect the timestamps and
- attempt to replay them as fast as possible while still
- respecting ordering. The result is the same I/O pattern to a
- given device, but different timings.
-
-replay_redirect=str While replaying I/O patterns using read_iolog the
- default behavior is to replay the IOPS onto the major/minor
- device that each IOP was recorded from. This is sometimes
- undesirable because on a different machine those major/minor
- numbers can map to a different device. Changing hardware on
- the same system can also result in a different major/minor
- mapping. Replay_redirect causes all IOPS to be replayed onto
- the single specified device regardless of the device it was
- recorded from. i.e. replay_redirect=/dev/sdc would cause all
- IO in the blktrace to be replayed onto /dev/sdc. This means
- multiple devices will be replayed onto a single, if the trace
- contains multiple devices. If you want multiple devices to be
- replayed concurrently to multiple redirected devices you must
- blkparse your trace into separate traces and replay them with
- independent fio invocations. Unfortuantely this also breaks
- the strict time ordering between multiple device accesses.
-
-write_bw_log=str If given, write a bandwidth log of the jobs in this job
- file. Can be used to store data of the bandwidth of the
- jobs in their lifetime. The included fio_generate_plots
- script uses gnuplot to turn these text files into nice
- graphs. See write_lat_log for behaviour of given
- filename. For this option, the suffix is _bw.x.log, where
- x is the index of the job (1..N, where N is the number of
- jobs).
-
-write_lat_log=str Same as write_bw_log, except that this option stores io
- submission, completion, and total latencies instead. If no
- filename is given with this option, the default filename of
- "jobname_type.log" is used. Even if the filename is given,
- fio will still append the type of log. So if one specifies
+ **sha512**
+ Use sha512 as the checksum function.
+
+ **sha256**
+ Use sha256 as the checksum function.
+
+ **sha1**
+ Use optimized sha1 as the checksum function.
+
+ **sha3-224**
+ Use optimized sha3-224 as the checksum function.
+
+ **sha3-256**
+ Use optimized sha3-256 as the checksum function.
+
+ **sha3-384**
+ Use optimized sha3-384 as the checksum function.
+
+ **sha3-512**
+ Use optimized sha3-512 as the checksum function.
+
+ **meta**
+ This option is deprecated, since now meta information is included in
+ generic verification header and meta verification happens by
+ default. For detailed information see the description of the
+ :option:`verify` setting. This option is kept because of
+ compatibility's sake with old configurations. Do not use it.
+
+ **pattern**
+ Verify a strict pattern. Normally fio includes a header with some
+ basic information and checksumming, but if this option is set, only
+ the specific pattern set with :option:`verify_pattern` is verified.
+
+ **null**
+ Only pretend to verify. Useful for testing internals with
+ :option:`ioengine` `=null`, not for much else.
+
+ This option can be used for repeated burn-in tests of a system to make sure
+ that the written data is also correctly read back. If the data direction
+ given is a read or random read, fio will assume that it should verify a
+ previously written file. If the data direction includes any form of write,
+ the verify will be of the newly written data.
+
+.. option:: verifysort=bool
+
+ If true, fio will sort written verify blocks when it deems it faster to read
+ them back in a sorted manner. This is often the case when overwriting an
+ existing file, since the blocks are already laid out in the file system. You
+ can ignore this option unless doing huge amounts of really fast I/O where
+ the red-black tree sorting CPU time becomes significant. Default: true.
+
+.. option:: verifysort_nr=int
+
+ Pre-load and sort verify blocks for a read workload.
+
+.. option:: verify_offset=int
+
+ Swap the verification header with data somewhere else in the block before
+ writing. It is swapped back before verifying.
+
+.. option:: verify_interval=int
+
+ Write the verification header at a finer granularity than the
+ :option:`blocksize`. It will be written for chunks the size of
+ ``verify_interval``. :option:`blocksize` should divide this evenly.
+
+.. option:: verify_pattern=str
+
+ If set, fio will fill the I/O buffers with this pattern. Fio defaults to
+ filling with totally random bytes, but sometimes it's interesting to fill
+ with a known pattern for I/O verification purposes. Depending on the width
+ of the pattern, fio will fill 1/2/3/4 bytes of the buffer at the time(it can
+ be either a decimal or a hex number). The ``verify_pattern`` if larger than
+ a 32-bit quantity has to be a hex number that starts with either "0x" or
+ "0X". Use with :option:`verify`. Also, ``verify_pattern`` supports %o
+ format, which means that for each block offset will be written and then
+ verified back, e.g.::
+
+ verify_pattern=%o
+
+ Or use combination of everything::
+
+ verify_pattern=0xff%o"abcd"-12
+
+.. option:: verify_fatal=bool
+
+ Normally fio will keep checking the entire contents before quitting on a
+ block verification failure. If this option is set, fio will exit the job on
+ the first observed failure. Default: false.
+
+.. option:: verify_dump=bool
+
+ If set, dump the contents of both the original data block and the data block
+ we read off disk to files. This allows later analysis to inspect just what
+ kind of data corruption occurred. Off by default.
+
+.. option:: verify_async=int
+
+ Fio will normally verify I/O inline from the submitting thread. This option
+ takes an integer describing how many async offload threads to create for I/O
+ verification instead, causing fio to offload the duty of verifying I/O
+ contents to one or more separate threads. If using this offload option, even
+ sync I/O engines can benefit from using an :option:`iodepth` setting higher
+ than 1, as it allows them to have I/O in flight while verifies are running.
+
+.. option:: verify_async_cpus=str
+
+ Tell fio to set the given CPU affinity on the async I/O verification
+ threads. See :option:`cpus_allowed` for the format used.
+
+.. option:: verify_backlog=int
+
+ Fio will normally verify the written contents of a job that utilizes verify
+ once that job has completed. In other words, everything is written then
+ everything is read back and verified. You may want to verify continually
+ instead for a variety of reasons. Fio stores the meta data associated with
+ an I/O block in memory, so for large verify workloads, quite a bit of memory
+ would be used up holding this meta data. If this option is enabled, fio will
+ write only N blocks before verifying these blocks.
+
+.. option:: verify_backlog_batch=int
+
+ Control how many blocks fio will verify if :option:`verify_backlog` is
+ set. If not set, will default to the value of :option:`verify_backlog`
+ (meaning the entire queue is read back and verified). If
+ ``verify_backlog_batch`` is less than :option:`verify_backlog` then not all
+ blocks will be verified, if ``verify_backlog_batch`` is larger than
+ :option:`verify_backlog`, some blocks will be verified more than once.
+
+.. option:: verify_state_save=bool
+
+ When a job exits during the write phase of a verify workload, save its
+ current state. This allows fio to replay up until that point, if the verify
+ state is loaded for the verify read phase. The format of the filename is,
+ roughly::
+
+ <type>-<jobname>-<jobindex>-verify.state.
+
+ <type> is "local" for a local run, "sock" for a client/server socket
+ connection, and "ip" (192.168.0.1, for instance) for a networked
+ client/server connection.
+
+.. option:: verify_state_load=bool
+
+ If a verify termination trigger was used, fio stores the current write state
+ of each thread. This can be used at verification time so that fio knows how
+ far it should verify. Without this information, fio will run a full
+ verification pass, according to the settings in the job file used.
+
+.. option:: trim_percentage=int
+
+ Number of verify blocks to discard/trim.
+
+.. option:: trim_verify_zero=bool
+
+ Verify that trim/discarded blocks are returned as zeroes.
+
+.. option:: trim_backlog=int
+
+ Verify that trim/discarded blocks are returned as zeroes.
+
+.. option:: trim_backlog_batch=int
+
+ Trim this number of I/O blocks.
+
+.. option:: experimental_verify=bool
+
+ Enable experimental verification.
+
+
+Steady state
+~~~~~~~~~~~~
+
+.. option:: steadystate=str:float, ss=str:float
+
+ Define the criterion and limit for assessing steady state performance. The
+ first parameter designates the criterion whereas the second parameter sets
+ the threshold. When the criterion falls below the threshold for the
+ specified duration, the job will stop. For example, `iops_slope:0.1%` will
+ direct fio to terminate the job when the least squares regression slope
+ falls below 0.1% of the mean IOPS. If :option:`group_reporting` is enabled
+ this will apply to all jobs in the group. Below is the list of available
+ steady state assessment criteria. All assessments are carried out using only
+ data from the rolling collection window. Threshold limits can be expressed
+ as a fixed value or as a percentage of the mean in the collection window.
+
+ **iops**
+ Collect IOPS data. Stop the job if all individual IOPS measurements
+ are within the specified limit of the mean IOPS (e.g., ``iops:2``
+ means that all individual IOPS values must be within 2 of the mean,
+ whereas ``iops:0.2%`` means that all individual IOPS values must be
+ within 0.2% of the mean IOPS to terminate the job).
+
+ **iops_slope**
+ Collect IOPS data and calculate the least squares regression
+ slope. Stop the job if the slope falls below the specified limit.
+
+ **bw**
+ Collect bandwidth data. Stop the job if all individual bandwidth
+ measurements are within the specified limit of the mean bandwidth.
+
+ **bw_slope**
+ Collect bandwidth data and calculate the least squares regression
+ slope. Stop the job if the slope falls below the specified limit.
+
+.. option:: steadystate_duration=time, ss_dur=time
+
+ A rolling window of this duration will be used to judge whether steady state
+ has been reached. Data will be collected once per second. The default is 0
+ which disables steady state detection. When the unit is omitted, the
+ value is given in seconds.
+
+.. option:: steadystate_ramp_time=time, ss_ramp=time
+
+ Allow the job to run for the specified duration before beginning data
+ collection for checking the steady state job termination criterion. The
+ default is 0. When the unit is omitted, the value is given in seconds.
+
+
+Measurements and reporting
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. option:: per_job_logs=bool
+
+ If set, this generates bw/clat/iops log with per file private filenames. If
+ not set, jobs with identical names will share the log filename. Default:
+ true.
+
+.. option:: group_reporting
+
+ It may sometimes be interesting to display statistics for groups of jobs as
+ a whole instead of for each individual job. This is especially true if
+ :option:`numjobs` is used; looking at individual thread/process output
+ quickly becomes unwieldy. To see the final report per-group instead of
+ per-job, use :option:`group_reporting`. Jobs in a file will be part of the
+ same reporting group, unless if separated by a :option:`stonewall`, or by
+ using :option:`new_group`.
+
+.. option:: new_group
+
+ Start a new reporting group. See: :option:`group_reporting`. If not given,
+ all jobs in a file will be part of the same reporting group, unless
+ separated by a :option:`stonewall`.
+
+.. option:: write_bw_log=str
+
+ If given, write a bandwidth log for this job. Can be used to store data of
+ the bandwidth of the jobs in their lifetime. The included
+ :command:`fio_generate_plots` script uses :command:`gnuplot` to turn these
+ text files into nice graphs. See :option:`write_lat_log` for behaviour of
+ given filename. For this option, the postfix is :file:`_bw.x.log`, where `x`
+ is the index of the job (`1..N`, where `N` is the number of jobs). If
+ :option:`per_job_logs` is false, then the filename will not include the job
+ index. See `Log File Formats`_.
+
+.. option:: write_lat_log=str
+
+ Same as :option:`write_bw_log`, except that this option stores I/O
+ submission, completion, and total latencies instead. If no filename is given
+ with this option, the default filename of :file:`jobname_type.log` is
+ used. Even if the filename is given, fio will still append the type of
+ log. So if one specifies::
write_lat_log=foo
- The actual log names will be foo_slat.x.log, foo_clat.x.log,
- and foo_lat.x.log, where x is the index of the job (1..N,
- where N is the number of jobs). This helps fio_generate_plot
- fine the logs automatically.
-
-write_iops_log=str Same as write_bw_log, but writes IOPS. If no filename is
- given with this option, the default filename of
- "jobname_type.x.log" is used,where x is the index of the job
- (1..N, where N is the number of jobs). Even if the filename
- is given, fio will still append the type of log.
-
-log_avg_msec=int By default, fio will log an entry in the iops, latency,
- or bw log for every IO that completes. When writing to the
- disk log, that can quickly grow to a very large size. Setting
- this option makes fio average the each log entry over the
- specified period of time, reducing the resolution of the log.
- Defaults to 0.
-
-log_offset=int If this is set, the iolog options will include the byte
- offset for the IO entry as well as the other data values.
-
-log_compression=int If this is set, fio will compress the IO logs as
- it goes, to keep the memory footprint lower. When a log
- reaches the specified size, that chunk is removed and
- compressed in the background. Given that IO logs are
- fairly highly compressible, this yields a nice memory
- savings for longer runs. The downside is that the
- compression will consume some background CPU cycles, so
- it may impact the run. This, however, is also true if
- the logging ends up consuming most of the system memory.
- So pick your poison. The IO logs are saved normally at the
- end of a run, by decompressing the chunks and storing them
- in the specified log file. This feature depends on the
- availability of zlib.
-
-log_store_compressed=bool If set, and log_compression is also set,
- fio will store the log files in a compressed format. They
- can be decompressed with fio, using the --inflate-log
- command line parameter. The files will be stored with a
- .fz suffix.
-
-block_error_percentiles=bool If set, record errors in trim block-sized
- units from writes and trims and output a histogram of
- how many trims it took to get to errors, and what kind
- of error was encountered.
-
-lockmem=int Pin down the specified amount of memory with mlock(2). Can
- potentially be used instead of removing memory or booting
- with less memory to simulate a smaller amount of memory.
- The amount specified is per worker.
-
-exec_prerun=str Before running this job, issue the command specified
- through system(3). Output is redirected in a file called
- jobname.prerun.txt.
-
-exec_postrun=str After the job completes, issue the command specified
- though system(3). Output is redirected in a file called
- jobname.postrun.txt.
-
-ioscheduler=str Attempt to switch the device hosting the file to the specified
- io scheduler before running.
-
-disk_util=bool Generate disk utilization statistics, if the platform
- supports it. Defaults to on.
-
-disable_lat=bool Disable measurements of total latency numbers. Useful
- only for cutting back the number of calls to gettimeofday,
- as that does impact performance at really high IOPS rates.
- Note that to really get rid of a large amount of these
- calls, this option must be used with disable_slat and
- disable_bw as well.
-
-disable_clat=bool Disable measurements of completion latency numbers. See
- disable_lat.
-
-disable_slat=bool Disable measurements of submission latency numbers. See
- disable_slat.
-
-disable_bw=bool Disable measurements of throughput/bandwidth numbers. See
- disable_lat.
-
-clat_percentiles=bool Enable the reporting of percentiles of
- completion latencies.
-
-percentile_list=float_list Overwrite the default list of percentiles
- for completion latencies and the block error histogram.
- Each number is a floating number in the range (0,100],
- and the maximum length of the list is 20. Use ':'
- to separate the numbers, and list the numbers in ascending
- order. For example, --percentile_list=99.5:99.9 will cause
- fio to report the values of completion latency below which
- 99.5% and 99.9% of the observed latencies fell, respectively.
-
-clocksource=str Use the given clocksource as the base of timing. The
- supported options are:
-
- gettimeofday gettimeofday(2)
-
- clock_gettime clock_gettime(2)
-
- cpu Internal CPU clock source
-
- cpu is the preferred clocksource if it is reliable, as it
- is very fast (and fio is heavy on time calls). Fio will
- automatically use this clocksource if it's supported and
- considered reliable on the system it is running on, unless
- another clocksource is specifically set. For x86/x86-64 CPUs,
- this means supporting TSC Invariant.
-
-gtod_reduce=bool Enable all of the gettimeofday() reducing options
- (disable_clat, disable_slat, disable_bw) plus reduce
- precision of the timeout somewhat to really shrink
- the gettimeofday() call count. With this option enabled,
- we only do about 0.4% of the gtod() calls we would have
- done if all time keeping was enabled.
-
-gtod_cpu=int Sometimes it's cheaper to dedicate a single thread of
- execution to just getting the current time. Fio (and
- databases, for instance) are very intensive on gettimeofday()
- calls. With this option, you can set one CPU aside for
- doing nothing but logging current time to a shared memory
- location. Then the other threads/processes that run IO
- workloads need only copy that segment, instead of entering
- the kernel with a gettimeofday() call. The CPU set aside
- for doing these time calls will be excluded from other
- uses. Fio will manually clear it from the CPU mask of other
- jobs.
-
-continue_on_error=str Normally fio will exit the job on the first observed
- failure. If this option is set, fio will continue the job when
- there is a 'non-fatal error' (EIO or EILSEQ) until the runtime
- is exceeded or the I/O size specified is completed. If this
- option is used, there are two more stats that are appended,
- the total error count and the first error. The error field
- given in the stats is the first error that was hit during the
- run.
-
- The allowed values are:
+ The actual log names will be :file:`foo_slat.x.log`, :file:`foo_clat.x.log`,
+ and :file:`foo_lat.x.log`, where `x` is the index of the job (1..N, where N
+ is the number of jobs). This helps :command:`fio_generate_plot` find the
+ logs automatically. If :option:`per_job_logs` is false, then the filename
+ will not include the job index. See `Log File Formats`_.
- none Exit on any IO or verify errors.
+.. option:: write_hist_log=str
- read Continue on read errors, exit on all others.
+ Same as :option:`write_lat_log`, but writes I/O completion latency
+ histograms. If no filename is given with this option, the default filename
+ of :file:`jobname_clat_hist.x.log` is used, where `x` is the index of the
+ job (1..N, where `N` is the number of jobs). Even if the filename is given,
+ fio will still append the type of log. If :option:`per_job_logs` is false,
+ then the filename will not include the job index. See `Log File Formats`_.
- write Continue on write errors, exit on all others.
+.. option:: write_iops_log=str
- io Continue on any IO error, exit on all others.
+ Same as :option:`write_bw_log`, but writes IOPS. If no filename is given
+ with this option, the default filename of :file:`jobname_type.x.log` is
+ used,where `x` is the index of the job (1..N, where `N` is the number of
+ jobs). Even if the filename is given, fio will still append the type of
+ log. If :option:`per_job_logs` is false, then the filename will not include
+ the job index. See `Log File Formats`_.
- verify Continue on verify errors, exit on all others.
+.. option:: log_avg_msec=int
- all Continue on all errors.
+ By default, fio will log an entry in the iops, latency, or bw log for every
+ I/O that completes. When writing to the disk log, that can quickly grow to a
+ very large size. Setting this option makes fio average the each log entry
+ over the specified period of time, reducing the resolution of the log. See
+ :option:`log_max_value` as well. Defaults to 0, logging all entries.
- 0 Backward-compatible alias for 'none'.
+.. option:: log_hist_msec=int
- 1 Backward-compatible alias for 'all'.
+ Same as :option:`log_avg_msec`, but logs entries for completion latency
+ histograms. Computing latency percentiles from averages of intervals using
+ :option:`log_avg_msec` is inaccurate. Setting this option makes fio log
+ histogram entries over the specified period of time, reducing log sizes for
+ high IOPS devices while retaining percentile accuracy. See
+ :option:`log_hist_coarseness` as well. Defaults to 0, meaning histogram
+ logging is disabled.
-ignore_error=str Sometimes you want to ignore some errors during test
- in that case you can specify error list for each error type.
- ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST
- errors for given error type is separated with ':'. Error
- may be symbol ('ENOSPC', 'ENOMEM') or integer.
- Example:
- ignore_error=EAGAIN,ENOSPC:122
- This option will ignore EAGAIN from READ, and ENOSPC and
- 122(EDQUOT) from WRITE.
+.. option:: log_hist_coarseness=int
-error_dump=bool If set dump every error even if it is non fatal, true
- by default. If disabled only fatal error will be dumped
+ Integer ranging from 0 to 6, defining the coarseness of the resolution of
+ the histogram logs enabled with :option:`log_hist_msec`. For each increment
+ in coarseness, fio outputs half as many bins. Defaults to 0, for which
+ histogram logs contain 1216 latency bins. See `Log File Formats`_.
-cgroup=str Add job to this control group. If it doesn't exist, it will
- be created. The system must have a mounted cgroup blkio
- mount point for this to work. If your system doesn't have it
- mounted, you can do so with:
+.. option:: log_max_value=bool
- # mount -t cgroup -o blkio none /cgroup
+ If :option:`log_avg_msec` is set, fio logs the average over that window. If
+ you instead want to log the maximum value, set this option to 1. Defaults to
+ 0, meaning that averaged values are logged.
-cgroup_weight=int Set the weight of the cgroup to this value. See
- the documentation that comes with the kernel, allowed values
- are in the range of 100..1000.
+.. option:: log_offset=int
-cgroup_nodelete=bool Normally fio will delete the cgroups it has created after
- the job completion. To override this behavior and to leave
- cgroups around after the job completion, set cgroup_nodelete=1.
- This can be useful if one wants to inspect various cgroup
- files after job completion. Default: false
+ If this is set, the iolog options will include the byte offset for the I/O
+ entry as well as the other data values.
-uid=int Instead of running as the invoking user, set the user ID to
- this value before the thread/process does any work.
+.. option:: log_compression=int
-gid=int Set group ID, see uid.
+ If this is set, fio will compress the I/O logs as it goes, to keep the
+ memory footprint lower. When a log reaches the specified size, that chunk is
+ removed and compressed in the background. Given that I/O logs are fairly
+ highly compressible, this yields a nice memory savings for longer runs. The
+ downside is that the compression will consume some background CPU cycles, so
+ it may impact the run. This, however, is also true if the logging ends up
+ consuming most of the system memory. So pick your poison. The I/O logs are
+ saved normally at the end of a run, by decompressing the chunks and storing
+ them in the specified log file. This feature depends on the availability of
+ zlib.
-flow_id=int The ID of the flow. If not specified, it defaults to being a
- global flow. See flow.
+.. option:: log_compression_cpus=str
-flow=int Weight in token-based flow control. If this value is used, then
- there is a 'flow counter' which is used to regulate the
- proportion of activity between two or more jobs. fio attempts
- to keep this flow counter near zero. The 'flow' parameter
- stands for how much should be added or subtracted to the flow
- counter on each iteration of the main I/O loop. That is, if
- one job has flow=8 and another job has flow=-1, then there
- will be a roughly 1:8 ratio in how much one runs vs the other.
+ Define the set of CPUs that are allowed to handle online log compression for
+ the I/O jobs. This can provide better isolation between performance
+ sensitive jobs, and background compression work.
-flow_watermark=int The maximum value that the absolute value of the flow
- counter is allowed to reach before the job must wait for a
- lower value of the counter.
+.. option:: log_store_compressed=bool
-flow_sleep=int The period of time, in microseconds, to wait after the flow
- watermark has been exceeded before retrying operations
+ If set, fio will store the log files in a compressed format. They can be
+ decompressed with fio, using the :option:`--inflate-log` command line
+ parameter. The files will be stored with a :file:`.fz` suffix.
-In addition, there are some parameters which are only valid when a specific
-ioengine is in use. These are used identically to normal parameters, with the
-caveat that when used on the command line, they must come after the ioengine
-that defines them is selected.
-
-[libaio] userspace_reap Normally, with the libaio engine in use, fio will use
- the io_getevents system call to reap newly returned events.
- With this flag turned on, the AIO ring will be read directly
- from user-space to reap events. The reaping mode is only
- enabled when polling for a minimum of 0 events (eg when
- iodepth_batch_complete=0).
-
-[cpu] cpuload=int Attempt to use the specified percentage of CPU cycles.
-
-[cpu] cpuchunks=int Split the load into cycles of the given time. In
- microseconds.
-
-[cpu] exit_on_io_done=bool Detect when IO threads are done, then exit.
-
-[netsplice] hostname=str
-[net] hostname=str The host name or IP address to use for TCP or UDP based IO.
- If the job is a TCP listener or UDP reader, the hostname is not
- used and must be omitted unless it is a valid UDP multicast
- address.
-
-[netsplice] port=int
-[net] port=int The TCP or UDP port to bind to or connect to. If this is used
-with numjobs to spawn multiple instances of the same job type, then this will
-be the starting port number since fio will use a range of ports.
-
-[netsplice] interface=str
-[net] interface=str The IP address of the network interface used to send or
- receive UDP multicast
-
-[netsplice] ttl=int
-[net] ttl=int Time-to-live value for outgoing UDP multicast packets.
- Default: 1
-
-[netsplice] nodelay=bool
-[net] nodelay=bool Set TCP_NODELAY on TCP connections.
-
-[netsplice] protocol=str
-[netsplice] proto=str
-[net] protocol=str
-[net] proto=str The network protocol to use. Accepted values are:
-
- tcp Transmission control protocol
- tcpv6 Transmission control protocol V6
- udp User datagram protocol
- udpv6 User datagram protocol V6
- unix UNIX domain socket
-
- When the protocol is TCP or UDP, the port must also be given,
- as well as the hostname if the job is a TCP listener or UDP
- reader. For unix sockets, the normal filename option should be
- used and the port is invalid.
-
-[net] listen For TCP network connections, tell fio to listen for incoming
- connections rather than initiating an outgoing connection. The
- hostname must be omitted if this option is used.
-
-[net] pingpong Normaly a network writer will just continue writing data, and
- a network reader will just consume packages. If pingpong=1
- is set, a writer will send its normal payload to the reader,
- then wait for the reader to send the same payload back. This
- allows fio to measure network latencies. The submission
- and completion latencies then measure local time spent
- sending or receiving, and the completion latency measures
- how long it took for the other end to receive and send back.
- For UDP multicast traffic pingpong=1 should only be set for a
- single reader when multiple readers are listening to the same
- address.
-
-[net] window_size Set the desired socket buffer size for the connection.
-
-[net] mss Set the TCP maximum segment size (TCP_MAXSEG).
-
-[e4defrag] donorname=str
- File will be used as a block donor(swap extents between files)
-[e4defrag] inplace=int
- Configure donor file blocks allocation strategy
- 0(default): Preallocate donor's file on init
- 1 : allocate space immidietly inside defragment event,
- and free right after event
-
-[mtd] skip_bad=bool Skip operations against known bad blocks.
-
-
-6.0 Interpreting the output
----------------------------
-
-fio spits out a lot of output. While running, fio will display the
-status of the jobs created. An example of that would be:
-
-Threads: 1: [_r] [24.8% done] [ 13509/ 8334 kb/s] [eta 00h:01m:31s]
-
-The characters inside the square brackets denote the current status of
-each thread. The possible values (in typical life cycle order) are:
-
-Idle Run
----- ---
-P Thread setup, but not started.
-C Thread created.
-I Thread initialized, waiting or generating necessary data.
- p Thread running pre-reading file(s).
- R Running, doing sequential reads.
- r Running, doing random reads.
- W Running, doing sequential writes.
- w Running, doing random writes.
- M Running, doing mixed sequential reads/writes.
- m Running, doing mixed random reads/writes.
- F Running, currently waiting for fsync()
- f Running, finishing up (writing IO logs, etc)
- V Running, doing verification of written data.
-E Thread exited, not reaped by main thread yet.
-_ Thread reaped, or
-X Thread reaped, exited with an error.
-K Thread reaped, exited due to signal.
-
-Fio will condense the thread string as not to take up more space on the
-command line as is needed. For instance, if you have 10 readers and 10
-writers running, the output would look like this:
-
-Jobs: 20 (f=20): [R(10),W(10)] [4.0% done] [2103MB/0KB/0KB /s] [538K/0/0 iops] [eta 57m:36s]
-
-Fio will still maintain the ordering, though. So the above means that jobs
-1..10 are readers, and 11..20 are writers.
-
-The other values are fairly self explanatory - number of threads
-currently running and doing io, rate of io since last check (read speed
-listed first, then write speed), and the estimated completion percentage
-and time for the running group. It's impossible to estimate runtime of
-the following groups (if any). Note that the string is displayed in order,
-so it's possible to tell which of the jobs are currently doing what. The
-first character is the first job defined in the job file, and so forth.
-
-When fio is done (or interrupted by ctrl-c), it will show the data for
-each thread, group of threads, and disks in that order. For each data
-direction, the output looks like:
-
-Client1 (g=0): err= 0:
- write: io= 32MB, bw= 666KB/s, iops=89 , runt= 50320msec
- slat (msec): min= 0, max= 136, avg= 0.03, stdev= 1.92
- clat (msec): min= 0, max= 631, avg=48.50, stdev=86.82
- bw (KB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, stdev=681.68
- cpu : usr=1.49%, sys=0.25%, ctx=7969, majf=0, minf=17
- IO depths : 1=0.1%, 2=0.3%, 4=0.5%, 8=99.0%, 16=0.0%, 32=0.0%, >32=0.0%
- submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
- complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
- issued r/w: total=0/32768, short=0/0
- lat (msec): 2=1.6%, 4=0.0%, 10=3.2%, 20=12.8%, 50=38.4%, 100=24.8%,
- lat (msec): 250=15.2%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2048=0.0%
+.. option:: log_unix_epoch=bool
+
+ If set, fio will log Unix timestamps to the log files produced by enabling
+ write_type_log for each log type, instead of the default zero-based
+ timestamps.
+
+.. option:: block_error_percentiles=bool
+
+ If set, record errors in trim block-sized units from writes and trims and
+ output a histogram of how many trims it took to get to errors, and what kind
+ of error was encountered.
+
+.. option:: bwavgtime=int
+
+ Average the calculated bandwidth over the given time. Value is specified in
+ milliseconds. If the job also does bandwidth logging through
+ :option:`write_bw_log`, then the minimum of this option and
+ :option:`log_avg_msec` will be used. Default: 500ms.
+
+.. option:: iopsavgtime=int
+
+ Average the calculated IOPS over the given time. Value is specified in
+ milliseconds. If the job also does IOPS logging through
+ :option:`write_iops_log`, then the minimum of this option and
+ :option:`log_avg_msec` will be used. Default: 500ms.
+
+.. option:: disk_util=bool
+
+ Generate disk utilization statistics, if the platform supports it.
+ Default: true.
+
+.. option:: disable_lat=bool
+
+ Disable measurements of total latency numbers. Useful only for cutting back
+ the number of calls to :manpage:`gettimeofday(2)`, as that does impact
+ performance at really high IOPS rates. Note that to really get rid of a
+ large amount of these calls, this option must be used with
+ :option:`disable_slat` and :option:`disable_bw_measurement` as well.
+
+.. option:: disable_clat=bool
+
+ Disable measurements of completion latency numbers. See
+ :option:`disable_lat`.
+
+.. option:: disable_slat=bool
+
+ Disable measurements of submission latency numbers. See
+ :option:`disable_slat`.
+
+.. option:: disable_bw_measurement=bool, disable_bw=bool
+
+ Disable measurements of throughput/bandwidth numbers. See
+ :option:`disable_lat`.
+
+.. option:: clat_percentiles=bool
+
+ Enable the reporting of percentiles of completion latencies.
+
+.. option:: percentile_list=float_list
+
+ Overwrite the default list of percentiles for completion latencies and the
+ block error histogram. Each number is a floating number in the range
+ (0,100], and the maximum length of the list is 20. Use ``:`` to separate the
+ numbers, and list the numbers in ascending order. For example,
+ ``--percentile_list=99.5:99.9`` will cause fio to report the values of
+ completion latency below which 99.5% and 99.9% of the observed latencies
+ fell, respectively.
+
+
+Error handling
+~~~~~~~~~~~~~~
+
+.. option:: exitall_on_error
+
+ When one job finishes in error, terminate the rest. The default is to wait
+ for each job to finish.
+
+.. option:: continue_on_error=str
+
+ Normally fio will exit the job on the first observed failure. If this option
+ is set, fio will continue the job when there is a 'non-fatal error' (EIO or
+ EILSEQ) until the runtime is exceeded or the I/O size specified is
+ completed. If this option is used, there are two more stats that are
+ appended, the total error count and the first error. The error field given
+ in the stats is the first error that was hit during the run.
+
+ The allowed values are:
+
+ **none**
+ Exit on any I/O or verify errors.
+
+ **read**
+ Continue on read errors, exit on all others.
+
+ **write**
+ Continue on write errors, exit on all others.
+
+ **io**
+ Continue on any I/O error, exit on all others.
+
+ **verify**
+ Continue on verify errors, exit on all others.
+
+ **all**
+ Continue on all errors.
+
+ **0**
+ Backward-compatible alias for 'none'.
+
+ **1**
+ Backward-compatible alias for 'all'.
+
+.. option:: ignore_error=str
+
+ Sometimes you want to ignore some errors during test in that case you can
+ specify error list for each error type.
+ ``ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST`` errors for
+ given error type is separated with ':'. Error may be symbol ('ENOSPC',
+ 'ENOMEM') or integer. Example::
+
+ ignore_error=EAGAIN,ENOSPC:122
+
+ This option will ignore EAGAIN from READ, and ENOSPC and 122(EDQUOT) from
+ WRITE.
+
+.. option:: error_dump=bool
+
+ If set dump every error even if it is non fatal, true by default. If
+ disabled only fatal error will be dumped.
+
+Running predefined workloads
+----------------------------
+
+Fio includes predefined profiles that mimic the I/O workloads generated by
+other tools.
+
+.. option:: profile=str
+
+ The predefined workload to run. Current profiles are:
+
+ **tiobench**
+ Threaded I/O bench (tiotest/tiobench) like workload.
+
+ **act**
+ Aerospike Certification Tool (ACT) like workload.
+
+To view a profile's additional options use :option:`--cmdhelp` after specifying
+the profile. For example::
+
+$ fio --profile=act --cmdhelp
+
+Act profile options
+~~~~~~~~~~~~~~~~~~~
+
+.. option:: device-names=str
+ :noindex:
+
+ Devices to use.
+
+.. option:: load=int
+ :noindex:
+
+ ACT load multiplier. Default: 1.
+
+.. option:: test-duration=time
+ :noindex:
+
+ How long the entire test takes to run. Default: 24h.
+
+.. option:: threads-per-queue=int
+ :noindex:
+
+ Number of read IO threads per device. Default: 8.
+
+.. option:: read-req-num-512-blocks=int
+ :noindex:
+
+ Number of 512B blocks to read at the time. Default: 3.
+
+.. option:: large-block-op-kbytes=int
+ :noindex:
+
+ Size of large block ops in KiB (writes). Default: 131072.
+
+.. option:: prep
+ :noindex:
+
+ Set to run ACT prep phase.
+
+Tiobench profile options
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. option:: size=str
+ :noindex:
+
+ Size in MiB
+
+.. option:: block=int
+ :noindex:
+
+ Block size in bytes. Default: 4096.
+
+.. option:: numruns=int
+ :noindex:
+
+ Number of runs.
+
+.. option:: dir=str
+ :noindex:
+
+ Test directory.
+
+.. option:: threads=int
+ :noindex:
+
+ Number of threads.
+
+Interpreting the output
+-----------------------
+
+Fio spits out a lot of output. While running, fio will display the status of the
+jobs created. An example of that would be::
+
+ Jobs: 1 (f=1): [_(1),M(1)][24.8%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 01m:31s]
+
+The characters inside the square brackets denote the current status of each
+thread. The possible values (in typical life cycle order) are:
+
++------+-----+-----------------------------------------------------------+
+| Idle | Run | |
++======+=====+===========================================================+
+| P | | Thread setup, but not started. |
++------+-----+-----------------------------------------------------------+
+| C | | Thread created. |
++------+-----+-----------------------------------------------------------+
+| I | | Thread initialized, waiting or generating necessary data. |
++------+-----+-----------------------------------------------------------+
+| | p | Thread running pre-reading file(s). |
++------+-----+-----------------------------------------------------------+
+| | R | Running, doing sequential reads. |
++------+-----+-----------------------------------------------------------+
+| | r | Running, doing random reads. |
++------+-----+-----------------------------------------------------------+
+| | W | Running, doing sequential writes. |
++------+-----+-----------------------------------------------------------+
+| | w | Running, doing random writes. |
++------+-----+-----------------------------------------------------------+
+| | M | Running, doing mixed sequential reads/writes. |
++------+-----+-----------------------------------------------------------+
+| | m | Running, doing mixed random reads/writes. |
++------+-----+-----------------------------------------------------------+
+| | F | Running, currently waiting for :manpage:`fsync(2)` |
++------+-----+-----------------------------------------------------------+
+| | V | Running, doing verification of written data. |
++------+-----+-----------------------------------------------------------+
+| E | | Thread exited, not reaped by main thread yet. |
++------+-----+-----------------------------------------------------------+
+| _ | | Thread reaped, or |
++------+-----+-----------------------------------------------------------+
+| X | | Thread reaped, exited with an error. |
++------+-----+-----------------------------------------------------------+
+| K | | Thread reaped, exited due to signal. |
++------+-----+-----------------------------------------------------------+
+
+Fio will condense the thread string as not to take up more space on the command
+line as is needed. For instance, if you have 10 readers and 10 writers running,
+the output would look like this::
+
+ Jobs: 20 (f=20): [R(10),W(10)][4.0%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 57m:36s]
+
+Fio will still maintain the ordering, though. So the above means that jobs 1..10
+are readers, and 11..20 are writers.
+
+The other values are fairly self explanatory -- number of threads currently
+running and doing I/O, the number of currently open files (f=), the rate of I/O
+since last check (read speed listed first, then write speed and optionally trim
+speed), and the estimated completion percentage and time for the current
+running group. It's impossible to estimate runtime of the following groups (if
+any). Note that the string is displayed in order, so it's possible to tell which
+of the jobs are currently doing what. The first character is the first job
+defined in the job file, and so forth.
+
+When fio is done (or interrupted by :kbd:`ctrl-c`), it will show the data for
+each thread, group of threads, and disks in that order. For each data direction,
+the output looks like::
+
+ Client1 (g=0): err= 0:
+ write: io= 32MiB, bw= 666KiB/s, iops=89 , runt= 50320msec
+ slat (msec): min= 0, max= 136, avg= 0.03, stdev= 1.92
+ clat (msec): min= 0, max= 631, avg=48.50, stdev=86.82
+ bw (KiB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, stdev=681.68
+ cpu : usr=1.49%, sys=0.25%, ctx=7969, majf=0, minf=17
+ IO depths : 1=0.1%, 2=0.3%, 4=0.5%, 8=99.0%, 16=0.0%, 32=0.0%, >32=0.0%
+ submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
+ complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
+ issued r/w: total=0/32768, short=0/0
+ lat (msec): 2=1.6%, 4=0.0%, 10=3.2%, 20=12.8%, 50=38.4%, 100=24.8%,
+ lat (msec): 250=15.2%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2048=0.0%
The client number is printed, along with the group id and error of that
-thread. Below is the io statistics, here for writes. In the order listed,
-they denote:
-
-io= Number of megabytes io performed
-bw= Average bandwidth rate
-iops= Average IOs performed per second
-runt= The runtime of that thread
- slat= Submission latency (avg being the average, stdev being the
- standard deviation). This is the time it took to submit
- the io. For sync io, the slat is really the completion
- latency, since queue/complete is one operation there. This
- value can be in milliseconds or microseconds, fio will choose
- the most appropriate base and print that. In the example
- above, milliseconds is the best scale. Note: in --minimal mode
+thread. Below is the I/O statistics, here for writes. In the order listed, they
+denote:
+
+**io**
+ Number of megabytes I/O performed.
+
+**bw**
+ Average bandwidth rate.
+
+**iops**
+ Average I/Os performed per second.
+
+**runt**
+ The runtime of that thread.
+
+**slat**
+ Submission latency (avg being the average, stdev being the standard
+ deviation). This is the time it took to submit the I/O. For sync I/O,
+ the slat is really the completion latency, since queue/complete is one
+ operation there. This value can be in milliseconds or microseconds, fio
+ will choose the most appropriate base and print that. In the example
+ above, milliseconds is the best scale. Note: in :option:`--minimal` mode
latencies are always expressed in microseconds.
- clat= Completion latency. Same names as slat, this denotes the
- time from submission to completion of the io pieces. For
- sync io, clat will usually be equal (or very close) to 0,
- as the time from submit to complete is basically just
- CPU time (io has already been done, see slat explanation).
- bw= Bandwidth. Same names as the xlat stats, but also includes
- an approximate percentage of total aggregate bandwidth
- this thread received in this group. This last value is
- only really useful if the threads in this group are on the
- same disk, since they are then competing for disk access.
-cpu= CPU usage. User and system time, along with the number
- of context switches this thread went through, usage of
- system and user time, and finally the number of major
- and minor page faults.
-IO depths= The distribution of io depths over the job life time. The
- numbers are divided into powers of 2, so for example the
- 16= entries includes depths up to that value but higher
- than the previous entry. In other words, it covers the
- range from 16 to 31.
-IO submit= How many pieces of IO were submitting in a single submit
- call. Each entry denotes that amount and below, until
- the previous entry - eg, 8=100% mean that we submitted
- anywhere in between 5-8 ios per submit call.
-IO complete= Like the above submit number, but for completions instead.
-IO issued= The number of read/write requests issued, and how many
- of them were short.
-IO latencies= The distribution of IO completion latencies. This is the
- time from when IO leaves fio and when it gets completed.
- The numbers follow the same pattern as the IO depths,
- meaning that 2=1.6% means that 1.6% of the IO completed
- within 2 msecs, 20=12.8% means that 12.8% of the IO
- took more than 10 msecs, but less than (or equal to) 20 msecs.
+
+**clat**
+ Completion latency. Same names as slat, this denotes the time from
+ submission to completion of the I/O pieces. For sync I/O, clat will
+ usually be equal (or very close) to 0, as the time from submit to
+ complete is basically just CPU time (I/O has already been done, see slat
+ explanation).
+
+**bw**
+ Bandwidth. Same names as the xlat stats, but also includes an
+ approximate percentage of total aggregate bandwidth this thread received
+ in this group. This last value is only really useful if the threads in
+ this group are on the same disk, since they are then competing for disk
+ access.
+
+**cpu**
+ CPU usage. User and system time, along with the number of context
+ switches this thread went through, usage of system and user time, and
+ finally the number of major and minor page faults. The CPU utilization
+ numbers are averages for the jobs in that reporting group, while the
+ context and fault counters are summed.
+
+**IO depths**
+ The distribution of I/O depths over the job life time. The numbers are
+ divided into powers of 2, so for example the 16= entries includes depths
+ up to that value but higher than the previous entry. In other words, it
+ covers the range from 16 to 31.
+
+**IO submit**
+ How many pieces of I/O were submitting in a single submit call. Each
+ entry denotes that amount and below, until the previous entry -- e.g.,
+ 8=100% mean that we submitted anywhere in between 5-8 I/Os per submit
+ call.
+
+**IO complete**
+ Like the above submit number, but for completions instead.
+
+**IO issued**
+ The number of read/write requests issued, and how many of them were
+ short.
+
+**IO latencies**
+ The distribution of I/O completion latencies. This is the time from when
+ I/O leaves fio and when it gets completed. The numbers follow the same
+ pattern as the I/O depths, meaning that 2=1.6% means that 1.6% of the
+ I/O completed within 2 msecs, 20=12.8% means that 12.8% of the I/O took
+ more than 10 msecs, but less than (or equal to) 20 msecs.
After each client has been listed, the group statistics are printed. They
-will look like this:
+will look like this::
-Run status group 0 (all jobs):
- READ: io=64MB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec
- WRITE: io=64MB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec
+ Run status group 0 (all jobs):
+ READ: io=64MB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec
+ WRITE: io=64MB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec
For each data direction, it prints:
-io= Number of megabytes io performed.
-aggrb= Aggregate bandwidth of threads in this group.
-minb= The minimum average bandwidth a thread saw.
-maxb= The maximum average bandwidth a thread saw.
-mint= The smallest runtime of the threads in that group.
-maxt= The longest runtime of the threads in that group.
+**io**
+ Number of megabytes I/O performed.
+**aggrb**
+ Aggregate bandwidth of threads in this group.
+**minb**
+ The minimum average bandwidth a thread saw.
+**maxb**
+ The maximum average bandwidth a thread saw.
+**mint**
+ The smallest runtime of the threads in that group.
+**maxt**
+ The longest runtime of the threads in that group.
-And finally, the disk statistics are printed. They will look like this:
+And finally, the disk statistics are printed. They will look like this::
-Disk stats (read/write):
- sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
+ Disk stats (read/write):
+ sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
Each value is printed for both reads and writes, with reads first. The
numbers denote:
-ios= Number of ios performed by all groups.
-merge= Number of merges io the io scheduler.
-ticks= Number of ticks we kept the disk busy.
-io_queue= Total time spent in the disk queue.
-util= The disk utilization. A value of 100% means we kept the disk
+**ios**
+ Number of I/Os performed by all groups.
+**merge**
+ Number of merges I/O the I/O scheduler.
+**ticks**
+ Number of ticks we kept the disk busy.
+**io_queue**
+ Total time spent in the disk queue.
+**util**
+ The disk utilization. A value of 100% means we kept the disk
busy constantly, 50% would be a disk idling half of the time.
-It is also possible to get fio to dump the current output while it is
-running, without terminating the job. To do that, send fio the USR1 signal.
-You can also get regularly timed dumps by using the --status-interval
-parameter, or by creating a file in /tmp named fio-dump-status. If fio
-sees this file, it will unlink it and dump the current output status.
+It is also possible to get fio to dump the current output while it is running,
+without terminating the job. To do that, send fio the **USR1** signal. You can
+also get regularly timed dumps by using the :option:`--status-interval`
+parameter, or by creating a file in :file:`/tmp` named
+:file:`fio-dump-status`. If fio sees this file, it will unlink it and dump the
+current output status.
-7.0 Terse output
-----------------
+Terse output
+------------
-For scripted usage where you typically want to generate tables or graphs
-of the results, fio can output the results in a semicolon separated format.
-The format is one long line of values, such as:
+For scripted usage where you typically want to generate tables or graphs of the
+results, fio can output the results in a semicolon separated format. The format
+is one long line of values, such as::
-2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
-A description of this job goes here.
+ 2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
+ A description of this job goes here.
The job description (if provided) follows on a second line.
-To enable terse output, use the --minimal command line option. The first
-value is the version of the terse output format. If the output has to
-be changed for some reason, this number will be incremented by 1 to
-signify that change.
+To enable terse output, use the :option:`--minimal` command line option. The
+first value is the version of the terse output format. If the output has to be
+changed for some reason, this number will be incremented by 1 to signify that
+change.
Split up, the format is as follows:
- terse version, fio version, jobname, groupid, error
- READ status:
- Total IO (KB), bandwidth (KB/sec), IOPS, runtime (msec)
- Submission latency: min, max, mean, deviation (usec)
- Completion latency: min, max, mean, deviation (usec)
- Completion latency percentiles: 20 fields (see below)
- Total latency: min, max, mean, deviation (usec)
- Bw (KB/s): min, max, aggregate percentage of total, mean, deviation
- WRITE status:
- Total IO (KB), bandwidth (KB/sec), IOPS, runtime (msec)
- Submission latency: min, max, mean, deviation (usec)
- Completion latency: min, max, mean, deviation (usec)
- Completion latency percentiles: 20 fields (see below)
- Total latency: min, max, mean, deviation (usec)
- Bw (KB/s): min, max, aggregate percentage of total, mean, deviation
- CPU usage: user, system, context switches, major faults, minor faults
- IO depths: <=1, 2, 4, 8, 16, 32, >=64
- IO latencies microseconds: <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
- IO latencies milliseconds: <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
- Disk utilization: Disk name, Read ios, write ios,
- Read merges, write merges,
- Read ticks, write ticks,
- Time spent in queue, disk utilization percentage
- Additional Info (dependent on continue_on_error, default off): total # errors, first error code
-
- Additional Info (dependent on description being set): Text description
-
-Completion latency percentiles can be a grouping of up to 20 sets, so
-for the terse output fio writes all of them. Each field will look like this:
+ ::
+
+ terse version, fio version, jobname, groupid, error
+
+ READ status::
+
+ Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
+ Submission latency: min, max, mean, stdev (usec)
+ Completion latency: min, max, mean, stdev (usec)
+ Completion latency percentiles: 20 fields (see below)
+ Total latency: min, max, mean, stdev (usec)
+ Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev
+
+ WRITE status:
+
+ ::
+
+ Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
+ Submission latency: min, max, mean, stdev (usec)
+ Completion latency: min, max, mean, stdev(usec)
+ Completion latency percentiles: 20 fields (see below)
+ Total latency: min, max, mean, stdev (usec)
+ Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev
+
+ CPU usage::
+
+ user, system, context switches, major faults, minor faults
+
+ I/O depths::
+
+ <=1, 2, 4, 8, 16, 32, >=64
+
+ I/O latencies microseconds::
+
+ <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
+
+ I/O latencies milliseconds::
+
+ <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
+
+ Disk utilization::
+
+ Disk name, Read ios, write ios,
+ Read merges, write merges,
+ Read ticks, write ticks,
+ Time spent in queue, disk utilization percentage
+
+ Additional Info (dependent on continue_on_error, default off)::
+
+ total # errors, first error code
+
+ Additional Info (dependent on description being set)::
+
+ Text description
+
+Completion latency percentiles can be a grouping of up to 20 sets, so for the
+terse output fio writes all of them. Each field will look like this::
1.00%=6112
-which is the Xth percentile, and the usec latency associated with it.
+which is the Xth percentile, and the `usec` latency associated with it.
+
+For disk utilization, all disks used by fio are shown. So for each disk there
+will be a disk utilization section.
-For disk utilization, all disks used by fio are shown. So for each disk
-there will be a disk utilization section.
+Trace file format
+-----------------
-8.0 Trace file format
----------------------
-There are two trace file format that you can encounter. The older (v1) format
-is unsupported since version 1.20-rc3 (March 2008). It will still be described
+There are two trace file format that you can encounter. The older (v1) format is
+unsupported since version 1.20-rc3 (March 2008). It will still be described
below in case that you get an old trace and want to understand it.
In any case the trace is a simple text file with a single action per line.
-8.1 Trace file format v1
-------------------------
-Each line represents a single io action in the following format:
+Trace file format v1
+~~~~~~~~~~~~~~~~~~~~
+
+Each line represents a single I/O action in the following format::
-rw, offset, length
+ rw, offset, length
-where rw=0/1 for read/write, and the offset and length entries being in bytes.
+where `rw=0/1` for read/write, and the offset and length entries being in bytes.
-This format is not supported in Fio versions => 1.20-rc3.
+This format is not supported in fio versions => 1.20-rc3.
-8.2 Trace file format v2
-------------------------
-The second version of the trace file format was added in Fio version 1.17.
-It allows to access more then one file per trace and has a bigger set of
-possible file actions.
+Trace file format v2
+~~~~~~~~~~~~~~~~~~~~
-The first line of the trace file has to be:
+The second version of the trace file format was added in fio version 1.17. It
+allows to access more then one file per trace and has a bigger set of possible
+file actions.
-fio version 2 iolog
+The first line of the trace file has to be::
+
+ fio version 2 iolog
Following this can be lines in two different formats, which are described below.
-The file management format:
+The file management format::
-filename action
+ filename action
The filename is given as an absolute path. The action can be one of these:
-add Add the given filename to the trace
-open Open the file with the given filename. The filename has to have
- been added with the add action before.
-close Close the file with the given filename. The file has to have been
- opened before.
+**add**
+ Add the given filename to the trace.
+**open**
+ Open the file with the given filename. The filename has to have
+ been added with the **add** action before.
+**close**
+ Close the file with the given filename. The file has to have been
+ opened before.
+
+
+The file I/O action format::
+
+ filename action offset length
+
+The `filename` is given as an absolute path, and has to have been added and
+opened before it can be used with this format. The `offset` and `length` are
+given in bytes. The `action` can be one of these:
+
+**wait**
+ Wait for `offset` microseconds. Everything below 100 is discarded.
+ The time is relative to the previous `wait` statement.
+**read**
+ Read `length` bytes beginning from `offset`.
+**write**
+ Write `length` bytes beginning from `offset`.
+**sync**
+ :manpage:`fsync(2)` the file.
+**datasync**
+ :manpage:`fdatasync(2)` the file.
+**trim**
+ Trim the given file from the given `offset` for `length` bytes.
+
+CPU idleness profiling
+----------------------
+
+In some cases, we want to understand CPU overhead in a test. For example, we
+test patches for the specific goodness of whether they reduce CPU usage.
+Fio implements a balloon approach to create a thread per CPU that runs at idle
+priority, meaning that it only runs when nobody else needs the cpu.
+By measuring the amount of work completed by the thread, idleness of each CPU
+can be derived accordingly.
+
+An unit work is defined as touching a full page of unsigned characters. Mean and
+standard deviation of time to complete an unit work is reported in "unit work"
+section. Options can be chosen to report detailed percpu idleness or overall
+system idleness by aggregating percpu stats.
+
+
+Verification and triggers
+-------------------------
+Fio is usually run in one of two ways, when data verification is done. The first
+is a normal write job of some sort with verify enabled. When the write phase has
+completed, fio switches to reads and verifies everything it wrote. The second
+model is running just the write phase, and then later on running the same job
+(but with reads instead of writes) to repeat the same I/O patterns and verify
+the contents. Both of these methods depend on the write phase being completed,
+as fio otherwise has no idea how much data was written.
-The file io action format:
+With verification triggers, fio supports dumping the current write state to
+local files. Then a subsequent read verify workload can load this state and know
+exactly where to stop. This is useful for testing cases where power is cut to a
+server in a managed fashion, for instance.
-filename action offset length
+A verification trigger consists of two things:
-The filename is given as an absolute path, and has to have been added and opened
-before it can be used with this format. The offset and length are given in
-bytes. The action can be one of these:
+1) Storing the write state of each job.
+2) Executing a trigger command.
-wait Wait for 'offset' microseconds. Everything below 100 is discarded.
-read Read 'length' bytes beginning from 'offset'
-write Write 'length' bytes beginning from 'offset'
-sync fsync() the file
-datasync fdatasync() the file
-trim trim the given file from the given 'offset' for 'length' bytes
+The write state is relatively small, on the order of hundreds of bytes to single
+kilobytes. It contains information on the number of completions done, the last X
+completions, etc.
+A trigger is invoked either through creation ('touch') of a specified file in
+the system, or through a timeout setting. If fio is run with
+:option:`--trigger-file` = :file:`/tmp/trigger-file`, then it will continually
+check for the existence of :file:`/tmp/trigger-file`. When it sees this file, it
+will fire off the trigger (thus saving state, and executing the trigger
+command).
-9.0 CPU idleness profiling
---------------------------
-In some cases, we want to understand CPU overhead in a test. For example,
-we test patches for the specific goodness of whether they reduce CPU usage.
-fio implements a balloon approach to create a thread per CPU that runs at
-idle priority, meaning that it only runs when nobody else needs the cpu.
-By measuring the amount of work completed by the thread, idleness of each
-CPU can be derived accordingly.
+For client/server runs, there's both a local and remote trigger. If fio is
+running as a server backend, it will send the job states back to the client for
+safe storage, then execute the remote trigger, if specified. If a local trigger
+is specified, the server will still send back the write state, but the client
+will then execute the trigger.
-An unit work is defined as touching a full page of unsigned characters. Mean
-and standard deviation of time to complete an unit work is reported in "unit
-work" section. Options can be chosen to report detailed percpu idleness or
-overall system idleness by aggregating percpu stats.
+Verification trigger example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Lets say we want to run a powercut test on the remote machine 'server'. Our
+write workload is in :file:`write-test.fio`. We want to cut power to 'server' at
+some point during the run, and we'll run this test from the safety or our local
+machine, 'localbox'. On the server, we'll start the fio backend normally::
-10.0 Verification and triggers
-------------------------------
-Fio is usually run in one of two ways, when data verification is done. The
-first is a normal write job of some sort with verify enabled. When the
-write phase has completed, fio switches to reads and verifies everything
-it wrote. The second model is running just the write phase, and then later
-on running the same job (but with reads instead of writes) to repeat the
-same IO patterns and verify the contents. Both of these methods depend
-on the write phase being completed, as fio otherwise has no idea how much
-data was written.
+ server# fio --server
-With verification triggers, fio supports dumping the current write state
-to local files. Then a subsequent read verify workload can load this state
-and know exactly where to stop. This is useful for testing cases where
-power is cut to a server in a managed fashion, for instance.
+and on the client, we'll fire off the workload::
-A verification trigger consists of two things:
+ localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger-remote="bash -c \"echo b > /proc/sysrq-triger\""
-1) Storing the write state of each job
-2) Executing a trigger command
+We set :file:`/tmp/my-trigger` as the trigger file, and we tell fio to execute::
-The write state is relatively small, on the order of hundreds of bytes
-to single kilobytes. It contains information on the number of completions
-done, the last X completions, etc.
+ echo b > /proc/sysrq-trigger
-A trigger is invoked either through creation ('touch') of a specified
-file in the system, or through a timeout setting. If fio is run with
---trigger-file=/tmp/trigger-file, then it will continually check for
-the existence of /tmp/trigger-file. When it sees this file, it will
-fire off the trigger (thus saving state, and executing the trigger
-command).
+on the server once it has received the trigger and sent us the write state. This
+will work, but it's not **really** cutting power to the server, it's merely
+abruptly rebooting it. If we have a remote way of cutting power to the server
+through IPMI or similar, we could do that through a local trigger command
+instead. Lets assume we have a script that does IPMI reboot of a given hostname,
+ipmi-reboot. On localbox, we could then have run fio with a local trigger
+instead::
-For client/server runs, there's both a local and remote trigger. If
-fio is running as a server backend, it will send the job states back
-to the client for safe storage, then execute the remote trigger, if
-specified. If a local trigger is specified, the server will still send
-back the write state, but the client will then execute the trigger.
+ localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger="ipmi-reboot server"
-10.1 Verification trigger example
----------------------------------
-Lets say we want to run a powercut test on the remote machine 'server'.
-Our write workload is in write-test.fio. We want to cut power to 'server'
-at some point during the run, and we'll run this test from the safety
-or our local machine, 'localbox'. On the server, we'll start the fio
-backend normally:
+For this case, fio would wait for the server to send us the write state, then
+execute ``ipmi-reboot server`` when that happened.
-server# fio --server
+Loading verify state
+~~~~~~~~~~~~~~~~~~~~
-and on the client, we'll fire off the workload:
+To load store write state, read verification job file must contain the
+:option:`verify_state_load` option. If that is set, fio will load the previously
+stored state. For a local fio run this is done by loading the files directly,
+and on a client/server run, the server backend will ask the client to send the
+files over and load them from there.
-localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger-remote="bash -c \"echo b > /proc/sysrq-triger\""
-We set /tmp/my-trigger as the trigger file, and we tell fio to execute
+Log File Formats
+----------------
-echo b > /proc/sysrq-trigger
+Fio supports a variety of log file formats, for logging latencies, bandwidth,
+and IOPS. The logs share a common format, which looks like this:
-on the server once it has received the trigger and sent us the write
-state. This will work, but it's not _really_ cutting power to the server,
-it's merely abruptly rebooting it. If we have a remote way of cutting
-power to the server through IPMI or similar, we could do that through
-a local trigger command instead. Lets assume we have a script that does
-IPMI reboot of a given hostname, ipmi-reboot. On localbox, we could
-then have run fio with a local trigger instead:
+ *time* (`msec`), *value*, *data direction*, *offset*
-localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger="ipmi-reboot server"
+Time for the log entry is always in milliseconds. The *value* logged depends
+on the type of log, it will be one of the following:
-For this case, fio would wait for the server to send us the write state,
-then execute 'ipmi-reboot server' when that happened.
+ **Latency log**
+ Value is latency in usecs
+ **Bandwidth log**
+ Value is in KiB/sec
+ **IOPS log**
+ Value is IOPS
-10.1 Loading verify state
--------------------------
-To load store write state, read verification job file must contain
-the verify_state_load option. If that is set, fio will load the previously
-stored state. For a local fio run this is done by loading the files directly,
-and on a client/server run, the server backend will ask the client to send
-the files over and load them from there.
+*Data direction* is one of the following:
+
+ **0**
+ I/O is a READ
+ **1**
+ I/O is a WRITE
+ **2**
+ I/O is a TRIM
+
+The *offset* is the offset, in bytes, from the start of the file, for that
+particular I/O. The logging of the offset can be toggled with
+:option:`log_offset`.
+
+If windowed logging is enabled through :option:`log_avg_msec` then fio doesn't
+log individual I/Os. Instead of logs the average values over the specified period
+of time. Since 'data direction' and 'offset' are per-I/O values, they aren't
+applicable if windowed logging is enabled. If windowed logging is enabled and
+:option:`log_max_value` is set, then fio logs maximum values in that window
+instead of averages.
+
+
+Client/server
+-------------
+
+Normally fio is invoked as a stand-alone application on the machine where the
+I/O workload should be generated. However, the frontend and backend of fio can
+be run separately. Ie the fio server can generate an I/O workload on the "Device
+Under Test" while being controlled from another machine.
+
+Start the server on the machine which has access to the storage DUT::
+
+ fio --server=args
+
+where args defines what fio listens to. The arguments are of the form
+``type,hostname`` or ``IP,port``. *type* is either ``ip`` (or ip4) for TCP/IP
+v4, ``ip6`` for TCP/IP v6, or ``sock`` for a local unix domain socket.
+*hostname* is either a hostname or IP address, and *port* is the port to listen
+to (only valid for TCP/IP, not a local socket). Some examples:
+
+1) ``fio --server``
+
+ Start a fio server, listening on all interfaces on the default port (8765).
+
+2) ``fio --server=ip:hostname,4444``
+
+ Start a fio server, listening on IP belonging to hostname and on port 4444.
+
+3) ``fio --server=ip6:::1,4444``
+
+ Start a fio server, listening on IPv6 localhost ::1 and on port 4444.
+
+4) ``fio --server=,4444``
+
+ Start a fio server, listening on all interfaces on port 4444.
+
+5) ``fio --server=1.2.3.4``
+
+ Start a fio server, listening on IP 1.2.3.4 on the default port.
+
+6) ``fio --server=sock:/tmp/fio.sock``
+
+ Start a fio server, listening on the local socket /tmp/fio.sock.
+
+Once a server is running, a "client" can connect to the fio server with::
+
+ fio <local-args> --client=<server> <remote-args> <job file(s)>
+
+where `local-args` are arguments for the client where it is running, `server`
+is the connect string, and `remote-args` and `job file(s)` are sent to the
+server. The `server` string follows the same format as it does on the server
+side, to allow IP/hostname/socket and port strings.
+
+Fio can connect to multiple servers this way::
+
+ fio --client=<server1> <job file(s)> --client=<server2> <job file(s)>
+
+If the job file is located on the fio server, then you can tell the server to
+load a local file as well. This is done by using :option:`--remote-config` ::
+
+ fio --client=server --remote-config /path/to/file.fio
+
+Then fio will open this local (to the server) job file instead of being passed
+one from the client.
+
+If you have many servers (example: 100 VMs/containers), you can input a pathname
+of a file containing host IPs/names as the parameter value for the
+:option:`--client` option. For example, here is an example :file:`host.list`
+file containing 2 hostnames::
+
+ host1.your.dns.domain
+ host2.your.dns.domain
+
+The fio command would then be::
+
+ fio --client=host.list <job file(s)>
+
+In this mode, you cannot input server-specific parameters or job files -- all
+servers receive the same job file.
+
+In order to let ``fio --client`` runs use a shared filesystem from multiple
+hosts, ``fio --client`` now prepends the IP address of the server to the
+filename. For example, if fio is using directory :file:`/mnt/nfs/fio` and is
+writing filename :file:`fileio.tmp`, with a :option:`--client` `hostfile`
+containing two hostnames ``h1`` and ``h2`` with IP addresses 192.168.10.120 and
+192.168.10.121, then fio will create two files::
+
+ /mnt/nfs/fio/192.168.10.120.fileio.tmp
+ /mnt/nfs/fio/192.168.10.121.fileio.tmp