HOWTO: update command line option descriptions
[fio.git] / HOWTO
CommitLineData
f80dba8d
MT
1How fio works
2-------------
3
4The first step in getting fio to simulate a desired I/O workload, is writing a
5job file describing that specific setup. A job file may contain any number of
6threads and/or files -- the typical contents of the job file is a *global*
7section defining shared parameters, and one or more job sections describing the
8jobs involved. When run, fio parses this file and sets everything up as
9described. If we break down a job from top to bottom, it contains the following
10basic parameters:
11
12`I/O type`_
13
14 Defines the I/O pattern issued to the file(s). We may only be reading
15 sequentially from this file(s), or we may be writing randomly. Or even
16 mixing reads and writes, sequentially or randomly.
17 Should we be doing buffered I/O, or direct/raw I/O?
18
19`Block size`_
20
21 In how large chunks are we issuing I/O? This may be a single value,
22 or it may describe a range of block sizes.
23
24`I/O size`_
25
26 How much data are we going to be reading/writing.
27
28`I/O engine`_
29
30 How do we issue I/O? We could be memory mapping the file, we could be
31 using regular read/write, we could be using splice, async I/O, or even
32 SG (SCSI generic sg).
33
34`I/O depth`_
35
36 If the I/O engine is async, how large a queuing depth do we want to
37 maintain?
38
39
40`Target file/device`_
41
42 How many files are we spreading the workload over.
43
44`Threads, processes and job synchronization`_
45
46 How many threads or processes should we spread this workload over.
47
48The above are the basic parameters defined for a workload, in addition there's a
49multitude of parameters that modify other aspects of how this job behaves.
50
51
52Command line options
53--------------------
54
55.. option:: --debug=type
56
57 Enable verbose tracing of various fio actions. May be ``all`` for all types
c60ebc45 58 or individual types separated by a comma (e.g. ``--debug=file,mem`` will
f80dba8d
MT
59 enable file and memory debugging). Currently, additional logging is
60 available for:
61
62 *process*
63 Dump info related to processes.
64 *file*
65 Dump info related to file actions.
66 *io*
67 Dump info related to I/O queuing.
68 *mem*
69 Dump info related to memory allocations.
70 *blktrace*
71 Dump info related to blktrace setup.
72 *verify*
73 Dump info related to I/O verification.
74 *all*
75 Enable all debug options.
76 *random*
77 Dump info related to random offset generation.
78 *parse*
79 Dump info related to option matching and parsing.
80 *diskutil*
81 Dump info related to disk utilization updates.
82 *job:x*
83 Dump info only related to job number x.
84 *mutex*
85 Dump info only related to mutex up/down ops.
86 *profile*
87 Dump info related to profile extensions.
88 *time*
89 Dump info related to internal time keeping.
90 *net*
91 Dump info related to networking connections.
92 *rate*
93 Dump info related to I/O rate switching.
94 *compress*
95 Dump info related to log compress/decompress.
96 *?* or *help*
97 Show available debug options.
98
99.. option:: --parse-only
100
101 Parse options only, don\'t start any I/O.
102
103.. option:: --output=filename
104
105 Write output to file `filename`.
106
107.. option:: --bandwidth-log
108
109 Generate aggregate bandwidth logs.
110
111.. option:: --minimal
112
113 Print statistics in a terse, semicolon-delimited format.
114
115.. option:: --append-terse
116
117 Print statistics in selected mode AND terse, semicolon-delimited format.
118 **deprecated**, use :option:`--output-format` instead to select multiple
119 formats.
120
121.. option:: --output-format=type
122
123 Set the reporting format to `normal`, `terse`, `json`, or `json+`. Multiple
4502cb42 124 formats can be selected, separated by a comma. `terse` is a CSV based
f80dba8d
MT
125 format. `json+` is like `json`, except it adds a full dump of the latency
126 buckets.
127
128.. option:: --terse-version=type
129
a2c95580 130 Set terse version output format (default 3, or 2 or 4 or 5).
f80dba8d
MT
131
132.. option:: --version
133
134 Print version info and exit.
135
136.. option:: --help
137
113f0e7c 138 Print a summary of the command line options and exit.
f80dba8d
MT
139
140.. option:: --cpuclock-test
141
142 Perform test and validation of internal CPU clock.
143
113f0e7c 144.. option:: --crctest=[test]
f80dba8d 145
113f0e7c
SW
146 Test the speed of the built-in checksumming functions. If no argument is
147 given all of them are tested. Alternatively, a comma separated list can be passed, in
f80dba8d
MT
148 which case the given ones are tested.
149
150.. option:: --cmdhelp=command
151
152 Print help information for `command`. May be ``all`` for all commands.
153
154.. option:: --enghelp=[ioengine[,command]]
155
156 List all commands defined by :option:`ioengine`, or print help for `command`
157 defined by :option:`ioengine`. If no :option:`ioengine` is given, list all
158 available ioengines.
159
160.. option:: --showcmd=jobfile
161
162 Turn a job file into command line options.
163
164.. option:: --readonly
165
166 Turn on safety read-only checks, preventing writes. The ``--readonly``
167 option is an extra safety guard to prevent users from accidentally starting
168 a write workload when that is not desired. Fio will only write if
169 `rw=write/randwrite/rw/randrw` is given. This extra safety net can be used
170 as an extra precaution as ``--readonly`` will also enable a write check in
171 the I/O engine core to prevent writes due to unknown user space bug(s).
172
173.. option:: --eta=when
174
175 When real-time ETA estimate should be printed. May be `always`, `never` or
176 `auto`.
177
178.. option:: --eta-newline=time
179
947e0fe0
SW
180 Force a new line for every `time` period passed. When the unit is omitted,
181 the value is interpreted in seconds.
f80dba8d
MT
182
183.. option:: --status-interval=time
184
947e0fe0
SW
185 Force full status dump every `time` period passed. When the unit is
186 omitted, the value is interpreted in seconds.
f80dba8d
MT
187
188.. option:: --section=name
189
190 Only run specified section in job file. Multiple sections can be specified.
191 The ``--section`` option allows one to combine related jobs into one file.
192 E.g. one job file could define light, moderate, and heavy sections. Tell
193 fio to run only the "heavy" section by giving ``--section=heavy``
194 command line option. One can also specify the "write" operations in one
195 section and "verify" operation in another section. The ``--section`` option
196 only applies to job sections. The reserved *global* section is always
197 parsed and used.
198
199.. option:: --alloc-size=kb
200
113f0e7c 201 Set the internal smalloc pool to this size in KiB. The
f80dba8d
MT
202 ``--alloc-size`` switch allows one to use a larger pool size for smalloc.
203 If running large jobs with randommap enabled, fio can run out of memory.
204 Smalloc is an internal allocator for shared structures from a fixed size
113f0e7c 205 memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
f80dba8d
MT
206
207 NOTE: While running :file:`.fio_smalloc.*` backing store files are visible
208 in :file:`/tmp`.
209
210.. option:: --warnings-fatal
211
212 All fio parser warnings are fatal, causing fio to exit with an
213 error.
214
215.. option:: --max-jobs=nr
216
217 Maximum number of threads/processes to support.
218
219.. option:: --server=args
220
221 Start a backend server, with `args` specifying what to listen to.
222 See `Client/Server`_ section.
223
224.. option:: --daemonize=pidfile
225
226 Background a fio server, writing the pid to the given `pidfile` file.
227
228.. option:: --client=hostname
229
230 Instead of running the jobs locally, send and run them on the given host or
231 set of hosts. See `Client/Server`_ section.
232
233.. option:: --remote-config=file
234
235 Tell fio server to load this local file.
236
237.. option:: --idle-prof=option
238
113f0e7c
SW
239 Report CPU idleness. *option* is one of the following:
240
241 **calibrate**
242 Run unit work calibration only and exit.
243
244 **system**
245 Show aggregate system idleness and unit work.
246
247 **percpu**
248 As **system** but also show per CPU idleness.
f80dba8d
MT
249
250.. option:: --inflate-log=log
251
252 Inflate and output compressed log.
253
254.. option:: --trigger-file=file
255
256 Execute trigger cmd when file exists.
257
258.. option:: --trigger-timeout=t
259
260 Execute trigger at this time.
261
262.. option:: --trigger=cmd
263
264 Set this command as local trigger.
265
266.. option:: --trigger-remote=cmd
267
268 Set this command as remote trigger.
269
270.. option:: --aux-path=path
271
272 Use this path for fio state generated files.
273
274Any parameters following the options will be assumed to be job files, unless
275they match a job file parameter. Multiple job files can be listed and each job
276file will be regarded as a separate group. Fio will :option:`stonewall`
277execution between each group.
278
279
280Job file format
281---------------
282
283As previously described, fio accepts one or more job files describing what it is
284supposed to do. The job file format is the classic ini file, where the names
c60ebc45 285enclosed in [] brackets define the job name. You are free to use any ASCII name
f80dba8d
MT
286you want, except *global* which has special meaning. Following the job name is
287a sequence of zero or more parameters, one per line, that define the behavior of
288the job. If the first character in a line is a ';' or a '#', the entire line is
289discarded as a comment.
290
291A *global* section sets defaults for the jobs described in that file. A job may
292override a *global* section parameter, and a job file may even have several
293*global* sections if so desired. A job is only affected by a *global* section
294residing above it.
295
296The :option:`--cmdhelp` option also lists all options. If used with an `option`
297argument, :option:`--cmdhelp` will detail the given `option`.
298
299See the `examples/` directory for inspiration on how to write job files. Note
300the copyright and license requirements currently apply to `examples/` files.
301
302So let's look at a really simple job file that defines two processes, each
303randomly reading from a 128MiB file:
304
305.. code-block:: ini
306
307 ; -- start job file --
308 [global]
309 rw=randread
310 size=128m
311
312 [job1]
313
314 [job2]
315
316 ; -- end job file --
317
318As you can see, the job file sections themselves are empty as all the described
319parameters are shared. As no :option:`filename` option is given, fio makes up a
320`filename` for each of the jobs as it sees fit. On the command line, this job
321would look as follows::
322
323$ fio --name=global --rw=randread --size=128m --name=job1 --name=job2
324
325
326Let's look at an example that has a number of processes writing randomly to
327files:
328
329.. code-block:: ini
330
331 ; -- start job file --
332 [random-writers]
333 ioengine=libaio
334 iodepth=4
335 rw=randwrite
336 bs=32k
337 direct=0
338 size=64m
339 numjobs=4
340 ; -- end job file --
341
342Here we have no *global* section, as we only have one job defined anyway. We
343want to use async I/O here, with a depth of 4 for each file. We also increased
344the buffer size used to 32KiB and define numjobs to 4 to fork 4 identical
345jobs. The result is 4 processes each randomly writing to their own 64MiB
346file. Instead of using the above job file, you could have given the parameters
347on the command line. For this case, you would specify::
348
349$ fio --name=random-writers --ioengine=libaio --iodepth=4 --rw=randwrite --bs=32k --direct=0 --size=64m --numjobs=4
350
351When fio is utilized as a basis of any reasonably large test suite, it might be
352desirable to share a set of standardized settings across multiple job files.
353Instead of copy/pasting such settings, any section may pull in an external
354:file:`filename.fio` file with *include filename* directive, as in the following
355example::
356
357 ; -- start job file including.fio --
358 [global]
359 filename=/tmp/test
360 filesize=1m
361 include glob-include.fio
362
363 [test]
364 rw=randread
365 bs=4k
366 time_based=1
367 runtime=10
368 include test-include.fio
369 ; -- end job file including.fio --
370
371.. code-block:: ini
372
373 ; -- start job file glob-include.fio --
374 thread=1
375 group_reporting=1
376 ; -- end job file glob-include.fio --
377
378.. code-block:: ini
379
380 ; -- start job file test-include.fio --
381 ioengine=libaio
382 iodepth=4
383 ; -- end job file test-include.fio --
384
385Settings pulled into a section apply to that section only (except *global*
386section). Include directives may be nested in that any included file may contain
387further include directive(s). Include files may not contain [] sections.
388
389
390Environment variables
391~~~~~~~~~~~~~~~~~~~~~
392
393Fio also supports environment variable expansion in job files. Any sub-string of
394the form ``${VARNAME}`` as part of an option value (in other words, on the right
395of the '='), will be expanded to the value of the environment variable called
396`VARNAME`. If no such environment variable is defined, or `VARNAME` is the
397empty string, the empty string will be substituted.
398
399As an example, let's look at a sample fio invocation and job file::
400
401$ SIZE=64m NUMJOBS=4 fio jobfile.fio
402
403.. code-block:: ini
404
405 ; -- start job file --
406 [random-writers]
407 rw=randwrite
408 size=${SIZE}
409 numjobs=${NUMJOBS}
410 ; -- end job file --
411
412This will expand to the following equivalent job file at runtime:
413
414.. code-block:: ini
415
416 ; -- start job file --
417 [random-writers]
418 rw=randwrite
419 size=64m
420 numjobs=4
421 ; -- end job file --
422
423Fio ships with a few example job files, you can also look there for inspiration.
424
425Reserved keywords
426~~~~~~~~~~~~~~~~~
427
428Additionally, fio has a set of reserved keywords that will be replaced
429internally with the appropriate value. Those keywords are:
430
431**$pagesize**
432
433 The architecture page size of the running system.
434
435**$mb_memory**
436
437 Megabytes of total memory in the system.
438
439**$ncpus**
440
441 Number of online available CPUs.
442
443These can be used on the command line or in the job file, and will be
444automatically substituted with the current system values when the job is
445run. Simple math is also supported on these keywords, so you can perform actions
446like::
447
448 size=8*$mb_memory
449
450and get that properly expanded to 8 times the size of memory in the machine.
451
452
453Job file parameters
454-------------------
455
456This section describes in details each parameter associated with a job. Some
457parameters take an option of a given type, such as an integer or a
458string. Anywhere a numeric value is required, an arithmetic expression may be
459used, provided it is surrounded by parentheses. Supported operators are:
460
461 - addition (+)
462 - subtraction (-)
463 - multiplication (*)
464 - division (/)
465 - modulus (%)
466 - exponentiation (^)
467
468For time values in expressions, units are microseconds by default. This is
469different than for time values not in expressions (not enclosed in
470parentheses). The following types are used:
471
472
473Parameter types
474~~~~~~~~~~~~~~~
475
476**str**
477 String. This is a sequence of alpha characters.
478
479**time**
008d0feb
SW
480 Integer with possible time suffix. Without a unit value is interpreted as
481 seconds unless otherwise specified. Accepts a suffix of 'd' for days, 'h' for
482 hours, 'm' for minutes, 's' for seconds, 'ms' (or 'msec') for milliseconds and
483 'us' (or 'usec') for microseconds. For example, use 10m for 10 minutes.
f80dba8d
MT
484
485.. _int:
486
487**int**
488 Integer. A whole number value, which may contain an integer prefix
489 and an integer suffix:
490
491 [*integer prefix*] **number** [*integer suffix*]
492
493 The optional *integer prefix* specifies the number's base. The default
494 is decimal. *0x* specifies hexadecimal.
495
496 The optional *integer suffix* specifies the number's units, and includes an
497 optional unit prefix and an optional unit. For quantities of data, the
947e0fe0
SW
498 default unit is bytes. For quantities of time, the default unit is seconds
499 unless otherwise specified.
f80dba8d 500
9207a0cb 501 With :option:`kb_base`\=1000, fio follows international standards for unit
f80dba8d
MT
502 prefixes. To specify power-of-10 decimal values defined in the
503 International System of Units (SI):
504
505 * *Ki* -- means kilo (K) or 1000
506 * *Mi* -- means mega (M) or 1000**2
507 * *Gi* -- means giga (G) or 1000**3
508 * *Ti* -- means tera (T) or 1000**4
509 * *Pi* -- means peta (P) or 1000**5
510
511 To specify power-of-2 binary values defined in IEC 80000-13:
512
513 * *k* -- means kibi (Ki) or 1024
514 * *M* -- means mebi (Mi) or 1024**2
515 * *G* -- means gibi (Gi) or 1024**3
516 * *T* -- means tebi (Ti) or 1024**4
517 * *P* -- means pebi (Pi) or 1024**5
518
9207a0cb 519 With :option:`kb_base`\=1024 (the default), the unit prefixes are opposite
f80dba8d
MT
520 from those specified in the SI and IEC 80000-13 standards to provide
521 compatibility with old scripts. For example, 4k means 4096.
522
523 For quantities of data, an optional unit of 'B' may be included
524 (e.g., 'kB' is the same as 'k').
525
526 The *integer suffix* is not case sensitive (e.g., m/mi mean mebi/mega,
527 not milli). 'b' and 'B' both mean byte, not bit.
528
9207a0cb 529 Examples with :option:`kb_base`\=1000:
f80dba8d
MT
530
531 * *4 KiB*: 4096, 4096b, 4096B, 4ki, 4kib, 4kiB, 4Ki, 4KiB
532 * *1 MiB*: 1048576, 1mi, 1024ki
533 * *1 MB*: 1000000, 1m, 1000k
534 * *1 TiB*: 1099511627776, 1ti, 1024gi, 1048576mi
535 * *1 TB*: 1000000000, 1t, 1000m, 1000000k
536
9207a0cb 537 Examples with :option:`kb_base`\=1024 (default):
f80dba8d
MT
538
539 * *4 KiB*: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
540 * *1 MiB*: 1048576, 1m, 1024k
541 * *1 MB*: 1000000, 1mi, 1000ki
542 * *1 TiB*: 1099511627776, 1t, 1024g, 1048576m
543 * *1 TB*: 1000000000, 1ti, 1000mi, 1000000ki
544
545 To specify times (units are not case sensitive):
546
547 * *D* -- means days
548 * *H* -- means hours
4502cb42 549 * *M* -- means minutes
f80dba8d
MT
550 * *s* -- or sec means seconds (default)
551 * *ms* -- or *msec* means milliseconds
552 * *us* -- or *usec* means microseconds
553
554 If the option accepts an upper and lower range, use a colon ':' or
555 minus '-' to separate such values. See :ref:`irange <irange>`.
4502cb42
SW
556 If the lower value specified happens to be larger than the upper value
557 the two values are swapped.
f80dba8d
MT
558
559.. _bool:
560
561**bool**
562 Boolean. Usually parsed as an integer, however only defined for
563 true and false (1 and 0).
564
565.. _irange:
566
567**irange**
568 Integer range with suffix. Allows value range to be given, such as
c60ebc45 569 1024-4096. A colon may also be used as the separator, e.g. 1k:4k. If the
f80dba8d
MT
570 option allows two sets of ranges, they can be specified with a ',' or '/'
571 delimiter: 1k-4k/8k-32k. Also see :ref:`int <int>`.
572
573**float_list**
574 A list of floating point numbers, separated by a ':' character.
575
576
577Units
578~~~~~
579
580.. option:: kb_base=int
581
582 Select the interpretation of unit prefixes in input parameters.
583
584 **1000**
585 Inputs comply with IEC 80000-13 and the International
586 System of Units (SI). Use:
587
588 - power-of-2 values with IEC prefixes (e.g., KiB)
589 - power-of-10 values with SI prefixes (e.g., kB)
590
591 **1024**
592 Compatibility mode (default). To avoid breaking old scripts:
593
594 - power-of-2 values with SI prefixes
595 - power-of-10 values with IEC prefixes
596
597 See :option:`bs` for more details on input parameters.
598
599 Outputs always use correct prefixes. Most outputs include both
600 side-by-side, like::
601
602 bw=2383.3kB/s (2327.4KiB/s)
603
604 If only one value is reported, then kb_base selects the one to use:
605
606 **1000** -- SI prefixes
607
608 **1024** -- IEC prefixes
609
610.. option:: unit_base=int
611
612 Base unit for reporting. Allowed values are:
613
614 **0**
615 Use auto-detection (default).
616 **8**
617 Byte based.
618 **1**
619 Bit based.
620
621
622With the above in mind, here follows the complete list of fio job parameters.
623
624
625Job description
626~~~~~~~~~~~~~~~
627
628.. option:: name=str
629
630 ASCII name of the job. This may be used to override the name printed by fio
631 for this job. Otherwise the job name is used. On the command line this
632 parameter has the special purpose of also signaling the start of a new job.
633
634.. option:: description=str
635
636 Text description of the job. Doesn't do anything except dump this text
637 description when this job is run. It's not parsed.
638
639.. option:: loops=int
640
641 Run the specified number of iterations of this job. Used to repeat the same
642 workload a given number of times. Defaults to 1.
643
644.. option:: numjobs=int
645
79591fa9
TK
646 Create the specified number of clones of this job. Each clone of job
647 is spawned as an independent thread or process. May be used to setup a
f80dba8d
MT
648 larger number of threads/processes doing the same thing. Each thread is
649 reported separately; to see statistics for all clones as a whole, use
650 :option:`group_reporting` in conjunction with :option:`new_group`.
a47b697c 651 See :option:`--max-jobs`. Default: 1.
f80dba8d
MT
652
653
654Time related parameters
655~~~~~~~~~~~~~~~~~~~~~~~
656
657.. option:: runtime=time
658
f75ede1d 659 Tell fio to terminate processing after the specified period of time. It
f80dba8d 660 can be quite hard to determine for how long a specified job will run, so
f75ede1d 661 this parameter is handy to cap the total runtime to a given time. When
947e0fe0 662 the unit is omitted, the value is intepreted in seconds.
f80dba8d
MT
663
664.. option:: time_based
665
666 If set, fio will run for the duration of the :option:`runtime` specified
667 even if the file(s) are completely read or written. It will simply loop over
668 the same workload as many times as the :option:`runtime` allows.
669
a881438b 670.. option:: startdelay=irange(time)
f80dba8d 671
947e0fe0
SW
672 Delay the start of job for the specified amount of time. Can be a single
673 value or a range. When given as a range, each thread will choose a value
674 randomly from within the range. Value is in seconds if a unit is omitted.
f80dba8d
MT
675
676.. option:: ramp_time=time
677
678 If set, fio will run the specified workload for this amount of time before
679 logging any performance numbers. Useful for letting performance settle
680 before logging results, thus minimizing the runtime required for stable
681 results. Note that the ``ramp_time`` is considered lead in time for a job,
682 thus it will increase the total runtime if a special timeout or
f75ede1d
SW
683 :option:`runtime` is specified. When the unit is omitted, the value is
684 given in seconds.
f80dba8d
MT
685
686.. option:: clocksource=str
687
688 Use the given clocksource as the base of timing. The supported options are:
689
690 **gettimeofday**
691 :manpage:`gettimeofday(2)`
692
693 **clock_gettime**
694 :manpage:`clock_gettime(2)`
695
696 **cpu**
697 Internal CPU clock source
698
699 cpu is the preferred clocksource if it is reliable, as it is very fast (and
700 fio is heavy on time calls). Fio will automatically use this clocksource if
701 it's supported and considered reliable on the system it is running on,
702 unless another clocksource is specifically set. For x86/x86-64 CPUs, this
703 means supporting TSC Invariant.
704
705.. option:: gtod_reduce=bool
706
707 Enable all of the :manpage:`gettimeofday(2)` reducing options
f75ede1d 708 (:option:`disable_clat`, :option:`disable_slat`, :option:`disable_bw_measurement`) plus
f80dba8d
MT
709 reduce precision of the timeout somewhat to really shrink the
710 :manpage:`gettimeofday(2)` call count. With this option enabled, we only do
711 about 0.4% of the :manpage:`gettimeofday(2)` calls we would have done if all
712 time keeping was enabled.
713
714.. option:: gtod_cpu=int
715
716 Sometimes it's cheaper to dedicate a single thread of execution to just
717 getting the current time. Fio (and databases, for instance) are very
718 intensive on :manpage:`gettimeofday(2)` calls. With this option, you can set
719 one CPU aside for doing nothing but logging current time to a shared memory
720 location. Then the other threads/processes that run I/O workloads need only
721 copy that segment, instead of entering the kernel with a
722 :manpage:`gettimeofday(2)` call. The CPU set aside for doing these time
723 calls will be excluded from other uses. Fio will manually clear it from the
724 CPU mask of other jobs.
725
726
727Target file/device
728~~~~~~~~~~~~~~~~~~
729
730.. option:: directory=str
731
732 Prefix filenames with this directory. Used to place files in a different
733 location than :file:`./`. You can specify a number of directories by
734 separating the names with a ':' character. These directories will be
02dd2689 735 assigned equally distributed to job clones created by :option:`numjobs` as
f80dba8d
MT
736 long as they are using generated filenames. If specific `filename(s)` are
737 set fio will use the first listed directory, and thereby matching the
738 `filename` semantic which generates a file each clone if not specified, but
739 let all clones use the same if set.
740
02dd2689
SW
741 See the :option:`filename` option for information on how to escape "``:``" and
742 "``\``" characters within the directory path itself.
f80dba8d
MT
743
744.. option:: filename=str
745
746 Fio normally makes up a `filename` based on the job name, thread number, and
02dd2689
SW
747 file number (see :option:`filename_format`). If you want to share files
748 between threads in a job or several
79591fa9
TK
749 jobs with fixed file paths, specify a `filename` for each of them to override
750 the default. If the ioengine is file based, you can specify a number of files
751 by separating the names with a ':' colon. So if you wanted a job to open
752 :file:`/dev/sda` and :file:`/dev/sdb` as the two working files, you would use
753 ``filename=/dev/sda:/dev/sdb``. This also means that whenever this option is
754 specified, :option:`nrfiles` is ignored. The size of regular files specified
02dd2689 755 by this option will be :option:`size` divided by number of files unless an
79591fa9
TK
756 explicit size is specified by :option:`filesize`.
757
02dd2689
SW
758 Each colon and backslash in the wanted path must be escaped with a ``\``
759 character. For instance, if the path is :file:`/dev/dsk/foo@3,0:c` then you
760 would use ``filename=/dev/dsk/foo@3,0\:c`` and if the path is
761 :file:`F:\\filename` then you would use ``filename=F\:\\filename``.
762
f80dba8d
MT
763 On Windows, disk devices are accessed as :file:`\\\\.\\PhysicalDrive0` for
764 the first device, :file:`\\\\.\\PhysicalDrive1` for the second etc.
765 Note: Windows and FreeBSD prevent write access to areas
02dd2689
SW
766 of the disk containing in-use data (e.g. filesystems).
767
768 The filename "`-`" is a reserved name, meaning *stdin* or *stdout*. Which
769 of the two depends on the read/write direction set.
f80dba8d
MT
770
771.. option:: filename_format=str
772
773 If sharing multiple files between jobs, it is usually necessary to have fio
774 generate the exact names that you want. By default, fio will name a file
775 based on the default file format specification of
776 :file:`jobname.jobnumber.filenumber`. With this option, that can be
777 customized. Fio will recognize and replace the following keywords in this
778 string:
779
780 **$jobname**
781 The name of the worker thread or process.
782 **$jobnum**
783 The incremental number of the worker thread or process.
784 **$filenum**
785 The incremental number of the file for that worker thread or
786 process.
787
788 To have dependent jobs share a set of files, this option can be set to have
789 fio generate filenames that are shared between the two. For instance, if
790 :file:`testfiles.$filenum` is specified, file number 4 for any job will be
791 named :file:`testfiles.4`. The default of :file:`$jobname.$jobnum.$filenum`
792 will be used if no other format specifier is given.
793
794.. option:: unique_filename=bool
795
796 To avoid collisions between networked clients, fio defaults to prefixing any
797 generated filenames (with a directory specified) with the source of the
798 client connecting. To disable this behavior, set this option to 0.
799
800.. option:: opendir=str
801
802 Recursively open any files below directory `str`.
803
804.. option:: lockfile=str
805
806 Fio defaults to not locking any files before it does I/O to them. If a file
807 or file descriptor is shared, fio can serialize I/O to that file to make the
808 end result consistent. This is usual for emulating real workloads that share
809 files. The lock modes are:
810
811 **none**
812 No locking. The default.
813 **exclusive**
814 Only one thread or process may do I/O at a time, excluding all
815 others.
816 **readwrite**
817 Read-write locking on the file. Many readers may
818 access the file at the same time, but writes get exclusive access.
819
820.. option:: nrfiles=int
821
79591fa9
TK
822 Number of files to use for this job. Defaults to 1. The size of files
823 will be :option:`size` divided by this unless explicit size is specified by
824 :option:`filesize`. Files are created for each thread separately, and each
825 file will have a file number within its name by default, as explained in
826 :option:`filename` section.
827
f80dba8d
MT
828
829.. option:: openfiles=int
830
831 Number of files to keep open at the same time. Defaults to the same as
832 :option:`nrfiles`, can be set smaller to limit the number simultaneous
833 opens.
834
835.. option:: file_service_type=str
836
837 Defines how fio decides which file from a job to service next. The following
838 types are defined:
839
840 **random**
841 Choose a file at random.
842
843 **roundrobin**
844 Round robin over opened files. This is the default.
845
846 **sequential**
847 Finish one file before moving on to the next. Multiple files can
848 still be open depending on 'openfiles'.
849
850 **zipf**
c60ebc45 851 Use a *Zipf* distribution to decide what file to access.
f80dba8d
MT
852
853 **pareto**
c60ebc45 854 Use a *Pareto* distribution to decide what file to access.
f80dba8d
MT
855
856 **gauss**
c60ebc45 857 Use a *Gaussian* (normal) distribution to decide what file to
f80dba8d
MT
858 access.
859
860 For *random*, *roundrobin*, and *sequential*, a postfix can be appended to
861 tell fio how many I/Os to issue before switching to a new file. For example,
862 specifying ``file_service_type=random:8`` would cause fio to issue
863 8 I/Os before selecting a new file at random. For the non-uniform
864 distributions, a floating point postfix can be given to influence how the
865 distribution is skewed. See :option:`random_distribution` for a description
866 of how that would work.
867
868.. option:: ioscheduler=str
869
870 Attempt to switch the device hosting the file to the specified I/O scheduler
871 before running.
872
873.. option:: create_serialize=bool
874
875 If true, serialize the file creation for the jobs. This may be handy to
876 avoid interleaving of data files, which may greatly depend on the filesystem
a47b697c 877 used and even the number of processors in the system. Default: true.
f80dba8d
MT
878
879.. option:: create_fsync=bool
880
881 fsync the data file after creation. This is the default.
882
883.. option:: create_on_open=bool
884
885 Don't pre-setup the files for I/O, just create open() when it's time to do
a47b697c 886 I/O to that file. Default: false.
f80dba8d
MT
887
888.. option:: create_only=bool
889
890 If true, fio will only run the setup phase of the job. If files need to be
4502cb42 891 laid out or updated on disk, only that will be done -- the actual job contents
a47b697c 892 are not executed. Default: false.
f80dba8d
MT
893
894.. option:: allow_file_create=bool
895
896 If true, fio is permitted to create files as part of its workload. This is
897 the default behavior. If this option is false, then fio will error out if
898 the files it needs to use don't already exist. Default: true.
899
900.. option:: allow_mounted_write=bool
901
c60ebc45 902 If this isn't set, fio will abort jobs that are destructive (e.g. that write)
f80dba8d
MT
903 to what appears to be a mounted device or partition. This should help catch
904 creating inadvertently destructive tests, not realizing that the test will
b1db0375
TK
905 destroy data on the mounted file system. Note that some platforms don't allow
906 writing against a mounted device regardless of this option. Default: false.
f80dba8d
MT
907
908.. option:: pre_read=bool
909
910 If this is given, files will be pre-read into memory before starting the
911 given I/O operation. This will also clear the :option:`invalidate` flag,
912 since it is pointless to pre-read and then drop the cache. This will only
913 work for I/O engines that are seek-able, since they allow you to read the
a47b697c
SW
914 same data multiple times. Thus it will not work on non-seekable I/O engines
915 (e.g. network, splice). Default: false.
f80dba8d
MT
916
917.. option:: unlink=bool
918
919 Unlink the job files when done. Not the default, as repeated runs of that
a47b697c
SW
920 job would then waste time recreating the file set again and again. Default:
921 false.
f80dba8d
MT
922
923.. option:: unlink_each_loop=bool
924
a47b697c 925 Unlink job files after each iteration or loop. Default: false.
f80dba8d
MT
926
927.. option:: zonesize=int
928
929 Divide a file into zones of the specified size. See :option:`zoneskip`.
930
931.. option:: zonerange=int
932
933 Give size of an I/O zone. See :option:`zoneskip`.
934
935.. option:: zoneskip=int
936
937 Skip the specified number of bytes when :option:`zonesize` data has been
938 read. The two zone options can be used to only do I/O on zones of a file.
939
940
941I/O type
942~~~~~~~~
943
944.. option:: direct=bool
945
946 If value is true, use non-buffered I/O. This is usually O_DIRECT. Note that
947 ZFS on Solaris doesn't support direct I/O. On Windows the synchronous
948 ioengines don't support direct I/O. Default: false.
949
950.. option:: atomic=bool
951
952 If value is true, attempt to use atomic direct I/O. Atomic writes are
953 guaranteed to be stable once acknowledged by the operating system. Only
954 Linux supports O_ATOMIC right now.
955
956.. option:: buffered=bool
957
958 If value is true, use buffered I/O. This is the opposite of the
959 :option:`direct` option. Defaults to true.
960
961.. option:: readwrite=str, rw=str
962
963 Type of I/O pattern. Accepted values are:
964
965 **read**
966 Sequential reads.
967 **write**
968 Sequential writes.
969 **trim**
970 Sequential trims (Linux block devices only).
971 **randwrite**
972 Random writes.
973 **randread**
974 Random reads.
975 **randtrim**
976 Random trims (Linux block devices only).
977 **rw,readwrite**
978 Sequential mixed reads and writes.
979 **randrw**
980 Random mixed reads and writes.
981 **trimwrite**
982 Sequential trim+write sequences. Blocks will be trimmed first,
983 then the same blocks will be written to.
984
985 Fio defaults to read if the option is not specified. For the mixed I/O
986 types, the default is to split them 50/50. For certain types of I/O the
987 result may still be skewed a bit, since the speed may be different. It is
988 possible to specify a number of I/O's to do before getting a new offset,
989 this is done by appending a ``:<nr>`` to the end of the string given. For a
990 random read, it would look like ``rw=randread:8`` for passing in an offset
991 modifier with a value of 8. If the suffix is used with a sequential I/O
992 pattern, then the value specified will be added to the generated offset for
993 each I/O. For instance, using ``rw=write:4k`` will skip 4k for every
994 write. It turns sequential I/O into sequential I/O with holes. See the
995 :option:`rw_sequencer` option.
996
997.. option:: rw_sequencer=str
998
999 If an offset modifier is given by appending a number to the ``rw=<str>``
1000 line, then this option controls how that number modifies the I/O offset
1001 being generated. Accepted values are:
1002
1003 **sequential**
1004 Generate sequential offset.
1005 **identical**
1006 Generate the same offset.
1007
1008 ``sequential`` is only useful for random I/O, where fio would normally
c60ebc45 1009 generate a new random offset for every I/O. If you append e.g. 8 to randread,
f80dba8d
MT
1010 you would get a new random offset for every 8 I/O's. The result would be a
1011 seek for only every 8 I/O's, instead of for every I/O. Use ``rw=randread:8``
1012 to specify that. As sequential I/O is already sequential, setting
1013 ``sequential`` for that would not result in any differences. ``identical``
1014 behaves in a similar fashion, except it sends the same offset 8 number of
1015 times before generating a new offset.
1016
1017.. option:: unified_rw_reporting=bool
1018
1019 Fio normally reports statistics on a per data direction basis, meaning that
1020 reads, writes, and trims are accounted and reported separately. If this
1021 option is set fio sums the results and report them as "mixed" instead.
1022
1023.. option:: randrepeat=bool
1024
1025 Seed the random number generator used for random I/O patterns in a
1026 predictable way so the pattern is repeatable across runs. Default: true.
1027
1028.. option:: allrandrepeat=bool
1029
1030 Seed all random number generators in a predictable way so results are
1031 repeatable across runs. Default: false.
1032
1033.. option:: randseed=int
1034
1035 Seed the random number generators based on this seed value, to be able to
1036 control what sequence of output is being generated. If not set, the random
1037 sequence depends on the :option:`randrepeat` setting.
1038
1039.. option:: fallocate=str
1040
1041 Whether pre-allocation is performed when laying down files.
1042 Accepted values are:
1043
1044 **none**
1045 Do not pre-allocate space.
1046
1047 **posix**
1048 Pre-allocate via :manpage:`posix_fallocate(3)`.
1049
1050 **keep**
1051 Pre-allocate via :manpage:`fallocate(2)` with
1052 FALLOC_FL_KEEP_SIZE set.
1053
1054 **0**
1055 Backward-compatible alias for **none**.
1056
1057 **1**
1058 Backward-compatible alias for **posix**.
1059
1060 May not be available on all supported platforms. **keep** is only available
1061 on Linux. If using ZFS on Solaris this must be set to **none** because ZFS
1062 doesn't support it. Default: **posix**.
1063
1064.. option:: fadvise_hint=str
1065
1066 Use :manpage:`posix_fadvise(2)` to advise the kernel on what I/O patterns
1067 are likely to be issued. Accepted values are:
1068
1069 **0**
1070 Backwards-compatible hint for "no hint".
1071
1072 **1**
1073 Backwards compatible hint for "advise with fio workload type". This
1074 uses **FADV_RANDOM** for a random workload, and **FADV_SEQUENTIAL**
1075 for a sequential workload.
1076
1077 **sequential**
1078 Advise using **FADV_SEQUENTIAL**.
1079
1080 **random**
1081 Advise using **FADV_RANDOM**.
1082
1083.. option:: fadvise_stream=int
1084
1085 Use :manpage:`posix_fadvise(2)` to advise the kernel what stream ID the
1086 writes issued belong to. Only supported on Linux. Note, this option may
1087 change going forward.
1088
1089.. option:: offset=int
1090
89978a6b
BW
1091 Start I/O at the provided offset in the file, given as either a fixed size or
1092 a percentage. If a percentage is given, the next ``blockalign``-ed offset
1093 will be used. Data before the given offset will not be touched. This
1094 effectively caps the file size at `real_size - offset`. Can be combined with
1095 :option:`size` to constrain the start and end range of the I/O workload.
f80dba8d
MT
1096
1097.. option:: offset_increment=int
1098
1099 If this is provided, then the real offset becomes `offset + offset_increment
1100 * thread_number`, where the thread number is a counter that starts at 0 and
1101 is incremented for each sub-job (i.e. when :option:`numjobs` option is
1102 specified). This option is useful if there are several jobs which are
1103 intended to operate on a file in parallel disjoint segments, with even
1104 spacing between the starting points.
1105
1106.. option:: number_ios=int
1107
c60ebc45 1108 Fio will normally perform I/Os until it has exhausted the size of the region
f80dba8d
MT
1109 set by :option:`size`, or if it exhaust the allocated time (or hits an error
1110 condition). With this setting, the range/size can be set independently of
c60ebc45 1111 the number of I/Os to perform. When fio reaches this number, it will exit
f80dba8d
MT
1112 normally and report status. Note that this does not extend the amount of I/O
1113 that will be done, it will only stop fio if this condition is met before
1114 other end-of-job criteria.
1115
1116.. option:: fsync=int
1117
1118 If writing to a file, issue a sync of the dirty data for every number of
1119 blocks given. For example, if you give 32 as a parameter, fio will sync the
1120 file for every 32 writes issued. If fio is using non-buffered I/O, we may
1121 not sync the file. The exception is the sg I/O engine, which synchronizes
54227e6b
TK
1122 the disk cache anyway. Defaults to 0, which means no sync every certain
1123 number of writes.
f80dba8d
MT
1124
1125.. option:: fdatasync=int
1126
1127 Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
000a5f1c 1128 not metadata blocks. In Windows, FreeBSD, and DragonFlyBSD there is no
f80dba8d 1129 :manpage:`fdatasync(2)`, this falls back to using :manpage:`fsync(2)`.
54227e6b 1130 Defaults to 0, which means no sync data every certain number of writes.
f80dba8d
MT
1131
1132.. option:: write_barrier=int
1133
1134 Make every `N-th` write a barrier write.
1135
1136.. option:: sync_file_range=str:val
1137
1138 Use :manpage:`sync_file_range(2)` for every `val` number of write
1139 operations. Fio will track range of writes that have happened since the last
1140 :manpage:`sync_file_range(2)` call. `str` can currently be one or more of:
1141
1142 **wait_before**
1143 SYNC_FILE_RANGE_WAIT_BEFORE
1144 **write**
1145 SYNC_FILE_RANGE_WRITE
1146 **wait_after**
1147 SYNC_FILE_RANGE_WAIT_AFTER
1148
1149 So if you do ``sync_file_range=wait_before,write:8``, fio would use
1150 ``SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE`` for every 8
1151 writes. Also see the :manpage:`sync_file_range(2)` man page. This option is
1152 Linux specific.
1153
1154.. option:: overwrite=bool
1155
1156 If true, writes to a file will always overwrite existing data. If the file
1157 doesn't already exist, it will be created before the write phase begins. If
1158 the file exists and is large enough for the specified write phase, nothing
a47b697c 1159 will be done. Default: false.
f80dba8d
MT
1160
1161.. option:: end_fsync=bool
1162
a47b697c
SW
1163 If true, :manpage:`fsync(2)` file contents when a write stage has completed.
1164 Default: false.
f80dba8d
MT
1165
1166.. option:: fsync_on_close=bool
1167
1168 If true, fio will :manpage:`fsync(2)` a dirty file on close. This differs
a47b697c
SW
1169 from :option:`end_fsync` in that it will happen on every file close, not
1170 just at the end of the job. Default: false.
f80dba8d
MT
1171
1172.. option:: rwmixread=int
1173
1174 Percentage of a mixed workload that should be reads. Default: 50.
1175
1176.. option:: rwmixwrite=int
1177
1178 Percentage of a mixed workload that should be writes. If both
1179 :option:`rwmixread` and :option:`rwmixwrite` is given and the values do not
1180 add up to 100%, the latter of the two will be used to override the
1181 first. This may interfere with a given rate setting, if fio is asked to
1182 limit reads or writes to a certain rate. If that is the case, then the
1183 distribution may be skewed. Default: 50.
1184
1185.. option:: random_distribution=str:float[,str:float][,str:float]
1186
1187 By default, fio will use a completely uniform random distribution when asked
1188 to perform random I/O. Sometimes it is useful to skew the distribution in
1189 specific ways, ensuring that some parts of the data is more hot than others.
1190 fio includes the following distribution models:
1191
1192 **random**
1193 Uniform random distribution
1194
1195 **zipf**
1196 Zipf distribution
1197
1198 **pareto**
1199 Pareto distribution
1200
1201 **gauss**
c60ebc45 1202 Normal (Gaussian) distribution
f80dba8d
MT
1203
1204 **zoned**
1205 Zoned random distribution
1206
1207 When using a **zipf** or **pareto** distribution, an input value is also
1208 needed to define the access pattern. For **zipf**, this is the `zipf
c60ebc45 1209 theta`. For **pareto**, it's the `Pareto power`. Fio includes a test
f80dba8d
MT
1210 program, :command:`genzipf`, that can be used visualize what the given input
1211 values will yield in terms of hit rates. If you wanted to use **zipf** with
1212 a `theta` of 1.2, you would use ``random_distribution=zipf:1.2`` as the
1213 option. If a non-uniform model is used, fio will disable use of the random
1214 map. For the **gauss** distribution, a normal deviation is supplied as a
1215 value between 0 and 100.
1216
1217 For a **zoned** distribution, fio supports specifying percentages of I/O
1218 access that should fall within what range of the file or device. For
1219 example, given a criteria of:
1220
1221 * 60% of accesses should be to the first 10%
1222 * 30% of accesses should be to the next 20%
1223 * 8% of accesses should be to to the next 30%
1224 * 2% of accesses should be to the next 40%
1225
1226 we can define that through zoning of the random accesses. For the above
1227 example, the user would do::
1228
1229 random_distribution=zoned:60/10:30/20:8/30:2/40
1230
1231 similarly to how :option:`bssplit` works for setting ranges and percentages
1232 of block sizes. Like :option:`bssplit`, it's possible to specify separate
1233 zones for reads, writes, and trims. If just one set is given, it'll apply to
1234 all of them.
1235
1236.. option:: percentage_random=int[,int][,int]
1237
1238 For a random workload, set how big a percentage should be random. This
1239 defaults to 100%, in which case the workload is fully random. It can be set
1240 from anywhere from 0 to 100. Setting it to 0 would make the workload fully
1241 sequential. Any setting in between will result in a random mix of sequential
1242 and random I/O, at the given percentages. Comma-separated values may be
1243 specified for reads, writes, and trims as described in :option:`blocksize`.
1244
1245.. option:: norandommap
1246
1247 Normally fio will cover every block of the file when doing random I/O. If
1248 this option is given, fio will just get a new random offset without looking
1249 at past I/O history. This means that some blocks may not be read or written,
1250 and that some blocks may be read/written more than once. If this option is
1251 used with :option:`verify` and multiple blocksizes (via :option:`bsrange`),
1252 only intact blocks are verified, i.e., partially-overwritten blocks are
1253 ignored.
1254
1255.. option:: softrandommap=bool
1256
1257 See :option:`norandommap`. If fio runs with the random block map enabled and
1258 it fails to allocate the map, if this option is set it will continue without
1259 a random block map. As coverage will not be as complete as with random maps,
1260 this option is disabled by default.
1261
1262.. option:: random_generator=str
1263
1264 Fio supports the following engines for generating
1265 I/O offsets for random I/O:
1266
1267 **tausworthe**
1268 Strong 2^88 cycle random number generator
1269 **lfsr**
1270 Linear feedback shift register generator
1271 **tausworthe64**
1272 Strong 64-bit 2^258 cycle random number generator
1273
1274 **tausworthe** is a strong random number generator, but it requires tracking
1275 on the side if we want to ensure that blocks are only read or written
1276 once. **LFSR** guarantees that we never generate the same offset twice, and
1277 it's also less computationally expensive. It's not a true random generator,
1278 however, though for I/O purposes it's typically good enough. **LFSR** only
1279 works with single block sizes, not with workloads that use multiple block
1280 sizes. If used with such a workload, fio may read or write some blocks
1281 multiple times. The default value is **tausworthe**, unless the required
1282 space exceeds 2^32 blocks. If it does, then **tausworthe64** is
1283 selected automatically.
1284
1285
1286Block size
1287~~~~~~~~~~
1288
1289.. option:: blocksize=int[,int][,int], bs=int[,int][,int]
1290
1291 The block size in bytes used for I/O units. Default: 4096. A single value
1292 applies to reads, writes, and trims. Comma-separated values may be
1293 specified for reads, writes, and trims. A value not terminated in a comma
1294 applies to subsequent types.
1295
1296 Examples:
1297
1298 **bs=256k**
1299 means 256k for reads, writes and trims.
1300
1301 **bs=8k,32k**
1302 means 8k for reads, 32k for writes and trims.
1303
1304 **bs=8k,32k,**
1305 means 8k for reads, 32k for writes, and default for trims.
1306
1307 **bs=,8k**
1308 means default for reads, 8k for writes and trims.
1309
1310 **bs=,8k,**
b443ae44 1311 means default for reads, 8k for writes, and default for trims.
f80dba8d
MT
1312
1313.. option:: blocksize_range=irange[,irange][,irange], bsrange=irange[,irange][,irange]
1314
1315 A range of block sizes in bytes for I/O units. The issued I/O unit will
1316 always be a multiple of the minimum size, unless
1317 :option:`blocksize_unaligned` is set.
1318
1319 Comma-separated ranges may be specified for reads, writes, and trims as
1320 described in :option:`blocksize`.
1321
1322 Example: ``bsrange=1k-4k,2k-8k``.
1323
1324.. option:: bssplit=str[,str][,str]
1325
1326 Sometimes you want even finer grained control of the block sizes issued, not
1327 just an even split between them. This option allows you to weight various
1328 block sizes, so that you are able to define a specific amount of block sizes
1329 issued. The format for this option is::
1330
1331 bssplit=blocksize/percentage:blocksize/percentage
1332
1333 for as many block sizes as needed. So if you want to define a workload that
1334 has 50% 64k blocks, 10% 4k blocks, and 40% 32k blocks, you would write::
1335
1336 bssplit=4k/10:64k/50:32k/40
1337
1338 Ordering does not matter. If the percentage is left blank, fio will fill in
1339 the remaining values evenly. So a bssplit option like this one::
1340
1341 bssplit=4k/50:1k/:32k/
1342
1343 would have 50% 4k ios, and 25% 1k and 32k ios. The percentages always add up
1344 to 100, if bssplit is given a range that adds up to more, it will error out.
1345
1346 Comma-separated values may be specified for reads, writes, and trims as
1347 described in :option:`blocksize`.
1348
1349 If you want a workload that has 50% 2k reads and 50% 4k reads, while having
1350 90% 4k writes and 10% 8k writes, you would specify::
1351
1352 bssplit=2k/50:4k/50,4k/90,8k/10
1353
1354.. option:: blocksize_unaligned, bs_unaligned
1355
1356 If set, fio will issue I/O units with any size within
1357 :option:`blocksize_range`, not just multiples of the minimum size. This
1358 typically won't work with direct I/O, as that normally requires sector
1359 alignment.
1360
1361.. option:: bs_is_seq_rand
1362
1363 If this option is set, fio will use the normal read,write blocksize settings
1364 as sequential,random blocksize settings instead. Any random read or write
1365 will use the WRITE blocksize settings, and any sequential read or write will
1366 use the READ blocksize settings.
1367
1368.. option:: blockalign=int[,int][,int], ba=int[,int][,int]
1369
1370 Boundary to which fio will align random I/O units. Default:
1371 :option:`blocksize`. Minimum alignment is typically 512b for using direct
1372 I/O, though it usually depends on the hardware block size. This option is
1373 mutually exclusive with using a random map for files, so it will turn off
1374 that option. Comma-separated values may be specified for reads, writes, and
1375 trims as described in :option:`blocksize`.
1376
1377
1378Buffers and memory
1379~~~~~~~~~~~~~~~~~~
1380
1381.. option:: zero_buffers
1382
1383 Initialize buffers with all zeros. Default: fill buffers with random data.
1384
1385.. option:: refill_buffers
1386
1387 If this option is given, fio will refill the I/O buffers on every
1388 submit. The default is to only fill it at init time and reuse that
1389 data. Only makes sense if zero_buffers isn't specified, naturally. If data
1390 verification is enabled, `refill_buffers` is also automatically enabled.
1391
1392.. option:: scramble_buffers=bool
1393
1394 If :option:`refill_buffers` is too costly and the target is using data
1395 deduplication, then setting this option will slightly modify the I/O buffer
1396 contents to defeat normal de-dupe attempts. This is not enough to defeat
1397 more clever block compression attempts, but it will stop naive dedupe of
1398 blocks. Default: true.
1399
1400.. option:: buffer_compress_percentage=int
1401
1402 If this is set, then fio will attempt to provide I/O buffer content (on
1403 WRITEs) that compress to the specified level. Fio does this by providing a
1404 mix of random data and a fixed pattern. The fixed pattern is either zeroes,
1405 or the pattern specified by :option:`buffer_pattern`. If the pattern option
1406 is used, it might skew the compression ratio slightly. Note that this is per
1407 block size unit, for file/disk wide compression level that matches this
1408 setting, you'll also want to set :option:`refill_buffers`.
1409
1410.. option:: buffer_compress_chunk=int
1411
1412 See :option:`buffer_compress_percentage`. This setting allows fio to manage
1413 how big the ranges of random data and zeroed data is. Without this set, fio
1414 will provide :option:`buffer_compress_percentage` of blocksize random data,
1415 followed by the remaining zeroed. With this set to some chunk size smaller
1416 than the block size, fio can alternate random and zeroed data throughout the
1417 I/O buffer.
1418
1419.. option:: buffer_pattern=str
1420
a1554f65
SB
1421 If set, fio will fill the I/O buffers with this pattern or with the contents
1422 of a file. If not set, the contents of I/O buffers are defined by the other
1423 options related to buffer contents. The setting can be any pattern of bytes,
1424 and can be prefixed with 0x for hex values. It may also be a string, where
1425 the string must then be wrapped with ``""``. Or it may also be a filename,
1426 where the filename must be wrapped with ``''`` in which case the file is
1427 opened and read. Note that not all the file contents will be read if that
1428 would cause the buffers to overflow. So, for example::
1429
1430 buffer_pattern='filename'
1431
1432 or::
f80dba8d
MT
1433
1434 buffer_pattern="abcd"
1435
1436 or::
1437
1438 buffer_pattern=-12
1439
1440 or::
1441
1442 buffer_pattern=0xdeadface
1443
1444 Also you can combine everything together in any order::
1445
a1554f65 1446 buffer_pattern=0xdeadface"abcd"-12'filename'
f80dba8d
MT
1447
1448.. option:: dedupe_percentage=int
1449
1450 If set, fio will generate this percentage of identical buffers when
1451 writing. These buffers will be naturally dedupable. The contents of the
1452 buffers depend on what other buffer compression settings have been set. It's
1453 possible to have the individual buffers either fully compressible, or not at
1454 all. This option only controls the distribution of unique buffers.
1455
1456.. option:: invalidate=bool
1457
1458 Invalidate the buffer/page cache parts for this file prior to starting
21c1b29e
TK
1459 I/O if the platform and file type support it. Defaults to true.
1460 This will be ignored if :option:`pre_read` is also specified for the
1461 same job.
f80dba8d
MT
1462
1463.. option:: sync=bool
1464
1465 Use synchronous I/O for buffered writes. For the majority of I/O engines,
1466 this means using O_SYNC. Default: false.
1467
1468.. option:: iomem=str, mem=str
1469
1470 Fio can use various types of memory as the I/O unit buffer. The allowed
1471 values are:
1472
1473 **malloc**
1474 Use memory from :manpage:`malloc(3)` as the buffers. Default memory
1475 type.
1476
1477 **shm**
1478 Use shared memory as the buffers. Allocated through
1479 :manpage:`shmget(2)`.
1480
1481 **shmhuge**
1482 Same as shm, but use huge pages as backing.
1483
1484 **mmap**
1485 Use mmap to allocate buffers. May either be anonymous memory, or can
1486 be file backed if a filename is given after the option. The format
1487 is `mem=mmap:/path/to/file`.
1488
1489 **mmaphuge**
1490 Use a memory mapped huge file as the buffer backing. Append filename
1491 after mmaphuge, ala `mem=mmaphuge:/hugetlbfs/file`.
1492
1493 **mmapshared**
1494 Same as mmap, but use a MMAP_SHARED mapping.
1495
03553853
YR
1496 **cudamalloc**
1497 Use GPU memory as the buffers for GPUDirect RDMA benchmark.
1498
f80dba8d
MT
1499 The area allocated is a function of the maximum allowed bs size for the job,
1500 multiplied by the I/O depth given. Note that for **shmhuge** and
1501 **mmaphuge** to work, the system must have free huge pages allocated. This
1502 can normally be checked and set by reading/writing
1503 :file:`/proc/sys/vm/nr_hugepages` on a Linux system. Fio assumes a huge page
1504 is 4MiB in size. So to calculate the number of huge pages you need for a
1505 given job file, add up the I/O depth of all jobs (normally one unless
1506 :option:`iodepth` is used) and multiply by the maximum bs set. Then divide
1507 that number by the huge page size. You can see the size of the huge pages in
1508 :file:`/proc/meminfo`. If no huge pages are allocated by having a non-zero
1509 number in `nr_hugepages`, using **mmaphuge** or **shmhuge** will fail. Also
1510 see :option:`hugepage-size`.
1511
1512 **mmaphuge** also needs to have hugetlbfs mounted and the file location
1513 should point there. So if it's mounted in :file:`/huge`, you would use
1514 `mem=mmaphuge:/huge/somefile`.
1515
1516.. option:: iomem_align=int
1517
1518 This indicates the memory alignment of the I/O memory buffers. Note that
1519 the given alignment is applied to the first I/O unit buffer, if using
1520 :option:`iodepth` the alignment of the following buffers are given by the
1521 :option:`bs` used. In other words, if using a :option:`bs` that is a
1522 multiple of the page sized in the system, all buffers will be aligned to
1523 this value. If using a :option:`bs` that is not page aligned, the alignment
1524 of subsequent I/O memory buffers is the sum of the :option:`iomem_align` and
1525 :option:`bs` used.
1526
1527.. option:: hugepage-size=int
1528
1529 Defines the size of a huge page. Must at least be equal to the system
1530 setting, see :file:`/proc/meminfo`. Defaults to 4MiB. Should probably
1531 always be a multiple of megabytes, so using ``hugepage-size=Xm`` is the
1532 preferred way to set this to avoid setting a non-pow-2 bad value.
1533
1534.. option:: lockmem=int
1535
1536 Pin the specified amount of memory with :manpage:`mlock(2)`. Can be used to
1537 simulate a smaller amount of memory. The amount specified is per worker.
1538
1539
1540I/O size
1541~~~~~~~~
1542
1543.. option:: size=int
1544
79591fa9
TK
1545 The total size of file I/O for each thread of this job. Fio will run until
1546 this many bytes has been transferred, unless runtime is limited by other options
1547 (such as :option:`runtime`, for instance, or increased/decreased by :option:`io_size`).
1548 Fio will divide this size between the available files determined by options
1549 such as :option:`nrfiles`, :option:`filename`, unless :option:`filesize` is
1550 specified by the job. If the result of division happens to be 0, the size is
c4aa2d08 1551 set to the physical size of the given files or devices if they exist.
79591fa9 1552 If this option is not specified, fio will use the full size of the given
f80dba8d
MT
1553 files or devices. If the files do not exist, size must be given. It is also
1554 possible to give size as a percentage between 1 and 100. If ``size=20%`` is
1555 given, fio will use 20% of the full size of the given files or devices.
9d25d068
SW
1556 Can be combined with :option:`offset` to constrain the start and end range
1557 that I/O will be done within.
f80dba8d
MT
1558
1559.. option:: io_size=int, io_limit=int
1560
1561 Normally fio operates within the region set by :option:`size`, which means
1562 that the :option:`size` option sets both the region and size of I/O to be
1563 performed. Sometimes that is not what you want. With this option, it is
1564 possible to define just the amount of I/O that fio should do. For instance,
1565 if :option:`size` is set to 20GiB and :option:`io_size` is set to 5GiB, fio
1566 will perform I/O within the first 20GiB but exit when 5GiB have been
1567 done. The opposite is also possible -- if :option:`size` is set to 20GiB,
1568 and :option:`io_size` is set to 40GiB, then fio will do 40GiB of I/O within
1569 the 0..20GiB region.
1570
1571.. option:: filesize=int
1572
1573 Individual file sizes. May be a range, in which case fio will select sizes
1574 for files at random within the given range and limited to :option:`size` in
1575 total (if that is given). If not given, each created file is the same size.
79591fa9
TK
1576 This option overrides :option:`size` in terms of file size, which means
1577 this value is used as a fixed size or possible range of each file.
f80dba8d
MT
1578
1579.. option:: file_append=bool
1580
1581 Perform I/O after the end of the file. Normally fio will operate within the
1582 size of a file. If this option is set, then fio will append to the file
1583 instead. This has identical behavior to setting :option:`offset` to the size
1584 of a file. This option is ignored on non-regular files.
1585
1586.. option:: fill_device=bool, fill_fs=bool
1587
1588 Sets size to something really large and waits for ENOSPC (no space left on
1589 device) as the terminating condition. Only makes sense with sequential
1590 write. For a read workload, the mount point will be filled first then I/O
1591 started on the result. This option doesn't make sense if operating on a raw
1592 device node, since the size of that is already known by the file system.
1593 Additionally, writing beyond end-of-device will not return ENOSPC there.
1594
1595
1596I/O engine
1597~~~~~~~~~~
1598
1599.. option:: ioengine=str
1600
1601 Defines how the job issues I/O to the file. The following types are defined:
1602
1603 **sync**
1604 Basic :manpage:`read(2)` or :manpage:`write(2)`
1605 I/O. :manpage:`lseek(2)` is used to position the I/O location.
54227e6b 1606 See :option:`fsync` and :option:`fdatasync` for syncing write I/Os.
f80dba8d
MT
1607
1608 **psync**
1609 Basic :manpage:`pread(2)` or :manpage:`pwrite(2)` I/O. Default on
1610 all supported operating systems except for Windows.
1611
1612 **vsync**
1613 Basic :manpage:`readv(2)` or :manpage:`writev(2)` I/O. Will emulate
c60ebc45 1614 queuing by coalescing adjacent I/Os into a single submission.
f80dba8d
MT
1615
1616 **pvsync**
1617 Basic :manpage:`preadv(2)` or :manpage:`pwritev(2)` I/O.
1618
1619 **pvsync2**
1620 Basic :manpage:`preadv2(2)` or :manpage:`pwritev2(2)` I/O.
1621
1622 **libaio**
1623 Linux native asynchronous I/O. Note that Linux may only support
1624 queued behaviour with non-buffered I/O (set ``direct=1`` or
1625 ``buffered=0``).
1626 This engine defines engine specific options.
1627
1628 **posixaio**
1629 POSIX asynchronous I/O using :manpage:`aio_read(3)` and
1630 :manpage:`aio_write(3)`.
1631
1632 **solarisaio**
1633 Solaris native asynchronous I/O.
1634
1635 **windowsaio**
1636 Windows native asynchronous I/O. Default on Windows.
1637
1638 **mmap**
1639 File is memory mapped with :manpage:`mmap(2)` and data copied
1640 to/from using :manpage:`memcpy(3)`.
1641
1642 **splice**
1643 :manpage:`splice(2)` is used to transfer the data and
1644 :manpage:`vmsplice(2)` to transfer data from user space to the
1645 kernel.
1646
1647 **sg**
1648 SCSI generic sg v3 I/O. May either be synchronous using the SG_IO
1649 ioctl, or if the target is an sg character device we use
1650 :manpage:`read(2)` and :manpage:`write(2)` for asynchronous
1651 I/O. Requires filename option to specify either block or character
1652 devices.
1653
1654 **null**
1655 Doesn't transfer any data, just pretends to. This is mainly used to
1656 exercise fio itself and for debugging/testing purposes.
1657
1658 **net**
1659 Transfer over the network to given ``host:port``. Depending on the
1660 :option:`protocol` used, the :option:`hostname`, :option:`port`,
1661 :option:`listen` and :option:`filename` options are used to specify
1662 what sort of connection to make, while the :option:`protocol` option
1663 determines which protocol will be used. This engine defines engine
1664 specific options.
1665
1666 **netsplice**
1667 Like **net**, but uses :manpage:`splice(2)` and
1668 :manpage:`vmsplice(2)` to map data and send/receive.
1669 This engine defines engine specific options.
1670
1671 **cpuio**
1672 Doesn't transfer any data, but burns CPU cycles according to the
1673 :option:`cpuload` and :option:`cpuchunks` options. Setting
9207a0cb 1674 :option:`cpuload`\=85 will cause that job to do nothing but burn 85%
f80dba8d
MT
1675 of the CPU. In case of SMP machines, use :option:`numjobs`
1676 =<no_of_cpu> to get desired CPU usage, as the cpuload only loads a
1677 single CPU at the desired rate. A job never finishes unless there is
1678 at least one non-cpuio job.
1679
1680 **guasi**
1681 The GUASI I/O engine is the Generic Userspace Asyncronous Syscall
1682 Interface approach to async I/O. See
1683
1684 http://www.xmailserver.org/guasi-lib.html
1685
1686 for more info on GUASI.
1687
1688 **rdma**
1689 The RDMA I/O engine supports both RDMA memory semantics
1690 (RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
1691 InfiniBand, RoCE and iWARP protocols.
1692
1693 **falloc**
1694 I/O engine that does regular fallocate to simulate data transfer as
1695 fio ioengine.
1696
1697 DDIR_READ
1698 does fallocate(,mode = FALLOC_FL_KEEP_SIZE,).
1699
1700 DDIR_WRITE
1701 does fallocate(,mode = 0).
1702
1703 DDIR_TRIM
1704 does fallocate(,mode = FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE).
1705
761cd093
SW
1706 **ftruncate**
1707 I/O engine that sends :manpage:`ftruncate(2)` operations in response
1708 to write (DDIR_WRITE) events. Each ftruncate issued sets the file's
1709 size to the current block offset. Block size is ignored.
1710
f80dba8d
MT
1711 **e4defrag**
1712 I/O engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate
1713 defragment activity in request to DDIR_WRITE event.
1714
1715 **rbd**
1716 I/O engine supporting direct access to Ceph Rados Block Devices
1717 (RBD) via librbd without the need to use the kernel rbd driver. This
1718 ioengine defines engine specific options.
1719
1720 **gfapi**
1721 Using Glusterfs libgfapi sync interface to direct access to
1722 Glusterfs volumes without having to go through FUSE. This ioengine
1723 defines engine specific options.
1724
1725 **gfapi_async**
1726 Using Glusterfs libgfapi async interface to direct access to
1727 Glusterfs volumes without having to go through FUSE. This ioengine
1728 defines engine specific options.
1729
1730 **libhdfs**
1731 Read and write through Hadoop (HDFS). The :file:`filename` option
1732 is used to specify host,port of the hdfs name-node to connect. This
1733 engine interprets offsets a little differently. In HDFS, files once
1734 created cannot be modified. So random writes are not possible. To
1735 imitate this, libhdfs engine expects bunch of small files to be
1736 created over HDFS, and engine will randomly pick a file out of those
1737 files based on the offset generated by fio backend. (see the example
1738 job file to create such files, use ``rw=write`` option). Please
1739 note, you might want to set necessary environment variables to work
9d25d068 1740 with hdfs/libhdfs properly. Each job uses its own connection to
f80dba8d
MT
1741 HDFS.
1742
1743 **mtd**
1744 Read, write and erase an MTD character device (e.g.,
1745 :file:`/dev/mtd0`). Discards are treated as erases. Depending on the
1746 underlying device type, the I/O may have to go in a certain pattern,
1747 e.g., on NAND, writing sequentially to erase blocks and discarding
1748 before overwriting. The writetrim mode works well for this
1749 constraint.
1750
1751 **pmemblk**
1752 Read and write using filesystem DAX to a file on a filesystem
1753 mounted with DAX on a persistent memory device through the NVML
1754 libpmemblk library.
1755
1756 **dev-dax**
1757 Read and write using device DAX to a persistent memory device (e.g.,
1758 /dev/dax0.0) through the NVML libpmem library.
1759
1760 **external**
1761 Prefix to specify loading an external I/O engine object file. Append
c60ebc45 1762 the engine filename, e.g. ``ioengine=external:/tmp/foo.o`` to load
f80dba8d
MT
1763 ioengine :file:`foo.o` in :file:`/tmp`.
1764
1765
1766I/O engine specific parameters
1767~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1768
1769In addition, there are some parameters which are only valid when a specific
1770ioengine is in use. These are used identically to normal parameters, with the
1771caveat that when used on the command line, they must come after the
1772:option:`ioengine` that defines them is selected.
1773
1774.. option:: userspace_reap : [libaio]
1775
1776 Normally, with the libaio engine in use, fio will use the
1777 :manpage:`io_getevents(2)` system call to reap newly returned events. With
1778 this flag turned on, the AIO ring will be read directly from user-space to
1779 reap events. The reaping mode is only enabled when polling for a minimum of
c60ebc45 1780 0 events (e.g. when :option:`iodepth_batch_complete` `=0`).
f80dba8d 1781
9d25d068 1782.. option:: hipri : [pvsync2]
f80dba8d
MT
1783
1784 Set RWF_HIPRI on I/O, indicating to the kernel that it's of higher priority
1785 than normal.
1786
1787.. option:: cpuload=int : [cpuio]
1788
da19cdb4
TK
1789 Attempt to use the specified percentage of CPU cycles. This is a mandatory
1790 option when using cpuio I/O engine.
f80dba8d
MT
1791
1792.. option:: cpuchunks=int : [cpuio]
1793
1794 Split the load into cycles of the given time. In microseconds.
1795
1796.. option:: exit_on_io_done=bool : [cpuio]
1797
1798 Detect when I/O threads are done, then exit.
1799
1800.. option:: hostname=str : [netsplice] [net]
1801
1802 The host name or IP address to use for TCP or UDP based I/O. If the job is
1803 a TCP listener or UDP reader, the host name is not used and must be omitted
1804 unless it is a valid UDP multicast address.
1805
1806.. option:: namenode=str : [libhdfs]
1807
1808 The host name or IP address of a HDFS cluster namenode to contact.
1809
1810.. option:: port=int
1811
1812 [netsplice], [net]
1813
1814 The TCP or UDP port to bind to or connect to. If this is used with
1815 :option:`numjobs` to spawn multiple instances of the same job type, then
1816 this will be the starting port number since fio will use a range of
1817 ports.
1818
1819 [libhdfs]
1820
1821 the listening port of the HFDS cluster namenode.
1822
1823.. option:: interface=str : [netsplice] [net]
1824
1825 The IP address of the network interface used to send or receive UDP
1826 multicast.
1827
1828.. option:: ttl=int : [netsplice] [net]
1829
1830 Time-to-live value for outgoing UDP multicast packets. Default: 1.
1831
1832.. option:: nodelay=bool : [netsplice] [net]
1833
1834 Set TCP_NODELAY on TCP connections.
1835
1836.. option:: protocol=str : [netsplice] [net]
1837
1838.. option:: proto=str : [netsplice] [net]
1839
1840 The network protocol to use. Accepted values are:
1841
1842 **tcp**
1843 Transmission control protocol.
1844 **tcpv6**
1845 Transmission control protocol V6.
1846 **udp**
1847 User datagram protocol.
1848 **udpv6**
1849 User datagram protocol V6.
1850 **unix**
1851 UNIX domain socket.
1852
1853 When the protocol is TCP or UDP, the port must also be given, as well as the
1854 hostname if the job is a TCP listener or UDP reader. For unix sockets, the
1855 normal filename option should be used and the port is invalid.
1856
1857.. option:: listen : [net]
1858
1859 For TCP network connections, tell fio to listen for incoming connections
1860 rather than initiating an outgoing connection. The :option:`hostname` must
1861 be omitted if this option is used.
1862
1863.. option:: pingpong : [net]
1864
1865 Normally a network writer will just continue writing data, and a network
1866 reader will just consume packages. If ``pingpong=1`` is set, a writer will
1867 send its normal payload to the reader, then wait for the reader to send the
1868 same payload back. This allows fio to measure network latencies. The
1869 submission and completion latencies then measure local time spent sending or
1870 receiving, and the completion latency measures how long it took for the
1871 other end to receive and send back. For UDP multicast traffic
1872 ``pingpong=1`` should only be set for a single reader when multiple readers
1873 are listening to the same address.
1874
1875.. option:: window_size : [net]
1876
1877 Set the desired socket buffer size for the connection.
1878
1879.. option:: mss : [net]
1880
1881 Set the TCP maximum segment size (TCP_MAXSEG).
1882
1883.. option:: donorname=str : [e4defrag]
1884
1885 File will be used as a block donor(swap extents between files).
1886
1887.. option:: inplace=int : [e4defrag]
1888
1889 Configure donor file blocks allocation strategy:
1890
1891 **0**
1892 Default. Preallocate donor's file on init.
1893 **1**
1894 Allocate space immediately inside defragment event, and free right
1895 after event.
1896
1897.. option:: clustername=str : [rbd]
1898
1899 Specifies the name of the Ceph cluster.
1900
1901.. option:: rbdname=str : [rbd]
1902
1903 Specifies the name of the RBD.
1904
1905.. option:: pool=str : [rbd]
1906
1907 Specifies the name of the Ceph pool containing RBD.
1908
1909.. option:: clientname=str : [rbd]
1910
1911 Specifies the username (without the 'client.' prefix) used to access the
1912 Ceph cluster. If the *clustername* is specified, the *clientname* shall be
1913 the full *type.id* string. If no type. prefix is given, fio will add
1914 'client.' by default.
1915
1916.. option:: skip_bad=bool : [mtd]
1917
1918 Skip operations against known bad blocks.
1919
1920.. option:: hdfsdirectory : [libhdfs]
1921
1922 libhdfs will create chunk in this HDFS directory.
1923
1924.. option:: chunk_size : [libhdfs]
1925
1926 the size of the chunk to use for each file.
1927
1928
1929I/O depth
1930~~~~~~~~~
1931
1932.. option:: iodepth=int
1933
1934 Number of I/O units to keep in flight against the file. Note that
1935 increasing *iodepth* beyond 1 will not affect synchronous ioengines (except
c60ebc45 1936 for small degrees when :option:`verify_async` is in use). Even async
f80dba8d
MT
1937 engines may impose OS restrictions causing the desired depth not to be
1938 achieved. This may happen on Linux when using libaio and not setting
9207a0cb 1939 :option:`direct`\=1, since buffered I/O is not async on that OS. Keep an
f80dba8d
MT
1940 eye on the I/O depth distribution in the fio output to verify that the
1941 achieved depth is as expected. Default: 1.
1942
1943.. option:: iodepth_batch_submit=int, iodepth_batch=int
1944
1945 This defines how many pieces of I/O to submit at once. It defaults to 1
1946 which means that we submit each I/O as soon as it is available, but can be
1947 raised to submit bigger batches of I/O at the time. If it is set to 0 the
1948 :option:`iodepth` value will be used.
1949
1950.. option:: iodepth_batch_complete_min=int, iodepth_batch_complete=int
1951
1952 This defines how many pieces of I/O to retrieve at once. It defaults to 1
1953 which means that we'll ask for a minimum of 1 I/O in the retrieval process
1954 from the kernel. The I/O retrieval will go on until we hit the limit set by
1955 :option:`iodepth_low`. If this variable is set to 0, then fio will always
1956 check for completed events before queuing more I/O. This helps reduce I/O
1957 latency, at the cost of more retrieval system calls.
1958
1959.. option:: iodepth_batch_complete_max=int
1960
1961 This defines maximum pieces of I/O to retrieve at once. This variable should
9207a0cb 1962 be used along with :option:`iodepth_batch_complete_min`\=int variable,
f80dba8d
MT
1963 specifying the range of min and max amount of I/O which should be
1964 retrieved. By default it is equal to :option:`iodepth_batch_complete_min`
1965 value.
1966
1967 Example #1::
1968
1969 iodepth_batch_complete_min=1
1970 iodepth_batch_complete_max=<iodepth>
1971
1972 which means that we will retrieve at least 1 I/O and up to the whole
1973 submitted queue depth. If none of I/O has been completed yet, we will wait.
1974
1975 Example #2::
1976
1977 iodepth_batch_complete_min=0
1978 iodepth_batch_complete_max=<iodepth>
1979
1980 which means that we can retrieve up to the whole submitted queue depth, but
1981 if none of I/O has been completed yet, we will NOT wait and immediately exit
1982 the system call. In this example we simply do polling.
1983
1984.. option:: iodepth_low=int
1985
1986 The low water mark indicating when to start filling the queue
1987 again. Defaults to the same as :option:`iodepth`, meaning that fio will
1988 attempt to keep the queue full at all times. If :option:`iodepth` is set to
c60ebc45 1989 e.g. 16 and *iodepth_low* is set to 4, then after fio has filled the queue of
f80dba8d
MT
1990 16 requests, it will let the depth drain down to 4 before starting to fill
1991 it again.
1992
1993.. option:: io_submit_mode=str
1994
1995 This option controls how fio submits the I/O to the I/O engine. The default
1996 is `inline`, which means that the fio job threads submit and reap I/O
1997 directly. If set to `offload`, the job threads will offload I/O submission
1998 to a dedicated pool of I/O threads. This requires some coordination and thus
1999 has a bit of extra overhead, especially for lower queue depth I/O where it
2000 can increase latencies. The benefit is that fio can manage submission rates
2001 independently of the device completion rates. This avoids skewed latency
2002 reporting if I/O gets back up on the device side (the coordinated omission
2003 problem).
2004
2005
2006I/O rate
2007~~~~~~~~
2008
a881438b 2009.. option:: thinktime=time
f80dba8d 2010
f75ede1d
SW
2011 Stall the job for the specified period of time after an I/O has completed before issuing the
2012 next. May be used to simulate processing being done by an application.
947e0fe0 2013 When the unit is omitted, the value is interpreted in microseconds. See
f80dba8d
MT
2014 :option:`thinktime_blocks` and :option:`thinktime_spin`.
2015
a881438b 2016.. option:: thinktime_spin=time
f80dba8d
MT
2017
2018 Only valid if :option:`thinktime` is set - pretend to spend CPU time doing
2019 something with the data received, before falling back to sleeping for the
f75ede1d 2020 rest of the period specified by :option:`thinktime`. When the unit is
947e0fe0 2021 omitted, the value is interpreted in microseconds.
f80dba8d
MT
2022
2023.. option:: thinktime_blocks=int
2024
2025 Only valid if :option:`thinktime` is set - control how many blocks to issue,
2026 before waiting `thinktime` usecs. If not set, defaults to 1 which will make
2027 fio wait `thinktime` usecs after every block. This effectively makes any
2028 queue depth setting redundant, since no more than 1 I/O will be queued
2029 before we have to complete it and do our thinktime. In other words, this
2030 setting effectively caps the queue depth if the latter is larger.
71bfa161 2031
f80dba8d 2032.. option:: rate=int[,int][,int]
71bfa161 2033
f80dba8d
MT
2034 Cap the bandwidth used by this job. The number is in bytes/sec, the normal
2035 suffix rules apply. Comma-separated values may be specified for reads,
2036 writes, and trims as described in :option:`blocksize`.
71bfa161 2037
f80dba8d 2038.. option:: rate_min=int[,int][,int]
71bfa161 2039
f80dba8d
MT
2040 Tell fio to do whatever it can to maintain at least this bandwidth. Failing
2041 to meet this requirement will cause the job to exit. Comma-separated values
2042 may be specified for reads, writes, and trims as described in
2043 :option:`blocksize`.
71bfa161 2044
f80dba8d 2045.. option:: rate_iops=int[,int][,int]
71bfa161 2046
f80dba8d
MT
2047 Cap the bandwidth to this number of IOPS. Basically the same as
2048 :option:`rate`, just specified independently of bandwidth. If the job is
2049 given a block size range instead of a fixed value, the smallest block size
2050 is used as the metric. Comma-separated values may be specified for reads,
2051 writes, and trims as described in :option:`blocksize`.
71bfa161 2052
f80dba8d 2053.. option:: rate_iops_min=int[,int][,int]
71bfa161 2054
f80dba8d
MT
2055 If fio doesn't meet this rate of I/O, it will cause the job to exit.
2056 Comma-separated values may be specified for reads, writes, and trims as
2057 described in :option:`blocksize`.
71bfa161 2058
f80dba8d 2059.. option:: rate_process=str
66c098b8 2060
f80dba8d
MT
2061 This option controls how fio manages rated I/O submissions. The default is
2062 `linear`, which submits I/O in a linear fashion with fixed delays between
c60ebc45 2063 I/Os that gets adjusted based on I/O completion rates. If this is set to
f80dba8d
MT
2064 `poisson`, fio will submit I/O based on a more real world random request
2065 flow, known as the Poisson process
2066 (https://en.wikipedia.org/wiki/Poisson_point_process). The lambda will be
2067 10^6 / IOPS for the given workload.
71bfa161
JA
2068
2069
f80dba8d
MT
2070I/O latency
2071~~~~~~~~~~~
71bfa161 2072
a881438b 2073.. option:: latency_target=time
71bfa161 2074
f80dba8d 2075 If set, fio will attempt to find the max performance point that the given
f75ede1d 2076 workload will run at while maintaining a latency below this target. When
947e0fe0 2077 the unit is omitted, the value is interpreted in microseconds. See
f75ede1d 2078 :option:`latency_window` and :option:`latency_percentile`.
71bfa161 2079
a881438b 2080.. option:: latency_window=time
71bfa161 2081
f80dba8d 2082 Used with :option:`latency_target` to specify the sample window that the job
f75ede1d 2083 is run at varying queue depths to test the performance. When the unit is
947e0fe0 2084 omitted, the value is interpreted in microseconds.
b4692828 2085
f80dba8d 2086.. option:: latency_percentile=float
71bfa161 2087
c60ebc45 2088 The percentage of I/Os that must fall within the criteria specified by
f80dba8d 2089 :option:`latency_target` and :option:`latency_window`. If not set, this
c60ebc45 2090 defaults to 100.0, meaning that all I/Os must be equal or below to the value
f80dba8d 2091 set by :option:`latency_target`.
71bfa161 2092
a881438b 2093.. option:: max_latency=time
71bfa161 2094
f75ede1d 2095 If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
947e0fe0 2096 maximum latency. When the unit is omitted, the value is interpreted in
f75ede1d 2097 microseconds.
71bfa161 2098
f80dba8d 2099.. option:: rate_cycle=int
71bfa161 2100
f80dba8d 2101 Average bandwidth for :option:`rate` and :option:`rate_min` over this number
a47b697c 2102 of milliseconds. Defaults to 1000.
71bfa161 2103
71bfa161 2104
f80dba8d
MT
2105I/O replay
2106~~~~~~~~~~
71bfa161 2107
f80dba8d 2108.. option:: write_iolog=str
c2b1e753 2109
f80dba8d
MT
2110 Write the issued I/O patterns to the specified file. See
2111 :option:`read_iolog`. Specify a separate file for each job, otherwise the
2112 iologs will be interspersed and the file may be corrupt.
c2b1e753 2113
f80dba8d 2114.. option:: read_iolog=str
71bfa161 2115
f80dba8d
MT
2116 Open an iolog with the specified file name and replay the I/O patterns it
2117 contains. This can be used to store a workload and replay it sometime
2118 later. The iolog given may also be a blktrace binary file, which allows fio
2119 to replay a workload captured by :command:`blktrace`. See
2120 :manpage:`blktrace(8)` for how to capture such logging data. For blktrace
2121 replay, the file needs to be turned into a blkparse binary data file first
2122 (``blkparse <device> -o /dev/null -d file_for_fio.bin``).
71bfa161 2123
f80dba8d 2124.. option:: replay_no_stall=int
71bfa161 2125
f80dba8d
MT
2126 When replaying I/O with :option:`read_iolog` the default behavior is to
2127 attempt to respect the time stamps within the log and replay them with the
2128 appropriate delay between IOPS. By setting this variable fio will not
2129 respect the timestamps and attempt to replay them as fast as possible while
2130 still respecting ordering. The result is the same I/O pattern to a given
2131 device, but different timings.
71bfa161 2132
f80dba8d 2133.. option:: replay_redirect=str
b4692828 2134
f80dba8d
MT
2135 While replaying I/O patterns using :option:`read_iolog` the default behavior
2136 is to replay the IOPS onto the major/minor device that each IOP was recorded
2137 from. This is sometimes undesirable because on a different machine those
2138 major/minor numbers can map to a different device. Changing hardware on the
2139 same system can also result in a different major/minor mapping.
2140 ``replay_redirect`` causes all IOPS to be replayed onto the single specified
2141 device regardless of the device it was recorded
9207a0cb 2142 from. i.e. :option:`replay_redirect`\= :file:`/dev/sdc` would cause all I/O
f80dba8d
MT
2143 in the blktrace or iolog to be replayed onto :file:`/dev/sdc`. This means
2144 multiple devices will be replayed onto a single device, if the trace
2145 contains multiple devices. If you want multiple devices to be replayed
2146 concurrently to multiple redirected devices you must blkparse your trace
2147 into separate traces and replay them with independent fio invocations.
2148 Unfortunately this also breaks the strict time ordering between multiple
2149 device accesses.
71bfa161 2150
f80dba8d 2151.. option:: replay_align=int
74929ac2 2152
f80dba8d
MT
2153 Force alignment of I/O offsets and lengths in a trace to this power of 2
2154 value.
3c54bc46 2155
f80dba8d 2156.. option:: replay_scale=int
3c54bc46 2157
f80dba8d 2158 Scale sector offsets down by this factor when replaying traces.
3c54bc46 2159
3c54bc46 2160
f80dba8d
MT
2161Threads, processes and job synchronization
2162~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3c54bc46 2163
f80dba8d 2164.. option:: thread
3c54bc46 2165
f80dba8d 2166 Fio defaults to forking jobs, however if this option is given, fio will use
79591fa9
TK
2167 POSIX Threads function :manpage:`pthread_create(3)` to create threads instead
2168 of forking processes.
71bfa161 2169
f80dba8d 2170.. option:: wait_for=str
74929ac2 2171
f80dba8d
MT
2172 Specifies the name of the already defined job to wait for. Single waitee
2173 name only may be specified. If set, the job won't be started until all
2174 workers of the waitee job are done.
74929ac2 2175
f80dba8d
MT
2176 ``wait_for`` operates on the job name basis, so there are a few
2177 limitations. First, the waitee must be defined prior to the waiter job
2178 (meaning no forward references). Second, if a job is being referenced as a
2179 waitee, it must have a unique name (no duplicate waitees).
74929ac2 2180
f80dba8d 2181.. option:: nice=int
892a6ffc 2182
f80dba8d 2183 Run the job with the given nice value. See man :manpage:`nice(2)`.
892a6ffc 2184
f80dba8d
MT
2185 On Windows, values less than -15 set the process class to "High"; -1 through
2186 -15 set "Above Normal"; 1 through 15 "Below Normal"; and above 15 "Idle"
2187 priority class.
74929ac2 2188
f80dba8d 2189.. option:: prio=int
71bfa161 2190
f80dba8d
MT
2191 Set the I/O priority value of this job. Linux limits us to a positive value
2192 between 0 and 7, with 0 being the highest. See man
2193 :manpage:`ionice(1)`. Refer to an appropriate manpage for other operating
2194 systems since meaning of priority may differ.
71bfa161 2195
f80dba8d 2196.. option:: prioclass=int
d59aa780 2197
f80dba8d 2198 Set the I/O priority class. See man :manpage:`ionice(1)`.
d59aa780 2199
f80dba8d 2200.. option:: cpumask=int
71bfa161 2201
f80dba8d
MT
2202 Set the CPU affinity of this job. The parameter given is a bitmask of
2203 allowed CPU's the job may run on. So if you want the allowed CPUs to be 1
2204 and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
2205 :manpage:`sched_setaffinity(2)`. This may not work on all supported
2206 operating systems or kernel versions. This option doesn't work well for a
2207 higher CPU count than what you can store in an integer mask, so it can only
2208 control cpus 1-32. For boxes with larger CPU counts, use
2209 :option:`cpus_allowed`.
6d500c2e 2210
f80dba8d 2211.. option:: cpus_allowed=str
6d500c2e 2212
f80dba8d
MT
2213 Controls the same options as :option:`cpumask`, but it allows a text setting
2214 of the permitted CPUs instead. So to use CPUs 1 and 5, you would specify
2215 ``cpus_allowed=1,5``. This options also allows a range of CPUs. Say you
2216 wanted a binding to CPUs 1, 5, and 8-15, you would set
2217 ``cpus_allowed=1,5,8-15``.
6d500c2e 2218
f80dba8d 2219.. option:: cpus_allowed_policy=str
6d500c2e 2220
f80dba8d
MT
2221 Set the policy of how fio distributes the CPUs specified by
2222 :option:`cpus_allowed` or cpumask. Two policies are supported:
6d500c2e 2223
f80dba8d
MT
2224 **shared**
2225 All jobs will share the CPU set specified.
2226 **split**
2227 Each job will get a unique CPU from the CPU set.
6d500c2e 2228
f80dba8d
MT
2229 **shared** is the default behaviour, if the option isn't specified. If
2230 **split** is specified, then fio will will assign one cpu per job. If not
2231 enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
2232 in the set.
6d500c2e 2233
f80dba8d 2234.. option:: numa_cpu_nodes=str
6d500c2e 2235
f80dba8d
MT
2236 Set this job running on specified NUMA nodes' CPUs. The arguments allow
2237 comma delimited list of cpu numbers, A-B ranges, or `all`. Note, to enable
2238 numa options support, fio must be built on a system with libnuma-dev(el)
2239 installed.
61b9861d 2240
f80dba8d 2241.. option:: numa_mem_policy=str
61b9861d 2242
f80dba8d
MT
2243 Set this job's memory policy and corresponding NUMA nodes. Format of the
2244 arguments::
5c94b008 2245
f80dba8d 2246 <mode>[:<nodelist>]
ce35b1ec 2247
f80dba8d
MT
2248 ``mode`` is one of the following memory policy: ``default``, ``prefer``,
2249 ``bind``, ``interleave``, ``local`` For ``default`` and ``local`` memory
2250 policy, no node is needed to be specified. For ``prefer``, only one node is
2251 allowed. For ``bind`` and ``interleave``, it allow comma delimited list of
2252 numbers, A-B ranges, or `all`.
71bfa161 2253
f80dba8d 2254.. option:: cgroup=str
390b1537 2255
f80dba8d
MT
2256 Add job to this control group. If it doesn't exist, it will be created. The
2257 system must have a mounted cgroup blkio mount point for this to work. If
2258 your system doesn't have it mounted, you can do so with::
5af1c6f3 2259
f80dba8d 2260 # mount -t cgroup -o blkio none /cgroup
5af1c6f3 2261
f80dba8d 2262.. option:: cgroup_weight=int
5af1c6f3 2263
f80dba8d
MT
2264 Set the weight of the cgroup to this value. See the documentation that comes
2265 with the kernel, allowed values are in the range of 100..1000.
a086c257 2266
f80dba8d 2267.. option:: cgroup_nodelete=bool
8c07860d 2268
f80dba8d
MT
2269 Normally fio will delete the cgroups it has created after the job
2270 completion. To override this behavior and to leave cgroups around after the
2271 job completion, set ``cgroup_nodelete=1``. This can be useful if one wants
2272 to inspect various cgroup files after job completion. Default: false.
8c07860d 2273
f80dba8d 2274.. option:: flow_id=int
8c07860d 2275
f80dba8d
MT
2276 The ID of the flow. If not specified, it defaults to being a global
2277 flow. See :option:`flow`.
1907dbc6 2278
f80dba8d 2279.. option:: flow=int
71bfa161 2280
f80dba8d
MT
2281 Weight in token-based flow control. If this value is used, then there is a
2282 'flow counter' which is used to regulate the proportion of activity between
2283 two or more jobs. Fio attempts to keep this flow counter near zero. The
2284 ``flow`` parameter stands for how much should be added or subtracted to the
2285 flow counter on each iteration of the main I/O loop. That is, if one job has
2286 ``flow=8`` and another job has ``flow=-1``, then there will be a roughly 1:8
2287 ratio in how much one runs vs the other.
71bfa161 2288
f80dba8d 2289.. option:: flow_watermark=int
a31041ea 2290
f80dba8d
MT
2291 The maximum value that the absolute value of the flow counter is allowed to
2292 reach before the job must wait for a lower value of the counter.
82407585 2293
f80dba8d 2294.. option:: flow_sleep=int
82407585 2295
f80dba8d
MT
2296 The period of time, in microseconds, to wait after the flow watermark has
2297 been exceeded before retrying operations.
82407585 2298
f80dba8d 2299.. option:: stonewall, wait_for_previous
82407585 2300
f80dba8d
MT
2301 Wait for preceding jobs in the job file to exit, before starting this
2302 one. Can be used to insert serialization points in the job file. A stone
2303 wall also implies starting a new reporting group, see
2304 :option:`group_reporting`.
2305
2306.. option:: exitall
2307
2308 When one job finishes, terminate the rest. The default is to wait for each
2309 job to finish, sometimes that is not the desired action.
2310
2311.. option:: exec_prerun=str
2312
2313 Before running this job, issue the command specified through
2314 :manpage:`system(3)`. Output is redirected in a file called
2315 :file:`jobname.prerun.txt`.
2316
2317.. option:: exec_postrun=str
2318
2319 After the job completes, issue the command specified though
2320 :manpage:`system(3)`. Output is redirected in a file called
2321 :file:`jobname.postrun.txt`.
2322
2323.. option:: uid=int
2324
2325 Instead of running as the invoking user, set the user ID to this value
2326 before the thread/process does any work.
2327
2328.. option:: gid=int
2329
2330 Set group ID, see :option:`uid`.
2331
2332
2333Verification
2334~~~~~~~~~~~~
2335
2336.. option:: verify_only
2337
2338 Do not perform specified workload, only verify data still matches previous
2339 invocation of this workload. This option allows one to check data multiple
2340 times at a later date without overwriting it. This option makes sense only
2341 for workloads that write data, and does not support workloads with the
2342 :option:`time_based` option set.
2343
2344.. option:: do_verify=bool
2345
2346 Run the verify phase after a write phase. Only valid if :option:`verify` is
2347 set. Default: true.
2348
2349.. option:: verify=str
2350
2351 If writing to a file, fio can verify the file contents after each iteration
2352 of the job. Each verification method also implies verification of special
2353 header, which is written to the beginning of each block. This header also
2354 includes meta information, like offset of the block, block number, timestamp
2355 when block was written, etc. :option:`verify` can be combined with
2356 :option:`verify_pattern` option. The allowed values are:
2357
2358 **md5**
2359 Use an md5 sum of the data area and store it in the header of
2360 each block.
2361
2362 **crc64**
2363 Use an experimental crc64 sum of the data area and store it in the
2364 header of each block.
2365
2366 **crc32c**
2367 Use a crc32c sum of the data area and store it in the header of each
2368 block.
2369
2370 **crc32c-intel**
2371 Use hardware assisted crc32c calculation provided on SSE4.2 enabled
2372 processors. Falls back to regular software crc32c, if not supported
2373 by the system.
2374
2375 **crc32**
2376 Use a crc32 sum of the data area and store it in the header of each
2377 block.
2378
2379 **crc16**
2380 Use a crc16 sum of the data area and store it in the header of each
2381 block.
2382
2383 **crc7**
2384 Use a crc7 sum of the data area and store it in the header of each
2385 block.
2386
2387 **xxhash**
2388 Use xxhash as the checksum function. Generally the fastest software
2389 checksum that fio supports.
2390
2391 **sha512**
2392 Use sha512 as the checksum function.
2393
2394 **sha256**
2395 Use sha256 as the checksum function.
2396
2397 **sha1**
2398 Use optimized sha1 as the checksum function.
82407585 2399
ae3a5acc
JA
2400 **sha3-224**
2401 Use optimized sha3-224 as the checksum function.
2402
2403 **sha3-256**
2404 Use optimized sha3-256 as the checksum function.
2405
2406 **sha3-384**
2407 Use optimized sha3-384 as the checksum function.
2408
2409 **sha3-512**
2410 Use optimized sha3-512 as the checksum function.
2411
f80dba8d
MT
2412 **meta**
2413 This option is deprecated, since now meta information is included in
2414 generic verification header and meta verification happens by
2415 default. For detailed information see the description of the
2416 :option:`verify` setting. This option is kept because of
2417 compatibility's sake with old configurations. Do not use it.
2418
2419 **pattern**
2420 Verify a strict pattern. Normally fio includes a header with some
2421 basic information and checksumming, but if this option is set, only
2422 the specific pattern set with :option:`verify_pattern` is verified.
2423
2424 **null**
2425 Only pretend to verify. Useful for testing internals with
9207a0cb 2426 :option:`ioengine`\=null, not for much else.
f80dba8d
MT
2427
2428 This option can be used for repeated burn-in tests of a system to make sure
2429 that the written data is also correctly read back. If the data direction
2430 given is a read or random read, fio will assume that it should verify a
2431 previously written file. If the data direction includes any form of write,
2432 the verify will be of the newly written data.
2433
2434.. option:: verifysort=bool
2435
2436 If true, fio will sort written verify blocks when it deems it faster to read
2437 them back in a sorted manner. This is often the case when overwriting an
2438 existing file, since the blocks are already laid out in the file system. You
2439 can ignore this option unless doing huge amounts of really fast I/O where
2440 the red-black tree sorting CPU time becomes significant. Default: true.
2441
2442.. option:: verifysort_nr=int
2443
2444 Pre-load and sort verify blocks for a read workload.
2445
2446.. option:: verify_offset=int
2447
2448 Swap the verification header with data somewhere else in the block before
2449 writing. It is swapped back before verifying.
2450
2451.. option:: verify_interval=int
2452
2453 Write the verification header at a finer granularity than the
2454 :option:`blocksize`. It will be written for chunks the size of
2455 ``verify_interval``. :option:`blocksize` should divide this evenly.
2456
2457.. option:: verify_pattern=str
2458
2459 If set, fio will fill the I/O buffers with this pattern. Fio defaults to
2460 filling with totally random bytes, but sometimes it's interesting to fill
2461 with a known pattern for I/O verification purposes. Depending on the width
2462 of the pattern, fio will fill 1/2/3/4 bytes of the buffer at the time(it can
2463 be either a decimal or a hex number). The ``verify_pattern`` if larger than
2464 a 32-bit quantity has to be a hex number that starts with either "0x" or
2465 "0X". Use with :option:`verify`. Also, ``verify_pattern`` supports %o
2466 format, which means that for each block offset will be written and then
2467 verified back, e.g.::
61b9861d
RP
2468
2469 verify_pattern=%o
2470
f80dba8d
MT
2471 Or use combination of everything::
2472
61b9861d 2473 verify_pattern=0xff%o"abcd"-12
e28218f3 2474
f80dba8d
MT
2475.. option:: verify_fatal=bool
2476
2477 Normally fio will keep checking the entire contents before quitting on a
2478 block verification failure. If this option is set, fio will exit the job on
2479 the first observed failure. Default: false.
2480
2481.. option:: verify_dump=bool
2482
2483 If set, dump the contents of both the original data block and the data block
2484 we read off disk to files. This allows later analysis to inspect just what
2485 kind of data corruption occurred. Off by default.
2486
2487.. option:: verify_async=int
2488
2489 Fio will normally verify I/O inline from the submitting thread. This option
2490 takes an integer describing how many async offload threads to create for I/O
2491 verification instead, causing fio to offload the duty of verifying I/O
2492 contents to one or more separate threads. If using this offload option, even
2493 sync I/O engines can benefit from using an :option:`iodepth` setting higher
2494 than 1, as it allows them to have I/O in flight while verifies are running.
d7e6ea1c 2495 Defaults to 0 async threads, i.e. verification is not asynchronous.
f80dba8d
MT
2496
2497.. option:: verify_async_cpus=str
2498
2499 Tell fio to set the given CPU affinity on the async I/O verification
2500 threads. See :option:`cpus_allowed` for the format used.
2501
2502.. option:: verify_backlog=int
2503
2504 Fio will normally verify the written contents of a job that utilizes verify
2505 once that job has completed. In other words, everything is written then
2506 everything is read back and verified. You may want to verify continually
2507 instead for a variety of reasons. Fio stores the meta data associated with
2508 an I/O block in memory, so for large verify workloads, quite a bit of memory
2509 would be used up holding this meta data. If this option is enabled, fio will
2510 write only N blocks before verifying these blocks.
2511
2512.. option:: verify_backlog_batch=int
2513
2514 Control how many blocks fio will verify if :option:`verify_backlog` is
2515 set. If not set, will default to the value of :option:`verify_backlog`
2516 (meaning the entire queue is read back and verified). If
2517 ``verify_backlog_batch`` is less than :option:`verify_backlog` then not all
2518 blocks will be verified, if ``verify_backlog_batch`` is larger than
2519 :option:`verify_backlog`, some blocks will be verified more than once.
2520
2521.. option:: verify_state_save=bool
2522
2523 When a job exits during the write phase of a verify workload, save its
2524 current state. This allows fio to replay up until that point, if the verify
2525 state is loaded for the verify read phase. The format of the filename is,
2526 roughly::
2527
2528 <type>-<jobname>-<jobindex>-verify.state.
2529
2530 <type> is "local" for a local run, "sock" for a client/server socket
2531 connection, and "ip" (192.168.0.1, for instance) for a networked
d7e6ea1c 2532 client/server connection. Defaults to true.
f80dba8d
MT
2533
2534.. option:: verify_state_load=bool
2535
2536 If a verify termination trigger was used, fio stores the current write state
2537 of each thread. This can be used at verification time so that fio knows how
2538 far it should verify. Without this information, fio will run a full
a47b697c
SW
2539 verification pass, according to the settings in the job file used. Default
2540 false.
f80dba8d
MT
2541
2542.. option:: trim_percentage=int
2543
2544 Number of verify blocks to discard/trim.
2545
2546.. option:: trim_verify_zero=bool
2547
2548 Verify that trim/discarded blocks are returned as zeroes.
2549
2550.. option:: trim_backlog=int
2551
2552 Verify that trim/discarded blocks are returned as zeroes.
2553
2554.. option:: trim_backlog_batch=int
2555
2556 Trim this number of I/O blocks.
2557
2558.. option:: experimental_verify=bool
2559
2560 Enable experimental verification.
2561
2562
2563Steady state
2564~~~~~~~~~~~~
2565
2566.. option:: steadystate=str:float, ss=str:float
2567
2568 Define the criterion and limit for assessing steady state performance. The
2569 first parameter designates the criterion whereas the second parameter sets
2570 the threshold. When the criterion falls below the threshold for the
2571 specified duration, the job will stop. For example, `iops_slope:0.1%` will
2572 direct fio to terminate the job when the least squares regression slope
2573 falls below 0.1% of the mean IOPS. If :option:`group_reporting` is enabled
2574 this will apply to all jobs in the group. Below is the list of available
2575 steady state assessment criteria. All assessments are carried out using only
2576 data from the rolling collection window. Threshold limits can be expressed
2577 as a fixed value or as a percentage of the mean in the collection window.
2578
2579 **iops**
2580 Collect IOPS data. Stop the job if all individual IOPS measurements
2581 are within the specified limit of the mean IOPS (e.g., ``iops:2``
2582 means that all individual IOPS values must be within 2 of the mean,
2583 whereas ``iops:0.2%`` means that all individual IOPS values must be
2584 within 0.2% of the mean IOPS to terminate the job).
2585
2586 **iops_slope**
2587 Collect IOPS data and calculate the least squares regression
2588 slope. Stop the job if the slope falls below the specified limit.
2589
2590 **bw**
2591 Collect bandwidth data. Stop the job if all individual bandwidth
2592 measurements are within the specified limit of the mean bandwidth.
2593
2594 **bw_slope**
2595 Collect bandwidth data and calculate the least squares regression
2596 slope. Stop the job if the slope falls below the specified limit.
2597
2598.. option:: steadystate_duration=time, ss_dur=time
2599
2600 A rolling window of this duration will be used to judge whether steady state
2601 has been reached. Data will be collected once per second. The default is 0
f75ede1d 2602 which disables steady state detection. When the unit is omitted, the
947e0fe0 2603 value is interpreted in seconds.
f80dba8d
MT
2604
2605.. option:: steadystate_ramp_time=time, ss_ramp=time
2606
2607 Allow the job to run for the specified duration before beginning data
2608 collection for checking the steady state job termination criterion. The
947e0fe0 2609 default is 0. When the unit is omitted, the value is interpreted in seconds.
f80dba8d
MT
2610
2611
2612Measurements and reporting
2613~~~~~~~~~~~~~~~~~~~~~~~~~~
2614
2615.. option:: per_job_logs=bool
2616
2617 If set, this generates bw/clat/iops log with per file private filenames. If
2618 not set, jobs with identical names will share the log filename. Default:
2619 true.
2620
2621.. option:: group_reporting
2622
2623 It may sometimes be interesting to display statistics for groups of jobs as
2624 a whole instead of for each individual job. This is especially true if
2625 :option:`numjobs` is used; looking at individual thread/process output
2626 quickly becomes unwieldy. To see the final report per-group instead of
2627 per-job, use :option:`group_reporting`. Jobs in a file will be part of the
2628 same reporting group, unless if separated by a :option:`stonewall`, or by
2629 using :option:`new_group`.
2630
2631.. option:: new_group
2632
2633 Start a new reporting group. See: :option:`group_reporting`. If not given,
2634 all jobs in a file will be part of the same reporting group, unless
2635 separated by a :option:`stonewall`.
2636
8243be59
JA
2637.. option:: stats
2638
2639 By default, fio collects and shows final output results for all jobs
2640 that run. If this option is set to 0, then fio will ignore it in
2641 the final stat output.
2642
f80dba8d
MT
2643.. option:: write_bw_log=str
2644
2645 If given, write a bandwidth log for this job. Can be used to store data of
2646 the bandwidth of the jobs in their lifetime. The included
2647 :command:`fio_generate_plots` script uses :command:`gnuplot` to turn these
2648 text files into nice graphs. See :option:`write_lat_log` for behaviour of
2649 given filename. For this option, the postfix is :file:`_bw.x.log`, where `x`
2650 is the index of the job (`1..N`, where `N` is the number of jobs). If
2651 :option:`per_job_logs` is false, then the filename will not include the job
2652 index. See `Log File Formats`_.
2653
2654.. option:: write_lat_log=str
2655
2656 Same as :option:`write_bw_log`, except that this option stores I/O
2657 submission, completion, and total latencies instead. If no filename is given
2658 with this option, the default filename of :file:`jobname_type.log` is
2659 used. Even if the filename is given, fio will still append the type of
2660 log. So if one specifies::
e3cedca7
JA
2661
2662 write_lat_log=foo
2663
f80dba8d
MT
2664 The actual log names will be :file:`foo_slat.x.log`, :file:`foo_clat.x.log`,
2665 and :file:`foo_lat.x.log`, where `x` is the index of the job (1..N, where N
2666 is the number of jobs). This helps :command:`fio_generate_plot` find the
2667 logs automatically. If :option:`per_job_logs` is false, then the filename
2668 will not include the job index. See `Log File Formats`_.
be4ecfdf 2669
f80dba8d 2670.. option:: write_hist_log=str
06842027 2671
f80dba8d
MT
2672 Same as :option:`write_lat_log`, but writes I/O completion latency
2673 histograms. If no filename is given with this option, the default filename
2674 of :file:`jobname_clat_hist.x.log` is used, where `x` is the index of the
2675 job (1..N, where `N` is the number of jobs). Even if the filename is given,
2676 fio will still append the type of log. If :option:`per_job_logs` is false,
2677 then the filename will not include the job index. See `Log File Formats`_.
06842027 2678
f80dba8d 2679.. option:: write_iops_log=str
06842027 2680
f80dba8d
MT
2681 Same as :option:`write_bw_log`, but writes IOPS. If no filename is given
2682 with this option, the default filename of :file:`jobname_type.x.log` is
2683 used,where `x` is the index of the job (1..N, where `N` is the number of
2684 jobs). Even if the filename is given, fio will still append the type of
2685 log. If :option:`per_job_logs` is false, then the filename will not include
2686 the job index. See `Log File Formats`_.
06842027 2687
f80dba8d 2688.. option:: log_avg_msec=int
06842027 2689
f80dba8d
MT
2690 By default, fio will log an entry in the iops, latency, or bw log for every
2691 I/O that completes. When writing to the disk log, that can quickly grow to a
2692 very large size. Setting this option makes fio average the each log entry
2693 over the specified period of time, reducing the resolution of the log. See
2694 :option:`log_max_value` as well. Defaults to 0, logging all entries.
06842027 2695
f80dba8d 2696.. option:: log_hist_msec=int
06842027 2697
f80dba8d
MT
2698 Same as :option:`log_avg_msec`, but logs entries for completion latency
2699 histograms. Computing latency percentiles from averages of intervals using
c60ebc45 2700 :option:`log_avg_msec` is inaccurate. Setting this option makes fio log
f80dba8d
MT
2701 histogram entries over the specified period of time, reducing log sizes for
2702 high IOPS devices while retaining percentile accuracy. See
2703 :option:`log_hist_coarseness` as well. Defaults to 0, meaning histogram
2704 logging is disabled.
06842027 2705
f80dba8d 2706.. option:: log_hist_coarseness=int
06842027 2707
f80dba8d
MT
2708 Integer ranging from 0 to 6, defining the coarseness of the resolution of
2709 the histogram logs enabled with :option:`log_hist_msec`. For each increment
2710 in coarseness, fio outputs half as many bins. Defaults to 0, for which
2711 histogram logs contain 1216 latency bins. See `Log File Formats`_.
8b28bd41 2712
f80dba8d 2713.. option:: log_max_value=bool
66c098b8 2714
f80dba8d
MT
2715 If :option:`log_avg_msec` is set, fio logs the average over that window. If
2716 you instead want to log the maximum value, set this option to 1. Defaults to
2717 0, meaning that averaged values are logged.
a696fa2a 2718
f80dba8d 2719.. option:: log_offset=int
a696fa2a 2720
f80dba8d
MT
2721 If this is set, the iolog options will include the byte offset for the I/O
2722 entry as well as the other data values.
71bfa161 2723
f80dba8d 2724.. option:: log_compression=int
7de87099 2725
f80dba8d
MT
2726 If this is set, fio will compress the I/O logs as it goes, to keep the
2727 memory footprint lower. When a log reaches the specified size, that chunk is
2728 removed and compressed in the background. Given that I/O logs are fairly
2729 highly compressible, this yields a nice memory savings for longer runs. The
2730 downside is that the compression will consume some background CPU cycles, so
2731 it may impact the run. This, however, is also true if the logging ends up
2732 consuming most of the system memory. So pick your poison. The I/O logs are
2733 saved normally at the end of a run, by decompressing the chunks and storing
2734 them in the specified log file. This feature depends on the availability of
2735 zlib.
e0b0d892 2736
f80dba8d 2737.. option:: log_compression_cpus=str
e0b0d892 2738
f80dba8d
MT
2739 Define the set of CPUs that are allowed to handle online log compression for
2740 the I/O jobs. This can provide better isolation between performance
2741 sensitive jobs, and background compression work.
9e684a49 2742
f80dba8d 2743.. option:: log_store_compressed=bool
9e684a49 2744
f80dba8d
MT
2745 If set, fio will store the log files in a compressed format. They can be
2746 decompressed with fio, using the :option:`--inflate-log` command line
2747 parameter. The files will be stored with a :file:`.fz` suffix.
9e684a49 2748
f80dba8d 2749.. option:: log_unix_epoch=bool
9e684a49 2750
f80dba8d
MT
2751 If set, fio will log Unix timestamps to the log files produced by enabling
2752 write_type_log for each log type, instead of the default zero-based
2753 timestamps.
2754
2755.. option:: block_error_percentiles=bool
2756
2757 If set, record errors in trim block-sized units from writes and trims and
2758 output a histogram of how many trims it took to get to errors, and what kind
2759 of error was encountered.
2760
2761.. option:: bwavgtime=int
2762
2763 Average the calculated bandwidth over the given time. Value is specified in
2764 milliseconds. If the job also does bandwidth logging through
2765 :option:`write_bw_log`, then the minimum of this option and
2766 :option:`log_avg_msec` will be used. Default: 500ms.
2767
2768.. option:: iopsavgtime=int
2769
2770 Average the calculated IOPS over the given time. Value is specified in
2771 milliseconds. If the job also does IOPS logging through
2772 :option:`write_iops_log`, then the minimum of this option and
2773 :option:`log_avg_msec` will be used. Default: 500ms.
2774
2775.. option:: disk_util=bool
2776
2777 Generate disk utilization statistics, if the platform supports it.
2778 Default: true.
2779
2780.. option:: disable_lat=bool
2781
2782 Disable measurements of total latency numbers. Useful only for cutting back
2783 the number of calls to :manpage:`gettimeofday(2)`, as that does impact
2784 performance at really high IOPS rates. Note that to really get rid of a
2785 large amount of these calls, this option must be used with
f75ede1d 2786 :option:`disable_slat` and :option:`disable_bw_measurement` as well.
f80dba8d
MT
2787
2788.. option:: disable_clat=bool
2789
2790 Disable measurements of completion latency numbers. See
2791 :option:`disable_lat`.
2792
2793.. option:: disable_slat=bool
2794
2795 Disable measurements of submission latency numbers. See
2796 :option:`disable_slat`.
2797
f75ede1d 2798.. option:: disable_bw_measurement=bool, disable_bw=bool
f80dba8d
MT
2799
2800 Disable measurements of throughput/bandwidth numbers. See
2801 :option:`disable_lat`.
2802
2803.. option:: clat_percentiles=bool
2804
2805 Enable the reporting of percentiles of completion latencies.
2806
2807.. option:: percentile_list=float_list
2808
2809 Overwrite the default list of percentiles for completion latencies and the
2810 block error histogram. Each number is a floating number in the range
2811 (0,100], and the maximum length of the list is 20. Use ``:`` to separate the
2812 numbers, and list the numbers in ascending order. For example,
2813 ``--percentile_list=99.5:99.9`` will cause fio to report the values of
2814 completion latency below which 99.5% and 99.9% of the observed latencies
2815 fell, respectively.
2816
2817
2818Error handling
2819~~~~~~~~~~~~~~
2820
2821.. option:: exitall_on_error
2822
2823 When one job finishes in error, terminate the rest. The default is to wait
2824 for each job to finish.
2825
2826.. option:: continue_on_error=str
2827
2828 Normally fio will exit the job on the first observed failure. If this option
2829 is set, fio will continue the job when there is a 'non-fatal error' (EIO or
2830 EILSEQ) until the runtime is exceeded or the I/O size specified is
2831 completed. If this option is used, there are two more stats that are
2832 appended, the total error count and the first error. The error field given
2833 in the stats is the first error that was hit during the run.
2834
2835 The allowed values are:
2836
2837 **none**
2838 Exit on any I/O or verify errors.
2839
2840 **read**
2841 Continue on read errors, exit on all others.
2842
2843 **write**
2844 Continue on write errors, exit on all others.
2845
2846 **io**
2847 Continue on any I/O error, exit on all others.
2848
2849 **verify**
2850 Continue on verify errors, exit on all others.
2851
2852 **all**
2853 Continue on all errors.
2854
2855 **0**
2856 Backward-compatible alias for 'none'.
2857
2858 **1**
2859 Backward-compatible alias for 'all'.
2860
2861.. option:: ignore_error=str
2862
2863 Sometimes you want to ignore some errors during test in that case you can
a35ef7cb
TK
2864 specify error list for each error type, instead of only being able to
2865 ignore the default 'non-fatal error' using :option:`continue_on_error`.
f80dba8d
MT
2866 ``ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST`` errors for
2867 given error type is separated with ':'. Error may be symbol ('ENOSPC',
2868 'ENOMEM') or integer. Example::
2869
2870 ignore_error=EAGAIN,ENOSPC:122
2871
2872 This option will ignore EAGAIN from READ, and ENOSPC and 122(EDQUOT) from
a35ef7cb
TK
2873 WRITE. This option works by overriding :option:`continue_on_error` with
2874 the list of errors for each error type if any.
f80dba8d
MT
2875
2876.. option:: error_dump=bool
2877
2878 If set dump every error even if it is non fatal, true by default. If
2879 disabled only fatal error will be dumped.
2880
f75ede1d
SW
2881Running predefined workloads
2882----------------------------
2883
2884Fio includes predefined profiles that mimic the I/O workloads generated by
2885other tools.
2886
2887.. option:: profile=str
2888
2889 The predefined workload to run. Current profiles are:
2890
2891 **tiobench**
2892 Threaded I/O bench (tiotest/tiobench) like workload.
2893
2894 **act**
2895 Aerospike Certification Tool (ACT) like workload.
2896
2897To view a profile's additional options use :option:`--cmdhelp` after specifying
2898the profile. For example::
2899
2900$ fio --profile=act --cmdhelp
2901
2902Act profile options
2903~~~~~~~~~~~~~~~~~~~
2904
2905.. option:: device-names=str
2906 :noindex:
2907
2908 Devices to use.
2909
2910.. option:: load=int
2911 :noindex:
2912
2913 ACT load multiplier. Default: 1.
2914
2915.. option:: test-duration=time
2916 :noindex:
2917
947e0fe0
SW
2918 How long the entire test takes to run. When the unit is omitted, the value
2919 is given in seconds. Default: 24h.
f75ede1d
SW
2920
2921.. option:: threads-per-queue=int
2922 :noindex:
2923
2924 Number of read IO threads per device. Default: 8.
2925
2926.. option:: read-req-num-512-blocks=int
2927 :noindex:
2928
2929 Number of 512B blocks to read at the time. Default: 3.
2930
2931.. option:: large-block-op-kbytes=int
2932 :noindex:
2933
2934 Size of large block ops in KiB (writes). Default: 131072.
2935
2936.. option:: prep
2937 :noindex:
2938
2939 Set to run ACT prep phase.
2940
2941Tiobench profile options
2942~~~~~~~~~~~~~~~~~~~~~~~~
2943
2944.. option:: size=str
2945 :noindex:
2946
2947 Size in MiB
2948
2949.. option:: block=int
2950 :noindex:
2951
2952 Block size in bytes. Default: 4096.
2953
2954.. option:: numruns=int
2955 :noindex:
2956
2957 Number of runs.
2958
2959.. option:: dir=str
2960 :noindex:
2961
2962 Test directory.
2963
2964.. option:: threads=int
2965 :noindex:
2966
2967 Number of threads.
f80dba8d
MT
2968
2969Interpreting the output
2970-----------------------
2971
2972Fio spits out a lot of output. While running, fio will display the status of the
2973jobs created. An example of that would be::
2974
9d25d068 2975 Jobs: 1 (f=1): [_(1),M(1)][24.8%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 01m:31s]
f80dba8d
MT
2976
2977The characters inside the square brackets denote the current status of each
2978thread. The possible values (in typical life cycle order) are:
2979
2980+------+-----+-----------------------------------------------------------+
2981| Idle | Run | |
2982+======+=====+===========================================================+
2983| P | | Thread setup, but not started. |
2984+------+-----+-----------------------------------------------------------+
2985| C | | Thread created. |
2986+------+-----+-----------------------------------------------------------+
2987| I | | Thread initialized, waiting or generating necessary data. |
2988+------+-----+-----------------------------------------------------------+
2989| | p | Thread running pre-reading file(s). |
2990+------+-----+-----------------------------------------------------------+
2991| | R | Running, doing sequential reads. |
2992+------+-----+-----------------------------------------------------------+
2993| | r | Running, doing random reads. |
2994+------+-----+-----------------------------------------------------------+
2995| | W | Running, doing sequential writes. |
2996+------+-----+-----------------------------------------------------------+
2997| | w | Running, doing random writes. |
2998+------+-----+-----------------------------------------------------------+
2999| | M | Running, doing mixed sequential reads/writes. |
3000+------+-----+-----------------------------------------------------------+
3001| | m | Running, doing mixed random reads/writes. |
3002+------+-----+-----------------------------------------------------------+
3003| | F | Running, currently waiting for :manpage:`fsync(2)` |
3004+------+-----+-----------------------------------------------------------+
3005| | V | Running, doing verification of written data. |
3006+------+-----+-----------------------------------------------------------+
3007| E | | Thread exited, not reaped by main thread yet. |
3008+------+-----+-----------------------------------------------------------+
3009| _ | | Thread reaped, or |
3010+------+-----+-----------------------------------------------------------+
3011| X | | Thread reaped, exited with an error. |
3012+------+-----+-----------------------------------------------------------+
3013| K | | Thread reaped, exited due to signal. |
3014+------+-----+-----------------------------------------------------------+
3015
3016Fio will condense the thread string as not to take up more space on the command
3017line as is needed. For instance, if you have 10 readers and 10 writers running,
3018the output would look like this::
3019
9d25d068 3020 Jobs: 20 (f=20): [R(10),W(10)][4.0%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 57m:36s]
f80dba8d
MT
3021
3022Fio will still maintain the ordering, though. So the above means that jobs 1..10
3023are readers, and 11..20 are writers.
3024
3025The other values are fairly self explanatory -- number of threads currently
9d25d068
SW
3026running and doing I/O, the number of currently open files (f=), the rate of I/O
3027since last check (read speed listed first, then write speed and optionally trim
3028speed), and the estimated completion percentage and time for the current
f80dba8d
MT
3029running group. It's impossible to estimate runtime of the following groups (if
3030any). Note that the string is displayed in order, so it's possible to tell which
3031of the jobs are currently doing what. The first character is the first job
3032defined in the job file, and so forth.
3033
3034When fio is done (or interrupted by :kbd:`ctrl-c`), it will show the data for
3035each thread, group of threads, and disks in that order. For each data direction,
3036the output looks like::
3037
3038 Client1 (g=0): err= 0:
3039 write: io= 32MiB, bw= 666KiB/s, iops=89 , runt= 50320msec
3040 slat (msec): min= 0, max= 136, avg= 0.03, stdev= 1.92
3041 clat (msec): min= 0, max= 631, avg=48.50, stdev=86.82
3042 bw (KiB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, stdev=681.68
3043 cpu : usr=1.49%, sys=0.25%, ctx=7969, majf=0, minf=17
3044 IO depths : 1=0.1%, 2=0.3%, 4=0.5%, 8=99.0%, 16=0.0%, 32=0.0%, >32=0.0%
3045 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
3046 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
3047 issued r/w: total=0/32768, short=0/0
3048 lat (msec): 2=1.6%, 4=0.0%, 10=3.2%, 20=12.8%, 50=38.4%, 100=24.8%,
3049 lat (msec): 250=15.2%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2048=0.0%
71bfa161
JA
3050
3051The client number is printed, along with the group id and error of that
f80dba8d
MT
3052thread. Below is the I/O statistics, here for writes. In the order listed, they
3053denote:
3054
3055**io**
3056 Number of megabytes I/O performed.
3057
3058**bw**
3059 Average bandwidth rate.
3060
3061**iops**
c60ebc45 3062 Average I/Os performed per second.
f80dba8d
MT
3063
3064**runt**
3065 The runtime of that thread.
3066
3067**slat**
3068 Submission latency (avg being the average, stdev being the standard
3069 deviation). This is the time it took to submit the I/O. For sync I/O,
3070 the slat is really the completion latency, since queue/complete is one
3071 operation there. This value can be in milliseconds or microseconds, fio
3072 will choose the most appropriate base and print that. In the example
3073 above, milliseconds is the best scale. Note: in :option:`--minimal` mode
0d237712 3074 latencies are always expressed in microseconds.
f80dba8d
MT
3075
3076**clat**
3077 Completion latency. Same names as slat, this denotes the time from
3078 submission to completion of the I/O pieces. For sync I/O, clat will
3079 usually be equal (or very close) to 0, as the time from submit to
3080 complete is basically just CPU time (I/O has already been done, see slat
3081 explanation).
3082
3083**bw**
3084 Bandwidth. Same names as the xlat stats, but also includes an
3085 approximate percentage of total aggregate bandwidth this thread received
3086 in this group. This last value is only really useful if the threads in
3087 this group are on the same disk, since they are then competing for disk
3088 access.
3089
3090**cpu**
3091 CPU usage. User and system time, along with the number of context
3092 switches this thread went through, usage of system and user time, and
3093 finally the number of major and minor page faults. The CPU utilization
3094 numbers are averages for the jobs in that reporting group, while the
23a8e176 3095 context and fault counters are summed.
f80dba8d
MT
3096
3097**IO depths**
3098 The distribution of I/O depths over the job life time. The numbers are
3099 divided into powers of 2, so for example the 16= entries includes depths
3100 up to that value but higher than the previous entry. In other words, it
3101 covers the range from 16 to 31.
3102
3103**IO submit**
3104 How many pieces of I/O were submitting in a single submit call. Each
c60ebc45
SW
3105 entry denotes that amount and below, until the previous entry -- e.g.,
3106 8=100% mean that we submitted anywhere in between 5-8 I/Os per submit
f80dba8d
MT
3107 call.
3108
3109**IO complete**
3110 Like the above submit number, but for completions instead.
3111
3112**IO issued**
3113 The number of read/write requests issued, and how many of them were
3114 short.
3115
3116**IO latencies**
3117 The distribution of I/O completion latencies. This is the time from when
3118 I/O leaves fio and when it gets completed. The numbers follow the same
3119 pattern as the I/O depths, meaning that 2=1.6% means that 1.6% of the
3120 I/O completed within 2 msecs, 20=12.8% means that 12.8% of the I/O took
3121 more than 10 msecs, but less than (or equal to) 20 msecs.
71bfa161
JA
3122
3123After each client has been listed, the group statistics are printed. They
f80dba8d 3124will look like this::
71bfa161 3125
f80dba8d
MT
3126 Run status group 0 (all jobs):
3127 READ: io=64MB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec
3128 WRITE: io=64MB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec
71bfa161
JA
3129
3130For each data direction, it prints:
3131
f80dba8d
MT
3132**io**
3133 Number of megabytes I/O performed.
3134**aggrb**
3135 Aggregate bandwidth of threads in this group.
3136**minb**
3137 The minimum average bandwidth a thread saw.
3138**maxb**
3139 The maximum average bandwidth a thread saw.
3140**mint**
3141 The smallest runtime of the threads in that group.
3142**maxt**
3143 The longest runtime of the threads in that group.
71bfa161 3144
f80dba8d 3145And finally, the disk statistics are printed. They will look like this::
71bfa161 3146
f80dba8d
MT
3147 Disk stats (read/write):
3148 sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
71bfa161
JA
3149
3150Each value is printed for both reads and writes, with reads first. The
3151numbers denote:
3152
f80dba8d 3153**ios**
c60ebc45 3154 Number of I/Os performed by all groups.
f80dba8d
MT
3155**merge**
3156 Number of merges I/O the I/O scheduler.
3157**ticks**
3158 Number of ticks we kept the disk busy.
3159**io_queue**
3160 Total time spent in the disk queue.
3161**util**
3162 The disk utilization. A value of 100% means we kept the disk
71bfa161
JA
3163 busy constantly, 50% would be a disk idling half of the time.
3164
f80dba8d
MT
3165It is also possible to get fio to dump the current output while it is running,
3166without terminating the job. To do that, send fio the **USR1** signal. You can
3167also get regularly timed dumps by using the :option:`--status-interval`
3168parameter, or by creating a file in :file:`/tmp` named
3169:file:`fio-dump-status`. If fio sees this file, it will unlink it and dump the
3170current output status.
8423bd11 3171
71bfa161 3172
f80dba8d
MT
3173Terse output
3174------------
71bfa161 3175
f80dba8d
MT
3176For scripted usage where you typically want to generate tables or graphs of the
3177results, fio can output the results in a semicolon separated format. The format
3178is one long line of values, such as::
71bfa161 3179
f80dba8d
MT
3180 2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
3181 A description of this job goes here.
562c2d2f
DN
3182
3183The job description (if provided) follows on a second line.
71bfa161 3184
f80dba8d
MT
3185To enable terse output, use the :option:`--minimal` command line option. The
3186first value is the version of the terse output format. If the output has to be
3187changed for some reason, this number will be incremented by 1 to signify that
3188change.
6820cb3b 3189
a2c95580
AH
3190Split up, the format is as follows (comments in brackets denote when a
3191field was introduced or whether its specific to some terse version):
71bfa161 3192
f80dba8d
MT
3193 ::
3194
a2c95580 3195 terse version, fio version [v3], jobname, groupid, error
f80dba8d
MT
3196
3197 READ status::
3198
3199 Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
3200 Submission latency: min, max, mean, stdev (usec)
3201 Completion latency: min, max, mean, stdev (usec)
3202 Completion latency percentiles: 20 fields (see below)
3203 Total latency: min, max, mean, stdev (usec)
a2c95580
AH
3204 Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
3205 IOPS [v5]: min, max, mean, stdev, number of samples
f80dba8d
MT
3206
3207 WRITE status:
3208
3209 ::
3210
3211 Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
3212 Submission latency: min, max, mean, stdev (usec)
247823cc 3213 Completion latency: min, max, mean, stdev (usec)
f80dba8d
MT
3214 Completion latency percentiles: 20 fields (see below)
3215 Total latency: min, max, mean, stdev (usec)
a2c95580
AH
3216 Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
3217 IOPS [v5]: min, max, mean, stdev, number of samples
3218
3219 TRIM status [all but version 3]:
3220
3221 Fields are similar to READ/WRITE status.
f80dba8d
MT
3222
3223 CPU usage::
3224
3225 user, system, context switches, major faults, minor faults
3226
3227 I/O depths::
3228
3229 <=1, 2, 4, 8, 16, 32, >=64
3230
3231 I/O latencies microseconds::
3232
3233 <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
3234
3235 I/O latencies milliseconds::
3236
3237 <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
3238
a2c95580 3239 Disk utilization [v3]::
f80dba8d
MT
3240
3241 Disk name, Read ios, write ios,
3242 Read merges, write merges,
3243 Read ticks, write ticks,
3244 Time spent in queue, disk utilization percentage
3245
3246 Additional Info (dependent on continue_on_error, default off)::
3247
3248 total # errors, first error code
3249
3250 Additional Info (dependent on description being set)::
3251
3252 Text description
3253
3254Completion latency percentiles can be a grouping of up to 20 sets, so for the
3255terse output fio writes all of them. Each field will look like this::
1db92cb6
JA
3256
3257 1.00%=6112
3258
f80dba8d 3259which is the Xth percentile, and the `usec` latency associated with it.
1db92cb6 3260
f80dba8d
MT
3261For disk utilization, all disks used by fio are shown. So for each disk there
3262will be a disk utilization section.
f2f788dd 3263
2fc26c3d
IC
3264Below is a single line containing short names for each of the fields in the
3265minimal output v3, separated by semicolons:
3266
3267terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_max;read_clat_min;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_max;write_clat_min;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;pu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
3268
25c8b9d7 3269
f80dba8d
MT
3270Trace file format
3271-----------------
3272
3273There are two trace file format that you can encounter. The older (v1) format is
3274unsupported since version 1.20-rc3 (March 2008). It will still be described
25c8b9d7
PD
3275below in case that you get an old trace and want to understand it.
3276
3277In any case the trace is a simple text file with a single action per line.
3278
3279
f80dba8d
MT
3280Trace file format v1
3281~~~~~~~~~~~~~~~~~~~~
3282
3283Each line represents a single I/O action in the following format::
3284
3285 rw, offset, length
25c8b9d7 3286
f80dba8d 3287where `rw=0/1` for read/write, and the offset and length entries being in bytes.
25c8b9d7 3288
f80dba8d 3289This format is not supported in fio versions => 1.20-rc3.
25c8b9d7 3290
25c8b9d7 3291
f80dba8d
MT
3292Trace file format v2
3293~~~~~~~~~~~~~~~~~~~~
25c8b9d7 3294
f80dba8d
MT
3295The second version of the trace file format was added in fio version 1.17. It
3296allows to access more then one file per trace and has a bigger set of possible
3297file actions.
25c8b9d7 3298
f80dba8d 3299The first line of the trace file has to be::
25c8b9d7 3300
f80dba8d 3301 fio version 2 iolog
25c8b9d7
PD
3302
3303Following this can be lines in two different formats, which are described below.
3304
f80dba8d 3305The file management format::
25c8b9d7 3306
f80dba8d 3307 filename action
25c8b9d7
PD
3308
3309The filename is given as an absolute path. The action can be one of these:
3310
f80dba8d
MT
3311**add**
3312 Add the given filename to the trace.
3313**open**
3314 Open the file with the given filename. The filename has to have
3315 been added with the **add** action before.
3316**close**
3317 Close the file with the given filename. The file has to have been
3318 opened before.
3319
3320
3321The file I/O action format::
3322
3323 filename action offset length
3324
3325The `filename` is given as an absolute path, and has to have been added and
3326opened before it can be used with this format. The `offset` and `length` are
3327given in bytes. The `action` can be one of these:
3328
3329**wait**
3330 Wait for `offset` microseconds. Everything below 100 is discarded.
3331 The time is relative to the previous `wait` statement.
3332**read**
3333 Read `length` bytes beginning from `offset`.
3334**write**
3335 Write `length` bytes beginning from `offset`.
3336**sync**
3337 :manpage:`fsync(2)` the file.
3338**datasync**
3339 :manpage:`fdatasync(2)` the file.
3340**trim**
3341 Trim the given file from the given `offset` for `length` bytes.
3342
3343CPU idleness profiling
3344----------------------
3345
3346In some cases, we want to understand CPU overhead in a test. For example, we
3347test patches for the specific goodness of whether they reduce CPU usage.
3348Fio implements a balloon approach to create a thread per CPU that runs at idle
3349priority, meaning that it only runs when nobody else needs the cpu.
3350By measuring the amount of work completed by the thread, idleness of each CPU
3351can be derived accordingly.
3352
3353An unit work is defined as touching a full page of unsigned characters. Mean and
3354standard deviation of time to complete an unit work is reported in "unit work"
3355section. Options can be chosen to report detailed percpu idleness or overall
3356system idleness by aggregating percpu stats.
3357
3358
3359Verification and triggers
3360-------------------------
3361
3362Fio is usually run in one of two ways, when data verification is done. The first
3363is a normal write job of some sort with verify enabled. When the write phase has
3364completed, fio switches to reads and verifies everything it wrote. The second
3365model is running just the write phase, and then later on running the same job
3366(but with reads instead of writes) to repeat the same I/O patterns and verify
3367the contents. Both of these methods depend on the write phase being completed,
3368as fio otherwise has no idea how much data was written.
3369
3370With verification triggers, fio supports dumping the current write state to
3371local files. Then a subsequent read verify workload can load this state and know
3372exactly where to stop. This is useful for testing cases where power is cut to a
3373server in a managed fashion, for instance.
99b9a85a
JA
3374
3375A verification trigger consists of two things:
3376
f80dba8d
MT
33771) Storing the write state of each job.
33782) Executing a trigger command.
99b9a85a 3379
f80dba8d
MT
3380The write state is relatively small, on the order of hundreds of bytes to single
3381kilobytes. It contains information on the number of completions done, the last X
3382completions, etc.
99b9a85a 3383
f80dba8d
MT
3384A trigger is invoked either through creation ('touch') of a specified file in
3385the system, or through a timeout setting. If fio is run with
9207a0cb 3386:option:`--trigger-file`\= :file:`/tmp/trigger-file`, then it will continually
f80dba8d
MT
3387check for the existence of :file:`/tmp/trigger-file`. When it sees this file, it
3388will fire off the trigger (thus saving state, and executing the trigger
99b9a85a
JA
3389command).
3390
f80dba8d
MT
3391For client/server runs, there's both a local and remote trigger. If fio is
3392running as a server backend, it will send the job states back to the client for
3393safe storage, then execute the remote trigger, if specified. If a local trigger
3394is specified, the server will still send back the write state, but the client
3395will then execute the trigger.
99b9a85a 3396
f80dba8d
MT
3397Verification trigger example
3398~~~~~~~~~~~~~~~~~~~~~~~~~~~~
99b9a85a 3399
4502cb42 3400Let's say we want to run a powercut test on the remote machine 'server'. Our
f80dba8d
MT
3401write workload is in :file:`write-test.fio`. We want to cut power to 'server' at
3402some point during the run, and we'll run this test from the safety or our local
3403machine, 'localbox'. On the server, we'll start the fio backend normally::
99b9a85a 3404
f80dba8d 3405 server# fio --server
99b9a85a 3406
f80dba8d 3407and on the client, we'll fire off the workload::
99b9a85a 3408
f80dba8d 3409 localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger-remote="bash -c \"echo b > /proc/sysrq-triger\""
99b9a85a 3410
f80dba8d 3411We set :file:`/tmp/my-trigger` as the trigger file, and we tell fio to execute::
99b9a85a 3412
f80dba8d 3413 echo b > /proc/sysrq-trigger
99b9a85a 3414
f80dba8d
MT
3415on the server once it has received the trigger and sent us the write state. This
3416will work, but it's not **really** cutting power to the server, it's merely
3417abruptly rebooting it. If we have a remote way of cutting power to the server
3418through IPMI or similar, we could do that through a local trigger command
4502cb42 3419instead. Let's assume we have a script that does IPMI reboot of a given hostname,
f80dba8d
MT
3420ipmi-reboot. On localbox, we could then have run fio with a local trigger
3421instead::
99b9a85a 3422
f80dba8d 3423 localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger="ipmi-reboot server"
99b9a85a 3424
f80dba8d
MT
3425For this case, fio would wait for the server to send us the write state, then
3426execute ``ipmi-reboot server`` when that happened.
3427
3428Loading verify state
3429~~~~~~~~~~~~~~~~~~~~
3430
4502cb42 3431To load stored write state, a read verification job file must contain the
f80dba8d 3432:option:`verify_state_load` option. If that is set, fio will load the previously
99b9a85a 3433stored state. For a local fio run this is done by loading the files directly,
f80dba8d
MT
3434and on a client/server run, the server backend will ask the client to send the
3435files over and load them from there.
a3ae5b05
JA
3436
3437
f80dba8d
MT
3438Log File Formats
3439----------------
a3ae5b05
JA
3440
3441Fio supports a variety of log file formats, for logging latencies, bandwidth,
3442and IOPS. The logs share a common format, which looks like this:
3443
f80dba8d 3444 *time* (`msec`), *value*, *data direction*, *offset*
a3ae5b05 3445
f80dba8d 3446Time for the log entry is always in milliseconds. The *value* logged depends
a3ae5b05
JA
3447on the type of log, it will be one of the following:
3448
f80dba8d
MT
3449 **Latency log**
3450 Value is latency in usecs
3451 **Bandwidth log**
3452 Value is in KiB/sec
3453 **IOPS log**
3454 Value is IOPS
3455
3456*Data direction* is one of the following:
3457
3458 **0**
3459 I/O is a READ
3460 **1**
3461 I/O is a WRITE
3462 **2**
3463 I/O is a TRIM
3464
3465The *offset* is the offset, in bytes, from the start of the file, for that
3466particular I/O. The logging of the offset can be toggled with
3467:option:`log_offset`.
3468
3469If windowed logging is enabled through :option:`log_avg_msec` then fio doesn't
c60ebc45 3470log individual I/Os. Instead of logs the average values over the specified period
f80dba8d
MT
3471of time. Since 'data direction' and 'offset' are per-I/O values, they aren't
3472applicable if windowed logging is enabled. If windowed logging is enabled and
3473:option:`log_max_value` is set, then fio logs maximum values in that window
3474instead of averages.
3475
3476
3477Client/server
3478-------------
3479
3480Normally fio is invoked as a stand-alone application on the machine where the
3481I/O workload should be generated. However, the frontend and backend of fio can
3482be run separately. Ie the fio server can generate an I/O workload on the "Device
3483Under Test" while being controlled from another machine.
3484
3485Start the server on the machine which has access to the storage DUT::
3486
3487 fio --server=args
3488
3489where args defines what fio listens to. The arguments are of the form
3490``type,hostname`` or ``IP,port``. *type* is either ``ip`` (or ip4) for TCP/IP
3491v4, ``ip6`` for TCP/IP v6, or ``sock`` for a local unix domain socket.
3492*hostname* is either a hostname or IP address, and *port* is the port to listen
3493to (only valid for TCP/IP, not a local socket). Some examples:
3494
34951) ``fio --server``
3496
3497 Start a fio server, listening on all interfaces on the default port (8765).
3498
34992) ``fio --server=ip:hostname,4444``
3500
3501 Start a fio server, listening on IP belonging to hostname and on port 4444.
3502
35033) ``fio --server=ip6:::1,4444``
3504
3505 Start a fio server, listening on IPv6 localhost ::1 and on port 4444.
3506
35074) ``fio --server=,4444``
3508
3509 Start a fio server, listening on all interfaces on port 4444.
3510
35115) ``fio --server=1.2.3.4``
3512
3513 Start a fio server, listening on IP 1.2.3.4 on the default port.
3514
35156) ``fio --server=sock:/tmp/fio.sock``
3516
3517 Start a fio server, listening on the local socket /tmp/fio.sock.
3518
3519Once a server is running, a "client" can connect to the fio server with::
3520
3521 fio <local-args> --client=<server> <remote-args> <job file(s)>
3522
3523where `local-args` are arguments for the client where it is running, `server`
3524is the connect string, and `remote-args` and `job file(s)` are sent to the
3525server. The `server` string follows the same format as it does on the server
3526side, to allow IP/hostname/socket and port strings.
3527
3528Fio can connect to multiple servers this way::
3529
3530 fio --client=<server1> <job file(s)> --client=<server2> <job file(s)>
3531
3532If the job file is located on the fio server, then you can tell the server to
3533load a local file as well. This is done by using :option:`--remote-config` ::
3534
3535 fio --client=server --remote-config /path/to/file.fio
3536
3537Then fio will open this local (to the server) job file instead of being passed
3538one from the client.
3539
3540If you have many servers (example: 100 VMs/containers), you can input a pathname
3541of a file containing host IPs/names as the parameter value for the
3542:option:`--client` option. For example, here is an example :file:`host.list`
3543file containing 2 hostnames::
3544
3545 host1.your.dns.domain
3546 host2.your.dns.domain
3547
3548The fio command would then be::
a3ae5b05 3549
f80dba8d 3550 fio --client=host.list <job file(s)>
a3ae5b05 3551
f80dba8d
MT
3552In this mode, you cannot input server-specific parameters or job files -- all
3553servers receive the same job file.
a3ae5b05 3554
f80dba8d
MT
3555In order to let ``fio --client`` runs use a shared filesystem from multiple
3556hosts, ``fio --client`` now prepends the IP address of the server to the
4502cb42 3557filename. For example, if fio is using the directory :file:`/mnt/nfs/fio` and is
f80dba8d
MT
3558writing filename :file:`fileio.tmp`, with a :option:`--client` `hostfile`
3559containing two hostnames ``h1`` and ``h2`` with IP addresses 192.168.10.120 and
3560192.168.10.121, then fio will create two files::
a3ae5b05 3561
f80dba8d
MT
3562 /mnt/nfs/fio/192.168.10.120.fileio.tmp
3563 /mnt/nfs/fio/192.168.10.121.fileio.tmp