Commit | Line | Data |
---|---|---|
71bfa161 JA |
1 | Table of contents |
2 | ----------------- | |
3 | ||
4 | 1. Overview | |
5 | 2. How fio works | |
6 | 3. Running fio | |
7 | 4. Job file format | |
8 | 5. Detailed list of parameters | |
9 | 6. Normal output | |
10 | 7. Terse output | |
25c8b9d7 | 11 | 8. Trace file format |
43f09da1 | 12 | 9. CPU idleness profiling |
29dbd1e5 | 13 | 10. Verification and triggers |
a3ae5b05 JA |
14 | 11. Log File Formats |
15 | ||
71bfa161 JA |
16 | |
17 | 1.0 Overview and history | |
18 | ------------------------ | |
19 | fio was originally written to save me the hassle of writing special test | |
20 | case programs when I wanted to test a specific workload, either for | |
21 | performance reasons or to find/reproduce a bug. The process of writing | |
22 | such a test app can be tiresome, especially if you have to do it often. | |
23 | Hence I needed a tool that would be able to simulate a given io workload | |
24 | without resorting to writing a tailored test case again and again. | |
25 | ||
26 | A test work load is difficult to define, though. There can be any number | |
27 | of processes or threads involved, and they can each be using their own | |
28 | way of generating io. You could have someone dirtying large amounts of | |
29 | memory in an memory mapped file, or maybe several threads issuing | |
30 | reads using asynchronous io. fio needed to be flexible enough to | |
31 | simulate both of these cases, and many more. | |
32 | ||
33 | 2.0 How fio works | |
34 | ----------------- | |
35 | The first step in getting fio to simulate a desired io workload, is | |
36 | writing a job file describing that specific setup. A job file may contain | |
37 | any number of threads and/or files - the typical contents of the job file | |
38 | is a global section defining shared parameters, and one or more job | |
39 | sections describing the jobs involved. When run, fio parses this file | |
40 | and sets everything up as described. If we break down a job from top to | |
41 | bottom, it contains the following basic parameters: | |
42 | ||
43 | IO type Defines the io pattern issued to the file(s). | |
44 | We may only be reading sequentially from this | |
45 | file(s), or we may be writing randomly. Or even | |
46 | mixing reads and writes, sequentially or randomly. | |
47 | ||
48 | Block size In how large chunks are we issuing io? This may be | |
49 | a single value, or it may describe a range of | |
50 | block sizes. | |
51 | ||
52 | IO size How much data are we going to be reading/writing. | |
53 | ||
54 | IO engine How do we issue io? We could be memory mapping the | |
55 | file, we could be using regular read/write, we | |
385e1da6 JA |
56 | could be using splice, async io, or even SG |
57 | (SCSI generic sg). | |
71bfa161 | 58 | |
6c219763 | 59 | IO depth If the io engine is async, how large a queuing |
71bfa161 JA |
60 | depth do we want to maintain? |
61 | ||
62 | IO type Should we be doing buffered io, or direct/raw io? | |
63 | ||
64 | Num files How many files are we spreading the workload over. | |
65 | ||
66 | Num threads How many threads or processes should we spread | |
67 | this workload over. | |
66c098b8 | 68 | |
71bfa161 JA |
69 | The above are the basic parameters defined for a workload, in addition |
70 | there's a multitude of parameters that modify other aspects of how this | |
71 | job behaves. | |
72 | ||
73 | ||
74 | 3.0 Running fio | |
75 | --------------- | |
76 | See the README file for command line parameters, there are only a few | |
77 | of them. | |
78 | ||
79 | Running fio is normally the easiest part - you just give it the job file | |
80 | (or job files) as parameters: | |
81 | ||
82 | $ fio job_file | |
83 | ||
84 | and it will start doing what the job_file tells it to do. You can give | |
85 | more than one job file on the command line, fio will serialize the running | |
86 | of those files. Internally that is the same as using the 'stonewall' | |
550b1db6 | 87 | parameter described in the parameter section. |
71bfa161 | 88 | |
b4692828 JA |
89 | If the job file contains only one job, you may as well just give the |
90 | parameters on the command line. The command line parameters are identical | |
91 | to the job parameters, with a few extra that control global parameters | |
92 | (see README). For example, for the job file parameter iodepth=2, the | |
c2b1e753 JA |
93 | mirror command line option would be --iodepth 2 or --iodepth=2. You can |
94 | also use the command line for giving more than one job entry. For each | |
95 | --name option that fio sees, it will start a new job with that name. | |
96 | Command line entries following a --name entry will apply to that job, | |
97 | until there are no more entries or a new --name entry is seen. This is | |
98 | similar to the job file options, where each option applies to the current | |
99 | job until a new [] job entry is seen. | |
b4692828 | 100 | |
71bfa161 JA |
101 | fio does not need to run as root, except if the files or devices specified |
102 | in the job section requires that. Some other options may also be restricted, | |
6c219763 | 103 | such as memory locking, io scheduler switching, and decreasing the nice value. |
71bfa161 JA |
104 | |
105 | ||
106 | 4.0 Job file format | |
107 | ------------------- | |
108 | As previously described, fio accepts one or more job files describing | |
109 | what it is supposed to do. The job file format is the classic ini file, | |
110 | where the names enclosed in [] brackets define the job name. You are free | |
111 | to use any ascii name you want, except 'global' which has special meaning. | |
112 | A global section sets defaults for the jobs described in that file. A job | |
113 | may override a global section parameter, and a job file may even have | |
114 | several global sections if so desired. A job is only affected by a global | |
65db0851 JA |
115 | section residing above it. If the first character in a line is a ';' or a |
116 | '#', the entire line is discarded as a comment. | |
71bfa161 | 117 | |
3c54bc46 | 118 | So let's look at a really simple job file that defines two processes, each |
6d500c2e | 119 | randomly reading from a 128MiB file. |
71bfa161 JA |
120 | |
121 | ; -- start job file -- | |
122 | [global] | |
123 | rw=randread | |
124 | size=128m | |
125 | ||
126 | [job1] | |
127 | ||
128 | [job2] | |
129 | ||
130 | ; -- end job file -- | |
131 | ||
132 | As you can see, the job file sections themselves are empty as all the | |
133 | described parameters are shared. As no filename= option is given, fio | |
c2b1e753 JA |
134 | makes up a filename for each of the jobs as it sees fit. On the command |
135 | line, this job would look as follows: | |
136 | ||
137 | $ fio --name=global --rw=randread --size=128m --name=job1 --name=job2 | |
138 | ||
71bfa161 | 139 | |
3c54bc46 | 140 | Let's look at an example that has a number of processes writing randomly |
71bfa161 JA |
141 | to files. |
142 | ||
143 | ; -- start job file -- | |
144 | [random-writers] | |
145 | ioengine=libaio | |
146 | iodepth=4 | |
147 | rw=randwrite | |
148 | bs=32k | |
149 | direct=0 | |
150 | size=64m | |
151 | numjobs=4 | |
152 | ||
153 | ; -- end job file -- | |
154 | ||
155 | Here we have no global section, as we only have one job defined anyway. | |
156 | We want to use async io here, with a depth of 4 for each file. We also | |
6d500c2e | 157 | increased the buffer size used to 32KiB and define numjobs to 4 to |
71bfa161 | 158 | fork 4 identical jobs. The result is 4 processes each randomly writing |
6d500c2e | 159 | to their own 64MiB file. Instead of using the above job file, you could |
b4692828 JA |
160 | have given the parameters on the command line. For this case, you would |
161 | specify: | |
162 | ||
163 | $ fio --name=random-writers --ioengine=libaio --iodepth=4 --rw=randwrite --bs=32k --direct=0 --size=64m --numjobs=4 | |
71bfa161 | 164 | |
df5ad464 AK |
165 | When fio is utilized as a basis of any reasonably large test suite, it might be |
166 | desirable to share a set of standardized settings across multiple job files. | |
167 | Instead of copy/pasting such settings, any section may pull in an external | |
168 | .fio file with 'include filename' directive, as in the following example: | |
169 | ||
170 | ; -- start job file including.fio -- | |
171 | [global] | |
172 | filename=/tmp/test | |
173 | filesize=1m | |
174 | include glob-include.fio | |
175 | ||
176 | [test] | |
177 | rw=randread | |
178 | bs=4k | |
179 | time_based=1 | |
180 | runtime=10 | |
181 | include test-include.fio | |
182 | ; -- end job file including.fio -- | |
183 | ||
184 | ; -- start job file glob-include.fio -- | |
185 | thread=1 | |
186 | group_reporting=1 | |
187 | ; -- end job file glob-include.fio -- | |
188 | ||
189 | ; -- start job file test-include.fio -- | |
190 | ioengine=libaio | |
191 | iodepth=4 | |
192 | ; -- end job file test-include.fio -- | |
193 | ||
194 | Settings pulled into a section apply to that section only (except global | |
195 | section). Include directives may be nested in that any included file may | |
73568e1b JA |
196 | contain further include directive(s). Include files may not contain [] |
197 | sections. | |
df5ad464 AK |
198 | |
199 | ||
74929ac2 JA |
200 | 4.1 Environment variables |
201 | ------------------------- | |
202 | ||
3c54bc46 | 203 | fio also supports environment variable expansion in job files. Any |
4fbe1860 | 204 | sub-string of the form "${VARNAME}" as part of an option value (in other |
3c54bc46 AC |
205 | words, on the right of the `='), will be expanded to the value of the |
206 | environment variable called VARNAME. If no such environment variable | |
207 | is defined, or VARNAME is the empty string, the empty string will be | |
208 | substituted. | |
209 | ||
210 | As an example, let's look at a sample fio invocation and job file: | |
211 | ||
212 | $ SIZE=64m NUMJOBS=4 fio jobfile.fio | |
213 | ||
214 | ; -- start job file -- | |
215 | [random-writers] | |
216 | rw=randwrite | |
217 | size=${SIZE} | |
218 | numjobs=${NUMJOBS} | |
219 | ; -- end job file -- | |
220 | ||
221 | This will expand to the following equivalent job file at runtime: | |
222 | ||
223 | ; -- start job file -- | |
224 | [random-writers] | |
225 | rw=randwrite | |
226 | size=64m | |
227 | numjobs=4 | |
228 | ; -- end job file -- | |
229 | ||
71bfa161 JA |
230 | fio ships with a few example job files, you can also look there for |
231 | inspiration. | |
232 | ||
74929ac2 JA |
233 | 4.2 Reserved keywords |
234 | --------------------- | |
235 | ||
236 | Additionally, fio has a set of reserved keywords that will be replaced | |
237 | internally with the appropriate value. Those keywords are: | |
238 | ||
239 | $pagesize The architecture page size of the running system | |
240 | $mb_memory Megabytes of total memory in the system | |
241 | $ncpus Number of online available CPUs | |
242 | ||
243 | These can be used on the command line or in the job file, and will be | |
244 | automatically substituted with the current system values when the job | |
892a6ffc JA |
245 | is run. Simple math is also supported on these keywords, so you can |
246 | perform actions like: | |
247 | ||
248 | size=8*$mb_memory | |
249 | ||
250 | and get that properly expanded to 8 times the size of memory in the | |
251 | machine. | |
74929ac2 | 252 | |
71bfa161 JA |
253 | |
254 | 5.0 Detailed list of parameters | |
255 | ------------------------------- | |
256 | ||
257 | This section describes in details each parameter associated with a job. | |
258 | Some parameters take an option of a given type, such as an integer or | |
d59aa780 JA |
259 | a string. Anywhere a numeric value is required, an arithmetic expression |
260 | may be used, provided it is surrounded by parentheses. Supported operators | |
261 | are: | |
262 | ||
263 | addition (+) | |
264 | subtraction (-) | |
265 | multiplication (*) | |
266 | division (/) | |
267 | modulus (%) | |
268 | exponentiation (^) | |
269 | ||
270 | For time values in expressions, units are microseconds by default. This is | |
271 | different than for time values not in expressions (not enclosed in | |
272 | parentheses). The following types are used: | |
71bfa161 JA |
273 | |
274 | str String. This is a sequence of alpha characters. | |
b09da8fa | 275 | time Integer with possible time suffix. In seconds unless otherwise |
e417fd66 | 276 | specified, use eg 10m for 10 minutes. Accepts s/m/h for seconds, |
0de5b26f JA |
277 | minutes, and hours, and accepts 'ms' (or 'msec') for milliseconds, |
278 | and 'us' (or 'usec') for microseconds. | |
6d500c2e RE |
279 | |
280 | int Integer. A whole number value, which may contain an integer prefix | |
281 | and an integer suffix. | |
282 | [integer prefix]number[integer suffix] | |
283 | ||
284 | The optional integer prefix specifies the number's base. The default | |
285 | is decimal. 0x specifies hexadecimal. | |
286 | ||
287 | The optional integer suffix specifies the number's units, and includes | |
288 | an optional unit prefix and an optional unit. For quantities of data, | |
289 | the default unit is bytes. For quantities of time, the default unit | |
290 | is seconds. | |
291 | ||
292 | With kb_base=1000, fio follows international standards for unit prefixes. | |
293 | To specify power-of-10 decimal values defined in the International | |
294 | System of Units (SI): | |
295 | Ki means kilo (K) or 1000 | |
296 | Mi means mega (M) or 1000**2 | |
297 | Gi means giga (G) or 1000**3 | |
298 | Ti means tera (T) or 1000**4 | |
299 | Pi means peta (P) or 1000**5 | |
300 | ||
301 | To specify power-of-2 binary values defined in IEC 80000-13: | |
302 | k means kibi (Ki) or 1024 | |
303 | M means mebi (Mi) or 1024**2 | |
304 | G means gibi (Gi) or 1024**3 | |
305 | T means tebi (Ti) or 1024**4 | |
306 | P means pebi (Pi) or 1024**5 | |
307 | ||
308 | With kb_base=1024 (the default), the unit prefixes are opposite from | |
309 | those specified in the SI and IEC 80000-13 standards to provide | |
310 | compatibility with old scripts. For example, 4k means 4096. | |
311 | ||
312 | For quantities of data, an optional unit of 'B' may be included | |
313 | (e.g., 'kB' is the same as 'k'). | |
314 | ||
315 | The integer suffix is not case sensitive (e.g., m/mi mean mebi/mega, | |
316 | not milli). 'b' and 'B' both mean byte, not bit. | |
317 | ||
318 | Examples with kb_base=1000: | |
319 | 4 KiB: 4096, 4096b, 4096B, 4ki, 4kib, 4kiB, 4Ki, 4KiB | |
320 | 1 MiB: 1048576, 1mi, 1024ki | |
321 | 1 MB: 1000000, 1m, 1000k | |
322 | 1 TiB: 1073741824, 1ti, 1024mi, 1048576ki | |
323 | 1 TB: 1000000000, 1t, 1000m, 1000000k | |
324 | ||
325 | Examples with kb_base=1024 (default): | |
326 | 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB | |
327 | 1 MiB: 1048576, 1m, 1024k | |
328 | 1 MB: 1000000, 1mi, 1000ki | |
329 | 1 TiB: 1073741824, 1t, 1024m, 1048576k | |
330 | 1 TB: 1000000000, 1ti, 1000mi, 1000000ki | |
331 | ||
332 | To specify times (units are not case sensitive): | |
333 | D means days | |
334 | H means hours | |
335 | M mean minutes | |
336 | s or sec means seconds (default) | |
337 | ms or msec means milliseconds | |
338 | us or usec means microseconds | |
339 | ||
340 | If the option accepts an upper and lower range, use a colon ':' or | |
341 | minus '-' to separate such values. See irange. | |
342 | ||
71bfa161 JA |
343 | bool Boolean. Usually parsed as an integer, however only defined for |
344 | true and false (1 and 0). | |
b09da8fa | 345 | irange Integer range with suffix. Allows value range to be given, such |
bf9a3edb | 346 | as 1024-4096. A colon may also be used as the separator, eg |
0c9baf91 JA |
347 | 1k:4k. If the option allows two sets of ranges, they can be |
348 | specified with a ',' or '/' delimiter: 1k-4k/8k-32k. Also see | |
f7fa2653 | 349 | int. |
16e56d25 | 350 | float_list A list of floating point numbers, separated by a ':' character. |
71bfa161 JA |
351 | |
352 | With the above in mind, here follows the complete list of fio job | |
353 | parameters. | |
354 | ||
355 | name=str ASCII name of the job. This may be used to override the | |
356 | name printed by fio for this job. Otherwise the job | |
c2b1e753 | 357 | name is used. On the command line this parameter has the |
6c219763 | 358 | special purpose of also signaling the start of a new |
c2b1e753 | 359 | job. |
71bfa161 | 360 | |
9cc8cb91 AK |
361 | wait_for=str Specifies the name of the already defined job to wait |
362 | for. Single waitee name only may be specified. If set, the job | |
363 | won't be started until all workers of the waitee job are done. | |
364 | ||
365 | Wait_for operates on the job name basis, so there are a few | |
366 | limitations. First, the waitee must be defined prior to the | |
367 | waiter job (meaning no forward references). Second, if a job | |
368 | is being referenced as a waitee, it must have a unique name | |
369 | (no duplicate waitees). | |
370 | ||
61697c37 JA |
371 | description=str Text description of the job. Doesn't do anything except |
372 | dump this text description when this job is run. It's | |
373 | not parsed. | |
374 | ||
3776041e | 375 | directory=str Prefix filenames with this directory. Used to place files |
67445b63 JA |
376 | in a different location than "./". See the 'filename' option |
377 | for escaping certain characters. | |
71bfa161 JA |
378 | |
379 | filename=str Fio normally makes up a filename based on the job name, | |
380 | thread number, and file number. If you want to share | |
381 | files between threads in a job or several jobs, specify | |
8a09277d JA |
382 | a filename for each of them to override the default. |
383 | If the ioengine is file based, you can specify a number of | |
384 | files by separating the names with a ':' colon. So if you | |
385 | wanted a job to open /dev/sda and /dev/sdb as the two working | |
386 | files, you would use filename=/dev/sda:/dev/sdb. On Windows, | |
387 | disk devices are accessed as \\.\PhysicalDrive0 for the first | |
388 | device, \\.\PhysicalDrive1 for the second etc. Note: Windows | |
389 | and FreeBSD prevent write access to areas of the disk | |
390 | containing in-use data (e.g. filesystems). | |
30a4588a JA |
391 | If the wanted filename does need to include a colon, then |
392 | escape that with a '\' character. For instance, if the filename | |
393 | is "/dev/dsk/foo@3,0:c", then you would use | |
394 | filename="/dev/dsk/foo@3,0\:c". '-' is a reserved name, meaning | |
395 | stdin or stdout. Which of the two depends on the read/write | |
396 | direction set. | |
71bfa161 | 397 | |
de98bd30 JA |
398 | filename_format=str |
399 | If sharing multiple files between jobs, it is usually necessary | |
400 | to have fio generate the exact names that you want. By default, | |
401 | fio will name a file based on the default file format | |
402 | specification of jobname.jobnumber.filenumber. With this | |
403 | option, that can be customized. Fio will recognize and replace | |
404 | the following keywords in this string: | |
405 | ||
406 | $jobname | |
407 | The name of the worker thread or process. | |
408 | ||
409 | $jobnum | |
410 | The incremental number of the worker thread or | |
411 | process. | |
412 | ||
413 | $filenum | |
414 | The incremental number of the file for that worker | |
415 | thread or process. | |
416 | ||
417 | To have dependent jobs share a set of files, this option can | |
418 | be set to have fio generate filenames that are shared between | |
419 | the two. For instance, if testfiles.$filenum is specified, | |
420 | file number 4 for any job will be named testfiles.4. The | |
421 | default of $jobname.$jobnum.$filenum will be used if | |
422 | no other format specifier is given. | |
423 | ||
922a5be8 JA |
424 | unique_filename=bool To avoid collisions between networked clients, fio |
425 | defaults to prefixing any generated filenames (with a directory | |
426 | specified) with the source of the client connecting. To disable | |
427 | this behavior, set this option to 0. | |
428 | ||
bbf6b540 JA |
429 | opendir=str Tell fio to recursively add any file it can find in this |
430 | directory and down the file system tree. | |
431 | ||
3776041e | 432 | lockfile=str Fio defaults to not locking any files before it does |
4d4e80f2 JA |
433 | IO to them. If a file or file descriptor is shared, fio |
434 | can serialize IO to that file to make the end result | |
435 | consistent. This is usual for emulating real workloads that | |
436 | share files. The lock modes are: | |
437 | ||
438 | none No locking. The default. | |
439 | exclusive Only one thread/process may do IO, | |
440 | excluding all others. | |
441 | readwrite Read-write locking on the file. Many | |
442 | readers may access the file at the | |
443 | same time, but writes get exclusive | |
444 | access. | |
445 | ||
d3aad8f2 | 446 | readwrite=str |
71bfa161 JA |
447 | rw=str Type of io pattern. Accepted values are: |
448 | ||
449 | read Sequential reads | |
450 | write Sequential writes | |
169c098d | 451 | trim Sequential trims |
71bfa161 JA |
452 | randwrite Random writes |
453 | randread Random reads | |
169c098d | 454 | randtrim Random trims |
10b023db | 455 | rw,readwrite Sequential mixed reads and writes |
71bfa161 | 456 | randrw Random mixed reads and writes |
169c098d | 457 | trimwrite Sequential trim+write sequences |
71bfa161 | 458 | |
38f8c318 | 459 | Fio defaults to read if the option is not specified. |
71bfa161 JA |
460 | For the mixed io types, the default is to split them 50/50. |
461 | For certain types of io the result may still be skewed a bit, | |
211097b2 | 462 | since the speed may be different. It is possible to specify |
38dad62d | 463 | a number of IO's to do before getting a new offset, this is |
892ea9bd | 464 | done by appending a ':<nr>' to the end of the string given. |
38dad62d | 465 | For a random read, it would look like 'rw=randread:8' for |
059b0802 | 466 | passing in an offset modifier with a value of 8. If the |
ddb754db | 467 | suffix is used with a sequential IO pattern, then the value |
059b0802 JA |
468 | specified will be added to the generated offset for each IO. |
469 | For instance, using rw=write:4k will skip 4k for every | |
470 | write. It turns sequential IO into sequential IO with holes. | |
471 | See the 'rw_sequencer' option. | |
38dad62d JA |
472 | |
473 | rw_sequencer=str If an offset modifier is given by appending a number to | |
474 | the rw=<str> line, then this option controls how that | |
475 | number modifies the IO offset being generated. Accepted | |
476 | values are: | |
477 | ||
478 | sequential Generate sequential offset | |
479 | identical Generate the same offset | |
480 | ||
481 | 'sequential' is only useful for random IO, where fio would | |
482 | normally generate a new random offset for every IO. If you | |
483 | append eg 8 to randread, you would get a new random offset for | |
211097b2 JA |
484 | every 8 IO's. The result would be a seek for only every 8 |
485 | IO's, instead of for every IO. Use rw=randread:8 to specify | |
38dad62d JA |
486 | that. As sequential IO is already sequential, setting |
487 | 'sequential' for that would not result in any differences. | |
488 | 'identical' behaves in a similar fashion, except it sends | |
489 | the same offset 8 number of times before generating a new | |
490 | offset. | |
71bfa161 | 491 | |
6d500c2e RE |
492 | kb_base=int Select the interpretation of unit prefixes in input parameters. |
493 | 1000 = Inputs comply with IEC 80000-13 and the International | |
494 | System of Units (SI). Use: | |
495 | - power-of-2 values with IEC prefixes (e.g., KiB) | |
496 | - power-of-10 values with SI prefixes (e.g., kB) | |
497 | 1024 = Compatibility mode (default). To avoid breaking | |
498 | old scripts: | |
499 | - power-of-2 values with SI prefixes | |
500 | - power-of-10 values with IEC prefixes | |
501 | See bs= for more details on input parameters. | |
502 | ||
503 | Outputs always use correct prefixes. Most outputs include | |
504 | both side-by-side, like: | |
505 | bw=2383.3kB/s (2327.4KiB/s) | |
506 | If only one value is reported, then kb_base selects the | |
507 | one to use: | |
508 | 1000 = SI prefixes | |
509 | 1024 = IEC prefixes | |
90fef2d1 | 510 | |
771e58be | 511 | unified_rw_reporting=bool Fio normally reports statistics on a per |
169c098d | 512 | data direction basis, meaning that reads, writes, and trims are |
771e58be JA |
513 | accounted and reported separately. If this option is set, |
514 | the fio will sum the results and report them as "mixed" | |
515 | instead. | |
516 | ||
ee738499 JA |
517 | randrepeat=bool For random IO workloads, seed the generator in a predictable |
518 | way so that results are repeatable across repetitions. | |
40fe5e7b | 519 | Defaults to true. |
ee738499 | 520 | |
04778baf JA |
521 | randseed=int Seed the random number generators based on this seed value, to |
522 | be able to control what sequence of output is being generated. | |
523 | If not set, the random sequence depends on the randrepeat | |
524 | setting. | |
525 | ||
a596f047 EG |
526 | fallocate=str Whether pre-allocation is performed when laying down files. |
527 | Accepted values are: | |
528 | ||
529 | none Do not pre-allocate space | |
530 | posix Pre-allocate via posix_fallocate() | |
531 | keep Pre-allocate via fallocate() with | |
532 | FALLOC_FL_KEEP_SIZE set | |
533 | 0 Backward-compatible alias for 'none' | |
534 | 1 Backward-compatible alias for 'posix' | |
535 | ||
536 | May not be available on all supported platforms. 'keep' is only | |
537 | available on Linux.If using ZFS on Solaris this must be set to | |
538 | 'none' because ZFS doesn't support it. Default: 'posix'. | |
7bc8c2cf | 539 | |
d2f3ac35 JA |
540 | fadvise_hint=bool By default, fio will use fadvise() to advise the kernel |
541 | on what IO patterns it is likely to issue. Sometimes you | |
542 | want to test specific IO patterns without telling the | |
543 | kernel about it, in which case you can disable this option. | |
ecb2083d JA |
544 | The following options are supported: |
545 | ||
546 | sequential Use FADV_SEQUENTIAL | |
547 | random Use FADV_RANDOM | |
548 | 1 Backwards-compatible hint for basing | |
549 | the hint on the fio workload. Will use | |
550 | FADV_SEQUENTIAL for a sequential | |
551 | workload, and FADV_RANDOM for a random | |
552 | workload. | |
553 | 0 Backwards-compatible setting for not | |
554 | issing a fadvise hint. | |
d2f3ac35 | 555 | |
37659335 JA |
556 | fadvise_stream=int Notify the kernel what write stream ID to place these |
557 | writes under. Only supported on Linux. Note, this option | |
558 | may change going forward. | |
559 | ||
f7fa2653 | 560 | size=int The total size of file io for this job. Fio will run until |
7616cafe | 561 | this many bytes has been transferred, unless runtime is |
a4d3b4db JA |
562 | limited by other options (such as 'runtime', for instance, |
563 | or increased/decreased by 'io_size'). Unless specific nrfiles | |
564 | and filesize options are given, fio will divide this size | |
565 | between the available files specified by the job. If not set, | |
566 | fio will use the full size of the given files or devices. | |
567 | If the files do not exist, size must be given. It is also | |
568 | possible to give size as a percentage between 1 and 100. If | |
569 | size=20% is given, fio will use 20% of the full size of the | |
570 | given files or devices. | |
571 | ||
572 | io_size=int | |
77731b29 JA |
573 | io_limit=int Normally fio operates within the region set by 'size', which |
574 | means that the 'size' option sets both the region and size of | |
575 | IO to be performed. Sometimes that is not what you want. With | |
576 | this option, it is possible to define just the amount of IO | |
6d500c2e RE |
577 | that fio should do. For instance, if 'size' is set to 20GiB and |
578 | 'io_size' is set to 5GiB, fio will perform IO within the first | |
579 | 20GiB but exit when 5GiB have been done. The opposite is also | |
580 | possible - if 'size' is set to 20GiB, and 'io_size' is set to | |
581 | 40GiB, then fio will do 40GiB of IO within the 0..20GiB region. | |
77731b29 | 582 | |
f7fa2653 | 583 | filesize=int Individual file sizes. May be a range, in which case fio |
9c60ce64 JA |
584 | will select sizes for files at random within the given range |
585 | and limited to 'size' in total (if that is given). If not | |
586 | given, each created file is the same size. | |
587 | ||
bedc9dc2 JA |
588 | file_append=bool Perform IO after the end of the file. Normally fio will |
589 | operate within the size of a file. If this option is set, then | |
590 | fio will append to the file instead. This has identical | |
0aae4ce7 JA |
591 | behavior to setting offset to the size of a file. This option |
592 | is ignored on non-regular files. | |
bedc9dc2 | 593 | |
74586c1e JA |
594 | fill_device=bool |
595 | fill_fs=bool Sets size to something really large and waits for ENOSPC (no | |
aa31f1f1 | 596 | space left on device) as the terminating condition. Only makes |
de98bd30 | 597 | sense with sequential write. For a read workload, the mount |
4f12432e JA |
598 | point will be filled first then IO started on the result. This |
599 | option doesn't make sense if operating on a raw device node, | |
600 | since the size of that is already known by the file system. | |
601 | Additionally, writing beyond end-of-device will not return | |
602 | ENOSPC there. | |
aa31f1f1 | 603 | |
6d500c2e RE |
604 | blocksize=int[,int][,int] |
605 | bs=int[,int][,int] | |
606 | The block size in bytes used for I/O units. Default: 4096. | |
607 | A single value applies to reads, writes, and trims. | |
608 | Comma-separated values may be specified for reads, writes, | |
609 | and trims. A value not terminated in a comma applies to | |
610 | subsequent types. | |
611 | ||
612 | Examples: | |
613 | bs=256k means 256k for reads, writes and trims | |
614 | bs=8k,32k means 8k for reads, 32k for writes and trims | |
615 | bs=8k,32k, means 8k for reads, 32k for writes, and | |
616 | default for trims | |
617 | bs=,8k means default for reads, 8k for writes and trims | |
618 | bs=,8k, means default for reads, 8k for writes, and | |
619 | default for writes | |
620 | ||
621 | blocksize_range=irange[,irange][,irange] | |
622 | bsrange=irange[,irange][,irange] | |
623 | A range of block sizes in bytes for I/O units. | |
624 | The issued I/O unit will always be a multiple of the minimum | |
625 | size, unless blocksize_unaligned is set. | |
626 | ||
627 | Comma-separated ranges may be specified for reads, writes, | |
628 | and trims as described in 'blocksize'. | |
629 | ||
630 | Example: bsrange=1k-4k,2k-8k | |
631 | ||
632 | bssplit=str[,str][,str] | |
633 | Sometimes you want even finer grained control of the | |
564ca972 JA |
634 | block sizes issued, not just an even split between them. |
635 | This option allows you to weight various block sizes, | |
636 | so that you are able to define a specific amount of | |
637 | block sizes issued. The format for this option is: | |
638 | ||
639 | bssplit=blocksize/percentage:blocksize/percentage | |
640 | ||
641 | for as many block sizes as needed. So if you want to define | |
642 | a workload that has 50% 64k blocks, 10% 4k blocks, and | |
643 | 40% 32k blocks, you would write: | |
644 | ||
645 | bssplit=4k/10:64k/50:32k/40 | |
646 | ||
647 | Ordering does not matter. If the percentage is left blank, | |
648 | fio will fill in the remaining values evenly. So a bssplit | |
649 | option like this one: | |
650 | ||
651 | bssplit=4k/50:1k/:32k/ | |
652 | ||
653 | would have 50% 4k ios, and 25% 1k and 32k ios. The percentages | |
654 | always add up to 100, if bssplit is given a range that adds | |
655 | up to more, it will error out. | |
656 | ||
6d500c2e RE |
657 | Comma-separated values may be specified for reads, writes, |
658 | and trims as described in 'blocksize'. | |
659 | ||
660 | If you want a workload that has 50% 2k reads and 50% 4k reads, | |
720e84ad JA |
661 | while having 90% 4k writes and 10% 8k writes, you would |
662 | specify: | |
663 | ||
892ea9bd | 664 | bssplit=2k/50:4k/50,4k/90:8k/10 |
720e84ad | 665 | |
d3aad8f2 | 666 | blocksize_unaligned |
6d500c2e RE |
667 | bs_unaligned If set, fio will issue I/O units with any size within |
668 | blocksize_range, not just multiples of the minimum size. | |
669 | This typically won't work with direct I/O, as that normally | |
670 | requires sector alignment. | |
71bfa161 | 671 | |
6aca9b3d | 672 | bs_is_seq_rand If this option is set, fio will use the normal read,write |
6d500c2e RE |
673 | blocksize settings as sequential,random blocksize settings |
674 | instead. Any random read or write will use the WRITE blocksize | |
675 | settings, and any sequential read or write will use the READ | |
676 | blocksize settings. | |
677 | ||
678 | blockalign=int[,int][,int] | |
679 | ba=int[,int][,int] | |
680 | Boundary to which fio will align random I/O units. | |
681 | Default: 'blocksize'. | |
682 | Minimum alignment is typically 512b for using direct IO, | |
683 | though it usually depends on the hardware block size. This | |
684 | option is mutually exclusive with using a random map for | |
685 | files, so it will turn off that option. | |
686 | Comma-separated values may be specified for reads, writes, | |
687 | and trims as described in 'blocksize'. | |
6aca9b3d | 688 | |
e9459e5a JA |
689 | zero_buffers If this option is given, fio will init the IO buffers to |
690 | all zeroes. The default is to fill them with random data. | |
691 | ||
5973cafb JA |
692 | refill_buffers If this option is given, fio will refill the IO buffers |
693 | on every submit. The default is to only fill it at init | |
694 | time and reuse that data. Only makes sense if zero_buffers | |
41ccd845 JA |
695 | isn't specified, naturally. If data verification is enabled, |
696 | refill_buffers is also automatically enabled. | |
5973cafb | 697 | |
fd68418e JA |
698 | scramble_buffers=bool If refill_buffers is too costly and the target is |
699 | using data deduplication, then setting this option will | |
700 | slightly modify the IO buffer contents to defeat normal | |
701 | de-dupe attempts. This is not enough to defeat more clever | |
702 | block compression attempts, but it will stop naive dedupe of | |
703 | blocks. Default: true. | |
704 | ||
c5751c62 JA |
705 | buffer_compress_percentage=int If this is set, then fio will attempt to |
706 | provide IO buffer content (on WRITEs) that compress to | |
707 | the specified level. Fio does this by providing a mix of | |
d1af2894 JA |
708 | random data and a fixed pattern. The fixed pattern is either |
709 | zeroes, or the pattern specified by buffer_pattern. If the | |
710 | pattern option is used, it might skew the compression ratio | |
711 | slightly. Note that this is per block size unit, for file/disk | |
712 | wide compression level that matches this setting, you'll also | |
713 | want to set refill_buffers. | |
c5751c62 JA |
714 | |
715 | buffer_compress_chunk=int See buffer_compress_percentage. This | |
716 | setting allows fio to manage how big the ranges of random | |
717 | data and zeroed data is. Without this set, fio will | |
718 | provide buffer_compress_percentage of blocksize random | |
719 | data, followed by the remaining zeroed. With this set | |
720 | to some chunk size smaller than the block size, fio can | |
721 | alternate random and zeroed data throughout the IO | |
722 | buffer. | |
723 | ||
5c94b008 JA |
724 | buffer_pattern=str If set, fio will fill the io buffers with this |
725 | pattern. If not set, the contents of io buffers is defined by | |
726 | the other options related to buffer contents. The setting can | |
727 | be any pattern of bytes, and can be prefixed with 0x for hex | |
728 | values. It may also be a string, where the string must then | |
61b9861d RP |
729 | be wrapped with "", e.g.: |
730 | ||
731 | buffer_pattern="abcd" | |
732 | or | |
733 | buffer_pattern=-12 | |
734 | or | |
735 | buffer_pattern=0xdeadface | |
736 | ||
737 | Also you can combine everything together in any order: | |
738 | buffer_pattern=0xdeadface"abcd"-12 | |
5c94b008 JA |
739 | |
740 | dedupe_percentage=int If set, fio will generate this percentage of | |
741 | identical buffers when writing. These buffers will be | |
742 | naturally dedupable. The contents of the buffers depend on | |
743 | what other buffer compression settings have been set. It's | |
744 | possible to have the individual buffers either fully | |
745 | compressible, or not at all. This option only controls the | |
746 | distribution of unique buffers. | |
ce35b1ec | 747 | |
71bfa161 JA |
748 | nrfiles=int Number of files to use for this job. Defaults to 1. |
749 | ||
390b1537 JA |
750 | openfiles=int Number of files to keep open at the same time. Defaults to |
751 | the same as nrfiles, can be set smaller to limit the number | |
752 | simultaneous opens. | |
753 | ||
5af1c6f3 JA |
754 | file_service_type=str Defines how fio decides which file from a job to |
755 | service next. The following types are defined: | |
756 | ||
757 | random Just choose a file at random. | |
758 | ||
759 | roundrobin Round robin over open files. This | |
760 | is the default. | |
761 | ||
a086c257 JA |
762 | sequential Finish one file before moving on to |
763 | the next. Multiple files can still be | |
764 | open depending on 'openfiles'. | |
765 | ||
8c07860d JA |
766 | zipf Use a zipfian distribution to decide what file |
767 | to access. | |
768 | ||
769 | pareto Use a pareto distribution to decide what file | |
770 | to access. | |
771 | ||
772 | gauss Use a gaussian (normal) distribution to decide | |
773 | what file to access. | |
774 | ||
775 | For random, roundrobin, and sequential, a postfix can be | |
776 | appended to tell fio how many I/Os to issue before switching | |
777 | to a new file. For example, specifying | |
778 | 'file_service_type=random:8' would cause fio to issue 8 I/Os | |
779 | before selecting a new file at random. For the non-uniform | |
780 | distributions, a floating point postfix can be given to | |
781 | influence how the distribution is skewed. See | |
782 | 'random_distribution' for a description of how that would work. | |
1907dbc6 | 783 | |
71bfa161 JA |
784 | ioengine=str Defines how the job issues io to the file. The following |
785 | types are defined: | |
786 | ||
787 | sync Basic read(2) or write(2) io. lseek(2) is | |
788 | used to position the io location. | |
789 | ||
38f8c318 TK |
790 | psync Basic pread(2) or pwrite(2) io. Default on all |
791 | supported operating systems except for Windows. | |
a31041ea | 792 | |
e05af9e5 | 793 | vsync Basic readv(2) or writev(2) IO. |
1d2af02a | 794 | |
385e1da6 JA |
795 | pvsync Basic preadv(2) or pwritev(2) IO. |
796 | ||
82e65aec | 797 | pvsync2 Basic preadv2(2) or pwritev2(2) IO. |
a46c5e01 | 798 | |
15d182aa JA |
799 | libaio Linux native asynchronous io. Note that Linux |
800 | may only support queued behaviour with | |
801 | non-buffered IO (set direct=1 or buffered=0). | |
de890a1e | 802 | This engine defines engine specific options. |
71bfa161 JA |
803 | |
804 | posixaio glibc posix asynchronous io. | |
805 | ||
417f0068 JA |
806 | solarisaio Solaris native asynchronous io. |
807 | ||
03e20d68 | 808 | windowsaio Windows native asynchronous io. |
38f8c318 | 809 | Default on Windows. |
03e20d68 | 810 | |
71bfa161 JA |
811 | mmap File is memory mapped and data copied |
812 | to/from using memcpy(3). | |
813 | ||
814 | splice splice(2) is used to transfer the data and | |
815 | vmsplice(2) to transfer data from user | |
816 | space to the kernel. | |
817 | ||
818 | sg SCSI generic sg v3 io. May either be | |
6c219763 | 819 | synchronous using the SG_IO ioctl, or if |
71bfa161 JA |
820 | the target is an sg character device |
821 | we use read(2) and write(2) for asynchronous | |
822 | io. | |
823 | ||
a94ea28b JA |
824 | null Doesn't transfer any data, just pretends |
825 | to. This is mainly used to exercise fio | |
826 | itself and for debugging/testing purposes. | |
827 | ||
ed92ac0c | 828 | net Transfer over the network to given host:port. |
de890a1e SL |
829 | Depending on the protocol used, the hostname, |
830 | port, listen and filename options are used to | |
831 | specify what sort of connection to make, while | |
832 | the protocol option determines which protocol | |
833 | will be used. | |
834 | This engine defines engine specific options. | |
ed92ac0c | 835 | |
9cce02e8 JA |
836 | netsplice Like net, but uses splice/vmsplice to |
837 | map data and send/receive. | |
de890a1e | 838 | This engine defines engine specific options. |
9cce02e8 | 839 | |
53aec0a4 | 840 | cpuio Doesn't transfer any data, but burns CPU |
ba0fbe10 | 841 | cycles according to the cpuload= and |
e0f01317 | 842 | cpuchunks= options. Setting cpuload=85 |
ba0fbe10 | 843 | will cause that job to do nothing but burn |
36ecec83 GP |
844 | 85% of the CPU. In case of SMP machines, |
845 | use numjobs=<no_of_cpu> to get desired CPU | |
846 | usage, as the cpuload only loads a single | |
3e93fc25 TK |
847 | CPU at the desired rate. A job never finishes |
848 | unless there is at least one non-cpuio job. | |
ba0fbe10 | 849 | |
e9a1806f JA |
850 | guasi The GUASI IO engine is the Generic Userspace |
851 | Asyncronous Syscall Interface approach | |
852 | to async IO. See | |
853 | ||
854 | http://www.xmailserver.org/guasi-lib.html | |
855 | ||
856 | for more info on GUASI. | |
857 | ||
21b8aee8 | 858 | rdma The RDMA I/O engine supports both RDMA |
eb52fa3f BVA |
859 | memory semantics (RDMA_WRITE/RDMA_READ) and |
860 | channel semantics (Send/Recv) for the | |
861 | InfiniBand, RoCE and iWARP protocols. | |
21b8aee8 | 862 | |
b861be9f JA |
863 | falloc IO engine that does regular fallocate to |
864 | simulate data transfer as fio ioengine. | |
865 | DDIR_READ does fallocate(,mode = keep_size,) | |
866 | DDIR_WRITE does fallocate(,mode = 0) | |
867 | DDIR_TRIM does fallocate(,mode = punch_hole) | |
d54fce84 DM |
868 | |
869 | e4defrag IO engine that does regular EXT4_IOC_MOVE_EXT | |
b861be9f JA |
870 | ioctls to simulate defragment activity in |
871 | request to DDIR_WRITE event | |
872 | ||
873 | rbd IO engine supporting direct access to Ceph | |
874 | Rados Block Devices (RBD) via librbd without | |
875 | the need to use the kernel rbd driver. This | |
876 | ioengine defines engine specific options. | |
877 | ||
878 | gfapi Using Glusterfs libgfapi sync interface to | |
879 | direct access to Glusterfs volumes without | |
880 | options. | |
881 | ||
882 | gfapi_async Using Glusterfs libgfapi async interface | |
883 | to direct access to Glusterfs volumes without | |
884 | having to go through FUSE. This ioengine | |
885 | defines engine specific options. | |
0981fd71 | 886 | |
b74e419e | 887 | libhdfs Read and write through Hadoop (HDFS). |
a3f001f5 | 888 | This engine interprets offsets a little |
b74e419e MM |
889 | differently. In HDFS, files once created |
890 | cannot be modified. So random writes are not | |
891 | possible. To imitate this, libhdfs engine | |
a3f001f5 | 892 | creates bunch of small files, and engine will |
42d97b5c SZ |
893 | pick a file out of those files based on the |
894 | offset generated by fio backend. Each jobs uses | |
a3f001f5 | 895 | it's own connection to HDFS. |
1b10477b | 896 | |
65fa28ca DE |
897 | mtd Read, write and erase an MTD character device |
898 | (e.g., /dev/mtd0). Discards are treated as | |
899 | erases. Depending on the underlying device | |
900 | type, the I/O may have to go in a certain | |
901 | pattern, e.g., on NAND, writing sequentially | |
902 | to erase blocks and discarding before | |
903 | overwriting. The writetrim mode works well | |
904 | for this constraint. | |
905 | ||
5c4ef02e JA |
906 | pmemblk Read and write through the NVML libpmemblk |
907 | interface. | |
908 | ||
104ee4de DJ |
909 | dev-dax Read and write through a DAX device exposed |
910 | from persistent memory. | |
911 | ||
8a7bd877 JA |
912 | external Prefix to specify loading an external |
913 | IO engine object file. Append the engine | |
914 | filename, eg ioengine=external:/tmp/foo.o | |
915 | to load ioengine foo.o in /tmp. | |
916 | ||
6d500c2e | 917 | iodepth=int This defines how many I/O units to keep in flight against |
71bfa161 JA |
918 | the file. The default is 1 for each file defined in this |
919 | job, can be overridden with a larger value for higher | |
ee72ca09 JA |
920 | concurrency. Note that increasing iodepth beyond 1 will not |
921 | affect synchronous ioengines (except for small degress when | |
9b836561 | 922 | verify_async is in use). Even async engines may impose OS |
ee72ca09 JA |
923 | restrictions causing the desired depth not to be achieved. |
924 | This may happen on Linux when using libaio and not setting | |
925 | direct=1, since buffered IO is not async on that OS. Keep an | |
926 | eye on the IO depth distribution in the fio output to verify | |
927 | that the achieved depth is as expected. Default: 1. | |
71bfa161 | 928 | |
4950421a | 929 | iodepth_batch_submit=int |
cb5ab512 | 930 | iodepth_batch=int This defines how many pieces of IO to submit at once. |
89e820f6 JA |
931 | It defaults to 1 which means that we submit each IO |
932 | as soon as it is available, but can be raised to submit | |
e63a0b2f RP |
933 | bigger batches of IO at the time. If it is set to 0 the iodepth |
934 | value will be used. | |
cb5ab512 | 935 | |
82407585 | 936 | iodepth_batch_complete_min=int |
4950421a JA |
937 | iodepth_batch_complete=int This defines how many pieces of IO to retrieve |
938 | at once. It defaults to 1 which means that we'll ask | |
939 | for a minimum of 1 IO in the retrieval process from | |
940 | the kernel. The IO retrieval will go on until we | |
941 | hit the limit set by iodepth_low. If this variable is | |
942 | set to 0, then fio will always check for completed | |
943 | events before queuing more IO. This helps reduce | |
944 | IO latency, at the cost of more retrieval system calls. | |
945 | ||
82407585 RP |
946 | iodepth_batch_complete_max=int This defines maximum pieces of IO to |
947 | retrieve at once. This variable should be used along with | |
948 | iodepth_batch_complete_min=int variable, specifying the range | |
949 | of min and max amount of IO which should be retrieved. By default | |
950 | it is equal to iodepth_batch_complete_min value. | |
951 | ||
952 | Example #1: | |
953 | ||
954 | iodepth_batch_complete_min=1 | |
955 | iodepth_batch_complete_max=<iodepth> | |
956 | ||
42d97b5c | 957 | which means that we will retrieve at least 1 IO and up to the |
82407585 RP |
958 | whole submitted queue depth. If none of IO has been completed |
959 | yet, we will wait. | |
960 | ||
961 | Example #2: | |
962 | ||
963 | iodepth_batch_complete_min=0 | |
964 | iodepth_batch_complete_max=<iodepth> | |
965 | ||
966 | which means that we can retrieve up to the whole submitted | |
967 | queue depth, but if none of IO has been completed yet, we will | |
968 | NOT wait and immediately exit the system call. In this example | |
969 | we simply do polling. | |
970 | ||
e916b390 JA |
971 | iodepth_low=int The low water mark indicating when to start filling |
972 | the queue again. Defaults to the same as iodepth, meaning | |
973 | that fio will attempt to keep the queue full at all times. | |
974 | If iodepth is set to eg 16 and iodepth_low is set to 4, then | |
975 | after fio has filled the queue of 16 requests, it will let | |
976 | the depth drain down to 4 before starting to fill it again. | |
977 | ||
1ad01bd1 JA |
978 | io_submit_mode=str This option controls how fio submits the IO to |
979 | the IO engine. The default is 'inline', which means that the | |
980 | fio job threads submit and reap IO directly. If set to | |
981 | 'offload', the job threads will offload IO submission to a | |
982 | dedicated pool of IO threads. This requires some coordination | |
983 | and thus has a bit of extra overhead, especially for lower | |
984 | queue depth IO where it can increase latencies. The benefit | |
985 | is that fio can manage submission rates independently of | |
986 | the device completion rates. This avoids skewed latency | |
987 | reporting if IO gets back up on the device side (the | |
988 | coordinated omission problem). | |
989 | ||
71bfa161 | 990 | direct=bool If value is true, use non-buffered io. This is usually |
9b836561 | 991 | O_DIRECT. Note that ZFS on Solaris doesn't support direct io. |
93bcfd20 | 992 | On Windows the synchronous ioengines don't support direct io. |
76a43db4 | 993 | |
d01612f3 CM |
994 | atomic=bool If value is true, attempt to use atomic direct IO. Atomic |
995 | writes are guaranteed to be stable once acknowledged by | |
996 | the operating system. Only Linux supports O_ATOMIC right | |
997 | now. | |
998 | ||
76a43db4 JA |
999 | buffered=bool If value is true, use buffered io. This is the opposite |
1000 | of the 'direct' option. Defaults to true. | |
71bfa161 | 1001 | |
f7fa2653 | 1002 | offset=int Start io at the given offset in the file. The data before |
71bfa161 JA |
1003 | the given offset will not be touched. This effectively |
1004 | caps the file size at real_size - offset. | |
1005 | ||
214ac7e0 | 1006 | offset_increment=int If this is provided, then the real offset becomes |
69bdd6ba JH |
1007 | offset + offset_increment * thread_number, where the thread |
1008 | number is a counter that starts at 0 and is incremented for | |
1009 | each sub-job (i.e. when numjobs option is specified). This | |
1010 | option is useful if there are several jobs which are intended | |
1011 | to operate on a file in parallel disjoint segments, with | |
1012 | even spacing between the starting points. | |
214ac7e0 | 1013 | |
ddf24e42 JA |
1014 | number_ios=int Fio will normally perform IOs until it has exhausted the size |
1015 | of the region set by size=, or if it exhaust the allocated | |
1016 | time (or hits an error condition). With this setting, the | |
1017 | range/size can be set independently of the number of IOs to | |
1018 | perform. When fio reaches this number, it will exit normally | |
be3fec7d JA |
1019 | and report status. Note that this does not extend the amount |
1020 | of IO that will be done, it will only stop fio if this | |
1021 | condition is met before other end-of-job criteria. | |
ddf24e42 | 1022 | |
71bfa161 JA |
1023 | fsync=int If writing to a file, issue a sync of the dirty data |
1024 | for every number of blocks given. For example, if you give | |
1025 | 32 as a parameter, fio will sync the file for every 32 | |
1026 | writes issued. If fio is using non-buffered io, we may | |
1027 | not sync the file. The exception is the sg io engine, which | |
6c219763 | 1028 | synchronizes the disk cache anyway. |
71bfa161 | 1029 | |
e76b1da4 | 1030 | fdatasync=int Like fsync= but uses fdatasync() to only sync data and not |
5f9099ea | 1031 | metadata blocks. |
37db59d6 JA |
1032 | In FreeBSD and Windows there is no fdatasync(), this falls back |
1033 | to using fsync() | |
5f9099ea | 1034 | |
e76b1da4 JA |
1035 | sync_file_range=str:val Use sync_file_range() for every 'val' number of |
1036 | write operations. Fio will track range of writes that | |
1037 | have happened since the last sync_file_range() call. 'str' | |
1038 | can currently be one or more of: | |
1039 | ||
1040 | wait_before SYNC_FILE_RANGE_WAIT_BEFORE | |
1041 | write SYNC_FILE_RANGE_WRITE | |
1042 | wait_after SYNC_FILE_RANGE_WAIT_AFTER | |
1043 | ||
1044 | So if you do sync_file_range=wait_before,write:8, fio would | |
1045 | use SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE for | |
1046 | every 8 writes. Also see the sync_file_range(2) man page. | |
1047 | This option is Linux specific. | |
1048 | ||
5036fc1e JA |
1049 | overwrite=bool If true, writes to a file will always overwrite existing |
1050 | data. If the file doesn't already exist, it will be | |
1051 | created before the write phase begins. If the file exists | |
1052 | and is large enough for the specified write phase, nothing | |
1053 | will be done. | |
71bfa161 | 1054 | |
dbd11ead | 1055 | end_fsync=bool If true, fsync file contents when a write stage has completed. |
71bfa161 | 1056 | |
ebb1415f JA |
1057 | fsync_on_close=bool If true, fio will fsync() a dirty file on close. |
1058 | This differs from end_fsync in that it will happen on every | |
1059 | file close, not just at the end of the job. | |
1060 | ||
71bfa161 JA |
1061 | rwmixread=int How large a percentage of the mix should be reads. |
1062 | ||
1063 | rwmixwrite=int How large a percentage of the mix should be writes. If both | |
1064 | rwmixread and rwmixwrite is given and the values do not add | |
1065 | up to 100%, the latter of the two will be used to override | |
c35dd7a6 JA |
1066 | the first. This may interfere with a given rate setting, |
1067 | if fio is asked to limit reads or writes to a certain rate. | |
1068 | If that is the case, then the distribution may be skewed. | |
71bfa161 | 1069 | |
6d500c2e RE |
1070 | random_distribution=str:float[,str:float][,str:float] |
1071 | By default, fio will use a completely uniform | |
92d42d69 JA |
1072 | random distribution when asked to perform random IO. Sometimes |
1073 | it is useful to skew the distribution in specific ways, | |
1074 | ensuring that some parts of the data is more hot than others. | |
1075 | fio includes the following distribution models: | |
1076 | ||
1077 | random Uniform random distribution | |
1078 | zipf Zipf distribution | |
1079 | pareto Pareto distribution | |
42d97b5c | 1080 | gauss Normal (gaussian) distribution |
e0a04ac1 | 1081 | zoned Zoned random distribution |
92d42d69 JA |
1082 | |
1083 | When using a zipf or pareto distribution, an input value | |
1084 | is also needed to define the access pattern. For zipf, this | |
1085 | is the zipf theta. For pareto, it's the pareto power. Fio | |
1086 | includes a test program, genzipf, that can be used visualize | |
1087 | what the given input values will yield in terms of hit rates. | |
1088 | If you wanted to use zipf with a theta of 1.2, you would use | |
1089 | random_distribution=zipf:1.2 as the option. If a non-uniform | |
8116fd24 JA |
1090 | model is used, fio will disable use of the random map. For |
1091 | the gauss distribution, a normal deviation is supplied as | |
1092 | a value between 0 and 100. | |
92d42d69 | 1093 | |
e0a04ac1 JA |
1094 | For a zoned distribution, fio supports specifying percentages |
1095 | of IO access that should fall within what range of the file or | |
1096 | device. For example, given a criteria of: | |
1097 | ||
1098 | 60% of accesses should be to the first 10% | |
1099 | 30% of accesses should be to the next 20% | |
1100 | 8% of accesses should be to to the next 30% | |
1101 | 2% of accesses should be to the next 40% | |
1102 | ||
1103 | we can define that through zoning of the random accesses. For | |
1104 | the above example, the user would do: | |
1105 | ||
1106 | random_distribution=zoned:60/10:30/20:8/30:2/40 | |
1107 | ||
1108 | similarly to how bssplit works for setting ranges and | |
1109 | percentages of block sizes. Like bssplit, it's possible to | |
1110 | specify separate zones for reads, writes, and trims. If just | |
1111 | one set is given, it'll apply to all of them. | |
1112 | ||
6d500c2e RE |
1113 | percentage_random=int[,int][,int] |
1114 | For a random workload, set how big a percentage should | |
211c9b89 JA |
1115 | be random. This defaults to 100%, in which case the workload |
1116 | is fully random. It can be set from anywhere from 0 to 100. | |
1117 | Setting it to 0 would make the workload fully sequential. Any | |
1118 | setting in between will result in a random mix of sequential | |
6d500c2e RE |
1119 | and random IO, at the given percentages. |
1120 | Comma-separated values may be specified for reads, writes, | |
1121 | and trims as described in 'blocksize'. | |
42d97b5c | 1122 | |
bb8895e0 JA |
1123 | norandommap Normally fio will cover every block of the file when doing |
1124 | random IO. If this option is given, fio will just get a | |
1125 | new random offset without looking at past io history. This | |
1126 | means that some blocks may not be read or written, and that | |
83da8fbf JE |
1127 | some blocks may be read/written more than once. If this option |
1128 | is used with verify= and multiple blocksizes (via bsrange=), | |
1129 | only intact blocks are verified, i.e., partially-overwritten | |
1130 | blocks are ignored. | |
bb8895e0 | 1131 | |
0408c206 JA |
1132 | softrandommap=bool See norandommap. If fio runs with the random block map |
1133 | enabled and it fails to allocate the map, if this option is | |
1134 | set it will continue without a random block map. As coverage | |
1135 | will not be as complete as with random maps, this option is | |
2b386d25 JA |
1136 | disabled by default. |
1137 | ||
e8b1961d JA |
1138 | random_generator=str Fio supports the following engines for generating |
1139 | IO offsets for random IO: | |
1140 | ||
1141 | tausworthe Strong 2^88 cycle random number generator | |
1142 | lfsr Linear feedback shift register generator | |
c3546b53 JA |
1143 | tausworthe64 Strong 64-bit 2^258 cycle random number |
1144 | generator | |
e8b1961d JA |
1145 | |
1146 | Tausworthe is a strong random number generator, but it | |
1147 | requires tracking on the side if we want to ensure that | |
1148 | blocks are only read or written once. LFSR guarantees | |
1149 | that we never generate the same offset twice, and it's | |
1150 | also less computationally expensive. It's not a true | |
1151 | random generator, however, though for IO purposes it's | |
1152 | typically good enough. LFSR only works with single | |
1153 | block sizes, not with workloads that use multiple block | |
1154 | sizes. If used with such a workload, fio may read or write | |
3bb85e84 JA |
1155 | some blocks multiple times. The default value is tausworthe, |
1156 | unless the required space exceeds 2^32 blocks. If it does, | |
1157 | then tausworthe64 is selected automatically. | |
43f09da1 | 1158 | |
71bfa161 JA |
1159 | nice=int Run the job with the given nice value. See man nice(2). |
1160 | ||
24a2bb13 BC |
1161 | On Windows, values less than -15 set the process class to "High"; |
1162 | -1 through -15 set "Above Normal"; 1 through 15 "Below Normal"; | |
1163 | and above 15 "Idle" priority class. | |
1164 | ||
71bfa161 JA |
1165 | prio=int Set the io priority value of this job. Linux limits us to |
1166 | a positive value between 0 and 7, with 0 being the highest. | |
4717fc5d TK |
1167 | See man ionice(1). Refer to an appropriate manpage for |
1168 | other operating systems since meaning of priority may differ. | |
71bfa161 JA |
1169 | |
1170 | prioclass=int Set the io priority class. See man ionice(1). | |
1171 | ||
1172 | thinktime=int Stall the job x microseconds after an io has completed before | |
1173 | issuing the next. May be used to simulate processing being | |
48097d5c JA |
1174 | done by an application. See thinktime_blocks and |
1175 | thinktime_spin. | |
1176 | ||
1177 | thinktime_spin=int | |
1178 | Only valid if thinktime is set - pretend to spend CPU time | |
1179 | doing something with the data received, before falling back | |
1180 | to sleeping for the rest of the period specified by | |
1181 | thinktime. | |
9c1f7434 | 1182 | |
4d01ece6 | 1183 | thinktime_blocks=int |
9c1f7434 JA |
1184 | Only valid if thinktime is set - control how many blocks |
1185 | to issue, before waiting 'thinktime' usecs. If not set, | |
1186 | defaults to 1 which will make fio wait 'thinktime' usecs | |
4d01ece6 JA |
1187 | after every block. This effectively makes any queue depth |
1188 | setting redundant, since no more than 1 IO will be queued | |
1189 | before we have to complete it and do our thinktime. In | |
1190 | other words, this setting effectively caps the queue depth | |
1191 | if the latter is larger. | |
71bfa161 | 1192 | |
6d500c2e RE |
1193 | rate=int[,int][,int] |
1194 | Cap the bandwidth used by this job. The number is in bytes/sec, | |
1195 | the normal suffix rules apply. | |
1196 | Comma-separated values may be specified for reads, writes, | |
1197 | and trims as described in 'blocksize'. | |
1198 | ||
1199 | rate_min=int[,int][,int] | |
1200 | Tell fio to do whatever it can to maintain at least this | |
1201 | bandwidth. Failing to meet this requirement will cause | |
1202 | the job to exit. | |
1203 | Comma-separated values may be specified for reads, writes, | |
1204 | and trims as described in 'blocksize'. | |
1205 | ||
1206 | rate_iops=int[,int][,int] | |
1207 | Cap the bandwidth to this number of IOPS. Basically the same | |
4e991c23 JA |
1208 | as rate, just specified independently of bandwidth. If the |
1209 | job is given a block size range instead of a fixed value, | |
6d500c2e RE |
1210 | the smallest block size is used as the metric. |
1211 | Comma-separated values may be specified for reads, writes, | |
1212 | and trims as described in 'blocksize'. | |
4e991c23 | 1213 | |
6d500c2e RE |
1214 | rate_iops_min=int[,int][,int] |
1215 | If fio doesn't meet this rate of IO, it will cause | |
1216 | the job to exit. | |
1217 | Comma-separated values may be specified for reads, writes, | |
1218 | and trims as described in 'blocksize'. | |
71bfa161 | 1219 | |
6de65959 JA |
1220 | rate_process=str This option controls how fio manages rated IO |
1221 | submissions. The default is 'linear', which submits IO in a | |
1222 | linear fashion with fixed delays between IOs that gets | |
1223 | adjusted based on IO completion rates. If this is set to | |
1224 | 'poisson', fio will submit IO based on a more real world | |
1225 | random request flow, known as the Poisson process | |
5d02b083 JA |
1226 | (https://en.wikipedia.org/wiki/Poisson_process). The lambda |
1227 | will be 10^6 / IOPS for the given workload. | |
e7b24047 | 1228 | |
3e260a46 JA |
1229 | latency_target=int If set, fio will attempt to find the max performance |
1230 | point that the given workload will run at while maintaining a | |
1231 | latency below this target. The values is given in microseconds. | |
1232 | See latency_window and latency_percentile | |
1233 | ||
1234 | latency_window=int Used with latency_target to specify the sample window | |
1235 | that the job is run at varying queue depths to test the | |
1236 | performance. The value is given in microseconds. | |
1237 | ||
1238 | latency_percentile=float The percentage of IOs that must fall within the | |
1239 | criteria specified by latency_target and latency_window. If not | |
1240 | set, this defaults to 100.0, meaning that all IOs must be equal | |
1241 | or below to the value set by latency_target. | |
1242 | ||
15501535 JA |
1243 | max_latency=int If set, fio will exit the job if it exceeds this maximum |
1244 | latency. It will exit with an ETIME error. | |
1245 | ||
6d428bcd | 1246 | rate_cycle=int Average bandwidth for 'rate' and 'rate_min' over this number |
6c219763 | 1247 | of milliseconds. |
71bfa161 JA |
1248 | |
1249 | cpumask=int Set the CPU affinity of this job. The parameter given is a | |
a08bc17f JA |
1250 | bitmask of allowed CPU's the job may run on. So if you want |
1251 | the allowed CPUs to be 1 and 5, you would pass the decimal | |
1252 | value of (1 << 1 | 1 << 5), or 34. See man | |
7dbb6eba | 1253 | sched_setaffinity(2). This may not work on all supported |
b0ea08ce JA |
1254 | operating systems or kernel versions. This option doesn't |
1255 | work well for a higher CPU count than what you can store in | |
1256 | an integer mask, so it can only control cpus 1-32. For | |
1257 | boxes with larger CPU counts, use cpus_allowed. | |
71bfa161 | 1258 | |
d2e268b0 JA |
1259 | cpus_allowed=str Controls the same options as cpumask, but it allows a text |
1260 | setting of the permitted CPUs instead. So to use CPUs 1 and | |
62a7273d JA |
1261 | 5, you would specify cpus_allowed=1,5. This options also |
1262 | allows a range of CPUs. Say you wanted a binding to CPUs | |
1263 | 1, 5, and 8-15, you would set cpus_allowed=1,5,8-15. | |
d2e268b0 | 1264 | |
c2acfbac JA |
1265 | cpus_allowed_policy=str Set the policy of how fio distributes the CPUs |
1266 | specified by cpus_allowed or cpumask. Two policies are | |
1267 | supported: | |
1268 | ||
1269 | shared All jobs will share the CPU set specified. | |
1270 | split Each job will get a unique CPU from the CPU set. | |
1271 | ||
1272 | 'shared' is the default behaviour, if the option isn't | |
ada083cd JA |
1273 | specified. If split is specified, then fio will will assign |
1274 | one cpu per job. If not enough CPUs are given for the jobs | |
1275 | listed, then fio will roundrobin the CPUs in the set. | |
c2acfbac | 1276 | |
769d13b5 | 1277 | numa_cpu_nodes=str Set this job running on specified NUMA nodes' CPUs. The |
d0b937ed YR |
1278 | arguments allow comma delimited list of cpu numbers, |
1279 | A-B ranges, or 'all'. Note, to enable numa options support, | |
67bf9823 | 1280 | fio must be built on a system with libnuma-dev(el) installed. |
d0b937ed YR |
1281 | |
1282 | numa_mem_policy=str Set this job's memory policy and corresponding NUMA | |
42d97b5c | 1283 | nodes. Format of the arguments: |
d0b937ed YR |
1284 | <mode>[:<nodelist>] |
1285 | `mode' is one of the following memory policy: | |
1286 | default, prefer, bind, interleave, local | |
1287 | For `default' and `local' memory policy, no node is | |
1288 | needed to be specified. | |
1289 | For `prefer', only one node is allowed. | |
1290 | For `bind' and `interleave', it allow comma delimited | |
1291 | list of numbers, A-B ranges, or 'all'. | |
1292 | ||
e417fd66 | 1293 | startdelay=time Start this job the specified number of seconds after fio |
71bfa161 JA |
1294 | has started. Only useful if the job file contains several |
1295 | jobs, and you want to delay starting some jobs to a certain | |
1296 | time. | |
1297 | ||
e417fd66 | 1298 | runtime=time Tell fio to terminate processing after the specified number |
71bfa161 JA |
1299 | of seconds. It can be quite hard to determine for how long |
1300 | a specified job will run, so this parameter is handy to | |
1301 | cap the total runtime to a given time. | |
1302 | ||
cf4464ca | 1303 | time_based If set, fio will run for the duration of the runtime |
bf9a3edb | 1304 | specified even if the file(s) are completely read or |
cf4464ca JA |
1305 | written. It will simply loop over the same workload |
1306 | as many times as the runtime allows. | |
1307 | ||
e417fd66 | 1308 | ramp_time=time If set, fio will run the specified workload for this amount |
721938ae JA |
1309 | of time before logging any performance numbers. Useful for |
1310 | letting performance settle before logging results, thus | |
b29ee5b3 JA |
1311 | minimizing the runtime required for stable results. Note |
1312 | that the ramp_time is considered lead in time for a job, | |
1313 | thus it will increase the total runtime if a special timeout | |
1314 | or runtime is specified. | |
721938ae | 1315 | |
16e56d25 VF |
1316 | steadystate=str:float |
1317 | ss=str:float Define the criterion and limit for assessing steady state | |
1318 | performance. The first parameter designates the criterion | |
1319 | whereas the second parameter sets the threshold. When the | |
1320 | criterion falls below the threshold for the specified duration, | |
1321 | the job will stop. For example, iops_slope:0.1% will direct fio | |
1322 | to terminate the job when the least squares regression slope | |
1323 | falls below 0.1% of the mean IOPS. If group_reporting is | |
1324 | enabled this will apply to all jobs in the group. Below is the | |
1325 | list of available steady state assessment criteria. All | |
1326 | assessments are carried out using only data from the rolling | |
1327 | collection window. Threshold limits can be expressed as a fixed | |
1328 | value or as a percentage of the mean in the collection window. | |
1329 | iops Collect IOPS data. Stop the job if all | |
1330 | individual IOPS measurements are within the | |
1331 | specified limit of the mean IOPS (e.g., iops:2 | |
1332 | means that all individual IOPS values must be | |
1333 | within 2 of the mean, whereas iops:0.2% means | |
1334 | that all individual IOPS values must be within | |
1335 | 0.2% of the mean IOPS to terminate the job). | |
1336 | iops_slope | |
1337 | Collect IOPS data and calculate the least | |
1338 | squares regression slope. Stop the job if the | |
1339 | slope falls below the specified limit. | |
1340 | bw Collect bandwidth data. Stop the job if all | |
1341 | individual bandwidth measurements are within | |
1342 | the specified limit of the mean bandwidth. | |
1343 | bw_slope | |
1344 | Collect bandwidth data and calculate the least | |
1345 | squares regression slope. Stop the job if the | |
1346 | slope falls below the specified limit. | |
1347 | ||
1348 | steadystate_duration=time | |
1349 | ss_dur=time A rolling window of this duration will be used to judge whether | |
1350 | steady state has been reached. Data will be collected once per | |
1351 | second. The default is 0 which disables steady state detection. | |
1352 | ||
1353 | steadystate_ramp_time=time | |
1354 | ss_ramp=time Allow the job to run for the specified duration before | |
1355 | beginning data collection for checking the steady state job | |
1356 | termination criterion. The default is 0. | |
71bfa161 | 1357 | |
3ce3881b VF |
1358 | invalidate=bool Invalidate the buffer/page cache parts for this file prior |
1359 | to starting io. Defaults to true. | |
1360 | ||
71bfa161 JA |
1361 | sync=bool Use sync io for buffered writes. For the majority of the |
1362 | io engines, this means using O_SYNC. | |
1363 | ||
d3aad8f2 | 1364 | iomem=str |
6d500c2e | 1365 | mem=str Fio can use various types of memory as the I/O unit buffer. |
71bfa161 JA |
1366 | The allowed values are: |
1367 | ||
1368 | malloc Use memory from malloc(3) as the buffers. | |
38f8c318 | 1369 | Default memory type. |
71bfa161 JA |
1370 | |
1371 | shm Use shared memory as the buffers. Allocated | |
1372 | through shmget(2). | |
1373 | ||
74b025b0 JA |
1374 | shmhuge Same as shm, but use huge pages as backing. |
1375 | ||
313cb206 JA |
1376 | mmap Use mmap to allocate buffers. May either be |
1377 | anonymous memory, or can be file backed if | |
1378 | a filename is given after the option. The | |
1379 | format is mem=mmap:/path/to/file. | |
71bfa161 | 1380 | |
d0bdaf49 JA |
1381 | mmaphuge Use a memory mapped huge file as the buffer |
1382 | backing. Append filename after mmaphuge, ala | |
1383 | mem=mmaphuge:/hugetlbfs/file | |
1384 | ||
09c782bb JA |
1385 | mmapshared Same as mmap, but use a MMAP_SHARED |
1386 | mapping. | |
1387 | ||
71bfa161 | 1388 | The area allocated is a function of the maximum allowed |
5394ae5f JA |
1389 | bs size for the job, multiplied by the io depth given. Note |
1390 | that for shmhuge and mmaphuge to work, the system must have | |
1391 | free huge pages allocated. This can normally be checked | |
1392 | and set by reading/writing /proc/sys/vm/nr_hugepages on a | |
6d500c2e | 1393 | Linux system. Fio assumes a huge page is 4MiB in size. So |
5394ae5f JA |
1394 | to calculate the number of huge pages you need for a given |
1395 | job file, add up the io depth of all jobs (normally one unless | |
1396 | iodepth= is used) and multiply by the maximum bs set. Then | |
1397 | divide that number by the huge page size. You can see the | |
1398 | size of the huge pages in /proc/meminfo. If no huge pages | |
1399 | are allocated by having a non-zero number in nr_hugepages, | |
56bb17f2 | 1400 | using mmaphuge or shmhuge will fail. Also see hugepage-size. |
5394ae5f JA |
1401 | |
1402 | mmaphuge also needs to have hugetlbfs mounted and the file | |
1403 | location should point there. So if it's mounted in /huge, | |
1404 | you would use mem=mmaphuge:/huge/somefile. | |
71bfa161 | 1405 | |
42d97b5c | 1406 | iomem_align=int This indicates the memory alignment of the IO memory buffers. |
6d500c2e | 1407 | Note that the given alignment is applied to the first I/O unit |
d529ee19 JA |
1408 | buffer, if using iodepth the alignment of the following buffers |
1409 | are given by the bs used. In other words, if using a bs that is | |
1410 | a multiple of the page sized in the system, all buffers will | |
1411 | be aligned to this value. If using a bs that is not page | |
1412 | aligned, the alignment of subsequent IO memory buffers is the | |
1413 | sum of the iomem_align and bs used. | |
1414 | ||
f7fa2653 | 1415 | hugepage-size=int |
56bb17f2 | 1416 | Defines the size of a huge page. Must at least be equal |
6d500c2e | 1417 | to the system setting, see /proc/meminfo. Defaults to 4MiB. |
c51074e7 JA |
1418 | Should probably always be a multiple of megabytes, so using |
1419 | hugepage-size=Xm is the preferred way to set this to avoid | |
1420 | setting a non-pow-2 bad value. | |
56bb17f2 | 1421 | |
71bfa161 JA |
1422 | exitall When one job finishes, terminate the rest. The default is |
1423 | to wait for each job to finish, sometimes that is not the | |
1424 | desired action. | |
1425 | ||
f9cafb12 JA |
1426 | exitall_on_error When one job finishes in error, terminate the rest. The |
1427 | default is to wait for each job to finish. | |
1428 | ||
71bfa161 | 1429 | bwavgtime=int Average the calculated bandwidth over the given time. Value |
a47591e4 JA |
1430 | is specified in milliseconds. If the job also does bandwidth |
1431 | logging through 'write_bw_log', then the minimum of this option | |
1432 | and 'log_avg_msec' will be used. Default: 500ms. | |
71bfa161 | 1433 | |
c8eeb9df | 1434 | iopsavgtime=int Average the calculated IOPS over the given time. Value |
a47591e4 JA |
1435 | is specified in milliseconds. If the job also does IOPS logging |
1436 | through 'write_iops_log', then the minimum of this option and | |
1437 | 'log_avg_msec' will be used. Default: 500ms. | |
c8eeb9df | 1438 | |
c2b8035f | 1439 | create_serialize=bool If true, serialize the file creation for the jobs. |
71bfa161 JA |
1440 | This may be handy to avoid interleaving of data |
1441 | files, which may greatly depend on the filesystem | |
1442 | used and even the number of processors in the system. | |
1443 | ||
1444 | create_fsync=bool fsync the data file after creation. This is the | |
1445 | default. | |
1446 | ||
814452bd JA |
1447 | create_on_open=bool Don't pre-setup the files for IO, just create open() |
1448 | when it's time to do IO to that file. | |
1449 | ||
25460cf6 JA |
1450 | create_only=bool If true, fio will only run the setup phase of the job. |
1451 | If files need to be laid out or updated on disk, only | |
1452 | that will be done. The actual job contents are not | |
1453 | executed. | |
1454 | ||
2378826d JA |
1455 | allow_file_create=bool If true, fio is permitted to create files as part |
1456 | of its workload. This is the default behavior. If this | |
1457 | option is false, then fio will error out if the files it | |
1458 | needs to use don't already exist. Default: true. | |
1459 | ||
e81ecca3 JA |
1460 | allow_mounted_write=bool If this isn't set, fio will abort jobs that |
1461 | are destructive (eg that write) to what appears to be a | |
1462 | mounted device or partition. This should help catch creating | |
1463 | inadvertently destructive tests, not realizing that the test | |
1464 | will destroy data on the mounted file system. Default: false. | |
1465 | ||
afad68f7 | 1466 | pre_read=bool If this is given, files will be pre-read into memory before |
34f1c044 JA |
1467 | starting the given IO operation. This will also clear |
1468 | the 'invalidate' flag, since it is pointless to pre-read | |
9c0d2241 | 1469 | and then drop the cache. This will only work for IO engines |
42d97b5c | 1470 | that are seek-able, since they allow you to read the same data |
9c0d2241 JA |
1471 | multiple times. Thus it will not work on eg network or splice |
1472 | IO. | |
afad68f7 | 1473 | |
e545a6ce | 1474 | unlink=bool Unlink the job files when done. Not the default, as repeated |
bf9a3edb JA |
1475 | runs of that job would then waste time recreating the file |
1476 | set again and again. | |
71bfa161 | 1477 | |
39c1c323 | 1478 | unlink_each_loop=bool Unlink job files after each iteration or loop. |
1479 | ||
71bfa161 JA |
1480 | loops=int Run the specified number of iterations of this job. Used |
1481 | to repeat the same workload a given number of times. Defaults | |
1482 | to 1. | |
1483 | ||
62167762 JC |
1484 | verify_only Do not perform specified workload---only verify data still |
1485 | matches previous invocation of this workload. This option | |
1486 | allows one to check data multiple times at a later date | |
1487 | without overwriting it. This option makes sense only for | |
1488 | workloads that write data, and does not support workloads | |
1489 | with the time_based option set. | |
1490 | ||
68e1f29a | 1491 | do_verify=bool Run the verify phase after a write phase. Only makes sense if |
e84c73a8 SL |
1492 | verify is set. Defaults to 1. |
1493 | ||
71bfa161 | 1494 | verify=str If writing to a file, fio can verify the file contents |
b638d82f RP |
1495 | after each iteration of the job. Each verification method also implies |
1496 | verification of special header, which is written to the beginning of | |
1497 | each block. This header also includes meta information, like offset | |
1498 | of the block, block number, timestamp when block was written, etc. | |
1499 | verify=str can be combined with verify_pattern=str option. | |
1500 | The allowed values are: | |
71bfa161 JA |
1501 | |
1502 | md5 Use an md5 sum of the data area and store | |
1503 | it in the header of each block. | |
1504 | ||
17dc34df JA |
1505 | crc64 Use an experimental crc64 sum of the data |
1506 | area and store it in the header of each | |
1507 | block. | |
1508 | ||
bac39e0e JA |
1509 | crc32c Use a crc32c sum of the data area and store |
1510 | it in the header of each block. | |
1511 | ||
42d97b5c | 1512 | crc32c-intel Use hardware assisted crc32c calculation |
0539d758 JA |
1513 | provided on SSE4.2 enabled processors. Falls |
1514 | back to regular software crc32c, if not | |
1515 | supported by the system. | |
3845591f | 1516 | |
71bfa161 JA |
1517 | crc32 Use a crc32 sum of the data area and store |
1518 | it in the header of each block. | |
1519 | ||
969f7ed3 JA |
1520 | crc16 Use a crc16 sum of the data area and store |
1521 | it in the header of each block. | |
1522 | ||
17dc34df JA |
1523 | crc7 Use a crc7 sum of the data area and store |
1524 | it in the header of each block. | |
1525 | ||
844ea602 JA |
1526 | xxhash Use xxhash as the checksum function. Generally |
1527 | the fastest software checksum that fio | |
1528 | supports. | |
1529 | ||
cd14cc10 JA |
1530 | sha512 Use sha512 as the checksum function. |
1531 | ||
1532 | sha256 Use sha256 as the checksum function. | |
1533 | ||
7c353ceb JA |
1534 | sha1 Use optimized sha1 as the checksum function. |
1535 | ||
b638d82f RP |
1536 | meta This option is deprecated, since now meta information is |
1537 | included in generic verification header and meta verification | |
1538 | happens by default. For detailed information see the description | |
1539 | of the verify=str setting. This option is kept because of | |
1540 | compatibility's sake with old configurations. Do not use it. | |
7437ee87 | 1541 | |
59245381 JA |
1542 | pattern Verify a strict pattern. Normally fio includes |
1543 | a header with some basic information and | |
1544 | checksumming, but if this option is set, only | |
1545 | the specific pattern set with 'verify_pattern' | |
1546 | is verified. | |
1547 | ||
36690c9b JA |
1548 | null Only pretend to verify. Useful for testing |
1549 | internals with ioengine=null, not for much | |
1550 | else. | |
1551 | ||
6c219763 | 1552 | This option can be used for repeated burn-in tests of a |
71bfa161 | 1553 | system to make sure that the written data is also |
b892dc08 JA |
1554 | correctly read back. If the data direction given is |
1555 | a read or random read, fio will assume that it should | |
1556 | verify a previously written file. If the data direction | |
1557 | includes any form of write, the verify will be of the | |
1558 | newly written data. | |
71bfa161 | 1559 | |
160b966d JA |
1560 | verifysort=bool If set, fio will sort written verify blocks when it deems |
1561 | it faster to read them back in a sorted manner. This is | |
1562 | often the case when overwriting an existing file, since | |
1563 | the blocks are already laid out in the file system. You | |
1564 | can ignore this option unless doing huge amounts of really | |
1565 | fast IO where the red-black tree sorting CPU time becomes | |
1566 | significant. | |
3f9f4e26 | 1567 | |
f7fa2653 | 1568 | verify_offset=int Swap the verification header with data somewhere else |
546a9142 SL |
1569 | in the block before writing. Its swapped back before |
1570 | verifying. | |
1571 | ||
f7fa2653 | 1572 | verify_interval=int Write the verification header at a finer granularity |
3f9f4e26 SL |
1573 | than the blocksize. It will be written for chunks the |
1574 | size of header_interval. blocksize should divide this | |
1575 | evenly. | |
90059d65 | 1576 | |
0e92f873 | 1577 | verify_pattern=str If set, fio will fill the io buffers with this |
e28218f3 SL |
1578 | pattern. Fio defaults to filling with totally random |
1579 | bytes, but sometimes it's interesting to fill with a known | |
1580 | pattern for io verification purposes. Depending on the | |
1581 | width of the pattern, fio will fill 1/2/3/4 bytes of the | |
0e92f873 RR |
1582 | buffer at the time(it can be either a decimal or a hex number). |
1583 | The verify_pattern if larger than a 32-bit quantity has to | |
996093bb | 1584 | be a hex number that starts with either "0x" or "0X". Use |
b638d82f | 1585 | with verify=str. Also, verify_pattern supports %o format, |
61b9861d | 1586 | which means that for each block offset will be written and |
42d97b5c | 1587 | then verified back, e.g.: |
61b9861d RP |
1588 | |
1589 | verify_pattern=%o | |
1590 | ||
1591 | Or use combination of everything: | |
1592 | verify_pattern=0xff%o"abcd"-12 | |
e28218f3 | 1593 | |
68e1f29a | 1594 | verify_fatal=bool Normally fio will keep checking the entire contents |
a12a3b4d JA |
1595 | before quitting on a block verification failure. If this |
1596 | option is set, fio will exit the job on the first observed | |
1597 | failure. | |
e8462bd8 | 1598 | |
b463e936 JA |
1599 | verify_dump=bool If set, dump the contents of both the original data |
1600 | block and the data block we read off disk to files. This | |
1601 | allows later analysis to inspect just what kind of data | |
ef71e317 | 1602 | corruption occurred. Off by default. |
b463e936 | 1603 | |
e8462bd8 JA |
1604 | verify_async=int Fio will normally verify IO inline from the submitting |
1605 | thread. This option takes an integer describing how many | |
1606 | async offload threads to create for IO verification instead, | |
1607 | causing fio to offload the duty of verifying IO contents | |
c85c324c JA |
1608 | to one or more separate threads. If using this offload |
1609 | option, even sync IO engines can benefit from using an | |
1610 | iodepth setting higher than 1, as it allows them to have | |
1611 | IO in flight while verifies are running. | |
e8462bd8 JA |
1612 | |
1613 | verify_async_cpus=str Tell fio to set the given CPU affinity on the | |
1614 | async IO verification threads. See cpus_allowed for the | |
1615 | format used. | |
6f87418f JA |
1616 | |
1617 | verify_backlog=int Fio will normally verify the written contents of a | |
1618 | job that utilizes verify once that job has completed. In | |
1619 | other words, everything is written then everything is read | |
1620 | back and verified. You may want to verify continually | |
1621 | instead for a variety of reasons. Fio stores the meta data | |
1622 | associated with an IO block in memory, so for large | |
1623 | verify workloads, quite a bit of memory would be used up | |
1624 | holding this meta data. If this option is enabled, fio | |
f42195a3 JA |
1625 | will write only N blocks before verifying these blocks. |
1626 | ||
6f87418f JA |
1627 | verify_backlog_batch=int Control how many blocks fio will verify |
1628 | if verify_backlog is set. If not set, will default to | |
1629 | the value of verify_backlog (meaning the entire queue | |
f42195a3 JA |
1630 | is read back and verified). If verify_backlog_batch is |
1631 | less than verify_backlog then not all blocks will be verified, | |
1632 | if verify_backlog_batch is larger than verify_backlog, some | |
1633 | blocks will be verified more than once. | |
66c098b8 | 1634 | |
ca09be4b JA |
1635 | verify_state_save=bool When a job exits during the write phase of a verify |
1636 | workload, save its current state. This allows fio to replay | |
1637 | up until that point, if the verify state is loaded for the | |
1638 | verify read phase. The format of the filename is, roughly, | |
1639 | <type>-<jobname>-<jobindex>-verify.state. <type> is "local" | |
1640 | for a local run, "sock" for a client/server socket connection, | |
1641 | and "ip" (192.168.0.1, for instance) for a networked | |
1642 | client/server connection. | |
1643 | ||
1644 | verify_state_load=bool If a verify termination trigger was used, fio stores | |
1645 | the current write state of each thread. This can be used at | |
1646 | verification time so that fio knows how far it should verify. | |
1647 | Without this information, fio will run a full verification | |
1648 | pass, according to the settings in the job file used. | |
1649 | ||
d392365e | 1650 | stonewall |
de8f6de9 | 1651 | wait_for_previous Wait for preceding jobs in the job file to exit, before |
71bfa161 | 1652 | starting this one. Can be used to insert serialization |
b3d62a75 JA |
1653 | points in the job file. A stone wall also implies starting |
1654 | a new reporting group. | |
1655 | ||
abcab6af | 1656 | new_group Start a new reporting group. See: group_reporting. |
71bfa161 JA |
1657 | |
1658 | numjobs=int Create the specified number of clones of this job. May be | |
1659 | used to setup a larger number of threads/processes doing | |
abcab6af AV |
1660 | the same thing. Each thread is reported separately; to see |
1661 | statistics for all clones as a whole, use group_reporting in | |
1662 | conjunction with new_group. | |
1663 | ||
1664 | group_reporting It may sometimes be interesting to display statistics for | |
04b2f799 JA |
1665 | groups of jobs as a whole instead of for each individual job. |
1666 | This is especially true if 'numjobs' is used; looking at | |
1667 | individual thread/process output quickly becomes unwieldy. | |
1668 | To see the final report per-group instead of per-job, use | |
1669 | 'group_reporting'. Jobs in a file will be part of the same | |
1670 | reporting group, unless if separated by a stonewall, or by | |
1671 | using 'new_group'. | |
71bfa161 JA |
1672 | |
1673 | thread fio defaults to forking jobs, however if this option is | |
1674 | given, fio will use pthread_create(3) to create threads | |
1675 | instead. | |
1676 | ||
f7fa2653 | 1677 | zonesize=int Divide a file into zones of the specified size. See zoneskip. |
71bfa161 | 1678 | |
f7fa2653 | 1679 | zoneskip=int Skip the specified number of bytes when zonesize data has |
71bfa161 JA |
1680 | been read. The two zone options can be used to only do |
1681 | io on zones of a file. | |
1682 | ||
076efc7c | 1683 | write_iolog=str Write the issued io patterns to the specified file. See |
5b42a488 SH |
1684 | read_iolog. Specify a separate file for each job, otherwise |
1685 | the iologs will be interspersed and the file may be corrupt. | |
71bfa161 | 1686 | |
076efc7c | 1687 | read_iolog=str Open an iolog with the specified file name and replay the |
71bfa161 | 1688 | io patterns it contains. This can be used to store a |
6df8adaa JA |
1689 | workload and replay it sometime later. The iolog given |
1690 | may also be a blktrace binary file, which allows fio | |
1691 | to replay a workload captured by blktrace. See blktrace | |
1692 | for how to capture such logging data. For blktrace replay, | |
1693 | the file needs to be turned into a blkparse binary data | |
ea3e51c3 | 1694 | file first (blkparse <device> -o /dev/null -d file_for_fio.bin). |
66c098b8 | 1695 | |
64bbb865 | 1696 | replay_no_stall=int When replaying I/O with read_iolog the default behavior |
62776229 | 1697 | is to attempt to respect the time stamps within the log and |
0228cfe7 | 1698 | replay them with the appropriate delay between IOPS. By |
62776229 JA |
1699 | setting this variable fio will not respect the timestamps and |
1700 | attempt to replay them as fast as possible while still | |
0228cfe7 | 1701 | respecting ordering. The result is the same I/O pattern to a |
62776229 | 1702 | given device, but different timings. |
71bfa161 | 1703 | |
d1c46c04 DN |
1704 | replay_redirect=str While replaying I/O patterns using read_iolog the |
1705 | default behavior is to replay the IOPS onto the major/minor | |
1706 | device that each IOP was recorded from. This is sometimes | |
de8f6de9 | 1707 | undesirable because on a different machine those major/minor |
d1c46c04 DN |
1708 | numbers can map to a different device. Changing hardware on |
1709 | the same system can also result in a different major/minor | |
1710 | mapping. Replay_redirect causes all IOPS to be replayed onto | |
1711 | the single specified device regardless of the device it was | |
1712 | recorded from. i.e. replay_redirect=/dev/sdc would cause all | |
0228cfe7 JA |
1713 | IO in the blktrace or iolog to be replayed onto /dev/sdc. |
1714 | This means multiple devices will be replayed onto a single | |
1715 | device, if the trace contains multiple devices. If you want | |
1716 | multiple devices to be replayed concurrently to multiple | |
1717 | redirected devices you must blkparse your trace into separate | |
1718 | traces and replay them with independent fio invocations. | |
42d97b5c | 1719 | Unfortunately this also breaks the strict time ordering |
0228cfe7 | 1720 | between multiple device accesses. |
d1c46c04 | 1721 | |
0c63576e JA |
1722 | replay_align=int Force alignment of IO offsets and lengths in a trace |
1723 | to this power of 2 value. | |
1724 | ||
1725 | replay_scale=int Scale sector offsets down by this factor when | |
1726 | replaying traces. | |
1727 | ||
3a5db920 JA |
1728 | per_job_logs=bool If set, this generates bw/clat/iops log with per |
1729 | file private filenames. If not set, jobs with identical names | |
1730 | will share the log filename. Default: true. | |
1731 | ||
e3cedca7 | 1732 | write_bw_log=str If given, write a bandwidth log of the jobs in this job |
71bfa161 | 1733 | file. Can be used to store data of the bandwidth of the |
e0da9bc2 JA |
1734 | jobs in their lifetime. The included fio_generate_plots |
1735 | script uses gnuplot to turn these text files into nice | |
ddb754db | 1736 | graphs. See write_lat_log for behaviour of given |
8ad3b3dd JA |
1737 | filename. For this option, the suffix is _bw.x.log, where |
1738 | x is the index of the job (1..N, where N is the number of | |
3a5db920 | 1739 | jobs). If 'per_job_logs' is false, then the filename will not |
a3ae5b05 | 1740 | include the job index. See 'Log File Formats'. |
71bfa161 | 1741 | |
e3cedca7 | 1742 | write_lat_log=str Same as write_bw_log, except that this option stores io |
02af0988 JA |
1743 | submission, completion, and total latencies instead. If no |
1744 | filename is given with this option, the default filename of | |
1745 | "jobname_type.log" is used. Even if the filename is given, | |
1746 | fio will still append the type of log. So if one specifies | |
e3cedca7 JA |
1747 | |
1748 | write_lat_log=foo | |
1749 | ||
8ad3b3dd JA |
1750 | The actual log names will be foo_slat.x.log, foo_clat.x.log, |
1751 | and foo_lat.x.log, where x is the index of the job (1..N, | |
1752 | where N is the number of jobs). This helps fio_generate_plot | |
dd32be11 | 1753 | find the logs automatically. If 'per_job_logs' is false, then |
a3ae5b05 JA |
1754 | the filename will not include the job index. See 'Log File |
1755 | Formats'. | |
71bfa161 | 1756 | |
1e613c9c KC |
1757 | write_hist_log=str Same as write_lat_log, but writes I/O completion |
1758 | latency histograms. If no filename is given with this option, the | |
1759 | default filename of "jobname_clat_hist.x.log" is used, where x is | |
1760 | the index of the job (1..N, where N is the number of jobs). Even | |
1761 | if the filename is given, fio will still append the type of log. | |
1762 | If per_job_logs is false, then the filename will not include the | |
1763 | job index. See 'Log File Formats'. | |
1764 | ||
b8bc8cba JA |
1765 | write_iops_log=str Same as write_bw_log, but writes IOPS. If no filename is |
1766 | given with this option, the default filename of | |
8ad3b3dd JA |
1767 | "jobname_type.x.log" is used,where x is the index of the job |
1768 | (1..N, where N is the number of jobs). Even if the filename | |
3a5db920 JA |
1769 | is given, fio will still append the type of log. If |
1770 | 'per_job_logs' is false, then the filename will not include | |
a3ae5b05 | 1771 | the job index. See 'Log File Formats'. |
b8bc8cba JA |
1772 | |
1773 | log_avg_msec=int By default, fio will log an entry in the iops, latency, | |
1774 | or bw log for every IO that completes. When writing to the | |
1775 | disk log, that can quickly grow to a very large size. Setting | |
1776 | this option makes fio average the each log entry over the | |
1777 | specified period of time, reducing the resolution of the log. | |
4b1ddb7a JA |
1778 | See log_max_value as well. Defaults to 0, logging all entries. |
1779 | ||
1e613c9c KC |
1780 | log_hist_msec=int Same as log_avg_msec, but logs entries for completion |
1781 | latency histograms. Computing latency percentiles from averages of | |
1782 | intervals using log_avg_msec is innacurate. Setting this option makes | |
1783 | fio log histogram entries over the specified period of time, reducing | |
1784 | log sizes for high IOPS devices while retaining percentile accuracy. | |
1785 | See log_hist_coarseness as well. Defaults to 0, meaning histogram | |
1786 | logging is disabled. | |
1787 | ||
1788 | log_hist_coarseness=int Integer ranging from 0 to 6, defining the coarseness | |
1789 | of the resolution of the histogram logs enabled with log_hist_msec. For | |
1790 | each increment in coarseness, fio outputs half as many bins. Defaults to | |
1791 | 0, for which histogram logs contain 1216 latency bins. See | |
1792 | 'Log File Formats'. | |
1793 | ||
4b1ddb7a JA |
1794 | log_max_value=bool If log_avg_msec is set, fio logs the average over that |
1795 | window. If you instead want to log the maximum value, set this | |
1796 | option to 1. Defaults to 0, meaning that averaged values are | |
1797 | logged. | |
b8bc8cba | 1798 | |
ae588852 JA |
1799 | log_offset=int If this is set, the iolog options will include the byte |
1800 | offset for the IO entry as well as the other data values. | |
1801 | ||
aee2ab67 JA |
1802 | log_compression=int If this is set, fio will compress the IO logs as |
1803 | it goes, to keep the memory footprint lower. When a log | |
1804 | reaches the specified size, that chunk is removed and | |
1805 | compressed in the background. Given that IO logs are | |
1806 | fairly highly compressible, this yields a nice memory | |
1807 | savings for longer runs. The downside is that the | |
1808 | compression will consume some background CPU cycles, so | |
1809 | it may impact the run. This, however, is also true if | |
1810 | the logging ends up consuming most of the system memory. | |
1811 | So pick your poison. The IO logs are saved normally at the | |
1812 | end of a run, by decompressing the chunks and storing them | |
1813 | in the specified log file. This feature depends on the | |
1814 | availability of zlib. | |
1815 | ||
c08f9fe2 JA |
1816 | log_compression_cpus=str Define the set of CPUs that are allowed to |
1817 | handle online log compression for the IO jobs. This can | |
1818 | provide better isolation between performance sensitive jobs, | |
1819 | and background compression work. | |
1820 | ||
1821 | log_store_compressed=bool If set, fio will store the log files in a | |
1822 | compressed format. They can be decompressed with fio, using | |
1823 | the --inflate-log command line parameter. The files will be | |
1824 | stored with a .fz suffix. | |
b26317c9 | 1825 | |
3aea75b1 KC |
1826 | log_unix_epoch=bool If set, fio will log Unix timestamps to the log |
1827 | files produced by enabling write_type_log for each log type, instead | |
1828 | of the default zero-based timestamps. | |
1829 | ||
66347cfa DE |
1830 | block_error_percentiles=bool If set, record errors in trim block-sized |
1831 | units from writes and trims and output a histogram of | |
1832 | how many trims it took to get to errors, and what kind | |
1833 | of error was encountered. | |
1834 | ||
f7fa2653 | 1835 | lockmem=int Pin down the specified amount of memory with mlock(2). Can |
71bfa161 JA |
1836 | potentially be used instead of removing memory or booting |
1837 | with less memory to simulate a smaller amount of memory. | |
81c6b6cd | 1838 | The amount specified is per worker. |
71bfa161 JA |
1839 | |
1840 | exec_prerun=str Before running this job, issue the command specified | |
74c8c488 JA |
1841 | through system(3). Output is redirected in a file called |
1842 | jobname.prerun.txt. | |
71bfa161 JA |
1843 | |
1844 | exec_postrun=str After the job completes, issue the command specified | |
74c8c488 JA |
1845 | though system(3). Output is redirected in a file called |
1846 | jobname.postrun.txt. | |
71bfa161 JA |
1847 | |
1848 | ioscheduler=str Attempt to switch the device hosting the file to the specified | |
1849 | io scheduler before running. | |
1850 | ||
0a839f30 JA |
1851 | disk_util=bool Generate disk utilization statistics, if the platform |
1852 | supports it. Defaults to on. | |
1853 | ||
02af0988 | 1854 | disable_lat=bool Disable measurements of total latency numbers. Useful |
9520ebb9 JA |
1855 | only for cutting back the number of calls to gettimeofday, |
1856 | as that does impact performance at really high IOPS rates. | |
1857 | Note that to really get rid of a large amount of these | |
1858 | calls, this option must be used with disable_slat and | |
1859 | disable_bw as well. | |
1860 | ||
02af0988 JA |
1861 | disable_clat=bool Disable measurements of completion latency numbers. See |
1862 | disable_lat. | |
1863 | ||
9520ebb9 | 1864 | disable_slat=bool Disable measurements of submission latency numbers. See |
02af0988 | 1865 | disable_slat. |
9520ebb9 JA |
1866 | |
1867 | disable_bw=bool Disable measurements of throughput/bandwidth numbers. See | |
02af0988 | 1868 | disable_lat. |
9520ebb9 | 1869 | |
83349190 YH |
1870 | clat_percentiles=bool Enable the reporting of percentiles of |
1871 | completion latencies. | |
1872 | ||
1873 | percentile_list=float_list Overwrite the default list of percentiles | |
66347cfa DE |
1874 | for completion latencies and the block error histogram. |
1875 | Each number is a floating number in the range (0,100], | |
1876 | and the maximum length of the list is 20. Use ':' | |
1877 | to separate the numbers, and list the numbers in ascending | |
1878 | order. For example, --percentile_list=99.5:99.9 will cause | |
1879 | fio to report the values of completion latency below which | |
1880 | 99.5% and 99.9% of the observed latencies fell, respectively. | |
83349190 | 1881 | |
23893646 JA |
1882 | clocksource=str Use the given clocksource as the base of timing. The |
1883 | supported options are: | |
1884 | ||
1885 | gettimeofday gettimeofday(2) | |
1886 | ||
1887 | clock_gettime clock_gettime(2) | |
1888 | ||
1889 | cpu Internal CPU clock source | |
1890 | ||
1891 | cpu is the preferred clocksource if it is reliable, as it | |
1892 | is very fast (and fio is heavy on time calls). Fio will | |
1893 | automatically use this clocksource if it's supported and | |
1894 | considered reliable on the system it is running on, unless | |
1895 | another clocksource is specifically set. For x86/x86-64 CPUs, | |
1896 | this means supporting TSC Invariant. | |
1897 | ||
993bf48b JA |
1898 | gtod_reduce=bool Enable all of the gettimeofday() reducing options |
1899 | (disable_clat, disable_slat, disable_bw) plus reduce | |
1900 | precision of the timeout somewhat to really shrink | |
1901 | the gettimeofday() call count. With this option enabled, | |
1902 | we only do about 0.4% of the gtod() calls we would have | |
1903 | done if all time keeping was enabled. | |
1904 | ||
be4ecfdf JA |
1905 | gtod_cpu=int Sometimes it's cheaper to dedicate a single thread of |
1906 | execution to just getting the current time. Fio (and | |
1907 | databases, for instance) are very intensive on gettimeofday() | |
1908 | calls. With this option, you can set one CPU aside for | |
1909 | doing nothing but logging current time to a shared memory | |
1910 | location. Then the other threads/processes that run IO | |
1911 | workloads need only copy that segment, instead of entering | |
1912 | the kernel with a gettimeofday() call. The CPU set aside | |
1913 | for doing these time calls will be excluded from other | |
1914 | uses. Fio will manually clear it from the CPU mask of other | |
1915 | jobs. | |
a696fa2a | 1916 | |
06842027 | 1917 | continue_on_error=str Normally fio will exit the job on the first observed |
f2bba182 RR |
1918 | failure. If this option is set, fio will continue the job when |
1919 | there is a 'non-fatal error' (EIO or EILSEQ) until the runtime | |
1920 | is exceeded or the I/O size specified is completed. If this | |
1921 | option is used, there are two more stats that are appended, | |
1922 | the total error count and the first error. The error field | |
1923 | given in the stats is the first error that was hit during the | |
1924 | run. | |
be4ecfdf | 1925 | |
06842027 SL |
1926 | The allowed values are: |
1927 | ||
1928 | none Exit on any IO or verify errors. | |
1929 | ||
1930 | read Continue on read errors, exit on all others. | |
1931 | ||
1932 | write Continue on write errors, exit on all others. | |
1933 | ||
1934 | io Continue on any IO error, exit on all others. | |
1935 | ||
1936 | verify Continue on verify errors, exit on all others. | |
1937 | ||
1938 | all Continue on all errors. | |
1939 | ||
1940 | 0 Backward-compatible alias for 'none'. | |
1941 | ||
1942 | 1 Backward-compatible alias for 'all'. | |
1943 | ||
8b28bd41 DM |
1944 | ignore_error=str Sometimes you want to ignore some errors during test |
1945 | in that case you can specify error list for each error type. | |
1946 | ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST | |
1947 | errors for given error type is separated with ':'. Error | |
1948 | may be symbol ('ENOSPC', 'ENOMEM') or integer. | |
1949 | Example: | |
1950 | ignore_error=EAGAIN,ENOSPC:122 | |
66c098b8 BC |
1951 | This option will ignore EAGAIN from READ, and ENOSPC and |
1952 | 122(EDQUOT) from WRITE. | |
8b28bd41 DM |
1953 | |
1954 | error_dump=bool If set dump every error even if it is non fatal, true | |
1955 | by default. If disabled only fatal error will be dumped | |
66c098b8 | 1956 | |
6adb38a1 JA |
1957 | cgroup=str Add job to this control group. If it doesn't exist, it will |
1958 | be created. The system must have a mounted cgroup blkio | |
1959 | mount point for this to work. If your system doesn't have it | |
1960 | mounted, you can do so with: | |
a696fa2a JA |
1961 | |
1962 | # mount -t cgroup -o blkio none /cgroup | |
1963 | ||
a696fa2a JA |
1964 | cgroup_weight=int Set the weight of the cgroup to this value. See |
1965 | the documentation that comes with the kernel, allowed values | |
1966 | are in the range of 100..1000. | |
71bfa161 | 1967 | |
7de87099 VG |
1968 | cgroup_nodelete=bool Normally fio will delete the cgroups it has created after |
1969 | the job completion. To override this behavior and to leave | |
1970 | cgroups around after the job completion, set cgroup_nodelete=1. | |
1971 | This can be useful if one wants to inspect various cgroup | |
1972 | files after job completion. Default: false | |
1973 | ||
e0b0d892 JA |
1974 | uid=int Instead of running as the invoking user, set the user ID to |
1975 | this value before the thread/process does any work. | |
1976 | ||
1977 | gid=int Set group ID, see uid. | |
1978 | ||
9e684a49 DE |
1979 | flow_id=int The ID of the flow. If not specified, it defaults to being a |
1980 | global flow. See flow. | |
1981 | ||
1982 | flow=int Weight in token-based flow control. If this value is used, then | |
1983 | there is a 'flow counter' which is used to regulate the | |
1984 | proportion of activity between two or more jobs. fio attempts | |
1985 | to keep this flow counter near zero. The 'flow' parameter | |
1986 | stands for how much should be added or subtracted to the flow | |
1987 | counter on each iteration of the main I/O loop. That is, if | |
1988 | one job has flow=8 and another job has flow=-1, then there | |
1989 | will be a roughly 1:8 ratio in how much one runs vs the other. | |
1990 | ||
1991 | flow_watermark=int The maximum value that the absolute value of the flow | |
1992 | counter is allowed to reach before the job must wait for a | |
1993 | lower value of the counter. | |
1994 | ||
1995 | flow_sleep=int The period of time, in microseconds, to wait after the flow | |
1996 | watermark has been exceeded before retrying operations | |
1997 | ||
de890a1e SL |
1998 | In addition, there are some parameters which are only valid when a specific |
1999 | ioengine is in use. These are used identically to normal parameters, with the | |
2000 | caveat that when used on the command line, they must come after the ioengine | |
2001 | that defines them is selected. | |
2002 | ||
2003 | [libaio] userspace_reap Normally, with the libaio engine in use, fio will use | |
2004 | the io_getevents system call to reap newly returned events. | |
2005 | With this flag turned on, the AIO ring will be read directly | |
2006 | from user-space to reap events. The reaping mode is only | |
2007 | enabled when polling for a minimum of 0 events (eg when | |
2008 | iodepth_batch_complete=0). | |
2009 | ||
2cafffbe JA |
2010 | [psyncv2] hipri Set RWF_HIPRI on IO, indicating to the kernel that |
2011 | it's of higher priority than normal. | |
2012 | ||
2403767a | 2013 | [cpuio] cpuload=int Attempt to use the specified percentage of CPU cycles. |
0353050f | 2014 | |
2403767a | 2015 | [cpuio] cpuchunks=int Split the load into cycles of the given time. In |
0353050f JA |
2016 | microseconds. |
2017 | ||
2403767a | 2018 | [cpuio] exit_on_io_done=bool Detect when IO threads are done, then exit. |
046395d7 | 2019 | |
de890a1e SL |
2020 | [netsplice] hostname=str |
2021 | [net] hostname=str The host name or IP address to use for TCP or UDP based IO. | |
2022 | If the job is a TCP listener or UDP reader, the hostname is not | |
b511c9aa SB |
2023 | used and must be omitted unless it is a valid UDP multicast |
2024 | address. | |
a3f001f5 | 2025 | [libhdfs] namenode=str The host name or IP address of a HDFS cluster namenode to contact. |
de890a1e SL |
2026 | |
2027 | [netsplice] port=int | |
6315af9d JA |
2028 | [net] port=int The TCP or UDP port to bind to or connect to. If this is used |
2029 | with numjobs to spawn multiple instances of the same job type, then this will | |
2030 | be the starting port number since fio will use a range of ports. | |
a3f001f5 | 2031 | [libhdfs] port=int the listening port of the HFDS cluster namenode. |
de890a1e | 2032 | |
b93b6a2e SB |
2033 | [netsplice] interface=str |
2034 | [net] interface=str The IP address of the network interface used to send or | |
2035 | receive UDP multicast | |
2036 | ||
d3a623de SB |
2037 | [netsplice] ttl=int |
2038 | [net] ttl=int Time-to-live value for outgoing UDP multicast packets. | |
2039 | Default: 1 | |
2040 | ||
1d360ffb JA |
2041 | [netsplice] nodelay=bool |
2042 | [net] nodelay=bool Set TCP_NODELAY on TCP connections. | |
2043 | ||
de890a1e SL |
2044 | [netsplice] protocol=str |
2045 | [netsplice] proto=str | |
2046 | [net] protocol=str | |
2047 | [net] proto=str The network protocol to use. Accepted values are: | |
2048 | ||
2049 | tcp Transmission control protocol | |
49ccb8c1 | 2050 | tcpv6 Transmission control protocol V6 |
f5cc3d0e | 2051 | udp User datagram protocol |
49ccb8c1 | 2052 | udpv6 User datagram protocol V6 |
de890a1e SL |
2053 | unix UNIX domain socket |
2054 | ||
2055 | When the protocol is TCP or UDP, the port must also be given, | |
2056 | as well as the hostname if the job is a TCP listener or UDP | |
2057 | reader. For unix sockets, the normal filename option should be | |
2058 | used and the port is invalid. | |
2059 | ||
2060 | [net] listen For TCP network connections, tell fio to listen for incoming | |
2061 | connections rather than initiating an outgoing connection. The | |
2062 | hostname must be omitted if this option is used. | |
1008602c | 2063 | |
42d97b5c | 2064 | [net] pingpong Normally a network writer will just continue writing data, and |
7aeb1e94 JA |
2065 | a network reader will just consume packages. If pingpong=1 |
2066 | is set, a writer will send its normal payload to the reader, | |
2067 | then wait for the reader to send the same payload back. This | |
2068 | allows fio to measure network latencies. The submission | |
2069 | and completion latencies then measure local time spent | |
2070 | sending or receiving, and the completion latency measures | |
2071 | how long it took for the other end to receive and send back. | |
b511c9aa SB |
2072 | For UDP multicast traffic pingpong=1 should only be set for a |
2073 | single reader when multiple readers are listening to the same | |
2074 | address. | |
7aeb1e94 | 2075 | |
1008602c JA |
2076 | [net] window_size Set the desired socket buffer size for the connection. |
2077 | ||
e5f34d95 JA |
2078 | [net] mss Set the TCP maximum segment size (TCP_MAXSEG). |
2079 | ||
d54fce84 DM |
2080 | [e4defrag] donorname=str |
2081 | File will be used as a block donor(swap extents between files) | |
2082 | [e4defrag] inplace=int | |
66c098b8 | 2083 | Configure donor file blocks allocation strategy |
d54fce84 | 2084 | 0(default): Preallocate donor's file on init |
42d97b5c | 2085 | 1 : allocate space immediately inside defragment event, |
d54fce84 DM |
2086 | and free right after event |
2087 | ||
08a2cbf6 JA |
2088 | [rbd] clustername=str Specifies the name of the Ceph cluster. |
2089 | [rbd] rbdname=str Specifies the name of the RBD. | |
42d97b5c | 2090 | [rbd] pool=str Specifies the name of the Ceph pool containing RBD. |
08a2cbf6 JA |
2091 | [rbd] clientname=str Specifies the username (without the 'client.' prefix) |
2092 | used to access the Ceph cluster. If the clustername is | |
42d97b5c | 2093 | specified, the clientname shall be the full type.id |
08a2cbf6 JA |
2094 | string. If no type. prefix is given, fio will add |
2095 | 'client.' by default. | |
2096 | ||
65fa28ca | 2097 | [mtd] skip_bad=bool Skip operations against known bad blocks. |
de890a1e | 2098 | |
a3f001f5 | 2099 | [libhdfs] hdfsdirectory libhdfs will create chunk in this HDFS directory |
dda13f44 | 2100 | [libhdfs] chunk_size the size of the chunk to use for each file. |
a3f001f5 | 2101 | |
de890a1e | 2102 | |
71bfa161 JA |
2103 | 6.0 Interpreting the output |
2104 | --------------------------- | |
2105 | ||
2106 | fio spits out a lot of output. While running, fio will display the | |
2107 | status of the jobs created. An example of that would be: | |
2108 | ||
6d500c2e | 2109 | Jobs: 1: [_r] [24.8% done] [r=20992KiB/s,w=24064KiB/s,t=0KiB/s] [r=82,w=94,t=0 iops] [eta 00h:01m:31s] |
71bfa161 JA |
2110 | |
2111 | The characters inside the square brackets denote the current status of | |
2112 | each thread. The possible values (in typical life cycle order) are: | |
2113 | ||
2114 | Idle Run | |
2115 | ---- --- | |
2116 | P Thread setup, but not started. | |
2117 | C Thread created. | |
9c6f6316 | 2118 | I Thread initialized, waiting or generating necessary data. |
b0f65863 | 2119 | p Thread running pre-reading file(s). |
71bfa161 JA |
2120 | R Running, doing sequential reads. |
2121 | r Running, doing random reads. | |
2122 | W Running, doing sequential writes. | |
2123 | w Running, doing random writes. | |
2124 | M Running, doing mixed sequential reads/writes. | |
2125 | m Running, doing mixed random reads/writes. | |
2126 | F Running, currently waiting for fsync() | |
3d434057 | 2127 | f Running, finishing up (writing IO logs, etc) |
fc6bd43c | 2128 | V Running, doing verification of written data. |
71bfa161 | 2129 | E Thread exited, not reaped by main thread yet. |
4f7e57a4 JA |
2130 | _ Thread reaped, or |
2131 | X Thread reaped, exited with an error. | |
a5e371a6 | 2132 | K Thread reaped, exited due to signal. |
71bfa161 | 2133 | |
3e2e48a7 JA |
2134 | Fio will condense the thread string as not to take up more space on the |
2135 | command line as is needed. For instance, if you have 10 readers and 10 | |
2136 | writers running, the output would look like this: | |
2137 | ||
6d500c2e | 2138 | Jobs: 20 (f=20): [R(10),W(10)] [4.0% done] [r=20992KiB/s,w=24064KiB/s,t=0KiB/s] [r=82,w=94,t=0 iops] [eta 57m:36s] |
3e2e48a7 JA |
2139 | |
2140 | Fio will still maintain the ordering, though. So the above means that jobs | |
2141 | 1..10 are readers, and 11..20 are writers. | |
2142 | ||
71bfa161 | 2143 | The other values are fairly self explanatory - number of threads |
c9f60304 JA |
2144 | currently running and doing io, rate of io since last check (read speed |
2145 | listed first, then write speed), and the estimated completion percentage | |
2146 | and time for the running group. It's impossible to estimate runtime of | |
4f7e57a4 JA |
2147 | the following groups (if any). Note that the string is displayed in order, |
2148 | so it's possible to tell which of the jobs are currently doing what. The | |
2149 | first character is the first job defined in the job file, and so forth. | |
71bfa161 JA |
2150 | |
2151 | When fio is done (or interrupted by ctrl-c), it will show the data for | |
2152 | each thread, group of threads, and disks in that order. For each data | |
2153 | direction, the output looks like: | |
2154 | ||
2155 | Client1 (g=0): err= 0: | |
6d500c2e | 2156 | write: io= 32MiB, bw= 666KiB/s, iops=89 , runt= 50320msec |
6104ddb6 JA |
2157 | slat (msec): min= 0, max= 136, avg= 0.03, stdev= 1.92 |
2158 | clat (msec): min= 0, max= 631, avg=48.50, stdev=86.82 | |
6d500c2e | 2159 | bw (KiB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, stdev=681.68 |
e7823a94 | 2160 | cpu : usr=1.49%, sys=0.25%, ctx=7969, majf=0, minf=17 |
71619dc2 | 2161 | IO depths : 1=0.1%, 2=0.3%, 4=0.5%, 8=99.0%, 16=0.0%, 32=0.0%, >32=0.0% |
838bc709 JA |
2162 | submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% |
2163 | complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% | |
30061b97 | 2164 | issued r/w: total=0/32768, short=0/0 |
8abdce66 JA |
2165 | lat (msec): 2=1.6%, 4=0.0%, 10=3.2%, 20=12.8%, 50=38.4%, 100=24.8%, |
2166 | lat (msec): 250=15.2%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2048=0.0% | |
71bfa161 JA |
2167 | |
2168 | The client number is printed, along with the group id and error of that | |
2169 | thread. Below is the io statistics, here for writes. In the order listed, | |
2170 | they denote: | |
2171 | ||
2172 | io= Number of megabytes io performed | |
2173 | bw= Average bandwidth rate | |
35649e58 | 2174 | iops= Average IOs performed per second |
71bfa161 | 2175 | runt= The runtime of that thread |
72fbda2a | 2176 | slat= Submission latency (avg being the average, stdev being the |
71bfa161 JA |
2177 | standard deviation). This is the time it took to submit |
2178 | the io. For sync io, the slat is really the completion | |
8a35c71e | 2179 | latency, since queue/complete is one operation there. This |
bf9a3edb | 2180 | value can be in milliseconds or microseconds, fio will choose |
8a35c71e | 2181 | the most appropriate base and print that. In the example |
0d237712 LAG |
2182 | above, milliseconds is the best scale. Note: in --minimal mode |
2183 | latencies are always expressed in microseconds. | |
71bfa161 JA |
2184 | clat= Completion latency. Same names as slat, this denotes the |
2185 | time from submission to completion of the io pieces. For | |
2186 | sync io, clat will usually be equal (or very close) to 0, | |
2187 | as the time from submit to complete is basically just | |
2188 | CPU time (io has already been done, see slat explanation). | |
2189 | bw= Bandwidth. Same names as the xlat stats, but also includes | |
2190 | an approximate percentage of total aggregate bandwidth | |
2191 | this thread received in this group. This last value is | |
2192 | only really useful if the threads in this group are on the | |
2193 | same disk, since they are then competing for disk access. | |
2194 | cpu= CPU usage. User and system time, along with the number | |
e7823a94 JA |
2195 | of context switches this thread went through, usage of |
2196 | system and user time, and finally the number of major | |
23a8e176 JA |
2197 | and minor page faults. The CPU utilization numbers are |
2198 | averages for the jobs in that reporting group, while the | |
2199 | context and fault counters are summed. | |
71619dc2 JA |
2200 | IO depths= The distribution of io depths over the job life time. The |
2201 | numbers are divided into powers of 2, so for example the | |
2202 | 16= entries includes depths up to that value but higher | |
2203 | than the previous entry. In other words, it covers the | |
2204 | range from 16 to 31. | |
838bc709 JA |
2205 | IO submit= How many pieces of IO were submitting in a single submit |
2206 | call. Each entry denotes that amount and below, until | |
2207 | the previous entry - eg, 8=100% mean that we submitted | |
2208 | anywhere in between 5-8 ios per submit call. | |
2209 | IO complete= Like the above submit number, but for completions instead. | |
30061b97 JA |
2210 | IO issued= The number of read/write requests issued, and how many |
2211 | of them were short. | |
ec118304 JA |
2212 | IO latencies= The distribution of IO completion latencies. This is the |
2213 | time from when IO leaves fio and when it gets completed. | |
2214 | The numbers follow the same pattern as the IO depths, | |
2215 | meaning that 2=1.6% means that 1.6% of the IO completed | |
8abdce66 JA |
2216 | within 2 msecs, 20=12.8% means that 12.8% of the IO |
2217 | took more than 10 msecs, but less than (or equal to) 20 msecs. | |
71bfa161 JA |
2218 | |
2219 | After each client has been listed, the group statistics are printed. They | |
2220 | will look like this: | |
2221 | ||
2222 | Run status group 0 (all jobs): | |
b22989b9 JA |
2223 | READ: io=64MB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec |
2224 | WRITE: io=64MB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec | |
71bfa161 JA |
2225 | |
2226 | For each data direction, it prints: | |
2227 | ||
2228 | io= Number of megabytes io performed. | |
2229 | aggrb= Aggregate bandwidth of threads in this group. | |
2230 | minb= The minimum average bandwidth a thread saw. | |
2231 | maxb= The maximum average bandwidth a thread saw. | |
2232 | mint= The smallest runtime of the threads in that group. | |
2233 | maxt= The longest runtime of the threads in that group. | |
2234 | ||
2235 | And finally, the disk statistics are printed. They will look like this: | |
2236 | ||
2237 | Disk stats (read/write): | |
2238 | sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00% | |
2239 | ||
2240 | Each value is printed for both reads and writes, with reads first. The | |
2241 | numbers denote: | |
2242 | ||
2243 | ios= Number of ios performed by all groups. | |
2244 | merge= Number of merges io the io scheduler. | |
2245 | ticks= Number of ticks we kept the disk busy. | |
2246 | io_queue= Total time spent in the disk queue. | |
2247 | util= The disk utilization. A value of 100% means we kept the disk | |
2248 | busy constantly, 50% would be a disk idling half of the time. | |
2249 | ||
8423bd11 JA |
2250 | It is also possible to get fio to dump the current output while it is |
2251 | running, without terminating the job. To do that, send fio the USR1 signal. | |
06464907 JA |
2252 | You can also get regularly timed dumps by using the --status-interval |
2253 | parameter, or by creating a file in /tmp named fio-dump-status. If fio | |
2254 | sees this file, it will unlink it and dump the current output status. | |
8423bd11 | 2255 | |
71bfa161 JA |
2256 | |
2257 | 7.0 Terse output | |
2258 | ---------------- | |
2259 | ||
2260 | For scripted usage where you typically want to generate tables or graphs | |
6af019c9 | 2261 | of the results, fio can output the results in a semicolon separated format. |
71bfa161 JA |
2262 | The format is one long line of values, such as: |
2263 | ||
562c2d2f DN |
2264 | 2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00% |
2265 | A description of this job goes here. | |
2266 | ||
2267 | The job description (if provided) follows on a second line. | |
71bfa161 | 2268 | |
525c2bfa JA |
2269 | To enable terse output, use the --minimal command line option. The first |
2270 | value is the version of the terse output format. If the output has to | |
2271 | be changed for some reason, this number will be incremented by 1 to | |
2272 | signify that change. | |
6820cb3b | 2273 | |
71bfa161 JA |
2274 | Split up, the format is as follows: |
2275 | ||
5e726d0a | 2276 | terse version, fio version, jobname, groupid, error |
71bfa161 | 2277 | READ status: |
6d500c2e | 2278 | Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec) |
d86ae56c CW |
2279 | Submission latency: min, max, mean, stdev (usec) |
2280 | Completion latency: min, max, mean, stdev (usec) | |
1db92cb6 | 2281 | Completion latency percentiles: 20 fields (see below) |
d86ae56c | 2282 | Total latency: min, max, mean, stdev (usec) |
6d500c2e | 2283 | Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev |
71bfa161 | 2284 | WRITE status: |
6d500c2e | 2285 | Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec) |
d86ae56c CW |
2286 | Submission latency: min, max, mean, stdev (usec) |
2287 | Completion latency: min, max, mean, stdev(usec) | |
1db92cb6 | 2288 | Completion latency percentiles: 20 fields (see below) |
d86ae56c | 2289 | Total latency: min, max, mean, stdev (usec) |
6d500c2e | 2290 | Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev |
046ee302 | 2291 | CPU usage: user, system, context switches, major faults, minor faults |
2270890c | 2292 | IO depths: <=1, 2, 4, 8, 16, 32, >=64 |
562c2d2f DN |
2293 | IO latencies microseconds: <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000 |
2294 | IO latencies milliseconds: <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000 | |
f2f788dd JA |
2295 | Disk utilization: Disk name, Read ios, write ios, |
2296 | Read merges, write merges, | |
2297 | Read ticks, write ticks, | |
3d7cd9b4 | 2298 | Time spent in queue, disk utilization percentage |
de8f6de9 | 2299 | Additional Info (dependent on continue_on_error, default off): total # errors, first error code |
66c098b8 | 2300 | |
de8f6de9 | 2301 | Additional Info (dependent on description being set): Text description |
25c8b9d7 | 2302 | |
1db92cb6 JA |
2303 | Completion latency percentiles can be a grouping of up to 20 sets, so |
2304 | for the terse output fio writes all of them. Each field will look like this: | |
2305 | ||
2306 | 1.00%=6112 | |
2307 | ||
2308 | which is the Xth percentile, and the usec latency associated with it. | |
2309 | ||
f2f788dd JA |
2310 | For disk utilization, all disks used by fio are shown. So for each disk |
2311 | there will be a disk utilization section. | |
2312 | ||
25c8b9d7 PD |
2313 | |
2314 | 8.0 Trace file format | |
2315 | --------------------- | |
66c098b8 | 2316 | There are two trace file format that you can encounter. The older (v1) format |
25c8b9d7 PD |
2317 | is unsupported since version 1.20-rc3 (March 2008). It will still be described |
2318 | below in case that you get an old trace and want to understand it. | |
2319 | ||
2320 | In any case the trace is a simple text file with a single action per line. | |
2321 | ||
2322 | ||
2323 | 8.1 Trace file format v1 | |
2324 | ------------------------ | |
2325 | Each line represents a single io action in the following format: | |
2326 | ||
2327 | rw, offset, length | |
2328 | ||
2329 | where rw=0/1 for read/write, and the offset and length entries being in bytes. | |
2330 | ||
2331 | This format is not supported in Fio versions => 1.20-rc3. | |
2332 | ||
2333 | ||
2334 | 8.2 Trace file format v2 | |
2335 | ------------------------ | |
2336 | The second version of the trace file format was added in Fio version 1.17. | |
2337 | It allows to access more then one file per trace and has a bigger set of | |
2338 | possible file actions. | |
2339 | ||
2340 | The first line of the trace file has to be: | |
2341 | ||
2342 | fio version 2 iolog | |
2343 | ||
2344 | Following this can be lines in two different formats, which are described below. | |
2345 | ||
2346 | The file management format: | |
2347 | ||
2348 | filename action | |
2349 | ||
2350 | The filename is given as an absolute path. The action can be one of these: | |
2351 | ||
2352 | add Add the given filename to the trace | |
66c098b8 | 2353 | open Open the file with the given filename. The filename has to have |
25c8b9d7 PD |
2354 | been added with the add action before. |
2355 | close Close the file with the given filename. The file has to have been | |
2356 | opened before. | |
2357 | ||
2358 | ||
2359 | The file io action format: | |
2360 | ||
2361 | filename action offset length | |
2362 | ||
2363 | The filename is given as an absolute path, and has to have been added and opened | |
66c098b8 | 2364 | before it can be used with this format. The offset and length are given in |
25c8b9d7 PD |
2365 | bytes. The action can be one of these: |
2366 | ||
2367 | wait Wait for 'offset' microseconds. Everything below 100 is discarded. | |
5c7808fe | 2368 | The time is relative to the previous wait statement. |
25c8b9d7 PD |
2369 | read Read 'length' bytes beginning from 'offset' |
2370 | write Write 'length' bytes beginning from 'offset' | |
2371 | sync fsync() the file | |
2372 | datasync fdatasync() the file | |
2373 | trim trim the given file from the given 'offset' for 'length' bytes | |
f2a2ce0e HL |
2374 | |
2375 | ||
2376 | 9.0 CPU idleness profiling | |
06464907 | 2377 | -------------------------- |
f2a2ce0e HL |
2378 | In some cases, we want to understand CPU overhead in a test. For example, |
2379 | we test patches for the specific goodness of whether they reduce CPU usage. | |
2380 | fio implements a balloon approach to create a thread per CPU that runs at | |
2381 | idle priority, meaning that it only runs when nobody else needs the cpu. | |
2382 | By measuring the amount of work completed by the thread, idleness of each | |
2383 | CPU can be derived accordingly. | |
2384 | ||
2385 | An unit work is defined as touching a full page of unsigned characters. Mean | |
2386 | and standard deviation of time to complete an unit work is reported in "unit | |
2387 | work" section. Options can be chosen to report detailed percpu idleness or | |
2388 | overall system idleness by aggregating percpu stats. | |
99b9a85a JA |
2389 | |
2390 | ||
2391 | 10.0 Verification and triggers | |
2392 | ------------------------------ | |
2393 | Fio is usually run in one of two ways, when data verification is done. The | |
2394 | first is a normal write job of some sort with verify enabled. When the | |
2395 | write phase has completed, fio switches to reads and verifies everything | |
2396 | it wrote. The second model is running just the write phase, and then later | |
2397 | on running the same job (but with reads instead of writes) to repeat the | |
2398 | same IO patterns and verify the contents. Both of these methods depend | |
2399 | on the write phase being completed, as fio otherwise has no idea how much | |
2400 | data was written. | |
2401 | ||
2402 | With verification triggers, fio supports dumping the current write state | |
2403 | to local files. Then a subsequent read verify workload can load this state | |
2404 | and know exactly where to stop. This is useful for testing cases where | |
2405 | power is cut to a server in a managed fashion, for instance. | |
2406 | ||
2407 | A verification trigger consists of two things: | |
2408 | ||
2409 | 1) Storing the write state of each job | |
2410 | 2) Executing a trigger command | |
2411 | ||
2412 | The write state is relatively small, on the order of hundreds of bytes | |
2413 | to single kilobytes. It contains information on the number of completions | |
2414 | done, the last X completions, etc. | |
2415 | ||
2416 | A trigger is invoked either through creation ('touch') of a specified | |
2417 | file in the system, or through a timeout setting. If fio is run with | |
2418 | --trigger-file=/tmp/trigger-file, then it will continually check for | |
2419 | the existence of /tmp/trigger-file. When it sees this file, it will | |
2420 | fire off the trigger (thus saving state, and executing the trigger | |
2421 | command). | |
2422 | ||
2423 | For client/server runs, there's both a local and remote trigger. If | |
2424 | fio is running as a server backend, it will send the job states back | |
2425 | to the client for safe storage, then execute the remote trigger, if | |
2426 | specified. If a local trigger is specified, the server will still send | |
2427 | back the write state, but the client will then execute the trigger. | |
2428 | ||
2429 | 10.1 Verification trigger example | |
2430 | --------------------------------- | |
2431 | Lets say we want to run a powercut test on the remote machine 'server'. | |
2432 | Our write workload is in write-test.fio. We want to cut power to 'server' | |
2433 | at some point during the run, and we'll run this test from the safety | |
2434 | or our local machine, 'localbox'. On the server, we'll start the fio | |
2435 | backend normally: | |
2436 | ||
2437 | server# fio --server | |
2438 | ||
2439 | and on the client, we'll fire off the workload: | |
2440 | ||
2441 | localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger-remote="bash -c \"echo b > /proc/sysrq-triger\"" | |
2442 | ||
2443 | We set /tmp/my-trigger as the trigger file, and we tell fio to execute | |
2444 | ||
2445 | echo b > /proc/sysrq-trigger | |
2446 | ||
2447 | on the server once it has received the trigger and sent us the write | |
2448 | state. This will work, but it's not _really_ cutting power to the server, | |
2449 | it's merely abruptly rebooting it. If we have a remote way of cutting | |
2450 | power to the server through IPMI or similar, we could do that through | |
2451 | a local trigger command instead. Lets assume we have a script that does | |
2452 | IPMI reboot of a given hostname, ipmi-reboot. On localbox, we could | |
2453 | then have run fio with a local trigger instead: | |
2454 | ||
2455 | localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger="ipmi-reboot server" | |
2456 | ||
2457 | For this case, fio would wait for the server to send us the write state, | |
2458 | then execute 'ipmi-reboot server' when that happened. | |
2459 | ||
29dbd1e5 | 2460 | 10.2 Loading verify state |
99b9a85a JA |
2461 | ------------------------- |
2462 | To load store write state, read verification job file must contain | |
2463 | the verify_state_load option. If that is set, fio will load the previously | |
2464 | stored state. For a local fio run this is done by loading the files directly, | |
2465 | and on a client/server run, the server backend will ask the client to send | |
2466 | the files over and load them from there. | |
a3ae5b05 JA |
2467 | |
2468 | ||
2469 | 11.0 Log File Formats | |
2470 | --------------------- | |
2471 | ||
2472 | Fio supports a variety of log file formats, for logging latencies, bandwidth, | |
2473 | and IOPS. The logs share a common format, which looks like this: | |
2474 | ||
2475 | time (msec), value, data direction, offset | |
2476 | ||
2477 | Time for the log entry is always in milliseconds. The value logged depends | |
2478 | on the type of log, it will be one of the following: | |
2479 | ||
2480 | Latency log Value is latency in usecs | |
6d500c2e | 2481 | Bandwidth log Value is in KiB/sec |
a3ae5b05 JA |
2482 | IOPS log Value is IOPS |
2483 | ||
2484 | Data direction is one of the following: | |
2485 | ||
2486 | 0 IO is a READ | |
2487 | 1 IO is a WRITE | |
2488 | 2 IO is a TRIM | |
2489 | ||
2490 | The offset is the offset, in bytes, from the start of the file, for that | |
2491 | particular IO. The logging of the offset can be toggled with 'log_offset'. | |
2492 | ||
42d97b5c | 2493 | If windowed logging is enabled through 'log_avg_msec', then fio doesn't log |
a3ae5b05 JA |
2494 | individual IOs. Instead of logs the average values over the specified |
2495 | period of time. Since 'data direction' and 'offset' are per-IO values, | |
2496 | they aren't applicable if windowed logging is enabled. If windowed logging | |
2497 | is enabled and 'log_max_value' is set, then fio logs maximum values in | |
2498 | that window instead of averages. |