Commit | Line | Data |
---|---|---|
ebac4655 JA |
1 | fio |
2 | --- | |
3 | ||
79809113 JA |
4 | fio is a tool that will spawn a number of threads or processes doing a |
5 | particular type of io action as specified by the user. fio takes a | |
6 | number of global parameters, each inherited by the thread unless | |
7 | otherwise parameters given to them overriding that setting is given. | |
8 | The typical use of fio is to write a job file matching the io load | |
9 | one wants to simulate. | |
ebac4655 | 10 | |
2b02b546 JA |
11 | |
12 | Source | |
13 | ------ | |
14 | ||
15 | fio resides in a git repo, the canonical place is: | |
16 | ||
17 | git://brick.kernel.dk/data/git/fio.git | |
18 | ||
79809113 JA |
19 | Snapshots are frequently generated and they include the git meta data as |
20 | well. You can download them here: | |
2b02b546 JA |
21 | |
22 | http://brick.kernel.dk/snaps/ | |
23 | ||
1053a106 JA |
24 | Pascal Bleser <guru@unixtech.be> has fio RPMs in his repository, you |
25 | can find them here: | |
26 | ||
27 | http://linux01.gwdg.de/~pbleser/rpm-navigation.php?cat=System/fio | |
28 | ||
2b02b546 | 29 | |
bbfd6b00 JA |
30 | Building |
31 | -------- | |
32 | ||
33 | Just type 'make' and 'make install'. If on FreeBSD, for now you have to | |
34 | specify the FreeBSD Makefile with -f, eg: | |
35 | ||
36 | $ make -f Makefile.Freebsd && make -f Makefile.FreeBSD install | |
37 | ||
edffcb96 | 38 | Likewise with OpenSolaris, use the Makefile.solaris to compile there. |
bbfd6b00 JA |
39 | This might change in the future if I opt for an autoconf type setup. |
40 | ||
41 | ||
972cfd25 JA |
42 | Command line |
43 | ------------ | |
ebac4655 JA |
44 | |
45 | $ fio | |
ebac4655 | 46 | -t <sec> Runtime in seconds |
ebac4655 JA |
47 | -l Generate per-job latency logs |
48 | -w Generate per-job bandwidth logs | |
9ebc27e1 | 49 | -o <file> Log output to file |
c6ae0a5b | 50 | -m Minimal (terse) output |
4785f995 | 51 | -h Print help info |
ebac4655 JA |
52 | -v Print version information and exit |
53 | ||
972cfd25 JA |
54 | Any parameters following the options will be assumed to be job files. |
55 | You can add as many as you want, each job file will be regarded as a | |
56 | separate group and fio will stonewall it's execution. | |
57 | ||
79809113 JA |
58 | |
59 | Job file | |
60 | -------- | |
61 | ||
62 | Only a few options can be controlled with command line parameters, | |
63 | generally it's a lot easier to just write a simple job file to describe | |
64 | the workload. The job file format is in the ini style format, as it's | |
65 | easy to read and write for the user. | |
66 | ||
67 | The job file parameters are: | |
ebac4655 | 68 | |
01452055 | 69 | name=x Use 'x' as the identifier for this job. |
ebac4655 | 70 | directory=x Use 'x' as the top level directory for storing files |
3d60d1ed JA |
71 | rw=x 'x' may be: read, randread, write, randwrite, |
72 | rw (read-write mix), randrw (read-write random mix) | |
a6ccc7be JA |
73 | rwmixcycle=x Base cycle for switching between read and write |
74 | in msecs. | |
75 | rwmixread=x 'x' percentage of rw mix ios will be reads. If | |
76 | rwmixwrite is also given, the last of the two will | |
77 | be used if they don't add up to 100%. | |
78 | rwmixwrite=x 'x' percentage of rw mix ios will be writes. See | |
79 | rwmixread. | |
9ebc27e1 JA |
80 | rand_repeatable=x The sequence of random io blocks can be repeatable |
81 | across runs, if 'x' is 1. | |
ebac4655 JA |
82 | size=x Set file size to x bytes (x string can include k/m/g) |
83 | ioengine=x 'x' may be: aio/libaio/linuxaio for Linux aio, | |
84 | posixaio for POSIX aio, sync for regular read/write io, | |
8756e4d4 JA |
85 | mmap for mmap'ed io, splice for using splice/vmsplice, |
86 | or sgio for direct SG_IO io. The latter only works on | |
87 | Linux on SCSI (or SCSI-like devices, such as | |
88 | usb-storage or sata/libata driven) devices. | |
ebac4655 JA |
89 | iodepth=x For async io, allow 'x' ios in flight |
90 | overwrite=x If 'x', layout a write file first. | |
91 | prio=x Run io at prio X, 0-7 is the kernel allowed range | |
92 | prioclass=x Run io at prio class X | |
93 | bs=x Use 'x' for thread blocksize. May include k/m postfix. | |
94 | bsrange=x-y Mix thread block sizes randomly between x and y. May | |
95 | also include k/m postfix. | |
96 | direct=x 1 for direct IO, 0 for buffered IO | |
97 | thinktime=x "Think" x usec after each io | |
98 | rate=x Throttle rate to x KiB/sec | |
99 | ratemin=x Quit if rate of x KiB/sec can't be met | |
100 | ratecycle=x ratemin averaged over x msecs | |
101 | cpumask=x Only allow job to run on CPUs defined by mask. | |
102 | fsync=x If writing, fsync after every x blocks have been written | |
103 | startdelay=x Start this thread x seconds after startup | |
906c8d75 JA |
104 | timeout=x Terminate x seconds after startup. Can include a |
105 | normal time suffix if not given in seconds, such as | |
106 | 'm' for minutes, 'h' for hours, and 'd' for days. | |
ebac4655 JA |
107 | offset=x Start io at offset x (x string can include k/m/g) |
108 | invalidate=x Invalidate page cache for file prior to doing io | |
109 | sync=x Use sync writes if x and writing | |
110 | mem=x If x == malloc, use malloc for buffers. If x == shm, | |
111 | use shm for buffers. If x == mmap, use anon mmap. | |
112 | exitall When one thread quits, terminate the others | |
113 | bwavgtime=x Average bandwidth stats over an x msec window. | |
114 | create_serialize=x If 'x', serialize file creation. | |
115 | create_fsync=x If 'x', run fsync() after file creation. | |
fc1a4713 | 116 | end_fsync=x If 'x', run fsync() after end-of-job. |
ebac4655 JA |
117 | loops=x Run the job 'x' number of times. |
118 | verify=x If 'x' == md5, use md5 for verifies. If 'x' == crc32, | |
119 | use crc32 for verifies. md5 is 'safer', but crc32 is | |
120 | a lot faster. Only makes sense for writing to a file. | |
121 | stonewall Wait for preceeding jobs to end before running. | |
122 | numjobs=x Create 'x' similar entries for this job | |
123 | thread Use pthreads instead of forked jobs | |
20dc95c4 JA |
124 | zonesize=x |
125 | zoneskip=y Zone options must be paired. If given, the job | |
126 | will skip y bytes for every x read/written. This | |
127 | can be used to gauge hard drive speed over the entire | |
128 | platter, without reading everything. Both x/y can | |
129 | include k/m/g suffix. | |
aea47d44 JA |
130 | iolog=x Open and read io pattern from file 'x'. The file must |
131 | contain one io action per line in the following format: | |
132 | rw, offset, length | |
133 | where with rw=0/1 for read/write, and the offset | |
134 | and length entries being in bytes. | |
843a7413 JA |
135 | write_iolog=x Write an iolog to file 'x' in the same format as iolog. |
136 | The iolog options are exclusive, if both given the | |
137 | read iolog will be performed. | |
c04f7ec3 JA |
138 | lockmem=x Lock down x amount of memory on the machine, to |
139 | simulate a machine with less memory available. x can | |
140 | include k/m/g suffix. | |
b6f4d880 | 141 | nice=x Run job at given nice value. |
4e0ba8af JA |
142 | exec_prerun=x Run 'x' before job io is begun. |
143 | exec_postrun=x Run 'x' after job io has finished. | |
da86774e | 144 | ioscheduler=x Use ioscheduler 'x' for this job. |
ebac4655 | 145 | |
79809113 | 146 | |
ebac4655 JA |
147 | Examples using a job file |
148 | ------------------------- | |
149 | ||
79809113 | 150 | Example 1) Two random readers |
ebac4655 | 151 | |
79809113 JA |
152 | Lets say we want to simulate two threads reading randomly from a file |
153 | each. They will be doing IO in 4KiB chunks, using raw (O_DIRECT) IO. | |
154 | Since they share most parameters, we'll put those in the [global] | |
155 | section. Job 1 will use a 128MiB file, job 2 will use a 256MiB file. | |
ebac4655 | 156 | |
79809113 | 157 | ; ---snip--- |
ebac4655 | 158 | |
79809113 JA |
159 | [global] |
160 | ioengine=sync ; regular read/write(2), the default | |
161 | rw=randread | |
162 | bs=4k | |
163 | direct=1 | |
ebac4655 | 164 | |
79809113 JA |
165 | [file1] |
166 | size=128m | |
ebac4655 | 167 | |
79809113 JA |
168 | [file2] |
169 | size=256m | |
ebac4655 | 170 | |
79809113 | 171 | ; ---snip--- |
ebac4655 | 172 | |
79809113 JA |
173 | Generally the [] bracketed name specifies a file name, but the "global" |
174 | keyword is reserved for setting options that are inherited by each | |
175 | subsequent job description. It's possible to have several [global] | |
176 | sections in the job file, each one adds options that are inherited by | |
177 | jobs defined below it. The name can also point to a block device, such | |
178 | as /dev/sda. To run the above job file, simply do: | |
ebac4655 | 179 | |
79809113 JA |
180 | $ fio jobfile |
181 | ||
182 | Example 2) Many random writers | |
183 | ||
184 | Say we want to exercise the IO subsystem some more. We'll define 64 | |
185 | threads doing random buffered writes. We'll let each thread use async io | |
186 | with a depth of 4 ios in flight. A job file would then look like this: | |
ebac4655 | 187 | |
79809113 | 188 | ; ---snip--- |
ebac4655 | 189 | |
79809113 JA |
190 | [global] |
191 | ioengine=libaio | |
192 | iodepth=4 | |
193 | rw=randwrite | |
194 | bs=32k | |
195 | direct=0 | |
196 | size=64m | |
ebac4655 | 197 | |
79809113 JA |
198 | [files] |
199 | numjobs=64 | |
ebac4655 | 200 | |
79809113 JA |
201 | ; ---snip--- |
202 | ||
203 | This will create files.[0-63] and perform the random writes to them. | |
204 | ||
205 | There are endless ways to define jobs, the examples/ directory contains | |
206 | a few more examples. | |
ebac4655 JA |
207 | |
208 | ||
209 | Interpreting the output | |
210 | ----------------------- | |
211 | ||
212 | fio spits out a lot of output. While running, fio will display the | |
213 | status of the jobs created. An example of that would be: | |
214 | ||
972cfd25 | 215 | Threads running: 1: [_r] [24.79% done] [eta 00h:01m:31s] |
ebac4655 JA |
216 | |
217 | The characters inside the square brackets denote the current status of | |
218 | each thread. The possible values (in typical life cycle order) are: | |
219 | ||
220 | Idle Run | |
221 | ---- --- | |
222 | P Thread setup, but not started. | |
79809113 JA |
223 | C Thread created. |
224 | I Thread initialized, waiting. | |
ebac4655 JA |
225 | R Running, doing sequential reads. |
226 | r Running, doing random reads. | |
227 | W Running, doing sequential writes. | |
228 | w Running, doing random writes. | |
79809113 JA |
229 | M Running, doing mixed sequential reads/writes. |
230 | m Running, doing mixed random reads/writes. | |
231 | F Running, currently waiting for fsync() | |
ebac4655 JA |
232 | V Running, doing verification of written data. |
233 | E Thread exited, not reaped by main thread yet. | |
234 | _ Thread reaped. | |
235 | ||
79809113 JA |
236 | The other values are fairly self explanatory - number of threads |
237 | currently running and doing io, and the estimated completion percentage | |
972cfd25 JA |
238 | and time for the running group. It's impossible to estimate runtime |
239 | of the following groups (if any). | |
ebac4655 JA |
240 | |
241 | When fio is done (or interrupted by ctrl-c), it will show the data for | |
242 | each thread, group of threads, and disks in that order. For each data | |
243 | direction, the output looks like: | |
244 | ||
245 | Client1 (g=0): err= 0: | |
246 | write: io= 32MiB, bw= 666KiB/s, runt= 50320msec | |
247 | slat (msec): min= 0, max= 136, avg= 0.03, dev= 1.92 | |
248 | clat (msec): min= 0, max= 631, avg=48.50, dev=86.82 | |
249 | bw (KiB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, dev=681.68 | |
250 | cpu : usr=1.49%, sys=0.25%, ctx=7969 | |
251 | ||
252 | The client number is printed, along with the group id and error of that | |
253 | thread. Below is the io statistics, here for writes. In the order listed, | |
254 | they denote: | |
255 | ||
256 | io= Number of megabytes io performed | |
257 | bw= Average bandwidth rate | |
258 | runt= The runtime of that thread | |
259 | slat= Submission latency (avg being the average, dev being the | |
260 | standard deviation). This is the time it took to submit | |
261 | the io. For sync io, the slat is really the completion | |
262 | latency, since queue/complete is one operation there. | |
263 | clat= Completion latency. Same names as slat, this denotes the | |
264 | time from submission to completion of the io pieces. For | |
265 | sync io, clat will usually be equal (or very close) to 0, | |
266 | as the time from submit to complete is basically just | |
267 | CPU time (io has already been done, see slat explanation). | |
268 | bw= Bandwidth. Same names as the xlat stats, but also includes | |
269 | an approximate percentage of total aggregate bandwidth | |
270 | this thread received in this group. This last value is | |
271 | only really useful if the threads in this group are on the | |
272 | same disk, since they are then competing for disk access. | |
273 | cpu= CPU usage. User and system time, along with the number | |
274 | of context switches this thread went through. | |
275 | ||
276 | After each client has been listed, the group statistics are printed. They | |
277 | will look like this: | |
278 | ||
279 | Run status group 0 (all jobs): | |
280 | READ: io=64MiB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec | |
281 | WRITE: io=64MiB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec | |
282 | ||
283 | For each data direction, it prints: | |
284 | ||
285 | io= Number of megabytes io performed. | |
286 | aggrb= Aggregate bandwidth of threads in this group. | |
287 | minb= The minimum average bandwidth a thread saw. | |
288 | maxb= The maximum average bandwidth a thread saw. | |
79809113 JA |
289 | mint= The smallest runtime of the threads in that group. |
290 | maxt= The longest runtime of the threads in that group. | |
ebac4655 JA |
291 | |
292 | And finally, the disk statistics are printed. They will look like this: | |
293 | ||
294 | Disk stats (read/write): | |
295 | sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00% | |
296 | ||
297 | Each value is printed for both reads and writes, with reads first. The | |
298 | numbers denote: | |
299 | ||
300 | ios= Number of ios performed by all groups. | |
301 | merge= Number of merges io the io scheduler. | |
302 | ticks= Number of ticks we kept the disk busy. | |
303 | io_queue= Total time spent in the disk queue. | |
304 | util= The disk utilization. A value of 100% means we kept the disk | |
305 | busy constantly, 50% would be a disk idling half of the time. | |
79809113 JA |
306 | |
307 | ||
c6ae0a5b JA |
308 | Terse output |
309 | ------------ | |
310 | ||
311 | For scripted usage where you typically want to generate tables or graphs | |
312 | of the results, fio can output the results in a comma seperated format. | |
313 | The format is one long line of values, such as: | |
314 | ||
315 | client1,0,0,936,331,2894,0,0,0.000000,0.000000,1,170,22.115385,34.290410,16,714,84.252874%,366.500000,566.417819,3496,1237,2894,0,0,0.000000,0.000000,0,246,6.671625,21.436952,0,2534,55.465300%,1406.600000,2008.044216,0.000000%,0.431928%,1109 | |
316 | ||
317 | Split up, the format is as follows: | |
318 | ||
319 | jobname, groupid, error | |
320 | READ status: | |
321 | KiB IO, bandwidth (KiB/sec), runtime (msec) | |
322 | Submission latency: min, max, mean, deviation | |
323 | Completion latency: min, max, mean, deviation | |
324 | Bw: min, max, aggreate percentage of total, mean, deviation | |
325 | WRITE status: | |
326 | KiB IO, bandwidth (KiB/sec), runtime (msec) | |
327 | Submission latency: min, max, mean, deviation | |
328 | Completion latency: min, max, mean, deviation | |
329 | Bw: min, max, aggreate percentage of total, mean, deviation | |
330 | CPU usage: user, system, context switches | |
331 | ||
332 | ||
79809113 JA |
333 | Author |
334 | ------ | |
335 | ||
aae22ca7 | 336 | Fio was written by Jens Axboe <axboe@kernel.dk> to enable flexible testing |
79809113 JA |
337 | of the Linux IO subsystem and schedulers. He got tired of writing |
338 | specific test applications to simulate a given workload, and found that | |
339 | the existing io benchmark/test tools out there weren't flexible enough | |
340 | to do what he wanted. | |
341 | ||
aae22ca7 | 342 | Jens Axboe <axboe@kernel.dk> 20060905 |
79809113 | 343 |