Commit | Line | Data |
---|---|---|
ebac4655 JA |
1 | fio |
2 | --- | |
3 | ||
4 | fio is a tool that will spawn a number of thread doing a particular | |
5 | type of io action as specified by the user. fio takes a number of | |
6 | global parameters, each inherited by the thread unless otherwise | |
7 | parameters given to them overriding that setting is given. | |
8 | ||
2b02b546 JA |
9 | |
10 | Source | |
11 | ------ | |
12 | ||
13 | fio resides in a git repo, the canonical place is: | |
14 | ||
15 | git://brick.kernel.dk/data/git/fio.git | |
16 | ||
17 | Snapshots are frequently generated as well and they include the git | |
18 | meta data as well. You can download them here: | |
19 | ||
20 | http://brick.kernel.dk/snaps/ | |
21 | ||
1053a106 JA |
22 | Pascal Bleser <guru@unixtech.be> has fio RPMs in his repository, you |
23 | can find them here: | |
24 | ||
25 | http://linux01.gwdg.de/~pbleser/rpm-navigation.php?cat=System/fio | |
26 | ||
2b02b546 | 27 | |
bbfd6b00 JA |
28 | Building |
29 | -------- | |
30 | ||
31 | Just type 'make' and 'make install'. If on FreeBSD, for now you have to | |
32 | specify the FreeBSD Makefile with -f, eg: | |
33 | ||
34 | $ make -f Makefile.Freebsd && make -f Makefile.FreeBSD install | |
35 | ||
edffcb96 | 36 | Likewise with OpenSolaris, use the Makefile.solaris to compile there. |
bbfd6b00 JA |
37 | This might change in the future if I opt for an autoconf type setup. |
38 | ||
39 | ||
ebac4655 JA |
40 | Options |
41 | ------- | |
42 | ||
43 | $ fio | |
44 | -s IO is sequential | |
45 | -b block size in KiB for each io | |
46 | -t <sec> Runtime in seconds | |
47 | -r For random io, sequence must be repeatable | |
48 | -R <on> If one thread fails to meet rate, quit all | |
49 | -o <on> Use direct IO is 1, buffered if 0 | |
50 | -l Generate per-job latency logs | |
51 | -w Generate per-job bandwidth logs | |
52 | -f <file> Read <file> for job descriptions | |
eb8bbf48 | 53 | -O <file> Log output to file |
4785f995 | 54 | -h Print help info |
ebac4655 JA |
55 | -v Print version information and exit |
56 | ||
57 | The <jobs> format is as follows: | |
58 | ||
01452055 | 59 | name=x Use 'x' as the identifier for this job. |
ebac4655 | 60 | directory=x Use 'x' as the top level directory for storing files |
3d60d1ed JA |
61 | rw=x 'x' may be: read, randread, write, randwrite, |
62 | rw (read-write mix), randrw (read-write random mix) | |
a6ccc7be JA |
63 | rwmixcycle=x Base cycle for switching between read and write |
64 | in msecs. | |
65 | rwmixread=x 'x' percentage of rw mix ios will be reads. If | |
66 | rwmixwrite is also given, the last of the two will | |
67 | be used if they don't add up to 100%. | |
68 | rwmixwrite=x 'x' percentage of rw mix ios will be writes. See | |
69 | rwmixread. | |
ebac4655 JA |
70 | size=x Set file size to x bytes (x string can include k/m/g) |
71 | ioengine=x 'x' may be: aio/libaio/linuxaio for Linux aio, | |
72 | posixaio for POSIX aio, sync for regular read/write io, | |
8756e4d4 JA |
73 | mmap for mmap'ed io, splice for using splice/vmsplice, |
74 | or sgio for direct SG_IO io. The latter only works on | |
75 | Linux on SCSI (or SCSI-like devices, such as | |
76 | usb-storage or sata/libata driven) devices. | |
ebac4655 JA |
77 | iodepth=x For async io, allow 'x' ios in flight |
78 | overwrite=x If 'x', layout a write file first. | |
79 | prio=x Run io at prio X, 0-7 is the kernel allowed range | |
80 | prioclass=x Run io at prio class X | |
81 | bs=x Use 'x' for thread blocksize. May include k/m postfix. | |
82 | bsrange=x-y Mix thread block sizes randomly between x and y. May | |
83 | also include k/m postfix. | |
84 | direct=x 1 for direct IO, 0 for buffered IO | |
85 | thinktime=x "Think" x usec after each io | |
86 | rate=x Throttle rate to x KiB/sec | |
87 | ratemin=x Quit if rate of x KiB/sec can't be met | |
88 | ratecycle=x ratemin averaged over x msecs | |
89 | cpumask=x Only allow job to run on CPUs defined by mask. | |
90 | fsync=x If writing, fsync after every x blocks have been written | |
91 | startdelay=x Start this thread x seconds after startup | |
92 | timeout=x Terminate x seconds after startup | |
93 | offset=x Start io at offset x (x string can include k/m/g) | |
94 | invalidate=x Invalidate page cache for file prior to doing io | |
95 | sync=x Use sync writes if x and writing | |
96 | mem=x If x == malloc, use malloc for buffers. If x == shm, | |
97 | use shm for buffers. If x == mmap, use anon mmap. | |
98 | exitall When one thread quits, terminate the others | |
99 | bwavgtime=x Average bandwidth stats over an x msec window. | |
100 | create_serialize=x If 'x', serialize file creation. | |
101 | create_fsync=x If 'x', run fsync() after file creation. | |
fc1a4713 | 102 | end_fsync=x If 'x', run fsync() after end-of-job. |
ebac4655 JA |
103 | loops=x Run the job 'x' number of times. |
104 | verify=x If 'x' == md5, use md5 for verifies. If 'x' == crc32, | |
105 | use crc32 for verifies. md5 is 'safer', but crc32 is | |
106 | a lot faster. Only makes sense for writing to a file. | |
107 | stonewall Wait for preceeding jobs to end before running. | |
108 | numjobs=x Create 'x' similar entries for this job | |
109 | thread Use pthreads instead of forked jobs | |
20dc95c4 JA |
110 | zonesize=x |
111 | zoneskip=y Zone options must be paired. If given, the job | |
112 | will skip y bytes for every x read/written. This | |
113 | can be used to gauge hard drive speed over the entire | |
114 | platter, without reading everything. Both x/y can | |
115 | include k/m/g suffix. | |
aea47d44 JA |
116 | iolog=x Open and read io pattern from file 'x'. The file must |
117 | contain one io action per line in the following format: | |
118 | rw, offset, length | |
119 | where with rw=0/1 for read/write, and the offset | |
120 | and length entries being in bytes. | |
843a7413 JA |
121 | write_iolog=x Write an iolog to file 'x' in the same format as iolog. |
122 | The iolog options are exclusive, if both given the | |
123 | read iolog will be performed. | |
c04f7ec3 JA |
124 | lockmem=x Lock down x amount of memory on the machine, to |
125 | simulate a machine with less memory available. x can | |
126 | include k/m/g suffix. | |
b6f4d880 | 127 | nice=x Run job at given nice value. |
4e0ba8af JA |
128 | exec_prerun=x Run 'x' before job io is begun. |
129 | exec_postrun=x Run 'x' after job io has finished. | |
da86774e | 130 | ioscheduler=x Use ioscheduler 'x' for this job. |
ebac4655 JA |
131 | |
132 | Examples using a job file | |
133 | ------------------------- | |
134 | ||
135 | A sample job file doing the same as above would look like this: | |
136 | ||
137 | [read_file] | |
138 | rw=0 | |
139 | bs=4096 | |
140 | ||
141 | [write_file] | |
142 | rw=1 | |
143 | bs=16384 | |
144 | ||
145 | And fio would be invoked as: | |
146 | ||
147 | $ fio -o1 -s -f file_with_above | |
148 | ||
149 | The second example would look like this: | |
150 | ||
151 | [rf1] | |
152 | rw=0 | |
153 | prio=6 | |
154 | ||
155 | [rf2] | |
156 | rw=0 | |
157 | prio=3 | |
158 | ||
159 | [rf3] | |
160 | rw=0 | |
161 | prio=0 | |
162 | direct=1 | |
163 | ||
164 | And fio would be invoked as: | |
165 | ||
166 | $ fio -o0 -s -b4096 -f file_with_above | |
167 | ||
168 | 'global' is a reserved keyword. When used as the filename, it sets the | |
169 | default options for the threads following that section. It is possible | |
170 | to have more than one global section in the file, as it only affects | |
171 | subsequent jobs. | |
172 | ||
173 | Also see the examples/ dir for sample job files. | |
174 | ||
175 | ||
176 | Interpreting the output | |
177 | ----------------------- | |
178 | ||
179 | fio spits out a lot of output. While running, fio will display the | |
180 | status of the jobs created. An example of that would be: | |
181 | ||
182 | Threads now running: 2 : [ww] [5.73% done] | |
183 | ||
184 | The characters inside the square brackets denote the current status of | |
185 | each thread. The possible values (in typical life cycle order) are: | |
186 | ||
187 | Idle Run | |
188 | ---- --- | |
189 | P Thread setup, but not started. | |
190 | C Thread created and running, but not doing anything yet | |
191 | R Running, doing sequential reads. | |
192 | r Running, doing random reads. | |
193 | W Running, doing sequential writes. | |
194 | w Running, doing random writes. | |
195 | V Running, doing verification of written data. | |
196 | E Thread exited, not reaped by main thread yet. | |
197 | _ Thread reaped. | |
198 | ||
199 | The other values are fairly self explanatory - number of thread currently | |
200 | running and doing io, and the estimated completion percentage. | |
201 | ||
202 | When fio is done (or interrupted by ctrl-c), it will show the data for | |
203 | each thread, group of threads, and disks in that order. For each data | |
204 | direction, the output looks like: | |
205 | ||
206 | Client1 (g=0): err= 0: | |
207 | write: io= 32MiB, bw= 666KiB/s, runt= 50320msec | |
208 | slat (msec): min= 0, max= 136, avg= 0.03, dev= 1.92 | |
209 | clat (msec): min= 0, max= 631, avg=48.50, dev=86.82 | |
210 | bw (KiB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, dev=681.68 | |
211 | cpu : usr=1.49%, sys=0.25%, ctx=7969 | |
212 | ||
213 | The client number is printed, along with the group id and error of that | |
214 | thread. Below is the io statistics, here for writes. In the order listed, | |
215 | they denote: | |
216 | ||
217 | io= Number of megabytes io performed | |
218 | bw= Average bandwidth rate | |
219 | runt= The runtime of that thread | |
220 | slat= Submission latency (avg being the average, dev being the | |
221 | standard deviation). This is the time it took to submit | |
222 | the io. For sync io, the slat is really the completion | |
223 | latency, since queue/complete is one operation there. | |
224 | clat= Completion latency. Same names as slat, this denotes the | |
225 | time from submission to completion of the io pieces. For | |
226 | sync io, clat will usually be equal (or very close) to 0, | |
227 | as the time from submit to complete is basically just | |
228 | CPU time (io has already been done, see slat explanation). | |
229 | bw= Bandwidth. Same names as the xlat stats, but also includes | |
230 | an approximate percentage of total aggregate bandwidth | |
231 | this thread received in this group. This last value is | |
232 | only really useful if the threads in this group are on the | |
233 | same disk, since they are then competing for disk access. | |
234 | cpu= CPU usage. User and system time, along with the number | |
235 | of context switches this thread went through. | |
236 | ||
237 | After each client has been listed, the group statistics are printed. They | |
238 | will look like this: | |
239 | ||
240 | Run status group 0 (all jobs): | |
241 | READ: io=64MiB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec | |
242 | WRITE: io=64MiB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec | |
243 | ||
244 | For each data direction, it prints: | |
245 | ||
246 | io= Number of megabytes io performed. | |
247 | aggrb= Aggregate bandwidth of threads in this group. | |
248 | minb= The minimum average bandwidth a thread saw. | |
249 | maxb= The maximum average bandwidth a thread saw. | |
250 | mint= The minimum runtime of a thread. | |
251 | maxt= The maximum runtime of a thread. | |
252 | ||
253 | And finally, the disk statistics are printed. They will look like this: | |
254 | ||
255 | Disk stats (read/write): | |
256 | sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00% | |
257 | ||
258 | Each value is printed for both reads and writes, with reads first. The | |
259 | numbers denote: | |
260 | ||
261 | ios= Number of ios performed by all groups. | |
262 | merge= Number of merges io the io scheduler. | |
263 | ticks= Number of ticks we kept the disk busy. | |
264 | io_queue= Total time spent in the disk queue. | |
265 | util= The disk utilization. A value of 100% means we kept the disk | |
266 | busy constantly, 50% would be a disk idling half of the time. |