[fio.git] / README

fio
---

fio is a tool that will spawn a number of threads or processes doing a
particular type of io action as specified by the user. fio takes a
number of global parameters, each inherited by the thread unless
otherwise parameters given to them overriding that setting is given.
The typical use of fio is to write a job file matching the io load
one wants to simulate.


Source
------

fio resides in a git repo, the canonical place is:

git://brick.kernel.dk/data/git/fio.git

Snapshots are frequently generated and they include the git meta data as
well. You can download them here:

http://brick.kernel.dk/snaps/

Pascal Bleser <guru@unixtech.be> has fio RPMs in his repository, you
can find them here:

http://linux01.gwdg.de/~pbleser/rpm-navigation.php?cat=System/fio


Building
--------

Just type 'make' and 'make install'. If on FreeBSD, for now you have to
specify the FreeBSD Makefile with -f, eg:

$ make -f Makefile.Freebsd && make -f Makefile.FreeBSD install

Likewise with OpenSolaris, use the Makefile.solaris to compile there.
This might change in the future if I opt for an autoconf type setup.


Options
-------

$ fio
	-s IO is sequential
	-b block size in KiB for each io
	-t <sec> Runtime in seconds
	-r For random io, sequence must be repeatable
	-R <on> If one thread fails to meet rate, quit all
	-o <on> Use direct IO is 1, buffered if 0
	-l Generate per-job latency logs
	-w Generate per-job bandwidth logs
	-f <file> Read <file> for job descriptions
	-O <file> Log output to file
	-h Print help info
	-v Print version information and exit


Job file
--------

Only a few options can be controlled with command line parameters,
generally it's a lot easier to just write a simple job file to describe
the workload. The job file format is in the ini style format, as it's
easy to read and write for the user.

The job file parameters are:

	name=x		Use 'x' as the identifier for this job.
	directory=x	Use 'x' as the top level directory for storing files
	rw=x		'x' may be: read, randread, write, randwrite,
			rw (read-write mix), randrw (read-write random mix)
	rwmixcycle=x	Base cycle for switching between read and write
			in msecs.
	rwmixread=x	'x' percentage of rw mix ios will be reads. If
			rwmixwrite is also given, the last of the two will
			 be used if they don't add up to 100%.
	rwmixwrite=x	'x' percentage of rw mix ios will be writes. See
			rwmixread.
	size=x		Set file size to x bytes (x string can include k/m/g)
	ioengine=x	'x' may be: aio/libaio/linuxaio for Linux aio,
			posixaio for POSIX aio, sync for regular read/write io,
			mmap for mmap'ed io, splice for using splice/vmsplice,
			or sgio for direct SG_IO io. The latter only works on
			Linux on SCSI (or SCSI-like devices, such as
			usb-storage or sata/libata driven) devices.
	iodepth=x	For async io, allow 'x' ios in flight
	overwrite=x	If 'x', layout a write file first.
	prio=x		Run io at prio X, 0-7 is the kernel allowed range
	prioclass=x	Run io at prio class X
	bs=x		Use 'x' for thread blocksize. May include k/m postfix.
	bsrange=x-y	Mix thread block sizes randomly between x and y. May
			also include k/m postfix.
	direct=x	1 for direct IO, 0 for buffered IO
	thinktime=x	"Think" x usec after each io
	rate=x		Throttle rate to x KiB/sec
	ratemin=x	Quit if rate of x KiB/sec can't be met
	ratecycle=x	ratemin averaged over x msecs
	cpumask=x	Only allow job to run on CPUs defined by mask.
	fsync=x		If writing, fsync after every x blocks have been written
	startdelay=x	Start this thread x seconds after startup
	timeout=x	Terminate x seconds after startup
	offset=x	Start io at offset x (x string can include k/m/g)
	invalidate=x	Invalidate page cache for file prior to doing io
	sync=x		Use sync writes if x and writing
	mem=x		If x == malloc, use malloc for buffers. If x == shm,
			use shm for buffers. If x == mmap, use anon mmap.
	exitall		When one thread quits, terminate the others
	bwavgtime=x	Average bandwidth stats over an x msec window.
	create_serialize=x	If 'x', serialize file creation.
	create_fsync=x	If 'x', run fsync() after file creation.
	end_fsync=x	If 'x', run fsync() after end-of-job.
	loops=x		Run the job 'x' number of times.
	verify=x	If 'x' == md5, use md5 for verifies. If 'x' == crc32,
			use crc32 for verifies. md5 is 'safer', but crc32 is
			a lot faster. Only makes sense for writing to a file.
	stonewall	Wait for preceeding jobs to end before running.
	numjobs=x	Create 'x' similar entries for this job
	thread		Use pthreads instead of forked jobs
	zonesize=x
	zoneskip=y	Zone options must be paired. If given, the job
			will skip y bytes for every x read/written. This
			can be used to gauge hard drive speed over the entire
			platter, without reading everything. Both x/y can
			include k/m/g suffix.
	iolog=x		Open and read io pattern from file 'x'. The file must
			contain one io action per line in the following format:
			rw, offset, length
			where with rw=0/1 for read/write, and the offset
			and length entries being in bytes.
	write_iolog=x	Write an iolog to file 'x' in the same format as iolog.
			The iolog options are exclusive, if both given the
			read iolog will be performed.
	lockmem=x	Lock down x amount of memory on the machine, to
			simulate a machine with less memory available. x can
			include k/m/g suffix.
	nice=x		Run job at given nice value.
	exec_prerun=x	Run 'x' before job io is begun.
	exec_postrun=x	Run 'x' after job io has finished.
	ioscheduler=x	Use ioscheduler 'x' for this job.


Examples using a job file
-------------------------

Example 1) Two random readers

Lets say we want to simulate two threads reading randomly from a file
each. They will be doing IO in 4KiB chunks, using raw (O_DIRECT) IO.
Since they share most parameters, we'll put those in the [global]
section. Job 1 will use a 128MiB file, job 2 will use a 256MiB file.

; ---snip---

[global]
ioengine=sync	; regular read/write(2), the default
rw=randread
bs=4k
direct=1

[file1]
size=128m

[file2]
size=256m

; ---snip---

Generally the [] bracketed name specifies a file name, but the "global"
keyword is reserved for setting options that are inherited by each
subsequent job description. It's possible to have several [global]
sections in the job file, each one adds options that are inherited by
jobs defined below it. The name can also point to a block device, such
as /dev/sda. To run the above job file, simply do:

$ fio jobfile

Example 2) Many random writers

Say we want to exercise the IO subsystem some more. We'll define 64
threads doing random buffered writes. We'll let each thread use async io
with a depth of 4 ios in flight. A job file would then look like this:

; ---snip---

[global]
ioengine=libaio
iodepth=4
rw=randwrite
bs=32k
direct=0
size=64m

[files]
numjobs=64

; ---snip---

This will create files.[0-63] and perform the random writes to them.

There are endless ways to define jobs, the examples/ directory contains
a few more examples.


Interpreting the output
-----------------------

fio spits out a lot of output. While running, fio will display the
status of the jobs created. An example of that would be:

Threads now running: 2 : [ww] [5.73% done]

The characters inside the square brackets denote the current status of
each thread. The possible values (in typical life cycle order) are:

Idle	Run
----    ---
P		Thread setup, but not started.
C		Thread created.
I		Thread initialized, waiting.
	R	Running, doing sequential reads.
	r	Running, doing random reads.
	W	Running, doing sequential writes.
	w	Running, doing random writes.
	M	Running, doing mixed sequential reads/writes.
	m	Running, doing mixed random reads/writes.
	F	Running, currently waiting for fsync()
V		Running, doing verification of written data.
E		Thread exited, not reaped by main thread yet.
_		Thread reaped.

The other values are fairly self explanatory - number of threads
currently running and doing io, and the estimated completion percentage
and time.

When fio is done (or interrupted by ctrl-c), it will show the data for
each thread, group of threads, and disks in that order. For each data
direction, the output looks like:

Client1 (g=0): err= 0:
  write: io=    32MiB, bw=   666KiB/s, runt= 50320msec
    slat (msec): min=    0, max=  136, avg= 0.03, dev= 1.92
    clat (msec): min=    0, max=  631, avg=48.50, dev=86.82
    bw (KiB/s) : min=    0, max= 1196, per=51.00%, avg=664.02, dev=681.68
  cpu        : usr=1.49%, sys=0.25%, ctx=7969

The client number is printed, along with the group id and error of that
thread. Below is the io statistics, here for writes. In the order listed,
they denote:

io=		Number of megabytes io performed
bw=		Average bandwidth rate
runt=		The runtime of that thread
	slat=	Submission latency (avg being the average, dev being the
		standard deviation). This is the time it took to submit
		the io. For sync io, the slat is really the completion
		latency, since queue/complete is one operation there.
	clat=	Completion latency. Same names as slat, this denotes the
		time from submission to completion of the io pieces. For
		sync io, clat will usually be equal (or very close) to 0,
		as the time from submit to complete is basically just
		CPU time (io has already been done, see slat explanation).
	bw=	Bandwidth. Same names as the xlat stats, but also includes
		an approximate percentage of total aggregate bandwidth
		this thread received in this group. This last value is
		only really useful if the threads in this group are on the
		same disk, since they are then competing for disk access.
cpu=		CPU usage. User and system time, along with the number
		of context switches this thread went through.

After each client has been listed, the group statistics are printed. They
will look like this:

Run status group 0 (all jobs):
   READ: io=64MiB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec
  WRITE: io=64MiB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec

For each data direction, it prints:

io=		Number of megabytes io performed.
aggrb=		Aggregate bandwidth of threads in this group.
minb=		The minimum average bandwidth a thread saw.
maxb=		The maximum average bandwidth a thread saw.
mint=		The smallest runtime of the threads in that group.
maxt=		The longest runtime of the threads in that group.

And finally, the disk statistics are printed. They will look like this:

Disk stats (read/write):
  sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%

Each value is printed for both reads and writes, with reads first. The
numbers denote:

ios=		Number of ios performed by all groups.
merge=		Number of merges io the io scheduler.
ticks=		Number of ticks we kept the disk busy.
io_queue=	Total time spent in the disk queue.
util=		The disk utilization. A value of 100% means we kept the disk
		busy constantly, 50% would be a disk idling half of the time.


Author
------

Fio was written by Jens Axboe <axboe@suse.de> to enable flexible testing
of the Linux IO subsystem and schedulers. He got tired of writing
specific test applications to simulate a given workload, and found that
the existing io benchmark/test tools out there weren't flexible enough
to do what he wanted.

Jens Axboe <axboe@suse.de> 20060609
Commit	Line	Data
	1	fio
	2	---
	3
	4	fio is a tool that will spawn a number of threads or processes doing a
	5	particular type of io action as specified by the user. fio takes a
	6	number of global parameters, each inherited by the thread unless
	7	otherwise parameters given to them overriding that setting is given.
	8	The typical use of fio is to write a job file matching the io load
	9	one wants to simulate.
	10
	11
	12	Source
	13	------
	14
	15	fio resides in a git repo, the canonical place is:
	16
	17	git://brick.kernel.dk/data/git/fio.git
	18
	19	Snapshots are frequently generated and they include the git meta data as
	20	well. You can download them here:
	21
	22	http://brick.kernel.dk/snaps/
	23
	24	Pascal Bleser <guru@unixtech.be> has fio RPMs in his repository, you
	25	can find them here:
	26
	27	http://linux01.gwdg.de/~pbleser/rpm-navigation.php?cat=System/fio
	28
	29
	30	Building
	31	--------
	32
	33	Just type 'make' and 'make install'. If on FreeBSD, for now you have to
	34	specify the FreeBSD Makefile with -f, eg:
	35
	36	$ make -f Makefile.Freebsd && make -f Makefile.FreeBSD install
	37
	38	Likewise with OpenSolaris, use the Makefile.solaris to compile there.
	39	This might change in the future if I opt for an autoconf type setup.
	40
	41
	42	Options
	43	-------
	44
	45	$ fio
	46	-s IO is sequential
	47	-b block size in KiB for each io
	48	-t <sec> Runtime in seconds
	49	-r For random io, sequence must be repeatable
	50	-R <on> If one thread fails to meet rate, quit all
	51	-o <on> Use direct IO is 1, buffered if 0
	52	-l Generate per-job latency logs
	53	-w Generate per-job bandwidth logs
	54	-f <file> Read <file> for job descriptions
	55	-O <file> Log output to file
	56	-h Print help info
	57	-v Print version information and exit
	58
	59
	60	Job file
	61	--------
	62
	63	Only a few options can be controlled with command line parameters,
	64	generally it's a lot easier to just write a simple job file to describe
	65	the workload. The job file format is in the ini style format, as it's
	66	easy to read and write for the user.
	67
	68	The job file parameters are:
	69
	70	name=x Use 'x' as the identifier for this job.
	71	directory=x Use 'x' as the top level directory for storing files
	72	rw=x 'x' may be: read, randread, write, randwrite,
	73	rw (read-write mix), randrw (read-write random mix)
	74	rwmixcycle=x Base cycle for switching between read and write
	75	in msecs.
	76	rwmixread=x 'x' percentage of rw mix ios will be reads. If
	77	rwmixwrite is also given, the last of the two will
	78	be used if they don't add up to 100%.
	79	rwmixwrite=x 'x' percentage of rw mix ios will be writes. See
	80	rwmixread.
	81	size=x Set file size to x bytes (x string can include k/m/g)
	82	ioengine=x 'x' may be: aio/libaio/linuxaio for Linux aio,
	83	posixaio for POSIX aio, sync for regular read/write io,
	84	mmap for mmap'ed io, splice for using splice/vmsplice,
	85	or sgio for direct SG_IO io. The latter only works on
	86	Linux on SCSI (or SCSI-like devices, such as
	87	usb-storage or sata/libata driven) devices.
	88	iodepth=x For async io, allow 'x' ios in flight
	89	overwrite=x If 'x', layout a write file first.
	90	prio=x Run io at prio X, 0-7 is the kernel allowed range
	91	prioclass=x Run io at prio class X
	92	bs=x Use 'x' for thread blocksize. May include k/m postfix.
	93	bsrange=x-y Mix thread block sizes randomly between x and y. May
	94	also include k/m postfix.
	95	direct=x 1 for direct IO, 0 for buffered IO
	96	thinktime=x "Think" x usec after each io
	97	rate=x Throttle rate to x KiB/sec
	98	ratemin=x Quit if rate of x KiB/sec can't be met
	99	ratecycle=x ratemin averaged over x msecs
	100	cpumask=x Only allow job to run on CPUs defined by mask.
	101	fsync=x If writing, fsync after every x blocks have been written
	102	startdelay=x Start this thread x seconds after startup
	103	timeout=x Terminate x seconds after startup
	104	offset=x Start io at offset x (x string can include k/m/g)
	105	invalidate=x Invalidate page cache for file prior to doing io
	106	sync=x Use sync writes if x and writing
	107	mem=x If x == malloc, use malloc for buffers. If x == shm,
	108	use shm for buffers. If x == mmap, use anon mmap.
	109	exitall When one thread quits, terminate the others
	110	bwavgtime=x Average bandwidth stats over an x msec window.
	111	create_serialize=x If 'x', serialize file creation.
	112	create_fsync=x If 'x', run fsync() after file creation.
	113	end_fsync=x If 'x', run fsync() after end-of-job.
	114	loops=x Run the job 'x' number of times.
	115	verify=x If 'x' == md5, use md5 for verifies. If 'x' == crc32,
	116	use crc32 for verifies. md5 is 'safer', but crc32 is
	117	a lot faster. Only makes sense for writing to a file.
	118	stonewall Wait for preceeding jobs to end before running.
	119	numjobs=x Create 'x' similar entries for this job
	120	thread Use pthreads instead of forked jobs
	121	zonesize=x
	122	zoneskip=y Zone options must be paired. If given, the job
	123	will skip y bytes for every x read/written. This
	124	can be used to gauge hard drive speed over the entire
	125	platter, without reading everything. Both x/y can
	126	include k/m/g suffix.
	127	iolog=x Open and read io pattern from file 'x'. The file must
	128	contain one io action per line in the following format:
	129	rw, offset, length
	130	where with rw=0/1 for read/write, and the offset
	131	and length entries being in bytes.
	132	write_iolog=x Write an iolog to file 'x' in the same format as iolog.
	133	The iolog options are exclusive, if both given the
	134	read iolog will be performed.
	135	lockmem=x Lock down x amount of memory on the machine, to
	136	simulate a machine with less memory available. x can
	137	include k/m/g suffix.
	138	nice=x Run job at given nice value.
	139	exec_prerun=x Run 'x' before job io is begun.
	140	exec_postrun=x Run 'x' after job io has finished.
	141	ioscheduler=x Use ioscheduler 'x' for this job.
	142
	143
	144	Examples using a job file
	145	-------------------------
	146
	147	Example 1) Two random readers
	148
	149	Lets say we want to simulate two threads reading randomly from a file
	150	each. They will be doing IO in 4KiB chunks, using raw (O_DIRECT) IO.
	151	Since they share most parameters, we'll put those in the [global]
	152	section. Job 1 will use a 128MiB file, job 2 will use a 256MiB file.
	153
	154	; ---snip---
	155
	156	[global]
	157	ioengine=sync ; regular read/write(2), the default
	158	rw=randread
	159	bs=4k
	160	direct=1
	161
	162	[file1]
	163	size=128m
	164
	165	[file2]
	166	size=256m
	167
	168	; ---snip---
	169
	170	Generally the [] bracketed name specifies a file name, but the "global"
	171	keyword is reserved for setting options that are inherited by each
	172	subsequent job description. It's possible to have several [global]
	173	sections in the job file, each one adds options that are inherited by
	174	jobs defined below it. The name can also point to a block device, such
	175	as /dev/sda. To run the above job file, simply do:
	176
	177	$ fio jobfile
	178
	179	Example 2) Many random writers
	180
	181	Say we want to exercise the IO subsystem some more. We'll define 64
	182	threads doing random buffered writes. We'll let each thread use async io
	183	with a depth of 4 ios in flight. A job file would then look like this:
	184
	185	; ---snip---
	186
	187	[global]
	188	ioengine=libaio
	189	iodepth=4
	190	rw=randwrite
	191	bs=32k
	192	direct=0
	193	size=64m
	194
	195	[files]
	196	numjobs=64
	197
	198	; ---snip---
	199
	200	This will create files.[0-63] and perform the random writes to them.
	201
	202	There are endless ways to define jobs, the examples/ directory contains
	203	a few more examples.
	204
	205
	206	Interpreting the output
	207	-----------------------
	208
	209	fio spits out a lot of output. While running, fio will display the
	210	status of the jobs created. An example of that would be:
	211
	212	Threads now running: 2 : [ww] [5.73% done]
	213
	214	The characters inside the square brackets denote the current status of
	215	each thread. The possible values (in typical life cycle order) are:
	216
	217	Idle Run
	218	---- ---
	219	P Thread setup, but not started.
	220	C Thread created.
	221	I Thread initialized, waiting.
	222	R Running, doing sequential reads.
	223	r Running, doing random reads.
	224	W Running, doing sequential writes.
	225	w Running, doing random writes.
	226	M Running, doing mixed sequential reads/writes.
	227	m Running, doing mixed random reads/writes.
	228	F Running, currently waiting for fsync()
	229	V Running, doing verification of written data.
	230	E Thread exited, not reaped by main thread yet.
	231	_ Thread reaped.
	232
	233	The other values are fairly self explanatory - number of threads
	234	currently running and doing io, and the estimated completion percentage
	235	and time.
	236
	237	When fio is done (or interrupted by ctrl-c), it will show the data for
	238	each thread, group of threads, and disks in that order. For each data
	239	direction, the output looks like:
	240
	241	Client1 (g=0): err= 0:
	242	write: io= 32MiB, bw= 666KiB/s, runt= 50320msec
	243	slat (msec): min= 0, max= 136, avg= 0.03, dev= 1.92
	244	clat (msec): min= 0, max= 631, avg=48.50, dev=86.82
	245	bw (KiB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, dev=681.68
	246	cpu : usr=1.49%, sys=0.25%, ctx=7969
	247
	248	The client number is printed, along with the group id and error of that
	249	thread. Below is the io statistics, here for writes. In the order listed,
	250	they denote:
	251
	252	io= Number of megabytes io performed
	253	bw= Average bandwidth rate
	254	runt= The runtime of that thread
	255	slat= Submission latency (avg being the average, dev being the
	256	standard deviation). This is the time it took to submit
	257	the io. For sync io, the slat is really the completion
	258	latency, since queue/complete is one operation there.
	259	clat= Completion latency. Same names as slat, this denotes the
	260	time from submission to completion of the io pieces. For
	261	sync io, clat will usually be equal (or very close) to 0,
	262	as the time from submit to complete is basically just
	263	CPU time (io has already been done, see slat explanation).
	264	bw= Bandwidth. Same names as the xlat stats, but also includes
	265	an approximate percentage of total aggregate bandwidth
	266	this thread received in this group. This last value is
	267	only really useful if the threads in this group are on the
	268	same disk, since they are then competing for disk access.
	269	cpu= CPU usage. User and system time, along with the number
	270	of context switches this thread went through.
	271
	272	After each client has been listed, the group statistics are printed. They
	273	will look like this:
	274
	275	Run status group 0 (all jobs):
	276	READ: io=64MiB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec
	277	WRITE: io=64MiB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec
	278
	279	For each data direction, it prints:
	280
	281	io= Number of megabytes io performed.
	282	aggrb= Aggregate bandwidth of threads in this group.
	283	minb= The minimum average bandwidth a thread saw.
	284	maxb= The maximum average bandwidth a thread saw.
	285	mint= The smallest runtime of the threads in that group.
	286	maxt= The longest runtime of the threads in that group.
	287
	288	And finally, the disk statistics are printed. They will look like this:
	289
	290	Disk stats (read/write):
	291	sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
	292
	293	Each value is printed for both reads and writes, with reads first. The
	294	numbers denote:
	295
	296	ios= Number of ios performed by all groups.
	297	merge= Number of merges io the io scheduler.
	298	ticks= Number of ticks we kept the disk busy.
	299	io_queue= Total time spent in the disk queue.
	300	util= The disk utilization. A value of 100% means we kept the disk
	301	busy constantly, 50% would be a disk idling half of the time.
	302
	303
	304	Author
	305	------
	306
	307	Fio was written by Jens Axboe <axboe@suse.de> to enable flexible testing
	308	of the Linux IO subsystem and schedulers. He got tired of writing
	309	specific test applications to simulate a given workload, and found that
	310	the existing io benchmark/test tools out there weren't flexible enough
	311	to do what he wanted.
	312
	313	Jens Axboe <axboe@suse.de> 20060609
	314