[linux-2.6-block.git] / tools / perf / Documentation / perf-bench.txt

perf-bench(1)
=============

NAME
----
perf-bench - General framework for benchmark suites

SYNOPSIS
--------
[verse]
'perf bench' [<common options>] <subsystem> <suite> [<options>]

DESCRIPTION
-----------
This 'perf bench' command is a general framework for benchmark suites.

COMMON OPTIONS
--------------
-r::
--repeat=::
Specify number of times to repeat the run (default 10).

-f::
--format=::
Specify format style.
Current available format styles are:

'default'::
Default style. This is mainly for human reading.
---------------------
% perf bench sched pipe                      # with no style specified
(executing 1000000 pipe operations between two tasks)
        Total time:5.855 sec
                5.855061 usecs/op
		170792 ops/sec
---------------------

'simple'::
This simple style is friendly for automated
processing by scripts.
---------------------
% perf bench --format=simple sched pipe      # specified simple
5.988
---------------------

SUBSYSTEM
---------

'sched'::
	Scheduler and IPC mechanisms.

'syscall'::
	System call performance (throughput).

'mem'::
	Memory access performance.

'numa'::
	NUMA scheduling and MM benchmarks.

'futex'::
	Futex stressing benchmarks.

'epoll'::
	Eventpoll (epoll) stressing benchmarks.

'internals'::
	Benchmark internal perf functionality.

'uprobe'::
	Benchmark overhead of uprobe + BPF.

'all'::
	All benchmark subsystems.

SUITES FOR 'sched'
~~~~~~~~~~~~~~~~~~
*messaging*::
Suite for evaluating performance of scheduler and IPC mechanisms.
Based on hackbench by Rusty Russell.

Options of *messaging*
^^^^^^^^^^^^^^^^^^^^^^
-p::
--pipe::
Use pipe() instead of socketpair()

-t::
--thread::
Be multi thread instead of multi process

-g::
--group=::
Specify number of groups

-l::
--nr_loops=::
Specify number of loops

Example of *messaging*
^^^^^^^^^^^^^^^^^^^^^^

---------------------
% perf bench sched messaging                 # run with default
options (20 sender and receiver processes per group)
(10 groups == 400 processes run)

      Total time:0.308 sec

% perf bench sched messaging -t -g 20        # be multi-thread, with 20 groups
(20 sender and receiver threads per group)
(20 groups == 800 threads run)

      Total time:0.582 sec
---------------------

*pipe*::
Suite for pipe() system call.
Based on pipe-test-1m.c by Ingo Molnar.

Options of *pipe*
^^^^^^^^^^^^^^^^^
-l::
--loop=::
Specify number of loops.

-G::
--cgroups=::
Names of cgroups for sender and receiver, separated by a comma.
This is useful to check cgroup context switching overhead.
Note that perf doesn't create nor delete the cgroups, so users should
make sure that the cgroups exist and are accessible before use.


Example of *pipe*
^^^^^^^^^^^^^^^^^

---------------------
% perf bench sched pipe
(executing 1000000 pipe operations between two tasks)

        Total time:8.091 sec
                8.091833 usecs/op
                123581 ops/sec

% perf bench sched pipe -l 1000              # loop 1000
(executing 1000 pipe operations between two tasks)

        Total time:0.016 sec
                16.948000 usecs/op
                59004 ops/sec

% perf bench sched pipe -G AAA,BBB
(executing 1000000 pipe operations between cgroups)
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes

     Total time: 6.886 [sec]

       6.886208 usecs/op
         145217 ops/sec

---------------------

SUITES FOR 'syscall'
~~~~~~~~~~~~~~~~~~
*basic*::
Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).
This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not
cached by glibc.


SUITES FOR 'mem'
~~~~~~~~~~~~~~~~
*memcpy*::
Suite for evaluating performance of simple memory copy in various ways.

Options of *memcpy*
^^^^^^^^^^^^^^^^^^^
-l::
--size::
Specify size of memory to copy (default: 1MB).
Available units are B, KB, MB, GB and TB (case insensitive).

-f::
--function::
Specify function to copy (default: default).
Available functions are depend on the architecture.
On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.

-l::
--nr_loops::
Repeat memcpy invocation this number of times.

-c::
--cycles::
Use perf's cpu-cycles event instead of gettimeofday syscall.

*memset*::
Suite for evaluating performance of simple memory set in various ways.

Options of *memset*
^^^^^^^^^^^^^^^^^^^
-l::
--size::
Specify size of memory to set (default: 1MB).
Available units are B, KB, MB, GB and TB (case insensitive).

-f::
--function::
Specify function to set (default: default).
Available functions are depend on the architecture.
On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.

-l::
--nr_loops::
Repeat memset invocation this number of times.

-c::
--cycles::
Use perf's cpu-cycles event instead of gettimeofday syscall.

SUITES FOR 'numa'
~~~~~~~~~~~~~~~~~
*mem*::
Suite for evaluating NUMA workloads.

SUITES FOR 'futex'
~~~~~~~~~~~~~~~~~~
*hash*::
Suite for evaluating hash tables.

*wake*::
Suite for evaluating wake calls.

*wake-parallel*::
Suite for evaluating parallel wake calls.

*requeue*::
Suite for evaluating requeue calls.

*lock-pi*::
Suite for evaluating futex lock_pi calls.

SUITES FOR 'epoll'
~~~~~~~~~~~~~~~~~~
*wait*::
Suite for evaluating concurrent epoll_wait calls.

*ctl*::
Suite for evaluating multiple epoll_ctl calls.

SUITES FOR 'internals'
~~~~~~~~~~~~~~~~~~~~~~
*synthesize*::
Suite for evaluating perf's event synthesis performance.

SEE ALSO
--------
linkperf:perf[1]
Commit	Line	Data
9fbc04f2	1	perf-bench(1)
4778e0e8	2	=============
9fbc04f2 HM	3
	4	NAME
	5	----
	6	perf-bench - General framework for benchmark suites
	7
	8	SYNOPSIS
	9	--------
	10	[verse]
	11	'perf bench' [<common options>] <subsystem> <suite> [<options>]
	12
	13	DESCRIPTION
	14	-----------
08942f6d	15	This 'perf bench' command is a general framework for benchmark suites.
9fbc04f2 HM	16
	17	COMMON OPTIONS
	18	--------------
b6f0629a DB	19	-r::
b6f0629a DB	20	--repeat=::
fc5d836c	21	Specify number of times to repeat the run (default 10).
b6f0629a	22
9fbc04f2 HM	23	-f::
	24	--format=::
	25	Specify format style.
854c5548	26	Current available format styles are:
9fbc04f2 HM	27
	28	'default'::
	29	Default style. This is mainly for human reading.
	30	---------------------
854c5548	31	% perf bench sched pipe # with no style specified
9fbc04f2 HM	32	(executing 1000000 pipe operations between two tasks)
	33	Total time:5.855 sec
	34	5.855061 usecs/op
	35	170792 ops/sec
	36	---------------------
	37
	38	'simple'::
	39	This simple style is friendly for automated
	40	processing by scripts.
	41	---------------------
	42	% perf bench --format=simple sched pipe # specified simple
	43	5.988
	44	---------------------
	45
	46	SUBSYSTEM
	47	---------
	48
	49	'sched'::
	50	Scheduler and IPC mechanisms.
	51
c2a08203 DB	52	'syscall'::
	53	System call performance (throughput).
	54
08942f6d NK	55	'mem'::
	56	Memory access performance.
	57
95a2b3c0 RR	58	'numa'::
	59	NUMA scheduling and MM benchmarks.
	60
	61	'futex'::
	62	Futex stressing benchmarks.
	63
121dd9ea DB	64	'epoll'::
	65	Eventpoll (epoll) stressing benchmarks.
	66
2a4b5166 IR	67	'internals'::
	68	Benchmark internal perf functionality.
	69
2df27071 ACM	70	'uprobe'::
	71	Benchmark overhead of uprobe + BPF.
	72
08942f6d NK	73	'all'::
	74	All benchmark subsystems.
	75
9fbc04f2 HM	76	SUITES FOR 'sched'
	77	~~~~~~~~~~~~~~~~~~
	78	messaging::
	79	Suite for evaluating performance of scheduler and IPC mechanisms.
	80	Based on hackbench by Rusty Russell.
	81
08942f6d NK	82	Options of messaging
08942f6d NK	83	^^^^^^^^^^^^^^^^^^^^^^
9fbc04f2 HM	84	-p::
	85	--pipe::
	86	Use pipe() instead of socketpair()
	87
	88	-t::
	89	--thread::
	90	Be multi thread instead of multi process
	91
	92	-g::
	93	--group=::
	94	Specify number of groups
	95
	96	-l::
b0d22e52	97	--nr_loops=::
9fbc04f2 HM	98	Specify number of loops
	99
	100	Example of messaging
	101	^^^^^^^^^^^^^^^^^^^^^^
	102
	103	---------------------
	104	% perf bench sched messaging # run with default
	105	options (20 sender and receiver processes per group)
	106	(10 groups == 400 processes run)
	107
	108	Total time:0.308 sec
	109
854c5548	110	% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups
9fbc04f2 HM	111	(20 sender and receiver threads per group)
	112	(20 groups == 800 threads run)
	113
	114	Total time:0.582 sec
	115	---------------------
	116
	117	pipe::
	118	Suite for pipe() system call.
	119	Based on pipe-test-1m.c by Ingo Molnar.
	120
	121	Options of pipe
	122	^^^^^^^^^^^^^^^^^
	123	-l::
	124	--loop=::
	125	Specify number of loops.
	126
79a3371b NK	127	-G::
	128	--cgroups=::
	129	Names of cgroups for sender and receiver, separated by a comma.
	130	This is useful to check cgroup context switching overhead.
	131	Note that perf doesn't create nor delete the cgroups, so users should
	132	make sure that the cgroups exist and are accessible before use.
	133
	134
9fbc04f2 HM	135	Example of pipe
	136	^^^^^^^^^^^^^^^^^
	137
	138	---------------------
	139	% perf bench sched pipe
	140	(executing 1000000 pipe operations between two tasks)
	141
	142	Total time:8.091 sec
	143	8.091833 usecs/op
	144	123581 ops/sec
	145
	146	% perf bench sched pipe -l 1000 # loop 1000
	147	(executing 1000 pipe operations between two tasks)
	148
	149	Total time:0.016 sec
	150	16.948000 usecs/op
	151	59004 ops/sec
79a3371b NK	152
	153	% perf bench sched pipe -G AAA,BBB
	154	(executing 1000000 pipe operations between cgroups)
	155	# Running 'sched/pipe' benchmark:
	156	# Executed 1000000 pipe operations between two processes
	157
	158	Total time: 6.886 [sec]
	159
	160	6.886208 usecs/op
	161	145217 ops/sec
	162
9fbc04f2 HM	163	---------------------
9fbc04f2 HM	164
c2a08203 DB	165	SUITES FOR 'syscall'
	166	~~~~~~~~~~~~~~~~~~
	167	basic::
	168	Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).
	169	This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not
	170	cached by glibc.
	171
	172
08942f6d NK	173	SUITES FOR 'mem'
	174	~~~~~~~~~~~~~~~~
	175	memcpy::
	176	Suite for evaluating performance of simple memory copy in various ways.
	177
	178	Options of memcpy
	179	^^^^^^^^^^^^^^^^^^^
	180	-l::
a69b4f74 IM	181	--size::
a69b4f74 IM	182	Specify size of memory to copy (default: 1MB).
08942f6d NK	183	Available units are B, KB, MB, GB and TB (case insensitive).
08942f6d NK	184
2f211c84 IM	185	-f::
	186	--function::
	187	Specify function to copy (default: default).
	188	Available functions are depend on the architecture.
08942f6d NK	189	On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.
08942f6d NK	190
b0d22e52 IM	191	-l::
b0d22e52 IM	192	--nr_loops::
08942f6d NK	193	Repeat memcpy invocation this number of times.
	194
	195	-c::
b14f2d35	196	--cycles::
08942f6d NK	197	Use perf's cpu-cycles event instead of gettimeofday syscall.
08942f6d NK	198
08942f6d NK	199	memset::
	200	Suite for evaluating performance of simple memory set in various ways.
	201
	202	Options of memset
	203	^^^^^^^^^^^^^^^^^^^
	204	-l::
a69b4f74 IM	205	--size::
a69b4f74 IM	206	Specify size of memory to set (default: 1MB).
08942f6d NK	207	Available units are B, KB, MB, GB and TB (case insensitive).
08942f6d NK	208
2f211c84 IM	209	-f::
	210	--function::
	211	Specify function to set (default: default).
	212	Available functions are depend on the architecture.
08942f6d NK	213	On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.
08942f6d NK	214
b0d22e52 IM	215	-l::
b0d22e52 IM	216	--nr_loops::
08942f6d NK	217	Repeat memset invocation this number of times.
	218
	219	-c::
b14f2d35	220	--cycles::
08942f6d NK	221	Use perf's cpu-cycles event instead of gettimeofday syscall.
08942f6d NK	222
95a2b3c0 RR	223	SUITES FOR 'numa'
	224	~~~~~~~~~~~~~~~~~
	225	mem::
	226	Suite for evaluating NUMA workloads.
	227
	228	SUITES FOR 'futex'
	229	~~~~~~~~~~~~~~~~~~
	230	hash::
	231	Suite for evaluating hash tables.
	232
	233	wake::
	234	Suite for evaluating wake calls.
	235
d65817b4 DB	236	wake-parallel::
	237	Suite for evaluating parallel wake calls.
	238
95a2b3c0 RR	239	requeue::
	240	Suite for evaluating requeue calls.
	241
d2f3f5d2 DB	242	lock-pi::
	243	Suite for evaluating futex lock_pi calls.
	244
121dd9ea DB	245	SUITES FOR 'epoll'
	246	~~~~~~~~~~~~~~~~~~
	247	wait::
	248	Suite for evaluating concurrent epoll_wait calls.
d2f3f5d2	249
231457ec DB	250	ctl::
	251	Suite for evaluating multiple epoll_ctl calls.
	252
2a4b5166 IR	253	SUITES FOR 'internals'
	254	~~~~~~~~~~~~~~~~~~~~~~
	255	synthesize::
	256	Suite for evaluating perf's event synthesis performance.
	257
9fbc04f2 HM	258	SEE ALSO
	259	--------
	260	linkperf:perf[1]