Merge tag 'for-5.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
[linux-2.6-block.git] / Documentation / trace / kprobetrace.rst
CommitLineData
263ee775
CD
1==========================
2Kprobe-based Event Tracing
3==========================
d8ec9185 4
263ee775 5:Author: Masami Hiramatsu
d8ec9185
MH
6
7Overview
8--------
77b44d1b
MH
9These events are similar to tracepoint based events. Instead of Tracepoint,
10this is based on kprobes (kprobe and kretprobe). So it can probe wherever
c1ac094d
NR
11kprobes can probe (this means, all functions except those with
12__kprobes/nokprobe_inline annotation and those marked NOKPROBE_SYMBOL).
13Unlike the Tracepoint based event, this can be added and removed
77b44d1b 14dynamically, on the fly.
d8ec9185 15
6b0b7551 16To enable this feature, build your kernel with CONFIG_KPROBE_EVENTS=y.
d8ec9185 17
77b44d1b
MH
18Similar to the events tracer, this doesn't need to be activated via
19current_tracer. Instead of that, add probe points via
20/sys/kernel/debug/tracing/kprobe_events, and enable it via
e50891d6 21/sys/kernel/debug/tracing/events/kprobes/<EVENT>/enable.
d8ec9185 22
6212dd29
MH
23You can also use /sys/kernel/debug/tracing/dynamic_events instead of
24kprobe_events. That interface will provide unified access to other
25dynamic events too.
d8ec9185
MH
26
27Synopsis of kprobe_events
28-------------------------
263ee775
CD
29::
30
61424318 31 p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
696ced4f 32 r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
df3ab708 33 -:[GRP/]EVENT : Clear a probe
d8ec9185 34
f52487e9 35 GRP : Group name. If omitted, use "kprobes" for it.
2fba0c88 36 EVENT : Event name. If omitted, the event name is generated
61424318
MH
37 based on SYM+offs or MEMADDR.
38 MOD : Module name which has given SYM.
39 SYM[+offs] : Symbol+offset where the probe is inserted.
2fba0c88 40 MEMADDR : Address where the probe is inserted.
696ced4f
AC
41 MAXACTIVE : Maximum number of instances of the specified function that
42 can be probed simultaneously, or 0 for the default value
43 as defined in Documentation/kprobes.txt section 1.3.1.
d8ec9185 44
2fba0c88 45 FETCHARGS : Arguments. Each probe can have up to 128 args.
2e06ff63
MH
46 %REG : Fetch register REG
47 @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
d8ec9185 48 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
2e06ff63
MH
49 $stackN : Fetch Nth entry of stack (N >= 0)
50 $stack : Fetch stack address.
a1303af5
MH
51 $argN : Fetch the Nth function argument. (N >= 1) (\*1)
52 $retval : Fetch return value.(\*2)
35abb67d 53 $comm : Fetch current task comm.
e65f7ae7 54 +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
6218bf9f 55 \IMM : Store an immediate value to the argument.
93ccae7a
MH
56 NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
57 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
17ce3dc7 58 (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
88903c46
MH
59 (x8/x16/x32/x64), "string", "ustring" and bitfield
60 are supported.
d8ec9185 61
a1303af5
MH
62 (\*1) only for the probe on function entry (offs == 0).
63 (\*2) only for return probe.
64 (\*3) this is useful for fetching a field of data structures.
e65f7ae7 65 (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
d8ec9185 66
1ff511e3
MH
67Types
68-----
69Several types are supported for fetch-args. Kprobe tracer will access memory
70by given type. Prefix 's' and 'u' means those types are signed and unsigned
bdca79c2
MH
71respectively. 'x' prefix implies it is unsigned. Traced arguments are shown
72in decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32'
73or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and
74x86-64 uses x64).
40b53b77
MH
75These value types can be an array. To record array data, you can add '[N]'
76(where N is a fixed number, less than 64) to the base type.
77E.g. 'x16[4]' means an array of x16 (2bytes hex) with 4 elements.
78Note that the array can be applied to memory type fetchargs, you can not
79apply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is
80wrong, but '+8($stack):x8[8]' is OK.)
1ff511e3
MH
81String type is a special type, which fetches a "null-terminated" string from
82kernel space. This means it will fail and store NULL if the string container
88903c46 83has been paged out. "ustring" type is an alternative of string for user-space.
e65f7ae7 84See :ref:`user_mem_access` for more info..
40b53b77
MH
85The string array type is a bit different from other types. For other base
86types, <base-type>[1] is equal to <base-type> (e.g. +0(%di):x32[1] is same
87as +0(%di):x32.) But string[1] is not equal to string. The string type itself
88represents "char array", but string array type represents "char * array".
89So, for example, +0(%di):string[1] is equal to +0(+0(%di)):string.
1ff511e3 90Bitfield is another special type, which takes 3 parameters, bit-width, bit-
263ee775 91offset, and container-size (usually 32). The syntax is::
1ff511e3
MH
92
93 b<bit-width>@<bit-offset>/<container-size>
94
60c2e0ce
MH
95Symbol type('symbol') is an alias of u32 or u64 type (depends on BITS_PER_LONG)
96which shows given pointer in "symbol+offset" style.
35abb67d
OS
97For $comm, the default type is "string"; any other type is invalid.
98
e65f7ae7
MH
99.. _user_mem_access:
100User Memory Access
101------------------
102Kprobe events supports user-space memory access. For that purpose, you can use
103either user-space dereference syntax or 'ustring' type.
104
105The user-space dereference syntax allows you to access a field of a data
106structure in user-space. This is done by adding the "u" prefix to the
107dereference syntax. For example, +u4(%si) means it will read memory from the
108address in the register %si offset by 4, and the memory is expected to be in
109user-space. You can use this for strings too, e.g. +u0(%si):string will read
110a string from the address in the register %si that is expected to be in user-
111space. 'ustring' is a shortcut way of performing the same task. That is,
112+0(%si):ustring is equivalent to +u0(%si):string.
113
114Note that kprobe-event provides the user-memory access syntax but it doesn't
115use it transparently. This means if you use normal dereference or string type
116for user memory, it might fail, and may always fail on some archs. The user
117has to carefully check if the target data is in kernel or user space.
d8ec9185
MH
118
119Per-Probe Event Filtering
120-------------------------
263ee775 121Per-probe event filtering feature allows you to set different filter on each
d8ec9185 122probe and gives you what arguments will be shown in trace buffer. If an event
77b44d1b
MH
123name is specified right after 'p:' or 'r:' in kprobe_events, it adds an event
124under tracing/events/kprobes/<EVENT>, at the directory you can see 'id',
31130c8e 125'enable', 'format', 'filter' and 'trigger'.
d8ec9185 126
e50891d6 127enable:
d8ec9185
MH
128 You can enable/disable the probe by writing 1 or 0 on it.
129
130format:
eca0d916 131 This shows the format of this probe event.
d8ec9185
MH
132
133filter:
eca0d916 134 You can write filtering rules of this event.
d8ec9185 135
e08d1c65
MH
136id:
137 This shows the id of this probe event.
d8ec9185 138
31130c8e
AZ
139trigger:
140 This allows to install trigger commands which are executed when the event is
141 hit (for details, see Documentation/trace/events.rst, section 6).
77b44d1b 142
cd7e7bd5
MH
143Event Profiling
144---------------
263ee775 145You can check the total number of probe hits and probe miss-hits via
cd7e7bd5 146/sys/kernel/debug/tracing/kprobe_profile.
263ee775 147The first column is event name, the second is the number of probe hits,
cd7e7bd5
MH
148the third is the number of probe miss-hits.
149
970988e1
MH
150Kernel Boot Parameter
151---------------------
152You can add and enable new kprobe events when booting up the kernel by
153"kprobe_event=" parameter. The parameter accepts a semicolon-delimited
154kprobe events, which format is similar to the kprobe_events.
155The difference is that the probe definition parameters are comma-delimited
156instead of space. For example, adding myprobe event on do_sys_open like below
157
158 p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)
159
160should be below for kernel boot parameter (just replace spaces with comma)
161
162 p:myprobe,do_sys_open,dfd=%ax,filename=%dx,flags=%cx,mode=+4($stack)
163
cd7e7bd5 164
d8ec9185
MH
165Usage examples
166--------------
167To add a probe as a new event, write a new definition to kprobe_events
263ee775 168as below::
d8ec9185 169
580d9e00 170 echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events
d8ec9185 171
263ee775 172This sets a kprobe on the top of do_sys_open() function with recording
14640106
MH
1731st to 4th arguments as "myprobe" event. Note, which register/stack entry is
174assigned to each function argument depends on arch-specific ABI. If you unsure
175the ABI, please try to use probe subcommand of perf-tools (you can find it
176under tools/perf/).
177As this example shows, users can choose more familiar names for each arguments.
263ee775 178::
d8ec9185 179
580d9e00 180 echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events
d8ec9185 181
263ee775 182This sets a kretprobe on the return point of do_sys_open() function with
99329c44 183recording return value as "myretprobe" event.
263ee775 184You can see the format of these events via
d8ec9185 185/sys/kernel/debug/tracing/events/kprobes/<EVENT>/format.
263ee775 186::
d8ec9185
MH
187
188 cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
263ee775
CD
189 name: myprobe
190 ID: 780
191 format:
192 field:unsigned short common_type; offset:0; size:2; signed:0;
193 field:unsigned char common_flags; offset:2; size:1; signed:0;
194 field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
195 field:int common_pid; offset:4; size:4; signed:1;
ec3a9039 196
263ee775
CD
197 field:unsigned long __probe_ip; offset:12; size:4; signed:0;
198 field:int __probe_nargs; offset:16; size:4; signed:1;
199 field:unsigned long dfd; offset:20; size:4; signed:0;
200 field:unsigned long filename; offset:24; size:4; signed:0;
201 field:unsigned long flags; offset:28; size:4; signed:0;
202 field:unsigned long mode; offset:32; size:4; signed:0;
ec3a9039
MH
203
204
263ee775
CD
205 print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip,
206 REC->dfd, REC->filename, REC->flags, REC->mode
d8ec9185 207
263ee775
CD
208You can see that the event has 4 arguments as in the expressions you specified.
209::
d8ec9185
MH
210
211 echo > /sys/kernel/debug/tracing/kprobe_events
212
263ee775 213This clears all probe points.
5a0d9050 214
263ee775
CD
215Or,
216::
df3ab708
MK
217
218 echo -:myprobe >> kprobe_events
219
263ee775 220This clears probe points selectively.
df3ab708 221
263ee775 222Right after definition, each event is disabled by default. For tracing these
5a0d9050 223events, you need to enable it.
263ee775 224::
5a0d9050
MH
225
226 echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
227 echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
228
78a89463
LC
229Use the following command to start tracing in an interval.
230::
7e6294cd 231
78a89463
LC
232 # echo 1 > tracing_on
233 Open something...
234 # echo 0 > tracing_on
235
263ee775
CD
236And you can see the traced information via /sys/kernel/debug/tracing/trace.
237::
d8ec9185
MH
238
239 cat /sys/kernel/debug/tracing/trace
263ee775
CD
240 # tracer: nop
241 #
242 # TASK-PID CPU# TIMESTAMP FUNCTION
243 # | | | | |
244 <...>-1447 [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
245 <...>-1447 [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe
246 <...>-1447 [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
247 <...>-1447 [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
248 <...>-1447 [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
249 <...>-1447 [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
250
251
252Each line shows when the kernel hits an event, and <- SYMBOL means kernel
d8ec9185
MH
253returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel
254returns from do_sys_open to sys_open+0x1b).
255