Commit | Line | Data |
---|---|---|
fcdeddc9 CD |
1 | ====================== |
2 | Function Tracer Design | |
3 | ====================== | |
4 | ||
5 | :Author: Mike Frysinger | |
6 | ||
7 | .. caution:: | |
8 | This document is out of date. Some of the description below doesn't | |
9 | match current implementation now. | |
555f386c MF |
10 | |
11 | Introduction | |
12 | ------------ | |
13 | ||
14 | Here we will cover the architecture pieces that the common function tracing | |
15 | code relies on for proper functioning. Things are broken down into increasing | |
16 | complexity so that you can start simple and at least get basic functionality. | |
17 | ||
18 | Note that this focuses on architecture implementation details only. If you | |
19 | want more explanation of a feature in terms of common code, review the common | |
20 | ftrace.txt file. | |
21 | ||
9849ed4d MF |
22 | Ideally, everyone who wishes to retain performance while supporting tracing in |
23 | their kernel should make it all the way to dynamic ftrace support. | |
24 | ||
555f386c MF |
25 | |
26 | Prerequisites | |
27 | ------------- | |
28 | ||
29 | Ftrace relies on these features being implemented: | |
fcdeddc9 CD |
30 | - STACKTRACE_SUPPORT - implement save_stack_trace() |
31 | - TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h | |
555f386c MF |
32 | |
33 | ||
34 | HAVE_FUNCTION_TRACER | |
35 | -------------------- | |
36 | ||
37 | You will need to implement the mcount and the ftrace_stub functions. | |
38 | ||
39 | The exact mcount symbol name will depend on your toolchain. Some call it | |
40 | "mcount", "_mcount", or even "__mcount". You can probably figure it out by | |
fcdeddc9 CD |
41 | running something like:: |
42 | ||
555f386c MF |
43 | $ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount |
44 | call mcount | |
fcdeddc9 | 45 | |
555f386c MF |
46 | We'll make the assumption below that the symbol is "mcount" just to keep things |
47 | nice and simple in the examples. | |
48 | ||
49 | Keep in mind that the ABI that is in effect inside of the mcount function is | |
50 | *highly* architecture/toolchain specific. We cannot help you in this regard, | |
51 | sorry. Dig up some old documentation and/or find someone more familiar than | |
52 | you to bang ideas off of. Typically, register usage (argument/scratch/etc...) | |
53 | is a major issue at this point, especially in relation to the location of the | |
54 | mcount call (before/after function prologue). You might also want to look at | |
55 | how glibc has implemented the mcount function for your architecture. It might | |
56 | be (semi-)relevant. | |
57 | ||
58 | The mcount function should check the function pointer ftrace_trace_function | |
59 | to see if it is set to ftrace_stub. If it is, there is nothing for you to do, | |
60 | so return immediately. If it isn't, then call that function in the same way | |
61 | the mcount function normally calls __mcount_internal -- the first argument is | |
62 | the "frompc" while the second argument is the "selfpc" (adjusted to remove the | |
63 | size of the mcount call that is embedded in the function). | |
64 | ||
65 | For example, if the function foo() calls bar(), when the bar() function calls | |
66 | mcount(), the arguments mcount() will pass to the tracer are: | |
fcdeddc9 CD |
67 | |
68 | - "frompc" - the address bar() will use to return to foo() | |
69 | - "selfpc" - the address bar() (with mcount() size adjustment) | |
555f386c MF |
70 | |
71 | Also keep in mind that this mcount function will be called *a lot*, so | |
72 | optimizing for the default case of no tracer will help the smooth running of | |
73 | your system when tracing is disabled. So the start of the mcount function is | |
7e25f44c RD |
74 | typically the bare minimum with checking things before returning. That also |
75 | means the code flow should usually be kept linear (i.e. no branching in the nop | |
76 | case). This is of course an optimization and not a hard requirement. | |
555f386c MF |
77 | |
78 | Here is some pseudo code that should help (these functions should actually be | |
fcdeddc9 | 79 | implemented in assembly):: |
555f386c | 80 | |
fcdeddc9 CD |
81 | void ftrace_stub(void) |
82 | { | |
83 | return; | |
84 | } | |
555f386c | 85 | |
fcdeddc9 CD |
86 | void mcount(void) |
87 | { | |
88 | /* save any bare state needed in order to do initial checking */ | |
555f386c | 89 | |
fcdeddc9 CD |
90 | extern void (*ftrace_trace_function)(unsigned long, unsigned long); |
91 | if (ftrace_trace_function != ftrace_stub) | |
92 | goto do_trace; | |
555f386c | 93 | |
fcdeddc9 | 94 | /* restore any bare state */ |
555f386c | 95 | |
fcdeddc9 | 96 | return; |
555f386c | 97 | |
fcdeddc9 | 98 | do_trace: |
555f386c | 99 | |
fcdeddc9 | 100 | /* save all state needed by the ABI (see paragraph above) */ |
555f386c | 101 | |
fcdeddc9 CD |
102 | unsigned long frompc = ...; |
103 | unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | |
104 | ftrace_trace_function(frompc, selfpc); | |
555f386c | 105 | |
fcdeddc9 CD |
106 | /* restore all state needed by the ABI */ |
107 | } | |
555f386c MF |
108 | |
109 | Don't forget to export mcount for modules ! | |
fcdeddc9 CD |
110 | :: |
111 | ||
112 | extern void mcount(void); | |
113 | EXPORT_SYMBOL(mcount); | |
555f386c MF |
114 | |
115 | ||
555f386c MF |
116 | HAVE_FUNCTION_GRAPH_TRACER |
117 | -------------------------- | |
118 | ||
119 | Deep breath ... time to do some real work. Here you will need to update the | |
120 | mcount function to check ftrace graph function pointers, as well as implement | |
121 | some functions to save (hijack) and restore the return address. | |
122 | ||
123 | The mcount function should check the function pointers ftrace_graph_return | |
124 | (compare to ftrace_stub) and ftrace_graph_entry (compare to | |
7e25f44c | 125 | ftrace_graph_entry_stub). If either of those is not set to the relevant stub |
555f386c MF |
126 | function, call the arch-specific function ftrace_graph_caller which in turn |
127 | calls the arch-specific function prepare_ftrace_return. Neither of these | |
7e25f44c | 128 | function names is strictly required, but you should use them anyway to stay |
555f386c MF |
129 | consistent across the architecture ports -- easier to compare & contrast |
130 | things. | |
131 | ||
132 | The arguments to prepare_ftrace_return are slightly different than what are | |
133 | passed to ftrace_trace_function. The second argument "selfpc" is the same, | |
134 | but the first argument should be a pointer to the "frompc". Typically this is | |
135 | located on the stack. This allows the function to hijack the return address | |
136 | temporarily to have it point to the arch-specific function return_to_handler. | |
137 | That function will simply call the common ftrace_return_to_handler function and | |
7e25f44c | 138 | that will return the original return address with which you can return to the |
555f386c MF |
139 | original call site. |
140 | ||
fcdeddc9 CD |
141 | Here is the updated mcount pseudo code:: |
142 | ||
143 | void mcount(void) | |
144 | { | |
145 | ... | |
146 | if (ftrace_trace_function != ftrace_stub) | |
147 | goto do_trace; | |
148 | ||
149 | +#ifdef CONFIG_FUNCTION_GRAPH_TRACER | |
150 | + extern void (*ftrace_graph_return)(...); | |
151 | + extern void (*ftrace_graph_entry)(...); | |
152 | + if (ftrace_graph_return != ftrace_stub || | |
153 | + ftrace_graph_entry != ftrace_graph_entry_stub) | |
154 | + ftrace_graph_caller(); | |
155 | +#endif | |
156 | ||
157 | /* restore any bare state */ | |
158 | ... | |
159 | ||
160 | Here is the pseudo code for the new ftrace_graph_caller assembly function:: | |
161 | ||
162 | #ifdef CONFIG_FUNCTION_GRAPH_TRACER | |
163 | void ftrace_graph_caller(void) | |
164 | { | |
165 | /* save all state needed by the ABI */ | |
166 | ||
167 | unsigned long *frompc = &...; | |
168 | unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | |
169 | /* passing frame pointer up is optional -- see below */ | |
170 | prepare_ftrace_return(frompc, selfpc, frame_pointer); | |
171 | ||
172 | /* restore all state needed by the ABI */ | |
173 | } | |
174 | #endif | |
555f386c | 175 | |
03688970 MF |
176 | For information on how to implement prepare_ftrace_return(), simply look at the |
177 | x86 version (the frame pointer passing is optional; see the next section for | |
178 | more information). The only architecture-specific piece in it is the setup of | |
555f386c MF |
179 | the fault recovery table (the asm(...) code). The rest should be the same |
180 | across architectures. | |
181 | ||
182 | Here is the pseudo code for the new return_to_handler assembly function. Note | |
183 | that the ABI that applies here is different from what applies to the mcount | |
184 | code. Since you are returning from a function (after the epilogue), you might | |
185 | be able to skimp on things saved/restored (usually just registers used to pass | |
186 | return values). | |
fcdeddc9 | 187 | :: |
555f386c | 188 | |
fcdeddc9 CD |
189 | #ifdef CONFIG_FUNCTION_GRAPH_TRACER |
190 | void return_to_handler(void) | |
191 | { | |
192 | /* save all state needed by the ABI (see paragraph above) */ | |
555f386c | 193 | |
fcdeddc9 | 194 | void (*original_return_point)(void) = ftrace_return_to_handler(); |
555f386c | 195 | |
fcdeddc9 | 196 | /* restore all state needed by the ABI */ |
555f386c | 197 | |
fcdeddc9 CD |
198 | /* this is usually either a return or a jump */ |
199 | original_return_point(); | |
200 | } | |
201 | #endif | |
555f386c MF |
202 | |
203 | ||
03688970 MF |
204 | HAVE_FUNCTION_GRAPH_FP_TEST |
205 | --------------------------- | |
206 | ||
207 | An arch may pass in a unique value (frame pointer) to both the entering and | |
208 | exiting of a function. On exit, the value is compared and if it does not | |
209 | match, then it will panic the kernel. This is largely a sanity check for bad | |
210 | code generation with gcc. If gcc for your port sanely updates the frame | |
9849ed4d | 211 | pointer under different optimization levels, then ignore this option. |
03688970 MF |
212 | |
213 | However, adding support for it isn't terribly difficult. In your assembly code | |
214 | that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument. | |
215 | Then in the C version of that function, do what the x86 port does and pass it | |
216 | along to ftrace_push_return_trace() instead of a stub value of 0. | |
217 | ||
218 | Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer. | |
219 | ||
9a7c348b JP |
220 | HAVE_FUNCTION_GRAPH_RET_ADDR_PTR |
221 | -------------------------------- | |
222 | ||
223 | An arch may pass in a pointer to the return address on the stack. This | |
224 | prevents potential stack unwinding issues where the unwinder gets out of | |
225 | sync with ret_stack and the wrong addresses are reported by | |
226 | ftrace_graph_ret_addr(). | |
227 | ||
228 | Adding support for it is easy: just define the macro in asm/ftrace.h and | |
229 | pass the return address pointer as the 'retp' argument to | |
230 | ftrace_push_return_trace(). | |
03688970 | 231 | |
459c6d15 | 232 | HAVE_SYSCALL_TRACEPOINTS |
9849ed4d | 233 | ------------------------ |
555f386c | 234 | |
459c6d15 FW |
235 | You need very few things to get the syscalls tracing in an arch. |
236 | ||
fcdeddc9 CD |
237 | - Support HAVE_ARCH_TRACEHOOK (see arch/Kconfig). |
238 | - Have a NR_syscalls variable in <asm/unistd.h> that provides the number | |
239 | of syscalls supported by the arch. | |
240 | - Support the TIF_SYSCALL_TRACEPOINT thread flags. | |
241 | - Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace | |
242 | in the ptrace syscalls tracing path. | |
243 | - If the system call table on this arch is more complicated than a simple array | |
244 | of addresses of the system calls, implement an arch_syscall_addr to return | |
245 | the address of a given system call. | |
246 | - If the symbol names of the system calls do not match the function names on | |
247 | this arch, define ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and | |
248 | implement arch_syscall_match_sym_name with the appropriate logic to return | |
249 | true if the function name corresponds with the symbol name. | |
250 | - Tag this arch as HAVE_SYSCALL_TRACEPOINTS. | |
555f386c MF |
251 | |
252 | ||
253 | HAVE_FTRACE_MCOUNT_RECORD | |
254 | ------------------------- | |
255 | ||
9849ed4d MF |
256 | See scripts/recordmcount.pl for more info. Just fill in the arch-specific |
257 | details for how to locate the addresses of mcount call sites via objdump. | |
258 | This option doesn't make much sense without also implementing dynamic ftrace. | |
555f386c | 259 | |
9849ed4d MF |
260 | |
261 | HAVE_DYNAMIC_FTRACE | |
262 | ------------------- | |
263 | ||
264 | You will first need HAVE_FTRACE_MCOUNT_RECORD and HAVE_FUNCTION_TRACER, so | |
265 | scroll your reader back up if you got over eager. | |
266 | ||
267 | Once those are out of the way, you will need to implement: | |
268 | - asm/ftrace.h: | |
269 | - MCOUNT_ADDR | |
270 | - ftrace_call_adjust() | |
271 | - struct dyn_arch_ftrace{} | |
272 | - asm code: | |
273 | - mcount() (new stub) | |
274 | - ftrace_caller() | |
275 | - ftrace_call() | |
276 | - ftrace_stub() | |
277 | - C code: | |
278 | - ftrace_dyn_arch_init() | |
279 | - ftrace_make_nop() | |
280 | - ftrace_make_call() | |
281 | - ftrace_update_ftrace_func() | |
282 | ||
283 | First you will need to fill out some arch details in your asm/ftrace.h. | |
284 | ||
fcdeddc9 CD |
285 | Define MCOUNT_ADDR as the address of your mcount symbol similar to:: |
286 | ||
9849ed4d | 287 | #define MCOUNT_ADDR ((unsigned long)mcount) |
fcdeddc9 CD |
288 | |
289 | Since no one else will have a decl for that function, you will need to:: | |
290 | ||
9849ed4d MF |
291 | extern void mcount(void); |
292 | ||
293 | You will also need the helper function ftrace_call_adjust(). Most people | |
fcdeddc9 CD |
294 | will be able to stub it out like so:: |
295 | ||
9849ed4d MF |
296 | static inline unsigned long ftrace_call_adjust(unsigned long addr) |
297 | { | |
298 | return addr; | |
299 | } | |
fcdeddc9 | 300 | |
555f386c MF |
301 | <details to be filled> |
302 | ||
9849ed4d MF |
303 | Lastly you will need the custom dyn_arch_ftrace structure. If you need |
304 | some extra state when runtime patching arbitrary call sites, this is the | |
fcdeddc9 CD |
305 | place. For now though, create an empty struct:: |
306 | ||
9849ed4d MF |
307 | struct dyn_arch_ftrace { |
308 | /* No extra data needed */ | |
309 | }; | |
310 | ||
311 | With the header out of the way, we can fill out the assembly code. While we | |
312 | did already create a mcount() function earlier, dynamic ftrace only wants a | |
313 | stub function. This is because the mcount() will only be used during boot | |
314 | and then all references to it will be patched out never to return. Instead, | |
315 | the guts of the old mcount() will be used to create a new ftrace_caller() | |
316 | function. Because the two are hard to merge, it will most likely be a lot | |
317 | easier to have two separate definitions split up by #ifdefs. Same goes for | |
318 | the ftrace_stub() as that will now be inlined in ftrace_caller(). | |
319 | ||
320 | Before we get confused anymore, let's check out some pseudo code so you can | |
fcdeddc9 | 321 | implement your own stuff in assembly:: |
555f386c | 322 | |
fcdeddc9 CD |
323 | void mcount(void) |
324 | { | |
325 | return; | |
326 | } | |
9849ed4d | 327 | |
fcdeddc9 CD |
328 | void ftrace_caller(void) |
329 | { | |
330 | /* save all state needed by the ABI (see paragraph above) */ | |
9849ed4d | 331 | |
fcdeddc9 CD |
332 | unsigned long frompc = ...; |
333 | unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | |
9849ed4d | 334 | |
fcdeddc9 CD |
335 | ftrace_call: |
336 | ftrace_stub(frompc, selfpc); | |
9849ed4d | 337 | |
fcdeddc9 | 338 | /* restore all state needed by the ABI */ |
9849ed4d | 339 | |
fcdeddc9 CD |
340 | ftrace_stub: |
341 | return; | |
342 | } | |
9849ed4d MF |
343 | |
344 | This might look a little odd at first, but keep in mind that we will be runtime | |
345 | patching multiple things. First, only functions that we actually want to trace | |
346 | will be patched to call ftrace_caller(). Second, since we only have one tracer | |
347 | active at a time, we will patch the ftrace_caller() function itself to call the | |
348 | specific tracer in question. That is the point of the ftrace_call label. | |
349 | ||
350 | With that in mind, let's move on to the C code that will actually be doing the | |
351 | runtime patching. You'll need a little knowledge of your arch's opcodes in | |
352 | order to make it through the next section. | |
353 | ||
354 | Every arch has an init callback function. If you need to do something early on | |
355 | to initialize some state, this is the time to do that. Otherwise, this simple | |
fcdeddc9 | 356 | function below should be sufficient for most people:: |
9849ed4d | 357 | |
fcdeddc9 CD |
358 | int __init ftrace_dyn_arch_init(void) |
359 | { | |
360 | return 0; | |
361 | } | |
9849ed4d MF |
362 | |
363 | There are two functions that are used to do runtime patching of arbitrary | |
364 | functions. The first is used to turn the mcount call site into a nop (which | |
365 | is what helps us retain runtime performance when not tracing). The second is | |
366 | used to turn the mcount call site into a call to an arbitrary location (but | |
367 | typically that is ftracer_caller()). See the general function definition in | |
fcdeddc9 CD |
368 | linux/ftrace.h for the functions:: |
369 | ||
9849ed4d MF |
370 | ftrace_make_nop() |
371 | ftrace_make_call() | |
fcdeddc9 | 372 | |
9849ed4d MF |
373 | The rec->ip value is the address of the mcount call site that was collected |
374 | by the scripts/recordmcount.pl during build time. | |
375 | ||
376 | The last function is used to do runtime patching of the active tracer. This | |
377 | will be modifying the assembly code at the location of the ftrace_call symbol | |
378 | inside of the ftrace_caller() function. So you should have sufficient padding | |
379 | at that location to support the new function calls you'll be inserting. Some | |
380 | people will be using a "call" type instruction while others will be using a | |
fcdeddc9 CD |
381 | "branch" type instruction. Specifically, the function is:: |
382 | ||
9849ed4d MF |
383 | ftrace_update_ftrace_func() |
384 | ||
385 | ||
386 | HAVE_DYNAMIC_FTRACE + HAVE_FUNCTION_GRAPH_TRACER | |
387 | ------------------------------------------------ | |
388 | ||
389 | The function grapher needs a few tweaks in order to work with dynamic ftrace. | |
390 | Basically, you will need to: | |
fcdeddc9 | 391 | |
9849ed4d MF |
392 | - update: |
393 | - ftrace_caller() | |
394 | - ftrace_graph_call() | |
395 | - ftrace_graph_caller() | |
396 | - implement: | |
397 | - ftrace_enable_ftrace_graph_caller() | |
398 | - ftrace_disable_ftrace_graph_caller() | |
555f386c MF |
399 | |
400 | <details to be filled> | |
fcdeddc9 | 401 | |
9849ed4d | 402 | Quick notes: |
fcdeddc9 | 403 | |
9849ed4d MF |
404 | - add a nop stub after the ftrace_call location named ftrace_graph_call; |
405 | stub needs to be large enough to support a call to ftrace_graph_caller() | |
406 | - update ftrace_graph_caller() to work with being called by the new | |
407 | ftrace_caller() since some semantics may have changed | |
408 | - ftrace_enable_ftrace_graph_caller() will runtime patch the | |
409 | ftrace_graph_call location with a call to ftrace_graph_caller() | |
410 | - ftrace_disable_ftrace_graph_caller() will runtime patch the | |
411 | ftrace_graph_call location with nops |