Commit | Line | Data |
---|---|---|
b4d94210 SR |
1 | ================================= |
2 | Using ftrace to hook to functions | |
3 | ================================= | |
4 | ||
5 | .. Copyright 2017 VMware Inc. | |
6 | .. Author: Steven Rostedt <srostedt@goodmis.org> | |
7 | .. License: The GNU Free Documentation License, Version 1.2 | |
8 | .. (dual licensed under the GPL v2) | |
9 | ||
10 | Written for: 4.14 | |
11 | ||
12 | Introduction | |
13 | ============ | |
14 | ||
e37274fa | 15 | The ftrace infrastructure was originally created to attach callbacks to the |
b4d94210 SR |
16 | beginning of functions in order to record and trace the flow of the kernel. |
17 | But callbacks to the start of a function can have other use cases. Either | |
18 | for live kernel patching, or for security monitoring. This document describes | |
19 | how to use ftrace to implement your own function callbacks. | |
20 | ||
21 | ||
22 | The ftrace context | |
23 | ================== | |
b3fdd1f9 | 24 | .. warning:: |
b4d94210 | 25 | |
b3fdd1f9 CD |
26 | The ability to add a callback to almost any function within the |
27 | kernel comes with risks. A callback can be called from any context | |
28 | (normal, softirq, irq, and NMI). Callbacks can also be called just before | |
29 | going to idle, during CPU bring up and takedown, or going to user space. | |
30 | This requires extra care to what can be done inside a callback. A callback | |
31 | can be called outside the protective scope of RCU. | |
b4d94210 | 32 | |
a25d036d SRV |
33 | There are helper functions to help against recursion, and making sure |
34 | RCU is watching. These are explained below. | |
b4d94210 SR |
35 | |
36 | ||
37 | The ftrace_ops structure | |
38 | ======================== | |
39 | ||
40 | To register a function callback, a ftrace_ops is required. This structure | |
41 | is used to tell ftrace what function should be called as the callback | |
42 | as well as what protections the callback will perform and not require | |
43 | ftrace to handle. | |
44 | ||
45 | There is only one field that is needed to be set when registering | |
2cd6ff4a | 46 | an ftrace_ops with ftrace: |
b4d94210 | 47 | |
2cd6ff4a | 48 | .. code-block:: c |
b4d94210 SR |
49 | |
50 | struct ftrace_ops ops = { | |
51 | .func = my_callback_func, | |
52 | .flags = MY_FTRACE_FLAGS | |
53 | .private = any_private_data_structure, | |
54 | }; | |
55 | ||
56 | Both .flags and .private are optional. Only .func is required. | |
57 | ||
d7faad15 | 58 | To enable tracing call:: |
b4d94210 | 59 | |
d7faad15 | 60 | register_ftrace_function(&ops); |
b4d94210 | 61 | |
d7faad15 | 62 | To disable tracing call:: |
b4d94210 | 63 | |
d7faad15 | 64 | unregister_ftrace_function(&ops); |
b4d94210 | 65 | |
d7faad15 | 66 | The above is defined by including the header:: |
b4d94210 | 67 | |
d7faad15 | 68 | #include <linux/ftrace.h> |
b4d94210 SR |
69 | |
70 | The registered callback will start being called some time after the | |
71 | register_ftrace_function() is called and before it returns. The exact time | |
72 | that callbacks start being called is dependent upon architecture and scheduling | |
73 | of services. The callback itself will have to handle any synchronization if it | |
74 | must begin at an exact moment. | |
75 | ||
76 | The unregister_ftrace_function() will guarantee that the callback is | |
77 | no longer being called by functions after the unregister_ftrace_function() | |
78 | returns. Note that to perform this guarantee, the unregister_ftrace_function() | |
79 | may take some time to finish. | |
80 | ||
81 | ||
82 | The callback function | |
83 | ===================== | |
84 | ||
2cd6ff4a | 85 | The prototype of the callback function is as follows (as of v4.14): |
b4d94210 | 86 | |
2cd6ff4a | 87 | .. code-block:: c |
b4d94210 | 88 | |
2cd6ff4a MH |
89 | void callback_func(unsigned long ip, unsigned long parent_ip, |
90 | struct ftrace_ops *op, struct pt_regs *regs); | |
b4d94210 SR |
91 | |
92 | @ip | |
93 | This is the instruction pointer of the function that is being traced. | |
94 | (where the fentry or mcount is within the function) | |
95 | ||
96 | @parent_ip | |
97 | This is the instruction pointer of the function that called the | |
98 | the function being traced (where the call of the function occurred). | |
99 | ||
100 | @op | |
101 | This is a pointer to ftrace_ops that was used to register the callback. | |
102 | This can be used to pass data to the callback via the private pointer. | |
103 | ||
104 | @regs | |
105 | If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED | |
106 | flags are set in the ftrace_ops structure, then this will be pointing | |
107 | to the pt_regs structure like it would be if an breakpoint was placed | |
108 | at the start of the function where ftrace was tracing. Otherwise it | |
109 | either contains garbage, or NULL. | |
110 | ||
a25d036d SRV |
111 | Protect your callback |
112 | ===================== | |
113 | ||
114 | As functions can be called from anywhere, and it is possible that a function | |
115 | called by a callback may also be traced, and call that same callback, | |
116 | recursion protection must be used. There are two helper functions that | |
117 | can help in this regard. If you start your code with: | |
118 | ||
3a37b918 SRV |
119 | .. code-block:: c |
120 | ||
a25d036d SRV |
121 | int bit; |
122 | ||
773c1670 | 123 | bit = ftrace_test_recursion_trylock(ip, parent_ip); |
a25d036d SRV |
124 | if (bit < 0) |
125 | return; | |
126 | ||
127 | and end it with: | |
128 | ||
3a37b918 SRV |
129 | .. code-block:: c |
130 | ||
a25d036d SRV |
131 | ftrace_test_recursion_unlock(bit); |
132 | ||
133 | The code in between will be safe to use, even if it ends up calling a | |
134 | function that the callback is tracing. Note, on success, | |
135 | ftrace_test_recursion_trylock() will disable preemption, and the | |
136 | ftrace_test_recursion_unlock() will enable it again (if it was previously | |
773c1670 SRV |
137 | enabled). The instruction pointer (ip) and its parent (parent_ip) is passed to |
138 | ftrace_test_recursion_trylock() to record where the recursion happened | |
139 | (if CONFIG_FTRACE_RECORD_RECURSION is set). | |
a25d036d SRV |
140 | |
141 | Alternatively, if the FTRACE_OPS_FL_RECURSION flag is set on the ftrace_ops | |
142 | (as explained below), then a helper trampoline will be used to test | |
143 | for recursion for the callback and no recursion test needs to be done. | |
144 | But this is at the expense of a slightly more overhead from an extra | |
145 | function call. | |
146 | ||
147 | If your callback accesses any data or critical section that requires RCU | |
148 | protection, it is best to make sure that RCU is "watching", otherwise | |
149 | that data or critical section will not be protected as expected. In this | |
150 | case add: | |
151 | ||
3a37b918 SRV |
152 | .. code-block:: c |
153 | ||
a25d036d SRV |
154 | if (!rcu_is_watching()) |
155 | return; | |
156 | ||
157 | Alternatively, if the FTRACE_OPS_FL_RCU flag is set on the ftrace_ops | |
158 | (as explained below), then a helper trampoline will be used to test | |
159 | for rcu_is_watching for the callback and no other test needs to be done. | |
160 | But this is at the expense of a slightly more overhead from an extra | |
161 | function call. | |
162 | ||
b4d94210 SR |
163 | |
164 | The ftrace FLAGS | |
165 | ================ | |
166 | ||
167 | The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. | |
168 | Some of the flags are used for internal infrastructure of ftrace, but the | |
169 | ones that users should be aware of are the following: | |
170 | ||
171 | FTRACE_OPS_FL_SAVE_REGS | |
172 | If the callback requires reading or modifying the pt_regs | |
173 | passed to the callback, then it must set this flag. Registering | |
174 | a ftrace_ops with this flag set on an architecture that does not | |
175 | support passing of pt_regs to the callback will fail. | |
176 | ||
177 | FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED | |
178 | Similar to SAVE_REGS but the registering of a | |
179 | ftrace_ops on an architecture that does not support passing of regs | |
180 | will not fail with this flag set. But the callback must check if | |
181 | regs is NULL or not to determine if the architecture supports it. | |
182 | ||
a25d036d SRV |
183 | FTRACE_OPS_FL_RECURSION |
184 | By default, it is expected that the callback can handle recursion. | |
d3f79db9 | 185 | But if the callback is not that worried about overhead, then |
a25d036d SRV |
186 | setting this bit will add the recursion protection around the |
187 | callback by calling a helper function that will do the recursion | |
188 | protection and only call the callback if it did not recurse. | |
189 | ||
190 | Note, if this flag is not set, and recursion does occur, it could | |
191 | cause the system to crash, and possibly reboot via a triple fault. | |
192 | ||
d3f79db9 | 193 | Note, if this flag is set, then the callback will always be called |
a25d036d SRV |
194 | with preemption disabled. If it is not set, then it is possible |
195 | (but not guaranteed) that the callback will be called in | |
196 | preemptable context. | |
b4d94210 SR |
197 | |
198 | FTRACE_OPS_FL_IPMODIFY | |
199 | Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" | |
200 | the traced function (have another function called instead of the | |
201 | traced function), it requires setting this flag. This is what live | |
202 | kernel patches uses. Without this flag the pt_regs->ip can not be | |
203 | modified. | |
204 | ||
205 | Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be | |
206 | registered to any given function at a time. | |
207 | ||
208 | FTRACE_OPS_FL_RCU | |
209 | If this is set, then the callback will only be called by functions | |
210 | where RCU is "watching". This is required if the callback function | |
211 | performs any rcu_read_lock() operation. | |
212 | ||
213 | RCU stops watching when the system goes idle, the time when a CPU | |
214 | is taken down and comes back online, and when entering from kernel | |
215 | to user space and back to kernel space. During these transitions, | |
216 | a callback may be executed and RCU synchronization will not protect | |
217 | it. | |
218 | ||
7162431d MB |
219 | FTRACE_OPS_FL_PERMANENT |
220 | If this is set on any ftrace ops, then the tracing cannot disabled by | |
221 | writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with | |
222 | the flag set cannot be registered if ftrace_enabled is 0. | |
223 | ||
224 | Livepatch uses it not to lose the function redirection, so the system | |
225 | stays protected. | |
226 | ||
b4d94210 SR |
227 | |
228 | Filtering which functions to trace | |
229 | ================================== | |
230 | ||
231 | If a callback is only to be called from specific functions, a filter must be | |
232 | set up. The filters are added by name, or ip if it is known. | |
233 | ||
2cd6ff4a | 234 | .. code-block:: c |
b4d94210 | 235 | |
2cd6ff4a MH |
236 | int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, |
237 | int len, int reset); | |
b4d94210 SR |
238 | |
239 | @ops | |
240 | The ops to set the filter with | |
241 | ||
242 | @buf | |
243 | The string that holds the function filter text. | |
244 | @len | |
245 | The length of the string. | |
246 | ||
247 | @reset | |
248 | Non-zero to reset all filters before applying this filter. | |
249 | ||
250 | Filters denote which functions should be enabled when tracing is enabled. | |
251 | If @buf is NULL and reset is set, all functions will be enabled for tracing. | |
252 | ||
253 | The @buf can also be a glob expression to enable all functions that | |
254 | match a specific pattern. | |
255 | ||
5fb94e9c | 256 | See Filter Commands in :file:`Documentation/trace/ftrace.rst`. |
b4d94210 | 257 | |
b3fdd1f9 | 258 | To just trace the schedule function: |
b4d94210 | 259 | |
2cd6ff4a | 260 | .. code-block:: c |
b4d94210 | 261 | |
2cd6ff4a | 262 | ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); |
b4d94210 SR |
263 | |
264 | To add more functions, call the ftrace_set_filter() more than once with the | |
265 | @reset parameter set to zero. To remove the current filter set and replace it | |
266 | with new functions defined by @buf, have @reset be non-zero. | |
267 | ||
b3fdd1f9 | 268 | To remove all the filtered functions and trace all functions: |
b4d94210 | 269 | |
2cd6ff4a | 270 | .. code-block:: c |
b4d94210 | 271 | |
2cd6ff4a | 272 | ret = ftrace_set_filter(&ops, NULL, 0, 1); |
b4d94210 SR |
273 | |
274 | ||
275 | Sometimes more than one function has the same name. To trace just a specific | |
276 | function in this case, ftrace_set_filter_ip() can be used. | |
277 | ||
2cd6ff4a | 278 | .. code-block:: c |
b4d94210 | 279 | |
2cd6ff4a | 280 | ret = ftrace_set_filter_ip(&ops, ip, 0, 0); |
b4d94210 SR |
281 | |
282 | Although the ip must be the address where the call to fentry or mcount is | |
283 | located in the function. This function is used by perf and kprobes that | |
284 | gets the ip address from the user (usually using debug info from the kernel). | |
285 | ||
286 | If a glob is used to set the filter, functions can be added to a "notrace" | |
287 | list that will prevent those functions from calling the callback. | |
288 | The "notrace" list takes precedence over the "filter" list. If the | |
289 | two lists are non-empty and contain the same functions, the callback will not | |
290 | be called by any function. | |
291 | ||
292 | An empty "notrace" list means to allow all functions defined by the filter | |
293 | to be traced. | |
294 | ||
2cd6ff4a | 295 | .. code-block:: c |
b4d94210 | 296 | |
2cd6ff4a MH |
297 | int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, |
298 | int len, int reset); | |
b4d94210 SR |
299 | |
300 | This takes the same parameters as ftrace_set_filter() but will add the | |
301 | functions it finds to not be traced. This is a separate list from the | |
302 | filter list, and this function does not modify the filter list. | |
303 | ||
304 | A non-zero @reset will clear the "notrace" list before adding functions | |
305 | that match @buf to it. | |
306 | ||
307 | Clearing the "notrace" list is the same as clearing the filter list | |
308 | ||
2cd6ff4a | 309 | .. code-block:: c |
b4d94210 SR |
310 | |
311 | ret = ftrace_set_notrace(&ops, NULL, 0, 1); | |
312 | ||
313 | The filter and notrace lists may be changed at any time. If only a set of | |
314 | functions should call the callback, it is best to set the filters before | |
315 | registering the callback. But the changes may also happen after the callback | |
316 | has been registered. | |
317 | ||
318 | If a filter is in place, and the @reset is non-zero, and @buf contains a | |
319 | matching glob to functions, the switch will happen during the time of | |
320 | the ftrace_set_filter() call. At no time will all functions call the callback. | |
321 | ||
2cd6ff4a | 322 | .. code-block:: c |
b4d94210 | 323 | |
2cd6ff4a | 324 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
b4d94210 | 325 | |
2cd6ff4a | 326 | register_ftrace_function(&ops); |
b4d94210 | 327 | |
2cd6ff4a | 328 | msleep(10); |
b4d94210 | 329 | |
2cd6ff4a | 330 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); |
b4d94210 SR |
331 | |
332 | is not the same as: | |
333 | ||
2cd6ff4a | 334 | .. code-block:: c |
b4d94210 | 335 | |
2cd6ff4a | 336 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
b4d94210 | 337 | |
2cd6ff4a | 338 | register_ftrace_function(&ops); |
b4d94210 | 339 | |
2cd6ff4a | 340 | msleep(10); |
b4d94210 | 341 | |
2cd6ff4a | 342 | ftrace_set_filter(&ops, NULL, 0, 1); |
b4d94210 | 343 | |
2cd6ff4a | 344 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); |
b4d94210 SR |
345 | |
346 | As the latter will have a short time where all functions will call | |
347 | the callback, between the time of the reset, and the time of the | |
348 | new setting of the filter. |