Commit | Line | Data |
---|---|---|
b4d94210 SR |
1 | ================================= |
2 | Using ftrace to hook to functions | |
3 | ================================= | |
4 | ||
5 | .. Copyright 2017 VMware Inc. | |
6 | .. Author: Steven Rostedt <srostedt@goodmis.org> | |
7 | .. License: The GNU Free Documentation License, Version 1.2 | |
8 | .. (dual licensed under the GPL v2) | |
9 | ||
10 | Written for: 4.14 | |
11 | ||
12 | Introduction | |
13 | ============ | |
14 | ||
15 | The ftrace infrastructure was originially created to attach callbacks to the | |
16 | beginning of functions in order to record and trace the flow of the kernel. | |
17 | But callbacks to the start of a function can have other use cases. Either | |
18 | for live kernel patching, or for security monitoring. This document describes | |
19 | how to use ftrace to implement your own function callbacks. | |
20 | ||
21 | ||
22 | The ftrace context | |
23 | ================== | |
24 | ||
25 | WARNING: The ability to add a callback to almost any function within the | |
26 | kernel comes with risks. A callback can be called from any context | |
27 | (normal, softirq, irq, and NMI). Callbacks can also be called just before | |
28 | going to idle, during CPU bring up and takedown, or going to user space. | |
29 | This requires extra care to what can be done inside a callback. A callback | |
30 | can be called outside the protective scope of RCU. | |
31 | ||
32 | The ftrace infrastructure has some protections agains recursions and RCU | |
33 | but one must still be very careful how they use the callbacks. | |
34 | ||
35 | ||
36 | The ftrace_ops structure | |
37 | ======================== | |
38 | ||
39 | To register a function callback, a ftrace_ops is required. This structure | |
40 | is used to tell ftrace what function should be called as the callback | |
41 | as well as what protections the callback will perform and not require | |
42 | ftrace to handle. | |
43 | ||
44 | There is only one field that is needed to be set when registering | |
2cd6ff4a | 45 | an ftrace_ops with ftrace: |
b4d94210 | 46 | |
2cd6ff4a | 47 | .. code-block:: c |
b4d94210 SR |
48 | |
49 | struct ftrace_ops ops = { | |
50 | .func = my_callback_func, | |
51 | .flags = MY_FTRACE_FLAGS | |
52 | .private = any_private_data_structure, | |
53 | }; | |
54 | ||
55 | Both .flags and .private are optional. Only .func is required. | |
56 | ||
57 | To enable tracing call:: | |
58 | ||
59 | .. c:function:: register_ftrace_function(&ops); | |
60 | ||
61 | To disable tracing call:: | |
62 | ||
63 | .. c:function:: unregister_ftrace_function(&ops); | |
64 | ||
65 | The above is defined by including the header:: | |
66 | ||
67 | .. c:function:: #include <linux/ftrace.h> | |
68 | ||
69 | The registered callback will start being called some time after the | |
70 | register_ftrace_function() is called and before it returns. The exact time | |
71 | that callbacks start being called is dependent upon architecture and scheduling | |
72 | of services. The callback itself will have to handle any synchronization if it | |
73 | must begin at an exact moment. | |
74 | ||
75 | The unregister_ftrace_function() will guarantee that the callback is | |
76 | no longer being called by functions after the unregister_ftrace_function() | |
77 | returns. Note that to perform this guarantee, the unregister_ftrace_function() | |
78 | may take some time to finish. | |
79 | ||
80 | ||
81 | The callback function | |
82 | ===================== | |
83 | ||
2cd6ff4a | 84 | The prototype of the callback function is as follows (as of v4.14): |
b4d94210 | 85 | |
2cd6ff4a | 86 | .. code-block:: c |
b4d94210 | 87 | |
2cd6ff4a MH |
88 | void callback_func(unsigned long ip, unsigned long parent_ip, |
89 | struct ftrace_ops *op, struct pt_regs *regs); | |
b4d94210 SR |
90 | |
91 | @ip | |
92 | This is the instruction pointer of the function that is being traced. | |
93 | (where the fentry or mcount is within the function) | |
94 | ||
95 | @parent_ip | |
96 | This is the instruction pointer of the function that called the | |
97 | the function being traced (where the call of the function occurred). | |
98 | ||
99 | @op | |
100 | This is a pointer to ftrace_ops that was used to register the callback. | |
101 | This can be used to pass data to the callback via the private pointer. | |
102 | ||
103 | @regs | |
104 | If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED | |
105 | flags are set in the ftrace_ops structure, then this will be pointing | |
106 | to the pt_regs structure like it would be if an breakpoint was placed | |
107 | at the start of the function where ftrace was tracing. Otherwise it | |
108 | either contains garbage, or NULL. | |
109 | ||
110 | ||
111 | The ftrace FLAGS | |
112 | ================ | |
113 | ||
114 | The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. | |
115 | Some of the flags are used for internal infrastructure of ftrace, but the | |
116 | ones that users should be aware of are the following: | |
117 | ||
118 | FTRACE_OPS_FL_SAVE_REGS | |
119 | If the callback requires reading or modifying the pt_regs | |
120 | passed to the callback, then it must set this flag. Registering | |
121 | a ftrace_ops with this flag set on an architecture that does not | |
122 | support passing of pt_regs to the callback will fail. | |
123 | ||
124 | FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED | |
125 | Similar to SAVE_REGS but the registering of a | |
126 | ftrace_ops on an architecture that does not support passing of regs | |
127 | will not fail with this flag set. But the callback must check if | |
128 | regs is NULL or not to determine if the architecture supports it. | |
129 | ||
130 | FTRACE_OPS_FL_RECURSION_SAFE | |
131 | By default, a wrapper is added around the callback to | |
132 | make sure that recursion of the function does not occur. That is, | |
133 | if a function that is called as a result of the callback's execution | |
134 | is also traced, ftrace will prevent the callback from being called | |
135 | again. But this wrapper adds some overhead, and if the callback is | |
136 | safe from recursion, it can set this flag to disable the ftrace | |
137 | protection. | |
138 | ||
139 | Note, if this flag is set, and recursion does occur, it could cause | |
140 | the system to crash, and possibly reboot via a triple fault. | |
141 | ||
142 | It is OK if another callback traces a function that is called by a | |
143 | callback that is marked recursion safe. Recursion safe callbacks | |
144 | must never trace any function that are called by the callback | |
145 | itself or any nested functions that those functions call. | |
146 | ||
147 | If this flag is set, it is possible that the callback will also | |
148 | be called with preemption enabled (when CONFIG_PREEMPT is set), | |
149 | but this is not guaranteed. | |
150 | ||
151 | FTRACE_OPS_FL_IPMODIFY | |
152 | Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" | |
153 | the traced function (have another function called instead of the | |
154 | traced function), it requires setting this flag. This is what live | |
155 | kernel patches uses. Without this flag the pt_regs->ip can not be | |
156 | modified. | |
157 | ||
158 | Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be | |
159 | registered to any given function at a time. | |
160 | ||
161 | FTRACE_OPS_FL_RCU | |
162 | If this is set, then the callback will only be called by functions | |
163 | where RCU is "watching". This is required if the callback function | |
164 | performs any rcu_read_lock() operation. | |
165 | ||
166 | RCU stops watching when the system goes idle, the time when a CPU | |
167 | is taken down and comes back online, and when entering from kernel | |
168 | to user space and back to kernel space. During these transitions, | |
169 | a callback may be executed and RCU synchronization will not protect | |
170 | it. | |
171 | ||
172 | ||
173 | Filtering which functions to trace | |
174 | ================================== | |
175 | ||
176 | If a callback is only to be called from specific functions, a filter must be | |
177 | set up. The filters are added by name, or ip if it is known. | |
178 | ||
2cd6ff4a | 179 | .. code-block:: c |
b4d94210 | 180 | |
2cd6ff4a MH |
181 | int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, |
182 | int len, int reset); | |
b4d94210 SR |
183 | |
184 | @ops | |
185 | The ops to set the filter with | |
186 | ||
187 | @buf | |
188 | The string that holds the function filter text. | |
189 | @len | |
190 | The length of the string. | |
191 | ||
192 | @reset | |
193 | Non-zero to reset all filters before applying this filter. | |
194 | ||
195 | Filters denote which functions should be enabled when tracing is enabled. | |
196 | If @buf is NULL and reset is set, all functions will be enabled for tracing. | |
197 | ||
198 | The @buf can also be a glob expression to enable all functions that | |
199 | match a specific pattern. | |
200 | ||
201 | See Filter Commands in :file:`Documentation/trace/ftrace.txt`. | |
202 | ||
203 | To just trace the schedule function:: | |
204 | ||
2cd6ff4a | 205 | .. code-block:: c |
b4d94210 | 206 | |
2cd6ff4a | 207 | ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); |
b4d94210 SR |
208 | |
209 | To add more functions, call the ftrace_set_filter() more than once with the | |
210 | @reset parameter set to zero. To remove the current filter set and replace it | |
211 | with new functions defined by @buf, have @reset be non-zero. | |
212 | ||
213 | To remove all the filtered functions and trace all functions:: | |
214 | ||
2cd6ff4a | 215 | .. code-block:: c |
b4d94210 | 216 | |
2cd6ff4a | 217 | ret = ftrace_set_filter(&ops, NULL, 0, 1); |
b4d94210 SR |
218 | |
219 | ||
220 | Sometimes more than one function has the same name. To trace just a specific | |
221 | function in this case, ftrace_set_filter_ip() can be used. | |
222 | ||
2cd6ff4a | 223 | .. code-block:: c |
b4d94210 | 224 | |
2cd6ff4a | 225 | ret = ftrace_set_filter_ip(&ops, ip, 0, 0); |
b4d94210 SR |
226 | |
227 | Although the ip must be the address where the call to fentry or mcount is | |
228 | located in the function. This function is used by perf and kprobes that | |
229 | gets the ip address from the user (usually using debug info from the kernel). | |
230 | ||
231 | If a glob is used to set the filter, functions can be added to a "notrace" | |
232 | list that will prevent those functions from calling the callback. | |
233 | The "notrace" list takes precedence over the "filter" list. If the | |
234 | two lists are non-empty and contain the same functions, the callback will not | |
235 | be called by any function. | |
236 | ||
237 | An empty "notrace" list means to allow all functions defined by the filter | |
238 | to be traced. | |
239 | ||
2cd6ff4a | 240 | .. code-block:: c |
b4d94210 | 241 | |
2cd6ff4a MH |
242 | int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, |
243 | int len, int reset); | |
b4d94210 SR |
244 | |
245 | This takes the same parameters as ftrace_set_filter() but will add the | |
246 | functions it finds to not be traced. This is a separate list from the | |
247 | filter list, and this function does not modify the filter list. | |
248 | ||
249 | A non-zero @reset will clear the "notrace" list before adding functions | |
250 | that match @buf to it. | |
251 | ||
252 | Clearing the "notrace" list is the same as clearing the filter list | |
253 | ||
2cd6ff4a | 254 | .. code-block:: c |
b4d94210 SR |
255 | |
256 | ret = ftrace_set_notrace(&ops, NULL, 0, 1); | |
257 | ||
258 | The filter and notrace lists may be changed at any time. If only a set of | |
259 | functions should call the callback, it is best to set the filters before | |
260 | registering the callback. But the changes may also happen after the callback | |
261 | has been registered. | |
262 | ||
263 | If a filter is in place, and the @reset is non-zero, and @buf contains a | |
264 | matching glob to functions, the switch will happen during the time of | |
265 | the ftrace_set_filter() call. At no time will all functions call the callback. | |
266 | ||
2cd6ff4a | 267 | .. code-block:: c |
b4d94210 | 268 | |
2cd6ff4a | 269 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
b4d94210 | 270 | |
2cd6ff4a | 271 | register_ftrace_function(&ops); |
b4d94210 | 272 | |
2cd6ff4a | 273 | msleep(10); |
b4d94210 | 274 | |
2cd6ff4a | 275 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); |
b4d94210 SR |
276 | |
277 | is not the same as: | |
278 | ||
2cd6ff4a | 279 | .. code-block:: c |
b4d94210 | 280 | |
2cd6ff4a | 281 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
b4d94210 | 282 | |
2cd6ff4a | 283 | register_ftrace_function(&ops); |
b4d94210 | 284 | |
2cd6ff4a | 285 | msleep(10); |
b4d94210 | 286 | |
2cd6ff4a | 287 | ftrace_set_filter(&ops, NULL, 0, 1); |
b4d94210 | 288 | |
2cd6ff4a | 289 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); |
b4d94210 SR |
290 | |
291 | As the latter will have a short time where all functions will call | |
292 | the callback, between the time of the reset, and the time of the | |
293 | new setting of the filter. |