Commit | Line | Data |
---|---|---|
b4d94210 SR |
1 | ================================= |
2 | Using ftrace to hook to functions | |
3 | ================================= | |
4 | ||
5 | .. Copyright 2017 VMware Inc. | |
6 | .. Author: Steven Rostedt <srostedt@goodmis.org> | |
7 | .. License: The GNU Free Documentation License, Version 1.2 | |
8 | .. (dual licensed under the GPL v2) | |
9 | ||
10 | Written for: 4.14 | |
11 | ||
12 | Introduction | |
13 | ============ | |
14 | ||
e37274fa | 15 | The ftrace infrastructure was originally created to attach callbacks to the |
b4d94210 SR |
16 | beginning of functions in order to record and trace the flow of the kernel. |
17 | But callbacks to the start of a function can have other use cases. Either | |
18 | for live kernel patching, or for security monitoring. This document describes | |
19 | how to use ftrace to implement your own function callbacks. | |
20 | ||
21 | ||
22 | The ftrace context | |
23 | ================== | |
b3fdd1f9 | 24 | .. warning:: |
b4d94210 | 25 | |
b3fdd1f9 CD |
26 | The ability to add a callback to almost any function within the |
27 | kernel comes with risks. A callback can be called from any context | |
28 | (normal, softirq, irq, and NMI). Callbacks can also be called just before | |
29 | going to idle, during CPU bring up and takedown, or going to user space. | |
30 | This requires extra care to what can be done inside a callback. A callback | |
31 | can be called outside the protective scope of RCU. | |
b4d94210 | 32 | |
e37274fa | 33 | The ftrace infrastructure has some protections against recursions and RCU |
b4d94210 SR |
34 | but one must still be very careful how they use the callbacks. |
35 | ||
36 | ||
37 | The ftrace_ops structure | |
38 | ======================== | |
39 | ||
40 | To register a function callback, a ftrace_ops is required. This structure | |
41 | is used to tell ftrace what function should be called as the callback | |
42 | as well as what protections the callback will perform and not require | |
43 | ftrace to handle. | |
44 | ||
45 | There is only one field that is needed to be set when registering | |
2cd6ff4a | 46 | an ftrace_ops with ftrace: |
b4d94210 | 47 | |
2cd6ff4a | 48 | .. code-block:: c |
b4d94210 SR |
49 | |
50 | struct ftrace_ops ops = { | |
51 | .func = my_callback_func, | |
52 | .flags = MY_FTRACE_FLAGS | |
53 | .private = any_private_data_structure, | |
54 | }; | |
55 | ||
56 | Both .flags and .private are optional. Only .func is required. | |
57 | ||
b3fdd1f9 | 58 | To enable tracing call: |
b4d94210 SR |
59 | |
60 | .. c:function:: register_ftrace_function(&ops); | |
61 | ||
b3fdd1f9 | 62 | To disable tracing call: |
b4d94210 SR |
63 | |
64 | .. c:function:: unregister_ftrace_function(&ops); | |
65 | ||
b3fdd1f9 | 66 | The above is defined by including the header: |
b4d94210 SR |
67 | |
68 | .. c:function:: #include <linux/ftrace.h> | |
69 | ||
70 | The registered callback will start being called some time after the | |
71 | register_ftrace_function() is called and before it returns. The exact time | |
72 | that callbacks start being called is dependent upon architecture and scheduling | |
73 | of services. The callback itself will have to handle any synchronization if it | |
74 | must begin at an exact moment. | |
75 | ||
76 | The unregister_ftrace_function() will guarantee that the callback is | |
77 | no longer being called by functions after the unregister_ftrace_function() | |
78 | returns. Note that to perform this guarantee, the unregister_ftrace_function() | |
79 | may take some time to finish. | |
80 | ||
81 | ||
82 | The callback function | |
83 | ===================== | |
84 | ||
2cd6ff4a | 85 | The prototype of the callback function is as follows (as of v4.14): |
b4d94210 | 86 | |
2cd6ff4a | 87 | .. code-block:: c |
b4d94210 | 88 | |
2cd6ff4a MH |
89 | void callback_func(unsigned long ip, unsigned long parent_ip, |
90 | struct ftrace_ops *op, struct pt_regs *regs); | |
b4d94210 SR |
91 | |
92 | @ip | |
93 | This is the instruction pointer of the function that is being traced. | |
94 | (where the fentry or mcount is within the function) | |
95 | ||
96 | @parent_ip | |
97 | This is the instruction pointer of the function that called the | |
98 | the function being traced (where the call of the function occurred). | |
99 | ||
100 | @op | |
101 | This is a pointer to ftrace_ops that was used to register the callback. | |
102 | This can be used to pass data to the callback via the private pointer. | |
103 | ||
104 | @regs | |
105 | If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED | |
106 | flags are set in the ftrace_ops structure, then this will be pointing | |
107 | to the pt_regs structure like it would be if an breakpoint was placed | |
108 | at the start of the function where ftrace was tracing. Otherwise it | |
109 | either contains garbage, or NULL. | |
110 | ||
111 | ||
112 | The ftrace FLAGS | |
113 | ================ | |
114 | ||
115 | The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. | |
116 | Some of the flags are used for internal infrastructure of ftrace, but the | |
117 | ones that users should be aware of are the following: | |
118 | ||
119 | FTRACE_OPS_FL_SAVE_REGS | |
120 | If the callback requires reading or modifying the pt_regs | |
121 | passed to the callback, then it must set this flag. Registering | |
122 | a ftrace_ops with this flag set on an architecture that does not | |
123 | support passing of pt_regs to the callback will fail. | |
124 | ||
125 | FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED | |
126 | Similar to SAVE_REGS but the registering of a | |
127 | ftrace_ops on an architecture that does not support passing of regs | |
128 | will not fail with this flag set. But the callback must check if | |
129 | regs is NULL or not to determine if the architecture supports it. | |
130 | ||
131 | FTRACE_OPS_FL_RECURSION_SAFE | |
132 | By default, a wrapper is added around the callback to | |
133 | make sure that recursion of the function does not occur. That is, | |
134 | if a function that is called as a result of the callback's execution | |
135 | is also traced, ftrace will prevent the callback from being called | |
136 | again. But this wrapper adds some overhead, and if the callback is | |
137 | safe from recursion, it can set this flag to disable the ftrace | |
138 | protection. | |
139 | ||
140 | Note, if this flag is set, and recursion does occur, it could cause | |
141 | the system to crash, and possibly reboot via a triple fault. | |
142 | ||
143 | It is OK if another callback traces a function that is called by a | |
144 | callback that is marked recursion safe. Recursion safe callbacks | |
145 | must never trace any function that are called by the callback | |
146 | itself or any nested functions that those functions call. | |
147 | ||
148 | If this flag is set, it is possible that the callback will also | |
149 | be called with preemption enabled (when CONFIG_PREEMPT is set), | |
150 | but this is not guaranteed. | |
151 | ||
152 | FTRACE_OPS_FL_IPMODIFY | |
153 | Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" | |
154 | the traced function (have another function called instead of the | |
155 | traced function), it requires setting this flag. This is what live | |
156 | kernel patches uses. Without this flag the pt_regs->ip can not be | |
157 | modified. | |
158 | ||
159 | Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be | |
160 | registered to any given function at a time. | |
161 | ||
162 | FTRACE_OPS_FL_RCU | |
163 | If this is set, then the callback will only be called by functions | |
164 | where RCU is "watching". This is required if the callback function | |
165 | performs any rcu_read_lock() operation. | |
166 | ||
167 | RCU stops watching when the system goes idle, the time when a CPU | |
168 | is taken down and comes back online, and when entering from kernel | |
169 | to user space and back to kernel space. During these transitions, | |
170 | a callback may be executed and RCU synchronization will not protect | |
171 | it. | |
172 | ||
173 | ||
174 | Filtering which functions to trace | |
175 | ================================== | |
176 | ||
177 | If a callback is only to be called from specific functions, a filter must be | |
178 | set up. The filters are added by name, or ip if it is known. | |
179 | ||
2cd6ff4a | 180 | .. code-block:: c |
b4d94210 | 181 | |
2cd6ff4a MH |
182 | int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, |
183 | int len, int reset); | |
b4d94210 SR |
184 | |
185 | @ops | |
186 | The ops to set the filter with | |
187 | ||
188 | @buf | |
189 | The string that holds the function filter text. | |
190 | @len | |
191 | The length of the string. | |
192 | ||
193 | @reset | |
194 | Non-zero to reset all filters before applying this filter. | |
195 | ||
196 | Filters denote which functions should be enabled when tracing is enabled. | |
197 | If @buf is NULL and reset is set, all functions will be enabled for tracing. | |
198 | ||
199 | The @buf can also be a glob expression to enable all functions that | |
200 | match a specific pattern. | |
201 | ||
5fb94e9c | 202 | See Filter Commands in :file:`Documentation/trace/ftrace.rst`. |
b4d94210 | 203 | |
b3fdd1f9 | 204 | To just trace the schedule function: |
b4d94210 | 205 | |
2cd6ff4a | 206 | .. code-block:: c |
b4d94210 | 207 | |
2cd6ff4a | 208 | ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); |
b4d94210 SR |
209 | |
210 | To add more functions, call the ftrace_set_filter() more than once with the | |
211 | @reset parameter set to zero. To remove the current filter set and replace it | |
212 | with new functions defined by @buf, have @reset be non-zero. | |
213 | ||
b3fdd1f9 | 214 | To remove all the filtered functions and trace all functions: |
b4d94210 | 215 | |
2cd6ff4a | 216 | .. code-block:: c |
b4d94210 | 217 | |
2cd6ff4a | 218 | ret = ftrace_set_filter(&ops, NULL, 0, 1); |
b4d94210 SR |
219 | |
220 | ||
221 | Sometimes more than one function has the same name. To trace just a specific | |
222 | function in this case, ftrace_set_filter_ip() can be used. | |
223 | ||
2cd6ff4a | 224 | .. code-block:: c |
b4d94210 | 225 | |
2cd6ff4a | 226 | ret = ftrace_set_filter_ip(&ops, ip, 0, 0); |
b4d94210 SR |
227 | |
228 | Although the ip must be the address where the call to fentry or mcount is | |
229 | located in the function. This function is used by perf and kprobes that | |
230 | gets the ip address from the user (usually using debug info from the kernel). | |
231 | ||
232 | If a glob is used to set the filter, functions can be added to a "notrace" | |
233 | list that will prevent those functions from calling the callback. | |
234 | The "notrace" list takes precedence over the "filter" list. If the | |
235 | two lists are non-empty and contain the same functions, the callback will not | |
236 | be called by any function. | |
237 | ||
238 | An empty "notrace" list means to allow all functions defined by the filter | |
239 | to be traced. | |
240 | ||
2cd6ff4a | 241 | .. code-block:: c |
b4d94210 | 242 | |
2cd6ff4a MH |
243 | int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, |
244 | int len, int reset); | |
b4d94210 SR |
245 | |
246 | This takes the same parameters as ftrace_set_filter() but will add the | |
247 | functions it finds to not be traced. This is a separate list from the | |
248 | filter list, and this function does not modify the filter list. | |
249 | ||
250 | A non-zero @reset will clear the "notrace" list before adding functions | |
251 | that match @buf to it. | |
252 | ||
253 | Clearing the "notrace" list is the same as clearing the filter list | |
254 | ||
2cd6ff4a | 255 | .. code-block:: c |
b4d94210 SR |
256 | |
257 | ret = ftrace_set_notrace(&ops, NULL, 0, 1); | |
258 | ||
259 | The filter and notrace lists may be changed at any time. If only a set of | |
260 | functions should call the callback, it is best to set the filters before | |
261 | registering the callback. But the changes may also happen after the callback | |
262 | has been registered. | |
263 | ||
264 | If a filter is in place, and the @reset is non-zero, and @buf contains a | |
265 | matching glob to functions, the switch will happen during the time of | |
266 | the ftrace_set_filter() call. At no time will all functions call the callback. | |
267 | ||
2cd6ff4a | 268 | .. code-block:: c |
b4d94210 | 269 | |
2cd6ff4a | 270 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
b4d94210 | 271 | |
2cd6ff4a | 272 | register_ftrace_function(&ops); |
b4d94210 | 273 | |
2cd6ff4a | 274 | msleep(10); |
b4d94210 | 275 | |
2cd6ff4a | 276 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); |
b4d94210 SR |
277 | |
278 | is not the same as: | |
279 | ||
2cd6ff4a | 280 | .. code-block:: c |
b4d94210 | 281 | |
2cd6ff4a | 282 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
b4d94210 | 283 | |
2cd6ff4a | 284 | register_ftrace_function(&ops); |
b4d94210 | 285 | |
2cd6ff4a | 286 | msleep(10); |
b4d94210 | 287 | |
2cd6ff4a | 288 | ftrace_set_filter(&ops, NULL, 0, 1); |
b4d94210 | 289 | |
2cd6ff4a | 290 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); |
b4d94210 SR |
291 | |
292 | As the latter will have a short time where all functions will call | |
293 | the callback, between the time of the reset, and the time of the | |
294 | new setting of the filter. |