Commit | Line | Data |
---|---|---|
63e564eb KKD |
1 | ============================= |
2 | BPF Kernel Functions (kfuncs) | |
3 | ============================= | |
4 | ||
5 | 1. Introduction | |
6 | =============== | |
7 | ||
8 | BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux | |
9 | kernel which are exposed for use by BPF programs. Unlike normal BPF helpers, | |
10 | kfuncs do not have a stable interface and can change from one kernel release to | |
11 | another. Hence, BPF programs need to be updated in response to changes in the | |
12 | kernel. | |
13 | ||
14 | 2. Defining a kfunc | |
15 | =================== | |
16 | ||
17 | There are two ways to expose a kernel function to BPF programs, either make an | |
18 | existing function in the kernel visible, or add a new wrapper for BPF. In both | |
19 | cases, care must be taken that BPF program can only call such function in a | |
20 | valid context. To enforce this, visibility of a kfunc can be per program type. | |
21 | ||
22 | If you are not creating a BPF wrapper for existing kernel function, skip ahead | |
23 | to :ref:`BPF_kfunc_nodef`. | |
24 | ||
25 | 2.1 Creating a wrapper kfunc | |
26 | ---------------------------- | |
27 | ||
28 | When defining a wrapper kfunc, the wrapper function should have extern linkage. | |
29 | This prevents the compiler from optimizing away dead code, as this wrapper kfunc | |
30 | is not invoked anywhere in the kernel itself. It is not necessary to provide a | |
31 | prototype in a header for the wrapper kfunc. | |
32 | ||
33 | An example is given below:: | |
34 | ||
35 | /* Disables missing prototype warnings */ | |
36 | __diag_push(); | |
37 | __diag_ignore_all("-Wmissing-prototypes", | |
38 | "Global kfuncs as their definitions will be in BTF"); | |
39 | ||
40 | struct task_struct *bpf_find_get_task_by_vpid(pid_t nr) | |
41 | { | |
42 | return find_get_task_by_vpid(nr); | |
43 | } | |
44 | ||
45 | __diag_pop(); | |
46 | ||
47 | A wrapper kfunc is often needed when we need to annotate parameters of the | |
48 | kfunc. Otherwise one may directly make the kfunc visible to the BPF program by | |
49 | registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`. | |
50 | ||
51 | 2.2 Annotating kfunc parameters | |
52 | ------------------------------- | |
53 | ||
54 | Similar to BPF helpers, there is sometime need for additional context required | |
55 | by the verifier to make the usage of kernel functions safer and more useful. | |
56 | Hence, we can annotate a parameter by suffixing the name of the argument of the | |
57 | kfunc with a __tag, where tag may be one of the supported annotations. | |
58 | ||
59 | 2.2.1 __sz Annotation | |
60 | --------------------- | |
61 | ||
62 | This annotation is used to indicate a memory and size pair in the argument list. | |
63 | An example is given below:: | |
64 | ||
65 | void bpf_memzero(void *mem, int mem__sz) | |
66 | { | |
67 | ... | |
68 | } | |
69 | ||
70 | Here, the verifier will treat first argument as a PTR_TO_MEM, and second | |
71 | argument as its size. By default, without __sz annotation, the size of the type | |
72 | of the pointer is used. Without __sz annotation, a kfunc cannot accept a void | |
73 | pointer. | |
74 | ||
a50388db KKD |
75 | 2.2.2 __k Annotation |
76 | -------------------- | |
77 | ||
78 | This annotation is only understood for scalar arguments, where it indicates that | |
79 | the verifier must check the scalar argument to be a known constant, which does | |
80 | not indicate a size parameter, and the value of the constant is relevant to the | |
81 | safety of the program. | |
82 | ||
83 | An example is given below:: | |
84 | ||
85 | void *bpf_obj_new(u32 local_type_id__k, ...) | |
86 | { | |
87 | ... | |
88 | } | |
89 | ||
90 | Here, bpf_obj_new uses local_type_id argument to find out the size of that type | |
91 | ID in program's BTF and return a sized pointer to it. Each type ID will have a | |
92 | distinct size, hence it is crucial to treat each such call as distinct when | |
93 | values don't match during verifier state pruning checks. | |
94 | ||
95 | Hence, whenever a constant scalar argument is accepted by a kfunc which is not a | |
96 | size parameter, and the value of the constant matters for program safety, __k | |
97 | suffix should be used. | |
98 | ||
63e564eb KKD |
99 | .. _BPF_kfunc_nodef: |
100 | ||
101 | 2.3 Using an existing kernel function | |
102 | ------------------------------------- | |
103 | ||
104 | When an existing function in the kernel is fit for consumption by BPF programs, | |
105 | it can be directly registered with the BPF subsystem. However, care must still | |
106 | be taken to review the context in which it will be invoked by the BPF program | |
107 | and whether it is safe to do so. | |
108 | ||
109 | 2.4 Annotating kfuncs | |
110 | --------------------- | |
111 | ||
112 | In addition to kfuncs' arguments, verifier may need more information about the | |
113 | type of kfunc(s) being registered with the BPF subsystem. To do so, we define | |
114 | flags on a set of kfuncs as follows:: | |
115 | ||
116 | BTF_SET8_START(bpf_task_set) | |
117 | BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) | |
118 | BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) | |
119 | BTF_SET8_END(bpf_task_set) | |
120 | ||
121 | This set encodes the BTF ID of each kfunc listed above, and encodes the flags | |
122 | along with it. Ofcourse, it is also allowed to specify no flags. | |
123 | ||
124 | 2.4.1 KF_ACQUIRE flag | |
125 | --------------------- | |
126 | ||
127 | The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a | |
128 | refcounted object. The verifier will then ensure that the pointer to the object | |
129 | is eventually released using a release kfunc, or transferred to a map using a | |
130 | referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the | |
131 | loading of the BPF program until no lingering references remain in all possible | |
132 | explored states of the program. | |
133 | ||
134 | 2.4.2 KF_RET_NULL flag | |
135 | ---------------------- | |
136 | ||
137 | The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc | |
138 | may be NULL. Hence, it forces the user to do a NULL check on the pointer | |
139 | returned from the kfunc before making use of it (dereferencing or passing to | |
140 | another helper). This flag is often used in pairing with KF_ACQUIRE flag, but | |
141 | both are orthogonal to each other. | |
142 | ||
143 | 2.4.3 KF_RELEASE flag | |
144 | --------------------- | |
145 | ||
146 | The KF_RELEASE flag is used to indicate that the kfunc releases the pointer | |
147 | passed in to it. There can be only one referenced pointer that can be passed in. | |
148 | All copies of the pointer being released are invalidated as a result of invoking | |
149 | kfunc with this flag. | |
150 | ||
151 | 2.4.4 KF_KPTR_GET flag | |
152 | ---------------------- | |
153 | ||
154 | The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument | |
155 | as a pointer to kptr, safely increments the refcount of the object it points to, | |
156 | and returns a reference to the user. The rest of the arguments may be normal | |
157 | arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with | |
158 | KF_ACQUIRE and KF_RET_NULL flags. | |
159 | ||
160 | 2.4.5 KF_TRUSTED_ARGS flag | |
161 | -------------------------- | |
162 | ||
163 | The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It | |
3f00c523 DV |
164 | indicates that the all pointer arguments are valid, and that all pointers to |
165 | BTF objects have been passed in their unmodified form (that is, at a zero | |
166 | offset, and without having been obtained from walking another pointer). | |
167 | ||
168 | There are two types of pointers to kernel objects which are considered "valid": | |
169 | ||
170 | 1. Pointers which are passed as tracepoint or struct_ops callback arguments. | |
171 | 2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc. | |
172 | ||
173 | Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to | |
174 | KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset. | |
175 | ||
176 | The definition of "valid" pointers is subject to change at any time, and has | |
177 | absolutely no ABI stability guarantees. | |
63e564eb | 178 | |
fa96b242 BT |
179 | 2.4.6 KF_SLEEPABLE flag |
180 | ----------------------- | |
181 | ||
182 | The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only | |
183 | be called by sleepable BPF programs (BPF_F_SLEEPABLE). | |
184 | ||
4dd48c6f AS |
185 | 2.4.7 KF_DESTRUCTIVE flag |
186 | -------------------------- | |
187 | ||
188 | The KF_DESTRUCTIVE flag is used to indicate functions calling which is | |
189 | destructive to the system. For example such a call can result in system | |
190 | rebooting or panicking. Due to this additional restrictions apply to these | |
191 | calls. At the moment they only require CAP_SYS_BOOT capability, but more can be | |
192 | added later. | |
193 | ||
f5362564 YS |
194 | 2.4.8 KF_RCU flag |
195 | ----------------- | |
196 | ||
197 | The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument. | |
198 | When used together with KF_ACQUIRE, it indicates the kfunc should have a | |
199 | single argument which must be a trusted argument or a MEM_RCU pointer. | |
200 | The argument may have reference count of 0 and the kfunc must take this | |
201 | into consideration. | |
202 | ||
63e564eb KKD |
203 | 2.5 Registering the kfuncs |
204 | -------------------------- | |
205 | ||
206 | Once the kfunc is prepared for use, the final step to making it visible is | |
207 | registering it with the BPF subsystem. Registration is done per BPF program | |
208 | type. An example is shown below:: | |
209 | ||
210 | BTF_SET8_START(bpf_task_set) | |
211 | BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) | |
212 | BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) | |
213 | BTF_SET8_END(bpf_task_set) | |
214 | ||
215 | static const struct btf_kfunc_id_set bpf_task_kfunc_set = { | |
216 | .owner = THIS_MODULE, | |
217 | .set = &bpf_task_set, | |
218 | }; | |
219 | ||
220 | static int init_subsystem(void) | |
221 | { | |
222 | return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set); | |
223 | } | |
224 | late_initcall(init_subsystem); | |
25c5e92d DV |
225 | |
226 | 3. Core kfuncs | |
227 | ============== | |
228 | ||
229 | The BPF subsystem provides a number of "core" kfuncs that are potentially | |
230 | applicable to a wide variety of different possible use cases and programs. | |
231 | Those kfuncs are documented here. | |
232 | ||
233 | 3.1 struct task_struct * kfuncs | |
234 | ------------------------------- | |
235 | ||
236 | There are a number of kfuncs that allow ``struct task_struct *`` objects to be | |
237 | used as kptrs: | |
238 | ||
239 | .. kernel-doc:: kernel/bpf/helpers.c | |
240 | :identifiers: bpf_task_acquire bpf_task_release | |
241 | ||
242 | These kfuncs are useful when you want to acquire or release a reference to a | |
243 | ``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a | |
244 | struct_ops callback arg. For example: | |
245 | ||
246 | .. code-block:: c | |
247 | ||
248 | /** | |
249 | * A trivial example tracepoint program that shows how to | |
250 | * acquire and release a struct task_struct * pointer. | |
251 | */ | |
252 | SEC("tp_btf/task_newtask") | |
253 | int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags) | |
254 | { | |
255 | struct task_struct *acquired; | |
256 | ||
257 | acquired = bpf_task_acquire(task); | |
258 | ||
259 | /* | |
260 | * In a typical program you'd do something like store | |
261 | * the task in a map, and the map will automatically | |
262 | * release it later. Here, we release it manually. | |
263 | */ | |
264 | bpf_task_release(acquired); | |
265 | return 0; | |
266 | } | |
267 | ||
268 | ---- | |
269 | ||
270 | A BPF program can also look up a task from a pid. This can be useful if the | |
271 | caller doesn't have a trusted pointer to a ``struct task_struct *`` object that | |
272 | it can acquire a reference on with bpf_task_acquire(). | |
273 | ||
274 | .. kernel-doc:: kernel/bpf/helpers.c | |
275 | :identifiers: bpf_task_from_pid | |
276 | ||
277 | Here is an example of it being used: | |
278 | ||
279 | .. code-block:: c | |
280 | ||
281 | SEC("tp_btf/task_newtask") | |
282 | int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags) | |
283 | { | |
284 | struct task_struct *lookup; | |
285 | ||
286 | lookup = bpf_task_from_pid(task->pid); | |
287 | if (!lookup) | |
288 | /* A task should always be found, as %task is a tracepoint arg. */ | |
289 | return -ENOENT; | |
290 | ||
291 | if (lookup->pid != task->pid) { | |
292 | /* bpf_task_from_pid() looks up the task via its | |
293 | * globally-unique pid from the init_pid_ns. Thus, | |
294 | * the pid of the lookup task should always be the | |
295 | * same as the input task. | |
296 | */ | |
297 | bpf_task_release(lookup); | |
298 | return -EINVAL; | |
299 | } | |
300 | ||
301 | /* bpf_task_from_pid() returns an acquired reference, | |
302 | * so it must be dropped before returning from the | |
303 | * tracepoint handler. | |
304 | */ | |
305 | bpf_task_release(lookup); | |
306 | return 0; | |
307 | } |