Commit | Line | Data |
---|---|---|
ffcf7ce9 YS |
1 | ===================== |
2 | BPF Type Format (BTF) | |
3 | ===================== | |
4 | ||
5 | 1. Introduction | |
6 | *************** | |
7 | ||
9ab5305d AN |
8 | BTF (BPF Type Format) is the metadata format which encodes the debug info |
9 | related to BPF program/map. The name BTF was used initially to describe data | |
10 | types. The BTF was later extended to include function info for defined | |
11 | subroutines, and line info for source/line information. | |
12 | ||
13 | The debug info is used for map pretty print, function signature, etc. The | |
14 | function signature enables better bpf program/function kernel symbol. The line | |
15 | info helps generate source annotated translated byte code, jited code and | |
16 | verifier log. | |
ffcf7ce9 YS |
17 | |
18 | The BTF specification contains two parts, | |
19 | * BTF kernel API | |
20 | * BTF ELF file format | |
21 | ||
9ab5305d AN |
22 | The kernel API is the contract between user space and kernel. The kernel |
23 | verifies the BTF info before using it. The ELF file format is a user space | |
24 | contract between ELF file and libbpf loader. | |
ffcf7ce9 | 25 | |
9ab5305d AN |
26 | The type and string sections are part of the BTF kernel API, describing the |
27 | debug info (mostly types related) referenced by the bpf program. These two | |
28 | sections are discussed in details in :ref:`BTF_Type_String`. | |
ffcf7ce9 YS |
29 | |
30 | .. _BTF_Type_String: | |
31 | ||
32 | 2. BTF Type and String Encoding | |
33 | ******************************* | |
34 | ||
9ab5305d AN |
35 | The file ``include/uapi/linux/btf.h`` provides high-level definition of how |
36 | types/strings are encoded. | |
ffcf7ce9 YS |
37 | |
38 | The beginning of data blob must be:: | |
39 | ||
40 | struct btf_header { | |
41 | __u16 magic; | |
42 | __u8 version; | |
43 | __u8 flags; | |
44 | __u32 hdr_len; | |
45 | ||
46 | /* All offsets are in bytes relative to the end of this header */ | |
47 | __u32 type_off; /* offset of type section */ | |
48 | __u32 type_len; /* length of type section */ | |
49 | __u32 str_off; /* offset of string section */ | |
50 | __u32 str_len; /* length of string section */ | |
51 | }; | |
52 | ||
53 | The magic is ``0xeB9F``, which has different encoding for big and little | |
9ab5305d AN |
54 | endian systems, and can be used to test whether BTF is generated for big- or |
55 | little-endian target. The ``btf_header`` is designed to be extensible with | |
56 | ``hdr_len`` equal to ``sizeof(struct btf_header)`` when a data blob is | |
57 | generated. | |
ffcf7ce9 YS |
58 | |
59 | 2.1 String Encoding | |
60 | =================== | |
61 | ||
9ab5305d AN |
62 | The first string in the string section must be a null string. The rest of |
63 | string table is a concatenation of other null-terminated strings. | |
ffcf7ce9 YS |
64 | |
65 | 2.2 Type Encoding | |
66 | ================= | |
67 | ||
9ab5305d AN |
68 | The type id ``0`` is reserved for ``void`` type. The type section is parsed |
69 | sequentially and type id is assigned to each recognized type starting from id | |
70 | ``1``. Currently, the following types are supported:: | |
ffcf7ce9 YS |
71 | |
72 | #define BTF_KIND_INT 1 /* Integer */ | |
73 | #define BTF_KIND_PTR 2 /* Pointer */ | |
74 | #define BTF_KIND_ARRAY 3 /* Array */ | |
75 | #define BTF_KIND_STRUCT 4 /* Struct */ | |
76 | #define BTF_KIND_UNION 5 /* Union */ | |
77 | #define BTF_KIND_ENUM 6 /* Enumeration */ | |
78 | #define BTF_KIND_FWD 7 /* Forward */ | |
79 | #define BTF_KIND_TYPEDEF 8 /* Typedef */ | |
80 | #define BTF_KIND_VOLATILE 9 /* Volatile */ | |
81 | #define BTF_KIND_CONST 10 /* Const */ | |
82 | #define BTF_KIND_RESTRICT 11 /* Restrict */ | |
83 | #define BTF_KIND_FUNC 12 /* Function */ | |
84 | #define BTF_KIND_FUNC_PROTO 13 /* Function Proto */ | |
f063c889 DB |
85 | #define BTF_KIND_VAR 14 /* Variable */ |
86 | #define BTF_KIND_DATASEC 15 /* Section */ | |
ffcf7ce9 YS |
87 | |
88 | Note that the type section encodes debug info, not just pure types. | |
89 | ``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram. | |
90 | ||
91 | Each type contains the following common data:: | |
92 | ||
93 | struct btf_type { | |
94 | __u32 name_off; | |
95 | /* "info" bits arrangement | |
96 | * bits 0-15: vlen (e.g. # of struct's members) | |
97 | * bits 16-23: unused | |
98 | * bits 24-27: kind (e.g. int, ptr, array...etc) | |
99 | * bits 28-30: unused | |
100 | * bit 31: kind_flag, currently used by | |
101 | * struct, union and fwd | |
102 | */ | |
103 | __u32 info; | |
104 | /* "size" is used by INT, ENUM, STRUCT and UNION. | |
105 | * "size" tells the size of the type it is describing. | |
106 | * | |
107 | * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT, | |
108 | * FUNC and FUNC_PROTO. | |
109 | * "type" is a type_id referring to another type. | |
110 | */ | |
111 | union { | |
112 | __u32 size; | |
113 | __u32 type; | |
114 | }; | |
115 | }; | |
116 | ||
9ab5305d AN |
117 | For certain kinds, the common data are followed by kind-specific data. The |
118 | ``name_off`` in ``struct btf_type`` specifies the offset in the string table. | |
119 | The following sections detail encoding of each kind. | |
ffcf7ce9 YS |
120 | |
121 | 2.2.1 BTF_KIND_INT | |
122 | ~~~~~~~~~~~~~~~~~~ | |
123 | ||
124 | ``struct btf_type`` encoding requirement: | |
125 | * ``name_off``: any valid offset | |
126 | * ``info.kind_flag``: 0 | |
127 | * ``info.kind``: BTF_KIND_INT | |
128 | * ``info.vlen``: 0 | |
129 | * ``size``: the size of the int type in bytes. | |
130 | ||
5efc529f | 131 | ``btf_type`` is followed by a ``u32`` with the following bits arrangement:: |
ffcf7ce9 YS |
132 | |
133 | #define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24) | |
948dc8c9 | 134 | #define BTF_INT_OFFSET(VAL) (((VAL) & 0x00ff0000) >> 16) |
ffcf7ce9 YS |
135 | #define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff) |
136 | ||
137 | The ``BTF_INT_ENCODING`` has the following attributes:: | |
138 | ||
139 | #define BTF_INT_SIGNED (1 << 0) | |
140 | #define BTF_INT_CHAR (1 << 1) | |
141 | #define BTF_INT_BOOL (1 << 2) | |
142 | ||
9ab5305d AN |
143 | The ``BTF_INT_ENCODING()`` provides extra information: signedness, char, or |
144 | bool, for the int type. The char and bool encoding are mostly useful for | |
145 | pretty print. At most one encoding can be specified for the int type. | |
146 | ||
147 | The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int | |
148 | type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4. | |
149 | The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()`` | |
150 | for the type. The maximum value of ``BTF_INT_BITS()`` is 128. | |
151 | ||
152 | The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values | |
f52c97d9 | 153 | for this int. For example, a bitfield struct member has: |
d857a3ff | 154 | |
f52c97d9 JDB |
155 | * btf member bit offset 100 from the start of the structure, |
156 | * btf member pointing to an int type, | |
157 | * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` | |
ffcf7ce9 | 158 | |
9ab5305d AN |
159 | Then in the struct memory layout, this member will occupy ``4`` bits starting |
160 | from bits ``100 + 2 = 102``. | |
ffcf7ce9 | 161 | |
9ab5305d AN |
162 | Alternatively, the bitfield struct member can be the following to access the |
163 | same bits as the above: | |
d857a3ff | 164 | |
ffcf7ce9 YS |
165 | * btf member bit offset 102, |
166 | * btf member pointing to an int type, | |
167 | * the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4`` | |
168 | ||
9ab5305d AN |
169 | The original intention of ``BTF_INT_OFFSET()`` is to provide flexibility of |
170 | bitfield encoding. Currently, both llvm and pahole generate | |
171 | ``BTF_INT_OFFSET() = 0`` for all int types. | |
ffcf7ce9 YS |
172 | |
173 | 2.2.2 BTF_KIND_PTR | |
174 | ~~~~~~~~~~~~~~~~~~ | |
175 | ||
176 | ``struct btf_type`` encoding requirement: | |
177 | * ``name_off``: 0 | |
178 | * ``info.kind_flag``: 0 | |
179 | * ``info.kind``: BTF_KIND_PTR | |
180 | * ``info.vlen``: 0 | |
181 | * ``type``: the pointee type of the pointer | |
182 | ||
183 | No additional type data follow ``btf_type``. | |
184 | ||
185 | 2.2.3 BTF_KIND_ARRAY | |
186 | ~~~~~~~~~~~~~~~~~~~~ | |
187 | ||
188 | ``struct btf_type`` encoding requirement: | |
189 | * ``name_off``: 0 | |
190 | * ``info.kind_flag``: 0 | |
191 | * ``info.kind``: BTF_KIND_ARRAY | |
192 | * ``info.vlen``: 0 | |
193 | * ``size/type``: 0, not used | |
194 | ||
5efc529f | 195 | ``btf_type`` is followed by one ``struct btf_array``:: |
ffcf7ce9 YS |
196 | |
197 | struct btf_array { | |
198 | __u32 type; | |
199 | __u32 index_type; | |
200 | __u32 nelems; | |
201 | }; | |
202 | ||
203 | The ``struct btf_array`` encoding: | |
204 | * ``type``: the element type | |
205 | * ``index_type``: the index type | |
206 | * ``nelems``: the number of elements for this array (``0`` is also allowed). | |
207 | ||
9ab5305d AN |
208 | The ``index_type`` can be any regular int type (``u8``, ``u16``, ``u32``, |
209 | ``u64``, ``unsigned __int128``). The original design of including | |
210 | ``index_type`` follows DWARF, which has an ``index_type`` for its array type. | |
ffcf7ce9 YS |
211 | Currently in BTF, beyond type verification, the ``index_type`` is not used. |
212 | ||
213 | The ``struct btf_array`` allows chaining through element type to represent | |
9ab5305d AN |
214 | multidimensional arrays. For example, for ``int a[5][6]``, the following type |
215 | information illustrates the chaining: | |
ffcf7ce9 YS |
216 | |
217 | * [1]: int | |
218 | * [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6`` | |
219 | * [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5`` | |
220 | ||
9ab5305d AN |
221 | Currently, both pahole and llvm collapse multidimensional array into |
222 | one-dimensional array, e.g., for ``a[5][6]``, the ``btf_array.nelems`` is | |
223 | equal to ``30``. This is because the original use case is map pretty print | |
224 | where the whole array is dumped out so one-dimensional array is enough. As | |
225 | more BTF usage is explored, pahole and llvm can be changed to generate proper | |
226 | chained representation for multidimensional arrays. | |
ffcf7ce9 YS |
227 | |
228 | 2.2.4 BTF_KIND_STRUCT | |
229 | ~~~~~~~~~~~~~~~~~~~~~ | |
230 | 2.2.5 BTF_KIND_UNION | |
231 | ~~~~~~~~~~~~~~~~~~~~ | |
232 | ||
233 | ``struct btf_type`` encoding requirement: | |
234 | * ``name_off``: 0 or offset to a valid C identifier | |
235 | * ``info.kind_flag``: 0 or 1 | |
236 | * ``info.kind``: BTF_KIND_STRUCT or BTF_KIND_UNION | |
237 | * ``info.vlen``: the number of struct/union members | |
238 | * ``info.size``: the size of the struct/union in bytes | |
239 | ||
240 | ``btf_type`` is followed by ``info.vlen`` number of ``struct btf_member``.:: | |
241 | ||
242 | struct btf_member { | |
243 | __u32 name_off; | |
244 | __u32 type; | |
245 | __u32 offset; | |
246 | }; | |
247 | ||
248 | ``struct btf_member`` encoding: | |
249 | * ``name_off``: offset to a valid C identifier | |
250 | * ``type``: the member type | |
251 | * ``offset``: <see below> | |
252 | ||
9ab5305d AN |
253 | If the type info ``kind_flag`` is not set, the offset contains only bit offset |
254 | of the member. Note that the base type of the bitfield can only be int or enum | |
255 | type. If the bitfield size is 32, the base type can be either int or enum | |
256 | type. If the bitfield size is not 32, the base type must be int, and int type | |
257 | ``BTF_INT_BITS()`` encodes the bitfield size. | |
ffcf7ce9 | 258 | |
9ab5305d AN |
259 | If the ``kind_flag`` is set, the ``btf_member.offset`` contains both member |
260 | bitfield size and bit offset. The bitfield size and bit offset are calculated | |
261 | as below.:: | |
ffcf7ce9 YS |
262 | |
263 | #define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24) | |
264 | #define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff) | |
265 | ||
9ab5305d | 266 | In this case, if the base type is an int type, it must be a regular int type: |
ffcf7ce9 YS |
267 | |
268 | * ``BTF_INT_OFFSET()`` must be 0. | |
269 | * ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``. | |
270 | ||
9ab5305d AN |
271 | The following kernel patch introduced ``kind_flag`` and explained why both |
272 | modes exist: | |
ffcf7ce9 YS |
273 | |
274 | https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3 | |
275 | ||
276 | 2.2.6 BTF_KIND_ENUM | |
277 | ~~~~~~~~~~~~~~~~~~~ | |
278 | ||
279 | ``struct btf_type`` encoding requirement: | |
280 | * ``name_off``: 0 or offset to a valid C identifier | |
281 | * ``info.kind_flag``: 0 | |
282 | * ``info.kind``: BTF_KIND_ENUM | |
283 | * ``info.vlen``: number of enum values | |
284 | * ``size``: 4 | |
285 | ||
286 | ``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum``.:: | |
287 | ||
288 | struct btf_enum { | |
289 | __u32 name_off; | |
290 | __s32 val; | |
291 | }; | |
292 | ||
293 | The ``btf_enum`` encoding: | |
294 | * ``name_off``: offset to a valid C identifier | |
295 | * ``val``: any value | |
296 | ||
297 | 2.2.7 BTF_KIND_FWD | |
298 | ~~~~~~~~~~~~~~~~~~ | |
299 | ||
300 | ``struct btf_type`` encoding requirement: | |
301 | * ``name_off``: offset to a valid C identifier | |
302 | * ``info.kind_flag``: 0 for struct, 1 for union | |
303 | * ``info.kind``: BTF_KIND_FWD | |
304 | * ``info.vlen``: 0 | |
305 | * ``type``: 0 | |
306 | ||
307 | No additional type data follow ``btf_type``. | |
308 | ||
309 | 2.2.8 BTF_KIND_TYPEDEF | |
310 | ~~~~~~~~~~~~~~~~~~~~~~ | |
311 | ||
312 | ``struct btf_type`` encoding requirement: | |
313 | * ``name_off``: offset to a valid C identifier | |
314 | * ``info.kind_flag``: 0 | |
315 | * ``info.kind``: BTF_KIND_TYPEDEF | |
316 | * ``info.vlen``: 0 | |
317 | * ``type``: the type which can be referred by name at ``name_off`` | |
318 | ||
319 | No additional type data follow ``btf_type``. | |
320 | ||
321 | 2.2.9 BTF_KIND_VOLATILE | |
322 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
323 | ||
324 | ``struct btf_type`` encoding requirement: | |
325 | * ``name_off``: 0 | |
326 | * ``info.kind_flag``: 0 | |
327 | * ``info.kind``: BTF_KIND_VOLATILE | |
328 | * ``info.vlen``: 0 | |
329 | * ``type``: the type with ``volatile`` qualifier | |
330 | ||
331 | No additional type data follow ``btf_type``. | |
332 | ||
333 | 2.2.10 BTF_KIND_CONST | |
334 | ~~~~~~~~~~~~~~~~~~~~~ | |
335 | ||
336 | ``struct btf_type`` encoding requirement: | |
337 | * ``name_off``: 0 | |
338 | * ``info.kind_flag``: 0 | |
339 | * ``info.kind``: BTF_KIND_CONST | |
340 | * ``info.vlen``: 0 | |
341 | * ``type``: the type with ``const`` qualifier | |
342 | ||
343 | No additional type data follow ``btf_type``. | |
344 | ||
345 | 2.2.11 BTF_KIND_RESTRICT | |
346 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
347 | ||
348 | ``struct btf_type`` encoding requirement: | |
349 | * ``name_off``: 0 | |
350 | * ``info.kind_flag``: 0 | |
351 | * ``info.kind``: BTF_KIND_RESTRICT | |
352 | * ``info.vlen``: 0 | |
353 | * ``type``: the type with ``restrict`` qualifier | |
354 | ||
355 | No additional type data follow ``btf_type``. | |
356 | ||
357 | 2.2.12 BTF_KIND_FUNC | |
358 | ~~~~~~~~~~~~~~~~~~~~ | |
359 | ||
360 | ``struct btf_type`` encoding requirement: | |
361 | * ``name_off``: offset to a valid C identifier | |
362 | * ``info.kind_flag``: 0 | |
363 | * ``info.kind``: BTF_KIND_FUNC | |
364 | * ``info.vlen``: 0 | |
365 | * ``type``: a BTF_KIND_FUNC_PROTO type | |
366 | ||
367 | No additional type data follow ``btf_type``. | |
368 | ||
5efc529f | 369 | A BTF_KIND_FUNC defines not a type, but a subprogram (function) whose |
9ab5305d AN |
370 | signature is defined by ``type``. The subprogram is thus an instance of that |
371 | type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the | |
372 | :ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load` | |
373 | (ABI). | |
ffcf7ce9 YS |
374 | |
375 | 2.2.13 BTF_KIND_FUNC_PROTO | |
376 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
377 | ||
378 | ``struct btf_type`` encoding requirement: | |
379 | * ``name_off``: 0 | |
380 | * ``info.kind_flag``: 0 | |
381 | * ``info.kind``: BTF_KIND_FUNC_PROTO | |
382 | * ``info.vlen``: # of parameters | |
383 | * ``type``: the return type | |
384 | ||
385 | ``btf_type`` is followed by ``info.vlen`` number of ``struct btf_param``.:: | |
386 | ||
387 | struct btf_param { | |
388 | __u32 name_off; | |
389 | __u32 type; | |
390 | }; | |
391 | ||
9ab5305d AN |
392 | If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, then |
393 | ``btf_param.name_off`` must point to a valid C identifier except for the | |
394 | possible last argument representing the variable argument. The btf_param.type | |
395 | refers to parameter type. | |
ffcf7ce9 | 396 | |
9ab5305d AN |
397 | If the function has variable arguments, the last parameter is encoded with |
398 | ``name_off = 0`` and ``type = 0``. | |
ffcf7ce9 | 399 | |
f063c889 DB |
400 | 2.2.14 BTF_KIND_VAR |
401 | ~~~~~~~~~~~~~~~~~~~ | |
402 | ||
403 | ``struct btf_type`` encoding requirement: | |
404 | * ``name_off``: offset to a valid C identifier | |
405 | * ``info.kind_flag``: 0 | |
406 | * ``info.kind``: BTF_KIND_VAR | |
407 | * ``info.vlen``: 0 | |
408 | * ``type``: the type of the variable | |
409 | ||
410 | ``btf_type`` is followed by a single ``struct btf_variable`` with the | |
411 | following data:: | |
412 | ||
413 | struct btf_var { | |
414 | __u32 linkage; | |
415 | }; | |
416 | ||
417 | ``struct btf_var`` encoding: | |
418 | * ``linkage``: currently only static variable 0, or globally allocated | |
419 | variable in ELF sections 1 | |
420 | ||
421 | Not all type of global variables are supported by LLVM at this point. | |
422 | The following is currently available: | |
423 | ||
424 | * static variables with or without section attributes | |
425 | * global variables with section attributes | |
426 | ||
427 | The latter is for future extraction of map key/value type id's from a | |
428 | map definition. | |
429 | ||
430 | 2.2.15 BTF_KIND_DATASEC | |
431 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
432 | ||
433 | ``struct btf_type`` encoding requirement: | |
434 | * ``name_off``: offset to a valid name associated with a variable or | |
435 | one of .data/.bss/.rodata | |
436 | * ``info.kind_flag``: 0 | |
437 | * ``info.kind``: BTF_KIND_DATASEC | |
438 | * ``info.vlen``: # of variables | |
439 | * ``size``: total section size in bytes (0 at compilation time, patched | |
440 | to actual size by BPF loaders such as libbpf) | |
441 | ||
442 | ``btf_type`` is followed by ``info.vlen`` number of ``struct btf_var_secinfo``.:: | |
443 | ||
444 | struct btf_var_secinfo { | |
445 | __u32 type; | |
446 | __u32 offset; | |
447 | __u32 size; | |
448 | }; | |
449 | ||
450 | ``struct btf_var_secinfo`` encoding: | |
451 | * ``type``: the type of the BTF_KIND_VAR variable | |
452 | * ``offset``: the in-section offset of the variable | |
453 | * ``size``: the size of the variable in bytes | |
454 | ||
ffcf7ce9 YS |
455 | 3. BTF Kernel API |
456 | ***************** | |
457 | ||
458 | The following bpf syscall command involves BTF: | |
459 | * BPF_BTF_LOAD: load a blob of BTF data into kernel | |
460 | * BPF_MAP_CREATE: map creation with btf key and value type info. | |
461 | * BPF_PROG_LOAD: prog load with btf function and line info. | |
462 | * BPF_BTF_GET_FD_BY_ID: get a btf fd | |
463 | * BPF_OBJ_GET_INFO_BY_FD: btf, func_info, line_info | |
464 | and other btf related info are returned. | |
465 | ||
466 | The workflow typically looks like: | |
467 | :: | |
468 | ||
469 | Application: | |
470 | BPF_BTF_LOAD | |
471 | | | |
472 | v | |
473 | BPF_MAP_CREATE and BPF_PROG_LOAD | |
474 | | | |
475 | V | |
476 | ...... | |
477 | ||
478 | Introspection tool: | |
479 | ...... | |
480 | BPF_{PROG,MAP}_GET_NEXT_ID (get prog/map id's) | |
481 | | | |
482 | V | |
483 | BPF_{PROG,MAP}_GET_FD_BY_ID (get a prog/map fd) | |
484 | | | |
485 | V | |
486 | BPF_OBJ_GET_INFO_BY_FD (get bpf_prog_info/bpf_map_info with btf_id) | |
487 | | | | |
488 | V | | |
489 | BPF_BTF_GET_FD_BY_ID (get btf_fd) | | |
490 | | | | |
491 | V | | |
492 | BPF_OBJ_GET_INFO_BY_FD (get btf) | | |
493 | | | | |
494 | V V | |
495 | pretty print types, dump func signatures and line info, etc. | |
496 | ||
497 | ||
498 | 3.1 BPF_BTF_LOAD | |
499 | ================ | |
500 | ||
9ab5305d AN |
501 | Load a blob of BTF data into kernel. A blob of data, described in |
502 | :ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd`` | |
503 | is returned to a userspace. | |
ffcf7ce9 YS |
504 | |
505 | 3.2 BPF_MAP_CREATE | |
506 | ================== | |
507 | ||
508 | A map can be created with ``btf_fd`` and specified key/value type id.:: | |
509 | ||
510 | __u32 btf_fd; /* fd pointing to a BTF type data */ | |
511 | __u32 btf_key_type_id; /* BTF type_id of the key */ | |
512 | __u32 btf_value_type_id; /* BTF type_id of the value */ | |
513 | ||
514 | In libbpf, the map can be defined with extra annotation like below: | |
515 | :: | |
516 | ||
517 | struct bpf_map_def SEC("maps") btf_map = { | |
518 | .type = BPF_MAP_TYPE_ARRAY, | |
519 | .key_size = sizeof(int), | |
520 | .value_size = sizeof(struct ipv_counts), | |
521 | .max_entries = 4, | |
522 | }; | |
523 | BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts); | |
524 | ||
9ab5305d AN |
525 | Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and |
526 | value types for the map. During ELF parsing, libbpf is able to extract | |
527 | key/value type_id's and assign them to BPF_MAP_CREATE attributes | |
528 | automatically. | |
ffcf7ce9 YS |
529 | |
530 | .. _BPF_Prog_Load: | |
531 | ||
532 | 3.3 BPF_PROG_LOAD | |
533 | ================= | |
534 | ||
9ab5305d AN |
535 | During prog_load, func_info and line_info can be passed to kernel with proper |
536 | values for the following attributes: | |
ffcf7ce9 YS |
537 | :: |
538 | ||
539 | __u32 insn_cnt; | |
540 | __aligned_u64 insns; | |
541 | ...... | |
542 | __u32 prog_btf_fd; /* fd pointing to BTF type data */ | |
543 | __u32 func_info_rec_size; /* userspace bpf_func_info size */ | |
544 | __aligned_u64 func_info; /* func info */ | |
545 | __u32 func_info_cnt; /* number of bpf_func_info records */ | |
546 | __u32 line_info_rec_size; /* userspace bpf_line_info size */ | |
547 | __aligned_u64 line_info; /* line info */ | |
548 | __u32 line_info_cnt; /* number of bpf_line_info records */ | |
549 | ||
550 | The func_info and line_info are an array of below, respectively.:: | |
551 | ||
552 | struct bpf_func_info { | |
553 | __u32 insn_off; /* [0, insn_cnt - 1] */ | |
554 | __u32 type_id; /* pointing to a BTF_KIND_FUNC type */ | |
555 | }; | |
556 | struct bpf_line_info { | |
557 | __u32 insn_off; /* [0, insn_cnt - 1] */ | |
558 | __u32 file_name_off; /* offset to string table for the filename */ | |
559 | __u32 line_off; /* offset to string table for the source line */ | |
560 | __u32 line_col; /* line number and column number */ | |
561 | }; | |
562 | ||
9ab5305d AN |
563 | func_info_rec_size is the size of each func_info record, and |
564 | line_info_rec_size is the size of each line_info record. Passing the record | |
565 | size to kernel make it possible to extend the record itself in the future. | |
ffcf7ce9 YS |
566 | |
567 | Below are requirements for func_info: | |
568 | * func_info[0].insn_off must be 0. | |
569 | * the func_info insn_off is in strictly increasing order and matches | |
570 | bpf func boundaries. | |
571 | ||
572 | Below are requirements for line_info: | |
5efc529f | 573 | * the first insn in each func must have a line_info record pointing to it. |
ffcf7ce9 YS |
574 | * the line_info insn_off is in strictly increasing order. |
575 | ||
576 | For line_info, the line number and column number are defined as below: | |
577 | :: | |
578 | ||
579 | #define BPF_LINE_INFO_LINE_NUM(line_col) ((line_col) >> 10) | |
580 | #define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff) | |
581 | ||
582 | 3.4 BPF_{PROG,MAP}_GET_NEXT_ID | |
3ef4641f | 583 | ============================== |
ffcf7ce9 | 584 | |
9ab5305d AN |
585 | In kernel, every loaded program, map or btf has a unique id. The id won't |
586 | change during the lifetime of a program, map, or btf. | |
ffcf7ce9 | 587 | |
9ab5305d AN |
588 | The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID returns all id's, one for |
589 | each command, to user space, for bpf program or maps, respectively, so an | |
590 | inspection tool can inspect all programs and maps. | |
ffcf7ce9 YS |
591 | |
592 | 3.5 BPF_{PROG,MAP}_GET_FD_BY_ID | |
3ef4641f | 593 | =============================== |
ffcf7ce9 | 594 | |
5efc529f AN |
595 | An introspection tool cannot use id to get details about program or maps. |
596 | A file descriptor needs to be obtained first for reference-counting purpose. | |
ffcf7ce9 YS |
597 | |
598 | 3.6 BPF_OBJ_GET_INFO_BY_FD | |
599 | ========================== | |
600 | ||
9ab5305d AN |
601 | Once a program/map fd is acquired, an introspection tool can get the detailed |
602 | information from kernel about this fd, some of which are BTF-related. For | |
603 | example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids. | |
604 | ``bpf_prog_info`` returns ``btf_id``, func_info, and line info for translated | |
605 | bpf byte codes, and jited_line_info. | |
ffcf7ce9 YS |
606 | |
607 | 3.7 BPF_BTF_GET_FD_BY_ID | |
608 | ======================== | |
609 | ||
9ab5305d AN |
610 | With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf |
611 | syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with | |
612 | command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally loaded into the | |
613 | kernel with BPF_BTF_LOAD, can be retrieved. | |
ffcf7ce9 | 614 | |
5efc529f | 615 | With the btf blob, ``bpf_map_info``, and ``bpf_prog_info``, an introspection |
9ab5305d AN |
616 | tool has full btf knowledge and is able to pretty print map key/values, dump |
617 | func signatures and line info, along with byte/jit codes. | |
ffcf7ce9 YS |
618 | |
619 | 4. ELF File Format Interface | |
620 | **************************** | |
621 | ||
622 | 4.1 .BTF section | |
623 | ================ | |
624 | ||
9ab5305d AN |
625 | The .BTF section contains type and string data. The format of this section is |
626 | same as the one describe in :ref:`BTF_Type_String`. | |
ffcf7ce9 YS |
627 | |
628 | .. _BTF_Ext_Section: | |
629 | ||
630 | 4.2 .BTF.ext section | |
631 | ==================== | |
632 | ||
9ab5305d AN |
633 | The .BTF.ext section encodes func_info and line_info which needs loader |
634 | manipulation before loading into the kernel. | |
ffcf7ce9 | 635 | |
9ab5305d AN |
636 | The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h`` |
637 | and ``tools/lib/bpf/btf.c``. | |
ffcf7ce9 YS |
638 | |
639 | The current header of .BTF.ext section:: | |
640 | ||
641 | struct btf_ext_header { | |
642 | __u16 magic; | |
643 | __u8 version; | |
644 | __u8 flags; | |
645 | __u32 hdr_len; | |
646 | ||
647 | /* All offsets are in bytes relative to the end of this header */ | |
648 | __u32 func_info_off; | |
649 | __u32 func_info_len; | |
650 | __u32 line_info_off; | |
651 | __u32 line_info_len; | |
652 | }; | |
653 | ||
9ab5305d AN |
654 | It is very similar to .BTF section. Instead of type/string section, it |
655 | contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details | |
656 | about func_info and line_info record format. | |
ffcf7ce9 YS |
657 | |
658 | The func_info is organized as below.:: | |
659 | ||
660 | func_info_rec_size | |
661 | btf_ext_info_sec for section #1 /* func_info for section #1 */ | |
662 | btf_ext_info_sec for section #2 /* func_info for section #2 */ | |
663 | ... | |
664 | ||
9ab5305d AN |
665 | ``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure when |
666 | .BTF.ext is generated. ``btf_ext_info_sec``, defined below, is a collection of | |
667 | func_info for each specific ELF section.:: | |
ffcf7ce9 YS |
668 | |
669 | struct btf_ext_info_sec { | |
670 | __u32 sec_name_off; /* offset to section name */ | |
671 | __u32 num_info; | |
672 | /* Followed by num_info * record_size number of bytes */ | |
673 | __u8 data[0]; | |
674 | }; | |
675 | ||
676 | Here, num_info must be greater than 0. | |
677 | ||
678 | The line_info is organized as below.:: | |
679 | ||
680 | line_info_rec_size | |
681 | btf_ext_info_sec for section #1 /* line_info for section #1 */ | |
682 | btf_ext_info_sec for section #2 /* line_info for section #2 */ | |
683 | ... | |
684 | ||
9ab5305d AN |
685 | ``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure when |
686 | .BTF.ext is generated. | |
ffcf7ce9 YS |
687 | |
688 | The interpretation of ``bpf_func_info->insn_off`` and | |
9ab5305d AN |
689 | ``bpf_line_info->insn_off`` is different between kernel API and ELF API. For |
690 | kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct | |
691 | bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the | |
692 | beginning of section (``btf_ext_info_sec->sec_name_off``). | |
ffcf7ce9 YS |
693 | |
694 | 5. Using BTF | |
695 | ************ | |
696 | ||
697 | 5.1 bpftool map pretty print | |
698 | ============================ | |
699 | ||
9ab5305d AN |
700 | With BTF, the map key/value can be printed based on fields rather than simply |
701 | raw bytes. This is especially valuable for large structure or if your data | |
702 | structure has bitfields. For example, for the following map,:: | |
ffcf7ce9 YS |
703 | |
704 | enum A { A1, A2, A3, A4, A5 }; | |
705 | typedef enum A ___A; | |
706 | struct tmp_t { | |
707 | char a1:4; | |
708 | int a2:4; | |
709 | int :4; | |
710 | __u32 a3:4; | |
711 | int b; | |
712 | ___A b1:4; | |
713 | enum A b2:4; | |
714 | }; | |
715 | struct bpf_map_def SEC("maps") tmpmap = { | |
716 | .type = BPF_MAP_TYPE_ARRAY, | |
717 | .key_size = sizeof(__u32), | |
718 | .value_size = sizeof(struct tmp_t), | |
719 | .max_entries = 1, | |
720 | }; | |
721 | BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t); | |
722 | ||
723 | bpftool is able to pretty print like below: | |
724 | :: | |
725 | ||
726 | [{ | |
727 | "key": 0, | |
728 | "value": { | |
729 | "a1": 0x2, | |
730 | "a2": 0x4, | |
731 | "a3": 0x6, | |
732 | "b": 7, | |
733 | "b1": 0x8, | |
734 | "b2": 0xa | |
735 | } | |
736 | } | |
737 | ] | |
738 | ||
739 | 5.2 bpftool prog dump | |
740 | ===================== | |
741 | ||
9ab5305d AN |
742 | The following is an example showing how func_info and line_info can help prog |
743 | dump with better kernel symbol names, function prototypes and line | |
744 | information.:: | |
ffcf7ce9 YS |
745 | |
746 | $ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv | |
747 | [...] | |
748 | int test_long_fname_2(struct dummy_tracepoint_args * arg): | |
749 | bpf_prog_44a040bf25481309_test_long_fname_2: | |
750 | ; static int test_long_fname_2(struct dummy_tracepoint_args *arg) | |
751 | 0: push %rbp | |
752 | 1: mov %rsp,%rbp | |
753 | 4: sub $0x30,%rsp | |
754 | b: sub $0x28,%rbp | |
755 | f: mov %rbx,0x0(%rbp) | |
756 | 13: mov %r13,0x8(%rbp) | |
757 | 17: mov %r14,0x10(%rbp) | |
758 | 1b: mov %r15,0x18(%rbp) | |
759 | 1f: xor %eax,%eax | |
760 | 21: mov %rax,0x20(%rbp) | |
761 | 25: xor %esi,%esi | |
762 | ; int key = 0; | |
763 | 27: mov %esi,-0x4(%rbp) | |
764 | ; if (!arg->sock) | |
765 | 2a: mov 0x8(%rdi),%rdi | |
766 | ; if (!arg->sock) | |
767 | 2e: cmp $0x0,%rdi | |
768 | 32: je 0x0000000000000070 | |
769 | 34: mov %rbp,%rsi | |
770 | ; counts = bpf_map_lookup_elem(&btf_map, &key); | |
771 | [...] | |
772 | ||
5efc529f | 773 | 5.3 Verifier Log |
ffcf7ce9 YS |
774 | ================ |
775 | ||
9ab5305d AN |
776 | The following is an example of how line_info can help debugging verification |
777 | failure.:: | |
ffcf7ce9 YS |
778 | |
779 | /* The code at tools/testing/selftests/bpf/test_xdp_noinline.c | |
780 | * is modified as below. | |
781 | */ | |
782 | data = (void *)(long)xdp->data; | |
783 | data_end = (void *)(long)xdp->data_end; | |
784 | /* | |
785 | if (data + 4 > data_end) | |
786 | return XDP_DROP; | |
787 | */ | |
788 | *(u32 *)data = dst->dst; | |
789 | ||
790 | $ bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp | |
791 | ; data = (void *)(long)xdp->data; | |
792 | 224: (79) r2 = *(u64 *)(r10 -112) | |
793 | 225: (61) r2 = *(u32 *)(r2 +0) | |
794 | ; *(u32 *)data = dst->dst; | |
795 | 226: (63) *(u32 *)(r2 +0) = r1 | |
796 | invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0) | |
797 | R2 offset is outside of the packet | |
798 | ||
799 | 6. BTF Generation | |
800 | ***************** | |
801 | ||
802 | You need latest pahole | |
803 | ||
804 | https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ | |
805 | ||
9ab5305d AN |
806 | or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't |
807 | support .BTF.ext and btf BTF_KIND_FUNC type yet. For example,:: | |
ffcf7ce9 YS |
808 | |
809 | -bash-4.4$ cat t.c | |
810 | struct t { | |
811 | int a:2; | |
812 | int b:3; | |
813 | int c:2; | |
814 | } g; | |
815 | -bash-4.4$ gcc -c -O2 -g t.c | |
816 | -bash-4.4$ pahole -JV t.o | |
817 | File t.o: | |
818 | [1] STRUCT t kind_flag=1 size=4 vlen=3 | |
819 | a type_id=2 bitfield_size=2 bits_offset=0 | |
820 | b type_id=2 bitfield_size=3 bits_offset=2 | |
821 | c type_id=2 bitfield_size=2 bits_offset=5 | |
822 | [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED | |
823 | ||
9ab5305d AN |
824 | The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target |
825 | only. The assembly code (-S) is able to show the BTF encoding in assembly | |
826 | format.:: | |
ffcf7ce9 YS |
827 | |
828 | -bash-4.4$ cat t2.c | |
829 | typedef int __int32; | |
830 | struct t2 { | |
831 | int a2; | |
832 | int (*f2)(char q1, __int32 q2, ...); | |
833 | int (*f3)(); | |
834 | } g2; | |
835 | int main() { return 0; } | |
836 | int test() { return 0; } | |
837 | -bash-4.4$ clang -c -g -O2 -target bpf t2.c | |
838 | -bash-4.4$ readelf -S t2.o | |
839 | ...... | |
840 | [ 8] .BTF PROGBITS 0000000000000000 00000247 | |
841 | 000000000000016e 0000000000000000 0 0 1 | |
842 | [ 9] .BTF.ext PROGBITS 0000000000000000 000003b5 | |
843 | 0000000000000060 0000000000000000 0 0 1 | |
844 | [10] .rel.BTF.ext REL 0000000000000000 000007e0 | |
845 | 0000000000000040 0000000000000010 16 9 8 | |
846 | ...... | |
847 | -bash-4.4$ clang -S -g -O2 -target bpf t2.c | |
848 | -bash-4.4$ cat t2.s | |
849 | ...... | |
850 | .section .BTF,"",@progbits | |
851 | .short 60319 # 0xeb9f | |
852 | .byte 1 | |
853 | .byte 0 | |
854 | .long 24 | |
855 | .long 0 | |
856 | .long 220 | |
857 | .long 220 | |
858 | .long 122 | |
859 | .long 0 # BTF_KIND_FUNC_PROTO(id = 1) | |
860 | .long 218103808 # 0xd000000 | |
861 | .long 2 | |
862 | .long 83 # BTF_KIND_INT(id = 2) | |
863 | .long 16777216 # 0x1000000 | |
864 | .long 4 | |
865 | .long 16777248 # 0x1000020 | |
866 | ...... | |
867 | .byte 0 # string offset=0 | |
868 | .ascii ".text" # string offset=1 | |
869 | .byte 0 | |
870 | .ascii "/home/yhs/tmp-pahole/t2.c" # string offset=7 | |
871 | .byte 0 | |
872 | .ascii "int main() { return 0; }" # string offset=33 | |
873 | .byte 0 | |
874 | .ascii "int test() { return 0; }" # string offset=58 | |
875 | .byte 0 | |
876 | .ascii "int" # string offset=83 | |
877 | ...... | |
878 | .section .BTF.ext,"",@progbits | |
879 | .short 60319 # 0xeb9f | |
880 | .byte 1 | |
881 | .byte 0 | |
882 | .long 24 | |
883 | .long 0 | |
884 | .long 28 | |
885 | .long 28 | |
886 | .long 44 | |
887 | .long 8 # FuncInfo | |
888 | .long 1 # FuncInfo section string offset=1 | |
889 | .long 2 | |
890 | .long .Lfunc_begin0 | |
891 | .long 3 | |
892 | .long .Lfunc_begin1 | |
893 | .long 5 | |
894 | .long 16 # LineInfo | |
895 | .long 1 # LineInfo section string offset=1 | |
896 | .long 2 | |
897 | .long .Ltmp0 | |
898 | .long 7 | |
899 | .long 33 | |
900 | .long 7182 # Line 7 Col 14 | |
901 | .long .Ltmp3 | |
902 | .long 7 | |
903 | .long 58 | |
904 | .long 8206 # Line 8 Col 14 | |
905 | ||
906 | 7. Testing | |
907 | ********** | |
908 | ||
5efc529f | 909 | Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests. |