Commit | Line | Data |
---|---|---|
2e39748a AS |
1 | BPF extensibility and applicability to networking, tracing, security |
2 | in the linux kernel and several user space implementations of BPF | |
3 | virtual machine led to a number of misunderstanding on what BPF actually is. | |
4 | This short QA is an attempt to address that and outline a direction | |
5 | of where BPF is heading long term. | |
6 | ||
7 | Q: Is BPF a generic instruction set similar to x64 and arm64? | |
8 | A: NO. | |
9 | ||
10 | Q: Is BPF a generic virtual machine ? | |
11 | A: NO. | |
12 | ||
13 | BPF is generic instruction set _with_ C calling convention. | |
14 | ||
15 | Q: Why C calling convention was chosen? | |
16 | A: Because BPF programs are designed to run in the linux kernel | |
17 | which is written in C, hence BPF defines instruction set compatible | |
18 | with two most used architectures x64 and arm64 (and takes into | |
19 | consideration important quirks of other architectures) and | |
20 | defines calling convention that is compatible with C calling | |
21 | convention of the linux kernel on those architectures. | |
22 | ||
23 | Q: can multiple return values be supported in the future? | |
24 | A: NO. BPF allows only register R0 to be used as return value. | |
25 | ||
26 | Q: can more than 5 function arguments be supported in the future? | |
27 | A: NO. BPF calling convention only allows registers R1-R5 to be used | |
28 | as arguments. BPF is not a standalone instruction set. | |
29 | (unlike x64 ISA that allows msft, cdecl and other conventions) | |
30 | ||
31 | Q: can BPF programs access instruction pointer or return address? | |
32 | A: NO. | |
33 | ||
34 | Q: can BPF programs access stack pointer ? | |
35 | A: NO. Only frame pointer (register R10) is accessible. | |
36 | From compiler point of view it's necessary to have stack pointer. | |
37 | For example LLVM defines register R11 as stack pointer in its | |
38 | BPF backend, but it makes sure that generated code never uses it. | |
39 | ||
40 | Q: Does C-calling convention diminishes possible use cases? | |
41 | A: YES. BPF design forces addition of major functionality in the form | |
42 | of kernel helper functions and kernel objects like BPF maps with | |
43 | seamless interoperability between them. It lets kernel call into | |
44 | BPF programs and programs call kernel helpers with zero overhead. | |
45 | As all of them were native C code. That is particularly the case | |
46 | for JITed BPF programs that are indistinguishable from | |
47 | native kernel C code. | |
48 | ||
49 | Q: Does it mean that 'innovative' extensions to BPF code are disallowed? | |
50 | A: Soft yes. At least for now until BPF core has support for | |
51 | bpf-to-bpf calls, indirect calls, loops, global variables, | |
52 | jump tables, read only sections and all other normal constructs | |
53 | that C code can produce. | |
54 | ||
55 | Q: Can loops be supported in a safe way? | |
56 | A: It's not clear yet. BPF developers are trying to find a way to | |
57 | support bounded loops where the verifier can guarantee that | |
58 | the program terminates in less than 4096 instructions. | |
59 | ||
60 | Q: How come LD_ABS and LD_IND instruction are present in BPF whereas | |
61 | C code cannot express them and has to use builtin intrinsics? | |
62 | A: This is artifact of compatibility with classic BPF. Modern | |
63 | networking code in BPF performs better without them. | |
64 | See 'direct packet access'. | |
65 | ||
66 | Q: It seems not all BPF instructions are one-to-one to native CPU. | |
67 | For example why BPF_JNE and other compare and jumps are not cpu-like? | |
68 | A: This was necessary to avoid introducing flags into ISA which are | |
69 | impossible to make generic and efficient across CPU architectures. | |
70 | ||
71 | Q: why BPF_DIV instruction doesn't map to x64 div? | |
72 | A: Because if we picked one-to-one relationship to x64 it would have made | |
73 | it more complicated to support on arm64 and other archs. Also it | |
74 | needs div-by-zero runtime check. | |
75 | ||
76 | Q: why there is no BPF_SDIV for signed divide operation? | |
77 | A: Because it would be rarely used. llvm errors in such case and | |
78 | prints a suggestion to use unsigned divide instead | |
79 | ||
80 | Q: Why BPF has implicit prologue and epilogue? | |
81 | A: Because architectures like sparc have register windows and in general | |
82 | there are enough subtle differences between architectures, so naive | |
83 | store return address into stack won't work. Another reason is BPF has | |
84 | to be safe from division by zero (and legacy exception path | |
85 | of LD_ABS insn). Those instructions need to invoke epilogue and | |
86 | return implicitly. | |
87 | ||
88 | Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? | |
89 | A: Because classic BPF didn't have them and BPF authors felt that compiler | |
90 | workaround would be acceptable. Turned out that programs lose performance | |
91 | due to lack of these compare instructions and they were added. | |
92 | These two instructions is a perfect example what kind of new BPF | |
93 | instructions are acceptable and can be added in the future. | |
94 | These two already had equivalent instructions in native CPUs. | |
95 | New instructions that don't have one-to-one mapping to HW instructions | |
96 | will not be accepted. | |
97 | ||
98 | Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF | |
99 | registers which makes BPF inefficient virtual machine for 32-bit | |
100 | CPU architectures and 32-bit HW accelerators. Can true 32-bit registers | |
101 | be added to BPF in the future? | |
102 | A: NO. The first thing to improve performance on 32-bit archs is to teach | |
103 | LLVM to generate code that uses 32-bit subregisters. Then second step | |
104 | is to teach verifier to mark operations where zero-ing upper bits | |
105 | is unnecessary. Then JITs can take advantage of those markings and | |
106 | drastically reduce size of generated code and improve performance. | |
107 | ||
108 | Q: Does BPF have a stable ABI? | |
109 | A: YES. BPF instructions, arguments to BPF programs, set of helper | |
110 | functions and their arguments, recognized return codes are all part | |
111 | of ABI. However when tracing programs are using bpf_probe_read() helper | |
112 | to walk kernel internal datastructures and compile with kernel | |
113 | internal headers these accesses can and will break with newer | |
114 | kernels. The union bpf_attr -> kern_version is checked at load time | |
115 | to prevent accidentally loading kprobe-based bpf programs written | |
116 | for a different kernel. Networking programs don't do kern_version check. | |
117 | ||
118 | Q: How much stack space a BPF program uses? | |
119 | A: Currently all program types are limited to 512 bytes of stack | |
120 | space, but the verifier computes the actual amount of stack used | |
121 | and both interpreter and most JITed code consume necessary amount. | |
122 | ||
123 | Q: Can BPF be offloaded to HW? | |
124 | A: YES. BPF HW offload is supported by NFP driver. | |
125 | ||
126 | Q: Does classic BPF interpreter still exist? | |
127 | A: NO. Classic BPF programs are converted into extend BPF instructions. | |
128 | ||
129 | Q: Can BPF call arbitrary kernel functions? | |
130 | A: NO. BPF programs can only call a set of helper functions which | |
131 | is defined for every program type. | |
132 | ||
133 | Q: Can BPF overwrite arbitrary kernel memory? | |
134 | A: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read() | |
135 | and bpf_probe_read_str() helpers. Networking programs cannot read | |
136 | arbitrary memory, since they don't have access to these helpers. | |
137 | Programs can never read or write arbitrary memory directly. | |
138 | ||
139 | Q: Can BPF overwrite arbitrary user memory? | |
140 | A: Sort-of. Tracing BPF programs can overwrite the user memory | |
141 | of the current task with bpf_probe_write_user(). Every time such | |
142 | program is loaded the kernel will print warning message, so | |
143 | this helper is only useful for experiments and prototypes. | |
144 | Tracing BPF programs are root only. | |
145 | ||
146 | Q: When bpf_trace_printk() helper is used the kernel prints nasty | |
147 | warning message. Why is that? | |
148 | A: This is done to nudge program authors into better interfaces when | |
149 | programs need to pass data to user space. Like bpf_perf_event_output() | |
150 | can be used to efficiently stream data via perf ring buffer. | |
151 | BPF maps can be used for asynchronous data sharing between kernel | |
152 | and user space. bpf_trace_printk() should only be used for debugging. | |
153 | ||
154 | Q: Can BPF functionality such as new program or map types, new | |
155 | helpers, etc be added out of kernel module code? | |
156 | A: NO. |