Commit | Line | Data |
---|---|---|
5a8921ba DT |
1 | .. contents:: |
2 | .. sectnum:: | |
3 | ||
4 | ======================================== | |
5 | eBPF Instruction Set Specification, v1.0 | |
6 | ======================================== | |
7 | ||
8 | This document specifies version 1.0 of the eBPF instruction set. | |
88691e9e | 9 | |
d00d5b82 DT |
10 | Documentation conventions |
11 | ========================= | |
12 | ||
13 | For brevity, this document uses the type notion "u64", "u32", etc. | |
14 | to mean an unsigned integer whose width is the specified number of bits. | |
88691e9e | 15 | |
41db511a CH |
16 | Registers and calling convention |
17 | ================================ | |
18 | ||
19 | eBPF has 10 general purpose registers and a read-only frame pointer register, | |
20 | all of which are 64-bits wide. | |
21 | ||
22 | The eBPF calling convention is defined as: | |
23 | ||
5a8921ba DT |
24 | * R0: return value from function calls, and exit value for eBPF programs |
25 | * R1 - R5: arguments for function calls | |
26 | * R6 - R9: callee saved registers that function calls will preserve | |
27 | * R10: read-only frame pointer to access stack | |
41db511a CH |
28 | |
29 | R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if | |
30 | necessary across calls. | |
88691e9e | 31 | |
62e46838 CH |
32 | Instruction encoding |
33 | ==================== | |
34 | ||
5ca15b8a CH |
35 | eBPF has two instruction encodings: |
36 | ||
5a8921ba | 37 | * the basic instruction encoding, which uses 64 bits to encode an instruction |
a92adde8 DT |
38 | * the wide instruction encoding, which appends a second 64-bit immediate (i.e., |
39 | constant) value after the basic instruction for a total of 128 bits. | |
5ca15b8a | 40 | |
a92adde8 DT |
41 | The basic instruction encoding is as follows, where MSB and LSB mean the most significant |
42 | bits and least significant bits, respectively: | |
62e46838 | 43 | |
a92adde8 DT |
44 | ============= ======= ======= ======= ============ |
45 | 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) | |
46 | ============= ======= ======= ======= ============ | |
47 | imm offset src_reg dst_reg opcode | |
48 | ============= ======= ======= ======= ============ | |
49 | ||
50 | **imm** | |
51 | signed integer immediate value | |
52 | ||
53 | **offset** | |
54 | signed integer offset used with pointer arithmetic | |
55 | ||
56 | **src_reg** | |
57 | the source register number (0-10), except where otherwise specified | |
58 | (`64-bit immediate instructions`_ reuse this field for other purposes) | |
59 | ||
60 | **dst_reg** | |
61 | destination register number (0-10) | |
62 | ||
63 | **opcode** | |
64 | operation to perform | |
62e46838 CH |
65 | |
66 | Note that most instructions do not use all of the fields. | |
67 | Unused fields shall be cleared to zero. | |
68 | ||
a92adde8 DT |
69 | As discussed below in `64-bit immediate instructions`_, a 64-bit immediate |
70 | instruction uses a 64-bit immediate value that is constructed as follows. | |
71 | The 64 bits following the basic instruction contain a pseudo instruction | |
72 | using the same format but with opcode, dst_reg, src_reg, and offset all set to zero, | |
73 | and imm containing the high 32 bits of the immediate value. | |
74 | ||
75 | ================= ================== | |
76 | 64 bits (MSB) 64 bits (LSB) | |
77 | ================= ================== | |
78 | basic instruction pseudo instruction | |
79 | ================= ================== | |
80 | ||
81 | Thus the 64-bit immediate value is constructed as follows: | |
82 | ||
83 | imm64 = (next_imm << 32) | imm | |
84 | ||
85 | where 'next_imm' refers to the imm value of the pseudo instruction | |
86 | following the basic instruction. | |
87 | ||
5e4dd19f | 88 | Instruction classes |
62e46838 | 89 | ------------------- |
88691e9e | 90 | |
5e4dd19f | 91 | The three LSB bits of the 'opcode' field store the instruction class: |
88691e9e | 92 | |
5a8921ba DT |
93 | ========= ===== =============================== =================================== |
94 | class value description reference | |
95 | ========= ===== =============================== =================================== | |
96 | BPF_LD 0x00 non-standard load operations `Load and store instructions`_ | |
97 | BPF_LDX 0x01 load into register operations `Load and store instructions`_ | |
98 | BPF_ST 0x02 store from immediate operations `Load and store instructions`_ | |
99 | BPF_STX 0x03 store from register operations `Load and store instructions`_ | |
100 | BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_ | |
101 | BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_ | |
102 | BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_ | |
103 | BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_ | |
104 | ========= ===== =============================== =================================== | |
88691e9e | 105 | |
5e4dd19f CH |
106 | Arithmetic and jump instructions |
107 | ================================ | |
108 | ||
5a8921ba DT |
109 | For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and |
110 | ``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts: | |
88691e9e | 111 | |
5a8921ba DT |
112 | ============== ====== ================= |
113 | 4 bits (MSB) 1 bit 3 bits (LSB) | |
114 | ============== ====== ================= | |
a92adde8 | 115 | code source instruction class |
5a8921ba | 116 | ============== ====== ================= |
88691e9e | 117 | |
a92adde8 DT |
118 | **code** |
119 | the operation code, whose meaning varies by instruction class | |
88691e9e | 120 | |
a92adde8 DT |
121 | **source** |
122 | the source operand location, which unless otherwise specified is one of: | |
88691e9e | 123 | |
a92adde8 DT |
124 | ====== ===== ============================================== |
125 | source value description | |
126 | ====== ===== ============================================== | |
127 | BPF_K 0x00 use 32-bit 'imm' value as source operand | |
128 | BPF_X 0x08 use 'src_reg' register value as source operand | |
129 | ====== ===== ============================================== | |
88691e9e | 130 | |
a92adde8 DT |
131 | **instruction class** |
132 | the instruction class (see `Instruction classes`_) | |
be3193cd CH |
133 | |
134 | Arithmetic instructions | |
135 | ----------------------- | |
136 | ||
5a8921ba | 137 | ``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for |
be3193cd | 138 | otherwise identical operations. |
a92adde8 DT |
139 | The 'code' field encodes the operation as below, where 'src' and 'dst' refer |
140 | to the values of the source and destination registers, respectively. | |
5a8921ba DT |
141 | |
142 | ======== ===== ========================================================== | |
143 | code value description | |
144 | ======== ===== ========================================================== | |
145 | BPF_ADD 0x00 dst += src | |
146 | BPF_SUB 0x10 dst -= src | |
147 | BPF_MUL 0x20 dst \*= src | |
0eb9d19e | 148 | BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0 |
5a8921ba DT |
149 | BPF_OR 0x40 dst \|= src |
150 | BPF_AND 0x50 dst &= src | |
151 | BPF_LSH 0x60 dst <<= src | |
152 | BPF_RSH 0x70 dst >>= src | |
153 | BPF_NEG 0x80 dst = ~src | |
0eb9d19e | 154 | BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst |
5a8921ba DT |
155 | BPF_XOR 0xa0 dst ^= src |
156 | BPF_MOV 0xb0 dst = src | |
157 | BPF_ARSH 0xc0 sign extending shift right | |
158 | BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below) | |
159 | ======== ===== ========================================================== | |
160 | ||
0eb9d19e DT |
161 | Underflow and overflow are allowed during arithmetic operations, meaning |
162 | the 64-bit or 32-bit value will wrap. If eBPF program execution would | |
163 | result in division by zero, the destination register is instead set to zero. | |
164 | If execution would result in modulo by zero, for ``BPF_ALU64`` the value of | |
165 | the destination register is unchanged whereas for ``BPF_ALU`` the upper | |
166 | 32 bits of the destination register are zeroed. | |
167 | ||
5a8921ba | 168 | ``BPF_ADD | BPF_X | BPF_ALU`` means:: |
be3193cd | 169 | |
a92adde8 | 170 | dst = (u32) ((u32) dst + (u32) src) |
be3193cd | 171 | |
d00d5b82 DT |
172 | where '(u32)' indicates that the upper 32 bits are zeroed. |
173 | ||
5a8921ba | 174 | ``BPF_ADD | BPF_X | BPF_ALU64`` means:: |
be3193cd | 175 | |
a92adde8 | 176 | dst = dst + src |
be3193cd | 177 | |
5a8921ba | 178 | ``BPF_XOR | BPF_K | BPF_ALU`` means:: |
be3193cd | 179 | |
a92adde8 | 180 | dst = (u32) dst ^ (u32) imm32 |
be3193cd | 181 | |
5a8921ba | 182 | ``BPF_XOR | BPF_K | BPF_ALU64`` means:: |
be3193cd | 183 | |
a92adde8 | 184 | dst = dst ^ imm32 |
be3193cd | 185 | |
0eb9d19e DT |
186 | Also note that the division and modulo operations are unsigned. Thus, for |
187 | ``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas | |
188 | for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result | |
189 | interpreted as an unsigned 64-bit value. There are no instructions for | |
190 | signed division or modulo. | |
be3193cd | 191 | |
dd33fb57 | 192 | Byte swap instructions |
5a8921ba | 193 | ~~~~~~~~~~~~~~~~~~~~~~ |
dd33fb57 | 194 | |
492f99e4 | 195 | The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit |
5a8921ba | 196 | 'code' field of ``BPF_END``. |
dd33fb57 | 197 | |
67b97e58 | 198 | The byte swap instructions operate on the destination register |
dd33fb57 CH |
199 | only and do not use a separate source register or immediate value. |
200 | ||
7f77ebbf | 201 | The 1-bit source operand field in the opcode is used to select what byte |
dd33fb57 CH |
202 | order the operation convert from or to: |
203 | ||
5a8921ba DT |
204 | ========= ===== ================================================= |
205 | source value description | |
206 | ========= ===== ================================================= | |
207 | BPF_TO_LE 0x00 convert between host byte order and little endian | |
208 | BPF_TO_BE 0x08 convert between host byte order and big endian | |
209 | ========= ===== ================================================= | |
dd33fb57 | 210 | |
5a8921ba | 211 | The 'imm' field encodes the width of the swap operations. The following widths |
dd33fb57 CH |
212 | are supported: 16, 32 and 64. |
213 | ||
214 | Examples: | |
215 | ||
216 | ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: | |
217 | ||
a92adde8 | 218 | dst = htole16(dst) |
dd33fb57 CH |
219 | |
220 | ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: | |
221 | ||
a92adde8 | 222 | dst = htobe64(dst) |
dd33fb57 | 223 | |
be3193cd CH |
224 | Jump instructions |
225 | ----------------- | |
226 | ||
5a8921ba | 227 | ``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for |
be3193cd | 228 | otherwise identical operations. |
5a8921ba DT |
229 | The 'code' field encodes the operation as below: |
230 | ||
231 | ======== ===== ========================= ============ | |
232 | code value description notes | |
233 | ======== ===== ========================= ============ | |
234 | BPF_JA 0x00 PC += off BPF_JMP only | |
235 | BPF_JEQ 0x10 PC += off if dst == src | |
236 | BPF_JGT 0x20 PC += off if dst > src unsigned | |
237 | BPF_JGE 0x30 PC += off if dst >= src unsigned | |
238 | BPF_JSET 0x40 PC += off if dst & src | |
239 | BPF_JNE 0x50 PC += off if dst != src | |
240 | BPF_JSGT 0x60 PC += off if dst > src signed | |
241 | BPF_JSGE 0x70 PC += off if dst >= src signed | |
242 | BPF_CALL 0x80 function call | |
243 | BPF_EXIT 0x90 function / program return BPF_JMP only | |
244 | BPF_JLT 0xa0 PC += off if dst < src unsigned | |
245 | BPF_JLE 0xb0 PC += off if dst <= src unsigned | |
246 | BPF_JSLT 0xc0 PC += off if dst < src signed | |
247 | BPF_JSLE 0xd0 PC += off if dst <= src signed | |
248 | ======== ===== ========================= ============ | |
41db511a | 249 | |
be3193cd CH |
250 | The eBPF program needs to store the return value into register R0 before doing a |
251 | BPF_EXIT. | |
88691e9e | 252 | |
88691e9e | 253 | |
5e4dd19f CH |
254 | Load and store instructions |
255 | =========================== | |
256 | ||
5a8921ba | 257 | For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the |
5e4dd19f CH |
258 | 8-bit 'opcode' field is divided as: |
259 | ||
5a8921ba DT |
260 | ============ ====== ================= |
261 | 3 bits (MSB) 2 bits 3 bits (LSB) | |
262 | ============ ====== ================= | |
263 | mode size instruction class | |
264 | ============ ====== ================= | |
265 | ||
266 | The mode modifier is one of: | |
267 | ||
268 | ============= ===== ==================================== ============= | |
269 | mode modifier value description reference | |
270 | ============= ===== ==================================== ============= | |
271 | BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_ | |
272 | BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ | |
273 | BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ | |
274 | BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ | |
275 | BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ | |
276 | ============= ===== ==================================== ============= | |
5e4dd19f CH |
277 | |
278 | The size modifier is one of: | |
88691e9e | 279 | |
5e4dd19f CH |
280 | ============= ===== ===================== |
281 | size modifier value description | |
282 | ============= ===== ===================== | |
283 | BPF_W 0x00 word (4 bytes) | |
284 | BPF_H 0x08 half word (2 bytes) | |
285 | BPF_B 0x10 byte | |
286 | BPF_DW 0x18 double word (8 bytes) | |
287 | ============= ===== ===================== | |
88691e9e | 288 | |
63d8c242 CH |
289 | Regular load and store operations |
290 | --------------------------------- | |
291 | ||
292 | The ``BPF_MEM`` mode modifier is used to encode regular load and store | |
293 | instructions that transfer data between a register and memory. | |
294 | ||
295 | ``BPF_MEM | <size> | BPF_STX`` means:: | |
88691e9e | 296 | |
a92adde8 | 297 | *(size *) (dst + offset) = src |
88691e9e | 298 | |
63d8c242 | 299 | ``BPF_MEM | <size> | BPF_ST`` means:: |
88691e9e | 300 | |
a92adde8 | 301 | *(size *) (dst + offset) = imm32 |
5e4dd19f | 302 | |
63d8c242 | 303 | ``BPF_MEM | <size> | BPF_LDX`` means:: |
5e4dd19f | 304 | |
a92adde8 | 305 | dst = *(size *) (src + offset) |
5e4dd19f | 306 | |
63d8c242 | 307 | Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. |
5e4dd19f | 308 | |
5e4dd19f CH |
309 | Atomic operations |
310 | ----------------- | |
88691e9e | 311 | |
594d3234 CH |
312 | Atomic operations are operations that operate on memory and can not be |
313 | interrupted or corrupted by other access to the same memory region | |
314 | by other eBPF programs or means outside of this specification. | |
88691e9e | 315 | |
594d3234 CH |
316 | All atomic operations supported by eBPF are encoded as store operations |
317 | that use the ``BPF_ATOMIC`` mode modifier as follows: | |
88691e9e | 318 | |
5a8921ba DT |
319 | * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations |
320 | * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations | |
321 | * 8-bit and 16-bit wide atomic operations are not supported. | |
88691e9e | 322 | |
5a8921ba | 323 | The 'imm' field is used to encode the actual atomic operation. |
594d3234 | 324 | Simple atomic operation use a subset of the values defined to encode |
5a8921ba | 325 | arithmetic operations in the 'imm' field to encode the atomic operation: |
88691e9e | 326 | |
5a8921ba DT |
327 | ======== ===== =========== |
328 | imm value description | |
329 | ======== ===== =========== | |
330 | BPF_ADD 0x00 atomic add | |
331 | BPF_OR 0x40 atomic or | |
332 | BPF_AND 0x50 atomic and | |
333 | BPF_XOR 0xa0 atomic xor | |
334 | ======== ===== =========== | |
88691e9e | 335 | |
88691e9e | 336 | |
5a8921ba | 337 | ``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means:: |
88691e9e | 338 | |
a92adde8 | 339 | *(u32 *)(dst + offset) += src |
88691e9e | 340 | |
5a8921ba | 341 | ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means:: |
88691e9e | 342 | |
a92adde8 | 343 | *(u64 *)(dst + offset) += src |
88691e9e | 344 | |
594d3234 CH |
345 | In addition to the simple atomic operations, there also is a modifier and |
346 | two complex atomic operations: | |
347 | ||
5a8921ba DT |
348 | =========== ================ =========================== |
349 | imm value description | |
350 | =========== ================ =========================== | |
351 | BPF_FETCH 0x01 modifier: return old value | |
352 | BPF_XCHG 0xe0 | BPF_FETCH atomic exchange | |
353 | BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange | |
354 | =========== ================ =========================== | |
594d3234 CH |
355 | |
356 | The ``BPF_FETCH`` modifier is optional for simple atomic operations, and | |
357 | always set for the complex atomic operations. If the ``BPF_FETCH`` flag | |
a92adde8 | 358 | is set, then the operation also overwrites ``src`` with the value that |
594d3234 CH |
359 | was in memory before it was modified. |
360 | ||
a92adde8 DT |
361 | The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value |
362 | addressed by ``dst + offset``. | |
594d3234 CH |
363 | |
364 | The ``BPF_CMPXCHG`` operation atomically compares the value addressed by | |
a92adde8 DT |
365 | ``dst + offset`` with ``R0``. If they match, the value addressed by |
366 | ``dst + offset`` is replaced with ``src``. In either case, the | |
367 | value that was at ``dst + offset`` before the operation is zero-extended | |
594d3234 | 368 | and loaded back to ``R0``. |
88691e9e | 369 | |
5ca15b8a CH |
370 | 64-bit immediate instructions |
371 | ----------------------------- | |
372 | ||
5a8921ba | 373 | Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction |
5ca15b8a CH |
374 | encoding for an extra imm64 value. |
375 | ||
376 | There is currently only one such instruction. | |
377 | ||
378 | ``BPF_LD | BPF_DW | BPF_IMM`` means:: | |
379 | ||
a92adde8 | 380 | dst = imm64 |
5e4dd19f | 381 | |
63d000c3 | 382 | |
15175336 CH |
383 | Legacy BPF Packet access instructions |
384 | ------------------------------------- | |
63d000c3 | 385 | |
6166da0a DT |
386 | eBPF previously introduced special instructions for access to packet data that were |
387 | carried over from classic BPF. However, these instructions are | |
388 | deprecated and should no longer be used. |