bpf, docs: Expand set of initial conformance groups
[linux-block.git] / Documentation / bpf / standardization / instruction-set.rst
CommitLineData
5a8921ba
DT
1.. contents::
2.. sectnum::
3
7d35eb1a
DV
4=======================================
5BPF Instruction Set Specification, v1.0
6=======================================
5a8921ba 7
7d35eb1a 8This document specifies version 1.0 of the BPF instruction set.
88691e9e 9
d00d5b82
DT
10Documentation conventions
11=========================
12
2369e526
WH
13For brevity and consistency, this document refers to families
14of types using a shorthand syntax and refers to several expository,
15mnemonic functions when describing the semantics of instructions.
16The range of valid values for those types and the semantics of those
17functions are defined in the following subsections.
18
19Types
20-----
21This document refers to integer types with the notation `SN` to specify
22a type's signedness (`S`) and bit width (`N`), respectively.
23
24.. table:: Meaning of signedness notation.
25
26 ==== =========
27 `S` Meaning
28 ==== =========
29 `u` unsigned
30 `s` signed
31 ==== =========
32
33.. table:: Meaning of bit-width notation.
34
35 ===== =========
36 `N` Bit width
37 ===== =========
38 `8` 8 bits
39 `16` 16 bits
40 `32` 32 bits
41 `64` 64 bits
42 `128` 128 bits
43 ===== =========
44
45For example, `u32` is a type whose valid values are all the 32-bit unsigned
46numbers and `s16` is a types whose valid values are all the 16-bit signed
47numbers.
48
49Functions
50---------
51* `htobe16`: Takes an unsigned 16-bit number in host-endian format and
52 returns the equivalent number as an unsigned 16-bit number in big-endian
53 format.
54* `htobe32`: Takes an unsigned 32-bit number in host-endian format and
55 returns the equivalent number as an unsigned 32-bit number in big-endian
56 format.
57* `htobe64`: Takes an unsigned 64-bit number in host-endian format and
58 returns the equivalent number as an unsigned 64-bit number in big-endian
59 format.
60* `htole16`: Takes an unsigned 16-bit number in host-endian format and
61 returns the equivalent number as an unsigned 16-bit number in little-endian
62 format.
63* `htole32`: Takes an unsigned 32-bit number in host-endian format and
64 returns the equivalent number as an unsigned 32-bit number in little-endian
65 format.
66* `htole64`: Takes an unsigned 64-bit number in host-endian format and
67 returns the equivalent number as an unsigned 64-bit number in little-endian
68 format.
69* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian
70 format and returns the equivalent number with the same bit width but
71 opposite endianness.
72* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian
73 format and returns the equivalent number with the same bit width but
74 opposite endianness.
75* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian
76 format and returns the equivalent number with the same bit width but
77 opposite endianness.
88691e9e 78
e546a119
WH
79
80Definitions
81-----------
82
83.. glossary::
84
85 Sign Extend
86 To `sign extend an` ``X`` `-bit number, A, to a` ``Y`` `-bit number, B ,` means to
87
88 #. Copy all ``X`` bits from `A` to the lower ``X`` bits of `B`.
89 #. Set the value of the remaining ``Y`` - ``X`` bits of `B` to the value of
90 the most-significant bit of `A`.
91
92.. admonition:: Example
93
94 Sign extend an 8-bit number ``A`` to a 16-bit number ``B`` on a big-endian platform:
95 ::
96
97 A: 10000110
98 B: 11111111 10000110
99
81777efb
DT
100Conformance groups
101------------------
102
103An implementation does not need to support all instructions specified in this
104document (e.g., deprecated instructions). Instead, a number of conformance
2d9a925d 105groups are specified. An implementation must support the base32 conformance
81777efb
DT
106group and may support additional conformance groups, where supporting a
107conformance group means it must support all instructions in that conformance
108group.
109
110The use of named conformance groups enables interoperability between a runtime
111that executes instructions, and tools as such compilers that generate
112instructions for the runtime. Thus, capability discovery in terms of
113conformance groups might be done manually by users or automatically by tools.
114
2d9a925d 115Each conformance group has a short ASCII label (e.g., "base32") that
81777efb
DT
116corresponds to a set of instructions that are mandatory. That is, each
117instruction has one or more conformance groups of which it is a member.
118
2d9a925d
DT
119This document defines the following conformance groups:
120* base32: includes all instructions defined in this
121 specification unless otherwise noted.
122* base64: includes base32, plus instructions explicitly noted
123 as being in the base64 conformance group.
124* atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_).
125* atomic64: includes atomic32, plus 64-bit atomic operation instructions.
126* divmul32: includes 32-bit division, multiplication, and modulo instructions.
127* divmul64: includes divmul32, plus 64-bit division, multiplication,
128 and modulo instructions.
129* legacy: deprecated packet access instructions.
81777efb 130
62e46838
CH
131Instruction encoding
132====================
133
7d35eb1a 134BPF has two instruction encodings:
5ca15b8a 135
5a8921ba 136* the basic instruction encoding, which uses 64 bits to encode an instruction
a92adde8
DT
137* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
138 constant) value after the basic instruction for a total of 128 bits.
5ca15b8a 139
ae256f95
JM
140The fields conforming an encoded basic instruction are stored in the
141following order::
62e46838 142
ae256f95
JM
143 opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF.
144 opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF.
a92adde8
DT
145
146**imm**
147 signed integer immediate value
148
149**offset**
150 signed integer offset used with pointer arithmetic
151
152**src_reg**
153 the source register number (0-10), except where otherwise specified
154 (`64-bit immediate instructions`_ reuse this field for other purposes)
155
156**dst_reg**
157 destination register number (0-10)
158
159**opcode**
160 operation to perform
62e46838 161
ae256f95
JM
162Note that the contents of multi-byte fields ('imm' and 'offset') are
163stored using big-endian byte ordering in big-endian BPF and
164little-endian byte ordering in little-endian BPF.
746ce767 165
ae256f95 166For example::
746ce767 167
ae256f95
JM
168 opcode offset imm assembly
169 src_reg dst_reg
170 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
171 dst_reg src_reg
172 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
746ce767 173
62e46838
CH
174Note that most instructions do not use all of the fields.
175Unused fields shall be cleared to zero.
176
a92adde8 177As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
ced33f2c 178instruction uses two 32-bit immediate values that are constructed as follows.
a92adde8
DT
179The 64 bits following the basic instruction contain a pseudo instruction
180using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
181and imm containing the high 32 bits of the immediate value.
182
ae256f95
JM
183This is depicted in the following figure::
184
185 basic_instruction
e48f0f4a
DT
186 .------------------------------.
187 | |
188 opcode:8 regs:8 offset:16 imm:32 unused:32 imm:32
189 | |
190 '--------------'
191 pseudo instruction
a92adde8 192
ced33f2c
YS
193Here, the imm value of the pseudo instruction is called 'next_imm'. The unused
194bytes in the pseudo instruction are reserved and shall be cleared to zero.
a92adde8 195
5e4dd19f 196Instruction classes
62e46838 197-------------------
88691e9e 198
5e4dd19f 199The three LSB bits of the 'opcode' field store the instruction class:
88691e9e 200
5a8921ba
DT
201========= ===== =============================== ===================================
202class value description reference
203========= ===== =============================== ===================================
204BPF_LD 0x00 non-standard load operations `Load and store instructions`_
205BPF_LDX 0x01 load into register operations `Load and store instructions`_
206BPF_ST 0x02 store from immediate operations `Load and store instructions`_
207BPF_STX 0x03 store from register operations `Load and store instructions`_
208BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_
209BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_
210BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_
211BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_
212========= ===== =============================== ===================================
88691e9e 213
5e4dd19f
CH
214Arithmetic and jump instructions
215================================
216
5a8921ba
DT
217For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
218``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
88691e9e 219
5a8921ba
DT
220============== ====== =================
2214 bits (MSB) 1 bit 3 bits (LSB)
222============== ====== =================
a92adde8 223code source instruction class
5a8921ba 224============== ====== =================
88691e9e 225
a92adde8
DT
226**code**
227 the operation code, whose meaning varies by instruction class
88691e9e 228
a92adde8
DT
229**source**
230 the source operand location, which unless otherwise specified is one of:
88691e9e 231
a92adde8
DT
232 ====== ===== ==============================================
233 source value description
234 ====== ===== ==============================================
235 BPF_K 0x00 use 32-bit 'imm' value as source operand
236 BPF_X 0x08 use 'src_reg' register value as source operand
237 ====== ===== ==============================================
88691e9e 238
a92adde8
DT
239**instruction class**
240 the instruction class (see `Instruction classes`_)
be3193cd
CH
241
242Arithmetic instructions
243-----------------------
244
5a8921ba 245``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
2d9a925d
DT
246otherwise identical operations. ``BPF_ALU64`` instructions belong to the
247base64 conformance group unless noted otherwise.
a92adde8
DT
248The 'code' field encodes the operation as below, where 'src' and 'dst' refer
249to the values of the source and destination registers, respectively.
5a8921ba 250
fb213ecb
YS
251========= ===== ======= ==========================================================
252code value offset description
253========= ===== ======= ==========================================================
254BPF_ADD 0x00 0 dst += src
255BPF_SUB 0x10 0 dst -= src
256BPF_MUL 0x20 0 dst \*= src
257BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0
258BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0
259BPF_OR 0x40 0 dst \|= src
260BPF_AND 0x50 0 dst &= src
261BPF_LSH 0x60 0 dst <<= (src & mask)
262BPF_RSH 0x70 0 dst >>= (src & mask)
263BPF_NEG 0x80 0 dst = -dst
264BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst
265BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst
266BPF_XOR 0xa0 0 dst ^= src
267BPF_MOV 0xb0 0 dst = src
268BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src
e546a119 269BPF_ARSH 0xc0 0 :term:`sign extending<Sign Extend>` dst >>= (src & mask)
fb213ecb
YS
270BPF_END 0xd0 0 byte swap operations (see `Byte swap instructions`_ below)
271========= ===== ======= ==========================================================
5a8921ba 272
0eb9d19e 273Underflow and overflow are allowed during arithmetic operations, meaning
7d35eb1a 274the 64-bit or 32-bit value will wrap. If BPF program execution would
0eb9d19e
DT
275result in division by zero, the destination register is instead set to zero.
276If execution would result in modulo by zero, for ``BPF_ALU64`` the value of
277the destination register is unchanged whereas for ``BPF_ALU`` the upper
27832 bits of the destination register are zeroed.
279
5a8921ba 280``BPF_ADD | BPF_X | BPF_ALU`` means::
be3193cd 281
a92adde8 282 dst = (u32) ((u32) dst + (u32) src)
be3193cd 283
d00d5b82
DT
284where '(u32)' indicates that the upper 32 bits are zeroed.
285
5a8921ba 286``BPF_ADD | BPF_X | BPF_ALU64`` means::
be3193cd 287
a92adde8 288 dst = dst + src
be3193cd 289
5a8921ba 290``BPF_XOR | BPF_K | BPF_ALU`` means::
be3193cd 291
a92adde8 292 dst = (u32) dst ^ (u32) imm32
be3193cd 293
5a8921ba 294``BPF_XOR | BPF_K | BPF_ALU64`` means::
be3193cd 295
a92adde8 296 dst = dst ^ imm32
be3193cd 297
ee932bf9
YS
298Note that most instructions have instruction offset of 0. Only three instructions
299(``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset.
245d4c40 300
2d9a925d
DT
301Division, multiplication, and modulo operations for ``BPF_ALU`` are part
302of the "divmul32" conformance group, and division, multiplication, and
303modulo operations for ``BPF_ALU64`` are part of the "divmul64" conformance
304group.
e546a119 305The division and modulo operations support both unsigned and signed flavors.
245d4c40 306
ee932bf9
YS
307For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``,
308'imm' is interpreted as a 32-bit unsigned value. For ``BPF_ALU64``,
e546a119
WH
309'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
310interpreted as a 64-bit unsigned value.
ee932bf9
YS
311
312For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
313'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
e546a119
WH
314is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
315interpreted as a 64-bit signed value.
ee932bf9 316
0e133a13
DT
317Note that there are varying definitions of the signed modulo operation
318when the dividend or divisor are negative, where implementations often
319vary by language such that Python, Ruby, etc. differ from C, Go, Java,
320etc. This specification requires that signed modulo use truncated division
321(where -13 % 3 == -1) as implemented in C, Go, etc.:
322
323 a % n = a - n * trunc(a / n)
324
ee932bf9 325The ``BPF_MOVSX`` instruction does a move operation with sign extension.
e546a119 326``BPF_ALU | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32
ee932bf9 327bit operands, and zeroes the remaining upper 32 bits.
e546a119 328``BPF_ALU64 | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
20e109ea
DT
329operands into 64 bit operands. Unlike other arithmetic instructions,
330``BPF_MOVSX`` is only defined for register source operands (``BPF_X``).
be3193cd 331
e48f0f4a
DT
332The ``BPF_NEG`` instruction is only defined when the source bit is clear
333(``BPF_K``).
334
8819495a
DT
335Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
336for 32-bit operations.
337
dd33fb57 338Byte swap instructions
ee932bf9 339----------------------
dd33fb57 340
245d4c40
YS
341The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64``
342and a 4-bit 'code' field of ``BPF_END``.
dd33fb57 343
67b97e58 344The byte swap instructions operate on the destination register
dd33fb57
CH
345only and do not use a separate source register or immediate value.
346
ee932bf9
YS
347For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to
348select what byte order the operation converts from or to. For
349``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved
350and must be set to 0.
dd33fb57 351
245d4c40
YS
352========= ========= ===== =================================================
353class source value description
354========= ========= ===== =================================================
355BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little endian
356BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big endian
ee932bf9 357BPF_ALU64 Reserved 0x00 do byte swap unconditionally
245d4c40 358========= ========= ===== =================================================
dd33fb57 359
5a8921ba 360The 'imm' field encodes the width of the swap operations. The following widths
2d9a925d
DT
361are supported: 16, 32 and 64. Width 64 operations belong to the base64
362conformance group and other swap operations belong to the base32
363conformance group.
dd33fb57
CH
364
365Examples:
366
2369e526 367``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
dd33fb57 368
a92adde8 369 dst = htole16(dst)
2369e526
WH
370 dst = htole32(dst)
371 dst = htole64(dst)
dd33fb57 372
2369e526 373``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means::
dd33fb57 374
2369e526
WH
375 dst = htobe16(dst)
376 dst = htobe32(dst)
a92adde8 377 dst = htobe64(dst)
dd33fb57 378
245d4c40
YS
379``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
380
2369e526
WH
381 dst = bswap16(dst)
382 dst = bswap32(dst)
383 dst = bswap64(dst)
245d4c40 384
be3193cd
CH
385Jump instructions
386-----------------
387
2d9a925d
DT
388``BPF_JMP32`` uses 32-bit wide operands and indicates the base32
389conformance group, while ``BPF_JMP`` uses 64-bit wide operands for
390otherwise identical operations, and indicates the base64 conformance
391group unless otherwise specified.
5a8921ba
DT
392The 'code' field encodes the operation as below:
393
e48f0f4a
DT
394======== ===== === =============================== =============================================
395code value src description notes
396======== ===== === =============================== =============================================
397BPF_JA 0x0 0x0 PC += offset BPF_JMP | BPF_K only
398BPF_JA 0x0 0x0 PC += imm BPF_JMP32 | BPF_K only
8cfee110 399BPF_JEQ 0x1 any PC += offset if dst == src
e48f0f4a
DT
400BPF_JGT 0x2 any PC += offset if dst > src unsigned
401BPF_JGE 0x3 any PC += offset if dst >= src unsigned
8cfee110
DT
402BPF_JSET 0x4 any PC += offset if dst & src
403BPF_JNE 0x5 any PC += offset if dst != src
e48f0f4a
DT
404BPF_JSGT 0x6 any PC += offset if dst > src signed
405BPF_JSGE 0x7 any PC += offset if dst >= src signed
406BPF_CALL 0x8 0x0 call helper function by address BPF_JMP | BPF_K only, see `Helper functions`_
407BPF_CALL 0x8 0x1 call PC += imm BPF_JMP | BPF_K only, see `Program-local functions`_
408BPF_CALL 0x8 0x2 call helper function by BTF ID BPF_JMP | BPF_K only, see `Helper functions`_
409BPF_EXIT 0x9 0x0 return BPF_JMP | BPF_K only
410BPF_JLT 0xa any PC += offset if dst < src unsigned
411BPF_JLE 0xb any PC += offset if dst <= src unsigned
412BPF_JSLT 0xc any PC += offset if dst < src signed
413BPF_JSLE 0xd any PC += offset if dst <= src signed
414======== ===== === =============================== =============================================
41db511a 415
7d35eb1a 416The BPF program needs to store the return value into register R0 before doing a
8cfee110 417``BPF_EXIT``.
88691e9e 418
b9fe8e8d
DT
419Example:
420
421``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means::
422
423 if (s32)dst s>= (s32)src goto +offset
424
425where 's>=' indicates a signed '>=' comparison.
426
245d4c40
YS
427``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means::
428
429 gotol +imm
430
431where 'imm' means the branch offset comes from insn 'imm' field.
432
ee932bf9
YS
433Note that there are two flavors of ``BPF_JA`` instructions. The
434``BPF_JMP`` class permits a 16-bit jump offset specified by the 'offset'
435field, whereas the ``BPF_JMP32`` class permits a 32-bit jump offset
436specified by the 'imm' field. A > 16-bit conditional jump may be
437converted to a < 16-bit conditional jump plus a 32-bit unconditional
438jump.
245d4c40 439
2d9a925d
DT
440All ``BPF_CALL`` and ``BPF_JA`` instructions belong to the
441base32 conformance group.
442
c1f9e14e
DT
443Helper functions
444~~~~~~~~~~~~~~~~
445
446Helper functions are a concept whereby BPF programs can call into a
8cfee110
DT
447set of function calls exposed by the underlying platform.
448
449Historically, each helper function was identified by an address
450encoded in the imm field. The available helper functions may differ
451for each program type, but address values are unique across all program types.
452
453Platforms that support the BPF Type Format (BTF) support identifying
454a helper function by a BTF ID encoded in the imm field, where the BTF ID
455identifies the helper name and type.
456
457Program-local functions
458~~~~~~~~~~~~~~~~~~~~~~~
459Program-local functions are functions exposed by the same BPF program as the
460caller, and are referenced by offset from the call instruction, similar to
2d71a90f
WH
461``BPF_JA``. The offset is encoded in the imm field of the call instruction.
462A ``BPF_EXIT`` within the program-local function will return to the caller.
88691e9e 463
5e4dd19f
CH
464Load and store instructions
465===========================
466
5a8921ba 467For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
5e4dd19f
CH
4688-bit 'opcode' field is divided as:
469
5a8921ba
DT
470============ ====== =================
4713 bits (MSB) 2 bits 3 bits (LSB)
472============ ====== =================
473mode size instruction class
474============ ====== =================
475
476The mode modifier is one of:
477
478 ============= ===== ==================================== =============
479 mode modifier value description reference
480 ============= ===== ==================================== =============
481 BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_
482 BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_
483 BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_
484 BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_
245d4c40 485 BPF_MEMSX 0x80 sign-extension load operations `Sign-extension load operations`_
5a8921ba
DT
486 BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_
487 ============= ===== ==================================== =============
5e4dd19f
CH
488
489The size modifier is one of:
88691e9e 490
5e4dd19f
CH
491 ============= ===== =====================
492 size modifier value description
493 ============= ===== =====================
494 BPF_W 0x00 word (4 bytes)
495 BPF_H 0x08 half word (2 bytes)
496 BPF_B 0x10 byte
497 BPF_DW 0x18 double word (8 bytes)
498 ============= ===== =====================
88691e9e 499
2d9a925d
DT
500Instructions using ``BPF_DW`` belong to the base64 conformance group.
501
63d8c242
CH
502Regular load and store operations
503---------------------------------
504
505The ``BPF_MEM`` mode modifier is used to encode regular load and store
506instructions that transfer data between a register and memory.
507
508``BPF_MEM | <size> | BPF_STX`` means::
88691e9e 509
a92adde8 510 *(size *) (dst + offset) = src
88691e9e 511
63d8c242 512``BPF_MEM | <size> | BPF_ST`` means::
88691e9e 513
a92adde8 514 *(size *) (dst + offset) = imm32
5e4dd19f 515
63d8c242 516``BPF_MEM | <size> | BPF_LDX`` means::
5e4dd19f 517
245d4c40
YS
518 dst = *(unsigned size *) (src + offset)
519
520Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and
ee932bf9 521'unsigned size' is one of u8, u16, u32 or u64.
245d4c40 522
fb213ecb
YS
523Sign-extension load operations
524------------------------------
525
e546a119 526The ``BPF_MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load
245d4c40
YS
527instructions that transfer data between a register and memory.
528
529``BPF_MEMSX | <size> | BPF_LDX`` means::
530
531 dst = *(signed size *) (src + offset)
5e4dd19f 532
245d4c40 533Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and
ee932bf9 534'signed size' is one of s8, s16 or s32.
5e4dd19f 535
5e4dd19f
CH
536Atomic operations
537-----------------
88691e9e 538
594d3234
CH
539Atomic operations are operations that operate on memory and can not be
540interrupted or corrupted by other access to the same memory region
7d35eb1a 541by other BPF programs or means outside of this specification.
88691e9e 542
7d35eb1a 543All atomic operations supported by BPF are encoded as store operations
594d3234 544that use the ``BPF_ATOMIC`` mode modifier as follows:
88691e9e 545
2d9a925d
DT
546* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations, which are
547 part of the "atomic32" conformance group.
548* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations, which are
549 part of the "atomic64" conformance group.
5a8921ba 550* 8-bit and 16-bit wide atomic operations are not supported.
88691e9e 551
5a8921ba 552The 'imm' field is used to encode the actual atomic operation.
594d3234 553Simple atomic operation use a subset of the values defined to encode
5a8921ba 554arithmetic operations in the 'imm' field to encode the atomic operation:
88691e9e 555
5a8921ba
DT
556======== ===== ===========
557imm value description
558======== ===== ===========
559BPF_ADD 0x00 atomic add
560BPF_OR 0x40 atomic or
561BPF_AND 0x50 atomic and
562BPF_XOR 0xa0 atomic xor
563======== ===== ===========
88691e9e 564
88691e9e 565
5a8921ba 566``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means::
88691e9e 567
a92adde8 568 *(u32 *)(dst + offset) += src
88691e9e 569
5a8921ba 570``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
88691e9e 571
a92adde8 572 *(u64 *)(dst + offset) += src
88691e9e 573
594d3234
CH
574In addition to the simple atomic operations, there also is a modifier and
575two complex atomic operations:
576
5a8921ba
DT
577=========== ================ ===========================
578imm value description
579=========== ================ ===========================
580BPF_FETCH 0x01 modifier: return old value
581BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
582BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
583=========== ================ ===========================
594d3234
CH
584
585The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
586always set for the complex atomic operations. If the ``BPF_FETCH`` flag
a92adde8 587is set, then the operation also overwrites ``src`` with the value that
594d3234
CH
588was in memory before it was modified.
589
a92adde8
DT
590The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
591addressed by ``dst + offset``.
594d3234
CH
592
593The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
a92adde8
DT
594``dst + offset`` with ``R0``. If they match, the value addressed by
595``dst + offset`` is replaced with ``src``. In either case, the
596value that was at ``dst + offset`` before the operation is zero-extended
594d3234 597and loaded back to ``R0``.
88691e9e 598
5ca15b8a
CH
59964-bit immediate instructions
600-----------------------------
601
5a8921ba 602Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
16b7c970
DT
603encoding defined in `Instruction encoding`_, and use the 'src' field of the
604basic instruction to hold an opcode subtype.
605
606The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions
607with opcode subtypes in the 'src' field, using new terms such as "map"
608defined further below:
609
610========================= ====== === ========================================= =========== ==============
611opcode construction opcode src pseudocode imm type dst type
612========================= ====== === ========================================= =========== ==============
ced33f2c 613BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = (next_imm << 32) | imm integer integer
16b7c970
DT
614BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map
615BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer
616BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer
617BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer
618BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map
619BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer
620========================= ====== === ========================================= =========== ==============
621
622where
623
624* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
625* map_by_idx(imm) means to convert a 32-bit index into an address of a map
626* map_val(map) gets the address of the first value in a given map
627* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
628* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
629* the 'imm type' can be used by disassemblers for display
630* the 'dst type' can be used for verification and JIT compilation purposes
631
632Maps
633~~~~
634
7d35eb1a 635Maps are shared memory regions accessible by BPF programs on some platforms.
16b7c970
DT
636A map can have various semantics as defined in a separate document, and may or
637may not have a single contiguous memory region, but the 'map_val(map)' is
638currently only defined for maps that do have a single contiguous memory region.
639
640Each map can have a file descriptor (fd) if supported by the platform, where
641'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
642BPF program can also be defined to use a set of maps associated with the
643program at load time, and 'map_by_idx(imm)' means to get the map with the given
644index in the set associated with the BPF program containing the instruction.
645
646Platform Variables
647~~~~~~~~~~~~~~~~~~
648
649Platform variables are memory regions, identified by integer ids, exposed by
650the runtime and accessible by BPF programs on some platforms. The
651'var_addr(imm)' operation means to get the address of the memory region
652identified by the given id.
63d000c3 653
15175336
CH
654Legacy BPF Packet access instructions
655-------------------------------------
63d000c3 656
7d35eb1a 657BPF previously introduced special instructions for access to packet data that were
088a464e
DT
658carried over from classic BPF. These instructions used an instruction
659class of BPF_LD, a size modifier of BPF_W, BPF_H, or BPF_B, and a
660mode modifier of BPF_ABS or BPF_IND. However, these instructions are
81777efb 661deprecated and should no longer be used. All legacy packet access
2d9a925d 662instructions belong to the "legacy" conformance group.