bpf, docs: Update ISA document title
[linux-block.git] / Documentation / bpf / standardization / instruction-set.rst
CommitLineData
5a8921ba
DT
1.. contents::
2.. sectnum::
3
dc8543b5
DT
4======================================
5BPF Instruction Set Architecture (ISA)
6======================================
5a8921ba 7
dc8543b5 8This document specifies the BPF instruction set architecture (ISA).
88691e9e 9
d00d5b82
DT
10Documentation conventions
11=========================
12
2369e526
WH
13For brevity and consistency, this document refers to families
14of types using a shorthand syntax and refers to several expository,
15mnemonic functions when describing the semantics of instructions.
16The range of valid values for those types and the semantics of those
17functions are defined in the following subsections.
18
19Types
20-----
21This document refers to integer types with the notation `SN` to specify
22a type's signedness (`S`) and bit width (`N`), respectively.
23
24.. table:: Meaning of signedness notation.
25
26 ==== =========
27 `S` Meaning
28 ==== =========
29 `u` unsigned
30 `s` signed
31 ==== =========
32
33.. table:: Meaning of bit-width notation.
34
35 ===== =========
36 `N` Bit width
37 ===== =========
38 `8` 8 bits
39 `16` 16 bits
40 `32` 32 bits
41 `64` 64 bits
42 `128` 128 bits
43 ===== =========
44
45For example, `u32` is a type whose valid values are all the 32-bit unsigned
46numbers and `s16` is a types whose valid values are all the 16-bit signed
47numbers.
48
49Functions
50---------
51* `htobe16`: Takes an unsigned 16-bit number in host-endian format and
52 returns the equivalent number as an unsigned 16-bit number in big-endian
53 format.
54* `htobe32`: Takes an unsigned 32-bit number in host-endian format and
55 returns the equivalent number as an unsigned 32-bit number in big-endian
56 format.
57* `htobe64`: Takes an unsigned 64-bit number in host-endian format and
58 returns the equivalent number as an unsigned 64-bit number in big-endian
59 format.
60* `htole16`: Takes an unsigned 16-bit number in host-endian format and
61 returns the equivalent number as an unsigned 16-bit number in little-endian
62 format.
63* `htole32`: Takes an unsigned 32-bit number in host-endian format and
64 returns the equivalent number as an unsigned 32-bit number in little-endian
65 format.
66* `htole64`: Takes an unsigned 64-bit number in host-endian format and
67 returns the equivalent number as an unsigned 64-bit number in little-endian
68 format.
69* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian
70 format and returns the equivalent number with the same bit width but
71 opposite endianness.
72* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian
73 format and returns the equivalent number with the same bit width but
74 opposite endianness.
75* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian
76 format and returns the equivalent number with the same bit width but
77 opposite endianness.
88691e9e 78
e546a119
WH
79
80Definitions
81-----------
82
83.. glossary::
84
85 Sign Extend
86 To `sign extend an` ``X`` `-bit number, A, to a` ``Y`` `-bit number, B ,` means to
87
88 #. Copy all ``X`` bits from `A` to the lower ``X`` bits of `B`.
89 #. Set the value of the remaining ``Y`` - ``X`` bits of `B` to the value of
90 the most-significant bit of `A`.
91
92.. admonition:: Example
93
94 Sign extend an 8-bit number ``A`` to a 16-bit number ``B`` on a big-endian platform:
95 ::
96
97 A: 10000110
98 B: 11111111 10000110
99
81777efb
DT
100Conformance groups
101------------------
102
103An implementation does not need to support all instructions specified in this
104document (e.g., deprecated instructions). Instead, a number of conformance
2d9a925d 105groups are specified. An implementation must support the base32 conformance
81777efb
DT
106group and may support additional conformance groups, where supporting a
107conformance group means it must support all instructions in that conformance
108group.
109
110The use of named conformance groups enables interoperability between a runtime
111that executes instructions, and tools as such compilers that generate
112instructions for the runtime. Thus, capability discovery in terms of
113conformance groups might be done manually by users or automatically by tools.
114
2d9a925d 115Each conformance group has a short ASCII label (e.g., "base32") that
81777efb
DT
116corresponds to a set of instructions that are mandatory. That is, each
117instruction has one or more conformance groups of which it is a member.
118
2d9a925d 119This document defines the following conformance groups:
563918a0 120
2d9a925d
DT
121* base32: includes all instructions defined in this
122 specification unless otherwise noted.
123* base64: includes base32, plus instructions explicitly noted
124 as being in the base64 conformance group.
125* atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_).
126* atomic64: includes atomic32, plus 64-bit atomic operation instructions.
127* divmul32: includes 32-bit division, multiplication, and modulo instructions.
128* divmul64: includes divmul32, plus 64-bit division, multiplication,
129 and modulo instructions.
130* legacy: deprecated packet access instructions.
81777efb 131
62e46838
CH
132Instruction encoding
133====================
134
7d35eb1a 135BPF has two instruction encodings:
5ca15b8a 136
5a8921ba 137* the basic instruction encoding, which uses 64 bits to encode an instruction
a92adde8
DT
138* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
139 constant) value after the basic instruction for a total of 128 bits.
5ca15b8a 140
ae256f95
JM
141The fields conforming an encoded basic instruction are stored in the
142following order::
62e46838 143
ae256f95
JM
144 opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF.
145 opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF.
a92adde8
DT
146
147**imm**
148 signed integer immediate value
149
150**offset**
151 signed integer offset used with pointer arithmetic
152
153**src_reg**
154 the source register number (0-10), except where otherwise specified
155 (`64-bit immediate instructions`_ reuse this field for other purposes)
156
157**dst_reg**
158 destination register number (0-10)
159
160**opcode**
161 operation to perform
62e46838 162
ae256f95
JM
163Note that the contents of multi-byte fields ('imm' and 'offset') are
164stored using big-endian byte ordering in big-endian BPF and
165little-endian byte ordering in little-endian BPF.
746ce767 166
ae256f95 167For example::
746ce767 168
ae256f95
JM
169 opcode offset imm assembly
170 src_reg dst_reg
171 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
172 dst_reg src_reg
173 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
746ce767 174
62e46838
CH
175Note that most instructions do not use all of the fields.
176Unused fields shall be cleared to zero.
177
a92adde8 178As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
ced33f2c 179instruction uses two 32-bit immediate values that are constructed as follows.
a92adde8
DT
180The 64 bits following the basic instruction contain a pseudo instruction
181using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
182and imm containing the high 32 bits of the immediate value.
183
ae256f95
JM
184This is depicted in the following figure::
185
186 basic_instruction
e48f0f4a
DT
187 .------------------------------.
188 | |
189 opcode:8 regs:8 offset:16 imm:32 unused:32 imm:32
190 | |
191 '--------------'
192 pseudo instruction
a92adde8 193
ced33f2c
YS
194Here, the imm value of the pseudo instruction is called 'next_imm'. The unused
195bytes in the pseudo instruction are reserved and shall be cleared to zero.
a92adde8 196
5e4dd19f 197Instruction classes
62e46838 198-------------------
88691e9e 199
5e4dd19f 200The three LSB bits of the 'opcode' field store the instruction class:
88691e9e 201
5a8921ba
DT
202========= ===== =============================== ===================================
203class value description reference
204========= ===== =============================== ===================================
205BPF_LD 0x00 non-standard load operations `Load and store instructions`_
206BPF_LDX 0x01 load into register operations `Load and store instructions`_
207BPF_ST 0x02 store from immediate operations `Load and store instructions`_
208BPF_STX 0x03 store from register operations `Load and store instructions`_
209BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_
210BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_
211BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_
212BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_
213========= ===== =============================== ===================================
88691e9e 214
5e4dd19f
CH
215Arithmetic and jump instructions
216================================
217
5a8921ba
DT
218For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
219``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
88691e9e 220
5a8921ba
DT
221============== ====== =================
2224 bits (MSB) 1 bit 3 bits (LSB)
223============== ====== =================
a92adde8 224code source instruction class
5a8921ba 225============== ====== =================
88691e9e 226
a92adde8
DT
227**code**
228 the operation code, whose meaning varies by instruction class
88691e9e 229
a92adde8
DT
230**source**
231 the source operand location, which unless otherwise specified is one of:
88691e9e 232
a92adde8
DT
233 ====== ===== ==============================================
234 source value description
235 ====== ===== ==============================================
236 BPF_K 0x00 use 32-bit 'imm' value as source operand
237 BPF_X 0x08 use 'src_reg' register value as source operand
238 ====== ===== ==============================================
88691e9e 239
a92adde8
DT
240**instruction class**
241 the instruction class (see `Instruction classes`_)
be3193cd
CH
242
243Arithmetic instructions
244-----------------------
245
5a8921ba 246``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
2d9a925d
DT
247otherwise identical operations. ``BPF_ALU64`` instructions belong to the
248base64 conformance group unless noted otherwise.
a92adde8
DT
249The 'code' field encodes the operation as below, where 'src' and 'dst' refer
250to the values of the source and destination registers, respectively.
5a8921ba 251
fb213ecb
YS
252========= ===== ======= ==========================================================
253code value offset description
254========= ===== ======= ==========================================================
255BPF_ADD 0x00 0 dst += src
256BPF_SUB 0x10 0 dst -= src
257BPF_MUL 0x20 0 dst \*= src
258BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0
259BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0
260BPF_OR 0x40 0 dst \|= src
261BPF_AND 0x50 0 dst &= src
262BPF_LSH 0x60 0 dst <<= (src & mask)
263BPF_RSH 0x70 0 dst >>= (src & mask)
264BPF_NEG 0x80 0 dst = -dst
265BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst
266BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst
267BPF_XOR 0xa0 0 dst ^= src
268BPF_MOV 0xb0 0 dst = src
269BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src
e546a119 270BPF_ARSH 0xc0 0 :term:`sign extending<Sign Extend>` dst >>= (src & mask)
fb213ecb
YS
271BPF_END 0xd0 0 byte swap operations (see `Byte swap instructions`_ below)
272========= ===== ======= ==========================================================
5a8921ba 273
0eb9d19e 274Underflow and overflow are allowed during arithmetic operations, meaning
7d35eb1a 275the 64-bit or 32-bit value will wrap. If BPF program execution would
0eb9d19e
DT
276result in division by zero, the destination register is instead set to zero.
277If execution would result in modulo by zero, for ``BPF_ALU64`` the value of
278the destination register is unchanged whereas for ``BPF_ALU`` the upper
27932 bits of the destination register are zeroed.
280
5a8921ba 281``BPF_ADD | BPF_X | BPF_ALU`` means::
be3193cd 282
a92adde8 283 dst = (u32) ((u32) dst + (u32) src)
be3193cd 284
d00d5b82
DT
285where '(u32)' indicates that the upper 32 bits are zeroed.
286
5a8921ba 287``BPF_ADD | BPF_X | BPF_ALU64`` means::
be3193cd 288
a92adde8 289 dst = dst + src
be3193cd 290
5a8921ba 291``BPF_XOR | BPF_K | BPF_ALU`` means::
be3193cd 292
563918a0 293 dst = (u32) dst ^ (u32) imm
be3193cd 294
5a8921ba 295``BPF_XOR | BPF_K | BPF_ALU64`` means::
be3193cd 296
563918a0 297 dst = dst ^ imm
be3193cd 298
ee932bf9
YS
299Note that most instructions have instruction offset of 0. Only three instructions
300(``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset.
245d4c40 301
2d9a925d
DT
302Division, multiplication, and modulo operations for ``BPF_ALU`` are part
303of the "divmul32" conformance group, and division, multiplication, and
304modulo operations for ``BPF_ALU64`` are part of the "divmul64" conformance
305group.
e546a119 306The division and modulo operations support both unsigned and signed flavors.
245d4c40 307
ee932bf9
YS
308For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``,
309'imm' is interpreted as a 32-bit unsigned value. For ``BPF_ALU64``,
e546a119
WH
310'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
311interpreted as a 64-bit unsigned value.
ee932bf9
YS
312
313For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
314'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
e546a119
WH
315is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
316interpreted as a 64-bit signed value.
ee932bf9 317
0e133a13
DT
318Note that there are varying definitions of the signed modulo operation
319when the dividend or divisor are negative, where implementations often
320vary by language such that Python, Ruby, etc. differ from C, Go, Java,
321etc. This specification requires that signed modulo use truncated division
322(where -13 % 3 == -1) as implemented in C, Go, etc.:
323
324 a % n = a - n * trunc(a / n)
325
ee932bf9 326The ``BPF_MOVSX`` instruction does a move operation with sign extension.
e546a119 327``BPF_ALU | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32
ee932bf9 328bit operands, and zeroes the remaining upper 32 bits.
e546a119 329``BPF_ALU64 | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
20e109ea
DT
330operands into 64 bit operands. Unlike other arithmetic instructions,
331``BPF_MOVSX`` is only defined for register source operands (``BPF_X``).
be3193cd 332
e48f0f4a
DT
333The ``BPF_NEG`` instruction is only defined when the source bit is clear
334(``BPF_K``).
335
8819495a
DT
336Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
337for 32-bit operations.
338
dd33fb57 339Byte swap instructions
ee932bf9 340----------------------
dd33fb57 341
245d4c40
YS
342The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64``
343and a 4-bit 'code' field of ``BPF_END``.
dd33fb57 344
67b97e58 345The byte swap instructions operate on the destination register
dd33fb57
CH
346only and do not use a separate source register or immediate value.
347
ee932bf9
YS
348For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to
349select what byte order the operation converts from or to. For
350``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved
351and must be set to 0.
dd33fb57 352
245d4c40
YS
353========= ========= ===== =================================================
354class source value description
355========= ========= ===== =================================================
356BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little endian
357BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big endian
ee932bf9 358BPF_ALU64 Reserved 0x00 do byte swap unconditionally
245d4c40 359========= ========= ===== =================================================
dd33fb57 360
5a8921ba 361The 'imm' field encodes the width of the swap operations. The following widths
2d9a925d
DT
362are supported: 16, 32 and 64. Width 64 operations belong to the base64
363conformance group and other swap operations belong to the base32
364conformance group.
dd33fb57
CH
365
366Examples:
367
2369e526 368``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
dd33fb57 369
a92adde8 370 dst = htole16(dst)
2369e526
WH
371 dst = htole32(dst)
372 dst = htole64(dst)
dd33fb57 373
2369e526 374``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means::
dd33fb57 375
2369e526
WH
376 dst = htobe16(dst)
377 dst = htobe32(dst)
a92adde8 378 dst = htobe64(dst)
dd33fb57 379
245d4c40
YS
380``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
381
2369e526
WH
382 dst = bswap16(dst)
383 dst = bswap32(dst)
384 dst = bswap64(dst)
245d4c40 385
be3193cd
CH
386Jump instructions
387-----------------
388
2d9a925d
DT
389``BPF_JMP32`` uses 32-bit wide operands and indicates the base32
390conformance group, while ``BPF_JMP`` uses 64-bit wide operands for
391otherwise identical operations, and indicates the base64 conformance
392group unless otherwise specified.
5a8921ba
DT
393The 'code' field encodes the operation as below:
394
e48f0f4a
DT
395======== ===== === =============================== =============================================
396code value src description notes
397======== ===== === =============================== =============================================
398BPF_JA 0x0 0x0 PC += offset BPF_JMP | BPF_K only
399BPF_JA 0x0 0x0 PC += imm BPF_JMP32 | BPF_K only
8cfee110 400BPF_JEQ 0x1 any PC += offset if dst == src
e48f0f4a
DT
401BPF_JGT 0x2 any PC += offset if dst > src unsigned
402BPF_JGE 0x3 any PC += offset if dst >= src unsigned
8cfee110
DT
403BPF_JSET 0x4 any PC += offset if dst & src
404BPF_JNE 0x5 any PC += offset if dst != src
e48f0f4a
DT
405BPF_JSGT 0x6 any PC += offset if dst > src signed
406BPF_JSGE 0x7 any PC += offset if dst >= src signed
407BPF_CALL 0x8 0x0 call helper function by address BPF_JMP | BPF_K only, see `Helper functions`_
408BPF_CALL 0x8 0x1 call PC += imm BPF_JMP | BPF_K only, see `Program-local functions`_
409BPF_CALL 0x8 0x2 call helper function by BTF ID BPF_JMP | BPF_K only, see `Helper functions`_
410BPF_EXIT 0x9 0x0 return BPF_JMP | BPF_K only
411BPF_JLT 0xa any PC += offset if dst < src unsigned
412BPF_JLE 0xb any PC += offset if dst <= src unsigned
413BPF_JSLT 0xc any PC += offset if dst < src signed
414BPF_JSLE 0xd any PC += offset if dst <= src signed
415======== ===== === =============================== =============================================
41db511a 416
7d35eb1a 417The BPF program needs to store the return value into register R0 before doing a
8cfee110 418``BPF_EXIT``.
88691e9e 419
b9fe8e8d
DT
420Example:
421
422``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means::
423
424 if (s32)dst s>= (s32)src goto +offset
425
426where 's>=' indicates a signed '>=' comparison.
427
245d4c40
YS
428``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means::
429
430 gotol +imm
431
432where 'imm' means the branch offset comes from insn 'imm' field.
433
ee932bf9
YS
434Note that there are two flavors of ``BPF_JA`` instructions. The
435``BPF_JMP`` class permits a 16-bit jump offset specified by the 'offset'
436field, whereas the ``BPF_JMP32`` class permits a 32-bit jump offset
437specified by the 'imm' field. A > 16-bit conditional jump may be
438converted to a < 16-bit conditional jump plus a 32-bit unconditional
439jump.
245d4c40 440
2d9a925d
DT
441All ``BPF_CALL`` and ``BPF_JA`` instructions belong to the
442base32 conformance group.
443
c1f9e14e
DT
444Helper functions
445~~~~~~~~~~~~~~~~
446
447Helper functions are a concept whereby BPF programs can call into a
8cfee110
DT
448set of function calls exposed by the underlying platform.
449
450Historically, each helper function was identified by an address
451encoded in the imm field. The available helper functions may differ
452for each program type, but address values are unique across all program types.
453
454Platforms that support the BPF Type Format (BTF) support identifying
455a helper function by a BTF ID encoded in the imm field, where the BTF ID
456identifies the helper name and type.
457
458Program-local functions
459~~~~~~~~~~~~~~~~~~~~~~~
460Program-local functions are functions exposed by the same BPF program as the
461caller, and are referenced by offset from the call instruction, similar to
2d71a90f
WH
462``BPF_JA``. The offset is encoded in the imm field of the call instruction.
463A ``BPF_EXIT`` within the program-local function will return to the caller.
88691e9e 464
5e4dd19f
CH
465Load and store instructions
466===========================
467
5a8921ba 468For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
5e4dd19f
CH
4698-bit 'opcode' field is divided as:
470
5a8921ba
DT
471============ ====== =================
4723 bits (MSB) 2 bits 3 bits (LSB)
473============ ====== =================
474mode size instruction class
475============ ====== =================
476
477The mode modifier is one of:
478
479 ============= ===== ==================================== =============
480 mode modifier value description reference
481 ============= ===== ==================================== =============
482 BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_
483 BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_
484 BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_
485 BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_
245d4c40 486 BPF_MEMSX 0x80 sign-extension load operations `Sign-extension load operations`_
5a8921ba
DT
487 BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_
488 ============= ===== ==================================== =============
5e4dd19f
CH
489
490The size modifier is one of:
88691e9e 491
5e4dd19f
CH
492 ============= ===== =====================
493 size modifier value description
494 ============= ===== =====================
495 BPF_W 0x00 word (4 bytes)
496 BPF_H 0x08 half word (2 bytes)
497 BPF_B 0x10 byte
498 BPF_DW 0x18 double word (8 bytes)
499 ============= ===== =====================
88691e9e 500
2d9a925d
DT
501Instructions using ``BPF_DW`` belong to the base64 conformance group.
502
63d8c242
CH
503Regular load and store operations
504---------------------------------
505
506The ``BPF_MEM`` mode modifier is used to encode regular load and store
507instructions that transfer data between a register and memory.
508
509``BPF_MEM | <size> | BPF_STX`` means::
88691e9e 510
a92adde8 511 *(size *) (dst + offset) = src
88691e9e 512
63d8c242 513``BPF_MEM | <size> | BPF_ST`` means::
88691e9e 514
563918a0 515 *(size *) (dst + offset) = imm
5e4dd19f 516
63d8c242 517``BPF_MEM | <size> | BPF_LDX`` means::
5e4dd19f 518
245d4c40
YS
519 dst = *(unsigned size *) (src + offset)
520
521Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and
ee932bf9 522'unsigned size' is one of u8, u16, u32 or u64.
245d4c40 523
fb213ecb
YS
524Sign-extension load operations
525------------------------------
526
e546a119 527The ``BPF_MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load
245d4c40
YS
528instructions that transfer data between a register and memory.
529
530``BPF_MEMSX | <size> | BPF_LDX`` means::
531
532 dst = *(signed size *) (src + offset)
5e4dd19f 533
245d4c40 534Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and
ee932bf9 535'signed size' is one of s8, s16 or s32.
5e4dd19f 536
5e4dd19f
CH
537Atomic operations
538-----------------
88691e9e 539
594d3234
CH
540Atomic operations are operations that operate on memory and can not be
541interrupted or corrupted by other access to the same memory region
7d35eb1a 542by other BPF programs or means outside of this specification.
88691e9e 543
7d35eb1a 544All atomic operations supported by BPF are encoded as store operations
594d3234 545that use the ``BPF_ATOMIC`` mode modifier as follows:
88691e9e 546
2d9a925d
DT
547* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations, which are
548 part of the "atomic32" conformance group.
549* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations, which are
550 part of the "atomic64" conformance group.
5a8921ba 551* 8-bit and 16-bit wide atomic operations are not supported.
88691e9e 552
5a8921ba 553The 'imm' field is used to encode the actual atomic operation.
594d3234 554Simple atomic operation use a subset of the values defined to encode
5a8921ba 555arithmetic operations in the 'imm' field to encode the atomic operation:
88691e9e 556
5a8921ba
DT
557======== ===== ===========
558imm value description
559======== ===== ===========
560BPF_ADD 0x00 atomic add
561BPF_OR 0x40 atomic or
562BPF_AND 0x50 atomic and
563BPF_XOR 0xa0 atomic xor
564======== ===== ===========
88691e9e 565
88691e9e 566
5a8921ba 567``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means::
88691e9e 568
a92adde8 569 *(u32 *)(dst + offset) += src
88691e9e 570
5a8921ba 571``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
88691e9e 572
a92adde8 573 *(u64 *)(dst + offset) += src
88691e9e 574
594d3234
CH
575In addition to the simple atomic operations, there also is a modifier and
576two complex atomic operations:
577
5a8921ba
DT
578=========== ================ ===========================
579imm value description
580=========== ================ ===========================
581BPF_FETCH 0x01 modifier: return old value
582BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
583BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
584=========== ================ ===========================
594d3234
CH
585
586The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
587always set for the complex atomic operations. If the ``BPF_FETCH`` flag
a92adde8 588is set, then the operation also overwrites ``src`` with the value that
594d3234
CH
589was in memory before it was modified.
590
a92adde8
DT
591The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
592addressed by ``dst + offset``.
594d3234
CH
593
594The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
a92adde8
DT
595``dst + offset`` with ``R0``. If they match, the value addressed by
596``dst + offset`` is replaced with ``src``. In either case, the
597value that was at ``dst + offset`` before the operation is zero-extended
594d3234 598and loaded back to ``R0``.
88691e9e 599
5ca15b8a
CH
60064-bit immediate instructions
601-----------------------------
602
5a8921ba 603Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
16b7c970
DT
604encoding defined in `Instruction encoding`_, and use the 'src' field of the
605basic instruction to hold an opcode subtype.
606
607The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions
608with opcode subtypes in the 'src' field, using new terms such as "map"
609defined further below:
610
611========================= ====== === ========================================= =========== ==============
612opcode construction opcode src pseudocode imm type dst type
613========================= ====== === ========================================= =========== ==============
ced33f2c 614BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = (next_imm << 32) | imm integer integer
16b7c970
DT
615BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map
616BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer
617BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer
618BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer
619BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map
620BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer
621========================= ====== === ========================================= =========== ==============
622
623where
624
625* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
626* map_by_idx(imm) means to convert a 32-bit index into an address of a map
627* map_val(map) gets the address of the first value in a given map
628* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
629* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
630* the 'imm type' can be used by disassemblers for display
631* the 'dst type' can be used for verification and JIT compilation purposes
632
633Maps
634~~~~
635
7d35eb1a 636Maps are shared memory regions accessible by BPF programs on some platforms.
16b7c970
DT
637A map can have various semantics as defined in a separate document, and may or
638may not have a single contiguous memory region, but the 'map_val(map)' is
639currently only defined for maps that do have a single contiguous memory region.
640
641Each map can have a file descriptor (fd) if supported by the platform, where
642'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
643BPF program can also be defined to use a set of maps associated with the
644program at load time, and 'map_by_idx(imm)' means to get the map with the given
645index in the set associated with the BPF program containing the instruction.
646
647Platform Variables
648~~~~~~~~~~~~~~~~~~
649
650Platform variables are memory regions, identified by integer ids, exposed by
651the runtime and accessible by BPF programs on some platforms. The
652'var_addr(imm)' operation means to get the address of the memory region
653identified by the given id.
63d000c3 654
15175336
CH
655Legacy BPF Packet access instructions
656-------------------------------------
63d000c3 657
7d35eb1a 658BPF previously introduced special instructions for access to packet data that were
088a464e
DT
659carried over from classic BPF. These instructions used an instruction
660class of BPF_LD, a size modifier of BPF_W, BPF_H, or BPF_B, and a
661mode modifier of BPF_ABS or BPF_IND. However, these instructions are
81777efb 662deprecated and should no longer be used. All legacy packet access
2d9a925d 663instructions belong to the "legacy" conformance group.