[linux-block.git] / Documentation / bpf / instruction-set.rst


====================
eBPF Instruction Set
====================

Registers and calling convention
================================

eBPF has 10 general purpose registers and a read-only frame pointer register,
all of which are 64-bits wide.

The eBPF calling convention is defined as:

 * R0: return value from function calls, and exit value for eBPF programs
 * R1 - R5: arguments for function calls
 * R6 - R9: callee saved registers that function calls will preserve
 * R10: read-only frame pointer to access stack

R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
necessary across calls.

Instruction encoding
====================

eBPF uses 64-bit instructions with the following encoding:

 =============  =======  ===============  ====================  ============
 32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
 =============  =======  ===============  ====================  ============
 immediate      offset   source register  destination register  opcode
 =============  =======  ===============  ====================  ============

Note that most instructions do not use all of the fields.
Unused fields shall be cleared to zero.

Instruction classes
-------------------

The three LSB bits of the 'opcode' field store the instruction class:

  =========  =====  ===============================
  class      value  description
  =========  =====  ===============================
  BPF_LD     0x00   non-standard load operations
  BPF_LDX    0x01   load into register operations
  BPF_ST     0x02   store from immediate operations
  BPF_STX    0x03   store from register operations
  BPF_ALU    0x04   32-bit arithmetic operations
  BPF_JMP    0x05   64-bit jump operations
  BPF_JMP32  0x06   32-bit jump operations
  BPF_ALU64  0x07   64-bit arithmetic operations
  =========  =====  ===============================

Arithmetic and jump instructions
================================

For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:

  ==============  ======  =================
  4 bits (MSB)    1 bit   3 bits (LSB)
  ==============  ======  =================
  operation code  source  instruction class
  ==============  ======  =================

The 4th bit encodes the source operand:

  ======  =====  ========================================
  source  value  description
  ======  =====  ========================================
  BPF_K   0x00   use 32-bit immediate as source operand
  BPF_X   0x08   use 'src_reg' register as source operand
  ======  =====  ========================================

The four MSB bits store the operation code.


Arithmetic instructions
-----------------------

BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
otherwise identical operations.
The code field encodes the operation as below:

  ========  =====  ==========================
  code      value  description
  ========  =====  ==========================
  BPF_ADD   0x00   dst += src
  BPF_SUB   0x10   dst -= src
  BPF_MUL   0x20   dst \*= src
  BPF_DIV   0x30   dst /= src
  BPF_OR    0x40   dst \|= src
  BPF_AND   0x50   dst &= src
  BPF_LSH   0x60   dst <<= src
  BPF_RSH   0x70   dst >>= src
  BPF_NEG   0x80   dst = ~src
  BPF_MOD   0x90   dst %= src
  BPF_XOR   0xa0   dst ^= src
  BPF_MOV   0xb0   dst = src
  BPF_ARSH  0xc0   sign extending shift right
  BPF_END   0xd0   endianness conversion
  ========  =====  ==========================

BPF_ADD | BPF_X | BPF_ALU means::

  dst_reg = (u32) dst_reg + (u32) src_reg;

BPF_ADD | BPF_X | BPF_ALU64 means::

  dst_reg = dst_reg + src_reg

BPF_XOR | BPF_K | BPF_ALU means::

  src_reg = (u32) src_reg ^ (u32) imm32

BPF_XOR | BPF_K | BPF_ALU64 means::

  src_reg = src_reg ^ imm32


Jump instructions
-----------------

BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
otherwise identical operations.
The code field encodes the operation as below:

  ========  =====  =========================  ============
  code      value  description                notes
  ========  =====  =========================  ============
  BPF_JA    0x00   PC += off                  BPF_JMP only
  BPF_JEQ   0x10   PC += off if dst == src
  BPF_JGT   0x20   PC += off if dst > src     unsigned
  BPF_JGE   0x30   PC += off if dst >= src    unsigned
  BPF_JSET  0x40   PC += off if dst & src
  BPF_JNE   0x50   PC += off if dst != src
  BPF_JSGT  0x60   PC += off if dst > src     signed
  BPF_JSGE  0x70   PC += off if dst >= src    signed
  BPF_CALL  0x80   function call
  BPF_EXIT  0x90   function / program return  BPF_JMP only
  BPF_JLT   0xa0   PC += off if dst < src     unsigned
  BPF_JLE   0xb0   PC += off if dst <= src    unsigned
  BPF_JSLT  0xc0   PC += off if dst < src     signed
  BPF_JSLE  0xd0   PC += off if dst <= src    signed
  ========  =====  =========================  ============

The eBPF program needs to store the return value into register R0 before doing a
BPF_EXIT.


Load and store instructions
===========================

For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
8-bit 'opcode' field is divided as:

  ============  ======  =================
  3 bits (MSB)  2 bits  3 bits (LSB)
  ============  ======  =================
  mode          size    instruction class
  ============  ======  =================

The size modifier is one of:

  =============  =====  =====================
  size modifier  value  description
  =============  =====  =====================
  BPF_W          0x00   word        (4 bytes)
  BPF_H          0x08   half word   (2 bytes)
  BPF_B          0x10   byte
  BPF_DW         0x18   double word (8 bytes)
  =============  =====  =====================

The mode modifier is one of:

  =============  =====  ====================================
  mode modifier  value  description
  =============  =====  ====================================
  BPF_IMM        0x00   used for 64-bit mov
  BPF_ABS        0x20   legacy BPF packet access
  BPF_IND        0x40   legacy BPF packet access
  BPF_MEM        0x60   all normal load and store operations
  BPF_ATOMIC     0xc0   atomic operations
  =============  =====  ====================================

BPF_MEM | <size> | BPF_STX means::

  *(size *) (dst_reg + off) = src_reg

BPF_MEM | <size> | BPF_ST means::

  *(size *) (dst_reg + off) = imm32

BPF_MEM | <size> | BPF_LDX means::

  dst_reg = *(size *) (src_reg + off)

Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.

Atomic operations
-----------------

eBPF includes atomic operations, which use the immediate field for extra
encoding::

   .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W  | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
   .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg

The basic atomic operations supported are::

    BPF_ADD
    BPF_AND
    BPF_OR
    BPF_XOR

Each having equivalent semantics with the ``BPF_ADD`` example, that is: the
memory location addresed by ``dst_reg + off`` is atomically modified, with
``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the
immediate, then these operations also overwrite ``src_reg`` with the
value that was in memory before it was modified.

The more special operations are::

    BPF_XCHG

This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
off``. ::

    BPF_CMPXCHG

This atomically compares the value addressed by ``dst_reg + off`` with
``R0``. If they match it is replaced with ``src_reg``. In either case, the
value that was there before is zero-extended and loaded back to ``R0``.

Note that 1 and 2 byte atomic operations are not supported.

Clang can generate atomic instructions by default when ``-mcpu=v3`` is
enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
the atomics features, while keeping a lower ``-mcpu`` version, you can use
``-Xclang -target-feature -Xclang +alu32``.

You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``,
referring to the exclusive-add operation encoded when the immediate field is
zero.

16-byte instructions
--------------------

eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists
of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
instruction that loads 64-bit immediate value into a dst_reg.

Packet access instructions
--------------------------

eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
(BPF_IND | <size> | BPF_LD) which are used to access packet data.

They had to be carried over from classic BPF to have strong performance of
socket filters running in eBPF interpreter. These instructions can only
be used when interpreter context is a pointer to ``struct sk_buff`` and
have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers
and must not be used to store the data across BPF_ABS | BPF_LD or
BPF_IND | BPF_LD instructions.

These instructions have implicit program exit condition as well. When
eBPF program is trying to access the data beyond the packet boundary,
the interpreter will abort the execution of the program. JIT compilers
therefore must preserve this property. src_reg and imm32 fields are
explicit inputs to these instructions.

For example, BPF_IND | BPF_W | BPF_LD means::

  R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))

and R1 - R5 are clobbered.
Commit	Line	Data
	1
	2	====================
	3	eBPF Instruction Set
	4	====================
	5
	6	Registers and calling convention
	7	================================
	8
	9	eBPF has 10 general purpose registers and a read-only frame pointer register,
	10	all of which are 64-bits wide.
	11
	12	The eBPF calling convention is defined as:
	13
	14	* R0: return value from function calls, and exit value for eBPF programs
	15	* R1 - R5: arguments for function calls
	16	* R6 - R9: callee saved registers that function calls will preserve
	17	* R10: read-only frame pointer to access stack
	18
	19	R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
	20	necessary across calls.
	21
	22	Instruction encoding
	23	====================
	24
	25	eBPF uses 64-bit instructions with the following encoding:
	26
	27	============= ======= =============== ==================== ============
	28	32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
	29	============= ======= =============== ==================== ============
	30	immediate offset source register destination register opcode
	31	============= ======= =============== ==================== ============
	32
	33	Note that most instructions do not use all of the fields.
	34	Unused fields shall be cleared to zero.
	35
	36	Instruction classes
	37	-------------------
	38
	39	The three LSB bits of the 'opcode' field store the instruction class:
	40
	41	========= ===== ===============================
	42	class value description
	43	========= ===== ===============================
	44	BPF_LD 0x00 non-standard load operations
	45	BPF_LDX 0x01 load into register operations
	46	BPF_ST 0x02 store from immediate operations
	47	BPF_STX 0x03 store from register operations
	48	BPF_ALU 0x04 32-bit arithmetic operations
	49	BPF_JMP 0x05 64-bit jump operations
	50	BPF_JMP32 0x06 32-bit jump operations
	51	BPF_ALU64 0x07 64-bit arithmetic operations
	52	========= ===== ===============================
	53
	54	Arithmetic and jump instructions
	55	================================
	56
	57	For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
	58	BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
	59
	60	============== ====== =================
	61	4 bits (MSB) 1 bit 3 bits (LSB)
	62	============== ====== =================
	63	operation code source instruction class
	64	============== ====== =================
	65
	66	The 4th bit encodes the source operand:
	67
	68	====== ===== ========================================
	69	source value description
	70	====== ===== ========================================
	71	BPF_K 0x00 use 32-bit immediate as source operand
	72	BPF_X 0x08 use 'src_reg' register as source operand
	73	====== ===== ========================================
	74
	75	The four MSB bits store the operation code.
	76
	77
	78	Arithmetic instructions
	79	-----------------------
	80
	81	BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
	82	otherwise identical operations.
	83	The code field encodes the operation as below:
	84
	85	======== ===== ==========================
	86	code value description
	87	======== ===== ==========================
	88	BPF_ADD 0x00 dst += src
	89	BPF_SUB 0x10 dst -= src
	90	BPF_MUL 0x20 dst \*= src
	91	BPF_DIV 0x30 dst /= src
	92	BPF_OR 0x40 dst \\|= src
	93	BPF_AND 0x50 dst &= src
	94	BPF_LSH 0x60 dst <<= src
	95	BPF_RSH 0x70 dst >>= src
	96	BPF_NEG 0x80 dst = ~src
	97	BPF_MOD 0x90 dst %= src
	98	BPF_XOR 0xa0 dst ^= src
	99	BPF_MOV 0xb0 dst = src
	100	BPF_ARSH 0xc0 sign extending shift right
	101	BPF_END 0xd0 endianness conversion
	102	======== ===== ==========================
	103
	104	BPF_ADD \| BPF_X \| BPF_ALU means::
	105
	106	dst_reg = (u32) dst_reg + (u32) src_reg;
	107
	108	BPF_ADD \| BPF_X \| BPF_ALU64 means::
	109
	110	dst_reg = dst_reg + src_reg
	111
	112	BPF_XOR \| BPF_K \| BPF_ALU means::
	113
	114	src_reg = (u32) src_reg ^ (u32) imm32
	115
	116	BPF_XOR \| BPF_K \| BPF_ALU64 means::
	117
	118	src_reg = src_reg ^ imm32
	119
	120
	121	Jump instructions
	122	-----------------
	123
	124	BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
	125	otherwise identical operations.
	126	The code field encodes the operation as below:
	127
	128	======== ===== ========================= ============
	129	code value description notes
	130	======== ===== ========================= ============
	131	BPF_JA 0x00 PC += off BPF_JMP only
	132	BPF_JEQ 0x10 PC += off if dst == src
	133	BPF_JGT 0x20 PC += off if dst > src unsigned
	134	BPF_JGE 0x30 PC += off if dst >= src unsigned
	135	BPF_JSET 0x40 PC += off if dst & src
	136	BPF_JNE 0x50 PC += off if dst != src
	137	BPF_JSGT 0x60 PC += off if dst > src signed
	138	BPF_JSGE 0x70 PC += off if dst >= src signed
	139	BPF_CALL 0x80 function call
	140	BPF_EXIT 0x90 function / program return BPF_JMP only
	141	BPF_JLT 0xa0 PC += off if dst < src unsigned
	142	BPF_JLE 0xb0 PC += off if dst <= src unsigned
	143	BPF_JSLT 0xc0 PC += off if dst < src signed
	144	BPF_JSLE 0xd0 PC += off if dst <= src signed
	145	======== ===== ========================= ============
	146
	147	The eBPF program needs to store the return value into register R0 before doing a
	148	BPF_EXIT.
	149
	150
	151	Load and store instructions
	152	===========================
	153
	154	For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
	155	8-bit 'opcode' field is divided as:
	156
	157	============ ====== =================
	158	3 bits (MSB) 2 bits 3 bits (LSB)
	159	============ ====== =================
	160	mode size instruction class
	161	============ ====== =================
	162
	163	The size modifier is one of:
	164
	165	============= ===== =====================
	166	size modifier value description
	167	============= ===== =====================
	168	BPF_W 0x00 word (4 bytes)
	169	BPF_H 0x08 half word (2 bytes)
	170	BPF_B 0x10 byte
	171	BPF_DW 0x18 double word (8 bytes)
	172	============= ===== =====================
	173
	174	The mode modifier is one of:
	175
	176	============= ===== ====================================
	177	mode modifier value description
	178	============= ===== ====================================
	179	BPF_IMM 0x00 used for 64-bit mov
	180	BPF_ABS 0x20 legacy BPF packet access
	181	BPF_IND 0x40 legacy BPF packet access
	182	BPF_MEM 0x60 all normal load and store operations
	183	BPF_ATOMIC 0xc0 atomic operations
	184	============= ===== ====================================
	185
	186	BPF_MEM \| <size> \| BPF_STX means::
	187
	188	(size ) (dst_reg + off) = src_reg
	189
	190	BPF_MEM \| <size> \| BPF_ST means::
	191
	192	(size ) (dst_reg + off) = imm32
	193
	194	BPF_MEM \| <size> \| BPF_LDX means::
	195
	196	dst_reg = (size ) (src_reg + off)
	197
	198	Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
	199
	200	Atomic operations
	201	-----------------
	202
	203	eBPF includes atomic operations, which use the immediate field for extra
	204	encoding::
	205
	206	.imm = BPF_ADD, .code = BPF_ATOMIC \| BPF_W \| BPF_STX: lock xadd (u32 )(dst_reg + off16) += src_reg
	207	.imm = BPF_ADD, .code = BPF_ATOMIC \| BPF_DW \| BPF_STX: lock xadd (u64 )(dst_reg + off16) += src_reg
	208
	209	The basic atomic operations supported are::
	210
	211	BPF_ADD
	212	BPF_AND
	213	BPF_OR
	214	BPF_XOR
	215
	216	Each having equivalent semantics with the ``BPF_ADD`` example, that is: the
	217	memory location addresed by ``dst_reg + off`` is atomically modified, with
	218	``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the
	219	immediate, then these operations also overwrite ``src_reg`` with the
	220	value that was in memory before it was modified.
	221
	222	The more special operations are::
	223
	224	BPF_XCHG
	225
	226	This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
	227	off``. ::
	228
	229	BPF_CMPXCHG
	230
	231	This atomically compares the value addressed by ``dst_reg + off`` with
	232	``R0``. If they match it is replaced with ``src_reg``. In either case, the
	233	value that was there before is zero-extended and loaded back to ``R0``.
	234
	235	Note that 1 and 2 byte atomic operations are not supported.
	236
	237	Clang can generate atomic instructions by default when ``-mcpu=v3`` is
	238	enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
	239	Clang can generate is ``BPF_ADD`` without ``BPF_FETCH``. If you need to enable
	240	the atomics features, while keeping a lower ``-mcpu`` version, you can use
	241	``-Xclang -target-feature -Xclang +alu32``.
	242
	243	You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``,
	244	referring to the exclusive-add operation encoded when the immediate field is
	245	zero.
	246
	247	16-byte instructions
	248	--------------------
	249
	250	eBPF has one 16-byte instruction: ``BPF_LD \| BPF_DW \| BPF_IMM`` which consists
	251	of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
	252	instruction that loads 64-bit immediate value into a dst_reg.
	253
	254	Packet access instructions
	255	--------------------------
	256
	257	eBPF has two non-generic instructions: (BPF_ABS \| <size> \| BPF_LD) and
	258	(BPF_IND \| <size> \| BPF_LD) which are used to access packet data.
	259
	260	They had to be carried over from classic BPF to have strong performance of
	261	socket filters running in eBPF interpreter. These instructions can only
	262	be used when interpreter context is a pointer to ``struct sk_buff`` and
	263	have seven implicit operands. Register R6 is an implicit input that must
	264	contain pointer to sk_buff. Register R0 is an implicit output which contains
	265	the data fetched from the packet. Registers R1-R5 are scratch registers
	266	and must not be used to store the data across BPF_ABS \| BPF_LD or
	267	BPF_IND \| BPF_LD instructions.
	268
	269	These instructions have implicit program exit condition as well. When
	270	eBPF program is trying to access the data beyond the packet boundary,
	271	the interpreter will abort the execution of the program. JIT compilers
	272	therefore must preserve this property. src_reg and imm32 fields are
	273	explicit inputs to these instructions.
	274
	275	For example, BPF_IND \| BPF_W \| BPF_LD means::
	276
	277	R0 = ntohl((u32 ) (((struct sk_buff *) R6)->data + src_reg + imm32))
	278
	279	and R1 - R5 are clobbered.