Commit | Line | Data |
---|---|---|
161939ab MT |
1 | .. SPDX-License-Identifier: GPL-2.0-only |
2 | .. Copyright (C) 2022 Red Hat, Inc. | |
3 | ||
4 | =================== | |
5 | BPF_MAP_TYPE_CPUMAP | |
6 | =================== | |
7 | ||
8 | .. note:: | |
9 | - ``BPF_MAP_TYPE_CPUMAP`` was introduced in kernel version 4.15 | |
10 | ||
11 | .. kernel-doc:: kernel/bpf/cpumap.c | |
12 | :doc: cpu map | |
13 | ||
14 | An example use-case for this map type is software based Receive Side Scaling (RSS). | |
15 | ||
16 | The CPUMAP represents the CPUs in the system indexed as the map-key, and the | |
17 | map-value is the config setting (per CPUMAP entry). Each CPUMAP entry has a dedicated | |
18 | kernel thread bound to the given CPU to represent the remote CPU execution unit. | |
19 | ||
20 | Starting from Linux kernel version 5.9 the CPUMAP can run a second XDP program | |
21 | on the remote CPU. This allows an XDP program to split its processing across | |
22 | multiple CPUs. For example, a scenario where the initial CPU (that sees/receives | |
23 | the packets) needs to do minimal packet processing and the remote CPU (to which | |
24 | the packet is directed) can afford to spend more cycles processing the frame. The | |
25 | initial CPU is where the XDP redirect program is executed. The remote CPU | |
26 | receives raw ``xdp_frame`` objects. | |
27 | ||
28 | Usage | |
29 | ===== | |
30 | ||
31 | Kernel BPF | |
32 | ---------- | |
3685b0dc MT |
33 | bpf_redirect_map() |
34 | ^^^^^^^^^^^^^^^^^^ | |
35 | .. code-block:: c | |
36 | ||
161939ab MT |
37 | long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) |
38 | ||
3685b0dc MT |
39 | Redirect the packet to the endpoint referenced by ``map`` at index ``key``. |
40 | For ``BPF_MAP_TYPE_CPUMAP`` this map contains references to CPUs. | |
161939ab | 41 | |
3685b0dc MT |
42 | The lower two bits of ``flags`` are used as the return code if the map lookup |
43 | fails. This is so that the return value can be one of the XDP program return | |
44 | codes up to ``XDP_TX``, as chosen by the caller. | |
161939ab | 45 | |
3685b0dc MT |
46 | User space |
47 | ---------- | |
161939ab MT |
48 | .. note:: |
49 | CPUMAP entries can only be updated/looked up/deleted from user space and not | |
50 | from an eBPF program. Trying to call these functions from a kernel eBPF | |
51 | program will result in the program failing to load and a verifier warning. | |
52 | ||
3685b0dc MT |
53 | bpf_map_update_elem() |
54 | ^^^^^^^^^^^^^^^^^^^^^ | |
55 | .. code-block:: c | |
56 | ||
e662c775 | 57 | int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags); |
161939ab | 58 | |
3685b0dc MT |
59 | CPU entries can be added or updated using the ``bpf_map_update_elem()`` |
60 | helper. This helper replaces existing elements atomically. The ``value`` parameter | |
61 | can be ``struct bpf_cpumap_val``. | |
161939ab MT |
62 | |
63 | .. code-block:: c | |
64 | ||
65 | struct bpf_cpumap_val { | |
66 | __u32 qsize; /* queue size to remote target CPU */ | |
67 | union { | |
68 | int fd; /* prog fd on map write */ | |
69 | __u32 id; /* prog id on map read */ | |
70 | } bpf_prog; | |
71 | }; | |
72 | ||
3685b0dc | 73 | The flags argument can be one of the following: |
161939ab MT |
74 | - BPF_ANY: Create a new element or update an existing element. |
75 | - BPF_NOEXIST: Create a new element only if it did not exist. | |
76 | - BPF_EXIST: Update an existing element. | |
77 | ||
3685b0dc MT |
78 | bpf_map_lookup_elem() |
79 | ^^^^^^^^^^^^^^^^^^^^^ | |
80 | .. code-block:: c | |
81 | ||
161939ab MT |
82 | int bpf_map_lookup_elem(int fd, const void *key, void *value); |
83 | ||
3685b0dc MT |
84 | CPU entries can be retrieved using the ``bpf_map_lookup_elem()`` |
85 | helper. | |
86 | ||
87 | bpf_map_delete_elem() | |
88 | ^^^^^^^^^^^^^^^^^^^^^ | |
89 | .. code-block:: c | |
161939ab | 90 | |
161939ab MT |
91 | int bpf_map_delete_elem(int fd, const void *key); |
92 | ||
3685b0dc MT |
93 | CPU entries can be deleted using the ``bpf_map_delete_elem()`` |
94 | helper. This helper will return 0 on success, or negative error in case of | |
95 | failure. | |
161939ab MT |
96 | |
97 | Examples | |
98 | ======== | |
99 | Kernel | |
100 | ------ | |
101 | ||
102 | The following code snippet shows how to declare a ``BPF_MAP_TYPE_CPUMAP`` called | |
103 | ``cpu_map`` and how to redirect packets to a remote CPU using a round robin scheme. | |
104 | ||
105 | .. code-block:: c | |
106 | ||
107 | struct { | |
108 | __uint(type, BPF_MAP_TYPE_CPUMAP); | |
109 | __type(key, __u32); | |
110 | __type(value, struct bpf_cpumap_val); | |
111 | __uint(max_entries, 12); | |
112 | } cpu_map SEC(".maps"); | |
113 | ||
114 | struct { | |
115 | __uint(type, BPF_MAP_TYPE_ARRAY); | |
116 | __type(key, __u32); | |
117 | __type(value, __u32); | |
118 | __uint(max_entries, 12); | |
119 | } cpus_available SEC(".maps"); | |
120 | ||
121 | struct { | |
122 | __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); | |
123 | __type(key, __u32); | |
124 | __type(value, __u32); | |
125 | __uint(max_entries, 1); | |
126 | } cpus_iterator SEC(".maps"); | |
127 | ||
128 | SEC("xdp") | |
129 | int xdp_redir_cpu_round_robin(struct xdp_md *ctx) | |
130 | { | |
131 | __u32 key = 0; | |
132 | __u32 cpu_dest = 0; | |
133 | __u32 *cpu_selected, *cpu_iterator; | |
134 | __u32 cpu_idx; | |
135 | ||
136 | cpu_iterator = bpf_map_lookup_elem(&cpus_iterator, &key); | |
137 | if (!cpu_iterator) | |
138 | return XDP_ABORTED; | |
139 | cpu_idx = *cpu_iterator; | |
140 | ||
141 | *cpu_iterator += 1; | |
142 | if (*cpu_iterator == bpf_num_possible_cpus()) | |
143 | *cpu_iterator = 0; | |
144 | ||
145 | cpu_selected = bpf_map_lookup_elem(&cpus_available, &cpu_idx); | |
146 | if (!cpu_selected) | |
147 | return XDP_ABORTED; | |
148 | cpu_dest = *cpu_selected; | |
149 | ||
150 | if (cpu_dest >= bpf_num_possible_cpus()) | |
151 | return XDP_ABORTED; | |
152 | ||
153 | return bpf_redirect_map(&cpu_map, cpu_dest, 0); | |
154 | } | |
155 | ||
3685b0dc MT |
156 | User space |
157 | ---------- | |
161939ab MT |
158 | |
159 | The following code snippet shows how to dynamically set the max_entries for a | |
160 | CPUMAP to the max number of cpus available on the system. | |
161 | ||
162 | .. code-block:: c | |
163 | ||
164 | int set_max_cpu_entries(struct bpf_map *cpu_map) | |
165 | { | |
166 | if (bpf_map__set_max_entries(cpu_map, libbpf_num_possible_cpus()) < 0) { | |
167 | fprintf(stderr, "Failed to set max entries for cpu_map map: %s", | |
168 | strerror(errno)); | |
169 | return -1; | |
170 | } | |
171 | return 0; | |
172 | } | |
173 | ||
174 | References | |
175 | =========== | |
176 | ||
177 | - https://developers.redhat.com/blog/2021/05/13/receive-side-scaling-rss-with-ebpf-and-cpumap#redirecting_into_a_cpumap |