Commit | Line | Data |
---|---|---|
d1e91173 MT |
1 | .. SPDX-License-Identifier: GPL-2.0-only |
2 | .. Copyright (C) 2022 Red Hat, Inc. | |
3 | ||
4 | ================================================= | |
5 | BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH | |
6 | ================================================= | |
7 | ||
8 | .. note:: | |
9 | - ``BPF_MAP_TYPE_DEVMAP`` was introduced in kernel version 4.14 | |
10 | - ``BPF_MAP_TYPE_DEVMAP_HASH`` was introduced in kernel version 5.4 | |
11 | ||
12 | ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` are BPF maps primarily | |
13 | used as backend maps for the XDP BPF helper call ``bpf_redirect_map()``. | |
14 | ``BPF_MAP_TYPE_DEVMAP`` is backed by an array that uses the key as | |
15 | the index to lookup a reference to a net device. While ``BPF_MAP_TYPE_DEVMAP_HASH`` | |
16 | is backed by a hash table that uses a key to lookup a reference to a net device. | |
17 | The user provides either <``key``/ ``ifindex``> or <``key``/ ``struct bpf_devmap_val``> | |
18 | pairs to update the maps with new net devices. | |
19 | ||
20 | .. note:: | |
21 | - The key to a hash map doesn't have to be an ``ifindex``. | |
22 | - While ``BPF_MAP_TYPE_DEVMAP_HASH`` allows for densely packing the net devices | |
23 | it comes at the cost of a hash of the key when performing a look up. | |
24 | ||
25 | The setup and packet enqueue/send code is shared between the two types of | |
26 | devmap; only the lookup and insertion is different. | |
27 | ||
28 | Usage | |
29 | ===== | |
30 | Kernel BPF | |
31 | ---------- | |
c645eee4 MT |
32 | bpf_redirect_map() |
33 | ^^^^^^^^^^^^^^^^^^ | |
34 | .. code-block:: c | |
35 | ||
36 | long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) | |
d1e91173 MT |
37 | |
38 | Redirect the packet to the endpoint referenced by ``map`` at index ``key``. | |
39 | For ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` this map contains | |
40 | references to net devices (for forwarding packets through other ports). | |
41 | ||
42 | The lower two bits of *flags* are used as the return code if the map lookup | |
43 | fails. This is so that the return value can be one of the XDP program return | |
44 | codes up to ``XDP_TX``, as chosen by the caller. The higher bits of ``flags`` | |
45 | can be set to ``BPF_F_BROADCAST`` or ``BPF_F_EXCLUDE_INGRESS`` as defined | |
46 | below. | |
47 | ||
48 | With ``BPF_F_BROADCAST`` the packet will be broadcast to all the interfaces | |
49 | in the map, with ``BPF_F_EXCLUDE_INGRESS`` the ingress interface will be excluded | |
50 | from the broadcast. | |
51 | ||
52 | .. note:: | |
53 | - The key is ignored if BPF_F_BROADCAST is set. | |
54 | - The broadcast feature can also be used to implement multicast forwarding: | |
55 | simply create multiple DEVMAPs, each one corresponding to a single multicast group. | |
56 | ||
57 | This helper will return ``XDP_REDIRECT`` on success, or the value of the two | |
58 | lower bits of the ``flags`` argument if the map lookup fails. | |
59 | ||
60 | More information about redirection can be found :doc:`redirect` | |
61 | ||
c645eee4 MT |
62 | bpf_map_lookup_elem() |
63 | ^^^^^^^^^^^^^^^^^^^^^ | |
64 | .. code-block:: c | |
65 | ||
d1e91173 MT |
66 | void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) |
67 | ||
68 | Net device entries can be retrieved using the ``bpf_map_lookup_elem()`` | |
69 | helper. | |
70 | ||
c645eee4 MT |
71 | User space |
72 | ---------- | |
d1e91173 MT |
73 | .. note:: |
74 | DEVMAP entries can only be updated/deleted from user space and not | |
75 | from an eBPF program. Trying to call these functions from a kernel eBPF | |
76 | program will result in the program failing to load and a verifier warning. | |
77 | ||
c645eee4 MT |
78 | bpf_map_update_elem() |
79 | ^^^^^^^^^^^^^^^^^^^^^ | |
80 | .. code-block:: c | |
81 | ||
d1e91173 MT |
82 | int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags); |
83 | ||
c645eee4 MT |
84 | Net device entries can be added or updated using the ``bpf_map_update_elem()`` |
85 | helper. This helper replaces existing elements atomically. The ``value`` parameter | |
86 | can be ``struct bpf_devmap_val`` or a simple ``int ifindex`` for backwards | |
87 | compatibility. | |
d1e91173 MT |
88 | |
89 | .. code-block:: c | |
90 | ||
91 | struct bpf_devmap_val { | |
92 | __u32 ifindex; /* device index */ | |
93 | union { | |
94 | int fd; /* prog fd on map write */ | |
95 | __u32 id; /* prog id on map read */ | |
96 | } bpf_prog; | |
97 | }; | |
98 | ||
c645eee4 | 99 | The ``flags`` argument can be one of the following: |
d1e91173 MT |
100 | - ``BPF_ANY``: Create a new element or update an existing element. |
101 | - ``BPF_NOEXIST``: Create a new element only if it did not exist. | |
102 | - ``BPF_EXIST``: Update an existing element. | |
103 | ||
c645eee4 MT |
104 | DEVMAPs can associate a program with a device entry by adding a ``bpf_prog.fd`` |
105 | to ``struct bpf_devmap_val``. Programs are run after ``XDP_REDIRECT`` and have | |
106 | access to both Rx device and Tx device. The program associated with the ``fd`` | |
107 | must have type XDP with expected attach type ``xdp_devmap``. | |
108 | When a program is associated with a device index, the program is run on an | |
109 | ``XDP_REDIRECT`` and before the buffer is added to the per-cpu queue. Examples | |
110 | of how to attach/use xdp_devmap progs can be found in the kernel selftests: | |
d1e91173 | 111 | |
c645eee4 MT |
112 | - ``tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c`` |
113 | - ``tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c`` | |
114 | ||
115 | bpf_map_lookup_elem() | |
116 | ^^^^^^^^^^^^^^^^^^^^^ | |
117 | .. code-block:: c | |
d1e91173 MT |
118 | |
119 | .. c:function:: | |
120 | int bpf_map_lookup_elem(int fd, const void *key, void *value); | |
121 | ||
c645eee4 MT |
122 | Net device entries can be retrieved using the ``bpf_map_lookup_elem()`` |
123 | helper. | |
124 | ||
125 | bpf_map_delete_elem() | |
126 | ^^^^^^^^^^^^^^^^^^^^^ | |
127 | .. code-block:: c | |
d1e91173 MT |
128 | |
129 | .. c:function:: | |
130 | int bpf_map_delete_elem(int fd, const void *key); | |
131 | ||
c645eee4 MT |
132 | Net device entries can be deleted using the ``bpf_map_delete_elem()`` |
133 | helper. This helper will return 0 on success, or negative error in case of | |
134 | failure. | |
d1e91173 MT |
135 | |
136 | Examples | |
137 | ======== | |
138 | ||
139 | Kernel BPF | |
140 | ---------- | |
141 | ||
142 | The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP`` | |
143 | called tx_port. | |
144 | ||
145 | .. code-block:: c | |
146 | ||
147 | struct { | |
148 | __uint(type, BPF_MAP_TYPE_DEVMAP); | |
149 | __type(key, __u32); | |
150 | __type(value, __u32); | |
151 | __uint(max_entries, 256); | |
152 | } tx_port SEC(".maps"); | |
153 | ||
154 | The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP_HASH`` | |
155 | called forward_map. | |
156 | ||
157 | .. code-block:: c | |
158 | ||
159 | struct { | |
160 | __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); | |
161 | __type(key, __u32); | |
162 | __type(value, struct bpf_devmap_val); | |
163 | __uint(max_entries, 32); | |
164 | } forward_map SEC(".maps"); | |
165 | ||
166 | .. note:: | |
167 | ||
168 | The value type in the DEVMAP above is a ``struct bpf_devmap_val`` | |
169 | ||
170 | The following code snippet shows a simple xdp_redirect_map program. This program | |
171 | would work with a user space program that populates the devmap ``forward_map`` based | |
172 | on ingress ifindexes. The BPF program (below) is redirecting packets using the | |
173 | ingress ``ifindex`` as the ``key``. | |
174 | ||
175 | .. code-block:: c | |
176 | ||
177 | SEC("xdp") | |
178 | int xdp_redirect_map_func(struct xdp_md *ctx) | |
179 | { | |
180 | int index = ctx->ingress_ifindex; | |
181 | ||
182 | return bpf_redirect_map(&forward_map, index, 0); | |
183 | } | |
184 | ||
185 | The following code snippet shows a BPF program that is broadcasting packets to | |
186 | all the interfaces in the ``tx_port`` devmap. | |
187 | ||
188 | .. code-block:: c | |
189 | ||
190 | SEC("xdp") | |
191 | int xdp_redirect_map_func(struct xdp_md *ctx) | |
192 | { | |
193 | return bpf_redirect_map(&tx_port, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS); | |
194 | } | |
195 | ||
196 | User space | |
197 | ---------- | |
198 | ||
199 | The following code snippet shows how to update a devmap called ``tx_port``. | |
200 | ||
201 | .. code-block:: c | |
202 | ||
203 | int update_devmap(int ifindex, int redirect_ifindex) | |
204 | { | |
205 | int ret; | |
206 | ||
207 | ret = bpf_map_update_elem(bpf_map__fd(tx_port), &ifindex, &redirect_ifindex, 0); | |
208 | if (ret < 0) { | |
209 | fprintf(stderr, "Failed to update devmap_ value: %s\n", | |
210 | strerror(errno)); | |
211 | } | |
212 | ||
213 | return ret; | |
214 | } | |
215 | ||
216 | The following code snippet shows how to update a hash_devmap called ``forward_map``. | |
217 | ||
218 | .. code-block:: c | |
219 | ||
220 | int update_devmap(int ifindex, int redirect_ifindex) | |
221 | { | |
222 | struct bpf_devmap_val devmap_val = { .ifindex = redirect_ifindex }; | |
223 | int ret; | |
224 | ||
225 | ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0); | |
226 | if (ret < 0) { | |
227 | fprintf(stderr, "Failed to update devmap_ value: %s\n", | |
228 | strerror(errno)); | |
229 | } | |
230 | return ret; | |
231 | } | |
232 | ||
233 | References | |
234 | =========== | |
235 | ||
236 | - https://lwn.net/Articles/728146/ | |
237 | - https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6f9d451ab1a33728adb72d7ff66a7b374d665176 | |
238 | - https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4106 |