Commit | Line | Data |
---|---|---|
4e15f460 YZ |
1 | .. SPDX-License-Identifier: GPL-2.0-only |
2 | .. Copyright (C) 2020 Google LLC. | |
3 | ||
4 | =========================== | |
5 | BPF_MAP_TYPE_CGROUP_STORAGE | |
6 | =========================== | |
7 | ||
8 | The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized | |
9 | storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that | |
10 | attach to cgroups; the programs are made available by the same Kconfig. The | |
11 | storage is identified by the cgroup the program is attached to. | |
12 | ||
13 | The map provide a local storage at the cgroup that the BPF program is attached | |
14 | to. It provides a faster and simpler access than the general purpose hash | |
15 | table, which performs a hash table lookups, and requires user to track live | |
16 | cgroups on their own. | |
17 | ||
18 | This document describes the usage and semantics of the | |
19 | ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in | |
20 | Linux 5.9 and this document will describe the differences. | |
21 | ||
22 | Usage | |
23 | ===== | |
24 | ||
25 | The map uses key of type of either ``__u64 cgroup_inode_id`` or | |
26 | ``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: | |
27 | ||
28 | struct bpf_cgroup_storage_key { | |
29 | __u64 cgroup_inode_id; | |
30 | __u32 attach_type; | |
31 | }; | |
32 | ||
33 | ``cgroup_inode_id`` is the inode id of the cgroup directory. | |
d2bef8e1 | 34 | ``attach_type`` is the program's attach type. |
4e15f460 YZ |
35 | |
36 | Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type. | |
37 | When this key type is used, then all attach types of the particular cgroup and | |
38 | map will share the same storage. Otherwise, if the type is | |
39 | ``struct bpf_cgroup_storage_key``, then programs of different attach types | |
40 | be isolated and see different storages. | |
41 | ||
42 | To access the storage in a program, use ``bpf_get_local_storage``:: | |
43 | ||
44 | void *bpf_get_local_storage(void *map, u64 flags) | |
45 | ||
46 | ``flags`` is reserved for future use and must be 0. | |
47 | ||
48 | There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` | |
49 | can be accessed by multiple programs across different CPUs, and user should | |
50 | take care of synchronization by themselves. The bpf infrastructure provides | |
51 | ``struct bpf_spin_lock`` to synchronize the storage. See | |
52 | ``tools/testing/selftests/bpf/progs/test_spin_lock.c``. | |
53 | ||
54 | Examples | |
55 | ======== | |
56 | ||
57 | Usage with key type as ``struct bpf_cgroup_storage_key``:: | |
58 | ||
59 | #include <bpf/bpf.h> | |
60 | ||
61 | struct { | |
62 | __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); | |
63 | __type(key, struct bpf_cgroup_storage_key); | |
64 | __type(value, __u32); | |
65 | } cgroup_storage SEC(".maps"); | |
66 | ||
67 | int program(struct __sk_buff *skb) | |
68 | { | |
69 | __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); | |
70 | __sync_fetch_and_add(ptr, 1); | |
71 | ||
72 | return 0; | |
73 | } | |
74 | ||
75 | Userspace accessing map declared above:: | |
76 | ||
77 | #include <linux/bpf.h> | |
78 | #include <linux/libbpf.h> | |
79 | ||
80 | __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) | |
81 | { | |
82 | struct bpf_cgroup_storage_key = { | |
83 | .cgroup_inode_id = cgrp, | |
84 | .attach_type = type, | |
85 | }; | |
86 | __u32 value; | |
87 | bpf_map_lookup_elem(bpf_map__fd(map), &key, &value); | |
88 | // error checking omitted | |
89 | return value; | |
90 | } | |
91 | ||
92 | Alternatively, using just ``__u64 cgroup_inode_id`` as key type:: | |
93 | ||
94 | #include <bpf/bpf.h> | |
95 | ||
96 | struct { | |
97 | __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); | |
98 | __type(key, __u64); | |
99 | __type(value, __u32); | |
100 | } cgroup_storage SEC(".maps"); | |
101 | ||
102 | int program(struct __sk_buff *skb) | |
103 | { | |
104 | __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); | |
105 | __sync_fetch_and_add(ptr, 1); | |
106 | ||
107 | return 0; | |
108 | } | |
109 | ||
110 | And userspace:: | |
111 | ||
112 | #include <linux/bpf.h> | |
113 | #include <linux/libbpf.h> | |
114 | ||
115 | __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) | |
116 | { | |
117 | __u32 value; | |
118 | bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value); | |
119 | // error checking omitted | |
120 | return value; | |
121 | } | |
122 | ||
123 | Semantics | |
124 | ========= | |
125 | ||
126 | ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This | |
127 | per-CPU variant will have different memory regions for each CPU for each | |
128 | storage. The non-per-CPU will have the same memory region for each storage. | |
129 | ||
130 | Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and | |
131 | for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded | |
132 | that uses the map. A program may be attached to multiple cgroups or have | |
133 | multiple attach types, and each attach creates a fresh zeroed storage. The | |
134 | storage is freed upon detach. | |
135 | ||
136 | There is a one-to-one association between the map of each type (per-CPU and | |
137 | non-per-CPU) and the BPF program during load verification time. As a result, | |
138 | each map can only be used by one BPF program and each BPF program can only use | |
139 | one storage map of each type. Because of map can only be used by one BPF | |
140 | program, sharing of this cgroup's storage with other BPF programs were | |
141 | impossible. | |
142 | ||
143 | Since Linux 5.9, storage can be shared by multiple programs. When a program is | |
144 | attached to a cgroup, the kernel would create a new storage only if the map | |
145 | does not already contain an entry for the cgroup and attach type pair, or else | |
146 | the old storage is reused for the new attachment. If the map is attach type | |
147 | shared, then attach type is simply ignored during comparison. Storage is freed | |
148 | only when either the map or the cgroup attached to is being freed. Detaching | |
149 | will not directly free the storage, but it may cause the reference to the map | |
150 | to reach zero and indirectly freeing all storage in the map. | |
151 | ||
152 | The map is not associated with any BPF program, thus making sharing possible. | |
153 | However, the BPF program can still only associate with one map of each type | |
154 | (per-CPU and non-per-CPU). A BPF program cannot use more than one | |
155 | ``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one | |
156 | ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``. | |
157 | ||
d2bef8e1 | 158 | In all versions, userspace may use the attach parameters of cgroup and |
4e15f460 YZ |
159 | attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map |
160 | APIs to read or update the storage for a given attachment. For Linux 5.9 | |
161 | attach type shared storages, only the first value in the struct, cgroup inode | |
162 | id, is used during comparison, so userspace may just specify a ``__u64`` | |
163 | directly. | |
164 | ||
165 | The storage is bound at attach time. Even if the program is attached to parent | |
166 | and triggers in child, the storage still belongs to the parent. | |
167 | ||
168 | Userspace cannot create a new entry in the map or delete an existing entry. | |
169 | Program test runs always use a temporary storage. |