[linux-block.git] / Documentation / bpf / map_cgroup_storage.rst

.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2020 Google LLC.

===========================
BPF_MAP_TYPE_CGROUP_STORAGE
===========================

The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
attach to cgroups; the programs are made available by the same Kconfig. The
storage is identified by the cgroup the program is attached to.

The map provide a local storage at the cgroup that the BPF program is attached
to. It provides a faster and simpler access than the general purpose hash
table, which performs a hash table lookups, and requires user to track live
cgroups on their own.

This document describes the usage and semantics of the
``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
Linux 5.9 and this document will describe the differences.

Usage
=====

The map uses key of type of either ``__u64 cgroup_inode_id`` or
``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::

    struct bpf_cgroup_storage_key {
            __u64 cgroup_inode_id;
            __u32 attach_type;
    };

``cgroup_inode_id`` is the inode id of the cgroup directory.
``attach_type`` is the program's attach type.

Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
When this key type is used, then all attach types of the particular cgroup and
map will share the same storage. Otherwise, if the type is
``struct bpf_cgroup_storage_key``, then programs of different attach types
be isolated and see different storages.

To access the storage in a program, use ``bpf_get_local_storage``::

    void *bpf_get_local_storage(void *map, u64 flags)

``flags`` is reserved for future use and must be 0.

There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
can be accessed by multiple programs across different CPUs, and user should
take care of synchronization by themselves. The bpf infrastructure provides
``struct bpf_spin_lock`` to synchronize the storage. See
``tools/testing/selftests/bpf/progs/test_spin_lock.c``.

Examples
========

Usage with key type as ``struct bpf_cgroup_storage_key``::

    #include <bpf/bpf.h>

    struct {
            __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
            __type(key, struct bpf_cgroup_storage_key);
            __type(value, __u32);
    } cgroup_storage SEC(".maps");

    int program(struct __sk_buff *skb)
    {
            __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
            __sync_fetch_and_add(ptr, 1);

            return 0;
    }

Userspace accessing map declared above::

    #include <linux/bpf.h>
    #include <linux/libbpf.h>

    __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
    {
            struct bpf_cgroup_storage_key = {
                    .cgroup_inode_id = cgrp,
                    .attach_type = type,
            };
            __u32 value;
            bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);
            // error checking omitted
            return value;
    }

Alternatively, using just ``__u64 cgroup_inode_id`` as key type::

    #include <bpf/bpf.h>

    struct {
            __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
            __type(key, __u64);
            __type(value, __u32);
    } cgroup_storage SEC(".maps");

    int program(struct __sk_buff *skb)
    {
            __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
            __sync_fetch_and_add(ptr, 1);

            return 0;
    }

And userspace::

    #include <linux/bpf.h>
    #include <linux/libbpf.h>

    __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
    {
            __u32 value;
            bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);
            // error checking omitted
            return value;
    }

Semantics
=========

``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
per-CPU variant will have different memory regions for each CPU for each
storage. The non-per-CPU will have the same memory region for each storage.

Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
that uses the map. A program may be attached to multiple cgroups or have
multiple attach types, and each attach creates a fresh zeroed storage. The
storage is freed upon detach.

There is a one-to-one association between the map of each type (per-CPU and
non-per-CPU) and the BPF program during load verification time. As a result,
each map can only be used by one BPF program and each BPF program can only use
one storage map of each type. Because of map can only be used by one BPF
program, sharing of this cgroup's storage with other BPF programs were
impossible.

Since Linux 5.9, storage can be shared by multiple programs. When a program is
attached to a cgroup, the kernel would create a new storage only if the map
does not already contain an entry for the cgroup and attach type pair, or else
the old storage is reused for the new attachment. If the map is attach type
shared, then attach type is simply ignored during comparison. Storage is freed
only when either the map or the cgroup attached to is being freed. Detaching
will not directly free the storage, but it may cause the reference to the map
to reach zero and indirectly freeing all storage in the map.

The map is not associated with any BPF program, thus making sharing possible.
However, the BPF program can still only associate with one map of each type
(per-CPU and non-per-CPU). A BPF program cannot use more than one
``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.

In all versions, userspace may use the attach parameters of cgroup and
attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
APIs to read or update the storage for a given attachment. For Linux 5.9
attach type shared storages, only the first value in the struct, cgroup inode
id, is used during comparison, so userspace may just specify a ``__u64``
directly.

The storage is bound at attach time. Even if the program is attached to parent
and triggers in child, the storage still belongs to the parent.

Userspace cannot create a new entry in the map or delete an existing entry.
Program test runs always use a temporary storage.
Commit	Line	Data
4e15f460 YZ	1	.. SPDX-License-Identifier: GPL-2.0-only
	2	.. Copyright (C) 2020 Google LLC.
	3
	4	===========================
	5	BPF_MAP_TYPE_CGROUP_STORAGE
	6	===========================
	7
	8	The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
	9	storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
	10	attach to cgroups; the programs are made available by the same Kconfig. The
	11	storage is identified by the cgroup the program is attached to.
	12
	13	The map provide a local storage at the cgroup that the BPF program is attached
	14	to. It provides a faster and simpler access than the general purpose hash
	15	table, which performs a hash table lookups, and requires user to track live
	16	cgroups on their own.
	17
	18	This document describes the usage and semantics of the
	19	``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
	20	Linux 5.9 and this document will describe the differences.
	21
	22	Usage
	23	=====
	24
	25	The map uses key of type of either ``__u64 cgroup_inode_id`` or
	26	``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::
	27
	28	struct bpf_cgroup_storage_key {
	29	__u64 cgroup_inode_id;
	30	__u32 attach_type;
	31	};
	32
	33	``cgroup_inode_id`` is the inode id of the cgroup directory.
d2bef8e1	34	``attach_type`` is the program's attach type.
4e15f460 YZ	35
	36	Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
	37	When this key type is used, then all attach types of the particular cgroup and
	38	map will share the same storage. Otherwise, if the type is
	39	``struct bpf_cgroup_storage_key``, then programs of different attach types
	40	be isolated and see different storages.
	41
	42	To access the storage in a program, use ``bpf_get_local_storage``::
	43
	44	void bpf_get_local_storage(void map, u64 flags)
	45
	46	``flags`` is reserved for future use and must be 0.
	47
	48	There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
	49	can be accessed by multiple programs across different CPUs, and user should
	50	take care of synchronization by themselves. The bpf infrastructure provides
	51	``struct bpf_spin_lock`` to synchronize the storage. See
	52	``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
	53
	54	Examples
	55	========
	56
	57	Usage with key type as ``struct bpf_cgroup_storage_key``::
	58
	59	#include <bpf/bpf.h>
	60
	61	struct {
	62	__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
	63	__type(key, struct bpf_cgroup_storage_key);
	64	__type(value, __u32);
	65	} cgroup_storage SEC(".maps");
	66
	67	int program(struct __sk_buff *skb)
	68	{
	69	__u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
	70	__sync_fetch_and_add(ptr, 1);
	71
	72	return 0;
	73	}
	74
	75	Userspace accessing map declared above::
	76
	77	#include <linux/bpf.h>
	78	#include <linux/libbpf.h>
	79
	80	__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
	81	{
	82	struct bpf_cgroup_storage_key = {
	83	.cgroup_inode_id = cgrp,
	84	.attach_type = type,
	85	};
	86	__u32 value;
	87	bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);
	88	// error checking omitted
	89	return value;
	90	}
	91
	92	Alternatively, using just ``__u64 cgroup_inode_id`` as key type::
	93
	94	#include <bpf/bpf.h>
	95
	96	struct {
	97	__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
	98	__type(key, __u64);
99	__type(value, __u32);
100	} cgroup_storage SEC(".maps");
101
102	int program(struct __sk_buff *skb)
103	{
104	__u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
105	__sync_fetch_and_add(ptr, 1);
106
107	return 0;
108	}
109
110	And userspace::
111
112	#include <linux/bpf.h>
113	#include <linux/libbpf.h>
114
115	__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
116	{
117	__u32 value;
118	bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);
119	// error checking omitted
120	return value;
121	}
122
123	Semantics
124	=========
125
126	``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
127	per-CPU variant will have different memory regions for each CPU for each
128	storage. The non-per-CPU will have the same memory region for each storage.
129
130	Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
131	for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
132	that uses the map. A program may be attached to multiple cgroups or have
133	multiple attach types, and each attach creates a fresh zeroed storage. The
134	storage is freed upon detach.
135
136	There is a one-to-one association between the map of each type (per-CPU and
137	non-per-CPU) and the BPF program during load verification time. As a result,
138	each map can only be used by one BPF program and each BPF program can only use
139	one storage map of each type. Because of map can only be used by one BPF
140	program, sharing of this cgroup's storage with other BPF programs were
141	impossible.
142
143	Since Linux 5.9, storage can be shared by multiple programs. When a program is
144	attached to a cgroup, the kernel would create a new storage only if the map
145	does not already contain an entry for the cgroup and attach type pair, or else
146	the old storage is reused for the new attachment. If the map is attach type
147	shared, then attach type is simply ignored during comparison. Storage is freed
148	only when either the map or the cgroup attached to is being freed. Detaching
149	will not directly free the storage, but it may cause the reference to the map
150	to reach zero and indirectly freeing all storage in the map.
151
152	The map is not associated with any BPF program, thus making sharing possible.
153	However, the BPF program can still only associate with one map of each type
154	(per-CPU and non-per-CPU). A BPF program cannot use more than one
155	``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
156	``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.
157
d2bef8e1	158	In all versions, userspace may use the attach parameters of cgroup and
4e15f460 YZ	159	attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
	160	APIs to read or update the storage for a given attachment. For Linux 5.9
	161	attach type shared storages, only the first value in the struct, cgroup inode
	162	id, is used during comparison, so userspace may just specify a ``__u64``
	163	directly.
	164
	165	The storage is bound at attach time. Even if the program is attached to parent
	166	and triggers in child, the storage still belongs to the parent.
	167
	168	Userspace cannot create a new entry in the map or delete an existing entry.
	169	Program test runs always use a temporary storage.