Commit | Line | Data |
---|---|---|
0c51b369 SF |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ============================ | |
4 | BPF_PROG_TYPE_CGROUP_SOCKOPT | |
5 | ============================ | |
6 | ||
7 | ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two | |
8 | cgroup hooks: | |
9 | ||
10 | * ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt`` | |
11 | system call. | |
12 | * ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt`` | |
13 | system call. | |
14 | ||
15 | The context (``struct bpf_sockopt``) has associated socket (``sk``) and | |
16 | all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``. | |
17 | ||
18 | BPF_CGROUP_SETSOCKOPT | |
19 | ===================== | |
20 | ||
21 | ``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of | |
22 | sockopt and it has writable context: it can modify the supplied arguments | |
23 | before passing them down to the kernel. This hook has access to the cgroup | |
24 | and socket local storage. | |
25 | ||
26 | If BPF program sets ``optlen`` to -1, the control will be returned | |
27 | back to the userspace after all other BPF programs in the cgroup | |
28 | chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed). | |
29 | ||
30 | Note, that ``optlen`` can not be increased beyond the user-supplied | |
31 | value. It can only be decreased or set to -1. Any other value will | |
32 | trigger ``EFAULT``. | |
33 | ||
34 | Return Type | |
35 | ----------- | |
36 | ||
37 | * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace. | |
38 | * ``1`` - success, continue with next BPF program in the cgroup chain. | |
39 | ||
40 | BPF_CGROUP_GETSOCKOPT | |
41 | ===================== | |
42 | ||
43 | ``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of | |
44 | sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval`` | |
45 | if it's interested in whatever kernel has returned. BPF hook can override | |
46 | the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen`` | |
47 | has been increased above initial ``getsockopt`` value (i.e. userspace | |
48 | buffer is too small), ``EFAULT`` is returned. | |
49 | ||
50 | This hook has access to the cgroup and socket local storage. | |
51 | ||
52 | Note, that the only acceptable value to set to ``retval`` is 0 and the | |
53 | original value that the kernel returned. Any other value will trigger | |
54 | ``EFAULT``. | |
55 | ||
56 | Return Type | |
57 | ----------- | |
58 | ||
59 | * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace. | |
60 | * ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return | |
61 | ``retval`` from the syscall (note that this can be overwritten by | |
62 | the BPF program from the parent cgroup). | |
63 | ||
64 | Cgroup Inheritance | |
65 | ================== | |
66 | ||
67 | Suppose, there is the following cgroup hierarchy where each cgroup | |
68 | has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with | |
69 | ``BPF_F_ALLOW_MULTI`` flag:: | |
70 | ||
71 | A (root, parent) | |
72 | \ | |
73 | B (child) | |
74 | ||
75 | When the application calls ``getsockopt`` syscall from the cgroup B, | |
76 | the programs are executed from the bottom up: B, A. First program | |
77 | (B) sees the result of kernel's ``getsockopt``. It can optionally | |
78 | adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that | |
79 | control will be passed to the second (A) program which will see the | |
80 | same context as B including any potential modifications. | |
81 | ||
82 | Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to | |
83 | A and B, the trigger order is B, then A. If B does any changes | |
84 | to the input arguments (``level``, ``optname``, ``optval``, ``optlen``), | |
85 | then the next program in the chain (A) will see those changes, | |
86 | *not* the original input ``setsockopt`` arguments. The potentially | |
87 | modified values will be then passed down to the kernel. | |
88 | ||
8030e250 SF |
89 | Large optval |
90 | ============ | |
91 | When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program | |
92 | can access only the first ``PAGE_SIZE`` of that data. So it has to options: | |
93 | ||
94 | * Set ``optlen`` to zero, which indicates that the kernel should | |
95 | use the original buffer from the userspace. Any modifications | |
96 | done by the BPF program to the ``optval`` are ignored. | |
97 | * Set ``optlen`` to the value less than ``PAGE_SIZE``, which | |
98 | indicates that the kernel should use BPF's trimmed ``optval``. | |
99 | ||
100 | When the BPF program returns with the ``optlen`` greater than | |
101 | ``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno. | |
102 | ||
0c51b369 SF |
103 | Example |
104 | ======= | |
105 | ||
106 | See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example | |
107 | of BPF program that handles socket options. |