Commit | Line | Data |
---|---|---|
07ff4f01 JS |
1 | .. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) |
2 | ||
3 | ===================== | |
4 | BPF sk_lookup program | |
5 | ===================== | |
6 | ||
7 | BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability | |
8 | into the socket lookup performed by the transport layer when a packet is to be | |
9 | delivered locally. | |
10 | ||
11 | When invoked BPF sk_lookup program can select a socket that will receive the | |
12 | incoming packet by calling the ``bpf_sk_assign()`` BPF helper function. | |
13 | ||
14 | Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP. | |
15 | ||
16 | Motivation | |
17 | ========== | |
18 | ||
19 | BPF sk_lookup program type was introduced to address setup scenarios where | |
20 | binding sockets to an address with ``bind()`` socket call is impractical, such | |
21 | as: | |
22 | ||
23 | 1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when | |
24 | binding to a wildcard address ``INADRR_ANY`` is not possible due to a port | |
25 | conflict, | |
26 | 2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use | |
27 | case. | |
28 | ||
29 | Such setups would require creating and ``bind()``'ing one socket to each of the | |
30 | IP address/port in the range, leading to resource consumption and potential | |
31 | latency spikes during socket lookup. | |
32 | ||
33 | Attachment | |
34 | ========== | |
35 | ||
36 | BPF sk_lookup program can be attached to a network namespace with | |
37 | ``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a | |
38 | netns FD as attachment ``target_fd``. | |
39 | ||
40 | Multiple programs can be attached to one network namespace. Programs will be | |
41 | invoked in the same order as they were attached. | |
42 | ||
43 | Hooks | |
44 | ===== | |
45 | ||
46 | The attached BPF sk_lookup programs run whenever the transport layer needs to | |
47 | find a listening (TCP) or an unconnected (UDP) socket for an incoming packet. | |
48 | ||
49 | Incoming traffic to established (TCP) and connected (UDP) sockets is delivered | |
50 | as usual without triggering the BPF sk_lookup hook. | |
51 | ||
52 | The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP`` | |
53 | verdict code. As for other BPF program types that are network filters, | |
54 | ``SK_PASS`` signifies that the socket lookup should continue on to regular | |
55 | hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the | |
56 | packet. | |
57 | ||
58 | A BPF sk_lookup program can also select a socket to receive the packet by | |
59 | calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket | |
60 | in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a | |
61 | ``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the | |
62 | selection. Selecting a socket only takes effect if the program has terminated | |
63 | with ``SK_PASS`` code. | |
64 | ||
65 | When multiple programs are attached, the end result is determined from return | |
66 | codes of all the programs according to the following rules: | |
67 | ||
68 | 1. If any program returned ``SK_PASS`` and selected a valid socket, the socket | |
69 | is used as the result of the socket lookup. | |
70 | 2. If more than one program returned ``SK_PASS`` and selected a socket, the last | |
71 | selection takes effect. | |
72 | 3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and | |
73 | selected a socket, socket lookup fails. | |
74 | 4. If all programs returned ``SK_PASS`` and none of them selected a socket, | |
75 | socket lookup continues on. | |
76 | ||
77 | API | |
78 | === | |
79 | ||
80 | In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program | |
81 | receives information about the packet that triggered the socket lookup. Namely: | |
82 | ||
83 | * IP version (``AF_INET`` or ``AF_INET6``), | |
84 | * L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``), | |
85 | * source and destination IP address, | |
86 | * source and destination L4 port, | |
87 | * the socket that has been selected with ``bpf_sk_assign()``. | |
88 | ||
89 | Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API | |
90 | header, and `bpf-helpers(7) | |
91 | <https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section | |
92 | for ``bpf_sk_assign()`` for details. | |
93 | ||
94 | Example | |
95 | ======= | |
96 | ||
97 | See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference | |
98 | implementation. |