[linux-block.git] / Documentation / bpf / kfuncs.rst

.. SPDX-License-Identifier: GPL-2.0

.. _kfuncs-header-label:

=============================
BPF Kernel Functions (kfuncs)
=============================

1. Introduction
===============

BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
kfuncs do not have a stable interface and can change from one kernel release to
another. Hence, BPF programs need to be updated in response to changes in the
kernel.

2. Defining a kfunc
===================

There are two ways to expose a kernel function to BPF programs, either make an
existing function in the kernel visible, or add a new wrapper for BPF. In both
cases, care must be taken that BPF program can only call such function in a
valid context. To enforce this, visibility of a kfunc can be per program type.

If you are not creating a BPF wrapper for existing kernel function, skip ahead
to :ref:`BPF_kfunc_nodef`.

2.1 Creating a wrapper kfunc
----------------------------

When defining a wrapper kfunc, the wrapper function should have extern linkage.
This prevents the compiler from optimizing away dead code, as this wrapper kfunc
is not invoked anywhere in the kernel itself. It is not necessary to provide a
prototype in a header for the wrapper kfunc.

An example is given below::

        /* Disables missing prototype warnings */
        __diag_push();
        __diag_ignore_all("-Wmissing-prototypes",
                          "Global kfuncs as their definitions will be in BTF");

        __bpf_kfunc struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
        {
                return find_get_task_by_vpid(nr);
        }

        __diag_pop();

A wrapper kfunc is often needed when we need to annotate parameters of the
kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.

2.2 Annotating kfunc parameters
-------------------------------

Similar to BPF helpers, there is sometime need for additional context required
by the verifier to make the usage of kernel functions safer and more useful.
Hence, we can annotate a parameter by suffixing the name of the argument of the
kfunc with a __tag, where tag may be one of the supported annotations.

2.2.1 __sz Annotation
---------------------

This annotation is used to indicate a memory and size pair in the argument list.
An example is given below::

        __bpf_kfunc void bpf_memzero(void *mem, int mem__sz)
        {
        ...
        }

Here, the verifier will treat first argument as a PTR_TO_MEM, and second
argument as its size. By default, without __sz annotation, the size of the type
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
pointer.

2.2.2 __k Annotation
--------------------

This annotation is only understood for scalar arguments, where it indicates that
the verifier must check the scalar argument to be a known constant, which does
not indicate a size parameter, and the value of the constant is relevant to the
safety of the program.

An example is given below::

        __bpf_kfunc void *bpf_obj_new(u32 local_type_id__k, ...)
        {
        ...
        }

Here, bpf_obj_new uses local_type_id argument to find out the size of that type
ID in program's BTF and return a sized pointer to it. Each type ID will have a
distinct size, hence it is crucial to treat each such call as distinct when
values don't match during verifier state pruning checks.

Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.

.. _BPF_kfunc_nodef:

2.3 Using an existing kernel function
-------------------------------------

When an existing function in the kernel is fit for consumption by BPF programs,
it can be directly registered with the BPF subsystem. However, care must still
be taken to review the context in which it will be invoked by the BPF program
and whether it is safe to do so.

2.4 Annotating kfuncs
---------------------

In addition to kfuncs' arguments, verifier may need more information about the
type of kfunc(s) being registered with the BPF subsystem. To do so, we define
flags on a set of kfuncs as follows::

        BTF_SET8_START(bpf_task_set)
        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
        BTF_SET8_END(bpf_task_set)

This set encodes the BTF ID of each kfunc listed above, and encodes the flags
along with it. Ofcourse, it is also allowed to specify no flags.

kfunc definitions should also always be annotated with the ``__bpf_kfunc``
macro. This prevents issues such as the compiler inlining the kfunc if it's a
static kernel function, or the function being elided in an LTO build as it's
not used in the rest of the kernel. Developers should not manually add
annotations to their kfunc to prevent these issues. If an annotation is
required to prevent such an issue with your kfunc, it is a bug and should be
added to the definition of the macro so that other kfuncs are similarly
protected. An example is given below::

        __bpf_kfunc struct task_struct *bpf_get_task_pid(s32 pid)
        {
        ...
        }

2.4.1 KF_ACQUIRE flag
---------------------

The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
refcounted object. The verifier will then ensure that the pointer to the object
is eventually released using a release kfunc, or transferred to a map using a
referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
loading of the BPF program until no lingering references remain in all possible
explored states of the program.

2.4.2 KF_RET_NULL flag
----------------------

The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
may be NULL. Hence, it forces the user to do a NULL check on the pointer
returned from the kfunc before making use of it (dereferencing or passing to
another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
both are orthogonal to each other.

2.4.3 KF_RELEASE flag
---------------------

The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
passed in to it. There can be only one referenced pointer that can be passed in.
All copies of the pointer being released are invalidated as a result of invoking
kfunc with this flag.

2.4.4 KF_KPTR_GET flag
----------------------

The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
as a pointer to kptr, safely increments the refcount of the object it points to,
and returns a reference to the user. The rest of the arguments may be normal
arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
KF_ACQUIRE and KF_RET_NULL flags.

2.4.5 KF_TRUSTED_ARGS flag
--------------------------

The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
indicates that the all pointer arguments are valid, and that all pointers to
BTF objects have been passed in their unmodified form (that is, at a zero
offset, and without having been obtained from walking another pointer, with one
exception described below).

There are two types of pointers to kernel objects which are considered "valid":

1. Pointers which are passed as tracepoint or struct_ops callback arguments.
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.

Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.

The definition of "valid" pointers is subject to change at any time, and has
absolutely no ABI stability guarantees.

As mentioned above, a nested pointer obtained from walking a trusted pointer is
no longer trusted, with one exception. If a struct type has a field that is
guaranteed to be valid as long as its parent pointer is trusted, the
``BTF_TYPE_SAFE_NESTED`` macro can be used to express that to the verifier as
follows:

.. code-block:: c

	BTF_TYPE_SAFE_NESTED(struct task_struct) {
		const cpumask_t *cpus_ptr;
	};

In other words, you must:

1. Wrap the trusted pointer type in the ``BTF_TYPE_SAFE_NESTED`` macro.

2. Specify the type and name of the trusted nested field. This field must match
   the field in the original type definition exactly.

2.4.6 KF_SLEEPABLE flag
-----------------------

The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
be called by sleepable BPF programs (BPF_F_SLEEPABLE).

2.4.7 KF_DESTRUCTIVE flag
--------------------------

The KF_DESTRUCTIVE flag is used to indicate functions calling which is
destructive to the system. For example such a call can result in system
rebooting or panicking. Due to this additional restrictions apply to these
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
added later.

2.4.8 KF_RCU flag
-----------------

The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
When used together with KF_ACQUIRE, it indicates the kfunc should have a
single argument which must be a trusted argument or a MEM_RCU pointer.
The argument may have reference count of 0 and the kfunc must take this
into consideration.

2.5 Registering the kfuncs
--------------------------

Once the kfunc is prepared for use, the final step to making it visible is
registering it with the BPF subsystem. Registration is done per BPF program
type. An example is shown below::

        BTF_SET8_START(bpf_task_set)
        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
        BTF_SET8_END(bpf_task_set)

        static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
                .owner = THIS_MODULE,
                .set   = &bpf_task_set,
        };

        static int init_subsystem(void)
        {
                return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
        }
        late_initcall(init_subsystem);

2.6  Specifying no-cast aliases with ___init
--------------------------------------------

The verifier will always enforce that the BTF type of a pointer passed to a
kfunc by a BPF program, matches the type of pointer specified in the kfunc
definition. The verifier, does, however, allow types that are equivalent
according to the C standard to be passed to the same kfunc arg, even if their
BTF_IDs differ.

For example, for the following type definition:

.. code-block:: c

	struct bpf_cpumask {
		cpumask_t cpumask;
		refcount_t usage;
	};

The verifier would allow a ``struct bpf_cpumask *`` to be passed to a kfunc
taking a ``cpumask_t *`` (which is a typedef of ``struct cpumask *``). For
instance, both ``struct cpumask *`` and ``struct bpf_cpmuask *`` can be passed
to bpf_cpumask_test_cpu().

In some cases, this type-aliasing behavior is not desired. ``struct
nf_conn___init`` is one such example:

.. code-block:: c

	struct nf_conn___init {
		struct nf_conn ct;
	};

The C standard would consider these types to be equivalent, but it would not
always be safe to pass either type to a trusted kfunc. ``struct
nf_conn___init`` represents an allocated ``struct nf_conn`` object that has
*not yet been initialized*, so it would therefore be unsafe to pass a ``struct
nf_conn___init *`` to a kfunc that's expecting a fully initialized ``struct
nf_conn *`` (e.g. ``bpf_ct_change_timeout()``).

In order to accommodate such requirements, the verifier will enforce strict
PTR_TO_BTF_ID type matching if two types have the exact same name, with one
being suffixed with ``___init``.

3. Core kfuncs
==============

The BPF subsystem provides a number of "core" kfuncs that are potentially
applicable to a wide variety of different possible use cases and programs.
Those kfuncs are documented here.

3.1 struct task_struct * kfuncs
-------------------------------

There are a number of kfuncs that allow ``struct task_struct *`` objects to be
used as kptrs:

.. kernel-doc:: kernel/bpf/helpers.c
   :identifiers: bpf_task_acquire bpf_task_release

These kfuncs are useful when you want to acquire or release a reference to a
``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
struct_ops callback arg. For example:

.. code-block:: c

	/**
	 * A trivial example tracepoint program that shows how to
	 * acquire and release a struct task_struct * pointer.
	 */
	SEC("tp_btf/task_newtask")
	int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
	{
		struct task_struct *acquired;

		acquired = bpf_task_acquire(task);

		/*
		 * In a typical program you'd do something like store
		 * the task in a map, and the map will automatically
		 * release it later. Here, we release it manually.
		 */
		bpf_task_release(acquired);
		return 0;
	}

----

A BPF program can also look up a task from a pid. This can be useful if the
caller doesn't have a trusted pointer to a ``struct task_struct *`` object that
it can acquire a reference on with bpf_task_acquire().

.. kernel-doc:: kernel/bpf/helpers.c
   :identifiers: bpf_task_from_pid

Here is an example of it being used:

.. code-block:: c

	SEC("tp_btf/task_newtask")
	int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
	{
		struct task_struct *lookup;

		lookup = bpf_task_from_pid(task->pid);
		if (!lookup)
			/* A task should always be found, as %task is a tracepoint arg. */
			return -ENOENT;

		if (lookup->pid != task->pid) {
			/* bpf_task_from_pid() looks up the task via its
			 * globally-unique pid from the init_pid_ns. Thus,
			 * the pid of the lookup task should always be the
			 * same as the input task.
			 */
			bpf_task_release(lookup);
			return -EINVAL;
		}

		/* bpf_task_from_pid() returns an acquired reference,
		 * so it must be dropped before returning from the
		 * tracepoint handler.
		 */
		bpf_task_release(lookup);
		return 0;
	}

3.2 struct cgroup * kfuncs
--------------------------

``struct cgroup *`` objects also have acquire and release functions:

.. kernel-doc:: kernel/bpf/helpers.c
   :identifiers: bpf_cgroup_acquire bpf_cgroup_release

These kfuncs are used in exactly the same manner as bpf_task_acquire() and
bpf_task_release() respectively, so we won't provide examples for them.

----

You may also acquire a reference to a ``struct cgroup`` kptr that's already
stored in a map using bpf_cgroup_kptr_get():

.. kernel-doc:: kernel/bpf/helpers.c
   :identifiers: bpf_cgroup_kptr_get

Here's an example of how it can be used:

.. code-block:: c

	/* struct containing the struct task_struct kptr which is actually stored in the map. */
	struct __cgroups_kfunc_map_value {
		struct cgroup __kptr_ref * cgroup;
	};

	/* The map containing struct __cgroups_kfunc_map_value entries. */
	struct {
		__uint(type, BPF_MAP_TYPE_HASH);
		__type(key, int);
		__type(value, struct __cgroups_kfunc_map_value);
		__uint(max_entries, 1);
	} __cgroups_kfunc_map SEC(".maps");

	/* ... */

	/**
	 * A simple example tracepoint program showing how a
	 * struct cgroup kptr that is stored in a map can
	 * be acquired using the bpf_cgroup_kptr_get() kfunc.
	 */
	 SEC("tp_btf/cgroup_mkdir")
	 int BPF_PROG(cgroup_kptr_get_example, struct cgroup *cgrp, const char *path)
	 {
		struct cgroup *kptr;
		struct __cgroups_kfunc_map_value *v;
		s32 id = cgrp->self.id;

		/* Assume a cgroup kptr was previously stored in the map. */
		v = bpf_map_lookup_elem(&__cgroups_kfunc_map, &id);
		if (!v)
			return -ENOENT;

		/* Acquire a reference to the cgroup kptr that's already stored in the map. */
		kptr = bpf_cgroup_kptr_get(&v->cgroup);
		if (!kptr)
			/* If no cgroup was present in the map, it's because
			 * we're racing with another CPU that removed it with
			 * bpf_kptr_xchg() between the bpf_map_lookup_elem()
			 * above, and our call to bpf_cgroup_kptr_get().
			 * bpf_cgroup_kptr_get() internally safely handles this
			 * race, and will return NULL if the task is no longer
			 * present in the map by the time we invoke the kfunc.
			 */
			return -EBUSY;

		/* Free the reference we just took above. Note that the
		 * original struct cgroup kptr is still in the map. It will
		 * be freed either at a later time if another context deletes
		 * it from the map, or automatically by the BPF subsystem if
		 * it's still present when the map is destroyed.
		 */
		bpf_cgroup_release(kptr);

		return 0;
        }

----

Another kfunc available for interacting with ``struct cgroup *`` objects is
bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
and return it as a cgroup kptr.

.. kernel-doc:: kernel/bpf/helpers.c
   :identifiers: bpf_cgroup_ancestor

Eventually, BPF should be updated to allow this to happen with a normal memory
load in the program itself. This is currently not possible without more work in
the verifier. bpf_cgroup_ancestor() can be used as follows:

.. code-block:: c

	/**
	 * Simple tracepoint example that illustrates how a cgroup's
	 * ancestor can be accessed using bpf_cgroup_ancestor().
	 */
	SEC("tp_btf/cgroup_mkdir")
	int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
	{
		struct cgroup *parent;

		/* The parent cgroup resides at the level before the current cgroup's level. */
		parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
		if (!parent)
			return -ENOENT;

		bpf_printk("Parent id is %d", parent->self.id);

		/* Return the parent cgroup that was acquired above. */
		bpf_cgroup_release(parent);
		return 0;
	}

3.3 struct cpumask * kfuncs
---------------------------

BPF provides a set of kfuncs that can be used to query, allocate, mutate, and
destroy struct cpumask * objects. Please refer to :ref:`cpumasks-header-label`
for more details.
Commit	Line	Data
bdbda395 DV	1	.. SPDX-License-Identifier: GPL-2.0
	2
	3	.. _kfuncs-header-label:
	4
63e564eb KKD	5	=============================
	6	BPF Kernel Functions (kfuncs)
	7	=============================
	8
	9	1. Introduction
	10	===============
	11
	12	BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
	13	kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
	14	kfuncs do not have a stable interface and can change from one kernel release to
	15	another. Hence, BPF programs need to be updated in response to changes in the
	16	kernel.
	17
	18	2. Defining a kfunc
	19	===================
	20
	21	There are two ways to expose a kernel function to BPF programs, either make an
	22	existing function in the kernel visible, or add a new wrapper for BPF. In both
	23	cases, care must be taken that BPF program can only call such function in a
	24	valid context. To enforce this, visibility of a kfunc can be per program type.
	25
	26	If you are not creating a BPF wrapper for existing kernel function, skip ahead
	27	to :ref:`BPF_kfunc_nodef`.
	28
	29	2.1 Creating a wrapper kfunc
	30	----------------------------
	31
	32	When defining a wrapper kfunc, the wrapper function should have extern linkage.
	33	This prevents the compiler from optimizing away dead code, as this wrapper kfunc
	34	is not invoked anywhere in the kernel itself. It is not necessary to provide a
	35	prototype in a header for the wrapper kfunc.
	36
	37	An example is given below::
	38
	39	/* Disables missing prototype warnings */
	40	__diag_push();
	41	__diag_ignore_all("-Wmissing-prototypes",
	42	"Global kfuncs as their definitions will be in BTF");
	43
98e6ab7a	44	__bpf_kfunc struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
63e564eb KKD	45	{
	46	return find_get_task_by_vpid(nr);
	47	}
	48
	49	__diag_pop();
	50
	51	A wrapper kfunc is often needed when we need to annotate parameters of the
	52	kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
	53	registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
	54
	55	2.2 Annotating kfunc parameters
	56	-------------------------------
	57
	58	Similar to BPF helpers, there is sometime need for additional context required
	59	by the verifier to make the usage of kernel functions safer and more useful.
	60	Hence, we can annotate a parameter by suffixing the name of the argument of the
	61	kfunc with a __tag, where tag may be one of the supported annotations.
	62
	63	2.2.1 __sz Annotation
	64	---------------------
	65
	66	This annotation is used to indicate a memory and size pair in the argument list.
	67	An example is given below::
	68
98e6ab7a	69	__bpf_kfunc void bpf_memzero(void *mem, int mem__sz)
63e564eb KKD	70	{
	71	...
	72	}
	73
	74	Here, the verifier will treat first argument as a PTR_TO_MEM, and second
	75	argument as its size. By default, without __sz annotation, the size of the type
	76	of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
	77	pointer.
	78
a50388db KKD	79	2.2.2 __k Annotation
	80	--------------------
	81
	82	This annotation is only understood for scalar arguments, where it indicates that
	83	the verifier must check the scalar argument to be a known constant, which does
	84	not indicate a size parameter, and the value of the constant is relevant to the
	85	safety of the program.
	86
	87	An example is given below::
	88
98e6ab7a	89	__bpf_kfunc void *bpf_obj_new(u32 local_type_id__k, ...)
a50388db KKD	90	{
	91	...
	92	}
	93
	94	Here, bpf_obj_new uses local_type_id argument to find out the size of that type
	95	ID in program's BTF and return a sized pointer to it. Each type ID will have a
	96	distinct size, hence it is crucial to treat each such call as distinct when
	97	values don't match during verifier state pruning checks.
	98
	99	Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
	100	size parameter, and the value of the constant matters for program safety, __k
	101	suffix should be used.
	102
63e564eb KKD	103	.. _BPF_kfunc_nodef:
	104
	105	2.3 Using an existing kernel function
	106	-------------------------------------
	107
	108	When an existing function in the kernel is fit for consumption by BPF programs,
	109	it can be directly registered with the BPF subsystem. However, care must still
	110	be taken to review the context in which it will be invoked by the BPF program
	111	and whether it is safe to do so.
	112
	113	2.4 Annotating kfuncs
	114	---------------------
	115
	116	In addition to kfuncs' arguments, verifier may need more information about the
	117	type of kfunc(s) being registered with the BPF subsystem. To do so, we define
	118	flags on a set of kfuncs as follows::
	119
	120	BTF_SET8_START(bpf_task_set)
	121	BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE \| KF_RET_NULL)
	122	BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
	123	BTF_SET8_END(bpf_task_set)
	124
	125	This set encodes the BTF ID of each kfunc listed above, and encodes the flags
	126	along with it. Ofcourse, it is also allowed to specify no flags.
	127
98e6ab7a DV	128	kfunc definitions should also always be annotated with the ``__bpf_kfunc``
	129	macro. This prevents issues such as the compiler inlining the kfunc if it's a
	130	static kernel function, or the function being elided in an LTO build as it's
	131	not used in the rest of the kernel. Developers should not manually add
	132	annotations to their kfunc to prevent these issues. If an annotation is
	133	required to prevent such an issue with your kfunc, it is a bug and should be
	134	added to the definition of the macro so that other kfuncs are similarly
	135	protected. An example is given below::
	136
	137	__bpf_kfunc struct task_struct *bpf_get_task_pid(s32 pid)
	138	{
	139	...
	140	}
	141
63e564eb KKD	142	2.4.1 KF_ACQUIRE flag
	143	---------------------
	144
	145	The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
	146	refcounted object. The verifier will then ensure that the pointer to the object
	147	is eventually released using a release kfunc, or transferred to a map using a
	148	referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
	149	loading of the BPF program until no lingering references remain in all possible
	150	explored states of the program.
	151
	152	2.4.2 KF_RET_NULL flag
	153	----------------------
	154
	155	The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
	156	may be NULL. Hence, it forces the user to do a NULL check on the pointer
	157	returned from the kfunc before making use of it (dereferencing or passing to
	158	another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
	159	both are orthogonal to each other.
	160
	161	2.4.3 KF_RELEASE flag
	162	---------------------
	163
	164	The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
	165	passed in to it. There can be only one referenced pointer that can be passed in.
	166	All copies of the pointer being released are invalidated as a result of invoking
	167	kfunc with this flag.
	168
	169	2.4.4 KF_KPTR_GET flag
	170	----------------------
	171
	172	The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
	173	as a pointer to kptr, safely increments the refcount of the object it points to,
	174	and returns a reference to the user. The rest of the arguments may be normal
	175	arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
	176	KF_ACQUIRE and KF_RET_NULL flags.
	177
	178	2.4.5 KF_TRUSTED_ARGS flag
	179	--------------------------
	180
	181	The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
3f00c523 DV	182	indicates that the all pointer arguments are valid, and that all pointers to
3f00c523 DV	183	BTF objects have been passed in their unmodified form (that is, at a zero
d94cbde2 DV	184	offset, and without having been obtained from walking another pointer, with one
d94cbde2 DV	185	exception described below).
3f00c523 DV	186
	187	There are two types of pointers to kernel objects which are considered "valid":
	188
	189	1. Pointers which are passed as tracepoint or struct_ops callback arguments.
	190	2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
	191
	192	Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
	193	KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
	194
	195	The definition of "valid" pointers is subject to change at any time, and has
	196	absolutely no ABI stability guarantees.
63e564eb	197
d94cbde2 DV	198	As mentioned above, a nested pointer obtained from walking a trusted pointer is
	199	no longer trusted, with one exception. If a struct type has a field that is
	200	guaranteed to be valid as long as its parent pointer is trusted, the
	201	``BTF_TYPE_SAFE_NESTED`` macro can be used to express that to the verifier as
	202	follows:
	203
	204	.. code-block:: c
	205
	206	BTF_TYPE_SAFE_NESTED(struct task_struct) {
	207	const cpumask_t *cpus_ptr;
	208	};
	209
	210	In other words, you must:
	211
	212	1. Wrap the trusted pointer type in the ``BTF_TYPE_SAFE_NESTED`` macro.
	213
	214	2. Specify the type and name of the trusted nested field. This field must match
	215	the field in the original type definition exactly.
	216
fa96b242 BT	217	2.4.6 KF_SLEEPABLE flag
	218	-----------------------
	219
	220	The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
	221	be called by sleepable BPF programs (BPF_F_SLEEPABLE).
	222
4dd48c6f AS	223	2.4.7 KF_DESTRUCTIVE flag
	224	--------------------------
	225
	226	The KF_DESTRUCTIVE flag is used to indicate functions calling which is
	227	destructive to the system. For example such a call can result in system
	228	rebooting or panicking. Due to this additional restrictions apply to these
	229	calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
	230	added later.
	231
f5362564 YS	232	2.4.8 KF_RCU flag
	233	-----------------
	234
	235	The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
	236	When used together with KF_ACQUIRE, it indicates the kfunc should have a
	237	single argument which must be a trusted argument or a MEM_RCU pointer.
	238	The argument may have reference count of 0 and the kfunc must take this
	239	into consideration.
	240
63e564eb KKD	241	2.5 Registering the kfuncs
	242	--------------------------
	243
	244	Once the kfunc is prepared for use, the final step to making it visible is
	245	registering it with the BPF subsystem. Registration is done per BPF program
	246	type. An example is shown below::
	247
	248	BTF_SET8_START(bpf_task_set)
	249	BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE \| KF_RET_NULL)
	250	BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
	251	BTF_SET8_END(bpf_task_set)
	252
	253	static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
	254	.owner = THIS_MODULE,
	255	.set = &bpf_task_set,
	256	};
	257
	258	static int init_subsystem(void)
	259	{
	260	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
	261	}
	262	late_initcall(init_subsystem);
25c5e92d	263
027bdec8 DV	264	2.6 Specifying no-cast aliases with ___init
	265	--------------------------------------------
	266
	267	The verifier will always enforce that the BTF type of a pointer passed to a
	268	kfunc by a BPF program, matches the type of pointer specified in the kfunc
	269	definition. The verifier, does, however, allow types that are equivalent
	270	according to the C standard to be passed to the same kfunc arg, even if their
	271	BTF_IDs differ.
	272
	273	For example, for the following type definition:
	274
	275	.. code-block:: c
	276
	277	struct bpf_cpumask {
	278	cpumask_t cpumask;
	279	refcount_t usage;
	280	};
	281
	282	The verifier would allow a ``struct bpf_cpumask *`` to be passed to a kfunc
	283	taking a ``cpumask_t `` (which is a typedef of ``struct cpumask ``). For
	284	instance, both ``struct cpumask `` and ``struct bpf_cpmuask `` can be passed
	285	to bpf_cpumask_test_cpu().
	286
	287	In some cases, this type-aliasing behavior is not desired. ``struct
	288	nf_conn___init`` is one such example:
	289
	290	.. code-block:: c
	291
	292	struct nf_conn___init {
	293	struct nf_conn ct;
	294	};
	295
	296	The C standard would consider these types to be equivalent, but it would not
	297	always be safe to pass either type to a trusted kfunc. ``struct
	298	nf_conn___init`` represents an allocated ``struct nf_conn`` object that has
	299	not yet been initialized, so it would therefore be unsafe to pass a ``struct
	300	nf_conn___init *`` to a kfunc that's expecting a fully initialized ``struct
	301	nf_conn *`` (e.g. ``bpf_ct_change_timeout()``).
	302
	303	In order to accommodate such requirements, the verifier will enforce strict
	304	PTR_TO_BTF_ID type matching if two types have the exact same name, with one
	305	being suffixed with ``___init``.
	306
25c5e92d DV	307	3. Core kfuncs
	308	==============
	309
	310	The BPF subsystem provides a number of "core" kfuncs that are potentially
	311	applicable to a wide variety of different possible use cases and programs.
	312	Those kfuncs are documented here.
	313
	314	3.1 struct task_struct * kfuncs
	315	-------------------------------
	316
	317	There are a number of kfuncs that allow ``struct task_struct *`` objects to be
	318	used as kptrs:
	319
	320	.. kernel-doc:: kernel/bpf/helpers.c
	321	:identifiers: bpf_task_acquire bpf_task_release
	322
	323	These kfuncs are useful when you want to acquire or release a reference to a
	324	``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
	325	struct_ops callback arg. For example:
	326
	327	.. code-block:: c
	328
	329	/**
	330	* A trivial example tracepoint program that shows how to
	331	* acquire and release a struct task_struct * pointer.
	332	*/
	333	SEC("tp_btf/task_newtask")
	334	int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
	335	{
	336	struct task_struct *acquired;
	337
	338	acquired = bpf_task_acquire(task);
	339
	340	/*
	341	* In a typical program you'd do something like store
	342	* the task in a map, and the map will automatically
	343	* release it later. Here, we release it manually.
	344	*/
	345	bpf_task_release(acquired);
	346	return 0;
	347	}
	348
	349	----
	350
	351	A BPF program can also look up a task from a pid. This can be useful if the
	352	caller doesn't have a trusted pointer to a ``struct task_struct *`` object that
	353	it can acquire a reference on with bpf_task_acquire().
	354
	355	.. kernel-doc:: kernel/bpf/helpers.c
	356	:identifiers: bpf_task_from_pid
	357
	358	Here is an example of it being used:
	359
	360	.. code-block:: c
	361
	362	SEC("tp_btf/task_newtask")
	363	int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
	364	{
	365	struct task_struct *lookup;
	366
	367	lookup = bpf_task_from_pid(task->pid);
	368	if (!lookup)
	369	/* A task should always be found, as %task is a tracepoint arg. */
	370	return -ENOENT;
371
372	if (lookup->pid != task->pid) {
373	/* bpf_task_from_pid() looks up the task via its
374	* globally-unique pid from the init_pid_ns. Thus,
375	* the pid of the lookup task should always be the
376	* same as the input task.
377	*/
378	bpf_task_release(lookup);
379	return -EINVAL;
380	}
381
382	/* bpf_task_from_pid() returns an acquired reference,
383	* so it must be dropped before returning from the
384	* tracepoint handler.
385	*/
386	bpf_task_release(lookup);
387	return 0;
388	}
36aa10ff DV	389
	390	3.2 struct cgroup * kfuncs
	391	--------------------------
	392
	393	``struct cgroup *`` objects also have acquire and release functions:
	394
	395	.. kernel-doc:: kernel/bpf/helpers.c
	396	:identifiers: bpf_cgroup_acquire bpf_cgroup_release
	397
	398	These kfuncs are used in exactly the same manner as bpf_task_acquire() and
	399	bpf_task_release() respectively, so we won't provide examples for them.
	400
	401	----
	402
	403	You may also acquire a reference to a ``struct cgroup`` kptr that's already
	404	stored in a map using bpf_cgroup_kptr_get():
	405
	406	.. kernel-doc:: kernel/bpf/helpers.c
	407	:identifiers: bpf_cgroup_kptr_get
	408
	409	Here's an example of how it can be used:
	410
	411	.. code-block:: c
	412
	413	/* struct containing the struct task_struct kptr which is actually stored in the map. */
	414	struct __cgroups_kfunc_map_value {
	415	struct cgroup __kptr_ref * cgroup;
	416	};
	417
	418	/* The map containing struct __cgroups_kfunc_map_value entries. */
	419	struct {
	420	__uint(type, BPF_MAP_TYPE_HASH);
	421	__type(key, int);
	422	__type(value, struct __cgroups_kfunc_map_value);
	423	__uint(max_entries, 1);
	424	} __cgroups_kfunc_map SEC(".maps");
	425
	426	/* ... */
	427
	428	/**
	429	* A simple example tracepoint program showing how a
	430	* struct cgroup kptr that is stored in a map can
	431	* be acquired using the bpf_cgroup_kptr_get() kfunc.
	432	*/
	433	SEC("tp_btf/cgroup_mkdir")
	434	int BPF_PROG(cgroup_kptr_get_example, struct cgroup cgrp, const char path)
	435	{
	436	struct cgroup *kptr;
	437	struct __cgroups_kfunc_map_value *v;
	438	s32 id = cgrp->self.id;
	439
	440	/* Assume a cgroup kptr was previously stored in the map. */
	441	v = bpf_map_lookup_elem(&__cgroups_kfunc_map, &id);
	442	if (!v)
	443	return -ENOENT;
	444
	445	/* Acquire a reference to the cgroup kptr that's already stored in the map. */
	446	kptr = bpf_cgroup_kptr_get(&v->cgroup);
	447	if (!kptr)
	448	/* If no cgroup was present in the map, it's because
	449	* we're racing with another CPU that removed it with
	450	* bpf_kptr_xchg() between the bpf_map_lookup_elem()
	451	* above, and our call to bpf_cgroup_kptr_get().
	452	* bpf_cgroup_kptr_get() internally safely handles this
453	* race, and will return NULL if the task is no longer
454	* present in the map by the time we invoke the kfunc.
455	*/
456	return -EBUSY;
457
458	/* Free the reference we just took above. Note that the
459	* original struct cgroup kptr is still in the map. It will
460	* be freed either at a later time if another context deletes
461	* it from the map, or automatically by the BPF subsystem if
462	* it's still present when the map is destroyed.
463	*/
464	bpf_cgroup_release(kptr);
465
466	return 0;
467	}
468
469	----
470
471	Another kfunc available for interacting with ``struct cgroup *`` objects is
472	bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
473	and return it as a cgroup kptr.
474
475	.. kernel-doc:: kernel/bpf/helpers.c
476	:identifiers: bpf_cgroup_ancestor
477
478	Eventually, BPF should be updated to allow this to happen with a normal memory
479	load in the program itself. This is currently not possible without more work in
480	the verifier. bpf_cgroup_ancestor() can be used as follows:
481
482	.. code-block:: c
483
484	/**
485	* Simple tracepoint example that illustrates how a cgroup's
486	* ancestor can be accessed using bpf_cgroup_ancestor().
487	*/
488	SEC("tp_btf/cgroup_mkdir")
489	int BPF_PROG(cgrp_ancestor_example, struct cgroup cgrp, const char path)
490	{
491	struct cgroup *parent;
492
493	/* The parent cgroup resides at the level before the current cgroup's level. */
494	parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
495	if (!parent)
496	return -ENOENT;
497
498	bpf_printk("Parent id is %d", parent->self.id);
499
500	/* Return the parent cgroup that was acquired above. */
501	bpf_cgroup_release(parent);
502	return 0;
503	}
bdbda395 DV	504
	505	3.3 struct cpumask * kfuncs
	506	---------------------------
	507
	508	BPF provides a set of kfuncs that can be used to query, allocate, mutate, and
	509	destroy struct cpumask * objects. Please refer to :ref:`cpumasks-header-label`
	510	for more details.