[linux-block.git] / Documentation / dev-tools / kmsan.rst

.. SPDX-License-Identifier: GPL-2.0
.. Copyright (C) 2022, Google LLC.

===================================
The Kernel Memory Sanitizer (KMSAN)
===================================

KMSAN is a dynamic error detector aimed at finding uses of uninitialized
values. It is based on compiler instrumentation, and is quite similar to the
userspace `MemorySanitizer tool`_.

An important note is that KMSAN is not intended for production use, because it
drastically increases kernel memory footprint and slows the whole system down.

Usage
=====

Building the kernel
-------------------

In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+).
Please refer to `LLVM documentation`_ for the instructions on how to build Clang.

Now configure and build the kernel with CONFIG_KMSAN enabled.

Example report
--------------

Here is an example of a KMSAN report::

  =====================================================
  BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
   test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
   kunit_run_case_internal lib/kunit/test.c:333
   kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
   kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
   kthread+0x721/0x850 kernel/kthread.c:327
   ret_from_fork+0x1f/0x30 ??:?

  Uninit was stored to memory at:
   do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
   test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
   kunit_run_case_internal lib/kunit/test.c:333
   kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
   kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
   kthread+0x721/0x850 kernel/kthread.c:327
   ret_from_fork+0x1f/0x30 ??:?

  Local variable uninit created at:
   do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
   test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271

  Bytes 4-7 of 8 are uninitialized
  Memory access of size 8 starts at ffff888083fe3da0

  CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G    B       E     5.16.0-rc3+ #104
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
  =====================================================

The report says that the local variable ``uninit`` was created uninitialized in
``do_uninit_local_array()``. The third stack trace corresponds to the place
where this variable was created.

The first stack trace shows where the uninit value was used (in
``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left
uninitialized in the local variable, as well as the stack where the value was
copied to another memory location before use.

A use of uninitialized value ``v`` is reported by KMSAN in the following cases:

 - in a condition, e.g. ``if (v) { ... }``;
 - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``;
 - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``;
 - when it is passed as an argument to a function, and
   ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below).

The mentioned cases (apart from copying data to userspace or hardware, which is
a security issue) are considered undefined behavior from the C11 Standard point
of view.

Disabling the instrumentation
-----------------------------

A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
ignore uninitialized values in that function and mark its output as initialized.
As a result, the user will not get KMSAN reports related to that function.

Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
Applying this attribute to a function will result in KMSAN not instrumenting
it, which can be helpful if we do not want the compiler to interfere with some
low-level code (e.g. that marked with ``noinstr`` which implicitly adds
``__no_sanitize_memory``).

This however comes at a cost: stack allocations from such functions will have
incorrect shadow/origin values, likely leading to false positives. Functions
called from non-instrumented code may also receive incorrect metadata for their
parameters.

As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.

It is also possible to disable KMSAN for a single file (e.g. main.o)::

  KMSAN_SANITIZE_main.o := n

or for the whole directory::

  KMSAN_SANITIZE := n

in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
their code gets broken by KMSAN (e.g. runs at early boot time).

Support
=======

In order for KMSAN to work the kernel must be built with Clang, which so far is
the only compiler that has KMSAN support. The kernel instrumentation pass is
based on the userspace `MemorySanitizer tool`_.

The runtime library only supports x86_64 at the moment.

How KMSAN works
===============

KMSAN shadow memory
-------------------

KMSAN associates a metadata byte (also called shadow byte) with every byte of
kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
setting its shadow bytes to ``0xff``) is called poisoning, marking it
initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.

When a new variable is allocated on the stack, it is poisoned by default by
instrumentation code inserted by the compiler (unless it is a stack variable
that is immediately initialized). Any new heap allocation done without
``__GFP_ZERO`` is also poisoned.

Compiler instrumentation also tracks the shadow values as they are used along
the code. When needed, instrumentation code invokes the runtime library in
``mm/kmsan/`` to persist shadow values.

The shadow value of a basic or compound type is an array of bytes of the same
length. When a constant value is written into memory, that memory is unpoisoned.
When a value is read from memory, its shadow memory is also obtained and
propagated into all the operations which use that value. For every instruction
that takes one or more values the compiler generates code that calculates the
shadow of the result depending on those values and their shadows.

Example::

  int a = 0xff;  // i.e. 0x000000ff
  int b;
  int c = a | b;

In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
``c`` are uninitialized, while the lower byte is initialized.

Origin tracking
---------------

Every four bytes of kernel memory also have a so-called origin mapped to them.
This origin describes the point in program execution at which the uninitialized
value was created. Every origin is associated with either the full allocation
stack (for heap-allocated memory), or the function containing the uninitialized
variable (for locals).

When an uninitialized variable is allocated on stack or heap, a new origin
value is created, and that variable's origin is filled with that value. When a
value is read from memory, its origin is also read and kept together with the
shadow. For every instruction that takes one or more values, the origin of the
result is one of the origins corresponding to any of the uninitialized inputs.
If a poisoned value is written into memory, its origin is written to the
corresponding storage as well.

Example 1::

  int a = 42;
  int b;
  int c = a + b;

In this case the origin of ``b`` is generated upon function entry, and is
stored to the origin of ``c`` right before the addition result is written into
memory.

Several variables may share the same origin address, if they are stored in the
same four-byte chunk. In this case every write to either variable updates the
origin for all of them. We have to sacrifice precision in this case, because
storing origins for individual bits (and even bytes) would be too costly.

Example 2::

  int combine(short a, short b) {
    union ret_t {
      int i;
      short s[2];
    } ret;
    ret.s[0] = a;
    ret.s[1] = b;
    return ret.i;
  }

If ``a`` is initialized and ``b`` is not, the shadow of the result would be
0xffff0000, and the origin of the result would be the origin of ``b``.
``ret.s[0]`` would have the same origin, but it will never be used, because
that variable is initialized.

If both function arguments are uninitialized, only the origin of the second
argument is preserved.

Origin chaining
~~~~~~~~~~~~~~~

To ease debugging, KMSAN creates a new origin for every store of an
uninitialized value to memory. The new origin references both its creation stack
and the previous origin the value had. This may cause increased memory
consumption, so we limit the length of origin chains in the runtime.

Clang instrumentation API
-------------------------

Clang instrumentation pass inserts calls to functions defined in
``mm/kmsan/nstrumentation.c`` into the kernel code.

Shadow manipulation
~~~~~~~~~~~~~~~~~~~

For every memory access the compiler emits a call to a function that returns a
pair of pointers to the shadow and origin addresses of the given memory::

  typedef struct {
    void *shadow, *origin;
  } shadow_origin_ptr_t

  shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
  shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
  shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
  shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)

The function name depends on the memory access size.

The compiler makes sure that for every loaded value its shadow and origin
values are read from memory. When a value is stored to memory, its shadow and
origin are also stored using the metadata pointers.

Handling locals
~~~~~~~~~~~~~~~

A special function is used to create a new origin value for a local variable and
set the origin of that variable to that value::

  void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)

Access to per-task data
~~~~~~~~~~~~~~~~~~~~~~~

At the beginning of every instrumented function KMSAN inserts a call to
``__msan_get_context_state()``::

  kmsan_context_state *__msan_get_context_state(void)

``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::

  struct kmsan_context_state {
    char param_tls[KMSAN_PARAM_SIZE];
    char retval_tls[KMSAN_RETVAL_SIZE];
    char va_arg_tls[KMSAN_PARAM_SIZE];
    char va_arg_origin_tls[KMSAN_PARAM_SIZE];
    u64 va_arg_overflow_size_tls;
    char param_origin_tls[KMSAN_PARAM_SIZE];
    depot_stack_handle_t retval_origin_tls;
  };

This structure is used by KMSAN to pass parameter shadows and origins between
instrumented functions (unless the parameters are checked immediately by
``CONFIG_KMSAN_CHECK_PARAM_RETVAL``).

Passing uninitialized values to functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Clang's MemorySanitizer instrumentation has an option,
``-fsanitize-memory-param-retval``, which makes the compiler check function
parameters passed by value, as well as function return values.

The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is
enabled by default to let KMSAN report uninitialized values earlier.
Please refer to the `LKML discussion`_ for more details.

Because of the way the checks are implemented in LLVM (they are only applied to
parameters marked as ``noundef``), not all parameters are guaranteed to be
checked, so we cannot give up the metadata storage in ``kmsan_context_state``.

String functions
~~~~~~~~~~~~~~~~

The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
following functions. These functions are also called when data structures are
initialized or copied, making sure shadow and origin values are copied alongside
with the data::

  void *__msan_memcpy(void *dst, void *src, uintptr_t n)
  void *__msan_memmove(void *dst, void *src, uintptr_t n)
  void *__msan_memset(void *dst, int c, uintptr_t n)

Error reporting
~~~~~~~~~~~~~~~

For each use of a value the compiler emits a shadow check that calls
``__msan_warning()`` in the case that value is poisoned::

  void __msan_warning(u32 origin)

``__msan_warning()`` causes KMSAN runtime to print an error report.

Inline assembly instrumentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

KMSAN instruments every inline assembly output with a call to::

  void __msan_instrument_asm_store(void *addr, uintptr_t size)

, which unpoisons the memory region.

This approach may mask certain errors, but it also helps to avoid a lot of
false positives in bitwise operations, atomics etc.

Sometimes the pointers passed into inline assembly do not point to valid memory.
In such cases they are ignored at runtime.


Runtime library
---------------

The code is located in ``mm/kmsan/``.

Per-task KMSAN state
~~~~~~~~~~~~~~~~~~~~

Every task_struct has an associated KMSAN task state that holds the KMSAN
context (see above) and a per-task flag disallowing KMSAN reports::

  struct kmsan_context {
    ...
    bool allow_reporting;
    struct kmsan_context_state cstate;
    ...
  }

  struct task_struct {
    ...
    struct kmsan_context kmsan;
    ...
  }

KMSAN contexts
~~~~~~~~~~~~~~

When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
hold the metadata for function parameters and return values.

But in the case the kernel is running in the interrupt, softirq or NMI context,
where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::

  DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);

Metadata allocation
~~~~~~~~~~~~~~~~~~~

There are several places in the kernel for which the metadata is stored.

1. Each ``struct page`` instance contains two pointers to its shadow and
origin pages::

  struct page {
    ...
    struct page *shadow, *origin;
    ...
  };

At boot-time, the kernel allocates shadow and origin pages for every available
kernel page. This is done quite late, when the kernel address space is already
fragmented, so normal data pages may arbitrarily interleave with the metadata
pages.

This means that in general for two contiguous memory pages their shadow/origin
pages may not be contiguous. Consequently, if a memory access crosses the
boundary of a memory block, accesses to shadow/origin memory may potentially
corrupt other pages or read incorrect values from them.

In practice, contiguous memory pages returned by the same ``alloc_pages()``
call will have contiguous metadata, whereas if these pages belong to two
different allocations their metadata pages can be fragmented.

For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
there also are no guarantees on metadata contiguity.

In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::

  char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
  char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));

``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
All stores to ``dummy_store_page`` are ignored.

2. For vmalloc memory and modules, there is a direct mapping between the memory
range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
area contains shadow memory for the first quarter, the third one holds the
origins. A small part of the fourth quarter contains shadow and origins for the
kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
more details.

When an array of pages is mapped into a contiguous virtual memory space, their
shadow and origin pages are similarly mapped into contiguous regions.

References
==========

E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
memory use in C++
<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
In Proceedings of CGO 2015.

.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
.. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/
Commit	Line	Data
93858ae7 AP	1	.. SPDX-License-Identifier: GPL-2.0
	2	.. Copyright (C) 2022, Google LLC.
	3
	4	===================================
	5	The Kernel Memory Sanitizer (KMSAN)
	6	===================================
	7
	8	KMSAN is a dynamic error detector aimed at finding uses of uninitialized
	9	values. It is based on compiler instrumentation, and is quite similar to the
	10	userspace `MemorySanitizer tool`_.
	11
	12	An important note is that KMSAN is not intended for production use, because it
	13	drastically increases kernel memory footprint and slows the whole system down.
	14
	15	Usage
	16	=====
	17
	18	Building the kernel
	19	-------------------
	20
	21	In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+).
	22	Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
	23
	24	Now configure and build the kernel with CONFIG_KMSAN enabled.
	25
	26	Example report
	27	--------------
	28
	29	Here is an example of a KMSAN report::
	30
	31	=====================================================
	32	BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
	33	test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
	34	kunit_run_case_internal lib/kunit/test.c:333
	35	kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
	36	kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
	37	kthread+0x721/0x850 kernel/kthread.c:327
	38	ret_from_fork+0x1f/0x30 ??:?
	39
	40	Uninit was stored to memory at:
	41	do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
	42	test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
	43	kunit_run_case_internal lib/kunit/test.c:333
	44	kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
	45	kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
	46	kthread+0x721/0x850 kernel/kthread.c:327
	47	ret_from_fork+0x1f/0x30 ??:?
	48
	49	Local variable uninit created at:
	50	do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
	51	test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
	52
	53	Bytes 4-7 of 8 are uninitialized
	54	Memory access of size 8 starts at ffff888083fe3da0
	55
	56	CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104
	57	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
	58	=====================================================
	59
	60	The report says that the local variable ``uninit`` was created uninitialized in
	61	``do_uninit_local_array()``. The third stack trace corresponds to the place
	62	where this variable was created.
	63
	64	The first stack trace shows where the uninit value was used (in
65	``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left
66	uninitialized in the local variable, as well as the stack where the value was
67	copied to another memory location before use.
68
69	A use of uninitialized value ``v`` is reported by KMSAN in the following cases:
436fa4a6	70
93858ae7 AP	71	- in a condition, e.g. ``if (v) { ... }``;
	72	- in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``;
	73	- when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``;
	74	- when it is passed as an argument to a function, and
	75	``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below).
	76
	77	The mentioned cases (apart from copying data to userspace or hardware, which is
	78	a security issue) are considered undefined behavior from the C11 Standard point
	79	of view.
	80
	81	Disabling the instrumentation
	82	-----------------------------
	83
	84	A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
	85	ignore uninitialized values in that function and mark its output as initialized.
	86	As a result, the user will not get KMSAN reports related to that function.
	87
	88	Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
	89	Applying this attribute to a function will result in KMSAN not instrumenting
	90	it, which can be helpful if we do not want the compiler to interfere with some
	91	low-level code (e.g. that marked with ``noinstr`` which implicitly adds
	92	``__no_sanitize_memory``).
	93
	94	This however comes at a cost: stack allocations from such functions will have
	95	incorrect shadow/origin values, likely leading to false positives. Functions
	96	called from non-instrumented code may also receive incorrect metadata for their
	97	parameters.
	98
	99	As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
	100
	101	It is also possible to disable KMSAN for a single file (e.g. main.o)::
	102
	103	KMSAN_SANITIZE_main.o := n
	104
	105	or for the whole directory::
	106
	107	KMSAN_SANITIZE := n
	108
	109	in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
	110	function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
	111	their code gets broken by KMSAN (e.g. runs at early boot time).
	112
	113	Support
	114	=======
	115
	116	In order for KMSAN to work the kernel must be built with Clang, which so far is
	117	the only compiler that has KMSAN support. The kernel instrumentation pass is
	118	based on the userspace `MemorySanitizer tool`_.
	119
	120	The runtime library only supports x86_64 at the moment.
	121
	122	How KMSAN works
	123	===============
	124
	125	KMSAN shadow memory
	126	-------------------
	127
	128	KMSAN associates a metadata byte (also called shadow byte) with every byte of
	129	kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
	130	kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
	131	setting its shadow bytes to ``0xff``) is called poisoning, marking it
	132	initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
	133
	134	When a new variable is allocated on the stack, it is poisoned by default by
135	instrumentation code inserted by the compiler (unless it is a stack variable
136	that is immediately initialized). Any new heap allocation done without
137	``__GFP_ZERO`` is also poisoned.
138
139	Compiler instrumentation also tracks the shadow values as they are used along
140	the code. When needed, instrumentation code invokes the runtime library in
141	``mm/kmsan/`` to persist shadow values.
142
143	The shadow value of a basic or compound type is an array of bytes of the same
144	length. When a constant value is written into memory, that memory is unpoisoned.
145	When a value is read from memory, its shadow memory is also obtained and
146	propagated into all the operations which use that value. For every instruction
147	that takes one or more values the compiler generates code that calculates the
148	shadow of the result depending on those values and their shadows.
149
150	Example::
151
152	int a = 0xff; // i.e. 0x000000ff
153	int b;
154	int c = a \| b;
155
156	In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
157	shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
158	``c`` are uninitialized, while the lower byte is initialized.
159
160	Origin tracking
161	---------------
162
163	Every four bytes of kernel memory also have a so-called origin mapped to them.
164	This origin describes the point in program execution at which the uninitialized
165	value was created. Every origin is associated with either the full allocation
166	stack (for heap-allocated memory), or the function containing the uninitialized
167	variable (for locals).
168
169	When an uninitialized variable is allocated on stack or heap, a new origin
170	value is created, and that variable's origin is filled with that value. When a
171	value is read from memory, its origin is also read and kept together with the
172	shadow. For every instruction that takes one or more values, the origin of the
173	result is one of the origins corresponding to any of the uninitialized inputs.
174	If a poisoned value is written into memory, its origin is written to the
175	corresponding storage as well.
176
177	Example 1::
178
179	int a = 42;
180	int b;
181	int c = a + b;
182
183	In this case the origin of ``b`` is generated upon function entry, and is
184	stored to the origin of ``c`` right before the addition result is written into
185	memory.
186
187	Several variables may share the same origin address, if they are stored in the
188	same four-byte chunk. In this case every write to either variable updates the
189	origin for all of them. We have to sacrifice precision in this case, because
190	storing origins for individual bits (and even bytes) would be too costly.
191
192	Example 2::
193
194	int combine(short a, short b) {
195	union ret_t {
196	int i;
197	short s[2];
198	} ret;
199	ret.s[0] = a;
200	ret.s[1] = b;
201	return ret.i;
202	}
203
204	If ``a`` is initialized and ``b`` is not, the shadow of the result would be
205	0xffff0000, and the origin of the result would be the origin of ``b``.
206	``ret.s[0]`` would have the same origin, but it will never be used, because
207	that variable is initialized.
208
209	If both function arguments are uninitialized, only the origin of the second
210	argument is preserved.
211
212	Origin chaining
213	~~~~~~~~~~~~~~~
214
215	To ease debugging, KMSAN creates a new origin for every store of an
216	uninitialized value to memory. The new origin references both its creation stack
217	and the previous origin the value had. This may cause increased memory
218	consumption, so we limit the length of origin chains in the runtime.
219
220	Clang instrumentation API
221	-------------------------
222
223	Clang instrumentation pass inserts calls to functions defined in
224	``mm/kmsan/nstrumentation.c`` into the kernel code.
225
226	Shadow manipulation
227	~~~~~~~~~~~~~~~~~~~
228
229	For every memory access the compiler emits a call to a function that returns a
230	pair of pointers to the shadow and origin addresses of the given memory::
231
232	typedef struct {
233	void shadow, origin;
234	} shadow_origin_ptr_t
235
236	shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
237	shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
238	shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
239	shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
240
241	The function name depends on the memory access size.
242
243	The compiler makes sure that for every loaded value its shadow and origin
244	values are read from memory. When a value is stored to memory, its shadow and
245	origin are also stored using the metadata pointers.
246
247	Handling locals
248	~~~~~~~~~~~~~~~
249
250	A special function is used to create a new origin value for a local variable and
251	set the origin of that variable to that value::
252
253	void __msan_poison_alloca(void addr, uintptr_t size, char descr)
254
255	Access to per-task data
256	~~~~~~~~~~~~~~~~~~~~~~~
257
258	At the beginning of every instrumented function KMSAN inserts a call to
259	``__msan_get_context_state()``::
260
261	kmsan_context_state *__msan_get_context_state(void)
262
263	``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
264
265	struct kmsan_context_state {
266	char param_tls[KMSAN_PARAM_SIZE];
267	char retval_tls[KMSAN_RETVAL_SIZE];
268	char va_arg_tls[KMSAN_PARAM_SIZE];
269	char va_arg_origin_tls[KMSAN_PARAM_SIZE];
270	u64 va_arg_overflow_size_tls;
271	char param_origin_tls[KMSAN_PARAM_SIZE];
272	depot_stack_handle_t retval_origin_tls;
273	};
274
275	This structure is used by KMSAN to pass parameter shadows and origins between
276	instrumented functions (unless the parameters are checked immediately by
277	``CONFIG_KMSAN_CHECK_PARAM_RETVAL``).
278
279	Passing uninitialized values to functions
280	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
281
282	Clang's MemorySanitizer instrumentation has an option,
283	``-fsanitize-memory-param-retval``, which makes the compiler check function
284	parameters passed by value, as well as function return values.
285
286	The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is
287	enabled by default to let KMSAN report uninitialized values earlier.
288	Please refer to the `LKML discussion`_ for more details.
289
290	Because of the way the checks are implemented in LLVM (they are only applied to
291	parameters marked as ``noundef``), not all parameters are guaranteed to be
292	checked, so we cannot give up the metadata storage in ``kmsan_context_state``.
293
294	String functions
295	~~~~~~~~~~~~~~~~
296
297	The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
298	following functions. These functions are also called when data structures are
299	initialized or copied, making sure shadow and origin values are copied alongside
300	with the data::
301
302	void __msan_memcpy(void dst, void *src, uintptr_t n)
303	void __msan_memmove(void dst, void *src, uintptr_t n)
304	void __msan_memset(void dst, int c, uintptr_t n)
305
306	Error reporting
307	~~~~~~~~~~~~~~~
308
309	For each use of a value the compiler emits a shadow check that calls
310	``__msan_warning()`` in the case that value is poisoned::
311
312	void __msan_warning(u32 origin)
313
314	``__msan_warning()`` causes KMSAN runtime to print an error report.
315
316	Inline assembly instrumentation
317	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
318
319	KMSAN instruments every inline assembly output with a call to::
320
321	void __msan_instrument_asm_store(void *addr, uintptr_t size)
322
323	, which unpoisons the memory region.
324
325	This approach may mask certain errors, but it also helps to avoid a lot of
326	false positives in bitwise operations, atomics etc.
327
328	Sometimes the pointers passed into inline assembly do not point to valid memory.
329	In such cases they are ignored at runtime.
330
331
332	Runtime library
333	---------------
334
335	The code is located in ``mm/kmsan/``.
336
337	Per-task KMSAN state
338	~~~~~~~~~~~~~~~~~~~~
339
340	Every task_struct has an associated KMSAN task state that holds the KMSAN
341	context (see above) and a per-task flag disallowing KMSAN reports::
342
343	struct kmsan_context {
344	...
345	bool allow_reporting;
346	struct kmsan_context_state cstate;
347	...
348	}
349
350	struct task_struct {
351	...
352	struct kmsan_context kmsan;
353	...
354	}
355
356	KMSAN contexts
357	~~~~~~~~~~~~~~
358
359	When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
360	hold the metadata for function parameters and return values.
361
362	But in the case the kernel is running in the interrupt, softirq or NMI context,
363	where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
364
365	DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
366
367	Metadata allocation
368	~~~~~~~~~~~~~~~~~~~
369
370	There are several places in the kernel for which the metadata is stored.
371
372	1. Each ``struct page`` instance contains two pointers to its shadow and
373	origin pages::
374
375	struct page {
376	...
377	struct page shadow, origin;
378	...
379	};
380
381	At boot-time, the kernel allocates shadow and origin pages for every available
382	kernel page. This is done quite late, when the kernel address space is already
383	fragmented, so normal data pages may arbitrarily interleave with the metadata
384	pages.
385
386	This means that in general for two contiguous memory pages their shadow/origin
387	pages may not be contiguous. Consequently, if a memory access crosses the
388	boundary of a memory block, accesses to shadow/origin memory may potentially
389	corrupt other pages or read incorrect values from them.
390
391	In practice, contiguous memory pages returned by the same ``alloc_pages()``
392	call will have contiguous metadata, whereas if these pages belong to two
393	different allocations their metadata pages can be fragmented.
394
395	For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
396	there also are no guarantees on metadata contiguity.
397
398	In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
399	pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
400
401	char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
402	char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
403
404	``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
405	All stores to ``dummy_store_page`` are ignored.
406
407	2. For vmalloc memory and modules, there is a direct mapping between the memory
408	range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
409	the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
410	area contains shadow memory for the first quarter, the third one holds the
411	origins. A small part of the fourth quarter contains shadow and origins for the
412	kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
413	more details.
414
415	When an array of pages is mapped into a contiguous virtual memory space, their
416	shadow and origin pages are similarly mapped into contiguous regions.
417
418	References
419	==========
420
421	E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
422	memory use in C++
423	<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
424	In Proceedings of CGO 2015.
425
426	.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
427	.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
428	.. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/