Commit | Line | Data |
---|---|---|
93858ae7 AP |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | .. Copyright (C) 2022, Google LLC. | |
3 | ||
4 | =================================== | |
5 | The Kernel Memory Sanitizer (KMSAN) | |
6 | =================================== | |
7 | ||
8 | KMSAN is a dynamic error detector aimed at finding uses of uninitialized | |
9 | values. It is based on compiler instrumentation, and is quite similar to the | |
10 | userspace `MemorySanitizer tool`_. | |
11 | ||
12 | An important note is that KMSAN is not intended for production use, because it | |
13 | drastically increases kernel memory footprint and slows the whole system down. | |
14 | ||
15 | Usage | |
16 | ===== | |
17 | ||
18 | Building the kernel | |
19 | ------------------- | |
20 | ||
21 | In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+). | |
22 | Please refer to `LLVM documentation`_ for the instructions on how to build Clang. | |
23 | ||
24 | Now configure and build the kernel with CONFIG_KMSAN enabled. | |
25 | ||
26 | Example report | |
27 | -------------- | |
28 | ||
29 | Here is an example of a KMSAN report:: | |
30 | ||
31 | ===================================================== | |
32 | BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test] | |
33 | test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273 | |
34 | kunit_run_case_internal lib/kunit/test.c:333 | |
35 | kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 | |
36 | kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 | |
37 | kthread+0x721/0x850 kernel/kthread.c:327 | |
38 | ret_from_fork+0x1f/0x30 ??:? | |
39 | ||
40 | Uninit was stored to memory at: | |
41 | do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260 | |
42 | test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 | |
43 | kunit_run_case_internal lib/kunit/test.c:333 | |
44 | kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 | |
45 | kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 | |
46 | kthread+0x721/0x850 kernel/kthread.c:327 | |
47 | ret_from_fork+0x1f/0x30 ??:? | |
48 | ||
49 | Local variable uninit created at: | |
50 | do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256 | |
51 | test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 | |
52 | ||
53 | Bytes 4-7 of 8 are uninitialized | |
54 | Memory access of size 8 starts at ffff888083fe3da0 | |
55 | ||
56 | CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104 | |
57 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 | |
58 | ===================================================== | |
59 | ||
60 | The report says that the local variable ``uninit`` was created uninitialized in | |
61 | ``do_uninit_local_array()``. The third stack trace corresponds to the place | |
62 | where this variable was created. | |
63 | ||
64 | The first stack trace shows where the uninit value was used (in | |
65 | ``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left | |
66 | uninitialized in the local variable, as well as the stack where the value was | |
67 | copied to another memory location before use. | |
68 | ||
69 | A use of uninitialized value ``v`` is reported by KMSAN in the following cases: | |
436fa4a6 | 70 | |
93858ae7 AP |
71 | - in a condition, e.g. ``if (v) { ... }``; |
72 | - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``; | |
73 | - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``; | |
74 | - when it is passed as an argument to a function, and | |
75 | ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below). | |
76 | ||
77 | The mentioned cases (apart from copying data to userspace or hardware, which is | |
78 | a security issue) are considered undefined behavior from the C11 Standard point | |
79 | of view. | |
80 | ||
81 | Disabling the instrumentation | |
82 | ----------------------------- | |
83 | ||
84 | A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN | |
85 | ignore uninitialized values in that function and mark its output as initialized. | |
86 | As a result, the user will not get KMSAN reports related to that function. | |
87 | ||
88 | Another function attribute supported by KMSAN is ``__no_sanitize_memory``. | |
89 | Applying this attribute to a function will result in KMSAN not instrumenting | |
90 | it, which can be helpful if we do not want the compiler to interfere with some | |
91 | low-level code (e.g. that marked with ``noinstr`` which implicitly adds | |
92 | ``__no_sanitize_memory``). | |
93 | ||
94 | This however comes at a cost: stack allocations from such functions will have | |
95 | incorrect shadow/origin values, likely leading to false positives. Functions | |
96 | called from non-instrumented code may also receive incorrect metadata for their | |
97 | parameters. | |
98 | ||
99 | As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly. | |
100 | ||
101 | It is also possible to disable KMSAN for a single file (e.g. main.o):: | |
102 | ||
103 | KMSAN_SANITIZE_main.o := n | |
104 | ||
105 | or for the whole directory:: | |
106 | ||
107 | KMSAN_SANITIZE := n | |
108 | ||
109 | in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every | |
110 | function in the file or directory. Most users won't need KMSAN_SANITIZE, unless | |
111 | their code gets broken by KMSAN (e.g. runs at early boot time). | |
112 | ||
113 | Support | |
114 | ======= | |
115 | ||
116 | In order for KMSAN to work the kernel must be built with Clang, which so far is | |
117 | the only compiler that has KMSAN support. The kernel instrumentation pass is | |
118 | based on the userspace `MemorySanitizer tool`_. | |
119 | ||
120 | The runtime library only supports x86_64 at the moment. | |
121 | ||
122 | How KMSAN works | |
123 | =============== | |
124 | ||
125 | KMSAN shadow memory | |
126 | ------------------- | |
127 | ||
128 | KMSAN associates a metadata byte (also called shadow byte) with every byte of | |
129 | kernel memory. A bit in the shadow byte is set iff the corresponding bit of the | |
130 | kernel memory byte is uninitialized. Marking the memory uninitialized (i.e. | |
131 | setting its shadow bytes to ``0xff``) is called poisoning, marking it | |
132 | initialized (setting the shadow bytes to ``0x00``) is called unpoisoning. | |
133 | ||
134 | When a new variable is allocated on the stack, it is poisoned by default by | |
135 | instrumentation code inserted by the compiler (unless it is a stack variable | |
136 | that is immediately initialized). Any new heap allocation done without | |
137 | ``__GFP_ZERO`` is also poisoned. | |
138 | ||
139 | Compiler instrumentation also tracks the shadow values as they are used along | |
140 | the code. When needed, instrumentation code invokes the runtime library in | |
141 | ``mm/kmsan/`` to persist shadow values. | |
142 | ||
143 | The shadow value of a basic or compound type is an array of bytes of the same | |
144 | length. When a constant value is written into memory, that memory is unpoisoned. | |
145 | When a value is read from memory, its shadow memory is also obtained and | |
146 | propagated into all the operations which use that value. For every instruction | |
147 | that takes one or more values the compiler generates code that calculates the | |
148 | shadow of the result depending on those values and their shadows. | |
149 | ||
150 | Example:: | |
151 | ||
152 | int a = 0xff; // i.e. 0x000000ff | |
153 | int b; | |
154 | int c = a | b; | |
155 | ||
156 | In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``, | |
157 | shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of | |
158 | ``c`` are uninitialized, while the lower byte is initialized. | |
159 | ||
160 | Origin tracking | |
161 | --------------- | |
162 | ||
163 | Every four bytes of kernel memory also have a so-called origin mapped to them. | |
164 | This origin describes the point in program execution at which the uninitialized | |
165 | value was created. Every origin is associated with either the full allocation | |
166 | stack (for heap-allocated memory), or the function containing the uninitialized | |
167 | variable (for locals). | |
168 | ||
169 | When an uninitialized variable is allocated on stack or heap, a new origin | |
170 | value is created, and that variable's origin is filled with that value. When a | |
171 | value is read from memory, its origin is also read and kept together with the | |
172 | shadow. For every instruction that takes one or more values, the origin of the | |
173 | result is one of the origins corresponding to any of the uninitialized inputs. | |
174 | If a poisoned value is written into memory, its origin is written to the | |
175 | corresponding storage as well. | |
176 | ||
177 | Example 1:: | |
178 | ||
179 | int a = 42; | |
180 | int b; | |
181 | int c = a + b; | |
182 | ||
183 | In this case the origin of ``b`` is generated upon function entry, and is | |
184 | stored to the origin of ``c`` right before the addition result is written into | |
185 | memory. | |
186 | ||
187 | Several variables may share the same origin address, if they are stored in the | |
188 | same four-byte chunk. In this case every write to either variable updates the | |
189 | origin for all of them. We have to sacrifice precision in this case, because | |
190 | storing origins for individual bits (and even bytes) would be too costly. | |
191 | ||
192 | Example 2:: | |
193 | ||
194 | int combine(short a, short b) { | |
195 | union ret_t { | |
196 | int i; | |
197 | short s[2]; | |
198 | } ret; | |
199 | ret.s[0] = a; | |
200 | ret.s[1] = b; | |
201 | return ret.i; | |
202 | } | |
203 | ||
204 | If ``a`` is initialized and ``b`` is not, the shadow of the result would be | |
205 | 0xffff0000, and the origin of the result would be the origin of ``b``. | |
206 | ``ret.s[0]`` would have the same origin, but it will never be used, because | |
207 | that variable is initialized. | |
208 | ||
209 | If both function arguments are uninitialized, only the origin of the second | |
210 | argument is preserved. | |
211 | ||
212 | Origin chaining | |
213 | ~~~~~~~~~~~~~~~ | |
214 | ||
215 | To ease debugging, KMSAN creates a new origin for every store of an | |
216 | uninitialized value to memory. The new origin references both its creation stack | |
217 | and the previous origin the value had. This may cause increased memory | |
218 | consumption, so we limit the length of origin chains in the runtime. | |
219 | ||
220 | Clang instrumentation API | |
221 | ------------------------- | |
222 | ||
223 | Clang instrumentation pass inserts calls to functions defined in | |
224 | ``mm/kmsan/nstrumentation.c`` into the kernel code. | |
225 | ||
226 | Shadow manipulation | |
227 | ~~~~~~~~~~~~~~~~~~~ | |
228 | ||
229 | For every memory access the compiler emits a call to a function that returns a | |
230 | pair of pointers to the shadow and origin addresses of the given memory:: | |
231 | ||
232 | typedef struct { | |
233 | void *shadow, *origin; | |
234 | } shadow_origin_ptr_t | |
235 | ||
236 | shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr) | |
237 | shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr) | |
238 | shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size) | |
239 | shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size) | |
240 | ||
241 | The function name depends on the memory access size. | |
242 | ||
243 | The compiler makes sure that for every loaded value its shadow and origin | |
244 | values are read from memory. When a value is stored to memory, its shadow and | |
245 | origin are also stored using the metadata pointers. | |
246 | ||
247 | Handling locals | |
248 | ~~~~~~~~~~~~~~~ | |
249 | ||
250 | A special function is used to create a new origin value for a local variable and | |
251 | set the origin of that variable to that value:: | |
252 | ||
253 | void __msan_poison_alloca(void *addr, uintptr_t size, char *descr) | |
254 | ||
255 | Access to per-task data | |
256 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
257 | ||
258 | At the beginning of every instrumented function KMSAN inserts a call to | |
259 | ``__msan_get_context_state()``:: | |
260 | ||
261 | kmsan_context_state *__msan_get_context_state(void) | |
262 | ||
263 | ``kmsan_context_state`` is declared in ``include/linux/kmsan.h``:: | |
264 | ||
265 | struct kmsan_context_state { | |
266 | char param_tls[KMSAN_PARAM_SIZE]; | |
267 | char retval_tls[KMSAN_RETVAL_SIZE]; | |
268 | char va_arg_tls[KMSAN_PARAM_SIZE]; | |
269 | char va_arg_origin_tls[KMSAN_PARAM_SIZE]; | |
270 | u64 va_arg_overflow_size_tls; | |
271 | char param_origin_tls[KMSAN_PARAM_SIZE]; | |
272 | depot_stack_handle_t retval_origin_tls; | |
273 | }; | |
274 | ||
275 | This structure is used by KMSAN to pass parameter shadows and origins between | |
276 | instrumented functions (unless the parameters are checked immediately by | |
277 | ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``). | |
278 | ||
279 | Passing uninitialized values to functions | |
280 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
281 | ||
282 | Clang's MemorySanitizer instrumentation has an option, | |
283 | ``-fsanitize-memory-param-retval``, which makes the compiler check function | |
284 | parameters passed by value, as well as function return values. | |
285 | ||
286 | The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is | |
287 | enabled by default to let KMSAN report uninitialized values earlier. | |
288 | Please refer to the `LKML discussion`_ for more details. | |
289 | ||
290 | Because of the way the checks are implemented in LLVM (they are only applied to | |
291 | parameters marked as ``noundef``), not all parameters are guaranteed to be | |
292 | checked, so we cannot give up the metadata storage in ``kmsan_context_state``. | |
293 | ||
294 | String functions | |
295 | ~~~~~~~~~~~~~~~~ | |
296 | ||
297 | The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the | |
298 | following functions. These functions are also called when data structures are | |
299 | initialized or copied, making sure shadow and origin values are copied alongside | |
300 | with the data:: | |
301 | ||
302 | void *__msan_memcpy(void *dst, void *src, uintptr_t n) | |
303 | void *__msan_memmove(void *dst, void *src, uintptr_t n) | |
304 | void *__msan_memset(void *dst, int c, uintptr_t n) | |
305 | ||
306 | Error reporting | |
307 | ~~~~~~~~~~~~~~~ | |
308 | ||
309 | For each use of a value the compiler emits a shadow check that calls | |
310 | ``__msan_warning()`` in the case that value is poisoned:: | |
311 | ||
312 | void __msan_warning(u32 origin) | |
313 | ||
314 | ``__msan_warning()`` causes KMSAN runtime to print an error report. | |
315 | ||
316 | Inline assembly instrumentation | |
317 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
318 | ||
319 | KMSAN instruments every inline assembly output with a call to:: | |
320 | ||
321 | void __msan_instrument_asm_store(void *addr, uintptr_t size) | |
322 | ||
323 | , which unpoisons the memory region. | |
324 | ||
325 | This approach may mask certain errors, but it also helps to avoid a lot of | |
326 | false positives in bitwise operations, atomics etc. | |
327 | ||
328 | Sometimes the pointers passed into inline assembly do not point to valid memory. | |
329 | In such cases they are ignored at runtime. | |
330 | ||
331 | ||
332 | Runtime library | |
333 | --------------- | |
334 | ||
335 | The code is located in ``mm/kmsan/``. | |
336 | ||
337 | Per-task KMSAN state | |
338 | ~~~~~~~~~~~~~~~~~~~~ | |
339 | ||
340 | Every task_struct has an associated KMSAN task state that holds the KMSAN | |
341 | context (see above) and a per-task flag disallowing KMSAN reports:: | |
342 | ||
343 | struct kmsan_context { | |
344 | ... | |
345 | bool allow_reporting; | |
346 | struct kmsan_context_state cstate; | |
347 | ... | |
348 | } | |
349 | ||
350 | struct task_struct { | |
351 | ... | |
352 | struct kmsan_context kmsan; | |
353 | ... | |
354 | } | |
355 | ||
356 | KMSAN contexts | |
357 | ~~~~~~~~~~~~~~ | |
358 | ||
359 | When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to | |
360 | hold the metadata for function parameters and return values. | |
361 | ||
362 | But in the case the kernel is running in the interrupt, softirq or NMI context, | |
363 | where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state:: | |
364 | ||
365 | DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx); | |
366 | ||
367 | Metadata allocation | |
368 | ~~~~~~~~~~~~~~~~~~~ | |
369 | ||
370 | There are several places in the kernel for which the metadata is stored. | |
371 | ||
372 | 1. Each ``struct page`` instance contains two pointers to its shadow and | |
373 | origin pages:: | |
374 | ||
375 | struct page { | |
376 | ... | |
377 | struct page *shadow, *origin; | |
378 | ... | |
379 | }; | |
380 | ||
381 | At boot-time, the kernel allocates shadow and origin pages for every available | |
382 | kernel page. This is done quite late, when the kernel address space is already | |
383 | fragmented, so normal data pages may arbitrarily interleave with the metadata | |
384 | pages. | |
385 | ||
386 | This means that in general for two contiguous memory pages their shadow/origin | |
387 | pages may not be contiguous. Consequently, if a memory access crosses the | |
388 | boundary of a memory block, accesses to shadow/origin memory may potentially | |
389 | corrupt other pages or read incorrect values from them. | |
390 | ||
391 | In practice, contiguous memory pages returned by the same ``alloc_pages()`` | |
392 | call will have contiguous metadata, whereas if these pages belong to two | |
393 | different allocations their metadata pages can be fragmented. | |
394 | ||
395 | For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions | |
396 | there also are no guarantees on metadata contiguity. | |
397 | ||
398 | In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two | |
399 | pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions:: | |
400 | ||
401 | char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); | |
402 | char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); | |
403 | ||
404 | ``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes. | |
405 | All stores to ``dummy_store_page`` are ignored. | |
406 | ||
407 | 2. For vmalloc memory and modules, there is a direct mapping between the memory | |
408 | range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only | |
409 | the first quarter available to ``vmalloc()``. The second quarter of the vmalloc | |
410 | area contains shadow memory for the first quarter, the third one holds the | |
411 | origins. A small part of the fourth quarter contains shadow and origins for the | |
412 | kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for | |
413 | more details. | |
414 | ||
415 | When an array of pages is mapped into a contiguous virtual memory space, their | |
416 | shadow and origin pages are similarly mapped into contiguous regions. | |
417 | ||
418 | References | |
419 | ========== | |
420 | ||
421 | E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized | |
422 | memory use in C++ | |
423 | <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_. | |
424 | In Proceedings of CGO 2015. | |
425 | ||
426 | .. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html | |
427 | .. _LLVM documentation: https://llvm.org/docs/GettingStarted.html | |
428 | .. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/ |