Commit | Line | Data |
---|---|---|
10efe55f ME |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | .. Copyright (C) 2020, Google LLC. | |
3 | ||
4 | Kernel Electric-Fence (KFENCE) | |
5 | ============================== | |
6 | ||
7 | Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety | |
8 | error detector. KFENCE detects heap out-of-bounds access, use-after-free, and | |
9 | invalid-free errors. | |
10 | ||
11 | KFENCE is designed to be enabled in production kernels, and has near zero | |
12 | performance overhead. Compared to KASAN, KFENCE trades performance for | |
13 | precision. The main motivation behind KFENCE's design, is that with enough | |
14 | total uptime KFENCE will detect bugs in code paths not typically exercised by | |
15 | non-production test workloads. One way to quickly achieve a large enough total | |
16 | uptime is when the tool is deployed across a large fleet of machines. | |
17 | ||
18 | Usage | |
19 | ----- | |
20 | ||
21 | To enable KFENCE, configure the kernel with:: | |
22 | ||
23 | CONFIG_KFENCE=y | |
24 | ||
25 | To build a kernel with KFENCE support, but disabled by default (to enable, set | |
26 | ``kfence.sample_interval`` to non-zero value), configure the kernel with:: | |
27 | ||
28 | CONFIG_KFENCE=y | |
29 | CONFIG_KFENCE_SAMPLE_INTERVAL=0 | |
30 | ||
31 | KFENCE provides several other configuration options to customize behaviour (see | |
32 | the respective help text in ``lib/Kconfig.kfence`` for more info). | |
33 | ||
34 | Tuning performance | |
35 | ~~~~~~~~~~~~~~~~~~ | |
36 | ||
37 | The most important parameter is KFENCE's sample interval, which can be set via | |
38 | the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The | |
39 | sample interval determines the frequency with which heap allocations will be | |
40 | guarded by KFENCE. The default is configurable via the Kconfig option | |
41 | ``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` | |
42 | disables KFENCE. | |
43 | ||
44 | The KFENCE memory pool is of fixed size, and if the pool is exhausted, no | |
45 | further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default | |
46 | 255), the number of available guarded objects can be controlled. Each object | |
47 | requires 2 pages, one for the object itself and the other one used as a guard | |
48 | page; object pages are interleaved with guard pages, and every object page is | |
49 | therefore surrounded by two guard pages. | |
50 | ||
51 | The total memory dedicated to the KFENCE memory pool can be computed as:: | |
52 | ||
53 | ( #objects + 1 ) * 2 * PAGE_SIZE | |
54 | ||
55 | Using the default config, and assuming a page size of 4 KiB, results in | |
56 | dedicating 2 MiB to the KFENCE memory pool. | |
57 | ||
58 | Note: On architectures that support huge pages, KFENCE will ensure that the | |
59 | pool is using pages of size ``PAGE_SIZE``. This will result in additional page | |
60 | tables being allocated. | |
61 | ||
62 | Error reports | |
63 | ~~~~~~~~~~~~~ | |
64 | ||
65 | A typical out-of-bounds access looks like this:: | |
66 | ||
67 | ================================================================== | |
bc8fbc5f | 68 | BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b |
10efe55f | 69 | |
bc8fbc5f | 70 | Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17): |
10efe55f ME |
71 | test_out_of_bounds_read+0xa3/0x22b |
72 | kunit_try_run_case+0x51/0x85 | |
73 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
74 | kthread+0x137/0x160 | |
75 | ret_from_fork+0x22/0x30 | |
76 | ||
77 | kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507: | |
78 | test_alloc+0xf3/0x25b | |
79 | test_out_of_bounds_read+0x98/0x22b | |
80 | kunit_try_run_case+0x51/0x85 | |
81 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
82 | kthread+0x137/0x160 | |
83 | ret_from_fork+0x22/0x30 | |
84 | ||
85 | CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7 | |
86 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 | |
87 | ================================================================== | |
88 | ||
89 | The header of the report provides a short summary of the function involved in | |
90 | the access. It is followed by more detailed information about the access and | |
91 | its origin. Note that, real kernel addresses are only shown for | |
92 | ``CONFIG_DEBUG_KERNEL=y`` builds. | |
93 | ||
94 | Use-after-free accesses are reported as:: | |
95 | ||
96 | ================================================================== | |
bc8fbc5f | 97 | BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143 |
10efe55f | 98 | |
bc8fbc5f | 99 | Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24): |
10efe55f ME |
100 | test_use_after_free_read+0xb3/0x143 |
101 | kunit_try_run_case+0x51/0x85 | |
102 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
103 | kthread+0x137/0x160 | |
104 | ret_from_fork+0x22/0x30 | |
105 | ||
106 | kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507: | |
107 | test_alloc+0xf3/0x25b | |
108 | test_use_after_free_read+0x76/0x143 | |
109 | kunit_try_run_case+0x51/0x85 | |
110 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
111 | kthread+0x137/0x160 | |
112 | ret_from_fork+0x22/0x30 | |
113 | ||
114 | freed by task 507: | |
115 | test_use_after_free_read+0xa8/0x143 | |
116 | kunit_try_run_case+0x51/0x85 | |
117 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
118 | kthread+0x137/0x160 | |
119 | ret_from_fork+0x22/0x30 | |
120 | ||
121 | CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 | |
122 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 | |
123 | ================================================================== | |
124 | ||
125 | KFENCE also reports on invalid frees, such as double-frees:: | |
126 | ||
127 | ================================================================== | |
128 | BUG: KFENCE: invalid free in test_double_free+0xdc/0x171 | |
129 | ||
130 | Invalid free of 0xffffffffb6741000: | |
131 | test_double_free+0xdc/0x171 | |
132 | kunit_try_run_case+0x51/0x85 | |
133 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
134 | kthread+0x137/0x160 | |
135 | ret_from_fork+0x22/0x30 | |
136 | ||
137 | kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507: | |
138 | test_alloc+0xf3/0x25b | |
139 | test_double_free+0x76/0x171 | |
140 | kunit_try_run_case+0x51/0x85 | |
141 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
142 | kthread+0x137/0x160 | |
143 | ret_from_fork+0x22/0x30 | |
144 | ||
145 | freed by task 507: | |
146 | test_double_free+0xa8/0x171 | |
147 | kunit_try_run_case+0x51/0x85 | |
148 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
149 | kthread+0x137/0x160 | |
150 | ret_from_fork+0x22/0x30 | |
151 | ||
152 | CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 | |
153 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 | |
154 | ================================================================== | |
155 | ||
156 | KFENCE also uses pattern-based redzones on the other side of an object's guard | |
157 | page, to detect out-of-bounds writes on the unprotected side of the object. | |
158 | These are reported on frees:: | |
159 | ||
160 | ================================================================== | |
161 | BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184 | |
162 | ||
163 | Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69): | |
164 | test_kmalloc_aligned_oob_write+0xef/0x184 | |
165 | kunit_try_run_case+0x51/0x85 | |
166 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
167 | kthread+0x137/0x160 | |
168 | ret_from_fork+0x22/0x30 | |
169 | ||
170 | kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507: | |
171 | test_alloc+0xf3/0x25b | |
172 | test_kmalloc_aligned_oob_write+0x57/0x184 | |
173 | kunit_try_run_case+0x51/0x85 | |
174 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
175 | kthread+0x137/0x160 | |
176 | ret_from_fork+0x22/0x30 | |
177 | ||
178 | CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 | |
179 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 | |
180 | ================================================================== | |
181 | ||
182 | For such errors, the address where the corruption occurred as well as the | |
183 | invalidly written bytes (offset from the address) are shown; in this | |
184 | representation, '.' denote untouched bytes. In the example above ``0xac`` is | |
185 | the value written to the invalid address at offset 0, and the remaining '.' | |
186 | denote that no following bytes have been touched. Note that, real values are | |
187 | only shown for ``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information | |
188 | disclosure for non-debug builds, '!' is used instead to denote invalidly | |
189 | written bytes. | |
190 | ||
191 | And finally, KFENCE may also report on invalid accesses to any protected page | |
192 | where it was not possible to determine an associated object, e.g. if adjacent | |
193 | object pages had not yet been allocated:: | |
194 | ||
195 | ================================================================== | |
bc8fbc5f | 196 | BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0 |
10efe55f | 197 | |
bc8fbc5f | 198 | Invalid read at 0xffffffffb670b00a: |
10efe55f ME |
199 | test_invalid_access+0x26/0xe0 |
200 | kunit_try_run_case+0x51/0x85 | |
201 | kunit_generic_run_threadfn_adapter+0x16/0x30 | |
202 | kthread+0x137/0x160 | |
203 | ret_from_fork+0x22/0x30 | |
204 | ||
205 | CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 | |
206 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 | |
207 | ================================================================== | |
208 | ||
209 | DebugFS interface | |
210 | ~~~~~~~~~~~~~~~~~ | |
211 | ||
212 | Some debugging information is exposed via debugfs: | |
213 | ||
214 | * The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics. | |
215 | ||
216 | * The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects | |
217 | allocated via KFENCE, including those already freed but protected. | |
218 | ||
219 | Implementation Details | |
220 | ---------------------- | |
221 | ||
222 | Guarded allocations are set up based on the sample interval. After expiration | |
223 | of the sample interval, the next allocation through the main allocator (SLAB or | |
224 | SLUB) returns a guarded allocation from the KFENCE object pool (allocation | |
225 | sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and | |
226 | the next allocation is set up after the expiration of the interval. To "gate" a | |
227 | KFENCE allocation through the main allocator's fast-path without overhead, | |
228 | KFENCE relies on static branches via the static keys infrastructure. The static | |
229 | branch is toggled to redirect the allocation to KFENCE. | |
230 | ||
231 | KFENCE objects each reside on a dedicated page, at either the left or right | |
232 | page boundaries selected at random. The pages to the left and right of the | |
233 | object page are "guard pages", whose attributes are changed to a protected | |
234 | state, and cause page faults on any attempted access. Such page faults are then | |
235 | intercepted by KFENCE, which handles the fault gracefully by reporting an | |
236 | out-of-bounds access, and marking the page as accessible so that the faulting | |
237 | code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead). | |
238 | ||
239 | To detect out-of-bounds writes to memory within the object's page itself, | |
240 | KFENCE also uses pattern-based redzones. For each object page, a redzone is set | |
241 | up for all non-object memory. For typical alignments, the redzone is only | |
242 | required on the unguarded side of an object. Because KFENCE must honor the | |
243 | cache's requested alignment, special alignments may result in unprotected gaps | |
244 | on either side of an object, all of which are redzoned. | |
245 | ||
246 | The following figure illustrates the page layout:: | |
247 | ||
248 | ---+-----------+-----------+-----------+-----------+-----------+--- | |
249 | | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | | |
250 | | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | | |
251 | | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | | |
252 | | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | | |
253 | | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | | |
254 | | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | | |
255 | ---+-----------+-----------+-----------+-----------+-----------+--- | |
256 | ||
257 | Upon deallocation of a KFENCE object, the object's page is again protected and | |
258 | the object is marked as freed. Any further access to the object causes a fault | |
259 | and KFENCE reports a use-after-free access. Freed objects are inserted at the | |
260 | tail of KFENCE's freelist, so that the least recently freed objects are reused | |
261 | first, and the chances of detecting use-after-frees of recently freed objects | |
262 | is increased. | |
263 | ||
264 | Interface | |
265 | --------- | |
266 | ||
267 | The following describes the functions which are used by allocators as well as | |
268 | page handling code to set up and deal with KFENCE allocations. | |
269 | ||
270 | .. kernel-doc:: include/linux/kfence.h | |
271 | :functions: is_kfence_address | |
272 | kfence_shutdown_cache | |
273 | kfence_alloc kfence_free __kfence_free | |
274 | kfence_ksize kfence_object_start | |
275 | kfence_handle_page_fault | |
276 | ||
277 | Related Tools | |
278 | ------------- | |
279 | ||
280 | In userspace, a similar approach is taken by `GWP-ASan | |
281 | <http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and | |
282 | a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is | |
283 | directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another | |
284 | similar but non-sampling approach, that also inspired the name "KFENCE", can be | |
285 | found in the userspace `Electric Fence Malloc Debugger | |
286 | <https://linux.die.net/man/3/efence>`_. | |
287 | ||
288 | In the kernel, several tools exist to debug memory access errors, and in | |
289 | particular KASAN can detect all bug classes that KFENCE can detect. While KASAN | |
290 | is more precise, relying on compiler instrumentation, this comes at a | |
291 | performance cost. | |
292 | ||
293 | It is worth highlighting that KASAN and KFENCE are complementary, with | |
294 | different target environments. For instance, KASAN is the better debugging-aid, | |
295 | where test cases or reproducers exists: due to the lower chance to detect the | |
296 | error, it would require more effort using KFENCE to debug. Deployments at scale | |
297 | that cannot afford to enable KASAN, however, would benefit from using KFENCE to | |
298 | discover bugs due to code paths not exercised by test cases or fuzzers. |