Commit | Line | Data |
---|---|---|
df4e817b PT |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
df4e817b PT |
3 | ================ |
4 | Page Table Check | |
5 | ================ | |
6 | ||
7 | Introduction | |
8 | ============ | |
9 | ||
854d0982 | 10 | Page table check allows to harden the kernel by ensuring that some types of |
df4e817b PT |
11 | the memory corruptions are prevented. |
12 | ||
13 | Page table check performs extra verifications at the time when new pages become | |
14 | accessible from the userspace by getting their page table entries (PTEs PMDs | |
15 | etc.) added into the table. | |
16 | ||
8430557f | 17 | In case of most detected corruption, the kernel is crashed. There is a small |
df4e817b PT |
18 | performance and memory overhead associated with the page table check. Therefore, |
19 | it is disabled by default, but can be optionally enabled on systems where the | |
20 | extra hardening outweighs the performance costs. Also, because page table check | |
21 | is synchronous, it can help with debugging double map memory corruption issues, | |
22 | by crashing kernel at the time wrong mapping occurs instead of later which is | |
23 | often the case with memory corruptions bugs. | |
24 | ||
8430557f PX |
25 | It can also be used to do page table entry checks over various flags, dump |
26 | warnings when illegal combinations of entry flags are detected. Currently, | |
27 | userfaultfd is the only user of such to sanity check wr-protect bit against | |
28 | any writable flags. Illegal flag combinations will not directly cause data | |
29 | corruption in this case immediately, but that will cause read-only data to | |
30 | be writable, leading to corrupt when the page content is later modified. | |
31 | ||
df4e817b PT |
32 | Double mapping detection logic |
33 | ============================== | |
34 | ||
35 | +-------------------+-------------------+-------------------+------------------+ | |
36 | | Current Mapping | New mapping | Permissions | Rule | | |
37 | +===================+===================+===================+==================+ | |
38 | | Anonymous | Anonymous | Read | Allow | | |
39 | +-------------------+-------------------+-------------------+------------------+ | |
40 | | Anonymous | Anonymous | Read / Write | Prohibit | | |
41 | +-------------------+-------------------+-------------------+------------------+ | |
42 | | Anonymous | Named | Any | Prohibit | | |
43 | +-------------------+-------------------+-------------------+------------------+ | |
44 | | Named | Anonymous | Any | Prohibit | | |
45 | +-------------------+-------------------+-------------------+------------------+ | |
46 | | Named | Named | Any | Allow | | |
47 | +-------------------+-------------------+-------------------+------------------+ | |
48 | ||
49 | Enabling Page Table Check | |
50 | ========================= | |
51 | ||
52 | Build kernel with: | |
53 | ||
54 | - PAGE_TABLE_CHECK=y | |
55 | Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK | |
56 | is available. | |
57 | ||
58 | - Boot with 'page_table_check=on' kernel parameter. | |
59 | ||
60 | Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page | |
61 | table support without extra kernel parameter. | |
81a31a86 RL |
62 | |
63 | Implementation notes | |
64 | ==================== | |
65 | ||
66 | We specifically decided not to use VMA information in order to avoid relying on | |
67 | MM states (except for limited "struct page" info). The page table check is a | |
68 | separate from Linux-MM state machine that verifies that the user accessible | |
69 | pages are not falsely shared. | |
70 | ||
71 | PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without | |
72 | EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory | |
73 | regions into the userspace via /dev/mem. At the same time, pages may change | |
74 | their properties (e.g., from anonymous pages to named pages) while they are | |
75 | still being mapped in the userspace, leading to "corruption" detected by the | |
76 | page table check. | |
77 | ||
78 | Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via | |
79 | /dev/mem. However, these pages are always considered as named pages, so they | |
80 | won't break the logic used in the page table check. |