Commit | Line | Data |
---|---|---|
591b1d8d DH |
1 | Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature |
2 | which will be found on future Intel CPUs. | |
3 | ||
4 | Memory Protection Keys provides a mechanism for enforcing page-based | |
5 | protections, but without requiring modification of the page tables | |
6 | when an application changes protection domains. It works by | |
7 | dedicating 4 previously ignored bits in each page table entry to a | |
8 | "protection key", giving 16 possible keys. | |
9 | ||
10 | There is also a new user-accessible register (PKRU) with two separate | |
11 | bits (Access Disable and Write Disable) for each key. Being a CPU | |
12 | register, PKRU is inherently thread-local, potentially giving each | |
13 | thread a different set of protections from every other thread. | |
14 | ||
15 | There are two new instructions (RDPKRU/WRPKRU) for reading and writing | |
16 | to the new register. The feature is only available in 64-bit mode, | |
17 | even though there is theoretically space in the PAE PTEs. These | |
18 | permissions are enforced on data access only and have no effect on | |
19 | instruction fetches. | |
20 | ||
c74fe394 DH |
21 | =========================== Syscalls =========================== |
22 | ||
6679dac5 | 23 | There are 3 system calls which directly interact with pkeys: |
c74fe394 DH |
24 | |
25 | int pkey_alloc(unsigned long flags, unsigned long init_access_rights) | |
26 | int pkey_free(int pkey); | |
27 | int pkey_mprotect(unsigned long start, size_t len, | |
28 | unsigned long prot, int pkey); | |
29 | ||
30 | Before a pkey can be used, it must first be allocated with | |
31 | pkey_alloc(). An application calls the WRPKRU instruction | |
32 | directly in order to change access permissions to memory covered | |
33 | with a key. In this example WRPKRU is wrapped by a C function | |
34 | called pkey_set(). | |
35 | ||
36 | int real_prot = PROT_READ|PROT_WRITE; | |
37 | pkey = pkey_alloc(0, PKEY_DENY_WRITE); | |
38 | ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); | |
39 | ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); | |
40 | ... application runs here | |
41 | ||
42 | Now, if the application needs to update the data at 'ptr', it can | |
43 | gain access, do the update, then remove its write access: | |
44 | ||
45 | pkey_set(pkey, 0); // clear PKEY_DENY_WRITE | |
46 | *ptr = foo; // assign something | |
47 | pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again | |
48 | ||
49 | Now when it frees the memory, it will also free the pkey since it | |
50 | is no longer in use: | |
51 | ||
52 | munmap(ptr, PAGE_SIZE); | |
53 | pkey_free(pkey); | |
54 | ||
6679dac5 DH |
55 | (Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. |
56 | An example implementation can be found in | |
57 | tools/testing/selftests/x86/protection_keys.c) | |
58 | ||
c74fe394 DH |
59 | =========================== Behavior =========================== |
60 | ||
61 | The kernel attempts to make protection keys consistent with the | |
62 | behavior of a plain mprotect(). For instance if you do this: | |
63 | ||
64 | mprotect(ptr, size, PROT_NONE); | |
65 | something(ptr); | |
66 | ||
67 | you can expect the same effects with protection keys when doing this: | |
68 | ||
69 | pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); | |
70 | pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); | |
71 | something(ptr); | |
72 | ||
73 | That should be true whether something() is a direct access to 'ptr' | |
74 | like: | |
75 | ||
76 | *ptr = foo; | |
77 | ||
78 | or when the kernel does the access on the application's behalf like | |
79 | with a read(): | |
80 | ||
81 | read(fd, ptr, 1); | |
82 | ||
83 | The kernel will send a SIGSEGV in both cases, but si_code will be set | |
84 | to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when | |
85 | the plain mprotect() permissions are violated. |