Commit | Line | Data |
---|---|---|
bf15d79c AP |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ============== | |
4 | Nitro Enclaves | |
5 | ============== | |
6 | ||
7 | Overview | |
8 | ======== | |
9 | ||
10 | Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability | |
11 | that allows customers to carve out isolated compute environments within EC2 | |
12 | instances [1]. | |
13 | ||
14 | For example, an application that processes sensitive data and runs in a VM, | |
15 | can be separated from other applications running in the same VM. This | |
16 | application then runs in a separate VM than the primary VM, namely an enclave. | |
cfa3c18c AP |
17 | It runs alongside the VM that spawned it. This setup matches low latency |
18 | applications needs. | |
bf15d79c | 19 | |
cfa3c18c AP |
20 | The current supported architectures for the NE kernel driver, available in the |
21 | upstream Linux kernel, are x86 and ARM64. | |
22 | ||
23 | The resources that are allocated for the enclave, such as memory and CPUs, are | |
24 | carved out of the primary VM. Each enclave is mapped to a process running in the | |
25 | primary VM, that communicates with the NE kernel driver via an ioctl interface. | |
bf15d79c AP |
26 | |
27 | In this sense, there are two components: | |
28 | ||
29 | 1. An enclave abstraction process - a user space process running in the primary | |
30 | VM guest that uses the provided ioctl interface of the NE driver to spawn an | |
31 | enclave VM (that's 2 below). | |
32 | ||
33 | There is a NE emulated PCI device exposed to the primary VM. The driver for this | |
34 | new PCI device is included in the NE driver. | |
35 | ||
36 | The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl | |
37 | maps to an enclave start PCI command. The PCI device commands are then | |
38 | translated into actions taken on the hypervisor side; that's the Nitro | |
39 | hypervisor running on the host where the primary VM is running. The Nitro | |
40 | hypervisor is based on core KVM technology. | |
41 | ||
42 | 2. The enclave itself - a VM running on the same host as the primary VM that | |
43 | spawned it. Memory and CPUs are carved out of the primary VM and are dedicated | |
44 | for the enclave VM. An enclave does not have persistent storage attached. | |
45 | ||
46 | The memory regions carved out of the primary VM and given to an enclave need to | |
47 | be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of | |
48 | this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from | |
cfa3c18c AP |
49 | user space [2][3][7]. The memory size for an enclave needs to be at least |
50 | 64 MiB. The enclave memory and CPUs need to be from the same NUMA node. | |
bf15d79c AP |
51 | |
52 | An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain | |
53 | available for the primary VM. A CPU pool has to be set for NE purposes by an | |
54 | user with admin capability. See the cpu list section from the kernel | |
55 | documentation [4] for how a CPU pool format looks. | |
56 | ||
57 | An enclave communicates with the primary VM via a local communication channel, | |
58 | using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device, | |
59 | while the enclave VM has a virtio-mmio vsock emulated device. The vsock device | |
60 | uses eventfd for signaling. The enclave VM sees the usual interfaces - local | |
61 | APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio | |
62 | device is placed in memory below the typical 4 GiB. | |
63 | ||
64 | The application that runs in the enclave needs to be packaged in an enclave | |
65 | image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the | |
66 | enclave VM. The enclave VM has its own kernel and follows the standard Linux | |
cfa3c18c | 67 | boot protocol [6][8]. |
bf15d79c AP |
68 | |
69 | The kernel bzImage, the kernel command line, the ramdisk(s) are part of the | |
70 | Enclave Image Format (EIF); plus an EIF header including metadata such as magic | |
71 | number, eif version, image size and CRC. | |
72 | ||
73 | Hash values are computed for the entire enclave image (EIF), the kernel and | |
74 | ramdisk(s). That's used, for example, to check that the enclave image that is | |
75 | loaded in the enclave VM is the one that was intended to be run. | |
76 | ||
77 | These crypto measurements are included in a signed attestation document | |
78 | generated by the Nitro Hypervisor and further used to prove the identity of the | |
79 | enclave; KMS is an example of service that NE is integrated with and that checks | |
80 | the attestation doc. | |
81 | ||
82 | The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The | |
83 | init process in the enclave connects to the vsock CID of the primary VM and a | |
84 | predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is | |
85 | used to check in the primary VM that the enclave has booted. The CID of the | |
86 | primary VM is 3. | |
87 | ||
88 | If the enclave VM crashes or gracefully exits, an interrupt event is received by | |
89 | the NE driver. This event is sent further to the user space enclave process | |
90 | running in the primary VM via a poll notification mechanism. Then the user space | |
91 | enclave process can exit. | |
92 | ||
93 | [1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/ | |
94 | [2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html | |
95 | [3] https://lwn.net/Articles/807108/ | |
96 | [4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html | |
97 | [5] https://man7.org/linux/man-pages/man7/vsock.7.html | |
98 | [6] https://www.kernel.org/doc/html/latest/x86/boot.html | |
cfa3c18c AP |
99 | [7] https://www.kernel.org/doc/html/latest/arm64/hugetlbpage.html |
100 | [8] https://www.kernel.org/doc/html/latest/arm64/booting.html |