[linux-block.git] / Documentation / virt / ne_overview.rst

.. SPDX-License-Identifier: GPL-2.0

==============
Nitro Enclaves
==============

Overview
========

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.
It runs alongside the VM that spawned it. This setup matches low latency
applications needs.

The current supported architectures for the NE kernel driver, available in the
upstream Linux kernel, are x86 and ARM64.

The resources that are allocated for the enclave, such as memory and CPUs, are
carved out of the primary VM. Each enclave is mapped to a process running in the
primary VM, that communicates with the NE kernel driver via an ioctl interface.

In this sense, there are two components:

1. An enclave abstraction process - a user space process running in the primary
VM guest that uses the provided ioctl interface of the NE driver to spawn an
enclave VM (that's 2 below).

There is a NE emulated PCI device exposed to the primary VM. The driver for this
new PCI device is included in the NE driver.

The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
maps to an enclave start PCI command. The PCI device commands are then
translated into  actions taken on the hypervisor side; that's the Nitro
hypervisor running on the host where the primary VM is running. The Nitro
hypervisor is based on core KVM technology.

2. The enclave itself - a VM running on the same host as the primary VM that
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
for the enclave VM. An enclave does not have persistent storage attached.

The memory regions carved out of the primary VM and given to an enclave need to
be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
user space [2][3][7]. The memory size for an enclave needs to be at least
64 MiB. The enclave memory and CPUs need to be from the same NUMA node.

An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
available for the primary VM. A CPU pool has to be set for NE purposes by an
user with admin capability. See the cpu list section from the kernel
documentation [4] for how a CPU pool format looks.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
device is placed in memory below the typical 4 GiB.

The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
boot protocol [6][8].

The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
number, eif version, image size and CRC.

Hash values are computed for the entire enclave image (EIF), the kernel and
ramdisk(s). That's used, for example, to check that the enclave image that is
loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity of the
enclave; KMS is an example of service that NE is integrated with and that checks
the attestation doc.

The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
init process in the enclave connects to the vsock CID of the primary VM and a
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
used to check in the primary VM that the enclave has booted. The CID of the
primary VM is 3.

If the enclave VM crashes or gracefully exits, an interrupt event is received by
the NE driver. This event is sent further to the user space enclave process
running in the primary VM via a poll notification mechanism. Then the user space
enclave process can exit.

[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
[3] https://lwn.net/Articles/807108/
[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
[5] https://man7.org/linux/man-pages/man7/vsock.7.html
[6] https://www.kernel.org/doc/html/latest/x86/boot.html
[7] https://www.kernel.org/doc/html/latest/arm64/hugetlbpage.html
[8] https://www.kernel.org/doc/html/latest/arm64/booting.html
Commit	Line	Data
bf15d79c AP	1	.. SPDX-License-Identifier: GPL-2.0
	2
	3	==============
	4	Nitro Enclaves
	5	==============
	6
	7	Overview
	8	========
	9
	10	Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
	11	that allows customers to carve out isolated compute environments within EC2
	12	instances [1].
	13
	14	For example, an application that processes sensitive data and runs in a VM,
	15	can be separated from other applications running in the same VM. This
	16	application then runs in a separate VM than the primary VM, namely an enclave.
cfa3c18c AP	17	It runs alongside the VM that spawned it. This setup matches low latency
cfa3c18c AP	18	applications needs.
bf15d79c	19
cfa3c18c AP	20	The current supported architectures for the NE kernel driver, available in the
	21	upstream Linux kernel, are x86 and ARM64.
	22
	23	The resources that are allocated for the enclave, such as memory and CPUs, are
	24	carved out of the primary VM. Each enclave is mapped to a process running in the
	25	primary VM, that communicates with the NE kernel driver via an ioctl interface.
bf15d79c AP	26
	27	In this sense, there are two components:
	28
	29	1. An enclave abstraction process - a user space process running in the primary
	30	VM guest that uses the provided ioctl interface of the NE driver to spawn an
	31	enclave VM (that's 2 below).
	32
	33	There is a NE emulated PCI device exposed to the primary VM. The driver for this
	34	new PCI device is included in the NE driver.
	35
	36	The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
	37	maps to an enclave start PCI command. The PCI device commands are then
	38	translated into actions taken on the hypervisor side; that's the Nitro
	39	hypervisor running on the host where the primary VM is running. The Nitro
	40	hypervisor is based on core KVM technology.
	41
	42	2. The enclave itself - a VM running on the same host as the primary VM that
	43	spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
	44	for the enclave VM. An enclave does not have persistent storage attached.
	45
	46	The memory regions carved out of the primary VM and given to an enclave need to
	47	be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
	48	this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
cfa3c18c AP	49	user space [2][3][7]. The memory size for an enclave needs to be at least
cfa3c18c AP	50	64 MiB. The enclave memory and CPUs need to be from the same NUMA node.
bf15d79c AP	51
	52	An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
	53	available for the primary VM. A CPU pool has to be set for NE purposes by an
	54	user with admin capability. See the cpu list section from the kernel
	55	documentation [4] for how a CPU pool format looks.
	56
	57	An enclave communicates with the primary VM via a local communication channel,
	58	using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
	59	while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
	60	uses eventfd for signaling. The enclave VM sees the usual interfaces - local
	61	APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
	62	device is placed in memory below the typical 4 GiB.
	63
	64	The application that runs in the enclave needs to be packaged in an enclave
	65	image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
	66	enclave VM. The enclave VM has its own kernel and follows the standard Linux
cfa3c18c	67	boot protocol [6][8].
bf15d79c AP	68
	69	The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
	70	Enclave Image Format (EIF); plus an EIF header including metadata such as magic
	71	number, eif version, image size and CRC.
	72
	73	Hash values are computed for the entire enclave image (EIF), the kernel and
	74	ramdisk(s). That's used, for example, to check that the enclave image that is
	75	loaded in the enclave VM is the one that was intended to be run.
	76
	77	These crypto measurements are included in a signed attestation document
	78	generated by the Nitro Hypervisor and further used to prove the identity of the
	79	enclave; KMS is an example of service that NE is integrated with and that checks
	80	the attestation doc.
	81
	82	The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
	83	init process in the enclave connects to the vsock CID of the primary VM and a
	84	predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
	85	used to check in the primary VM that the enclave has booted. The CID of the
	86	primary VM is 3.
	87
	88	If the enclave VM crashes or gracefully exits, an interrupt event is received by
	89	the NE driver. This event is sent further to the user space enclave process
	90	running in the primary VM via a poll notification mechanism. Then the user space
	91	enclave process can exit.
	92
	93	[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
	94	[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
	95	[3] https://lwn.net/Articles/807108/
	96	[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
	97	[5] https://man7.org/linux/man-pages/man7/vsock.7.html
	98	[6] https://www.kernel.org/doc/html/latest/x86/boot.html
cfa3c18c AP	99	[7] https://www.kernel.org/doc/html/latest/arm64/hugetlbpage.html
cfa3c18c AP	100	[8] https://www.kernel.org/doc/html/latest/arm64/booting.html