[linux-2.6-block.git] / Documentation / x86 / protection-keys.txt

Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
which will be found on future Intel CPUs.

Memory Protection Keys provides a mechanism for enforcing page-based
protections, but without requiring modification of the page tables
when an application changes protection domains.  It works by
dedicating 4 previously ignored bits in each page table entry to a
"protection key", giving 16 possible keys.

There is also a new user-accessible register (PKRU) with two separate
bits (Access Disable and Write Disable) for each key.  Being a CPU
register, PKRU is inherently thread-local, potentially giving each
thread a different set of protections from every other thread.

There are two new instructions (RDPKRU/WRPKRU) for reading and writing
to the new register.  The feature is only available in 64-bit mode,
even though there is theoretically space in the PAE PTEs.  These
permissions are enforced on data access only and have no effect on
instruction fetches.

=========================== Syscalls ===========================

There are 3 system calls which directly interact with pkeys:

	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	int pkey_free(int pkey);
	int pkey_mprotect(unsigned long start, size_t len,
			  unsigned long prot, int pkey);

Before a pkey can be used, it must first be allocated with
pkey_alloc().  An application calls the WRPKRU instruction
directly in order to change access permissions to memory covered
with a key.  In this example WRPKRU is wrapped by a C function
called pkey_set().

	int real_prot = PROT_READ|PROT_WRITE;
	pkey = pkey_alloc(0, PKEY_DENY_WRITE);
	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
	... application runs here

Now, if the application needs to update the data at 'ptr', it can
gain access, do the update, then remove its write access:

	pkey_set(pkey, 0); // clear PKEY_DENY_WRITE
	*ptr = foo; // assign something
	pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again

Now when it frees the memory, it will also free the pkey since it
is no longer in use:

	munmap(ptr, PAGE_SIZE);
	pkey_free(pkey);

(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
 An example implementation can be found in
 tools/testing/selftests/x86/protection_keys.c)

=========================== Behavior ===========================

The kernel attempts to make protection keys consistent with the
behavior of a plain mprotect().  For instance if you do this:

	mprotect(ptr, size, PROT_NONE);
	something(ptr);

you can expect the same effects with protection keys when doing this:

	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
	something(ptr);

That should be true whether something() is a direct access to 'ptr'
like:

	*ptr = foo;

or when the kernel does the access on the application's behalf like
with a read():

	read(fd, ptr, 1);

The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
Commit	Line	Data
591b1d8d DH	1	Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
	2	which will be found on future Intel CPUs.
	3
	4	Memory Protection Keys provides a mechanism for enforcing page-based
	5	protections, but without requiring modification of the page tables
	6	when an application changes protection domains. It works by
	7	dedicating 4 previously ignored bits in each page table entry to a
	8	"protection key", giving 16 possible keys.
	9
	10	There is also a new user-accessible register (PKRU) with two separate
	11	bits (Access Disable and Write Disable) for each key. Being a CPU
	12	register, PKRU is inherently thread-local, potentially giving each
	13	thread a different set of protections from every other thread.
	14
	15	There are two new instructions (RDPKRU/WRPKRU) for reading and writing
	16	to the new register. The feature is only available in 64-bit mode,
	17	even though there is theoretically space in the PAE PTEs. These
	18	permissions are enforced on data access only and have no effect on
	19	instruction fetches.
	20
c74fe394 DH	21	=========================== Syscalls ===========================
c74fe394 DH	22
6679dac5	23	There are 3 system calls which directly interact with pkeys:
c74fe394 DH	24
	25	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	26	int pkey_free(int pkey);
	27	int pkey_mprotect(unsigned long start, size_t len,
	28	unsigned long prot, int pkey);
	29
	30	Before a pkey can be used, it must first be allocated with
	31	pkey_alloc(). An application calls the WRPKRU instruction
	32	directly in order to change access permissions to memory covered
	33	with a key. In this example WRPKRU is wrapped by a C function
	34	called pkey_set().
	35
	36	int real_prot = PROT_READ\|PROT_WRITE;
	37	pkey = pkey_alloc(0, PKEY_DENY_WRITE);
	38	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS\|MAP_PRIVATE, -1, 0);
	39	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
	40	... application runs here
	41
	42	Now, if the application needs to update the data at 'ptr', it can
	43	gain access, do the update, then remove its write access:
	44
	45	pkey_set(pkey, 0); // clear PKEY_DENY_WRITE
	46	*ptr = foo; // assign something
	47	pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again
	48
	49	Now when it frees the memory, it will also free the pkey since it
	50	is no longer in use:
	51
	52	munmap(ptr, PAGE_SIZE);
	53	pkey_free(pkey);
	54
6679dac5 DH	55	(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
	56	An example implementation can be found in
	57	tools/testing/selftests/x86/protection_keys.c)
	58
c74fe394 DH	59	=========================== Behavior ===========================
	60
	61	The kernel attempts to make protection keys consistent with the
	62	behavior of a plain mprotect(). For instance if you do this:
	63
	64	mprotect(ptr, size, PROT_NONE);
	65	something(ptr);
	66
	67	you can expect the same effects with protection keys when doing this:
	68
	69	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE \| PKEY_DISABLE_READ);
	70	pkey_mprotect(ptr, size, PROT_READ\|PROT_WRITE, pkey);
	71	something(ptr);
	72
	73	That should be true whether something() is a direct access to 'ptr'
	74	like:
	75
	76	*ptr = foo;
	77
	78	or when the kernel does the access on the application's behalf like
	79	with a read():
	80
	81	read(fd, ptr, 1);
	82
	83	The kernel will send a SIGSEGV in both cases, but si_code will be set
	84	to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
	85	the plain mprotect() permissions are violated.