[linux-block.git] / Documentation / security / self-protection.rst

======================
Kernel Self-Protection
======================

Kernel self-protection is the design and implementation of systems and
structures within the Linux kernel to protect against security flaws in
the kernel itself. This covers a wide range of issues, including removing
entire classes of bugs, blocking security flaw exploitation methods,
and actively detecting attack attempts. Not all topics are explored in
this document, but it should serve as a reasonable starting point and
answer any frequently asked questions. (Patches welcome, of course!)

In the worst-case scenario, we assume an unprivileged local attacker
has arbitrary read and write access to the kernel's memory. In many
cases, bugs being exploited will not provide this level of access,
but with systems in place that defend against the worst case we'll
cover the more limited cases as well. A higher bar, and one that should
still be kept in mind, is protecting the kernel against a _privileged_
local attacker, since the root user has access to a vastly increased
attack surface. (Especially when they have the ability to load arbitrary
kernel modules.)

The goals for successful self-protection systems would be that they
are effective, on by default, require no opt-in by developers, have no
performance impact, do not impede kernel debugging, and have tests. It
is uncommon that all these goals can be met, but it is worth explicitly
mentioning them, since these aspects need to be explored, dealt with,
and/or accepted.


Attack Surface Reduction
========================

The most fundamental defense against security exploits is to reduce the
areas of the kernel that can be used to redirect execution. This ranges
from limiting the exposed APIs available to userspace, making in-kernel
APIs hard to use incorrectly, minimizing the areas of writable kernel
memory, etc.

Strict kernel memory permissions
--------------------------------

When all of kernel memory is writable, it becomes trivial for attacks
to redirect execution flow. To reduce the availability of these targets
the kernel needs to protect its memory with a tight set of permissions.

Executable code and read-only data must not be writable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Any areas of the kernel with executable memory must not be writable.
While this obviously includes the kernel text itself, we must consider
all additional places too: kernel modules, JIT memory, etc. (There are
temporary exceptions to this rule to support things like instruction
alternatives, breakpoints, kprobes, etc. If these must exist in a
kernel, they are implemented in a way where the memory is temporarily
made writable during the update, and then returned to the original
permissions.)

In support of this are ``CONFIG_STRICT_KERNEL_RWX`` and
``CONFIG_STRICT_MODULE_RWX``, which seek to make sure that code is not
writable, data is not executable, and read-only data is neither writable
nor executable.

Most architectures have these options on by default and not user selectable.
For some architectures like arm that wish to have these be selectable,
the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable
a Kconfig prompt. ``CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT`` determines
the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled.

Function pointers and sensitive variables must not be writable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Vast areas of kernel memory contain function pointers that are looked
up by the kernel and used to continue execution (e.g. descriptor/vector
tables, file/network/etc operation structures, etc). The number of these
variables must be reduced to an absolute minimum.

Many such variables can be made read-only by setting them "const"
so that they live in the .rodata section instead of the .data section
of the kernel, gaining the protection of the kernel's strict memory
permissions as described above.

For variables that are initialized once at ``__init`` time, these can
be marked with the ``__ro_after_init`` attribute.

What remains are variables that are updated rarely (e.g. GDT). These
will need another infrastructure (similar to the temporary exceptions
made to kernel code mentioned above) that allow them to spend the rest
of their lifetime read-only. (For example, when being updated, only the
CPU thread performing the update would be given uninterruptible write
access to the memory.)

Segregation of kernel memory from userspace memory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The kernel must never execute userspace memory. The kernel must also never
access userspace memory without explicit expectation to do so. These
rules can be enforced either by support of hardware-based restrictions
(x86's SMEP/SMAP, ARM's PXN/PAN) or via emulation (ARM's Memory Domains).
By blocking userspace memory in this way, execution and data parsing
cannot be passed to trivially-controlled userspace memory, forcing
attacks to operate entirely in kernel memory.

Reduced access to syscalls
--------------------------

One trivial way to eliminate many syscalls for 64-bit systems is building
without ``CONFIG_COMPAT``. However, this is rarely a feasible scenario.

The "seccomp" system provides an opt-in feature made available to
userspace, which provides a way to reduce the number of kernel entry
points available to a running process. This limits the breadth of kernel
code that can be reached, possibly reducing the availability of a given
bug to an attack.

An area of improvement would be creating viable ways to keep access to
things like compat, user namespaces, BPF creation, and perf limited only
to trusted processes. This would keep the scope of kernel entry points
restricted to the more regular set of normally available to unprivileged
userspace.

Restricting access to kernel modules
------------------------------------

The kernel should never allow an unprivileged user the ability to
load specific kernel modules, since that would provide a facility to
unexpectedly extend the available attack surface. (The on-demand loading
of modules via their predefined subsystems, e.g. MODULE_ALIAS_*, is
considered "expected" here, though additional consideration should be
given even to these.) For example, loading a filesystem module via an
unprivileged socket API is nonsense: only the root or physically local
user should trigger filesystem module loading. (And even this can be up
for debate in some scenarios.)

To protect against even privileged users, systems may need to either
disable module loading entirely (e.g. monolithic kernel builds or
modules_disabled sysctl), or provide signed modules (e.g.
``CONFIG_MODULE_SIG_FORCE``, or dm-crypt with LoadPin), to keep from having
root load arbitrary kernel code via the module loader interface.


Memory integrity
================

There are many memory structures in the kernel that are regularly abused
to gain execution control during an attack, By far the most commonly
understood is that of the stack buffer overflow in which the return
address stored on the stack is overwritten. Many other examples of this
kind of attack exist, and protections exist to defend against them.

Stack buffer overflow
---------------------

The classic stack buffer overflow involves writing past the expected end
of a variable stored on the stack, ultimately writing a controlled value
to the stack frame's stored return address. The most widely used defense
is the presence of a stack canary between the stack variables and the
return address (``CONFIG_STACKPROTECTOR``), which is verified just before
the function returns. Other defenses include things like shadow stacks.

Stack depth overflow
--------------------

A less well understood attack is using a bug that triggers the
kernel to consume stack memory with deep function calls or large stack
allocations. With this attack it is possible to write beyond the end of
the kernel's preallocated stack space and into sensitive structures. Two
important changes need to be made for better protections: moving the
sensitive thread_info structure elsewhere, and adding a faulting memory
hole at the bottom of the stack to catch these overflows.

Heap memory integrity
---------------------

The structures used to track heap free lists can be sanity-checked during
allocation and freeing to make sure they aren't being used to manipulate
other memory areas.

Counter integrity
-----------------

Many places in the kernel use atomic counters to track object references
or perform similar lifetime management. When these counters can be made
to wrap (over or under) this traditionally exposes a use-after-free
flaw. By trapping atomic wrapping, this class of bug vanishes.

Size calculation overflow detection
-----------------------------------

Similar to counter overflow, integer overflows (usually size calculations)
need to be detected at runtime to kill this class of bug, which
traditionally leads to being able to write past the end of kernel buffers.


Probabilistic defenses
======================

While many protections can be considered deterministic (e.g. read-only
memory cannot be written to), some protections provide only statistical
defense, in that an attack must gather enough information about a
running system to overcome the defense. While not perfect, these do
provide meaningful defenses.

Canaries, blinding, and other secrets
-------------------------------------

It should be noted that things like the stack canary discussed earlier
are technically statistical defenses, since they rely on a secret value,
and such values may become discoverable through an information exposure
flaw.

Blinding literal values for things like JITs, where the executable
contents may be partially under the control of userspace, need a similar
secret value.

It is critical that the secret values used must be separate (e.g.
different canary per stack) and high entropy (e.g. is the RNG actually
working?) in order to maximize their success.

Kernel Address Space Layout Randomization (KASLR)
-------------------------------------------------

Since the location of kernel memory is almost always instrumental in
mounting a successful attack, making the location non-deterministic
raises the difficulty of an exploit. (Note that this in turn makes
the value of information exposures higher, since they may be used to
discover desired memory locations.)

Text and module base
~~~~~~~~~~~~~~~~~~~~

By relocating the physical and virtual base address of the kernel at
boot-time (``CONFIG_RANDOMIZE_BASE``), attacks needing kernel code will be
frustrated. Additionally, offsetting the module loading base address
means that even systems that load the same set of modules in the same
order every boot will not share a common base address with the rest of
the kernel text.

Stack base
~~~~~~~~~~

If the base address of the kernel stack is not the same between processes,
or even not the same between syscalls, targets on or beyond the stack
become more difficult to locate.

Dynamic memory base
~~~~~~~~~~~~~~~~~~~

Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up
being relatively deterministic in layout due to the order of early-boot
initializations. If the base address of these areas is not the same
between boots, targeting them is frustrated, requiring an information
exposure specific to the region.

Structure layout
~~~~~~~~~~~~~~~~

By performing a per-build randomization of the layout of sensitive
structures, attacks must either be tuned to known kernel builds or expose
enough kernel memory to determine structure layouts before manipulating
them.


Preventing Information Exposures
================================

Since the locations of sensitive structures are the primary target for
attacks, it is important to defend against exposure of both kernel memory
addresses and kernel memory contents (since they may contain kernel
addresses or other sensitive things like canary values).

Kernel addresses
----------------

Printing kernel addresses to userspace leaks sensitive information about
the kernel memory layout. Care should be exercised when using any printk
specifier that prints the raw address, currently %px, %p[ad], (and %p[sSb]
in certain circumstances [*]).  Any file written to using one of these
specifiers should be readable only by privileged processes.

Kernels 4.14 and older printed the raw address using %p. As of 4.15-rc1
addresses printed with the specifier %p are hashed before printing.

[*] If KALLSYMS is enabled and symbol lookup fails, the raw address is
printed. If KALLSYMS is not enabled the raw address is printed.

Unique identifiers
------------------

Kernel memory addresses must never be used as identifiers exposed to
userspace. Instead, use an atomic counter, an idr, or similar unique
identifier.

Memory initialization
---------------------

Memory copied to userspace must always be fully initialized. If not
explicitly memset(), this will require changes to the compiler to make
sure structure holes are cleared.

Memory poisoning
----------------

When releasing memory, it is best to poison the contents, to avoid reuse
attacks that rely on the old contents of memory. E.g., clear stack on a
syscall return (``CONFIG_GCC_PLUGIN_STACKLEAK``), wipe heap memory on a
free. This frustrates many uninitialized variable attacks, stack content
exposures, heap content exposures, and use-after-free attacks.

Destination tracking
--------------------

To help kill classes of bugs that result in kernel addresses being
written to userspace, the destination of writes needs to be tracked. If
the buffer is destined for userspace (e.g. seq_file backed ``/proc`` files),
it should automatically censor sensitive values.
Commit	Line	Data
c2ed6743 KC	1	======================
	2	Kernel Self-Protection
	3	======================
9f803664 KC	4
	5	Kernel self-protection is the design and implementation of systems and
	6	structures within the Linux kernel to protect against security flaws in
	7	the kernel itself. This covers a wide range of issues, including removing
	8	entire classes of bugs, blocking security flaw exploitation methods,
	9	and actively detecting attack attempts. Not all topics are explored in
	10	this document, but it should serve as a reasonable starting point and
	11	answer any frequently asked questions. (Patches welcome, of course!)
	12
	13	In the worst-case scenario, we assume an unprivileged local attacker
	14	has arbitrary read and write access to the kernel's memory. In many
	15	cases, bugs being exploited will not provide this level of access,
	16	but with systems in place that defend against the worst case we'll
	17	cover the more limited cases as well. A higher bar, and one that should
	18	still be kept in mind, is protecting the kernel against a _privileged_
	19	local attacker, since the root user has access to a vastly increased
	20	attack surface. (Especially when they have the ability to load arbitrary
	21	kernel modules.)
	22
	23	The goals for successful self-protection systems would be that they
	24	are effective, on by default, require no opt-in by developers, have no
	25	performance impact, do not impede kernel debugging, and have tests. It
	26	is uncommon that all these goals can be met, but it is worth explicitly
	27	mentioning them, since these aspects need to be explored, dealt with,
	28	and/or accepted.
	29
	30
c2ed6743 KC	31	Attack Surface Reduction
c2ed6743 KC	32	========================
9f803664 KC	33
	34	The most fundamental defense against security exploits is to reduce the
	35	areas of the kernel that can be used to redirect execution. This ranges
	36	from limiting the exposed APIs available to userspace, making in-kernel
	37	APIs hard to use incorrectly, minimizing the areas of writable kernel
	38	memory, etc.
	39
c2ed6743 KC	40	Strict kernel memory permissions
c2ed6743 KC	41	--------------------------------
9f803664 KC	42
	43	When all of kernel memory is writable, it becomes trivial for attacks
	44	to redirect execution flow. To reduce the availability of these targets
	45	the kernel needs to protect its memory with a tight set of permissions.
	46
c2ed6743 KC	47	Executable code and read-only data must not be writable
c2ed6743 KC	48	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9f803664 KC	49
	50	Any areas of the kernel with executable memory must not be writable.
	51	While this obviously includes the kernel text itself, we must consider
	52	all additional places too: kernel modules, JIT memory, etc. (There are
	53	temporary exceptions to this rule to support things like instruction
	54	alternatives, breakpoints, kprobes, etc. If these must exist in a
	55	kernel, they are implemented in a way where the memory is temporarily
	56	made writable during the update, and then returned to the original
	57	permissions.)
	58
c2ed6743 KC	59	In support of this are ``CONFIG_STRICT_KERNEL_RWX`` and
c2ed6743 KC	60	``CONFIG_STRICT_MODULE_RWX``, which seek to make sure that code is not
9f803664 KC	61	writable, data is not executable, and read-only data is neither writable
	62	nor executable.
	63
ad21fc4f LA	64	Most architectures have these options on by default and not user selectable.
	65	For some architectures like arm that wish to have these be selectable,
	66	the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable
c2ed6743	67	a Kconfig prompt. ``CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT`` determines
ad21fc4f LA	68	the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled.
ad21fc4f LA	69
c2ed6743 KC	70	Function pointers and sensitive variables must not be writable
c2ed6743 KC	71	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9f803664 KC	72
	73	Vast areas of kernel memory contain function pointers that are looked
	74	up by the kernel and used to continue execution (e.g. descriptor/vector
	75	tables, file/network/etc operation structures, etc). The number of these
	76	variables must be reduced to an absolute minimum.
	77
	78	Many such variables can be made read-only by setting them "const"
	79	so that they live in the .rodata section instead of the .data section
	80	of the kernel, gaining the protection of the kernel's strict memory
	81	permissions as described above.
	82
c2ed6743	83	For variables that are initialized once at ``__init`` time, these can
b080e521	84	be marked with the ``__ro_after_init`` attribute.
9f803664 KC	85
	86	What remains are variables that are updated rarely (e.g. GDT). These
	87	will need another infrastructure (similar to the temporary exceptions
	88	made to kernel code mentioned above) that allow them to spend the rest
	89	of their lifetime read-only. (For example, when being updated, only the
	90	CPU thread performing the update would be given uninterruptible write
	91	access to the memory.)
	92
c2ed6743 KC	93	Segregation of kernel memory from userspace memory
c2ed6743 KC	94	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9f803664 KC	95
	96	The kernel must never execute userspace memory. The kernel must also never
	97	access userspace memory without explicit expectation to do so. These
	98	rules can be enforced either by support of hardware-based restrictions
	99	(x86's SMEP/SMAP, ARM's PXN/PAN) or via emulation (ARM's Memory Domains).
	100	By blocking userspace memory in this way, execution and data parsing
	101	cannot be passed to trivially-controlled userspace memory, forcing
	102	attacks to operate entirely in kernel memory.
	103
c2ed6743 KC	104	Reduced access to syscalls
c2ed6743 KC	105	--------------------------
9f803664 KC	106
9f803664 KC	107	One trivial way to eliminate many syscalls for 64-bit systems is building
c2ed6743	108	without ``CONFIG_COMPAT``. However, this is rarely a feasible scenario.
9f803664 KC	109
	110	The "seccomp" system provides an opt-in feature made available to
	111	userspace, which provides a way to reduce the number of kernel entry
	112	points available to a running process. This limits the breadth of kernel
	113	code that can be reached, possibly reducing the availability of a given
	114	bug to an attack.
	115
	116	An area of improvement would be creating viable ways to keep access to
	117	things like compat, user namespaces, BPF creation, and perf limited only
	118	to trusted processes. This would keep the scope of kernel entry points
	119	restricted to the more regular set of normally available to unprivileged
	120	userspace.
	121
c2ed6743 KC	122	Restricting access to kernel modules
c2ed6743 KC	123	------------------------------------
9f803664 KC	124
	125	The kernel should never allow an unprivileged user the ability to
	126	load specific kernel modules, since that would provide a facility to
	127	unexpectedly extend the available attack surface. (The on-demand loading
	128	of modules via their predefined subsystems, e.g. MODULE_ALIAS_*, is
	129	considered "expected" here, though additional consideration should be
	130	given even to these.) For example, loading a filesystem module via an
	131	unprivileged socket API is nonsense: only the root or physically local
	132	user should trigger filesystem module loading. (And even this can be up
	133	for debate in some scenarios.)
	134
	135	To protect against even privileged users, systems may need to either
	136	disable module loading entirely (e.g. monolithic kernel builds or
	137	modules_disabled sysctl), or provide signed modules (e.g.
c2ed6743	138	``CONFIG_MODULE_SIG_FORCE``, or dm-crypt with LoadPin), to keep from having
9f803664 KC	139	root load arbitrary kernel code via the module loader interface.
	140
	141
c2ed6743 KC	142	Memory integrity
c2ed6743 KC	143	================
9f803664 KC	144
	145	There are many memory structures in the kernel that are regularly abused
	146	to gain execution control during an attack, By far the most commonly
	147	understood is that of the stack buffer overflow in which the return
	148	address stored on the stack is overwritten. Many other examples of this
	149	kind of attack exist, and protections exist to defend against them.
	150
c2ed6743 KC	151	Stack buffer overflow
c2ed6743 KC	152	---------------------
9f803664 KC	153
	154	The classic stack buffer overflow involves writing past the expected end
	155	of a variable stored on the stack, ultimately writing a controlled value
	156	to the stack frame's stored return address. The most widely used defense
	157	is the presence of a stack canary between the stack variables and the
050e9baa	158	return address (``CONFIG_STACKPROTECTOR``), which is verified just before
9f803664 KC	159	the function returns. Other defenses include things like shadow stacks.
9f803664 KC	160
c2ed6743 KC	161	Stack depth overflow
c2ed6743 KC	162	--------------------
9f803664 KC	163
	164	A less well understood attack is using a bug that triggers the
	165	kernel to consume stack memory with deep function calls or large stack
	166	allocations. With this attack it is possible to write beyond the end of
	167	the kernel's preallocated stack space and into sensitive structures. Two
	168	important changes need to be made for better protections: moving the
	169	sensitive thread_info structure elsewhere, and adding a faulting memory
	170	hole at the bottom of the stack to catch these overflows.
	171
c2ed6743 KC	172	Heap memory integrity
c2ed6743 KC	173	---------------------
9f803664 KC	174
	175	The structures used to track heap free lists can be sanity-checked during
	176	allocation and freeing to make sure they aren't being used to manipulate
	177	other memory areas.
	178
c2ed6743 KC	179	Counter integrity
c2ed6743 KC	180	-----------------
9f803664 KC	181
	182	Many places in the kernel use atomic counters to track object references
	183	or perform similar lifetime management. When these counters can be made
	184	to wrap (over or under) this traditionally exposes a use-after-free
	185	flaw. By trapping atomic wrapping, this class of bug vanishes.
	186
c2ed6743 KC	187	Size calculation overflow detection
c2ed6743 KC	188	-----------------------------------
9f803664 KC	189
	190	Similar to counter overflow, integer overflows (usually size calculations)
	191	need to be detected at runtime to kill this class of bug, which
	192	traditionally leads to being able to write past the end of kernel buffers.
	193
	194
c2ed6743 KC	195	Probabilistic defenses
c2ed6743 KC	196	======================
9f803664 KC	197
	198	While many protections can be considered deterministic (e.g. read-only
	199	memory cannot be written to), some protections provide only statistical
	200	defense, in that an attack must gather enough information about a
	201	running system to overcome the defense. While not perfect, these do
	202	provide meaningful defenses.
	203
c2ed6743 KC	204	Canaries, blinding, and other secrets
c2ed6743 KC	205	-------------------------------------
9f803664 KC	206
9f803664 KC	207	It should be noted that things like the stack canary discussed earlier
c9de4a82 KC	208	are technically statistical defenses, since they rely on a secret value,
	209	and such values may become discoverable through an information exposure
	210	flaw.
9f803664 KC	211
	212	Blinding literal values for things like JITs, where the executable
	213	contents may be partially under the control of userspace, need a similar
	214	secret value.
	215
	216	It is critical that the secret values used must be separate (e.g.
	217	different canary per stack) and high entropy (e.g. is the RNG actually
	218	working?) in order to maximize their success.
	219
c2ed6743 KC	220	Kernel Address Space Layout Randomization (KASLR)
c2ed6743 KC	221	-------------------------------------------------
9f803664 KC	222
	223	Since the location of kernel memory is almost always instrumental in
	224	mounting a successful attack, making the location non-deterministic
	225	raises the difficulty of an exploit. (Note that this in turn makes
c9de4a82 KC	226	the value of information exposures higher, since they may be used to
c9de4a82 KC	227	discover desired memory locations.)
9f803664	228
c2ed6743 KC	229	Text and module base
c2ed6743 KC	230	~~~~~~~~~~~~~~~~~~~~
9f803664 KC	231
9f803664 KC	232	By relocating the physical and virtual base address of the kernel at
c2ed6743	233	boot-time (``CONFIG_RANDOMIZE_BASE``), attacks needing kernel code will be
9f803664 KC	234	frustrated. Additionally, offsetting the module loading base address
	235	means that even systems that load the same set of modules in the same
	236	order every boot will not share a common base address with the rest of
	237	the kernel text.
	238
c2ed6743 KC	239	Stack base
c2ed6743 KC	240	~~~~~~~~~~
9f803664 KC	241
	242	If the base address of the kernel stack is not the same between processes,
	243	or even not the same between syscalls, targets on or beyond the stack
	244	become more difficult to locate.
	245
c2ed6743 KC	246	Dynamic memory base
c2ed6743 KC	247	~~~~~~~~~~~~~~~~~~~
9f803664 KC	248
	249	Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up
	250	being relatively deterministic in layout due to the order of early-boot
	251	initializations. If the base address of these areas is not the same
c9de4a82 KC	252	between boots, targeting them is frustrated, requiring an information
	253	exposure specific to the region.
	254
c2ed6743 KC	255	Structure layout
c2ed6743 KC	256	~~~~~~~~~~~~~~~~
c9de4a82 KC	257
	258	By performing a per-build randomization of the layout of sensitive
	259	structures, attacks must either be tuned to known kernel builds or expose
	260	enough kernel memory to determine structure layouts before manipulating
	261	them.
9f803664 KC	262
9f803664 KC	263
c2ed6743 KC	264	Preventing Information Exposures
c2ed6743 KC	265	================================
9f803664 KC	266
9f803664 KC	267	Since the locations of sensitive structures are the primary target for
c9de4a82	268	attacks, it is important to defend against exposure of both kernel memory
9f803664 KC	269	addresses and kernel memory contents (since they may contain kernel
	270	addresses or other sensitive things like canary values).
	271
227d1a61 TH	272	Kernel addresses
	273	----------------
	274
	275	Printing kernel addresses to userspace leaks sensitive information about
	276	the kernel memory layout. Care should be exercised when using any printk
	277	specifier that prints the raw address, currently %px, %p[ad], (and %p[sSb]
	278	in certain circumstances [*]). Any file written to using one of these
	279	specifiers should be readable only by privileged processes.
	280
	281	Kernels 4.14 and older printed the raw address using %p. As of 4.15-rc1
	282	addresses printed with the specifier %p are hashed before printing.
	283
	284	[*] If KALLSYMS is enabled and symbol lookup fails, the raw address is
	285	printed. If KALLSYMS is not enabled the raw address is printed.
	286
c2ed6743 KC	287	Unique identifiers
c2ed6743 KC	288	------------------
9f803664 KC	289
	290	Kernel memory addresses must never be used as identifiers exposed to
	291	userspace. Instead, use an atomic counter, an idr, or similar unique
	292	identifier.
	293
c2ed6743 KC	294	Memory initialization
c2ed6743 KC	295	---------------------
9f803664 KC	296
	297	Memory copied to userspace must always be fully initialized. If not
	298	explicitly memset(), this will require changes to the compiler to make
	299	sure structure holes are cleared.
	300
c2ed6743 KC	301	Memory poisoning
c2ed6743 KC	302	----------------
9f803664	303
ed535a2d AP	304	When releasing memory, it is best to poison the contents, to avoid reuse
	305	attacks that rely on the old contents of memory. E.g., clear stack on a
	306	syscall return (``CONFIG_GCC_PLUGIN_STACKLEAK``), wipe heap memory on a
	307	free. This frustrates many uninitialized variable attacks, stack content
	308	exposures, heap content exposures, and use-after-free attacks.
9f803664	309
c2ed6743 KC	310	Destination tracking
c2ed6743 KC	311	--------------------
9f803664 KC	312
	313	To help kill classes of bugs that result in kernel addresses being
	314	written to userspace, the destination of writes needs to be tracked. If
c2ed6743	315	the buffer is destined for userspace (e.g. seq_file backed ``/proc`` files),
9f803664	316	it should automatically censor sensitive values.