x86/Documentation: Remove STACKFAULT_STACK bulletpoint
[linux-2.6-block.git] / Documentation / x86 / kernel-stacks
CommitLineData
d724a9a5
BP
1Kernel stacks on x86-64 bit
2---------------------------
3
352f7bae
AK
4Most of the text from Keith Owens, hacked by AK
5
6x86_64 page size (PAGE_SIZE) is 4K.
7
8Like all other architectures, x86_64 has a kernel stack for every
9active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big.
10These stacks contain useful data as long as a thread is alive or a
11zombie. While the thread is in user space the kernel stack is empty
12except for the thread_info structure at the bottom.
13
14In addition to the per thread stacks, there are specialized stacks
57d30772
RD
15associated with each CPU. These stacks are only used while the kernel
16is in control on that CPU; when a CPU returns to user space the
17specialized stacks contain no useful data. The main CPU stacks are:
352f7bae
AK
18
19* Interrupt stack. IRQSTACKSIZE
20
21 Used for external hardware interrupts. If this is the first external
22 hardware interrupt (i.e. not a nested hardware interrupt) then the
23 kernel switches from the current task to the interrupt stack. Like
7974891d
CH
24 the split thread and interrupt stacks on i386, this gives more room
25 for kernel interrupt processing without having to increase the size
26 of every per thread stack.
352f7bae
AK
27
28 The interrupt stack is also used when processing a softirq.
29
30Switching to the kernel interrupt stack is done by software based on a
31per CPU interrupt nest counter. This is needed because x86-64 "IST"
32hardware stacks cannot nest without races.
33
34x86_64 also has a feature which is not available on i386, the ability
35to automatically switch to a new stack for designated events such as
36double fault or NMI, which makes it easier to handle these unusual
37events on x86_64. This feature is called the Interrupt Stack Table
57d30772
RD
38(IST). There can be up to 7 IST entries per CPU. The IST code is an
39index into the Task State Segment (TSS). The IST entries in the TSS
40point to dedicated stacks; each stack can be a different size.
352f7bae 41
57d30772 42An IST is selected by a non-zero value in the IST field of an
352f7bae
AK
43interrupt-gate descriptor. When an interrupt occurs and the hardware
44loads such a descriptor, the hardware automatically sets the new stack
45pointer based on the IST value, then invokes the interrupt handler. If
48e08d0f
AL
46the interrupt came from user mode, then the interrupt handler prologue
47will switch back to the per-thread stack. If software wants to allow
48nested IST interrupts then the handler must adjust the IST values on
49entry to and exit from the interrupt handler. (This is occasionally
50done, e.g. for debug exceptions.)
352f7bae
AK
51
52Events with different IST codes (i.e. with different stacks) can be
53nested. For example, a debug interrupt can safely be interrupted by an
54NMI. arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack
55pointers on entry to and exit from all IST events, in theory allowing
56IST events with the same code to be nested. However in most cases, the
57stack size allocated to an IST assumes no nesting for the same code.
58If that assumption is ever broken then the stacks will become corrupt.
59
60The currently assigned IST stacks are :-
61
352f7bae
AK
62* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
63
64 Used for interrupt 8 - Double Fault Exception (#DF).
65
57d30772
RD
66 Invoked when handling one exception causes another exception. Happens
67 when the kernel is very confused (e.g. kernel stack pointer corrupt).
68 Using a separate stack allows the kernel to recover from it well enough
69 in many cases to still output an oops.
352f7bae
AK
70
71* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
72
73 Used for non-maskable interrupts (NMI).
74
75 NMI can be delivered at any time, including when the kernel is in the
76 middle of switching stacks. Using IST for NMI events avoids making
77 assumptions about the previous state of the kernel stack.
78
79* DEBUG_STACK. DEBUG_STKSZ
80
81 Used for hardware debug interrupts (interrupt 1) and for software
82 debug interrupts (INT3).
83
84 When debugging a kernel, debug interrupts (both hardware and
85 software) can occur at any time. Using IST for these interrupts
86 avoids making assumptions about the previous state of the kernel
87 stack.
88
89* MCE_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
90
91 Used for interrupt 18 - Machine Check Exception (#MC).
92
93 MCE can be delivered at any time, including when the kernel is in the
94 middle of switching stacks. Using IST for MCE events avoids making
95 assumptions about the previous state of the kernel stack.
96
97For more details see the Intel IA32 or AMD AMD64 architecture manuals.