Commit | Line | Data |
---|---|---|
6a9e5292 TG |
1 | Microarchitectural Data Sampling (MDS) mitigation |
2 | ================================================= | |
3 | ||
4 | .. _mds: | |
5 | ||
6 | Overview | |
7 | -------- | |
8 | ||
9 | Microarchitectural Data Sampling (MDS) is a family of side channel attacks | |
10 | on internal buffers in Intel CPUs. The variants are: | |
11 | ||
12 | - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) | |
13 | - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) | |
14 | - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127) | |
e672f8bf | 15 | - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091) |
6a9e5292 TG |
16 | |
17 | MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a | |
18 | dependent load (store-to-load forwarding) as an optimization. The forward | |
19 | can also happen to a faulting or assisting load operation for a different | |
20 | memory address, which can be exploited under certain conditions. Store | |
21 | buffers are partitioned between Hyper-Threads so cross thread forwarding is | |
22 | not possible. But if a thread enters or exits a sleep state the store | |
23 | buffer is repartitioned which can expose data from one thread to the other. | |
24 | ||
25 | MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage | |
26 | L1 miss situations and to hold data which is returned or sent in response | |
27 | to a memory or I/O operation. Fill buffers can forward data to a load | |
28 | operation and also write data to the cache. When the fill buffer is | |
29 | deallocated it can retain the stale data of the preceding operations which | |
30 | can then be forwarded to a faulting or assisting load operation, which can | |
31 | be exploited under certain conditions. Fill buffers are shared between | |
32 | Hyper-Threads so cross thread leakage is possible. | |
33 | ||
34 | MLPDS leaks Load Port Data. Load ports are used to perform load operations | |
35 | from memory or I/O. The received data is then forwarded to the register | |
36 | file or a subsequent operation. In some implementations the Load Port can | |
37 | contain stale data from a previous operation which can be forwarded to | |
38 | faulting or assisting loads under certain conditions, which again can be | |
39 | exploited eventually. Load ports are shared between Hyper-Threads so cross | |
40 | thread leakage is possible. | |
41 | ||
e672f8bf PG |
42 | MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from |
43 | memory that takes a fault or assist can leave data in a microarchitectural | |
44 | structure that may later be observed using one of the same methods used by | |
45 | MSBDS, MFBDS or MLPDS. | |
6a9e5292 TG |
46 | |
47 | Exposure assumptions | |
48 | -------------------- | |
49 | ||
50 | It is assumed that attack code resides in user space or in a guest with one | |
51 | exception. The rationale behind this assumption is that the code construct | |
52 | needed for exploiting MDS requires: | |
53 | ||
54 | - to control the load to trigger a fault or assist | |
55 | ||
56 | - to have a disclosure gadget which exposes the speculatively accessed | |
57 | data for consumption through a side channel. | |
58 | ||
59 | - to control the pointer through which the disclosure gadget exposes the | |
60 | data | |
61 | ||
62 | The existence of such a construct in the kernel cannot be excluded with | |
63 | 100% certainty, but the complexity involved makes it extremly unlikely. | |
64 | ||
65 | There is one exception, which is untrusted BPF. The functionality of | |
66 | untrusted BPF is limited, but it needs to be thoroughly investigated | |
67 | whether it can be used to create such a construct. | |
68 | ||
69 | ||
70 | Mitigation strategy | |
71 | ------------------- | |
72 | ||
73 | All variants have the same mitigation strategy at least for the single CPU | |
74 | thread case (SMT off): Force the CPU to clear the affected buffers. | |
75 | ||
76 | This is achieved by using the otherwise unused and obsolete VERW | |
77 | instruction in combination with a microcode update. The microcode clears | |
78 | the affected CPU buffers when the VERW instruction is executed. | |
79 | ||
80 | For virtualization there are two ways to achieve CPU buffer | |
81 | clearing. Either the modified VERW instruction or via the L1D Flush | |
82 | command. The latter is issued when L1TF mitigation is enabled so the extra | |
83 | VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to | |
84 | be issued. | |
85 | ||
86 | If the VERW instruction with the supplied segment selector argument is | |
87 | executed on a CPU without the microcode update there is no side effect | |
88 | other than a small number of pointlessly wasted CPU cycles. | |
89 | ||
90 | This does not protect against cross Hyper-Thread attacks except for MSBDS | |
91 | which is only exploitable cross Hyper-thread when one of the Hyper-Threads | |
92 | enters a C-state. | |
93 | ||
94 | The kernel provides a function to invoke the buffer clearing: | |
95 | ||
96 | mds_clear_cpu_buffers() | |
97 | ||
98 | The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state | |
99 | (idle) transitions. | |
100 | ||
22dd8365 TG |
101 | As a special quirk to address virtualization scenarios where the host has |
102 | the microcode updated, but the hypervisor does not (yet) expose the | |
103 | MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the | |
104 | hope that it might actually clear the buffers. The state is reflected | |
105 | accordingly. | |
106 | ||
6a9e5292 TG |
107 | According to current knowledge additional mitigations inside the kernel |
108 | itself are not required because the necessary gadgets to expose the leaked | |
109 | data cannot be controlled in a way which allows exploitation from malicious | |
110 | user space or VM guests. | |
04dcbdb8 | 111 | |
22dd8365 TG |
112 | Kernel internal mitigation modes |
113 | -------------------------------- | |
114 | ||
115 | ======= ============================================================ | |
116 | off Mitigation is disabled. Either the CPU is not affected or | |
117 | mds=off is supplied on the kernel command line | |
118 | ||
95310e34 | 119 | full Mitigation is enabled. CPU is affected and MD_CLEAR is |
22dd8365 TG |
120 | advertised in CPUID. |
121 | ||
122 | vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not | |
123 | advertised in CPUID. That is mainly for virtualization | |
124 | scenarios where the host has the updated microcode but the | |
125 | hypervisor does not expose MD_CLEAR in CPUID. It's a best | |
126 | effort approach without guarantee. | |
127 | ======= ============================================================ | |
128 | ||
129 | If the CPU is affected and mds=off is not supplied on the kernel command | |
130 | line then the kernel selects the appropriate mitigation mode depending on | |
131 | the availability of the MD_CLEAR CPUID bit. | |
132 | ||
04dcbdb8 TG |
133 | Mitigation points |
134 | ----------------- | |
135 | ||
136 | 1. Return to user space | |
137 | ^^^^^^^^^^^^^^^^^^^^^^^ | |
138 | ||
139 | When transitioning from kernel to user space the CPU buffers are flushed | |
140 | on affected CPUs when the mitigation is not disabled on the kernel | |
141 | command line. The migitation is enabled through the static key | |
142 | mds_user_clear. | |
143 | ||
144 | The mitigation is invoked in prepare_exit_to_usermode() which covers | |
9d8d0294 AL |
145 | all but one of the kernel to user space transitions. The exception |
146 | is when we return from a Non Maskable Interrupt (NMI), which is | |
147 | handled directly in do_nmi(). | |
148 | ||
149 | (The reason that NMI is special is that prepare_exit_to_usermode() can | |
150 | enable IRQs. In NMI context, NMIs are blocked, and we don't want to | |
151 | enable IRQs with NMIs blocked.) | |
07f07f55 TG |
152 | |
153 | ||
154 | 2. C-State transition | |
155 | ^^^^^^^^^^^^^^^^^^^^^ | |
156 | ||
157 | When a CPU goes idle and enters a C-State the CPU buffers need to be | |
158 | cleared on affected CPUs when SMT is active. This addresses the | |
159 | repartitioning of the store buffer when one of the Hyper-Threads enters | |
160 | a C-State. | |
161 | ||
162 | When SMT is inactive, i.e. either the CPU does not support it or all | |
163 | sibling threads are offline CPU buffer clearing is not required. | |
164 | ||
165 | The idle clearing is enabled on CPUs which are only affected by MSBDS | |
166 | and not by any other MDS variant. The other MDS variants cannot be | |
167 | protected against cross Hyper-Thread attacks because the Fill Buffer and | |
168 | the Load Ports are shared. So on CPUs affected by other variants, the | |
169 | idle clearing would be a window dressing exercise and is therefore not | |
170 | activated. | |
171 | ||
172 | The invocation is controlled by the static key mds_idle_clear which is | |
173 | switched depending on the chosen mitigation mode and the SMT state of | |
174 | the system. | |
175 | ||
176 | The buffer clear is only invoked before entering the C-State to prevent | |
177 | that stale data from the idling CPU from spilling to the Hyper-Thread | |
178 | sibling after the store buffer got repartitioned and all entries are | |
179 | available to the non idle sibling. | |
180 | ||
181 | When coming out of idle the store buffer is partitioned again so each | |
182 | sibling has half of it available. The back from idle CPU could be then | |
183 | speculatively exposed to contents of the sibling. The buffers are | |
184 | flushed either on exit to user space or on VMENTER so malicious code | |
185 | in user space or the guest cannot speculatively access them. | |
186 | ||
187 | The mitigation is hooked into all variants of halt()/mwait(), but does | |
188 | not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver | |
189 | has been superseded by the intel_idle driver around 2010 and is | |
190 | preferred on all affected CPUs which are expected to gain the MD_CLEAR | |
191 | functionality in microcode. Aside of that the IO-Port mechanism is a | |
192 | legacy interface which is only used on older systems which are either | |
193 | not affected or do not receive microcode updates anymore. |