Commit | Line | Data |
---|---|---|
6634fbb6 MCC |
1 | Error Detection And Correction (EDAC) Devices |
2 | ============================================= | |
3 | ||
6b1fb6f7 MCC |
4 | Main Concepts used at the EDAC subsystem |
5 | ---------------------------------------- | |
6 | ||
7 | There are several things to be aware of that aren't at all obvious, like | |
8 | *sockets, *socket sets*, *banks*, *rows*, *chip-select rows*, *channels*, | |
9 | etc... | |
10 | ||
11 | These are some of the many terms that are thrown about that don't always | |
12 | mean what people think they mean (Inconceivable!). In the interest of | |
13 | creating a common ground for discussion, terms and their definitions | |
14 | will be established. | |
15 | ||
16 | * Memory devices | |
17 | ||
18 | The individual DRAM chips on a memory stick. These devices commonly | |
19 | output 4 and 8 bits each (x4, x8). Grouping several of these in parallel | |
20 | provides the number of bits that the memory controller expects: | |
21 | typically 72 bits, in order to provide 64 bits + 8 bits of ECC data. | |
22 | ||
23 | * Memory Stick | |
24 | ||
25 | A printed circuit board that aggregates multiple memory devices in | |
26 | parallel. In general, this is the Field Replaceable Unit (FRU) which | |
27 | gets replaced, in the case of excessive errors. Most often it is also | |
28 | called DIMM (Dual Inline Memory Module). | |
29 | ||
30 | * Memory Socket | |
31 | ||
32 | A physical connector on the motherboard that accepts a single memory | |
33 | stick. Also called as "slot" on several datasheets. | |
34 | ||
35 | * Channel | |
36 | ||
37 | A memory controller channel, responsible to communicate with a group of | |
38 | DIMMs. Each channel has its own independent control (command) and data | |
39 | bus, and can be used independently or grouped with other channels. | |
40 | ||
41 | * Branch | |
42 | ||
43 | It is typically the highest hierarchy on a Fully-Buffered DIMM memory | |
44 | controller. Typically, it contains two channels. Two channels at the | |
45 | same branch can be used in single mode or in lockstep mode. When | |
46 | lockstep is enabled, the cacheline is doubled, but it generally brings | |
47 | some performance penalty. Also, it is generally not possible to point to | |
48 | just one memory stick when an error occurs, as the error correction code | |
49 | is calculated using two DIMMs instead of one. Due to that, it is capable | |
50 | of correcting more errors than on single mode. | |
51 | ||
52 | * Single-channel | |
53 | ||
54 | The data accessed by the memory controller is contained into one dimm | |
55 | only. E. g. if the data is 64 bits-wide, the data flows to the CPU using | |
56 | one 64 bits parallel access. Typically used with SDR, DDR, DDR2 and DDR3 | |
57 | memories. FB-DIMM and RAMBUS use a different concept for channel, so | |
58 | this concept doesn't apply there. | |
59 | ||
60 | * Double-channel | |
61 | ||
62 | The data size accessed by the memory controller is interlaced into two | |
63 | dimms, accessed at the same time. E. g. if the DIMM is 64 bits-wide (72 | |
64 | bits with ECC), the data flows to the CPU using a 128 bits parallel | |
65 | access. | |
66 | ||
67 | * Chip-select row | |
68 | ||
69 | This is the name of the DRAM signal used to select the DRAM ranks to be | |
70 | accessed. Common chip-select rows for single channel are 64 bits, for | |
71 | dual channel 128 bits. It may not be visible by the memory controller, | |
72 | as some DIMM types have a memory buffer that can hide direct access to | |
73 | it from the Memory Controller. | |
74 | ||
75 | * Single-Ranked stick | |
76 | ||
77 | A Single-ranked stick has 1 chip-select row of memory. Motherboards | |
78 | commonly drive two chip-select pins to a memory stick. A single-ranked | |
79 | stick, will occupy only one of those rows. The other will be unused. | |
80 | ||
81 | .. _doubleranked: | |
82 | ||
83 | * Double-Ranked stick | |
84 | ||
85 | A double-ranked stick has two chip-select rows which access different | |
86 | sets of memory devices. The two rows cannot be accessed concurrently. | |
87 | ||
88 | * Double-sided stick | |
89 | ||
90 | **DEPRECATED TERM**, see :ref:`Double-Ranked stick <doubleranked>`. | |
91 | ||
92 | A double-sided stick has two chip-select rows which access different sets | |
93 | of memory devices. The two rows cannot be accessed concurrently. | |
94 | "Double-sided" is irrespective of the memory devices being mounted on | |
95 | both sides of the memory stick. | |
96 | ||
97 | * Socket set | |
98 | ||
99 | All of the memory sticks that are required for a single memory access or | |
100 | all of the memory sticks spanned by a chip-select row. A single socket | |
101 | set has two chip-select rows and if double-sided sticks are used these | |
102 | will occupy those chip-select rows. | |
103 | ||
104 | * Bank | |
105 | ||
106 | This term is avoided because it is unclear when needing to distinguish | |
107 | between chip-select rows and socket sets. | |
108 | ||
4f3fa571 M |
109 | * High Bandwidth Memory (HBM) |
110 | ||
111 | HBM is a new memory type with low power consumption and ultra-wide | |
112 | communication lanes. It uses vertically stacked memory chips (DRAM dies) | |
113 | interconnected by microscopic wires called "through-silicon vias," or | |
114 | TSVs. | |
115 | ||
116 | Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast | |
117 | interconnect called the "interposer". Therefore, HBM's characteristics | |
118 | are nearly indistinguishable from on-chip integrated RAM. | |
6b1fb6f7 | 119 | |
6634fbb6 MCC |
120 | Memory Controllers |
121 | ------------------ | |
122 | ||
123 | Most of the EDAC core is focused on doing Memory Controller error detection. | |
124 | The :c:func:`edac_mc_alloc`. It uses internally the struct ``mem_ctl_info`` | |
125 | to describe the memory controllers, with is an opaque struct for the EDAC | |
126 | drivers. Only the EDAC core is allowed to touch it. | |
127 | ||
128 | .. kernel-doc:: include/linux/edac.h | |
129 | ||
130 | .. kernel-doc:: drivers/edac/edac_mc.h | |
131 | ||
132 | PCI Controllers | |
133 | --------------- | |
134 | ||
135 | The EDAC subsystem provides a mechanism to handle PCI controllers by calling | |
136 | the :c:func:`edac_pci_alloc_ctl_info`. It will use the struct | |
137 | :c:type:`edac_pci_ctl_info` to describe the PCI controllers. | |
138 | ||
139 | .. kernel-doc:: drivers/edac/edac_pci.h | |
140 | ||
141 | EDAC Blocks | |
142 | ----------- | |
143 | ||
144 | The EDAC subsystem also provides a generic mechanism to report errors on | |
145 | other parts of the hardware via :c:func:`edac_device_alloc_ctl_info` function. | |
146 | ||
147 | The structures :c:type:`edac_dev_sysfs_block_attribute`, | |
148 | :c:type:`edac_device_block`, :c:type:`edac_device_instance` and | |
149 | :c:type:`edac_device_ctl_info` provide a generic or abstract 'edac_device' | |
150 | representation at sysfs. | |
151 | ||
152 | This set of structures and the code that implements the APIs for the same, provide for registering EDAC type devices which are NOT standard memory or | |
153 | PCI, like: | |
154 | ||
155 | - CPU caches (L1 and L2) | |
156 | - DMA engines | |
157 | - Core CPU switches | |
158 | - Fabric switch units | |
159 | - PCIe interface controllers | |
160 | - other EDAC/ECC type devices that can be monitored for | |
161 | errors, etc. | |
162 | ||
163 | It allows for a 2 level set of hierarchy. | |
164 | ||
165 | For example, a cache could be composed of L1, L2 and L3 levels of cache. | |
166 | Each CPU core would have its own L1 cache, while sharing L2 and maybe L3 | |
167 | caches. On such case, those can be represented via the following sysfs | |
168 | nodes:: | |
169 | ||
170 | /sys/devices/system/edac/.. | |
171 | ||
172 | pci/ <existing pci directory (if available)> | |
173 | mc/ <existing memory device directory> | |
174 | cpu/cpu0/.. <L1 and L2 block directory> | |
175 | /L1-cache/ce_count | |
176 | /ue_count | |
177 | /L2-cache/ce_count | |
178 | /ue_count | |
179 | cpu/cpu1/.. <L1 and L2 block directory> | |
180 | /L1-cache/ce_count | |
181 | /ue_count | |
182 | /L2-cache/ce_count | |
183 | /ue_count | |
184 | ... | |
185 | ||
186 | the L1 and L2 directories would be "edac_device_block's" | |
187 | ||
188 | .. kernel-doc:: drivers/edac/edac_device.h | |
4f3fa571 M |
189 | |
190 | ||
191 | Heterogeneous system support | |
192 | ---------------------------- | |
193 | ||
194 | An AMD heterogeneous system is built by connecting the data fabrics of | |
195 | both CPUs and GPUs via custom xGMI links. Thus, the data fabric on the | |
196 | GPU nodes can be accessed the same way as the data fabric on CPU nodes. | |
197 | ||
198 | The MI200 accelerators are data center GPUs. They have 2 data fabrics, | |
199 | and each GPU data fabric contains four Unified Memory Controllers (UMC). | |
200 | Each UMC contains eight channels. Each UMC channel controls one 128-bit | |
201 | HBM2e (2GB) channel (equivalent to 8 X 2GB ranks). This creates a total | |
202 | of 4096-bits of DRAM data bus. | |
203 | ||
204 | While the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC | |
205 | channel is interfacing 2GB of DRAM (represented as rank). | |
206 | ||
207 | Memory controllers on AMD GPU nodes can be represented in EDAC thusly: | |
208 | ||
209 | GPU DF / GPU Node -> EDAC MC | |
210 | GPU UMC -> EDAC CSROW | |
211 | GPU UMC channel -> EDAC CHANNEL | |
212 | ||
213 | For example: a heterogeneous system with 1 AMD CPU is connected to | |
214 | 4 MI200 (Aldebaran) GPUs using xGMI. | |
215 | ||
216 | Some more heterogeneous hardware details: | |
217 | ||
218 | - The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC. | |
219 | They have chip selects (csrows) and channels. However, the layouts are different | |
220 | for performance, physical layout, or other reasons. | |
221 | - CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the | |
222 | marketing speak. CPU has X memory channels, etc. | |
223 | - CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW. | |
224 | - GPU UMCs use 1 chip select, So UMC = EDAC CSROW. | |
225 | - GPU UMCs use 8 channels, So UMC channel = EDAC channel. | |
226 | ||
227 | The EDAC subsystem provides a mechanism to handle AMD heterogeneous | |
228 | systems by calling system specific ops for both CPUs and GPUs. | |
229 | ||
230 | AMD GPU nodes are enumerated in sequential order based on the PCI | |
231 | hierarchy, and the first GPU node is assumed to have a Node ID value | |
232 | following those of the CPU nodes after latter are fully populated:: | |
233 | ||
234 | $ ls /sys/devices/system/edac/mc/ | |
235 | mc0 - CPU MC node 0 | |
236 | mc1 | | |
237 | mc2 |- GPU card[0] => node 0(mc1), node 1(mc2) | |
238 | mc3 | | |
239 | mc4 |- GPU card[1] => node 0(mc3), node 1(mc4) | |
240 | mc5 | | |
241 | mc6 |- GPU card[2] => node 0(mc5), node 1(mc6) | |
242 | mc7 | | |
243 | mc8 |- GPU card[3] => node 0(mc7), node 1(mc8) | |
244 | ||
245 | For example, a heterogeneous system with one AMD CPU is connected to | |
246 | four MI200 (Aldebaran) GPUs using xGMI. This topology can be represented | |
247 | via the following sysfs entries:: | |
248 | ||
249 | /sys/devices/system/edac/mc/.. | |
250 | ||
251 | CPU # CPU node | |
252 | ├── mc 0 | |
253 | ||
254 | GPU Nodes are enumerated sequentially after CPU nodes have been populated | |
255 | GPU card 1 # Each MI200 GPU has 2 nodes/mcs | |
256 | ├── mc 1 # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs | |
257 | │ ├── csrow 0 # UMC 0 | |
258 | │ │ ├── channel 0 # Each UMC has 8 channels | |
259 | │ │ ├── channel 1 # size of each channel is 2 GB, so each UMC has 16 GB | |
260 | │ │ ├── channel 2 | |
261 | │ │ ├── channel 3 | |
262 | │ │ ├── channel 4 | |
263 | │ │ ├── channel 5 | |
264 | │ │ ├── channel 6 | |
265 | │ │ ├── channel 7 | |
266 | │ ├── csrow 1 # UMC 1 | |
267 | │ │ ├── channel 0 | |
268 | │ │ ├── .. | |
269 | │ │ ├── channel 7 | |
270 | │ ├── .. .. | |
271 | │ ├── csrow 3 # UMC 3 | |
272 | │ │ ├── channel 0 | |
273 | │ │ ├── .. | |
274 | │ │ ├── channel 7 | |
275 | │ ├── rank 0 | |
276 | │ ├── .. .. | |
277 | │ ├── rank 31 # total 32 ranks/dimms from 4 UMCs | |
278 | ├ | |
279 | ├── mc 2 # GPU node 1 == mc2 | |
280 | │ ├── .. # each GPU has total 64 GB | |
281 | ||
282 | GPU card 2 | |
283 | ├── mc 3 | |
284 | │ ├── .. | |
285 | ├── mc 4 | |
286 | │ ├── .. | |
287 | ||
288 | GPU card 3 | |
289 | ├── mc 5 | |
290 | │ ├── .. | |
291 | ├── mc 6 | |
292 | │ ├── .. | |
293 | ||
294 | GPU card 4 | |
295 | ├── mc 7 | |
296 | │ ├── .. | |
297 | ├── mc 8 | |
298 | │ ├── .. |