Commit | Line | Data |
---|---|---|
b693d0b3 MCC |
1 | ===================== |
2 | Booting AArch64 Linux | |
3 | ===================== | |
9703d9d7 CM |
4 | |
5 | Author: Will Deacon <will.deacon@arm.com> | |
b693d0b3 | 6 | |
9703d9d7 CM |
7 | Date : 07 September 2012 |
8 | ||
9 | This document is based on the ARM booting document by Russell King and | |
10 | is relevant to all public releases of the AArch64 Linux kernel. | |
11 | ||
12 | The AArch64 exception model is made up of a number of exception levels | |
b8ac4ee0 AP |
13 | (EL0 - EL3), with EL0, EL1 and EL2 having a secure and a non-secure |
14 | counterpart. EL2 is the hypervisor level, EL3 is the highest priority | |
15 | level and exists only in secure mode. Both are architecturally optional. | |
9703d9d7 | 16 | |
b693d0b3 | 17 | For the purposes of this document, we will use the term `boot loader` |
9703d9d7 CM |
18 | simply to define all software that executes on the CPU(s) before control |
19 | is passed to the Linux kernel. This may include secure monitor and | |
20 | hypervisor code, or it may just be a handful of instructions for | |
21 | preparing a minimal boot environment. | |
22 | ||
23 | Essentially, the boot loader should provide (as a minimum) the | |
24 | following: | |
25 | ||
26 | 1. Setup and initialise the RAM | |
27 | 2. Setup the device tree | |
28 | 3. Decompress the kernel image | |
29 | 4. Call the kernel image | |
30 | ||
31 | ||
32 | 1. Setup and initialise RAM | |
33 | --------------------------- | |
34 | ||
35 | Requirement: MANDATORY | |
36 | ||
37 | The boot loader is expected to find and initialise all RAM that the | |
38 | kernel will use for volatile data storage in the system. It performs | |
39 | this in a machine dependent manner. (It may use internal algorithms | |
40 | to automatically locate and size all RAM, or it may use knowledge of | |
41 | the RAM in the machine, or any other method the boot loader designer | |
42 | sees fit.) | |
43 | ||
44 | ||
45 | 2. Setup the device tree | |
46 | ------------------------- | |
47 | ||
48 | Requirement: MANDATORY | |
49 | ||
61bd93ce AB |
50 | The device tree blob (dtb) must be placed on an 8-byte boundary and must |
51 | not exceed 2 megabytes in size. Since the dtb will be mapped cacheable | |
52 | using blocks of up to 2 megabytes in size, it must not be placed within | |
53 | any 2M region which must be mapped with any specific attributes. | |
9703d9d7 | 54 | |
61bd93ce AB |
55 | NOTE: versions prior to v4.2 also require that the DTB be placed within |
56 | the 512 MB region starting at text_offset bytes below the kernel Image. | |
9703d9d7 CM |
57 | |
58 | 3. Decompress the kernel image | |
59 | ------------------------------ | |
60 | ||
61 | Requirement: OPTIONAL | |
62 | ||
63 | The AArch64 kernel does not currently provide a decompressor and | |
64 | therefore requires decompression (gzip etc.) to be performed by the boot | |
65 | loader if a compressed Image target (e.g. Image.gz) is used. For | |
66 | bootloaders that do not implement this requirement, the uncompressed | |
67 | Image target is available instead. | |
68 | ||
69 | ||
70 | 4. Call the kernel image | |
71 | ------------------------ | |
72 | ||
73 | Requirement: MANDATORY | |
74 | ||
b693d0b3 | 75 | The decompressed kernel image contains a 64-byte header as follows:: |
9703d9d7 | 76 | |
4370eec0 RF |
77 | u32 code0; /* Executable code */ |
78 | u32 code1; /* Executable code */ | |
a2c1d73b MR |
79 | u64 text_offset; /* Image load offset, little endian */ |
80 | u64 image_size; /* Effective Image size, little endian */ | |
81 | u64 flags; /* kernel flags, little endian */ | |
9703d9d7 | 82 | u64 res2 = 0; /* reserved */ |
4370eec0 RF |
83 | u64 res3 = 0; /* reserved */ |
84 | u64 res4 = 0; /* reserved */ | |
85 | u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ | |
6c020ea8 | 86 | u32 res5; /* reserved (used for PE COFF offset) */ |
4370eec0 RF |
87 | |
88 | ||
89 | Header notes: | |
90 | ||
a2c1d73b MR |
91 | - As of v3.17, all fields are little endian unless stated otherwise. |
92 | ||
4370eec0 | 93 | - code0/code1 are responsible for branching to stext. |
a2c1d73b | 94 | |
cdd78578 MS |
95 | - when booting through EFI, code0/code1 are initially skipped. |
96 | res5 is an offset to the PE header and the PE header has the EFI | |
a2c1d73b | 97 | entry point (efi_stub_entry). When the stub has done its work, it |
cdd78578 | 98 | jumps to code0 to resume the normal boot process. |
9703d9d7 | 99 | |
a2c1d73b MR |
100 | - Prior to v3.17, the endianness of text_offset was not specified. In |
101 | these cases image_size is zero and text_offset is 0x80000 in the | |
102 | endianness of the kernel. Where image_size is non-zero image_size is | |
103 | little-endian and must be respected. Where image_size is zero, | |
104 | text_offset can be assumed to be 0x80000. | |
105 | ||
106 | - The flags field (introduced in v3.17) is a little-endian 64-bit field | |
107 | composed as follows: | |
b693d0b3 MCC |
108 | |
109 | ============= =============================================================== | |
110 | Bit 0 Kernel endianness. 1 if BE, 0 if LE. | |
111 | Bit 1-2 Kernel Page size. | |
112 | ||
113 | * 0 - Unspecified. | |
114 | * 1 - 4K | |
115 | * 2 - 16K | |
116 | * 3 - 64K | |
117 | Bit 3 Kernel physical placement | |
118 | ||
119 | 0 | |
120 | 2MB aligned base should be as close as possible | |
121 | to the base of DRAM, since memory below it is not | |
122 | accessible via the linear mapping | |
123 | 1 | |
453dfcee AB |
124 | 2MB aligned base such that all image_size bytes |
125 | counted from the start of the image are within | |
126 | the 48-bit addressable range of physical memory | |
b693d0b3 MCC |
127 | Bits 4-63 Reserved. |
128 | ============= =============================================================== | |
a2c1d73b MR |
129 | |
130 | - When image_size is zero, a bootloader should attempt to keep as much | |
131 | memory as possible free for use by the kernel immediately after the | |
132 | end of the kernel image. The amount of space required will vary | |
133 | depending on selected features, and is effectively unbound. | |
134 | ||
135 | The Image must be placed text_offset bytes from a 2MB aligned base | |
a7f8de16 AB |
136 | address anywhere in usable system RAM and called there. The region |
137 | between the 2 MB aligned base address and the start of the image has no | |
138 | special significance to the kernel, and may be used for other purposes. | |
a2c1d73b MR |
139 | At least image_size bytes from the start of the image must be free for |
140 | use by the kernel. | |
a7f8de16 AB |
141 | NOTE: versions prior to v4.6 cannot make use of memory below the |
142 | physical offset of the Image so it is recommended that the Image be | |
143 | placed as close as possible to the start of system RAM. | |
a2c1d73b | 144 | |
177e15f0 AB |
145 | If an initrd/initramfs is passed to the kernel at boot, it must reside |
146 | entirely within a 1 GB aligned physical memory window of up to 32 GB in | |
147 | size that fully covers the kernel Image as well. | |
148 | ||
6c020ea8 AB |
149 | Any memory described to the kernel (even that below the start of the |
150 | image) which is not marked as reserved from the kernel (e.g., with a | |
a2c1d73b MR |
151 | memreserve region in the device tree) will be considered as available to |
152 | the kernel. | |
9703d9d7 CM |
153 | |
154 | Before jumping into the kernel, the following conditions must be met: | |
155 | ||
156 | - Quiesce all DMA capable devices so that memory does not get | |
157 | corrupted by bogus network packets or disk data. This will save | |
158 | you many hours of debug. | |
159 | ||
b693d0b3 MCC |
160 | - Primary CPU general-purpose register settings: |
161 | ||
162 | - x0 = physical address of device tree blob (dtb) in system RAM. | |
163 | - x1 = 0 (reserved for future use) | |
164 | - x2 = 0 (reserved for future use) | |
165 | - x3 = 0 (reserved for future use) | |
9703d9d7 CM |
166 | |
167 | - CPU mode | |
b693d0b3 | 168 | |
9703d9d7 CM |
169 | All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, |
170 | IRQ and FIQ). | |
b8ac4ee0 AP |
171 | The CPU must be in non-secure state, either in EL2 (RECOMMENDED in order |
172 | to have access to the virtualisation extensions), or in EL1. | |
9703d9d7 CM |
173 | |
174 | - Caches, MMUs | |
b693d0b3 | 175 | |
9703d9d7 | 176 | The MMU must be off. |
877a37d3 | 177 | |
e24e03aa WD |
178 | The instruction cache may be on or off, and must not hold any stale |
179 | entries corresponding to the loaded kernel image. | |
877a37d3 | 180 | |
c218bca7 CM |
181 | The address range corresponding to the loaded kernel image must be |
182 | cleaned to the PoC. In the presence of a system cache or other | |
183 | coherent masters with caches enabled, this will typically require | |
184 | cache maintenance by VA rather than set/way operations. | |
185 | System caches which respect the architected cache maintenance by VA | |
186 | operations must be configured and may be enabled. | |
187 | System caches which do not respect architected cache maintenance by VA | |
188 | operations (not recommended) must be configured and disabled. | |
9703d9d7 CM |
189 | |
190 | - Architected timers | |
b693d0b3 | 191 | |
4fcd6e14 MR |
192 | CNTFRQ must be programmed with the timer frequency and CNTVOFF must |
193 | be programmed with a consistent value on all CPUs. If entering the | |
194 | kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where | |
195 | available. | |
9703d9d7 CM |
196 | |
197 | - Coherency | |
b693d0b3 | 198 | |
9703d9d7 CM |
199 | All CPUs to be booted by the kernel must be part of the same coherency |
200 | domain on entry to the kernel. This may require IMPLEMENTATION DEFINED | |
201 | initialisation to enable the receiving of maintenance operations on | |
202 | each CPU. | |
203 | ||
204 | - System registers | |
b693d0b3 | 205 | |
230800cd MB |
206 | All writable architected system registers at or below the exception |
207 | level where the kernel image will be entered must be initialised by | |
208 | software at a higher exception level to prevent execution in an UNKNOWN | |
209 | state. | |
9703d9d7 | 210 | |
e3849765 MZ |
211 | For all systems: |
212 | - If EL3 is present: | |
213 | ||
214 | - SCR_EL3.FIQ must have the same value across all CPUs the kernel is | |
215 | executing on. | |
216 | - The value of SCR_EL3.FIQ must be the same as the one present at boot | |
217 | time whenever the kernel is executing. | |
218 | ||
219 | - If EL3 is present and the kernel is entered at EL2: | |
220 | ||
221 | - SCR_EL3.HCE (bit 8) must be initialised to 0b1. | |
d98d0a99 | 222 | |
6d32ab2d | 223 | For systems with a GICv3 interrupt controller to be used in v3 mode: |
63f8344c | 224 | - If EL3 is present: |
b693d0b3 MCC |
225 | |
226 | - ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. | |
227 | - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. | |
7e3a57fa MZ |
228 | - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across |
229 | all CPUs the kernel is executing on, and must stay constant | |
230 | for the lifetime of the kernel. | |
b693d0b3 | 231 | |
63f8344c | 232 | - If the kernel is entered at EL1: |
b693d0b3 MCC |
233 | |
234 | - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 | |
235 | - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. | |
236 | ||
6d32ab2d MZ |
237 | - The DT or ACPI tables must describe a GICv3 interrupt controller. |
238 | ||
239 | For systems with a GICv3 interrupt controller to be used in | |
240 | compatibility (v2) mode: | |
b693d0b3 | 241 | |
6d32ab2d | 242 | - If EL3 is present: |
b693d0b3 MCC |
243 | |
244 | ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0. | |
245 | ||
6d32ab2d | 246 | - If the kernel is entered at EL1: |
b693d0b3 MCC |
247 | |
248 | ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0. | |
249 | ||
6d32ab2d | 250 | - The DT or ACPI tables must describe a GICv2 interrupt controller. |
63f8344c | 251 | |
fbedc599 | 252 | For CPUs with pointer authentication functionality: |
877a37d3 | 253 | |
fbedc599 | 254 | - If EL3 is present: |
b693d0b3 MCC |
255 | |
256 | - SCR_EL3.APK (bit 16) must be initialised to 0b1 | |
257 | - SCR_EL3.API (bit 17) must be initialised to 0b1 | |
258 | ||
fbedc599 | 259 | - If the kernel is entered at EL1: |
b693d0b3 MCC |
260 | |
261 | - HCR_EL2.APK (bit 40) must be initialised to 0b1 | |
262 | - HCR_EL2.API (bit 41) must be initialised to 0b1 | |
fbedc599 | 263 | |
6abde908 | 264 | For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: |
877a37d3 | 265 | |
6abde908 | 266 | - If EL3 is present: |
877a37d3 MCC |
267 | |
268 | - CPTR_EL3.TAM (bit 30) must be initialised to 0b0 | |
269 | - CPTR_EL2.TAM (bit 30) must be initialised to 0b0 | |
270 | - AMCNTENSET0_EL0 must be initialised to 0b1111 | |
271 | - AMCNTENSET1_EL0 must be initialised to a platform specific value | |
272 | having 0b1 set for the corresponding bit for each of the auxiliary | |
273 | counters present. | |
274 | ||
6abde908 | 275 | - If the kernel is entered at EL1: |
877a37d3 MCC |
276 | |
277 | - AMCNTENSET0_EL0 must be initialised to 0b1111 | |
278 | - AMCNTENSET1_EL0 must be initialised to a platform specific value | |
279 | having 0b1 set for the corresponding bit for each of the auxiliary | |
280 | counters present. | |
6abde908 | 281 | |
3e237387 MB |
282 | For CPUs with the Fine Grained Traps (FEAT_FGT) extension present: |
283 | ||
284 | - If EL3 is present and the kernel is entered at EL2: | |
285 | ||
286 | - SCR_EL3.FGTEn (bit 27) must be initialised to 0b1. | |
287 | ||
ca940790 MB |
288 | For CPUs with support for HCRX_EL2 (FEAT_HCX) present: |
289 | ||
290 | - If EL3 is present and the kernel is entered at EL2: | |
291 | ||
292 | - SCR_EL3.HXEn (bit 38) must be initialised to 0b1. | |
293 | ||
b30dbf4d MB |
294 | For CPUs with Advanced SIMD and floating point support: |
295 | ||
296 | - If EL3 is present: | |
297 | ||
298 | - CPTR_EL3.TFP (bit 10) must be initialised to 0b0. | |
299 | ||
300 | - If EL2 is present and the kernel is entered at EL1: | |
301 | ||
302 | - CPTR_EL2.TFP (bit 10) must be initialised to 0b0. | |
303 | ||
ff1c42cd MB |
304 | For CPUs with the Scalable Vector Extension (FEAT_SVE) present: |
305 | ||
306 | - if EL3 is present: | |
307 | ||
308 | - CPTR_EL3.EZ (bit 8) must be initialised to 0b1. | |
309 | ||
310 | - ZCR_EL3.LEN must be initialised to the same value for all CPUs the | |
311 | kernel is executed on. | |
312 | ||
313 | - If the kernel is entered at EL1 and EL2 is present: | |
314 | ||
315 | - CPTR_EL2.TZ (bit 8) must be initialised to 0b0. | |
316 | ||
317 | - CPTR_EL2.ZEN (bits 17:16) must be initialised to 0b11. | |
318 | ||
319 | - ZCR_EL2.LEN must be initialised to the same value for all CPUs the | |
320 | kernel will execute on. | |
321 | ||
a8caaa23 MB |
322 | For CPUs with the Scalable Matrix Extension (FEAT_SME): |
323 | ||
324 | - If EL3 is present: | |
325 | ||
326 | - CPTR_EL3.ESM (bit 12) must be initialised to 0b1. | |
327 | ||
328 | - SCR_EL3.EnTP2 (bit 41) must be initialised to 0b1. | |
329 | ||
330 | - SMCR_EL3.LEN must be initialised to the same value for all CPUs the | |
331 | kernel will execute on. | |
332 | ||
333 | - If the kernel is entered at EL1 and EL2 is present: | |
334 | ||
335 | - CPTR_EL2.TSM (bit 12) must be initialised to 0b0. | |
336 | ||
337 | - CPTR_EL2.SMEN (bits 25:24) must be initialised to 0b11. | |
338 | ||
339 | - SCTLR_EL2.EnTP2 (bit 60) must be initialised to 0b1. | |
340 | ||
341 | - SMCR_EL2.LEN must be initialised to the same value for all CPUs the | |
342 | kernel will execute on. | |
343 | ||
be0ddf52 MB |
344 | - HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. |
345 | ||
346 | - HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. | |
347 | ||
348 | - HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. | |
349 | ||
350 | - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. | |
351 | ||
8a8112d8 | 352 | For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64): |
d198c77b MB |
353 | |
354 | - If EL3 is present: | |
355 | ||
356 | - SMCR_EL3.FA64 (bit 31) must be initialised to 0b1. | |
357 | ||
358 | - If the kernel is entered at EL1 and EL2 is present: | |
359 | ||
360 | - SMCR_EL2.FA64 (bit 31) must be initialised to 0b1. | |
361 | ||
b6ba1a89 PC |
362 | For CPUs with the Memory Tagging Extension feature (FEAT_MTE2): |
363 | ||
364 | - If EL3 is present: | |
365 | ||
366 | - SCR_EL3.ATA (bit 26) must be initialised to 0b1. | |
367 | ||
368 | - If the kernel is entered at EL1 and EL2 is present: | |
369 | ||
370 | - HCR_EL2.ATA (bit 56) must be initialised to 0b1. | |
371 | ||
4fcd6e14 MR |
372 | The requirements described above for CPU mode, caches, MMUs, architected |
373 | timers, coherency and system registers apply to all CPUs. All CPUs must | |
ee61f36d MB |
374 | enter the kernel in the same exception level. Where the values documented |
375 | disable traps it is permissible for these traps to be enabled so long as | |
376 | those traps are handled transparently by higher exception levels as though | |
377 | the values documented were set. | |
4fcd6e14 | 378 | |
9703d9d7 CM |
379 | The boot loader is expected to enter the kernel on each CPU in the |
380 | following manner: | |
381 | ||
382 | - The primary CPU must jump directly to the first instruction of the | |
383 | kernel image. The device tree blob passed by this CPU must contain | |
4fcd6e14 MR |
384 | an 'enable-method' property for each cpu node. The supported |
385 | enable-methods are described below. | |
9703d9d7 CM |
386 | |
387 | It is expected that the bootloader will generate these device tree | |
388 | properties and insert them into the blob prior to kernel entry. | |
389 | ||
4fcd6e14 MR |
390 | - CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' |
391 | property in their cpu node. This property identifies a | |
392 | naturally-aligned 64-bit zero-initalised memory location. | |
393 | ||
394 | These CPUs should spin outside of the kernel in a reserved area of | |
395 | memory (communicated to the kernel by a /memreserve/ region in the | |
9703d9d7 CM |
396 | device tree) polling their cpu-release-addr location, which must be |
397 | contained in the reserved region. A wfe instruction may be inserted | |
398 | to reduce the overhead of the busy-loop and a sev will be issued by | |
399 | the primary CPU. When a read of the location pointed to by the | |
4fcd6e14 MR |
400 | cpu-release-addr returns a non-zero value, the CPU must jump to this |
401 | value. The value will be written as a single 64-bit little-endian | |
402 | value, so CPUs must convert the read value to their native endianness | |
403 | before jumping to it. | |
404 | ||
405 | - CPUs with a "psci" enable method should remain outside of | |
406 | the kernel (i.e. outside of the regions of memory described to the | |
407 | kernel in the memory node, or in a reserved area of memory described | |
408 | to the kernel by a /memreserve/ region in the device tree). The | |
409 | kernel will issue CPU_ON calls as described in ARM document number ARM | |
410 | DEN 0022A ("Power State Coordination Interface System Software on ARM | |
411 | processors") to bring CPUs into the kernel. | |
412 | ||
413 | The device tree should contain a 'psci' node, as described in | |
5025ef8b | 414 | Documentation/devicetree/bindings/arm/psci.yaml. |
9703d9d7 CM |
415 | |
416 | - Secondary CPU general-purpose register settings | |
877a37d3 MCC |
417 | |
418 | - x0 = 0 (reserved for future use) | |
419 | - x1 = 0 (reserved for future use) | |
420 | - x2 = 0 (reserved for future use) | |
421 | - x3 = 0 (reserved for future use) |