Commit | Line | Data |
---|---|---|
b693d0b3 MCC |
1 | =================================================== |
2 | Scalable Vector Extension support for AArch64 Linux | |
3 | =================================================== | |
ce699081 DM |
4 | |
5 | Author: Dave Martin <Dave.Martin@arm.com> | |
b693d0b3 | 6 | |
ce699081 DM |
7 | Date: 4 August 2017 |
8 | ||
9 | This document outlines briefly the interface provided to userspace by Linux in | |
96d32e63 MB |
10 | order to support use of the ARM Scalable Vector Extension (SVE), including |
11 | interactions with Streaming SVE mode added by the Scalable Matrix Extension | |
12 | (SME). | |
ce699081 DM |
13 | |
14 | This is an outline of the most important features and issues only and not | |
15 | intended to be exhaustive. | |
16 | ||
17 | This document does not aim to describe the SVE architecture or programmer's | |
18 | model. To aid understanding, a minimal description of relevant programmer's | |
19 | model features for SVE is included in Appendix A. | |
20 | ||
21 | ||
22 | 1. General | |
23 | ----------- | |
24 | ||
25 | * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are | |
26 | tracked per-thread. | |
27 | ||
96d32e63 MB |
28 | * In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present |
29 | in the system, when it is not supported and these interfaces are used to | |
30 | access streaming mode FFR is read and written as zero. | |
31 | ||
ce699081 DM |
32 | * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector |
33 | AT_HWCAP entry. Presence of this flag implies the presence of the SVE | |
34 | instructions and registers, and the Linux-specific system interfaces | |
35 | described in this document. SVE is reported in /proc/cpuinfo as "sve". | |
36 | ||
37 | * Support for the execution of SVE instructions in userspace can also be | |
38 | detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS | |
39 | instruction, and checking that the value of the SVE field is nonzero. [3] | |
40 | ||
41 | It does not guarantee the presence of the system interfaces described in the | |
42 | following sections: software that needs to verify that those interfaces are | |
43 | present must check for HWCAP_SVE instead. | |
44 | ||
06a916fe DM |
45 | * On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also |
46 | be reported in the AT_HWCAP2 aux vector entry. In addition to this, | |
47 | optional extensions to SVE2 may be reported by the presence of: | |
48 | ||
49 | HWCAP2_SVE2 | |
50 | HWCAP2_SVEAES | |
51 | HWCAP2_SVEPMULL | |
52 | HWCAP2_SVEBITPERM | |
53 | HWCAP2_SVESHA3 | |
54 | HWCAP2_SVESM4 | |
d12aada8 | 55 | HWCAP2_SVE2P1 |
06a916fe DM |
56 | |
57 | This list may be extended over time as the SVE architecture evolves. | |
58 | ||
59 | These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1, | |
60 | which userspace can read using an MRS instruction. See elf_hwcaps.txt and | |
61 | cpu-feature-registers.txt for details. | |
62 | ||
96d32e63 MB |
63 | * On hardware that supports the SME extensions, HWCAP2_SME will also be |
64 | reported in the AT_HWCAP2 aux vector entry. Among other things SME adds | |
65 | streaming mode which provides a subset of the SVE feature set using a | |
66 | separate SME vector length and the same Z/V registers. See sme.rst | |
67 | for more details. | |
68 | ||
ce699081 DM |
69 | * Debuggers should restrict themselves to interacting with the target via the |
70 | NT_ARM_SVE regset. The recommended way of detecting support for this regset | |
71 | is to connect to a target process first and then attempt a | |
96d32e63 MB |
72 | ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is |
73 | present and streaming SVE mode is in use the FPSIMD subset of registers | |
74 | will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode | |
75 | in the target. | |
ce699081 | 76 | |
41040cf7 DM |
77 | * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory |
78 | between userspace and the kernel, the register value is encoded in memory in | |
79 | an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at | |
80 | byte offset i from the start of the memory representation. This affects for | |
81 | example the signal frame (struct sve_context) and ptrace interface | |
82 | (struct user_sve_header) and associated data. | |
83 | ||
84 | Beware that on big-endian systems this results in a different byte order than | |
85 | for the FPSIMD V-registers, which are stored as single host-endian 128-bit | |
86 | values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at | |
87 | byte offset i. (struct fpsimd_context, struct user_fpsimd_state). | |
88 | ||
ce699081 DM |
89 | |
90 | 2. Vector length terminology | |
91 | ----------------------------- | |
92 | ||
93 | The size of an SVE vector (Z) register is referred to as the "vector length". | |
94 | ||
95 | To avoid confusion about the units used to express vector length, the kernel | |
96 | adopts the following conventions: | |
97 | ||
98 | * Vector length (VL) = size of a Z-register in bytes | |
99 | ||
100 | * Vector quadwords (VQ) = size of a Z-register in units of 128 bits | |
101 | ||
102 | (So, VL = 16 * VQ.) | |
103 | ||
104 | The VQ convention is used where the underlying granularity is important, such | |
105 | as in data structure definitions. In most other situations, the VL convention | |
106 | is used. This is consistent with the meaning of the "VL" pseudo-register in | |
107 | the SVE instruction set architecture. | |
108 | ||
109 | ||
110 | 3. System call behaviour | |
111 | ------------------------- | |
112 | ||
113 | * On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of | |
114 | Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR | |
d09ee410 | 115 | become zero on return from a syscall. |
ce699081 DM |
116 | |
117 | * The SVE registers are not used to pass arguments to or receive results from | |
118 | any syscall. | |
119 | ||
120 | * In practice the affected registers/bits will be preserved or will be replaced | |
121 | with zeros on return from a syscall, but userspace should not make | |
122 | assumptions about this. The kernel behaviour may vary on a case-by-case | |
123 | basis. | |
124 | ||
125 | * All other SVE state of a thread, including the currently configured vector | |
126 | length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector | |
127 | length (if any), is preserved across all syscalls, subject to the specific | |
128 | exceptions for execve() described in section 6. | |
129 | ||
130 | In particular, on return from a fork() or clone(), the parent and new child | |
131 | process or thread share identical SVE configuration, matching that of the | |
132 | parent before the call. | |
133 | ||
134 | ||
135 | 4. Signal handling | |
136 | ------------------- | |
137 | ||
138 | * A new signal frame record sve_context encodes the SVE registers on signal | |
139 | delivery. [1] | |
140 | ||
141 | * This record is supplementary to fpsimd_context. The FPSR and FPCR registers | |
142 | are only present in fpsimd_context. For convenience, the content of V0..V31 | |
143 | is duplicated between sve_context and fpsimd_context. | |
144 | ||
96d32e63 MB |
145 | * The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which |
146 | if set indicates that the thread is in streaming mode and the vector length | |
147 | and register data (if present) describe the streaming SVE data and vector | |
148 | length. | |
149 | ||
ce699081 DM |
150 | * The signal frame record for SVE always contains basic metadata, in particular |
151 | the thread's vector length (in sve_context.vl). | |
152 | ||
153 | * The SVE registers may or may not be included in the record, depending on | |
154 | whether the registers are live for the thread. The registers are present if | |
155 | and only if: | |
156 | sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). | |
157 | ||
158 | * If the registers are present, the remainder of the record has a vl-dependent | |
159 | size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to | |
160 | the members. | |
161 | ||
41040cf7 DM |
162 | * Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant |
163 | layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the | |
164 | start of the register's representation in memory. | |
165 | ||
ce699081 DM |
166 | * If the SVE context is too big to fit in sigcontext.__reserved[], then extra |
167 | space is allocated on the stack, an extra_context record is written in | |
168 | __reserved[] referencing this space. sve_context is then written in the | |
169 | extra space. Refer to [1] for further details about this mechanism. | |
170 | ||
171 | ||
172 | 5. Signal return | |
173 | ----------------- | |
174 | ||
175 | When returning from a signal handler: | |
176 | ||
177 | * If there is no sve_context record in the signal frame, or if the record is | |
a70f00e7 | 178 | present but contains no register data as described in the previous section, |
ce699081 DM |
179 | then the SVE registers/bits become non-live and take unspecified values. |
180 | ||
181 | * If sve_context is present in the signal frame and contains full register | |
182 | data, the SVE registers become live and are populated with the specified | |
183 | data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 | |
184 | are always restored from the corresponding members of fpsimd_context.vregs[] | |
185 | and not from sve_context. The remaining bits are restored from sve_context. | |
186 | ||
187 | * Inclusion of fpsimd_context in the signal frame remains mandatory, | |
188 | irrespective of whether sve_context is present or not. | |
189 | ||
190 | * The vector length cannot be changed via signal return. If sve_context.vl in | |
191 | the signal frame does not match the current vector length, the signal return | |
192 | attempt is treated as illegal, resulting in a forced SIGSEGV. | |
193 | ||
96d32e63 MB |
194 | * It is permitted to enter or leave streaming mode by setting or clearing |
195 | the SVE_SIG_FLAG_SM flag but applications should take care to ensure that | |
196 | when doing so sve_context.vl and any register data are appropriate for the | |
197 | vector length in the new mode. | |
198 | ||
ce699081 DM |
199 | |
200 | 6. prctl extensions | |
201 | -------------------- | |
202 | ||
203 | Some new prctl() calls are added to allow programs to manage the SVE vector | |
204 | length: | |
205 | ||
206 | prctl(PR_SVE_SET_VL, unsigned long arg) | |
207 | ||
208 | Sets the vector length of the calling thread and related flags, where | |
209 | arg == vl | flags. Other threads of the calling process are unaffected. | |
210 | ||
211 | vl is the desired vector length, where sve_vl_valid(vl) must be true. | |
212 | ||
213 | flags: | |
214 | ||
9ba6a9ef | 215 | PR_SVE_VL_INHERIT |
ce699081 DM |
216 | |
217 | Inherit the current vector length across execve(). Otherwise, the | |
218 | vector length is reset to the system default at execve(). (See | |
219 | Section 9.) | |
220 | ||
221 | PR_SVE_SET_VL_ONEXEC | |
222 | ||
223 | Defer the requested vector length change until the next execve() | |
224 | performed by this thread. | |
225 | ||
a70f00e7 | 226 | The effect is equivalent to implicit execution of the following |
ce699081 DM |
227 | call immediately after the next execve() (if any) by the thread: |
228 | ||
229 | prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) | |
230 | ||
231 | This allows launching of a new program with a different vector | |
232 | length, while avoiding runtime side effects in the caller. | |
233 | ||
234 | ||
235 | Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect | |
236 | immediately. | |
237 | ||
238 | ||
239 | Return value: a nonnegative on success, or a negative value on error: | |
240 | EINVAL: SVE not supported, invalid vector length requested, or | |
241 | invalid flags. | |
242 | ||
243 | ||
244 | On success: | |
245 | ||
246 | * Either the calling thread's vector length or the deferred vector length | |
247 | to be applied at the next execve() by the thread (dependent on whether | |
248 | PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value | |
249 | supported by the system that is less than or equal to vl. If vl == | |
250 | SVE_VL_MAX, the value set will be the largest value supported by the | |
251 | system. | |
252 | ||
253 | * Any previously outstanding deferred vector length change in the calling | |
254 | thread is cancelled. | |
255 | ||
256 | * The returned value describes the resulting configuration, encoded as for | |
257 | PR_SVE_GET_VL. The vector length reported in this value is the new | |
258 | current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not | |
259 | present in arg; otherwise, the reported vector length is the deferred | |
260 | vector length that will be applied at the next execve() by the calling | |
261 | thread. | |
262 | ||
263 | * Changing the vector length causes all of P0..P15, FFR and all bits of | |
afce0cc9 | 264 | Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become |
ce699081 DM |
265 | unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current |
266 | vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC | |
267 | flag, does not constitute a change to the vector length for this purpose. | |
268 | ||
269 | ||
270 | prctl(PR_SVE_GET_VL) | |
271 | ||
272 | Gets the vector length of the calling thread. | |
273 | ||
274 | The following flag may be OR-ed into the result: | |
275 | ||
9ba6a9ef | 276 | PR_SVE_VL_INHERIT |
ce699081 DM |
277 | |
278 | Vector length will be inherited across execve(). | |
279 | ||
280 | There is no way to determine whether there is an outstanding deferred | |
281 | vector length change (which would only normally be the case between a | |
282 | fork() or vfork() and the corresponding execve() in typical use). | |
283 | ||
aed34d9e | 284 | To extract the vector length from the result, bitwise and it with |
ce699081 DM |
285 | PR_SVE_VL_LEN_MASK. |
286 | ||
287 | Return value: a nonnegative value on success, or a negative value on error: | |
288 | EINVAL: SVE not supported. | |
289 | ||
290 | ||
291 | 7. ptrace extensions | |
292 | --------------------- | |
293 | ||
96d32e63 MB |
294 | * New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with |
295 | PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the | |
296 | streaming mode SVE registers and NT_ARM_SVE describes the | |
297 | non-streaming mode SVE registers. | |
298 | ||
299 | In this description a register set is referred to as being "live" when | |
300 | the target is in the appropriate streaming or non-streaming mode and is | |
301 | using data beyond the subset shared with the FPSIMD Vn registers. | |
ce699081 DM |
302 | |
303 | Refer to [2] for definitions. | |
304 | ||
305 | The regset data starts with struct user_sve_header, containing: | |
306 | ||
307 | size | |
308 | ||
309 | Size of the complete regset, in bytes. | |
310 | This depends on vl and possibly on other things in the future. | |
311 | ||
312 | If a call to PTRACE_GETREGSET requests less data than the value of | |
313 | size, the caller can allocate a larger buffer and retry in order to | |
314 | read the complete regset. | |
315 | ||
316 | max_size | |
317 | ||
318 | Maximum size in bytes that the regset can grow to for the target | |
319 | thread. The regset won't grow bigger than this even if the target | |
320 | thread changes its vector length etc. | |
321 | ||
322 | vl | |
323 | ||
324 | Target thread's current vector length, in bytes. | |
325 | ||
326 | max_vl | |
327 | ||
328 | Maximum possible vector length for the target thread. | |
329 | ||
330 | flags | |
331 | ||
96d32e63 | 332 | at most one of |
ce699081 DM |
333 | |
334 | SVE_PT_REGS_FPSIMD | |
335 | ||
336 | SVE registers are not live (GETREGSET) or are to be made | |
337 | non-live (SETREGSET). | |
338 | ||
339 | The payload is of type struct user_fpsimd_state, with the same | |
340 | meaning as for NT_PRFPREG, starting at offset | |
341 | SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. | |
342 | ||
343 | Extra data might be appended in the future: the size of the | |
344 | payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). | |
345 | ||
346 | vq should be obtained using sve_vq_from_vl(vl). | |
347 | ||
348 | or | |
349 | ||
350 | SVE_PT_REGS_SVE | |
351 | ||
352 | SVE registers are live (GETREGSET) or are to be made live | |
353 | (SETREGSET). | |
354 | ||
355 | The payload contains the SVE register data, starting at offset | |
356 | SVE_PT_SVE_OFFSET from the start of user_sve_header, and with | |
357 | size SVE_PT_SVE_SIZE(vq, flags); | |
358 | ||
359 | ... OR-ed with zero or more of the following flags, which have the same | |
360 | meaning and behaviour as the corresponding PR_SET_VL_* flags: | |
361 | ||
362 | SVE_PT_VL_INHERIT | |
363 | ||
364 | SVE_PT_VL_ONEXEC (SETREGSET only). | |
365 | ||
96d32e63 MB |
366 | If neither FPSIMD nor SVE flags are provided then no register |
367 | payload is available, this is only possible when SME is implemented. | |
368 | ||
369 | ||
ce699081 DM |
370 | * The effects of changing the vector length and/or flags are equivalent to |
371 | those documented for PR_SVE_SET_VL. | |
372 | ||
373 | The caller must make a further GETREGSET call if it needs to know what VL is | |
374 | actually set by SETREGSET, unless is it known in advance that the requested | |
375 | VL is supported. | |
376 | ||
377 | * In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on | |
378 | the header fields. The SVE_PT_SVE_*() macros are provided to facilitate | |
379 | access to the members. | |
380 | ||
381 | * In either case, for SETREGSET it is permissible to omit the payload, in which | |
382 | case only the vector length and flags are changed (along with any | |
383 | consequences of those changes). | |
384 | ||
96d32e63 MB |
385 | * In systems supporting SME when in streaming mode a GETREGSET for |
386 | NT_REG_SVE will return only the user_sve_header with no register data, | |
387 | similarly a GETREGSET for NT_REG_SSVE will not return any register data | |
388 | when not in streaming mode. | |
389 | ||
390 | * A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD. | |
391 | ||
ce699081 DM |
392 | * For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the |
393 | requested VL is not supported, the effect will be the same as if the | |
394 | payload were omitted, except that an EIO error is reported. No | |
395 | attempt is made to translate the payload data to the correct layout | |
396 | for the vector length actually set. The thread's FPSIMD state is | |
397 | preserved, but the remaining bits of the SVE registers become | |
398 | unspecified. It is up to the caller to translate the payload layout | |
399 | for the actual VL and retry. | |
400 | ||
96d32e63 MB |
401 | * Where SME is implemented it is not possible to GETREGSET the register |
402 | state for normal SVE when in streaming mode, nor the streaming mode | |
403 | register state when in normal mode, regardless of the implementation defined | |
404 | behaviour of the hardware for sharing data between the two modes. | |
405 | ||
406 | * Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in | |
407 | streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode | |
408 | if the target was not in streaming mode. | |
409 | ||
ce699081 DM |
410 | * The effect of writing a partial, incomplete payload is unspecified. |
411 | ||
412 | ||
413 | 8. ELF coredump extensions | |
414 | --------------------------- | |
415 | ||
96d32e63 MB |
416 | * NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for |
417 | each thread of the dumped process. The contents will be equivalent to the | |
418 | data that would have been read if a PTRACE_GETREGSET of the corresponding | |
419 | type were executed for each thread when the coredump was generated. | |
ce699081 DM |
420 | |
421 | 9. System runtime configuration | |
422 | -------------------------------- | |
423 | ||
424 | * To mitigate the ABI impact of expansion of the signal frame, a policy | |
425 | mechanism is provided for administrators, distro maintainers and developers | |
426 | to set the default vector length for userspace processes: | |
427 | ||
428 | /proc/sys/abi/sve_default_vector_length | |
429 | ||
430 | Writing the text representation of an integer to this file sets the system | |
431 | default vector length to the specified value, unless the value is greater | |
432 | than the maximum vector length supported by the system in which case the | |
433 | default vector length is set to that maximum. | |
434 | ||
435 | The result can be determined by reopening the file and reading its | |
436 | contents. | |
437 | ||
438 | At boot, the default vector length is initially set to 64 or the maximum | |
439 | supported vector length, whichever is smaller. This determines the initial | |
440 | vector length of the init process (PID 1). | |
441 | ||
442 | Reading this file returns the current system default vector length. | |
443 | ||
444 | * At every execve() call, the new vector length of the new process is set to | |
445 | the system default vector length, unless | |
446 | ||
9ba6a9ef | 447 | * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the |
ce699081 DM |
448 | calling thread, or |
449 | ||
450 | * a deferred vector length change is pending, established via the | |
451 | PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). | |
452 | ||
453 | * Modifying the system default vector length does not affect the vector length | |
454 | of any existing process or thread that does not make an execve() call. | |
455 | ||
1f2906d1 JC |
456 | 10. Perf extensions |
457 | -------------------------------- | |
458 | ||
459 | * The arm64 specific DWARF standard [5] added the VG (Vector Granule) register | |
460 | at index 46. This register is used for DWARF unwinding when variable length | |
461 | SVE registers are pushed onto the stack. | |
462 | ||
463 | * Its value is equivalent to the current SVE vector length (VL) in bits divided | |
464 | by 64. | |
465 | ||
466 | * The value is included in Perf samples in the regs[46] field if | |
467 | PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set. | |
468 | ||
469 | * The value is the current value at the time the sample was taken, and it can | |
470 | change over time. | |
471 | ||
472 | * If the system doesn't support SVE when perf_event_open is called with these | |
473 | settings, the event will fail to open. | |
ce699081 DM |
474 | |
475 | Appendix A. SVE programmer's model (informative) | |
476 | ================================================= | |
477 | ||
478 | This section provides a minimal description of the additions made by SVE to the | |
479 | ARMv8-A programmer's model that are relevant to this document. | |
480 | ||
481 | Note: This section is for information only and not intended to be complete or | |
482 | to replace any architectural specification. | |
483 | ||
484 | A.1. Registers | |
485 | --------------- | |
486 | ||
487 | In A64 state, SVE adds the following: | |
488 | ||
489 | * 32 8VL-bit vector registers Z0..Z31 | |
490 | For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. | |
491 | ||
492 | A register write using a Vn register name zeros all bits of the corresponding | |
493 | Zn except for bits [127:0]. | |
494 | ||
495 | * 16 VL-bit predicate registers P0..P15 | |
496 | ||
497 | * 1 VL-bit special-purpose predicate register FFR (the "first-fault register") | |
498 | ||
499 | * a VL "pseudo-register" that determines the size of each vector register | |
500 | ||
501 | The SVE instruction set architecture provides no way to write VL directly. | |
502 | Instead, it can be modified only by EL1 and above, by writing appropriate | |
503 | system registers. | |
504 | ||
505 | * The value of VL can be configured at runtime by EL1 and above: | |
506 | 16 <= VL <= VLmax, where VL must be a multiple of 16. | |
507 | ||
508 | * The maximum vector length is determined by the hardware: | |
509 | 16 <= VLmax <= 256. | |
510 | ||
511 | (The SVE architecture specifies 256, but permits future architecture | |
512 | revisions to raise this limit.) | |
513 | ||
514 | * FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point | |
515 | operations in a similar way to the way in which they interact with ARMv8 | |
b693d0b3 | 516 | floating-point operations:: |
ce699081 DM |
517 | |
518 | 8VL-1 128 0 bit index | |
519 | +---- //// -----------------+ | |
520 | Z0 | : V0 | | |
521 | : : | |
522 | Z7 | : V7 | | |
523 | Z8 | : * V8 | | |
524 | : : : | |
525 | Z15 | : *V15 | | |
526 | Z16 | : V16 | | |
527 | : : | |
528 | Z31 | : V31 | | |
529 | +---- //// -----------------+ | |
530 | 31 0 | |
531 | VL-1 0 +-------+ | |
532 | +---- //// --+ FPSR | | | |
533 | P0 | | +-------+ | |
534 | : | | *FPCR | | | |
535 | P15 | | +-------+ | |
536 | +---- //// --+ | |
537 | FFR | | +-----+ | |
538 | +---- //// --+ VL | | | |
539 | +-----+ | |
540 | ||
541 | (*) callee-save: | |
542 | This only applies to bits [63:0] of Z-/V-registers. | |
543 | FPCR contains callee-save and caller-save bits. See [4] for details. | |
544 | ||
545 | ||
546 | A.2. Procedure call standard | |
547 | ----------------------------- | |
548 | ||
549 | The ARMv8-A base procedure call standard is extended as follows with respect to | |
550 | the additional SVE register state: | |
551 | ||
552 | * All SVE register bits that are not shared with FP/SIMD are caller-save. | |
553 | ||
554 | * Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. | |
555 | ||
556 | This follows from the way these bits are mapped to V8..V15, which are caller- | |
557 | save in the base procedure call standard. | |
558 | ||
559 | ||
560 | Appendix B. ARMv8-A FP/SIMD programmer's model | |
561 | =============================================== | |
562 | ||
563 | Note: This section is for information only and not intended to be complete or | |
564 | to replace any architectural specification. | |
565 | ||
8c046cdd | 566 | Refer to [4] for more information. |
ce699081 DM |
567 | |
568 | ARMv8-A defines the following floating-point / SIMD register state: | |
569 | ||
570 | * 32 128-bit vector registers V0..V31 | |
571 | * 2 32-bit status/control registers FPSR, FPCR | |
572 | ||
b693d0b3 MCC |
573 | :: |
574 | ||
ce699081 DM |
575 | 127 0 bit index |
576 | +---------------+ | |
577 | V0 | | | |
578 | : : : | |
579 | V7 | | | |
580 | * V8 | | | |
581 | : : : : | |
582 | *V15 | | | |
583 | V16 | | | |
584 | : : : | |
585 | V31 | | | |
586 | +---------------+ | |
587 | ||
588 | 31 0 | |
589 | +-------+ | |
590 | FPSR | | | |
591 | +-------+ | |
592 | *FPCR | | | |
593 | +-------+ | |
594 | ||
595 | (*) callee-save: | |
596 | This only applies to bits [63:0] of V-registers. | |
597 | FPCR contains a mixture of callee-save and caller-save bits. | |
598 | ||
599 | ||
600 | References | |
601 | ========== | |
602 | ||
603 | [1] arch/arm64/include/uapi/asm/sigcontext.h | |
604 | AArch64 Linux signal ABI definitions | |
605 | ||
606 | [2] arch/arm64/include/uapi/asm/ptrace.h | |
607 | AArch64 Linux ptrace ABI definitions | |
608 | ||
b693d0b3 | 609 | [3] Documentation/arm64/cpu-feature-registers.rst |
ce699081 DM |
610 | |
611 | [4] ARM IHI0055C | |
612 | http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf | |
613 | http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html | |
614 | Procedure Call Standard for the ARM 64-bit Architecture (AArch64) | |
1f2906d1 JC |
615 | |
616 | [5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst |