Commit | Line | Data |
---|---|---|
dc7a12bd | 1 | ================ |
2afd0a05 AB |
2 | Kernel mode NEON |
3 | ================ | |
4 | ||
5 | TL;DR summary | |
6 | ------------- | |
7 | * Use only NEON instructions, or VFP instructions that don't rely on support | |
8 | code | |
9 | * Isolate your NEON code in a separate compilation unit, and compile it with | |
de9c0d49 | 10 | '-march=armv7-a -mfpu=neon -mfloat-abi=softfp' |
2afd0a05 AB |
11 | * Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your |
12 | NEON code | |
13 | * Don't sleep in your NEON code, and be aware that it will be executed with | |
14 | preemption disabled | |
15 | ||
16 | ||
17 | Introduction | |
18 | ------------ | |
19 | It is possible to use NEON instructions (and in some cases, VFP instructions) in | |
20 | code that runs in kernel mode. However, for performance reasons, the NEON/VFP | |
21 | register file is not preserved and restored at every context switch or taken | |
22 | exception like the normal register file is, so some manual intervention is | |
23 | required. Furthermore, special care is required for code that may sleep [i.e., | |
24 | may call schedule()], as NEON or VFP instructions will be executed in a | |
25 | non-preemptible section for reasons outlined below. | |
26 | ||
27 | ||
28 | Lazy preserve and restore | |
29 | ------------------------- | |
30 | The NEON/VFP register file is managed using lazy preserve (on UP systems) and | |
31 | lazy restore (on both SMP and UP systems). This means that the register file is | |
32 | kept 'live', and is only preserved and restored when multiple tasks are | |
33 | contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to | |
34 | another core). Lazy restore is implemented by disabling the NEON/VFP unit after | |
35 | every context switch, resulting in a trap when subsequently a NEON/VFP | |
36 | instruction is issued, allowing the kernel to step in and perform the restore if | |
37 | necessary. | |
38 | ||
39 | Any use of the NEON/VFP unit in kernel mode should not interfere with this, so | |
40 | it is required to do an 'eager' preserve of the NEON/VFP register file, and | |
41 | enable the NEON/VFP unit explicitly so no exceptions are generated on first | |
42 | subsequent use. This is handled by the function kernel_neon_begin(), which | |
43 | should be called before any kernel mode NEON or VFP instructions are issued. | |
44 | Likewise, the NEON/VFP unit should be disabled again after use to make sure user | |
45 | mode will hit the lazy restore trap upon next use. This is handled by the | |
46 | function kernel_neon_end(). | |
47 | ||
48 | ||
49 | Interruptions in kernel mode | |
50 | ---------------------------- | |
51 | For reasons of performance and simplicity, it was decided that there shall be no | |
52 | preserve/restore mechanism for the kernel mode NEON/VFP register contents. This | |
53 | implies that interruptions of a kernel mode NEON section can only be allowed if | |
54 | they are guaranteed not to touch the NEON/VFP registers. For this reason, the | |
55 | following rules and restrictions apply in the kernel: | |
56 | * NEON/VFP code is not allowed in interrupt context; | |
57 | * NEON/VFP code is not allowed to sleep; | |
58 | * NEON/VFP code is executed with preemption disabled. | |
59 | ||
60 | If latency is a concern, it is possible to put back to back calls to | |
61 | kernel_neon_end() and kernel_neon_begin() in places in your code where none of | |
62 | the NEON registers are live. (Additional calls to kernel_neon_begin() should be | |
63 | reasonably cheap if no context switch occurred in the meantime) | |
64 | ||
65 | ||
66 | VFP and support code | |
67 | -------------------- | |
68 | Earlier versions of VFP (prior to version 3) rely on software support for things | |
69 | like IEEE-754 compliant underflow handling etc. When the VFP unit needs such | |
70 | software assistance, it signals the kernel by raising an undefined instruction | |
71 | exception. The kernel responds by inspecting the VFP control registers and the | |
72 | current instruction and arguments, and emulates the instruction in software. | |
73 | ||
74 | Such software assistance is currently not implemented for VFP instructions | |
75 | executed in kernel mode. If such a condition is encountered, the kernel will | |
76 | fail and generate an OOPS. | |
77 | ||
78 | ||
79 | Separating NEON code from ordinary code | |
80 | --------------------------------------- | |
81 | The compiler is not aware of the special significance of kernel_neon_begin() and | |
82 | kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions | |
83 | between calls to these respective functions. Furthermore, GCC may generate NEON | |
84 | instructions of its own at -O3 level if -mfpu=neon is selected, and even if the | |
85 | kernel is currently compiled at -O2, future changes may result in NEON/VFP | |
86 | instructions appearing in unexpected places if no special care is taken. | |
87 | ||
88 | Therefore, the recommended and only supported way of using NEON/VFP in the | |
89 | kernel is by adhering to the following rules: | |
dc7a12bd | 90 | |
2afd0a05 | 91 | * isolate the NEON code in a separate compilation unit and compile it with |
de9c0d49 | 92 | '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'; |
2afd0a05 AB |
93 | * issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls |
94 | into the unit containing the NEON code from a compilation unit which is *not* | |
95 | built with the GCC flag '-mfpu=neon' set. | |
96 | ||
97 | As the kernel is compiled with '-msoft-float', the above will guarantee that | |
98 | both NEON and VFP instructions will only ever appear in designated compilation | |
99 | units at any optimization level. | |
100 | ||
101 | ||
102 | NEON assembler | |
103 | -------------- | |
104 | NEON assembler is supported with no additional caveats as long as the rules | |
105 | above are followed. | |
106 | ||
107 | ||
108 | NEON code generated by GCC | |
109 | -------------------------- | |
110 | The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit | |
111 | parallelism, and generates NEON code from ordinary C source code. This is fully | |
112 | supported as long as the rules above are followed. | |
113 | ||
114 | ||
115 | NEON intrinsics | |
116 | --------------- | |
117 | NEON intrinsics are also supported. However, as code using NEON intrinsics | |
118 | relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should | |
119 | observe the following in addition to the rules above: | |
dc7a12bd | 120 | |
2afd0a05 AB |
121 | * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC |
122 | uses its builtin version of <stdint.h> (this is a C99 header which the kernel | |
123 | does not supply); | |
124 | * Include <arm_neon.h> last, or at least after <linux/types.h> |