Commit | Line | Data |
---|---|---|
ffedeeb7 JS |
1 | Assembler Annotations |
2 | ===================== | |
3 | ||
4 | Copyright (c) 2017-2019 Jiri Slaby | |
5 | ||
6 | This document describes the new macros for annotation of data and code in | |
7 | assembly. In particular, it contains information about ``SYM_FUNC_START``, | |
8 | ``SYM_FUNC_END``, ``SYM_CODE_START``, and similar. | |
9 | ||
10 | Rationale | |
11 | --------- | |
12 | Some code like entries, trampolines, or boot code needs to be written in | |
13 | assembly. The same as in C, such code is grouped into functions and | |
14 | accompanied with data. Standard assemblers do not force users into precisely | |
15 | marking these pieces as code, data, or even specifying their length. | |
16 | Nevertheless, assemblers provide developers with such annotations to aid | |
17 | debuggers throughout assembly. On top of that, developers also want to mark | |
18 | some functions as *global* in order to be visible outside of their translation | |
19 | units. | |
20 | ||
21 | Over time, the Linux kernel has adopted macros from various projects (like | |
22 | ``binutils``) to facilitate such annotations. So for historic reasons, | |
23 | developers have been using ``ENTRY``, ``END``, ``ENDPROC``, and other | |
24 | annotations in assembly. Due to the lack of their documentation, the macros | |
25 | are used in rather wrong contexts at some locations. Clearly, ``ENTRY`` was | |
26 | intended to denote the beginning of global symbols (be it data or code). | |
27 | ``END`` used to mark the end of data or end of special functions with | |
28 | *non-standard* calling convention. In contrast, ``ENDPROC`` should annotate | |
29 | only ends of *standard* functions. | |
30 | ||
31 | When these macros are used correctly, they help assemblers generate a nice | |
32 | object with both sizes and types set correctly. For example, the result of | |
33 | ``arch/x86/lib/putuser.S``:: | |
34 | ||
35 | Num: Value Size Type Bind Vis Ndx Name | |
36 | 25: 0000000000000000 33 FUNC GLOBAL DEFAULT 1 __put_user_1 | |
37 | 29: 0000000000000030 37 FUNC GLOBAL DEFAULT 1 __put_user_2 | |
38 | 32: 0000000000000060 36 FUNC GLOBAL DEFAULT 1 __put_user_4 | |
39 | 35: 0000000000000090 37 FUNC GLOBAL DEFAULT 1 __put_user_8 | |
40 | ||
41 | This is not only important for debugging purposes. When there are properly | |
42 | annotated objects like this, tools can be run on them to generate more useful | |
43 | information. In particular, on properly annotated objects, ``objtool`` can be | |
44 | run to check and fix the object if needed. Currently, ``objtool`` can report | |
45 | missing frame pointer setup/destruction in functions. It can also | |
46 | automatically generate annotations for :doc:`ORC unwinder <x86/orc-unwinder>` | |
47 | for most code. Both of these are especially important to support reliable | |
48 | stack traces which are in turn necessary for :doc:`Kernel live patching | |
49 | <livepatch/livepatch>`. | |
50 | ||
51 | Caveat and Discussion | |
52 | --------------------- | |
53 | As one might realize, there were only three macros previously. That is indeed | |
54 | insufficient to cover all the combinations of cases: | |
55 | ||
56 | * standard/non-standard function | |
57 | * code/data | |
58 | * global/local symbol | |
59 | ||
60 | There was a discussion_ and instead of extending the current ``ENTRY/END*`` | |
61 | macros, it was decided that brand new macros should be introduced instead:: | |
62 | ||
63 | So how about using macro names that actually show the purpose, instead | |
64 | of importing all the crappy, historic, essentially randomly chosen | |
65 | debug symbol macro names from the binutils and older kernels? | |
66 | ||
a9d85efb | 67 | .. _discussion: https://lore.kernel.org/r/20170217104757.28588-1-jslaby@suse.cz |
ffedeeb7 JS |
68 | |
69 | Macros Description | |
70 | ------------------ | |
71 | ||
72 | The new macros are prefixed with the ``SYM_`` prefix and can be divided into | |
73 | three main groups: | |
74 | ||
75 | 1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with | |
6535a39f WD |
76 | standard C calling conventions. For example, on x86, this means that the |
77 | stack contains a return address at the predefined place and a return from | |
78 | the function can happen in a standard way. When frame pointers are enabled, | |
79 | save/restore of frame pointer shall happen at the start/end of a function, | |
80 | respectively, too. | |
ffedeeb7 JS |
81 | |
82 | Checking tools like ``objtool`` should ensure such marked functions conform | |
83 | to these rules. The tools can also easily annotate these functions with | |
84 | debugging information (like *ORC data*) automatically. | |
85 | ||
86 | 2. ``SYM_CODE_*`` -- special functions called with special stack. Be it | |
87 | interrupt handlers with special stack content, trampolines, or startup | |
88 | functions. | |
89 | ||
90 | Checking tools mostly ignore checking of these functions. But some debug | |
91 | information still can be generated automatically. For correct debug data, | |
92 | this code needs hints like ``UNWIND_HINT_REGS`` provided by developers. | |
93 | ||
94 | 3. ``SYM_DATA*`` -- obviously data belonging to ``.data`` sections and not to | |
95 | ``.text``. Data do not contain instructions, so they have to be treated | |
96 | specially by the tools: they should not treat the bytes as instructions, | |
97 | nor assign any debug information to them. | |
98 | ||
99 | Instruction Macros | |
100 | ~~~~~~~~~~~~~~~~~~ | |
101 | This section covers ``SYM_FUNC_*`` and ``SYM_CODE_*`` enumerated above. | |
102 | ||
5e6dca82 ND |
103 | ``objtool`` requires that all code must be contained in an ELF symbol. Symbol |
104 | names that have a ``.L`` prefix do not emit symbol table entries. ``.L`` | |
105 | prefixed symbols can be used within a code region, but should be avoided for | |
106 | denoting a range of code via ``SYM_*_START/END`` annotations. | |
107 | ||
ffedeeb7 JS |
108 | * ``SYM_FUNC_START`` and ``SYM_FUNC_START_LOCAL`` are supposed to be **the |
109 | most frequent markings**. They are used for functions with standard calling | |
110 | conventions -- global and local. Like in C, they both align the functions to | |
111 | architecture specific ``__ALIGN`` bytes. There are also ``_NOALIGN`` variants | |
112 | for special cases where developers do not want this implicit alignment. | |
113 | ||
114 | ``SYM_FUNC_START_WEAK`` and ``SYM_FUNC_START_WEAK_NOALIGN`` markings are | |
115 | also offered as an assembler counterpart to the *weak* attribute known from | |
116 | C. | |
117 | ||
118 | All of these **shall** be coupled with ``SYM_FUNC_END``. First, it marks | |
119 | the sequence of instructions as a function and computes its size to the | |
120 | generated object file. Second, it also eases checking and processing such | |
121 | object files as the tools can trivially find exact function boundaries. | |
122 | ||
123 | So in most cases, developers should write something like in the following | |
124 | example, having some asm instructions in between the macros, of course:: | |
125 | ||
0f42c1ad | 126 | SYM_FUNC_START(memset) |
ffedeeb7 | 127 | ... asm insns ... |
0f42c1ad | 128 | SYM_FUNC_END(memset) |
ffedeeb7 JS |
129 | |
130 | In fact, this kind of annotation corresponds to the now deprecated ``ENTRY`` | |
131 | and ``ENDPROC`` macros. | |
132 | ||
e0891269 MR |
133 | * ``SYM_FUNC_ALIAS``, ``SYM_FUNC_ALIAS_LOCAL``, and ``SYM_FUNC_ALIAS_WEAK`` can |
134 | be used to define multiple names for a function. The typical use is:: | |
135 | ||
136 | SYM_FUNC_START(__memset) | |
137 | ... asm insns ... | |
138 | SYN_FUNC_END(__memset) | |
139 | SYM_FUNC_ALIAS(memset, __memset) | |
140 | ||
141 | In this example, one can call ``__memset`` or ``memset`` with the same | |
142 | result, except the debug information for the instructions is generated to | |
143 | the object file only once -- for the non-``ALIAS`` case. | |
144 | ||
ffedeeb7 JS |
145 | * ``SYM_CODE_START`` and ``SYM_CODE_START_LOCAL`` should be used only in |
146 | special cases -- if you know what you are doing. This is used exclusively | |
147 | for interrupt handlers and similar where the calling convention is not the C | |
148 | one. ``_NOALIGN`` variants exist too. The use is the same as for the ``FUNC`` | |
149 | category above:: | |
150 | ||
151 | SYM_CODE_START_LOCAL(bad_put_user) | |
152 | ... asm insns ... | |
153 | SYM_CODE_END(bad_put_user) | |
154 | ||
155 | Again, every ``SYM_CODE_START*`` **shall** be coupled by ``SYM_CODE_END``. | |
156 | ||
157 | To some extent, this category corresponds to deprecated ``ENTRY`` and | |
158 | ``END``. Except ``END`` had several other meanings too. | |
159 | ||
160 | * ``SYM_INNER_LABEL*`` is used to denote a label inside some | |
161 | ``SYM_{CODE,FUNC}_START`` and ``SYM_{CODE,FUNC}_END``. They are very similar | |
162 | to C labels, except they can be made global. An example of use:: | |
163 | ||
164 | SYM_CODE_START(ftrace_caller) | |
165 | /* save_mcount_regs fills in first two parameters */ | |
166 | ... | |
167 | ||
168 | SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL) | |
169 | /* Load the ftrace_ops into the 3rd parameter */ | |
170 | ... | |
171 | ||
172 | SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) | |
173 | call ftrace_stub | |
174 | ... | |
175 | retq | |
176 | SYM_CODE_END(ftrace_caller) | |
177 | ||
178 | Data Macros | |
179 | ~~~~~~~~~~~ | |
180 | Similar to instructions, there is a couple of macros to describe data in the | |
181 | assembly. | |
182 | ||
183 | * ``SYM_DATA_START`` and ``SYM_DATA_START_LOCAL`` mark the start of some data | |
184 | and shall be used in conjunction with either ``SYM_DATA_END``, or | |
185 | ``SYM_DATA_END_LABEL``. The latter adds also a label to the end, so that | |
186 | people can use ``lstack`` and (local) ``lstack_end`` in the following | |
187 | example:: | |
188 | ||
189 | SYM_DATA_START_LOCAL(lstack) | |
190 | .skip 4096 | |
191 | SYM_DATA_END_LABEL(lstack, SYM_L_LOCAL, lstack_end) | |
192 | ||
193 | * ``SYM_DATA`` and ``SYM_DATA_LOCAL`` are variants for simple, mostly one-line | |
194 | data:: | |
195 | ||
196 | SYM_DATA(HEAP, .long rm_heap) | |
197 | SYM_DATA(heap_end, .long rm_stack) | |
198 | ||
199 | In the end, they expand to ``SYM_DATA_START`` with ``SYM_DATA_END`` | |
200 | internally. | |
201 | ||
202 | Support Macros | |
203 | ~~~~~~~~~~~~~~ | |
204 | All the above reduce themselves to some invocation of ``SYM_START``, | |
205 | ``SYM_END``, or ``SYM_ENTRY`` at last. Normally, developers should avoid using | |
206 | these. | |
207 | ||
208 | Further, in the above examples, one could see ``SYM_L_LOCAL``. There are also | |
209 | ``SYM_L_GLOBAL`` and ``SYM_L_WEAK``. All are intended to denote linkage of a | |
210 | symbol marked by them. They are used either in ``_LABEL`` variants of the | |
211 | earlier macros, or in ``SYM_START``. | |
212 | ||
213 | ||
214 | Overriding Macros | |
215 | ~~~~~~~~~~~~~~~~~ | |
216 | Architecture can also override any of the macros in their own | |
217 | ``asm/linkage.h``, including macros specifying the type of a symbol | |
218 | (``SYM_T_FUNC``, ``SYM_T_OBJECT``, and ``SYM_T_NONE``). As every macro | |
219 | described in this file is surrounded by ``#ifdef`` + ``#endif``, it is enough | |
220 | to define the macros differently in the aforementioned architecture-dependent | |
221 | header. |