Commit | Line | Data |
---|---|---|
ffedeeb7 JS |
1 | Assembler Annotations |
2 | ===================== | |
3 | ||
4 | Copyright (c) 2017-2019 Jiri Slaby | |
5 | ||
6 | This document describes the new macros for annotation of data and code in | |
7 | assembly. In particular, it contains information about ``SYM_FUNC_START``, | |
8 | ``SYM_FUNC_END``, ``SYM_CODE_START``, and similar. | |
9 | ||
10 | Rationale | |
11 | --------- | |
12 | Some code like entries, trampolines, or boot code needs to be written in | |
13 | assembly. The same as in C, such code is grouped into functions and | |
14 | accompanied with data. Standard assemblers do not force users into precisely | |
15 | marking these pieces as code, data, or even specifying their length. | |
16 | Nevertheless, assemblers provide developers with such annotations to aid | |
17 | debuggers throughout assembly. On top of that, developers also want to mark | |
18 | some functions as *global* in order to be visible outside of their translation | |
19 | units. | |
20 | ||
21 | Over time, the Linux kernel has adopted macros from various projects (like | |
22 | ``binutils``) to facilitate such annotations. So for historic reasons, | |
23 | developers have been using ``ENTRY``, ``END``, ``ENDPROC``, and other | |
24 | annotations in assembly. Due to the lack of their documentation, the macros | |
25 | are used in rather wrong contexts at some locations. Clearly, ``ENTRY`` was | |
26 | intended to denote the beginning of global symbols (be it data or code). | |
27 | ``END`` used to mark the end of data or end of special functions with | |
28 | *non-standard* calling convention. In contrast, ``ENDPROC`` should annotate | |
29 | only ends of *standard* functions. | |
30 | ||
31 | When these macros are used correctly, they help assemblers generate a nice | |
32 | object with both sizes and types set correctly. For example, the result of | |
33 | ``arch/x86/lib/putuser.S``:: | |
34 | ||
35 | Num: Value Size Type Bind Vis Ndx Name | |
36 | 25: 0000000000000000 33 FUNC GLOBAL DEFAULT 1 __put_user_1 | |
37 | 29: 0000000000000030 37 FUNC GLOBAL DEFAULT 1 __put_user_2 | |
38 | 32: 0000000000000060 36 FUNC GLOBAL DEFAULT 1 __put_user_4 | |
39 | 35: 0000000000000090 37 FUNC GLOBAL DEFAULT 1 __put_user_8 | |
40 | ||
41 | This is not only important for debugging purposes. When there are properly | |
42 | annotated objects like this, tools can be run on them to generate more useful | |
43 | information. In particular, on properly annotated objects, ``objtool`` can be | |
44 | run to check and fix the object if needed. Currently, ``objtool`` can report | |
45 | missing frame pointer setup/destruction in functions. It can also | |
46 | automatically generate annotations for :doc:`ORC unwinder <x86/orc-unwinder>` | |
47 | for most code. Both of these are especially important to support reliable | |
48 | stack traces which are in turn necessary for :doc:`Kernel live patching | |
49 | <livepatch/livepatch>`. | |
50 | ||
51 | Caveat and Discussion | |
52 | --------------------- | |
53 | As one might realize, there were only three macros previously. That is indeed | |
54 | insufficient to cover all the combinations of cases: | |
55 | ||
56 | * standard/non-standard function | |
57 | * code/data | |
58 | * global/local symbol | |
59 | ||
60 | There was a discussion_ and instead of extending the current ``ENTRY/END*`` | |
61 | macros, it was decided that brand new macros should be introduced instead:: | |
62 | ||
63 | So how about using macro names that actually show the purpose, instead | |
64 | of importing all the crappy, historic, essentially randomly chosen | |
65 | debug symbol macro names from the binutils and older kernels? | |
66 | ||
67 | .. _discussion: https://lkml.kernel.org/r/20170217104757.28588-1-jslaby@suse.cz | |
68 | ||
69 | Macros Description | |
70 | ------------------ | |
71 | ||
72 | The new macros are prefixed with the ``SYM_`` prefix and can be divided into | |
73 | three main groups: | |
74 | ||
75 | 1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with | |
6535a39f WD |
76 | standard C calling conventions. For example, on x86, this means that the |
77 | stack contains a return address at the predefined place and a return from | |
78 | the function can happen in a standard way. When frame pointers are enabled, | |
79 | save/restore of frame pointer shall happen at the start/end of a function, | |
80 | respectively, too. | |
ffedeeb7 JS |
81 | |
82 | Checking tools like ``objtool`` should ensure such marked functions conform | |
83 | to these rules. The tools can also easily annotate these functions with | |
84 | debugging information (like *ORC data*) automatically. | |
85 | ||
86 | 2. ``SYM_CODE_*`` -- special functions called with special stack. Be it | |
87 | interrupt handlers with special stack content, trampolines, or startup | |
88 | functions. | |
89 | ||
90 | Checking tools mostly ignore checking of these functions. But some debug | |
91 | information still can be generated automatically. For correct debug data, | |
92 | this code needs hints like ``UNWIND_HINT_REGS`` provided by developers. | |
93 | ||
94 | 3. ``SYM_DATA*`` -- obviously data belonging to ``.data`` sections and not to | |
95 | ``.text``. Data do not contain instructions, so they have to be treated | |
96 | specially by the tools: they should not treat the bytes as instructions, | |
97 | nor assign any debug information to them. | |
98 | ||
99 | Instruction Macros | |
100 | ~~~~~~~~~~~~~~~~~~ | |
101 | This section covers ``SYM_FUNC_*`` and ``SYM_CODE_*`` enumerated above. | |
102 | ||
103 | * ``SYM_FUNC_START`` and ``SYM_FUNC_START_LOCAL`` are supposed to be **the | |
104 | most frequent markings**. They are used for functions with standard calling | |
105 | conventions -- global and local. Like in C, they both align the functions to | |
106 | architecture specific ``__ALIGN`` bytes. There are also ``_NOALIGN`` variants | |
107 | for special cases where developers do not want this implicit alignment. | |
108 | ||
109 | ``SYM_FUNC_START_WEAK`` and ``SYM_FUNC_START_WEAK_NOALIGN`` markings are | |
110 | also offered as an assembler counterpart to the *weak* attribute known from | |
111 | C. | |
112 | ||
113 | All of these **shall** be coupled with ``SYM_FUNC_END``. First, it marks | |
114 | the sequence of instructions as a function and computes its size to the | |
115 | generated object file. Second, it also eases checking and processing such | |
116 | object files as the tools can trivially find exact function boundaries. | |
117 | ||
118 | So in most cases, developers should write something like in the following | |
119 | example, having some asm instructions in between the macros, of course:: | |
120 | ||
0f42c1ad | 121 | SYM_FUNC_START(memset) |
ffedeeb7 | 122 | ... asm insns ... |
0f42c1ad | 123 | SYM_FUNC_END(memset) |
ffedeeb7 JS |
124 | |
125 | In fact, this kind of annotation corresponds to the now deprecated ``ENTRY`` | |
126 | and ``ENDPROC`` macros. | |
127 | ||
128 | * ``SYM_FUNC_START_ALIAS`` and ``SYM_FUNC_START_LOCAL_ALIAS`` serve for those | |
129 | who decided to have two or more names for one function. The typical use is:: | |
130 | ||
131 | SYM_FUNC_START_ALIAS(__memset) | |
132 | SYM_FUNC_START(memset) | |
133 | ... asm insns ... | |
134 | SYM_FUNC_END(memset) | |
135 | SYM_FUNC_END_ALIAS(__memset) | |
136 | ||
137 | In this example, one can call ``__memset`` or ``memset`` with the same | |
138 | result, except the debug information for the instructions is generated to | |
139 | the object file only once -- for the non-``ALIAS`` case. | |
140 | ||
141 | * ``SYM_CODE_START`` and ``SYM_CODE_START_LOCAL`` should be used only in | |
142 | special cases -- if you know what you are doing. This is used exclusively | |
143 | for interrupt handlers and similar where the calling convention is not the C | |
144 | one. ``_NOALIGN`` variants exist too. The use is the same as for the ``FUNC`` | |
145 | category above:: | |
146 | ||
147 | SYM_CODE_START_LOCAL(bad_put_user) | |
148 | ... asm insns ... | |
149 | SYM_CODE_END(bad_put_user) | |
150 | ||
151 | Again, every ``SYM_CODE_START*`` **shall** be coupled by ``SYM_CODE_END``. | |
152 | ||
153 | To some extent, this category corresponds to deprecated ``ENTRY`` and | |
154 | ``END``. Except ``END`` had several other meanings too. | |
155 | ||
156 | * ``SYM_INNER_LABEL*`` is used to denote a label inside some | |
157 | ``SYM_{CODE,FUNC}_START`` and ``SYM_{CODE,FUNC}_END``. They are very similar | |
158 | to C labels, except they can be made global. An example of use:: | |
159 | ||
160 | SYM_CODE_START(ftrace_caller) | |
161 | /* save_mcount_regs fills in first two parameters */ | |
162 | ... | |
163 | ||
164 | SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL) | |
165 | /* Load the ftrace_ops into the 3rd parameter */ | |
166 | ... | |
167 | ||
168 | SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) | |
169 | call ftrace_stub | |
170 | ... | |
171 | retq | |
172 | SYM_CODE_END(ftrace_caller) | |
173 | ||
174 | Data Macros | |
175 | ~~~~~~~~~~~ | |
176 | Similar to instructions, there is a couple of macros to describe data in the | |
177 | assembly. | |
178 | ||
179 | * ``SYM_DATA_START`` and ``SYM_DATA_START_LOCAL`` mark the start of some data | |
180 | and shall be used in conjunction with either ``SYM_DATA_END``, or | |
181 | ``SYM_DATA_END_LABEL``. The latter adds also a label to the end, so that | |
182 | people can use ``lstack`` and (local) ``lstack_end`` in the following | |
183 | example:: | |
184 | ||
185 | SYM_DATA_START_LOCAL(lstack) | |
186 | .skip 4096 | |
187 | SYM_DATA_END_LABEL(lstack, SYM_L_LOCAL, lstack_end) | |
188 | ||
189 | * ``SYM_DATA`` and ``SYM_DATA_LOCAL`` are variants for simple, mostly one-line | |
190 | data:: | |
191 | ||
192 | SYM_DATA(HEAP, .long rm_heap) | |
193 | SYM_DATA(heap_end, .long rm_stack) | |
194 | ||
195 | In the end, they expand to ``SYM_DATA_START`` with ``SYM_DATA_END`` | |
196 | internally. | |
197 | ||
198 | Support Macros | |
199 | ~~~~~~~~~~~~~~ | |
200 | All the above reduce themselves to some invocation of ``SYM_START``, | |
201 | ``SYM_END``, or ``SYM_ENTRY`` at last. Normally, developers should avoid using | |
202 | these. | |
203 | ||
204 | Further, in the above examples, one could see ``SYM_L_LOCAL``. There are also | |
205 | ``SYM_L_GLOBAL`` and ``SYM_L_WEAK``. All are intended to denote linkage of a | |
206 | symbol marked by them. They are used either in ``_LABEL`` variants of the | |
207 | earlier macros, or in ``SYM_START``. | |
208 | ||
209 | ||
210 | Overriding Macros | |
211 | ~~~~~~~~~~~~~~~~~ | |
212 | Architecture can also override any of the macros in their own | |
213 | ``asm/linkage.h``, including macros specifying the type of a symbol | |
214 | (``SYM_T_FUNC``, ``SYM_T_OBJECT``, and ``SYM_T_NONE``). As every macro | |
215 | described in this file is surrounded by ``#ifdef`` + ``#endif``, it is enough | |
216 | to define the macros differently in the aforementioned architecture-dependent | |
217 | header. |