Commit | Line | Data |
---|---|---|
cd7198fc | 1 | .. _memory_allocation: |
acf0f57a | 2 | |
52272c92 MR |
3 | ======================= |
4 | Memory Allocation Guide | |
5 | ======================= | |
6 | ||
7 | Linux provides a variety of APIs for memory allocation. You can | |
8 | allocate small chunks using `kmalloc` or `kmem_cache_alloc` families, | |
9 | large virtually contiguous areas using `vmalloc` and its derivatives, | |
10 | or you can directly request pages from the page allocator with | |
11 | `alloc_pages`. It is also possible to use more specialized allocators, | |
12 | for instance `cma_alloc` or `zs_malloc`. | |
13 | ||
14 | Most of the memory allocation APIs use GFP flags to express how that | |
15 | memory should be allocated. The GFP acronym stands for "get free | |
16 | pages", the underlying memory allocation function. | |
17 | ||
18 | Diversity of the allocation APIs combined with the numerous GFP flags | |
19 | makes the question "How should I allocate memory?" not that easy to | |
20 | answer, although very likely you should use | |
21 | ||
22 | :: | |
23 | ||
24 | kzalloc(<size>, GFP_KERNEL); | |
25 | ||
26 | Of course there are cases when other allocation APIs and different GFP | |
27 | flags must be used. | |
28 | ||
29 | Get Free Page flags | |
30 | =================== | |
31 | ||
32 | The GFP flags control the allocators behavior. They tell what memory | |
33 | zones can be used, how hard the allocator should try to find free | |
34 | memory, whether the memory can be accessed by the userspace etc. The | |
35 | :ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides | |
36 | reference documentation for the GFP flags and their combinations and | |
37 | here we briefly outline their recommended usage: | |
38 | ||
39 | * Most of the time ``GFP_KERNEL`` is what you need. Memory for the | |
40 | kernel data structures, DMAable memory, inode cache, all these and | |
41 | many other allocations types can use ``GFP_KERNEL``. Note, that | |
42 | using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that | |
43 | direct reclaim may be triggered under memory pressure; the calling | |
44 | context must be allowed to sleep. | |
45 | * If the allocation is performed from an atomic context, e.g interrupt | |
46 | handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and | |
47 | IO or filesystem operations. Consequently, under memory pressure | |
48 | ``GFP_NOWAIT`` allocation is likely to fail. Allocations which | |
49 | have a reasonable fallback should be using ``GFP_NOWARN``. | |
50 | * If you think that accessing memory reserves is justified and the kernel | |
51 | will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``. | |
52 | * Untrusted allocations triggered from userspace should be a subject | |
53 | of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There | |
54 | is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` | |
55 | allocations that should be accounted. | |
56 | * Userspace allocations should use either of the ``GFP_USER``, | |
57 | ``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer | |
58 | the flag name the less restrictive it is. | |
59 | ||
60 | ``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory | |
61 | will be directly accessible by the kernel and implies that the | |
62 | data is movable. | |
63 | ||
64 | ``GFP_HIGHUSER`` means that the allocated memory is not movable, | |
65 | but it is not required to be directly accessible by the kernel. An | |
66 | example may be a hardware allocation that maps data directly into | |
67 | userspace but has no addressing limitations. | |
68 | ||
69 | ``GFP_USER`` means that the allocated memory is not movable and it | |
70 | must be directly accessible by the kernel. | |
71 | ||
72 | You may notice that quite a few allocations in the existing code | |
73 | specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to | |
74 | prevent recursion deadlocks caused by direct memory reclaim calling | |
75 | back into the FS or IO paths and blocking on already held | |
76 | resources. Since 4.12 the preferred way to address this issue is to | |
77 | use new scope APIs described in | |
78 | :ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`. | |
79 | ||
80 | Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are | |
81 | used to ensure that the allocated memory is accessible by hardware | |
82 | with limited addressing capabilities. So unless you are writing a | |
83 | driver for a device with such restrictions, avoid using these flags. | |
84 | And even with hardware with restrictions it is preferable to use | |
85 | `dma_alloc*` APIs. | |
86 | ||
00bafa57 MR |
87 | GFP flags and reclaim behavior |
88 | ------------------------------ | |
89 | Memory allocations may trigger direct or background reclaim and it is | |
90 | useful to understand how hard the page allocator will try to satisfy that | |
91 | or another request. | |
92 | ||
93 | * ``GFP_KERNEL & ~__GFP_RECLAIM`` - optimistic allocation without _any_ | |
94 | attempt to free memory at all. The most light weight mode which even | |
95 | doesn't kick the background reclaim. Should be used carefully because it | |
96 | might deplete the memory and the next user might hit the more aggressive | |
97 | reclaim. | |
98 | ||
99 | * ``GFP_KERNEL & ~__GFP_DIRECT_RECLAIM`` (or ``GFP_NOWAIT``)- optimistic | |
100 | allocation without any attempt to free memory from the current | |
101 | context but can wake kswapd to reclaim memory if the zone is below | |
102 | the low watermark. Can be used from either atomic contexts or when | |
103 | the request is a performance optimization and there is another | |
104 | fallback for a slow path. | |
105 | ||
106 | * ``(GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM`` (aka ``GFP_ATOMIC``) - | |
107 | non sleeping allocation with an expensive fallback so it can access | |
108 | some portion of memory reserves. Usually used from interrupt/bottom-half | |
109 | context with an expensive slow path fallback. | |
110 | ||
111 | * ``GFP_KERNEL`` - both background and direct reclaim are allowed and the | |
112 | **default** page allocator behavior is used. That means that not costly | |
113 | allocation requests are basically no-fail but there is no guarantee of | |
114 | that behavior so failures have to be checked properly by callers | |
115 | (e.g. OOM killer victim is allowed to fail currently). | |
116 | ||
117 | * ``GFP_KERNEL | __GFP_NORETRY`` - overrides the default allocator behavior | |
118 | and all allocation requests fail early rather than cause disruptive | |
119 | reclaim (one round of reclaim in this implementation). The OOM killer | |
120 | is not invoked. | |
121 | ||
122 | * ``GFP_KERNEL | __GFP_RETRY_MAYFAIL`` - overrides the default allocator | |
123 | behavior and all allocation requests try really hard. The request | |
124 | will fail if the reclaim cannot make any progress. The OOM killer | |
125 | won't be triggered. | |
126 | ||
127 | * ``GFP_KERNEL | __GFP_NOFAIL`` - overrides the default allocator behavior | |
128 | and all allocation requests will loop endlessly until they succeed. | |
129 | This might be really dangerous especially for larger orders. | |
130 | ||
52272c92 MR |
131 | Selecting memory allocator |
132 | ========================== | |
133 | ||
134 | The most straightforward way to allocate memory is to use a function | |
094ef1c9 CP |
135 | from the kmalloc() family. And, to be on the safe side it's best to use |
136 | routines that set memory to zero, like kzalloc(). If you need to | |
137 | allocate memory for an array, there are kmalloc_array() and kcalloc() | |
1c16b3d5 CP |
138 | helpers. The helpers struct_size(), array_size() and array3_size() can |
139 | be used to safely calculate object sizes without overflowing. | |
52272c92 MR |
140 | |
141 | The maximal size of a chunk that can be allocated with `kmalloc` is | |
142 | limited. The actual limit depends on the hardware and the kernel | |
143 | configuration, but it is a good practice to use `kmalloc` for objects | |
144 | smaller than page size. | |
145 | ||
59bb4798 VB |
146 | The address of a chunk allocated with `kmalloc` is aligned to at least |
147 | ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the | |
148 | alignment is also guaranteed to be at least the respective size. | |
149 | ||
f0dbd2bd BG |
150 | Chunks allocated with kmalloc() can be resized with krealloc(). Similarly |
151 | to kmalloc_array(): a helper for resizing arrays is provided in the form of | |
152 | krealloc_array(). | |
153 | ||
094ef1c9 CP |
154 | For large allocations you can use vmalloc() and vzalloc(), or directly |
155 | request pages from the page allocator. The memory allocated by `vmalloc` | |
156 | and related functions is not physically contiguous. | |
52272c92 MR |
157 | |
158 | If you are not sure whether the allocation size is too large for | |
094ef1c9 CP |
159 | `kmalloc`, it is possible to use kvmalloc() and its derivatives. It will |
160 | try to allocate memory with `kmalloc` and if the allocation fails it | |
161 | will be retried with `vmalloc`. There are restrictions on which GFP | |
162 | flags can be used with `kvmalloc`; please see kvmalloc_node() reference | |
163 | documentation. Note that `kvmalloc` may return memory that is not | |
164 | physically contiguous. | |
52272c92 MR |
165 | |
166 | If you need to allocate many identical objects you can use the slab | |
094ef1c9 CP |
167 | cache allocator. The cache should be set up with kmem_cache_create() or |
168 | kmem_cache_create_usercopy() before it can be used. The second function | |
169 | should be used if a part of the cache might be copied to the userspace. | |
170 | After the cache is created kmem_cache_alloc() and its convenience | |
171 | wrappers can allocate memory from that cache. | |
172 | ||
ae65a521 VB |
173 | When the allocated memory is no longer needed it must be freed. |
174 | ||
175 | Objects allocated by `kmalloc` can be freed by `kfree` or `kvfree`. Objects | |
176 | allocated by `kmem_cache_alloc` can be freed with `kmem_cache_free`, `kfree` | |
177 | or `kvfree`, where the latter two might be more convenient thanks to not | |
178 | needing the kmem_cache pointer. | |
179 | ||
180 | The same rules apply to _bulk and _rcu flavors of freeing functions. | |
181 | ||
182 | Memory allocated by `vmalloc` can be freed with `vfree` or `kvfree`. | |
183 | Memory allocated by `kvmalloc` can be freed with `kvfree`. | |
184 | Caches created by `kmem_cache_create` should be freed with | |
185 | `kmem_cache_destroy` only after freeing all the allocated objects first. |