Merge tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git...
[linux-block.git] / Documentation / core-api / memory-allocation.rst
CommitLineData
cd7198fc 1.. _memory_allocation:
acf0f57a 2
52272c92
MR
3=======================
4Memory Allocation Guide
5=======================
6
7Linux provides a variety of APIs for memory allocation. You can
8allocate small chunks using `kmalloc` or `kmem_cache_alloc` families,
9large virtually contiguous areas using `vmalloc` and its derivatives,
10or you can directly request pages from the page allocator with
11`alloc_pages`. It is also possible to use more specialized allocators,
12for instance `cma_alloc` or `zs_malloc`.
13
14Most of the memory allocation APIs use GFP flags to express how that
15memory should be allocated. The GFP acronym stands for "get free
16pages", the underlying memory allocation function.
17
18Diversity of the allocation APIs combined with the numerous GFP flags
19makes the question "How should I allocate memory?" not that easy to
20answer, although very likely you should use
21
22::
23
24 kzalloc(<size>, GFP_KERNEL);
25
26Of course there are cases when other allocation APIs and different GFP
27flags must be used.
28
29Get Free Page flags
30===================
31
32The GFP flags control the allocators behavior. They tell what memory
33zones can be used, how hard the allocator should try to find free
34memory, whether the memory can be accessed by the userspace etc. The
35:ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides
36reference documentation for the GFP flags and their combinations and
37here we briefly outline their recommended usage:
38
39 * Most of the time ``GFP_KERNEL`` is what you need. Memory for the
40 kernel data structures, DMAable memory, inode cache, all these and
41 many other allocations types can use ``GFP_KERNEL``. Note, that
42 using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that
43 direct reclaim may be triggered under memory pressure; the calling
44 context must be allowed to sleep.
45 * If the allocation is performed from an atomic context, e.g interrupt
46 handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and
47 IO or filesystem operations. Consequently, under memory pressure
48 ``GFP_NOWAIT`` allocation is likely to fail. Allocations which
49 have a reasonable fallback should be using ``GFP_NOWARN``.
50 * If you think that accessing memory reserves is justified and the kernel
51 will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.
52 * Untrusted allocations triggered from userspace should be a subject
53 of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There
54 is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL``
55 allocations that should be accounted.
56 * Userspace allocations should use either of the ``GFP_USER``,
57 ``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer
58 the flag name the less restrictive it is.
59
60 ``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory
61 will be directly accessible by the kernel and implies that the
62 data is movable.
63
64 ``GFP_HIGHUSER`` means that the allocated memory is not movable,
65 but it is not required to be directly accessible by the kernel. An
66 example may be a hardware allocation that maps data directly into
67 userspace but has no addressing limitations.
68
69 ``GFP_USER`` means that the allocated memory is not movable and it
70 must be directly accessible by the kernel.
71
72You may notice that quite a few allocations in the existing code
73specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to
74prevent recursion deadlocks caused by direct memory reclaim calling
75back into the FS or IO paths and blocking on already held
76resources. Since 4.12 the preferred way to address this issue is to
77use new scope APIs described in
78:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`.
79
80Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are
81used to ensure that the allocated memory is accessible by hardware
82with limited addressing capabilities. So unless you are writing a
83driver for a device with such restrictions, avoid using these flags.
84And even with hardware with restrictions it is preferable to use
85`dma_alloc*` APIs.
86
00bafa57
MR
87GFP flags and reclaim behavior
88------------------------------
89Memory allocations may trigger direct or background reclaim and it is
90useful to understand how hard the page allocator will try to satisfy that
91or another request.
92
93 * ``GFP_KERNEL & ~__GFP_RECLAIM`` - optimistic allocation without _any_
94 attempt to free memory at all. The most light weight mode which even
95 doesn't kick the background reclaim. Should be used carefully because it
96 might deplete the memory and the next user might hit the more aggressive
97 reclaim.
98
99 * ``GFP_KERNEL & ~__GFP_DIRECT_RECLAIM`` (or ``GFP_NOWAIT``)- optimistic
100 allocation without any attempt to free memory from the current
101 context but can wake kswapd to reclaim memory if the zone is below
102 the low watermark. Can be used from either atomic contexts or when
103 the request is a performance optimization and there is another
104 fallback for a slow path.
105
106 * ``(GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM`` (aka ``GFP_ATOMIC``) -
107 non sleeping allocation with an expensive fallback so it can access
108 some portion of memory reserves. Usually used from interrupt/bottom-half
109 context with an expensive slow path fallback.
110
111 * ``GFP_KERNEL`` - both background and direct reclaim are allowed and the
112 **default** page allocator behavior is used. That means that not costly
113 allocation requests are basically no-fail but there is no guarantee of
114 that behavior so failures have to be checked properly by callers
115 (e.g. OOM killer victim is allowed to fail currently).
116
117 * ``GFP_KERNEL | __GFP_NORETRY`` - overrides the default allocator behavior
118 and all allocation requests fail early rather than cause disruptive
119 reclaim (one round of reclaim in this implementation). The OOM killer
120 is not invoked.
121
122 * ``GFP_KERNEL | __GFP_RETRY_MAYFAIL`` - overrides the default allocator
123 behavior and all allocation requests try really hard. The request
124 will fail if the reclaim cannot make any progress. The OOM killer
125 won't be triggered.
126
127 * ``GFP_KERNEL | __GFP_NOFAIL`` - overrides the default allocator behavior
128 and all allocation requests will loop endlessly until they succeed.
129 This might be really dangerous especially for larger orders.
130
52272c92
MR
131Selecting memory allocator
132==========================
133
134The most straightforward way to allocate memory is to use a function
094ef1c9
CP
135from the kmalloc() family. And, to be on the safe side it's best to use
136routines that set memory to zero, like kzalloc(). If you need to
137allocate memory for an array, there are kmalloc_array() and kcalloc()
1c16b3d5
CP
138helpers. The helpers struct_size(), array_size() and array3_size() can
139be used to safely calculate object sizes without overflowing.
52272c92
MR
140
141The maximal size of a chunk that can be allocated with `kmalloc` is
142limited. The actual limit depends on the hardware and the kernel
143configuration, but it is a good practice to use `kmalloc` for objects
144smaller than page size.
145
59bb4798
VB
146The address of a chunk allocated with `kmalloc` is aligned to at least
147ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the
148alignment is also guaranteed to be at least the respective size.
149
f0dbd2bd
BG
150Chunks allocated with kmalloc() can be resized with krealloc(). Similarly
151to kmalloc_array(): a helper for resizing arrays is provided in the form of
152krealloc_array().
153
094ef1c9
CP
154For large allocations you can use vmalloc() and vzalloc(), or directly
155request pages from the page allocator. The memory allocated by `vmalloc`
156and related functions is not physically contiguous.
52272c92
MR
157
158If you are not sure whether the allocation size is too large for
094ef1c9
CP
159`kmalloc`, it is possible to use kvmalloc() and its derivatives. It will
160try to allocate memory with `kmalloc` and if the allocation fails it
161will be retried with `vmalloc`. There are restrictions on which GFP
162flags can be used with `kvmalloc`; please see kvmalloc_node() reference
163documentation. Note that `kvmalloc` may return memory that is not
164physically contiguous.
52272c92
MR
165
166If you need to allocate many identical objects you can use the slab
094ef1c9
CP
167cache allocator. The cache should be set up with kmem_cache_create() or
168kmem_cache_create_usercopy() before it can be used. The second function
169should be used if a part of the cache might be copied to the userspace.
170After the cache is created kmem_cache_alloc() and its convenience
171wrappers can allocate memory from that cache.
172
ae65a521
VB
173When the allocated memory is no longer needed it must be freed.
174
175Objects allocated by `kmalloc` can be freed by `kfree` or `kvfree`. Objects
176allocated by `kmem_cache_alloc` can be freed with `kmem_cache_free`, `kfree`
177or `kvfree`, where the latter two might be more convenient thanks to not
178needing the kmem_cache pointer.
179
180The same rules apply to _bulk and _rcu flavors of freeing functions.
181
182Memory allocated by `vmalloc` can be freed with `vfree` or `kvfree`.
183Memory allocated by `kvmalloc` can be freed with `kvfree`.
184Caches created by `kmem_cache_create` should be freed with
185`kmem_cache_destroy` only after freeing all the allocated objects first.