Commit | Line | Data |
---|---|---|
2a05c58b | 1 | ======== |
d02be50d | 2 | zsmalloc |
2a05c58b | 3 | ======== |
d02be50d MK |
4 | |
5 | This allocator is designed for use with zram. Thus, the allocator is | |
6 | supposed to work well under low memory conditions. In particular, it | |
7 | never attempts higher order page allocation which is very likely to | |
8 | fail under memory pressure. On the other hand, if we just use single | |
9 | (0-order) pages, it would suffer from very high fragmentation -- | |
10 | any object of size PAGE_SIZE/2 or larger would occupy an entire page. | |
11 | This was one of the major issues with its predecessor (xvmalloc). | |
12 | ||
13 | To overcome these issues, zsmalloc allocates a bunch of 0-order pages | |
14 | and links them together using various 'struct page' fields. These linked | |
15 | pages act as a single higher-order page i.e. an object can span 0-order | |
16 | page boundaries. The code refers to these linked pages as a single entity | |
17 | called zspage. | |
18 | ||
19 | For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE | |
20 | since this satisfies the requirements of all its current users (in the | |
21 | worst case, page is incompressible and is thus stored "as-is" i.e. in | |
22 | uncompressed form). For allocation requests larger than this size, failure | |
23 | is returned (see zs_malloc). | |
24 | ||
25 | Additionally, zs_malloc() does not return a dereferenceable pointer. | |
26 | Instead, it returns an opaque handle (unsigned long) which encodes actual | |
27 | location of the allocated object. The reason for this indirection is that | |
28 | zsmalloc does not keep zspages permanently mapped since that would cause | |
29 | issues on 32-bit systems where the VA region for kernel space mappings | |
30 | is very small. So, before using the allocating memory, the object has to | |
31 | be mapped using zs_map_object() to get a usable pointer and subsequently | |
32 | unmapped using zs_unmap_object(). | |
33 | ||
34 | stat | |
2a05c58b | 35 | ==== |
d02be50d MK |
36 | |
37 | With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via | |
2a05c58b | 38 | ``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output:: |
d02be50d | 39 | |
2a05c58b | 40 | # cat /sys/kernel/debug/zsmalloc/zram0/classes |
d02be50d MK |
41 | |
42 | class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage | |
2a05c58b MR |
43 | ... |
44 | ... | |
d02be50d MK |
45 | 9 176 0 1 186 129 8 4 |
46 | 10 192 1 0 2880 2872 135 3 | |
47 | 11 208 0 1 819 795 42 2 | |
48 | 12 224 0 1 219 159 12 4 | |
2a05c58b MR |
49 | ... |
50 | ... | |
51 | ||
d02be50d | 52 | |
2a05c58b MR |
53 | class |
54 | index | |
55 | size | |
56 | object size zspage stores | |
57 | almost_empty | |
58 | the number of ZS_ALMOST_EMPTY zspages(see below) | |
59 | almost_full | |
60 | the number of ZS_ALMOST_FULL zspages(see below) | |
61 | obj_allocated | |
62 | the number of objects allocated | |
63 | obj_used | |
64 | the number of objects allocated to the user | |
65 | pages_used | |
66 | the number of pages allocated for the class | |
67 | pages_per_zspage | |
68 | the number of 0-order pages to make a zspage | |
d02be50d | 69 | |
2a05c58b | 70 | We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where |
d02be50d | 71 | |
2a05c58b MR |
72 | * n = number of allocated objects |
73 | * N = total number of objects zspage can store | |
74 | * f = fullness_threshold_frac(ie, 4 at the moment) | |
d02be50d MK |
75 | |
76 | Similarly, we assign zspage to: | |
2a05c58b MR |
77 | |
78 | * ZS_ALMOST_FULL when n > N / f | |
79 | * ZS_EMPTY when n == 0 | |
80 | * ZS_FULL when n == N | |
4ff93b29 SS |
81 | |
82 | ||
83 | Internals | |
84 | ========= | |
85 | ||
86 | zsmalloc has 255 size classes, each of which can hold a number of zspages. | |
87 | Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages. | |
88 | The optimal zspage chain size for each size class is calculated during the | |
89 | creation of the zsmalloc pool (see calculate_zspage_chain_size()). | |
90 | ||
91 | As an optimization, zsmalloc merges size classes that have similar | |
92 | characteristics in terms of the number of pages per zspage and the number | |
93 | of objects that each zspage can store. | |
94 | ||
95 | For instance, consider the following size classes::: | |
96 | ||
97 | class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable | |
98 | ... | |
99 | 94 1536 0 0 0 0 0 3 0 | |
100 | 100 1632 0 0 0 0 0 2 0 | |
101 | ... | |
102 | ||
103 | ||
104 | Size classes #95-99 are merged with size class #100. This means that when we | |
105 | need to store an object of size, say, 1568 bytes, we end up using size class | |
106 | #100 instead of size class #96. Size class #100 is meant for objects of size | |
107 | 1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes. | |
108 | ||
109 | Size class #100 consists of zspages with 2 physical pages each, which can | |
110 | hold a total of 5 objects. If we need to store 13 objects of size 1568, we | |
111 | end up allocating three zspages, or 6 physical pages. | |
112 | ||
113 | However, if we take a closer look at size class #96 (which is meant for | |
114 | objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we | |
115 | find that the most optimal zspage configuration for this class is a chain | |
116 | of 5 physical pages::: | |
117 | ||
118 | pages per zspage wasted bytes used% | |
119 | 1 960 76 | |
120 | 2 352 95 | |
121 | 3 1312 89 | |
122 | 4 704 95 | |
123 | 5 96 99 | |
124 | ||
125 | This means that a class #96 configuration with 5 physical pages can store 13 | |
126 | objects of size 1568 in a single zspage, using a total of 5 physical pages. | |
127 | This is more efficient than the class #100 configuration, which would use 6 | |
128 | physical pages to store the same number of objects. | |
129 | ||
130 | As the zspage chain size for class #96 increases, its key characteristics | |
131 | such as pages per-zspage and objects per-zspage also change. This leads to | |
132 | dewer class mergers, resulting in a more compact grouping of classes, which | |
133 | reduces memory wastage. | |
134 | ||
135 | Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: | |
136 | ||
137 | class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable | |
138 | ... | |
139 | 202 3264 0 0 0 0 0 4 0 | |
140 | 254 4096 0 0 0 0 0 1 0 | |
141 | ... | |
142 | ||
143 | Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages | |
144 | per zspage. Any object larger than 3264 bytes is considered huge and belongs | |
145 | to size class #254, which stores each object in its own physical page (objects | |
146 | in huge classes do not share pages). | |
147 | ||
148 | Increasing the size of the chain of zspages also results in a higher watermark | |
149 | for the huge size class and fewer huge classes overall. This allows for more | |
150 | efficient storage of large objects. | |
151 | ||
152 | For zspage chain size of 8, huge class watermark becomes 3632 bytes::: | |
153 | ||
154 | class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable | |
155 | ... | |
156 | 202 3264 0 0 0 0 0 4 0 | |
157 | 211 3408 0 0 0 0 0 5 0 | |
158 | 217 3504 0 0 0 0 0 6 0 | |
159 | 222 3584 0 0 0 0 0 7 0 | |
160 | 225 3632 0 0 0 0 0 8 0 | |
161 | 254 4096 0 0 0 0 0 1 0 | |
162 | ... | |
163 | ||
164 | For zspage chain size of 16, huge class watermark becomes 3840 bytes::: | |
165 | ||
166 | class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable | |
167 | ... | |
168 | 202 3264 0 0 0 0 0 4 0 | |
169 | 206 3328 0 0 0 0 0 13 0 | |
170 | 207 3344 0 0 0 0 0 9 0 | |
171 | 208 3360 0 0 0 0 0 14 0 | |
172 | 211 3408 0 0 0 0 0 5 0 | |
173 | 212 3424 0 0 0 0 0 16 0 | |
174 | 214 3456 0 0 0 0 0 11 0 | |
175 | 217 3504 0 0 0 0 0 6 0 | |
176 | 219 3536 0 0 0 0 0 13 0 | |
177 | 222 3584 0 0 0 0 0 7 0 | |
178 | 223 3600 0 0 0 0 0 15 0 | |
179 | 225 3632 0 0 0 0 0 8 0 | |
180 | 228 3680 0 0 0 0 0 9 0 | |
181 | 230 3712 0 0 0 0 0 10 0 | |
182 | 232 3744 0 0 0 0 0 11 0 | |
183 | 234 3776 0 0 0 0 0 12 0 | |
184 | 235 3792 0 0 0 0 0 13 0 | |
185 | 236 3808 0 0 0 0 0 14 0 | |
186 | 238 3840 0 0 0 0 0 15 0 | |
187 | 254 4096 0 0 0 0 0 1 0 | |
188 | ... | |
189 | ||
190 | Overall the combined zspage chain size effect on zsmalloc pool configuration::: | |
191 | ||
192 | pages per zspage number of size classes (clusters) huge size class watermark | |
193 | 4 69 3264 | |
194 | 5 86 3408 | |
195 | 6 93 3504 | |
196 | 7 112 3584 | |
197 | 8 123 3632 | |
198 | 9 140 3680 | |
199 | 10 143 3712 | |
200 | 11 159 3744 | |
201 | 12 164 3776 | |
202 | 13 180 3792 | |
203 | 14 183 3808 | |
204 | 15 188 3840 | |
205 | 16 191 3840 | |
206 | ||
207 | ||
208 | A synthetic test | |
209 | ---------------- | |
210 | ||
211 | zram as a build artifacts storage (Linux kernel compilation). | |
212 | ||
213 | * `CONFIG_ZSMALLOC_CHAIN_SIZE=4` | |
214 | ||
215 | zsmalloc classes stats::: | |
216 | ||
217 | class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable | |
218 | ... | |
219 | Total 13 51 413836 412973 159955 3 | |
220 | ||
221 | zram mm_stat::: | |
222 | ||
223 | 1691783168 628083717 655175680 0 655175680 60 0 34048 34049 | |
224 | ||
225 | ||
226 | * `CONFIG_ZSMALLOC_CHAIN_SIZE=8` | |
227 | ||
228 | zsmalloc classes stats::: | |
229 | ||
230 | class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable | |
231 | ... | |
232 | Total 18 87 414852 412978 156666 0 | |
233 | ||
234 | zram mm_stat::: | |
235 | ||
236 | 1691803648 627793930 641703936 0 641703936 60 0 33591 33591 | |
237 | ||
238 | Using larger zspage chains may result in using fewer physical pages, as seen | |
239 | in the example where the number of physical pages used decreased from 159955 | |
240 | to 156666, at the same time maximum zsmalloc pool memory usage went down from | |
241 | 655175680 to 641703936 bytes. | |
242 | ||
243 | However, this advantage may be offset by the potential for increased system | |
244 | memory pressure (as some zspages have larger chain sizes) in cases where there | |
245 | is heavy internal fragmentation and zspool compaction is unable to relocate | |
246 | objects and release zspages. In these cases, it is recommended to decrease | |
247 | the limit on the size of the zspage chains (as specified by the | |
248 | CONFIG_ZSMALLOC_CHAIN_SIZE option). |