Commit | Line | Data |
---|---|---|
50aab9b1 | 1 | .. _hmm: |
aa9f34e5 MR |
2 | |
3 | ===================================== | |
bffc33ec | 4 | Heterogeneous Memory Management (HMM) |
aa9f34e5 | 5 | ===================================== |
bffc33ec | 6 | |
e8eddfd2 JG |
7 | Provide infrastructure and helpers to integrate non-conventional memory (device |
8 | memory like GPU on board memory) into regular kernel path, with the cornerstone | |
9 | of this being specialized struct page for such memory (see sections 5 to 7 of | |
10 | this document). | |
11 | ||
12 | HMM also provides optional helpers for SVM (Share Virtual Memory), i.e., | |
2076e5c0 | 13 | allowing a device to transparently access program addresses coherently with |
24844fd3 JC |
14 | the CPU meaning that any valid pointer on the CPU is also a valid pointer |
15 | for the device. This is becoming mandatory to simplify the use of advanced | |
16 | heterogeneous computing where GPU, DSP, or FPGA are used to perform various | |
e8eddfd2 | 17 | computations on behalf of a process. |
76ea470c RC |
18 | |
19 | This document is divided as follows: in the first section I expose the problems | |
20 | related to using device specific memory allocators. In the second section, I | |
21 | expose the hardware limitations that are inherent to many platforms. The third | |
22 | section gives an overview of the HMM design. The fourth section explains how | |
e8eddfd2 | 23 | CPU page-table mirroring works and the purpose of HMM in this context. The |
76ea470c | 24 | fifth section deals with how device memory is represented inside the kernel. |
2076e5c0 RC |
25 | Finally, the last section presents a new migration helper that allows |
26 | leveraging the device DMA engine. | |
76ea470c | 27 | |
aa9f34e5 | 28 | .. contents:: :local: |
76ea470c | 29 | |
24844fd3 JC |
30 | Problems of using a device specific memory allocator |
31 | ==================================================== | |
bffc33ec | 32 | |
e8eddfd2 | 33 | Devices with a large amount of on board memory (several gigabytes) like GPUs |
76ea470c RC |
34 | have historically managed their memory through dedicated driver specific APIs. |
35 | This creates a disconnect between memory allocated and managed by a device | |
36 | driver and regular application memory (private anonymous, shared memory, or | |
37 | regular file backed memory). From here on I will refer to this aspect as split | |
38 | address space. I use shared address space to refer to the opposite situation: | |
39 | i.e., one in which any application memory region can be used by a device | |
40 | transparently. | |
bffc33ec | 41 | |
2076e5c0 RC |
42 | Split address space happens because devices can only access memory allocated |
43 | through a device specific API. This implies that all memory objects in a program | |
e8eddfd2 JG |
44 | are not equal from the device point of view which complicates large programs |
45 | that rely on a wide set of libraries. | |
bffc33ec | 46 | |
2076e5c0 RC |
47 | Concretely, this means that code that wants to leverage devices like GPUs needs |
48 | to copy objects between generically allocated memory (malloc, mmap private, mmap | |
e8eddfd2 JG |
49 | share) and memory allocated through the device driver API (this still ends up |
50 | with an mmap but of the device file). | |
bffc33ec | 51 | |
e8eddfd2 | 52 | For flat data sets (array, grid, image, ...) this isn't too hard to achieve but |
2076e5c0 | 53 | for complex data sets (list, tree, ...) it's hard to get right. Duplicating a |
e8eddfd2 | 54 | complex data set needs to re-map all the pointer relations between each of its |
2076e5c0 | 55 | elements. This is error prone and programs get harder to debug because of the |
e8eddfd2 | 56 | duplicate data set and addresses. |
bffc33ec | 57 | |
e8eddfd2 | 58 | Split address space also means that libraries cannot transparently use data |
76ea470c | 59 | they are getting from the core program or another library and thus each library |
e8eddfd2 | 60 | might have to duplicate its input data set using the device specific memory |
76ea470c RC |
61 | allocator. Large projects suffer from this and waste resources because of the |
62 | various memory copies. | |
bffc33ec | 63 | |
e8eddfd2 | 64 | Duplicating each library API to accept as input or output memory allocated by |
bffc33ec | 65 | each device specific allocator is not a viable option. It would lead to a |
76ea470c | 66 | combinatorial explosion in the library entry points. |
bffc33ec | 67 | |
76ea470c RC |
68 | Finally, with the advance of high level language constructs (in C++ but in |
69 | other languages too) it is now possible for the compiler to leverage GPUs and | |
70 | other devices without programmer knowledge. Some compiler identified patterns | |
71 | are only do-able with a shared address space. It is also more reasonable to use | |
72 | a shared address space for all other patterns. | |
bffc33ec JG |
73 | |
74 | ||
24844fd3 JC |
75 | I/O bus, device memory characteristics |
76 | ====================================== | |
bffc33ec | 77 | |
e8eddfd2 JG |
78 | I/O buses cripple shared address spaces due to a few limitations. Most I/O |
79 | buses only allow basic memory access from device to main memory; even cache | |
2076e5c0 | 80 | coherency is often optional. Access to device memory from a CPU is even more |
e8eddfd2 | 81 | limited. More often than not, it is not cache coherent. |
bffc33ec | 82 | |
76ea470c RC |
83 | If we only consider the PCIE bus, then a device can access main memory (often |
84 | through an IOMMU) and be cache coherent with the CPUs. However, it only allows | |
2076e5c0 | 85 | a limited set of atomic operations from the device on main memory. This is worse |
e8eddfd2 JG |
86 | in the other direction: the CPU can only access a limited range of the device |
87 | memory and cannot perform atomic operations on it. Thus device memory cannot | |
76ea470c | 88 | be considered the same as regular memory from the kernel point of view. |
bffc33ec JG |
89 | |
90 | Another crippling factor is the limited bandwidth (~32GBytes/s with PCIE 4.0 | |
76ea470c RC |
91 | and 16 lanes). This is 33 times less than the fastest GPU memory (1 TBytes/s). |
92 | The final limitation is latency. Access to main memory from the device has an | |
93 | order of magnitude higher latency than when the device accesses its own memory. | |
bffc33ec | 94 | |
76ea470c | 95 | Some platforms are developing new I/O buses or additions/modifications to PCIE |
2076e5c0 RC |
96 | to address some of these limitations (OpenCAPI, CCIX). They mainly allow |
97 | two-way cache coherency between CPU and device and allow all atomic operations the | |
e8eddfd2 | 98 | architecture supports. Sadly, not all platforms are following this trend and |
76ea470c | 99 | some major architectures are left without hardware solutions to these problems. |
bffc33ec | 100 | |
e8eddfd2 JG |
101 | So for shared address space to make sense, not only must we allow devices to |
102 | access any memory but we must also permit any memory to be migrated to device | |
2076e5c0 | 103 | memory while the device is using it (blocking CPU access while it happens). |
bffc33ec JG |
104 | |
105 | ||
24844fd3 JC |
106 | Shared address space and migration |
107 | ================================== | |
bffc33ec | 108 | |
2076e5c0 | 109 | HMM intends to provide two main features. The first one is to share the address |
76ea470c RC |
110 | space by duplicating the CPU page table in the device page table so the same |
111 | address points to the same physical memory for any valid main memory address in | |
bffc33ec JG |
112 | the process address space. |
113 | ||
76ea470c | 114 | To achieve this, HMM offers a set of helpers to populate the device page table |
bffc33ec | 115 | while keeping track of CPU page table updates. Device page table updates are |
76ea470c RC |
116 | not as easy as CPU page table updates. To update the device page table, you must |
117 | allocate a buffer (or use a pool of pre-allocated buffers) and write GPU | |
118 | specific commands in it to perform the update (unmap, cache invalidations, and | |
e8eddfd2 | 119 | flush, ...). This cannot be done through common code for all devices. Hence |
76ea470c RC |
120 | why HMM provides helpers to factor out everything that can be while leaving the |
121 | hardware specific details to the device driver. | |
122 | ||
e8eddfd2 | 123 | The second mechanism HMM provides is a new kind of ZONE_DEVICE memory that |
2076e5c0 | 124 | allows allocating a struct page for each page of device memory. Those pages |
e8eddfd2 | 125 | are special because the CPU cannot map them. However, they allow migrating |
76ea470c | 126 | main memory to device memory using existing migration mechanisms and everything |
2076e5c0 RC |
127 | looks like a page that is swapped out to disk from the CPU point of view. Using a |
128 | struct page gives the easiest and cleanest integration with existing mm | |
129 | mechanisms. Here again, HMM only provides helpers, first to hotplug new ZONE_DEVICE | |
76ea470c | 130 | memory for the device memory and second to perform migration. Policy decisions |
2076e5c0 | 131 | of what and when to migrate is left to the device driver. |
76ea470c RC |
132 | |
133 | Note that any CPU access to a device page triggers a page fault and a migration | |
134 | back to main memory. For example, when a page backing a given CPU address A is | |
135 | migrated from a main memory page to a device page, then any CPU access to | |
136 | address A triggers a page fault and initiates a migration back to main memory. | |
137 | ||
138 | With these two features, HMM not only allows a device to mirror process address | |
2076e5c0 RC |
139 | space and keeps both CPU and device page tables synchronized, but also |
140 | leverages device memory by migrating the part of the data set that is actively being | |
76ea470c | 141 | used by the device. |
bffc33ec JG |
142 | |
143 | ||
aa9f34e5 MR |
144 | Address space mirroring implementation and API |
145 | ============================================== | |
bffc33ec | 146 | |
76ea470c RC |
147 | Address space mirroring's main objective is to allow duplication of a range of |
148 | CPU page table into a device page table; HMM helps keep both synchronized. A | |
e8eddfd2 | 149 | device driver that wants to mirror a process address space must start with the |
a22dd506 JG |
150 | registration of a mmu_interval_notifier:: |
151 | ||
5292e24a JG |
152 | int mmu_interval_notifier_insert(struct mmu_interval_notifier *interval_sub, |
153 | struct mm_struct *mm, unsigned long start, | |
154 | unsigned long length, | |
155 | const struct mmu_interval_notifier_ops *ops); | |
a22dd506 | 156 | |
5292e24a JG |
157 | During the ops->invalidate() callback the device driver must perform the |
158 | update action to the range (mark range read only, or fully unmap, etc.). The | |
159 | device must complete the update before the driver callback returns. | |
bffc33ec | 160 | |
76ea470c | 161 | When the device driver wants to populate a range of virtual addresses, it can |
d45d464b | 162 | use:: |
aa9f34e5 | 163 | |
be957c88 | 164 | int hmm_range_fault(struct hmm_range *range); |
bffc33ec | 165 | |
6bfef2f9 JG |
166 | It will trigger a page fault on missing or read-only entries if write access is |
167 | requested (see below). Page faults use the generic mm page fault code path just | |
168 | like a CPU page fault. | |
bffc33ec | 169 | |
76ea470c RC |
170 | Both functions copy CPU page table entries into their pfns array argument. Each |
171 | entry in that array corresponds to an address in the virtual range. HMM | |
172 | provides a set of flags to help the driver identify special CPU page table | |
173 | entries. | |
bffc33ec | 174 | |
2076e5c0 RC |
175 | Locking within the sync_cpu_device_pagetables() callback is the most important |
176 | aspect the driver must respect in order to keep things properly synchronized. | |
177 | The usage pattern is:: | |
bffc33ec JG |
178 | |
179 | int driver_populate_range(...) | |
180 | { | |
181 | struct hmm_range range; | |
182 | ... | |
25f23a0c | 183 | |
5292e24a | 184 | range.notifier = &interval_sub; |
25f23a0c JG |
185 | range.start = ...; |
186 | range.end = ...; | |
2733ea14 | 187 | range.hmm_pfns = ...; |
a3e0d41c | 188 | |
5292e24a | 189 | if (!mmget_not_zero(interval_sub->notifier.mm)) |
a22dd506 | 190 | return -EFAULT; |
25f23a0c | 191 | |
bffc33ec | 192 | again: |
5292e24a | 193 | range.notifier_seq = mmu_interval_read_begin(&interval_sub); |
3e4e28c5 | 194 | mmap_read_lock(mm); |
6bfef2f9 | 195 | ret = hmm_range_fault(&range); |
25f23a0c | 196 | if (ret) { |
3e4e28c5 | 197 | mmap_read_unlock(mm); |
a22dd506 JG |
198 | if (ret == -EBUSY) |
199 | goto again; | |
bffc33ec | 200 | return ret; |
25f23a0c | 201 | } |
3e4e28c5 | 202 | mmap_read_unlock(mm); |
a22dd506 | 203 | |
bffc33ec | 204 | take_lock(driver->update); |
a22dd506 | 205 | if (mmu_interval_read_retry(&ni, range.notifier_seq) { |
bffc33ec JG |
206 | release_lock(driver->update); |
207 | goto again; | |
208 | } | |
209 | ||
a22dd506 JG |
210 | /* Use pfns array content to update device page table, |
211 | * under the update lock */ | |
bffc33ec JG |
212 | |
213 | release_lock(driver->update); | |
214 | return 0; | |
215 | } | |
216 | ||
76ea470c | 217 | The driver->update lock is the same lock that the driver takes inside its |
a22dd506 JG |
218 | invalidate() callback. That lock must be held before calling |
219 | mmu_interval_read_retry() to avoid any race with a concurrent CPU page table | |
220 | update. | |
bffc33ec | 221 | |
023a019a JG |
222 | Leverage default_flags and pfn_flags_mask |
223 | ========================================= | |
224 | ||
2076e5c0 RC |
225 | The hmm_range struct has 2 fields, default_flags and pfn_flags_mask, that specify |
226 | fault or snapshot policy for the whole range instead of having to set them | |
227 | for each entry in the pfns array. | |
228 | ||
2733ea14 JG |
229 | For instance if the device driver wants pages for a range with at least read |
230 | permission, it sets:: | |
023a019a | 231 | |
2733ea14 | 232 | range->default_flags = HMM_PFN_REQ_FAULT; |
023a019a JG |
233 | range->pfn_flags_mask = 0; |
234 | ||
2076e5c0 | 235 | and calls hmm_range_fault() as described above. This will fill fault all pages |
023a019a JG |
236 | in the range with at least read permission. |
237 | ||
2076e5c0 RC |
238 | Now let's say the driver wants to do the same except for one page in the range for |
239 | which it wants to have write permission. Now driver set:: | |
91173c6e | 240 | |
2733ea14 JG |
241 | range->default_flags = HMM_PFN_REQ_FAULT; |
242 | range->pfn_flags_mask = HMM_PFN_REQ_WRITE; | |
243 | range->pfns[index_of_write] = HMM_PFN_REQ_WRITE; | |
023a019a | 244 | |
2076e5c0 | 245 | With this, HMM will fault in all pages with at least read (i.e., valid) and for the |
023a019a | 246 | address == range->start + (index_of_write << PAGE_SHIFT) it will fault with |
2076e5c0 | 247 | write permission i.e., if the CPU pte does not have write permission set then HMM |
023a019a JG |
248 | will call handle_mm_fault(). |
249 | ||
2733ea14 JG |
250 | After hmm_range_fault completes the flag bits are set to the current state of |
251 | the page tables, ie HMM_PFN_VALID | HMM_PFN_WRITE will be set if the page is | |
252 | writable. | |
023a019a JG |
253 | |
254 | ||
aa9f34e5 MR |
255 | Represent and manage device memory from core kernel point of view |
256 | ================================================================= | |
bffc33ec | 257 | |
2076e5c0 RC |
258 | Several different designs were tried to support device memory. The first one |
259 | used a device specific data structure to keep information about migrated memory | |
260 | and HMM hooked itself in various places of mm code to handle any access to | |
76ea470c RC |
261 | addresses that were backed by device memory. It turns out that this ended up |
262 | replicating most of the fields of struct page and also needed many kernel code | |
263 | paths to be updated to understand this new kind of memory. | |
bffc33ec | 264 | |
76ea470c RC |
265 | Most kernel code paths never try to access the memory behind a page |
266 | but only care about struct page contents. Because of this, HMM switched to | |
267 | directly using struct page for device memory which left most kernel code paths | |
268 | unaware of the difference. We only need to make sure that no one ever tries to | |
269 | map those pages from the CPU side. | |
bffc33ec | 270 | |
24844fd3 JC |
271 | Migration to and from device memory |
272 | =================================== | |
bffc33ec | 273 | |
f7ebd9ed RC |
274 | Because the CPU cannot access device memory directly, the device driver must |
275 | use hardware DMA or device specific load/store instructions to migrate data. | |
276 | The migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() | |
277 | functions are designed to make drivers easier to write and to centralize common | |
278 | code across drivers. | |
279 | ||
280 | Before migrating pages to device private memory, special device private | |
281 | ``struct page`` need to be created. These will be used as special "swap" | |
282 | page table entries so that a CPU process will fault if it tries to access | |
283 | a page that has been migrated to device private memory. | |
284 | ||
285 | These can be allocated and freed with:: | |
286 | ||
287 | struct resource *res; | |
288 | struct dev_pagemap pagemap; | |
289 | ||
290 | res = request_free_mem_region(&iomem_resource, /* number of bytes */, | |
291 | "name of driver resource"); | |
292 | pagemap.type = MEMORY_DEVICE_PRIVATE; | |
293 | pagemap.range.start = res->start; | |
294 | pagemap.range.end = res->end; | |
295 | pagemap.nr_range = 1; | |
296 | pagemap.ops = &device_devmem_ops; | |
297 | memremap_pages(&pagemap, numa_node_id()); | |
298 | ||
299 | memunmap_pages(&pagemap); | |
300 | release_mem_region(pagemap.range.start, range_len(&pagemap.range)); | |
301 | ||
302 | There are also devm_request_free_mem_region(), devm_memremap_pages(), | |
303 | devm_memunmap_pages(), and devm_release_mem_region() when the resources can | |
304 | be tied to a ``struct device``. | |
305 | ||
306 | The overall migration steps are similar to migrating NUMA pages within system | |
307 | memory (see :ref:`Page migration <page_migration>`) but the steps are split | |
308 | between device driver specific code and shared common code: | |
309 | ||
310 | 1. ``mmap_read_lock()`` | |
311 | ||
312 | The device driver has to pass a ``struct vm_area_struct`` to | |
313 | migrate_vma_setup() so the mmap_read_lock() or mmap_write_lock() needs to | |
314 | be held for the duration of the migration. | |
315 | ||
316 | 2. ``migrate_vma_setup(struct migrate_vma *args)`` | |
317 | ||
318 | The device driver initializes the ``struct migrate_vma`` fields and passes | |
319 | the pointer to migrate_vma_setup(). The ``args->flags`` field is used to | |
320 | filter which source pages should be migrated. For example, setting | |
321 | ``MIGRATE_VMA_SELECT_SYSTEM`` will only migrate system memory and | |
322 | ``MIGRATE_VMA_SELECT_DEVICE_PRIVATE`` will only migrate pages residing in | |
323 | device private memory. If the latter flag is set, the ``args->pgmap_owner`` | |
324 | field is used to identify device private pages owned by the driver. This | |
325 | avoids trying to migrate device private pages residing in other devices. | |
326 | Currently only anonymous private VMA ranges can be migrated to or from | |
327 | system memory and device private memory. | |
328 | ||
329 | One of the first steps migrate_vma_setup() does is to invalidate other | |
330 | device's MMUs with the ``mmu_notifier_invalidate_range_start(()`` and | |
331 | ``mmu_notifier_invalidate_range_end()`` calls around the page table | |
332 | walks to fill in the ``args->src`` array with PFNs to be migrated. | |
333 | The ``invalidate_range_start()`` callback is passed a | |
334 | ``struct mmu_notifier_range`` with the ``event`` field set to | |
335 | ``MMU_NOTIFY_MIGRATE`` and the ``migrate_pgmap_owner`` field set to | |
336 | the ``args->pgmap_owner`` field passed to migrate_vma_setup(). This is | |
337 | allows the device driver to skip the invalidation callback and only | |
338 | invalidate device private MMU mappings that are actually migrating. | |
339 | This is explained more in the next section. | |
340 | ||
341 | While walking the page tables, a ``pte_none()`` or ``is_zero_pfn()`` | |
342 | entry results in a valid "zero" PFN stored in the ``args->src`` array. | |
343 | This lets the driver allocate device private memory and clear it instead | |
344 | of copying a page of zeros. Valid PTE entries to system memory or | |
345 | device private struct pages will be locked with ``lock_page()``, isolated | |
346 | from the LRU (if system memory since device private pages are not on | |
347 | the LRU), unmapped from the process, and a special migration PTE is | |
348 | inserted in place of the original PTE. | |
349 | migrate_vma_setup() also clears the ``args->dst`` array. | |
350 | ||
351 | 3. The device driver allocates destination pages and copies source pages to | |
352 | destination pages. | |
353 | ||
354 | The driver checks each ``src`` entry to see if the ``MIGRATE_PFN_MIGRATE`` | |
355 | bit is set and skips entries that are not migrating. The device driver | |
356 | can also choose to skip migrating a page by not filling in the ``dst`` | |
357 | array for that page. | |
358 | ||
359 | The driver then allocates either a device private struct page or a | |
360 | system memory page, locks the page with ``lock_page()``, and fills in the | |
361 | ``dst`` array entry with:: | |
362 | ||
f910ce52 | 363 | dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; |
f7ebd9ed RC |
364 | |
365 | Now that the driver knows that this page is being migrated, it can | |
366 | invalidate device private MMU mappings and copy device private memory | |
367 | to system memory or another device private page. The core Linux kernel | |
368 | handles CPU page table invalidations so the device driver only has to | |
369 | invalidate its own MMU mappings. | |
370 | ||
371 | The driver can use ``migrate_pfn_to_page(src[i])`` to get the | |
372 | ``struct page`` of the source and either copy the source page to the | |
373 | destination or clear the destination device private memory if the pointer | |
374 | is ``NULL`` meaning the source page was not populated in system memory. | |
375 | ||
376 | 4. ``migrate_vma_pages()`` | |
377 | ||
378 | This step is where the migration is actually "committed". | |
379 | ||
380 | If the source page was a ``pte_none()`` or ``is_zero_pfn()`` page, this | |
381 | is where the newly allocated page is inserted into the CPU's page table. | |
382 | This can fail if a CPU thread faults on the same page. However, the page | |
383 | table is locked and only one of the new pages will be inserted. | |
384 | The device driver will see that the ``MIGRATE_PFN_MIGRATE`` bit is cleared | |
385 | if it loses the race. | |
386 | ||
387 | If the source page was locked, isolated, etc. the source ``struct page`` | |
388 | information is now copied to destination ``struct page`` finalizing the | |
389 | migration on the CPU side. | |
390 | ||
391 | 5. Device driver updates device MMU page tables for pages still migrating, | |
392 | rolling back pages not migrating. | |
393 | ||
394 | If the ``src`` entry still has ``MIGRATE_PFN_MIGRATE`` bit set, the device | |
395 | driver can update the device MMU and set the write enable bit if the | |
396 | ``MIGRATE_PFN_WRITE`` bit is set. | |
397 | ||
398 | 6. ``migrate_vma_finalize()`` | |
399 | ||
400 | This step replaces the special migration page table entry with the new | |
401 | page's page table entry and releases the reference to the source and | |
402 | destination ``struct page``. | |
403 | ||
404 | 7. ``mmap_read_unlock()`` | |
405 | ||
406 | The lock can now be released. | |
bffc33ec | 407 | |
aa9f34e5 MR |
408 | Memory cgroup (memcg) and rss accounting |
409 | ======================================== | |
bffc33ec | 410 | |
2076e5c0 | 411 | For now, device memory is accounted as any regular page in rss counters (either |
76ea470c | 412 | anonymous if device page is used for anonymous, file if device page is used for |
2076e5c0 | 413 | file backed page, or shmem if device page is used for shared memory). This is a |
76ea470c RC |
414 | deliberate choice to keep existing applications, that might start using device |
415 | memory without knowing about it, running unimpacted. | |
416 | ||
e8eddfd2 | 417 | A drawback is that the OOM killer might kill an application using a lot of |
76ea470c RC |
418 | device memory and not a lot of regular system memory and thus not freeing much |
419 | system memory. We want to gather more real world experience on how applications | |
420 | and system react under memory pressure in the presence of device memory before | |
bffc33ec JG |
421 | deciding to account device memory differently. |
422 | ||
423 | ||
76ea470c | 424 | Same decision was made for memory cgroup. Device memory pages are accounted |
bffc33ec JG |
425 | against same memory cgroup a regular page would be accounted to. This does |
426 | simplify migration to and from device memory. This also means that migration | |
e8eddfd2 | 427 | back from device memory to regular memory cannot fail because it would |
bffc33ec | 428 | go above memory cgroup limit. We might revisit this choice latter on once we |
76ea470c | 429 | get more experience in how device memory is used and its impact on memory |
bffc33ec JG |
430 | resource control. |
431 | ||
432 | ||
2076e5c0 | 433 | Note that device memory can never be pinned by a device driver nor through GUP |
bffc33ec | 434 | and thus such memory is always free upon process exit. Or when last reference |
76ea470c | 435 | is dropped in case of shared memory or file backed memory. |