Commit | Line | Data |
---|---|---|
3cdd868e CD |
1 | ============================ |
2 | Subsystem Trace Points: kmem | |
3 | ============================ | |
8fbb398f | 4 | |
2ec91eec RD |
5 | The kmem tracing system captures events related to object and page allocation |
6 | within the kernel. Broadly speaking there are five major subheadings. | |
8fbb398f | 7 | |
3cdd868e CD |
8 | - Slab allocation of small objects of unknown type (kmalloc) |
9 | - Slab allocation of small objects of known type | |
10 | - Page allocation | |
11 | - Per-CPU Allocator Activity | |
12 | - External Fragmentation | |
8fbb398f | 13 | |
2ec91eec | 14 | This document describes what each of the tracepoints is and why they |
8fbb398f MG |
15 | might be useful. |
16 | ||
17 | 1. Slab allocation of small objects of unknown type | |
18 | =================================================== | |
3cdd868e CD |
19 | :: |
20 | ||
21 | kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s | |
22 | kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d | |
23 | kfree call_site=%lx ptr=%p | |
8fbb398f MG |
24 | |
25 | Heavy activity for these events may indicate that a specific cache is | |
26 | justified, particularly if kmalloc slab pages are getting significantly | |
27 | internal fragmented as a result of the allocation pattern. By correlating | |
28 | kmalloc with kfree, it may be possible to identify memory leaks and where | |
29 | the allocation sites were. | |
30 | ||
31 | ||
32 | 2. Slab allocation of small objects of known type | |
33 | ================================================= | |
3cdd868e CD |
34 | :: |
35 | ||
36 | kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s | |
37 | kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d | |
38 | kmem_cache_free call_site=%lx ptr=%p | |
8fbb398f MG |
39 | |
40 | These events are similar in usage to the kmalloc-related events except that | |
41 | it is likely easier to pin the event down to a specific cache. At the time | |
42 | of writing, no information is available on what slab is being allocated from, | |
2ec91eec | 43 | but the call_site can usually be used to extrapolate that information. |
8fbb398f MG |
44 | |
45 | 3. Page allocation | |
46 | ================== | |
3cdd868e CD |
47 | :: |
48 | ||
49 | mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s | |
50 | mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d | |
51 | mm_page_free page=%p pfn=%lu order=%d | |
52 | mm_page_free_batched page=%p pfn=%lu order=%d cold=%d | |
8fbb398f MG |
53 | |
54 | These four events deal with page allocation and freeing. mm_page_alloc is | |
55 | a simple indicator of page allocator activity. Pages may be allocated from | |
56 | the per-CPU allocator (high performance) or the buddy allocator. | |
57 | ||
58 | If pages are allocated directly from the buddy allocator, the | |
59 | mm_page_alloc_zone_locked event is triggered. This event is important as high | |
60 | amounts of activity imply high activity on the zone->lock. Taking this lock | |
61 | impairs performance by disabling interrupts, dirtying cache lines between | |
62 | CPUs and serialising many CPUs. | |
63 | ||
b413d48a | 64 | When a page is freed directly by the caller, the only mm_page_free event |
8fbb398f MG |
65 | is triggered. Significant amounts of activity here could indicate that the |
66 | callers should be batching their activities. | |
67 | ||
b413d48a KK |
68 | When pages are freed in batch, the also mm_page_free_batched is triggered. |
69 | Broadly speaking, pages are taken off the LRU lock in bulk and | |
70 | freed in batch with a page list. Significant amounts of activity here could | |
8fbb398f | 71 | indicate that the system is under memory pressure and can also indicate |
15b44736 | 72 | contention on the lruvec->lru_lock. |
8fbb398f MG |
73 | |
74 | 4. Per-CPU Allocator Activity | |
75 | ============================= | |
3cdd868e CD |
76 | :: |
77 | ||
78 | mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d | |
79 | mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d | |
8fbb398f MG |
80 | |
81 | In front of the page allocator is a per-cpu page allocator. It exists only | |
82 | for order-0 pages, reduces contention on the zone->lock and reduces the | |
83 | amount of writing on struct page. | |
84 | ||
85 | When a per-CPU list is empty or pages of the wrong type are allocated, | |
86 | the zone->lock will be taken once and the per-CPU list refilled. The event | |
87 | triggered is mm_page_alloc_zone_locked for each page allocated with the | |
88 | event indicating whether it is for a percpu_refill or not. | |
89 | ||
90 | When the per-CPU list is too full, a number of pages are freed, each one | |
91 | which triggers a mm_page_pcpu_drain event. | |
92 | ||
2ec91eec | 93 | The individual nature of the events is so that pages can be tracked |
8fbb398f | 94 | between allocation and freeing. A number of drain or refill pages that occur |
2ec91eec | 95 | consecutively imply the zone->lock being taken once. Large amounts of per-CPU |
8fbb398f MG |
96 | refills and drains could imply an imbalance between CPUs where too much work |
97 | is being concentrated in one place. It could also indicate that the per-CPU | |
98 | lists should be a larger size. Finally, large amounts of refills on one CPU | |
99 | and drains on another could be a factor in causing large amounts of cache | |
100 | line bounces due to writes between CPUs and worth investigating if pages | |
101 | can be allocated and freed on the same CPU through some algorithm change. | |
102 | ||
103 | 5. External Fragmentation | |
104 | ========================= | |
3cdd868e CD |
105 | :: |
106 | ||
107 | mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d | |
8fbb398f MG |
108 | |
109 | External fragmentation affects whether a high-order allocation will be | |
110 | successful or not. For some types of hardware, this is important although | |
111 | it is avoided where possible. If the system is using huge pages and needs | |
112 | to be able to resize the pool over the lifetime of the system, this value | |
113 | is important. | |
114 | ||
115 | Large numbers of this event implies that memory is fragmenting and | |
116 | high-order allocations will start failing at some time in the future. One | |
2ec91eec | 117 | means of reducing the occurrence of this event is to increase the size of |
8fbb398f MG |
118 | min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where |
119 | pageblock_size is usually the size of the default hugepage size. |