Commit | Line | Data |
---|---|---|
f227e04e | 1 | ================================================== |
16a7ade8 | 2 | page owner: Tracking about who allocated each page |
f227e04e | 3 | ================================================== |
16a7ade8 | 4 | |
f227e04e MR |
5 | Introduction |
6 | ============ | |
16a7ade8 JK |
7 | |
8 | page owner is for the tracking about who allocated each page. | |
9 | It can be used to debug memory leak or to find a memory hogger. | |
10 | When allocation happens, information about allocation such as call stack | |
11 | and order of pages is stored into certain storage for each page. | |
12 | When we need to know about status of all pages, we can get and analyze | |
13 | this information. | |
14 | ||
15 | Although we already have tracepoint for tracing page allocation/free, | |
16 | using it for analyzing who allocate each page is rather complex. We need | |
17 | to enlarge the trace buffer for preventing overlapping until userspace | |
18 | program launched. And, launched program continually dump out the trace | |
94ebdd28 | 19 | buffer for later analysis and it would change system behaviour with more |
16a7ade8 JK |
20 | possibility rather than just keeping it in memory, so bad for debugging. |
21 | ||
22 | page owner can also be used for various purposes. For example, accurate | |
23 | fragmentation statistics can be obtained through gfp flag information of | |
24 | each page. It is already implemented and activated if page owner is | |
25 | enabled. Other usages are more than welcome. | |
26 | ||
024314d6 YC |
27 | page owner is disabled by default. So, if you'd like to use it, you need |
28 | to add "page_owner=on" to your boot cmdline. If the kernel is built | |
29 | with page owner and page owner is disabled in runtime due to not enabling | |
16a7ade8 JK |
30 | boot option, runtime overhead is marginal. If disabled in runtime, it |
31 | doesn't require memory to store owner information, so there is no runtime | |
32 | memory overhead. And, page owner inserts just two unlikely branches into | |
7dd80b8a VB |
33 | the page allocator hotpath and if not enabled, then allocation is done |
34 | like as the kernel without page owner. These two unlikely branches should | |
35 | not affect to allocation performance, especially if the static keys jump | |
36 | label patching functionality is available. Following is the kernel's code | |
37 | size change due to this facility. | |
16a7ade8 | 38 | |
0719fdba YC |
39 | Although enabling page owner increases kernel size by several kilobytes, |
40 | most of this code is outside page allocator and its hot path. Building | |
41 | the kernel with page owner and turning it on if needed would be great | |
42 | option to debug kernel memory problem. | |
16a7ade8 JK |
43 | |
44 | There is one notice that is caused by implementation detail. page owner | |
45 | stores information into the memory from struct page extension. This memory | |
46 | is initialized some time later than that page allocator starts in sparse | |
47 | memory system, so, until initialization, many pages can be allocated and | |
48 | they would have no owner information. To fix it up, these early allocated | |
49 | pages are investigated and marked as allocated in initialization phase. | |
50 | Although it doesn't mean that they have the right owner information, | |
51 | at least, we can tell whether the page is allocated or not, | |
52 | more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages | |
e7951a3e | 53 | are caught and marked, although they are mostly allocated from struct |
16a7ade8 JK |
54 | page extension feature. Anyway, after that, no page is left in |
55 | un-tracking state. | |
56 | ||
f227e04e MR |
57 | Usage |
58 | ===== | |
59 | ||
60 | 1) Build user-space helper:: | |
16a7ade8 | 61 | |
799fb82a | 62 | cd tools/mm |
16a7ade8 JK |
63 | make page_owner_sort |
64 | ||
f227e04e | 65 | 2) Enable page owner: add "page_owner=on" to boot cmdline. |
16a7ade8 | 66 | |
59d7cb27 | 67 | 3) Do the job that you want to debug. |
16a7ade8 | 68 | |
f227e04e MR |
69 | 4) Analyze information from page owner:: |
70 | ||
16a7ade8 | 71 | cat /sys/kernel/debug/page_owner > page_owner_full.txt |
5b94ce2f | 72 | ./page_owner_sort page_owner_full.txt sorted_page_owner.txt |
16a7ade8 | 73 | |
18ab3078 | 74 | The general output of ``page_owner_full.txt`` is as follows:: |
f7df2b1c ZW |
75 | |
76 | Page allocated via order XXX, ... | |
77 | PFN XXX ... | |
2e944985 | 78 | // Detailed stack |
f7df2b1c ZW |
79 | |
80 | Page allocated via order XXX, ... | |
81 | PFN XXX ... | |
2e944985 | 82 | // Detailed stack |
8f0efa81 KL |
83 | By default, it will do full pfn dump, to start with a given pfn, |
84 | page_owner supports fseek. | |
85 | ||
86 | FILE *fp = fopen("/sys/kernel/debug/page_owner", "r"); | |
87 | fseek(fp, pfn_start, SEEK_SET); | |
f7df2b1c ZW |
88 | |
89 | The ``page_owner_sort`` tool ignores ``PFN`` rows, puts the remaining rows | |
90 | in buf, uses regexp to extract the page order value, counts the times | |
57f2b54a | 91 | and pages of buf, and finally sorts them according to the parameter(s). |
f7df2b1c | 92 | |
f227e04e | 93 | See the result about who allocated each page |
18ab3078 | 94 | in the ``sorted_page_owner.txt``. General output:: |
f7df2b1c ZW |
95 | |
96 | XXX times, XXX pages: | |
97 | Page allocated via order XXX, ... | |
2e944985 | 98 | // Detailed stack |
f7df2b1c ZW |
99 | |
100 | By default, ``page_owner_sort`` is sorted according to the times of buf. | |
57f2b54a SH |
101 | If you want to sort by the page nums of buf, use the ``-m`` parameter. |
102 | The detailed parameters are: | |
103 | ||
5603f9bd | 104 | fundamental function:: |
57f2b54a SH |
105 | |
106 | Sort: | |
107 | -a Sort by memory allocation time. | |
108 | -m Sort by total memory. | |
109 | -p Sort by pid. | |
cf3c2c86 | 110 | -P Sort by tgid. |
194d52d7 | 111 | -n Sort by task command name. |
57f2b54a SH |
112 | -r Sort by memory release time. |
113 | -s Sort by stack trace. | |
114 | -t Sort by times (default). | |
ebbeae36 JY |
115 | --sort <order> Specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]]. |
116 | Choose a key from the **STANDARD FORMAT SPECIFIERS** section. The "+" is | |
117 | optional since default direction is increasing numerical or lexicographic | |
118 | order. Mixed use of abbreviated and complete-form of keys is allowed. | |
119 | ||
120 | Examples: | |
121 | ./page_owner_sort <input> <output> --sort=n,+pid,-tgid | |
122 | ./page_owner_sort <input> <output> --sort=at | |
57f2b54a | 123 | |
5603f9bd | 124 | additional function:: |
57f2b54a SH |
125 | |
126 | Cull: | |
9c8a0a8e JY |
127 | --cull <rules> |
128 | Specify culling rules.Culling syntax is key[,key[,...]].Choose a | |
129 | multi-letter key from the **STANDARD FORMAT SPECIFIERS** section. | |
130 | ||
9c8a0a8e JY |
131 | <rules> is a single argument in the form of a comma-separated list, |
132 | which offers a way to specify individual culling rules. The recognized | |
133 | keywords are described in the **STANDARD FORMAT SPECIFIERS** section below. | |
134 | <rules> can be specified by the sequence of keys k1,k2, ..., as described in | |
135 | the STANDARD SORT KEYS section below. Mixed use of abbreviated and | |
136 | complete-form of keys is allowed. | |
137 | ||
9c8a0a8e JY |
138 | Examples: |
139 | ./page_owner_sort <input> <output> --cull=stacktrace | |
140 | ./page_owner_sort <input> <output> --cull=st,pid,name | |
141 | ./page_owner_sort <input> <output> --cull=n,f | |
57f2b54a SH |
142 | |
143 | Filter: | |
59d7cb27 | 144 | -f Filter out the information of blocks whose memory has been released. |
8ea8613a JY |
145 | |
146 | Select: | |
75382a2d JY |
147 | --pid <pidlist> Select by pid. This selects the blocks whose process ID |
148 | numbers appear in <pidlist>. | |
149 | --tgid <tgidlist> Select by tgid. This selects the blocks whose thread | |
150 | group ID numbers appear in <tgidlist>. | |
151 | --name <cmdlist> Select by task command name. This selects the blocks whose | |
152 | task command name appear in <cmdlist>. | |
153 | ||
154 | <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list, | |
155 | which offers a way to specify individual selecting rules. | |
156 | ||
157 | ||
158 | Examples: | |
159 | ./page_owner_sort <input> <output> --pid=1 | |
160 | ./page_owner_sort <input> <output> --tgid=1,2,3 | |
161 | ./page_owner_sort <input> <output> --name name1,name2 | |
9c8a0a8e JY |
162 | |
163 | STANDARD FORMAT SPECIFIERS | |
164 | ========================== | |
5603f9bd | 165 | :: |
9c8a0a8e | 166 | |
d1ed51fc | 167 | For --sort option: |
ebbeae36 JY |
168 | |
169 | KEY LONG DESCRIPTION | |
170 | p pid process ID | |
171 | tg tgid thread group ID | |
172 | n name task command name | |
173 | st stacktrace stack trace of the page allocation | |
174 | T txt full text of block | |
175 | ft free_ts timestamp of the page when it was released | |
176 | at alloc_ts timestamp of the page when it was allocated | |
d1ed51fc | 177 | ator allocator memory allocator for pages |
ebbeae36 | 178 | |
e7951a3e | 179 | For --cull option: |
ebbeae36 | 180 | |
9c8a0a8e JY |
181 | KEY LONG DESCRIPTION |
182 | p pid process ID | |
183 | tg tgid thread group ID | |
184 | n name task command name | |
185 | f free whether the page has been released or not | |
ebbeae36 | 186 | st stacktrace stack trace of the page allocation |
d1ed51fc | 187 | ator allocator memory allocator for pages |