Commit | Line | Data |
---|---|---|
f227e04e MR |
1 | .. _page_owner: |
2 | ||
3 | ================================================== | |
16a7ade8 | 4 | page owner: Tracking about who allocated each page |
f227e04e | 5 | ================================================== |
16a7ade8 | 6 | |
f227e04e MR |
7 | Introduction |
8 | ============ | |
16a7ade8 JK |
9 | |
10 | page owner is for the tracking about who allocated each page. | |
11 | It can be used to debug memory leak or to find a memory hogger. | |
12 | When allocation happens, information about allocation such as call stack | |
13 | and order of pages is stored into certain storage for each page. | |
14 | When we need to know about status of all pages, we can get and analyze | |
15 | this information. | |
16 | ||
17 | Although we already have tracepoint for tracing page allocation/free, | |
18 | using it for analyzing who allocate each page is rather complex. We need | |
19 | to enlarge the trace buffer for preventing overlapping until userspace | |
20 | program launched. And, launched program continually dump out the trace | |
94ebdd28 | 21 | buffer for later analysis and it would change system behaviour with more |
16a7ade8 JK |
22 | possibility rather than just keeping it in memory, so bad for debugging. |
23 | ||
24 | page owner can also be used for various purposes. For example, accurate | |
25 | fragmentation statistics can be obtained through gfp flag information of | |
26 | each page. It is already implemented and activated if page owner is | |
27 | enabled. Other usages are more than welcome. | |
28 | ||
024314d6 YC |
29 | page owner is disabled by default. So, if you'd like to use it, you need |
30 | to add "page_owner=on" to your boot cmdline. If the kernel is built | |
31 | with page owner and page owner is disabled in runtime due to not enabling | |
16a7ade8 JK |
32 | boot option, runtime overhead is marginal. If disabled in runtime, it |
33 | doesn't require memory to store owner information, so there is no runtime | |
34 | memory overhead. And, page owner inserts just two unlikely branches into | |
7dd80b8a VB |
35 | the page allocator hotpath and if not enabled, then allocation is done |
36 | like as the kernel without page owner. These two unlikely branches should | |
37 | not affect to allocation performance, especially if the static keys jump | |
38 | label patching functionality is available. Following is the kernel's code | |
39 | size change due to this facility. | |
16a7ade8 | 40 | |
0719fdba YC |
41 | Although enabling page owner increases kernel size by several kilobytes, |
42 | most of this code is outside page allocator and its hot path. Building | |
43 | the kernel with page owner and turning it on if needed would be great | |
44 | option to debug kernel memory problem. | |
16a7ade8 JK |
45 | |
46 | There is one notice that is caused by implementation detail. page owner | |
47 | stores information into the memory from struct page extension. This memory | |
48 | is initialized some time later than that page allocator starts in sparse | |
49 | memory system, so, until initialization, many pages can be allocated and | |
50 | they would have no owner information. To fix it up, these early allocated | |
51 | pages are investigated and marked as allocated in initialization phase. | |
52 | Although it doesn't mean that they have the right owner information, | |
53 | at least, we can tell whether the page is allocated or not, | |
54 | more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages | |
55 | are catched and marked, although they are mostly allocated from struct | |
56 | page extension feature. Anyway, after that, no page is left in | |
57 | un-tracking state. | |
58 | ||
f227e04e MR |
59 | Usage |
60 | ===== | |
61 | ||
62 | 1) Build user-space helper:: | |
16a7ade8 | 63 | |
16a7ade8 JK |
64 | cd tools/vm |
65 | make page_owner_sort | |
66 | ||
f227e04e | 67 | 2) Enable page owner: add "page_owner=on" to boot cmdline. |
16a7ade8 | 68 | |
59d7cb27 | 69 | 3) Do the job that you want to debug. |
16a7ade8 | 70 | |
f227e04e MR |
71 | 4) Analyze information from page owner:: |
72 | ||
16a7ade8 | 73 | cat /sys/kernel/debug/page_owner > page_owner_full.txt |
5b94ce2f | 74 | ./page_owner_sort page_owner_full.txt sorted_page_owner.txt |
16a7ade8 | 75 | |
18ab3078 | 76 | The general output of ``page_owner_full.txt`` is as follows:: |
f7df2b1c ZW |
77 | |
78 | Page allocated via order XXX, ... | |
79 | PFN XXX ... | |
2e944985 | 80 | // Detailed stack |
f7df2b1c ZW |
81 | |
82 | Page allocated via order XXX, ... | |
83 | PFN XXX ... | |
2e944985 | 84 | // Detailed stack |
8f0efa81 KL |
85 | By default, it will do full pfn dump, to start with a given pfn, |
86 | page_owner supports fseek. | |
87 | ||
88 | FILE *fp = fopen("/sys/kernel/debug/page_owner", "r"); | |
89 | fseek(fp, pfn_start, SEEK_SET); | |
f7df2b1c ZW |
90 | |
91 | The ``page_owner_sort`` tool ignores ``PFN`` rows, puts the remaining rows | |
92 | in buf, uses regexp to extract the page order value, counts the times | |
57f2b54a | 93 | and pages of buf, and finally sorts them according to the parameter(s). |
f7df2b1c | 94 | |
f227e04e | 95 | See the result about who allocated each page |
18ab3078 | 96 | in the ``sorted_page_owner.txt``. General output:: |
f7df2b1c ZW |
97 | |
98 | XXX times, XXX pages: | |
99 | Page allocated via order XXX, ... | |
2e944985 | 100 | // Detailed stack |
f7df2b1c ZW |
101 | |
102 | By default, ``page_owner_sort`` is sorted according to the times of buf. | |
57f2b54a SH |
103 | If you want to sort by the page nums of buf, use the ``-m`` parameter. |
104 | The detailed parameters are: | |
105 | ||
5603f9bd | 106 | fundamental function:: |
57f2b54a SH |
107 | |
108 | Sort: | |
109 | -a Sort by memory allocation time. | |
110 | -m Sort by total memory. | |
111 | -p Sort by pid. | |
cf3c2c86 | 112 | -P Sort by tgid. |
194d52d7 | 113 | -n Sort by task command name. |
57f2b54a SH |
114 | -r Sort by memory release time. |
115 | -s Sort by stack trace. | |
116 | -t Sort by times (default). | |
ebbeae36 JY |
117 | --sort <order> Specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]]. |
118 | Choose a key from the **STANDARD FORMAT SPECIFIERS** section. The "+" is | |
119 | optional since default direction is increasing numerical or lexicographic | |
120 | order. Mixed use of abbreviated and complete-form of keys is allowed. | |
121 | ||
122 | Examples: | |
123 | ./page_owner_sort <input> <output> --sort=n,+pid,-tgid | |
124 | ./page_owner_sort <input> <output> --sort=at | |
57f2b54a | 125 | |
5603f9bd | 126 | additional function:: |
57f2b54a SH |
127 | |
128 | Cull: | |
9c8a0a8e JY |
129 | --cull <rules> |
130 | Specify culling rules.Culling syntax is key[,key[,...]].Choose a | |
131 | multi-letter key from the **STANDARD FORMAT SPECIFIERS** section. | |
132 | ||
9c8a0a8e JY |
133 | <rules> is a single argument in the form of a comma-separated list, |
134 | which offers a way to specify individual culling rules. The recognized | |
135 | keywords are described in the **STANDARD FORMAT SPECIFIERS** section below. | |
136 | <rules> can be specified by the sequence of keys k1,k2, ..., as described in | |
137 | the STANDARD SORT KEYS section below. Mixed use of abbreviated and | |
138 | complete-form of keys is allowed. | |
139 | ||
9c8a0a8e JY |
140 | Examples: |
141 | ./page_owner_sort <input> <output> --cull=stacktrace | |
142 | ./page_owner_sort <input> <output> --cull=st,pid,name | |
143 | ./page_owner_sort <input> <output> --cull=n,f | |
57f2b54a SH |
144 | |
145 | Filter: | |
59d7cb27 | 146 | -f Filter out the information of blocks whose memory has been released. |
8ea8613a JY |
147 | |
148 | Select: | |
75382a2d JY |
149 | --pid <pidlist> Select by pid. This selects the blocks whose process ID |
150 | numbers appear in <pidlist>. | |
151 | --tgid <tgidlist> Select by tgid. This selects the blocks whose thread | |
152 | group ID numbers appear in <tgidlist>. | |
153 | --name <cmdlist> Select by task command name. This selects the blocks whose | |
154 | task command name appear in <cmdlist>. | |
155 | ||
156 | <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list, | |
157 | which offers a way to specify individual selecting rules. | |
158 | ||
159 | ||
160 | Examples: | |
161 | ./page_owner_sort <input> <output> --pid=1 | |
162 | ./page_owner_sort <input> <output> --tgid=1,2,3 | |
163 | ./page_owner_sort <input> <output> --name name1,name2 | |
9c8a0a8e JY |
164 | |
165 | STANDARD FORMAT SPECIFIERS | |
166 | ========================== | |
5603f9bd | 167 | :: |
9c8a0a8e | 168 | |
d1ed51fc | 169 | For --sort option: |
ebbeae36 JY |
170 | |
171 | KEY LONG DESCRIPTION | |
172 | p pid process ID | |
173 | tg tgid thread group ID | |
174 | n name task command name | |
175 | st stacktrace stack trace of the page allocation | |
176 | T txt full text of block | |
177 | ft free_ts timestamp of the page when it was released | |
178 | at alloc_ts timestamp of the page when it was allocated | |
d1ed51fc | 179 | ator allocator memory allocator for pages |
ebbeae36 | 180 | |
d1ed51fc | 181 | For --curl option: |
ebbeae36 | 182 | |
9c8a0a8e JY |
183 | KEY LONG DESCRIPTION |
184 | p pid process ID | |
185 | tg tgid thread group ID | |
186 | n name task command name | |
187 | f free whether the page has been released or not | |
ebbeae36 | 188 | st stacktrace stack trace of the page allocation |
d1ed51fc | 189 | ator allocator memory allocator for pages |