| 1 | ================================================== |
| 2 | page owner: Tracking about who allocated each page |
| 3 | ================================================== |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | |
| 8 | page owner is for the tracking about who allocated each page. |
| 9 | It can be used to debug memory leak or to find a memory hogger. |
| 10 | When allocation happens, information about allocation such as call stack |
| 11 | and order of pages is stored into certain storage for each page. |
| 12 | When we need to know about status of all pages, we can get and analyze |
| 13 | this information. |
| 14 | |
| 15 | Although we already have tracepoint for tracing page allocation/free, |
| 16 | using it for analyzing who allocate each page is rather complex. We need |
| 17 | to enlarge the trace buffer for preventing overlapping until userspace |
| 18 | program launched. And, launched program continually dump out the trace |
| 19 | buffer for later analysis and it would change system behaviour with more |
| 20 | possibility rather than just keeping it in memory, so bad for debugging. |
| 21 | |
| 22 | page owner can also be used for various purposes. For example, accurate |
| 23 | fragmentation statistics can be obtained through gfp flag information of |
| 24 | each page. It is already implemented and activated if page owner is |
| 25 | enabled. Other usages are more than welcome. |
| 26 | |
| 27 | page owner is disabled by default. So, if you'd like to use it, you need |
| 28 | to add "page_owner=on" to your boot cmdline. If the kernel is built |
| 29 | with page owner and page owner is disabled in runtime due to not enabling |
| 30 | boot option, runtime overhead is marginal. If disabled in runtime, it |
| 31 | doesn't require memory to store owner information, so there is no runtime |
| 32 | memory overhead. And, page owner inserts just two unlikely branches into |
| 33 | the page allocator hotpath and if not enabled, then allocation is done |
| 34 | like as the kernel without page owner. These two unlikely branches should |
| 35 | not affect to allocation performance, especially if the static keys jump |
| 36 | label patching functionality is available. Following is the kernel's code |
| 37 | size change due to this facility. |
| 38 | |
| 39 | Although enabling page owner increases kernel size by several kilobytes, |
| 40 | most of this code is outside page allocator and its hot path. Building |
| 41 | the kernel with page owner and turning it on if needed would be great |
| 42 | option to debug kernel memory problem. |
| 43 | |
| 44 | There is one notice that is caused by implementation detail. page owner |
| 45 | stores information into the memory from struct page extension. This memory |
| 46 | is initialized some time later than that page allocator starts in sparse |
| 47 | memory system, so, until initialization, many pages can be allocated and |
| 48 | they would have no owner information. To fix it up, these early allocated |
| 49 | pages are investigated and marked as allocated in initialization phase. |
| 50 | Although it doesn't mean that they have the right owner information, |
| 51 | at least, we can tell whether the page is allocated or not, |
| 52 | more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages |
| 53 | are caught and marked, although they are mostly allocated from struct |
| 54 | page extension feature. Anyway, after that, no page is left in |
| 55 | un-tracking state. |
| 56 | |
| 57 | Usage |
| 58 | ===== |
| 59 | |
| 60 | 1) Build user-space helper:: |
| 61 | |
| 62 | cd tools/mm |
| 63 | make page_owner_sort |
| 64 | |
| 65 | 2) Enable page owner: add "page_owner=on" to boot cmdline. |
| 66 | |
| 67 | 3) Do the job that you want to debug. |
| 68 | |
| 69 | 4) Analyze information from page owner:: |
| 70 | |
| 71 | cat /sys/kernel/debug/page_owner > page_owner_full.txt |
| 72 | ./page_owner_sort page_owner_full.txt sorted_page_owner.txt |
| 73 | |
| 74 | The general output of ``page_owner_full.txt`` is as follows:: |
| 75 | |
| 76 | Page allocated via order XXX, ... |
| 77 | PFN XXX ... |
| 78 | // Detailed stack |
| 79 | |
| 80 | Page allocated via order XXX, ... |
| 81 | PFN XXX ... |
| 82 | // Detailed stack |
| 83 | By default, it will do full pfn dump, to start with a given pfn, |
| 84 | page_owner supports fseek. |
| 85 | |
| 86 | FILE *fp = fopen("/sys/kernel/debug/page_owner", "r"); |
| 87 | fseek(fp, pfn_start, SEEK_SET); |
| 88 | |
| 89 | The ``page_owner_sort`` tool ignores ``PFN`` rows, puts the remaining rows |
| 90 | in buf, uses regexp to extract the page order value, counts the times |
| 91 | and pages of buf, and finally sorts them according to the parameter(s). |
| 92 | |
| 93 | See the result about who allocated each page |
| 94 | in the ``sorted_page_owner.txt``. General output:: |
| 95 | |
| 96 | XXX times, XXX pages: |
| 97 | Page allocated via order XXX, ... |
| 98 | // Detailed stack |
| 99 | |
| 100 | By default, ``page_owner_sort`` is sorted according to the times of buf. |
| 101 | If you want to sort by the page nums of buf, use the ``-m`` parameter. |
| 102 | The detailed parameters are: |
| 103 | |
| 104 | fundamental function:: |
| 105 | |
| 106 | Sort: |
| 107 | -a Sort by memory allocation time. |
| 108 | -m Sort by total memory. |
| 109 | -p Sort by pid. |
| 110 | -P Sort by tgid. |
| 111 | -n Sort by task command name. |
| 112 | -r Sort by memory release time. |
| 113 | -s Sort by stack trace. |
| 114 | -t Sort by times (default). |
| 115 | --sort <order> Specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]]. |
| 116 | Choose a key from the **STANDARD FORMAT SPECIFIERS** section. The "+" is |
| 117 | optional since default direction is increasing numerical or lexicographic |
| 118 | order. Mixed use of abbreviated and complete-form of keys is allowed. |
| 119 | |
| 120 | Examples: |
| 121 | ./page_owner_sort <input> <output> --sort=n,+pid,-tgid |
| 122 | ./page_owner_sort <input> <output> --sort=at |
| 123 | |
| 124 | additional function:: |
| 125 | |
| 126 | Cull: |
| 127 | --cull <rules> |
| 128 | Specify culling rules.Culling syntax is key[,key[,...]].Choose a |
| 129 | multi-letter key from the **STANDARD FORMAT SPECIFIERS** section. |
| 130 | |
| 131 | <rules> is a single argument in the form of a comma-separated list, |
| 132 | which offers a way to specify individual culling rules. The recognized |
| 133 | keywords are described in the **STANDARD FORMAT SPECIFIERS** section below. |
| 134 | <rules> can be specified by the sequence of keys k1,k2, ..., as described in |
| 135 | the STANDARD SORT KEYS section below. Mixed use of abbreviated and |
| 136 | complete-form of keys is allowed. |
| 137 | |
| 138 | Examples: |
| 139 | ./page_owner_sort <input> <output> --cull=stacktrace |
| 140 | ./page_owner_sort <input> <output> --cull=st,pid,name |
| 141 | ./page_owner_sort <input> <output> --cull=n,f |
| 142 | |
| 143 | Filter: |
| 144 | -f Filter out the information of blocks whose memory has been released. |
| 145 | |
| 146 | Select: |
| 147 | --pid <pidlist> Select by pid. This selects the blocks whose process ID |
| 148 | numbers appear in <pidlist>. |
| 149 | --tgid <tgidlist> Select by tgid. This selects the blocks whose thread |
| 150 | group ID numbers appear in <tgidlist>. |
| 151 | --name <cmdlist> Select by task command name. This selects the blocks whose |
| 152 | task command name appear in <cmdlist>. |
| 153 | |
| 154 | <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list, |
| 155 | which offers a way to specify individual selecting rules. |
| 156 | |
| 157 | |
| 158 | Examples: |
| 159 | ./page_owner_sort <input> <output> --pid=1 |
| 160 | ./page_owner_sort <input> <output> --tgid=1,2,3 |
| 161 | ./page_owner_sort <input> <output> --name name1,name2 |
| 162 | |
| 163 | STANDARD FORMAT SPECIFIERS |
| 164 | ========================== |
| 165 | :: |
| 166 | |
| 167 | For --sort option: |
| 168 | |
| 169 | KEY LONG DESCRIPTION |
| 170 | p pid process ID |
| 171 | tg tgid thread group ID |
| 172 | n name task command name |
| 173 | st stacktrace stack trace of the page allocation |
| 174 | T txt full text of block |
| 175 | ft free_ts timestamp of the page when it was released |
| 176 | at alloc_ts timestamp of the page when it was allocated |
| 177 | ator allocator memory allocator for pages |
| 178 | |
| 179 | For --cull option: |
| 180 | |
| 181 | KEY LONG DESCRIPTION |
| 182 | p pid process ID |
| 183 | tg tgid thread group ID |
| 184 | n name task command name |
| 185 | f free whether the page has been released or not |
| 186 | st stacktrace stack trace of the page allocation |
| 187 | ator allocator memory allocator for pages |