kdump: documentation cleanups
[linux-2.6-block.git] / Documentation / sysctl / vm.txt
CommitLineData
1da177e4
LT
1Documentation for /proc/sys/vm/* kernel version 2.2.10
2 (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
3
4For general info and legal blurb, please look in README.
5
6==============================================================
7
8This file contains the documentation for the sysctl files in
9/proc/sys/vm and is valid for Linux kernel version 2.2.
10
11The files in this directory can be used to tune the operation
12of the virtual memory (VM) subsystem of the Linux kernel and
13the writeout of dirty data to disk.
14
15Default values and initialization routines for most of these
16files can be found in mm/swap.c.
17
18Currently, these files are in /proc/sys/vm:
19- overcommit_memory
20- page-cluster
21- dirty_ratio
22- dirty_background_ratio
23- dirty_expire_centisecs
24- dirty_writeback_centisecs
25- max_map_count
26- min_free_kbytes
27- laptop_mode
28- block_dump
9d0243bc 29- drop-caches
1743660b 30- zone_reclaim_mode
9614634f 31- min_unmapped_ratio
0ff38490 32- min_slab_ratio
fadd8fbd 33- panic_on_oom
fe071d7e 34- oom_kill_allocating_task
ed032189 35- mmap_min_address
f0c0b2b8 36- numa_zonelist_order
1da177e4
LT
37
38==============================================================
39
40dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
41dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode,
ed7ed365
MG
42block_dump, swap_token_timeout, drop-caches,
43hugepages_treat_as_movable:
1da177e4
LT
44
45See Documentation/filesystems/proc.txt
46
47==============================================================
48
49overcommit_memory:
50
51This value contains a flag that enables memory overcommitment.
52
53When this flag is 0, the kernel attempts to estimate the amount
54of free memory left when userspace requests more memory.
55
56When this flag is 1, the kernel pretends there is always enough
57memory until it actually runs out.
58
59When this flag is 2, the kernel uses a "never overcommit"
60policy that attempts to prevent any overcommit of memory.
61
62This feature can be very useful because there are a lot of
63programs that malloc() huge amounts of memory "just-in-case"
64and don't use much of it.
65
66The default value is 0.
67
68See Documentation/vm/overcommit-accounting and
69security/commoncap.c::cap_vm_enough_memory() for more information.
70
71==============================================================
72
73overcommit_ratio:
74
75When overcommit_memory is set to 2, the committed address
76space is not permitted to exceed swap plus this percentage
77of physical RAM. See above.
78
79==============================================================
80
81page-cluster:
82
83The Linux VM subsystem avoids excessive disk seeks by reading
84multiple pages on a page fault. The number of pages it reads
85is dependent on the amount of memory in your machine.
86
87The number of pages the kernel reads in at once is equal to
882 ^ page-cluster. Values above 2 ^ 5 don't make much sense
89for swap because we only cluster swap data in 32-page groups.
90
91==============================================================
92
93max_map_count:
94
95This file contains the maximum number of memory map areas a process
96may have. Memory map areas are used as a side-effect of calling
97malloc, directly by mmap and mprotect, and also when loading shared
98libraries.
99
100While most applications need less than a thousand maps, certain
101programs, particularly malloc debuggers, may consume lots of them,
102e.g., up to one or two maps per allocation.
103
104The default value is 65536.
105
106==============================================================
107
108min_free_kbytes:
109
110This is used to force the Linux VM to keep a minimum number
111of kilobytes free. The VM uses this number to compute a pages_min
112value for each lowmem zone in the system. Each lowmem zone gets
113a number of reserved free pages based proportionally on its size.
8ad4b1fb
RS
114
115==============================================================
116
117percpu_pagelist_fraction
118
119This is the fraction of pages at most (high mark pcp->high) in each zone that
120are allocated for each per cpu page list. The min value for this is 8. It
121means that we don't allow more than 1/8th of pages in each zone to be
122allocated in any single per_cpu_pagelist. This entry only changes the value
123of hot per cpu pagelists. User can specify a number like 100 to allocate
1241/100th of each zone to each per cpu page list.
125
126The batch value of each per cpu pagelist is also updated as a result. It is
127set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)
128
129The initial value is zero. Kernel does not use this value at boot time to set
130the high water marks for each per cpu page list.
1743660b
CL
131
132===============================================================
133
134zone_reclaim_mode:
135
5d3f083d 136Zone_reclaim_mode allows someone to set more or less aggressive approaches to
1b2ffb78
CL
137reclaim memory when a zone runs out of memory. If it is set to zero then no
138zone reclaim occurs. Allocations will be satisfied from other zones / nodes
139in the system.
140
141This is value ORed together of
142
1431 = Zone reclaim on
1442 = Zone reclaim writes dirty pages out
1454 = Zone reclaim swaps pages
146
147zone_reclaim_mode is set during bootup to 1 if it is determined that pages
148from remote zones will cause a measurable performance reduction. The
1743660b 149page allocator will then reclaim easily reusable pages (those page
1b2ffb78
CL
150cache pages that are currently not used) before allocating off node pages.
151
152It may be beneficial to switch off zone reclaim if the system is
153used for a file server and all of memory should be used for caching files
154from disk. In that case the caching effect is more important than
155data locality.
156
157Allowing zone reclaim to write out pages stops processes that are
158writing large amounts of data from dirtying pages on other nodes. Zone
159reclaim will write out dirty pages if a zone fills up and so effectively
160throttle the process. This may decrease the performance of a single process
161since it cannot use all of system memory to buffer the outgoing writes
162anymore but it preserve the memory on other nodes so that the performance
163of other processes running on other nodes will not be affected.
1743660b 164
1b2ffb78
CL
165Allowing regular swap effectively restricts allocations to the local
166node unless explicitly overridden by memory policies or cpuset
167configurations.
1743660b 168
fadd8fbd
KH
169=============================================================
170
9614634f
CL
171min_unmapped_ratio:
172
173This is available only on NUMA kernels.
174
0ff38490 175A percentage of the total pages in each zone. Zone reclaim will only
9614634f
CL
176occur if more than this percentage of pages are file backed and unmapped.
177This is to insure that a minimal amount of local pages is still available for
178file I/O even if the node is overallocated.
179
180The default is 1 percent.
181
182=============================================================
183
0ff38490
CL
184min_slab_ratio:
185
186This is available only on NUMA kernels.
187
188A percentage of the total pages in each zone. On Zone reclaim
189(fallback from the local zone occurs) slabs will be reclaimed if more
190than this percentage of pages in a zone are reclaimable slab pages.
191This insures that the slab growth stays under control even in NUMA
192systems that rarely perform global reclaim.
193
194The default is 5 percent.
195
196Note that slab reclaim is triggered in a per zone / node fashion.
197The process of reclaiming slab memory is currently not node specific
198and may not be fast.
199
200=============================================================
201
fadd8fbd
KH
202panic_on_oom
203
2b744c01 204This enables or disables panic on out-of-memory feature.
fadd8fbd 205
2b744c01
YG
206If this is set to 0, the kernel will kill some rogue process,
207called oom_killer. Usually, oom_killer can kill rogue processes and
208system will survive.
209
210If this is set to 1, the kernel panics when out-of-memory happens.
211However, if a process limits using nodes by mempolicy/cpusets,
212and those nodes become memory exhaustion status, one process
213may be killed by oom-killer. No panic occurs in this case.
214Because other nodes' memory may be free. This means system total status
215may be not fatal yet.
fadd8fbd 216
2b744c01
YG
217If this is set to 2, the kernel panics compulsorily even on the
218above-mentioned.
219
220The default value is 0.
2211 and 2 are for failover of clustering. Please select either
222according to your policy of failover.
ed032189 223
fe071d7e
DR
224=============================================================
225
226oom_kill_allocating_task
227
228This enables or disables killing the OOM-triggering task in
229out-of-memory situations.
230
231If this is set to zero, the OOM killer will scan through the entire
232tasklist and select a task based on heuristics to kill. This normally
233selects a rogue memory-hogging task that frees up a large amount of
234memory when killed.
235
236If this is set to non-zero, the OOM killer simply kills the task that
237triggered the out-of-memory condition. This avoids the expensive
238tasklist scan.
239
240If panic_on_oom is selected, it takes precedence over whatever value
241is used in oom_kill_allocating_task.
242
243The default value is 0.
244
ed032189
EP
245==============================================================
246
247mmap_min_addr
248
249This file indicates the amount of address space which a user process will
250be restricted from mmaping. Since kernel null dereference bugs could
251accidentally operate based on the information in the first couple of pages
252of memory userspace processes should not be allowed to write to them. By
253default this value is set to 0 and no protections will be enforced by the
254security module. Setting this value to something like 64k will allow the
255vast majority of applications to work correctly and provide defense in depth
256against future potential kernel bugs.
257
f0c0b2b8
KH
258==============================================================
259
260numa_zonelist_order
261
262This sysctl is only for NUMA.
263'where the memory is allocated from' is controlled by zonelists.
264(This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation.
265 you may be able to read ZONE_DMA as ZONE_DMA32...)
266
267In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following.
268ZONE_NORMAL -> ZONE_DMA
269This means that a memory allocation request for GFP_KERNEL will
270get memory from ZONE_DMA only when ZONE_NORMAL is not available.
271
272In NUMA case, you can think of following 2 types of order.
273Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL
274
275(A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
276(B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.
277
278Type(A) offers the best locality for processes on Node(0), but ZONE_DMA
279will be used before ZONE_NORMAL exhaustion. This increases possibility of
280out-of-memory(OOM) of ZONE_DMA because ZONE_DMA is tend to be small.
281
282Type(B) cannot offer the best locality but is more robust against OOM of
283the DMA zone.
284
285Type(A) is called as "Node" order. Type (B) is "Zone" order.
286
287"Node order" orders the zonelists by node, then by zone within each node.
288Specify "[Nn]ode" for zone order
289
290"Zone Order" orders the zonelists by zone type, then by node within each
291zone. Specify "[Zz]one"for zode order.
292
293Specify "[Dd]efault" to request automatic configuration. Autoconfiguration
294will select "node" order in following case.
295(1) if the DMA zone does not exist or
296(2) if the DMA zone comprises greater than 50% of the available memory or
297(3) if any node's DMA zone comprises greater than 60% of its local memory and
298 the amount of local memory is big enough.
299
300Otherwise, "zone" order will be selected. Default order is recommended unless
301this is causing problems for your system/application.