Commit | Line | Data |
---|---|---|
53b95375 MCC |
1 | =================================== |
2 | Documentation for /proc/sys/kernel/ | |
3 | =================================== | |
1da177e4 | 4 | |
021622df SK |
5 | .. See scripts/check-sysctl-docs to keep this up to date |
6 | ||
7 | ||
53b95375 MCC |
8 | Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> |
9 | ||
10 | Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> | |
11 | ||
a3cb66a5 | 12 | For general info and legal blurb, please look in :doc:`index`. |
53b95375 MCC |
13 | |
14 | ------------------------------------------------------------------------------ | |
1da177e4 LT |
15 | |
16 | This file contains documentation for the sysctl files in | |
a3cb66a5 | 17 | ``/proc/sys/kernel/`` and is valid for Linux kernel version 2.2. |
1da177e4 LT |
18 | |
19 | The files in this directory can be used to tune and monitor | |
20 | miscellaneous and general things in the operation of the Linux | |
a3cb66a5 | 21 | kernel. Since some of the files *can* be used to screw up your |
1da177e4 LT |
22 | system, it is advisable to read both documentation and source |
23 | before actually making adjustments. | |
24 | ||
25 | Currently, these files might (depending on your configuration) | |
a3cb66a5 SK |
26 | show up in ``/proc/sys/kernel``: |
27 | ||
28 | .. contents:: :local: | |
29 | ||
30 | ||
31 | acct | |
32 | ==== | |
33 | ||
34 | :: | |
1da177e4 | 35 | |
a3cb66a5 | 36 | highwater lowwater frequency |
1da177e4 LT |
37 | |
38 | If BSD-style process accounting is enabled these values control | |
39 | its behaviour. If free space on filesystem where the log lives | |
a3cb66a5 SK |
40 | goes below ``lowwater``% accounting suspends. If free space gets |
41 | above ``highwater``% accounting resumes. ``frequency`` determines | |
1da177e4 LT |
42 | how often do we check the amount of free space (value is in |
43 | seconds). Default: | |
1da177e4 | 44 | |
a3cb66a5 | 45 | :: |
807094c0 | 46 | |
a3cb66a5 | 47 | 4 2 30 |
807094c0 | 48 | |
a3cb66a5 SK |
49 | That is, suspend accounting if free space drops below 2%; resume it |
50 | if it increases to at least 4%; consider information about amount of | |
51 | free space valid for 30 seconds. | |
807094c0 | 52 | |
807094c0 | 53 | |
a3cb66a5 SK |
54 | acpi_video_flags |
55 | ================ | |
56 | ||
2bd49cb5 SK |
57 | See :doc:`/power/video`. This allows the video resume mode to be set, |
58 | in a similar fashion to the ``acpi_sleep`` kernel parameter, by | |
59 | combining the following values: | |
60 | ||
61 | = ======= | |
62 | 1 s3_bios | |
63 | 2 s3_mode | |
64 | 4 s3_beep | |
65 | = ======= | |
807094c0 | 66 | |
a3cb66a5 SK |
67 | |
68 | auto_msgmni | |
69 | =========== | |
807094c0 | 70 | |
0050ee05 MS |
71 | This variable has no effect and may be removed in future kernel |
72 | releases. Reading it always returns 0. | |
a3cb66a5 SK |
73 | Up to Linux 3.17, it enabled/disabled automatic recomputing of |
74 | `msgmni`_ | |
75 | upon memory add/remove or upon IPC namespace creation/removal. | |
0050ee05 | 76 | Echoing "1" into this file enabled msgmni automatic recomputing. |
a3cb66a5 | 77 | Echoing "0" turned it off. The default value was 1. |
807094c0 | 78 | |
d75757ab | 79 | |
a3cb66a5 SK |
80 | bootloader_type (x86 only) |
81 | ========================== | |
d75757ab PA |
82 | |
83 | This gives the bootloader type number as indicated by the bootloader, | |
84 | shifted left by 4, and OR'd with the low four bits of the bootloader | |
85 | version. The reason for this encoding is that this used to match the | |
a3cb66a5 | 86 | ``type_of_loader`` field in the kernel header; the encoding is kept for |
d75757ab PA |
87 | backwards compatibility. That is, if the full bootloader type number |
88 | is 0x15 and the full version number is 0x234, this file will contain | |
89 | the value 340 = 0x154. | |
90 | ||
a3cb66a5 SK |
91 | See the ``type_of_loader`` and ``ext_loader_type`` fields in |
92 | :doc:`/x86/boot` for additional information. | |
d75757ab | 93 | |
d75757ab | 94 | |
a3cb66a5 SK |
95 | bootloader_version (x86 only) |
96 | ============================= | |
d75757ab PA |
97 | |
98 | The complete bootloader version number. In the example above, this | |
99 | file will contain the value 564 = 0x234. | |
100 | ||
a3cb66a5 SK |
101 | See the ``type_of_loader`` and ``ext_loader_ver`` fields in |
102 | :doc:`/x86/boot` for additional information. | |
d75757ab | 103 | |
d75757ab | 104 | |
a3cb66a5 SK |
105 | cap_last_cap |
106 | ============ | |
73efc039 DB |
107 | |
108 | Highest valid capability of the running kernel. Exports | |
a3cb66a5 | 109 | ``CAP_LAST_CAP`` from the kernel. |
73efc039 | 110 | |
73efc039 | 111 | |
a3cb66a5 SK |
112 | core_pattern |
113 | ============ | |
1da177e4 | 114 | |
a3cb66a5 | 115 | ``core_pattern`` is used to specify a core dumpfile pattern name. |
53b95375 MCC |
116 | |
117 | * max length 127 characters; default value is "core" | |
a3cb66a5 SK |
118 | * ``core_pattern`` is used as a pattern template for the output |
119 | filename; certain string patterns (beginning with '%') are | |
120 | substituted with their actual values. | |
121 | * backward compatibility with ``core_uses_pid``: | |
53b95375 | 122 | |
a3cb66a5 SK |
123 | If ``core_pattern`` does not include "%p" (default does not) |
124 | and ``core_uses_pid`` is set, then .PID will be appended to | |
1da177e4 | 125 | the filename. |
53b95375 | 126 | |
a3cb66a5 SK |
127 | * corename format specifiers |
128 | ||
129 | ======== ========================================== | |
130 | %<NUL> '%' is dropped | |
131 | %% output one '%' | |
132 | %p pid | |
133 | %P global pid (init PID namespace) | |
134 | %i tid | |
135 | %I global tid (init PID namespace) | |
136 | %u uid (in initial user namespace) | |
137 | %g gid (in initial user namespace) | |
138 | %d dump mode, matches ``PR_SET_DUMPABLE`` and | |
139 | ``/proc/sys/fs/suid_dumpable`` | |
140 | %s signal number | |
141 | %t UNIX time of dump | |
142 | %h hostname | |
143 | %e executable filename (may be shortened) | |
144 | %E executable path | |
145 | %c maximum size of core file by resource limit RLIMIT_CORE | |
146 | %<OTHER> both are dropped | |
147 | ======== ========================================== | |
53b95375 MCC |
148 | |
149 | * If the first character of the pattern is a '|', the kernel will treat | |
cd081041 MU |
150 | the rest of the pattern as a command to run. The core dump will be |
151 | written to the standard input of that program instead of to a file. | |
1da177e4 | 152 | |
1da177e4 | 153 | |
a3cb66a5 SK |
154 | core_pipe_limit |
155 | =============== | |
a293980c | 156 | |
a3cb66a5 SK |
157 | This sysctl is only applicable when `core_pattern`_ is configured to |
158 | pipe core files to a user space helper (when the first character of | |
159 | ``core_pattern`` is a '|', see above). | |
160 | When collecting cores via a pipe to an application, it is occasionally | |
161 | useful for the collecting application to gather data about the | |
162 | crashing process from its ``/proc/pid`` directory. | |
163 | In order to do this safely, the kernel must wait for the collecting | |
164 | process to exit, so as not to remove the crashing processes proc files | |
165 | prematurely. | |
166 | This in turn creates the possibility that a misbehaving userspace | |
167 | collecting process can block the reaping of a crashed process simply | |
168 | by never exiting. | |
169 | This sysctl defends against that. | |
170 | It defines how many concurrent crashing processes may be piped to user | |
171 | space applications in parallel. | |
172 | If this value is exceeded, then those crashing processes above that | |
173 | value are noted via the kernel log and their cores are skipped. | |
174 | 0 is a special value, indicating that unlimited processes may be | |
175 | captured in parallel, but that no waiting will take place (i.e. the | |
176 | collecting process is not guaranteed access to ``/proc/<crashing | |
177 | pid>/``). | |
178 | This value defaults to 0. | |
179 | ||
180 | ||
181 | core_uses_pid | |
182 | ============= | |
1da177e4 LT |
183 | |
184 | The default coredump filename is "core". By setting | |
a3cb66a5 SK |
185 | ``core_uses_pid`` to 1, the coredump filename becomes core.PID. |
186 | If `core_pattern`_ does not include "%p" (default does not) | |
187 | and ``core_uses_pid`` is set, then .PID will be appended to | |
1da177e4 LT |
188 | the filename. |
189 | ||
1da177e4 | 190 | |
a3cb66a5 SK |
191 | ctrl-alt-del |
192 | ============ | |
1da177e4 LT |
193 | |
194 | When the value in this file is 0, ctrl-alt-del is trapped and | |
a3cb66a5 | 195 | sent to the ``init(1)`` program to handle a graceful restart. |
1da177e4 LT |
196 | When, however, the value is > 0, Linux's reaction to a Vulcan |
197 | Nerve Pinch (tm) will be an immediate reboot, without even | |
198 | syncing its dirty buffers. | |
199 | ||
53b95375 MCC |
200 | Note: |
201 | when a program (like dosemu) has the keyboard in 'raw' | |
202 | mode, the ctrl-alt-del is intercepted by the program before it | |
203 | ever reaches the kernel tty layer, and it's up to the program | |
204 | to decide what to do with it. | |
1da177e4 | 205 | |
1da177e4 | 206 | |
a3cb66a5 SK |
207 | dmesg_restrict |
208 | ============== | |
eaf06b24 | 209 | |
807094c0 | 210 | This toggle indicates whether unprivileged users are prevented |
a3cb66a5 SK |
211 | from using ``dmesg(8)`` to view messages from the kernel's log |
212 | buffer. | |
213 | When ``dmesg_restrict`` is set to 0 there are no restrictions. | |
214 | When ``dmesg_restrict`` is set set to 1, users must have | |
215 | ``CAP_SYSLOG`` to use ``dmesg(8)``. | |
eaf06b24 | 216 | |
a3cb66a5 SK |
217 | The kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the |
218 | default value of ``dmesg_restrict``. | |
eaf06b24 | 219 | |
eaf06b24 | 220 | |
a3cb66a5 SK |
221 | domainname & hostname |
222 | ===================== | |
1da177e4 LT |
223 | |
224 | These files can be used to set the NIS/YP domainname and the | |
225 | hostname of your box in exactly the same way as the commands | |
53b95375 MCC |
226 | domainname and hostname, i.e.:: |
227 | ||
228 | # echo "darkstar" > /proc/sys/kernel/hostname | |
229 | # echo "mydomain" > /proc/sys/kernel/domainname | |
230 | ||
231 | has the same effect as:: | |
232 | ||
233 | # hostname "darkstar" | |
234 | # domainname "mydomain" | |
1da177e4 LT |
235 | |
236 | Note, however, that the classic darkstar.frop.org has the | |
237 | hostname "darkstar" and DNS (Internet Domain Name Server) | |
238 | domainname "frop.org", not to be confused with the NIS (Network | |
239 | Information Service) or YP (Yellow Pages) domainname. These two | |
240 | domain names are in general different. For a detailed discussion | |
a3cb66a5 | 241 | see the ``hostname(1)`` man page. |
1da177e4 | 242 | |
53b95375 | 243 | |
a3cb66a5 SK |
244 | hardlockup_all_cpu_backtrace |
245 | ============================ | |
55537871 JK |
246 | |
247 | This value controls the hard lockup detector behavior when a hard | |
248 | lockup condition is detected as to whether or not to gather further | |
249 | debug information. If enabled, arch-specific all-CPU stack dumping | |
250 | will be initiated. | |
251 | ||
a3cb66a5 SK |
252 | = ============================================ |
253 | 0 Do nothing. This is the default behavior. | |
254 | 1 On detection capture more debug information. | |
255 | = ============================================ | |
53b95375 | 256 | |
1da177e4 | 257 | |
a3cb66a5 SK |
258 | hardlockup_panic |
259 | ================ | |
d22881dc SW |
260 | |
261 | This parameter can be used to control whether the kernel panics | |
262 | when a hard lockup is detected. | |
263 | ||
a3cb66a5 SK |
264 | = =========================== |
265 | 0 Don't panic on hard lockup. | |
266 | 1 Panic on hard lockup. | |
267 | = =========================== | |
d22881dc | 268 | |
a3cb66a5 SK |
269 | See :doc:`/admin-guide/lockup-watchdogs` for more information. |
270 | This can also be set using the nmi_watchdog kernel parameter. | |
d22881dc | 271 | |
d22881dc | 272 | |
a3cb66a5 SK |
273 | hotplug |
274 | ======= | |
1da177e4 LT |
275 | |
276 | Path for the hotplug policy agent. | |
a3cb66a5 | 277 | Default value is "``/sbin/hotplug``". |
1da177e4 | 278 | |
1da177e4 | 279 | |
a3cb66a5 SK |
280 | hung_task_panic |
281 | =============== | |
270750db AT |
282 | |
283 | Controls the kernel's behavior when a hung task is detected. | |
a3cb66a5 | 284 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
270750db | 285 | |
a3cb66a5 SK |
286 | = ================================================= |
287 | 0 Continue operation. This is the default behavior. | |
288 | 1 Panic immediately. | |
289 | = ================================================= | |
270750db | 290 | |
270750db | 291 | |
a3cb66a5 SK |
292 | hung_task_check_count |
293 | ===================== | |
270750db AT |
294 | |
295 | The upper bound on the number of tasks that are checked. | |
a3cb66a5 | 296 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
270750db | 297 | |
270750db | 298 | |
a3cb66a5 SK |
299 | hung_task_timeout_secs |
300 | ====================== | |
270750db | 301 | |
a2e51445 | 302 | When a task in D state did not get scheduled |
270750db | 303 | for more than this value report a warning. |
a3cb66a5 | 304 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
270750db | 305 | |
a3cb66a5 | 306 | 0 means infinite timeout, no checking is done. |
53b95375 | 307 | |
a3cb66a5 | 308 | Possible values to set are in range {0:``LONG_MAX``/``HZ``}. |
270750db | 309 | |
270750db | 310 | |
a3cb66a5 SK |
311 | hung_task_check_interval_secs |
312 | ============================= | |
a2e51445 DV |
313 | |
314 | Hung task check interval. If hung task checking is enabled | |
a3cb66a5 SK |
315 | (see `hung_task_timeout_secs`_), the check is done every |
316 | ``hung_task_check_interval_secs`` seconds. | |
317 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. | |
a2e51445 | 318 | |
a3cb66a5 SK |
319 | 0 (default) means use ``hung_task_timeout_secs`` as checking |
320 | interval. | |
a2e51445 | 321 | |
a3cb66a5 | 322 | Possible values to set are in range {0:``LONG_MAX``/``HZ``}. |
a2e51445 | 323 | |
a3cb66a5 SK |
324 | |
325 | hung_task_warnings | |
326 | ================== | |
270750db AT |
327 | |
328 | The maximum number of warnings to report. During a check interval | |
70e0ac5f AT |
329 | if a hung task is detected, this value is decreased by 1. |
330 | When this value reaches 0, no more warnings will be reported. | |
a3cb66a5 | 331 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
270750db AT |
332 | |
333 | -1: report an infinite number of warnings. | |
334 | ||
270750db | 335 | |
a3cb66a5 SK |
336 | hyperv_record_panic_msg |
337 | ======================= | |
81b18bce SM |
338 | |
339 | Controls whether the panic kmsg data should be reported to Hyper-V. | |
340 | ||
a3cb66a5 SK |
341 | = ========================================================= |
342 | 0 Do not report panic kmsg data. | |
343 | 1 Report the panic kmsg data. This is the default behavior. | |
344 | = ========================================================= | |
81b18bce | 345 | |
81b18bce | 346 | |
a3cb66a5 SK |
347 | kexec_load_disabled |
348 | =================== | |
81b18bce | 349 | |
a3cb66a5 SK |
350 | A toggle indicating if the ``kexec_load`` syscall has been disabled. |
351 | This value defaults to 0 (false: ``kexec_load`` enabled), but can be | |
352 | set to 1 (true: ``kexec_load`` disabled). | |
353 | Once true, kexec can no longer be used, and the toggle cannot be set | |
354 | back to false. | |
355 | This allows a kexec image to be loaded before disabling the syscall, | |
356 | allowing a system to set up (and later use) an image without it being | |
357 | altered. | |
358 | Generally used together with the `modules_disabled`_ sysctl. | |
7984754b | 359 | |
7984754b | 360 | |
a3cb66a5 SK |
361 | kptr_restrict |
362 | ============= | |
455cd5ab DR |
363 | |
364 | This toggle indicates whether restrictions are placed on | |
a3cb66a5 SK |
365 | exposing kernel addresses via ``/proc`` and other interfaces. |
366 | ||
367 | When ``kptr_restrict`` is set to 0 (the default) the address is hashed | |
368 | before printing. | |
369 | (This is the equivalent to %p.) | |
370 | ||
371 | When ``kptr_restrict`` is set to 1, kernel pointers printed using the | |
372 | %pK format specifier will be replaced with 0s unless the user has | |
373 | ``CAP_SYSLOG`` and effective user and group ids are equal to the real | |
374 | ids. | |
375 | This is because %pK checks are done at read() time rather than open() | |
376 | time, so if permissions are elevated between the open() and the read() | |
377 | (e.g via a setuid binary) then %pK will not leak kernel pointers to | |
378 | unprivileged users. | |
379 | Note, this is a temporary solution only. | |
380 | The correct long-term solution is to do the permission checks at | |
381 | open() time. | |
382 | Consider removing world read permissions from files that use %pK, and | |
383 | using `dmesg_restrict`_ to protect against uses of %pK in ``dmesg(8)`` | |
384 | if leaking kernel pointer values to unprivileged users is a concern. | |
385 | ||
386 | When ``kptr_restrict`` is set to 2, kernel pointers printed using | |
387 | %pK will be replaced with 0s regardless of privileges. | |
388 | ||
389 | ||
a3cb66a5 SK |
390 | modprobe |
391 | ======== | |
455cd5ab | 392 | |
0317c537 SK |
393 | This gives the full path of the modprobe command which the kernel will |
394 | use to load modules. This can be used to debug module loading | |
395 | requests:: | |
396 | ||
397 | echo '#! /bin/sh' > /tmp/modprobe | |
398 | echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe | |
399 | echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe | |
400 | chmod a+x /tmp/modprobe | |
401 | echo /tmp/modprobe > /proc/sys/kernel/modprobe | |
402 | ||
403 | This only applies when the *kernel* is requesting that the module be | |
404 | loaded; it won't have any effect if the module is being loaded | |
405 | explicitly using ``modprobe`` from userspace. | |
807094c0 | 406 | |
807094c0 | 407 | |
a3cb66a5 SK |
408 | modules_disabled |
409 | ================ | |
3d43321b KC |
410 | |
411 | A toggle value indicating if modules are allowed to be loaded | |
412 | in an otherwise modular kernel. This toggle defaults to off | |
413 | (0), but can be set true (1). Once true, modules can be | |
414 | neither loaded nor unloaded, and the toggle cannot be set back | |
a3cb66a5 SK |
415 | to false. Generally used with the `kexec_load_disabled`_ toggle. |
416 | ||
3d43321b | 417 | |
a3cb66a5 | 418 | .. _msgmni: |
3d43321b | 419 | |
a3cb66a5 SK |
420 | msgmax, msgmnb, and msgmni |
421 | ========================== | |
422 | ||
fa5b5264 SK |
423 | ``msgmax`` is the maximum size of an IPC message, in bytes. 8192 by |
424 | default (``MSGMAX``). | |
425 | ||
426 | ``msgmnb`` is the maximum size of an IPC queue, in bytes. 16384 by | |
427 | default (``MSGMNB``). | |
428 | ||
429 | ``msgmni`` is the maximum number of IPC queues. 32000 by default | |
430 | (``MSGMNI``). | |
431 | ||
a3cb66a5 SK |
432 | |
433 | msg_next_id, sem_next_id, and shm_next_id (System V IPC) | |
434 | ======================================================== | |
03f59566 SK |
435 | |
436 | These three toggles allows to specify desired id for next allocated IPC | |
437 | object: message, semaphore or shared memory respectively. | |
438 | ||
439 | By default they are equal to -1, which means generic allocation logic. | |
a3cb66a5 | 440 | Possible values to set are in range {0:``INT_MAX``}. |
03f59566 SK |
441 | |
442 | Notes: | |
53b95375 MCC |
443 | 1) kernel doesn't guarantee, that new object will have desired id. So, |
444 | it's up to userspace, how to handle an object with "wrong" id. | |
445 | 2) Toggle with non-default value will be set back to -1 by kernel after | |
446 | successful IPC object allocation. If an IPC object allocation syscall | |
447 | fails, it is undefined if the value remains unmodified or is reset to -1. | |
03f59566 | 448 | |
03f59566 | 449 | |
a3cb66a5 SK |
450 | nmi_watchdog |
451 | ============ | |
807094c0 | 452 | |
195daf66 UO |
453 | This parameter can be used to control the NMI watchdog |
454 | (i.e. the hard lockup detector) on x86 systems. | |
807094c0 | 455 | |
a3cb66a5 SK |
456 | = ================================= |
457 | 0 Disable the hard lockup detector. | |
458 | 1 Enable the hard lockup detector. | |
459 | = ================================= | |
195daf66 UO |
460 | |
461 | The hard lockup detector monitors each CPU for its ability to respond to | |
462 | timer interrupts. The mechanism utilizes CPU performance counter registers | |
463 | that are programmed to generate Non-Maskable Interrupts (NMIs) periodically | |
464 | while a CPU is busy. Hence, the alternative name 'NMI watchdog'. | |
465 | ||
466 | The NMI watchdog is disabled by default if the kernel is running as a guest | |
53b95375 | 467 | in a KVM virtual machine. This default can be overridden by adding:: |
195daf66 UO |
468 | |
469 | nmi_watchdog=1 | |
470 | ||
a3cb66a5 | 471 | to the guest kernel command line (see :doc:`/admin-guide/kernel-parameters`). |
807094c0 | 472 | |
807094c0 | 473 | |
a3cb66a5 SK |
474 | numa_balancing |
475 | ============== | |
10fc05d0 MG |
476 | |
477 | Enables/disables automatic page fault based NUMA memory | |
478 | balancing. Memory is moved automatically to nodes | |
479 | that access it often. | |
480 | ||
481 | Enables/disables automatic NUMA memory balancing. On NUMA machines, there | |
482 | is a performance penalty if remote memory is accessed by a CPU. When this | |
483 | feature is enabled the kernel samples what task thread is accessing memory | |
484 | by periodically unmapping pages and later trapping a page fault. At the | |
485 | time of the page fault, it is determined if the data being accessed should | |
486 | be migrated to a local memory node. | |
487 | ||
488 | The unmapping of pages and trapping faults incur additional overhead that | |
489 | ideally is offset by improved memory locality but there is no universal | |
490 | guarantee. If the target workload is already bound to NUMA nodes then this | |
491 | feature should be disabled. Otherwise, if the system overhead from the | |
492 | feature is too high then the rate the kernel samples for NUMA hinting | |
a3cb66a5 | 493 | faults may be controlled by the `numa_balancing_scan_period_min_ms, |
930aa174 | 494 | numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, |
a3cb66a5 SK |
495 | numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls. |
496 | ||
10fc05d0 | 497 | |
53b95375 MCC |
498 | numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb |
499 | =============================================================================================================================== | |
10fc05d0 | 500 | |
10fc05d0 MG |
501 | |
502 | Automatic NUMA balancing scans tasks address space and unmaps pages to | |
503 | detect if pages are properly placed or if the data should be migrated to a | |
504 | memory node local to where the task is running. Every "scan delay" the task | |
505 | scans the next "scan size" number of pages in its address space. When the | |
506 | end of the address space is reached the scanner restarts from the beginning. | |
507 | ||
508 | In combination, the "scan delay" and "scan size" determine the scan rate. | |
509 | When "scan delay" decreases, the scan rate increases. The scan delay and | |
510 | hence the scan rate of every task is adaptive and depends on historical | |
511 | behaviour. If pages are properly placed then the scan delay increases, | |
512 | otherwise the scan delay decreases. The "scan size" is not adaptive but | |
513 | the higher the "scan size", the higher the scan rate. | |
514 | ||
515 | Higher scan rates incur higher system overhead as page faults must be | |
516 | trapped and potentially data must be migrated. However, the higher the scan | |
517 | rate, the more quickly a tasks memory is migrated to a local node if the | |
518 | workload pattern changes and minimises performance impact due to remote | |
519 | memory accesses. These sysctls control the thresholds for scan delays and | |
520 | the number of pages scanned. | |
521 | ||
a3cb66a5 | 522 | ``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to |
598f0ec0 MG |
523 | scan a tasks virtual memory. It effectively controls the maximum scanning |
524 | rate for each task. | |
10fc05d0 | 525 | |
a3cb66a5 | 526 | ``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task |
10fc05d0 MG |
527 | when it initially forks. |
528 | ||
a3cb66a5 | 529 | ``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to |
598f0ec0 MG |
530 | scan a tasks virtual memory. It effectively controls the minimum scanning |
531 | rate for each task. | |
10fc05d0 | 532 | |
a3cb66a5 | 533 | ``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are |
10fc05d0 MG |
534 | scanned for a given scan. |
535 | ||
10fc05d0 | 536 | |
a3cb66a5 SK |
537 | osrelease, ostype & version |
538 | =========================== | |
53b95375 MCC |
539 | |
540 | :: | |
1da177e4 | 541 | |
53b95375 MCC |
542 | # cat osrelease |
543 | 2.1.88 | |
544 | # cat ostype | |
545 | Linux | |
546 | # cat version | |
547 | #5 Wed Feb 25 21:49:24 MET 1998 | |
1da177e4 | 548 | |
a3cb66a5 SK |
549 | The files ``osrelease`` and ``ostype`` should be clear enough. |
550 | ``version`` | |
1da177e4 LT |
551 | needs a little more clarification however. The '#5' means that |
552 | this is the fifth kernel built from this source base and the | |
553 | date behind it indicates the time the kernel was built. | |
554 | The only way to tune these values is to rebuild the kernel :-) | |
555 | ||
1da177e4 | 556 | |
a3cb66a5 SK |
557 | overflowgid & overflowuid |
558 | ========================= | |
1da177e4 | 559 | |
807094c0 BP |
560 | if your architecture did not always support 32-bit UIDs (i.e. arm, |
561 | i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to | |
562 | applications that use the old 16-bit UID/GID system calls, if the | |
563 | actual UID or GID would exceed 65535. | |
1da177e4 LT |
564 | |
565 | These sysctls allow you to change the value of the fixed UID and GID. | |
566 | The default is 65534. | |
567 | ||
1da177e4 | 568 | |
a3cb66a5 SK |
569 | panic |
570 | ===== | |
1da177e4 | 571 | |
404347e6 SK |
572 | The value in this file determines the behaviour of the kernel on a |
573 | panic: | |
574 | ||
575 | * if zero, the kernel will loop forever; | |
576 | * if negative, the kernel will reboot immediately; | |
577 | * if positive, the kernel will reboot after the corresponding number | |
578 | of seconds. | |
579 | ||
580 | When you use the software watchdog, the recommended setting is 60. | |
807094c0 | 581 | |
9f318e3f | 582 | |
a3cb66a5 SK |
583 | panic_on_io_nmi |
584 | =============== | |
9f318e3f HK |
585 | |
586 | Controls the kernel's behavior when a CPU receives an NMI caused by | |
587 | an IO error. | |
588 | ||
a3cb66a5 SK |
589 | = ================================================================== |
590 | 0 Try to continue operation (default). | |
591 | 1 Panic immediately. The IO error triggered an NMI. This indicates a | |
592 | serious system condition which could result in IO data corruption. | |
593 | Rather than continuing, panicking might be a better choice. Some | |
594 | servers issue this sort of NMI when the dump button is pushed, | |
595 | and you can use this option to take a crash dump. | |
596 | = ================================================================== | |
9f318e3f | 597 | |
807094c0 | 598 | |
a3cb66a5 SK |
599 | panic_on_oops |
600 | ============= | |
1da177e4 LT |
601 | |
602 | Controls the kernel's behaviour when an oops or BUG is encountered. | |
603 | ||
a3cb66a5 SK |
604 | = =================================================================== |
605 | 0 Try to continue operation. | |
606 | 1 Panic immediately. If the `panic` sysctl is also non-zero then the | |
607 | machine will be rebooted. | |
608 | = =================================================================== | |
1da177e4 | 609 | |
1da177e4 | 610 | |
a3cb66a5 SK |
611 | panic_on_stackoverflow |
612 | ====================== | |
55af7796 MH |
613 | |
614 | Controls the kernel's behavior when detecting the overflows of | |
615 | kernel, IRQ and exception stacks except a user stack. | |
a3cb66a5 | 616 | This file shows up if ``CONFIG_DEBUG_STACKOVERFLOW`` is enabled. |
55af7796 | 617 | |
a3cb66a5 SK |
618 | = ========================== |
619 | 0 Try to continue operation. | |
620 | 1 Panic immediately. | |
621 | = ========================== | |
55af7796 | 622 | |
55af7796 | 623 | |
a3cb66a5 SK |
624 | panic_on_unrecovered_nmi |
625 | ======================== | |
9e3961a0 PB |
626 | |
627 | The default Linux behaviour on an NMI of either memory or unknown is | |
628 | to continue operation. For many environments such as scientific | |
629 | computing it is preferable that the box is taken out and the error | |
630 | dealt with than an uncorrected parity/ECC error get propagated. | |
631 | ||
a3cb66a5 | 632 | A small number of systems do generate NMIs for bizarre random reasons |
9e3961a0 PB |
633 | such as power management so the default is off. That sysctl works like |
634 | the existing panic controls already in that directory. | |
635 | ||
9e3961a0 | 636 | |
a3cb66a5 SK |
637 | panic_on_warn |
638 | ============= | |
9e3961a0 PB |
639 | |
640 | Calls panic() in the WARN() path when set to 1. This is useful to avoid | |
641 | a kernel rebuild when attempting to kdump at the location of a WARN(). | |
642 | ||
a3cb66a5 SK |
643 | = ================================================ |
644 | 0 Only WARN(), default behaviour. | |
645 | 1 Call panic() after printing out WARN() location. | |
646 | = ================================================ | |
9e3961a0 | 647 | |
9e3961a0 | 648 | |
a3cb66a5 SK |
649 | panic_print |
650 | =========== | |
81c9d43f FT |
651 | |
652 | Bitmask for printing system info when panic happens. User can chose | |
653 | combination of the following bits: | |
654 | ||
a3cb66a5 | 655 | ===== ============================================ |
53b95375 MCC |
656 | bit 0 print all tasks info |
657 | bit 1 print system memory info | |
658 | bit 2 print timer info | |
a3cb66a5 | 659 | bit 3 print locks info if ``CONFIG_LOCKDEP`` is on |
53b95375 | 660 | bit 4 print ftrace buffer |
a3cb66a5 | 661 | ===== ============================================ |
53b95375 MCC |
662 | |
663 | So for example to print tasks and memory info on panic, user can:: | |
81c9d43f | 664 | |
81c9d43f FT |
665 | echo 3 > /proc/sys/kernel/panic_print |
666 | ||
81c9d43f | 667 | |
a3cb66a5 SK |
668 | panic_on_rcu_stall |
669 | ================== | |
088e9d25 DBO |
670 | |
671 | When set to 1, calls panic() after RCU stall detection messages. This | |
672 | is useful to define the root cause of RCU stalls using a vmcore. | |
673 | ||
a3cb66a5 SK |
674 | = ============================================================ |
675 | 0 Do not panic() when RCU stall takes place, default behavior. | |
676 | 1 panic() after printing RCU stall messages. | |
677 | = ============================================================ | |
088e9d25 | 678 | |
088e9d25 | 679 | |
a3cb66a5 SK |
680 | perf_cpu_time_max_percent |
681 | ========================= | |
14c63f17 DH |
682 | |
683 | Hints to the kernel how much CPU time it should be allowed to | |
684 | use to handle perf sampling events. If the perf subsystem | |
685 | is informed that its samples are exceeding this limit, it | |
686 | will drop its sampling frequency to attempt to reduce its CPU | |
687 | usage. | |
688 | ||
689 | Some perf sampling happens in NMIs. If these samples | |
690 | unexpectedly take too long to execute, the NMIs can become | |
691 | stacked up next to each other so much that nothing else is | |
692 | allowed to execute. | |
693 | ||
a3cb66a5 SK |
694 | ===== ======================================================== |
695 | 0 Disable the mechanism. Do not monitor or correct perf's | |
696 | sampling rate no matter how CPU time it takes. | |
14c63f17 | 697 | |
a3cb66a5 SK |
698 | 1-100 Attempt to throttle perf's sample rate to this |
699 | percentage of CPU. Note: the kernel calculates an | |
700 | "expected" length of each sample event. 100 here means | |
701 | 100% of that expected length. Even if this is set to | |
702 | 100, you may still see sample throttling if this | |
703 | length is exceeded. Set to 0 if you truly do not care | |
704 | how much CPU is consumed. | |
705 | ===== ======================================================== | |
14c63f17 | 706 | |
14c63f17 | 707 | |
a3cb66a5 SK |
708 | perf_event_paranoid |
709 | =================== | |
3379e0c3 BH |
710 | |
711 | Controls use of the performance events system by unprivileged | |
0161028b | 712 | users (without CAP_SYS_ADMIN). The default value is 2. |
3379e0c3 | 713 | |
53b95375 | 714 | === ================================================================== |
a3cb66a5 | 715 | -1 Allow use of (almost) all events by all users. |
53b95375 | 716 | |
a3cb66a5 SK |
717 | Ignore mlock limit after perf_event_mlock_kb without |
718 | ``CAP_IPC_LOCK``. | |
53b95375 | 719 | |
a3cb66a5 SK |
720 | >=0 Disallow ftrace function tracepoint by users without |
721 | ``CAP_SYS_ADMIN``. | |
53b95375 | 722 | |
a3cb66a5 | 723 | Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``. |
3379e0c3 | 724 | |
a3cb66a5 | 725 | >=1 Disallow CPU event access by users without ``CAP_SYS_ADMIN``. |
53b95375 | 726 | |
a3cb66a5 | 727 | >=2 Disallow kernel profiling by users without ``CAP_SYS_ADMIN``. |
53b95375 MCC |
728 | === ================================================================== |
729 | ||
55af7796 | 730 | |
a3cb66a5 SK |
731 | perf_event_max_stack |
732 | ==================== | |
c5dfd78e | 733 | |
a3cb66a5 SK |
734 | Controls maximum number of stack frames to copy for (``attr.sample_type & |
735 | PERF_SAMPLE_CALLCHAIN``) configured events, for instance, when using | |
736 | '``perf record -g``' or '``perf trace --call-graph fp``'. | |
c5dfd78e ACM |
737 | |
738 | This can only be done when no events are in use that have callchains | |
a3cb66a5 | 739 | enabled, otherwise writing to this file will return ``-EBUSY``. |
c5dfd78e ACM |
740 | |
741 | The default value is 127. | |
742 | ||
c5dfd78e | 743 | |
a3cb66a5 SK |
744 | perf_event_mlock_kb |
745 | =================== | |
ac0bb6b7 KK |
746 | |
747 | Control size of per-cpu ring buffer not counted agains mlock limit. | |
748 | ||
749 | The default value is 512 + 1 page | |
750 | ||
ac0bb6b7 | 751 | |
a3cb66a5 SK |
752 | perf_event_max_contexts_per_stack |
753 | ================================= | |
c85b0334 ACM |
754 | |
755 | Controls maximum number of stack frame context entries for | |
a3cb66a5 SK |
756 | (``attr.sample_type & PERF_SAMPLE_CALLCHAIN``) configured events, for |
757 | instance, when using '``perf record -g``' or '``perf trace --call-graph fp``'. | |
c85b0334 ACM |
758 | |
759 | This can only be done when no events are in use that have callchains | |
a3cb66a5 | 760 | enabled, otherwise writing to this file will return ``-EBUSY``. |
c85b0334 ACM |
761 | |
762 | The default value is 8. | |
763 | ||
c85b0334 | 764 | |
a3cb66a5 SK |
765 | pid_max |
766 | ======= | |
1da177e4 | 767 | |
beb7dd86 | 768 | PID allocation wrap value. When the kernel's next PID value |
1da177e4 | 769 | reaches this value, it wraps back to a minimum PID value. |
a3cb66a5 | 770 | PIDs of value ``pid_max`` or larger are not allocated. |
1da177e4 | 771 | |
1da177e4 | 772 | |
a3cb66a5 SK |
773 | ns_last_pid |
774 | =========== | |
b8f566b0 PE |
775 | |
776 | The last pid allocated in the current (the one task using this sysctl | |
777 | lives in) pid namespace. When selecting a pid for a next task on fork | |
778 | kernel tries to allocate a number starting from this one. | |
779 | ||
b8f566b0 | 780 | |
a3cb66a5 SK |
781 | powersave-nap (PPC only) |
782 | ======================== | |
1da177e4 LT |
783 | |
784 | If set, Linux-PPC will use the 'nap' mode of powersaving, | |
785 | otherwise the 'doze' mode will be used. | |
786 | ||
a3cb66a5 | 787 | |
1da177e4 LT |
788 | ============================================================== |
789 | ||
a3cb66a5 SK |
790 | printk |
791 | ====== | |
1da177e4 | 792 | |
a3cb66a5 SK |
793 | The four values in printk denote: ``console_loglevel``, |
794 | ``default_message_loglevel``, ``minimum_console_loglevel`` and | |
795 | ``default_console_loglevel`` respectively. | |
1da177e4 LT |
796 | |
797 | These values influence printk() behavior when printing or | |
a3cb66a5 | 798 | logging error messages. See '``man 2 syslog``' for more info on |
1da177e4 LT |
799 | the different loglevels. |
800 | ||
a3cb66a5 SK |
801 | ======================== ===================================== |
802 | console_loglevel messages with a higher priority than | |
803 | this will be printed to the console | |
804 | default_message_loglevel messages without an explicit priority | |
805 | will be printed with this priority | |
806 | minimum_console_loglevel minimum (highest) value to which | |
807 | console_loglevel can be set | |
808 | default_console_loglevel default value for console_loglevel | |
809 | ======================== ===================================== | |
1da177e4 | 810 | |
1da177e4 | 811 | |
a3cb66a5 SK |
812 | printk_delay |
813 | ============ | |
807094c0 | 814 | |
a3cb66a5 | 815 | Delay each printk message in ``printk_delay`` milliseconds |
807094c0 BP |
816 | |
817 | Value from 0 - 10000 is allowed. | |
818 | ||
807094c0 | 819 | |
a3cb66a5 SK |
820 | printk_ratelimit |
821 | ================ | |
1da177e4 | 822 | |
a3cb66a5 | 823 | Some warning messages are rate limited. ``printk_ratelimit`` specifies |
ca30ad85 ON |
824 | the minimum length of time between these messages (in seconds). |
825 | The default value is 5 seconds. | |
1da177e4 LT |
826 | |
827 | A value of 0 will disable rate limiting. | |
828 | ||
1da177e4 | 829 | |
a3cb66a5 SK |
830 | printk_ratelimit_burst |
831 | ====================== | |
1da177e4 | 832 | |
a3cb66a5 | 833 | While long term we enforce one message per `printk_ratelimit`_ |
1da177e4 | 834 | seconds, we do allow a burst of messages to pass through. |
a3cb66a5 | 835 | ``printk_ratelimit_burst`` specifies the number of messages we can |
1da177e4 LT |
836 | send before ratelimiting kicks in. |
837 | ||
ca30ad85 ON |
838 | The default value is 10 messages. |
839 | ||
1da177e4 | 840 | |
a3cb66a5 SK |
841 | printk_devkmsg |
842 | ============== | |
53b95375 | 843 | |
a3cb66a5 | 844 | Control the logging to ``/dev/kmsg`` from userspace: |
53b95375 | 845 | |
a3cb66a5 SK |
846 | ========= ============================================= |
847 | ratelimit default, ratelimited | |
848 | on unlimited logging to /dev/kmsg from userspace | |
849 | off logging to /dev/kmsg disabled | |
850 | ========= ============================================= | |
750afe7b | 851 | |
a3cb66a5 | 852 | The kernel command line parameter ``printk.devkmsg=`` overrides this and is |
750afe7b BP |
853 | a one-time setting until next reboot: once set, it cannot be changed by |
854 | this sysctl interface anymore. | |
855 | ||
a3cb66a5 | 856 | ============================================================== |
750afe7b | 857 | |
a3cb66a5 SK |
858 | |
859 | pty | |
860 | === | |
861 | ||
862 | See Documentation/filesystems/devpts.txt. | |
863 | ||
864 | ||
865 | randomize_va_space | |
866 | ================== | |
1ec7fd50 JK |
867 | |
868 | This option can be used to select the type of process address | |
869 | space randomization that is used in the system, for architectures | |
870 | that support this feature. | |
871 | ||
53b95375 MCC |
872 | == =========================================================================== |
873 | 0 Turn the process address space randomization off. This is the | |
b7f5ab6f HS |
874 | default for architectures that do not support this feature anyways, |
875 | and kernels that are booted with the "norandmaps" parameter. | |
1ec7fd50 | 876 | |
53b95375 | 877 | 1 Make the addresses of mmap base, stack and VDSO page randomized. |
1ec7fd50 | 878 | This, among other things, implies that shared libraries will be |
b7f5ab6f HS |
879 | loaded to random addresses. Also for PIE-linked binaries, the |
880 | location of code start is randomized. This is the default if the | |
a3cb66a5 | 881 | ``CONFIG_COMPAT_BRK`` option is enabled. |
1ec7fd50 | 882 | |
53b95375 | 883 | 2 Additionally enable heap randomization. This is the default if |
a3cb66a5 | 884 | ``CONFIG_COMPAT_BRK`` is disabled. |
b7f5ab6f HS |
885 | |
886 | There are a few legacy applications out there (such as some ancient | |
1ec7fd50 | 887 | versions of libc.so.5 from 1996) that assume that brk area starts |
b7f5ab6f HS |
888 | just after the end of the code+bss. These applications break when |
889 | start of the brk area is randomized. There are however no known | |
1ec7fd50 | 890 | non-legacy applications that would be broken this way, so for most |
b7f5ab6f HS |
891 | systems it is safe to choose full randomization. |
892 | ||
893 | Systems with ancient and/or broken binaries should be configured | |
a3cb66a5 | 894 | with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process |
b7f5ab6f | 895 | address space randomization. |
53b95375 | 896 | == =========================================================================== |
1ec7fd50 | 897 | |
1ec7fd50 | 898 | |
a3cb66a5 SK |
899 | real-root-dev |
900 | ============= | |
901 | ||
902 | See :doc:`/admin-guide/initrd`. | |
903 | ||
904 | ||
905 | reboot-cmd (SPARC only) | |
906 | ======================= | |
1da177e4 LT |
907 | |
908 | ??? This seems to be a way to give an argument to the Sparc | |
909 | ROM/Flash boot loader. Maybe to tell it what to do after | |
910 | rebooting. ??? | |
911 | ||
1da177e4 | 912 | |
a3cb66a5 SK |
913 | sched_energy_aware |
914 | ================== | |
8d5d0cfb QP |
915 | |
916 | Enables/disables Energy Aware Scheduling (EAS). EAS starts | |
917 | automatically on platforms where it can run (that is, | |
918 | platforms with asymmetric CPU topologies and having an Energy | |
919 | Model available). If your platform happens to meet the | |
920 | requirements for EAS but you do not want to use it, change | |
921 | this value to 0. | |
922 | ||
8d5d0cfb | 923 | |
a3cb66a5 SK |
924 | sched_schedstats |
925 | ================ | |
cb251765 MG |
926 | |
927 | Enables/disables scheduler statistics. Enabling this feature | |
928 | incurs a small amount of overhead in the scheduler but is | |
929 | useful for debugging and performance tuning. | |
930 | ||
cb251765 | 931 | |
a3cb66a5 SK |
932 | seccomp |
933 | ======= | |
934 | ||
935 | See :doc:`/userspace-api/seccomp_filter`. | |
936 | ||
937 | ||
938 | sg-big-buff | |
939 | =========== | |
1da177e4 LT |
940 | |
941 | This file shows the size of the generic SCSI (sg) buffer. | |
942 | You can't tune it just yet, but you could change it on | |
a3cb66a5 SK |
943 | compile time by editing ``include/scsi/sg.h`` and changing |
944 | the value of ``SG_BIG_BUFF``. | |
1da177e4 LT |
945 | |
946 | There shouldn't be any reason to change this value. If | |
947 | you can come up with one, you probably know what you | |
948 | are doing anyway :) | |
949 | ||
1da177e4 | 950 | |
a3cb66a5 SK |
951 | shmall |
952 | ====== | |
358e419f CALP |
953 | |
954 | This parameter sets the total amount of shared memory pages that | |
a3cb66a5 SK |
955 | can be used system wide. Hence, ``shmall`` should always be at least |
956 | ``ceil(shmmax/PAGE_SIZE)``. | |
358e419f | 957 | |
a3cb66a5 SK |
958 | If you are not sure what the default ``PAGE_SIZE`` is on your Linux |
959 | system, you can run the following command:: | |
358e419f | 960 | |
53b95375 | 961 | # getconf PAGE_SIZE |
358e419f | 962 | |
358e419f | 963 | |
a3cb66a5 SK |
964 | shmmax |
965 | ====== | |
1da177e4 LT |
966 | |
967 | This value can be used to query and set the run time limit | |
968 | on the maximum shared memory segment size that can be created. | |
807094c0 | 969 | Shared memory segments up to 1Gb are now supported in the |
a3cb66a5 | 970 | kernel. This value defaults to ``SHMMAX``. |
1da177e4 | 971 | |
1da177e4 | 972 | |
a3cb66a5 SK |
973 | shmmni |
974 | ====== | |
975 | ||
fa5b5264 SK |
976 | This value determines the maximum number of shared memory segments. |
977 | 4096 by default (``SHMMNI``). | |
978 | ||
a3cb66a5 SK |
979 | |
980 | shm_rmid_forced | |
981 | =============== | |
b34a6b1d VK |
982 | |
983 | Linux lets you set resource limits, including how much memory one | |
a3cb66a5 | 984 | process can consume, via ``setrlimit(2)``. Unfortunately, shared memory |
b34a6b1d VK |
985 | segments are allowed to exist without association with any process, and |
986 | thus might not be counted against any resource limits. If enabled, | |
987 | shared memory segments are automatically destroyed when their attach | |
988 | count becomes zero after a detach or a process termination. It will | |
989 | also destroy segments that were created, but never attached to, on exit | |
a3cb66a5 | 990 | from the process. The only use left for ``IPC_RMID`` is to immediately |
b34a6b1d VK |
991 | destroy an unattached segment. Of course, this breaks the way things are |
992 | defined, so some applications might stop working. Note that this | |
993 | feature will do you no good unless you also configure your resource | |
a3cb66a5 | 994 | limits (in particular, ``RLIMIT_AS`` and ``RLIMIT_NPROC``). Most systems don't |
b34a6b1d VK |
995 | need this. |
996 | ||
997 | Note that if you change this from 0 to 1, already created segments | |
998 | without users and with a dead originative process will be destroyed. | |
999 | ||
b34a6b1d | 1000 | |
a3cb66a5 SK |
1001 | sysctl_writes_strict |
1002 | ==================== | |
f4aacea2 KC |
1003 | |
1004 | Control how file position affects the behavior of updating sysctl values | |
a3cb66a5 | 1005 | via the ``/proc/sys`` interface: |
f4aacea2 | 1006 | |
53b95375 MCC |
1007 | == ====================================================================== |
1008 | -1 Legacy per-write sysctl value handling, with no printk warnings. | |
f4aacea2 KC |
1009 | Each write syscall must fully contain the sysctl value to be |
1010 | written, and multiple writes on the same sysctl file descriptor | |
1011 | will rewrite the sysctl value, regardless of file position. | |
53b95375 | 1012 | 0 Same behavior as above, but warn about processes that perform writes |
41662f5c | 1013 | to a sysctl file descriptor when the file position is not 0. |
53b95375 | 1014 | 1 (default) Respect file position when writing sysctl strings. Multiple |
41662f5c KC |
1015 | writes will append to the sysctl value buffer. Anything past the max |
1016 | length of the sysctl value buffer will be ignored. Writes to numeric | |
1017 | sysctl entries must always be at file position 0 and the value must | |
1018 | be fully contained in the buffer sent in the write syscall. | |
53b95375 | 1019 | == ====================================================================== |
f4aacea2 | 1020 | |
f4aacea2 | 1021 | |
a3cb66a5 SK |
1022 | softlockup_all_cpu_backtrace |
1023 | ============================ | |
ed235875 AT |
1024 | |
1025 | This value controls the soft lockup detector thread's behavior | |
1026 | when a soft lockup condition is detected as to whether or not | |
1027 | to gather further debug information. If enabled, each cpu will | |
1028 | be issued an NMI and instructed to capture stack trace. | |
1029 | ||
1030 | This feature is only applicable for architectures which support | |
1031 | NMI. | |
1032 | ||
a3cb66a5 SK |
1033 | = ============================================ |
1034 | 0 Do nothing. This is the default behavior. | |
1035 | 1 On detection capture more debug information. | |
1036 | = ============================================ | |
ed235875 | 1037 | |
ed235875 | 1038 | |
0a07bef6 GP |
1039 | softlockup_panic |
1040 | ================= | |
1041 | ||
1042 | This parameter can be used to control whether the kernel panics | |
1043 | when a soft lockup is detected. | |
1044 | ||
1045 | = ============================================ | |
1046 | 0 Don't panic on soft lockup. | |
1047 | 1 Panic on soft lockup. | |
1048 | = ============================================ | |
1049 | ||
1050 | This can also be set using the softlockup_panic kernel parameter. | |
1051 | ||
1052 | ||
a3cb66a5 SK |
1053 | soft_watchdog |
1054 | ============= | |
195daf66 UO |
1055 | |
1056 | This parameter can be used to control the soft lockup detector. | |
1057 | ||
a3cb66a5 SK |
1058 | = ================================= |
1059 | 0 Disable the soft lockup detector. | |
1060 | 1 Enable the soft lockup detector. | |
1061 | = ================================= | |
195daf66 UO |
1062 | |
1063 | The soft lockup detector monitors CPUs for threads that are hogging the CPUs | |
1064 | without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads | |
1065 | from running. The mechanism depends on the CPUs ability to respond to timer | |
1066 | interrupts which are needed for the 'watchdog/N' threads to be woken up by | |
a3cb66a5 | 1067 | the watchdog timer function, otherwise the NMI watchdog — if enabled — can |
195daf66 UO |
1068 | detect a hard lockup condition. |
1069 | ||
195daf66 | 1070 | |
a3cb66a5 SK |
1071 | stack_erasing |
1072 | ============= | |
964c9dff AP |
1073 | |
1074 | This parameter can be used to control kernel stack erasing at the end | |
a3cb66a5 | 1075 | of syscalls for kernels built with ``CONFIG_GCC_PLUGIN_STACKLEAK``. |
964c9dff AP |
1076 | |
1077 | That erasing reduces the information which kernel stack leak bugs | |
1078 | can reveal and blocks some uninitialized stack variable attacks. | |
1079 | The tradeoff is the performance impact: on a single CPU system kernel | |
1080 | compilation sees a 1% slowdown, other systems and workloads may vary. | |
1081 | ||
a3cb66a5 SK |
1082 | = ==================================================================== |
1083 | 0 Kernel stack erasing is disabled, STACKLEAK_METRICS are not updated. | |
1084 | 1 Kernel stack erasing is enabled (default), it is performed before | |
1085 | returning to the userspace at the end of syscalls. | |
1086 | = ==================================================================== | |
1087 | ||
1088 | ||
1089 | stop-a (SPARC only) | |
1090 | =================== | |
964c9dff | 1091 | |
a1ad4f15 SK |
1092 | Controls Stop-A: |
1093 | ||
1094 | = ==================================== | |
1095 | 0 Stop-A has no effect. | |
1096 | 1 Stop-A breaks to the PROM (default). | |
1097 | = ==================================== | |
1098 | ||
1099 | Stop-A is always enabled on a panic, so that the user can return to | |
1100 | the boot PROM. | |
1101 | ||
a3cb66a5 SK |
1102 | |
1103 | sysrq | |
1104 | ===== | |
1105 | ||
1106 | See :doc:`/admin-guide/sysrq`. | |
53b95375 | 1107 | |
964c9dff | 1108 | |
896dd323 | 1109 | tainted |
53b95375 | 1110 | ======= |
1da177e4 | 1111 | |
9c4560e5 KC |
1112 | Non-zero if the kernel has been tainted. Numeric values, which can be |
1113 | ORed together. The letters are seen in "Tainted" line of Oops reports. | |
1114 | ||
53b95375 MCC |
1115 | ====== ===== ============================================================== |
1116 | 1 `(P)` proprietary module was loaded | |
1117 | 2 `(F)` module was force loaded | |
1118 | 4 `(S)` SMP kernel oops on an officially SMP incapable processor | |
1119 | 8 `(R)` module was force unloaded | |
1120 | 16 `(M)` processor reported a Machine Check Exception (MCE) | |
1121 | 32 `(B)` bad page referenced or some unexpected page flags | |
1122 | 64 `(U)` taint requested by userspace application | |
1123 | 128 `(D)` kernel died recently, i.e. there was an OOPS or BUG | |
1124 | 256 `(A)` an ACPI table was overridden by user | |
1125 | 512 `(W)` kernel issued warning | |
1126 | 1024 `(C)` staging driver was loaded | |
1127 | 2048 `(I)` workaround for bug in platform firmware applied | |
1128 | 4096 `(O)` externally-built ("out-of-tree") module was loaded | |
1129 | 8192 `(E)` unsigned module was loaded | |
1130 | 16384 `(L)` soft lockup occurred | |
1131 | 32768 `(K)` kernel has been live patched | |
1132 | 65536 `(X)` Auxiliary taint, defined and used by for distros | |
1133 | 131072 `(T)` The kernel was built with the struct randomization plugin | |
1134 | ====== ===== ============================================================== | |
896dd323 | 1135 | |
a3cb66a5 | 1136 | See :doc:`/admin-guide/tainted-kernels` for more information. |
1da177e4 | 1137 | |
760df93e | 1138 | |
a3cb66a5 SK |
1139 | threads-max |
1140 | =========== | |
0ec62afe HS |
1141 | |
1142 | This value controls the maximum number of threads that can be created | |
a3cb66a5 | 1143 | using ``fork()``. |
0ec62afe HS |
1144 | |
1145 | During initialization the kernel sets this value such that even if the | |
1146 | maximum number of threads is created, the thread structures occupy only | |
1147 | a part (1/8th) of the available RAM pages. | |
1148 | ||
a3cb66a5 | 1149 | The minimum value that can be written to ``threads-max`` is 1. |
53b95375 | 1150 | |
a3cb66a5 SK |
1151 | The maximum value that can be written to ``threads-max`` is given by the |
1152 | constant ``FUTEX_TID_MASK`` (0x3fffffff). | |
53b95375 | 1153 | |
a3cb66a5 SK |
1154 | If a value outside of this range is written to ``threads-max`` an |
1155 | ``EINVAL`` error occurs. | |
0ec62afe | 1156 | |
0ec62afe | 1157 | |
a3cb66a5 SK |
1158 | unknown_nmi_panic |
1159 | ================= | |
760df93e | 1160 | |
807094c0 BP |
1161 | The value in this file affects behavior of handling NMI. When the |
1162 | value is non-zero, unknown NMI is trapped and then panic occurs. At | |
1163 | that time, kernel debugging information is displayed on console. | |
760df93e | 1164 | |
807094c0 BP |
1165 | NMI switch that most IA32 servers have fires unknown NMI up, for |
1166 | example. If a system hangs up, try pressing the NMI switch. | |
08825c90 | 1167 | |
08825c90 | 1168 | |
a3cb66a5 SK |
1169 | watchdog |
1170 | ======== | |
195daf66 UO |
1171 | |
1172 | This parameter can be used to disable or enable the soft lockup detector | |
a3cb66a5 | 1173 | *and* the NMI watchdog (i.e. the hard lockup detector) at the same time. |
195daf66 | 1174 | |
a3cb66a5 SK |
1175 | = ============================== |
1176 | 0 Disable both lockup detectors. | |
1177 | 1 Enable both lockup detectors. | |
1178 | = ============================== | |
195daf66 UO |
1179 | |
1180 | The soft lockup detector and the NMI watchdog can also be disabled or | |
a3cb66a5 SK |
1181 | enabled individually, using the ``soft_watchdog`` and ``nmi_watchdog`` |
1182 | parameters. | |
1183 | If the ``watchdog`` parameter is read, for example by executing:: | |
195daf66 UO |
1184 | |
1185 | cat /proc/sys/kernel/watchdog | |
1186 | ||
a3cb66a5 SK |
1187 | the output of this command (0 or 1) shows the logical OR of |
1188 | ``soft_watchdog`` and ``nmi_watchdog``. | |
195daf66 | 1189 | |
195daf66 | 1190 | |
a3cb66a5 SK |
1191 | watchdog_cpumask |
1192 | ================ | |
fe4ba3c3 CM |
1193 | |
1194 | This value can be used to control on which cpus the watchdog may run. | |
a3cb66a5 | 1195 | The default cpumask is all possible cores, but if ``NO_HZ_FULL`` is |
fe4ba3c3 | 1196 | enabled in the kernel config, and cores are specified with the |
a3cb66a5 | 1197 | ``nohz_full=`` boot argument, those cores are excluded by default. |
fe4ba3c3 CM |
1198 | Offline cores can be included in this mask, and if the core is later |
1199 | brought online, the watchdog will be started based on the mask value. | |
1200 | ||
a3cb66a5 | 1201 | Typically this value would only be touched in the ``nohz_full`` case |
fe4ba3c3 CM |
1202 | to re-enable cores that by default were not running the watchdog, |
1203 | if a kernel lockup was suspected on those cores. | |
1204 | ||
1205 | The argument value is the standard cpulist format for cpumasks, | |
1206 | so for example to enable the watchdog on cores 0, 2, 3, and 4 you | |
53b95375 | 1207 | might say:: |
fe4ba3c3 CM |
1208 | |
1209 | echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask | |
1210 | ||
fe4ba3c3 | 1211 | |
a3cb66a5 SK |
1212 | watchdog_thresh |
1213 | =============== | |
08825c90 LZ |
1214 | |
1215 | This value can be used to control the frequency of hrtimer and NMI | |
1216 | events and the soft and hard lockup thresholds. The default threshold | |
1217 | is 10 seconds. | |
1218 | ||
a3cb66a5 | 1219 | The softlockup threshold is (``2 * watchdog_thresh``). Setting this |
08825c90 | 1220 | tunable to zero will disable lockup detection altogether. |