Commit | Line | Data |
---|---|---|
53b95375 MCC |
1 | =================================== |
2 | Documentation for /proc/sys/kernel/ | |
3 | =================================== | |
1da177e4 | 4 | |
021622df SK |
5 | .. See scripts/check-sysctl-docs to keep this up to date |
6 | ||
7 | ||
53b95375 MCC |
8 | Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> |
9 | ||
10 | Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> | |
11 | ||
2793e19d MCC |
12 | For general info and legal blurb, please look in |
13 | Documentation/admin-guide/sysctl/index.rst. | |
53b95375 MCC |
14 | |
15 | ------------------------------------------------------------------------------ | |
1da177e4 LT |
16 | |
17 | This file contains documentation for the sysctl files in | |
d151a23d | 18 | ``/proc/sys/kernel/``. |
1da177e4 LT |
19 | |
20 | The files in this directory can be used to tune and monitor | |
21 | miscellaneous and general things in the operation of the Linux | |
a3cb66a5 | 22 | kernel. Since some of the files *can* be used to screw up your |
1da177e4 LT |
23 | system, it is advisable to read both documentation and source |
24 | before actually making adjustments. | |
25 | ||
26 | Currently, these files might (depending on your configuration) | |
a3cb66a5 SK |
27 | show up in ``/proc/sys/kernel``: |
28 | ||
29 | .. contents:: :local: | |
30 | ||
31 | ||
32 | acct | |
33 | ==== | |
34 | ||
35 | :: | |
1da177e4 | 36 | |
a3cb66a5 | 37 | highwater lowwater frequency |
1da177e4 LT |
38 | |
39 | If BSD-style process accounting is enabled these values control | |
40 | its behaviour. If free space on filesystem where the log lives | |
30fb8761 SK |
41 | goes below ``lowwater``\ % accounting suspends. If free space gets |
42 | above ``highwater``\ % accounting resumes. ``frequency`` determines | |
1da177e4 LT |
43 | how often do we check the amount of free space (value is in |
44 | seconds). Default: | |
1da177e4 | 45 | |
a3cb66a5 | 46 | :: |
807094c0 | 47 | |
a3cb66a5 | 48 | 4 2 30 |
807094c0 | 49 | |
a3cb66a5 SK |
50 | That is, suspend accounting if free space drops below 2%; resume it |
51 | if it increases to at least 4%; consider information about amount of | |
52 | free space valid for 30 seconds. | |
807094c0 | 53 | |
807094c0 | 54 | |
a3cb66a5 SK |
55 | acpi_video_flags |
56 | ================ | |
57 | ||
2793e19d | 58 | See Documentation/power/video.rst. This allows the video resume mode to be set, |
2bd49cb5 SK |
59 | in a similar fashion to the ``acpi_sleep`` kernel parameter, by |
60 | combining the following values: | |
61 | ||
62 | = ======= | |
63 | 1 s3_bios | |
64 | 2 s3_mode | |
65 | 4 s3_beep | |
66 | = ======= | |
807094c0 | 67 | |
bfca3dd3 PV |
68 | arch |
69 | ==== | |
70 | ||
71 | The machine hardware name, the same output as ``uname -m`` | |
72 | (e.g. ``x86_64`` or ``aarch64``). | |
a3cb66a5 SK |
73 | |
74 | auto_msgmni | |
75 | =========== | |
807094c0 | 76 | |
0050ee05 MS |
77 | This variable has no effect and may be removed in future kernel |
78 | releases. Reading it always returns 0. | |
a3cb66a5 SK |
79 | Up to Linux 3.17, it enabled/disabled automatic recomputing of |
80 | `msgmni`_ | |
81 | upon memory add/remove or upon IPC namespace creation/removal. | |
0050ee05 | 82 | Echoing "1" into this file enabled msgmni automatic recomputing. |
a3cb66a5 | 83 | Echoing "0" turned it off. The default value was 1. |
807094c0 | 84 | |
d75757ab | 85 | |
a3cb66a5 SK |
86 | bootloader_type (x86 only) |
87 | ========================== | |
d75757ab PA |
88 | |
89 | This gives the bootloader type number as indicated by the bootloader, | |
90 | shifted left by 4, and OR'd with the low four bits of the bootloader | |
91 | version. The reason for this encoding is that this used to match the | |
a3cb66a5 | 92 | ``type_of_loader`` field in the kernel header; the encoding is kept for |
d75757ab PA |
93 | backwards compatibility. That is, if the full bootloader type number |
94 | is 0x15 and the full version number is 0x234, this file will contain | |
95 | the value 340 = 0x154. | |
96 | ||
a3cb66a5 | 97 | See the ``type_of_loader`` and ``ext_loader_type`` fields in |
ff61f079 | 98 | Documentation/arch/x86/boot.rst for additional information. |
d75757ab | 99 | |
d75757ab | 100 | |
a3cb66a5 SK |
101 | bootloader_version (x86 only) |
102 | ============================= | |
d75757ab PA |
103 | |
104 | The complete bootloader version number. In the example above, this | |
105 | file will contain the value 564 = 0x234. | |
106 | ||
a3cb66a5 | 107 | See the ``type_of_loader`` and ``ext_loader_ver`` fields in |
ff61f079 | 108 | Documentation/arch/x86/boot.rst for additional information. |
d75757ab | 109 | |
d75757ab | 110 | |
5d8e5aee SK |
111 | bpf_stats_enabled |
112 | ================= | |
113 | ||
114 | Controls whether the kernel should collect statistics on BPF programs | |
115 | (total time spent running, number of times run...). Enabling | |
116 | statistics causes a slight reduction in performance on each program | |
117 | run. The statistics can be seen using ``bpftool``. | |
118 | ||
119 | = =================================== | |
120 | 0 Don't collect statistics (default). | |
121 | 1 Collect statistics. | |
122 | = =================================== | |
123 | ||
124 | ||
6bc47621 SK |
125 | cad_pid |
126 | ======= | |
127 | ||
128 | This is the pid which will be signalled on reboot (notably, by | |
129 | Ctrl-Alt-Delete). Writing a value to this file which doesn't | |
130 | correspond to a running process will result in ``-ESRCH``. | |
131 | ||
132 | See also `ctrl-alt-del`_. | |
133 | ||
134 | ||
a3cb66a5 SK |
135 | cap_last_cap |
136 | ============ | |
73efc039 DB |
137 | |
138 | Highest valid capability of the running kernel. Exports | |
a3cb66a5 | 139 | ``CAP_LAST_CAP`` from the kernel. |
73efc039 | 140 | |
73efc039 | 141 | |
aadc0cd5 SK |
142 | .. _core_pattern: |
143 | ||
a3cb66a5 SK |
144 | core_pattern |
145 | ============ | |
1da177e4 | 146 | |
a3cb66a5 | 147 | ``core_pattern`` is used to specify a core dumpfile pattern name. |
53b95375 MCC |
148 | |
149 | * max length 127 characters; default value is "core" | |
a3cb66a5 SK |
150 | * ``core_pattern`` is used as a pattern template for the output |
151 | filename; certain string patterns (beginning with '%') are | |
152 | substituted with their actual values. | |
153 | * backward compatibility with ``core_uses_pid``: | |
53b95375 | 154 | |
a3cb66a5 SK |
155 | If ``core_pattern`` does not include "%p" (default does not) |
156 | and ``core_uses_pid`` is set, then .PID will be appended to | |
1da177e4 | 157 | the filename. |
53b95375 | 158 | |
a3cb66a5 SK |
159 | * corename format specifiers |
160 | ||
161 | ======== ========================================== | |
162 | %<NUL> '%' is dropped | |
163 | %% output one '%' | |
164 | %p pid | |
165 | %P global pid (init PID namespace) | |
166 | %i tid | |
167 | %I global tid (init PID namespace) | |
168 | %u uid (in initial user namespace) | |
169 | %g gid (in initial user namespace) | |
170 | %d dump mode, matches ``PR_SET_DUMPABLE`` and | |
171 | ``/proc/sys/fs/suid_dumpable`` | |
172 | %s signal number | |
173 | %t UNIX time of dump | |
174 | %h hostname | |
f38c85f1 LW |
175 | %e executable filename (may be shortened, could be changed by prctl etc) |
176 | %f executable filename | |
a3cb66a5 SK |
177 | %E executable path |
178 | %c maximum size of core file by resource limit RLIMIT_CORE | |
8603b6f5 | 179 | %C CPU the task ran on |
a3cb66a5 SK |
180 | %<OTHER> both are dropped |
181 | ======== ========================================== | |
53b95375 MCC |
182 | |
183 | * If the first character of the pattern is a '|', the kernel will treat | |
cd081041 MU |
184 | the rest of the pattern as a command to run. The core dump will be |
185 | written to the standard input of that program instead of to a file. | |
1da177e4 | 186 | |
1da177e4 | 187 | |
a3cb66a5 SK |
188 | core_pipe_limit |
189 | =============== | |
a293980c | 190 | |
a3cb66a5 SK |
191 | This sysctl is only applicable when `core_pattern`_ is configured to |
192 | pipe core files to a user space helper (when the first character of | |
193 | ``core_pattern`` is a '|', see above). | |
194 | When collecting cores via a pipe to an application, it is occasionally | |
195 | useful for the collecting application to gather data about the | |
196 | crashing process from its ``/proc/pid`` directory. | |
197 | In order to do this safely, the kernel must wait for the collecting | |
198 | process to exit, so as not to remove the crashing processes proc files | |
199 | prematurely. | |
200 | This in turn creates the possibility that a misbehaving userspace | |
201 | collecting process can block the reaping of a crashed process simply | |
202 | by never exiting. | |
203 | This sysctl defends against that. | |
204 | It defines how many concurrent crashing processes may be piped to user | |
205 | space applications in parallel. | |
206 | If this value is exceeded, then those crashing processes above that | |
207 | value are noted via the kernel log and their cores are skipped. | |
208 | 0 is a special value, indicating that unlimited processes may be | |
209 | captured in parallel, but that no waiting will take place (i.e. the | |
210 | collecting process is not guaranteed access to ``/proc/<crashing | |
211 | pid>/``). | |
212 | This value defaults to 0. | |
213 | ||
214 | ||
39ec9eaa KC |
215 | core_sort_vma |
216 | ============= | |
217 | ||
218 | The default coredump writes VMAs in address order. By setting | |
219 | ``core_sort_vma`` to 1, VMAs will be written from smallest size | |
220 | to largest size. This is known to break at least elfutils, but | |
221 | can be handy when dealing with very large (and truncated) | |
222 | coredumps where the more useful debugging details are included | |
223 | in the smaller VMAs. | |
224 | ||
225 | ||
a3cb66a5 SK |
226 | core_uses_pid |
227 | ============= | |
1da177e4 LT |
228 | |
229 | The default coredump filename is "core". By setting | |
a3cb66a5 SK |
230 | ``core_uses_pid`` to 1, the coredump filename becomes core.PID. |
231 | If `core_pattern`_ does not include "%p" (default does not) | |
232 | and ``core_uses_pid`` is set, then .PID will be appended to | |
1da177e4 LT |
233 | the filename. |
234 | ||
1da177e4 | 235 | |
a3cb66a5 SK |
236 | ctrl-alt-del |
237 | ============ | |
1da177e4 LT |
238 | |
239 | When the value in this file is 0, ctrl-alt-del is trapped and | |
a3cb66a5 | 240 | sent to the ``init(1)`` program to handle a graceful restart. |
1da177e4 LT |
241 | When, however, the value is > 0, Linux's reaction to a Vulcan |
242 | Nerve Pinch (tm) will be an immediate reboot, without even | |
243 | syncing its dirty buffers. | |
244 | ||
53b95375 MCC |
245 | Note: |
246 | when a program (like dosemu) has the keyboard in 'raw' | |
247 | mode, the ctrl-alt-del is intercepted by the program before it | |
248 | ever reaches the kernel tty layer, and it's up to the program | |
249 | to decide what to do with it. | |
1da177e4 | 250 | |
1da177e4 | 251 | |
a3cb66a5 SK |
252 | dmesg_restrict |
253 | ============== | |
eaf06b24 | 254 | |
807094c0 | 255 | This toggle indicates whether unprivileged users are prevented |
a3cb66a5 SK |
256 | from using ``dmesg(8)`` to view messages from the kernel's log |
257 | buffer. | |
258 | When ``dmesg_restrict`` is set to 0 there are no restrictions. | |
ee74db08 | 259 | When ``dmesg_restrict`` is set to 1, users must have |
a3cb66a5 | 260 | ``CAP_SYSLOG`` to use ``dmesg(8)``. |
eaf06b24 | 261 | |
a3cb66a5 SK |
262 | The kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the |
263 | default value of ``dmesg_restrict``. | |
eaf06b24 | 264 | |
eaf06b24 | 265 | |
a3cb66a5 SK |
266 | domainname & hostname |
267 | ===================== | |
1da177e4 LT |
268 | |
269 | These files can be used to set the NIS/YP domainname and the | |
270 | hostname of your box in exactly the same way as the commands | |
53b95375 MCC |
271 | domainname and hostname, i.e.:: |
272 | ||
273 | # echo "darkstar" > /proc/sys/kernel/hostname | |
274 | # echo "mydomain" > /proc/sys/kernel/domainname | |
275 | ||
276 | has the same effect as:: | |
277 | ||
278 | # hostname "darkstar" | |
279 | # domainname "mydomain" | |
1da177e4 LT |
280 | |
281 | Note, however, that the classic darkstar.frop.org has the | |
282 | hostname "darkstar" and DNS (Internet Domain Name Server) | |
283 | domainname "frop.org", not to be confused with the NIS (Network | |
284 | Information Service) or YP (Yellow Pages) domainname. These two | |
285 | domain names are in general different. For a detailed discussion | |
a3cb66a5 | 286 | see the ``hostname(1)`` man page. |
1da177e4 | 287 | |
53b95375 | 288 | |
d75829c1 SK |
289 | firmware_config |
290 | =============== | |
291 | ||
2793e19d | 292 | See Documentation/driver-api/firmware/fallback-mechanisms.rst. |
d75829c1 SK |
293 | |
294 | The entries in this directory allow the firmware loader helper | |
295 | fallback to be controlled: | |
296 | ||
297 | * ``force_sysfs_fallback``, when set to 1, forces the use of the | |
298 | fallback; | |
299 | * ``ignore_sysfs_fallback``, when set to 1, ignores any fallback. | |
300 | ||
301 | ||
50cdae76 SK |
302 | ftrace_dump_on_oops |
303 | =================== | |
304 | ||
305 | Determines whether ``ftrace_dump()`` should be called on an oops (or | |
306 | kernel panic). This will output the contents of the ftrace buffers to | |
307 | the console. This is very useful for capturing traces that lead to | |
308 | crashes and outputting them to a serial console. | |
309 | ||
19f0423f HY |
310 | ======================= =========================================== |
311 | 0 Disabled (default). | |
312 | 1 Dump buffers of all CPUs. | |
313 | 2(orig_cpu) Dump the buffer of the CPU that triggered the | |
314 | oops. | |
315 | <instance> Dump the specific instance buffer on all CPUs. | |
316 | <instance>=2(orig_cpu) Dump the specific instance buffer on the CPU | |
317 | that triggered the oops. | |
318 | ======================= =========================================== | |
319 | ||
320 | Multiple instance dump is also supported, and instances are separated | |
321 | by commas. If global buffer also needs to be dumped, please specify | |
322 | the dump mode (1/2/orig_cpu) first for global buffer. | |
323 | ||
324 | So for example to dump "foo" and "bar" instance buffer on all CPUs, | |
325 | user can:: | |
326 | ||
327 | echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops | |
328 | ||
329 | To dump global buffer and "foo" instance buffer on all | |
330 | CPUs along with the "bar" instance buffer on CPU that triggered the | |
331 | oops, user can:: | |
332 | ||
333 | echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops | |
50cdae76 SK |
334 | |
335 | ftrace_enabled, stack_tracer_enabled | |
336 | ==================================== | |
337 | ||
2793e19d | 338 | See Documentation/trace/ftrace.rst. |
50cdae76 SK |
339 | |
340 | ||
a3cb66a5 SK |
341 | hardlockup_all_cpu_backtrace |
342 | ============================ | |
55537871 JK |
343 | |
344 | This value controls the hard lockup detector behavior when a hard | |
345 | lockup condition is detected as to whether or not to gather further | |
346 | debug information. If enabled, arch-specific all-CPU stack dumping | |
347 | will be initiated. | |
348 | ||
a3cb66a5 SK |
349 | = ============================================ |
350 | 0 Do nothing. This is the default behavior. | |
351 | 1 On detection capture more debug information. | |
352 | = ============================================ | |
53b95375 | 353 | |
1da177e4 | 354 | |
a3cb66a5 SK |
355 | hardlockup_panic |
356 | ================ | |
d22881dc SW |
357 | |
358 | This parameter can be used to control whether the kernel panics | |
359 | when a hard lockup is detected. | |
360 | ||
a3cb66a5 SK |
361 | = =========================== |
362 | 0 Don't panic on hard lockup. | |
363 | 1 Panic on hard lockup. | |
364 | = =========================== | |
d22881dc | 365 | |
2793e19d | 366 | See Documentation/admin-guide/lockup-watchdogs.rst for more information. |
a3cb66a5 | 367 | This can also be set using the nmi_watchdog kernel parameter. |
d22881dc | 368 | |
d22881dc | 369 | |
a3cb66a5 SK |
370 | hotplug |
371 | ======= | |
1da177e4 LT |
372 | |
373 | Path for the hotplug policy agent. | |
1e886090 RV |
374 | Default value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults |
375 | to the empty string. | |
376 | ||
377 | This file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most | |
378 | modern systems rely exclusively on the netlink-based uevent source and | |
379 | don't need this. | |
1da177e4 | 380 | |
1da177e4 | 381 | |
e996919b RD |
382 | hung_task_all_cpu_backtrace |
383 | =========================== | |
0ec9dc9b GP |
384 | |
385 | If this option is set, the kernel will send an NMI to all CPUs to dump | |
386 | their backtraces when a hung task is detected. This file shows up if | |
387 | CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled. | |
388 | ||
389 | 0: Won't show all CPUs backtraces when a hung task is detected. | |
390 | This is the default behavior. | |
391 | ||
392 | 1: Will non-maskably interrupt all CPUs and dump their backtraces when | |
393 | a hung task is detected. | |
394 | ||
395 | ||
a3cb66a5 SK |
396 | hung_task_panic |
397 | =============== | |
270750db AT |
398 | |
399 | Controls the kernel's behavior when a hung task is detected. | |
a3cb66a5 | 400 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
270750db | 401 | |
a3cb66a5 SK |
402 | = ================================================= |
403 | 0 Continue operation. This is the default behavior. | |
404 | 1 Panic immediately. | |
405 | = ================================================= | |
270750db | 406 | |
270750db | 407 | |
a3cb66a5 SK |
408 | hung_task_check_count |
409 | ===================== | |
270750db AT |
410 | |
411 | The upper bound on the number of tasks that are checked. | |
a3cb66a5 | 412 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
270750db | 413 | |
270750db | 414 | |
62bf7065 LY |
415 | hung_task_detect_count |
416 | ====================== | |
417 | ||
418 | Indicates the total number of tasks that have been detected as hung since | |
419 | the system boot. | |
420 | ||
421 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. | |
422 | ||
423 | ||
a3cb66a5 SK |
424 | hung_task_timeout_secs |
425 | ====================== | |
270750db | 426 | |
a2e51445 | 427 | When a task in D state did not get scheduled |
270750db | 428 | for more than this value report a warning. |
a3cb66a5 | 429 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
270750db | 430 | |
a3cb66a5 | 431 | 0 means infinite timeout, no checking is done. |
53b95375 | 432 | |
a3cb66a5 | 433 | Possible values to set are in range {0:``LONG_MAX``/``HZ``}. |
270750db | 434 | |
270750db | 435 | |
a3cb66a5 SK |
436 | hung_task_check_interval_secs |
437 | ============================= | |
a2e51445 DV |
438 | |
439 | Hung task check interval. If hung task checking is enabled | |
a3cb66a5 SK |
440 | (see `hung_task_timeout_secs`_), the check is done every |
441 | ``hung_task_check_interval_secs`` seconds. | |
442 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. | |
a2e51445 | 443 | |
a3cb66a5 SK |
444 | 0 (default) means use ``hung_task_timeout_secs`` as checking |
445 | interval. | |
a2e51445 | 446 | |
a3cb66a5 | 447 | Possible values to set are in range {0:``LONG_MAX``/``HZ``}. |
a2e51445 | 448 | |
a3cb66a5 SK |
449 | |
450 | hung_task_warnings | |
451 | ================== | |
270750db AT |
452 | |
453 | The maximum number of warnings to report. During a check interval | |
70e0ac5f AT |
454 | if a hung task is detected, this value is decreased by 1. |
455 | When this value reaches 0, no more warnings will be reported. | |
a3cb66a5 | 456 | This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
270750db AT |
457 | |
458 | -1: report an infinite number of warnings. | |
459 | ||
270750db | 460 | |
a3cb66a5 SK |
461 | hyperv_record_panic_msg |
462 | ======================= | |
81b18bce SM |
463 | |
464 | Controls whether the panic kmsg data should be reported to Hyper-V. | |
465 | ||
a3cb66a5 SK |
466 | = ========================================================= |
467 | 0 Do not report panic kmsg data. | |
468 | 1 Report the panic kmsg data. This is the default behavior. | |
469 | = ========================================================= | |
81b18bce | 470 | |
81b18bce | 471 | |
997c798e SK |
472 | ignore-unaligned-usertrap |
473 | ========================= | |
474 | ||
475 | On architectures where unaligned accesses cause traps, and where this | |
476 | feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``; | |
cbade823 | 477 | currently, ``arc``, ``parisc`` and ``loongarch``), controls whether all |
61a6fccc | 478 | unaligned traps are logged. |
997c798e SK |
479 | |
480 | = ============================================================= | |
481 | 0 Log all unaligned accesses. | |
482 | 1 Only warn the first time a process traps. This is the default | |
483 | setting. | |
484 | = ============================================================= | |
485 | ||
94483490 | 486 | See also `unaligned-trap`_. |
997c798e | 487 | |
76d3ccec MR |
488 | io_uring_disabled |
489 | ================= | |
490 | ||
491 | Prevents all processes from creating new io_uring instances. Enabling this | |
492 | shrinks the kernel's attack surface. | |
493 | ||
494 | = ====================================================================== | |
495 | 0 All processes can create io_uring instances as normal. This is the | |
496 | default setting. | |
497 | 1 io_uring creation is disabled (io_uring_setup() will fail with | |
498 | -EPERM) for unprivileged processes not in the io_uring_group group. | |
499 | Existing io_uring instances can still be used. See the | |
500 | documentation for io_uring_group for more information. | |
501 | 2 io_uring creation is disabled for all processes. io_uring_setup() | |
502 | always fails with -EPERM. Existing io_uring instances can still be | |
503 | used. | |
504 | = ====================================================================== | |
505 | ||
506 | ||
507 | io_uring_group | |
508 | ============== | |
509 | ||
510 | When io_uring_disabled is set to 1, a process must either be | |
511 | privileged (CAP_SYS_ADMIN) or be in the io_uring_group group in order | |
512 | to create an io_uring instance. If io_uring_group is set to -1 (the | |
513 | default), only processes with the CAP_SYS_ADMIN capability may create | |
514 | io_uring instances. | |
515 | ||
516 | ||
a3cb66a5 SK |
517 | kexec_load_disabled |
518 | =================== | |
81b18bce | 519 | |
06dcb013 RR |
520 | A toggle indicating if the syscalls ``kexec_load`` and |
521 | ``kexec_file_load`` have been disabled. | |
522 | This value defaults to 0 (false: ``kexec_*load`` enabled), but can be | |
523 | set to 1 (true: ``kexec_*load`` disabled). | |
a3cb66a5 SK |
524 | Once true, kexec can no longer be used, and the toggle cannot be set |
525 | back to false. | |
526 | This allows a kexec image to be loaded before disabling the syscall, | |
527 | allowing a system to set up (and later use) an image without it being | |
528 | altered. | |
529 | Generally used together with the `modules_disabled`_ sysctl. | |
7984754b | 530 | |
a42aaad2 RR |
531 | kexec_load_limit_panic |
532 | ====================== | |
533 | ||
534 | This parameter specifies a limit to the number of times the syscalls | |
535 | ``kexec_load`` and ``kexec_file_load`` can be called with a crash | |
536 | image. It can only be set with a more restrictive value than the | |
537 | current one. | |
538 | ||
539 | == ====================================================== | |
540 | -1 Unlimited calls to kexec. This is the default setting. | |
541 | N Number of calls left. | |
542 | == ====================================================== | |
543 | ||
544 | kexec_load_limit_reboot | |
545 | ======================= | |
546 | ||
547 | Similar functionality as ``kexec_load_limit_panic``, but for a normal | |
548 | image. | |
7984754b | 549 | |
a3cb66a5 SK |
550 | kptr_restrict |
551 | ============= | |
455cd5ab DR |
552 | |
553 | This toggle indicates whether restrictions are placed on | |
a3cb66a5 SK |
554 | exposing kernel addresses via ``/proc`` and other interfaces. |
555 | ||
556 | When ``kptr_restrict`` is set to 0 (the default) the address is hashed | |
557 | before printing. | |
558 | (This is the equivalent to %p.) | |
559 | ||
560 | When ``kptr_restrict`` is set to 1, kernel pointers printed using the | |
561 | %pK format specifier will be replaced with 0s unless the user has | |
562 | ``CAP_SYSLOG`` and effective user and group ids are equal to the real | |
563 | ids. | |
564 | This is because %pK checks are done at read() time rather than open() | |
565 | time, so if permissions are elevated between the open() and the read() | |
566 | (e.g via a setuid binary) then %pK will not leak kernel pointers to | |
567 | unprivileged users. | |
568 | Note, this is a temporary solution only. | |
569 | The correct long-term solution is to do the permission checks at | |
570 | open() time. | |
571 | Consider removing world read permissions from files that use %pK, and | |
572 | using `dmesg_restrict`_ to protect against uses of %pK in ``dmesg(8)`` | |
573 | if leaking kernel pointer values to unprivileged users is a concern. | |
574 | ||
575 | When ``kptr_restrict`` is set to 2, kernel pointers printed using | |
576 | %pK will be replaced with 0s regardless of privileges. | |
577 | ||
578 | ||
a3cb66a5 SK |
579 | modprobe |
580 | ======== | |
455cd5ab | 581 | |
52338dfb | 582 | The full path to the usermode helper for autoloading kernel modules, |
f4d3f25a RV |
583 | by default ``CONFIG_MODPROBE_PATH``, which in turn defaults to |
584 | "/sbin/modprobe". This binary is executed when the kernel requests a | |
585 | module. For example, if userspace passes an unknown filesystem type | |
586 | to mount(), then the kernel will automatically request the | |
587 | corresponding filesystem module by executing this usermode helper. | |
52338dfb EB |
588 | This usermode helper should insert the needed module into the kernel. |
589 | ||
590 | This sysctl only affects module autoloading. It has no effect on the | |
591 | ability to explicitly insert modules. | |
592 | ||
593 | This sysctl can be used to debug module loading requests:: | |
0317c537 SK |
594 | |
595 | echo '#! /bin/sh' > /tmp/modprobe | |
596 | echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe | |
597 | echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe | |
598 | chmod a+x /tmp/modprobe | |
599 | echo /tmp/modprobe > /proc/sys/kernel/modprobe | |
600 | ||
52338dfb EB |
601 | Alternatively, if this sysctl is set to the empty string, then module |
602 | autoloading is completely disabled. The kernel will not try to | |
603 | execute a usermode helper at all, nor will it call the | |
604 | kernel_module_request LSM hook. | |
807094c0 | 605 | |
52338dfb EB |
606 | If CONFIG_STATIC_USERMODEHELPER=y is set in the kernel configuration, |
607 | then the configured static usermode helper overrides this sysctl, | |
608 | except that the empty string is still accepted to completely disable | |
609 | module autoloading as described above. | |
807094c0 | 610 | |
a3cb66a5 SK |
611 | modules_disabled |
612 | ================ | |
3d43321b KC |
613 | |
614 | A toggle value indicating if modules are allowed to be loaded | |
615 | in an otherwise modular kernel. This toggle defaults to off | |
616 | (0), but can be set true (1). Once true, modules can be | |
617 | neither loaded nor unloaded, and the toggle cannot be set back | |
a3cb66a5 SK |
618 | to false. Generally used with the `kexec_load_disabled`_ toggle. |
619 | ||
3d43321b | 620 | |
a3cb66a5 | 621 | .. _msgmni: |
3d43321b | 622 | |
a3cb66a5 SK |
623 | msgmax, msgmnb, and msgmni |
624 | ========================== | |
625 | ||
fa5b5264 SK |
626 | ``msgmax`` is the maximum size of an IPC message, in bytes. 8192 by |
627 | default (``MSGMAX``). | |
628 | ||
629 | ``msgmnb`` is the maximum size of an IPC queue, in bytes. 16384 by | |
630 | default (``MSGMNB``). | |
631 | ||
632 | ``msgmni`` is the maximum number of IPC queues. 32000 by default | |
633 | (``MSGMNI``). | |
634 | ||
9220066e AG |
635 | All of these parameters are set per ipc namespace. The maximum number of bytes |
636 | in POSIX message queues is limited by ``RLIMIT_MSGQUEUE``. This limit is | |
637 | respected hierarchically in the each user namespace. | |
a3cb66a5 SK |
638 | |
639 | msg_next_id, sem_next_id, and shm_next_id (System V IPC) | |
640 | ======================================================== | |
03f59566 SK |
641 | |
642 | These three toggles allows to specify desired id for next allocated IPC | |
643 | object: message, semaphore or shared memory respectively. | |
644 | ||
645 | By default they are equal to -1, which means generic allocation logic. | |
a3cb66a5 | 646 | Possible values to set are in range {0:``INT_MAX``}. |
03f59566 SK |
647 | |
648 | Notes: | |
53b95375 MCC |
649 | 1) kernel doesn't guarantee, that new object will have desired id. So, |
650 | it's up to userspace, how to handle an object with "wrong" id. | |
651 | 2) Toggle with non-default value will be set back to -1 by kernel after | |
652 | successful IPC object allocation. If an IPC object allocation syscall | |
653 | fails, it is undefined if the value remains unmodified or is reset to -1. | |
03f59566 | 654 | |
17444d9b SK |
655 | |
656 | ngroups_max | |
657 | =========== | |
658 | ||
659 | Maximum number of supplementary groups, _i.e._ the maximum size which | |
660 | ``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel. | |
661 | ||
662 | ||
663 | ||
a3cb66a5 SK |
664 | nmi_watchdog |
665 | ============ | |
807094c0 | 666 | |
195daf66 UO |
667 | This parameter can be used to control the NMI watchdog |
668 | (i.e. the hard lockup detector) on x86 systems. | |
807094c0 | 669 | |
a3cb66a5 SK |
670 | = ================================= |
671 | 0 Disable the hard lockup detector. | |
672 | 1 Enable the hard lockup detector. | |
673 | = ================================= | |
195daf66 UO |
674 | |
675 | The hard lockup detector monitors each CPU for its ability to respond to | |
676 | timer interrupts. The mechanism utilizes CPU performance counter registers | |
677 | that are programmed to generate Non-Maskable Interrupts (NMIs) periodically | |
678 | while a CPU is busy. Hence, the alternative name 'NMI watchdog'. | |
679 | ||
680 | The NMI watchdog is disabled by default if the kernel is running as a guest | |
53b95375 | 681 | in a KVM virtual machine. This default can be overridden by adding:: |
195daf66 UO |
682 | |
683 | nmi_watchdog=1 | |
684 | ||
2793e19d MCC |
685 | to the guest kernel command line (see |
686 | Documentation/admin-guide/kernel-parameters.rst). | |
807094c0 | 687 | |
807094c0 | 688 | |
118b1366 LD |
689 | nmi_wd_lpm_factor (PPC only) |
690 | ============================ | |
691 | ||
692 | Factor to apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is | |
693 | set to 1). This factor represents the percentage added to | |
694 | ``watchdog_thresh`` when calculating the NMI watchdog timeout during an | |
695 | LPM. The soft lockup timeout is not impacted. | |
696 | ||
697 | A value of 0 means no change. The default value is 200 meaning the NMI | |
698 | watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10). | |
699 | ||
700 | ||
a3cb66a5 SK |
701 | numa_balancing |
702 | ============== | |
10fc05d0 | 703 | |
c574bbe9 HY |
704 | Enables/disables and configures automatic page fault based NUMA memory |
705 | balancing. Memory is moved automatically to nodes that access it often. | |
706 | The value to set can be the result of ORing the following: | |
10fc05d0 | 707 | |
c574bbe9 HY |
708 | = ================================= |
709 | 0 NUMA_BALANCING_DISABLED | |
710 | 1 NUMA_BALANCING_NORMAL | |
711 | 2 NUMA_BALANCING_MEMORY_TIERING | |
712 | = ================================= | |
713 | ||
714 | Or NUMA_BALANCING_NORMAL to optimize page placement among different | |
715 | NUMA nodes to reduce remote accessing. On NUMA machines, there is a | |
716 | performance penalty if remote memory is accessed by a CPU. When this | |
717 | feature is enabled the kernel samples what task thread is accessing | |
718 | memory by periodically unmapping pages and later trapping a page | |
719 | fault. At the time of the page fault, it is determined if the data | |
720 | being accessed should be migrated to a local memory node. | |
10fc05d0 MG |
721 | |
722 | The unmapping of pages and trapping faults incur additional overhead that | |
723 | ideally is offset by improved memory locality but there is no universal | |
724 | guarantee. If the target workload is already bound to NUMA nodes then this | |
3624ba7b | 725 | feature should be disabled. |
10fc05d0 | 726 | |
c574bbe9 HY |
727 | Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among |
728 | different types of memory (represented as different NUMA nodes) to | |
729 | place the hot pages in the fast memory. This is implemented based on | |
730 | unmapping and page fault too. | |
10fc05d0 | 731 | |
c6833e10 HY |
732 | numa_balancing_promote_rate_limit_MBps |
733 | ====================================== | |
734 | ||
735 | Too high promotion/demotion throughput between different memory types | |
736 | may hurt application latency. This can be used to rate limit the | |
737 | promotion throughput. The per-node max promotion throughput in MB/s | |
738 | will be limited to be no more than the set value. | |
739 | ||
740 | A rule of thumb is to set this to less than 1/10 of the PMEM node | |
741 | write bandwidth. | |
742 | ||
e996919b RD |
743 | oops_all_cpu_backtrace |
744 | ====================== | |
60c958d8 GP |
745 | |
746 | If this option is set, the kernel will send an NMI to all CPUs to dump | |
747 | their backtraces when an oops event occurs. It should be used as a last | |
748 | resort in case a panic cannot be triggered (to protect VMs running, for | |
749 | example) or kdump can't be collected. This file shows up if CONFIG_SMP | |
750 | is enabled. | |
751 | ||
752 | 0: Won't show all CPUs backtraces when an oops is detected. | |
753 | This is the default behavior. | |
754 | ||
755 | 1: Will non-maskably interrupt all CPUs and dump their backtraces when | |
756 | an oops event is detected. | |
757 | ||
758 | ||
d4ccd54d JH |
759 | oops_limit |
760 | ========== | |
761 | ||
762 | Number of kernel oopses after which the kernel should panic when | |
de92f657 KC |
763 | ``panic_on_oops`` is not set. Setting this to 0 disables checking |
764 | the count. Setting this to 1 has the same effect as setting | |
765 | ``panic_on_oops=1``. The default value is 10000. | |
d4ccd54d JH |
766 | |
767 | ||
a3cb66a5 SK |
768 | osrelease, ostype & version |
769 | =========================== | |
53b95375 MCC |
770 | |
771 | :: | |
1da177e4 | 772 | |
53b95375 MCC |
773 | # cat osrelease |
774 | 2.1.88 | |
775 | # cat ostype | |
776 | Linux | |
777 | # cat version | |
778 | #5 Wed Feb 25 21:49:24 MET 1998 | |
1da177e4 | 779 | |
a3cb66a5 SK |
780 | The files ``osrelease`` and ``ostype`` should be clear enough. |
781 | ``version`` | |
1da177e4 LT |
782 | needs a little more clarification however. The '#5' means that |
783 | this is the fifth kernel built from this source base and the | |
784 | date behind it indicates the time the kernel was built. | |
785 | The only way to tune these values is to rebuild the kernel :-) | |
786 | ||
1da177e4 | 787 | |
a3cb66a5 SK |
788 | overflowgid & overflowuid |
789 | ========================= | |
1da177e4 | 790 | |
807094c0 BP |
791 | if your architecture did not always support 32-bit UIDs (i.e. arm, |
792 | i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to | |
793 | applications that use the old 16-bit UID/GID system calls, if the | |
794 | actual UID or GID would exceed 65535. | |
1da177e4 LT |
795 | |
796 | These sysctls allow you to change the value of the fixed UID and GID. | |
797 | The default is 65534. | |
798 | ||
1da177e4 | 799 | |
a3cb66a5 SK |
800 | panic |
801 | ===== | |
1da177e4 | 802 | |
404347e6 SK |
803 | The value in this file determines the behaviour of the kernel on a |
804 | panic: | |
805 | ||
806 | * if zero, the kernel will loop forever; | |
807 | * if negative, the kernel will reboot immediately; | |
808 | * if positive, the kernel will reboot after the corresponding number | |
809 | of seconds. | |
810 | ||
811 | When you use the software watchdog, the recommended setting is 60. | |
807094c0 | 812 | |
9f318e3f | 813 | |
a3cb66a5 SK |
814 | panic_on_io_nmi |
815 | =============== | |
9f318e3f HK |
816 | |
817 | Controls the kernel's behavior when a CPU receives an NMI caused by | |
818 | an IO error. | |
819 | ||
a3cb66a5 SK |
820 | = ================================================================== |
821 | 0 Try to continue operation (default). | |
822 | 1 Panic immediately. The IO error triggered an NMI. This indicates a | |
823 | serious system condition which could result in IO data corruption. | |
824 | Rather than continuing, panicking might be a better choice. Some | |
825 | servers issue this sort of NMI when the dump button is pushed, | |
826 | and you can use this option to take a crash dump. | |
827 | = ================================================================== | |
9f318e3f | 828 | |
807094c0 | 829 | |
a3cb66a5 SK |
830 | panic_on_oops |
831 | ============= | |
1da177e4 LT |
832 | |
833 | Controls the kernel's behaviour when an oops or BUG is encountered. | |
834 | ||
a3cb66a5 SK |
835 | = =================================================================== |
836 | 0 Try to continue operation. | |
837 | 1 Panic immediately. If the `panic` sysctl is also non-zero then the | |
838 | machine will be rebooted. | |
839 | = =================================================================== | |
1da177e4 | 840 | |
1da177e4 | 841 | |
a3cb66a5 SK |
842 | panic_on_stackoverflow |
843 | ====================== | |
55af7796 MH |
844 | |
845 | Controls the kernel's behavior when detecting the overflows of | |
846 | kernel, IRQ and exception stacks except a user stack. | |
a3cb66a5 | 847 | This file shows up if ``CONFIG_DEBUG_STACKOVERFLOW`` is enabled. |
55af7796 | 848 | |
a3cb66a5 SK |
849 | = ========================== |
850 | 0 Try to continue operation. | |
851 | 1 Panic immediately. | |
852 | = ========================== | |
55af7796 | 853 | |
55af7796 | 854 | |
a3cb66a5 SK |
855 | panic_on_unrecovered_nmi |
856 | ======================== | |
9e3961a0 PB |
857 | |
858 | The default Linux behaviour on an NMI of either memory or unknown is | |
859 | to continue operation. For many environments such as scientific | |
860 | computing it is preferable that the box is taken out and the error | |
861 | dealt with than an uncorrected parity/ECC error get propagated. | |
862 | ||
a3cb66a5 | 863 | A small number of systems do generate NMIs for bizarre random reasons |
9e3961a0 PB |
864 | such as power management so the default is off. That sysctl works like |
865 | the existing panic controls already in that directory. | |
866 | ||
9e3961a0 | 867 | |
a3cb66a5 SK |
868 | panic_on_warn |
869 | ============= | |
9e3961a0 PB |
870 | |
871 | Calls panic() in the WARN() path when set to 1. This is useful to avoid | |
872 | a kernel rebuild when attempting to kdump at the location of a WARN(). | |
873 | ||
a3cb66a5 SK |
874 | = ================================================ |
875 | 0 Only WARN(), default behaviour. | |
876 | 1 Call panic() after printing out WARN() location. | |
877 | = ================================================ | |
9e3961a0 | 878 | |
9e3961a0 | 879 | |
a3cb66a5 SK |
880 | panic_print |
881 | =========== | |
81c9d43f FT |
882 | |
883 | Bitmask for printing system info when panic happens. User can chose | |
884 | combination of the following bits: | |
885 | ||
a3cb66a5 | 886 | ===== ============================================ |
53b95375 MCC |
887 | bit 0 print all tasks info |
888 | bit 1 print system memory info | |
889 | bit 2 print timer info | |
a3cb66a5 | 890 | bit 3 print locks info if ``CONFIG_LOCKDEP`` is on |
53b95375 | 891 | bit 4 print ftrace buffer |
a1ff1de0 | 892 | bit 5 print all printk messages in buffer |
8d470a45 | 893 | bit 6 print all CPUs backtrace (if available in the arch) |
2e3fc6ca | 894 | bit 7 print only tasks in uninterruptible (blocked) state |
a3cb66a5 | 895 | ===== ============================================ |
53b95375 MCC |
896 | |
897 | So for example to print tasks and memory info on panic, user can:: | |
81c9d43f | 898 | |
81c9d43f FT |
899 | echo 3 > /proc/sys/kernel/panic_print |
900 | ||
81c9d43f | 901 | |
a3cb66a5 SK |
902 | panic_on_rcu_stall |
903 | ================== | |
088e9d25 DBO |
904 | |
905 | When set to 1, calls panic() after RCU stall detection messages. This | |
906 | is useful to define the root cause of RCU stalls using a vmcore. | |
907 | ||
a3cb66a5 SK |
908 | = ============================================================ |
909 | 0 Do not panic() when RCU stall takes place, default behavior. | |
910 | 1 panic() after printing RCU stall messages. | |
911 | = ============================================================ | |
088e9d25 | 912 | |
81c65365 JS |
913 | max_rcu_stall_to_panic |
914 | ====================== | |
915 | ||
916 | When ``panic_on_rcu_stall`` is set to 1, this value determines the | |
917 | number of times that RCU can stall before panic() is called. | |
918 | ||
919 | When ``panic_on_rcu_stall`` is set to 0, this value is has no effect. | |
088e9d25 | 920 | |
a3cb66a5 SK |
921 | perf_cpu_time_max_percent |
922 | ========================= | |
14c63f17 DH |
923 | |
924 | Hints to the kernel how much CPU time it should be allowed to | |
925 | use to handle perf sampling events. If the perf subsystem | |
926 | is informed that its samples are exceeding this limit, it | |
927 | will drop its sampling frequency to attempt to reduce its CPU | |
928 | usage. | |
929 | ||
930 | Some perf sampling happens in NMIs. If these samples | |
931 | unexpectedly take too long to execute, the NMIs can become | |
932 | stacked up next to each other so much that nothing else is | |
933 | allowed to execute. | |
934 | ||
a3cb66a5 SK |
935 | ===== ======================================================== |
936 | 0 Disable the mechanism. Do not monitor or correct perf's | |
937 | sampling rate no matter how CPU time it takes. | |
14c63f17 | 938 | |
a3cb66a5 SK |
939 | 1-100 Attempt to throttle perf's sample rate to this |
940 | percentage of CPU. Note: the kernel calculates an | |
941 | "expected" length of each sample event. 100 here means | |
942 | 100% of that expected length. Even if this is set to | |
943 | 100, you may still see sample throttling if this | |
944 | length is exceeded. Set to 0 if you truly do not care | |
945 | how much CPU is consumed. | |
946 | ===== ======================================================== | |
14c63f17 | 947 | |
14c63f17 | 948 | |
a3cb66a5 SK |
949 | perf_event_paranoid |
950 | =================== | |
3379e0c3 BH |
951 | |
952 | Controls use of the performance events system by unprivileged | |
025b16f8 AB |
953 | users (without CAP_PERFMON). The default value is 2. |
954 | ||
955 | For backward compatibility reasons access to system performance | |
956 | monitoring and observability remains open for CAP_SYS_ADMIN | |
957 | privileged processes but CAP_SYS_ADMIN usage for secure system | |
958 | performance monitoring and observability operations is discouraged | |
959 | with respect to CAP_PERFMON use cases. | |
3379e0c3 | 960 | |
53b95375 | 961 | === ================================================================== |
a3cb66a5 | 962 | -1 Allow use of (almost) all events by all users. |
53b95375 | 963 | |
a3cb66a5 SK |
964 | Ignore mlock limit after perf_event_mlock_kb without |
965 | ``CAP_IPC_LOCK``. | |
53b95375 | 966 | |
a3cb66a5 | 967 | >=0 Disallow ftrace function tracepoint by users without |
025b16f8 | 968 | ``CAP_PERFMON``. |
53b95375 | 969 | |
025b16f8 | 970 | Disallow raw tracepoint access by users without ``CAP_PERFMON``. |
3379e0c3 | 971 | |
025b16f8 | 972 | >=1 Disallow CPU event access by users without ``CAP_PERFMON``. |
53b95375 | 973 | |
025b16f8 | 974 | >=2 Disallow kernel profiling by users without ``CAP_PERFMON``. |
53b95375 MCC |
975 | === ================================================================== |
976 | ||
55af7796 | 977 | |
a3cb66a5 SK |
978 | perf_event_max_stack |
979 | ==================== | |
c5dfd78e | 980 | |
a3cb66a5 SK |
981 | Controls maximum number of stack frames to copy for (``attr.sample_type & |
982 | PERF_SAMPLE_CALLCHAIN``) configured events, for instance, when using | |
983 | '``perf record -g``' or '``perf trace --call-graph fp``'. | |
c5dfd78e ACM |
984 | |
985 | This can only be done when no events are in use that have callchains | |
a3cb66a5 | 986 | enabled, otherwise writing to this file will return ``-EBUSY``. |
c5dfd78e ACM |
987 | |
988 | The default value is 127. | |
989 | ||
c5dfd78e | 990 | |
a3cb66a5 SK |
991 | perf_event_mlock_kb |
992 | =================== | |
ac0bb6b7 | 993 | |
751d5b27 | 994 | Control size of per-cpu ring buffer not counted against mlock limit. |
ac0bb6b7 KK |
995 | |
996 | The default value is 512 + 1 page | |
997 | ||
ac0bb6b7 | 998 | |
a3cb66a5 SK |
999 | perf_event_max_contexts_per_stack |
1000 | ================================= | |
c85b0334 ACM |
1001 | |
1002 | Controls maximum number of stack frame context entries for | |
a3cb66a5 SK |
1003 | (``attr.sample_type & PERF_SAMPLE_CALLCHAIN``) configured events, for |
1004 | instance, when using '``perf record -g``' or '``perf trace --call-graph fp``'. | |
c85b0334 ACM |
1005 | |
1006 | This can only be done when no events are in use that have callchains | |
a3cb66a5 | 1007 | enabled, otherwise writing to this file will return ``-EBUSY``. |
c85b0334 ACM |
1008 | |
1009 | The default value is 8. | |
1010 | ||
c85b0334 | 1011 | |
57972127 AG |
1012 | perf_user_access (arm64 and riscv only) |
1013 | ======================================= | |
1014 | ||
1015 | Controls user space access for reading perf event counters. | |
e2012600 | 1016 | |
57972127 AG |
1017 | arm64 |
1018 | ===== | |
e2012600 RH |
1019 | |
1020 | The default value is 0 (access disabled). | |
1021 | ||
57972127 AG |
1022 | When set to 1, user space can read performance monitor counter registers |
1023 | directly. | |
1024 | ||
e4624435 | 1025 | See Documentation/arch/arm64/perf.rst for more information. |
e2012600 | 1026 | |
57972127 AG |
1027 | riscv |
1028 | ===== | |
1029 | ||
1030 | When set to 0, user space access is disabled. | |
1031 | ||
1032 | The default value is 1, user space can read performance monitor counter | |
1033 | registers through perf, any direct access without perf intervention will trigger | |
1034 | an illegal instruction. | |
1035 | ||
1036 | When set to 2, which enables legacy mode (user space has direct access to cycle | |
1037 | and insret CSRs only). Note that this legacy value is deprecated and will be | |
1038 | removed once all user space applications are fixed. | |
1039 | ||
1040 | Note that the time CSR is always directly accessible to all modes. | |
e2012600 | 1041 | |
a3cb66a5 SK |
1042 | pid_max |
1043 | ======= | |
1da177e4 | 1044 | |
beb7dd86 | 1045 | PID allocation wrap value. When the kernel's next PID value |
1da177e4 | 1046 | reaches this value, it wraps back to a minimum PID value. |
a3cb66a5 | 1047 | PIDs of value ``pid_max`` or larger are not allocated. |
1da177e4 | 1048 | |
1da177e4 | 1049 | |
a3cb66a5 SK |
1050 | ns_last_pid |
1051 | =========== | |
b8f566b0 PE |
1052 | |
1053 | The last pid allocated in the current (the one task using this sysctl | |
1054 | lives in) pid namespace. When selecting a pid for a next task on fork | |
1055 | kernel tries to allocate a number starting from this one. | |
1056 | ||
b8f566b0 | 1057 | |
a3cb66a5 SK |
1058 | powersave-nap (PPC only) |
1059 | ======================== | |
1da177e4 LT |
1060 | |
1061 | If set, Linux-PPC will use the 'nap' mode of powersaving, | |
1062 | otherwise the 'doze' mode will be used. | |
1063 | ||
a3cb66a5 | 1064 | |
1da177e4 LT |
1065 | ============================================================== |
1066 | ||
a3cb66a5 SK |
1067 | printk |
1068 | ====== | |
1da177e4 | 1069 | |
a3cb66a5 SK |
1070 | The four values in printk denote: ``console_loglevel``, |
1071 | ``default_message_loglevel``, ``minimum_console_loglevel`` and | |
1072 | ``default_console_loglevel`` respectively. | |
1da177e4 LT |
1073 | |
1074 | These values influence printk() behavior when printing or | |
a3cb66a5 | 1075 | logging error messages. See '``man 2 syslog``' for more info on |
1da177e4 LT |
1076 | the different loglevels. |
1077 | ||
a3cb66a5 SK |
1078 | ======================== ===================================== |
1079 | console_loglevel messages with a higher priority than | |
1080 | this will be printed to the console | |
1081 | default_message_loglevel messages without an explicit priority | |
1082 | will be printed with this priority | |
1083 | minimum_console_loglevel minimum (highest) value to which | |
1084 | console_loglevel can be set | |
1085 | default_console_loglevel default value for console_loglevel | |
1086 | ======================== ===================================== | |
1da177e4 | 1087 | |
1da177e4 | 1088 | |
a3cb66a5 SK |
1089 | printk_delay |
1090 | ============ | |
807094c0 | 1091 | |
a3cb66a5 | 1092 | Delay each printk message in ``printk_delay`` milliseconds |
807094c0 BP |
1093 | |
1094 | Value from 0 - 10000 is allowed. | |
1095 | ||
807094c0 | 1096 | |
a3cb66a5 SK |
1097 | printk_ratelimit |
1098 | ================ | |
1da177e4 | 1099 | |
a3cb66a5 | 1100 | Some warning messages are rate limited. ``printk_ratelimit`` specifies |
ca30ad85 ON |
1101 | the minimum length of time between these messages (in seconds). |
1102 | The default value is 5 seconds. | |
1da177e4 LT |
1103 | |
1104 | A value of 0 will disable rate limiting. | |
1105 | ||
1da177e4 | 1106 | |
a3cb66a5 SK |
1107 | printk_ratelimit_burst |
1108 | ====================== | |
1da177e4 | 1109 | |
a3cb66a5 | 1110 | While long term we enforce one message per `printk_ratelimit`_ |
1da177e4 | 1111 | seconds, we do allow a burst of messages to pass through. |
a3cb66a5 | 1112 | ``printk_ratelimit_burst`` specifies the number of messages we can |
1da177e4 LT |
1113 | send before ratelimiting kicks in. |
1114 | ||
ca30ad85 ON |
1115 | The default value is 10 messages. |
1116 | ||
1da177e4 | 1117 | |
a3cb66a5 SK |
1118 | printk_devkmsg |
1119 | ============== | |
53b95375 | 1120 | |
a3cb66a5 | 1121 | Control the logging to ``/dev/kmsg`` from userspace: |
53b95375 | 1122 | |
a3cb66a5 SK |
1123 | ========= ============================================= |
1124 | ratelimit default, ratelimited | |
1125 | on unlimited logging to /dev/kmsg from userspace | |
1126 | off logging to /dev/kmsg disabled | |
1127 | ========= ============================================= | |
750afe7b | 1128 | |
a3cb66a5 | 1129 | The kernel command line parameter ``printk.devkmsg=`` overrides this and is |
750afe7b BP |
1130 | a one-time setting until next reboot: once set, it cannot be changed by |
1131 | this sysctl interface anymore. | |
1132 | ||
a3cb66a5 | 1133 | ============================================================== |
750afe7b | 1134 | |
a3cb66a5 SK |
1135 | |
1136 | pty | |
1137 | === | |
1138 | ||
01478b83 | 1139 | See Documentation/filesystems/devpts.rst. |
a3cb66a5 SK |
1140 | |
1141 | ||
0b227076 SK |
1142 | random |
1143 | ====== | |
1144 | ||
1145 | This is a directory, with the following entries: | |
1146 | ||
1147 | * ``boot_id``: a UUID generated the first time this is retrieved, and | |
1148 | unvarying after that; | |
1149 | ||
069c4ea6 JD |
1150 | * ``uuid``: a UUID generated every time this is retrieved (this can |
1151 | thus be used to generate UUIDs at will); | |
1152 | ||
0b227076 SK |
1153 | * ``entropy_avail``: the pool's entropy count, in bits; |
1154 | ||
1155 | * ``poolsize``: the entropy pool size, in bits; | |
1156 | ||
1157 | * ``urandom_min_reseed_secs``: obsolete (used to determine the minimum | |
489c7fc4 JD |
1158 | number of seconds between urandom pool reseeding). This file is |
1159 | writable for compatibility purposes, but writing to it has no effect | |
069c4ea6 | 1160 | on any RNG behavior; |
0b227076 SK |
1161 | |
1162 | * ``write_wakeup_threshold``: when the entropy count drops below this | |
1163 | (as a number of bits), processes waiting to write to ``/dev/random`` | |
489c7fc4 JD |
1164 | are woken up. This file is writable for compatibility purposes, but |
1165 | writing to it has no effect on any RNG behavior. | |
0b227076 | 1166 | |
0b227076 | 1167 | |
a3cb66a5 SK |
1168 | randomize_va_space |
1169 | ================== | |
1ec7fd50 JK |
1170 | |
1171 | This option can be used to select the type of process address | |
1172 | space randomization that is used in the system, for architectures | |
1173 | that support this feature. | |
1174 | ||
53b95375 MCC |
1175 | == =========================================================================== |
1176 | 0 Turn the process address space randomization off. This is the | |
b7f5ab6f HS |
1177 | default for architectures that do not support this feature anyways, |
1178 | and kernels that are booted with the "norandmaps" parameter. | |
1ec7fd50 | 1179 | |
53b95375 | 1180 | 1 Make the addresses of mmap base, stack and VDSO page randomized. |
1ec7fd50 | 1181 | This, among other things, implies that shared libraries will be |
b7f5ab6f HS |
1182 | loaded to random addresses. Also for PIE-linked binaries, the |
1183 | location of code start is randomized. This is the default if the | |
a3cb66a5 | 1184 | ``CONFIG_COMPAT_BRK`` option is enabled. |
1ec7fd50 | 1185 | |
53b95375 | 1186 | 2 Additionally enable heap randomization. This is the default if |
a3cb66a5 | 1187 | ``CONFIG_COMPAT_BRK`` is disabled. |
b7f5ab6f HS |
1188 | |
1189 | There are a few legacy applications out there (such as some ancient | |
1ec7fd50 | 1190 | versions of libc.so.5 from 1996) that assume that brk area starts |
b7f5ab6f HS |
1191 | just after the end of the code+bss. These applications break when |
1192 | start of the brk area is randomized. There are however no known | |
1ec7fd50 | 1193 | non-legacy applications that would be broken this way, so for most |
b7f5ab6f HS |
1194 | systems it is safe to choose full randomization. |
1195 | ||
1196 | Systems with ancient and/or broken binaries should be configured | |
a3cb66a5 | 1197 | with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process |
b7f5ab6f | 1198 | address space randomization. |
53b95375 | 1199 | == =========================================================================== |
1ec7fd50 | 1200 | |
1ec7fd50 | 1201 | |
a3cb66a5 SK |
1202 | real-root-dev |
1203 | ============= | |
1204 | ||
2793e19d | 1205 | See Documentation/admin-guide/initrd.rst. |
a3cb66a5 SK |
1206 | |
1207 | ||
1208 | reboot-cmd (SPARC only) | |
1209 | ======================= | |
1da177e4 LT |
1210 | |
1211 | ??? This seems to be a way to give an argument to the Sparc | |
1212 | ROM/Flash boot loader. Maybe to tell it what to do after | |
1213 | rebooting. ??? | |
1214 | ||
1da177e4 | 1215 | |
a3cb66a5 SK |
1216 | sched_energy_aware |
1217 | ================== | |
8d5d0cfb QP |
1218 | |
1219 | Enables/disables Energy Aware Scheduling (EAS). EAS starts | |
1220 | automatically on platforms where it can run (that is, | |
1221 | platforms with asymmetric CPU topologies and having an Energy | |
1222 | Model available). If your platform happens to meet the | |
1223 | requirements for EAS but you do not want to use it, change | |
8f833c82 SH |
1224 | this value to 0. On Non-EAS platforms, write operation fails and |
1225 | read doesn't return anything. | |
8d5d0cfb | 1226 | |
fcb50170 MG |
1227 | task_delayacct |
1228 | =============== | |
1229 | ||
1230 | Enables/disables task delay accounting (see | |
0f60a29c | 1231 | Documentation/accounting/delay-accounting.rst. Enabling this feature incurs |
fcb50170 MG |
1232 | a small amount of overhead in the scheduler but is useful for debugging |
1233 | and performance tuning. It is required by some tools such as iotop. | |
8d5d0cfb | 1234 | |
a3cb66a5 SK |
1235 | sched_schedstats |
1236 | ================ | |
cb251765 MG |
1237 | |
1238 | Enables/disables scheduler statistics. Enabling this feature | |
1239 | incurs a small amount of overhead in the scheduler but is | |
1240 | useful for debugging and performance tuning. | |
1241 | ||
d151a23d SK |
1242 | sched_util_clamp_min |
1243 | ==================== | |
1f73d1ab QY |
1244 | |
1245 | Max allowed *minimum* utilization. | |
1246 | ||
1247 | Default value is 1024, which is the maximum possible value. | |
1248 | ||
1249 | It means that any requested uclamp.min value cannot be greater than | |
1250 | sched_util_clamp_min, i.e., it is restricted to the range | |
1251 | [0:sched_util_clamp_min]. | |
1252 | ||
d151a23d SK |
1253 | sched_util_clamp_max |
1254 | ==================== | |
1f73d1ab QY |
1255 | |
1256 | Max allowed *maximum* utilization. | |
1257 | ||
1258 | Default value is 1024, which is the maximum possible value. | |
1259 | ||
1260 | It means that any requested uclamp.max value cannot be greater than | |
1261 | sched_util_clamp_max, i.e., it is restricted to the range | |
1262 | [0:sched_util_clamp_max]. | |
1263 | ||
d151a23d SK |
1264 | sched_util_clamp_min_rt_default |
1265 | =============================== | |
1f73d1ab QY |
1266 | |
1267 | By default Linux is tuned for performance. Which means that RT tasks always run | |
1268 | at the highest frequency and most capable (highest capacity) CPU (in | |
1269 | heterogeneous systems). | |
1270 | ||
1271 | Uclamp achieves this by setting the requested uclamp.min of all RT tasks to | |
1272 | 1024 by default, which effectively boosts the tasks to run at the highest | |
1273 | frequency and biases them to run on the biggest CPU. | |
1274 | ||
1275 | This knob allows admins to change the default behavior when uclamp is being | |
1276 | used. In battery powered devices particularly, running at the maximum | |
1277 | capacity and frequency will increase energy consumption and shorten the battery | |
1278 | life. | |
1279 | ||
1280 | This knob is only effective for RT tasks which the user hasn't modified their | |
1281 | requested uclamp.min value via sched_setattr() syscall. | |
1282 | ||
1283 | This knob will not escape the range constraint imposed by sched_util_clamp_min | |
1284 | defined above. | |
1285 | ||
1286 | For example if | |
1287 | ||
1288 | sched_util_clamp_min_rt_default = 800 | |
1289 | sched_util_clamp_min = 600 | |
1290 | ||
1291 | Then the boost will be clamped to 600 because 800 is outside of the permissible | |
1292 | range of [0:600]. This could happen for instance if a powersave mode will | |
1293 | restrict all boosts temporarily by modifying sched_util_clamp_min. As soon as | |
1294 | this restriction is lifted, the requested sched_util_clamp_min_rt_default | |
1295 | will take effect. | |
cb251765 | 1296 | |
a3cb66a5 SK |
1297 | seccomp |
1298 | ======= | |
1299 | ||
2793e19d | 1300 | See Documentation/userspace-api/seccomp_filter.rst. |
a3cb66a5 SK |
1301 | |
1302 | ||
1303 | sg-big-buff | |
1304 | =========== | |
1da177e4 LT |
1305 | |
1306 | This file shows the size of the generic SCSI (sg) buffer. | |
1307 | You can't tune it just yet, but you could change it on | |
a3cb66a5 SK |
1308 | compile time by editing ``include/scsi/sg.h`` and changing |
1309 | the value of ``SG_BIG_BUFF``. | |
1da177e4 LT |
1310 | |
1311 | There shouldn't be any reason to change this value. If | |
1312 | you can come up with one, you probably know what you | |
1313 | are doing anyway :) | |
1314 | ||
1da177e4 | 1315 | |
a3cb66a5 SK |
1316 | shmall |
1317 | ====== | |
358e419f | 1318 | |
9220066e AG |
1319 | This parameter sets the total amount of shared memory pages that can be used |
1320 | inside ipc namespace. The shared memory pages counting occurs for each ipc | |
1321 | namespace separately and is not inherited. Hence, ``shmall`` should always be at | |
1322 | least ``ceil(shmmax/PAGE_SIZE)``. | |
358e419f | 1323 | |
a3cb66a5 SK |
1324 | If you are not sure what the default ``PAGE_SIZE`` is on your Linux |
1325 | system, you can run the following command:: | |
358e419f | 1326 | |
53b95375 | 1327 | # getconf PAGE_SIZE |
358e419f | 1328 | |
9220066e AG |
1329 | To reduce or disable the ability to allocate shared memory, you must create a |
1330 | new ipc namespace, set this parameter to the required value and prohibit the | |
1331 | creation of a new ipc namespace in the current user namespace or cgroups can | |
1332 | be used. | |
358e419f | 1333 | |
a3cb66a5 SK |
1334 | shmmax |
1335 | ====== | |
1da177e4 LT |
1336 | |
1337 | This value can be used to query and set the run time limit | |
1338 | on the maximum shared memory segment size that can be created. | |
807094c0 | 1339 | Shared memory segments up to 1Gb are now supported in the |
a3cb66a5 | 1340 | kernel. This value defaults to ``SHMMAX``. |
1da177e4 | 1341 | |
1da177e4 | 1342 | |
a3cb66a5 SK |
1343 | shmmni |
1344 | ====== | |
1345 | ||
fa5b5264 SK |
1346 | This value determines the maximum number of shared memory segments. |
1347 | 4096 by default (``SHMMNI``). | |
1348 | ||
a3cb66a5 SK |
1349 | |
1350 | shm_rmid_forced | |
1351 | =============== | |
b34a6b1d VK |
1352 | |
1353 | Linux lets you set resource limits, including how much memory one | |
a3cb66a5 | 1354 | process can consume, via ``setrlimit(2)``. Unfortunately, shared memory |
b34a6b1d VK |
1355 | segments are allowed to exist without association with any process, and |
1356 | thus might not be counted against any resource limits. If enabled, | |
1357 | shared memory segments are automatically destroyed when their attach | |
1358 | count becomes zero after a detach or a process termination. It will | |
1359 | also destroy segments that were created, but never attached to, on exit | |
a3cb66a5 | 1360 | from the process. The only use left for ``IPC_RMID`` is to immediately |
b34a6b1d VK |
1361 | destroy an unattached segment. Of course, this breaks the way things are |
1362 | defined, so some applications might stop working. Note that this | |
1363 | feature will do you no good unless you also configure your resource | |
a3cb66a5 | 1364 | limits (in particular, ``RLIMIT_AS`` and ``RLIMIT_NPROC``). Most systems don't |
b34a6b1d VK |
1365 | need this. |
1366 | ||
1367 | Note that if you change this from 0 to 1, already created segments | |
1368 | without users and with a dead originative process will be destroyed. | |
1369 | ||
b34a6b1d | 1370 | |
a3cb66a5 SK |
1371 | sysctl_writes_strict |
1372 | ==================== | |
f4aacea2 KC |
1373 | |
1374 | Control how file position affects the behavior of updating sysctl values | |
a3cb66a5 | 1375 | via the ``/proc/sys`` interface: |
f4aacea2 | 1376 | |
53b95375 MCC |
1377 | == ====================================================================== |
1378 | -1 Legacy per-write sysctl value handling, with no printk warnings. | |
f4aacea2 KC |
1379 | Each write syscall must fully contain the sysctl value to be |
1380 | written, and multiple writes on the same sysctl file descriptor | |
1381 | will rewrite the sysctl value, regardless of file position. | |
53b95375 | 1382 | 0 Same behavior as above, but warn about processes that perform writes |
41662f5c | 1383 | to a sysctl file descriptor when the file position is not 0. |
53b95375 | 1384 | 1 (default) Respect file position when writing sysctl strings. Multiple |
41662f5c KC |
1385 | writes will append to the sysctl value buffer. Anything past the max |
1386 | length of the sysctl value buffer will be ignored. Writes to numeric | |
1387 | sysctl entries must always be at file position 0 and the value must | |
1388 | be fully contained in the buffer sent in the write syscall. | |
53b95375 | 1389 | == ====================================================================== |
f4aacea2 | 1390 | |
f4aacea2 | 1391 | |
a3cb66a5 SK |
1392 | softlockup_all_cpu_backtrace |
1393 | ============================ | |
ed235875 AT |
1394 | |
1395 | This value controls the soft lockup detector thread's behavior | |
1396 | when a soft lockup condition is detected as to whether or not | |
1397 | to gather further debug information. If enabled, each cpu will | |
1398 | be issued an NMI and instructed to capture stack trace. | |
1399 | ||
1400 | This feature is only applicable for architectures which support | |
1401 | NMI. | |
1402 | ||
a3cb66a5 SK |
1403 | = ============================================ |
1404 | 0 Do nothing. This is the default behavior. | |
1405 | 1 On detection capture more debug information. | |
1406 | = ============================================ | |
ed235875 | 1407 | |
ed235875 | 1408 | |
0a07bef6 GP |
1409 | softlockup_panic |
1410 | ================= | |
1411 | ||
1412 | This parameter can be used to control whether the kernel panics | |
1413 | when a soft lockup is detected. | |
1414 | ||
1415 | = ============================================ | |
1416 | 0 Don't panic on soft lockup. | |
1417 | 1 Panic on soft lockup. | |
1418 | = ============================================ | |
1419 | ||
1420 | This can also be set using the softlockup_panic kernel parameter. | |
1421 | ||
1422 | ||
a3cb66a5 SK |
1423 | soft_watchdog |
1424 | ============= | |
195daf66 UO |
1425 | |
1426 | This parameter can be used to control the soft lockup detector. | |
1427 | ||
a3cb66a5 SK |
1428 | = ================================= |
1429 | 0 Disable the soft lockup detector. | |
1430 | 1 Enable the soft lockup detector. | |
1431 | = ================================= | |
195daf66 UO |
1432 | |
1433 | The soft lockup detector monitors CPUs for threads that are hogging the CPUs | |
256f7a67 WQ |
1434 | without rescheduling voluntarily, and thus prevent the 'migration/N' threads |
1435 | from running, causing the watchdog work fail to execute. The mechanism depends | |
1436 | on the CPUs ability to respond to timer interrupts which are needed for the | |
1437 | watchdog work to be queued by the watchdog timer function, otherwise the NMI | |
1438 | watchdog — if enabled — can detect a hard lockup condition. | |
195daf66 | 1439 | |
195daf66 | 1440 | |
72720937 GP |
1441 | split_lock_mitigate (x86 only) |
1442 | ============================== | |
1443 | ||
1444 | On x86, each "split lock" imposes a system-wide performance penalty. On larger | |
1445 | systems, large numbers of split locks from unprivileged users can result in | |
1446 | denials of service to well-behaved and potentially more important users. | |
1447 | ||
1448 | The kernel mitigates these bad users by detecting split locks and imposing | |
1449 | penalties: forcing them to wait and only allowing one core to execute split | |
1450 | locks at a time. | |
1451 | ||
1452 | These mitigations can make those bad applications unbearably slow. Setting | |
1453 | split_lock_mitigate=0 may restore some application performance, but will also | |
1454 | increase system exposure to denial of service attacks from split lock users. | |
1455 | ||
1456 | = =================================================================== | |
1457 | 0 Disable the mitigation mode - just warns the split lock on kernel log | |
1458 | and exposes the system to denials of service from the split lockers. | |
1459 | 1 Enable the mitigation mode (this is the default) - penalizes the split | |
1460 | lockers with intentional performance degradation. | |
1461 | = =================================================================== | |
1462 | ||
1463 | ||
a3cb66a5 SK |
1464 | stack_erasing |
1465 | ============= | |
964c9dff AP |
1466 | |
1467 | This parameter can be used to control kernel stack erasing at the end | |
a3cb66a5 | 1468 | of syscalls for kernels built with ``CONFIG_GCC_PLUGIN_STACKLEAK``. |
964c9dff AP |
1469 | |
1470 | That erasing reduces the information which kernel stack leak bugs | |
1471 | can reveal and blocks some uninitialized stack variable attacks. | |
1472 | The tradeoff is the performance impact: on a single CPU system kernel | |
1473 | compilation sees a 1% slowdown, other systems and workloads may vary. | |
1474 | ||
a3cb66a5 SK |
1475 | = ==================================================================== |
1476 | 0 Kernel stack erasing is disabled, STACKLEAK_METRICS are not updated. | |
1477 | 1 Kernel stack erasing is enabled (default), it is performed before | |
1478 | returning to the userspace at the end of syscalls. | |
1479 | = ==================================================================== | |
1480 | ||
1481 | ||
1482 | stop-a (SPARC only) | |
1483 | =================== | |
964c9dff | 1484 | |
a1ad4f15 SK |
1485 | Controls Stop-A: |
1486 | ||
1487 | = ==================================== | |
1488 | 0 Stop-A has no effect. | |
1489 | 1 Stop-A breaks to the PROM (default). | |
1490 | = ==================================== | |
1491 | ||
1492 | Stop-A is always enabled on a panic, so that the user can return to | |
1493 | the boot PROM. | |
1494 | ||
a3cb66a5 SK |
1495 | |
1496 | sysrq | |
1497 | ===== | |
1498 | ||
2793e19d | 1499 | See Documentation/admin-guide/sysrq.rst. |
53b95375 | 1500 | |
964c9dff | 1501 | |
896dd323 | 1502 | tainted |
53b95375 | 1503 | ======= |
1da177e4 | 1504 | |
9c4560e5 KC |
1505 | Non-zero if the kernel has been tainted. Numeric values, which can be |
1506 | ORed together. The letters are seen in "Tainted" line of Oops reports. | |
1507 | ||
53b95375 MCC |
1508 | ====== ===== ============================================================== |
1509 | 1 `(P)` proprietary module was loaded | |
1510 | 2 `(F)` module was force loaded | |
547f574f | 1511 | 4 `(S)` kernel running on an out of specification system |
53b95375 MCC |
1512 | 8 `(R)` module was force unloaded |
1513 | 16 `(M)` processor reported a Machine Check Exception (MCE) | |
1514 | 32 `(B)` bad page referenced or some unexpected page flags | |
1515 | 64 `(U)` taint requested by userspace application | |
1516 | 128 `(D)` kernel died recently, i.e. there was an OOPS or BUG | |
1517 | 256 `(A)` an ACPI table was overridden by user | |
1518 | 512 `(W)` kernel issued warning | |
1519 | 1024 `(C)` staging driver was loaded | |
1520 | 2048 `(I)` workaround for bug in platform firmware applied | |
1521 | 4096 `(O)` externally-built ("out-of-tree") module was loaded | |
1522 | 8192 `(E)` unsigned module was loaded | |
1523 | 16384 `(L)` soft lockup occurred | |
1524 | 32768 `(K)` kernel has been live patched | |
1525 | 65536 `(X)` Auxiliary taint, defined and used by for distros | |
1526 | 131072 `(T)` The kernel was built with the struct randomization plugin | |
1527 | ====== ===== ============================================================== | |
896dd323 | 1528 | |
2793e19d | 1529 | See Documentation/admin-guide/tainted-kernels.rst for more information. |
1da177e4 | 1530 | |
db38d5c1 RA |
1531 | Note: |
1532 | writes to this sysctl interface will fail with ``EINVAL`` if the kernel is | |
1533 | booted with the command line option ``panic_on_taint=<bitmask>,nousertaint`` | |
1534 | and any of the ORed together values being written to ``tainted`` match with | |
1535 | the bitmask declared on panic_on_taint. | |
2793e19d MCC |
1536 | See Documentation/admin-guide/kernel-parameters.rst for more details on |
1537 | that particular kernel command line option and its optional | |
1538 | ``nousertaint`` switch. | |
760df93e | 1539 | |
a3cb66a5 SK |
1540 | threads-max |
1541 | =========== | |
0ec62afe HS |
1542 | |
1543 | This value controls the maximum number of threads that can be created | |
a3cb66a5 | 1544 | using ``fork()``. |
0ec62afe HS |
1545 | |
1546 | During initialization the kernel sets this value such that even if the | |
1547 | maximum number of threads is created, the thread structures occupy only | |
1548 | a part (1/8th) of the available RAM pages. | |
1549 | ||
a3cb66a5 | 1550 | The minimum value that can be written to ``threads-max`` is 1. |
53b95375 | 1551 | |
a3cb66a5 SK |
1552 | The maximum value that can be written to ``threads-max`` is given by the |
1553 | constant ``FUTEX_TID_MASK`` (0x3fffffff). | |
53b95375 | 1554 | |
a3cb66a5 SK |
1555 | If a value outside of this range is written to ``threads-max`` an |
1556 | ``EINVAL`` error occurs. | |
0ec62afe | 1557 | |
e129fdc5 PA |
1558 | timer_migration |
1559 | =============== | |
1560 | ||
1561 | When set to a non-zero value, attempt to migrate timers away from idle cpus to | |
1562 | allow them to remain in low power states longer. | |
1563 | ||
1564 | Default is set (1). | |
0ec62afe | 1565 | |
50cdae76 SK |
1566 | traceoff_on_warning |
1567 | =================== | |
1568 | ||
2793e19d | 1569 | When set, disables tracing (see Documentation/trace/ftrace.rst) when a |
50cdae76 SK |
1570 | ``WARN()`` is hit. |
1571 | ||
1572 | ||
1573 | tracepoint_printk | |
1574 | ================= | |
1575 | ||
1576 | When tracepoints are sent to printk() (enabled by the ``tp_printk`` | |
1577 | boot parameter), this entry provides runtime control:: | |
1578 | ||
1579 | echo 0 > /proc/sys/kernel/tracepoint_printk | |
1580 | ||
1581 | will stop tracepoints from being sent to printk(), and:: | |
1582 | ||
1583 | echo 1 > /proc/sys/kernel/tracepoint_printk | |
1584 | ||
1585 | will send them to printk() again. | |
1586 | ||
1587 | This only works if the kernel was booted with ``tp_printk`` enabled. | |
1588 | ||
2793e19d MCC |
1589 | See Documentation/admin-guide/kernel-parameters.rst and |
1590 | Documentation/trace/boottime-trace.rst. | |
50cdae76 SK |
1591 | |
1592 | ||
997c798e SK |
1593 | unaligned-trap |
1594 | ============== | |
1595 | ||
1596 | On architectures where unaligned accesses cause traps, and where this | |
1597 | feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently, | |
61a6fccc HC |
1598 | ``arc``, ``parisc`` and ``loongarch``), controls whether unaligned traps |
1599 | are caught and emulated (instead of failing). | |
997c798e SK |
1600 | |
1601 | = ======================================================== | |
1602 | 0 Do not emulate unaligned accesses. | |
1603 | 1 Emulate unaligned accesses. This is the default setting. | |
1604 | = ======================================================== | |
1605 | ||
1606 | See also `ignore-unaligned-usertrap`_. | |
1607 | ||
1608 | ||
a3cb66a5 SK |
1609 | unknown_nmi_panic |
1610 | ================= | |
760df93e | 1611 | |
807094c0 BP |
1612 | The value in this file affects behavior of handling NMI. When the |
1613 | value is non-zero, unknown NMI is trapped and then panic occurs. At | |
1614 | that time, kernel debugging information is displayed on console. | |
760df93e | 1615 | |
807094c0 BP |
1616 | NMI switch that most IA32 servers have fires unknown NMI up, for |
1617 | example. If a system hangs up, try pressing the NMI switch. | |
08825c90 | 1618 | |
08825c90 | 1619 | |
5d8e5aee SK |
1620 | unprivileged_bpf_disabled |
1621 | ========================= | |
1622 | ||
1623 | Writing 1 to this entry will disable unprivileged calls to ``bpf()``; | |
08389d88 DB |
1624 | once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` or ``CAP_BPF`` |
1625 | will return ``-EPERM``. Once set to 1, this can't be cleared from the | |
1626 | running kernel anymore. | |
5d8e5aee | 1627 | |
08389d88 DB |
1628 | Writing 2 to this entry will also disable unprivileged calls to ``bpf()``, |
1629 | however, an admin can still change this setting later on, if needed, by | |
1630 | writing 0 or 1 to this entry. | |
5d8e5aee | 1631 | |
08389d88 DB |
1632 | If ``BPF_UNPRIV_DEFAULT_OFF`` is enabled in the kernel config, then this |
1633 | entry will default to 2 instead of 0. | |
1634 | ||
1635 | = ============================================================= | |
1636 | 0 Unprivileged calls to ``bpf()`` are enabled | |
1637 | 1 Unprivileged calls to ``bpf()`` are disabled without recovery | |
1638 | 2 Unprivileged calls to ``bpf()`` are disabled | |
1639 | = ============================================================= | |
5d8e5aee | 1640 | |
9fc9e278 KC |
1641 | |
1642 | warn_limit | |
1643 | ========== | |
1644 | ||
1645 | Number of kernel warnings after which the kernel should panic when | |
1646 | ``panic_on_warn`` is not set. Setting this to 0 disables checking | |
1647 | the warning count. Setting this to 1 has the same effect as setting | |
1648 | ``panic_on_warn=1``. The default value is 0. | |
1649 | ||
1650 | ||
a3cb66a5 SK |
1651 | watchdog |
1652 | ======== | |
195daf66 UO |
1653 | |
1654 | This parameter can be used to disable or enable the soft lockup detector | |
a3cb66a5 | 1655 | *and* the NMI watchdog (i.e. the hard lockup detector) at the same time. |
195daf66 | 1656 | |
a3cb66a5 SK |
1657 | = ============================== |
1658 | 0 Disable both lockup detectors. | |
1659 | 1 Enable both lockup detectors. | |
1660 | = ============================== | |
195daf66 UO |
1661 | |
1662 | The soft lockup detector and the NMI watchdog can also be disabled or | |
a3cb66a5 SK |
1663 | enabled individually, using the ``soft_watchdog`` and ``nmi_watchdog`` |
1664 | parameters. | |
1665 | If the ``watchdog`` parameter is read, for example by executing:: | |
195daf66 UO |
1666 | |
1667 | cat /proc/sys/kernel/watchdog | |
1668 | ||
a3cb66a5 SK |
1669 | the output of this command (0 or 1) shows the logical OR of |
1670 | ``soft_watchdog`` and ``nmi_watchdog``. | |
195daf66 | 1671 | |
195daf66 | 1672 | |
a3cb66a5 SK |
1673 | watchdog_cpumask |
1674 | ================ | |
fe4ba3c3 CM |
1675 | |
1676 | This value can be used to control on which cpus the watchdog may run. | |
a3cb66a5 | 1677 | The default cpumask is all possible cores, but if ``NO_HZ_FULL`` is |
fe4ba3c3 | 1678 | enabled in the kernel config, and cores are specified with the |
a3cb66a5 | 1679 | ``nohz_full=`` boot argument, those cores are excluded by default. |
fe4ba3c3 CM |
1680 | Offline cores can be included in this mask, and if the core is later |
1681 | brought online, the watchdog will be started based on the mask value. | |
1682 | ||
a3cb66a5 | 1683 | Typically this value would only be touched in the ``nohz_full`` case |
fe4ba3c3 CM |
1684 | to re-enable cores that by default were not running the watchdog, |
1685 | if a kernel lockup was suspected on those cores. | |
1686 | ||
1687 | The argument value is the standard cpulist format for cpumasks, | |
1688 | so for example to enable the watchdog on cores 0, 2, 3, and 4 you | |
53b95375 | 1689 | might say:: |
fe4ba3c3 CM |
1690 | |
1691 | echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask | |
1692 | ||
fe4ba3c3 | 1693 | |
a3cb66a5 SK |
1694 | watchdog_thresh |
1695 | =============== | |
08825c90 LZ |
1696 | |
1697 | This value can be used to control the frequency of hrtimer and NMI | |
1698 | events and the soft and hard lockup thresholds. The default threshold | |
1699 | is 10 seconds. | |
1700 | ||
a3cb66a5 | 1701 | The softlockup threshold is (``2 * watchdog_thresh``). Setting this |
08825c90 | 1702 | tunable to zero will disable lockup detection altogether. |