Commit | Line | Data |
---|---|---|
53b95375 MCC |
1 | =============================== |
2 | Documentation for /proc/sys/fs/ | |
3 | =============================== | |
1da177e4 | 4 | |
53b95375 | 5 | kernel version 2.2.10 |
1da177e4 | 6 | |
53b95375 MCC |
7 | Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> |
8 | ||
9 | Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> | |
10 | ||
11 | For general info and legal blurb, please look in intro.rst. | |
12 | ||
13 | ------------------------------------------------------------------------------ | |
1da177e4 LT |
14 | |
15 | This file contains documentation for the sysctl files in | |
16 | /proc/sys/fs/ and is valid for Linux kernel version 2.2. | |
17 | ||
18 | The files in this directory can be used to tune and monitor | |
19 | miscellaneous and general things in the operation of the Linux | |
20 | kernel. Since some of the files _can_ be used to screw up your | |
21 | system, it is advisable to read both documentation and source | |
22 | before actually making adjustments. | |
23 | ||
760df93e | 24 | 1. /proc/sys/fs |
53b95375 | 25 | =============== |
760df93e | 26 | |
1da177e4 | 27 | Currently, these files are in /proc/sys/fs: |
53b95375 | 28 | |
760df93e SF |
29 | - aio-max-nr |
30 | - aio-nr | |
1da177e4 LT |
31 | - dentry-state |
32 | - dquot-max | |
33 | - dquot-nr | |
34 | - file-max | |
35 | - file-nr | |
36 | - inode-max | |
37 | - inode-nr | |
38 | - inode-state | |
9cfe015a | 39 | - nr_open |
1da177e4 LT |
40 | - overflowuid |
41 | - overflowgid | |
759c0114 WT |
42 | - pipe-user-pages-hard |
43 | - pipe-user-pages-soft | |
30aba665 | 44 | - protected_fifos |
800179c9 | 45 | - protected_hardlinks |
30aba665 | 46 | - protected_regular |
800179c9 | 47 | - protected_symlinks |
a2e0b563 | 48 | - suid_dumpable |
1da177e4 LT |
49 | - super-max |
50 | - super-nr | |
51 | ||
760df93e | 52 | |
53b95375 MCC |
53 | aio-nr & aio-max-nr |
54 | ------------------- | |
760df93e SF |
55 | |
56 | aio-nr is the running total of the number of events specified on the | |
57 | io_setup system call for all currently active aio contexts. If aio-nr | |
58 | reaches aio-max-nr then io_setup will fail with EAGAIN. Note that | |
59 | raising aio-max-nr does not result in the pre-allocation or re-sizing | |
60 | of any kernel data structures. | |
1da177e4 | 61 | |
1da177e4 | 62 | |
53b95375 MCC |
63 | dentry-state |
64 | ------------ | |
1da177e4 | 65 | |
53b95375 MCC |
66 | From linux/include/linux/dcache.h:: |
67 | ||
68 | struct dentry_stat_t dentry_stat { | |
1da177e4 LT |
69 | int nr_dentry; |
70 | int nr_unused; | |
71 | int age_limit; /* age in seconds */ | |
72 | int want_pages; /* pages requested by system */ | |
af0c9af1 WL |
73 | int nr_negative; /* # of unused negative dentries */ |
74 | int dummy; /* Reserved for future use */ | |
53b95375 | 75 | }; |
af0c9af1 WL |
76 | |
77 | Dentries are dynamically allocated and deallocated. | |
78 | ||
79 | nr_dentry shows the total number of dentries allocated (active | |
80 | + unused). nr_unused shows the number of dentries that are not | |
81 | actively used, but are saved in the LRU list for future reuse. | |
82 | ||
1da177e4 LT |
83 | Age_limit is the age in seconds after which dcache entries |
84 | can be reclaimed when memory is short and want_pages is | |
85 | nonzero when shrink_dcache_pages() has been called and the | |
86 | dcache isn't pruned yet. | |
87 | ||
af0c9af1 | 88 | nr_negative shows the number of unused dentries that are also |
1413d9af WL |
89 | negative dentries which do not map to any files. Instead, |
90 | they help speeding up rejection of non-existing files provided | |
91 | by the users. | |
af0c9af1 | 92 | |
1da177e4 | 93 | |
53b95375 MCC |
94 | dquot-max & dquot-nr |
95 | -------------------- | |
1da177e4 LT |
96 | |
97 | The file dquot-max shows the maximum number of cached disk | |
98 | quota entries. | |
99 | ||
100 | The file dquot-nr shows the number of allocated disk quota | |
101 | entries and the number of free disk quota entries. | |
102 | ||
103 | If the number of free cached disk quotas is very low and | |
104 | you have some awesome number of simultaneous system users, | |
105 | you might want to raise the limit. | |
106 | ||
1da177e4 | 107 | |
53b95375 MCC |
108 | file-max & file-nr |
109 | ------------------ | |
1da177e4 | 110 | |
1da177e4 LT |
111 | The value in file-max denotes the maximum number of file- |
112 | handles that the Linux kernel will allocate. When you get lots | |
113 | of error messages about running out of file handles, you might | |
114 | want to increase this limit. | |
115 | ||
ca3b78aa FT |
116 | Historically,the kernel was able to allocate file handles |
117 | dynamically, but not to free them again. The three values in | |
118 | file-nr denote the number of allocated file handles, the number | |
119 | of allocated but unused file handles, and the maximum number of | |
120 | file handles. Linux 2.6 always reports 0 as the number of free | |
121 | file handles -- this is not an error, it just means that the | |
122 | number of allocated file handles exactly matches the number of | |
123 | used file handles. | |
bcadbbd4 XF |
124 | |
125 | Attempts to allocate more file descriptors than file-max are | |
126 | reported with printk, look for "VFS: file-max limit <number> | |
127 | reached". | |
9cfe015a | 128 | |
53b95375 MCC |
129 | |
130 | nr_open | |
131 | ------- | |
9cfe015a ED |
132 | |
133 | This denotes the maximum number of file-handles a process can | |
134 | allocate. Default value is 1024*1024 (1048576) which should be | |
135 | enough for most machines. Actual limit depends on RLIMIT_NOFILE | |
136 | resource limit. | |
137 | ||
1da177e4 | 138 | |
53b95375 MCC |
139 | inode-max, inode-nr & inode-state |
140 | --------------------------------- | |
1da177e4 LT |
141 | |
142 | As with file handles, the kernel allocates the inode structures | |
143 | dynamically, but can't free them yet. | |
144 | ||
145 | The value in inode-max denotes the maximum number of inode | |
146 | handlers. This value should be 3-4 times larger than the value | |
147 | in file-max, since stdin, stdout and network sockets also | |
148 | need an inode struct to handle them. When you regularly run | |
149 | out of inodes, you need to increase this value. | |
150 | ||
151 | The file inode-nr contains the first two items from | |
152 | inode-state, so we'll skip to that file... | |
153 | ||
154 | Inode-state contains three actual numbers and four dummies. | |
155 | The actual numbers are, in order of appearance, nr_inodes, | |
156 | nr_free_inodes and preshrink. | |
157 | ||
158 | Nr_inodes stands for the number of inodes the system has | |
159 | allocated, this can be slightly more than inode-max because | |
160 | Linux allocates them one pageful at a time. | |
161 | ||
162 | Nr_free_inodes represents the number of free inodes (?) and | |
163 | preshrink is nonzero when the nr_inodes > inode-max and the | |
164 | system needs to prune the inode list instead of allocating | |
165 | more. | |
166 | ||
1da177e4 | 167 | |
53b95375 MCC |
168 | overflowgid & overflowuid |
169 | ------------------------- | |
1da177e4 LT |
170 | |
171 | Some filesystems only support 16-bit UIDs and GIDs, although in Linux | |
172 | UIDs and GIDs are 32 bits. When one of these filesystems is mounted | |
173 | with writes enabled, any UID or GID that would exceed 65535 is translated | |
174 | to a fixed value before being written to disk. | |
175 | ||
176 | These sysctls allow you to change the value of the fixed UID and GID. | |
177 | The default is 65534. | |
178 | ||
759c0114 | 179 | |
53b95375 MCC |
180 | pipe-user-pages-hard |
181 | -------------------- | |
759c0114 WT |
182 | |
183 | Maximum total number of pages a non-privileged user may allocate for pipes. | |
184 | Once this limit is reached, no new pipes may be allocated until usage goes | |
185 | below the limit again. When set to 0, no limit is applied, which is the default | |
186 | setting. | |
187 | ||
759c0114 | 188 | |
53b95375 MCC |
189 | pipe-user-pages-soft |
190 | -------------------- | |
759c0114 WT |
191 | |
192 | Maximum total number of pages a non-privileged user may allocate for pipes | |
193 | before the pipe size gets limited to a single page. Once this limit is reached, | |
194 | new pipes will be limited to a single page in size for this user in order to | |
195 | limit total memory usage, and trying to increase them using fcntl() will be | |
196 | denied until usage goes below the limit again. The default value allows to | |
197 | allocate up to 1024 pipes at their default size. When set to 0, no limit is | |
198 | applied. | |
199 | ||
1da177e4 | 200 | |
53b95375 MCC |
201 | protected_fifos |
202 | --------------- | |
30aba665 SM |
203 | |
204 | The intent of this protection is to avoid unintentional writes to | |
205 | an attacker-controlled FIFO, where a program expected to create a regular | |
206 | file. | |
207 | ||
208 | When set to "0", writing to FIFOs is unrestricted. | |
209 | ||
210 | When set to "1" don't allow O_CREAT open on FIFOs that we don't own | |
211 | in world writable sticky directories, unless they are owned by the | |
212 | owner of the directory. | |
213 | ||
214 | When set to "2" it also applies to group writable sticky directories. | |
215 | ||
216 | This protection is based on the restrictions in Openwall. | |
217 | ||
30aba665 | 218 | |
53b95375 MCC |
219 | protected_hardlinks |
220 | -------------------- | |
800179c9 KC |
221 | |
222 | A long-standing class of security issues is the hardlink-based | |
223 | time-of-check-time-of-use race, most commonly seen in world-writable | |
224 | directories like /tmp. The common method of exploitation of this flaw | |
225 | is to cross privilege boundaries when following a given hardlink (i.e. a | |
226 | root process follows a hardlink created by another user). Additionally, | |
227 | on systems without separated partitions, this stops unauthorized users | |
228 | from "pinning" vulnerable setuid/setgid files against being upgraded by | |
229 | the administrator, or linking to special files. | |
230 | ||
231 | When set to "0", hardlink creation behavior is unrestricted. | |
232 | ||
233 | When set to "1" hardlinks cannot be created by users if they do not | |
234 | already own the source file, or do not have read/write access to it. | |
235 | ||
236 | This protection is based on the restrictions in Openwall and grsecurity. | |
237 | ||
800179c9 | 238 | |
53b95375 MCC |
239 | protected_regular |
240 | ----------------- | |
30aba665 SM |
241 | |
242 | This protection is similar to protected_fifos, but it | |
243 | avoids writes to an attacker-controlled regular file, where a program | |
244 | expected to create one. | |
245 | ||
246 | When set to "0", writing to regular files is unrestricted. | |
247 | ||
248 | When set to "1" don't allow O_CREAT open on regular files that we | |
249 | don't own in world writable sticky directories, unless they are | |
250 | owned by the owner of the directory. | |
251 | ||
252 | When set to "2" it also applies to group writable sticky directories. | |
253 | ||
30aba665 | 254 | |
53b95375 MCC |
255 | protected_symlinks |
256 | ------------------ | |
800179c9 KC |
257 | |
258 | A long-standing class of security issues is the symlink-based | |
259 | time-of-check-time-of-use race, most commonly seen in world-writable | |
260 | directories like /tmp. The common method of exploitation of this flaw | |
261 | is to cross privilege boundaries when following a given symlink (i.e. a | |
262 | root process follows a symlink belonging to another user). For a likely | |
263 | incomplete list of hundreds of examples across the years, please see: | |
264 | http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp | |
265 | ||
266 | When set to "0", symlink following behavior is unrestricted. | |
267 | ||
268 | When set to "1" symlinks are permitted to be followed only when outside | |
269 | a sticky world-writable directory, or when the uid of the symlink and | |
270 | follower match, or when the directory owner matches the symlink's owner. | |
271 | ||
272 | This protection is based on the restrictions in Openwall and grsecurity. | |
273 | ||
800179c9 | 274 | |
a2e0b563 | 275 | suid_dumpable: |
53b95375 | 276 | -------------- |
a2e0b563 AD |
277 | |
278 | This value can be used to query and set the core dump mode for setuid | |
279 | or otherwise protected/tainted binaries. The modes are | |
280 | ||
53b95375 MCC |
281 | = ========== =============================================================== |
282 | 0 (default) traditional behaviour. Any process which has changed | |
283 | privilege levels or is execute only will not be dumped. | |
284 | 1 (debug) all processes dump core when possible. The core dump is | |
285 | owned by the current user and no security is applied. This is | |
286 | intended for system debugging situations only. | |
287 | Ptrace is unchecked. | |
288 | This is insecure as it allows regular users to examine the | |
289 | memory contents of privileged processes. | |
290 | 2 (suidsafe) any binary which normally would not be dumped is dumped | |
291 | anyway, but only if the "core_pattern" kernel sysctl is set to | |
292 | either a pipe handler or a fully qualified path. (For more | |
293 | details on this limitation, see CVE-2006-2451.) This mode is | |
294 | appropriate when administrators are attempting to debug | |
295 | problems in a normal environment, and either have a core dump | |
296 | pipe handler that knows to treat privileged core dumps with | |
297 | care, or specific directory defined for catching core dumps. | |
298 | If a core dump happens without a pipe handler or fully | |
299 | qualified path, a message will be emitted to syslog warning | |
300 | about the lack of a correct setting. | |
301 | = ========== =============================================================== | |
302 | ||
303 | ||
304 | super-max & super-nr | |
305 | -------------------- | |
1da177e4 LT |
306 | |
307 | These numbers control the maximum number of superblocks, and | |
308 | thus the maximum number of mounted filesystems the kernel | |
309 | can have. You only need to increase super-max if you need to | |
310 | mount more filesystems than the current value in super-max | |
311 | allows you to. | |
312 | ||
1da177e4 | 313 | |
53b95375 MCC |
314 | aio-nr & aio-max-nr |
315 | ------------------- | |
1da177e4 LT |
316 | |
317 | aio-nr shows the current system-wide number of asynchronous io | |
318 | requests. aio-max-nr allows you to change the maximum value | |
319 | aio-nr can grow to. | |
320 | ||
760df93e | 321 | |
53b95375 MCC |
322 | mount-max |
323 | --------- | |
d2921684 EB |
324 | |
325 | This denotes the maximum number of mounts that may exist | |
326 | in a mount namespace. | |
327 | ||
d2921684 | 328 | |
760df93e SF |
329 | |
330 | 2. /proc/sys/fs/binfmt_misc | |
53b95375 | 331 | =========================== |
760df93e SF |
332 | |
333 | Documentation for the files in /proc/sys/fs/binfmt_misc is | |
852f1a21 | 334 | in Documentation/admin-guide/binfmt-misc.rst. |
760df93e SF |
335 | |
336 | ||
337 | 3. /proc/sys/fs/mqueue - POSIX message queues filesystem | |
53b95375 MCC |
338 | ======================================================== |
339 | ||
760df93e SF |
340 | |
341 | The "mqueue" filesystem provides the necessary kernel features to enable the | |
342 | creation of a user space library that implements the POSIX message queues | |
343 | API (as noted by the MSG tag in the POSIX 1003.1-2001 version of the System | |
344 | Interfaces specification.) | |
345 | ||
346 | The "mqueue" filesystem contains values for determining/setting the amount of | |
347 | resources used by the file system. | |
348 | ||
349 | /proc/sys/fs/mqueue/queues_max is a read/write file for setting/getting the | |
350 | maximum number of message queues allowed on the system. | |
351 | ||
352 | /proc/sys/fs/mqueue/msg_max is a read/write file for setting/getting the | |
353 | maximum number of messages in a queue value. In fact it is the limiting value | |
354 | for another (user) limit which is set in mq_open invocation. This attribute of | |
355 | a queue must be less or equal then msg_max. | |
356 | ||
357 | /proc/sys/fs/mqueue/msgsize_max is a read/write file for setting/getting the | |
358 | maximum message size value (it is every message queue's attribute set during | |
359 | its creation). | |
360 | ||
cef0184c KM |
361 | /proc/sys/fs/mqueue/msg_default is a read/write file for setting/getting the |
362 | default number of messages in a queue value if attr parameter of mq_open(2) is | |
363 | NULL. If it exceed msg_max, the default value is initialized msg_max. | |
364 | ||
365 | /proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting | |
366 | the default message size value if attr parameter of mq_open(2) is NULL. If it | |
367 | exceed msgsize_max, the default value is initialized msgsize_max. | |
760df93e SF |
368 | |
369 | 4. /proc/sys/fs/epoll - Configuration options for the epoll interface | |
53b95375 | 370 | ===================================================================== |
760df93e SF |
371 | |
372 | This directory contains configuration options for the epoll(7) interface. | |
373 | ||
760df93e SF |
374 | max_user_watches |
375 | ---------------- | |
376 | ||
377 | Every epoll file descriptor can store a number of files to be monitored | |
378 | for event readiness. Each one of these monitored files constitutes a "watch". | |
379 | This configuration option sets the maximum number of "watches" that are | |
380 | allowed for each user. | |
381 | Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes | |
382 | on a 64bit one. | |
383 | The current default value for max_user_watches is the 1/32 of the available | |
384 | low memory, divided for the "watch" cost in bytes. |