Commit | Line | Data |
---|---|---|
89b408a6 | 1 | .. SPDX-License-Identifier: GPL-2.0 |
1da177e4 | 2 | |
89b408a6 | 3 | ====================== |
1da177e4 LT |
4 | The SGI XFS Filesystem |
5 | ====================== | |
6 | ||
7 | XFS is a high performance journaling filesystem which originated | |
8 | on the SGI IRIX platform. It is completely multi-threaded, can | |
9 | support large files and large filesystems, extended attributes, | |
10 | variable block sizes, is extent based, and makes extensive use of | |
11 | Btrees (directories, extents, free space) to aid both performance | |
12 | and scalability. | |
13 | ||
a10c5d91 | 14 | Refer to the documentation at https://xfs.wiki.kernel.org/ |
1da177e4 LT |
15 | for further details. This implementation is on-disk compatible |
16 | with the IRIX version of XFS. | |
17 | ||
18 | ||
19 | Mount Options | |
20 | ============= | |
21 | ||
22 | When mounting an XFS filesystem, the following options are accepted. | |
23 | ||
fc97bbf3 NS |
24 | allocsize=size |
25 | Sets the buffered I/O end-of-file preallocation size when | |
26 | doing delayed allocation writeout (default size is 64KiB). | |
27 | Valid values for this option are page size (typically 4KiB) | |
28 | through to 1GiB, inclusive, in power-of-2 increments. | |
29 | ||
3e5b7d8b DC |
30 | The default behaviour is for dynamic end-of-file |
31 | preallocation size, which uses a set of heuristics to | |
32 | optimise the preallocation size based on the current | |
33 | allocation patterns within the file and the access patterns | |
89b408a6 | 34 | to the file. Specifying a fixed ``allocsize`` value turns off |
3e5b7d8b DC |
35 | the dynamic behaviour. |
36 | ||
89b408a6 | 37 | attr2 or noattr2 |
3e5b7d8b DC |
38 | The options enable/disable an "opportunistic" improvement to |
39 | be made in the way inline extended attributes are stored | |
40 | on-disk. When the new form is used for the first time when | |
89b408a6 | 41 | ``attr2`` is selected (either when setting or removing extended |
3e5b7d8b DC |
42 | attributes) the on-disk superblock feature bit field will be |
43 | updated to reflect this format being in use. | |
44 | ||
45 | The default behaviour is determined by the on-disk feature | |
89b408a6 SE |
46 | bit indicating that ``attr2`` behaviour is active. If either |
47 | mount option is set, then that becomes the new default used | |
3e5b7d8b | 48 | by the filesystem. |
fc97bbf3 | 49 | |
89b408a6 SE |
50 | CRC enabled filesystems always use the ``attr2`` format, and so |
51 | will reject the ``noattr2`` mount option if it is set. | |
d3eaace8 | 52 | |
89b408a6 | 53 | discard or nodiscard (default) |
3e5b7d8b DC |
54 | Enable/disable the issuing of commands to let the block |
55 | device reclaim space freed by the filesystem. This is | |
56 | useful for SSD devices, thinly provisioned LUNs and virtual | |
57 | machine images, but may have a performance impact. | |
58 | ||
89b408a6 SE |
59 | Note: It is currently recommended that you use the ``fstrim`` |
60 | application to ``discard`` unused blocks rather than the ``discard`` | |
3e5b7d8b DC |
61 | mount option because the performance impact of this option |
62 | is quite severe. | |
63 | ||
89b408a6 | 64 | grpid/bsdgroups or nogrpid/sysvgroups (default) |
3e5b7d8b | 65 | These options define what group ID a newly created file |
89b408a6 | 66 | gets. When ``grpid`` is set, it takes the group ID of the |
3e5b7d8b | 67 | directory in which it is created; otherwise it takes the |
89b408a6 SE |
68 | ``fsgid`` of the current process, unless the directory has the |
69 | ``setgid`` bit set, in which case it takes the ``gid`` from the | |
70 | parent directory, and also gets the ``setgid`` bit set if it is | |
3e5b7d8b DC |
71 | a directory itself. |
72 | ||
73 | filestreams | |
74 | Make the data allocator use the filestreams allocation mode | |
75 | across the entire filesystem rather than just on directories | |
76 | configured to use it. | |
77 | ||
89b408a6 SE |
78 | ikeep or noikeep (default) |
79 | When ``ikeep`` is specified, XFS does not delete empty inode | |
80 | clusters and keeps them around on disk. When ``noikeep`` is | |
3e5b7d8b DC |
81 | specified, empty inode clusters are returned to the free |
82 | space pool. | |
c99abb8f | 83 | |
89b408a6 SE |
84 | inode32 or inode64 (default) |
85 | When ``inode32`` is specified, it indicates that XFS limits | |
3e5b7d8b DC |
86 | inode creation to locations which will not result in inode |
87 | numbers with more than 32 bits of significance. | |
88 | ||
89b408a6 | 89 | When ``inode64`` is specified, it indicates that XFS is allowed |
3e5b7d8b DC |
90 | to create inodes at any location in the filesystem, |
91 | including those which will result in inode numbers occupying | |
89b408a6 | 92 | more than 32 bits of significance. |
3e5b7d8b | 93 | |
89b408a6 | 94 | ``inode32`` is provided for backwards compatibility with older |
3e5b7d8b DC |
95 | systems and applications, since 64 bits inode numbers might |
96 | cause problems for some applications that cannot handle | |
97 | large inode numbers. If applications are in use which do | |
89b408a6 | 98 | not handle inode numbers bigger than 32 bits, the ``inode32`` |
3e5b7d8b DC |
99 | option should be specified. |
100 | ||
89b408a6 SE |
101 | largeio or nolargeio (default) |
102 | If ``nolargeio`` is specified, the optimal I/O reported in | |
103 | ``st_blksize`` by **stat(2)** will be as small as possible to allow | |
3e5b7d8b DC |
104 | user applications to avoid inefficient read/modify/write |
105 | I/O. This is typically the page size of the machine, as | |
106 | this is the granularity of the page cache. | |
107 | ||
89b408a6 SE |
108 | If ``largeio`` is specified, a filesystem that was created with a |
109 | ``swidth`` specified will return the ``swidth`` value (in bytes) | |
110 | in ``st_blksize``. If the filesystem does not have a ``swidth`` | |
111 | specified but does specify an ``allocsize`` then ``allocsize`` | |
3e5b7d8b | 112 | (in bytes) will be returned instead. Otherwise the behaviour |
89b408a6 | 113 | is the same as if ``nolargeio`` was specified. |
fc97bbf3 | 114 | |
1da177e4 | 115 | logbufs=value |
3e5b7d8b DC |
116 | Set the number of in-memory log buffers. Valid numbers |
117 | range from 2-8 inclusive. | |
118 | ||
119 | The default value is 8 buffers. | |
120 | ||
121 | If the memory cost of 8 log buffers is too high on small | |
122 | systems, then it may be reduced at some cost to performance | |
89b408a6 | 123 | on metadata intensive workloads. The ``logbsize`` option below |
9ed354b7 | 124 | controls the size of each buffer and so is also relevant to |
3e5b7d8b | 125 | this case. |
1da177e4 LT |
126 | |
127 | logbsize=value | |
3e5b7d8b DC |
128 | Set the size of each in-memory log buffer. The size may be |
129 | specified in bytes, or in kilobytes with a "k" suffix. | |
130 | Valid sizes for version 1 and version 2 logs are 16384 (16k) | |
131 | and 32768 (32k). Valid sizes for version 2 logs also | |
132 | include 65536 (64k), 131072 (128k) and 262144 (256k). The | |
133 | logbsize must be an integer multiple of the log | |
89b408a6 | 134 | stripe unit configured at **mkfs(8)** time. |
3e5b7d8b | 135 | |
559394d3 | 136 | The default value for version 1 logs is 32768, while the |
3e5b7d8b | 137 | default value for version 2 logs is MAX(32768, log_sunit). |
1da177e4 LT |
138 | |
139 | logdev=device and rtdev=device | |
140 | Use an external log (metadata journal) and/or real-time device. | |
141 | An XFS filesystem has up to three parts: a data section, a log | |
142 | section, and a real-time section. The real-time section is | |
143 | optional, and the log section can be separate from the data | |
144 | section or contained within it. | |
145 | ||
146 | noalign | |
3e5b7d8b DC |
147 | Data allocations will not be aligned at stripe unit |
148 | boundaries. This is only relevant to filesystems created | |
89b408a6 SE |
149 | with non-zero data alignment parameters (``sunit``, ``swidth``) by |
150 | **mkfs(8)**. | |
1da177e4 LT |
151 | |
152 | norecovery | |
153 | The filesystem will be mounted without running log recovery. | |
154 | If the filesystem was not cleanly unmounted, it is likely to | |
89b408a6 | 155 | be inconsistent when mounted in ``norecovery`` mode. |
1da177e4 | 156 | Some files or directories may not be accessible because of this. |
89b408a6 | 157 | Filesystems mounted ``norecovery`` must be mounted read-only or |
1da177e4 LT |
158 | the mount will fail. |
159 | ||
160 | nouuid | |
3e5b7d8b | 161 | Don't check for double mounted file systems using the file |
89b408a6 SE |
162 | system ``uuid``. This is useful to mount LVM snapshot volumes, |
163 | and often used in combination with ``norecovery`` for mounting | |
3e5b7d8b DC |
164 | read-only snapshots. |
165 | ||
166 | noquota | |
167 | Forcibly turns off all quota accounting and enforcement | |
168 | within the filesystem. | |
1da177e4 | 169 | |
fc97bbf3 | 170 | uquota/usrquota/uqnoenforce/quota |
1da177e4 | 171 | User disk quota accounting enabled, and limits (optionally) |
89b408a6 | 172 | enforced. Refer to **xfs_quota(8)** for further details. |
1da177e4 | 173 | |
fc97bbf3 | 174 | gquota/grpquota/gqnoenforce |
1da177e4 | 175 | Group disk quota accounting enabled and limits (optionally) |
89b408a6 | 176 | enforced. Refer to **xfs_quota(8)** for further details. |
fc97bbf3 NS |
177 | |
178 | pquota/prjquota/pqnoenforce | |
179 | Project disk quota accounting enabled and limits (optionally) | |
89b408a6 | 180 | enforced. Refer to **xfs_quota(8)** for further details. |
1da177e4 LT |
181 | |
182 | sunit=value and swidth=value | |
3e5b7d8b DC |
183 | Used to specify the stripe unit and width for a RAID device |
184 | or a stripe volume. "value" must be specified in 512-byte | |
185 | block units. These options are only relevant to filesystems | |
186 | that were created with non-zero data alignment parameters. | |
187 | ||
89b408a6 | 188 | The ``sunit`` and ``swidth`` parameters specified must be compatible |
3e5b7d8b | 189 | with the existing filesystem alignment characteristics. In |
89b408a6 SE |
190 | general, that means the only valid changes to ``sunit`` are |
191 | increasing it by a power-of-2 multiple. Valid ``swidth`` values | |
192 | are any integer multiple of a valid ``sunit`` value. | |
3e5b7d8b DC |
193 | |
194 | Typically the only time these mount options are necessary if | |
195 | after an underlying RAID device has had it's geometry | |
196 | modified, such as adding a new disk to a RAID5 lun and | |
197 | reshaping it. | |
1da177e4 | 198 | |
fc97bbf3 NS |
199 | swalloc |
200 | Data allocations will be rounded up to stripe width boundaries | |
201 | when the current end of file is being extended and the file | |
202 | size is larger than the stripe width size. | |
203 | ||
3e5b7d8b DC |
204 | wsync |
205 | When specified, all filesystem namespace operations are | |
206 | executed synchronously. This ensures that when the namespace | |
207 | operation (create, unlink, etc) completes, the change to the | |
208 | namespace is on stable storage. This is useful in HA setups | |
209 | where failover must not result in clients seeing | |
210 | inconsistent namespace presentation during or after a | |
211 | failover event. | |
212 | ||
b96cb835 DW |
213 | Deprecation of V4 Format |
214 | ======================== | |
215 | ||
216 | The V4 filesystem format lacks certain features that are supported by | |
217 | the V5 format, such as metadata checksumming, strengthened metadata | |
218 | verification, and the ability to store timestamps past the year 2038. | |
219 | Because of this, the V4 format is deprecated. All users should upgrade | |
220 | by backing up their files, reformatting, and restoring from the backup. | |
221 | ||
222 | Administrators and users can detect a V4 filesystem by running xfs_info | |
223 | against a filesystem mountpoint and checking for a string containing | |
224 | "crc=". If no such string is found, please upgrade xfsprogs to the | |
225 | latest version and try again. | |
226 | ||
227 | The deprecation will take place in two parts. Support for mounting V4 | |
228 | filesystems can now be disabled at kernel build time via Kconfig option. | |
229 | The option will default to yes until September 2025, at which time it | |
230 | will be changed to default to no. In September 2030, support will be | |
231 | removed from the codebase entirely. | |
232 | ||
233 | Note: Distributors may choose to withdraw V4 format support earlier than | |
234 | the dates listed above. | |
3e5b7d8b DC |
235 | |
236 | Deprecated Mount Options | |
237 | ======================== | |
238 | ||
71deb8a5 | 239 | ============================ ================ |
4cf4573d | 240 | Name Removal Schedule |
71deb8a5 | 241 | ============================ ================ |
b96cb835 | 242 | Mounting with V4 filesystem September 2030 |
7ba83850 | 243 | Mounting ascii-ci filesystem September 2030 |
c23c393e PR |
244 | ikeep/noikeep September 2025 |
245 | attr2/noattr2 September 2025 | |
71deb8a5 | 246 | ============================ ================ |
3e5b7d8b | 247 | |
3e5b7d8b | 248 | |
444a7022 ES |
249 | Removed Mount Options |
250 | ===================== | |
3e5b7d8b | 251 | |
89b408a6 | 252 | =========================== ======= |
444a7022 | 253 | Name Removed |
89b408a6 | 254 | =========================== ======= |
4d66ea09 FL |
255 | delaylog/nodelaylog v4.0 |
256 | ihashsize v4.0 | |
257 | irixsgid v4.0 | |
258 | osyncisdsync/osyncisosync v4.0 | |
1c02d502 ES |
259 | barrier v4.19 |
260 | nobarrier v4.19 | |
89b408a6 | 261 | =========================== ======= |
fc97bbf3 | 262 | |
1da177e4 LT |
263 | sysctls |
264 | ======= | |
265 | ||
266 | The following sysctls are available for the XFS filesystem: | |
267 | ||
268 | fs.xfs.stats_clear (Min: 0 Default: 0 Max: 1) | |
fc97bbf3 | 269 | Setting this to "1" clears accumulated XFS statistics |
1da177e4 | 270 | in /proc/fs/xfs/stat. It then immediately resets to "0". |
fc97bbf3 | 271 | |
1da177e4 | 272 | fs.xfs.xfssyncd_centisecs (Min: 100 Default: 3000 Max: 720000) |
3e5b7d8b DC |
273 | The interval at which the filesystem flushes metadata |
274 | out to disk and runs internal cache cleanup routines. | |
1da177e4 | 275 | |
3e5b7d8b DC |
276 | fs.xfs.filestream_centisecs (Min: 1 Default: 3000 Max: 360000) |
277 | The interval at which the filesystem ages filestreams cache | |
278 | references and returns timed-out AGs back to the free stream | |
279 | pool. | |
1da177e4 | 280 | |
3e5b7d8b | 281 | fs.xfs.speculative_prealloc_lifetime |
99528efd | 282 | (Units: seconds Min: 1 Default: 300 Max: 86400) |
3e5b7d8b DC |
283 | The interval at which the background scanning for inodes |
284 | with unused speculative preallocation runs. The scan | |
285 | removes unused preallocation from clean inodes and releases | |
286 | the unused space back to the free pool. | |
1da177e4 | 287 | |
89e0eb8c DW |
288 | fs.xfs.speculative_cow_prealloc_lifetime |
289 | This is an alias for speculative_prealloc_lifetime. | |
290 | ||
1da177e4 LT |
291 | fs.xfs.error_level (Min: 0 Default: 3 Max: 11) |
292 | A volume knob for error reporting when internal errors occur. | |
293 | This will generate detailed messages & backtraces for filesystem | |
294 | shutdowns, for example. Current threshold values are: | |
295 | ||
296 | XFS_ERRLEVEL_OFF: 0 | |
297 | XFS_ERRLEVEL_LOW: 1 | |
298 | XFS_ERRLEVEL_HIGH: 5 | |
299 | ||
167ce4cb | 300 | fs.xfs.panic_mask (Min: 0 Default: 0 Max: 511) |
fc97bbf3 | 301 | Causes certain error conditions to call BUG(). Value is a bitmask; |
de8bd0eb | 302 | OR together the tags which represent errors which should cause panics: |
fc97bbf3 | 303 | |
1da177e4 LT |
304 | XFS_NO_PTAG 0 |
305 | XFS_PTAG_IFLUSH 0x00000001 | |
306 | XFS_PTAG_LOGRES 0x00000002 | |
307 | XFS_PTAG_AILDELETE 0x00000004 | |
308 | XFS_PTAG_ERROR_REPORT 0x00000008 | |
309 | XFS_PTAG_SHUTDOWN_CORRUPT 0x00000010 | |
310 | XFS_PTAG_SHUTDOWN_IOERROR 0x00000020 | |
311 | XFS_PTAG_SHUTDOWN_LOGERROR 0x00000040 | |
de8bd0eb | 312 | XFS_PTAG_FSBLOCK_ZERO 0x00000080 |
d519da41 | 313 | XFS_PTAG_VERIFIER_ERROR 0x00000100 |
1da177e4 | 314 | |
fc97bbf3 | 315 | This option is intended for debugging only. |
1da177e4 LT |
316 | |
317 | fs.xfs.irix_symlink_mode (Min: 0 Default: 0 Max: 1) | |
318 | Controls whether symlinks are created with mode 0777 (default) | |
319 | or whether their mode is affected by the umask (irix mode). | |
320 | ||
321 | fs.xfs.irix_sgid_inherit (Min: 0 Default: 0 Max: 1) | |
322 | Controls files created in SGID directories. | |
323 | If the group ID of the new file does not match the effective group | |
fc97bbf3 NS |
324 | ID or one of the supplementary group IDs of the parent dir, the |
325 | ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl | |
1da177e4 LT |
326 | is set. |
327 | ||
fc97bbf3 NS |
328 | fs.xfs.inherit_sync (Min: 0 Default: 1 Max: 1) |
329 | Setting this to "1" will cause the "sync" flag set | |
89b408a6 | 330 | by the **xfs_io(8)** chattr command on a directory to be |
1da177e4 LT |
331 | inherited by files in that directory. |
332 | ||
fc97bbf3 NS |
333 | fs.xfs.inherit_nodump (Min: 0 Default: 1 Max: 1) |
334 | Setting this to "1" will cause the "nodump" flag set | |
89b408a6 | 335 | by the **xfs_io(8)** chattr command on a directory to be |
1da177e4 LT |
336 | inherited by files in that directory. |
337 | ||
fc97bbf3 NS |
338 | fs.xfs.inherit_noatime (Min: 0 Default: 1 Max: 1) |
339 | Setting this to "1" will cause the "noatime" flag set | |
89b408a6 | 340 | by the **xfs_io(8)** chattr command on a directory to be |
1da177e4 | 341 | inherited by files in that directory. |
fc97bbf3 NS |
342 | |
343 | fs.xfs.inherit_nosymlinks (Min: 0 Default: 1 Max: 1) | |
344 | Setting this to "1" will cause the "nosymlinks" flag set | |
89b408a6 | 345 | by the **xfs_io(8)** chattr command on a directory to be |
fc97bbf3 NS |
346 | inherited by files in that directory. |
347 | ||
3e5b7d8b DC |
348 | fs.xfs.inherit_nodefrag (Min: 0 Default: 1 Max: 1) |
349 | Setting this to "1" will cause the "nodefrag" flag set | |
89b408a6 | 350 | by the **xfs_io(8)** chattr command on a directory to be |
3e5b7d8b DC |
351 | inherited by files in that directory. |
352 | ||
fc97bbf3 NS |
353 | fs.xfs.rotorstep (Min: 1 Default: 1 Max: 256) |
354 | In "inode32" allocation mode, this option determines how many | |
355 | files the allocator attempts to allocate in the same allocation | |
356 | group before moving to the next allocation group. The intent | |
357 | is to control the rate at which the allocator moves between | |
358 | allocation groups when allocating extents for new files. | |
3e5b7d8b DC |
359 | |
360 | Deprecated Sysctls | |
361 | ================== | |
362 | ||
89e0eb8c DW |
363 | =========================================== ================ |
364 | Name Removal Schedule | |
365 | =========================================== ================ | |
366 | fs.xfs.irix_sgid_inherit September 2025 | |
367 | fs.xfs.irix_symlink_mode September 2025 | |
368 | fs.xfs.speculative_cow_prealloc_lifetime September 2025 | |
369 | =========================================== ================ | |
3e5b7d8b | 370 | |
3e5b7d8b | 371 | |
64af7a6e DC |
372 | Removed Sysctls |
373 | =============== | |
3e5b7d8b | 374 | |
38a449ff | 375 | ============================= ======= |
64af7a6e | 376 | Name Removed |
38a449ff | 377 | ============================= ======= |
4d66ea09 FL |
378 | fs.xfs.xfsbufd_centisec v4.0 |
379 | fs.xfs.age_buffer_centisecs v4.0 | |
38a449ff | 380 | ============================= ======= |
5694fe9a CM |
381 | |
382 | Error handling | |
383 | ============== | |
384 | ||
385 | XFS can act differently according to the type of error found during its | |
386 | operation. The implementation introduces the following concepts to the error | |
387 | handler: | |
388 | ||
389 | -failure speed: | |
390 | Defines how fast XFS should propagate an error upwards when a specific | |
391 | error is found during the filesystem operation. It can propagate | |
392 | immediately, after a defined number of retries, after a set time period, | |
393 | or simply retry forever. | |
394 | ||
395 | -error classes: | |
396 | Specifies the subsystem the error configuration will apply to, such as | |
397 | metadata IO or memory allocation. Different subsystems will have | |
398 | different error handlers for which behaviour can be configured. | |
399 | ||
400 | -error handlers: | |
401 | Defines the behavior for a specific error. | |
402 | ||
89b408a6 | 403 | The filesystem behavior during an error can be set via ``sysfs`` files. Each |
5694fe9a CM |
404 | error handler works independently - the first condition met by an error handler |
405 | for a specific class will cause the error to be propagated rather than reset and | |
406 | retried. | |
407 | ||
408 | The action taken by the filesystem when the error is propagated is context | |
409 | dependent - it may cause a shut down in the case of an unrecoverable error, | |
410 | it may be reported back to userspace, or it may even be ignored because | |
411 | there's nothing useful we can with the error or anyone we can report it to (e.g. | |
412 | during unmount). | |
413 | ||
414 | The configuration files are organized into the following hierarchy for each | |
415 | mounted filesystem: | |
416 | ||
417 | /sys/fs/xfs/<dev>/error/<class>/<error>/ | |
418 | ||
419 | Where: | |
420 | <dev> | |
421 | The short device name of the mounted filesystem. This is the same device | |
422 | name that shows up in XFS kernel error messages as "XFS(<dev>): ..." | |
423 | ||
424 | <class> | |
425 | The subsystem the error configuration belongs to. As of 4.9, the defined | |
426 | classes are: | |
427 | ||
428 | - "metadata": applies metadata buffer write IO | |
429 | ||
430 | <error> | |
431 | The individual error handler configurations. | |
432 | ||
433 | ||
434 | Each filesystem has "global" error configuration options defined in their top | |
435 | level directory: | |
436 | ||
437 | /sys/fs/xfs/<dev>/error/ | |
438 | ||
439 | fail_at_unmount (Min: 0 Default: 1 Max: 1) | |
440 | Defines the filesystem error behavior at unmount time. | |
441 | ||
442 | If set to a value of 1, XFS will override all other error configurations | |
443 | during unmount and replace them with "immediate fail" characteristics. | |
444 | i.e. no retries, no retry timeout. This will always allow unmount to | |
445 | succeed when there are persistent errors present. | |
446 | ||
447 | If set to 0, the configured retry behaviour will continue until all | |
448 | retries and/or timeouts have been exhausted. This will delay unmount | |
449 | completion when there are persistent errors, and it may prevent the | |
450 | filesystem from ever unmounting fully in the case of "retry forever" | |
451 | handler configurations. | |
452 | ||
806654a9 | 453 | Note: there is no guarantee that fail_at_unmount can be set while an |
89b408a6 | 454 | unmount is in progress. It is possible that the ``sysfs`` entries are |
5694fe9a CM |
455 | removed by the unmounting filesystem before a "retry forever" error |
456 | handler configuration causes unmount to hang, and hence the filesystem | |
457 | must be configured appropriately before unmount begins to prevent | |
458 | unmount hangs. | |
459 | ||
460 | Each filesystem has specific error class handlers that define the error | |
461 | propagation behaviour for specific errors. There is also a "default" error | |
462 | handler defined, which defines the behaviour for all errors that don't have | |
89b408a6 | 463 | specific handlers defined. Where multiple retry constraints are configured for |
5694fe9a CM |
464 | a single error, the first retry configuration that expires will cause the error |
465 | to be propagated. The handler configurations are found in the directory: | |
466 | ||
467 | /sys/fs/xfs/<dev>/error/<class>/<error>/ | |
468 | ||
469 | max_retries (Min: -1 Default: Varies Max: INTMAX) | |
470 | Defines the allowed number of retries of a specific error before | |
471 | the filesystem will propagate the error. The retry count for a given | |
472 | error context (e.g. a specific metadata buffer) is reset every time | |
473 | there is a successful completion of the operation. | |
474 | ||
475 | Setting the value to "-1" will cause XFS to retry forever for this | |
476 | specific error. | |
477 | ||
478 | Setting the value to "0" will cause XFS to fail immediately when the | |
479 | specific error is reported. | |
480 | ||
481 | Setting the value to "N" (where 0 < N < Max) will make XFS retry the | |
482 | operation "N" times before propagating the error. | |
483 | ||
484 | retry_timeout_seconds (Min: -1 Default: Varies Max: 1 day) | |
485 | Define the amount of time (in seconds) that the filesystem is | |
486 | allowed to retry its operations when the specific error is | |
487 | found. | |
488 | ||
489 | Setting the value to "-1" will allow XFS to retry forever for this | |
490 | specific error. | |
491 | ||
492 | Setting the value to "0" will cause XFS to fail immediately when the | |
493 | specific error is reported. | |
494 | ||
495 | Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the | |
496 | operation for up to "N" seconds before propagating the error. | |
497 | ||
89b408a6 | 498 | **Note:** The default behaviour for a specific error handler is dependent on both |
5694fe9a CM |
499 | the class and error context. For example, the default values for |
500 | "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults | |
501 | to "fail immediately" behaviour. This is done because ENODEV is a fatal, | |
502 | unrecoverable error no matter how many times the metadata IO is retried. | |
f83d436a DW |
503 | |
504 | Workqueue Concurrency | |
505 | ===================== | |
506 | ||
507 | XFS uses kernel workqueues to parallelize metadata update processes. This | |
508 | enables it to take advantage of storage hardware that can service many IO | |
509 | operations simultaneously. This interface exposes internal implementation | |
510 | details of XFS, and as such is explicitly not part of any userspace API/ABI | |
511 | guarantee the kernel may give userspace. These are undocumented features of | |
512 | the generic workqueue implementation XFS uses for concurrency, and they are | |
513 | provided here purely for diagnostic and tuning purposes and may change at any | |
514 | time in the future. | |
515 | ||
516 | The control knobs for a filesystem's workqueues are organized by task at hand | |
517 | and the short name of the data device. They all can be found in: | |
518 | ||
519 | /sys/bus/workqueue/devices/${task}!${device} | |
520 | ||
521 | ================ =========== | |
522 | Task Description | |
523 | ================ =========== | |
524 | xfs_iwalk-$pid Inode scans of the entire filesystem. Currently limited to | |
525 | mount time quotacheck. | |
3fef46fc | 526 | xfs-gc Background garbage collection of disk space that have been |
47bd6d34 DW |
527 | speculatively allocated beyond EOF or for staging copy on |
528 | write operations. | |
f83d436a DW |
529 | ================ =========== |
530 | ||
531 | For example, the knobs for the quotacheck workqueue for /dev/nvme0n1 would be | |
532 | found in /sys/bus/workqueue/devices/xfs_iwalk-1111!nvme0n1/. | |
533 | ||
534 | The interesting knobs for XFS workqueues are as follows: | |
535 | ||
536 | ============ =========== | |
537 | Knob Description | |
538 | ============ =========== | |
539 | max_active Maximum number of background threads that can be started to | |
540 | run the work. | |
541 | cpumask CPUs upon which the threads are allowed to run. | |
542 | nice Relative priority of scheduling the threads. These are the | |
543 | same nice levels that can be applied to userspace processes. | |
8e8794b9 | 544 | ============ =========== |