Commit | Line | Data |
---|---|---|
2640c19d MCC |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ====== | |
962281a7 | 4 | NILFS2 |
2640c19d | 5 | ====== |
962281a7 RK |
6 | |
7 | NILFS2 is a log-structured file system (LFS) supporting continuous | |
8 | snapshotting. In addition to versioning capability of the entire file | |
9 | system, users can even restore files mistakenly overwritten or | |
10 | destroyed just a few seconds ago. Since NILFS2 can keep consistency | |
11 | like conventional LFS, it achieves quick recovery after system | |
12 | crashes. | |
13 | ||
14 | NILFS2 creates a number of checkpoints every few seconds or per | |
15 | synchronous write basis (unless there is no change). Users can select | |
16 | significant versions among continuously created checkpoints, and can | |
17 | change them into snapshots which will be preserved until they are | |
18 | changed back to checkpoints. | |
19 | ||
20 | There is no limit on the number of snapshots until the volume gets | |
21 | full. Each snapshot is mountable as a read-only file system | |
22 | concurrently with its writable mount, and this feature is convenient | |
23 | for online backup. | |
24 | ||
25 | The userland tools are included in nilfs-utils package, which is | |
26 | available from the following download page. At least "mkfs.nilfs2", | |
27 | "mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called | |
28 | cleaner or garbage collector) are required. Details on the tools are | |
29 | described in the man pages included in the package. | |
30 | ||
2640c19d MCC |
31 | :Project web page: https://nilfs.sourceforge.io/ |
32 | :Download page: https://nilfs.sourceforge.io/en/download.html | |
33 | :List info: http://vger.kernel.org/vger-lists.html#linux-nilfs | |
962281a7 RK |
34 | |
35 | Caveats | |
36 | ======= | |
37 | ||
38 | Features which NILFS2 does not support yet: | |
39 | ||
40 | - atime | |
41 | - extended attributes | |
42 | - POSIX ACLs | |
43 | - quotas | |
fb6e7113 | 44 | - fsck |
962281a7 RK |
45 | - defragmentation |
46 | ||
47 | Mount options | |
48 | ============= | |
49 | ||
50 | NILFS2 supports the following mount options: | |
51 | (*) == default | |
52 | ||
2640c19d | 53 | ======================= ======================================================= |
773bc4f3 RK |
54 | barrier(*) This enables/disables the use of write barriers. This |
55 | nobarrier requires an IO stack which can support barriers, and | |
56 | if nilfs gets an error on a barrier write, it will | |
57 | disable again with a warning. | |
277a6a34 RK |
58 | errors=continue Keep going on a filesystem error. |
59 | errors=remount-ro(*) Remount the filesystem read-only on an error. | |
962281a7 RK |
60 | errors=panic Panic and halt the machine if an error occurs. |
61 | cp=n Specify the checkpoint-number of the snapshot to be | |
62 | mounted. Checkpoints and snapshots are listed by lscp | |
63 | user command. Only the checkpoints marked as snapshot | |
64 | are mountable with this option. Snapshot is read-only, | |
65 | so a read-only mount option must be specified together. | |
66 | order=relaxed(*) Apply relaxed order semantics that allows modified data | |
67 | blocks to be written to disk without making a | |
68 | checkpoint if no metadata update is going. This mode | |
69 | is equivalent to the ordered data mode of the ext3 | |
70 | filesystem except for the updates on data blocks still | |
71 | conserve atomicity. This will improve synchronous | |
72 | write performance for overwriting. | |
73 | order=strict Apply strict in-order semantics that preserves sequence | |
74 | of all file operations including overwriting of data | |
75 | blocks. That means, it is guaranteed that no | |
76 | overtaking of events occurs in the recovered file | |
77 | system after a crash. | |
0234576d RK |
78 | norecovery Disable recovery of the filesystem on mount. |
79 | This disables every write access on the device for | |
80 | read-only mounts or snapshots. This option will fail | |
81 | for r/w mounts on an unclean volume. | |
802d3177 RK |
82 | discard This enables/disables the use of discard/TRIM commands. |
83 | nodiscard(*) The discard/TRIM commands are sent to the underlying | |
84 | block device when blocks are freed. This is useful | |
85 | for SSD devices and sparse/thinly-provisioned LUNs. | |
2640c19d | 86 | ======================= ======================================================= |
962281a7 | 87 | |
d623a942 VD |
88 | Ioctls |
89 | ====== | |
90 | ||
91 | There is some NILFS2 specific functionality which can be accessed by applications | |
92 | through the system call interfaces. The list of all NILFS2 specific ioctls are | |
93 | shown in the table below. | |
94 | ||
2640c19d MCC |
95 | Table of NILFS2 specific ioctls: |
96 | ||
97 | ============================== =============================================== | |
d623a942 | 98 | Ioctl Description |
2640c19d | 99 | ============================== =============================================== |
d623a942 VD |
100 | NILFS_IOCTL_CHANGE_CPMODE Change mode of given checkpoint between |
101 | checkpoint and snapshot state. This ioctl is | |
102 | used in chcp and mkcp utilities. | |
103 | ||
104 | NILFS_IOCTL_DELETE_CHECKPOINT Remove checkpoint from NILFS2 file system. | |
105 | This ioctl is used in rmcp utility. | |
106 | ||
107 | NILFS_IOCTL_GET_CPINFO Return info about requested checkpoints. This | |
108 | ioctl is used in lscp utility and by | |
109 | nilfs_cleanerd daemon. | |
110 | ||
111 | NILFS_IOCTL_GET_CPSTAT Return checkpoints statistics. This ioctl is | |
112 | used by lscp, rmcp utilities and by | |
113 | nilfs_cleanerd daemon. | |
114 | ||
115 | NILFS_IOCTL_GET_SUINFO Return segment usage info about requested | |
116 | segments. This ioctl is used in lssu, | |
117 | nilfs_resize utilities and by nilfs_cleanerd | |
118 | daemon. | |
119 | ||
2cc88f3a AR |
120 | NILFS_IOCTL_SET_SUINFO Modify segment usage info of requested |
121 | segments. This ioctl is used by | |
122 | nilfs_cleanerd daemon to skip unnecessary | |
123 | cleaning operation of segments and reduce | |
124 | performance penalty or wear of flash device | |
125 | due to redundant move of in-use blocks. | |
126 | ||
d623a942 VD |
127 | NILFS_IOCTL_GET_SUSTAT Return segment usage statistics. This ioctl |
128 | is used in lssu, nilfs_resize utilities and | |
129 | by nilfs_cleanerd daemon. | |
130 | ||
131 | NILFS_IOCTL_GET_VINFO Return information on virtual block addresses. | |
132 | This ioctl is used by nilfs_cleanerd daemon. | |
133 | ||
134 | NILFS_IOCTL_GET_BDESCS Return information about descriptors of disk | |
135 | block numbers. This ioctl is used by | |
136 | nilfs_cleanerd daemon. | |
137 | ||
138 | NILFS_IOCTL_CLEAN_SEGMENTS Do garbage collection operation in the | |
139 | environment of requested parameters from | |
140 | userspace. This ioctl is used by | |
141 | nilfs_cleanerd daemon. | |
142 | ||
143 | NILFS_IOCTL_SYNC Make a checkpoint. This ioctl is used in | |
144 | mkcp utility. | |
145 | ||
146 | NILFS_IOCTL_RESIZE Resize NILFS2 volume. This ioctl is used | |
147 | by nilfs_resize utility. | |
148 | ||
149 | NILFS_IOCTL_SET_ALLOC_RANGE Define lower limit of segments in bytes and | |
150 | upper limit of segments in bytes. This ioctl | |
151 | is used by nilfs_resize utility. | |
2640c19d | 152 | ============================== =============================================== |
d623a942 | 153 | |
962281a7 RK |
154 | NILFS2 usage |
155 | ============ | |
156 | ||
2640c19d | 157 | To use nilfs2 as a local file system, simply:: |
962281a7 RK |
158 | |
159 | # mkfs -t nilfs2 /dev/block_device | |
160 | # mount -t nilfs2 /dev/block_device /dir | |
161 | ||
162 | This will also invoke the cleaner through the mount helper program | |
163 | (mount.nilfs2). | |
164 | ||
165 | Checkpoints and snapshots are managed by the following commands. | |
166 | Their manpages are included in the nilfs-utils package above. | |
167 | ||
2640c19d | 168 | ==== =========================================================== |
962281a7 RK |
169 | lscp list checkpoints or snapshots. |
170 | mkcp make a checkpoint or a snapshot. | |
171 | chcp change an existing checkpoint to a snapshot or vice versa. | |
172 | rmcp invalidate specified checkpoint(s). | |
2640c19d | 173 | ==== =========================================================== |
962281a7 | 174 | |
2640c19d | 175 | To mount a snapshot:: |
962281a7 RK |
176 | |
177 | # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir | |
178 | ||
179 | where <cno> is the checkpoint number of the snapshot. | |
180 | ||
2640c19d | 181 | To unmount the NILFS2 mount point or snapshot, simply:: |
962281a7 RK |
182 | |
183 | # umount /dir | |
184 | ||
185 | Then, the cleaner daemon is automatically shut down by the umount | |
186 | helper program (umount.nilfs2). | |
187 | ||
188 | Disk format | |
189 | =========== | |
190 | ||
191 | A nilfs2 volume is equally divided into a number of segments except | |
192 | for the super block (SB) and segment #0. A segment is the container | |
193 | of logs. Each log is composed of summary information blocks, payload | |
2640c19d | 194 | blocks, and an optional super root block (SR):: |
962281a7 RK |
195 | |
196 | ______________________________________________________ | |
197 | | |SB| | Segment | Segment | Segment | ... | Segment | | | |
198 | |_|__|_|____0____|____1____|____2____|_____|____N____|_| | |
199 | 0 +1K +4K +8M +16M +24M +(8MB x N) | |
200 | . . (Typical offsets for 4KB-block) | |
201 | . . | |
202 | .______________________. | |
203 | | log | log |... | log | | |
204 | |__1__|__2__|____|__m__| | |
205 | . . | |
206 | . . | |
207 | . . | |
208 | .______________________________. | |
209 | | Summary | Payload blocks |SR| | |
210 | |_blocks__|_________________|__| | |
211 | ||
212 | The payload blocks are organized per file, and each file consists of | |
2640c19d | 213 | data blocks and B-tree node blocks:: |
962281a7 RK |
214 | |
215 | |<--- File-A --->|<--- File-B --->| | |
216 | _______________________________________________________________ | |
217 | | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ... | |
218 | _|_____________|_______________|_____________|_______________|_ | |
219 | ||
220 | ||
221 | Since only the modified blocks are written in the log, it may have | |
222 | files without data blocks or B-tree node blocks. | |
223 | ||
224 | The organization of the blocks is recorded in the summary information | |
225 | blocks, which contains a header structure (nilfs_segment_summary), per | |
2640c19d | 226 | file structures (nilfs_finfo), and per block structures (nilfs_binfo):: |
962281a7 RK |
227 | |
228 | _________________________________________________________________________ | |
229 | | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |... | |
230 | |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___ | |
231 | ||
232 | ||
233 | The logs include regular files, directory files, symbolic link files | |
d56b699d | 234 | and several meta data files. The meta data files are the files used |
962281a7 | 235 | to maintain file system meta data. The current version of NILFS2 uses |
2640c19d | 236 | the following meta data files:: |
962281a7 RK |
237 | |
238 | 1) Inode file (ifile) -- Stores on-disk inodes | |
239 | 2) Checkpoint file (cpfile) -- Stores checkpoints | |
240 | 3) Segment usage file (sufile) -- Stores allocation state of segments | |
241 | 4) Data address translation file -- Maps virtual block numbers to usual | |
242 | (DAT) block numbers. This file serves to | |
243 | make on-disk blocks relocatable. | |
962281a7 | 244 | |
2640c19d | 245 | The following figure shows a typical organization of the logs:: |
962281a7 RK |
246 | |
247 | _________________________________________________________________________ | |
248 | | Summary | regular file | file | ... | ifile | cpfile | sufile | DAT |SR| | |
249 | |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__| | |
250 | ||
251 | ||
252 | To stride over segment boundaries, this sequence of files may be split | |
253 | into multiple logs. The sequence of logs that should be treated as | |
254 | logically one log, is delimited with flags marked in the segment | |
255 | summary. The recovery code of nilfs2 looks this boundary information | |
256 | to ensure atomicity of updates. | |
257 | ||
258 | The super root block is inserted for every checkpoints. It includes | |
259 | three special inodes, inodes for the DAT, cpfile, and sufile. Inodes | |
260 | of regular files, directories, symlinks and other special files, are | |
261 | included in the ifile. The inode of ifile itself is included in the | |
262 | corresponding checkpoint entry in the cpfile. Thus, the hierarchy | |
2640c19d | 263 | among NILFS2 files can be depicted as follows:: |
962281a7 RK |
264 | |
265 | Super block (SB) | |
266 | | | |
267 | v | |
268 | Super root block (the latest cno=xx) | |
269 | |-- DAT | |
270 | |-- sufile | |
271 | `-- cpfile | |
272 | |-- ifile (cno=c1) | |
273 | |-- ifile (cno=c2) ---- file (ino=i1) | |
274 | : : |-- file (ino=i2) | |
275 | `-- ifile (cno=xx) |-- file (ino=i3) | |
276 | : : | |
277 | `-- file (ino=yy) | |
278 | ( regular file, directory, or symlink ) | |
279 | ||
e63e88bc RK |
280 | For detail on the format of each file, please see nilfs2_ondisk.h |
281 | located at include/uapi/linux directory. | |
756cbdb3 RK |
282 | |
283 | There are no patents or other intellectual property that we protect | |
284 | with regard to the design of NILFS2. It is allowed to replicate the | |
285 | design in hopes that other operating systems could share (mount, read, | |
286 | write, etc.) data stored in this format. |