Commit | Line | Data |
---|---|---|
fdb05364 GX |
1 | Overview |
2 | ======== | |
3 | ||
4 | EROFS file-system stands for Enhanced Read-Only File System. Different | |
5 | from other read-only file systems, it aims to be designed for flexibility, | |
6 | scalability, but be kept simple and high performance. | |
7 | ||
8 | It is designed as a better filesystem solution for the following scenarios: | |
9 | - read-only storage media or | |
10 | ||
11 | - part of a fully trusted read-only solution, which means it needs to be | |
12 | immutable and bit-for-bit identical to the official golden image for | |
13 | their releases due to security and other considerations and | |
14 | ||
15 | - hope to save some extra storage space with guaranteed end-to-end performance | |
16 | by using reduced metadata and transparent file compression, especially | |
17 | for those embedded devices with limited memory (ex, smartphone); | |
18 | ||
19 | Here is the main features of EROFS: | |
20 | - Little endian on-disk design; | |
21 | ||
22 | - Currently 4KB block size (nobh) and therefore maximum 16TB address space; | |
23 | ||
24 | - Metadata & data could be mixed by design; | |
25 | ||
26 | - 2 inode versions for different requirements: | |
27 | v1 v2 | |
28 | Inode metadata size: 32 bytes 64 bytes | |
29 | Max file size: 4 GB 16 EB (also limited by max. vol size) | |
30 | Max uids/gids: 65536 4294967296 | |
31 | File creation time: no yes (64 + 32-bit timestamp) | |
32 | Max hardlinks: 65536 4294967296 | |
33 | Metadata reserved: 4 bytes 14 bytes | |
34 | ||
35 | - Support extended attributes (xattrs) as an option; | |
36 | ||
37 | - Support xattr inline and tail-end data inline for all files; | |
38 | ||
39 | - Support transparent file compression as an option: | |
40 | LZ4 algorithm with 4 KB fixed-output compression for high performance; | |
41 | ||
42 | The following git tree provides the file system user-space tools under | |
43 | development (ex, formatting tool mkfs.erofs): | |
44 | >> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git | |
45 | ||
46 | Bugs and patches are welcome, please kindly help us and send to the following | |
47 | linux-erofs mailing list: | |
48 | >> linux-erofs mailing list <linux-erofs@lists.ozlabs.org> | |
49 | ||
50 | Note that EROFS is still working in progress as a Linux staging driver, | |
51 | Cc the staging mailing list as well is highly recommended: | |
52 | >> Linux Driver Project Developer List <devel@driverdev.osuosl.org> | |
53 | ||
54 | Mount options | |
55 | ============= | |
56 | ||
57 | fault_injection=%d Enable fault injection in all supported types with | |
58 | specified injection rate. Supported injection type: | |
59 | Type_Name Type_Value | |
60 | FAULT_KMALLOC 0x000000001 | |
61 | (no)user_xattr Setup Extended User Attributes. Note: xattr is enabled | |
62 | by default if CONFIG_EROFS_FS_XATTR is selected. | |
63 | (no)acl Setup POSIX Access Control List. Note: acl is enabled | |
64 | by default if CONFIG_EROFS_FS_POSIX_ACL is selected. | |
65 | ||
66 | On-disk details | |
67 | =============== | |
68 | ||
69 | Summary | |
70 | ------- | |
71 | Different from other read-only file systems, an EROFS volume is designed | |
72 | to be as simple as possible: | |
73 | ||
74 | |-> aligned with the block size | |
75 | ____________________________________________________________ | |
76 | | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data | | |
77 | |_|__|_|_____|__________|_____|______|__________|_____|______| | |
78 | 0 +1K | |
79 | ||
80 | All data areas should be aligned with the block size, but metadata areas | |
81 | may not. All metadatas can be now observed in two different spaces (views): | |
82 | 1. Inode metadata space | |
83 | Each valid inode should be aligned with an inode slot, which is a fixed | |
84 | value (32 bytes) and designed to be kept in line with v1 inode size. | |
85 | ||
86 | Each inode can be directly found with the following formula: | |
87 | inode offset = meta_blkaddr * block_size + 32 * nid | |
88 | ||
89 | |-> aligned with 8B | |
90 | |-> followed closely | |
91 | + meta_blkaddr blocks |-> another slot | |
92 | _____________________________________________________________________ | |
93 | | ... | inode | xattrs | extents | data inline | ... | inode ... | |
94 | |________|_______|(optional)|(optional)|__(optional)_|_____|__________ | |
95 | |-> aligned with the inode slot size | |
96 | . . | |
97 | . . | |
98 | . . | |
99 | . . | |
100 | . . | |
101 | . . | |
102 | .____________________________________________________|-> aligned with 4B | |
103 | | xattr_ibody_header | shared xattrs | inline xattrs | | |
104 | |____________________|_______________|_______________| | |
105 | |-> 12 bytes <-|->x * 4 bytes<-| . | |
106 | . . . | |
107 | . . . | |
108 | . . . | |
109 | ._______________________________.______________________. | |
110 | | id | id | id | id | ... | id | ent | ... | ent| ... | | |
111 | |____|____|____|____|______|____|_____|_____|____|_____| | |
112 | |-> aligned with 4B | |
113 | |-> aligned with 4B | |
114 | ||
115 | Inode could be 32 or 64 bytes, which can be distinguished from a common | |
116 | field which all inode versions have -- i_advise: | |
117 | ||
118 | __________________ __________________ | |
119 | | i_advise | | i_advise | | |
120 | |__________________| |__________________| | |
121 | | ... | | ... | | |
122 | | | | | | |
123 | |__________________| 32 bytes | | | |
124 | | | | |
125 | |__________________| 64 bytes | |
126 | ||
127 | Xattrs, extents, data inline are followed by the corresponding inode with | |
128 | proper alignes, and they could be optional for different data mappings, | |
129 | _currently_ there are totally 3 valid data mappings supported: | |
130 | ||
131 | 1) flat file data without data inline (no extent); | |
132 | 2) fixed-output size data compression (must have extents); | |
133 | 3) flat file data with tail-end data inline (no extent); | |
134 | ||
135 | The size of the optional xattrs is indicated by i_xattr_count in inode | |
136 | header. Large xattrs or xattrs shared by many different files can be | |
137 | stored in shared xattrs metadata rather than inlined right after inode. | |
138 | ||
139 | 2. Shared xattrs metadata space | |
140 | Shared xattrs space is similar to the above inode space, started with | |
141 | a specific block indicated by xattr_blkaddr, organized one by one with | |
142 | proper align. | |
143 | ||
144 | Each share xattr can also be directly found by the following formula: | |
145 | xattr offset = xattr_blkaddr * block_size + 4 * xattr_id | |
146 | ||
147 | |-> aligned by 4 bytes | |
148 | + xattr_blkaddr blocks |-> aligned with 4 bytes | |
149 | _________________________________________________________________________ | |
150 | | ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ... | |
151 | |________|_____________|_____________|_____|______________|_______________ | |
152 | ||
153 | Directories | |
154 | ----------- | |
155 | All directories are now organized in a compact on-disk format. Note that | |
156 | each directory block is divided into index and name areas in order to support | |
157 | random file lookup, and all directory entries are _strictly_ recorded in | |
158 | alphabetical order in order to support improved prefix binary search | |
159 | algorithm (could refer to the related source code). | |
160 | ||
161 | ___________________________ | |
162 | / | | |
163 | / ______________|________________ | |
164 | / / | nameoff1 | nameoffN-1 | |
165 | ____________.______________._______________v________________v__________ | |
166 | | dirent | dirent | ... | dirent | filename | filename | ... | filename | | |
167 | |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| | |
168 | \ ^ | |
169 | \ | * could have | |
170 | \ | trailing '\0' | |
171 | \________________________| nameoff0 | |
172 | ||
173 | Directory block | |
174 | ||
175 | Note that apart from the offset of the first filename, nameoff0 also indicates | |
176 | the total number of directory entries in this block since it is no need to | |
177 | introduce another on-disk field at all. | |
178 | ||
179 | Compression | |
180 | ----------- | |
181 | Currently, EROFS supports 4KB fixed-output clustersize transparent file | |
182 | compression, as illustrated below: | |
183 | ||
184 | |---- Variant-Length Extent ----|-------- VLE --------|----- VLE ----- | |
185 | clusterofs clusterofs clusterofs | |
186 | | | | logical data | |
187 | _________v_______________________________v_____________________v_______________ | |
188 | ... | . | | . | | . | ... | |
189 | ____|____.________|_____________|________.____|_____________|__.__________|____ | |
190 | |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-| | |
191 | size size size size size | |
192 | . . . . | |
193 | . . . . | |
194 | . . . . | |
195 | _______._____________._____________._____________._____________________ | |
196 | ... | | | | ... physical data | |
197 | _______|_____________|_____________|_____________|_____________________ | |
198 | |-> cluster <-|-> cluster <-|-> cluster <-| | |
199 | size size size | |
200 | ||
201 | Currently each on-disk physical cluster can contain 4KB (un)compressed data | |
202 | at most. For each logical cluster, there is a corresponding on-disk index to | |
203 | describe its cluster type, physical cluster address, etc. | |
204 | ||
205 | See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details. | |
206 |