Commit | Line | Data |
---|---|---|
fdb05364 GX |
1 | Overview |
2 | ======== | |
3 | ||
4 | EROFS file-system stands for Enhanced Read-Only File System. Different | |
5 | from other read-only file systems, it aims to be designed for flexibility, | |
6 | scalability, but be kept simple and high performance. | |
7 | ||
8 | It is designed as a better filesystem solution for the following scenarios: | |
9 | - read-only storage media or | |
10 | ||
11 | - part of a fully trusted read-only solution, which means it needs to be | |
12 | immutable and bit-for-bit identical to the official golden image for | |
13 | their releases due to security and other considerations and | |
14 | ||
15 | - hope to save some extra storage space with guaranteed end-to-end performance | |
16 | by using reduced metadata and transparent file compression, especially | |
17 | for those embedded devices with limited memory (ex, smartphone); | |
18 | ||
19 | Here is the main features of EROFS: | |
20 | - Little endian on-disk design; | |
21 | ||
22 | - Currently 4KB block size (nobh) and therefore maximum 16TB address space; | |
23 | ||
24 | - Metadata & data could be mixed by design; | |
25 | ||
26 | - 2 inode versions for different requirements: | |
27 | v1 v2 | |
28 | Inode metadata size: 32 bytes 64 bytes | |
29 | Max file size: 4 GB 16 EB (also limited by max. vol size) | |
30 | Max uids/gids: 65536 4294967296 | |
31 | File creation time: no yes (64 + 32-bit timestamp) | |
32 | Max hardlinks: 65536 4294967296 | |
33 | Metadata reserved: 4 bytes 14 bytes | |
34 | ||
35 | - Support extended attributes (xattrs) as an option; | |
36 | ||
37 | - Support xattr inline and tail-end data inline for all files; | |
38 | ||
516c115c GX |
39 | - Support POSIX.1e ACLs by using xattrs; |
40 | ||
fdb05364 GX |
41 | - Support transparent file compression as an option: |
42 | LZ4 algorithm with 4 KB fixed-output compression for high performance; | |
43 | ||
44 | The following git tree provides the file system user-space tools under | |
45 | development (ex, formatting tool mkfs.erofs): | |
46 | >> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git | |
47 | ||
48 | Bugs and patches are welcome, please kindly help us and send to the following | |
49 | linux-erofs mailing list: | |
50 | >> linux-erofs mailing list <linux-erofs@lists.ozlabs.org> | |
51 | ||
fdb05364 GX |
52 | Mount options |
53 | ============= | |
54 | ||
fdb05364 GX |
55 | (no)user_xattr Setup Extended User Attributes. Note: xattr is enabled |
56 | by default if CONFIG_EROFS_FS_XATTR is selected. | |
57 | (no)acl Setup POSIX Access Control List. Note: acl is enabled | |
58 | by default if CONFIG_EROFS_FS_POSIX_ACL is selected. | |
4279f3f9 GX |
59 | cache_strategy=%s Select a strategy for cached decompression from now on: |
60 | disabled: In-place I/O decompression only; | |
61 | readahead: Cache the last incomplete compressed physical | |
62 | cluster for further reading. It still does | |
63 | in-place I/O decompression for the rest | |
64 | compressed physical clusters; | |
65 | readaround: Cache the both ends of incomplete compressed | |
66 | physical clusters for further reading. | |
67 | It still does in-place I/O decompression | |
68 | for the rest compressed physical clusters. | |
fdb05364 GX |
69 | |
70 | On-disk details | |
71 | =============== | |
72 | ||
73 | Summary | |
74 | ------- | |
75 | Different from other read-only file systems, an EROFS volume is designed | |
76 | to be as simple as possible: | |
77 | ||
78 | |-> aligned with the block size | |
79 | ____________________________________________________________ | |
80 | | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data | | |
81 | |_|__|_|_____|__________|_____|______|__________|_____|______| | |
82 | 0 +1K | |
83 | ||
84 | All data areas should be aligned with the block size, but metadata areas | |
85 | may not. All metadatas can be now observed in two different spaces (views): | |
86 | 1. Inode metadata space | |
87 | Each valid inode should be aligned with an inode slot, which is a fixed | |
88 | value (32 bytes) and designed to be kept in line with v1 inode size. | |
89 | ||
90 | Each inode can be directly found with the following formula: | |
91 | inode offset = meta_blkaddr * block_size + 32 * nid | |
92 | ||
93 | |-> aligned with 8B | |
94 | |-> followed closely | |
95 | + meta_blkaddr blocks |-> another slot | |
96 | _____________________________________________________________________ | |
97 | | ... | inode | xattrs | extents | data inline | ... | inode ... | |
98 | |________|_______|(optional)|(optional)|__(optional)_|_____|__________ | |
99 | |-> aligned with the inode slot size | |
100 | . . | |
101 | . . | |
102 | . . | |
103 | . . | |
104 | . . | |
105 | . . | |
106 | .____________________________________________________|-> aligned with 4B | |
107 | | xattr_ibody_header | shared xattrs | inline xattrs | | |
108 | |____________________|_______________|_______________| | |
109 | |-> 12 bytes <-|->x * 4 bytes<-| . | |
110 | . . . | |
111 | . . . | |
112 | . . . | |
113 | ._______________________________.______________________. | |
114 | | id | id | id | id | ... | id | ent | ... | ent| ... | | |
115 | |____|____|____|____|______|____|_____|_____|____|_____| | |
116 | |-> aligned with 4B | |
117 | |-> aligned with 4B | |
118 | ||
119 | Inode could be 32 or 64 bytes, which can be distinguished from a common | |
120 | field which all inode versions have -- i_advise: | |
121 | ||
122 | __________________ __________________ | |
123 | | i_advise | | i_advise | | |
124 | |__________________| |__________________| | |
125 | | ... | | ... | | |
126 | | | | | | |
127 | |__________________| 32 bytes | | | |
128 | | | | |
129 | |__________________| 64 bytes | |
130 | ||
131 | Xattrs, extents, data inline are followed by the corresponding inode with | |
132 | proper alignes, and they could be optional for different data mappings, | |
133 | _currently_ there are totally 3 valid data mappings supported: | |
134 | ||
135 | 1) flat file data without data inline (no extent); | |
136 | 2) fixed-output size data compression (must have extents); | |
137 | 3) flat file data with tail-end data inline (no extent); | |
138 | ||
139 | The size of the optional xattrs is indicated by i_xattr_count in inode | |
140 | header. Large xattrs or xattrs shared by many different files can be | |
141 | stored in shared xattrs metadata rather than inlined right after inode. | |
142 | ||
143 | 2. Shared xattrs metadata space | |
144 | Shared xattrs space is similar to the above inode space, started with | |
145 | a specific block indicated by xattr_blkaddr, organized one by one with | |
146 | proper align. | |
147 | ||
148 | Each share xattr can also be directly found by the following formula: | |
149 | xattr offset = xattr_blkaddr * block_size + 4 * xattr_id | |
150 | ||
151 | |-> aligned by 4 bytes | |
152 | + xattr_blkaddr blocks |-> aligned with 4 bytes | |
153 | _________________________________________________________________________ | |
154 | | ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ... | |
155 | |________|_____________|_____________|_____|______________|_______________ | |
156 | ||
157 | Directories | |
158 | ----------- | |
159 | All directories are now organized in a compact on-disk format. Note that | |
160 | each directory block is divided into index and name areas in order to support | |
161 | random file lookup, and all directory entries are _strictly_ recorded in | |
162 | alphabetical order in order to support improved prefix binary search | |
163 | algorithm (could refer to the related source code). | |
164 | ||
165 | ___________________________ | |
166 | / | | |
167 | / ______________|________________ | |
168 | / / | nameoff1 | nameoffN-1 | |
169 | ____________.______________._______________v________________v__________ | |
170 | | dirent | dirent | ... | dirent | filename | filename | ... | filename | | |
171 | |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| | |
172 | \ ^ | |
173 | \ | * could have | |
174 | \ | trailing '\0' | |
175 | \________________________| nameoff0 | |
176 | ||
177 | Directory block | |
178 | ||
179 | Note that apart from the offset of the first filename, nameoff0 also indicates | |
180 | the total number of directory entries in this block since it is no need to | |
181 | introduce another on-disk field at all. | |
182 | ||
183 | Compression | |
184 | ----------- | |
185 | Currently, EROFS supports 4KB fixed-output clustersize transparent file | |
186 | compression, as illustrated below: | |
187 | ||
188 | |---- Variant-Length Extent ----|-------- VLE --------|----- VLE ----- | |
189 | clusterofs clusterofs clusterofs | |
190 | | | | logical data | |
191 | _________v_______________________________v_____________________v_______________ | |
192 | ... | . | | . | | . | ... | |
193 | ____|____.________|_____________|________.____|_____________|__.__________|____ | |
194 | |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-| | |
195 | size size size size size | |
196 | . . . . | |
197 | . . . . | |
198 | . . . . | |
199 | _______._____________._____________._____________._____________________ | |
200 | ... | | | | ... physical data | |
201 | _______|_____________|_____________|_____________|_____________________ | |
202 | |-> cluster <-|-> cluster <-|-> cluster <-| | |
203 | size size size | |
204 | ||
205 | Currently each on-disk physical cluster can contain 4KB (un)compressed data | |
206 | at most. For each logical cluster, there is a corresponding on-disk index to | |
207 | describe its cluster type, physical cluster address, etc. | |
208 | ||
209 | See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details. | |
210 |