staging: erofs: use xattr_prefix to wrap up
[linux-2.6-block.git] / drivers / staging / erofs / Documentation / filesystems / erofs.txt
CommitLineData
fdb05364
GX
1Overview
2========
3
4EROFS file-system stands for Enhanced Read-Only File System. Different
5from other read-only file systems, it aims to be designed for flexibility,
6scalability, but be kept simple and high performance.
7
8It is designed as a better filesystem solution for the following scenarios:
9 - read-only storage media or
10
11 - part of a fully trusted read-only solution, which means it needs to be
12 immutable and bit-for-bit identical to the official golden image for
13 their releases due to security and other considerations and
14
15 - hope to save some extra storage space with guaranteed end-to-end performance
16 by using reduced metadata and transparent file compression, especially
17 for those embedded devices with limited memory (ex, smartphone);
18
19Here is the main features of EROFS:
20 - Little endian on-disk design;
21
22 - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
23
24 - Metadata & data could be mixed by design;
25
26 - 2 inode versions for different requirements:
27 v1 v2
28 Inode metadata size: 32 bytes 64 bytes
29 Max file size: 4 GB 16 EB (also limited by max. vol size)
30 Max uids/gids: 65536 4294967296
31 File creation time: no yes (64 + 32-bit timestamp)
32 Max hardlinks: 65536 4294967296
33 Metadata reserved: 4 bytes 14 bytes
34
35 - Support extended attributes (xattrs) as an option;
36
37 - Support xattr inline and tail-end data inline for all files;
38
39 - Support transparent file compression as an option:
40 LZ4 algorithm with 4 KB fixed-output compression for high performance;
41
42The following git tree provides the file system user-space tools under
43development (ex, formatting tool mkfs.erofs):
44>> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
45
46Bugs and patches are welcome, please kindly help us and send to the following
47linux-erofs mailing list:
48>> linux-erofs mailing list <linux-erofs@lists.ozlabs.org>
49
50Note that EROFS is still working in progress as a Linux staging driver,
51Cc the staging mailing list as well is highly recommended:
52>> Linux Driver Project Developer List <devel@driverdev.osuosl.org>
53
54Mount options
55=============
56
57fault_injection=%d Enable fault injection in all supported types with
58 specified injection rate. Supported injection type:
59 Type_Name Type_Value
60 FAULT_KMALLOC 0x000000001
61(no)user_xattr Setup Extended User Attributes. Note: xattr is enabled
62 by default if CONFIG_EROFS_FS_XATTR is selected.
63(no)acl Setup POSIX Access Control List. Note: acl is enabled
64 by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
65
66On-disk details
67===============
68
69Summary
70-------
71Different from other read-only file systems, an EROFS volume is designed
72to be as simple as possible:
73
74 |-> aligned with the block size
75 ____________________________________________________________
76 | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
77 |_|__|_|_____|__________|_____|______|__________|_____|______|
78 0 +1K
79
80All data areas should be aligned with the block size, but metadata areas
81may not. All metadatas can be now observed in two different spaces (views):
82 1. Inode metadata space
83 Each valid inode should be aligned with an inode slot, which is a fixed
84 value (32 bytes) and designed to be kept in line with v1 inode size.
85
86 Each inode can be directly found with the following formula:
87 inode offset = meta_blkaddr * block_size + 32 * nid
88
89 |-> aligned with 8B
90 |-> followed closely
91 + meta_blkaddr blocks |-> another slot
92 _____________________________________________________________________
93 | ... | inode | xattrs | extents | data inline | ... | inode ...
94 |________|_______|(optional)|(optional)|__(optional)_|_____|__________
95 |-> aligned with the inode slot size
96 . .
97 . .
98 . .
99 . .
100 . .
101 . .
102 .____________________________________________________|-> aligned with 4B
103 | xattr_ibody_header | shared xattrs | inline xattrs |
104 |____________________|_______________|_______________|
105 |-> 12 bytes <-|->x * 4 bytes<-| .
106 . . .
107 . . .
108 . . .
109 ._______________________________.______________________.
110 | id | id | id | id | ... | id | ent | ... | ent| ... |
111 |____|____|____|____|______|____|_____|_____|____|_____|
112 |-> aligned with 4B
113 |-> aligned with 4B
114
115 Inode could be 32 or 64 bytes, which can be distinguished from a common
116 field which all inode versions have -- i_advise:
117
118 __________________ __________________
119 | i_advise | | i_advise |
120 |__________________| |__________________|
121 | ... | | ... |
122 | | | |
123 |__________________| 32 bytes | |
124 | |
125 |__________________| 64 bytes
126
127 Xattrs, extents, data inline are followed by the corresponding inode with
128 proper alignes, and they could be optional for different data mappings,
129 _currently_ there are totally 3 valid data mappings supported:
130
131 1) flat file data without data inline (no extent);
132 2) fixed-output size data compression (must have extents);
133 3) flat file data with tail-end data inline (no extent);
134
135 The size of the optional xattrs is indicated by i_xattr_count in inode
136 header. Large xattrs or xattrs shared by many different files can be
137 stored in shared xattrs metadata rather than inlined right after inode.
138
139 2. Shared xattrs metadata space
140 Shared xattrs space is similar to the above inode space, started with
141 a specific block indicated by xattr_blkaddr, organized one by one with
142 proper align.
143
144 Each share xattr can also be directly found by the following formula:
145 xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
146
147 |-> aligned by 4 bytes
148 + xattr_blkaddr blocks |-> aligned with 4 bytes
149 _________________________________________________________________________
150 | ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ...
151 |________|_____________|_____________|_____|______________|_______________
152
153Directories
154-----------
155All directories are now organized in a compact on-disk format. Note that
156each directory block is divided into index and name areas in order to support
157random file lookup, and all directory entries are _strictly_ recorded in
158alphabetical order in order to support improved prefix binary search
159algorithm (could refer to the related source code).
160
161 ___________________________
162 / |
163 / ______________|________________
164 / / | nameoff1 | nameoffN-1
165 ____________.______________._______________v________________v__________
166| dirent | dirent | ... | dirent | filename | filename | ... | filename |
167|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
168 \ ^
169 \ | * could have
170 \ | trailing '\0'
171 \________________________| nameoff0
172
173 Directory block
174
175Note that apart from the offset of the first filename, nameoff0 also indicates
176the total number of directory entries in this block since it is no need to
177introduce another on-disk field at all.
178
179Compression
180-----------
181Currently, EROFS supports 4KB fixed-output clustersize transparent file
182compression, as illustrated below:
183
184 |---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
185 clusterofs clusterofs clusterofs
186 | | | logical data
187_________v_______________________________v_____________________v_______________
188... | . | | . | | . | ...
189____|____.________|_____________|________.____|_____________|__.__________|____
190 |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
191 size size size size size
192 . . . .
193 . . . .
194 . . . .
195 _______._____________._____________._____________._____________________
196 ... | | | | ... physical data
197 _______|_____________|_____________|_____________|_____________________
198 |-> cluster <-|-> cluster <-|-> cluster <-|
199 size size size
200
201Currently each on-disk physical cluster can contain 4KB (un)compressed data
202at most. For each logical cluster, there is a corresponding on-disk index to
203describe its cluster type, physical cluster address, etc.
204
205See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
206