Commit | Line | Data |
---|---|---|
6ff2deb2 EB |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | .. _fsverity: | |
4 | ||
5 | ======================================================= | |
6 | fs-verity: read-only file-based authenticity protection | |
7 | ======================================================= | |
8 | ||
9 | Introduction | |
10 | ============ | |
11 | ||
12 | fs-verity (``fs/verity/``) is a support layer that filesystems can | |
13 | hook into to support transparent integrity and authenticity protection | |
14 | of read-only files. Currently, it is supported by the ext4 and f2fs | |
15 | filesystems. Like fscrypt, not too much filesystem-specific code is | |
16 | needed to support fs-verity. | |
17 | ||
18 | fs-verity is similar to `dm-verity | |
19 | <https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ | |
20 | but works on files rather than block devices. On regular files on | |
21 | filesystems supporting fs-verity, userspace can execute an ioctl that | |
22 | causes the filesystem to build a Merkle tree for the file and persist | |
23 | it to a filesystem-specific location associated with the file. | |
24 | ||
25 | After this, the file is made readonly, and all reads from the file are | |
26 | automatically verified against the file's Merkle tree. Reads of any | |
27 | corrupted data, including mmap reads, will fail. | |
28 | ||
29 | Userspace can use another ioctl to retrieve the root hash (actually | |
ed45e201 EB |
30 | the "fs-verity file digest", which is a hash that includes the Merkle |
31 | tree root hash) that fs-verity is enforcing for the file. This ioctl | |
32 | executes in constant time, regardless of the file size. | |
6ff2deb2 EB |
33 | |
34 | fs-verity is essentially a way to hash a file in constant time, | |
35 | subject to the caveat that reads which would violate the hash will | |
36 | fail at runtime. | |
37 | ||
38 | Use cases | |
39 | ========= | |
40 | ||
41 | By itself, the base fs-verity feature only provides integrity | |
42 | protection, i.e. detection of accidental (non-malicious) corruption. | |
43 | ||
44 | However, because fs-verity makes retrieving the file hash extremely | |
45 | efficient, it's primarily meant to be used as a tool to support | |
46 | authentication (detection of malicious modifications) or auditing | |
47 | (logging file hashes before use). | |
48 | ||
49 | Trusted userspace code (e.g. operating system code running on a | |
50 | read-only partition that is itself authenticated by dm-verity) can | |
51 | authenticate the contents of an fs-verity file by using the | |
52 | `FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a | |
53 | digital signature of it. | |
54 | ||
55 | A standard file hash could be used instead of fs-verity. However, | |
56 | this is inefficient if the file is large and only a small portion may | |
57 | be accessed. This is often the case for Android application package | |
58 | (APK) files, for example. These typically contain many translations, | |
59 | classes, and other resources that are infrequently or even never | |
60 | accessed on a particular device. It would be slow and wasteful to | |
61 | read and hash the entire file before starting the application. | |
62 | ||
63 | Unlike an ahead-of-time hash, fs-verity also re-verifies data each | |
64 | time it's paged in. This ensures that malicious disk firmware can't | |
65 | undetectably change the contents of the file at runtime. | |
66 | ||
67 | fs-verity does not replace or obsolete dm-verity. dm-verity should | |
68 | still be used on read-only filesystems. fs-verity is for files that | |
69 | must live on a read-write filesystem because they are independently | |
70 | updated and potentially user-installed, so dm-verity cannot be used. | |
71 | ||
72 | The base fs-verity feature is a hashing mechanism only; actually | |
02ee2316 MZ |
73 | authenticating the files may be done by: |
74 | ||
75 | * Userspace-only | |
76 | ||
77 | * Builtin signature verification + userspace policy | |
78 | ||
79 | fs-verity optionally supports a simple signature verification | |
80 | mechanism where users can configure the kernel to require that | |
81 | all fs-verity files be signed by a key loaded into a keyring; | |
82 | see `Built-in signature verification`_. | |
83 | ||
84 | * Integrity Measurement Architecture (IMA) | |
85 | ||
86 | IMA supports including fs-verity file digests and signatures in the | |
87 | IMA measurement list and verifying fs-verity based file signatures | |
88 | stored as security.ima xattrs, based on policy. | |
89 | ||
6ff2deb2 EB |
90 | |
91 | User API | |
92 | ======== | |
93 | ||
94 | FS_IOC_ENABLE_VERITY | |
95 | -------------------- | |
96 | ||
97 | The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes | |
9303c9d5 | 98 | in a pointer to a struct fsverity_enable_arg, defined as |
6ff2deb2 EB |
99 | follows:: |
100 | ||
101 | struct fsverity_enable_arg { | |
102 | __u32 version; | |
103 | __u32 hash_algorithm; | |
104 | __u32 block_size; | |
105 | __u32 salt_size; | |
106 | __u64 salt_ptr; | |
107 | __u32 sig_size; | |
108 | __u32 __reserved1; | |
109 | __u64 sig_ptr; | |
110 | __u64 __reserved2[11]; | |
111 | }; | |
112 | ||
113 | This structure contains the parameters of the Merkle tree to build for | |
114 | the file, and optionally contains a signature. It must be initialized | |
115 | as follows: | |
116 | ||
117 | - ``version`` must be 1. | |
118 | - ``hash_algorithm`` must be the identifier for the hash algorithm to | |
119 | use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See | |
120 | ``include/uapi/linux/fsverity.h`` for the list of possible values. | |
121 | - ``block_size`` must be the Merkle tree block size. Currently, this | |
122 | must be equal to the system page size, which is usually 4096 bytes. | |
123 | Other sizes may be supported in the future. This value is not | |
124 | necessarily the same as the filesystem block size. | |
125 | - ``salt_size`` is the size of the salt in bytes, or 0 if no salt is | |
126 | provided. The salt is a value that is prepended to every hashed | |
127 | block; it can be used to personalize the hashing for a particular | |
128 | file or device. Currently the maximum salt size is 32 bytes. | |
129 | - ``salt_ptr`` is the pointer to the salt, or NULL if no salt is | |
130 | provided. | |
131 | - ``sig_size`` is the size of the signature in bytes, or 0 if no | |
132 | signature is provided. Currently the signature is (somewhat | |
133 | arbitrarily) limited to 16128 bytes. See `Built-in signature | |
134 | verification`_ for more information. | |
135 | - ``sig_ptr`` is the pointer to the signature, or NULL if no | |
136 | signature is provided. | |
137 | - All reserved fields must be zeroed. | |
138 | ||
139 | FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for | |
140 | the file and persist it to a filesystem-specific location associated | |
141 | with the file, then mark the file as a verity file. This ioctl may | |
142 | take a long time to execute on large files, and it is interruptible by | |
143 | fatal signals. | |
144 | ||
145 | FS_IOC_ENABLE_VERITY checks for write access to the inode. However, | |
146 | it must be executed on an O_RDONLY file descriptor and no processes | |
147 | can have the file open for writing. Attempts to open the file for | |
148 | writing while this ioctl is executing will fail with ETXTBSY. (This | |
149 | is necessary to guarantee that no writable file descriptors will exist | |
150 | after verity is enabled, and to guarantee that the file's contents are | |
151 | stable while the Merkle tree is being built over it.) | |
152 | ||
153 | On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a | |
154 | verity file. On failure (including the case of interruption by a | |
155 | fatal signal), no changes are made to the file. | |
156 | ||
157 | FS_IOC_ENABLE_VERITY can fail with the following errors: | |
158 | ||
159 | - ``EACCES``: the process does not have write access to the file | |
160 | - ``EBADMSG``: the signature is malformed | |
161 | - ``EBUSY``: this ioctl is already running on the file | |
162 | - ``EEXIST``: the file already has verity enabled | |
163 | - ``EFAULT``: the caller provided inaccessible memory | |
164 | - ``EINTR``: the operation was interrupted by a fatal signal | |
165 | - ``EINVAL``: unsupported version, hash algorithm, or block size; or | |
166 | reserved bits are set; or the file descriptor refers to neither a | |
167 | regular file nor a directory. | |
168 | - ``EISDIR``: the file descriptor refers to a directory | |
169 | - ``EKEYREJECTED``: the signature doesn't match the file | |
170 | - ``EMSGSIZE``: the salt or signature is too long | |
171 | - ``ENOKEY``: the fs-verity keyring doesn't contain the certificate | |
172 | needed to verify the signature | |
173 | - ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not | |
174 | available in the kernel's crypto API as currently configured (e.g. | |
175 | for SHA-512, missing CONFIG_CRYPTO_SHA512). | |
176 | - ``ENOTTY``: this type of filesystem does not implement fs-verity | |
177 | - ``EOPNOTSUPP``: the kernel was not configured with fs-verity | |
178 | support; or the filesystem superblock has not had the 'verity' | |
179 | feature enabled on it; or the filesystem does not support fs-verity | |
180 | on this file. (See `Filesystem support`_.) | |
181 | - ``EPERM``: the file is append-only; or, a signature is required and | |
182 | one was not provided. | |
183 | - ``EROFS``: the filesystem is read-only | |
184 | - ``ETXTBSY``: someone has the file open for writing. This can be the | |
185 | caller's file descriptor, another open file descriptor, or the file | |
186 | reference held by a writable memory map. | |
187 | ||
188 | FS_IOC_MEASURE_VERITY | |
189 | --------------------- | |
190 | ||
ed45e201 EB |
191 | The FS_IOC_MEASURE_VERITY ioctl retrieves the digest of a verity file. |
192 | The fs-verity file digest is a cryptographic digest that identifies | |
193 | the file contents that are being enforced on reads; it is computed via | |
194 | a Merkle tree and is different from a traditional full-file digest. | |
6ff2deb2 EB |
195 | |
196 | This ioctl takes in a pointer to a variable-length structure:: | |
197 | ||
198 | struct fsverity_digest { | |
199 | __u16 digest_algorithm; | |
200 | __u16 digest_size; /* input/output */ | |
201 | __u8 digest[]; | |
202 | }; | |
203 | ||
204 | ``digest_size`` is an input/output field. On input, it must be | |
205 | initialized to the number of bytes allocated for the variable-length | |
206 | ``digest`` field. | |
207 | ||
208 | On success, 0 is returned and the kernel fills in the structure as | |
209 | follows: | |
210 | ||
211 | - ``digest_algorithm`` will be the hash algorithm used for the file | |
ed45e201 | 212 | digest. It will match ``fsverity_enable_arg::hash_algorithm``. |
6ff2deb2 EB |
213 | - ``digest_size`` will be the size of the digest in bytes, e.g. 32 |
214 | for SHA-256. (This can be redundant with ``digest_algorithm``.) | |
215 | - ``digest`` will be the actual bytes of the digest. | |
216 | ||
217 | FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time, | |
218 | regardless of the size of the file. | |
219 | ||
220 | FS_IOC_MEASURE_VERITY can fail with the following errors: | |
221 | ||
222 | - ``EFAULT``: the caller provided inaccessible memory | |
223 | - ``ENODATA``: the file is not a verity file | |
224 | - ``ENOTTY``: this type of filesystem does not implement fs-verity | |
225 | - ``EOPNOTSUPP``: the kernel was not configured with fs-verity | |
226 | support, or the filesystem superblock has not had the 'verity' | |
227 | feature enabled on it. (See `Filesystem support`_.) | |
228 | - ``EOVERFLOW``: the digest is longer than the specified | |
229 | ``digest_size`` bytes. Try providing a larger buffer. | |
230 | ||
e17fe657 EB |
231 | FS_IOC_READ_VERITY_METADATA |
232 | --------------------------- | |
233 | ||
234 | The FS_IOC_READ_VERITY_METADATA ioctl reads verity metadata from a | |
235 | verity file. This ioctl is available since Linux v5.12. | |
236 | ||
237 | This ioctl allows writing a server program that takes a verity file | |
238 | and serves it to a client program, such that the client can do its own | |
239 | fs-verity compatible verification of the file. This only makes sense | |
240 | if the client doesn't trust the server and if the server needs to | |
241 | provide the storage for the client. | |
242 | ||
243 | This is a fairly specialized use case, and most fs-verity users won't | |
244 | need this ioctl. | |
245 | ||
246 | This ioctl takes in a pointer to the following structure:: | |
247 | ||
622699cf | 248 | #define FS_VERITY_METADATA_TYPE_MERKLE_TREE 1 |
947191ac | 249 | #define FS_VERITY_METADATA_TYPE_DESCRIPTOR 2 |
07c99001 | 250 | #define FS_VERITY_METADATA_TYPE_SIGNATURE 3 |
622699cf | 251 | |
e17fe657 EB |
252 | struct fsverity_read_metadata_arg { |
253 | __u64 metadata_type; | |
254 | __u64 offset; | |
255 | __u64 length; | |
256 | __u64 buf_ptr; | |
257 | __u64 __reserved; | |
258 | }; | |
259 | ||
622699cf EB |
260 | ``metadata_type`` specifies the type of metadata to read: |
261 | ||
262 | - ``FS_VERITY_METADATA_TYPE_MERKLE_TREE`` reads the blocks of the | |
263 | Merkle tree. The blocks are returned in order from the root level | |
264 | to the leaf level. Within each level, the blocks are returned in | |
265 | the same order that their hashes are themselves hashed. | |
266 | See `Merkle tree`_ for more information. | |
e17fe657 | 267 | |
947191ac EB |
268 | - ``FS_VERITY_METADATA_TYPE_DESCRIPTOR`` reads the fs-verity |
269 | descriptor. See `fs-verity descriptor`_. | |
270 | ||
07c99001 EB |
271 | - ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the signature which was |
272 | passed to FS_IOC_ENABLE_VERITY, if any. See `Built-in signature | |
273 | verification`_. | |
274 | ||
e17fe657 EB |
275 | The semantics are similar to those of ``pread()``. ``offset`` |
276 | specifies the offset in bytes into the metadata item to read from, and | |
277 | ``length`` specifies the maximum number of bytes to read from the | |
278 | metadata item. ``buf_ptr`` is the pointer to the buffer to read into, | |
279 | cast to a 64-bit integer. ``__reserved`` must be 0. On success, the | |
280 | number of bytes read is returned. 0 is returned at the end of the | |
281 | metadata item. The returned length may be less than ``length``, for | |
282 | example if the ioctl is interrupted. | |
283 | ||
284 | The metadata returned by FS_IOC_READ_VERITY_METADATA isn't guaranteed | |
285 | to be authenticated against the file digest that would be returned by | |
286 | `FS_IOC_MEASURE_VERITY`_, as the metadata is expected to be used to | |
287 | implement fs-verity compatible verification anyway (though absent a | |
288 | malicious disk, the metadata will indeed match). E.g. to implement | |
289 | this ioctl, the filesystem is allowed to just read the Merkle tree | |
290 | blocks from disk without actually verifying the path to the root node. | |
291 | ||
292 | FS_IOC_READ_VERITY_METADATA can fail with the following errors: | |
293 | ||
294 | - ``EFAULT``: the caller provided inaccessible memory | |
295 | - ``EINTR``: the ioctl was interrupted before any data was read | |
296 | - ``EINVAL``: reserved fields were set, or ``offset + length`` | |
297 | overflowed | |
07c99001 EB |
298 | - ``ENODATA``: the file is not a verity file, or |
299 | FS_VERITY_METADATA_TYPE_SIGNATURE was requested but the file doesn't | |
300 | have a built-in signature | |
e17fe657 EB |
301 | - ``ENOTTY``: this type of filesystem does not implement fs-verity, or |
302 | this ioctl is not yet implemented on it | |
303 | - ``EOPNOTSUPP``: the kernel was not configured with fs-verity | |
304 | support, or the filesystem superblock has not had the 'verity' | |
305 | feature enabled on it. (See `Filesystem support`_.) | |
306 | ||
6ff2deb2 EB |
307 | FS_IOC_GETFLAGS |
308 | --------------- | |
309 | ||
310 | The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity) | |
311 | can also be used to check whether a file has fs-verity enabled or not. | |
312 | To do so, check for FS_VERITY_FL (0x00100000) in the returned flags. | |
313 | ||
314 | The verity flag is not settable via FS_IOC_SETFLAGS. You must use | |
315 | FS_IOC_ENABLE_VERITY instead, since parameters must be provided. | |
316 | ||
73f0ec02 EB |
317 | statx |
318 | ----- | |
319 | ||
320 | Since Linux v5.5, the statx() system call sets STATX_ATTR_VERITY if | |
321 | the file has fs-verity enabled. This can perform better than | |
322 | FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require | |
323 | opening the file, and opening verity files can be expensive. | |
324 | ||
6ff2deb2 EB |
325 | Accessing verity files |
326 | ====================== | |
327 | ||
328 | Applications can transparently access a verity file just like a | |
329 | non-verity one, with the following exceptions: | |
330 | ||
331 | - Verity files are readonly. They cannot be opened for writing or | |
332 | truncate()d, even if the file mode bits allow it. Attempts to do | |
333 | one of these things will fail with EPERM. However, changes to | |
334 | metadata such as owner, mode, timestamps, and xattrs are still | |
335 | allowed, since these are not measured by fs-verity. Verity files | |
336 | can also still be renamed, deleted, and linked to. | |
337 | ||
338 | - Direct I/O is not supported on verity files. Attempts to use direct | |
339 | I/O on such files will fall back to buffered I/O. | |
340 | ||
341 | - DAX (Direct Access) is not supported on verity files, because this | |
342 | would circumvent the data verification. | |
343 | ||
344 | - Reads of data that doesn't match the verity Merkle tree will fail | |
345 | with EIO (for read()) or SIGBUS (for mmap() reads). | |
346 | ||
347 | - If the sysctl "fs.verity.require_signatures" is set to 1 and the | |
ed45e201 EB |
348 | file is not signed by a key in the fs-verity keyring, then opening |
349 | the file will fail. See `Built-in signature verification`_. | |
6ff2deb2 EB |
350 | |
351 | Direct access to the Merkle tree is not supported. Therefore, if a | |
352 | verity file is copied, or is backed up and restored, then it will lose | |
353 | its "verity"-ness. fs-verity is primarily meant for files like | |
354 | executables that are managed by a package manager. | |
355 | ||
ed45e201 EB |
356 | File digest computation |
357 | ======================= | |
6ff2deb2 EB |
358 | |
359 | This section describes how fs-verity hashes the file contents using a | |
ed45e201 EB |
360 | Merkle tree to produce the digest which cryptographically identifies |
361 | the file contents. This algorithm is the same for all filesystems | |
362 | that support fs-verity. | |
6ff2deb2 EB |
363 | |
364 | Userspace only needs to be aware of this algorithm if it needs to | |
ed45e201 | 365 | compute fs-verity file digests itself, e.g. in order to sign files. |
6ff2deb2 EB |
366 | |
367 | .. _fsverity_merkle_tree: | |
368 | ||
369 | Merkle tree | |
370 | ----------- | |
371 | ||
372 | The file contents is divided into blocks, where the block size is | |
373 | configurable but is usually 4096 bytes. The end of the last block is | |
374 | zero-padded if needed. Each block is then hashed, producing the first | |
375 | level of hashes. Then, the hashes in this first level are grouped | |
376 | into 'blocksize'-byte blocks (zero-padding the ends as needed) and | |
377 | these blocks are hashed, producing the second level of hashes. This | |
378 | proceeds up the tree until only a single block remains. The hash of | |
379 | this block is the "Merkle tree root hash". | |
380 | ||
381 | If the file fits in one block and is nonempty, then the "Merkle tree | |
382 | root hash" is simply the hash of the single data block. If the file | |
383 | is empty, then the "Merkle tree root hash" is all zeroes. | |
384 | ||
385 | The "blocks" here are not necessarily the same as "filesystem blocks". | |
386 | ||
387 | If a salt was specified, then it's zero-padded to the closest multiple | |
388 | of the input size of the hash algorithm's compression function, e.g. | |
389 | 64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is | |
390 | prepended to every data or Merkle tree block that is hashed. | |
391 | ||
392 | The purpose of the block padding is to cause every hash to be taken | |
393 | over the same amount of data, which simplifies the implementation and | |
394 | keeps open more possibilities for hardware acceleration. The purpose | |
395 | of the salt padding is to make the salting "free" when the salted hash | |
396 | state is precomputed, then imported for each hash. | |
397 | ||
398 | Example: in the recommended configuration of SHA-256 and 4K blocks, | |
399 | 128 hash values fit in each block. Thus, each level of the Merkle | |
400 | tree is approximately 128 times smaller than the previous, and for | |
401 | large files the Merkle tree's size converges to approximately 1/127 of | |
402 | the original file size. However, for small files, the padding is | |
403 | significant, making the space overhead proportionally more. | |
404 | ||
405 | .. _fsverity_descriptor: | |
406 | ||
407 | fs-verity descriptor | |
408 | -------------------- | |
409 | ||
410 | By itself, the Merkle tree root hash is ambiguous. For example, it | |
411 | can't a distinguish a large file from a small second file whose data | |
412 | is exactly the top-level hash block of the first file. Ambiguities | |
413 | also arise from the convention of padding to the next block boundary. | |
414 | ||
ed45e201 EB |
415 | To solve this problem, the fs-verity file digest is actually computed |
416 | as a hash of the following structure, which contains the Merkle tree | |
417 | root hash as well as other fields such as the file size:: | |
6ff2deb2 EB |
418 | |
419 | struct fsverity_descriptor { | |
420 | __u8 version; /* must be 1 */ | |
421 | __u8 hash_algorithm; /* Merkle tree hash algorithm */ | |
422 | __u8 log_blocksize; /* log2 of size of data and tree blocks */ | |
423 | __u8 salt_size; /* size of salt in bytes; 0 if none */ | |
bde49334 | 424 | __le32 __reserved_0x04; /* must be 0 */ |
6ff2deb2 EB |
425 | __le64 data_size; /* size of file the Merkle tree is built over */ |
426 | __u8 root_hash[64]; /* Merkle tree root hash */ | |
427 | __u8 salt[32]; /* salt prepended to each hashed block */ | |
428 | __u8 __reserved[144]; /* must be 0's */ | |
429 | }; | |
430 | ||
6ff2deb2 EB |
431 | Built-in signature verification |
432 | =============================== | |
433 | ||
434 | With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting | |
435 | a portion of an authentication policy (see `Use cases`_) in the | |
436 | kernel. Specifically, it adds support for: | |
437 | ||
438 | 1. At fs-verity module initialization time, a keyring ".fs-verity" is | |
439 | created. The root user can add trusted X.509 certificates to this | |
440 | keyring using the add_key() system call, then (when done) | |
441 | optionally use keyctl_restrict_keyring() to prevent additional | |
442 | certificates from being added. | |
443 | ||
444 | 2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted | |
ed45e201 EB |
445 | detached signature in DER format of the file's fs-verity digest. |
446 | On success, this signature is persisted alongside the Merkle tree. | |
6ff2deb2 | 447 | Then, any time the file is opened, the kernel will verify the |
ed45e201 EB |
448 | file's actual digest against this signature, using the certificates |
449 | in the ".fs-verity" keyring. | |
6ff2deb2 EB |
450 | |
451 | 3. A new sysctl "fs.verity.require_signatures" is made available. | |
452 | When set to 1, the kernel requires that all verity files have a | |
ed45e201 | 453 | correctly signed digest as described in (2). |
6ff2deb2 | 454 | |
ed45e201 EB |
455 | fs-verity file digests must be signed in the following format, which |
456 | is similar to the structure used by `FS_IOC_MEASURE_VERITY`_:: | |
6ff2deb2 | 457 | |
9e90f30e | 458 | struct fsverity_formatted_digest { |
6ff2deb2 EB |
459 | char magic[8]; /* must be "FSVerity" */ |
460 | __le16 digest_algorithm; | |
461 | __le16 digest_size; | |
462 | __u8 digest[]; | |
463 | }; | |
464 | ||
465 | fs-verity's built-in signature verification support is meant as a | |
466 | relatively simple mechanism that can be used to provide some level of | |
467 | authenticity protection for verity files, as an alternative to doing | |
468 | the signature verification in userspace or using IMA-appraisal. | |
469 | However, with this mechanism, userspace programs still need to check | |
470 | that the verity bit is set, and there is no protection against verity | |
471 | files being swapped around. | |
472 | ||
473 | Filesystem support | |
474 | ================== | |
475 | ||
476 | fs-verity is currently supported by the ext4 and f2fs filesystems. | |
477 | The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity | |
478 | on either filesystem. | |
479 | ||
480 | ``include/linux/fsverity.h`` declares the interface between the | |
481 | ``fs/verity/`` support layer and filesystems. Briefly, filesystems | |
482 | must provide an ``fsverity_operations`` structure that provides | |
483 | methods to read and write the verity metadata to a filesystem-specific | |
484 | location, including the Merkle tree blocks and | |
485 | ``fsverity_descriptor``. Filesystems must also call functions in | |
486 | ``fs/verity/`` at certain times, such as when a file is opened or when | |
487 | pages have been read into the pagecache. (See `Verifying data`_.) | |
488 | ||
489 | ext4 | |
490 | ---- | |
491 | ||
c0d782a3 | 492 | ext4 supports fs-verity since Linux v5.4 and e2fsprogs v1.45.2. |
6ff2deb2 EB |
493 | |
494 | To create verity files on an ext4 filesystem, the filesystem must have | |
495 | been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on | |
496 | it. "verity" is an RO_COMPAT filesystem feature, so once set, old | |
497 | kernels will only be able to mount the filesystem readonly, and old | |
498 | versions of e2fsck will be unable to check the filesystem. Moreover, | |
499 | currently ext4 only supports mounting a filesystem with the "verity" | |
500 | feature when its block size is equal to PAGE_SIZE (often 4096 bytes). | |
501 | ||
502 | ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It | |
503 | can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. | |
504 | ||
505 | ext4 also supports encryption, which can be used simultaneously with | |
506 | fs-verity. In this case, the plaintext data is verified rather than | |
ed45e201 EB |
507 | the ciphertext. This is necessary in order to make the fs-verity file |
508 | digest meaningful, since every file is encrypted differently. | |
6ff2deb2 EB |
509 | |
510 | ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) | |
511 | past the end of the file, starting at the first 64K boundary beyond | |
512 | i_size. This approach works because (a) verity files are readonly, | |
513 | and (b) pages fully beyond i_size aren't visible to userspace but can | |
514 | be read/written internally by ext4 with only some relatively small | |
515 | changes to ext4. This approach avoids having to depend on the | |
516 | EA_INODE feature and on rearchitecturing ext4's xattr support to | |
517 | support paging multi-gigabyte xattrs into memory, and to support | |
518 | encrypting xattrs. Note that the verity metadata *must* be encrypted | |
519 | when the file is, since it contains hashes of the plaintext data. | |
520 | ||
521 | Currently, ext4 verity only supports the case where the Merkle tree | |
522 | block size, filesystem block size, and page size are all the same. It | |
523 | also only supports extent-based files. | |
524 | ||
525 | f2fs | |
526 | ---- | |
527 | ||
c0d782a3 | 528 | f2fs supports fs-verity since Linux v5.4 and f2fs-tools v1.11.0. |
6ff2deb2 EB |
529 | |
530 | To create verity files on an f2fs filesystem, the filesystem must have | |
531 | been formatted with ``-O verity``. | |
532 | ||
533 | f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files. | |
534 | It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be | |
535 | cleared. | |
536 | ||
537 | Like ext4, f2fs stores the verity metadata (Merkle tree and | |
538 | fsverity_descriptor) past the end of the file, starting at the first | |
539 | 64K boundary beyond i_size. See explanation for ext4 above. | |
540 | Moreover, f2fs supports at most 4096 bytes of xattr entries per inode | |
541 | which wouldn't be enough for even a single Merkle tree block. | |
542 | ||
543 | Currently, f2fs verity only supports a Merkle tree block size of 4096. | |
544 | Also, f2fs doesn't support enabling verity on files that currently | |
545 | have atomic or volatile writes pending. | |
546 | ||
547 | Implementation details | |
548 | ====================== | |
549 | ||
550 | Verifying data | |
551 | -------------- | |
552 | ||
553 | fs-verity ensures that all reads of a verity file's data are verified, | |
554 | regardless of which syscall is used to do the read (e.g. mmap(), | |
555 | read(), pread()) and regardless of whether it's the first read or a | |
556 | later read (unless the later read can return cached data that was | |
557 | already verified). Below, we describe how filesystems implement this. | |
558 | ||
559 | Pagecache | |
560 | ~~~~~~~~~ | |
561 | ||
08830c8b | 562 | For filesystems using Linux's pagecache, the ``->read_folio()`` and |
704528d8 | 563 | ``->readahead()`` methods must be modified to verify pages before they |
6ff2deb2 EB |
564 | are marked Uptodate. Merely hooking ``->read_iter()`` would be |
565 | insufficient, since ``->read_iter()`` is not used for memory maps. | |
566 | ||
567 | Therefore, fs/verity/ provides a function fsverity_verify_page() which | |
568 | verifies a page that has been read into the pagecache of a verity | |
569 | inode, but is still locked and not Uptodate, so it's not yet readable | |
570 | by userspace. As needed to do the verification, | |
571 | fsverity_verify_page() will call back into the filesystem to read | |
572 | Merkle tree pages via fsverity_operations::read_merkle_tree_page(). | |
573 | ||
574 | fsverity_verify_page() returns false if verification failed; in this | |
575 | case, the filesystem must not set the page Uptodate. Following this, | |
576 | as per the usual Linux pagecache behavior, attempts by userspace to | |
577 | read() from the part of the file containing the page will fail with | |
578 | EIO, and accesses to the page within a memory map will raise SIGBUS. | |
579 | ||
580 | fsverity_verify_page() currently only supports the case where the | |
581 | Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes). | |
582 | ||
583 | In principle, fsverity_verify_page() verifies the entire path in the | |
584 | Merkle tree from the data page to the root hash. However, for | |
585 | efficiency the filesystem may cache the hash pages. Therefore, | |
586 | fsverity_verify_page() only ascends the tree reading hash pages until | |
587 | an already-verified hash page is seen, as indicated by the PageChecked | |
588 | bit being set. It then verifies the path to that page. | |
589 | ||
590 | This optimization, which is also used by dm-verity, results in | |
591 | excellent sequential read performance. This is because usually (e.g. | |
592 | 127 in 128 times for 4K blocks and SHA-256) the hash page from the | |
593 | bottom level of the tree will already be cached and checked from | |
594 | reading a previous data page. However, random reads perform worse. | |
595 | ||
596 | Block device based filesystems | |
597 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
598 | ||
599 | Block device based filesystems (e.g. ext4 and f2fs) in Linux also use | |
600 | the pagecache, so the above subsection applies too. However, they | |
601 | also usually read many pages from a file at once, grouped into a | |
602 | structure called a "bio". To make it easier for these types of | |
603 | filesystems to support fs-verity, fs/verity/ also provides a function | |
604 | fsverity_verify_bio() which verifies all pages in a bio. | |
605 | ||
606 | ext4 and f2fs also support encryption. If a verity file is also | |
607 | encrypted, the pages must be decrypted before being verified. To | |
608 | support this, these filesystems allocate a "post-read context" for | |
609 | each bio and store it in ``->bi_private``:: | |
610 | ||
611 | struct bio_post_read_ctx { | |
612 | struct bio *bio; | |
613 | struct work_struct work; | |
614 | unsigned int cur_step; | |
615 | unsigned int enabled_steps; | |
616 | }; | |
617 | ||
618 | ``enabled_steps`` is a bitmask that specifies whether decryption, | |
619 | verity, or both is enabled. After the bio completes, for each needed | |
620 | postprocessing step the filesystem enqueues the bio_post_read_ctx on a | |
621 | workqueue, and then the workqueue work does the decryption or | |
622 | verification. Finally, pages where no decryption or verity error | |
623 | occurred are marked Uptodate, and the pages are unlocked. | |
624 | ||
704528d8 | 625 | Files on ext4 and f2fs may contain holes. Normally, ``->readahead()`` |
6ff2deb2 EB |
626 | simply zeroes holes and sets the corresponding pages Uptodate; no bios |
627 | are issued. To prevent this case from bypassing fs-verity, these | |
628 | filesystems use fsverity_verify_page() to verify hole pages. | |
629 | ||
630 | ext4 and f2fs disable direct I/O on verity files, since otherwise | |
631 | direct I/O would bypass fs-verity. (They also do the same for | |
632 | encrypted files.) | |
633 | ||
634 | Userspace utility | |
635 | ================= | |
636 | ||
637 | This document focuses on the kernel, but a userspace utility for | |
638 | fs-verity can be found at: | |
639 | ||
640 | https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git | |
641 | ||
642 | See the README.md file in the fsverity-utils source tree for details, | |
643 | including examples of setting up fs-verity protected files. | |
644 | ||
645 | Tests | |
646 | ===== | |
647 | ||
648 | To test fs-verity, use xfstests. For example, using `kvm-xfstests | |
649 | <https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: | |
650 | ||
651 | kvm-xfstests -c ext4,f2fs -g verity | |
652 | ||
653 | FAQ | |
654 | === | |
655 | ||
656 | This section answers frequently asked questions about fs-verity that | |
657 | weren't already directly answered in other parts of this document. | |
658 | ||
659 | :Q: Why isn't fs-verity part of IMA? | |
660 | :A: fs-verity and IMA (Integrity Measurement Architecture) have | |
661 | different focuses. fs-verity is a filesystem-level mechanism for | |
662 | hashing individual files using a Merkle tree. In contrast, IMA | |
663 | specifies a system-wide policy that specifies which files are | |
664 | hashed and what to do with those hashes, such as log them, | |
665 | authenticate them, or add them to a measurement list. | |
666 | ||
02ee2316 MZ |
667 | IMA supports the fs-verity hashing mechanism as an alternative |
668 | to full file hashes, for those who want the performance and | |
669 | security benefits of the Merkle tree based hash. However, it | |
670 | doesn't make sense to force all uses of fs-verity to be through | |
671 | IMA. fs-verity already meets many users' needs even as a | |
672 | standalone filesystem feature, and it's testable like other | |
6ff2deb2 EB |
673 | filesystem features e.g. with xfstests. |
674 | ||
675 | :Q: Isn't fs-verity useless because the attacker can just modify the | |
676 | hashes in the Merkle tree, which is stored on-disk? | |
677 | :A: To verify the authenticity of an fs-verity file you must verify | |
ed45e201 EB |
678 | the authenticity of the "fs-verity file digest", which |
679 | incorporates the root hash of the Merkle tree. See `Use cases`_. | |
6ff2deb2 EB |
680 | |
681 | :Q: Isn't fs-verity useless because the attacker can just replace a | |
682 | verity file with a non-verity one? | |
683 | :A: See `Use cases`_. In the initial use case, it's really trusted | |
684 | userspace code that authenticates the files; fs-verity is just a | |
685 | tool to do this job efficiently and securely. The trusted | |
686 | userspace code will consider non-verity files to be inauthentic. | |
687 | ||
688 | :Q: Why does the Merkle tree need to be stored on-disk? Couldn't you | |
689 | store just the root hash? | |
690 | :A: If the Merkle tree wasn't stored on-disk, then you'd have to | |
691 | compute the entire tree when the file is first accessed, even if | |
692 | just one byte is being read. This is a fundamental consequence of | |
693 | how Merkle tree hashing works. To verify a leaf node, you need to | |
694 | verify the whole path to the root hash, including the root node | |
695 | (the thing which the root hash is a hash of). But if the root | |
696 | node isn't stored on-disk, you have to compute it by hashing its | |
697 | children, and so on until you've actually hashed the entire file. | |
698 | ||
699 | That defeats most of the point of doing a Merkle tree-based hash, | |
700 | since if you have to hash the whole file ahead of time anyway, | |
701 | then you could simply do sha256(file) instead. That would be much | |
702 | simpler, and a bit faster too. | |
703 | ||
704 | It's true that an in-memory Merkle tree could still provide the | |
705 | advantage of verification on every read rather than just on the | |
706 | first read. However, it would be inefficient because every time a | |
707 | hash page gets evicted (you can't pin the entire Merkle tree into | |
708 | memory, since it may be very large), in order to restore it you | |
709 | again need to hash everything below it in the tree. This again | |
710 | defeats most of the point of doing a Merkle tree-based hash, since | |
711 | a single block read could trigger re-hashing gigabytes of data. | |
712 | ||
713 | :Q: But couldn't you store just the leaf nodes and compute the rest? | |
714 | :A: See previous answer; this really just moves up one level, since | |
715 | one could alternatively interpret the data blocks as being the | |
716 | leaf nodes of the Merkle tree. It's true that the tree can be | |
717 | computed much faster if the leaf level is stored rather than just | |
718 | the data, but that's only because each level is less than 1% the | |
719 | size of the level below (assuming the recommended settings of | |
720 | SHA-256 and 4K blocks). For the exact same reason, by storing | |
721 | "just the leaf nodes" you'd already be storing over 99% of the | |
722 | tree, so you might as well simply store the whole tree. | |
723 | ||
724 | :Q: Can the Merkle tree be built ahead of time, e.g. distributed as | |
725 | part of a package that is installed to many computers? | |
726 | :A: This isn't currently supported. It was part of the original | |
727 | design, but was removed to simplify the kernel UAPI and because it | |
728 | wasn't a critical use case. Files are usually installed once and | |
729 | used many times, and cryptographic hashing is somewhat fast on | |
730 | most modern processors. | |
731 | ||
732 | :Q: Why doesn't fs-verity support writes? | |
733 | :A: Write support would be very difficult and would require a | |
734 | completely different design, so it's well outside the scope of | |
735 | fs-verity. Write support would require: | |
736 | ||
737 | - A way to maintain consistency between the data and hashes, | |
738 | including all levels of hashes, since corruption after a crash | |
739 | (especially of potentially the entire file!) is unacceptable. | |
740 | The main options for solving this are data journalling, | |
741 | copy-on-write, and log-structured volume. But it's very hard to | |
742 | retrofit existing filesystems with new consistency mechanisms. | |
743 | Data journalling is available on ext4, but is very slow. | |
744 | ||
59bc120e | 745 | - Rebuilding the Merkle tree after every write, which would be |
6ff2deb2 EB |
746 | extremely inefficient. Alternatively, a different authenticated |
747 | dictionary structure such as an "authenticated skiplist" could | |
748 | be used. However, this would be far more complex. | |
749 | ||
750 | Compare it to dm-verity vs. dm-integrity. dm-verity is very | |
751 | simple: the kernel just verifies read-only data against a | |
752 | read-only Merkle tree. In contrast, dm-integrity supports writes | |
753 | but is slow, is much more complex, and doesn't actually support | |
754 | full-device authentication since it authenticates each sector | |
755 | independently, i.e. there is no "root hash". It doesn't really | |
756 | make sense for the same device-mapper target to support these two | |
757 | very different cases; the same applies to fs-verity. | |
758 | ||
759 | :Q: Since verity files are immutable, why isn't the immutable bit set? | |
760 | :A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a | |
761 | specific set of semantics which not only make the file contents | |
762 | read-only, but also prevent the file from being deleted, renamed, | |
763 | linked to, or having its owner or mode changed. These extra | |
764 | properties are unwanted for fs-verity, so reusing the immutable | |
765 | bit isn't appropriate. | |
766 | ||
767 | :Q: Why does the API use ioctls instead of setxattr() and getxattr()? | |
768 | :A: Abusing the xattr interface for basically arbitrary syscalls is | |
769 | heavily frowned upon by most of the Linux filesystem developers. | |
770 | An xattr should really just be an xattr on-disk, not an API to | |
771 | e.g. magically trigger construction of a Merkle tree. | |
772 | ||
773 | :Q: Does fs-verity support remote filesystems? | |
774 | :A: Only ext4 and f2fs support is implemented currently, but in | |
775 | principle any filesystem that can store per-file verity metadata | |
776 | can support fs-verity, regardless of whether it's local or remote. | |
777 | Some filesystems may have fewer options of where to store the | |
778 | verity metadata; one possibility is to store it past the end of | |
779 | the file and "hide" it from userspace by manipulating i_size. The | |
780 | data verification functions provided by ``fs/verity/`` also assume | |
781 | that the filesystem uses the Linux pagecache, but both local and | |
782 | remote filesystems normally do so. | |
783 | ||
784 | :Q: Why is anything filesystem-specific at all? Shouldn't fs-verity | |
785 | be implemented entirely at the VFS level? | |
786 | :A: There are many reasons why this is not possible or would be very | |
787 | difficult, including the following: | |
788 | ||
789 | - To prevent bypassing verification, pages must not be marked | |
790 | Uptodate until they've been verified. Currently, each | |
791 | filesystem is responsible for marking pages Uptodate via | |
704528d8 | 792 | ``->readahead()``. Therefore, currently it's not possible for |
6ff2deb2 EB |
793 | the VFS to do the verification on its own. Changing this would |
794 | require significant changes to the VFS and all filesystems. | |
795 | ||
796 | - It would require defining a filesystem-independent way to store | |
797 | the verity metadata. Extended attributes don't work for this | |
798 | because (a) the Merkle tree may be gigabytes, but many | |
799 | filesystems assume that all xattrs fit into a single 4K | |
800 | filesystem block, and (b) ext4 and f2fs encryption doesn't | |
801 | encrypt xattrs, yet the Merkle tree *must* be encrypted when the | |
802 | file contents are, because it stores hashes of the plaintext | |
803 | file contents. | |
804 | ||
805 | So the verity metadata would have to be stored in an actual | |
806 | file. Using a separate file would be very ugly, since the | |
807 | metadata is fundamentally part of the file to be protected, and | |
808 | it could cause problems where users could delete the real file | |
809 | but not the metadata file or vice versa. On the other hand, | |
810 | having it be in the same file would break applications unless | |
811 | filesystems' notion of i_size were divorced from the VFS's, | |
812 | which would be complex and require changes to all filesystems. | |
813 | ||
814 | - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's | |
815 | transaction mechanism so that either the file ends up with | |
816 | verity enabled, or no changes were made. Allowing intermediate | |
817 | states to occur after a crash may cause problems. |