Commit | Line | Data |
---|---|---|
3993baeb DW |
1 | /* |
2 | * Copyright (C) 2016 Oracle. All Rights Reserved. | |
3 | * | |
4 | * Author: Darrick J. Wong <darrick.wong@oracle.com> | |
5 | * | |
6 | * This program is free software; you can redistribute it and/or | |
7 | * modify it under the terms of the GNU General Public License | |
8 | * as published by the Free Software Foundation; either version 2 | |
9 | * of the License, or (at your option) any later version. | |
10 | * | |
11 | * This program is distributed in the hope that it would be useful, | |
12 | * but WITHOUT ANY WARRANTY; without even the implied warranty of | |
13 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
14 | * GNU General Public License for more details. | |
15 | * | |
16 | * You should have received a copy of the GNU General Public License | |
17 | * along with this program; if not, write the Free Software Foundation, | |
18 | * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. | |
19 | */ | |
20 | #include "xfs.h" | |
21 | #include "xfs_fs.h" | |
22 | #include "xfs_shared.h" | |
23 | #include "xfs_format.h" | |
24 | #include "xfs_log_format.h" | |
25 | #include "xfs_trans_resv.h" | |
26 | #include "xfs_mount.h" | |
27 | #include "xfs_defer.h" | |
28 | #include "xfs_da_format.h" | |
29 | #include "xfs_da_btree.h" | |
30 | #include "xfs_inode.h" | |
31 | #include "xfs_trans.h" | |
32 | #include "xfs_inode_item.h" | |
33 | #include "xfs_bmap.h" | |
34 | #include "xfs_bmap_util.h" | |
35 | #include "xfs_error.h" | |
36 | #include "xfs_dir2.h" | |
37 | #include "xfs_dir2_priv.h" | |
38 | #include "xfs_ioctl.h" | |
39 | #include "xfs_trace.h" | |
40 | #include "xfs_log.h" | |
41 | #include "xfs_icache.h" | |
42 | #include "xfs_pnfs.h" | |
43 | #include "xfs_refcount_btree.h" | |
44 | #include "xfs_refcount.h" | |
45 | #include "xfs_bmap_btree.h" | |
46 | #include "xfs_trans_space.h" | |
47 | #include "xfs_bit.h" | |
48 | #include "xfs_alloc.h" | |
49 | #include "xfs_quota_defs.h" | |
50 | #include "xfs_quota.h" | |
51 | #include "xfs_btree.h" | |
52 | #include "xfs_bmap_btree.h" | |
53 | #include "xfs_reflink.h" | |
54 | ||
55 | /* | |
56 | * Copy on Write of Shared Blocks | |
57 | * | |
58 | * XFS must preserve "the usual" file semantics even when two files share | |
59 | * the same physical blocks. This means that a write to one file must not | |
60 | * alter the blocks in a different file; the way that we'll do that is | |
61 | * through the use of a copy-on-write mechanism. At a high level, that | |
62 | * means that when we want to write to a shared block, we allocate a new | |
63 | * block, write the data to the new block, and if that succeeds we map the | |
64 | * new block into the file. | |
65 | * | |
66 | * XFS provides a "delayed allocation" mechanism that defers the allocation | |
67 | * of disk blocks to dirty-but-not-yet-mapped file blocks as long as | |
68 | * possible. This reduces fragmentation by enabling the filesystem to ask | |
69 | * for bigger chunks less often, which is exactly what we want for CoW. | |
70 | * | |
71 | * The delalloc mechanism begins when the kernel wants to make a block | |
72 | * writable (write_begin or page_mkwrite). If the offset is not mapped, we | |
73 | * create a delalloc mapping, which is a regular in-core extent, but without | |
74 | * a real startblock. (For delalloc mappings, the startblock encodes both | |
75 | * a flag that this is a delalloc mapping, and a worst-case estimate of how | |
76 | * many blocks might be required to put the mapping into the BMBT.) delalloc | |
77 | * mappings are a reservation against the free space in the filesystem; | |
78 | * adjacent mappings can also be combined into fewer larger mappings. | |
79 | * | |
80 | * When dirty pages are being written out (typically in writepage), the | |
81 | * delalloc reservations are converted into real mappings by allocating | |
82 | * blocks and replacing the delalloc mapping with real ones. A delalloc | |
83 | * mapping can be replaced by several real ones if the free space is | |
84 | * fragmented. | |
85 | * | |
86 | * We want to adapt the delalloc mechanism for copy-on-write, since the | |
87 | * write paths are similar. The first two steps (creating the reservation | |
88 | * and allocating the blocks) are exactly the same as delalloc except that | |
89 | * the mappings must be stored in a separate CoW fork because we do not want | |
90 | * to disturb the mapping in the data fork until we're sure that the write | |
91 | * succeeded. IO completion in this case is the process of removing the old | |
92 | * mapping from the data fork and moving the new mapping from the CoW fork to | |
93 | * the data fork. This will be discussed shortly. | |
94 | * | |
95 | * For now, unaligned directio writes will be bounced back to the page cache. | |
96 | * Block-aligned directio writes will use the same mechanism as buffered | |
97 | * writes. | |
98 | * | |
99 | * CoW remapping must be done after the data block write completes, | |
100 | * because we don't want to destroy the old data fork map until we're sure | |
101 | * the new block has been written. Since the new mappings are kept in a | |
102 | * separate fork, we can simply iterate these mappings to find the ones | |
103 | * that cover the file blocks that we just CoW'd. For each extent, simply | |
104 | * unmap the corresponding range in the data fork, map the new range into | |
105 | * the data fork, and remove the extent from the CoW fork. | |
106 | * | |
107 | * Since the remapping operation can be applied to an arbitrary file | |
108 | * range, we record the need for the remap step as a flag in the ioend | |
109 | * instead of declaring a new IO type. This is required for direct io | |
110 | * because we only have ioend for the whole dio, and we have to be able to | |
111 | * remember the presence of unwritten blocks and CoW blocks with a single | |
112 | * ioend structure. Better yet, the more ground we can cover with one | |
113 | * ioend, the better. | |
114 | */ |