xfs: introduce the CoW fork
[linux-block.git] / fs / xfs / xfs_reflink.c
CommitLineData
3993baeb
DW
1/*
2 * Copyright (C) 2016 Oracle. All Rights Reserved.
3 *
4 * Author: Darrick J. Wong <darrick.wong@oracle.com>
5 *
6 * This program is free software; you can redistribute it and/or
7 * modify it under the terms of the GNU General Public License
8 * as published by the Free Software Foundation; either version 2
9 * of the License, or (at your option) any later version.
10 *
11 * This program is distributed in the hope that it would be useful,
12 * but WITHOUT ANY WARRANTY; without even the implied warranty of
13 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14 * GNU General Public License for more details.
15 *
16 * You should have received a copy of the GNU General Public License
17 * along with this program; if not, write the Free Software Foundation,
18 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
19 */
20#include "xfs.h"
21#include "xfs_fs.h"
22#include "xfs_shared.h"
23#include "xfs_format.h"
24#include "xfs_log_format.h"
25#include "xfs_trans_resv.h"
26#include "xfs_mount.h"
27#include "xfs_defer.h"
28#include "xfs_da_format.h"
29#include "xfs_da_btree.h"
30#include "xfs_inode.h"
31#include "xfs_trans.h"
32#include "xfs_inode_item.h"
33#include "xfs_bmap.h"
34#include "xfs_bmap_util.h"
35#include "xfs_error.h"
36#include "xfs_dir2.h"
37#include "xfs_dir2_priv.h"
38#include "xfs_ioctl.h"
39#include "xfs_trace.h"
40#include "xfs_log.h"
41#include "xfs_icache.h"
42#include "xfs_pnfs.h"
43#include "xfs_refcount_btree.h"
44#include "xfs_refcount.h"
45#include "xfs_bmap_btree.h"
46#include "xfs_trans_space.h"
47#include "xfs_bit.h"
48#include "xfs_alloc.h"
49#include "xfs_quota_defs.h"
50#include "xfs_quota.h"
51#include "xfs_btree.h"
52#include "xfs_bmap_btree.h"
53#include "xfs_reflink.h"
54
55/*
56 * Copy on Write of Shared Blocks
57 *
58 * XFS must preserve "the usual" file semantics even when two files share
59 * the same physical blocks. This means that a write to one file must not
60 * alter the blocks in a different file; the way that we'll do that is
61 * through the use of a copy-on-write mechanism. At a high level, that
62 * means that when we want to write to a shared block, we allocate a new
63 * block, write the data to the new block, and if that succeeds we map the
64 * new block into the file.
65 *
66 * XFS provides a "delayed allocation" mechanism that defers the allocation
67 * of disk blocks to dirty-but-not-yet-mapped file blocks as long as
68 * possible. This reduces fragmentation by enabling the filesystem to ask
69 * for bigger chunks less often, which is exactly what we want for CoW.
70 *
71 * The delalloc mechanism begins when the kernel wants to make a block
72 * writable (write_begin or page_mkwrite). If the offset is not mapped, we
73 * create a delalloc mapping, which is a regular in-core extent, but without
74 * a real startblock. (For delalloc mappings, the startblock encodes both
75 * a flag that this is a delalloc mapping, and a worst-case estimate of how
76 * many blocks might be required to put the mapping into the BMBT.) delalloc
77 * mappings are a reservation against the free space in the filesystem;
78 * adjacent mappings can also be combined into fewer larger mappings.
79 *
80 * When dirty pages are being written out (typically in writepage), the
81 * delalloc reservations are converted into real mappings by allocating
82 * blocks and replacing the delalloc mapping with real ones. A delalloc
83 * mapping can be replaced by several real ones if the free space is
84 * fragmented.
85 *
86 * We want to adapt the delalloc mechanism for copy-on-write, since the
87 * write paths are similar. The first two steps (creating the reservation
88 * and allocating the blocks) are exactly the same as delalloc except that
89 * the mappings must be stored in a separate CoW fork because we do not want
90 * to disturb the mapping in the data fork until we're sure that the write
91 * succeeded. IO completion in this case is the process of removing the old
92 * mapping from the data fork and moving the new mapping from the CoW fork to
93 * the data fork. This will be discussed shortly.
94 *
95 * For now, unaligned directio writes will be bounced back to the page cache.
96 * Block-aligned directio writes will use the same mechanism as buffered
97 * writes.
98 *
99 * CoW remapping must be done after the data block write completes,
100 * because we don't want to destroy the old data fork map until we're sure
101 * the new block has been written. Since the new mappings are kept in a
102 * separate fork, we can simply iterate these mappings to find the ones
103 * that cover the file blocks that we just CoW'd. For each extent, simply
104 * unmap the corresponding range in the data fork, map the new range into
105 * the data fork, and remove the extent from the CoW fork.
106 *
107 * Since the remapping operation can be applied to an arbitrary file
108 * range, we record the need for the remap step as a flag in the ioend
109 * instead of declaring a new IO type. This is required for direct io
110 * because we only have ioend for the whole dio, and we have to be able to
111 * remember the presence of unwritten blocks and CoW blocks with a single
112 * ioend structure. Better yet, the more ground we can cover with one
113 * ioend, the better.
114 */