Commit | Line | Data |
---|---|---|
f0ba4377 | 1 | ============================== |
e484585e PBG |
2 | Device-mapper snapshot support |
3 | ============================== | |
4 | ||
5 | Device-mapper allows you, without massive data copying: | |
6 | ||
f0ba4377 MCC |
7 | - To create snapshots of any block device i.e. mountable, saved states of |
8 | the block device which are also writable without interfering with the | |
9 | original content; | |
10 | - To create device "forks", i.e. multiple different versions of the | |
11 | same data stream. | |
12 | - To merge a snapshot of a block device back into the snapshot's origin | |
13 | device. | |
e484585e | 14 | |
d698aa45 MP |
15 | In the first two cases, dm copies only the chunks of data that get |
16 | changed and uses a separate copy-on-write (COW) block device for | |
17 | storage. | |
e484585e | 18 | |
d698aa45 MP |
19 | For snapshot merge the contents of the COW storage are merged back into |
20 | the origin device. | |
e484585e PBG |
21 | |
22 | ||
d698aa45 MP |
23 | There are three dm targets available: |
24 | snapshot, snapshot-origin, and snapshot-merge. | |
e484585e | 25 | |
f0ba4377 | 26 | - snapshot-origin <origin> |
e484585e PBG |
27 | |
28 | which will normally have one or more snapshots based on it. | |
e484585e PBG |
29 | Reads will be mapped directly to the backing device. For each write, the |
30 | original data will be saved in the <COW device> of each snapshot to keep | |
31 | its visible content unchanged, at least until the <COW device> fills up. | |
32 | ||
33 | ||
f0ba4377 | 34 | - snapshot <origin> <COW device> <persistent?> <chunksize> |
2e602385 | 35 | [<# feature args> [<arg>]*] |
e484585e | 36 | |
411f1140 | 37 | A snapshot of the <origin> block device is created. Changed chunks of |
e484585e PBG |
38 | <chunksize> sectors will be stored on the <COW device>. Writes will |
39 | only go to the <COW device>. Reads will come from the <COW device> or | |
40 | from <origin> for unchanged data. <COW device> will often be | |
41 | smaller than the origin and if it fills up the snapshot will become | |
42 | useless and be disabled, returning errors. So it is important to monitor | |
43 | the amount of free space and expand the <COW device> before it fills up. | |
44 | ||
45 | <persistent?> is P (Persistent) or N (Not persistent - will not survive | |
b0d3cc01 MS |
46 | after reboot). O (Overflow) can be added as a persistent store option |
47 | to allow userspace to advertise its support for seeing "Overflow" in the | |
48 | snapshot status. So supported store types are "P", "PO" and "N". | |
49 | ||
50 | The difference between persistent and transient is with transient | |
51 | snapshots less metadata must be saved on disk - they can be kept in | |
52 | memory by the kernel. | |
e484585e | 53 | |
424da29c MP |
54 | When loading or unloading the snapshot target, the corresponding |
55 | snapshot-origin or snapshot-merge target must be suspended. A failure to | |
56 | suspend the origin target could result in data corruption. | |
57 | ||
2e602385 | 58 | Optional features: |
e484585e | 59 | |
2e602385 MS |
60 | discard_zeroes_cow - a discard issued to the snapshot device that |
61 | maps to entire chunks to will zero the corresponding exception(s) in | |
62 | the snapshot's exception store. | |
63 | ||
64 | discard_passdown_origin - a discard to the snapshot device is passed | |
65 | down to the snapshot-origin's underlying device. This doesn't cause | |
66 | copy-out to the snapshot exception store because the snapshot-origin | |
67 | target is bypassed. | |
68 | ||
69 | The discard_passdown_origin feature depends on the discard_zeroes_cow | |
70 | feature being enabled. | |
71 | ||
e484585e | 72 | |
22608405 LT |
73 | - snapshot-merge <origin> <COW device> <persistent> <chunksize> |
74 | [<# feature args> [<arg>]*] | |
d698aa45 MP |
75 | |
76 | takes the same table arguments as the snapshot target except it only | |
77 | works with persistent snapshots. This target assumes the role of the | |
78 | "snapshot-origin" target and must not be loaded if the "snapshot-origin" | |
79 | is still present for <origin>. | |
80 | ||
81 | Creates a merging snapshot that takes control of the changed chunks | |
82 | stored in the <COW device> of an existing snapshot, through a handover | |
83 | procedure, and merges these chunks back into the <origin>. Once merging | |
84 | has started (in the background) the <origin> may be opened and the merge | |
85 | will continue while I/O is flowing to it. Changes to the <origin> are | |
86 | deferred until the merging snapshot's corresponding chunk(s) have been | |
87 | merged. Once merging has started the snapshot device, associated with | |
88 | the "snapshot" target, will return -EIO when accessed. | |
89 | ||
90 | ||
91 | How snapshot is used by LVM2 | |
92 | ============================ | |
e484585e PBG |
93 | When you create the first LVM2 snapshot of a volume, four dm devices are used: |
94 | ||
95 | 1) a device containing the original mapping table of the source volume; | |
96 | 2) a device used as the <COW device>; | |
97 | 3) a "snapshot" device, combining #1 and #2, which is the visible snapshot | |
98 | volume; | |
99 | 4) the "original" volume (which uses the device number used by the original | |
100 | source volume), whose table is replaced by a "snapshot-origin" mapping | |
101 | from device #1. | |
102 | ||
f0ba4377 | 103 | A fixed naming scheme is used, so with the following commands:: |
e484585e | 104 | |
f0ba4377 MCC |
105 | lvcreate -L 1G -n base volumeGroup |
106 | lvcreate -L 100M --snapshot -n snap volumeGroup/base | |
e484585e | 107 | |
f0ba4377 | 108 | we'll have this situation (with volumes in above order):: |
e484585e | 109 | |
f0ba4377 | 110 | # dmsetup table|grep volumeGroup |
e484585e | 111 | |
f0ba4377 MCC |
112 | volumeGroup-base-real: 0 2097152 linear 8:19 384 |
113 | volumeGroup-snap-cow: 0 204800 linear 8:19 2097536 | |
114 | volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16 | |
115 | volumeGroup-base: 0 2097152 snapshot-origin 254:11 | |
e484585e | 116 | |
f0ba4377 MCC |
117 | # ls -lL /dev/mapper/volumeGroup-* |
118 | brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real | |
119 | brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow | |
120 | brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap | |
121 | brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base | |
e484585e | 122 | |
d698aa45 MP |
123 | |
124 | How snapshot-merge is used by LVM2 | |
125 | ================================== | |
126 | A merging snapshot assumes the role of the "snapshot-origin" while | |
127 | merging. As such the "snapshot-origin" is replaced with | |
128 | "snapshot-merge". The "-real" device is not changed and the "-cow" | |
129 | device is renamed to <origin name>-cow to aid LVM2's cleanup of the | |
130 | merging snapshot after it completes. The "snapshot" that hands over its | |
131 | COW device to the "snapshot-merge" is deactivated (unless using lvchange | |
132 | --refresh); but if it is left active it will simply return I/O errors. | |
133 | ||
f0ba4377 | 134 | A snapshot will merge into its origin with the following command:: |
d698aa45 | 135 | |
f0ba4377 | 136 | lvconvert --merge volumeGroup/snap |
d698aa45 | 137 | |
f0ba4377 | 138 | we'll now have this situation:: |
d698aa45 | 139 | |
f0ba4377 | 140 | # dmsetup table|grep volumeGroup |
d698aa45 | 141 | |
f0ba4377 MCC |
142 | volumeGroup-base-real: 0 2097152 linear 8:19 384 |
143 | volumeGroup-base-cow: 0 204800 linear 8:19 2097536 | |
144 | volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 | |
d698aa45 | 145 | |
f0ba4377 MCC |
146 | # ls -lL /dev/mapper/volumeGroup-* |
147 | brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real | |
148 | brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow | |
149 | brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base | |
c53a381e MS |
150 | |
151 | ||
152 | How to determine when a merging is complete | |
153 | =========================================== | |
154 | The snapshot-merge and snapshot status lines end with: | |
f0ba4377 | 155 | |
c53a381e MS |
156 | <sectors_allocated>/<total_sectors> <metadata_sectors> |
157 | ||
158 | Both <sectors_allocated> and <total_sectors> include both data and metadata. | |
159 | During merging, the number of sectors allocated gets smaller and | |
160 | smaller. Merging has finished when the number of sectors holding data | |
161 | is zero, in other words <sectors_allocated> == <metadata_sectors>. | |
162 | ||
f0ba4377 | 163 | Here is a practical example (using a hybrid of lvm and dmsetup commands):: |
c53a381e | 164 | |
f0ba4377 MCC |
165 | # lvs |
166 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert | |
167 | base volumeGroup owi-a- 4.00g | |
168 | snap volumeGroup swi-a- 1.00g base 18.97 | |
c53a381e | 169 | |
f0ba4377 MCC |
170 | # dmsetup status volumeGroup-snap |
171 | 0 8388608 snapshot 397896/2097152 1560 | |
172 | ^^^^ metadata sectors | |
c53a381e | 173 | |
f0ba4377 MCC |
174 | # lvconvert --merge -b volumeGroup/snap |
175 | Merging of volume snap started. | |
c53a381e | 176 | |
f0ba4377 MCC |
177 | # lvs volumeGroup/snap |
178 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert | |
179 | base volumeGroup Owi-a- 4.00g 17.23 | |
c53a381e | 180 | |
f0ba4377 MCC |
181 | # dmsetup status volumeGroup-base |
182 | 0 8388608 snapshot-merge 281688/2097152 1104 | |
c53a381e | 183 | |
f0ba4377 MCC |
184 | # dmsetup status volumeGroup-base |
185 | 0 8388608 snapshot-merge 180480/2097152 712 | |
c53a381e | 186 | |
f0ba4377 MCC |
187 | # dmsetup status volumeGroup-base |
188 | 0 8388608 snapshot-merge 16/2097152 16 | |
c53a381e MS |
189 | |
190 | Merging has finished. | |
191 | ||
f0ba4377 MCC |
192 | :: |
193 | ||
194 | # lvs | |
195 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert | |
196 | base volumeGroup owi-a- 4.00g |