Commit | Line | Data |
---|---|---|
7431b783 NT |
1 | .. SPDX-License-Identifier: GPL-2.0-only |
2 | ||
3 | ======== | |
4 | dm-clone | |
5 | ======== | |
6 | ||
7 | Introduction | |
8 | ============ | |
9 | ||
10 | dm-clone is a device mapper target which produces a one-to-one copy of an | |
11 | existing, read-only source device into a writable destination device: It | |
12 | presents a virtual block device which makes all data appear immediately, and | |
13 | redirects reads and writes accordingly. | |
14 | ||
15 | The main use case of dm-clone is to clone a potentially remote, high-latency, | |
16 | read-only, archival-type block device into a writable, fast, primary-type device | |
17 | for fast, low-latency I/O. The cloned device is visible/mountable immediately | |
18 | and the copy of the source device to the destination device happens in the | |
19 | background, in parallel with user I/O. | |
20 | ||
21 | For example, one could restore an application backup from a read-only copy, | |
22 | accessible through a network storage protocol (NBD, Fibre Channel, iSCSI, AoE, | |
23 | etc.), into a local SSD or NVMe device, and start using the device immediately, | |
24 | without waiting for the restore to complete. | |
25 | ||
26 | When the cloning completes, the dm-clone table can be removed altogether and be | |
27 | replaced, e.g., by a linear table, mapping directly to the destination device. | |
28 | ||
29 | The dm-clone target reuses the metadata library used by the thin-provisioning | |
30 | target. | |
31 | ||
32 | Glossary | |
33 | ======== | |
34 | ||
35 | Hydration | |
36 | The process of filling a region of the destination device with data from | |
37 | the same region of the source device, i.e., copying the region from the | |
38 | source to the destination device. | |
39 | ||
40 | Once a region gets hydrated we redirect all I/O regarding it to the destination | |
41 | device. | |
42 | ||
43 | Design | |
44 | ====== | |
45 | ||
46 | Sub-devices | |
47 | ----------- | |
48 | ||
49 | The target is constructed by passing three devices to it (along with other | |
50 | parameters detailed later): | |
51 | ||
52 | 1. A source device - the read-only device that gets cloned and source of the | |
53 | hydration. | |
54 | ||
55 | 2. A destination device - the destination of the hydration, which will become a | |
56 | clone of the source device. | |
57 | ||
58 | 3. A small metadata device - it records which regions are already valid in the | |
59 | destination device, i.e., which regions have already been hydrated, or have | |
60 | been written to directly, via user I/O. | |
61 | ||
62 | The size of the destination device must be at least equal to the size of the | |
63 | source device. | |
64 | ||
65 | Regions | |
66 | ------- | |
67 | ||
68 | dm-clone divides the source and destination devices in fixed sized regions. | |
69 | Regions are the unit of hydration, i.e., the minimum amount of data copied from | |
70 | the source to the destination device. | |
71 | ||
72 | The region size is configurable when you first create the dm-clone device. The | |
73 | recommended region size is the same as the file system block size, which usually | |
74 | is 4KB. The region size must be between 8 sectors (4KB) and 2097152 sectors | |
75 | (1GB) and a power of two. | |
76 | ||
77 | Reads and writes from/to hydrated regions are serviced from the destination | |
78 | device. | |
79 | ||
80 | A read to a not yet hydrated region is serviced directly from the source device. | |
81 | ||
82 | A write to a not yet hydrated region will be delayed until the corresponding | |
83 | region has been hydrated and the hydration of the region starts immediately. | |
84 | ||
85 | Note that a write request with size equal to region size will skip copying of | |
86 | the corresponding region from the source device and overwrite the region of the | |
87 | destination device directly. | |
88 | ||
89 | Discards | |
90 | -------- | |
91 | ||
92 | dm-clone interprets a discard request to a range that hasn't been hydrated yet | |
93 | as a hint to skip hydration of the regions covered by the request, i.e., it | |
94 | skips copying the region's data from the source to the destination device, and | |
95 | only updates its metadata. | |
96 | ||
97 | If the destination device supports discards, then by default dm-clone will pass | |
98 | down discard requests to it. | |
99 | ||
100 | Background Hydration | |
101 | -------------------- | |
102 | ||
103 | dm-clone copies continuously from the source to the destination device, until | |
104 | all of the device has been copied. | |
105 | ||
106 | Copying data from the source to the destination device uses bandwidth. The user | |
107 | can set a throttle to prevent more than a certain amount of copying occurring at | |
108 | any one time. Moreover, dm-clone takes into account user I/O traffic going to | |
109 | the devices and pauses the background hydration when there is I/O in-flight. | |
110 | ||
111 | A message `hydration_threshold <#regions>` can be used to set the maximum number | |
112 | of regions being copied, the default being 1 region. | |
113 | ||
114 | dm-clone employs dm-kcopyd for copying portions of the source device to the | |
115 | destination device. By default, we issue copy requests of size equal to the | |
116 | region size. A message `hydration_batch_size <#regions>` can be used to tune the | |
117 | size of these copy requests. Increasing the hydration batch size results in | |
118 | dm-clone trying to batch together contiguous regions, so we copy the data in | |
119 | batches of this many regions. | |
120 | ||
121 | When the hydration of the destination device finishes, a dm event will be sent | |
122 | to user space. | |
123 | ||
124 | Updating on-disk metadata | |
125 | ------------------------- | |
126 | ||
127 | On-disk metadata is committed every time a FLUSH or FUA bio is written. If no | |
128 | such requests are made then commits will occur every second. This means the | |
129 | dm-clone device behaves like a physical disk that has a volatile write cache. If | |
130 | power is lost you may lose some recent writes. The metadata should always be | |
131 | consistent in spite of any crash. | |
132 | ||
133 | Target Interface | |
134 | ================ | |
135 | ||
136 | Constructor | |
137 | ----------- | |
138 | ||
139 | :: | |
140 | ||
141 | clone <metadata dev> <destination dev> <source dev> <region size> | |
142 | [<#feature args> [<feature arg>]* [<#core args> [<core arg>]*]] | |
143 | ||
144 | ================ ============================================================== | |
145 | metadata dev Fast device holding the persistent metadata | |
146 | destination dev The destination device, where the source will be cloned | |
147 | source dev Read only device containing the data that gets cloned | |
148 | region size The size of a region in sectors | |
149 | ||
150 | #feature args Number of feature arguments passed | |
151 | feature args no_hydration or no_discard_passdown | |
152 | ||
153 | #core args An even number of arguments corresponding to key/value pairs | |
154 | passed to dm-clone | |
155 | core args Key/value pairs passed to dm-clone, e.g. `hydration_threshold | |
156 | 256` | |
157 | ================ ============================================================== | |
158 | ||
159 | Optional feature arguments are: | |
160 | ||
161 | ==================== ========================================================= | |
162 | no_hydration Create a dm-clone instance with background hydration | |
163 | disabled | |
164 | no_discard_passdown Disable passing down discards to the destination device | |
165 | ==================== ========================================================= | |
166 | ||
167 | Optional core arguments are: | |
168 | ||
169 | ================================ ============================================== | |
170 | hydration_threshold <#regions> Maximum number of regions being copied from | |
171 | the source to the destination device at any | |
172 | one time, during background hydration. | |
173 | hydration_batch_size <#regions> During background hydration, try to batch | |
174 | together contiguous regions, so we copy data | |
175 | from the source to the destination device in | |
176 | batches of this many regions. | |
177 | ================================ ============================================== | |
178 | ||
179 | Status | |
180 | ------ | |
181 | ||
182 | :: | |
183 | ||
184 | <metadata block size> <#used metadata blocks>/<#total metadata blocks> | |
185 | <region size> <#hydrated regions>/<#total regions> <#hydrating regions> | |
186 | <#feature args> <feature args>* <#core args> <core args>* | |
187 | <clone metadata mode> | |
188 | ||
189 | ======================= ======================================================= | |
190 | metadata block size Fixed block size for each metadata block in sectors | |
191 | #used metadata blocks Number of metadata blocks used | |
192 | #total metadata blocks Total number of metadata blocks | |
193 | region size Configurable region size for the device in sectors | |
194 | #hydrated regions Number of regions that have finished hydrating | |
195 | #total regions Total number of regions to hydrate | |
196 | #hydrating regions Number of regions currently hydrating | |
197 | #feature args Number of feature arguments to follow | |
198 | feature args Feature arguments, e.g. `no_hydration` | |
199 | #core args Even number of core arguments to follow | |
200 | core args Key/value pairs for tuning the core, e.g. | |
201 | `hydration_threshold 256` | |
202 | clone metadata mode ro if read-only, rw if read-write | |
203 | ||
204 | In serious cases where even a read-only mode is deemed | |
205 | unsafe no further I/O will be permitted and the status | |
206 | will just contain the string 'Fail'. If the metadata | |
207 | mode changes, a dm event will be sent to user space. | |
208 | ======================= ======================================================= | |
209 | ||
210 | Messages | |
211 | -------- | |
212 | ||
213 | `disable_hydration` | |
214 | Disable the background hydration of the destination device. | |
215 | ||
216 | `enable_hydration` | |
217 | Enable the background hydration of the destination device. | |
218 | ||
219 | `hydration_threshold <#regions>` | |
220 | Set background hydration threshold. | |
221 | ||
222 | `hydration_batch_size <#regions>` | |
223 | Set background hydration batch size. | |
224 | ||
225 | Examples | |
226 | ======== | |
227 | ||
228 | Clone a device containing a file system | |
229 | --------------------------------------- | |
230 | ||
231 | 1. Create the dm-clone device. | |
232 | ||
233 | :: | |
234 | ||
235 | dmsetup create clone --table "0 1048576000 clone $metadata_dev $dest_dev \ | |
236 | $source_dev 8 1 no_hydration" | |
237 | ||
238 | 2. Mount the device and trim the file system. dm-clone interprets the discards | |
239 | sent by the file system and it will not hydrate the unused space. | |
240 | ||
241 | :: | |
242 | ||
243 | mount /dev/mapper/clone /mnt/cloned-fs | |
244 | fstrim /mnt/cloned-fs | |
245 | ||
246 | 3. Enable background hydration of the destination device. | |
247 | ||
248 | :: | |
249 | ||
250 | dmsetup message clone 0 enable_hydration | |
251 | ||
252 | 4. When the hydration finishes, we can replace the dm-clone table with a linear | |
253 | table. | |
254 | ||
255 | :: | |
256 | ||
257 | dmsetup suspend clone | |
258 | dmsetup load clone --table "0 1048576000 linear $dest_dev 0" | |
259 | dmsetup resume clone | |
260 | ||
261 | The metadata device is no longer needed and can be safely discarded or reused | |
262 | for other purposes. | |
263 | ||
264 | Known issues | |
265 | ============ | |
266 | ||
267 | 1. We redirect reads, to not-yet-hydrated regions, to the source device. If | |
268 | reading the source device has high latency and the user repeatedly reads from | |
269 | the same regions, this behaviour could degrade performance. We should use | |
270 | these reads as hints to hydrate the relevant regions sooner. Currently, we | |
271 | rely on the page cache to cache these regions, so we hopefully don't end up | |
272 | reading them multiple times from the source device. | |
273 | ||
274 | 2. Release in-core resources, i.e., the bitmaps tracking which regions are | |
275 | hydrated, after the hydration has finished. | |
276 | ||
277 | 3. During background hydration, if we fail to read the source or write to the | |
278 | destination device, we print an error message, but the hydration process | |
279 | continues indefinitely, until it succeeds. We should stop the background | |
280 | hydration after a number of failures and emit a dm event for user space to | |
281 | notice. | |
282 | ||
283 | Why not...? | |
284 | =========== | |
285 | ||
286 | We explored the following alternatives before implementing dm-clone: | |
287 | ||
288 | 1. Use dm-cache with cache size equal to the source device and implement a new | |
289 | cloning policy: | |
290 | ||
291 | * The resulting cache device is not a one-to-one mirror of the source device | |
292 | and thus we cannot remove the cache device once cloning completes. | |
293 | ||
294 | * dm-cache writes to the source device, which violates our requirement that | |
295 | the source device must be treated as read-only. | |
296 | ||
297 | * Caching is semantically different from cloning. | |
298 | ||
299 | 2. Use dm-snapshot with a COW device equal to the source device: | |
300 | ||
301 | * dm-snapshot stores its metadata in the COW device, so the resulting device | |
302 | is not a one-to-one mirror of the source device. | |
303 | ||
304 | * No background copying mechanism. | |
305 | ||
306 | * dm-snapshot needs to commit its metadata whenever a pending exception | |
307 | completes, to ensure snapshot consistency. In the case of cloning, we don't | |
308 | need to be so strict and can rely on committing metadata every time a FLUSH | |
309 | or FUA bio is written, or periodically, like dm-thin and dm-cache do. This | |
310 | improves the performance significantly. | |
311 | ||
312 | 3. Use dm-mirror: The mirror target has a background copying/mirroring | |
313 | mechanism, but it writes to all mirrors, thus violating our requirement that | |
314 | the source device must be treated as read-only. | |
315 | ||
316 | 4. Use dm-thin's external snapshot functionality. This approach is the most | |
317 | promising among all alternatives, as the thinly-provisioned volume is a | |
318 | one-to-one mirror of the source device and handles reads and writes to | |
319 | un-provisioned/not-yet-cloned areas the same way as dm-clone does. | |
320 | ||
321 | Still: | |
322 | ||
323 | * There is no background copying mechanism, though one could be implemented. | |
324 | ||
325 | * Most importantly, we want to support arbitrary block devices as the | |
326 | destination of the cloning process and not restrict ourselves to | |
327 | thinly-provisioned volumes. Thin-provisioning has an inherent metadata | |
328 | overhead, for maintaining the thin volume mappings, which significantly | |
329 | degrades performance. | |
330 | ||
331 | Moreover, cloning a device shouldn't force the use of thin-provisioning. On | |
332 | the other hand, if we wish to use thin provisioning, we can just use a thin | |
333 | LV as dm-clone's destination device. |