Commit | Line | Data |
---|---|---|
898bd37a | 1 | ========================================== |
04ccc65c | 2 | Explicit volatile write back cache control |
898bd37a | 3 | ========================================== |
04ccc65c CH |
4 | |
5 | Introduction | |
6 | ------------ | |
7 | ||
8 | Many storage devices, especially in the consumer market, come with volatile | |
9 | write back caches. That means the devices signal I/O completion to the | |
10 | operating system before data actually has hit the non-volatile storage. This | |
11 | behavior obviously speeds up various workloads, but it means the operating | |
12 | system needs to force data out to the non-volatile storage when it performs | |
13 | a data integrity operation like fsync, sync or an unmount. | |
14 | ||
15 | The Linux block layer provides two simple mechanisms that let filesystems | |
16 | control the caching behavior of the storage device. These mechanisms are | |
17 | a forced cache flush, and the Force Unit Access (FUA) flag for requests. | |
18 | ||
19 | ||
20 | Explicit cache flushes | |
21 | ---------------------- | |
22 | ||
28a8f0d3 | 23 | The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from |
04ccc65c CH |
24 | the filesystem and will make sure the volatile cache of the storage device |
25 | has been flushed before the actual I/O operation is started. This explicitly | |
26 | guarantees that previously completed write requests are on non-volatile | |
28a8f0d3 | 27 | storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be |
04ccc65c CH |
28 | set on an otherwise empty bio structure, which causes only an explicit cache |
29 | flush without any dependent I/O. It is recommend to use | |
30 | the blkdev_issue_flush() helper for a pure cache flush. | |
31 | ||
32 | ||
33 | Forced Unit Access | |
898bd37a | 34 | ------------------ |
04ccc65c CH |
35 | |
36 | The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the | |
37 | filesystem and will make sure that I/O completion for this request is only | |
38 | signaled after the data has been committed to non-volatile storage. | |
39 | ||
40 | ||
41 | Implementation details for filesystems | |
42 | -------------------------------------- | |
43 | ||
28a8f0d3 | 44 | Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to |
04ccc65c | 45 | worry if the underlying devices need any explicit cache flushing and how |
28a8f0d3 | 46 | the Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flags |
04ccc65c CH |
47 | may both be set on a single bio. |
48 | ||
49 | ||
c62b37d9 | 50 | Implementation details for bio based block drivers |
04ccc65c CH |
51 | -------------------------------------------------------------- |
52 | ||
28a8f0d3 | 53 | These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit |
04ccc65c CH |
54 | directly below the submit_bio interface. For remapping drivers the REQ_FUA |
55 | bits need to be propagated to underlying devices, and a global flush needs | |
28a8f0d3 MC |
56 | to be implemented for bios with the REQ_PREFLUSH bit set. For real device |
57 | drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits | |
58 | on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without | |
04ccc65c CH |
59 | data can be completed successfully without doing any work. Drivers for |
60 | devices with volatile caches need to implement the support for these | |
61 | flags themselves without any help from the block layer. | |
62 | ||
63 | ||
64 | Implementation details for request_fn based block drivers | |
898bd37a | 65 | --------------------------------------------------------- |
04ccc65c CH |
66 | |
67 | For devices that do not support volatile write caches there is no driver | |
28a8f0d3 MC |
68 | support required, the block layer completes empty REQ_PREFLUSH requests before |
69 | entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from | |
04ccc65c CH |
70 | requests that have a payload. For devices with volatile write caches the |
71 | driver needs to tell the block layer that it supports flushing caches by | |
898bd37a | 72 | doing:: |
04ccc65c | 73 | |
2245f6de | 74 | blk_queue_write_cache(sdkp->disk->queue, true, false); |
04ccc65c | 75 | |
3a5e02ce | 76 | and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn. Note that |
28a8f0d3 | 77 | REQ_PREFLUSH requests with a payload are automatically turned into a sequence |
3a5e02ce | 78 | of an empty REQ_OP_FLUSH request followed by the actual write by the block |
04ccc65c | 79 | layer. For devices that also support the FUA bit the block layer needs |
898bd37a | 80 | to be told to pass through the REQ_FUA bit using:: |
04ccc65c | 81 | |
2245f6de | 82 | blk_queue_write_cache(sdkp->disk->queue, true, true); |
04ccc65c CH |
83 | |
84 | and the driver must handle write requests that have the REQ_FUA bit set | |
85 | in prep_fn/request_fn. If the FUA bit is not natively supported the block | |
3a5e02ce | 86 | layer turns it into an empty REQ_OP_FLUSH request after the actual write. |