Commit | Line | Data |
---|---|---|
efc930fa MCC |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
e0484344 DH |
3 | ============================== |
4 | Network Filesystem Caching API | |
5 | ============================== | |
2d6fff63 | 6 | |
e0484344 DH |
7 | Fscache provides an API by which a network filesystem can make use of local |
8 | caching facilities. The API is arranged around a number of principles: | |
2d6fff63 | 9 | |
e0484344 DH |
10 | (1) A cache is logically organised into volumes and data storage objects |
11 | within those volumes. | |
2d6fff63 | 12 | |
e0484344 DH |
13 | (2) Volumes and data storage objects are represented by various types of |
14 | cookie. | |
2d6fff63 | 15 | |
e0484344 | 16 | (3) Cookies have keys that distinguish them from their peers. |
2d6fff63 | 17 | |
e0484344 DH |
18 | (4) Cookies have coherency data that allows a cache to determine if the |
19 | cached data is still valid. | |
2d6fff63 | 20 | |
e0484344 | 21 | (5) I/O is done asynchronously where possible. |
2d6fff63 | 22 | |
e0484344 | 23 | This API is used by:: |
2d6fff63 | 24 | |
e0484344 | 25 | #include <linux/fscache.h>. |
2d6fff63 | 26 | |
e0484344 | 27 | .. This document contains the following sections: |
2d6fff63 | 28 | |
e0484344 DH |
29 | (1) Overview |
30 | (2) Volume registration | |
31 | (3) Data file registration | |
32 | (4) Declaring a cookie to be in use | |
33 | (5) Resizing a data file (truncation) | |
34 | (6) Data I/O API | |
35 | (7) Data file coherency | |
36 | (8) Data file invalidation | |
37 | (9) Write back resource management | |
38 | (10) Caching of local modifications | |
39 | (11) Page release and invalidation | |
40 | ||
41 | ||
42 | Overview | |
43 | ======== | |
44 | ||
45 | The fscache hierarchy is organised on two levels from a network filesystem's | |
46 | point of view. The upper level represents "volumes" and the lower level | |
47 | represents "data storage objects". These are represented by two types of | |
48 | cookie, hereafter referred to as "volume cookies" and "cookies". | |
49 | ||
50 | A network filesystem acquires a volume cookie for a volume using a volume key, | |
51 | which represents all the information that defines that volume (e.g. cell name | |
52 | or server address, volume ID or share name). This must be rendered as a | |
53 | printable string that can be used as a directory name (ie. no '/' characters | |
54 | and shouldn't begin with a '.'). The maximum name length is one less than the | |
55 | maximum size of a filename component (allowing the cache backend one char for | |
56 | its own purposes). | |
57 | ||
58 | A filesystem would typically have a volume cookie for each superblock. | |
59 | ||
60 | The filesystem then acquires a cookie for each file within that volume using an | |
61 | object key. Object keys are binary blobs and only need to be unique within | |
62 | their parent volume. The cache backend is reponsible for rendering the binary | |
63 | blob into something it can use and may employ hash tables, trees or whatever to | |
64 | improve its ability to find an object. This is transparent to the network | |
65 | filesystem. | |
66 | ||
67 | A filesystem would typically have a cookie for each inode, and would acquire it | |
68 | in iget and relinquish it when evicting the cookie. | |
69 | ||
70 | Once it has a cookie, the filesystem needs to mark the cookie as being in use. | |
71 | This causes fscache to send the cache backend off to look up/create resources | |
72 | for the cookie in the background, to check its coherency and, if necessary, to | |
73 | mark the object as being under modification. | |
74 | ||
75 | A filesystem would typically "use" the cookie in its file open routine and | |
76 | unuse it in file release and it needs to use the cookie around calls to | |
77 | truncate the cookie locally. It *also* needs to use the cookie when the | |
78 | pagecache becomes dirty and unuse it when writeback is complete. This is | |
79 | slightly tricky, and provision is made for it. | |
80 | ||
81 | When performing a read, write or resize on a cookie, the filesystem must first | |
82 | begin an operation. This copies the resources into a holding struct and puts | |
83 | extra pins into the cache to stop cache withdrawal from tearing down the | |
84 | structures being used. The actual operation can then be issued and conflicting | |
85 | invalidations can be detected upon completion. | |
86 | ||
87 | The filesystem is expected to use netfslib to access the cache, but that's not | |
88 | actually required and it can use the fscache I/O API directly. | |
89 | ||
90 | ||
91 | Volume Registration | |
92 | =================== | |
93 | ||
94 | The first step for a network filsystem is to acquire a volume cookie for the | |
95 | volume it wants to access:: | |
96 | ||
97 | struct fscache_volume * | |
98 | fscache_acquire_volume(const char *volume_key, | |
99 | const char *cache_name, | |
100 | const void *coherency_data, | |
101 | size_t coherency_len); | |
102 | ||
103 | This function creates a volume cookie with the specified volume key as its name | |
104 | and notes the coherency data. | |
105 | ||
106 | The volume key must be a printable string with no '/' characters in it. It | |
107 | should begin with the name of the filesystem and should be no longer than 254 | |
108 | characters. It should uniquely represent the volume and will be matched with | |
109 | what's stored in the cache. | |
110 | ||
111 | The caller may also specify the name of the cache to use. If specified, | |
112 | fscache will look up or create a cache cookie of that name and will use a cache | |
113 | of that name if it is online or comes online. If no cache name is specified, | |
114 | it will use the first cache that comes to hand and set the name to that. | |
115 | ||
116 | The specified coherency data is stored in the cookie and will be matched | |
117 | against coherency data stored on disk. The data pointer may be NULL if no data | |
118 | is provided. If the coherency data doesn't match, the entire cache volume will | |
119 | be invalidated. | |
120 | ||
121 | This function can return errors such as EBUSY if the volume key is already in | |
122 | use by an acquired volume or ENOMEM if an allocation failure occured. It may | |
123 | also return a NULL volume cookie if fscache is not enabled. It is safe to | |
124 | pass a NULL cookie to any function that takes a volume cookie. This will | |
125 | cause that function to do nothing. | |
126 | ||
127 | ||
128 | When the network filesystem has finished with a volume, it should relinquish it | |
129 | by calling:: | |
130 | ||
131 | void fscache_relinquish_volume(struct fscache_volume *volume, | |
132 | const void *coherency_data, | |
133 | bool invalidate); | |
134 | ||
135 | This will cause the volume to be committed or removed, and if sealed the | |
136 | coherency data will be set to the value supplied. The amount of coherency data | |
137 | must match the length specified when the volume was acquired. Note that all | |
138 | data cookies obtained in this volume must be relinquished before the volume is | |
139 | relinquished. | |
2d6fff63 DH |
140 | |
141 | ||
e0484344 DH |
142 | Data File Registration |
143 | ====================== | |
2d6fff63 | 144 | |
e0484344 DH |
145 | Once it has a volume cookie, a network filesystem can use it to acquire a |
146 | cookie for data storage:: | |
2d6fff63 DH |
147 | |
148 | struct fscache_cookie * | |
e0484344 DH |
149 | fscache_acquire_cookie(struct fscache_volume *volume, |
150 | u8 advice, | |
402cb8dd DH |
151 | const void *index_key, |
152 | size_t index_key_len, | |
153 | const void *aux_data, | |
154 | size_t aux_data_len, | |
e0484344 | 155 | loff_t object_size) |
2d6fff63 | 156 | |
e0484344 DH |
157 | This creates the cookie in the volume using the specified index key. The index |
158 | key is a binary blob of the given length and must be unique for the volume. | |
159 | This is saved into the cookie. There are no restrictions on the content, but | |
160 | its length shouldn't exceed about three quarters of the maximum filename length | |
161 | to allow for encoding. | |
2d6fff63 | 162 | |
e0484344 DH |
163 | The caller should also pass in a piece of coherency data in aux_data. A buffer |
164 | of size aux_data_len will be allocated and the coherency data copied in. It is | |
165 | assumed that the size is invariant over time. The coherency data is used to | |
166 | check the validity of data in the cache. Functions are provided by which the | |
167 | coherency data can be updated. | |
402cb8dd | 168 | |
e0484344 DH |
169 | The file size of the object being cached should also be provided. This may be |
170 | used to trim the data and will be stored with the coherency data. | |
402cb8dd | 171 | |
e0484344 DH |
172 | This function never returns an error, though it may return a NULL cookie on |
173 | allocation failure or if fscache is not enabled. It is safe to pass in a NULL | |
174 | volume cookie and pass the NULL cookie returned to any function that takes it. | |
175 | This will cause that function to do nothing. | |
402cb8dd | 176 | |
ee1235a9 | 177 | |
e0484344 DH |
178 | When the network filesystem has finished with a cookie, it should relinquish it |
179 | by calling:: | |
2d6fff63 | 180 | |
e0484344 DH |
181 | void fscache_relinquish_cookie(struct fscache_cookie *cookie, |
182 | bool retire); | |
2d6fff63 | 183 | |
e0484344 DH |
184 | This will cause fscache to either commit the storage backing the cookie or |
185 | delete it. | |
94d30ae9 | 186 | |
2d6fff63 | 187 | |
e0484344 DH |
188 | Marking A Cookie In-Use |
189 | ======================= | |
2d6fff63 | 190 | |
e0484344 DH |
191 | Once a cookie has been acquired by a network filesystem, the filesystem should |
192 | tell fscache when it intends to use the cookie (typically done on file open) | |
193 | and should say when it has finished with it (typically on file close):: | |
2d6fff63 | 194 | |
e0484344 DH |
195 | void fscache_use_cookie(struct fscache_cookie *cookie, |
196 | bool will_modify); | |
197 | void fscache_unuse_cookie(struct fscache_cookie *cookie, | |
198 | const void *aux_data, | |
199 | const loff_t *object_size); | |
2d6fff63 | 200 | |
e0484344 DH |
201 | The *use* function tells fscache that it will use the cookie and, additionally, |
202 | indicate if the user is intending to modify the contents locally. If not yet | |
203 | done, this will trigger the cache backend to go and gather the resources it | |
204 | needs to access/store data in the cache. This is done in the background, and | |
205 | so may not be complete by the time the function returns. | |
2d6fff63 | 206 | |
e0484344 DH |
207 | The *unuse* function indicates that a filesystem has finished using a cookie. |
208 | It optionally updates the stored coherency data and object size and then | |
209 | decreases the in-use counter. When the last user unuses the cookie, it is | |
210 | scheduled for garbage collection. If not reused within a short time, the | |
211 | resources will be released to reduce system resource consumption. | |
2d6fff63 | 212 | |
e0484344 DH |
213 | A cookie must be marked in-use before it can be accessed for read, write or |
214 | resize - and an in-use mark must be kept whilst there is dirty data in the | |
215 | pagecache in order to avoid an oops due to trying to open a file during process | |
216 | exit. | |
2d6fff63 | 217 | |
e0484344 DH |
218 | Note that in-use marks are cumulative. For each time a cookie is marked |
219 | in-use, it must be unused. | |
2d6fff63 DH |
220 | |
221 | ||
e0484344 | 222 | Resizing A Data File (Truncation) |
2d6fff63 DH |
223 | ================================= |
224 | ||
e0484344 DH |
225 | If a network filesystem file is resized locally by truncation, the following |
226 | should be called to notify the cache:: | |
2d6fff63 | 227 | |
e0484344 DH |
228 | void fscache_resize_cookie(struct fscache_cookie *cookie, |
229 | loff_t new_size); | |
2d6fff63 | 230 | |
e0484344 DH |
231 | The caller must have first marked the cookie in-use. The cookie and the new |
232 | size are passed in and the cache is synchronously resized. This is expected to | |
233 | be called from ``->setattr()`` inode operation under the inode lock. | |
2d6fff63 | 234 | |
2d6fff63 | 235 | |
e0484344 DH |
236 | Data I/O API |
237 | ============ | |
2d6fff63 | 238 | |
e0484344 DH |
239 | To do data I/O operations directly through a cookie, the following functions |
240 | are available:: | |
2d6fff63 | 241 | |
e0484344 DH |
242 | int fscache_begin_read_operation(struct netfs_cache_resources *cres, |
243 | struct fscache_cookie *cookie); | |
244 | int fscache_read(struct netfs_cache_resources *cres, | |
245 | loff_t start_pos, | |
246 | struct iov_iter *iter, | |
247 | enum netfs_read_from_hole read_hole, | |
248 | netfs_io_terminated_t term_func, | |
249 | void *term_func_priv); | |
250 | int fscache_write(struct netfs_cache_resources *cres, | |
251 | loff_t start_pos, | |
252 | struct iov_iter *iter, | |
253 | netfs_io_terminated_t term_func, | |
254 | void *term_func_priv); | |
2d6fff63 | 255 | |
e0484344 DH |
256 | The *begin* function sets up an operation, attaching the resources required to |
257 | the cache resources block from the cookie. Assuming it doesn't return an error | |
258 | (for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do | |
259 | nothing), then one of the other two functions can be issued. | |
2d6fff63 | 260 | |
e0484344 DH |
261 | The *read* and *write* functions initiate a direct-IO operation. Both take the |
262 | previously set up cache resources block, an indication of the start file | |
263 | position, and an I/O iterator that describes buffer and indicates the amount of | |
264 | data. | |
2d6fff63 | 265 | |
e0484344 DH |
266 | The read function also takes a parameter to indicate how it should handle a |
267 | partially populated region (a hole) in the disk content. This may be to ignore | |
268 | it, skip over an initial hole and place zeros in the buffer or give an error. | |
2d6fff63 | 269 | |
e0484344 DH |
270 | The read and write functions can be given an optional termination function that |
271 | will be run on completion:: | |
2d6fff63 DH |
272 | |
273 | typedef | |
e0484344 DH |
274 | void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, |
275 | bool was_async); | |
2d6fff63 | 276 | |
e0484344 DH |
277 | If a termination function is given, the operation will be run asynchronously |
278 | and the termination function will be called upon completion. If not given, the | |
279 | operation will be run synchronously. Note that in the asynchronous case, it is | |
280 | possible for the operation to complete before the function returns. | |
2d6fff63 | 281 | |
e0484344 DH |
282 | Both the read and write functions end the operation when they complete, |
283 | detaching any pinned resources. | |
2d6fff63 | 284 | |
e0484344 DH |
285 | The read operation will fail with ESTALE if invalidation occurred whilst the |
286 | operation was ongoing. | |
2d6fff63 | 287 | |
2d6fff63 | 288 | |
e0484344 DH |
289 | Data File Coherency |
290 | =================== | |
2d6fff63 | 291 | |
e0484344 DH |
292 | To request an update of the coherency data and file size on a cookie, the |
293 | following should be called:: | |
2d6fff63 | 294 | |
402cb8dd | 295 | void fscache_update_cookie(struct fscache_cookie *cookie, |
402cb8dd | 296 | const void *aux_data, |
e0484344 | 297 | const loff_t *object_size); |
94d30ae9 | 298 | |
e0484344 | 299 | This will update the cookie's coherency data and/or file size. |
94d30ae9 | 300 | |
ee1235a9 | 301 | |
e0484344 DH |
302 | Data File Invalidation |
303 | ====================== | |
2d6fff63 | 304 | |
e0484344 DH |
305 | Sometimes it will be necessary to invalidate an object that contains data. |
306 | Typically this will be necessary when the server informs the network filesystem | |
307 | of a remote third-party change - at which point the filesystem has to throw | |
308 | away the state and cached data that it had for an file and reload from the | |
309 | server. | |
2d6fff63 | 310 | |
e0484344 DH |
311 | To indicate that a cache object should be invalidated, the following should be |
312 | called:: | |
2d6fff63 | 313 | |
e0484344 DH |
314 | void fscache_invalidate(struct fscache_cookie *cookie, |
315 | const void *aux_data, | |
316 | loff_t size, | |
317 | unsigned int flags); | |
2d6fff63 | 318 | |
e0484344 DH |
319 | This increases the invalidation counter in the cookie to cause outstanding |
320 | reads to fail with -ESTALE, sets the coherency data and file size from the | |
321 | information supplied, blocks new I/O on the cookie and dispatches the cache to | |
322 | go and get rid of the old data. | |
2d6fff63 | 323 | |
e0484344 DH |
324 | Invalidation runs asynchronously in a worker thread so that it doesn't block |
325 | too much. | |
2d6fff63 | 326 | |
2d6fff63 | 327 | |
e0484344 DH |
328 | Write-Back Resource Management |
329 | ============================== | |
2d6fff63 | 330 | |
e0484344 DH |
331 | To write data to the cache from network filesystem writeback, the cache |
332 | resources required need to be pinned at the point the modification is made (for | |
333 | instance when the page is marked dirty) as it's not possible to open a file in | |
334 | a thread that's exiting. | |
2d6fff63 | 335 | |
e0484344 | 336 | The following facilities are provided to manage this: |
2d6fff63 | 337 | |
e0484344 DH |
338 | * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an |
339 | in-use is held on the cookie for this inode. It can only be changed if the | |
340 | the inode lock is held. | |
2d6fff63 | 341 | |
e0484344 DH |
342 | * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control`` |
343 | struct that gets set if ``__writeback_single_inode()`` clears | |
344 | ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared. | |
2d6fff63 | 345 | |
e0484344 | 346 | To support this, the following functions are provided:: |
2d6fff63 | 347 | |
8fb72b4a MWO |
348 | bool fscache_dirty_folio(struct address_space *mapping, |
349 | struct folio *folio, | |
350 | struct fscache_cookie *cookie); | |
e0484344 DH |
351 | void fscache_unpin_writeback(struct writeback_control *wbc, |
352 | struct fscache_cookie *cookie); | |
353 | void fscache_clear_inode_writeback(struct fscache_cookie *cookie, | |
354 | struct inode *inode, | |
355 | const void *aux); | |
2d6fff63 | 356 | |
e0484344 | 357 | The *set* function is intended to be called from the filesystem's |
8fb72b4a | 358 | ``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not |
e0484344 DH |
359 | set, it sets that flag and increments the use count on the cookie (the caller |
360 | must already have called ``fscache_use_cookie()``). | |
2d6fff63 | 361 | |
e0484344 DH |
362 | The *unpin* function is intended to be called from the filesystem's |
363 | ``write_inode`` superblock operation. It cleans up after writing by unusing | |
364 | the cookie if unpinned_fscache_wb is set in the writeback_control struct. | |
2d6fff63 | 365 | |
e0484344 DH |
366 | The *clear* function is intended to be called from the netfs's ``evict_inode`` |
367 | superblock operation. It must be called *after* | |
368 | ``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans | |
369 | up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to | |
370 | be updated. | |
2d6fff63 | 371 | |
2d6fff63 | 372 | |
e0484344 DH |
373 | Caching of Local Modifications |
374 | ============================== | |
402cb8dd | 375 | |
e0484344 DH |
376 | If a network filesystem has locally modified data that it wants to write to the |
377 | cache, it needs to mark the pages to indicate that a write is in progress, and | |
378 | if the mark is already present, it needs to wait for it to be removed first | |
379 | (presumably due to an already in-progress operation). This prevents multiple | |
380 | competing DIO writes to the same storage in the cache. | |
2d6fff63 | 381 | |
e0484344 DH |
382 | Firstly, the netfs should determine if caching is available by doing something |
383 | like:: | |
2d6fff63 | 384 | |
e0484344 | 385 | bool caching = fscache_cookie_enabled(cookie); |
ef778e7a | 386 | |
e0484344 DH |
387 | If caching is to be attempted, pages should be waited for and then marked using |
388 | the following functions provided by the netfs helper library:: | |
ef778e7a | 389 | |
e0484344 DH |
390 | void set_page_fscache(struct page *page); |
391 | void wait_on_page_fscache(struct page *page); | |
392 | int wait_on_page_fscache_killable(struct page *page); | |
ef778e7a | 393 | |
e0484344 DH |
394 | Once all the pages in the span are marked, the netfs can ask fscache to |
395 | schedule a write of that region:: | |
ef778e7a | 396 | |
e0484344 DH |
397 | void fscache_write_to_cache(struct fscache_cookie *cookie, |
398 | struct address_space *mapping, | |
399 | loff_t start, size_t len, loff_t i_size, | |
400 | netfs_io_terminated_t term_func, | |
401 | void *term_func_priv, | |
402 | bool caching) | |
ef778e7a | 403 | |
e0484344 DH |
404 | And if an error occurs before that point is reached, the marks can be removed |
405 | by calling:: | |
ef778e7a | 406 | |
2c547f29 | 407 | void fscache_clear_page_bits(struct address_space *mapping, |
e0484344 DH |
408 | loff_t start, size_t len, |
409 | bool caching) | |
ef778e7a | 410 | |
2c547f29 YH |
411 | In these functions, a pointer to the mapping to which the source pages are |
412 | attached is passed in and start and len indicate the size of the region that's | |
413 | going to be written (it doesn't have to align to page boundaries necessarily, | |
414 | but it does have to align to DIO boundaries on the backing filesystem). The | |
415 | caching parameter indicates if caching should be skipped, and if false, the | |
416 | functions do nothing. | |
417 | ||
418 | The write function takes some additional parameters: the cookie representing | |
419 | the cache object to be written to, i_size indicates the size of the netfs file | |
420 | and term_func indicates an optional completion function, to which | |
421 | term_func_priv will be passed, along with the error or amount written. | |
ef778e7a | 422 | |
e0484344 DH |
423 | Note that the write function will always run asynchronously and will unmark all |
424 | the pages upon completion before calling term_func. | |
2d6fff63 | 425 | |
2d6fff63 | 426 | |
e0484344 DH |
427 | Page Release and Invalidation |
428 | ============================= | |
2d6fff63 | 429 | |
e0484344 DH |
430 | Fscache keeps track of whether we have any data in the cache yet for a cache |
431 | object we've just created. It knows it doesn't have to do any reading until it | |
432 | has done a write and then the page it wrote from has been released by the VM, | |
433 | after which it *has* to look in the cache. | |
2d6fff63 | 434 | |
e0484344 | 435 | To inform fscache that a page might now be in the cache, the following function |
fa29000b | 436 | should be called from the ``release_folio`` address space op:: |
2d6fff63 | 437 | |
e0484344 | 438 | void fscache_note_page_release(struct fscache_cookie *cookie); |
2d6fff63 | 439 | |
fa29000b | 440 | if the page has been released (ie. release_folio returned true). |
2d6fff63 | 441 | |
e0484344 DH |
442 | Page release and page invalidation should also wait for any mark left on the |
443 | page to say that a DIO write is underway from that page:: | |
2d6fff63 | 444 | |
e0484344 DH |
445 | void wait_on_page_fscache(struct page *page); |
446 | int wait_on_page_fscache_killable(struct page *page); | |
2d6fff63 | 447 | |
2d6fff63 | 448 | |
e0484344 DH |
449 | API Function Reference |
450 | ====================== | |
2d6fff63 | 451 | |
e0484344 | 452 | .. kernel-doc:: include/linux/fscache.h |