Merge tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel...
[linux-2.6-block.git] / Documentation / filesystems / caching / netfs-api.rst
CommitLineData
efc930fa
MCC
1.. SPDX-License-Identifier: GPL-2.0
2
e0484344
DH
3==============================
4Network Filesystem Caching API
5==============================
2d6fff63 6
e0484344
DH
7Fscache provides an API by which a network filesystem can make use of local
8caching facilities. The API is arranged around a number of principles:
2d6fff63 9
e0484344
DH
10 (1) A cache is logically organised into volumes and data storage objects
11 within those volumes.
2d6fff63 12
e0484344
DH
13 (2) Volumes and data storage objects are represented by various types of
14 cookie.
2d6fff63 15
e0484344 16 (3) Cookies have keys that distinguish them from their peers.
2d6fff63 17
e0484344
DH
18 (4) Cookies have coherency data that allows a cache to determine if the
19 cached data is still valid.
2d6fff63 20
e0484344 21 (5) I/O is done asynchronously where possible.
2d6fff63 22
e0484344 23This API is used by::
2d6fff63 24
e0484344 25 #include <linux/fscache.h>.
2d6fff63 26
e0484344 27.. This document contains the following sections:
2d6fff63 28
e0484344
DH
29 (1) Overview
30 (2) Volume registration
31 (3) Data file registration
32 (4) Declaring a cookie to be in use
33 (5) Resizing a data file (truncation)
34 (6) Data I/O API
35 (7) Data file coherency
36 (8) Data file invalidation
37 (9) Write back resource management
38 (10) Caching of local modifications
39 (11) Page release and invalidation
40
41
42Overview
43========
44
45The fscache hierarchy is organised on two levels from a network filesystem's
46point of view. The upper level represents "volumes" and the lower level
47represents "data storage objects". These are represented by two types of
48cookie, hereafter referred to as "volume cookies" and "cookies".
49
50A network filesystem acquires a volume cookie for a volume using a volume key,
51which represents all the information that defines that volume (e.g. cell name
52or server address, volume ID or share name). This must be rendered as a
53printable string that can be used as a directory name (ie. no '/' characters
54and shouldn't begin with a '.'). The maximum name length is one less than the
55maximum size of a filename component (allowing the cache backend one char for
56its own purposes).
57
58A filesystem would typically have a volume cookie for each superblock.
59
60The filesystem then acquires a cookie for each file within that volume using an
61object key. Object keys are binary blobs and only need to be unique within
62their parent volume. The cache backend is reponsible for rendering the binary
63blob into something it can use and may employ hash tables, trees or whatever to
64improve its ability to find an object. This is transparent to the network
65filesystem.
66
67A filesystem would typically have a cookie for each inode, and would acquire it
68in iget and relinquish it when evicting the cookie.
69
70Once it has a cookie, the filesystem needs to mark the cookie as being in use.
71This causes fscache to send the cache backend off to look up/create resources
72for the cookie in the background, to check its coherency and, if necessary, to
73mark the object as being under modification.
74
75A filesystem would typically "use" the cookie in its file open routine and
76unuse it in file release and it needs to use the cookie around calls to
77truncate the cookie locally. It *also* needs to use the cookie when the
78pagecache becomes dirty and unuse it when writeback is complete. This is
79slightly tricky, and provision is made for it.
80
81When performing a read, write or resize on a cookie, the filesystem must first
82begin an operation. This copies the resources into a holding struct and puts
83extra pins into the cache to stop cache withdrawal from tearing down the
84structures being used. The actual operation can then be issued and conflicting
85invalidations can be detected upon completion.
86
87The filesystem is expected to use netfslib to access the cache, but that's not
88actually required and it can use the fscache I/O API directly.
89
90
91Volume Registration
92===================
93
94The first step for a network filsystem is to acquire a volume cookie for the
95volume it wants to access::
96
97 struct fscache_volume *
98 fscache_acquire_volume(const char *volume_key,
99 const char *cache_name,
100 const void *coherency_data,
101 size_t coherency_len);
102
103This function creates a volume cookie with the specified volume key as its name
104and notes the coherency data.
105
106The volume key must be a printable string with no '/' characters in it. It
107should begin with the name of the filesystem and should be no longer than 254
108characters. It should uniquely represent the volume and will be matched with
109what's stored in the cache.
110
111The caller may also specify the name of the cache to use. If specified,
112fscache will look up or create a cache cookie of that name and will use a cache
113of that name if it is online or comes online. If no cache name is specified,
114it will use the first cache that comes to hand and set the name to that.
115
116The specified coherency data is stored in the cookie and will be matched
117against coherency data stored on disk. The data pointer may be NULL if no data
118is provided. If the coherency data doesn't match, the entire cache volume will
119be invalidated.
120
121This function can return errors such as EBUSY if the volume key is already in
122use by an acquired volume or ENOMEM if an allocation failure occured. It may
123also return a NULL volume cookie if fscache is not enabled. It is safe to
124pass a NULL cookie to any function that takes a volume cookie. This will
125cause that function to do nothing.
126
127
128When the network filesystem has finished with a volume, it should relinquish it
129by calling::
130
131 void fscache_relinquish_volume(struct fscache_volume *volume,
132 const void *coherency_data,
133 bool invalidate);
134
135This will cause the volume to be committed or removed, and if sealed the
136coherency data will be set to the value supplied. The amount of coherency data
137must match the length specified when the volume was acquired. Note that all
138data cookies obtained in this volume must be relinquished before the volume is
139relinquished.
2d6fff63
DH
140
141
e0484344
DH
142Data File Registration
143======================
2d6fff63 144
e0484344
DH
145Once it has a volume cookie, a network filesystem can use it to acquire a
146cookie for data storage::
2d6fff63
DH
147
148 struct fscache_cookie *
e0484344
DH
149 fscache_acquire_cookie(struct fscache_volume *volume,
150 u8 advice,
402cb8dd
DH
151 const void *index_key,
152 size_t index_key_len,
153 const void *aux_data,
154 size_t aux_data_len,
e0484344 155 loff_t object_size)
2d6fff63 156
e0484344
DH
157This creates the cookie in the volume using the specified index key. The index
158key is a binary blob of the given length and must be unique for the volume.
159This is saved into the cookie. There are no restrictions on the content, but
160its length shouldn't exceed about three quarters of the maximum filename length
161to allow for encoding.
2d6fff63 162
e0484344
DH
163The caller should also pass in a piece of coherency data in aux_data. A buffer
164of size aux_data_len will be allocated and the coherency data copied in. It is
165assumed that the size is invariant over time. The coherency data is used to
166check the validity of data in the cache. Functions are provided by which the
167coherency data can be updated.
402cb8dd 168
e0484344
DH
169The file size of the object being cached should also be provided. This may be
170used to trim the data and will be stored with the coherency data.
402cb8dd 171
e0484344
DH
172This function never returns an error, though it may return a NULL cookie on
173allocation failure or if fscache is not enabled. It is safe to pass in a NULL
174volume cookie and pass the NULL cookie returned to any function that takes it.
175This will cause that function to do nothing.
402cb8dd 176
ee1235a9 177
e0484344
DH
178When the network filesystem has finished with a cookie, it should relinquish it
179by calling::
2d6fff63 180
e0484344
DH
181 void fscache_relinquish_cookie(struct fscache_cookie *cookie,
182 bool retire);
2d6fff63 183
e0484344
DH
184This will cause fscache to either commit the storage backing the cookie or
185delete it.
94d30ae9 186
2d6fff63 187
e0484344
DH
188Marking A Cookie In-Use
189=======================
2d6fff63 190
e0484344
DH
191Once a cookie has been acquired by a network filesystem, the filesystem should
192tell fscache when it intends to use the cookie (typically done on file open)
193and should say when it has finished with it (typically on file close)::
2d6fff63 194
e0484344
DH
195 void fscache_use_cookie(struct fscache_cookie *cookie,
196 bool will_modify);
197 void fscache_unuse_cookie(struct fscache_cookie *cookie,
198 const void *aux_data,
199 const loff_t *object_size);
2d6fff63 200
e0484344
DH
201The *use* function tells fscache that it will use the cookie and, additionally,
202indicate if the user is intending to modify the contents locally. If not yet
203done, this will trigger the cache backend to go and gather the resources it
204needs to access/store data in the cache. This is done in the background, and
205so may not be complete by the time the function returns.
2d6fff63 206
e0484344
DH
207The *unuse* function indicates that a filesystem has finished using a cookie.
208It optionally updates the stored coherency data and object size and then
209decreases the in-use counter. When the last user unuses the cookie, it is
210scheduled for garbage collection. If not reused within a short time, the
211resources will be released to reduce system resource consumption.
2d6fff63 212
e0484344
DH
213A cookie must be marked in-use before it can be accessed for read, write or
214resize - and an in-use mark must be kept whilst there is dirty data in the
215pagecache in order to avoid an oops due to trying to open a file during process
216exit.
2d6fff63 217
e0484344
DH
218Note that in-use marks are cumulative. For each time a cookie is marked
219in-use, it must be unused.
2d6fff63
DH
220
221
e0484344 222Resizing A Data File (Truncation)
2d6fff63
DH
223=================================
224
e0484344
DH
225If a network filesystem file is resized locally by truncation, the following
226should be called to notify the cache::
2d6fff63 227
e0484344
DH
228 void fscache_resize_cookie(struct fscache_cookie *cookie,
229 loff_t new_size);
2d6fff63 230
e0484344
DH
231The caller must have first marked the cookie in-use. The cookie and the new
232size are passed in and the cache is synchronously resized. This is expected to
233be called from ``->setattr()`` inode operation under the inode lock.
2d6fff63 234
2d6fff63 235
e0484344
DH
236Data I/O API
237============
2d6fff63 238
e0484344
DH
239To do data I/O operations directly through a cookie, the following functions
240are available::
2d6fff63 241
e0484344
DH
242 int fscache_begin_read_operation(struct netfs_cache_resources *cres,
243 struct fscache_cookie *cookie);
244 int fscache_read(struct netfs_cache_resources *cres,
245 loff_t start_pos,
246 struct iov_iter *iter,
247 enum netfs_read_from_hole read_hole,
248 netfs_io_terminated_t term_func,
249 void *term_func_priv);
250 int fscache_write(struct netfs_cache_resources *cres,
251 loff_t start_pos,
252 struct iov_iter *iter,
253 netfs_io_terminated_t term_func,
254 void *term_func_priv);
2d6fff63 255
e0484344
DH
256The *begin* function sets up an operation, attaching the resources required to
257the cache resources block from the cookie. Assuming it doesn't return an error
258(for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do
259nothing), then one of the other two functions can be issued.
2d6fff63 260
e0484344
DH
261The *read* and *write* functions initiate a direct-IO operation. Both take the
262previously set up cache resources block, an indication of the start file
263position, and an I/O iterator that describes buffer and indicates the amount of
264data.
2d6fff63 265
e0484344
DH
266The read function also takes a parameter to indicate how it should handle a
267partially populated region (a hole) in the disk content. This may be to ignore
268it, skip over an initial hole and place zeros in the buffer or give an error.
2d6fff63 269
e0484344
DH
270The read and write functions can be given an optional termination function that
271will be run on completion::
2d6fff63
DH
272
273 typedef
e0484344
DH
274 void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error,
275 bool was_async);
2d6fff63 276
e0484344
DH
277If a termination function is given, the operation will be run asynchronously
278and the termination function will be called upon completion. If not given, the
279operation will be run synchronously. Note that in the asynchronous case, it is
280possible for the operation to complete before the function returns.
2d6fff63 281
e0484344
DH
282Both the read and write functions end the operation when they complete,
283detaching any pinned resources.
2d6fff63 284
e0484344
DH
285The read operation will fail with ESTALE if invalidation occurred whilst the
286operation was ongoing.
2d6fff63 287
2d6fff63 288
e0484344
DH
289Data File Coherency
290===================
2d6fff63 291
e0484344
DH
292To request an update of the coherency data and file size on a cookie, the
293following should be called::
2d6fff63 294
402cb8dd 295 void fscache_update_cookie(struct fscache_cookie *cookie,
402cb8dd 296 const void *aux_data,
e0484344 297 const loff_t *object_size);
94d30ae9 298
e0484344 299This will update the cookie's coherency data and/or file size.
94d30ae9 300
ee1235a9 301
e0484344
DH
302Data File Invalidation
303======================
2d6fff63 304
e0484344
DH
305Sometimes it will be necessary to invalidate an object that contains data.
306Typically this will be necessary when the server informs the network filesystem
307of a remote third-party change - at which point the filesystem has to throw
308away the state and cached data that it had for an file and reload from the
309server.
2d6fff63 310
e0484344
DH
311To indicate that a cache object should be invalidated, the following should be
312called::
2d6fff63 313
e0484344
DH
314 void fscache_invalidate(struct fscache_cookie *cookie,
315 const void *aux_data,
316 loff_t size,
317 unsigned int flags);
2d6fff63 318
e0484344
DH
319This increases the invalidation counter in the cookie to cause outstanding
320reads to fail with -ESTALE, sets the coherency data and file size from the
321information supplied, blocks new I/O on the cookie and dispatches the cache to
322go and get rid of the old data.
2d6fff63 323
e0484344
DH
324Invalidation runs asynchronously in a worker thread so that it doesn't block
325too much.
2d6fff63 326
2d6fff63 327
e0484344
DH
328Write-Back Resource Management
329==============================
2d6fff63 330
e0484344
DH
331To write data to the cache from network filesystem writeback, the cache
332resources required need to be pinned at the point the modification is made (for
333instance when the page is marked dirty) as it's not possible to open a file in
334a thread that's exiting.
2d6fff63 335
e0484344 336The following facilities are provided to manage this:
2d6fff63 337
e0484344
DH
338 * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an
339 in-use is held on the cookie for this inode. It can only be changed if the
340 the inode lock is held.
2d6fff63 341
e0484344
DH
342 * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control``
343 struct that gets set if ``__writeback_single_inode()`` clears
344 ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared.
2d6fff63 345
e0484344 346To support this, the following functions are provided::
2d6fff63 347
8fb72b4a
MWO
348 bool fscache_dirty_folio(struct address_space *mapping,
349 struct folio *folio,
350 struct fscache_cookie *cookie);
e0484344
DH
351 void fscache_unpin_writeback(struct writeback_control *wbc,
352 struct fscache_cookie *cookie);
353 void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
354 struct inode *inode,
355 const void *aux);
2d6fff63 356
e0484344 357The *set* function is intended to be called from the filesystem's
8fb72b4a 358``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not
e0484344
DH
359set, it sets that flag and increments the use count on the cookie (the caller
360must already have called ``fscache_use_cookie()``).
2d6fff63 361
e0484344
DH
362The *unpin* function is intended to be called from the filesystem's
363``write_inode`` superblock operation. It cleans up after writing by unusing
364the cookie if unpinned_fscache_wb is set in the writeback_control struct.
2d6fff63 365
e0484344
DH
366The *clear* function is intended to be called from the netfs's ``evict_inode``
367superblock operation. It must be called *after*
368``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans
369up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to
370be updated.
2d6fff63 371
2d6fff63 372
e0484344
DH
373Caching of Local Modifications
374==============================
402cb8dd 375
e0484344
DH
376If a network filesystem has locally modified data that it wants to write to the
377cache, it needs to mark the pages to indicate that a write is in progress, and
378if the mark is already present, it needs to wait for it to be removed first
379(presumably due to an already in-progress operation). This prevents multiple
380competing DIO writes to the same storage in the cache.
2d6fff63 381
e0484344
DH
382Firstly, the netfs should determine if caching is available by doing something
383like::
2d6fff63 384
e0484344 385 bool caching = fscache_cookie_enabled(cookie);
ef778e7a 386
e0484344
DH
387If caching is to be attempted, pages should be waited for and then marked using
388the following functions provided by the netfs helper library::
ef778e7a 389
e0484344
DH
390 void set_page_fscache(struct page *page);
391 void wait_on_page_fscache(struct page *page);
392 int wait_on_page_fscache_killable(struct page *page);
ef778e7a 393
e0484344
DH
394Once all the pages in the span are marked, the netfs can ask fscache to
395schedule a write of that region::
ef778e7a 396
e0484344
DH
397 void fscache_write_to_cache(struct fscache_cookie *cookie,
398 struct address_space *mapping,
399 loff_t start, size_t len, loff_t i_size,
400 netfs_io_terminated_t term_func,
401 void *term_func_priv,
402 bool caching)
ef778e7a 403
e0484344
DH
404And if an error occurs before that point is reached, the marks can be removed
405by calling::
ef778e7a 406
2c547f29 407 void fscache_clear_page_bits(struct address_space *mapping,
e0484344
DH
408 loff_t start, size_t len,
409 bool caching)
ef778e7a 410
2c547f29
YH
411In these functions, a pointer to the mapping to which the source pages are
412attached is passed in and start and len indicate the size of the region that's
413going to be written (it doesn't have to align to page boundaries necessarily,
414but it does have to align to DIO boundaries on the backing filesystem). The
415caching parameter indicates if caching should be skipped, and if false, the
416functions do nothing.
417
418The write function takes some additional parameters: the cookie representing
419the cache object to be written to, i_size indicates the size of the netfs file
420and term_func indicates an optional completion function, to which
421term_func_priv will be passed, along with the error or amount written.
ef778e7a 422
e0484344
DH
423Note that the write function will always run asynchronously and will unmark all
424the pages upon completion before calling term_func.
2d6fff63 425
2d6fff63 426
e0484344
DH
427Page Release and Invalidation
428=============================
2d6fff63 429
e0484344
DH
430Fscache keeps track of whether we have any data in the cache yet for a cache
431object we've just created. It knows it doesn't have to do any reading until it
432has done a write and then the page it wrote from has been released by the VM,
433after which it *has* to look in the cache.
2d6fff63 434
e0484344 435To inform fscache that a page might now be in the cache, the following function
fa29000b 436should be called from the ``release_folio`` address space op::
2d6fff63 437
e0484344 438 void fscache_note_page_release(struct fscache_cookie *cookie);
2d6fff63 439
fa29000b 440if the page has been released (ie. release_folio returned true).
2d6fff63 441
e0484344
DH
442Page release and page invalidation should also wait for any mark left on the
443page to say that a DIO write is underway from that page::
2d6fff63 444
e0484344
DH
445 void wait_on_page_fscache(struct page *page);
446 int wait_on_page_fscache_killable(struct page *page);
2d6fff63 447
2d6fff63 448
e0484344
DH
449API Function Reference
450======================
2d6fff63 451
e0484344 452.. kernel-doc:: include/linux/fscache.h