Commit | Line | Data |
---|---|---|
e6d42cb1 MCC |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | =================================== | |
28225418 | 4 | File management in the Linux kernel |
e6d42cb1 | 5 | =================================== |
28225418 DS |
6 | |
7 | This document describes how locking for files (struct file) | |
8 | and file descriptor table (struct files) works. | |
9 | ||
10 | Up until 2.6.12, the file descriptor table has been protected | |
11 | with a lock (files->file_lock) and reference count (files->count). | |
12 | ->file_lock protected accesses to all the file related fields | |
13 | of the table. ->count was used for sharing the file descriptor | |
14 | table between tasks cloned with CLONE_FILES flag. Typically | |
15 | this would be the case for posix threads. As with the common | |
16 | refcounting model in the kernel, the last task doing | |
17 | a put_files_struct() frees the file descriptor (fd) table. | |
18 | The files (struct file) themselves are protected using | |
19 | reference count (->f_count). | |
20 | ||
21 | In the new lock-free model of file descriptor management, | |
22 | the reference counting is similar, but the locking is | |
23 | based on RCU. The file descriptor table contains multiple | |
24 | elements - the fd sets (open_fds and close_on_exec, the | |
25 | array of file pointers, the sizes of the sets and the array | |
26 | etc.). In order for the updates to appear atomic to | |
27 | a lock-free reader, all the elements of the file descriptor | |
28 | table are in a separate structure - struct fdtable. | |
29 | files_struct contains a pointer to struct fdtable through | |
30 | which the actual fd table is accessed. Initially the | |
31 | fdtable is embedded in files_struct itself. On a subsequent | |
32 | expansion of fdtable, a new fdtable structure is allocated | |
33 | and files->fdtab points to the new structure. The fdtable | |
34 | structure is freed with RCU and lock-free readers either | |
35 | see the old fdtable or the new fdtable making the update | |
36 | appear atomic. Here are the locking rules for | |
37 | the fdtable structure - | |
38 | ||
39 | 1. All references to the fdtable must be done through | |
e6d42cb1 | 40 | the files_fdtable() macro:: |
28225418 DS |
41 | |
42 | struct fdtable *fdt; | |
43 | ||
44 | rcu_read_lock(); | |
45 | ||
46 | fdt = files_fdtable(files); | |
47 | .... | |
48 | if (n <= fdt->max_fds) | |
49 | .... | |
50 | ... | |
51 | rcu_read_unlock(); | |
52 | ||
53 | files_fdtable() uses rcu_dereference() macro which takes care of | |
54 | the memory barrier requirements for lock-free dereference. | |
55 | The fdtable pointer must be read within the read-side | |
56 | critical section. | |
57 | ||
58 | 2. Reading of the fdtable as described above must be protected | |
59 | by rcu_read_lock()/rcu_read_unlock(). | |
60 | ||
670e9f34 | 61 | 3. For any update to the fd table, files->file_lock must |
28225418 DS |
62 | be held. |
63 | ||
64 | 4. To look up the file structure given an fd, a reader | |
460b4f81 | 65 | must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These |
28225418 | 66 | take care of barrier requirements due to lock-free lookup. |
e6d42cb1 MCC |
67 | |
68 | An example:: | |
28225418 DS |
69 | |
70 | struct file *file; | |
71 | ||
72 | rcu_read_lock(); | |
460b4f81 | 73 | file = lookup_fd_rcu(fd); |
28225418 DS |
74 | if (file) { |
75 | ... | |
76 | } | |
77 | .... | |
78 | rcu_read_unlock(); | |
79 | ||
80 | 5. Handling of the file structures is special. Since the look-up | |
81 | of the fd (fget()/fget_light()) are lock-free, it is possible | |
82 | that look-up may race with the last put() operation on the | |
fd659fd6 | 83 | file structure. This is avoided using atomic_long_inc_not_zero() |
e6d42cb1 | 84 | on ->f_count:: |
28225418 DS |
85 | |
86 | rcu_read_lock(); | |
f36c2943 | 87 | file = files_lookup_fd_rcu(files, fd); |
28225418 | 88 | if (file) { |
fd659fd6 | 89 | if (atomic_long_inc_not_zero(&file->f_count)) |
28225418 DS |
90 | *fput_needed = 1; |
91 | else | |
92 | /* Didn't get the reference, someone's freed */ | |
93 | file = NULL; | |
94 | } | |
95 | rcu_read_unlock(); | |
96 | .... | |
97 | return file; | |
98 | ||
fd659fd6 | 99 | atomic_long_inc_not_zero() detects if refcounts is already zero or |
28225418 DS |
100 | goes to zero during increment. If it does, we fail |
101 | fget()/fget_light(). | |
102 | ||
103 | 6. Since both fdtable and file structures can be looked up | |
104 | lock-free, they must be installed using rcu_assign_pointer() | |
105 | API. If they are looked up lock-free, rcu_dereference() | |
106 | must be used. However it is advisable to use files_fdtable() | |
460b4f81 | 107 | and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues. |
28225418 DS |
108 | |
109 | 7. While updating, the fdtable pointer must be looked up while | |
110 | holding files->file_lock. If ->file_lock is dropped, then | |
111 | another thread expand the files thereby creating a new | |
112 | fdtable and making the earlier fdtable pointer stale. | |
e6d42cb1 MCC |
113 | |
114 | For example:: | |
28225418 DS |
115 | |
116 | spin_lock(&files->file_lock); | |
117 | fd = locate_fd(files, file, start); | |
118 | if (fd >= 0) { | |
119 | /* locate_fd() may have expanded fdtable, load the ptr */ | |
120 | fdt = files_fdtable(files); | |
1dce27c5 DH |
121 | __set_open_fd(fd, fdt); |
122 | __clear_close_on_exec(fd, fdt); | |
28225418 DS |
123 | spin_unlock(&files->file_lock); |
124 | ..... | |
125 | ||
126 | Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), | |
127 | the fdtable pointer (fdt) must be loaded after locate_fd(). | |
128 |