Commit | Line | Data |
---|---|---|
4b22ff13 | 1 | |
b6bb226a | 2 | Miscellaneous Device control operations for the autofs kernel module |
4b22ff13 IK |
3 | ==================================================================== |
4 | ||
5 | The problem | |
6 | =========== | |
7 | ||
8 | There is a problem with active restarts in autofs (that is to say | |
9 | restarting autofs when there are busy mounts). | |
10 | ||
11 | During normal operation autofs uses a file descriptor opened on the | |
12 | directory that is being managed in order to be able to issue control | |
13 | operations. Using a file descriptor gives ioctl operations access to | |
14 | autofs specific information stored in the super block. The operations | |
15 | are things such as setting an autofs mount catatonic, setting the | |
16 | expire timeout and requesting expire checks. As is explained below, | |
17 | certain types of autofs triggered mounts can end up covering an autofs | |
18 | mount itself which prevents us being able to use open(2) to obtain a | |
19 | file descriptor for these operations if we don't already have one open. | |
20 | ||
21 | Currently autofs uses "umount -l" (lazy umount) to clear active mounts | |
22 | at restart. While using lazy umount works for most cases, anything that | |
23 | needs to walk back up the mount tree to construct a path, such as | |
24 | getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works | |
25 | because the point from which the path is constructed has been detached | |
26 | from the mount tree. | |
27 | ||
28 | The actual problem with autofs is that it can't reconnect to existing | |
29 | mounts. Immediately one thinks of just adding the ability to remount | |
30 | autofs file systems would solve it, but alas, that can't work. This is | |
31 | because autofs direct mounts and the implementation of "on demand mount | |
32 | and expire" of nested mount trees have the file system mounted directly | |
33 | on top of the mount trigger directory dentry. | |
34 | ||
35 | For example, there are two types of automount maps, direct (in the kernel | |
36 | module source you will see a third type called an offset, which is just | |
37 | a direct mount in disguise) and indirect. | |
38 | ||
39 | Here is a master map with direct and indirect map entries: | |
40 | ||
41 | /- /etc/auto.direct | |
42 | /test /etc/auto.indirect | |
43 | ||
44 | and the corresponding map files: | |
45 | ||
46 | /etc/auto.direct: | |
47 | ||
48 | /automount/dparse/g6 budgie:/autofs/export1 | |
49 | /automount/dparse/g1 shark:/autofs/export1 | |
50 | and so on. | |
51 | ||
52 | /etc/auto.indirect: | |
53 | ||
54 | g1 shark:/autofs/export1 | |
55 | g6 budgie:/autofs/export1 | |
56 | and so on. | |
57 | ||
58 | For the above indirect map an autofs file system is mounted on /test and | |
59 | mounts are triggered for each sub-directory key by the inode lookup | |
60 | operation. So we see a mount of shark:/autofs/export1 on /test/g1, for | |
61 | example. | |
62 | ||
63 | The way that direct mounts are handled is by making an autofs mount on | |
64 | each full path, such as /automount/dparse/g1, and using it as a mount | |
65 | trigger. So when we walk on the path we mount shark:/autofs/export1 "on | |
66 | top of this mount point". Since these are always directories we can | |
67 | use the follow_link inode operation to trigger the mount. | |
68 | ||
69 | But, each entry in direct and indirect maps can have offsets (making | |
70 | them multi-mount map entries). | |
71 | ||
72 | For example, an indirect mount map entry could also be: | |
73 | ||
74 | g1 \ | |
75 | / shark:/autofs/export5/testing/test \ | |
76 | /s1 shark:/autofs/export/testing/test/s1 \ | |
77 | /s2 shark:/autofs/export5/testing/test/s2 \ | |
78 | /s1/ss1 shark:/autofs/export1 \ | |
79 | /s2/ss2 shark:/autofs/export2 | |
80 | ||
81 | and a similarly a direct mount map entry could also be: | |
82 | ||
83 | /automount/dparse/g1 \ | |
84 | / shark:/autofs/export5/testing/test \ | |
85 | /s1 shark:/autofs/export/testing/test/s1 \ | |
86 | /s2 shark:/autofs/export5/testing/test/s2 \ | |
87 | /s1/ss1 shark:/autofs/export2 \ | |
88 | /s2/ss2 shark:/autofs/export2 | |
89 | ||
90 | One of the issues with version 4 of autofs was that, when mounting an | |
91 | entry with a large number of offsets, possibly with nesting, we needed | |
92 | to mount and umount all of the offsets as a single unit. Not really a | |
93 | problem, except for people with a large number of offsets in map entries. | |
94 | This mechanism is used for the well known "hosts" map and we have seen | |
95 | cases (in 2.4) where the available number of mounts are exhausted or | |
96 | where the number of privileged ports available is exhausted. | |
97 | ||
98 | In version 5 we mount only as we go down the tree of offsets and | |
99 | similarly for expiring them which resolves the above problem. There is | |
100 | somewhat more detail to the implementation but it isn't needed for the | |
101 | sake of the problem explanation. The one important detail is that these | |
102 | offsets are implemented using the same mechanism as the direct mounts | |
103 | above and so the mount points can be covered by a mount. | |
104 | ||
105 | The current autofs implementation uses an ioctl file descriptor opened | |
106 | on the mount point for control operations. The references held by the | |
107 | descriptor are accounted for in checks made to determine if a mount is | |
108 | in use and is also used to access autofs file system information held | |
109 | in the mount super block. So the use of a file handle needs to be | |
110 | retained. | |
111 | ||
112 | ||
113 | The Solution | |
114 | ============ | |
115 | ||
116 | To be able to restart autofs leaving existing direct, indirect and | |
117 | offset mounts in place we need to be able to obtain a file handle | |
118 | for these potentially covered autofs mount points. Rather than just | |
119 | implement an isolated operation it was decided to re-implement the | |
120 | existing ioctl interface and add new operations to provide this | |
121 | functionality. | |
122 | ||
123 | In addition, to be able to reconstruct a mount tree that has busy mounts, | |
124 | the uid and gid of the last user that triggered the mount needs to be | |
125 | available because these can be used as macro substitution variables in | |
126 | autofs maps. They are recorded at mount request time and an operation | |
127 | has been added to retrieve them. | |
128 | ||
129 | Since we're re-implementing the control interface, a couple of other | |
130 | problems with the existing interface have been addressed. First, when | |
131 | a mount or expire operation completes a status is returned to the | |
132 | kernel by either a "send ready" or a "send fail" operation. The | |
133 | "send fail" operation of the ioctl interface could only ever send | |
134 | ENOENT so the re-implementation allows user space to send an actual | |
135 | status. Another expensive operation in user space, for those using | |
136 | very large maps, is discovering if a mount is present. Usually this | |
137 | involves scanning /proc/mounts and since it needs to be done quite | |
138 | often it can introduce significant overhead when there are many entries | |
139 | in the mount table. An operation to lookup the mount status of a mount | |
140 | point dentry (covered or not) has also been added. | |
141 | ||
142 | Current kernel development policy recommends avoiding the use of the | |
143 | ioctl mechanism in favor of systems such as Netlink. An implementation | |
144 | using this system was attempted to evaluate its suitability and it was | |
145 | found to be inadequate, in this case. The Generic Netlink system was | |
146 | used for this as raw Netlink would lead to a significant increase in | |
147 | complexity. There's no question that the Generic Netlink system is an | |
148 | elegant solution for common case ioctl functions but it's not a complete | |
a33f3224 | 149 | replacement probably because its primary purpose in life is to be a |
4b22ff13 IK |
150 | message bus implementation rather than specifically an ioctl replacement. |
151 | While it would be possible to work around this there is one concern | |
152 | that lead to the decision to not use it. This is that the autofs | |
153 | expire in the daemon has become far to complex because umount | |
154 | candidates are enumerated, almost for no other reason than to "count" | |
155 | the number of times to call the expire ioctl. This involves scanning | |
156 | the mount table which has proved to be a big overhead for users with | |
157 | large maps. The best way to improve this is try and get back to the | |
158 | way the expire was done long ago. That is, when an expire request is | |
159 | issued for a mount (file handle) we should continually call back to | |
160 | the daemon until we can't umount any more mounts, then return the | |
161 | appropriate status to the daemon. At the moment we just expire one | |
162 | mount at a time. A Generic Netlink implementation would exclude this | |
163 | possibility for future development due to the requirements of the | |
164 | message bus architecture. | |
165 | ||
166 | ||
b6bb226a | 167 | autofs Miscellaneous Device mount control interface |
4b22ff13 IK |
168 | ==================================================== |
169 | ||
170 | The control interface is opening a device node, typically /dev/autofs. | |
171 | ||
172 | All the ioctls use a common structure to pass the needed parameter | |
173 | information and return operation results: | |
174 | ||
175 | struct autofs_dev_ioctl { | |
176 | __u32 ver_major; | |
177 | __u32 ver_minor; | |
178 | __u32 size; /* total size of data passed in | |
179 | * including this struct */ | |
180 | __s32 ioctlfd; /* automount command fd */ | |
181 | ||
88488080 | 182 | /* Command parameters */ |
bf72eda5 TK |
183 | union { |
184 | struct args_protover protover; | |
185 | struct args_protosubver protosubver; | |
186 | struct args_openmount openmount; | |
187 | struct args_ready ready; | |
188 | struct args_fail fail; | |
189 | struct args_setpipefd setpipefd; | |
190 | struct args_timeout timeout; | |
191 | struct args_requester requester; | |
192 | struct args_expire expire; | |
193 | struct args_askumount askumount; | |
194 | struct args_ismountpoint ismountpoint; | |
195 | }; | |
4b22ff13 IK |
196 | |
197 | char path[0]; | |
198 | }; | |
199 | ||
200 | The ioctlfd field is a mount point file descriptor of an autofs mount | |
201 | point. It is returned by the open call and is used by all calls except | |
202 | the check for whether a given path is a mount point, where it may | |
203 | optionally be used to check a specific mount corresponding to a given | |
204 | mount point file descriptor, and when requesting the uid and gid of the | |
205 | last successful mount on a directory within the autofs file system. | |
206 | ||
bf72eda5 TK |
207 | The union is used to communicate parameters and results of calls made |
208 | as described below. | |
4b22ff13 IK |
209 | |
210 | The path field is used to pass a path where it is needed and the size field | |
211 | is used account for the increased structure length when translating the | |
212 | structure sent from user space. | |
213 | ||
214 | This structure can be initialized before setting specific fields by using | |
215 | the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *). | |
216 | ||
217 | All of the ioctls perform a copy of this structure from user space to | |
218 | kernel space and return -EINVAL if the size parameter is smaller than | |
219 | the structure size itself, -ENOMEM if the kernel memory allocation fails | |
220 | or -EFAULT if the copy itself fails. Other checks include a version check | |
221 | of the compiled in user space version against the module version and a | |
222 | mismatch results in a -EINVAL return. If the size field is greater than | |
223 | the structure size then a path is assumed to be present and is checked to | |
224 | ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is | |
225 | returned. Following these checks, for all ioctl commands except | |
226 | AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and | |
227 | AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is | |
228 | not a valid descriptor or doesn't correspond to an autofs mount point | |
229 | an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is | |
230 | returned. | |
231 | ||
232 | ||
233 | The ioctls | |
234 | ========== | |
235 | ||
236 | An example of an implementation which uses this interface can be seen | |
237 | in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the | |
238 | distribution tar available for download from kernel.org in directory | |
239 | /pub/linux/daemons/autofs/v5. | |
240 | ||
241 | The device node ioctl operations implemented by this interface are: | |
242 | ||
243 | ||
244 | AUTOFS_DEV_IOCTL_VERSION | |
245 | ------------------------ | |
246 | ||
b6bb226a | 247 | Get the major and minor version of the autofs device ioctl kernel module |
4b22ff13 IK |
248 | implementation. It requires an initialized struct autofs_dev_ioctl as an |
249 | input parameter and sets the version information in the passed in structure. | |
250 | It returns 0 on success or the error -EINVAL if a version mismatch is | |
251 | detected. | |
252 | ||
253 | ||
254 | AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD | |
255 | ------------------------------------------------------------------ | |
256 | ||
b6bb226a | 257 | Get the major and minor version of the autofs protocol version understood |
4b22ff13 IK |
258 | by loaded module. This call requires an initialized struct autofs_dev_ioctl |
259 | with the ioctlfd field set to a valid autofs mount point descriptor | |
bf72eda5 TK |
260 | and sets the requested version number in version field of struct args_protover |
261 | or sub_version field of struct args_protosubver. These commands return | |
262 | 0 on success or one of the negative error codes if validation fails. | |
4b22ff13 IK |
263 | |
264 | ||
265 | AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT | |
266 | ---------------------------------------------------------- | |
267 | ||
268 | Obtain and release a file descriptor for an autofs managed mount point | |
269 | path. The open call requires an initialized struct autofs_dev_ioctl with | |
df5cbb27 | 270 | the path field set and the size field adjusted appropriately as well |
bf72eda5 TK |
271 | as the devid field of struct args_openmount set to the device number of |
272 | the autofs mount. The device number can be obtained from the mount options | |
273 | shown in /proc/mounts. The close call requires an initialized struct | |
4b22ff13 IK |
274 | autofs_dev_ioct with the ioctlfd field set to the descriptor obtained |
275 | from the open call. The release of the file descriptor can also be done | |
276 | with close(2) so any open descriptors will also be closed at process exit. | |
277 | The close call is included in the implemented operations largely for | |
278 | completeness and to provide for a consistent user space implementation. | |
279 | ||
280 | ||
281 | AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD | |
282 | -------------------------------------------------------- | |
283 | ||
284 | Return mount and expire result status from user space to the kernel. | |
285 | Both of these calls require an initialized struct autofs_dev_ioctl | |
286 | with the ioctlfd field set to the descriptor obtained from the open | |
bf72eda5 TK |
287 | call and the token field of struct args_ready or struct args_fail set |
288 | to the wait queue token number, received by user space in the foregoing | |
289 | mount or expire request. The status field of struct args_fail is set to | |
290 | the errno of the operation. It is set to 0 on success. | |
4b22ff13 IK |
291 | |
292 | ||
293 | AUTOFS_DEV_IOCTL_SETPIPEFD_CMD | |
294 | ------------------------------ | |
295 | ||
296 | Set the pipe file descriptor used for kernel communication to the daemon. | |
297 | Normally this is set at mount time using an option but when reconnecting | |
298 | to a existing mount we need to use this to tell the autofs mount about | |
299 | the new kernel pipe descriptor. In order to protect mounts against | |
300 | incorrectly setting the pipe descriptor we also require that the autofs | |
301 | mount be catatonic (see next call). | |
302 | ||
303 | The call requires an initialized struct autofs_dev_ioctl with the | |
304 | ioctlfd field set to the descriptor obtained from the open call and | |
bf72eda5 TK |
305 | the pipefd field of struct args_setpipefd set to descriptor of the pipe. |
306 | On success the call also sets the process group id used to identify the | |
307 | controlling process (eg. the owning automount(8) daemon) to the process | |
308 | group of the caller. | |
4b22ff13 IK |
309 | |
310 | ||
311 | AUTOFS_DEV_IOCTL_CATATONIC_CMD | |
312 | ------------------------------ | |
313 | ||
314 | Make the autofs mount point catatonic. The autofs mount will no longer | |
315 | issue mount requests, the kernel communication pipe descriptor is released | |
316 | and any remaining waits in the queue released. | |
317 | ||
318 | The call requires an initialized struct autofs_dev_ioctl with the | |
319 | ioctlfd field set to the descriptor obtained from the open call. | |
320 | ||
321 | ||
322 | AUTOFS_DEV_IOCTL_TIMEOUT_CMD | |
323 | ---------------------------- | |
324 | ||
25985edc | 325 | Set the expire timeout for mounts within an autofs mount point. |
4b22ff13 IK |
326 | |
327 | The call requires an initialized struct autofs_dev_ioctl with the | |
328 | ioctlfd field set to the descriptor obtained from the open call. | |
329 | ||
330 | ||
331 | AUTOFS_DEV_IOCTL_REQUESTER_CMD | |
332 | ------------------------------ | |
333 | ||
334 | Return the uid and gid of the last process to successfully trigger a the | |
335 | mount on the given path dentry. | |
336 | ||
337 | The call requires an initialized struct autofs_dev_ioctl with the path | |
338 | field set to the mount point in question and the size field adjusted | |
bf72eda5 TK |
339 | appropriately. Upon return the uid field of struct args_requester contains |
340 | the uid and gid field the gid. | |
4b22ff13 IK |
341 | |
342 | When reconstructing an autofs mount tree with active mounts we need to | |
343 | re-connect to mounts that may have used the original process uid and | |
344 | gid (or string variations of them) for mount lookups within the map entry. | |
345 | This call provides the ability to obtain this uid and gid so they may be | |
346 | used by user space for the mount map lookups. | |
347 | ||
348 | ||
349 | AUTOFS_DEV_IOCTL_EXPIRE_CMD | |
350 | --------------------------- | |
351 | ||
352 | Issue an expire request to the kernel for an autofs mount. Typically | |
353 | this ioctl is called until no further expire candidates are found. | |
354 | ||
355 | The call requires an initialized struct autofs_dev_ioctl with the | |
356 | ioctlfd field set to the descriptor obtained from the open call. In | |
841964e8 IK |
357 | addition an immediate expire that's independent of the mount timeout, |
358 | and a forced expire that's independent of whether the mount is busy, | |
359 | can be requested by setting the how field of struct args_expire to | |
360 | AUTOFS_EXP_IMMEDIATE or AUTOFS_EXP_FORCED, respectively . If no | |
bf72eda5 TK |
361 | expire candidates can be found the ioctl returns -1 with errno set to |
362 | EAGAIN. | |
4b22ff13 IK |
363 | |
364 | This call causes the kernel module to check the mount corresponding | |
365 | to the given ioctlfd for mounts that can be expired, issues an expire | |
366 | request back to the daemon and waits for completion. | |
367 | ||
368 | AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD | |
369 | ------------------------------ | |
370 | ||
371 | Checks if an autofs mount point is in use. | |
372 | ||
373 | The call requires an initialized struct autofs_dev_ioctl with the | |
374 | ioctlfd field set to the descriptor obtained from the open call and | |
bf72eda5 TK |
375 | it returns the result in the may_umount field of struct args_askumount, |
376 | 1 for busy and 0 otherwise. | |
4b22ff13 IK |
377 | |
378 | ||
379 | AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD | |
380 | --------------------------------- | |
381 | ||
382 | Check if the given path is a mountpoint. | |
383 | ||
384 | The call requires an initialized struct autofs_dev_ioctl. There are two | |
385 | possible variations. Both use the path field set to the path of the mount | |
386 | point to check and the size field adjusted appropriately. One uses the | |
387 | ioctlfd field to identify a specific mount point to check while the other | |
bf72eda5 TK |
388 | variation uses the path and optionally in.type field of struct args_ismountpoint |
389 | set to an autofs mount type. The call returns 1 if this is a mount point | |
390 | and sets out.devid field to the device number of the mount and out.magic | |
391 | field to the relevant super block magic number (described below) or 0 if | |
392 | it isn't a mountpoint. In both cases the the device number (as returned | |
393 | by new_encode_dev()) is returned in out.devid field. | |
4b22ff13 IK |
394 | |
395 | If supplied with a file descriptor we're looking for a specific mount, | |
396 | not necessarily at the top of the mounted stack. In this case the path | |
397 | the descriptor corresponds to is considered a mountpoint if it is itself | |
398 | a mountpoint or contains a mount, such as a multi-mount without a root | |
399 | mount. In this case we return 1 if the descriptor corresponds to a mount | |
400 | point and and also returns the super magic of the covering mount if there | |
401 | is one or 0 if it isn't a mountpoint. | |
402 | ||
403 | If a path is supplied (and the ioctlfd field is set to -1) then the path | |
404 | is looked up and is checked to see if it is the root of a mount. If a | |
405 | type is also given we are looking for a particular autofs mount and if | |
406 | a match isn't found a fail is returned. If the the located path is the | |
407 | root of a mount 1 is returned along with the super magic of the mount | |
408 | or 0 otherwise. |