Commit | Line | Data |
---|---|---|
7063fbf2 | 1 | |
6c28f2c0 | 2 | configfs - Userspace-driven kernel object configuration. |
7063fbf2 JB |
3 | |
4 | Joel Becker <joel.becker@oracle.com> | |
5 | ||
6 | Updated: 31 March 2005 | |
7 | ||
8 | Copyright (c) 2005 Oracle Corporation, | |
9 | Joel Becker <joel.becker@oracle.com> | |
10 | ||
11 | ||
12 | [What is configfs?] | |
13 | ||
14 | configfs is a ram-based filesystem that provides the converse of | |
15 | sysfs's functionality. Where sysfs is a filesystem-based view of | |
16 | kernel objects, configfs is a filesystem-based manager of kernel | |
17 | objects, or config_items. | |
18 | ||
19 | With sysfs, an object is created in kernel (for example, when a device | |
20 | is discovered) and it is registered with sysfs. Its attributes then | |
21 | appear in sysfs, allowing userspace to read the attributes via | |
22 | readdir(3)/read(2). It may allow some attributes to be modified via | |
23 | write(2). The important point is that the object is created and | |
24 | destroyed in kernel, the kernel controls the lifecycle of the sysfs | |
25 | representation, and sysfs is merely a window on all this. | |
26 | ||
27 | A configfs config_item is created via an explicit userspace operation: | |
28 | mkdir(2). It is destroyed via rmdir(2). The attributes appear at | |
29 | mkdir(2) time, and can be read or modified via read(2) and write(2). | |
30 | As with sysfs, readdir(3) queries the list of items and/or attributes. | |
31 | symlink(2) can be used to group items together. Unlike sysfs, the | |
32 | lifetime of the representation is completely driven by userspace. The | |
33 | kernel modules backing the items must respond to this. | |
34 | ||
35 | Both sysfs and configfs can and should exist together on the same | |
36 | system. One is not a replacement for the other. | |
37 | ||
38 | [Using configfs] | |
39 | ||
40 | configfs can be compiled as a module or into the kernel. You can access | |
41 | it by doing | |
42 | ||
43 | mount -t configfs none /config | |
44 | ||
45 | The configfs tree will be empty unless client modules are also loaded. | |
46 | These are modules that register their item types with configfs as | |
47 | subsystems. Once a client subsystem is loaded, it will appear as a | |
48 | subdirectory (or more than one) under /config. Like sysfs, the | |
49 | configfs tree is always there, whether mounted on /config or not. | |
50 | ||
51 | An item is created via mkdir(2). The item's attributes will also | |
52 | appear at this time. readdir(3) can determine what the attributes are, | |
53 | read(2) can query their default values, and write(2) can store new | |
54 | values. Like sysfs, attributes should be ASCII text files, preferably | |
55 | with only one value per file. The same efficiency caveats from sysfs | |
56 | apply. Don't mix more than one attribute in one attribute file. | |
57 | ||
58 | Like sysfs, configfs expects write(2) to store the entire buffer at | |
59 | once. When writing to configfs attributes, userspace processes should | |
60 | first read the entire file, modify the portions they wish to change, and | |
61 | then write the entire buffer back. Attribute files have a maximum size | |
62 | of one page (PAGE_SIZE, 4096 on i386). | |
63 | ||
64 | When an item needs to be destroyed, remove it with rmdir(2). An | |
65 | item cannot be destroyed if any other item has a link to it (via | |
66 | symlink(2)). Links can be removed via unlink(2). | |
67 | ||
68 | [Configuring FakeNBD: an Example] | |
69 | ||
70 | Imagine there's a Network Block Device (NBD) driver that allows you to | |
71 | access remote block devices. Call it FakeNBD. FakeNBD uses configfs | |
72 | for its configuration. Obviously, there will be a nice program that | |
73 | sysadmins use to configure FakeNBD, but somehow that program has to tell | |
74 | the driver about it. Here's where configfs comes in. | |
75 | ||
76 | When the FakeNBD driver is loaded, it registers itself with configfs. | |
77 | readdir(3) sees this just fine: | |
78 | ||
79 | # ls /config | |
80 | fakenbd | |
81 | ||
82 | A fakenbd connection can be created with mkdir(2). The name is | |
83 | arbitrary, but likely the tool will make some use of the name. Perhaps | |
84 | it is a uuid or a disk name: | |
85 | ||
86 | # mkdir /config/fakenbd/disk1 | |
87 | # ls /config/fakenbd/disk1 | |
88 | target device rw | |
89 | ||
90 | The target attribute contains the IP address of the server FakeNBD will | |
91 | connect to. The device attribute is the device on the server. | |
92 | Predictably, the rw attribute determines whether the connection is | |
93 | read-only or read-write. | |
94 | ||
95 | # echo 10.0.0.1 > /config/fakenbd/disk1/target | |
96 | # echo /dev/sda1 > /config/fakenbd/disk1/device | |
97 | # echo 1 > /config/fakenbd/disk1/rw | |
98 | ||
99 | That's it. That's all there is. Now the device is configured, via the | |
100 | shell no less. | |
101 | ||
102 | [Coding With configfs] | |
103 | ||
104 | Every object in configfs is a config_item. A config_item reflects an | |
105 | object in the subsystem. It has attributes that match values on that | |
106 | object. configfs handles the filesystem representation of that object | |
107 | and its attributes, allowing the subsystem to ignore all but the | |
108 | basic show/store interaction. | |
109 | ||
110 | Items are created and destroyed inside a config_group. A group is a | |
111 | collection of items that share the same attributes and operations. | |
112 | Items are created by mkdir(2) and removed by rmdir(2), but configfs | |
113 | handles that. The group has a set of operations to perform these tasks | |
114 | ||
115 | A subsystem is the top level of a client module. During initialization, | |
116 | the client module registers the subsystem with configfs, the subsystem | |
117 | appears as a directory at the top of the configfs filesystem. A | |
118 | subsystem is also a config_group, and can do everything a config_group | |
119 | can. | |
120 | ||
121 | [struct config_item] | |
122 | ||
123 | struct config_item { | |
124 | char *ci_name; | |
125 | char ci_namebuf[UOBJ_NAME_LEN]; | |
126 | struct kref ci_kref; | |
127 | struct list_head ci_entry; | |
128 | struct config_item *ci_parent; | |
129 | struct config_group *ci_group; | |
130 | struct config_item_type *ci_type; | |
131 | struct dentry *ci_dentry; | |
132 | }; | |
133 | ||
134 | void config_item_init(struct config_item *); | |
135 | void config_item_init_type_name(struct config_item *, | |
136 | const char *name, | |
137 | struct config_item_type *type); | |
138 | struct config_item *config_item_get(struct config_item *); | |
139 | void config_item_put(struct config_item *); | |
140 | ||
141 | Generally, struct config_item is embedded in a container structure, a | |
142 | structure that actually represents what the subsystem is doing. The | |
143 | config_item portion of that structure is how the object interacts with | |
144 | configfs. | |
145 | ||
146 | Whether statically defined in a source file or created by a parent | |
147 | config_group, a config_item must have one of the _init() functions | |
148 | called on it. This initializes the reference count and sets up the | |
149 | appropriate fields. | |
150 | ||
151 | All users of a config_item should have a reference on it via | |
152 | config_item_get(), and drop the reference when they are done via | |
153 | config_item_put(). | |
154 | ||
155 | By itself, a config_item cannot do much more than appear in configfs. | |
156 | Usually a subsystem wants the item to display and/or store attributes, | |
157 | among other things. For that, it needs a type. | |
158 | ||
159 | [struct config_item_type] | |
160 | ||
161 | struct configfs_item_operations { | |
162 | void (*release)(struct config_item *); | |
163 | ssize_t (*show_attribute)(struct config_item *, | |
164 | struct configfs_attribute *, | |
165 | char *); | |
166 | ssize_t (*store_attribute)(struct config_item *, | |
167 | struct configfs_attribute *, | |
168 | const char *, size_t); | |
169 | int (*allow_link)(struct config_item *src, | |
170 | struct config_item *target); | |
171 | int (*drop_link)(struct config_item *src, | |
172 | struct config_item *target); | |
173 | }; | |
174 | ||
175 | struct config_item_type { | |
176 | struct module *ct_owner; | |
177 | struct configfs_item_operations *ct_item_ops; | |
178 | struct configfs_group_operations *ct_group_ops; | |
179 | struct configfs_attribute **ct_attrs; | |
180 | }; | |
181 | ||
182 | The most basic function of a config_item_type is to define what | |
183 | operations can be performed on a config_item. All items that have been | |
184 | allocated dynamically will need to provide the ct_item_ops->release() | |
185 | method. This method is called when the config_item's reference count | |
186 | reaches zero. Items that wish to display an attribute need to provide | |
187 | the ct_item_ops->show_attribute() method. Similarly, storing a new | |
188 | attribute value uses the store_attribute() method. | |
189 | ||
190 | [struct configfs_attribute] | |
191 | ||
192 | struct configfs_attribute { | |
193 | char *ca_name; | |
194 | struct module *ca_owner; | |
195 | mode_t ca_mode; | |
196 | }; | |
197 | ||
198 | When a config_item wants an attribute to appear as a file in the item's | |
199 | configfs directory, it must define a configfs_attribute describing it. | |
200 | It then adds the attribute to the NULL-terminated array | |
201 | config_item_type->ct_attrs. When the item appears in configfs, the | |
202 | attribute file will appear with the configfs_attribute->ca_name | |
203 | filename. configfs_attribute->ca_mode specifies the file permissions. | |
204 | ||
205 | If an attribute is readable and the config_item provides a | |
206 | ct_item_ops->show_attribute() method, that method will be called | |
207 | whenever userspace asks for a read(2) on the attribute. The converse | |
208 | will happen for write(2). | |
209 | ||
210 | [struct config_group] | |
211 | ||
212 | A config_item cannot live in a vaccum. The only way one can be created | |
213 | is via mkdir(2) on a config_group. This will trigger creation of a | |
214 | child item. | |
215 | ||
216 | struct config_group { | |
217 | struct config_item cg_item; | |
218 | struct list_head cg_children; | |
219 | struct configfs_subsystem *cg_subsys; | |
220 | struct config_group **default_groups; | |
221 | }; | |
222 | ||
223 | void config_group_init(struct config_group *group); | |
224 | void config_group_init_type_name(struct config_group *group, | |
225 | const char *name, | |
226 | struct config_item_type *type); | |
227 | ||
228 | ||
229 | The config_group structure contains a config_item. Properly configuring | |
230 | that item means that a group can behave as an item in its own right. | |
231 | However, it can do more: it can create child items or groups. This is | |
232 | accomplished via the group operations specified on the group's | |
233 | config_item_type. | |
234 | ||
235 | struct configfs_group_operations { | |
236 | struct config_item *(*make_item)(struct config_group *group, | |
237 | const char *name); | |
238 | struct config_group *(*make_group)(struct config_group *group, | |
239 | const char *name); | |
240 | int (*commit_item)(struct config_item *item); | |
241 | void (*drop_item)(struct config_group *group, | |
242 | struct config_item *item); | |
243 | }; | |
244 | ||
245 | A group creates child items by providing the | |
246 | ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new | |
247 | config_item (or more likely, its container structure), initializes it, | |
248 | and returns it to configfs. Configfs will then populate the filesystem | |
249 | tree to reflect the new item. | |
250 | ||
251 | If the subsystem wants the child to be a group itself, the subsystem | |
252 | provides ct_group_ops->make_group(). Everything else behaves the same, | |
253 | using the group _init() functions on the group. | |
254 | ||
255 | Finally, when userspace calls rmdir(2) on the item or group, | |
256 | ct_group_ops->drop_item() is called. As a config_group is also a | |
53cb4726 | 257 | config_item, it is not necessary for a separate drop_group() method. |
7063fbf2 JB |
258 | The subsystem must config_item_put() the reference that was initialized |
259 | upon item allocation. If a subsystem has no work to do, it may omit | |
260 | the ct_group_ops->drop_item() method, and configfs will call | |
261 | config_item_put() on the item on behalf of the subsystem. | |
262 | ||
263 | IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2) | |
264 | is called, configfs WILL remove the item from the filesystem tree | |
265 | (assuming that it has no children to keep it busy). The subsystem is | |
266 | responsible for responding to this. If the subsystem has references to | |
267 | the item in other threads, the memory is safe. It may take some time | |
268 | for the item to actually disappear from the subsystem's usage. But it | |
269 | is gone from configfs. | |
270 | ||
271 | A config_group cannot be removed while it still has child items. This | |
272 | is implemented in the configfs rmdir(2) code. ->drop_item() will not be | |
273 | called, as the item has not been dropped. rmdir(2) will fail, as the | |
274 | directory is not empty. | |
275 | ||
276 | [struct configfs_subsystem] | |
277 | ||
278 | A subsystem must register itself, ususally at module_init time. This | |
279 | tells configfs to make the subsystem appear in the file tree. | |
280 | ||
281 | struct configfs_subsystem { | |
282 | struct config_group su_group; | |
283 | struct semaphore su_sem; | |
284 | }; | |
285 | ||
286 | int configfs_register_subsystem(struct configfs_subsystem *subsys); | |
287 | void configfs_unregister_subsystem(struct configfs_subsystem *subsys); | |
288 | ||
289 | A subsystem consists of a toplevel config_group and a semaphore. | |
290 | The group is where child config_items are created. For a subsystem, | |
291 | this group is usually defined statically. Before calling | |
292 | configfs_register_subsystem(), the subsystem must have initialized the | |
293 | group via the usual group _init() functions, and it must also have | |
294 | initialized the semaphore. | |
295 | When the register call returns, the subsystem is live, and it | |
296 | will be visible via configfs. At that point, mkdir(2) can be called and | |
297 | the subsystem must be ready for it. | |
298 | ||
299 | [An Example] | |
300 | ||
301 | The best example of these basic concepts is the simple_children | |
302 | subsystem/group and the simple_child item in configfs_example.c It | |
303 | shows a trivial object displaying and storing an attribute, and a simple | |
304 | group creating and destroying these children. | |
305 | ||
306 | [Hierarchy Navigation and the Subsystem Semaphore] | |
307 | ||
308 | There is an extra bonus that configfs provides. The config_groups and | |
309 | config_items are arranged in a hierarchy due to the fact that they | |
310 | appear in a filesystem. A subsystem is NEVER to touch the filesystem | |
311 | parts, but the subsystem might be interested in this hierarchy. For | |
312 | this reason, the hierarchy is mirrored via the config_group->cg_children | |
313 | and config_item->ci_parent structure members. | |
314 | ||
315 | A subsystem can navigate the cg_children list and the ci_parent pointer | |
316 | to see the tree created by the subsystem. This can race with configfs' | |
317 | management of the hierarchy, so configfs uses the subsystem semaphore to | |
318 | protect modifications. Whenever a subsystem wants to navigate the | |
319 | hierarchy, it must do so under the protection of the subsystem | |
320 | semaphore. | |
321 | ||
322 | A subsystem will be prevented from acquiring the semaphore while a newly | |
323 | allocated item has not been linked into this hierarchy. Similarly, it | |
324 | will not be able to acquire the semaphore while a dropping item has not | |
325 | yet been unlinked. This means that an item's ci_parent pointer will | |
326 | never be NULL while the item is in configfs, and that an item will only | |
327 | be in its parent's cg_children list for the same duration. This allows | |
328 | a subsystem to trust ci_parent and cg_children while they hold the | |
329 | semaphore. | |
330 | ||
331 | [Item Aggregation Via symlink(2)] | |
332 | ||
333 | configfs provides a simple group via the group->item parent/child | |
334 | relationship. Often, however, a larger environment requires aggregation | |
335 | outside of the parent/child connection. This is implemented via | |
336 | symlink(2). | |
337 | ||
338 | A config_item may provide the ct_item_ops->allow_link() and | |
339 | ct_item_ops->drop_link() methods. If the ->allow_link() method exists, | |
340 | symlink(2) may be called with the config_item as the source of the link. | |
341 | These links are only allowed between configfs config_items. Any | |
342 | symlink(2) attempt outside the configfs filesystem will be denied. | |
343 | ||
344 | When symlink(2) is called, the source config_item's ->allow_link() | |
345 | method is called with itself and a target item. If the source item | |
346 | allows linking to target item, it returns 0. A source item may wish to | |
347 | reject a link if it only wants links to a certain type of object (say, | |
348 | in its own subsystem). | |
349 | ||
350 | When unlink(2) is called on the symbolic link, the source item is | |
351 | notified via the ->drop_link() method. Like the ->drop_item() method, | |
352 | this is a void function and cannot return failure. The subsystem is | |
353 | responsible for responding to the change. | |
354 | ||
355 | A config_item cannot be removed while it links to any other item, nor | |
356 | can it be removed while an item links to it. Dangling symlinks are not | |
357 | allowed in configfs. | |
358 | ||
359 | [Automatically Created Subgroups] | |
360 | ||
361 | A new config_group may want to have two types of child config_items. | |
362 | While this could be codified by magic names in ->make_item(), it is much | |
363 | more explicit to have a method whereby userspace sees this divergence. | |
364 | ||
365 | Rather than have a group where some items behave differently than | |
366 | others, configfs provides a method whereby one or many subgroups are | |
367 | automatically created inside the parent at its creation. Thus, | |
368 | mkdir("parent) results in "parent", "parent/subgroup1", up through | |
369 | "parent/subgroupN". Items of type 1 can now be created in | |
370 | "parent/subgroup1", and items of type N can be created in | |
371 | "parent/subgroupN". | |
372 | ||
373 | These automatic subgroups, or default groups, do not preclude other | |
374 | children of the parent group. If ct_group_ops->make_group() exists, | |
375 | other child groups can be created on the parent group directly. | |
376 | ||
377 | A configfs subsystem specifies default groups by filling in the | |
378 | NULL-terminated array default_groups on the config_group structure. | |
379 | Each group in that array is populated in the configfs tree at the same | |
380 | time as the parent group. Similarly, they are removed at the same time | |
381 | as the parent. No extra notification is provided. When a ->drop_item() | |
382 | method call notifies the subsystem the parent group is going away, it | |
383 | also means every default group child associated with that parent group. | |
384 | ||
385 | As a consequence of this, default_groups cannot be removed directly via | |
386 | rmdir(2). They also are not considered when rmdir(2) on the parent | |
387 | group is checking for children. | |
388 | ||
389 | [Committable Items] | |
390 | ||
391 | NOTE: Committable items are currently unimplemented. | |
392 | ||
393 | Some config_items cannot have a valid initial state. That is, no | |
394 | default values can be specified for the item's attributes such that the | |
395 | item can do its work. Userspace must configure one or more attributes, | |
396 | after which the subsystem can start whatever entity this item | |
397 | represents. | |
398 | ||
399 | Consider the FakeNBD device from above. Without a target address *and* | |
400 | a target device, the subsystem has no idea what block device to import. | |
401 | The simple example assumes that the subsystem merely waits until all the | |
402 | appropriate attributes are configured, and then connects. This will, | |
403 | indeed, work, but now every attribute store must check if the attributes | |
404 | are initialized. Every attribute store must fire off the connection if | |
405 | that condition is met. | |
406 | ||
407 | Far better would be an explicit action notifying the subsystem that the | |
408 | config_item is ready to go. More importantly, an explicit action allows | |
3f6dee9b | 409 | the subsystem to provide feedback as to whether the attributes are |
7063fbf2 JB |
410 | initialized in a way that makes sense. configfs provides this as |
411 | committable items. | |
412 | ||
413 | configfs still uses only normal filesystem operations. An item is | |
414 | committed via rename(2). The item is moved from a directory where it | |
415 | can be modified to a directory where it cannot. | |
416 | ||
417 | Any group that provides the ct_group_ops->commit_item() method has | |
418 | committable items. When this group appears in configfs, mkdir(2) will | |
419 | not work directly in the group. Instead, the group will have two | |
420 | subdirectories: "live" and "pending". The "live" directory does not | |
421 | support mkdir(2) or rmdir(2) either. It only allows rename(2). The | |
422 | "pending" directory does allow mkdir(2) and rmdir(2). An item is | |
423 | created in the "pending" directory. Its attributes can be modified at | |
424 | will. Userspace commits the item by renaming it into the "live" | |
d6bc8ac9 | 425 | directory. At this point, the subsystem receives the ->commit_item() |
7063fbf2 JB |
426 | callback. If all required attributes are filled to satisfaction, the |
427 | method returns zero and the item is moved to the "live" directory. | |
428 | ||
429 | As rmdir(2) does not work in the "live" directory, an item must be | |
430 | shutdown, or "uncommitted". Again, this is done via rename(2), this | |
431 | time from the "live" directory back to the "pending" one. The subsystem | |
432 | is notified by the ct_group_ops->uncommit_object() method. | |
433 | ||
434 |