Commit | Line | Data |
---|---|---|
9422dc24 | 1 | .. _list_rcu_doc: |
1da177e4 | 2 | |
9422dc24 JC |
3 | Using RCU to Protect Read-Mostly Linked Lists |
4 | ============================================= | |
1da177e4 | 5 | |
06e6d1d6 PM |
6 | One of the most common uses of RCU is protecting read-mostly linked lists |
7 | (``struct list_head`` in list.h). One big advantage of this approach is | |
8 | that all of the required memory ordering is provided by the list macros. | |
9 | This document describes several list-based RCU use cases. | |
1da177e4 | 10 | |
dc8cb9df JFG |
11 | |
12 | Example 1: Read-mostly list: Deferred Destruction | |
13 | ------------------------------------------------- | |
14 | ||
15 | A widely used usecase for RCU lists in the kernel is lockless iteration over | |
16 | all processes in the system. ``task_struct::tasks`` represents the list node that | |
17 | links all the processes. The list can be traversed in parallel to any list | |
18 | additions or removals. | |
19 | ||
20 | The traversal of the list is done using ``for_each_process()`` which is defined | |
21 | by the 2 macros:: | |
22 | ||
23 | #define next_task(p) \ | |
24 | list_entry_rcu((p)->tasks.next, struct task_struct, tasks) | |
25 | ||
26 | #define for_each_process(p) \ | |
27 | for (p = &init_task ; (p = next_task(p)) != &init_task ; ) | |
28 | ||
29 | The code traversing the list of all processes typically looks like:: | |
30 | ||
31 | rcu_read_lock(); | |
32 | for_each_process(p) { | |
33 | /* Do something with p */ | |
34 | } | |
35 | rcu_read_unlock(); | |
36 | ||
06e6d1d6 PM |
37 | The simplified and heavily inlined code for removing a process from a |
38 | task list is:: | |
dc8cb9df JFG |
39 | |
40 | void release_task(struct task_struct *p) | |
41 | { | |
42 | write_lock(&tasklist_lock); | |
43 | list_del_rcu(&p->tasks); | |
44 | write_unlock(&tasklist_lock); | |
45 | call_rcu(&p->rcu, delayed_put_task_struct); | |
46 | } | |
47 | ||
06e6d1d6 PM |
48 | When a process exits, ``release_task()`` calls ``list_del_rcu(&p->tasks)`` |
49 | via __exit_signal() and __unhash_process() under ``tasklist_lock`` | |
50 | writer lock protection. The list_del_rcu() invocation removes | |
51 | the task from the list of all tasks. The ``tasklist_lock`` | |
52 | prevents concurrent list additions/removals from corrupting the | |
53 | list. Readers using ``for_each_process()`` are not protected with the | |
54 | ``tasklist_lock``. To prevent readers from noticing changes in the list | |
55 | pointers, the ``task_struct`` object is freed only after one or more | |
56 | grace periods elapse, with the help of call_rcu(), which is invoked via | |
57 | put_task_struct_rcu_user(). This deferring of destruction ensures that | |
58 | any readers traversing the list will see valid ``p->tasks.next`` pointers | |
59 | and deletion/freeing can happen in parallel with traversal of the list. | |
60 | This pattern is also called an **existence lock**, since RCU refrains | |
61 | from invoking the delayed_put_task_struct() callback function until | |
62 | all existing readers finish, which guarantees that the ``task_struct`` | |
63 | object in question will remain in existence until after the completion | |
64 | of all RCU readers that might possibly have a reference to that object. | |
dc8cb9df JFG |
65 | |
66 | ||
67 | Example 2: Read-Side Action Taken Outside of Lock: No In-Place Updates | |
9422dc24 | 68 | ---------------------------------------------------------------------- |
1da177e4 | 69 | |
06e6d1d6 PM |
70 | Some reader-writer locking use cases compute a value while holding |
71 | the read-side lock, but continue to use that value after that lock is | |
72 | released. These use cases are often good candidates for conversion | |
73 | to RCU. One prominent example involves network packet routing. | |
74 | Because the packet-routing data tracks the state of equipment outside | |
75 | of the computer, it will at times contain stale data. Therefore, once | |
76 | the route has been computed, there is no need to hold the routing table | |
77 | static during transmission of the packet. After all, you can hold the | |
78 | routing table static all you want, but that won't keep the external | |
79 | Internet from changing, and it is the state of the external Internet | |
80 | that really matters. In addition, routing entries are typically added | |
81 | or deleted, rather than being modified in place. This is a rare example | |
82 | of the finite speed of light and the non-zero size of atoms actually | |
83 | helping make synchronization be lighter weight. | |
84 | ||
85 | A straightforward example of this type of RCU use case may be found in | |
86 | the system-call auditing support. For example, a reader-writer locked | |
dc8cb9df | 87 | implementation of ``audit_filter_task()`` might be as follows:: |
1da177e4 | 88 | |
06e6d1d6 | 89 | static enum audit_state audit_filter_task(struct task_struct *tsk, char **key) |
1da177e4 LT |
90 | { |
91 | struct audit_entry *e; | |
92 | enum audit_state state; | |
93 | ||
94 | read_lock(&auditsc_lock); | |
dc8cb9df | 95 | /* Note: audit_filter_mutex held by caller. */ |
1da177e4 LT |
96 | list_for_each_entry(e, &audit_tsklist, list) { |
97 | if (audit_filter_rules(tsk, &e->rule, NULL, &state)) { | |
06e6d1d6 PM |
98 | if (state == AUDIT_STATE_RECORD) |
99 | *key = kstrdup(e->rule.filterkey, GFP_ATOMIC); | |
1da177e4 LT |
100 | read_unlock(&auditsc_lock); |
101 | return state; | |
102 | } | |
103 | } | |
104 | read_unlock(&auditsc_lock); | |
105 | return AUDIT_BUILD_CONTEXT; | |
106 | } | |
107 | ||
108 | Here the list is searched under the lock, but the lock is dropped before | |
109 | the corresponding value is returned. By the time that this value is acted | |
110 | on, the list may well have been modified. This makes sense, since if | |
111 | you are turning auditing off, it is OK to audit a few extra system calls. | |
112 | ||
9422dc24 | 113 | This means that RCU can be easily applied to the read side, as follows:: |
1da177e4 | 114 | |
06e6d1d6 | 115 | static enum audit_state audit_filter_task(struct task_struct *tsk, char **key) |
1da177e4 LT |
116 | { |
117 | struct audit_entry *e; | |
118 | enum audit_state state; | |
119 | ||
120 | rcu_read_lock(); | |
dc8cb9df | 121 | /* Note: audit_filter_mutex held by caller. */ |
1da177e4 LT |
122 | list_for_each_entry_rcu(e, &audit_tsklist, list) { |
123 | if (audit_filter_rules(tsk, &e->rule, NULL, &state)) { | |
06e6d1d6 PM |
124 | if (state == AUDIT_STATE_RECORD) |
125 | *key = kstrdup(e->rule.filterkey, GFP_ATOMIC); | |
1da177e4 LT |
126 | rcu_read_unlock(); |
127 | return state; | |
128 | } | |
129 | } | |
130 | rcu_read_unlock(); | |
131 | return AUDIT_BUILD_CONTEXT; | |
132 | } | |
133 | ||
06e6d1d6 PM |
134 | The read_lock() and read_unlock() calls have become rcu_read_lock() |
135 | and rcu_read_unlock(), respectively, and the list_for_each_entry() | |
136 | has become list_for_each_entry_rcu(). The **_rcu()** list-traversal | |
137 | primitives add READ_ONCE() and diagnostic checks for incorrect use | |
138 | outside of an RCU read-side critical section. | |
1da177e4 | 139 | |
dc8cb9df | 140 | The changes to the update side are also straightforward. A reader-writer lock |
06e6d1d6 PM |
141 | might be used as follows for deletion and insertion in these simplified |
142 | versions of audit_del_rule() and audit_add_rule():: | |
1da177e4 LT |
143 | |
144 | static inline int audit_del_rule(struct audit_rule *rule, | |
145 | struct list_head *list) | |
146 | { | |
dc8cb9df | 147 | struct audit_entry *e; |
1da177e4 LT |
148 | |
149 | write_lock(&auditsc_lock); | |
150 | list_for_each_entry(e, list, list) { | |
151 | if (!audit_compare_rule(rule, &e->rule)) { | |
152 | list_del(&e->list); | |
153 | write_unlock(&auditsc_lock); | |
154 | return 0; | |
155 | } | |
156 | } | |
157 | write_unlock(&auditsc_lock); | |
158 | return -EFAULT; /* No matching rule */ | |
159 | } | |
160 | ||
161 | static inline int audit_add_rule(struct audit_entry *entry, | |
162 | struct list_head *list) | |
163 | { | |
164 | write_lock(&auditsc_lock); | |
165 | if (entry->rule.flags & AUDIT_PREPEND) { | |
166 | entry->rule.flags &= ~AUDIT_PREPEND; | |
167 | list_add(&entry->list, list); | |
168 | } else { | |
169 | list_add_tail(&entry->list, list); | |
170 | } | |
171 | write_unlock(&auditsc_lock); | |
172 | return 0; | |
173 | } | |
174 | ||
9422dc24 | 175 | Following are the RCU equivalents for these two functions:: |
1da177e4 LT |
176 | |
177 | static inline int audit_del_rule(struct audit_rule *rule, | |
178 | struct list_head *list) | |
179 | { | |
dc8cb9df | 180 | struct audit_entry *e; |
1da177e4 | 181 | |
dc8cb9df | 182 | /* No need to use the _rcu iterator here, since this is the only |
1da177e4 LT |
183 | * deletion routine. */ |
184 | list_for_each_entry(e, list, list) { | |
185 | if (!audit_compare_rule(rule, &e->rule)) { | |
186 | list_del_rcu(&e->list); | |
3943ac5d | 187 | call_rcu(&e->rcu, audit_free_rule); |
1da177e4 LT |
188 | return 0; |
189 | } | |
190 | } | |
191 | return -EFAULT; /* No matching rule */ | |
192 | } | |
193 | ||
194 | static inline int audit_add_rule(struct audit_entry *entry, | |
195 | struct list_head *list) | |
196 | { | |
197 | if (entry->rule.flags & AUDIT_PREPEND) { | |
198 | entry->rule.flags &= ~AUDIT_PREPEND; | |
199 | list_add_rcu(&entry->list, list); | |
200 | } else { | |
201 | list_add_tail_rcu(&entry->list, list); | |
202 | } | |
203 | return 0; | |
204 | } | |
205 | ||
06e6d1d6 | 206 | Normally, the write_lock() and write_unlock() would be replaced by a |
dc8cb9df JFG |
207 | spin_lock() and a spin_unlock(). But in this case, all callers hold |
208 | ``audit_filter_mutex``, so no additional locking is required. The | |
06e6d1d6 | 209 | auditsc_lock can therefore be eliminated, since use of RCU eliminates the |
dc8cb9df | 210 | need for writers to exclude readers. |
1da177e4 LT |
211 | |
212 | The list_del(), list_add(), and list_add_tail() primitives have been | |
213 | replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu(). | |
06e6d1d6 PM |
214 | The **_rcu()** list-manipulation primitives add memory barriers that are |
215 | needed on weakly ordered CPUs. The list_del_rcu() primitive omits the | |
dc8cb9df JFG |
216 | pointer poisoning debug-assist code that would otherwise cause concurrent |
217 | readers to fail spectacularly. | |
1da177e4 | 218 | |
dc8cb9df JFG |
219 | So, when readers can tolerate stale data and when entries are either added or |
220 | deleted, without in-place modification, it is very easy to use RCU! | |
1da177e4 | 221 | |
dc8cb9df JFG |
222 | |
223 | Example 3: Handling In-Place Updates | |
9422dc24 | 224 | ------------------------------------ |
1da177e4 | 225 | |
dc8cb9df JFG |
226 | The system-call auditing code does not update auditing rules in place. However, |
227 | if it did, the reader-writer-locked code to do so might look as follows | |
228 | (assuming only ``field_count`` is updated, otherwise, the added fields would | |
229 | need to be filled in):: | |
1da177e4 LT |
230 | |
231 | static inline int audit_upd_rule(struct audit_rule *rule, | |
232 | struct list_head *list, | |
233 | __u32 newaction, | |
234 | __u32 newfield_count) | |
235 | { | |
dc8cb9df JFG |
236 | struct audit_entry *e; |
237 | struct audit_entry *ne; | |
1da177e4 LT |
238 | |
239 | write_lock(&auditsc_lock); | |
dc8cb9df | 240 | /* Note: audit_filter_mutex held by caller. */ |
1da177e4 LT |
241 | list_for_each_entry(e, list, list) { |
242 | if (!audit_compare_rule(rule, &e->rule)) { | |
243 | e->rule.action = newaction; | |
c50a8714 | 244 | e->rule.field_count = newfield_count; |
1da177e4 LT |
245 | write_unlock(&auditsc_lock); |
246 | return 0; | |
247 | } | |
248 | } | |
249 | write_unlock(&auditsc_lock); | |
250 | return -EFAULT; /* No matching rule */ | |
251 | } | |
252 | ||
253 | The RCU version creates a copy, updates the copy, then replaces the old | |
254 | entry with the newly updated entry. This sequence of actions, allowing | |
dc8cb9df | 255 | concurrent reads while making a copy to perform an update, is what gives |
06e6d1d6 PM |
256 | RCU (*read-copy update*) its name. |
257 | ||
258 | The RCU version of audit_upd_rule() is as follows:: | |
1da177e4 LT |
259 | |
260 | static inline int audit_upd_rule(struct audit_rule *rule, | |
261 | struct list_head *list, | |
262 | __u32 newaction, | |
263 | __u32 newfield_count) | |
264 | { | |
dc8cb9df JFG |
265 | struct audit_entry *e; |
266 | struct audit_entry *ne; | |
1da177e4 LT |
267 | |
268 | list_for_each_entry(e, list, list) { | |
269 | if (!audit_compare_rule(rule, &e->rule)) { | |
270 | ne = kmalloc(sizeof(*entry), GFP_ATOMIC); | |
271 | if (ne == NULL) | |
272 | return -ENOMEM; | |
273 | audit_copy_rule(&ne->rule, &e->rule); | |
274 | ne->rule.action = newaction; | |
c50a8714 | 275 | ne->rule.field_count = newfield_count; |
57d34a6c | 276 | list_replace_rcu(&e->list, &ne->list); |
3943ac5d | 277 | call_rcu(&e->rcu, audit_free_rule); |
1da177e4 LT |
278 | return 0; |
279 | } | |
280 | } | |
281 | return -EFAULT; /* No matching rule */ | |
282 | } | |
283 | ||
dc8cb9df JFG |
284 | Again, this assumes that the caller holds ``audit_filter_mutex``. Normally, the |
285 | writer lock would become a spinlock in this sort of code. | |
1da177e4 | 286 | |
06e6d1d6 PM |
287 | The update_lsm_rule() does something very similar, for those who would |
288 | prefer to look at real Linux-kernel code. | |
289 | ||
dc8cb9df JFG |
290 | Another use of this pattern can be found in the openswitch driver's *connection |
291 | tracking table* code in ``ct_limit_set()``. The table holds connection tracking | |
292 | entries and has a limit on the maximum entries. There is one such table | |
293 | per-zone and hence one *limit* per zone. The zones are mapped to their limits | |
294 | through a hashtable using an RCU-managed hlist for the hash chains. When a new | |
295 | limit is set, a new limit object is allocated and ``ct_limit_set()`` is called | |
296 | to replace the old limit object with the new one using list_replace_rcu(). | |
297 | The old limit object is then freed after a grace period using kfree_rcu(). | |
298 | ||
299 | ||
300 | Example 4: Eliminating Stale Data | |
9422dc24 | 301 | --------------------------------- |
1da177e4 | 302 | |
dc8cb9df | 303 | The auditing example above tolerates stale data, as do most algorithms |
06e6d1d6 PM |
304 | that are tracking external state. After all, given there is a delay |
305 | from the time the external state changes before Linux becomes aware | |
306 | of the change, and so as noted earlier, a small quantity of additional | |
307 | RCU-induced staleness is generally not a problem. | |
1da177e4 LT |
308 | |
309 | However, there are many examples where stale data cannot be tolerated. | |
3282b046 SP |
310 | One example in the Linux kernel is the System V IPC (see the shm_lock() |
311 | function in ipc/shm.c). This code checks a *deleted* flag under a | |
dc8cb9df | 312 | per-entry spinlock, and, if the *deleted* flag is set, pretends that the |
1da177e4 | 313 | entry does not exist. For this to be helpful, the search function must |
3282b046 | 314 | return holding the per-entry spinlock, as shm_lock() does in fact do. |
dc8cb9df JFG |
315 | |
316 | .. _quick_quiz: | |
1da177e4 | 317 | |
9422dc24 | 318 | Quick Quiz: |
dc8cb9df JFG |
319 | For the deleted-flag technique to be helpful, why is it necessary |
320 | to hold the per-entry lock while returning from the search function? | |
9422dc24 | 321 | |
dc8cb9df | 322 | :ref:`Answer to Quick Quiz <quick_quiz_answer>` |
1da177e4 | 323 | |
dc8cb9df JFG |
324 | If the system-call audit module were to ever need to reject stale data, one way |
325 | to accomplish this would be to add a ``deleted`` flag and a ``lock`` spinlock to the | |
06e6d1d6 | 326 | ``audit_entry`` structure, and modify audit_filter_task() as follows:: |
1da177e4 LT |
327 | |
328 | static enum audit_state audit_filter_task(struct task_struct *tsk) | |
329 | { | |
330 | struct audit_entry *e; | |
331 | enum audit_state state; | |
332 | ||
333 | rcu_read_lock(); | |
334 | list_for_each_entry_rcu(e, &audit_tsklist, list) { | |
335 | if (audit_filter_rules(tsk, &e->rule, NULL, &state)) { | |
336 | spin_lock(&e->lock); | |
337 | if (e->deleted) { | |
338 | spin_unlock(&e->lock); | |
339 | rcu_read_unlock(); | |
340 | return AUDIT_BUILD_CONTEXT; | |
341 | } | |
342 | rcu_read_unlock(); | |
06e6d1d6 PM |
343 | if (state == AUDIT_STATE_RECORD) |
344 | *key = kstrdup(e->rule.filterkey, GFP_ATOMIC); | |
1da177e4 LT |
345 | return state; |
346 | } | |
347 | } | |
348 | rcu_read_unlock(); | |
349 | return AUDIT_BUILD_CONTEXT; | |
350 | } | |
351 | ||
dc8cb9df JFG |
352 | The ``audit_del_rule()`` function would need to set the ``deleted`` flag under the |
353 | spinlock as follows:: | |
1da177e4 LT |
354 | |
355 | static inline int audit_del_rule(struct audit_rule *rule, | |
356 | struct list_head *list) | |
357 | { | |
dc8cb9df | 358 | struct audit_entry *e; |
1da177e4 | 359 | |
dc8cb9df | 360 | /* No need to use the _rcu iterator here, since this |
d19720a9 | 361 | * is the only deletion routine. */ |
1da177e4 LT |
362 | list_for_each_entry(e, list, list) { |
363 | if (!audit_compare_rule(rule, &e->rule)) { | |
364 | spin_lock(&e->lock); | |
365 | list_del_rcu(&e->list); | |
366 | e->deleted = 1; | |
367 | spin_unlock(&e->lock); | |
3943ac5d | 368 | call_rcu(&e->rcu, audit_free_rule); |
1da177e4 LT |
369 | return 0; |
370 | } | |
371 | } | |
372 | return -EFAULT; /* No matching rule */ | |
373 | } | |
374 | ||
dc8cb9df JFG |
375 | This too assumes that the caller holds ``audit_filter_mutex``. |
376 | ||
06e6d1d6 PM |
377 | Note that this example assumes that entries are only added and deleted. |
378 | Additional mechanism is required to deal correctly with the update-in-place | |
379 | performed by audit_upd_rule(). For one thing, audit_upd_rule() would | |
380 | need to hold the locks of both the old ``audit_entry`` and its replacement | |
381 | while executing the list_replace_rcu(). | |
382 | ||
dc8cb9df JFG |
383 | |
384 | Example 5: Skipping Stale Objects | |
385 | --------------------------------- | |
386 | ||
06e6d1d6 PM |
387 | For some use cases, reader performance can be improved by skipping |
388 | stale objects during read-side list traversal, where stale objects | |
389 | are those that will be removed and destroyed after one or more grace | |
390 | periods. One such example can be found in the timerfd subsystem. When a | |
391 | ``CLOCK_REALTIME`` clock is reprogrammed (for example due to setting | |
392 | of the system time) then all programmed ``timerfds`` that depend on | |
393 | this clock get triggered and processes waiting on them are awakened in | |
394 | advance of their scheduled expiry. To facilitate this, all such timers | |
395 | are added to an RCU-managed ``cancel_list`` when they are setup in | |
dc8cb9df JFG |
396 | ``timerfd_setup_cancel()``:: |
397 | ||
398 | static void timerfd_setup_cancel(struct timerfd_ctx *ctx, int flags) | |
399 | { | |
400 | spin_lock(&ctx->cancel_lock); | |
06e6d1d6 PM |
401 | if ((ctx->clockid == CLOCK_REALTIME || |
402 | ctx->clockid == CLOCK_REALTIME_ALARM) && | |
dc8cb9df JFG |
403 | (flags & TFD_TIMER_ABSTIME) && (flags & TFD_TIMER_CANCEL_ON_SET)) { |
404 | if (!ctx->might_cancel) { | |
405 | ctx->might_cancel = true; | |
406 | spin_lock(&cancel_lock); | |
407 | list_add_rcu(&ctx->clist, &cancel_list); | |
408 | spin_unlock(&cancel_lock); | |
409 | } | |
06e6d1d6 PM |
410 | } else { |
411 | __timerfd_remove_cancel(ctx); | |
dc8cb9df JFG |
412 | } |
413 | spin_unlock(&ctx->cancel_lock); | |
414 | } | |
415 | ||
06e6d1d6 PM |
416 | When a timerfd is freed (fd is closed), then the ``might_cancel`` |
417 | flag of the timerfd object is cleared, the object removed from the | |
418 | ``cancel_list`` and destroyed, as shown in this simplified and inlined | |
419 | version of timerfd_release():: | |
dc8cb9df JFG |
420 | |
421 | int timerfd_release(struct inode *inode, struct file *file) | |
422 | { | |
423 | struct timerfd_ctx *ctx = file->private_data; | |
424 | ||
425 | spin_lock(&ctx->cancel_lock); | |
426 | if (ctx->might_cancel) { | |
427 | ctx->might_cancel = false; | |
428 | spin_lock(&cancel_lock); | |
429 | list_del_rcu(&ctx->clist); | |
430 | spin_unlock(&cancel_lock); | |
431 | } | |
432 | spin_unlock(&ctx->cancel_lock); | |
433 | ||
06e6d1d6 PM |
434 | if (isalarm(ctx)) |
435 | alarm_cancel(&ctx->t.alarm); | |
436 | else | |
437 | hrtimer_cancel(&ctx->t.tmr); | |
dc8cb9df JFG |
438 | kfree_rcu(ctx, rcu); |
439 | return 0; | |
440 | } | |
441 | ||
442 | If the ``CLOCK_REALTIME`` clock is set, for example by a time server, the | |
443 | hrtimer framework calls ``timerfd_clock_was_set()`` which walks the | |
444 | ``cancel_list`` and wakes up processes waiting on the timerfd. While iterating | |
445 | the ``cancel_list``, the ``might_cancel`` flag is consulted to skip stale | |
446 | objects:: | |
447 | ||
448 | void timerfd_clock_was_set(void) | |
449 | { | |
06e6d1d6 | 450 | ktime_t moffs = ktime_mono_to_real(0); |
dc8cb9df JFG |
451 | struct timerfd_ctx *ctx; |
452 | unsigned long flags; | |
453 | ||
454 | rcu_read_lock(); | |
455 | list_for_each_entry_rcu(ctx, &cancel_list, clist) { | |
456 | if (!ctx->might_cancel) | |
457 | continue; | |
458 | spin_lock_irqsave(&ctx->wqh.lock, flags); | |
06e6d1d6 | 459 | if (ctx->moffs != moffs) { |
dc8cb9df JFG |
460 | ctx->moffs = KTIME_MAX; |
461 | ctx->ticks++; | |
462 | wake_up_locked_poll(&ctx->wqh, EPOLLIN); | |
463 | } | |
464 | spin_unlock_irqrestore(&ctx->wqh.lock, flags); | |
465 | } | |
466 | rcu_read_unlock(); | |
467 | } | |
468 | ||
06e6d1d6 PM |
469 | The key point is that because RCU-protected traversal of the |
470 | ``cancel_list`` happens concurrently with object addition and removal, | |
471 | sometimes the traversal can access an object that has been removed from | |
472 | the list. In this example, a flag is used to skip such objects. | |
dc8cb9df JFG |
473 | |
474 | ||
1da177e4 | 475 | Summary |
9422dc24 | 476 | ------- |
1da177e4 LT |
477 | |
478 | Read-mostly list-based data structures that can tolerate stale data are | |
479 | the most amenable to use of RCU. The simplest case is where entries are | |
480 | either added or deleted from the data structure (or atomically modified | |
481 | in place), but non-atomic in-place modifications can be handled by making | |
482 | a copy, updating the copy, then replacing the original with the copy. | |
dc8cb9df | 483 | If stale data cannot be tolerated, then a *deleted* flag may be used |
1da177e4 LT |
484 | in conjunction with a per-entry spinlock in order to allow the search |
485 | function to reject newly deleted data. | |
486 | ||
dc8cb9df | 487 | .. _quick_quiz_answer: |
1da177e4 | 488 | |
9422dc24 | 489 | Answer to Quick Quiz: |
dc8cb9df JFG |
490 | For the deleted-flag technique to be helpful, why is it necessary |
491 | to hold the per-entry lock while returning from the search function? | |
d19720a9 PM |
492 | |
493 | If the search function drops the per-entry lock before returning, | |
494 | then the caller will be processing stale data in any case. If it | |
495 | is really OK to be processing stale data, then you don't need a | |
dc8cb9df | 496 | *deleted* flag. If processing stale data really is a problem, |
d19720a9 PM |
497 | then you need to hold the per-entry lock across all of the code |
498 | that uses the value that was returned. | |
dc8cb9df JFG |
499 | |
500 | :ref:`Back to Quick Quiz <quick_quiz>` |