Commit | Line | Data |
---|---|---|
0c373344 IM |
1 | Completions - "wait for completion" barrier APIs |
2 | ================================================ | |
202799be NMG |
3 | |
4 | Introduction: | |
5 | ------------- | |
6 | ||
0c373344 | 7 | If you have one or more threads that must wait for some kernel activity |
7085f6c3 JC |
8 | to have reached a point or a specific state, completions can provide a |
9 | race-free solution to this problem. Semantically they are somewhat like a | |
0c373344 | 10 | pthread_barrier() and have similar use-cases. |
202799be | 11 | |
7085f6c3 | 12 | Completions are a code synchronization mechanism which is preferable to any |
0c373344 IM |
13 | misuse of locks/semaphores and busy-loops. Any time you think of using |
14 | yield() or some quirky msleep(1) loop to allow something else to proceed, | |
15 | you probably want to look into using one of the wait_for_completion*() | |
16 | calls and complete() instead. | |
17 | ||
18 | The advantage of using completions is that they have a well defined, focused | |
19 | purpose which makes it very easy to see the intent of the code, but they | |
20 | also result in more efficient code as all threads can continue execution | |
21 | until the result is actually needed, and both the waiting and the signalling | |
22 | is highly efficient using low level scheduler sleep/wakeup facilities. | |
23 | ||
24 | Completions are built on top of the waitqueue and wakeup infrastructure of | |
25 | the Linux scheduler. The event the threads on the waitqueue are waiting for | |
26 | is reduced to a simple flag in 'struct completion', appropriately called "done". | |
27 | ||
28 | As completions are scheduling related, the code can be found in | |
dc92726e | 29 | kernel/sched/completion.c. |
202799be NMG |
30 | |
31 | ||
32 | Usage: | |
33 | ------ | |
34 | ||
0c373344 IM |
35 | There are three main parts to using completions: |
36 | ||
37 | - the initialization of the 'struct completion' synchronization object | |
38 | - the waiting part through a call to one of the variants of wait_for_completion(), | |
39 | - the signaling side through a call to complete() or complete_all(). | |
40 | ||
41 | There are also some helper functions for checking the state of completions. | |
42 | Note that while initialization must happen first, the waiting and signaling | |
43 | part can happen in any order. I.e. it's entirely normal for a thread | |
44 | to have marked a completion as 'done' before another thread checks whether | |
45 | it has to wait for it. | |
202799be | 46 | |
0c373344 IM |
47 | To use completions you need to #include <linux/completion.h> and |
48 | create a static or dynamic variable of type 'struct completion', | |
49 | which has only two fields: | |
202799be NMG |
50 | |
51 | struct completion { | |
52 | unsigned int done; | |
53 | wait_queue_head_t wait; | |
54 | }; | |
55 | ||
0c373344 IM |
56 | This provides the ->wait waitqueue to place tasks on for waiting (if any), and |
57 | the ->done completion flag for indicating whether it's completed or not. | |
202799be | 58 | |
0c373344 IM |
59 | Completions should be named to refer to the event that is being synchronized on. |
60 | A good example is: | |
202799be NMG |
61 | |
62 | wait_for_completion(&early_console_added); | |
63 | ||
64 | complete(&early_console_added); | |
65 | ||
0c373344 IM |
66 | Good, intuitive naming (as always) helps code readability. Naming a completion |
67 | 'complete' is not helpful unless the purpose is super obvious... | |
202799be NMG |
68 | |
69 | ||
70 | Initializing completions: | |
71 | ------------------------- | |
72 | ||
11e13696 NMG |
73 | Dynamically allocated completion objects should preferably be embedded in data |
74 | structures that are assured to be alive for the life-time of the function/driver, | |
75 | to prevent races with asynchronous complete() calls from occurring. | |
76 | ||
77 | Particular care should be taken when using the _timeout() or _killable()/_interruptible() | |
78 | variants of wait_for_completion(), as it must be assured that memory de-allocation | |
79 | does not happen until all related activities (complete() or reinit_completion()) | |
80 | have taken place, even if these wait functions return prematurely due to a timeout | |
81 | or a signal triggering. | |
82 | ||
83 | Initializing of dynamically allocated completion objects is done via a call to | |
84 | init_completion(): | |
202799be | 85 | |
0c373344 | 86 | init_completion(&dynamic_object->done); |
202799be | 87 | |
0c373344 IM |
88 | In this call we initialize the waitqueue and set ->done to 0, i.e. "not completed" |
89 | or "not done". | |
202799be NMG |
90 | |
91 | The re-initialization function, reinit_completion(), simply resets the | |
0c373344 IM |
92 | ->done field to 0 ("not done"), without touching the waitqueue. |
93 | Callers of this function must make sure that there are no racy | |
94 | wait_for_completion() calls going on in parallel. | |
95 | ||
96 | Calling init_completion() on the same completion object twice is | |
202799be | 97 | most likely a bug as it re-initializes the queue to an empty queue and |
0c373344 IM |
98 | enqueued tasks could get "lost" - use reinit_completion() in that case, |
99 | but be aware of other races. | |
202799be | 100 | |
0c373344 | 101 | For static declaration and initialization, macros are available. |
202799be | 102 | |
0c373344 | 103 | For static (or global) declarations in file scope you can use DECLARE_COMPLETION(): |
202799be | 104 | |
0c373344 IM |
105 | static DECLARE_COMPLETION(setup_done); |
106 | DECLARE_COMPLETION(setup_done); | |
202799be | 107 | |
0c373344 IM |
108 | Note that in this case the completion is boot time (or module load time) |
109 | initialized to 'not done' and doesn't require an init_completion() call. | |
202799be | 110 | |
0c373344 | 111 | When a completion is declared as a local variable within a function, |
11e13696 NMG |
112 | then the initialization should always use DECLARE_COMPLETION_ONSTACK() |
113 | explicitly, not just to make lockdep happy, but also to make it clear | |
114 | that limited scope had been considered and is intentional: | |
202799be | 115 | |
0c373344 IM |
116 | DECLARE_COMPLETION_ONSTACK(setup_done) |
117 | ||
0c373344 | 118 | Note that when using completion objects as local variables you must be |
11e13696 NMG |
119 | acutely aware of the short life time of the function stack: the function |
120 | must not return to a calling context until all activities (such as waiting | |
121 | threads) have ceased and the completion object is completely unused. | |
122 | ||
123 | To emphasise this again: in particular when using some of the waiting API variants | |
124 | with more complex outcomes, such as the timeout or signalling (_timeout(), | |
125 | _killable() and _interruptible()) variants, the wait might complete | |
126 | prematurely while the object might still be in use by another thread - and a return | |
127 | from the wait_on_completion*() caller function will deallocate the function | |
128 | stack and cause subtle data corruption if a complete() is done in some | |
129 | other thread. Simple testing might not trigger these kinds of races. | |
130 | ||
131 | If unsure, use dynamically allocated completion objects, preferably embedded | |
132 | in some other long lived object that has a boringly long life time which | |
133 | exceeds the life time of any helper threads using the completion object, | |
134 | or has a lock or other synchronization mechanism to make sure complete() | |
135 | is not called on a freed object. | |
136 | ||
137 | A naive DECLARE_COMPLETION() on the stack triggers a lockdep warning. | |
202799be NMG |
138 | |
139 | Waiting for completions: | |
140 | ------------------------ | |
141 | ||
0c373344 IM |
142 | For a thread to wait for some concurrent activity to finish, it |
143 | calls wait_for_completion() on the initialized completion structure: | |
144 | ||
145 | void wait_for_completion(struct completion *done) | |
146 | ||
202799be NMG |
147 | A typical usage scenario is: |
148 | ||
0c373344 IM |
149 | CPU#1 CPU#2 |
150 | ||
7085f6c3 | 151 | struct completion setup_done; |
0c373344 | 152 | |
202799be | 153 | init_completion(&setup_done); |
0c373344 | 154 | initialize_work(...,&setup_done,...); |
202799be | 155 | |
0c373344 | 156 | /* run non-dependent code */ /* do setup */ |
202799be | 157 | |
0c373344 | 158 | wait_for_completion(&setup_done); complete(setup_done); |
202799be | 159 | |
0c373344 IM |
160 | This is not implying any particular order between wait_for_completion() and |
161 | the call to complete() - if the call to complete() happened before the call | |
202799be | 162 | to wait_for_completion() then the waiting side simply will continue |
7b6abce7 | 163 | immediately as all dependencies are satisfied; if not, it will block until |
4988aaa6 | 164 | completion is signaled by complete(). |
202799be | 165 | |
7085f6c3 | 166 | Note that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(), |
202799be | 167 | so it can only be called safely when you know that interrupts are enabled. |
0c373344 IM |
168 | Calling it from IRQs-off atomic contexts will result in hard-to-detect |
169 | spurious enabling of interrupts. | |
202799be | 170 | |
7085f6c3 | 171 | The default behavior is to wait without a timeout and to mark the task as |
202799be | 172 | uninterruptible. wait_for_completion() and its variants are only safe |
4988aaa6 | 173 | in process context (as they can sleep) but not in atomic context, |
0c373344 | 174 | interrupt context, with disabled IRQs, or preemption is disabled - see also |
4988aaa6 NMG |
175 | try_wait_for_completion() below for handling completion in atomic/interrupt |
176 | context. | |
177 | ||
202799be | 178 | As all variants of wait_for_completion() can (obviously) block for a long |
0c373344 IM |
179 | time depending on the nature of the activity they are waiting for, so in |
180 | most cases you probably don't want to call this with held mutexes. | |
202799be NMG |
181 | |
182 | ||
0c373344 IM |
183 | wait_for_completion*() variants available: |
184 | ------------------------------------------ | |
202799be NMG |
185 | |
186 | The below variants all return status and this status should be checked in | |
187 | most(/all) cases - in cases where the status is deliberately not checked you | |
188 | probably want to make a note explaining this (e.g. see | |
189 | arch/arm/kernel/smp.c:__cpu_up()). | |
190 | ||
191 | A common problem that occurs is to have unclean assignment of return types, | |
0c373344 IM |
192 | so take care to assign return-values to variables of the proper type. |
193 | ||
194 | Checking for the specific meaning of return values also has been found | |
195 | to be quite inaccurate, e.g. constructs like: | |
196 | ||
197 | if (!wait_for_completion_interruptible_timeout(...)) | |
198 | ||
199 | ... would execute the same code path for successful completion and for the | |
200 | interrupted case - which is probably not what you want. | |
202799be NMG |
201 | |
202 | int wait_for_completion_interruptible(struct completion *done) | |
203 | ||
0c373344 IM |
204 | This function marks the task TASK_INTERRUPTIBLE while it is waiting. |
205 | If a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise. | |
202799be | 206 | |
0c373344 | 207 | unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout) |
202799be | 208 | |
4988aaa6 | 209 | The task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout' |
0c373344 IM |
210 | jiffies. If a timeout occurs it returns 0, else the remaining time in |
211 | jiffies (but at least 1). | |
212 | ||
213 | Timeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies(), | |
214 | to make the code largely HZ-invariant. | |
215 | ||
216 | If the returned timeout value is deliberately ignored a comment should probably explain | |
217 | why (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc()). | |
202799be | 218 | |
0c373344 | 219 | long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout) |
202799be | 220 | |
7085f6c3 JC |
221 | This function passes a timeout in jiffies and marks the task as |
222 | TASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS; | |
0c373344 | 223 | otherwise it returns 0 if the completion timed out, or the remaining time in |
7085f6c3 | 224 | jiffies if completion occurred. |
202799be | 225 | |
7085f6c3 | 226 | Further variants include _killable which uses TASK_KILLABLE as the |
0c373344 IM |
227 | designated tasks state and will return -ERESTARTSYS if it is interrupted, |
228 | or 0 if completion was achieved. There is a _timeout variant as well: | |
202799be NMG |
229 | |
230 | long wait_for_completion_killable(struct completion *done) | |
0c373344 | 231 | long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout) |
202799be | 232 | |
4988aaa6 | 233 | The _io variants wait_for_completion_io() behave the same as the non-_io |
0c373344 IM |
234 | variants, except for accounting waiting time as 'waiting on IO', which has |
235 | an impact on how the task is accounted in scheduling/IO stats: | |
202799be NMG |
236 | |
237 | void wait_for_completion_io(struct completion *done) | |
0c373344 | 238 | unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout) |
202799be NMG |
239 | |
240 | ||
241 | Signaling completions: | |
242 | ---------------------- | |
243 | ||
4988aaa6 NMG |
244 | A thread that wants to signal that the conditions for continuation have been |
245 | achieved calls complete() to signal exactly one of the waiters that it can | |
0c373344 | 246 | continue: |
202799be NMG |
247 | |
248 | void complete(struct completion *done) | |
249 | ||
0c373344 | 250 | ... or calls complete_all() to signal all current and future waiters: |
202799be NMG |
251 | |
252 | void complete_all(struct completion *done) | |
253 | ||
254 | The signaling will work as expected even if completions are signaled before | |
255 | a thread starts waiting. This is achieved by the waiter "consuming" | |
0c373344 | 256 | (decrementing) the done field of 'struct completion'. Waiting threads |
202799be NMG |
257 | wakeup order is the same in which they were enqueued (FIFO order). |
258 | ||
259 | If complete() is called multiple times then this will allow for that number | |
260 | of waiters to continue - each call to complete() will simply increment the | |
0c373344 IM |
261 | done field. Calling complete_all() multiple times is a bug though. Both |
262 | complete() and complete_all() can be called in IRQ/atomic context safely. | |
202799be | 263 | |
0c373344 IM |
264 | There can only be one thread calling complete() or complete_all() on a |
265 | particular 'struct completion' at any time - serialized through the wait | |
202799be NMG |
266 | queue spinlock. Any such concurrent calls to complete() or complete_all() |
267 | probably are a design bug. | |
268 | ||
0c373344 | 269 | Signaling completion from IRQ context is fine as it will appropriately |
01aa9d51 LT |
270 | lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never |
271 | sleep. | |
202799be NMG |
272 | |
273 | ||
274 | try_wait_for_completion()/completion_done(): | |
275 | -------------------------------------------- | |
276 | ||
4988aaa6 NMG |
277 | The try_wait_for_completion() function will not put the thread on the wait |
278 | queue but rather returns false if it would need to enqueue (block) the thread, | |
7085f6c3 | 279 | else it consumes one posted completion and returns true. |
202799be | 280 | |
4988aaa6 | 281 | bool try_wait_for_completion(struct completion *done) |
202799be | 282 | |
0c373344 | 283 | Finally, to check the state of a completion without changing it in any way, |
7085f6c3 JC |
284 | call completion_done(), which returns false if there are no posted |
285 | completions that were not yet consumed by waiters (implying that there are | |
286 | waiters) and true otherwise; | |
202799be | 287 | |
4988aaa6 | 288 | bool completion_done(struct completion *done) |
202799be NMG |
289 | |
290 | Both try_wait_for_completion() and completion_done() are safe to be called in | |
0c373344 | 291 | IRQ or atomic context. |