Commit | Line | Data |
---|---|---|
d6a3b247 | 1 | ================================================ |
0c373344 IM |
2 | Completions - "wait for completion" barrier APIs |
3 | ================================================ | |
202799be NMG |
4 | |
5 | Introduction: | |
6 | ------------- | |
7 | ||
0c373344 | 8 | If you have one or more threads that must wait for some kernel activity |
7085f6c3 JC |
9 | to have reached a point or a specific state, completions can provide a |
10 | race-free solution to this problem. Semantically they are somewhat like a | |
0c373344 | 11 | pthread_barrier() and have similar use-cases. |
202799be | 12 | |
7085f6c3 | 13 | Completions are a code synchronization mechanism which is preferable to any |
0c373344 IM |
14 | misuse of locks/semaphores and busy-loops. Any time you think of using |
15 | yield() or some quirky msleep(1) loop to allow something else to proceed, | |
16 | you probably want to look into using one of the wait_for_completion*() | |
17 | calls and complete() instead. | |
18 | ||
19 | The advantage of using completions is that they have a well defined, focused | |
20 | purpose which makes it very easy to see the intent of the code, but they | |
21 | also result in more efficient code as all threads can continue execution | |
22 | until the result is actually needed, and both the waiting and the signalling | |
23 | is highly efficient using low level scheduler sleep/wakeup facilities. | |
24 | ||
25 | Completions are built on top of the waitqueue and wakeup infrastructure of | |
26 | the Linux scheduler. The event the threads on the waitqueue are waiting for | |
27 | is reduced to a simple flag in 'struct completion', appropriately called "done". | |
28 | ||
29 | As completions are scheduling related, the code can be found in | |
dc92726e | 30 | kernel/sched/completion.c. |
202799be NMG |
31 | |
32 | ||
33 | Usage: | |
34 | ------ | |
35 | ||
0c373344 IM |
36 | There are three main parts to using completions: |
37 | ||
38 | - the initialization of the 'struct completion' synchronization object | |
39 | - the waiting part through a call to one of the variants of wait_for_completion(), | |
40 | - the signaling side through a call to complete() or complete_all(). | |
41 | ||
42 | There are also some helper functions for checking the state of completions. | |
43 | Note that while initialization must happen first, the waiting and signaling | |
44 | part can happen in any order. I.e. it's entirely normal for a thread | |
45 | to have marked a completion as 'done' before another thread checks whether | |
46 | it has to wait for it. | |
202799be | 47 | |
0c373344 IM |
48 | To use completions you need to #include <linux/completion.h> and |
49 | create a static or dynamic variable of type 'struct completion', | |
d6a3b247 | 50 | which has only two fields:: |
202799be NMG |
51 | |
52 | struct completion { | |
53 | unsigned int done; | |
54 | wait_queue_head_t wait; | |
55 | }; | |
56 | ||
0c373344 IM |
57 | This provides the ->wait waitqueue to place tasks on for waiting (if any), and |
58 | the ->done completion flag for indicating whether it's completed or not. | |
202799be | 59 | |
0c373344 | 60 | Completions should be named to refer to the event that is being synchronized on. |
d6a3b247 | 61 | A good example is:: |
202799be NMG |
62 | |
63 | wait_for_completion(&early_console_added); | |
64 | ||
65 | complete(&early_console_added); | |
66 | ||
0c373344 IM |
67 | Good, intuitive naming (as always) helps code readability. Naming a completion |
68 | 'complete' is not helpful unless the purpose is super obvious... | |
202799be NMG |
69 | |
70 | ||
71 | Initializing completions: | |
72 | ------------------------- | |
73 | ||
11e13696 NMG |
74 | Dynamically allocated completion objects should preferably be embedded in data |
75 | structures that are assured to be alive for the life-time of the function/driver, | |
76 | to prevent races with asynchronous complete() calls from occurring. | |
77 | ||
78 | Particular care should be taken when using the _timeout() or _killable()/_interruptible() | |
79 | variants of wait_for_completion(), as it must be assured that memory de-allocation | |
80 | does not happen until all related activities (complete() or reinit_completion()) | |
81 | have taken place, even if these wait functions return prematurely due to a timeout | |
82 | or a signal triggering. | |
83 | ||
84 | Initializing of dynamically allocated completion objects is done via a call to | |
d6a3b247 | 85 | init_completion():: |
202799be | 86 | |
0c373344 | 87 | init_completion(&dynamic_object->done); |
202799be | 88 | |
0c373344 IM |
89 | In this call we initialize the waitqueue and set ->done to 0, i.e. "not completed" |
90 | or "not done". | |
202799be NMG |
91 | |
92 | The re-initialization function, reinit_completion(), simply resets the | |
0c373344 IM |
93 | ->done field to 0 ("not done"), without touching the waitqueue. |
94 | Callers of this function must make sure that there are no racy | |
95 | wait_for_completion() calls going on in parallel. | |
96 | ||
97 | Calling init_completion() on the same completion object twice is | |
202799be | 98 | most likely a bug as it re-initializes the queue to an empty queue and |
0c373344 IM |
99 | enqueued tasks could get "lost" - use reinit_completion() in that case, |
100 | but be aware of other races. | |
202799be | 101 | |
0c373344 | 102 | For static declaration and initialization, macros are available. |
202799be | 103 | |
d6a3b247 MCC |
104 | For static (or global) declarations in file scope you can use |
105 | DECLARE_COMPLETION():: | |
202799be | 106 | |
0c373344 IM |
107 | static DECLARE_COMPLETION(setup_done); |
108 | DECLARE_COMPLETION(setup_done); | |
202799be | 109 | |
0c373344 IM |
110 | Note that in this case the completion is boot time (or module load time) |
111 | initialized to 'not done' and doesn't require an init_completion() call. | |
202799be | 112 | |
0c373344 | 113 | When a completion is declared as a local variable within a function, |
11e13696 NMG |
114 | then the initialization should always use DECLARE_COMPLETION_ONSTACK() |
115 | explicitly, not just to make lockdep happy, but also to make it clear | |
d6a3b247 | 116 | that limited scope had been considered and is intentional:: |
202799be | 117 | |
0c373344 IM |
118 | DECLARE_COMPLETION_ONSTACK(setup_done) |
119 | ||
0c373344 | 120 | Note that when using completion objects as local variables you must be |
11e13696 NMG |
121 | acutely aware of the short life time of the function stack: the function |
122 | must not return to a calling context until all activities (such as waiting | |
123 | threads) have ceased and the completion object is completely unused. | |
124 | ||
125 | To emphasise this again: in particular when using some of the waiting API variants | |
126 | with more complex outcomes, such as the timeout or signalling (_timeout(), | |
127 | _killable() and _interruptible()) variants, the wait might complete | |
128 | prematurely while the object might still be in use by another thread - and a return | |
129 | from the wait_on_completion*() caller function will deallocate the function | |
130 | stack and cause subtle data corruption if a complete() is done in some | |
131 | other thread. Simple testing might not trigger these kinds of races. | |
132 | ||
133 | If unsure, use dynamically allocated completion objects, preferably embedded | |
134 | in some other long lived object that has a boringly long life time which | |
135 | exceeds the life time of any helper threads using the completion object, | |
136 | or has a lock or other synchronization mechanism to make sure complete() | |
137 | is not called on a freed object. | |
138 | ||
139 | A naive DECLARE_COMPLETION() on the stack triggers a lockdep warning. | |
202799be NMG |
140 | |
141 | Waiting for completions: | |
142 | ------------------------ | |
143 | ||
0c373344 | 144 | For a thread to wait for some concurrent activity to finish, it |
d6a3b247 | 145 | calls wait_for_completion() on the initialized completion structure:: |
0c373344 IM |
146 | |
147 | void wait_for_completion(struct completion *done) | |
148 | ||
d6a3b247 | 149 | A typical usage scenario is:: |
202799be | 150 | |
0c373344 IM |
151 | CPU#1 CPU#2 |
152 | ||
7085f6c3 | 153 | struct completion setup_done; |
0c373344 | 154 | |
202799be | 155 | init_completion(&setup_done); |
0c373344 | 156 | initialize_work(...,&setup_done,...); |
202799be | 157 | |
0c373344 | 158 | /* run non-dependent code */ /* do setup */ |
202799be | 159 | |
f98b161b | 160 | wait_for_completion(&setup_done); complete(&setup_done); |
202799be | 161 | |
0c373344 IM |
162 | This is not implying any particular order between wait_for_completion() and |
163 | the call to complete() - if the call to complete() happened before the call | |
202799be | 164 | to wait_for_completion() then the waiting side simply will continue |
7b6abce7 | 165 | immediately as all dependencies are satisfied; if not, it will block until |
4988aaa6 | 166 | completion is signaled by complete(). |
202799be | 167 | |
7085f6c3 | 168 | Note that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(), |
202799be | 169 | so it can only be called safely when you know that interrupts are enabled. |
0c373344 IM |
170 | Calling it from IRQs-off atomic contexts will result in hard-to-detect |
171 | spurious enabling of interrupts. | |
202799be | 172 | |
7085f6c3 | 173 | The default behavior is to wait without a timeout and to mark the task as |
202799be | 174 | uninterruptible. wait_for_completion() and its variants are only safe |
4988aaa6 | 175 | in process context (as they can sleep) but not in atomic context, |
0c373344 | 176 | interrupt context, with disabled IRQs, or preemption is disabled - see also |
4988aaa6 NMG |
177 | try_wait_for_completion() below for handling completion in atomic/interrupt |
178 | context. | |
179 | ||
202799be | 180 | As all variants of wait_for_completion() can (obviously) block for a long |
0c373344 IM |
181 | time depending on the nature of the activity they are waiting for, so in |
182 | most cases you probably don't want to call this with held mutexes. | |
202799be NMG |
183 | |
184 | ||
0c373344 IM |
185 | wait_for_completion*() variants available: |
186 | ------------------------------------------ | |
202799be NMG |
187 | |
188 | The below variants all return status and this status should be checked in | |
189 | most(/all) cases - in cases where the status is deliberately not checked you | |
190 | probably want to make a note explaining this (e.g. see | |
191 | arch/arm/kernel/smp.c:__cpu_up()). | |
192 | ||
193 | A common problem that occurs is to have unclean assignment of return types, | |
0c373344 IM |
194 | so take care to assign return-values to variables of the proper type. |
195 | ||
196 | Checking for the specific meaning of return values also has been found | |
d6a3b247 | 197 | to be quite inaccurate, e.g. constructs like:: |
0c373344 IM |
198 | |
199 | if (!wait_for_completion_interruptible_timeout(...)) | |
200 | ||
201 | ... would execute the same code path for successful completion and for the | |
d6a3b247 | 202 | interrupted case - which is probably not what you want:: |
202799be NMG |
203 | |
204 | int wait_for_completion_interruptible(struct completion *done) | |
205 | ||
0c373344 | 206 | This function marks the task TASK_INTERRUPTIBLE while it is waiting. |
d6a3b247 | 207 | If a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise:: |
202799be | 208 | |
0c373344 | 209 | unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout) |
202799be | 210 | |
4988aaa6 | 211 | The task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout' |
0c373344 IM |
212 | jiffies. If a timeout occurs it returns 0, else the remaining time in |
213 | jiffies (but at least 1). | |
214 | ||
215 | Timeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies(), | |
216 | to make the code largely HZ-invariant. | |
217 | ||
218 | If the returned timeout value is deliberately ignored a comment should probably explain | |
d6a3b247 | 219 | why (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc()):: |
202799be | 220 | |
0c373344 | 221 | long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout) |
202799be | 222 | |
7085f6c3 JC |
223 | This function passes a timeout in jiffies and marks the task as |
224 | TASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS; | |
0c373344 | 225 | otherwise it returns 0 if the completion timed out, or the remaining time in |
7085f6c3 | 226 | jiffies if completion occurred. |
202799be | 227 | |
7085f6c3 | 228 | Further variants include _killable which uses TASK_KILLABLE as the |
0c373344 | 229 | designated tasks state and will return -ERESTARTSYS if it is interrupted, |
d6a3b247 | 230 | or 0 if completion was achieved. There is a _timeout variant as well:: |
202799be NMG |
231 | |
232 | long wait_for_completion_killable(struct completion *done) | |
0c373344 | 233 | long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout) |
202799be | 234 | |
4988aaa6 | 235 | The _io variants wait_for_completion_io() behave the same as the non-_io |
0c373344 | 236 | variants, except for accounting waiting time as 'waiting on IO', which has |
d6a3b247 | 237 | an impact on how the task is accounted in scheduling/IO stats:: |
202799be NMG |
238 | |
239 | void wait_for_completion_io(struct completion *done) | |
0c373344 | 240 | unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout) |
202799be NMG |
241 | |
242 | ||
243 | Signaling completions: | |
244 | ---------------------- | |
245 | ||
4988aaa6 NMG |
246 | A thread that wants to signal that the conditions for continuation have been |
247 | achieved calls complete() to signal exactly one of the waiters that it can | |
d6a3b247 | 248 | continue:: |
202799be NMG |
249 | |
250 | void complete(struct completion *done) | |
251 | ||
d6a3b247 | 252 | ... or calls complete_all() to signal all current and future waiters:: |
202799be NMG |
253 | |
254 | void complete_all(struct completion *done) | |
255 | ||
256 | The signaling will work as expected even if completions are signaled before | |
257 | a thread starts waiting. This is achieved by the waiter "consuming" | |
0c373344 | 258 | (decrementing) the done field of 'struct completion'. Waiting threads |
202799be NMG |
259 | wakeup order is the same in which they were enqueued (FIFO order). |
260 | ||
261 | If complete() is called multiple times then this will allow for that number | |
262 | of waiters to continue - each call to complete() will simply increment the | |
0c373344 IM |
263 | done field. Calling complete_all() multiple times is a bug though. Both |
264 | complete() and complete_all() can be called in IRQ/atomic context safely. | |
202799be | 265 | |
0c373344 IM |
266 | There can only be one thread calling complete() or complete_all() on a |
267 | particular 'struct completion' at any time - serialized through the wait | |
202799be NMG |
268 | queue spinlock. Any such concurrent calls to complete() or complete_all() |
269 | probably are a design bug. | |
270 | ||
0c373344 | 271 | Signaling completion from IRQ context is fine as it will appropriately |
01aa9d51 | 272 | lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never |
d6a3b247 | 273 | sleep. |
202799be NMG |
274 | |
275 | ||
276 | try_wait_for_completion()/completion_done(): | |
277 | -------------------------------------------- | |
278 | ||
4988aaa6 NMG |
279 | The try_wait_for_completion() function will not put the thread on the wait |
280 | queue but rather returns false if it would need to enqueue (block) the thread, | |
d6a3b247 | 281 | else it consumes one posted completion and returns true:: |
202799be | 282 | |
4988aaa6 | 283 | bool try_wait_for_completion(struct completion *done) |
202799be | 284 | |
0c373344 | 285 | Finally, to check the state of a completion without changing it in any way, |
7085f6c3 JC |
286 | call completion_done(), which returns false if there are no posted |
287 | completions that were not yet consumed by waiters (implying that there are | |
d6a3b247 | 288 | waiters) and true otherwise:: |
202799be | 289 | |
4988aaa6 | 290 | bool completion_done(struct completion *done) |
202799be NMG |
291 | |
292 | Both try_wait_for_completion() and completion_done() are safe to be called in | |
0c373344 | 293 | IRQ or atomic context. |