Commit | Line | Data |
---|---|---|
9cc07df4 MCC |
1 | =========================================================================== |
2 | Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe | |
3 | =========================================================================== | |
1da177e4 | 4 | |
9cc07df4 | 5 | :Author: Robert Love <rml@tech9.net> |
1da177e4 | 6 | |
9cc07df4 MCC |
7 | |
8 | Introduction | |
9 | ============ | |
1da177e4 LT |
10 | |
11 | ||
12 | A preemptible kernel creates new locking issues. The issues are the same as | |
13 | those under SMP: concurrency and reentrancy. Thankfully, the Linux preemptible | |
14 | kernel model leverages existing SMP locking mechanisms. Thus, the kernel | |
15 | requires explicit additional locking for very few additional situations. | |
16 | ||
17 | This document is for all kernel hackers. Developing code in the kernel | |
18 | requires protecting these situations. | |
19 | ||
20 | ||
21 | RULE #1: Per-CPU data structures need explicit protection | |
9cc07df4 | 22 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
1da177e4 LT |
23 | |
24 | ||
9cc07df4 | 25 | Two similar problems arise. An example code snippet:: |
1da177e4 LT |
26 | |
27 | struct this_needs_locking tux[NR_CPUS]; | |
28 | tux[smp_processor_id()] = some_value; | |
29 | /* task is preempted here... */ | |
30 | something = tux[smp_processor_id()]; | |
31 | ||
32 | First, since the data is per-CPU, it may not have explicit SMP locking, but | |
33 | require it otherwise. Second, when a preempted task is finally rescheduled, | |
34 | the previous value of smp_processor_id may not equal the current. You must | |
35 | protect these situations by disabling preemption around them. | |
36 | ||
37 | You can also use put_cpu() and get_cpu(), which will disable preemption. | |
38 | ||
39 | ||
40 | RULE #2: CPU state must be protected. | |
9cc07df4 | 41 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
1da177e4 LT |
42 | |
43 | ||
44 | Under preemption, the state of the CPU must be protected. This is arch- | |
45 | dependent, but includes CPU structures and state not preserved over a context | |
46 | switch. For example, on x86, entering and exiting FPU mode is now a critical | |
47 | section that must occur while preemption is disabled. Think what would happen | |
48 | if the kernel is executing a floating-point instruction and is then preempted. | |
49 | Remember, the kernel does not save FPU state except for user tasks. Therefore, | |
50 | upon preemption, the FPU registers will be sold to the lowest bidder. Thus, | |
51 | preemption must be disabled around such regions. | |
52 | ||
53 | Note, some FPU functions are already explicitly preempt safe. For example, | |
54 | kernel_fpu_begin and kernel_fpu_end will disable and enable preemption. | |
3a0aee48 | 55 | However, fpu__restore() must be called with preemption disabled. |
1da177e4 LT |
56 | |
57 | ||
58 | RULE #3: Lock acquire and release must be performed by same task | |
9cc07df4 | 59 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
1da177e4 LT |
60 | |
61 | ||
62 | A lock acquired in one task must be released by the same task. This | |
63 | means you can't do oddball things like acquire a lock and go off to | |
64 | play while another task releases it. If you want to do something | |
65 | like this, acquire and release the task in the same code path and | |
66 | have the caller wait on an event by the other task. | |
67 | ||
68 | ||
9cc07df4 MCC |
69 | Solution |
70 | ======== | |
1da177e4 LT |
71 | |
72 | ||
73 | Data protection under preemption is achieved by disabling preemption for the | |
74 | duration of the critical region. | |
75 | ||
9cc07df4 MCC |
76 | :: |
77 | ||
78 | preempt_enable() decrement the preempt counter | |
79 | preempt_disable() increment the preempt counter | |
80 | preempt_enable_no_resched() decrement, but do not immediately preempt | |
81 | preempt_check_resched() if needed, reschedule | |
82 | preempt_count() return the preempt counter | |
1da177e4 LT |
83 | |
84 | The functions are nestable. In other words, you can call preempt_disable | |
85 | n-times in a code path, and preemption will not be reenabled until the n-th | |
86 | call to preempt_enable. The preempt statements define to nothing if | |
87 | preemption is not enabled. | |
88 | ||
89 | Note that you do not need to explicitly prevent preemption if you are holding | |
90 | any locks or interrupts are disabled, since preemption is implicitly disabled | |
91 | in those cases. | |
92 | ||
93 | But keep in mind that 'irqs disabled' is a fundamentally unsafe way of | |
44280690 AM |
94 | disabling preemption - any cond_resched() or cond_resched_lock() might trigger |
95 | a reschedule if the preempt count is 0. A simple printk() might trigger a | |
96 | reschedule. So use this implicit preemption-disabling property only if you | |
97 | know that the affected codepath does not do any of this. Best policy is to use | |
98 | this only for small, atomic code that you wrote and which calls no complex | |
99 | functions. | |
1da177e4 | 100 | |
9cc07df4 | 101 | Example:: |
1da177e4 LT |
102 | |
103 | cpucache_t *cc; /* this is per-CPU */ | |
104 | preempt_disable(); | |
105 | cc = cc_data(searchp); | |
106 | if (cc && cc->avail) { | |
107 | __free_block(searchp, cc_entry(cc), cc->avail); | |
108 | cc->avail = 0; | |
109 | } | |
110 | preempt_enable(); | |
111 | return 0; | |
112 | ||
113 | Notice how the preemption statements must encompass every reference of the | |
9cc07df4 | 114 | critical variables. Another example:: |
1da177e4 LT |
115 | |
116 | int buf[NR_CPUS]; | |
117 | set_cpu_val(buf); | |
118 | if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n"); | |
119 | spin_lock(&buf_lock); | |
120 | /* ... */ | |
121 | ||
122 | This code is not preempt-safe, but see how easily we can fix it by simply | |
123 | moving the spin_lock up two lines. | |
124 | ||
125 | ||
9cc07df4 MCC |
126 | Preventing preemption using interrupt disabling |
127 | =============================================== | |
1da177e4 LT |
128 | |
129 | ||
130 | It is possible to prevent a preemption event using local_irq_disable and | |
131 | local_irq_save. Note, when doing so, you must be very careful to not cause | |
132 | an event that would set need_resched and result in a preemption check. When | |
133 | in doubt, rely on locking or explicit preemption disabling. | |
134 | ||
135 | Note in 2.5 interrupt disabling is now only per-CPU (e.g. local). | |
136 | ||
137 | An additional concern is proper usage of local_irq_disable and local_irq_save. | |
138 | These may be used to protect from preemption, however, on exit, if preemption | |
139 | may be enabled, a test to see if preemption is required should be done. If | |
140 | these are called from the spin_lock and read/write lock macros, the right thing | |
141 | is done. They may also be called within a spin-lock protected region, however, | |
142 | if they are ever called outside of this context, a test for preemption should | |
143 | be made. Do note that calls from interrupt context or bottom half/ tasklets | |
144 | are also protected by preemption locks and so may use the versions which do | |
145 | not check preemption. |