Commit | Line | Data |
---|---|---|
b9b158fe VR |
1 | Real-Time group scheduling |
2 | -------------------------- | |
9f0c1e56 | 3 | |
b9b158fe VR |
4 | CONTENTS |
5 | ======== | |
9f0c1e56 | 6 | |
60aa605d | 7 | 0. WARNING |
b9b158fe VR |
8 | 1. Overview |
9 | 1.1 The problem | |
10 | 1.2 The solution | |
11 | 2. The interface | |
12 | 2.1 System-wide settings | |
13 | 2.2 Default behaviour | |
14 | 2.3 Basis for grouping tasks | |
15 | 3. Future plans | |
9f0c1e56 | 16 | |
9f0c1e56 | 17 | |
60aa605d PZ |
18 | 0. WARNING |
19 | ========== | |
20 | ||
21 | Fiddling with these settings can result in an unstable system, the knobs are | |
22 | root only and assumes root knows what he is doing. | |
23 | ||
24 | Most notable: | |
25 | ||
26 | * very small values in sched_rt_period_us can result in an unstable | |
27 | system when the period is smaller than either the available hrtimer | |
28 | resolution, or the time it takes to handle the budget refresh itself. | |
29 | ||
30 | * very small values in sched_rt_runtime_us can result in an unstable | |
31 | system when the runtime is so small the system has difficulty making | |
32 | forward progress (NOTE: the migration thread and kstopmachine both | |
33 | are real-time processes). | |
34 | ||
b9b158fe VR |
35 | 1. Overview |
36 | =========== | |
9f0c1e56 | 37 | |
9f0c1e56 | 38 | |
b9b158fe VR |
39 | 1.1 The problem |
40 | --------------- | |
9f0c1e56 | 41 | |
b9b158fe VR |
42 | Realtime scheduling is all about determinism, a group has to be able to rely on |
43 | the amount of bandwidth (eg. CPU time) being constant. In order to schedule | |
44 | multiple groups of realtime tasks, each group must be assigned a fixed portion | |
45 | of the CPU time available. Without a minimum guarantee a realtime group can | |
46 | obviously fall short. A fuzzy upper limit is of no use since it cannot be | |
47 | relied upon. Which leaves us with just the single fixed portion. | |
9f0c1e56 | 48 | |
b9b158fe VR |
49 | 1.2 The solution |
50 | ---------------- | |
9f0c1e56 | 51 | |
b9b158fe VR |
52 | CPU time is divided by means of specifying how much time can be spent running |
53 | in a given period. We allocate this "run time" for each realtime group which | |
54 | the other realtime groups will not be permitted to use. | |
9f0c1e56 | 55 | |
b9b158fe VR |
56 | Any time not allocated to a realtime group will be used to run normal priority |
57 | tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by | |
58 | SCHED_OTHER. | |
9f0c1e56 | 59 | |
b9b158fe VR |
60 | Let's consider an example: a frame fixed realtime renderer must deliver 25 |
61 | frames a second, which yields a period of 0.04s per frame. Now say it will also | |
62 | have to play some music and respond to input, leaving it with around 80% CPU | |
63 | time dedicated for the graphics. We can then give this group a run time of 0.8 | |
64 | * 0.04s = 0.032s. | |
9f0c1e56 | 65 | |
b9b158fe VR |
66 | This way the graphics group will have a 0.04s period with a 0.032s run time |
67 | limit. Now if the audio thread needs to refill the DMA buffer every 0.005s, but | |
68 | needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s = | |
69 | 0.00015s. So this group can be scheduled with a period of 0.005s and a run time | |
70 | of 0.00015s. | |
9f0c1e56 | 71 | |
f7d62364 | 72 | The remaining CPU time will be used for user input and other tasks. Because |
b9b158fe | 73 | realtime tasks have explicitly allocated the CPU time they need to perform |
f7d62364 | 74 | their tasks, buffer underruns in the graphics or audio can be eliminated. |
9f0c1e56 | 75 | |
d4ec36ba | 76 | NOTE: the above example is not fully implemented yet. We still |
b9b158fe | 77 | lack an EDF scheduler to make non-uniform periods usable. |
9f0c1e56 | 78 | |
9f0c1e56 | 79 | |
b9b158fe VR |
80 | 2. The Interface |
81 | ================ | |
9f0c1e56 | 82 | |
9f0c1e56 | 83 | |
b9b158fe VR |
84 | 2.1 System wide settings |
85 | ------------------------ | |
9f0c1e56 | 86 | |
b9b158fe | 87 | The system wide settings are configured under the /proc virtual file system: |
9f0c1e56 | 88 | |
b9b158fe VR |
89 | /proc/sys/kernel/sched_rt_period_us: |
90 | The scheduling period that is equivalent to 100% CPU bandwidth | |
9f0c1e56 | 91 | |
b9b158fe VR |
92 | /proc/sys/kernel/sched_rt_runtime_us: |
93 | A global limit on how much time realtime scheduling may use. Even without | |
94 | CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime | |
95 | processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth | |
96 | available to all realtime groups. | |
97 | ||
98 | * Time is specified in us because the interface is s32. This gives an | |
99 | operating range from 1us to about 35 minutes. | |
100 | * sched_rt_period_us takes values from 1 to INT_MAX. | |
101 | * sched_rt_runtime_us takes values from -1 to (INT_MAX - 1). | |
102 | * A run time of -1 specifies runtime == period, ie. no limit. | |
103 | ||
104 | ||
105 | 2.2 Default behaviour | |
106 | --------------------- | |
107 | ||
108 | The default values for sched_rt_period_us (1000000 or 1s) and | |
109 | sched_rt_runtime_us (950000 or 0.95s). This gives 0.05s to be used by | |
110 | SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away | |
111 | realtime tasks will not lock up the machine but leave a little time to recover | |
112 | it. By setting runtime to -1 you'd get the old behaviour back. | |
113 | ||
114 | By default all bandwidth is assigned to the root group and new groups get the | |
115 | period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you | |
116 | want to assign bandwidth to another group, reduce the root group's bandwidth | |
117 | and assign some or all of the difference to another group. | |
118 | ||
119 | Realtime group scheduling means you have to assign a portion of total CPU | |
120 | bandwidth to the group before it will accept realtime tasks. Therefore you will | |
121 | not be able to run realtime tasks as any user other than root until you have | |
122 | done that, even if the user has the rights to run processes with realtime | |
123 | priority! | |
124 | ||
125 | ||
126 | 2.3 Basis for grouping tasks | |
127 | ---------------------------- | |
128 | ||
25c2d55c LZ |
129 | Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real |
130 | CPU bandwidth to task groups. | |
b9b158fe | 131 | |
f6e07d38 JS |
132 | This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us" |
133 | to control the CPU time reserved for each control group. | |
b9b158fe VR |
134 | |
135 | For more information on working with control groups, you should read | |
09c3bcce | 136 | Documentation/cgroup-v1/cgroups.txt as well. |
b9b158fe | 137 | |
d4ec36ba WS |
138 | Group settings are checked against the following limits in order to keep the |
139 | configuration schedulable: | |
9f0c1e56 PZ |
140 | |
141 | \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period | |
142 | ||
b9b158fe VR |
143 | For now, this can be simplified to just the following (but see Future plans): |
144 | ||
145 | \Sum_{i} runtime_{i} <= global_runtime | |
146 | ||
147 | ||
148 | 3. Future plans | |
149 | =============== | |
150 | ||
151 | There is work in progress to make the scheduling period for each group | |
f6e07d38 | 152 | ("<cgroup>/cpu.rt_period_us") configurable as well. |
b9b158fe VR |
153 | |
154 | The constraint on the period is that a subgroup must have a smaller or | |
155 | equal period to its parent. But realistically its not very useful _yet_ | |
156 | as its prone to starvation without deadline scheduling. | |
157 | ||
158 | Consider two sibling groups A and B; both have 50% bandwidth, but A's | |
159 | period is twice the length of B's. | |
160 | ||
3a09b8d4 ZC |
161 | * group A: period=100000us, runtime=50000us |
162 | - this runs for 0.05s once every 0.1s | |
b9b158fe | 163 | |
3a09b8d4 ZC |
164 | * group B: period= 50000us, runtime=25000us |
165 | - this runs for 0.025s twice every 0.1s (or once every 0.05 sec). | |
b9b158fe VR |
166 | |
167 | This means that currently a while (1) loop in A will run for the full period of | |
168 | B and can starve B's tasks (assuming they are of lower priority) for a whole | |
169 | period. | |
170 | ||
171 | The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring | |
172 | full deadline scheduling to the linux kernel. Deadline scheduling the above | |
173 | groups and treating end of the period as a deadline will ensure that they both | |
174 | get their allocated time. | |
175 | ||
176 | Implementing SCHED_EDF might take a while to complete. Priority Inheritance is | |
177 | the biggest challenge as the current linux PI infrastructure is geared towards | |
f04d82b7 | 178 | the limited static priority levels 0-99. With deadline scheduling you need to |
b9b158fe | 179 | do deadline inheritance (since priority is inversely proportional to the |
d4ec36ba | 180 | deadline delta (deadline - now)). |
b9b158fe VR |
181 | |
182 | This means the whole PI machinery will have to be reworked - and that is one of | |
183 | the most complex pieces of code we have. |