Commit | Line | Data |
---|---|---|
898bd37a | 1 | ========================== |
aee69d78 PV |
2 | BFQ (Budget Fair Queueing) |
3 | ========================== | |
4 | ||
5 | BFQ is a proportional-share I/O scheduler, with some extra | |
6 | low-latency capabilities. In addition to cgroups support (blkio or io | |
7 | controllers), BFQ's main features are: | |
898bd37a | 8 | |
aee69d78 PV |
9 | - BFQ guarantees a high system and application responsiveness, and a |
10 | low latency for time-sensitive applications, such as audio or video | |
11 | players; | |
12 | - BFQ distributes bandwidth, and not just time, among processes or | |
13 | groups (switching back to time distribution when needed to keep | |
14 | throughput high). | |
15 | ||
43c1b3d6 PV |
16 | In its default configuration, BFQ privileges latency over |
17 | throughput. So, when needed for achieving a lower latency, BFQ builds | |
18 | schedules that may lead to a lower throughput. If your main or only | |
19 | goal, for a given device, is to achieve the maximum-possible | |
20 | throughput at all times, then do switch off all low-latency heuristics | |
233f0bf4 PV |
21 | for that device, by setting low_latency to 0. See Section 3 for |
22 | details on how to configure BFQ for the desired tradeoff between | |
23 | latency and throughput, or on how to maximize throughput. | |
43c1b3d6 | 24 | |
4438cf50 PV |
25 | As every I/O scheduler, BFQ adds some overhead to per-I/O-request |
26 | processing. To give an idea of this overhead, the total, | |
27 | single-lock-protected, per-request processing time of BFQ---i.e., the | |
28 | sum of the execution times of the request insertion, dispatch and | |
29 | completion hooks---is, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz | |
30 | (dated CPU for notebooks; time measured with simple code | |
31 | instrumentation, and using the throughput-sync.sh script of the S | |
32 | suite [1], in performance-profiling mode). To put this result into | |
33 | context, the total, single-lock-protected, per-request execution time | |
34 | of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7 | |
35 | us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ). | |
36 | ||
37 | Scheduling overhead further limits the maximum IOPS that a CPU can | |
38 | process (already limited by the execution of the rest of the I/O | |
39 | stack). To give an idea of the limits with BFQ, on slow or average | |
40 | CPUs, here are, first, the limits of BFQ for three different CPUs, on, | |
41 | respectively, an average laptop, an old desktop, and a cheap embedded | |
42 | system, in case full hierarchical support is enabled (i.e., | |
8060c47b | 43 | CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not |
4438cf50 | 44 | set (Section 4-2): |
a33801e8 LM |
45 | - Intel i7-4850HQ: 400 KIOPS |
46 | - AMD A8-3850: 250 KIOPS | |
47 | - ARM CortexTM-A53 Octa-core: 80 KIOPS | |
48 | ||
8060c47b | 49 | If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical |
a33801e8 LM |
50 | support is enabled), then the sustainable throughput with BFQ |
51 | decreases, because all blkio.bfq* statistics are created and updated | |
52 | (Section 4-2). For BFQ, this leads to the following maximum | |
53 | sustainable throughputs, on the same systems as above: | |
24bfd19b PV |
54 | - Intel i7-4850HQ: 310 KIOPS |
55 | - AMD A8-3850: 200 KIOPS | |
56 | - ARM CortexTM-A53 Octa-core: 56 KIOPS | |
68017e5d PV |
57 | |
58 | BFQ works for multi-queue devices too. | |
aee69d78 | 59 | |
898bd37a | 60 | .. The table of contents follow. Impatients can just jump to Section 3. |
aee69d78 | 61 | |
898bd37a | 62 | .. CONTENTS |
aee69d78 | 63 | |
898bd37a MCC |
64 | 1. When may BFQ be useful? |
65 | 1-1 Personal systems | |
66 | 1-2 Server systems | |
67 | 2. How does BFQ work? | |
68 | 3. What are BFQ's tunables and how to properly configure BFQ? | |
69 | 4. BFQ group scheduling | |
70 | 4-1 Service guarantees provided | |
71 | 4-2 Interface | |
aee69d78 PV |
72 | |
73 | 1. When may BFQ be useful? | |
74 | ========================== | |
75 | ||
76 | BFQ provides the following benefits on personal and server systems. | |
77 | ||
78 | 1-1 Personal systems | |
79 | -------------------- | |
80 | ||
81 | Low latency for interactive applications | |
898bd37a | 82 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
aee69d78 PV |
83 | |
84 | Regardless of the actual background workload, BFQ guarantees that, for | |
85 | interactive tasks, the storage device is virtually as responsive as if | |
86 | it was idle. For example, even if one or more of the following | |
87 | background workloads are being executed: | |
898bd37a | 88 | |
aee69d78 PV |
89 | - one or more large files are being read, written or copied, |
90 | - a tree of source files is being compiled, | |
91 | - one or more virtual machines are performing I/O, | |
92 | - a software update is in progress, | |
93 | - indexing daemons are scanning filesystems and updating their | |
94 | databases, | |
898bd37a | 95 | |
aee69d78 PV |
96 | starting an application or loading a file from within an application |
97 | takes about the same time as if the storage device was idle. As a | |
98 | comparison, with CFQ, NOOP or DEADLINE, and in the same conditions, | |
99 | applications experience high latencies, or even become unresponsive | |
100 | until the background workload terminates (also on SSDs). | |
101 | ||
102 | Low latency for soft real-time applications | |
898bd37a | 103 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
aee69d78 PV |
104 | Also soft real-time applications, such as audio and video |
105 | players/streamers, enjoy a low latency and a low drop rate, regardless | |
106 | of the background I/O workload. As a consequence, these applications | |
107 | do not suffer from almost any glitch due to the background workload. | |
108 | ||
109 | Higher speed for code-development tasks | |
898bd37a | 110 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
aee69d78 PV |
111 | |
112 | If some additional workload happens to be executed in parallel, then | |
113 | BFQ executes the I/O-related components of typical code-development | |
114 | tasks (compilation, checkout, merge, ...) much more quickly than CFQ, | |
115 | NOOP or DEADLINE. | |
116 | ||
117 | High throughput | |
898bd37a | 118 | ^^^^^^^^^^^^^^^ |
aee69d78 PV |
119 | |
120 | On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and | |
121 | up to 150% higher throughput than DEADLINE and NOOP, with all the | |
122 | sequential workloads considered in our tests. With random workloads, | |
123 | and with all the workloads on flash-based devices, BFQ achieves, | |
124 | instead, about the same throughput as the other schedulers. | |
125 | ||
126 | Strong fairness, bandwidth and delay guarantees | |
898bd37a | 127 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
aee69d78 PV |
128 | |
129 | BFQ distributes the device throughput, and not just the device time, | |
130 | among I/O-bound applications in proportion their weights, with any | |
131 | workload and regardless of the device parameters. From these bandwidth | |
132 | guarantees, it is possible to compute tight per-I/O-request delay | |
133 | guarantees by a simple formula. If not configured for strict service | |
134 | guarantees, BFQ switches to time-based resource sharing (only) for | |
135 | applications that would otherwise cause a throughput loss. | |
136 | ||
137 | 1-2 Server systems | |
138 | ------------------ | |
139 | ||
140 | Most benefits for server systems follow from the same service | |
141 | properties as above. In particular, regardless of whether additional, | |
142 | possibly heavy workloads are being served, BFQ guarantees: | |
143 | ||
898bd37a | 144 | * audio and video-streaming with zero or very low jitter and drop |
aee69d78 PV |
145 | rate; |
146 | ||
898bd37a | 147 | * fast retrieval of WEB pages and embedded objects; |
aee69d78 | 148 | |
898bd37a | 149 | * real-time recording of data in live-dumping applications (e.g., |
aee69d78 PV |
150 | packet logging); |
151 | ||
898bd37a | 152 | * responsiveness in local and remote access to a server. |
aee69d78 PV |
153 | |
154 | ||
155 | 2. How does BFQ work? | |
156 | ===================== | |
157 | ||
158 | BFQ is a proportional-share I/O scheduler, whose general structure, | |
159 | plus a lot of code, are borrowed from CFQ. | |
160 | ||
161 | - Each process doing I/O on a device is associated with a weight and a | |
898bd37a | 162 | `(bfq_)queue`. |
aee69d78 PV |
163 | |
164 | - BFQ grants exclusive access to the device, for a while, to one queue | |
165 | (process) at a time, and implements this service model by | |
166 | associating every queue with a budget, measured in number of | |
167 | sectors. | |
168 | ||
169 | - After a queue is granted access to the device, the budget of the | |
170 | queue is decremented, on each request dispatch, by the size of the | |
171 | request. | |
172 | ||
173 | - The in-service queue is expired, i.e., its service is suspended, | |
174 | only if one of the following events occurs: 1) the queue finishes | |
175 | its budget, 2) the queue empties, 3) a "budget timeout" fires. | |
176 | ||
177 | - The budget timeout prevents processes doing random I/O from | |
178 | holding the device for too long and dramatically reducing | |
179 | throughput. | |
180 | ||
181 | - Actually, as in CFQ, a queue associated with a process issuing | |
182 | sync requests may not be expired immediately when it empties. In | |
183 | contrast, BFQ may idle the device for a short time interval, | |
184 | giving the process the chance to go on being served if it issues | |
185 | a new request in time. Device idling typically boosts the | |
2670cd16 PV |
186 | throughput on rotational devices and on non-queueing flash-based |
187 | devices, if processes do synchronous and sequential I/O. In | |
188 | addition, under BFQ, device idling is also instrumental in | |
189 | guaranteeing the desired throughput fraction to processes | |
190 | issuing sync requests (see the description of the slice_idle | |
191 | tunable in this document, or [1, 2], for more details). | |
aee69d78 PV |
192 | |
193 | - With respect to idling for service guarantees, if several | |
194 | processes are competing for the device at the same time, but | |
233f0bf4 PV |
195 | all processes and groups have the same weight, then BFQ |
196 | guarantees the expected throughput distribution without ever | |
197 | idling the device. Throughput is thus as high as possible in | |
198 | this common scenario. | |
aee69d78 | 199 | |
2670cd16 PV |
200 | - On flash-based storage with internal queueing of commands |
201 | (typically NCQ), device idling happens to be always detrimental | |
202 | for throughput. So, with these devices, BFQ performs idling | |
203 | only when strictly needed for service guarantees, i.e., for | |
204 | guaranteeing low latency or fairness. In these cases, overall | |
205 | throughput may be sub-optimal. No solution currently exists to | |
206 | provide both strong service guarantees and optimal throughput | |
207 | on devices with internal queueing. | |
208 | ||
aee69d78 PV |
209 | - If low-latency mode is enabled (default configuration), BFQ |
210 | executes some special heuristics to detect interactive and soft | |
211 | real-time applications (e.g., video or audio players/streamers), | |
212 | and to reduce their latency. The most important action taken to | |
213 | achieve this goal is to give to the queues associated with these | |
214 | applications more than their fair share of the device | |
215 | throughput. For brevity, we call just "weight-raising" the whole | |
216 | sets of actions taken by BFQ to privilege these queues. In | |
217 | particular, BFQ provides a milder form of weight-raising for | |
218 | interactive applications, and a stronger form for soft real-time | |
219 | applications. | |
220 | ||
221 | - BFQ automatically deactivates idling for queues born in a burst of | |
222 | queue creations. In fact, these queues are usually associated with | |
223 | the processes of applications and services that benefit mostly | |
224 | from a high throughput. Examples are systemd during boot, or git | |
225 | grep. | |
226 | ||
227 | - As CFQ, BFQ merges queues performing interleaved I/O, i.e., | |
228 | performing random I/O that becomes mostly sequential if | |
229 | merged. Differently from CFQ, BFQ achieves this goal with a more | |
230 | reactive mechanism, called Early Queue Merge (EQM). EQM is so | |
231 | responsive in detecting interleaved I/O (cooperating processes), | |
232 | that it enables BFQ to achieve a high throughput, by queue | |
233 | merging, even for queues for which CFQ needs a different | |
234 | mechanism, preemption, to get a high throughput. As such EQM is a | |
235 | unified mechanism to achieve a high throughput with interleaved | |
236 | I/O. | |
237 | ||
238 | - Queues are scheduled according to a variant of WF2Q+, named | |
239 | B-WF2Q+, and implemented using an augmented rb-tree to preserve an | |
240 | O(log N) overall complexity. See [2] for more details. B-WF2Q+ is | |
233f0bf4 | 241 | also ready for hierarchical scheduling, details in Section 4. |
aee69d78 PV |
242 | |
243 | - B-WF2Q+ guarantees a tight deviation with respect to an ideal, | |
244 | perfectly fair, and smooth service. In particular, B-WF2Q+ | |
245 | guarantees that each queue receives a fraction of the device | |
246 | throughput proportional to its weight, even if the throughput | |
247 | fluctuates, and regardless of: the device parameters, the current | |
248 | workload and the budgets assigned to the queue. | |
249 | ||
250 | - The last, budget-independence, property (although probably | |
251 | counterintuitive in the first place) is definitely beneficial, for | |
252 | the following reasons: | |
253 | ||
254 | - First, with any proportional-share scheduler, the maximum | |
255 | deviation with respect to an ideal service is proportional to | |
256 | the maximum budget (slice) assigned to queues. As a consequence, | |
257 | BFQ can keep this deviation tight not only because of the | |
258 | accurate service of B-WF2Q+, but also because BFQ *does not* | |
259 | need to assign a larger budget to a queue to let the queue | |
260 | receive a higher fraction of the device throughput. | |
261 | ||
262 | - Second, BFQ is free to choose, for every process (queue), the | |
263 | budget that best fits the needs of the process, or best | |
264 | leverages the I/O pattern of the process. In particular, BFQ | |
265 | updates queue budgets with a simple feedback-loop algorithm that | |
266 | allows a high throughput to be achieved, while still providing | |
267 | tight latency guarantees to time-sensitive applications. When | |
268 | the in-service queue expires, this algorithm computes the next | |
269 | budget of the queue so as to: | |
270 | ||
271 | - Let large budgets be eventually assigned to the queues | |
272 | associated with I/O-bound applications performing sequential | |
273 | I/O: in fact, the longer these applications are served once | |
274 | got access to the device, the higher the throughput is. | |
275 | ||
276 | - Let small budgets be eventually assigned to the queues | |
277 | associated with time-sensitive applications (which typically | |
278 | perform sporadic and short I/O), because, the smaller the | |
279 | budget assigned to a queue waiting for service is, the sooner | |
280 | B-WF2Q+ will serve that queue (Subsec 3.3 in [2]). | |
281 | ||
282 | - If several processes are competing for the device at the same time, | |
283 | but all processes and groups have the same weight, then BFQ | |
284 | guarantees the expected throughput distribution without ever idling | |
285 | the device. It uses preemption instead. Throughput is then much | |
286 | higher in this common scenario. | |
287 | ||
288 | - ioprio classes are served in strict priority order, i.e., | |
289 | lower-priority queues are not served as long as there are | |
290 | higher-priority queues. Among queues in the same class, the | |
291 | bandwidth is distributed in proportion to the weight of each | |
292 | queue. A very thin extra bandwidth is however guaranteed to | |
293 | the Idle class, to prevent it from starving. | |
294 | ||
295 | ||
2670cd16 PV |
296 | 3. What are BFQ's tunables and how to properly configure BFQ? |
297 | ============================================================= | |
aee69d78 | 298 | |
2670cd16 PV |
299 | Most BFQ tunables affect service guarantees (basically latency and |
300 | fairness) and throughput. For full details on how to choose the | |
301 | desired tradeoff between service guarantees and throughput, see the | |
302 | parameters slice_idle, strict_guarantees and low_latency. For details | |
303 | on how to maximise throughput, see slice_idle, timeout_sync and | |
304 | max_budget. The other performance-related parameters have been | |
305 | inherited from, and have been preserved mostly for compatibility with | |
306 | CFQ. So far, no performance improvement has been reported after | |
307 | changing the latter parameters in BFQ. | |
308 | ||
309 | In particular, the tunables back_seek-max, back_seek_penalty, | |
310 | fifo_expire_async and fifo_expire_sync below are the same as in | |
311 | CFQ. Their description is just copied from that for CFQ. Some | |
312 | considerations in the description of slice_idle are copied from CFQ | |
313 | too. | |
aee69d78 PV |
314 | |
315 | per-process ioprio and weight | |
316 | ----------------------------- | |
317 | ||
e21b7a0b AA |
318 | Unless the cgroups interface is used (see "4. BFQ group scheduling"), |
319 | weights can be assigned to processes only indirectly, through I/O | |
320 | priorities, and according to the relation: | |
321 | weight = (IOPRIO_BE_NR - ioprio) * 10. | |
322 | ||
323 | Beware that, if low-latency is set, then BFQ automatically raises the | |
324 | weight of the queues associated with interactive and soft real-time | |
325 | applications. Unset this tunable if you need/want to control weights. | |
aee69d78 PV |
326 | |
327 | slice_idle | |
328 | ---------- | |
329 | ||
330 | This parameter specifies how long BFQ should idle for next I/O | |
331 | request, when certain sync BFQ queues become empty. By default | |
332 | slice_idle is a non-zero value. Idling has a double purpose: boosting | |
333 | throughput and making sure that the desired throughput distribution is | |
334 | respected (see the description of how BFQ works, and, if needed, the | |
335 | papers referred there). | |
336 | ||
337 | As for throughput, idling can be very helpful on highly seeky media | |
338 | like single spindle SATA/SAS disks where we can cut down on overall | |
339 | number of seeks and see improved throughput. | |
340 | ||
341 | Setting slice_idle to 0 will remove all the idling on queues and one | |
342 | should see an overall improved throughput on faster storage devices | |
2670cd16 PV |
343 | like multiple SATA/SAS disks in hardware RAID configuration, as well |
344 | as flash-based storage with internal command queueing (and | |
345 | parallelism). | |
aee69d78 PV |
346 | |
347 | So depending on storage and workload, it might be useful to set | |
348 | slice_idle=0. In general for SATA/SAS disks and software RAID of | |
349 | SATA/SAS disks keeping slice_idle enabled should be useful. For any | |
350 | configurations where there are multiple spindles behind single LUN | |
2670cd16 PV |
351 | (Host based hardware RAID controller or for storage arrays), or with |
352 | flash-based fast storage, setting slice_idle=0 might end up in better | |
353 | throughput and acceptable latencies. | |
aee69d78 PV |
354 | |
355 | Idling is however necessary to have service guarantees enforced in | |
356 | case of differentiated weights or differentiated I/O-request lengths. | |
357 | To see why, suppose that a given BFQ queue A must get several I/O | |
358 | requests served for each request served for another queue B. Idling | |
359 | ensures that, if A makes a new I/O request slightly after becoming | |
360 | empty, then no request of B is dispatched in the middle, and thus A | |
361 | does not lose the possibility to get more than one request dispatched | |
362 | before the next request of B is dispatched. Note that idling | |
363 | guarantees the desired differentiated treatment of queues only in | |
364 | terms of I/O-request dispatches. To guarantee that the actual service | |
365 | order then corresponds to the dispatch order, the strict_guarantees | |
366 | tunable must be set too. | |
367 | ||
368 | There is an important flipside for idling: apart from the above cases | |
369 | where it is beneficial also for throughput, idling can severely impact | |
370 | throughput. One important case is random workload. Because of this | |
371 | issue, BFQ tends to avoid idling as much as possible, when it is not | |
2670cd16 PV |
372 | beneficial also for throughput (as detailed in Section 2). As a |
373 | consequence of this behavior, and of further issues described for the | |
374 | strict_guarantees tunable, short-term service guarantees may be | |
375 | occasionally violated. And, in some cases, these guarantees may be | |
376 | more important than guaranteeing maximum throughput. For example, in | |
377 | video playing/streaming, a very low drop rate may be more important | |
378 | than maximum throughput. In these cases, consider setting the | |
379 | strict_guarantees parameter. | |
aee69d78 | 380 | |
47cb393e JP |
381 | slice_idle_us |
382 | ------------- | |
383 | ||
384 | Controls the same tuning parameter as slice_idle, but in microseconds. | |
385 | Either tunable can be used to set idling behavior. Afterwards, the | |
386 | other tunable will reflect the newly set value in sysfs. | |
387 | ||
aee69d78 PV |
388 | strict_guarantees |
389 | ----------------- | |
390 | ||
391 | If this parameter is set (default: unset), then BFQ | |
392 | ||
393 | - always performs idling when the in-service queue becomes empty; | |
394 | ||
395 | - forces the device to serve one I/O request at a time, by dispatching a | |
396 | new request only if there is no outstanding request. | |
397 | ||
398 | In the presence of differentiated weights or I/O-request sizes, both | |
399 | the above conditions are needed to guarantee that every BFQ queue | |
400 | receives its allotted share of the bandwidth. The first condition is | |
401 | needed for the reasons explained in the description of the slice_idle | |
402 | tunable. The second condition is needed because all modern storage | |
403 | devices reorder internally-queued requests, which may trivially break | |
404 | the service guarantees enforced by the I/O scheduler. | |
405 | ||
406 | Setting strict_guarantees may evidently affect throughput. | |
407 | ||
408 | back_seek_max | |
409 | ------------- | |
410 | ||
411 | This specifies, given in Kbytes, the maximum "distance" for backward seeking. | |
412 | The distance is the amount of space from the current head location to the | |
413 | sectors that are backward in terms of distance. | |
414 | ||
415 | This parameter allows the scheduler to anticipate requests in the "backward" | |
416 | direction and consider them as being the "next" if they are within this | |
417 | distance from the current head location. | |
418 | ||
419 | back_seek_penalty | |
420 | ----------------- | |
421 | ||
422 | This parameter is used to compute the cost of backward seeking. If the | |
423 | backward distance of request is just 1/back_seek_penalty from a "front" | |
424 | request, then the seeking cost of two requests is considered equivalent. | |
425 | ||
426 | So scheduler will not bias toward one or the other request (otherwise scheduler | |
427 | will bias toward front request). Default value of back_seek_penalty is 2. | |
428 | ||
429 | fifo_expire_async | |
430 | ----------------- | |
431 | ||
432 | This parameter is used to set the timeout of asynchronous requests. Default | |
433 | value of this is 248ms. | |
434 | ||
435 | fifo_expire_sync | |
436 | ---------------- | |
437 | ||
438 | This parameter is used to set the timeout of synchronous requests. Default | |
439 | value of this is 124ms. In case to favor synchronous requests over asynchronous | |
440 | one, this value should be decreased relative to fifo_expire_async. | |
441 | ||
442 | low_latency | |
443 | ----------- | |
444 | ||
445 | This parameter is used to enable/disable BFQ's low latency mode. By | |
446 | default, low latency mode is enabled. If enabled, interactive and soft | |
447 | real-time applications are privileged and experience a lower latency, | |
448 | as explained in more detail in the description of how BFQ works. | |
449 | ||
43c1b3d6 | 450 | DISABLE this mode if you need full control on bandwidth |
44e44a1b PV |
451 | distribution. In fact, if it is enabled, then BFQ automatically |
452 | increases the bandwidth share of privileged applications, as the main | |
453 | means to guarantee a lower latency to them. | |
454 | ||
43c1b3d6 PV |
455 | In addition, as already highlighted at the beginning of this document, |
456 | DISABLE this mode if your only goal is to achieve a high throughput. | |
457 | In fact, privileging the I/O of some application over the rest may | |
458 | entail a lower throughput. To achieve the highest-possible throughput | |
459 | on a non-rotational device, setting slice_idle to 0 may be needed too | |
460 | (at the cost of giving up any strong guarantee on fairness and low | |
461 | latency). | |
462 | ||
aee69d78 PV |
463 | timeout_sync |
464 | ------------ | |
465 | ||
466 | Maximum amount of device time that can be given to a task (queue) once | |
467 | it has been selected for service. On devices with costly seeks, | |
468 | increasing this time usually increases maximum throughput. On the | |
469 | opposite end, increasing this time coarsens the granularity of the | |
470 | short-term bandwidth and latency guarantees, especially if the | |
471 | following parameter is set to zero. | |
472 | ||
473 | max_budget | |
474 | ---------- | |
475 | ||
476 | Maximum amount of service, measured in sectors, that can be provided | |
477 | to a BFQ queue once it is set in service (of course within the limits | |
478 | of the above timeout). According to what said in the description of | |
479 | the algorithm, larger values increase the throughput in proportion to | |
480 | the percentage of sequential I/O requests issued. The price of larger | |
481 | values is that they coarsen the granularity of short-term bandwidth | |
482 | and latency guarantees. | |
483 | ||
484 | The default value is 0, which enables auto-tuning: BFQ sets max_budget | |
485 | to the maximum number of sectors that can be served during | |
486 | timeout_sync, according to the estimated peak rate. | |
487 | ||
2670cd16 PV |
488 | For specific devices, some users have occasionally reported to have |
489 | reached a higher throughput by setting max_budget explicitly, i.e., by | |
490 | setting max_budget to a higher value than 0. In particular, they have | |
491 | set max_budget to higher values than those to which BFQ would have set | |
492 | it with auto-tuning. An alternative way to achieve this goal is to | |
493 | just increase the value of timeout_sync, leaving max_budget equal to 0. | |
494 | ||
aee69d78 PV |
495 | weights |
496 | ------- | |
497 | ||
498 | Read-only parameter, used to show the weights of the currently active | |
499 | BFQ queues. | |
500 | ||
501 | ||
aee69d78 PV |
502 | 4. Group scheduling with BFQ |
503 | ============================ | |
504 | ||
e21b7a0b AA |
505 | BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely |
506 | blkio and io. In particular, BFQ supports weight-based proportional | |
507 | share. To activate cgroups support, set BFQ_GROUP_IOSCHED. | |
aee69d78 PV |
508 | |
509 | 4-1 Service guarantees provided | |
510 | ------------------------------- | |
511 | ||
512 | With BFQ, proportional share means true proportional share of the | |
513 | device bandwidth, according to group weights. For example, a group | |
514 | with weight 200 gets twice the bandwidth, and not just twice the time, | |
515 | of a group with weight 100. | |
516 | ||
517 | BFQ supports hierarchies (group trees) of any depth. Bandwidth is | |
518 | distributed among groups and processes in the expected way: for each | |
519 | group, the children of the group share the whole bandwidth of the | |
520 | group in proportion to their weights. In particular, this implies | |
521 | that, for each leaf group, every process of the group receives the | |
522 | same share of the whole group bandwidth, unless the ioprio of the | |
523 | process is modified. | |
524 | ||
525 | The resource-sharing guarantee for a group may partially or totally | |
526 | switch from bandwidth to time, if providing bandwidth guarantees to | |
527 | the group lowers the throughput too much. This switch occurs on a | |
528 | per-process basis: if a process of a leaf group causes throughput loss | |
529 | if served in such a way to receive its share of the bandwidth, then | |
530 | BFQ switches back to just time-based proportional share for that | |
531 | process. | |
532 | ||
533 | 4-2 Interface | |
534 | ------------- | |
535 | ||
536 | To get proportional sharing of bandwidth with BFQ for a given device, | |
537 | BFQ must of course be the active scheduler for that device. | |
538 | ||
539 | Within each group directory, the names of the files associated with | |
540 | BFQ-specific cgroup parameters and stats begin with the "bfq." | |
541 | prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for | |
542 | BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group | |
543 | parameter to set the weight of a group with BFQ is blkio.bfq.weight | |
544 | or io.bfq.weight. | |
545 | ||
a33801e8 LM |
546 | As for cgroups-v1 (blkio controller), the exact set of stat files |
547 | created, and kept up-to-date by bfq, depends on whether | |
8060c47b | 548 | CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all |
a33801e8 | 549 | the stat files documented in |
da82c92f | 550 | Documentation/admin-guide/cgroup-v1/blkio-controller.rst. If, instead, |
898bd37a MCC |
551 | CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files:: |
552 | ||
553 | blkio.bfq.io_service_bytes | |
554 | blkio.bfq.io_service_bytes_recursive | |
555 | blkio.bfq.io_serviced | |
556 | blkio.bfq.io_serviced_recursive | |
a33801e8 | 557 | |
8060c47b | 558 | The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum |
a33801e8 LM |
559 | throughput sustainable with bfq, because updating the blkio.bfq.* |
560 | stats is rather costly, especially for some of the stats enabled by | |
8060c47b | 561 | CONFIG_BFQ_CGROUP_DEBUG. |
a33801e8 | 562 | |
aee69d78 PV |
563 | Parameters to set |
564 | ----------------- | |
565 | ||
566 | For each group, there is only the following parameter to set. | |
567 | ||
568 | weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the | |
569 | group inside its parent. Available values: 1..10000 (default 100). The | |
570 | linear mapping between ioprio and weights, described at the beginning | |
571 | of the tunable section, is still valid, but all weights higher than | |
572 | IOPRIO_BE_NR*10 are mapped to ioprio 0. | |
573 | ||
44e44a1b PV |
574 | Recall that, if low-latency is set, then BFQ automatically raises the |
575 | weight of the queues associated with interactive and soft real-time | |
576 | applications. Unset this tunable if you need/want to control weights. | |
577 | ||
aee69d78 | 578 | |
898bd37a MCC |
579 | [1] |
580 | P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O | |
aee69d78 PV |
581 | Scheduler", Proceedings of the First Workshop on Mobile System |
582 | Technologies (MST-2015), May 2015. | |
898bd37a | 583 | |
aee69d78 PV |
584 | http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf |
585 | ||
898bd37a MCC |
586 | [2] |
587 | P. Valente and M. Andreolini, "Improving Application | |
aee69d78 PV |
588 | Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of |
589 | the 5th Annual International Systems and Storage Conference | |
590 | (SYSTOR '12), June 2012. | |
898bd37a | 591 | |
aee69d78 | 592 | Slightly extended version: |
4438cf50 | 593 | |
898bd37a MCC |
594 | http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-results.pdf |
595 | ||
596 | [3] | |
597 | https://github.com/Algodev-github/S |