Commit | Line | Data |
---|---|---|
9b038086 | 1 | ====================================================== |
faf4db00 TG |
2 | Net DIM - Generic Network Dynamic Interrupt Moderation |
3 | ====================================================== | |
4 | ||
9b038086 | 5 | :Author: Tal Gilboa <talgi@mellanox.com> |
faf4db00 | 6 | |
9b038086 | 7 | .. contents:: :depth: 2 |
faf4db00 | 8 | |
9b038086 JK |
9 | Assumptions |
10 | =========== | |
faf4db00 TG |
11 | |
12 | This document assumes the reader has basic knowledge in network drivers | |
13 | and in general interrupt moderation. | |
14 | ||
15 | ||
9b038086 JK |
16 | Introduction |
17 | ============ | |
faf4db00 TG |
18 | |
19 | Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the | |
20 | interrupt moderation configuration of a channel in order to optimize packet | |
21 | processing. The mechanism includes an algorithm which decides if and how to | |
22 | change moderation parameters for a channel, usually by performing an analysis on | |
23 | runtime data sampled from the system. Net DIM is such a mechanism. In each | |
24 | iteration of the algorithm, it analyses a given sample of the data, compares it | |
25 | to the previous sample and if required, it can decide to change some of the | |
26 | interrupt moderation configuration fields. The data sample is composed of data | |
27 | bandwidth, the number of packets and the number of events. The time between | |
28 | samples is also measured. Net DIM compares the current and the previous data and | |
29 | returns an adjusted interrupt moderation configuration object. In some cases, | |
30 | the algorithm might decide not to change anything. The configuration fields are | |
31 | the minimum duration (microseconds) allowed between events and the maximum | |
32 | number of wanted packets per event. The Net DIM algorithm ascribes importance to | |
33 | increase bandwidth over reducing interrupt rate. | |
34 | ||
35 | ||
9b038086 JK |
36 | Net DIM Algorithm |
37 | ================= | |
faf4db00 TG |
38 | |
39 | Each iteration of the Net DIM algorithm follows these steps: | |
9b038086 JK |
40 | |
41 | #. Calculates new data sample. | |
42 | #. Compares it to previous sample. | |
43 | #. Makes a decision - suggests interrupt moderation configuration fields. | |
44 | #. Applies a schedule work function, which applies suggested configuration. | |
faf4db00 TG |
45 | |
46 | The first two steps are straightforward, both the new and the previous data are | |
47 | supplied by the driver registered to Net DIM. The previous data is the new data | |
48 | supplied to the previous iteration. The comparison step checks the difference | |
49 | between the new and previous data and decides on the result of the last step. | |
50 | A step would result as "better" if bandwidth increases and as "worse" if | |
51 | bandwidth reduces. If there is no change in bandwidth, the packet rate is | |
52 | compared in a similar fashion - increase == "better" and decrease == "worse". | |
53 | In case there is no change in the packet rate as well, the interrupt rate is | |
54 | compared. Here the algorithm tries to optimize for lower interrupt rate so an | |
55 | increase in the interrupt rate is considered "worse" and a decrease is | |
56 | considered "better". Step #2 has an optimization for avoiding false results: it | |
57 | only considers a difference between samples as valid if it is greater than a | |
58 | certain percentage. Also, since Net DIM does not measure anything by itself, it | |
59 | assumes the data provided by the driver is valid. | |
60 | ||
61 | Step #3 decides on the suggested configuration based on the result from step #2 | |
62 | and the internal state of the algorithm. The states reflect the "direction" of | |
63 | the algorithm: is it going left (reducing moderation), right (increasing | |
64 | moderation) or standing still. Another optimization is that if a decision | |
65 | to stay still is made multiple times, the interval between iterations of the | |
66 | algorithm would increase in order to reduce calculation overhead. Also, after | |
67 | "parking" on one of the most left or most right decisions, the algorithm may | |
68 | decide to verify this decision by taking a step in the other direction. This is | |
69 | done in order to avoid getting stuck in a "deep sleep" scenario. Once a | |
70 | decision is made, an interrupt moderation configuration is selected from | |
71 | the predefined profiles. | |
72 | ||
73 | The last step is to notify the registered driver that it should apply the | |
74 | suggested configuration. This is done by scheduling a work function, defined by | |
75 | the Net DIM API and provided by the registered driver. | |
76 | ||
77 | As you can see, Net DIM itself does not actively interact with the system. It | |
78 | would have trouble making the correct decisions if the wrong data is supplied to | |
79 | it and it would be useless if the work function would not apply the suggested | |
80 | configuration. This does, however, allow the registered driver some room for | |
81 | manoeuvre as it may provide partial data or ignore the algorithm suggestion | |
82 | under some conditions. | |
83 | ||
84 | ||
9b038086 JK |
85 | Registering a Network Device to DIM |
86 | =================================== | |
faf4db00 | 87 | |
9b038086 JK |
88 | Net DIM API exposes the main function net_dim(). |
89 | This function is the entry point to the Net | |
faf4db00 TG |
90 | DIM algorithm and has to be called every time the driver would like to check if |
91 | it should change interrupt moderation parameters. The driver should provide two | |
9b038086 JK |
92 | data structures: :c:type:`struct dim <dim>` and |
93 | :c:type:`struct dim_sample <dim_sample>`. :c:type:`struct dim <dim>` | |
faf4db00 TG |
94 | describes the state of DIM for a specific object (RX queue, TX queue, |
95 | other queues, etc.). This includes the current selected profile, previous data | |
96 | samples, the callback function provided by the driver and more. | |
9b038086 JK |
97 | :c:type:`struct dim_sample <dim_sample>` describes a data sample, |
98 | which will be compared to the data sample stored in :c:type:`struct dim <dim>` | |
99 | in order to decide on the algorithm's next | |
faf4db00 TG |
100 | step. The sample should include bytes, packets and interrupts, measured by |
101 | the driver. | |
102 | ||
103 | In order to use Net DIM from a networking driver, the driver needs to call the | |
104 | main net_dim() function. The recommended method is to call net_dim() on each | |
105 | interrupt. Since Net DIM has a built-in moderation and it might decide to skip | |
106 | iterations under certain conditions, there is no need to moderate the net_dim() | |
107 | calls as well. As mentioned above, the driver needs to provide an object of type | |
9b038086 JK |
108 | :c:type:`struct dim <dim>` to the net_dim() function call. It is advised for |
109 | each entity using Net DIM to hold a :c:type:`struct dim <dim>` as part of its | |
110 | data structure and use it as the main Net DIM API object. | |
111 | The :c:type:`struct dim_sample <dim_sample>` should hold the latest | |
faf4db00 TG |
112 | bytes, packets and interrupts count. No need to perform any calculations, just |
113 | include the raw data. | |
114 | ||
115 | The net_dim() call itself does not return anything. Instead Net DIM relies on | |
116 | the driver to provide a callback function, which is called when the algorithm | |
117 | decides to make a change in the interrupt moderation parameters. This callback | |
118 | will be scheduled and run in a separate thread in order not to add overhead to | |
119 | the data flow. After the work is done, Net DIM algorithm needs to be set to | |
120 | the proper state in order to move to the next iteration. | |
121 | ||
122 | ||
9b038086 JK |
123 | Example |
124 | ======= | |
faf4db00 TG |
125 | |
126 | The following code demonstrates how to register a driver to Net DIM. The actual | |
127 | usage is not complete but it should make the outline of the usage clear. | |
128 | ||
9b038086 | 129 | .. code-block:: c |
faf4db00 | 130 | |
9b038086 | 131 | #include <linux/dim.h> |
faf4db00 | 132 | |
9b038086 JK |
133 | /* Callback for net DIM to schedule on a decision to change moderation */ |
134 | void my_driver_do_dim_work(struct work_struct *work) | |
135 | { | |
2168da45 JK |
136 | /* Get struct dim from struct work_struct */ |
137 | struct dim *dim = container_of(work, struct dim, | |
138 | work); | |
faf4db00 TG |
139 | /* Do interrupt moderation related stuff */ |
140 | ... | |
141 | ||
142 | /* Signal net DIM work is done and it should move to next iteration */ | |
2168da45 | 143 | dim->state = DIM_START_MEASURE; |
9b038086 | 144 | } |
faf4db00 | 145 | |
9b038086 JK |
146 | /* My driver's interrupt handler */ |
147 | int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) | |
148 | { | |
faf4db00 TG |
149 | ... |
150 | /* A struct to hold current measured data */ | |
2168da45 | 151 | struct dim_sample dim_sample; |
faf4db00 TG |
152 | ... |
153 | /* Initiate data sample struct with current data */ | |
2168da45 JK |
154 | dim_update_sample(my_entity->events, |
155 | my_entity->packets, | |
156 | my_entity->bytes, | |
157 | &dim_sample); | |
faf4db00 TG |
158 | /* Call net DIM */ |
159 | net_dim(&my_entity->dim, dim_sample); | |
160 | ... | |
9b038086 | 161 | } |
faf4db00 | 162 | |
9b038086 JK |
163 | /* My entity's initialization function (my_entity was already allocated) */ |
164 | int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...) | |
165 | { | |
faf4db00 TG |
166 | ... |
167 | /* Initiate struct work_struct with my driver's callback function */ | |
168 | INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work); | |
169 | ... | |
9b038086 | 170 | } |