Commit | Line | Data |
---|---|---|
faf4db00 TG |
1 | Net DIM - Generic Network Dynamic Interrupt Moderation |
2 | ====================================================== | |
3 | ||
4 | Author: | |
5 | Tal Gilboa <talgi@mellanox.com> | |
6 | ||
7 | ||
8 | Contents | |
9 | ========= | |
10 | ||
11 | - Assumptions | |
12 | - Introduction | |
13 | - The Net DIM Algorithm | |
14 | - Registering a Network Device to DIM | |
15 | - Example | |
16 | ||
17 | Part 0: Assumptions | |
18 | ====================== | |
19 | ||
20 | This document assumes the reader has basic knowledge in network drivers | |
21 | and in general interrupt moderation. | |
22 | ||
23 | ||
24 | Part I: Introduction | |
25 | ====================== | |
26 | ||
27 | Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the | |
28 | interrupt moderation configuration of a channel in order to optimize packet | |
29 | processing. The mechanism includes an algorithm which decides if and how to | |
30 | change moderation parameters for a channel, usually by performing an analysis on | |
31 | runtime data sampled from the system. Net DIM is such a mechanism. In each | |
32 | iteration of the algorithm, it analyses a given sample of the data, compares it | |
33 | to the previous sample and if required, it can decide to change some of the | |
34 | interrupt moderation configuration fields. The data sample is composed of data | |
35 | bandwidth, the number of packets and the number of events. The time between | |
36 | samples is also measured. Net DIM compares the current and the previous data and | |
37 | returns an adjusted interrupt moderation configuration object. In some cases, | |
38 | the algorithm might decide not to change anything. The configuration fields are | |
39 | the minimum duration (microseconds) allowed between events and the maximum | |
40 | number of wanted packets per event. The Net DIM algorithm ascribes importance to | |
41 | increase bandwidth over reducing interrupt rate. | |
42 | ||
43 | ||
44 | Part II: The Net DIM Algorithm | |
45 | =============================== | |
46 | ||
47 | Each iteration of the Net DIM algorithm follows these steps: | |
48 | 1. Calculates new data sample. | |
49 | 2. Compares it to previous sample. | |
50 | 3. Makes a decision - suggests interrupt moderation configuration fields. | |
51 | 4. Applies a schedule work function, which applies suggested configuration. | |
52 | ||
53 | The first two steps are straightforward, both the new and the previous data are | |
54 | supplied by the driver registered to Net DIM. The previous data is the new data | |
55 | supplied to the previous iteration. The comparison step checks the difference | |
56 | between the new and previous data and decides on the result of the last step. | |
57 | A step would result as "better" if bandwidth increases and as "worse" if | |
58 | bandwidth reduces. If there is no change in bandwidth, the packet rate is | |
59 | compared in a similar fashion - increase == "better" and decrease == "worse". | |
60 | In case there is no change in the packet rate as well, the interrupt rate is | |
61 | compared. Here the algorithm tries to optimize for lower interrupt rate so an | |
62 | increase in the interrupt rate is considered "worse" and a decrease is | |
63 | considered "better". Step #2 has an optimization for avoiding false results: it | |
64 | only considers a difference between samples as valid if it is greater than a | |
65 | certain percentage. Also, since Net DIM does not measure anything by itself, it | |
66 | assumes the data provided by the driver is valid. | |
67 | ||
68 | Step #3 decides on the suggested configuration based on the result from step #2 | |
69 | and the internal state of the algorithm. The states reflect the "direction" of | |
70 | the algorithm: is it going left (reducing moderation), right (increasing | |
71 | moderation) or standing still. Another optimization is that if a decision | |
72 | to stay still is made multiple times, the interval between iterations of the | |
73 | algorithm would increase in order to reduce calculation overhead. Also, after | |
74 | "parking" on one of the most left or most right decisions, the algorithm may | |
75 | decide to verify this decision by taking a step in the other direction. This is | |
76 | done in order to avoid getting stuck in a "deep sleep" scenario. Once a | |
77 | decision is made, an interrupt moderation configuration is selected from | |
78 | the predefined profiles. | |
79 | ||
80 | The last step is to notify the registered driver that it should apply the | |
81 | suggested configuration. This is done by scheduling a work function, defined by | |
82 | the Net DIM API and provided by the registered driver. | |
83 | ||
84 | As you can see, Net DIM itself does not actively interact with the system. It | |
85 | would have trouble making the correct decisions if the wrong data is supplied to | |
86 | it and it would be useless if the work function would not apply the suggested | |
87 | configuration. This does, however, allow the registered driver some room for | |
88 | manoeuvre as it may provide partial data or ignore the algorithm suggestion | |
89 | under some conditions. | |
90 | ||
91 | ||
92 | Part III: Registering a Network Device to DIM | |
93 | ============================================== | |
94 | ||
95 | Net DIM API exposes the main function net_dim(struct net_dim *dim, | |
96 | struct net_dim_sample end_sample). This function is the entry point to the Net | |
97 | DIM algorithm and has to be called every time the driver would like to check if | |
98 | it should change interrupt moderation parameters. The driver should provide two | |
99 | data structures: struct net_dim and struct net_dim_sample. Struct net_dim | |
100 | describes the state of DIM for a specific object (RX queue, TX queue, | |
101 | other queues, etc.). This includes the current selected profile, previous data | |
102 | samples, the callback function provided by the driver and more. | |
103 | Struct net_dim_sample describes a data sample, which will be compared to the | |
104 | data sample stored in struct net_dim in order to decide on the algorithm's next | |
105 | step. The sample should include bytes, packets and interrupts, measured by | |
106 | the driver. | |
107 | ||
108 | In order to use Net DIM from a networking driver, the driver needs to call the | |
109 | main net_dim() function. The recommended method is to call net_dim() on each | |
110 | interrupt. Since Net DIM has a built-in moderation and it might decide to skip | |
111 | iterations under certain conditions, there is no need to moderate the net_dim() | |
112 | calls as well. As mentioned above, the driver needs to provide an object of type | |
113 | struct net_dim to the net_dim() function call. It is advised for each entity | |
114 | using Net DIM to hold a struct net_dim as part of its data structure and use it | |
115 | as the main Net DIM API object. The struct net_dim_sample should hold the latest | |
116 | bytes, packets and interrupts count. No need to perform any calculations, just | |
117 | include the raw data. | |
118 | ||
119 | The net_dim() call itself does not return anything. Instead Net DIM relies on | |
120 | the driver to provide a callback function, which is called when the algorithm | |
121 | decides to make a change in the interrupt moderation parameters. This callback | |
122 | will be scheduled and run in a separate thread in order not to add overhead to | |
123 | the data flow. After the work is done, Net DIM algorithm needs to be set to | |
124 | the proper state in order to move to the next iteration. | |
125 | ||
126 | ||
127 | Part IV: Example | |
128 | ================= | |
129 | ||
130 | The following code demonstrates how to register a driver to Net DIM. The actual | |
131 | usage is not complete but it should make the outline of the usage clear. | |
132 | ||
133 | my_driver.c: | |
134 | ||
135 | #include <linux/net_dim.h> | |
136 | ||
137 | /* Callback for net DIM to schedule on a decision to change moderation */ | |
138 | void my_driver_do_dim_work(struct work_struct *work) | |
139 | { | |
140 | /* Get struct net_dim from struct work_struct */ | |
141 | struct net_dim *dim = container_of(work, struct net_dim, | |
142 | work); | |
143 | /* Do interrupt moderation related stuff */ | |
144 | ... | |
145 | ||
146 | /* Signal net DIM work is done and it should move to next iteration */ | |
147 | dim->state = NET_DIM_START_MEASURE; | |
148 | } | |
149 | ||
150 | /* My driver's interrupt handler */ | |
151 | int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) | |
152 | { | |
153 | ... | |
154 | /* A struct to hold current measured data */ | |
155 | struct net_dim_sample dim_sample; | |
156 | ... | |
157 | /* Initiate data sample struct with current data */ | |
158 | net_dim_sample(my_entity->events, | |
159 | my_entity->packets, | |
160 | my_entity->bytes, | |
161 | &dim_sample); | |
162 | /* Call net DIM */ | |
163 | net_dim(&my_entity->dim, dim_sample); | |
164 | ... | |
165 | } | |
166 | ||
167 | /* My entity's initialization function (my_entity was already allocated) */ | |
168 | int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...) | |
169 | { | |
170 | ... | |
171 | /* Initiate struct work_struct with my driver's callback function */ | |
172 | INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work); | |
173 | ... | |
174 | } |