Commit | Line | Data |
---|---|---|
7b7570ad LL |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ======================= | |
4 | Energy Model of devices | |
5 | ======================= | |
1017b48c QP |
6 | |
7 | 1. Overview | |
8 | ----------- | |
9 | ||
10 | The Energy Model (EM) framework serves as an interface between drivers knowing | |
7b7570ad | 11 | the power consumed by devices at various performance levels, and the kernel |
1017b48c QP |
12 | subsystems willing to use that information to make energy-aware decisions. |
13 | ||
7b7570ad | 14 | The source of the information about the power consumed by devices can vary greatly |
1017b48c QP |
15 | from one platform to another. These power costs can be estimated using |
16 | devicetree data in some cases. In others, the firmware will know better. | |
17 | Alternatively, userspace might be best positioned. And so on. In order to avoid | |
18 | each and every client subsystem to re-implement support for each and every | |
19 | possible source of information on its own, the EM framework intervenes as an | |
20 | abstraction layer which standardizes the format of power cost tables in the | |
21 | kernel, hence enabling to avoid redundant work. | |
22 | ||
c5d39fae | 23 | The power values might be expressed in micro-Watts or in an 'abstract scale'. |
5a64f775 LL |
24 | Multiple subsystems might use the EM and it is up to the system integrator to |
25 | check that the requirements for the power value scale types are met. An example | |
26 | can be found in the Energy-Aware Scheduler documentation | |
27 | Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or | |
28 | powercap power values expressed in an 'abstract scale' might cause issues. | |
29 | These subsystems are more interested in estimation of power used in the past, | |
c5d39fae | 30 | thus the real micro-Watts might be needed. An example of these requirements can |
5a64f775 LL |
31 | be found in the Intelligent Power Allocation in |
32 | Documentation/driver-api/thermal/power_allocator.rst. | |
b56a352c LL |
33 | Kernel subsystems might implement automatic detection to check whether EM |
34 | registered devices have inconsistent scale (based on EM internal flag). | |
5a64f775 | 35 | Important thing to keep in mind is that when the power values are expressed in |
c5d39fae | 36 | an 'abstract scale' deriving real energy in micro-Joules would not be possible. |
5a64f775 | 37 | |
1017b48c QP |
38 | The figure below depicts an example of drivers (Arm-specific here, but the |
39 | approach is applicable to any architecture) providing power costs to the EM | |
151f4e2b | 40 | framework, and interested clients reading the data from it:: |
1017b48c QP |
41 | |
42 | +---------------+ +-----------------+ +---------------+ | |
43 | | Thermal (IPA) | | Scheduler (EAS) | | Other | | |
44 | +---------------+ +-----------------+ +---------------+ | |
7b7570ad | 45 | | | em_cpu_energy() | |
1017b48c QP |
46 | | | em_cpu_get() | |
47 | +---------+ | +---------+ | |
48 | | | | | |
49 | v v v | |
50 | +---------------------+ | |
51 | | Energy Model | | |
52 | | Framework | | |
53 | +---------------------+ | |
54 | ^ ^ ^ | |
7b7570ad | 55 | | | | em_dev_register_perf_domain() |
1017b48c QP |
56 | +----------+ | +---------+ |
57 | | | | | |
58 | +---------------+ +---------------+ +--------------+ | |
59 | | cpufreq-dt | | arm_scmi | | Other | | |
60 | +---------------+ +---------------+ +--------------+ | |
61 | ^ ^ ^ | |
62 | | | | | |
63 | +--------------+ +---------------+ +--------------+ | |
64 | | Device Tree | | Firmware | | ? | | |
65 | +--------------+ +---------------+ +--------------+ | |
66 | ||
7b7570ad LL |
67 | In case of CPU devices the EM framework manages power cost tables per |
68 | 'performance domain' in the system. A performance domain is a group of CPUs | |
69 | whose performance is scaled together. Performance domains generally have a | |
70 | 1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are | |
71 | required to have the same micro-architecture. CPUs in different performance | |
72 | domains can have different micro-architectures. | |
1017b48c QP |
73 | |
74 | ||
75 | 2. Core APIs | |
76 | ------------ | |
77 | ||
151f4e2b MCC |
78 | 2.1 Config options |
79 | ^^^^^^^^^^^^^^^^^^ | |
1017b48c QP |
80 | |
81 | CONFIG_ENERGY_MODEL must be enabled to use the EM framework. | |
82 | ||
83 | ||
151f4e2b MCC |
84 | 2.2 Registration of performance domains |
85 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1017b48c | 86 | |
08374410 LL |
87 | Registration of 'advanced' EM |
88 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
89 | ||
90 | The 'advanced' EM gets it's name due to the fact that the driver is allowed | |
91 | to provide more precised power model. It's not limited to some implemented math | |
92 | formula in the framework (like it's in 'simple' EM case). It can better reflect | |
93 | the real power measurements performed for each performance state. Thus, this | |
94 | registration method should be preferred in case considering EM static power | |
95 | (leakage) is important. | |
96 | ||
1017b48c | 97 | Drivers are expected to register performance domains into the EM framework by |
151f4e2b | 98 | calling the following API:: |
1017b48c | 99 | |
7b7570ad | 100 | int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, |
c5d39fae | 101 | struct em_data_callback *cb, cpumask_t *cpus, bool microwatts); |
1017b48c | 102 | |
7b7570ad LL |
103 | Drivers must provide a callback function returning <frequency, power> tuples |
104 | for each performance state. The callback function provided by the driver is free | |
1017b48c | 105 | to fetch data from any relevant location (DT, firmware, ...), and by any mean |
7b7570ad LL |
106 | deemed necessary. Only for CPU devices, drivers must specify the CPUs of the |
107 | performance domains using cpumask. For other devices than CPUs the last | |
108 | argument must be set to NULL. | |
c5d39fae | 109 | The last argument 'microwatts' is important to set with correct value. Kernel |
b56a352c LL |
110 | subsystems which use EM might rely on this flag to check if all EM devices use |
111 | the same scale. If there are different scales, these subsystems might decide | |
c5d39fae | 112 | to return warning/error, stop working or panic. |
7b7570ad | 113 | See Section 3. for an example of driver implementing this |
d62aab8f | 114 | callback, or Section 2.4 for further documentation on this API |
1017b48c | 115 | |
f48a0c47 LL |
116 | Registration of EM using DT |
117 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
118 | ||
119 | The EM can also be registered using OPP framework and information in DT | |
120 | "operating-points-v2". Each OPP entry in DT can be extended with a property | |
121 | "opp-microwatt" containing micro-Watts power value. This OPP DT property | |
122 | allows a platform to register EM power values which are reflecting total power | |
123 | (static + dynamic). These power values might be coming directly from | |
124 | experiments and measurements. | |
125 | ||
015f569c LL |
126 | Registration of 'artificial' EM |
127 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
128 | ||
129 | There is an option to provide a custom callback for drivers missing detailed | |
130 | knowledge about power value for each performance state. The callback | |
131 | .get_cost() is optional and provides the 'cost' values used by the EAS. | |
132 | This is useful for platforms that only provide information on relative | |
133 | efficiency between CPU types, where one could use the information to | |
134 | create an abstract power model. But even an abstract power model can | |
135 | sometimes be hard to fit in, given the input power value size restrictions. | |
136 | The .get_cost() allows to provide the 'cost' values which reflect the | |
137 | efficiency of the CPUs. This would allow to provide EAS information which | |
138 | has different relation than what would be forced by the EM internal | |
139 | formulas calculating 'cost' values. To register an EM for such platform, the | |
c5d39fae | 140 | driver must set the flag 'microwatts' to 0, provide .get_power() callback |
015f569c LL |
141 | and provide .get_cost() callback. The EM framework would handle such platform |
142 | properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such | |
143 | platform. Special care should be taken by other frameworks which are using EM | |
144 | to test and treat this flag properly. | |
145 | ||
08374410 LL |
146 | Registration of 'simple' EM |
147 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
148 | ||
149 | The 'simple' EM is registered using the framework helper function | |
150 | cpufreq_register_em_with_opp(). It implements a power model which is tight to | |
151 | math formula:: | |
152 | ||
153 | Power = C * V^2 * f | |
154 | ||
155 | The EM which is registered using this method might not reflect correctly the | |
156 | physics of a real device, e.g. when static power (leakage) is important. | |
157 | ||
1017b48c | 158 | |
151f4e2b MCC |
159 | 2.3 Accessing performance domains |
160 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1017b48c | 161 | |
7b7570ad LL |
162 | There are two API functions which provide the access to the energy model: |
163 | em_cpu_get() which takes CPU id as an argument and em_pd_get() with device | |
164 | pointer as an argument. It depends on the subsystem which interface it is | |
165 | going to use, but in case of CPU devices both functions return the same | |
166 | performance domain. | |
167 | ||
1017b48c QP |
168 | Subsystems interested in the energy model of a CPU can retrieve it using the |
169 | em_cpu_get() API. The energy model tables are allocated once upon creation of | |
170 | the performance domains, and kept in memory untouched. | |
171 | ||
172 | The energy consumed by a performance domain can be estimated using the | |
7b7570ad LL |
173 | em_cpu_energy() API. The estimation is performed assuming that the schedutil |
174 | CPUfreq governor is in use in case of CPU device. Currently this calculation is | |
175 | not provided for other type of devices. | |
1017b48c | 176 | |
d62aab8f LL |
177 | More details about the above APIs can be found in ``<linux/energy_model.h>`` |
178 | or in Section 2.4 | |
179 | ||
180 | ||
181 | 2.4 Description details of this API | |
182 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
183 | .. kernel-doc:: include/linux/energy_model.h | |
184 | :internal: | |
185 | ||
186 | .. kernel-doc:: kernel/power/energy_model.c | |
187 | :export: | |
1017b48c QP |
188 | |
189 | ||
190 | 3. Example driver | |
191 | ----------------- | |
192 | ||
d704aa0d LL |
193 | The CPUFreq framework supports dedicated callback for registering |
194 | the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em(). | |
195 | That callback has to be implemented properly for a given driver, | |
196 | because the framework would call it at the right time during setup. | |
1017b48c QP |
197 | This section provides a simple example of a CPUFreq driver registering a |
198 | performance domain in the Energy Model framework using the (fake) 'foo' | |
199 | protocol. The driver implements an est_power() function to be provided to the | |
151f4e2b MCC |
200 | EM framework:: |
201 | ||
202 | -> drivers/cpufreq/foo_cpufreq.c | |
203 | ||
75a3a99a LL |
204 | 01 static int est_power(struct device *dev, unsigned long *mW, |
205 | 02 unsigned long *KHz) | |
7b7570ad LL |
206 | 03 { |
207 | 04 long freq, power; | |
208 | 05 | |
209 | 06 /* Use the 'foo' protocol to ceil the frequency */ | |
210 | 07 freq = foo_get_freq_ceil(dev, *KHz); | |
211 | 08 if (freq < 0); | |
212 | 09 return freq; | |
213 | 10 | |
214 | 11 /* Estimate the power cost for the dev at the relevant freq. */ | |
215 | 12 power = foo_estimate_power(dev, freq); | |
216 | 13 if (power < 0); | |
217 | 14 return power; | |
218 | 15 | |
219 | 16 /* Return the values to the EM framework */ | |
220 | 17 *mW = power; | |
221 | 18 *KHz = freq; | |
222 | 19 | |
223 | 20 return 0; | |
224 | 21 } | |
225 | 22 | |
d704aa0d | 226 | 23 static void foo_cpufreq_register_em(struct cpufreq_policy *policy) |
7b7570ad LL |
227 | 24 { |
228 | 25 struct em_data_callback em_cb = EM_DATA_CB(est_power); | |
229 | 26 struct device *cpu_dev; | |
d704aa0d | 230 | 27 int nr_opp; |
7b7570ad LL |
231 | 28 |
232 | 29 cpu_dev = get_cpu_device(cpumask_first(policy->cpus)); | |
233 | 30 | |
d704aa0d LL |
234 | 31 /* Find the number of OPPs for this policy */ |
235 | 32 nr_opp = foo_get_nr_opp(policy); | |
236 | 33 | |
237 | 34 /* And register the new performance domain */ | |
238 | 35 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus, | |
239 | 36 true); | |
240 | 37 } | |
7b7570ad | 241 | 38 |
d704aa0d LL |
242 | 39 static struct cpufreq_driver foo_cpufreq_driver = { |
243 | 40 .register_em = foo_cpufreq_register_em, | |
244 | 41 }; |