intel_pstate: Avoid getting stuck in high P-states when idle
authorRafael J. Wysocki <rafael.j.wysocki@intel.com>
Sun, 10 Apr 2016 03:59:10 +0000 (05:59 +0200)
committerRafael J. Wysocki <rafael.j.wysocki@intel.com>
Sun, 10 Apr 2016 03:59:10 +0000 (05:59 +0200)
Jörg Otte reports that commit a4675fbc4a7a (cpufreq: intel_pstate:
Replace timers with utilization update callbacks) caused the CPUs in
his Haswell-based system to stay in the very high frequency region
even if the system is completely idle.

That turns out to be an existing problem in the intel_pstate driver's
P-state selection algorithm for Core processors.  Namely, all
decisions made by that algorithm are based on the average frequency
of the CPU between sampling events and on the P-state requested on
the last invocation, so it may get stuck at a very hight frequency
even if the utilization of the CPU is very low (in fact, it may get
stuck in a inadequate P-state regardless of the CPU utilization).
The only way to kick it out of that limbo is a sufficiently long idle
period (3 times longer than the prescribed sampling interval), but if
that doesn't happen often enough (eg. due to a timing change like
after the above commit), the P-state of the CPU may be inadequate
pretty much all the time.

To address the most egregious manifestations of that issue, reset the
core_busy value used to determine the next P-state to request if the
utilization of the CPU, determined with the help of the MPERF
feedback register and the TSC, is below 1%.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
drivers/cpufreq/intel_pstate.c

index 8b5a415ee14a53d21e9358e2b6707df852e827b8..30fe323c4551b4628de4c5878500d29eab8d57ca 100644 (file)
@@ -1130,6 +1130,10 @@ static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
                sample_ratio = div_fp(int_tofp(pid_params.sample_rate_ns),
                                      int_tofp(duration_ns));
                core_busy = mul_fp(core_busy, sample_ratio);
+       } else {
+               sample_ratio = div_fp(100 * cpu->sample.mperf, cpu->sample.tsc);
+               if (sample_ratio < int_tofp(1))
+                       core_busy = 0;
        }
 
        cpu->sample.busy_scaled = core_busy;