rcu: Enable rcu_normal_wake_from_gp on small systems
Automatically enable the rcu_normal_wake_from_gp parameter on
systems with a small number of CPUs. The activation threshold
is set to 16 CPUs.
This helps to reduce a latency of normal synchronize_rcu() API
by waking up GP-waiters earlier and decoupling synchronize_rcu()
callers from regular callback handling.
A benchmark running 64 parallel jobs(system with 64 CPUs) invoking
synchronize_rcu() demonstrates a notable latency reduction with the
setting enabled.
Latency distribution (microseconds):
<default>
0 - 9999 : 1
10000 - 19999 : 4
20000 - 29999 : 399
30000 - 39999 : 3197
40000 - 49999 : 10428
50000 - 59999 : 17363
60000 - 69999 : 15529
70000 - 79999 : 9287
80000 - 89999 : 4249
90000 - 99999 : 1915
100000 - 109999 : 922
110000 - 119999 : 390
120000 - 129999 : 187
...
<default>
<rcu_normal_wake_from_gp>
0 - 9999 : 1
10000 - 19999 : 234
20000 - 29999 : 6678
30000 - 39999 : 33463
40000 - 49999 : 20669
50000 - 59999 : 2766
60000 - 69999 : 183
...
<rcu_normal_wake_from_gp>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>