sched/numa: Do not swap tasks between nodes when spare capacity is available
authorMel Gorman <mgorman@techsingularity.net>
Fri, 20 May 2022 10:35:17 +0000 (11:35 +0100)
committerPeter Zijlstra <peterz@infradead.org>
Mon, 13 Jun 2022 08:29:59 +0000 (10:29 +0200)
If a destination node has spare capacity but there is an imbalance then
two tasks are selected for swapping. If the tasks have no numa group
or are within the same NUMA group, it's simply shuffling tasks around
without having any impact on the compute imbalance. Instead, it's just
punishing one task to help another.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20220520103519.1863-3-mgorman@techsingularity.net
kernel/sched/fair.c

index 51836efe5931d8d7dd5001f3d147421f8cc4012b..23da36c9cacb857713a7d22423001d6da43a50c5 100644 (file)
@@ -1790,6 +1790,15 @@ static bool task_numa_compare(struct task_numa_env *env,
         */
        cur_ng = rcu_dereference(cur->numa_group);
        if (cur_ng == p_ng) {
+               /*
+                * Do not swap within a group or between tasks that have
+                * no group if there is spare capacity. Swapping does
+                * not address the load imbalance and helps one task at
+                * the cost of punishing another.
+                */
+               if (env->dst_stats.node_type == node_has_spare)
+                       goto unlock;
+
                imp = taskimp + task_weight(cur, env->src_nid, dist) -
                      task_weight(cur, env->dst_nid, dist);
                /*