sched/fair: Fix warning in bandwidth distribution
authorJosh Don <joshdon@google.com>
Fri, 22 Sep 2023 23:05:35 +0000 (16:05 -0700)
committerIngo Molnar <mingo@kernel.org>
Sun, 24 Sep 2023 10:08:29 +0000 (12:08 +0200)
commit2f8c62296b6f656bbfd17e9f1fadd7478003a9d9
tree0e8239cc8dcdbaf23107992ab5a017e5f0f2af7e
parent30797bce8ef0c73f0c388148ffac92458533b10e
sched/fair: Fix warning in bandwidth distribution

We've observed the following warning being hit in
distribute_cfs_runtime():

SCHED_WARN_ON(cfs_rq->runtime_remaining > 0)

We have the following race:

 - CPU 0: running bandwidth distribution (distribute_cfs_runtime).
   Inspects the local cfs_rq and makes its runtime_remaining positive.
   However, we defer unthrottling the local cfs_rq until after
   considering all remote cfs_rq's.

 - CPU 1: starts running bandwidth distribution from the slack timer. When
   it finds the cfs_rq for CPU 0 on the throttled list, it observers the
   that the cfs_rq is throttled, yet is not on the CSD list, and has a
   positive runtime_remaining, thus triggering the warning in
   distribute_cfs_runtime.

To fix this, we can rework the local unthrottling logic to put the local
cfs_rq on a local list, so that any future bandwidth distributions will
realize that the cfs_rq is about to be unthrottled.

Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230922230535.296350-2-joshdon@google.com
kernel/sched/fair.c