sched_ext: Improve comment on idle_sched_class exception in scx_task_iter_next_locked()
authorTejun Heo <tj@kernel.org>
Tue, 6 Aug 2024 19:40:11 +0000 (09:40 -1000)
committerTejun Heo <tj@kernel.org>
Tue, 6 Aug 2024 19:40:11 +0000 (09:40 -1000)
scx_task_iter_next_locked() skips tasks whose sched_class is
idle_sched_class. While it has a short comment explaining why it's testing
the sched_class directly isntead of using is_idle_task(), the comment
doesn't sufficiently explain what's going on and why. Improve the comment.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: David Vernet <void@manifault.com>
kernel/sched/ext.c

index 09f394bb4889dce3bdcebbb6c9b76245ec672f7a..7837a551022c47c004d9cddc46609a6bdd39d8d6 100644 (file)
@@ -1252,8 +1252,29 @@ retry:
 
        while ((p = scx_task_iter_next(iter))) {
                /*
-                * is_idle_task() tests %PF_IDLE which may not be set for CPUs
-                * which haven't yet been onlined. Test sched_class directly.
+                * scx_task_iter is used to prepare and move tasks into SCX
+                * while loading the BPF scheduler and vice-versa while
+                * unloading. The init_tasks ("swappers") should be excluded
+                * from the iteration because:
+                *
+                * - It's unsafe to use __setschduler_prio() on an init_task to
+                *   determine the sched_class to use as it won't preserve its
+                *   idle_sched_class.
+                *
+                * - ops.init/exit_task() can easily be confused if called with
+                *   init_tasks as they, e.g., share PID 0.
+                *
+                * As init_tasks are never scheduled through SCX, they can be
+                * skipped safely. Note that is_idle_task() which tests %PF_IDLE
+                * doesn't work here:
+                *
+                * - %PF_IDLE may not be set for an init_task whose CPU hasn't
+                *   yet been onlined.
+                *
+                * - %PF_IDLE can be set on tasks that are not init_tasks. See
+                *   play_idle_precise() used by CONFIG_IDLE_INJECT.
+                *
+                * Test for idle_sched_class as only init_tasks are on it.
                 */
                if (p->sched_class != &idle_sched_class)
                        break;