sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable()
authorTejun Heo <tj@kernel.org>
Fri, 27 Sep 2024 20:02:40 +0000 (10:02 -1000)
committerTejun Heo <tj@kernel.org>
Fri, 27 Sep 2024 20:02:40 +0000 (10:02 -1000)
scx_ops_enable() has two task iteration loops. The first one calls
scx_ops_init_task() on every task and the latter switches the eligible ones
into SCX. The first loop left the tasks in SCX_TASK_INIT state and then the
second loop switched it into READY before switching the task into SCX.

The distinction between INIT and READY is only meaningful in the fork path
where it's used to tell whether the task finished forking so that we can
tell ops.exit_task() accordingly. Leaving task in INIT state between the two
loops is incosistent with the fork path and incorrect. The following can be
triggered by running a program which keeps toggling a task between
SCHED_OTHER and SCHED_SCX while enabling a task:

  sched_ext: Invalid task state transition 1 -> 3 for fish[1526]
  WARNING: CPU: 2 PID: 1615 at kernel/sched/ext.c:3393 scx_ops_enable_task+0x1a1/0x200
  ...
  Sched_ext: qmap (enabling+all)
  RIP: 0010:scx_ops_enable_task+0x1a1/0x200
  ...
   switching_to_scx+0x13/0xa0
   __sched_setscheduler+0x850/0xa50
   do_sched_setscheduler+0x104/0x1c0
   __x64_sys_sched_setscheduler+0x18/0x30
   do_syscall_64+0x7b/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fix it by transitioning to READY in the first loop right after
scx_ops_init_task() succeeds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
kernel/sched/ext.c

index 00883f3ef6472b25c31cbbb7986ea3060955fd01..e83af19de59d3248a97969b7638304847ede92bf 100644 (file)
@@ -5166,6 +5166,8 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
                        goto err_disable_unlock_all;
                }
 
+               scx_set_task_state(p, SCX_TASK_READY);
+
                put_task_struct(p);
                spin_lock_irq(&scx_tasks_lock);
        }
@@ -5178,7 +5180,7 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
        WRITE_ONCE(scx_switching_all, !(ops->flags & SCX_OPS_SWITCH_PARTIAL));
 
        /*
-        * We're fully committed and can't fail. The PREPPED -> ENABLED
+        * We're fully committed and can't fail. The task READY -> ENABLED
         * transitions here are synchronized against sched_ext_free() through
         * scx_tasks_lock.
         */
@@ -5190,7 +5192,6 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
 
                sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx);
 
-               scx_set_task_state(p, SCX_TASK_READY);
                __setscheduler_prio(p, p->prio);
                check_class_changing(task_rq(p), p, old_class);