rcu/nocb: Fix rcuog wake-up from offline softirq
authorFrederic Weisbecker <frederic@kernel.org>
Thu, 10 Oct 2024 16:36:09 +0000 (18:36 +0200)
committerNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
Thu, 10 Oct 2024 16:48:19 +0000 (22:18 +0530)
After a CPU has set itself offline and before it eventually calls
rcutree_report_cpu_dead(), there are still opportunities for callbacks
to be enqueued, for example from a softirq. When that happens on NOCB,
the rcuog wake-up is deferred through an IPI to an online CPU in order
not to call into the scheduler and risk arming the RT-bandwidth after
hrtimers have been migrated out and disabled.

But performing a synchronized IPI from a softirq is buggy as reported in
the following scenario:

        WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
        Modules linked in: rcutorture torture
        CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
        Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
        RIP: 0010:smp_call_function_single
        <IRQ>
        swake_up_one_online
        __call_rcu_nocb_wake
        __call_rcu_common
        ? rcu_torture_one_read
        call_timer_fn
        __run_timers
        run_timer_softirq
        handle_softirqs
        irq_exit_rcu
        ? tick_handle_periodic
        sysvec_apic_timer_interrupt
        </IRQ>

Fix this with forcing deferred rcuog wake up through the NOCB timer when
the CPU is offline. The actual wake up will happen from
rcutree_report_cpu_dead().

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com
Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU")
Reviewed-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
kernel/rcu/tree_nocb.h

index 97b99cd0692323eda39013951ed4e01eb9382a11..16865475120ba38c741aae897a3dc8d99f95d0ee 100644 (file)
@@ -554,13 +554,19 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
                        rcu_nocb_unlock(rdp);
                        wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
                                           TPS("WakeLazy"));
-               } else if (!irqs_disabled_flags(flags)) {
+               } else if (!irqs_disabled_flags(flags) && cpu_online(rdp->cpu)) {
                        /* ... if queue was empty ... */
                        rcu_nocb_unlock(rdp);
                        wake_nocb_gp(rdp, false);
                        trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
                                            TPS("WakeEmpty"));
                } else {
+                       /*
+                        * Don't do the wake-up upfront on fragile paths.
+                        * Also offline CPUs can't call swake_up_one_online() from
+                        * (soft-)IRQs. Rely on the final deferred wake-up from
+                        * rcutree_report_cpu_dead()
+                        */
                        rcu_nocb_unlock(rdp);
                        wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
                                           TPS("WakeEmptyIsDeferred"));