git.kernel.dk Git - linux-block.git/commit

author	Peter Zijlstra <peterz@infradead.org>
	Fri, 23 May 2025 15:28:00 +0000 (17:28 +0200)
committer	Peter Zijlstra <peterz@infradead.org>
	Wed, 9 Jul 2025 11:40:21 +0000 (13:40 +0200)
commit	570c8efd5eb79c3725ba439ce105ed1bedc5acd9
tree	958350a66f19788cccdee4fb57327be3a5b5a15b	tree
parent	155213a2aed42c85361bf4f5c817f5cb68951c3b	commit \| diff

sched/psi: Optimize psi_group_change() cpu_clock() usage

Dietmar reported that commit 3840cbe24cf0 ("sched: psi: fix bogus
pressure spikes from aggregation race") caused a regression for him on
a high context switch rate benchmark (schbench) due to the now
repeating cpu_clock() calls.

In particular the problem is that get_recent_times() will extrapolate
the current state to 'now'. But if an update uses a timestamp from
before the start of the update, it is possible to get two reads
with inconsistent results. It is effectively back-dating an update.

(note that this all hard-relies on the clock being synchronized across
CPUs -- if this is not the case, all bets are off).

Combine this problem with the fact that there are per-group-per-cpu
seqcounts, the commit in question pushed the clock read into the group
iteration, causing tree-depth cpu_clock() calls. On architectures
where cpu_clock() has appreciable overhead, this hurts.

Instead move to a per-cpu seqcount, which allows us to have a single
clock read for all group updates, increasing internal consistency and
lowering update overhead. This comes at the cost of a longer update
side (proportional to the tree depth) which can cause the read side to
retry more often.

Fixes: 3840cbe24cf0 ("sched: psi: fix bogus pressure spikes from aggregation race")
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>,
Link: https://lkml.kernel.org/20250522084844.GC31726@noisy.programming.kicks-ass.net

include/linux/psi_types.h		diff \| blob \| blame \| history
kernel/sched/psi.c		diff \| blob \| blame \| history