perf: Timehist account sch delay for scheduled out running
authorFernand Sieber <sieberf@amazon.com>
Tue, 18 Jun 2024 09:03:39 +0000 (11:03 +0200)
committerNamhyung Kim <namhyung@kernel.org>
Tue, 25 Jun 2024 18:06:20 +0000 (11:06 -0700)
When using perf timehist, sch delay is only computed for a waking task,
not for a pre empted task. This patches changes sch delay to account for
both. This makes sense as testing scheduling policy need to consider the
effect of scheduling delay globally, not only for waking tasks.

Example of `perf timehist` report before the patch for `stress` task
competing with each other.

First column is wait time, second column sch delay, third column
runtime.

1.492060 [0000]  s    stress[81]                          1.999      0.000      2.000      R  next: stress[83]
1.494060 [0000]  s    stress[83]                          2.000      0.000      2.000      R  next: stress[81]
1.496060 [0000]  s    stress[81]                          2.000      0.000      2.000      R  next: stress[83]
1.498060 [0000]  s    stress[83]                          2.000      0.000      1.999      R  next: stress[81]

After the patch, it looks like this (note that all wait time is not zero
anymore):

1.492060 [0000]  s    stress[81]                          1.999      1.999      2.000      R  next: stress[83]
1.494060 [0000]  s    stress[83]                          2.000      2.000      2.000      R  next: stress[81]
1.496060 [0000]  s    stress[81]                          2.000      2.000      2.000      R  next: stress[83]
1.498060 [0000]  s    stress[83]                          2.000      2.000      1.999      R  next: stress[81]

Signed-off-by: Fernand Sieber <sieberf@amazon.com>
Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240618090339.87482-1-sieberf@amazon.com
tools/perf/Documentation/perf-sched.txt
tools/perf/builtin-sched.c

index a216d2991b191083c03677738dfb8eece2b2c0ff..74c812f7a4a4a02a397b57586d91dc463bdacd0e 100644 (file)
@@ -64,8 +64,8 @@ There are several variants of 'perf sched':
     
    By default it shows the individual schedule events, including the wait
    time (time between sched-out and next sched-in events for the task), the
-   task scheduling delay (time between wakeup and actually running) and run
-   time for the task:
+   task scheduling delay (time between runnable and actually running) and
+   run time for the task:
     
                 time    cpu  task name             wait time  sch delay   run time
                              [tid/pid]                (msec)     (msec)     (msec)
index 8cdf18139a7e6b9d784ea91b5da108ee7b772881..aa59f763ca46e427b3b23827fbd981c1bbe5519a 100644 (file)
@@ -2659,7 +2659,10 @@ out:
                tr->last_state = state;
 
                /* sched out event for task so reset ready to run time */
-               tr->ready_to_run = 0;
+               if (state == 'R')
+                       tr->ready_to_run = t;
+               else
+                       tr->ready_to_run = 0;
        }
 
        evsel__save_time(evsel, sample->time, sample->cpu);