iolog: regrow logs in iolog_delay()
authorShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Thu, 21 Mar 2024 03:10:10 +0000 (12:10 +0900)
committerJens Axboe <axboe@kernel.dk>
Thu, 21 Mar 2024 11:57:54 +0000 (05:57 -0600)
The commit b85c01f7e9df ("iolog.c: fix inaccurate clat when replay
trace") triggered the assertion failure below for the workload which
does I/O replay as asynchronous I/O together with log recording options
such as write_lat_log.

  fio: stat.c:3030: get_cur_log: Assertion `iolog->pending->nr_samples < iolog->pending->max_samples' failed.
  fio: pid=40120, got signal=6

The assertion means that too many logs are recorded in the pending log
space which keeps the logs until next log space regrow by reglow_logs()
call. However, reglow_logs() is not called, and the pending log space
runs out.

The trigger commit modified iolog_delay() to call io_u_queued_complete()
so that the asynchronous I/Os can be completed during delays between
replayed I/Os. Before this commit, replayed I/Os were not completed
until all I/O units are consumed. So the free I/O unit list gets empty
periodically, then wait_for_completion() and regrow_logs() were called
periodically. After this commit, all I/O units are not consumed, so
wait_for_completion() and regrow_logs() are no longer called for long
duration. Hence the assertion failure.

To avoid the assertion, add the check for log regrow and reglow_logs()
call in iolog_delay().

Fixes: b85c01f7e9df ("iolog.c: fix inaccurate clat when replay trace")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240321031011.4140040-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
iolog.c

diff --git a/iolog.c b/iolog.c
index f52a9a80f7ef4e9546e1f0acf159a2eb6d3e255f..251e9d7fa2f8c4386b55fb2441229ae3cc9512de 100644 (file)
--- a/iolog.c
+++ b/iolog.c
@@ -102,6 +102,8 @@ static void iolog_delay(struct thread_data *td, unsigned long delay)
                ret = io_u_queued_complete(td, 0);
                if (ret < 0)
                        td_verror(td, -ret, "io_u_queued_complete");
+               if (td->flags & TD_F_REGROW_LOGS)
+                       regrow_logs(td);
                if (utime_since_now(&ts) > delay)
                        break;
        }