zxh reports running into a rusage_sem vs stat_mutex deadlock:
"I am running into a deadlock with fio for the following lines:
thread_main
fio_mutex_down(stat_mutex) at backend.c:1529
helper_thread_main
fio_mutex_down(td->rusage_sem) at stat.c:1467
thread_main is waiting for stat_mutex, which is already locked by
helper_thread_main in function __show_running_run_stats() at
stat.c:1441. However, the helper_thread_main is waiting for
td->rusage_sem, which is supposed to be unlocked by
check_update_rusage() in do_io() at backend.c:1525 in thread_main.
The issue is not reproducible every time, and I was using a customized
ioengine derived from rbd.c.
Is there any chance that this issue is caused by the customized io
engine? Or is there a way to get around this?"
Fix this by always updating the rusage stats before grabbing
stat_mutex, so we avoid this ABBA deadlock.
Signed-off-by: Jens Axboe <axboe@fb.com>
+ /*
+ * Make sure we've successfully updated the rusage stats
+ * before waiting on the stat mutex. Otherwise we could have
+ * the stat thread holding stat mutex and waiting for
+ * the rusage_sem, which would never get upped because
+ * this thread is waiting for the stat mutex.
+ */
+ check_update_rusage(td);
+
fio_mutex_down(stat_mutex);
if (td_read(td) && td->io_bytes[DDIR_READ]) {
elapsed = mtime_since_now(&td->start);
fio_mutex_down(stat_mutex);
if (td_read(td) && td->io_bytes[DDIR_READ]) {
elapsed = mtime_since_now(&td->start);
do_verify(td, verify_bytes);
do_verify(td, verify_bytes);
+ /*
+ * See comment further up for why this is done here.
+ */
+ check_update_rusage(td);
+
fio_mutex_down(stat_mutex);
td->ts.runtime[DDIR_READ] += mtime_since_now(&td->start);
fio_gettime(&td->start, NULL);
fio_mutex_down(stat_mutex);
td->ts.runtime[DDIR_READ] += mtime_since_now(&td->start);
fio_gettime(&td->start, NULL);