From: Jens Axboe Date: Mon, 23 Mar 2015 16:19:44 +0000 (-0600) Subject: Always update rusage before grabbing stat_mutex X-Git-Tag: fio-2.2.7~12 X-Git-Url: https://git.kernel.dk/?p=fio.git;a=commitdiff_plain;h=40c666c87c2d7751abea8fc2f866404d1efbdee2 Always update rusage before grabbing stat_mutex zxh reports running into a rusage_sem vs stat_mutex deadlock: "I am running into a deadlock with fio for the following lines: thread_main fio_mutex_down(stat_mutex) at backend.c:1529 helper_thread_main fio_mutex_down(td->rusage_sem) at stat.c:1467 thread_main is waiting for stat_mutex, which is already locked by helper_thread_main in function __show_running_run_stats() at stat.c:1441. However, the helper_thread_main is waiting for td->rusage_sem, which is supposed to be unlocked by check_update_rusage() in do_io() at backend.c:1525 in thread_main. The issue is not reproducible every time, and I was using a customized ioengine derived from rbd.c. Is there any chance that this issue is caused by the customized io engine? Or is there a way to get around this?" Fix this by always updating the rusage stats before grabbing stat_mutex, so we avoid this ABBA deadlock. Signed-off-by: Jens Axboe --- diff --git a/backend.c b/backend.c index fdb7413d..2be71496 100644 --- a/backend.c +++ b/backend.c @@ -1526,6 +1526,15 @@ static void *thread_main(void *data) clear_state = 1; + /* + * Make sure we've successfully updated the rusage stats + * before waiting on the stat mutex. Otherwise we could have + * the stat thread holding stat mutex and waiting for + * the rusage_sem, which would never get upped because + * this thread is waiting for the stat mutex. + */ + check_update_rusage(td); + fio_mutex_down(stat_mutex); if (td_read(td) && td->io_bytes[DDIR_READ]) { elapsed = mtime_since_now(&td->start); @@ -1556,6 +1565,11 @@ static void *thread_main(void *data) do_verify(td, verify_bytes); + /* + * See comment further up for why this is done here. + */ + check_update_rusage(td); + fio_mutex_down(stat_mutex); td->ts.runtime[DDIR_READ] += mtime_since_now(&td->start); fio_gettime(&td->start, NULL);