Even with the fix to check if we need to update rusage happening
right before the stat_mutex lock, we can still deadlock. This
looks something like the below:
helper_thread job
lock(stat_mutex);
lock(stat_mutex);
down(td->rusage_sem);
And now are both effectively locked in an ABBA deadlock. The helper
thread is waiting for the job to update it's rusage, but the job
is stuck waiting for the stat_mutex.
Fix this by doing a trylock on the stat_mutex, and if it fails,
ensure that we update rusage.
Signed-off-by: Jens Axboe <axboe@fb.com>
* the rusage_sem, which would never get upped because
* this thread is waiting for the stat mutex.
*/
- check_update_rusage(td);
+ do {
+ check_update_rusage(td);
+ if (!fio_mutex_down_trylock(stat_mutex))
+ break;
+ usleep(1000);
+ } while (1);
- fio_mutex_down(stat_mutex);
if (td_read(td) && td->io_bytes[DDIR_READ])
update_runtime(td, elapsed_us, DDIR_READ);
if (td_write(td) && td->io_bytes[DDIR_WRITE])