With zonemode=zbd, for a multi-job workload using asynchronous I/O
engines with a deep I/O queue depth setting, a job that is building a
batch of asynchronous I/Os to submit may end up waiting for an I/O
target zone lock held by another job that is also preparing a batch.
For small devices with few zones and/or a large number of jobs, such
prepare phase zone lock contention can be frequent enough to end up in a
situation where all jobs are waiting for zone locks held by other jobs
and no I/O being executed (so no zone being unlocked).
Avoid this situation by using pthread_mutex_trylock() instead of
pthread_mutex_lock() and by calling io_u_quiesce() to execute queued
I/O units if locking fails. pthread_mutex_lock() is then called to
lock the desired target zone. The execution of io_u_quiesce() forces
I/O execution progress and so zones to be unlocked, avoiding job
deadlock.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
zbd_check_swd(f);
- pthread_mutex_lock(&zb->mutex);
+ /*
+ * Lock the io_u target zone. The zone will be unlocked if io_u offset
+ * is changed or when io_u completes and zbd_put_io() executed.
+ * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
+ * other waiting for zone locks when building an io_u batch, first
+ * only trylock the zone. If the zone is already locked by another job,
+ * process the currently queued I/Os so that I/O progress is made and
+ * zones unlocked.
+ */
+ if (pthread_mutex_trylock(&zb->mutex) != 0) {
+ if (!td_ioengine_flagged(td, FIO_SYNCIO))
+ io_u_quiesce(td);
+ pthread_mutex_lock(&zb->mutex);
+ }
+
switch (io_u->ddir) {
case DDIR_READ:
if (td->runstate == TD_VERIFYING) {