After commmit
e44163e17796 ("btrfs: explictly delete unused block groups
in close_ctree and ro-remount"), added in the 4.3 merge window, we have
calls to btrfs_delete_unused_bgs() while holding the cleaner_mutex.
This can cause a deadlock with a concurrent block group relocation (when
a filesystem balance or shrink operation is in progress for example)
because btrfs_delete_unused_bgs() locks delete_unused_bgs_mutex and the
relocation path locks first delete_unused_bgs_mutex and then it locks
cleaner_mutex, resulting in a classic ABBA deadlock:
CPU 0 CPU 1
lock fs_info->cleaner_mutex
__btrfs_balance() || btrfs_shrink_device()
lock fs_info->delete_unused_bgs_mutex
btrfs_relocate_chunk()
btrfs_relocate_block_group()
lock fs_info->cleaner_mutex
btrfs_delete_unused_bgs()
lock fs_info->delete_unused_bgs_mutex
Fix this by not taking the cleaner_mutex before calling
btrfs_delete_unused_bgs() because it's no longer needed after
commit
67c5e7d464bc ("Btrfs: fix race between balance and unused block
group deletion"). The mutex fs_info->delete_unused_bgs_mutex, the
spinlock fs_info->unused_bgs_lock and a block group's spinlock are
enough to get correct serialization between tasks running relocation
and unused block group deletion (as well as between multiple tasks
concurrently calling btrfs_delete_unused_bgs()).
This issue was discussed (in the mailing list) during the review of
the patch titled "btrfs: explictly delete unused block groups in
close_ctree and ro-remount" and it was agreed that acquiring the
cleaner mutex had to be dropped after the patch titled
"Btrfs: fix race between balance and unused block group deletion"
got merged (both patches were submitted at about the same time, but
one landed in kernel 4.2 and the other in the 4.3 merge window).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
* block groups queued for removal, the deletion will be
* skipped when we quit the cleaner thread.
*/
- mutex_lock(&root->fs_info->cleaner_mutex);
btrfs_delete_unused_bgs(root->fs_info);
- mutex_unlock(&root->fs_info->cleaner_mutex);
ret = btrfs_commit_super(root);
if (ret)
* groups on disk until we're mounted read-write again
* unless we clean them up here.
*/
- mutex_lock(&root->fs_info->cleaner_mutex);
btrfs_delete_unused_bgs(fs_info);
- mutex_unlock(&root->fs_info->cleaner_mutex);
btrfs_dev_replace_suspend_for_unmount(fs_info);
btrfs_scrub_cancel(fs_info);