From: Filipe Manana Date: Tue, 22 Apr 2025 15:21:25 +0000 (+0100) Subject: btrfs: make extent unpinning more efficient when committing transaction X-Git-Tag: block-6.16-20250606~42^2~55 X-Git-Url: https://git.kernel.dk/?a=commitdiff_plain;h=66864101d1b76f7f3417274a954b160e81903a94;p=linux-block.git btrfs: make extent unpinning more efficient when committing transaction At btrfs_finish_extent_commit() we have this loop that keeps finding an extent range to unpin in the transaction's pinned_extents io tree, caches the extent state and then passes that cached extent state to btrfs_clear_extent_dirty(), which will free that extent state since we clear the only bit it can have set. So on each loop iteration we do a full io tree search and the cached state is used only to avoid having a tree search done by btrfs_clear_extent_dirty(). During the lifetime of a transaction we can pin many thousands of extents, resulting in a large and deep rb tree that backs the io tree. For example, for the following fs_mark run on a 12 cores boxes: $ cat test.sh #!/bin/bash DEV=/dev/nullb0 MNT=/mnt/nullb0 FILES=100000 THREADS=$(nproc --all) echo "performance" | \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor mkfs.btrfs -f $DEV mount $DEV $MNT OPTS="-S 0 -L 8 -n $FILES -s 0 -t $THREADS -k" for ((i = 1; i <= $THREADS; i++)); do OPTS="$OPTS -d $MNT/d$i" done fs_mark $OPTS umount $MNT an histogram for the number of ranges (elements) in the pinned extents io tree of a transaction was the following: Count: 76 Range: 5440.000 - 51088.000; Mean: 27354.368; Median: 28312.000; Stddev: 9800.767 Percentiles: 90th: 40486.000; 95th: 43322.000; 99th: 51088.000 5440.000 - 6805.809: 1 ### 6805.809 - 10652.034: 1 ### 10652.034 - 13326.178: 3 ######## 13326.178 - 16671.590: 8 ###################### 16671.590 - 20856.773: 7 #################### 20856.773 - 26092.528: 13 #################################### 26092.528 - 32642.571: 19 ##################################################### 32642.571 - 40836.818: 17 ############################################### 40836.818 - 51088.000: 7 #################### We can improve on this by grabbing the next state before calling btrfs_clear_extent_dirty(), avoiding a full tree search on the next iteration which always has an O(log n) complexity while grabbing the next element (rb_next() rbtree operation) is in the worst case O(log n) too, but very often much less than that, making it more efficient. Here follow histograms for the execution times, in nanoseconds, of btrfs_finish_extent_commit() before and after applying this patch and all the other patches in the same patchset. Before patchset: Count: 32 Range: 3925691.000 - 269990635.000; Mean: 133814526.906; Median: 122758052.000; Stddev: 65776550.375 Percentiles: 90th: 228672087.000; 95th: 265187000.000; 99th: 269990635.000 3925691.000 - 5993208.660: 1 #### 5993208.660 - 75878537.656: 4 ################## 75878537.656 - 115840974.514: 12 ##################################################### 115840974.514 - 176850157.761: 6 ########################### 176850157.761 - 269990635.000: 9 ######################################## After patchset: Count: 32 Range: 1849393.000 - 231491064.000; Mean: 126978584.625; Median: 123732897.000; Stddev: 58007821.806 Percentiles: 90th: 203055491.000; 95th: 219952699.000; 99th: 231491064.000 1849393.000 - 2997642.092: 1 #### 2997642.092 - 88111637.071: 9 ##################################### 88111637.071 - 142818264.414: 9 ##################################### 142818264.414 - 231491064.000: 13 ##################################################### Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba --- diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index 9266bb39c619..b1b96eb5f64e 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -1921,6 +1921,26 @@ int btrfs_lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end, u32 return err; } +/* + * Get the extent state that follows the given extent state. + * This is meant to be used in a context where we know no other tasks can + * concurrently modify the tree. + */ +struct extent_state *btrfs_next_extent_state(struct extent_io_tree *tree, + struct extent_state *state) +{ + struct extent_state *next; + + spin_lock(&tree->lock); + ASSERT(extent_state_in_tree(state)); + next = next_state(state); + if (next) + refcount_inc(&next->refs); + spin_unlock(&tree->lock); + + return next; +} + void __cold btrfs_extent_state_free_cachep(void) { btrfs_extent_state_leak_debug_check(); diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index d8c01d857667..0a18ca9c59c3 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -243,4 +243,7 @@ static inline int btrfs_unlock_dio_extent(struct extent_io_tree *tree, u64 start cached, NULL); } +struct extent_state *btrfs_next_extent_state(struct extent_io_tree *tree, + struct extent_state *state); + #endif /* BTRFS_EXTENT_IO_TREE_H */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7e87ef1bcb52..527bffab75e5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2818,28 +2818,24 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_block_group *block_group, *tmp; struct list_head *deleted_bgs; - struct extent_io_tree *unpin; + struct extent_io_tree *unpin = &trans->transaction->pinned_extents; + struct extent_state *cached_state = NULL; u64 start; u64 end; int unpin_error = 0; int ret; - unpin = &trans->transaction->pinned_extents; + mutex_lock(&fs_info->unused_bg_unpin_mutex); + btrfs_find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY, &cached_state); - while (!TRANS_ABORTED(trans)) { - struct extent_state *cached_state = NULL; - - mutex_lock(&fs_info->unused_bg_unpin_mutex); - if (!btrfs_find_first_extent_bit(unpin, 0, &start, &end, - EXTENT_DIRTY, &cached_state)) { - mutex_unlock(&fs_info->unused_bg_unpin_mutex); - break; - } + while (!TRANS_ABORTED(trans) && cached_state) { + struct extent_state *next_state; if (btrfs_test_opt(fs_info, DISCARD_SYNC)) ret = btrfs_discard_extent(fs_info, start, end + 1 - start, NULL); + next_state = btrfs_next_extent_state(unpin, cached_state); btrfs_clear_extent_dirty(unpin, start, end, &cached_state); ret = unpin_extent_range(fs_info, start, end, true); /* @@ -2858,10 +2854,27 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) if (!unpin_error) unpin_error = ret; } - mutex_unlock(&fs_info->unused_bg_unpin_mutex); + btrfs_free_extent_state(cached_state); - cond_resched(); + + if (need_resched()) { + btrfs_free_extent_state(next_state); + mutex_unlock(&fs_info->unused_bg_unpin_mutex); + cond_resched(); + cached_state = NULL; + mutex_lock(&fs_info->unused_bg_unpin_mutex); + btrfs_find_first_extent_bit(unpin, 0, &start, &end, + EXTENT_DIRTY, &cached_state); + } else { + cached_state = next_state; + if (cached_state) { + start = cached_state->start; + end = cached_state->end; + } + } } + mutex_unlock(&fs_info->unused_bg_unpin_mutex); + btrfs_free_extent_state(cached_state); if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) { btrfs_discard_calc_delay(&fs_info->discard_ctl);