Btrfs: avoid unnecessary ordered extent cache resets
authorFilipe David Borba Manana <fdmanana@gmail.com>
Fri, 22 Nov 2013 18:54:58 +0000 (18:54 +0000)
committerChris Mason <clm@fb.com>
Tue, 28 Jan 2014 21:19:46 +0000 (13:19 -0800)
After an ordered extent completes, don't blindly reset the
inode's ordered tree last accessed ordered extent pointer.

While running the xfstests I noticed that about 29% of the
time the ordered extent to which tree->last pointed was not
the same as our just completed ordered extent. After that I
ran the following sysbench test (after a prepare phase) and
noticed that about 68% of the time tree->last pointed to
a different ordered extent too.

sysbench --test=fileio --file-num=32 --file-total-size=4G \
    --file-test-mode=rndwr --num-threads=512 \
    --file-block-size=32768 --max-time=60 --max-requests=0 run

Therefore reset tree->last on ordered extent removal only if
it pointed to the ordered extent we're removing from the tree.

Results from 4 runs of the following test before and after
applying this patch:

$ sysbench --test=fileio --file-num=32 --file-total-size=4G \
  --file-test-mode=seqwr --num-threads=512 \
  --file-block-size=32768 --max-time=60 --file-io-mode=sync prepare
$ sysbench --test=fileio --file-num=32 --file-total-size=4G \
  --file-test-mode=seqwr --num-threads=512 \
  --file-block-size=32768 --max-time=60 --file-io-mode=sync run

Before this path:

run 1 - 64.049Mb/sec
run 2 - 63.455Mb/sec
run 3 - 64.656Mb/sec
run 4 - 63.833Mb/sec

After this patch:

run 1 - 66.149Mb/sec
run 2 - 68.459Mb/sec
run 3 - 66.338Mb/sec
run 4 - 66.176Mb/sec

With random writes (--file-test-mode=rndwr) I had huge fluctuations
on the results (+- 35% easily).

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
fs/btrfs/ordered-data.c

index 69582d5b69d1f6064a77a409760a3ba1886b6d92..b8c2ded75fe26ff8251398d5953a0e71e6a8db09 100644 (file)
@@ -520,7 +520,8 @@ void btrfs_remove_ordered_extent(struct inode *inode,
        spin_lock_irq(&tree->lock);
        node = &entry->rb_node;
        rb_erase(node, &tree->tree);
-       tree->last = NULL;
+       if (tree->last == node)
+               tree->last = NULL;
        set_bit(BTRFS_ORDERED_COMPLETE, &entry->flags);
        spin_unlock_irq(&tree->lock);