Btrfs: fix file corruption after cloning inline extents
Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.
For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount
# Create our test files. File foo has the same 2K of data at offset 4K
# as file bar has at its offset 0.
$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
-c "pwrite -S 0xbb 4k 2K" \
-c "pwrite -S 0xcc 8K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# File bar consists of a single inline extent (2K size).
$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
$SCRATCH_MNT/bar | _filter_xfs_io
# Now call the clone ioctl to clone the extent of file bar into file
# foo at its offset 4K. This made file foo have an inline extent at
# offset 4K, something which the btrfs code can not deal with in future
# IO operations because all inline extents are supposed to start at an
# offset of 0, resulting in all sorts of chaos.
# So here we validate that clone ioctl returns an EOPNOTSUPP, which is
# what it returns for other cases dealing with inlined extents.
$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/foo
# Because of the inline extent at offset 4K, the following write made
# the kernel crash with a BUG_ON().
$XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
status=0
exit
The stack trace of the BUG_ON() triggered by the last write is:
[152154.035903] ------------[ cut here ]------------
[152154.036424] kernel BUG at mm/page-writeback.c:2286!
[152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
[152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G W 4.1.0-rc6-btrfs-next-11+ #2
[152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[152154.036424] task:
ffff880429f70990 ti:
ffff880429efc000 task.ti:
ffff880429efc000
[152154.036424] RIP: 0010:[<
ffffffff8111a9d5>] [<
ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
[152154.036424] RSP: 0018:
ffff880429effc68 EFLAGS:
00010246
[152154.036424] RAX:
0200000000000806 RBX:
ffffea0006a6d8f0 RCX:
0000000000000001
[152154.036424] RDX:
0000000000000000 RSI:
ffffffff81155d1b RDI:
ffffea0006a6d8f0
[152154.036424] RBP:
ffff880429effc78 R08:
ffff8801ce389fe0 R09:
0000000000000001
[152154.036424] R10:
0000000000002000 R11:
ffffffffffffffff R12:
ffff8800200dce68
[152154.036424] R13:
0000000000000000 R14:
ffff8800200dcc88 R15:
ffff8803d5736d80
[152154.036424] FS:
00007fbf119f6700(0000) GS:
ffff88043d280000(0000) knlGS:
0000000000000000
[152154.036424] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[152154.036424] CR2:
0000000001bdc000 CR3:
00000003aa555000 CR4:
00000000000006e0
[152154.036424] Stack:
[152154.036424]
ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
[152154.036424]
ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
[152154.036424]
0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
[152154.036424] Call Trace:
[152154.036424] [<
ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
[152154.036424] [<
ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
[152154.036424] [<
ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
[152154.036424] [<
ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
[152154.036424] [<
ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
[152154.036424] [<
ffffffff81165a4a>] __vfs_write+0x7c/0xa5
[152154.036424] [<
ffffffff81165f89>] vfs_write+0xa0/0xe4
[152154.036424] [<
ffffffff81166855>] SyS_pwrite64+0x64/0x82
[152154.036424] [<
ffffffff81465197>] system_call_fastpath+0x12/0x6f
[152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
[152154.036424] RIP [<
ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
[152154.036424] RSP <
ffff880429effc68>
[152154.242621] ---[ end trace
e3d3376b23a57041 ]---
Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:
00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>