Filipe Manana [Thu, 10 Apr 2025 11:59:05 +0000 (12:59 +0100)]
btrfs: add missing error return to btrfs_clear_extent_bit_changeset()
We have a couple error branches where we have an error stored in the 'err'
variable and then jump to the 'out' label, however we don't return that
error, we just return 0. Normally this is not a problem since those error
branches call extent_io_tree_panic() which triggers a BUG() call, however
it's possible to have rather exotic kernel config with CONFIG_BUG disabled
in which case the BUG() call does nothing and we fallthrough. So make sure
to return the error, not just to fix that exotic case but also to make the
code less confusing. While at it also rename the 'err' variable to 'ret'
since this is the style we prefer and use more widely.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 9 Apr 2025 15:17:16 +0000 (16:17 +0100)]
btrfs: exit after state split error at btrfs_clear_extent_bit_changeset()
If split_state() returned an error we call extent_io_tree_panic() which
will trigger a BUG() call. However if CONFIG_BUG is disabled, which is an
uncommon and exotic scenario, then we fallthrough and hit a use after free
when calling clear_state_bit() since the extent state record which the
local variable 'prealloc' points to was freed by split_state().
So jump to the label 'out' after calling extent_io_tree_panic() and set
the 'prealloc' pointer to NULL since split_state() has already freed it
when it hit an error.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 9 Apr 2025 14:27:35 +0000 (15:27 +0100)]
btrfs: remove duplicate error check at btrfs_clear_extent_bit_changeset()
There's no need to check if split_state() returned an error twice, instead
unify into a single if statement after setting 'prealloc' to NULL, because
on error split_state() frees the 'prealloc' extent state record.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 28 Apr 2025 00:46:19 +0000 (10:16 +0930)]
btrfs: get rid of btrfs_read_dev_super()
The function is introduced by commit
a512bbf855ff ("Btrfs: superblock
duplication") at the beginning of btrfs.
It leaved a comment saying we'd need a special mount option to read all
super blocks, but it's never been implemented and there was not
need/request for it. The check/rescue tools are able to start from a
specific copy and use it as primary eventually.
This means btrfs_read_dev_super() is always reading the first super
block, making all the code finding the latest super block unnecessary.
Just remove that function and replace all call sites with
btrfs_read_disk_super(bdev, 0, false).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 28 Apr 2025 00:36:50 +0000 (10:06 +0930)]
btrfs: merge btrfs_read_dev_one_super() into btrfs_read_disk_super()
We have two functions to read a super block from a block device:
- btrfs_read_dev_one_super()
Exported from disk-io.c
- btrfs_read_disk_super()
Local to volumes.c
And they have some minor differences:
- btrfs_read_dev_one_super() uses @copy_num
Meanwhile btrfs_read_disk_super() relies on the physical and expected
bytenr passed from the caller.
The parameter list of btrfs_read_dev_one_super() is more user
friendly.
- btrfs_read_disk_super() makes sure the label is NUL terminated
We do not need two different functions doing the same job, so merge the
behavior into btrfs_read_disk_super() by:
- Remove btrfs_read_dev_one_super()
- Export btrfs_read_disk_super()
The name pairs with btrfs_release_disk_super() perfectly.
- Change the parameter list of btrfs_read_disk_super() to mimic
btrfs_read_dev_one_super()
All existing callers are calculating the physical address and expect
bytenr before calling btrfs_read_disk_super() already.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Daniel Vacek [Fri, 25 Apr 2025 07:23:57 +0000 (09:23 +0200)]
btrfs: get rid of goto in alloc_test_extent_buffer()
The `free_eb` label is used only once. Simplify by moving the code inplace.
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Mon, 28 Apr 2025 14:52:57 +0000 (10:52 -0400)]
btrfs: use buffer xarray for extent buffer writeback operations
Currently we have this ugly back and forth with the btree writeback
where we find the folio, find the eb associated with that folio, and
then attempt to writeback. This results in two different paths for
subpage ebs and >= page size ebs.
Clean this up by adding our own infrastructure around looking up tagged
ebs and writing the ebs out directly. This allows us to unify the
subpage and >= pagesize IO paths, resulting in a much cleaner writeback
path for extent buffers.
I ran this through fsperf on a VM with 8 CPUs and 16GiB of RAM. I used
smallfiles100k, but reduced the files to 1k to make it run faster, the
results are as follows, with the statistically significant improvements
marked with *, there were no regressions. fsperf was run with -n 10 for
both runs, so the baseline is the average 10 runs and the test is the
average of 10 runs.
smallfiles100k results
metric baseline current stdev diff
================================================================================
avg_commit_ms 68.58 58.44 3.35 -14.79% *
commits 270.60 254.70 16.24 -5.88%
dev_read_iops 48 48 0 0.00%
dev_read_kbytes 1044 1044 0 0.00%
dev_write_iops 866117.90 850028.10 14292.20 -1.86%
dev_write_kbytes
10939976.40
10605701.20 351330.32 -3.06%
elapsed 49.30 33 1.64 -33.06% *
end_state_mount_ns
41251498.80
35773220.70
2531205.32 -13.28% *
end_state_umount_ns 1.90e+09 1.50e+09
14186226.85 -21.38% *
max_commit_ms 139 111.60 9.72 -19.71% *
sys_cpu 4.90 3.86 0.88 -21.29%
write_bw_bytes
42935768.20
64318451.10
1609415.05 49.80% *
write_clat_ns_mean 366431.69 243202.60 14161.98 -33.63% *
write_clat_ns_p50 49203.20 20992 264.40 -57.34% *
write_clat_ns_p99 827392 653721.60 65904.74 -20.99% *
write_io_kbytes
2035940 2035940 0 0.00%
write_iops 10482.37 15702.75 392.92 49.80% *
write_lat_ns_max 1.01e+08
90516129 3910102.06 -10.29% *
write_lat_ns_mean 366556.19 243308.48 14154.51 -33.62% *
As you can see we get about a 33% decrease runtime, with a 50%
throughput increase, which is pretty significant.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Mon, 28 Apr 2025 14:52:56 +0000 (10:52 -0400)]
btrfs: set DIRTY and WRITEBACK tags on the buffer_tree
In preparation for changing how we do writeout of extent buffers, start
tagging the extent buffer xarray with DIRTY and WRITEBACK to make it
easier to find extent buffers that are in either state.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Mon, 28 Apr 2025 14:52:55 +0000 (10:52 -0400)]
btrfs: convert the buffer_radix to an xarray
In order to fully utilize xarray tagging to improve writeback we need to
convert the buffer_radix to a proper xarray. This conversion is
relatively straightforward as the radix code uses the xarray underneath.
Using xarray directly allows for quite a lot less code.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 29 Apr 2025 10:56:41 +0000 (12:56 +0200)]
btrfs: rename btrfs_discard workqueue to btrfs-discard
We use the "btrfs-" prefix for our workqueues, the discard has
underscore instead of dash, so unify it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 06:29:14 +0000 (08:29 +0200)]
btrfs: on unknown chunk allocation policy fallback to regular
We have only two chunk allocation policies right now and the
switch/cases don't handle an unknown one properly. The error is in the
impossible category (the policy is stored only in memory), we don't have
to BUG(), falling back to regular policy should be safe.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 16:53:59 +0000 (18:53 +0200)]
btrfs: reformat comments in acls_after_inode_item()
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 16:53:58 +0000 (18:53 +0200)]
btrfs: switch int dev_replace_is_ongoing variables/parameters to bool
Both the variable and the parameter are used as logical indicators so
convert them to bool.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 16:53:57 +0000 (18:53 +0200)]
btrfs: trivial conversion to return bool instead of int
Old code has a lot of int for bool return values, bool is recommended
and done in new code. Convert the trivial cases that do simple 0/false
and 1/true. Functions comment are updated if needed.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 23 Apr 2025 07:06:14 +0000 (16:36 +0930)]
btrfs: subpage: reject tree blocks which are not nodesize aligned
When btrfs subpage support (fs block < page size) was introduced, a
subpage filesystem will only reject tree blocks which cross page
boundaries.
This used to be a compromise to simplify the tree block handling and
still allowing subpage cases to read some old converted filesystems
which did not have proper chunk alignment.
But in practice, suppose we have the following unaligned tree block on a
64K page sized system:
0 32K 44K 60K 64K
| |///////////////| |
Although btrfs has no problem reading the tree block at [44K, 60K), if
extent allocator is allocating another tree block, it may choose the
range [60K, 74K), as extent allocator has no awareness if it's a subpage
metadata request or not.
Then we'd get -EINVAL from the following sequence:
btrfs_alloc_tree_block()
|- btrfs_reserve_extent()
| Which returned range [60K, 74K)
|- btrfs_init_new_buffer()
|- btrfs_find_create_tree_block()
|- alloc_extent_buffer()
|- check_eb_alignment()
Which returned -EINVAL, because the range crosses page
boundary.
This situation will not fix itself and should mostly mark the fs
read-only.
Thankfully we didn't really get such reports in the real world because:
- The original unaligned tree block is only caused by older
btrfs-convert
It's before the btrfs-convert rework was done in v4.6, where converted
btrfs filesystem can have metadata block groups which are not aligned
to nodesize nor stripe size (64K).
But after btrfs-progs v4.6, all chunks allocated will be stripe (64K)
aligned, thus no more such problem.
Considering how old the fix is (v4.6 was released almost 10 years ago),
subpage support for btrfs was introduced in v5.15, it should be safe to
reject those unaligned tree blocks.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Daniel Vacek [Wed, 23 Apr 2025 08:51:22 +0000 (10:51 +0200)]
btrfs: move folio initialization to one place in attach_eb_folio_to_filemap()
This is just a trivial change. The code looks a bit more readable this way, IMO.
Move initialization of existing_folio to the beginning of the retry loop
so it's set to NULL at one place.
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:24 +0000 (17:57 +0200)]
btrfs: raid56: rename parameter err to status in endio helpers
Trivial renames to unify the naming of blk_status_t variables/parameters.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:23 +0000 (17:57 +0200)]
btrfs: change return type of btrfs_alloc_dummy_sum() to int
The type blk_status_t is from block layer and not related to checksums
in our context. Use int internally and do the conversions to blk_status_t
as needed in btrfs_submit_chunk().
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:22 +0000 (17:57 +0200)]
btrfs: rename ret2 to ret in btrfs_submit_compressed_read()
We can now rename 'ret2' to 'ret' and use it for generic errors.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:21 +0000 (17:57 +0200)]
btrfs: rename ret to status in btrfs_submit_compressed_read()
We're using 'status' for the blk_status_t variables, rename 'ret' so we can
use it for generic errors.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:20 +0000 (17:57 +0200)]
btrfs: simplify reading bio status in end_compressed_writeback()
We don't need to have a separate variable to read the bio status, 'ret'
works for that just fine so remove 'error'.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:19 +0000 (17:57 +0200)]
btrfs: rename error to ret in btrfs_submit_chunk()
We can now rename 'error' to 'ret' and use it for generic errors.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:18 +0000 (17:57 +0200)]
btrfs: rename ret to status in btrfs_submit_chunk()
We're using 'status' for the blk_status_t variables, rename 'ret' so we
can use it for proper return type.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:17 +0000 (17:57 +0200)]
btrfs: change return type of btrfs_bio_csum() to int
The type blk_status_t is from block layer and not related to checksums
in our context. Use int internally and do the conversions to blk_status_t
as needed.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:16 +0000 (17:57 +0200)]
btrfs: change return type of btree_csum_one_bio() to int
The type blk_status_t is from block layer and not related to checksums
in our context. Use int internally and do the conversions to blk_status_t
as needed in btrfs_bio_csum().
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:15 +0000 (17:57 +0200)]
btrfs: change return type of btrfs_csum_one_bio() to int
The type blk_status_t is from block layer and not related to checksums
in our context. Use int internally and do the conversions to blk_status_t
as needed in btrfs_bio_csum().
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:14 +0000 (17:57 +0200)]
btrfs: change return type of btrfs_lookup_bio_sums() to int
The type blk_status_t is from block layer and not related to checksums
in our context. Use int internally and do the conversions to blk_status_t
as needed in btrfs_submit_chunk().
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 15:57:13 +0000 (17:57 +0200)]
btrfs: drop redundant local variable in raid_wait_write_end_io()
The bio status is read only once, no variable needed for that.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 23 Apr 2025 07:30:48 +0000 (09:30 +0200)]
btrfs: merge __setup_root() to btrfs_alloc_root()
There's only one caller of __setup_root() so merge it there.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 22 Apr 2025 15:55:41 +0000 (17:55 +0200)]
btrfs: use unsigned types for constants defined as bit shifts
The unsigned type is a recommended practice (CWE-190, CWE-194) for bit
shifts to avoid problems with potential unwanted sign extensions.
Although there are no such cases in btrfs codebase, follow the
recommendation.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 22 Apr 2025 15:32:17 +0000 (17:32 +0200)]
btrfs: remove unused btrfs_io_stripe::length
First added (but not effectively used) in
02c372e1f016e5 ("btrfs: add
support for inserting raid stripe extents"). The structure is
initialized to zeros so the only use in btrfs_insert_one_raid_extent()
u64 length = bioc->stripes[i].length;
struct btrfs_raid_stride *raid_stride = &stripe_extent->strides[i];
if (length == 0)
length = bioc->size;
the 'if' always happens.
Last use in
4016358e852861 ("btrfs: remove unused variable length in
btrfs_insert_one_raid_extent()") was an obvious cleanup. It seems to be
safe to remove, raid-stripe-tree works without using it since 6.6.
This was found by tool https://github.com/jirislaby/clang-struct .
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 22 Apr 2025 16:21:51 +0000 (18:21 +0200)]
btrfs: use list_first_entry() everywhere
Using the helper makes it a bit more clear that we're accessing the
first list entry.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 17 Apr 2025 09:17:03 +0000 (11:17 +0200)]
btrfs: convert ASSERT(0) with handled errors to DEBUG_WARN()
The use of ASSERT(0) is maybe useful for some cases but more like a
notice for developers. Assertions can be compiled in independently so
convert it to a debugging helper.
The difference is that it's just a warning and will not end up in BUG().
The converted cases are in connection with proper error handling.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 17 Apr 2025 09:17:02 +0000 (11:17 +0200)]
btrfs: convert WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)) to DEBUG_WARN
Use the conditional warning instead of typing the whole condition.
Optional message is printed where it seems clear what could be the
problem.
Conversion is left out in btree_csum_one_bio() because of the additional
condition.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 17 Apr 2025 09:17:01 +0000 (11:17 +0200)]
btrfs: add debug build only WARN
Add conditional WARN() wrapper that's enabled only in debug build. It
should be used for unexpected conditions that should be noisy. Use it
instead of ASSERT(0). As it will not lead to BUG() make sure that
continuing is still possible, e.g. the error is handled anyway.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 17 Apr 2025 09:17:00 +0000 (11:17 +0200)]
btrfs: use verbose ASSERT() in volumes.c
The file volumes.c has about 40 assertions and half of them are suitable
for ASSERT() with additional data.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 17 Apr 2025 09:16:59 +0000 (11:16 +0200)]
btrfs: enhance ASSERT() to take optional format string
Currently ASSERT() prints the stringified condition and without macro
expansions so simple constants like BTRFS_MAX_METADATA_BLOCKSIZE remain
readable in the output.
There are expressions where we'd like to see the exact values but all we
get is something like:
assertion failed: em->start <= start && start < extent_map_end(em), in fs/btrfs/extent_map.c:613
It would be nice to be able to print any additional information to help
understand the problem. With some preprocessor magic and compile-time
optimizations we can enhance ASSERT to work like that as well:
ASSERT(value > limit, "value=%llu limit=%llu", value, limit);
with free-form printk arguments that will be part of the assertion
message.
Pros:
- helps debugging and understanding reported problems
- the optional format is verified at compile-time
Cons:
- increases the .ko size
- writing the assertion code is repetitive (condition, format, values)
- format and variable type must match (extra lookup)
- needs gcc 8.x and newer, otherwise it's the short format
Recommended use is for non-trivial expressions, so basic ASSERT(value) can be
used for pointers or sometimes integers.
The format has been slightly updated to also print the result of the
evaluation of the condition, appended to the stringified condition as
"condition :: <value>".
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Yangtao Li [Thu, 17 Apr 2025 14:26:49 +0000 (08:26 -0600)]
btrfs: remove BTRFS_REF_LAST from enum btrfs_ref_type
Commit
b28b1f0ce44c ("btrfs: delayed-ref: Introduce better documented
delayed ref structures") introduced BTRFS_REF_LAST, which can be used
for sanity checking, e.g. in switch/case or for loops.
In btrfs_ref_type() there is an assertion
ASSERT(ref->type == BTRFS_REF_DATA || ref->type == BTRFS_REF_METADATA);
to validate the values so we don't need the ending enum.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Wed, 9 Apr 2025 11:10:42 +0000 (13:10 +0200)]
btrfs: use bvec_kmap_local() in btrfs_decompress_buf2page()
This removes the last direct poke into bvec internals in btrfs.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Wed, 9 Apr 2025 11:10:41 +0000 (13:10 +0200)]
btrfs: scrub: use virtual addresses directly
Instead of the old @page and @page_offset pair inside scrub, here we can
directly use the virtual address for a sector.
This has the following benefit:
- Simplified parameters
A single @kaddr will repair @page and @page_offset.
- No more unnecessary kmap/kunmap calls
Since all pages utilized by scrub is allocated by scrub, and no
highmem is allowed, we do not need to do any kmap/kunmap.
And add an ASSERT() inside the new scrub_stripe_get_kaddr() to
catch any unexpected highmem page.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 9 Apr 2025 11:10:40 +0000 (13:10 +0200)]
btrfs: raid56: store a physical address in structure sector_ptr
Instead of using a @page + @pg_offset pair inside sector_ptr structure,
use a single physical address instead.
This allows us to grab both the page and offset from a single u64 value.
Although we still need an extra bool value, @has_paddr, to distinguish
if the sector is properly mapped (as the 0 physical address is totally
valid).
This change doesn't change the size of structure sector_ptr, but reduces
the parameters of several functions.
Note: the original idea and patch is from Christoph Hellwig
(https://lore.kernel.org/linux-btrfs/
20250409111055.
3640328-7-hch@lst.de/)
but the final implementation is different.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[ Use physical addresses instead to handle highmem. ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Wed, 9 Apr 2025 11:10:39 +0000 (13:10 +0200)]
btrfs: simplify bvec iteration in index_one_bio()
Flatten the two loops by open coding bio_for_each_segment() and advancing
the iterator one sector at a time.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ Fix a bug that @offset is not increased. ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Wed, 9 Apr 2025 11:10:38 +0000 (13:10 +0200)]
btrfs: move kmapping out of btrfs_check_sector_csum()
Move kmapping the page out of btrfs_check_sector_csum().
This allows using bvec_kmap_local() where suitable and reduces the number
of kmap*() calls in the raid56 code.
This also means btrfs_check_sector_csum() will only accept a properly
kmapped address.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Wed, 9 Apr 2025 11:10:37 +0000 (13:10 +0200)]
btrfs: pass a physical address to btrfs_repair_io_failure()
Using physical address has the following advantages:
- All involved callers only need a single pointer
Instead of the old @folio + @offset pair.
- No complex poking into the bio_vec structure
As a bio_vec can be single or multiple paged, grabbing the real page
can be quite complex if the bio_vec is a multi-page one.
Instead bvec_phys() will always give a single physical address, and it
cab be easily converted to a page.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Wed, 9 Apr 2025 11:10:36 +0000 (13:10 +0200)]
btrfs: track the next file offset in struct btrfs_bio_ctrl
The bio implementation is not something we should really mess around,
and we shouldn't recalculate the pos from the folio over and over.
Instead just track then end of the current bio in logical file offsets
in the btrfs_bio_ctrl, which is much simpler and easier to read.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Wed, 9 Apr 2025 11:10:35 +0000 (13:10 +0200)]
btrfs: remove the alignment checks in end_bbio_data_read()
end_bbio_data_read() checks that each iterated folio fragment is aligned
and justifies that with block drivers advancing the bio. But block
driver only advance bi_iter, while end_bbio_data_read() uses
bio_for_each_folio_all() to iterate the immutable bi_io_vec array that
can't be changed by drivers at all.
Furthermore btrfs has already did the alignment check of the file
offset inside submit_one_sector(), and the size is fixed to fs block
size, there is no need to re-do the alignment check again inside the
endio function.
So just remove the unnecessary alignment check along with the incorrect
comment.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Charles Han [Thu, 10 Apr 2025 09:07:22 +0000 (17:07 +0800)]
btrfs: update and correct description of btrfs_get_or_create_delayed_node()
The comment mistakenly says the function is returning PTR_ERR instead of
ERR_PTR. Fix it and update it so it's more descriptive.
Signed-off-by: Charles Han <hanchunchao@inspur.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ Enhance the function comment. ]
Signed-off-by: David Sterba <dsterba@suse.com>
Yangtao Li [Mon, 14 Apr 2025 12:52:31 +0000 (06:52 -0600)]
btrfs: simplify return logic from btrfs_delayed_ref_init()
Make this simpler by returning directly when there's no other cleanup
needed.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Yangtao Li [Tue, 15 Apr 2025 03:53:40 +0000 (21:53 -0600)]
btrfs: reuse exit helper for cleanup in btrfs_bioset_init()
Do not duplicate the cleanup after failed initialization
in btrfs_bioset_init() and reuse the exit function btrfs_bioset_exit().
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 15 Apr 2025 07:02:13 +0000 (09:02 +0200)]
btrfs: rename iov_iter iterator parameter in btrfs_buffered_write()
Using 'i' for a parameter is confusing and conforming to current
preferences, so rename it to 'iter'.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 17 Mar 2025 07:10:54 +0000 (17:40 +1030)]
btrfs: enable large data folios support for defrag
Currently we reject large folios for defrag gracefully, but the
implementation itself is already mostly large folios compatible.
There are several parts of defrag in btrfs:
- Extent map checking
Aka, defrag_collect_targets(), which prepares a list of target ranges
that should be defragged.
This part is completely folio unrelated, thus it doesn't care about
the folio size.
- Target folio preparation
Aka, defrag_prepare_one_folio(), which lock and read (if needed) the
target folio.
Since folio read and lock are already supporting large folios, this
part needs only minor changes.
- Redirty the target range of the folio
This is already done in a way supporting large folios.
So it's pretty straightforward to enable large folios for defrag:
- Do not reject large folios for experimental builds
This affects the large folio check inside defrag_prepare_one_folio().
- Wait for ordered extents of the whole folio in
defrag_prepare_one_folio()
- Lock the whole extent range for all involved folios in
defrag_one_range()
- Allow the folios[] array to be partially empty
Since we can have large folios, folios[] will not always be full.
This affects:
* How to allocate folios in defrag_one_range()
Now we cannot use page index, but use the end position of the folio
as an iterator.
* How to free the folios[] array
If we hit an empty slot, it means we have large folios and already
hit the end of the array.
* How to mark the range dirty
Instead of use page index directly, we have to go through each
folio, and check if the folio covers the defrag target inside
defrag_one_locked_target().
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 7 Apr 2025 08:43:33 +0000 (18:13 +0930)]
btrfs: prepare compression paths for large data folios
All compression algorithms inside btrfs are not supporting large folios
due to the following points:
- btrfs_calc_input_length() is assuming page sized folio
- kmap_local_folio() usages are using offset_in_page()
Prepare them to support large data folios by:
- Add a folio parameter to btrfs_calc_input_length()
And use that folio parameter to calculate the correct length.
Since we're here, also add extra ASSERT()s to make sure the parameter
@cur is inside the folio range.
This affects only zlib and zstd. Lzo compresses at most one block at a
time, thus not affected.
- Use offset_in_folio() to calculate the kmap_local_folio() offset
This affects all 3 algorithms.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 8 Apr 2025 16:37:03 +0000 (17:37 +0100)]
btrfs: rename __tree_search() to remove double underscore prefix
There's no need to have a double underscore prefix as there's no variant
of the function without it.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 8 Apr 2025 16:34:24 +0000 (17:34 +0100)]
btrfs: rename __lookup_extent_mapping() to remove double underscore prefix
There's no need to have a double underscore prefix as there's no variant
of the function without it anymore.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 8 Apr 2025 16:31:16 +0000 (17:31 +0100)]
btrfs: rename remaining exported extent map functions
Rename all the exported functions from extent_map.h that don't have a
'btrfs_' prefix in their names, so that they are consistent with all the
other functions, to make it clear they are btrfs specific functions and
to avoid potential name collisions in the future with functions defined
elsewhere in the kernel.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 8 Apr 2025 16:13:12 +0000 (17:13 +0100)]
btrfs: rename functions to allocate and free extent maps
These functions are exported and don't have a 'btrfs_' prefix in their
names, which goes against coding style conventions. Rename them to have
such prefix, making it clear they are from btrfs and avoiding potential
collisions in the future with functions defined elsewhere outside btrfs.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 8 Apr 2025 15:52:09 +0000 (16:52 +0100)]
btrfs: rename extent map functions to get block start, end and check if in tree
These functions are exported and don't have a 'btrfs_' prefix in their
names, which goes against coding style conventions. Rename them to have
such prefix, making it clear they are from btrfs and avoiding potential
collisions in the future with functions defined elsewhere outside btrfs.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 8 Apr 2025 15:41:15 +0000 (16:41 +0100)]
btrfs: rename exported extent map compression functions
These functions are exported and don't have a 'btrfs_' prefix in their
names, which goes against coding style conventions. Rename them to have
such prefix, making it clear they are from btrfs and avoiding potential
collisions in the future with functions defined elsewhere outside btrfs.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 8 Apr 2025 11:08:49 +0000 (12:08 +0100)]
btrfs: tracepoints: remove no longer used tracepoints for eb locking
There are several tracepoints for extent buffer locks that are not used
anymore:
* btrfs_tree_read_unlock_blocking
* btrfs_set_lock_blocking_read
* btrfs_set_lock_blocking_write
* btrfs_tree_read_lock_atomic
These stopped being used after we switched extent buffer locks from a
custom implementation to rw semaphores in commit
196d59ab9ccc
("btrfs: switch extent buffer tree lock to rw_semaphore").
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 8 Apr 2025 11:01:06 +0000 (12:01 +0100)]
btrfs: tracepoints: add btrfs prefix to names where it's missing
Most of our tracepoints have the 'btrfs_' prefix in their names but a few
of them are missing, making it inconsistent. So add the prefix to the ones
that are missing it, creating consistency, making it clear for users these
are btrfs tracepoints and eventually avoid name collisions with other
tracepoints defined by other kernel subsystems.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 7 Apr 2025 11:15:25 +0000 (12:15 +0100)]
btrfs: make btrfs_find_contiguous_extent_bit() return bool instead of int
The function needs only to return true or false, so there's no need to
return an integer. Currently it returns 0 when a range with the given
bits is set and 1 when not found, which is a bit counter intuitive too.
So change the function to return a bool instead, returning true when a
range is found and false otherwise. Update the function's documentation
to mention the return value too.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 7 Apr 2025 10:57:05 +0000 (11:57 +0100)]
btrfs: remove double underscore prefix from __set_extent_bit()
Now that set_extent_bit() was renamed to btrfs_set_extent_bit(), there's
no need to have a __set_extent_bit() function, we can just remove the
double underscore prefix, which we try to avoid according to the coding
style conventions.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 4 Apr 2025 15:45:12 +0000 (16:45 +0100)]
btrfs: rename remaining exported functions from extent-io-tree.h
Rename the remaning exported functions that don't have a 'btrfs_' prefix.
By convention exported functions should have such prefix to make it clear
they are btrfs specific and to avoid collisions with functions from
elsewhere in the kernel.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 4 Apr 2025 15:31:24 +0000 (16:31 +0100)]
btrfs: rename free_extent_state() to include a btrfs prefix
This is an exported function so it should have a 'btrfs_' prefix by
convention, to make it clear it's btrfs specific and to avoid collisions
with functions from elsewhere in the kernel.
Rename the function to add 'btrfs_' prefix to it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 4 Apr 2025 15:07:19 +0000 (16:07 +0100)]
btrfs: rename the functions to count, test and get bit ranges in io trees
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel.
So add a 'btrfs_' prefix to their names to make it clear they are from
btrfs.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 4 Apr 2025 11:17:13 +0000 (12:17 +0100)]
btrfs: rename the functions to init and release an extent io tree
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel.
So add a 'btrfs_' prefix to their name to make it clear they are from
btrfs.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 4 Apr 2025 11:04:04 +0000 (12:04 +0100)]
btrfs: directly grab inode at __btrfs_debug_check_extent_io_range()
We've tested that we are dealing with io tree that is associated to an
inode (its owner is IO_TREE_INODE_IO), so there's no need to call
btrfs_extent_io_tree_to_inode() in a separate line and we just assign
tree->inode to the local inode variable when we declare it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 4 Apr 2025 10:09:07 +0000 (11:09 +0100)]
btrfs: rename the functions to get inode and fs_info from an extent io tree
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel.
So add a 'btrfs_' prefix to their name to make it clear they are from
btrfs. Also remove the 'const' suffix from extent_io_tree_to_inode_const()
since there's no non-const variant anymore and makes the naming consistent
with extent_io_tree_to_fs_info() (no 'const' suffix and returns a const
pointer).
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 3 Apr 2025 14:19:49 +0000 (15:19 +0100)]
btrfs: rename the functions to search for bits in extent ranges
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel.
So add a 'btrfs_' prefix to their name to make it clear they are from
btrfs.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 3 Apr 2025 14:00:26 +0000 (15:00 +0100)]
btrfs: rename set_extent_bit() to include a btrfs prefix
This is an exported function so it should have a 'btrfs_' prefix by
convention, to make it clear it's btrfs specific and to avoid collisions
with functions from elsewhere in the kernel.
So rename it to btrfs_set_extent_bit().
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 2 Apr 2025 10:50:08 +0000 (11:50 +0100)]
btrfs: rename the functions to clear bits for an extent range
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel. One of them has a
double underscore prefix which is also discouraged.
So remove double underscore prefix where applicable and add a 'btrfs_'
prefix to their name to make it clear they are from btrfs.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 31 Mar 2025 14:29:21 +0000 (15:29 +0100)]
btrfs: rename __lock_extent() and __try_lock_extent()
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel. Their double
underscore prefix is also discouraged.
So remove their double underscore prefix, add a 'btrfs_' prefix to their
name to make it clear they are from btrfs and a '_bits' suffix to avoid
collision with btrfs_lock_extent() and btrfs_try_lock_extent().
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 31 Mar 2025 13:38:24 +0000 (14:38 +0100)]
btrfs: add btrfs prefix to dio lock and unlock extent functions
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel. So add a prefix to
their name.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 31 Mar 2025 13:23:42 +0000 (14:23 +0100)]
btrfs: add btrfs prefix to main lock, try lock and unlock extent functions
These functions are exported so they should have a 'btrfs_' prefix by
convention, to make it clear they are btrfs specific and to avoid
collisions with functions from elsewhere in the kernel. So add a prefix to
their name.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 7 Apr 2025 16:52:22 +0000 (17:52 +0100)]
btrfs: add btrfs prefix to trace events for extent state alloc and free
These trace events don't have the 'btrfs_' prefix in their name, unlike
the other trace events from extent-io-tree.c. So add the prefix to make
them consistent and follow coding style conventions too.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 3 Apr 2025 14:53:31 +0000 (15:53 +0100)]
btrfs: remove extent_io_tree_to_inode() and is_inode_io_tree()
These functions aren't used outside extent-io-tree.c, but yet one of them
(extent_io_tree_to_inode()) is unnecessarily exported in the header.
Furthermore their single use is in a pattern like this:
if (is_inode_io_tree(tree))
foo(extent_io_tree_to_inode(tree), ...);
So we're effectively unnecessarily adding more indirection, checking
twice if tree->owner == IO_TREE_INODE_IO before getting the inode and
doing a non-inline function call to get tree->inode.
Simplify this by removing these helper functions and instead doing
thing like this:
if (tree->owner == IO_TREE_INODE_IO)
foo(tree->inode, ...);
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 4 Apr 2025 18:19:38 +0000 (20:19 +0200)]
btrfs: tree-checker: more unlikely annotations
Add more unlikely annotations to branches that lead to EUCLEAN, overall
in the tree checker this helps to reorder instructions for the no-error
case.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Thu, 3 Apr 2025 23:40:51 +0000 (10:10 +1030)]
btrfs: use folio_contains() for EOF detection
Currently we use the following pattern to detect if the folio contains
the end of a file:
if (folio->index == end_index)
folio_zero_range();
But that only works if the folio is page sized.
For the following case, it will not work and leave the range beyond EOF
uninitialized:
The page size is 4K, and the fs block size is also 4K.
16K 20K 24K
| | | |
|
EOF at 22K
And we have a large folio sized 8K at file offset 16K.
In that case, the old "folio->index == end_index" will not work, thus
the range [22K, 24K) will not be zeroed out.
Fix the following call sites which use the above pattern:
- add_ra_bio_pages()
- extent_writepage()
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Thu, 3 Apr 2025 23:50:21 +0000 (10:20 +1030)]
btrfs: remove unnecessary early exits in delalloc folio lock and unlock
Inside functions unlock_delalloc_folio() and lock_delalloc_folios(), we
have the following early exits:
if (index == locked_folio->index && end_index == index)
return;
This allows us to exit early if the range is inside the same locked
folio.
However the current check relies on page sized folios, if we have a large
folio that contains @index but not at @index, then the early exit will
no longer trigger.
Furthermore without the above early check, the existing code can handle it
well, as both __process_folios_contig() and lock_delalloc_folios() will
skip any folio page lock/unlock if it's on the locked folio.
Here we remove the early exits and let the existing code handle the
same index case, to make the code a little simpler.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 3 Apr 2025 15:23:41 +0000 (16:23 +0100)]
btrfs: tracepoints: use btrfs_root_id() to get the id of a root
Instead of open coding btrfs_root_id() to get the ID of a root, use the
helper in the trace points, which also makes the code less verbose.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 17 Mar 2025 07:10:52 +0000 (17:40 +1030)]
btrfs: zlib: prepare copy_data_into_buffer() for large data folios
The function itself is already taking large folios into consideration,
just remove the ASSERT(!folio_test_large()) line.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 17 Mar 2025 07:10:51 +0000 (17:40 +1030)]
btrfs: subpage: prepare for large data folios
The subpage handling code has two locations not supporting large folios:
- btrfs_attach_subpage()
Which is doing a metadata specific ASSERT() check.
But for the future large data folios support, that check is too
generic. Since it's metadata specific, only check the ASSERT() for
metadata.
- btrfs_subpage_assert()
Just remove the "ASSERT(folio_order(folio) == 0)" check.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 17 Mar 2025 07:10:50 +0000 (17:40 +1030)]
btrfs: prepare end_bbio_data_write() for large data folios
The function is doing an ASSERT() checking the folio order, but all
later functions are handling large folios properly, thus we can safely
remove that ASSERT().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 17 Mar 2025 07:10:49 +0000 (17:40 +1030)]
btrfs: prepare prepare_one_folio() for large data folios
The only blockage is the ASSERT() rejecting large folios, just remove
it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 17 Mar 2025 07:10:48 +0000 (17:40 +1030)]
btrfs: prepare btrfs_page_mkwrite() for large data folios
The function btrfs_page_mkwrite() has an explicit ASSERT() checking the
folio order.
To make it support large data folios, we need to:
- Remove the ASSERT(folio_order(folio) == 0)
- Use folio_contains() to check if the folio covers the last page
Otherwise the code is already supporting large folios well.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 17 Mar 2025 07:10:47 +0000 (17:40 +1030)]
btrfs: send: prepare put_file_data() for large data folios
Currently put_file_data() can only accept a page sized folio. However
the function itself is not that complex, it's just copying data from
filemap folio into the send buffer.
Make it support large data folios:
- Change the loop to use file offset instead of page index
- Calculate @pg_offset and @cur_len after getting the folio
- Remove the "WARN_ON(folio_order(folio));" line
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 17 Mar 2025 07:10:46 +0000 (17:40 +1030)]
btrfs: send: remove the again label inside put_file_data()
The again label is here to retry to get the folio for the current index.
When triggering that label, there is no advance of the iterator.
So it can be replaced by a simple "continue" and remove the again label.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 1 Apr 2025 23:18:12 +0000 (01:18 +0200)]
btrfs: use BTRFS_PATH_AUTO_FREE in btrfs_insert_inode_extref()
This is the trivial pattern for path auto free, initialize at the
beginning and free at the end with simple goto -> return conversions.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 1 Apr 2025 23:18:11 +0000 (01:18 +0200)]
btrfs: use BTRFS_PATH_AUTO_FREE in btrfs_del_inode_extref()
This is the trivial pattern for path auto free, initialize at the
beginning and free at the end with simple goto -> return conversions.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 1 Apr 2025 23:18:10 +0000 (01:18 +0200)]
btrfs: use BTRFS_PATH_AUTO_FREE in btrfs_encoded_read_inline()
This is the trivial pattern for path auto free, initialize at the
beginning and free at the end with simple goto -> return conversions.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 1 Apr 2025 23:18:09 +0000 (01:18 +0200)]
btrfs: use BTRFS_PATH_AUTO_FREE in can_nocow_extent()
This is the trivial pattern for path auto free, initialize at the
beginning and free at the end with simple goto -> return conversions.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 1 Apr 2025 23:18:08 +0000 (01:18 +0200)]
btrfs: use BTRFS_PATH_AUTO_FREE in btrfs_set_inode_index_count()
This is the trivial pattern for path auto free, initialize at the
beginning and free at the end with simple goto -> return conversions.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 1 Apr 2025 23:18:07 +0000 (01:18 +0200)]
btrfs: use BTRFS_PATH_AUTO_FREE in may_destroy_subvol()
This is the trivial pattern for path auto free, initialize at the
beginning and free at the end with simple goto -> return conversions.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 1 Apr 2025 23:18:06 +0000 (01:18 +0200)]
btrfs: do more trivial BTRFS_PATH_AUTO_FREE conversions
The most trivial pattern for the auto freeing when the variable is
declared with the macro and the final btrfs_free_path() is removed.
There are almost none goto -> return conversions and there's no other
function cleanup.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 2 Apr 2025 13:10:53 +0000 (14:10 +0100)]
btrfs: remove redundant record start offset check at test_range_bit()
It's pointless to check if the current record's start offset is greater
than the end offset, as before we just tested if it was greater than the
start offset - and if it's not it means it's less than or equal to the
start offset, so it can not be greater than the end offset, as our start
offset is always smaller than the end offset.
So remove that check and also add an assertion to verify the start offset
is smaller then the end offset.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 2 Apr 2025 12:31:46 +0000 (13:31 +0100)]
btrfs: simplify last record detection at test_range_bit()
The overflow detection for the start offset of the next record is not
really necessary, we can just stop iterating if the current record ends at
or after out end offset. This removes the need to test if the current
record end offset is (u64)-1 and to check if adding 1 to the current
end offset results in 0.
By testing only if the current record ends at or after the end offset, we
also don't need anymore to test the new start offset at the head of the
while loop.
This makes both the source code and assembly code simpler, more efficient
and shorter (reducing the object text size).
Also remove the pointless initialization to NULL of the state variable, as
we don't use it before the first assignment to it. This may help avoid
some warnings with clang tools such as the one reported/fixed by commit
966de47ff0c9 ("btrfs: remove redundant initialization of variables in
log_new_ancestors").
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 2 Apr 2025 12:23:45 +0000 (13:23 +0100)]
btrfs: remove redundant check at find_first_extent_bit_state()
The tree_search() function always returns an entry that either contains
the search offset or the first entry in the tree that starts after the
offset. So checking at find_first_extent_bit_state() if the returned
entry ends at or after the search offset is pointless. Remove the check.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 2 Apr 2025 12:07:33 +0000 (13:07 +0100)]
btrfs: fix documentation for tree_search_for_insert()
There are several things wrong with the documentation:
1) At the top it's only mentioned that we search for an entry containing
the given offset, but when such entry does not exists we search for
the first entry that starts and ends after that offset;
2) It mentions that @node_ret and @parent_ret aren't changed if the
returned entry contains the given offset - that is true only if the
returned entry starts exactly at @offset, otherwise those arguments
are changed;
3) It mentions that if no entry containing offset is found then we return
the first entry ending before the offset - that is not true, we return
the first entry that starts and ends after that offset;
4) It also mentions that NULL is never returned. This is false as in case
there's no entry containing offset or any entry that starts and ends
after offset, NULL is returned.
So fix the documentation.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 1 Apr 2025 11:27:38 +0000 (12:27 +0100)]
btrfs: simplify last record detection at test_range_bit_exists()
Instead of keeping track of the minimum start offset of the next record
and detecting overflow every time we update that offset to be the sum of
current record's end offset plus one, we can simply exit when the current
record ends at or beyond our end offset and forget about updating the
start offset on every iteration and testing for it at the top of the loop.
This makes both the source code and assembly code simpler, more efficient
and shorter (reducing the object text size).
Also remove the pointless initialization to NULL of the state variable, as
we don't use it before the first assignment to it. This may help avoid
some warnings with clang tools such as the one reported/fixed by commit
966de47ff0c9 ("btrfs: remove redundant initialization of variables in
log_new_ancestors").
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 1 Apr 2025 15:12:52 +0000 (16:12 +0100)]
btrfs: use clear_extent_bits() instead of clear_extent_bit() where possible
Several places are using clear_extent_bit() and passing a NULL value for
the 'cached' argument, which is pointless as they can use instead
clear_extent_bits().
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>