linux-block.git
3 weeks agobtrfs: reorganize logic at free_extent_buffer() for better readability
Filipe Manana [Mon, 2 Jun 2025 12:27:49 +0000 (13:27 +0100)]
btrfs: reorganize logic at free_extent_buffer() for better readability

It's hard to read the logic to break out of the while loop since it's a
very long expression consisting of a logical or of two composite
expressions, each one composed by a logical and. Further each one is also
testing for the EXTENT_BUFFER_UNMAPPED bit, making it more verbose than
necessary.

So change from this:

    if ((!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags) && refs <= 3)
        || (test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags) &&
            refs == 1))
       break;

To this:

    if (test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags)) {
        if (refs == 1)
            break;
    } else if (refs <= 3) {
            break;
    }

At least on x86_64 using gcc 9.3.0, this doesn't change the object size.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: make btrfs_readdir_delayed_dir_index() return a bool instead
Filipe Manana [Sat, 31 May 2025 14:56:46 +0000 (15:56 +0100)]
btrfs: make btrfs_readdir_delayed_dir_index() return a bool instead

There's no need to return errors, all we do is return 1 or 0 depending
on whether we should or should not stop iterating over delayed dir
indexes. So change the function to return bool instead of an int.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: make btrfs_should_delete_dir_index() return a bool instead
Filipe Manana [Sat, 31 May 2025 14:41:41 +0000 (15:41 +0100)]
btrfs: make btrfs_should_delete_dir_index() return a bool instead

There's no need to return errors and we currently return 1 in case the
index should be deleted and 0 otherwise, so change the return type from
int to bool as this is a boolean function and it's more clear this way.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: add details to error messages at btrfs_delete_delayed_dir_index()
Filipe Manana [Fri, 30 May 2025 17:41:18 +0000 (18:41 +0100)]
btrfs: add details to error messages at btrfs_delete_delayed_dir_index()

Update the error messages with:

1) Fix typo in the first one, deltiona -> deletion;

2) Remove redundant part of the first message, the part following the
   comma, and including all the useful information: root, inode, index
   and error value;

3) Update the second message to use more formal language (example 'error'
   instead of 'err'), , remove redundant part about "deletion tree of
   delayed node..." and print the relevant information in the same
   format and order as the first message, without the ugly opening
   parenthesis without a space separating from the previous word.
   This also makes the message similar in format to the one we have at
   btrfs_insert_delayed_dir_index().

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: make btrfs_delete_delayed_insertion_item() return a boolean
Filipe Manana [Fri, 30 May 2025 17:27:07 +0000 (18:27 +0100)]
btrfs: make btrfs_delete_delayed_insertion_item() return a boolean

There's no need to return an integer as all we need to do is return true
or false to tell whether we deleted a delayed item or not. Also the logic
is inverted since we return 1 (true) if we didn't delete and 0 (false) if
we did, which is somewhat counter intuitive. Change the return type to a
boolean and make it return true if we deleted and false otherwise.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: switch del_all argument of replay_dir_deletes() from int to bool
Filipe Manana [Thu, 29 May 2025 16:41:46 +0000 (17:41 +0100)]
btrfs: switch del_all argument of replay_dir_deletes() from int to bool

The argument has boolean semantics, so change its type from int to bool,
making it more clear.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: pass NULL index to btrfs_del_inode_ref() where not needed
Filipe Manana [Thu, 29 May 2025 15:59:07 +0000 (16:59 +0100)]
btrfs: pass NULL index to btrfs_del_inode_ref() where not needed

There are two callers of btrfs_del_inode_ref() that declare a local index
variable and then pass a pointer for it to btrfs_del_inode_ref(), but then
don't use that index at all. Since btrfs_del_inode_ref() accepts a NULL
index pointer, pass NULL instead and stop declaring those useless index
variables.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: allocate scratch eb earlier at btrfs_log_new_name()
Filipe Manana [Sun, 1 Jun 2025 14:39:16 +0000 (15:39 +0100)]
btrfs: allocate scratch eb earlier at btrfs_log_new_name()

Instead of allocating the scratch eb after joining the log transaction,
allocate it before so that we're not delaying log commits for longer
than necessary, as allocating the scratch eb means allocating an
extent_buffer structure, which comes from a dedicated kmem_cache, plus
pages/folios to attach to the eb. Both of these allocations may take time
when we're under memory pressure.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: allocate path earlier at btrfs_log_new_name()
Filipe Manana [Sat, 31 May 2025 15:33:14 +0000 (16:33 +0100)]
btrfs: allocate path earlier at btrfs_log_new_name()

Instead of allocating the path after joining the log transaction, allocate
it before so that we're not delaying log commits for the rare cases where
the allocation takes a significant time (under memory pressure and all
slabs are full, there's the need to allocate a new page, etc).

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: allocate path earlier at btrfs_del_dir_entries_in_log()
Filipe Manana [Thu, 29 May 2025 15:32:37 +0000 (16:32 +0100)]
btrfs: allocate path earlier at btrfs_del_dir_entries_in_log()

Instead of allocating the path after joining the log transaction, allocate
it before so that we're not delaying log commits for the rare cases where
the allocation takes a significant time (under memory pressure and all
slabs are full, there's the need to allocate a new page, etc).

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: assert we join log transaction at btrfs_del_dir_entries_in_log()
Filipe Manana [Thu, 29 May 2025 15:25:29 +0000 (16:25 +0100)]
btrfs: assert we join log transaction at btrfs_del_dir_entries_in_log()

We are supposed to be able to join a log transaction at that point, since
we have determined that the inode was logged in the current transaction
with the call to inode_logged(). So ASSERT() we joined a log transaction
and also warn if we didn't in case assertions are disabled (the kernel
config doesn't have CONFIG_BTRFS_ASSERT=y), so that the issue gets noticed
and reported if it ever happens.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use btrfs_del_item() at del_logged_dentry()
Filipe Manana [Thu, 29 May 2025 15:12:45 +0000 (16:12 +0100)]
btrfs: use btrfs_del_item() at del_logged_dentry()

There's no need to use btrfs_delete_one_dir_name() at del_logged_dentry()
because we are processing a dir index key which can contain only a single
name, unlike dir item keys which can encode multiple names in case of name
hash collisions. We have explicitly looked up for a dir index key by
calling btrfs_lookup_dir_index_item() and we don't log dir item keys
anymore (since commit 339d03542484 ("btrfs: only copy dir index keys when
logging a directory")). So simplify and use btrfs_del_item() directly
instead of btrfs_delete_one_dir_name().

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: free path sooner at __btrfs_unlink_inode()
Filipe Manana [Thu, 29 May 2025 14:47:24 +0000 (15:47 +0100)]
btrfs: free path sooner at __btrfs_unlink_inode()

After calling btrfs_delete_one_dir_name() there's no need for the path
anymore so we can free it immediately after. After that point we do
some btree operations that take time and in those call chains we end up
allocating paths for these operations, so we're unnecessarily holding on
to the path we allocated early at __btrfs_unlink_inode().

So free the path as soon as we don't need it and add a comment. This
also allows to simplify the error path, removing two exit labels and
returning directly when an error happens.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: assert we join log transaction at btrfs_del_inode_ref_in_log()
Filipe Manana [Wed, 28 May 2025 14:28:26 +0000 (15:28 +0100)]
btrfs: assert we join log transaction at btrfs_del_inode_ref_in_log()

We are supposed to be able to join a log transaction at that point, since
we have determined that the inode was logged in the current transaction
with the call to inode_logged(). So ASSERT() we joined a log transaction
and also warn if we didn't in case assertions are disabled (the kernel
config doesn't have CONFIG_BTRFS_ASSERT=y), so that the issue gets noticed
and reported if it ever happens.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: open code fc_mount() to avoid releasing s_umount rw_sempahore
Al Viro [Tue, 6 May 2025 19:58:26 +0000 (20:58 +0100)]
btrfs: open code fc_mount() to avoid releasing s_umount rw_sempahore

[CURRENT BEHAVIOR]
Currently inside btrfs_get_tree_subvol(), we call fc_mount() to grab a
tree, then re-lock s_umount inside btrfs_reconfigure_for_mount() to
avoid race with remount.

However fc_mount() itself is just doing two things:

1. Call vfs_get_tree()
2. Release s_umount then call vfs_create_mount()

[ENHANCEMENT]
Instead of calling fc_mount(), we can open-code it with vfs_get_tree()
first.
This provides a benefit that, since we have the full control of
s_umount, we do not need to re-lock that rw_sempahore when calling
btrfs_reconfigure_for_mount(), meaning less race between RO/RW remount.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ Rework the subject and commit message, refactor the error handling ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Tested-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in scrub_submit_extent_sector_read()
David Sterba [Fri, 30 May 2025 16:19:06 +0000 (18:19 +0200)]
btrfs: rename err to ret in scrub_submit_extent_sector_read()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_create_common()
David Sterba [Fri, 30 May 2025 16:19:03 +0000 (18:19 +0200)]
btrfs: rename err to ret in btrfs_create_common()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_wait_tree_log_extents()
David Sterba [Fri, 30 May 2025 16:18:57 +0000 (18:18 +0200)]
btrfs: rename err to ret in btrfs_wait_tree_log_extents()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_wait_extents()
David Sterba [Fri, 30 May 2025 16:18:55 +0000 (18:18 +0200)]
btrfs: rename err to ret in btrfs_wait_extents()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in quota_override_store()
David Sterba [Fri, 30 May 2025 16:18:48 +0000 (18:18 +0200)]
btrfs: rename err to ret in quota_override_store()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_fill_super()
David Sterba [Fri, 30 May 2025 16:18:46 +0000 (18:18 +0200)]
btrfs: rename err to ret in btrfs_fill_super()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in calc_pct_ratio()
David Sterba [Fri, 30 May 2025 16:18:40 +0000 (18:18 +0200)]
btrfs: rename err to ret in calc_pct_ratio()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_symlink()
David Sterba [Fri, 30 May 2025 16:18:33 +0000 (18:18 +0200)]
btrfs: rename err to ret in btrfs_symlink()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_link()
David Sterba [Fri, 30 May 2025 16:18:31 +0000 (18:18 +0200)]
btrfs: rename err to ret in btrfs_link()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_setattr()
David Sterba [Fri, 30 May 2025 16:18:29 +0000 (18:18 +0200)]
btrfs: rename err to ret in btrfs_setattr()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_init_inode_security()
David Sterba [Fri, 30 May 2025 16:18:18 +0000 (18:18 +0200)]
btrfs: rename err to ret in btrfs_init_inode_security()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_alloc_from_bitmap()
David Sterba [Fri, 30 May 2025 16:18:01 +0000 (18:18 +0200)]
btrfs: rename err to ret in btrfs_alloc_from_bitmap()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_lock_extent_bits()
David Sterba [Fri, 30 May 2025 16:17:58 +0000 (18:17 +0200)]
btrfs: rename err to ret in btrfs_lock_extent_bits()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret in btrfs_try_lock_extent_bits()
David Sterba [Fri, 30 May 2025 16:17:48 +0000 (18:17 +0200)]
btrfs: rename err to ret in btrfs_try_lock_extent_bits()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret2 in btrfs_truncate_inode_items()
David Sterba [Fri, 30 May 2025 16:17:46 +0000 (18:17 +0200)]
btrfs: rename err to ret2 in btrfs_truncate_inode_items()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret2 in btrfs_add_link()
David Sterba [Fri, 30 May 2025 16:17:39 +0000 (18:17 +0200)]
btrfs: rename err to ret2 in btrfs_add_link()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret2 in btrfs_setsize()
David Sterba [Fri, 30 May 2025 16:17:37 +0000 (18:17 +0200)]
btrfs: rename err to ret2 in btrfs_setsize()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret2 in btrfs_search_old_slot()
David Sterba [Fri, 30 May 2025 16:17:26 +0000 (18:17 +0200)]
btrfs: rename err to ret2 in btrfs_search_old_slot()

Unify naming of return value to the preferred way, move the variable to
the closest scope.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret2 in btrfs_search_slot()
David Sterba [Fri, 30 May 2025 16:17:24 +0000 (18:17 +0200)]
btrfs: rename err to ret2 in btrfs_search_slot()

Unify naming of return value to the preferred way, move the variable to
the closest scope.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret2 in search_leaf()
David Sterba [Fri, 30 May 2025 16:17:14 +0000 (18:17 +0200)]
btrfs: rename err to ret2 in search_leaf()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret2 in read_block_for_search()
David Sterba [Fri, 30 May 2025 16:17:11 +0000 (18:17 +0200)]
btrfs: rename err to ret2 in read_block_for_search()

Unify naming of return value to the preferred way.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename err to ret2 in resolve_indirect_refs()
David Sterba [Fri, 30 May 2025 16:17:05 +0000 (18:17 +0200)]
btrfs: rename err to ret2 in resolve_indirect_refs()

Unify naming of return value to the preferred way, move the variable to
the closest scope.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: rename btrfs_subpage structure
Qu Wenruo [Mon, 2 Jun 2025 00:38:53 +0000 (10:08 +0930)]
btrfs: rename btrfs_subpage structure

With the incoming large data folios support, the structure name
btrfs_subpage is no longer correct, as for we can have multiple blocks
inside a large folio, and the block size is still page size.

So to follow the schema of iomap, rename btrfs_subpage to
btrfs_folio_state, along with involved enums.

There are still exported functions with "btrfs_subpage_" prefix, and I
believe for metadata the name "subpage" will stay forever as we will
never allocate a folio larger than nodesize anyway.

The full cleanup of the word "subpage" will happen in much smaller steps
in the future.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: add comments on the extra btrfs specific subpage bitmaps
Qu Wenruo [Mon, 2 Jun 2025 00:38:52 +0000 (10:08 +0930)]
btrfs: add comments on the extra btrfs specific subpage bitmaps

Unlike the iomap_folio_state structure, the btrfs_subpage structure has a
lot of extra sub-bitmaps, namely:

- writeback sub-bitmap
- locked sub-bitmap
  iomap_folio_state uses an atomic for writeback tracking, while it has
  no per-block locked tracking.

  This is because iomap always locks a single folio, and submits dirty
  blocks with that folio locked.

  But btrfs has async delalloc ranges (for compression), which are queued
  with their range locked, until the compression is done, then marks the
  involved range writeback and unlocked.

  This means a range can be unlocked and marked writeback at seemingly
  random timing, thus it needs the extra tracking.

  This needs a huge rework on the lifespan of async delalloc range
  before we can remove/simplify these two sub-bitmaps.

- ordered sub-bitmap
- checked sub-bitmap
  These are for COW-fixup, but as I mentioned in the past, the COW-fixup
  is not really needed anymore and these two flags are already marked
  deprecated, and will be removed in the near future after comprehensive
  tests.

Add related comments to indicate we're actively trying to align the
sub-bitmaps to the iomap ones.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: harden parsing of compression mount options
Daniel Vacek [Mon, 2 Jun 2025 15:53:19 +0000 (17:53 +0200)]
btrfs: harden parsing of compression mount options

Btrfs happily but incorrectly accepts the `-o compress=zlib+foo` and similar
options with any random suffix.

Fix that by explicitly checking the end of the strings.

Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: factor out compression mount options parsing
Daniel Vacek [Mon, 2 Jun 2025 15:53:18 +0000 (17:53 +0200)]
btrfs: factor out compression mount options parsing

There are many options making the parsing a bit lengthy.  Factor the
compress options out into a helper function.  The next patch is going to
harden this function.

Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: constify more pointer parameters
David Sterba [Thu, 15 May 2025 15:03:24 +0000 (17:03 +0200)]
btrfs: constify more pointer parameters

Another batch of pointer parameter constifications. This is for clarity
and minor addition to type safety. There are no observable effects in the
assembly code and .ko measured on release config.

Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: sysfs: track current commit duration in commit_stats
Boris Burkov [Fri, 9 May 2025 00:54:41 +0000 (17:54 -0700)]
btrfs: sysfs: track current commit duration in commit_stats

When debugging/detecting outlier commit stalls, having an indicator that
we are currently in a long commit critical section can be very useful.
Extend the commit_stats sysfs file to also include the current commit
critical section duration.

Since this requires storing the last commit start time, use that rather
than a separate stack variable for storing the finished commit durations
as well.

This also requires slightly moving up the timing of the stats updating
to *inside* the critical section to avoid the transaction T+1 setting
the critical_section_start_time to 0 before transaction T can update its
stats, which would trigger the new ASSERT. This is an improvement in and
of itself, as it makes the stats more accurately represent the true
critical section time. It may be yet better to pull the stats up to where
start_transaction gets unblocked, rather than the next commit, but this
seems like a good enough place as well.

Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find_add() in rb_simple_insert()
Pan Chuang [Fri, 16 May 2025 03:03:33 +0000 (11:03 +0800)]
btrfs: use rb_find_add() in rb_simple_insert()

Use the rb-tree helper so we don't open code the search and insert
code.

Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: pass struct rb_simple_node pointer directly in rb_simple_insert()
Pan Chuang [Fri, 16 May 2025 03:03:32 +0000 (11:03 +0800)]
btrfs: pass struct rb_simple_node pointer directly in rb_simple_insert()

Replace struct embedding with union to enable safe type conversion in
btrfs_backref_node, tree_block and mapping_node.

Adjust function calls to use the new unified API, eliminating redundant
parameters.

Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find_add() in btrfs_qgroup_add_swapped_blocks()
Yangtao Li [Fri, 16 May 2025 03:03:31 +0000 (11:03 +0800)]
btrfs: use rb_find_add() in btrfs_qgroup_add_swapped_blocks()

Use the rb-tree helper so we don't open code the search and insert
code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find() in btrfs_qgroup_trace_subtree_after_cow()
Yangtao Li [Fri, 16 May 2025 03:03:30 +0000 (11:03 +0800)]
btrfs: use rb_find() in btrfs_qgroup_trace_subtree_after_cow()

Use the rb-tree helper so we don't open code the search code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find_add() in add_qgroup_rb()
Yangtao Li [Fri, 16 May 2025 03:03:29 +0000 (11:03 +0800)]
btrfs: use rb_find_add() in add_qgroup_rb()

Use the rb-tree helper so we don't open code the search and insert
code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find() in find_qgroup_rb()
Yangtao Li [Fri, 16 May 2025 03:03:28 +0000 (11:03 +0800)]
btrfs: use rb_find() in find_qgroup_rb()

Use the rb-tree helper so we don't open code the search code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find_add() in insert_ref_entry()
Yangtao Li [Fri, 16 May 2025 03:03:27 +0000 (11:03 +0800)]
btrfs: use rb_find_add() in insert_ref_entry()

Use the rb-tree helper so we don't open code the search and insert
code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find_add() in insert_root_entry()
Yangtao Li [Fri, 16 May 2025 03:03:26 +0000 (11:03 +0800)]
btrfs: use rb_find_add() in insert_root_entry()

Use the rb-tree helper so we don't open code the search and insert
code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find() in lookup_root_entry()
Yangtao Li [Fri, 16 May 2025 03:03:25 +0000 (11:03 +0800)]
btrfs: use rb_find() in lookup_root_entry()

Use the rb-tree helper so we don't open code the search code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find_add() in insert_block_entry()
Yangtao Li [Fri, 16 May 2025 03:03:24 +0000 (11:03 +0800)]
btrfs: use rb_find_add() in insert_block_entry()

Use the rb-tree helper so we don't open code the search and insert
code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find() in lookup_block_entry()
Yangtao Li [Fri, 16 May 2025 03:03:23 +0000 (11:03 +0800)]
btrfs: use rb_find() in lookup_block_entry()

Use the rb-tree helper so we don't open code the search code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find_add() in ulist_rbtree_insert()
Yangtao Li [Fri, 16 May 2025 03:03:22 +0000 (11:03 +0800)]
btrfs: use rb_find_add() in ulist_rbtree_insert()

Use the rb-tree helper so we don't open code the search and insert
code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find() in ulist_rbtree_search()
Yangtao Li [Fri, 16 May 2025 03:03:21 +0000 (11:03 +0800)]
btrfs: use rb_find() in ulist_rbtree_search()

Use the rb-tree helper so we don't open code the search code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find() in __btrfs_lookup_delayed_item()
Yangtao Li [Fri, 16 May 2025 03:03:20 +0000 (11:03 +0800)]
btrfs: use rb_find() in __btrfs_lookup_delayed_item()

Use the rb-tree helper so we don't open code the search code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: use rb_find_add() in btrfs_insert_inode_defrag()
Yangtao Li [Fri, 16 May 2025 03:03:19 +0000 (11:03 +0800)]
btrfs: use rb_find_add() in btrfs_insert_inode_defrag()

Use the rb-tree helper so we don't open code the search and insert
code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Pan Chuang <panchuang@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: fix comment in reserved space warning
Dan Johnson [Tue, 15 Apr 2025 00:25:52 +0000 (17:25 -0700)]
btrfs: fix comment in reserved space warning

mkfs.btrfs up to v4.14 actually can leave a chunk inside the reserved
space when invoked with `-m single`, fixed by 997f9977c24397eb6980bb9
("mkfs: Prevent temporary system chunk to use space in reserved 1M
range") released with v4.15.

Signed-off-by: Dan Johnson <ComputerDruid@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: relocation: simplify unused logic related to LINK_LOWER
Daniel Vacek [Wed, 14 May 2025 13:12:39 +0000 (15:12 +0200)]
btrfs: relocation: simplify unused logic related to LINK_LOWER

btrfs_backref_link_edge() is always called with the LINK_LOWER argument.
We can simplify it and remove the LINK_LOWER and LINK_UPPER macros
completely.

The last call with LINK_UPPER was removed with commit 0097422c0dfe0a
("btrfs: remove clone_backref_node() from relocation").

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: unfold transaction abort at btrfs_insert_one_raid_extent()
Filipe Manana [Mon, 19 May 2025 11:16:10 +0000 (12:16 +0100)]
btrfs: unfold transaction abort at btrfs_insert_one_raid_extent()

We have a common error path where we abort the transaction, but like this
in case we get a transaction abort stack trace we don't know exactly which
previous function call failed. Instead abort the transaction after any
function call that returns an error, so that we can easily identify which
function failed.

Reviewed-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: unfold transaction abort at __btrfs_update_delayed_inode()
Filipe Manana [Mon, 19 May 2025 10:57:13 +0000 (11:57 +0100)]
btrfs: unfold transaction abort at __btrfs_update_delayed_inode()

We have a common error path where we abort the transaction, but like this
in case we get a transaction abort stack trace we don't know exactly which
previous function call failed. Instead abort the transaction after any
function call that returns an error, so that we can easily identify which
function failed.

Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: abort transaction on unexpected eb generation at btrfs_copy_root()
Filipe Manana [Mon, 19 May 2025 10:07:29 +0000 (11:07 +0100)]
btrfs: abort transaction on unexpected eb generation at btrfs_copy_root()

If we find an unexpected generation for the extent buffer we are cloning
at btrfs_copy_root(), we just WARN_ON() and don't error out and abort the
transaction, meaning we allow to persist metadata with an unexpected
generation. Instead of warning only, abort the transaction and return
-EUCLEAN.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: unfold transaction abort at btrfs_copy_root()
Filipe Manana [Mon, 19 May 2025 09:59:18 +0000 (10:59 +0100)]
btrfs: unfold transaction abort at btrfs_copy_root()

Instead of having a common btrfs_abort_transaction() call for when any of
the two btrfs_inc_ref() calls fail, move the btrfs_abort_transaction() to
happen immediately after each one of the calls, so that when analyzing a
stack trace with a transaction abort we know which call failed.

Reviewed-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: move transaction aborts to the error site in add_block_group_free_space()
David Sterba [Sat, 17 May 2025 19:03:57 +0000 (21:03 +0200)]
btrfs: move transaction aborts to the error site in add_block_group_free_space()

Transaction aborts should be done next to the place the error happens,
which was not done in add_block_group_free_space().

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: move transaction aborts to the error site in remove_block_group_free_space()
David Sterba [Sat, 17 May 2025 19:04:10 +0000 (21:04 +0200)]
btrfs: move transaction aborts to the error site in remove_block_group_free_space()

Transaction aborts should be done next to the place the error happens,
which was not done in remove_block_group_free_space().

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: simplify error detection flow during log replay
Filipe Manana [Wed, 21 May 2025 17:18:09 +0000 (18:18 +0100)]
btrfs: simplify error detection flow during log replay

We have this fuzzy logic at btrfs_recover_log_trees() where we don't
abort the transaction and exit immediately after each function call that
returned an error, and instead have if-then-else logic or check if the
previous function call returned success before calling the next function.

Make the flow more straightforward by immediately aborting the transaction
and exiting after each function call failure. This also allows to avoid
two consecutive if statements that test the same conditions:

   if (!ret && wc.stage == LOG_WALK_REPLAY_ALL) {
        (...)
   }

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: remove redundant path release when replaying a log tree
Filipe Manana [Wed, 21 May 2025 16:56:25 +0000 (17:56 +0100)]
btrfs: remove redundant path release when replaying a log tree

There's no need to call btrfs_release_path() before calling
btrfs_init_root_free_objectid() as we have released the path already at
the top of the loop and the previous call to fixup_inode_link_counts()
also releases the path. So remove it to simplify the code.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: abort transaction during log replay if walk_log_tree() failed
Filipe Manana [Wed, 21 May 2025 16:41:18 +0000 (17:41 +0100)]
btrfs: abort transaction during log replay if walk_log_tree() failed

If we failed walking a log tree during replay, we have a missing
transaction abort to prevent committing a transaction where we didn't
fully replay all the changes from a log tree and therefore can leave the
respective subvolume tree in some inconsistent state. So add the missing
transaction abort.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: unfold transaction aborts when replaying log trees
Filipe Manana [Wed, 21 May 2025 16:30:56 +0000 (17:30 +0100)]
btrfs: unfold transaction aborts when replaying log trees

We have a single line doing a transaction abort in case either we got an
error from btrfs_get_fs_root() different from -ENOENT or we got an error
from btrfs_pin_extent_for_log_replay(), making it hard to figure out which
function call failed when looking at a transaction abort massages and
stack trace in dmesg. Change this to have an explicit transaction abort
for each one of the two cases.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: make btrfs_should_periodic_reclaim() static
Johannes Thumshirn [Wed, 21 May 2025 10:14:05 +0000 (12:14 +0200)]
btrfs: make btrfs_should_periodic_reclaim() static

btrfs_should_periodic_reclaim() is not used outside of space-info.c so
make it static and remove the prototype from space-info.h.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
3 weeks agobtrfs: zoned: use filesystem size not disk size for reclaim decision
Johannes Thumshirn [Tue, 20 May 2025 07:20:47 +0000 (09:20 +0200)]
btrfs: zoned: use filesystem size not disk size for reclaim decision

When deciding if a zoned filesystem is reaching the threshold to reclaim
data block groups, look at the size of the filesystem not to potentially
total available size of all drives in the filesystem.

Especially if a filesystem was created with mkfs' -b option, constraining
it to only a portion of the block device, the numbers won't match and
potentially garbage collection is kicking in too late.

Fixes: 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: unfold transaction abort at clone_copy_inline_extent()
Filipe Manana [Fri, 16 May 2025 18:37:44 +0000 (19:37 +0100)]
btrfs: unfold transaction abort at clone_copy_inline_extent()

We have a common error path where we abort the transaction, but like this
in case we get a transaction abort stack trace we don't know exactly which
previous function call failed. Instead abort the transaction after any
function call that returns an error, so that we can easily identify which
function failed.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: remove pointless 'out' label from clone_finish_inode_update()
Filipe Manana [Fri, 16 May 2025 18:13:53 +0000 (19:13 +0100)]
btrfs: remove pointless 'out' label from clone_finish_inode_update()

The label is only used once and we can instead return directly where it's
used, besides the fact that all we do under the label is to return the
value of 'ret'. So get rid of the label and return directly.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: unfold transaction abort at walk_up_proc()
Filipe Manana [Fri, 16 May 2025 16:32:14 +0000 (17:32 +0100)]
btrfs: unfold transaction abort at walk_up_proc()

Instead of having a common btrfs_abort_transaction() call for when any of
the two btrfs_dec_ref() calls fail, move the btrfs_abort_transaction() to
happen immediately after each one of the calls, so that when analysing a
stack trace with a transaction abort we know which call failed.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: unfold transaction abort at __btrfs_inc_extent_ref()
Filipe Manana [Fri, 16 May 2025 16:26:03 +0000 (17:26 +0100)]
btrfs: unfold transaction abort at __btrfs_inc_extent_ref()

Instead of having a common btrfs_abort_transaction() call for when either
insert_tree_block_ref() failed or when insert_extent_data_ref() failed,
move the btrfs_abort_transaction() to happen immediately after each one of
those calls, so that when analysing a stack trace with a transaction abort
we know which call failed.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agobtrfs: unfold transaction aborts at btrfs_create_new_inode()
Filipe Manana [Fri, 16 May 2025 16:07:40 +0000 (17:07 +0100)]
btrfs: unfold transaction aborts at btrfs_create_new_inode()

Instead of having a common btrfs_abort_transaction() call for when either
btrfs_orphan_add() failed or when btrfs_add_link() failed, move the
btrfs_abort_transaction() to happen immediately after each one of those
calls, so that when analysing a stack trace with a transaction abort we
know which call failed.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
4 weeks agoLinux 6.16-rc7
Linus Torvalds [Sun, 20 Jul 2025 22:18:33 +0000 (15:18 -0700)]
Linux 6.16-rc7

4 weeks agoMerge tag 'trace-v6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Sun, 20 Jul 2025 20:03:31 +0000 (13:03 -0700)]
Merge tag 'trace-v6.16-rc5' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix timerlat with use of FORTIFY_SOURCE

   FORTIFY_SOURCE was added to the stack tracer where it compares the
   entry->caller array to having entry->size elements.

   timerlat has the following:

      memcpy(&entry->caller, fstack->calls, size);
      entry->size = size;

   Which triggers FORTIFY_SOURCE as the caller is populated before the
   entry->size is initialized.

   Swap the order to satisfy FORTIFY_SOURCE logic.

 - Add down_write(trace_event_sem) when adding trace events in modules

   Trace events being added to the ftrace_events array are protected by
   the trace_event_sem semaphore. But when loading modules that have
   trace events, the addition of the events are not protected by the
   semaphore and loading two modules that have events at the same time
   can corrupt the list.

   Also add a lockdep_assert_held(trace_event_sem) to
   _trace_add_event_dirs() to confirm it is held when iterating the
   list.

* tag 'trace-v6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Add down_write(trace_event_sem) when adding trace event
  tracing/osnoise: Fix crash in timerlat_dump_stack()

4 weeks agoMerge tag 'i2c-for-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sun, 20 Jul 2025 19:56:13 +0000 (12:56 -0700)]
Merge tag 'i2c-for-6.16-rc7' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "omap:
   - add missing error check
   - fix PM disable in probe error path

  stm32:
   - unmap DMA buffer on transfer failure
   - use correct device when mapping and unmapping during transfers"

* tag 'i2c-for-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: stm32f7: unmap DMA mapped buffer
  i2c: stm32: fix the device used for the DMA map
  i2c: omap: Fix an error handling path in omap_i2c_probe()
  i2c: omap: Handle omap_i2c_init() errors in omap_i2c_probe()

4 weeks agoMerge tag 'x86-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 20 Jul 2025 18:27:52 +0000 (11:27 -0700)]
Merge tag 'x86-urgent-2025-07-20' of git://git./linux/kernel/git/tip/tip

Pull x86 bug fix from Thomas Gleixner:
 "A single fix for a GCC wreckage, which emits a KCSAN instrumentation
  call in __sev_es_nmi_complete() despite the function being annotated
  with 'noinstr'.

  As all functions in that source file are noinstr, exclude the whole
  file from KCSAN in the Makefile to cure it"

* tag 'x86-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Work around broken noinstr on GCC

4 weeks agoMerge tag 'locking-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 Jul 2025 18:22:05 +0000 (11:22 -0700)]
Merge tag 'locking-urgent-2025-07-20' of git://git./linux/kernel/git/tip/tip

Pull locking fix from Thomas Gleixner:
 "A single fix for the futex selftest code to make 32-bit user space
  work correctly on 64-bit kernels.

  sys_futex_wait() expects a struct __kernel_timespec for the timeout,
  but the selftest uses struct timespec, which is the original 32-bit
  non 2038 compliant variant.

  Fix it up by converting the callsite supplied timespec to a
  __kernel_timespec and hand that into the syscall"

* tag 'locking-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/futex: Convert 32-bit timespec to 64-bit version for 32-bit compatibility mode

4 weeks agoMerge tag 'sched-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 Jul 2025 18:08:51 +0000 (11:08 -0700)]
Merge tag 'sched-urgent-2025-07-20' of git://git./linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
 "A single fix for the scheduler.

  A recent commit changed the runqueue counter nr_uninterruptible to an
  unsigned int. Due to the fact that the counters are not updated on
  migration of a uninterruptble task to a different CPU, these counters
  can exceed INT_MAX.

  The counter is cast to long in the load average calculation, which
  means that the cast expands into negative space resulting in bogus
  load average values.

  Convert it back to unsigned long to fix this.

* tag 'sched-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Change nr_uninterruptible type to unsigned long

4 weeks agoMerge tag 'hyperv-fixes-signed-20250718' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 Jul 2025 16:29:43 +0000 (09:29 -0700)]
Merge tag 'hyperv-fixes-signed-20250718' of git://git./linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Select use CONFIG_SYSFB only if EFI is enabled (Michael Kelley)

 - An assorted set of fixes to remove warnings for missing export.h
   header inclusion (Naman Jain)

 - An assorted set of fixes for when Linux run as the root partition
   for Microsoft Hypervisor (Mukesh Rathor, Nuno Das Neves, Stanislav
   Kinsburskii)

 - Fix the check for HYPERVISOR_CALLBACK_VECTOR (Naman Jain)

 - Fix fcopy tool to handle irregularities with size of ring buffer
   (Naman Jain)

 - Fix incorrect file path conversion in fcopy tool (Yasumasa Suenaga)

* tag 'hyperv-fixes-signed-20250718' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  tools/hv: fcopy: Fix irregularities with size of ring buffer
  PCI: hv: Use the correct hypercall for unmasking interrupts on nested
  x86/hyperv: Expose hv_map_msi_interrupt()
  Drivers: hv: Use nested hypercall for post message and signal event
  x86/hyperv: Clean up hv_map/unmap_interrupt() return values
  x86/hyperv: Fix usage of cpu_online_mask to get valid cpu
  PCI: hv: Don't load the driver for baremetal root partition
  net: mana: Fix warnings for missing export.h header inclusion
  PCI: hv: Fix warnings for missing export.h header inclusion
  clocksource: hyper-v: Fix warnings for missing export.h header inclusion
  x86/hyperv: Fix warnings for missing export.h header inclusion
  Drivers: hv: Fix warnings for missing export.h header inclusion
  Drivers: hv: Fix the check for HYPERVISOR_CALLBACK_VECTOR
  tools/hv: fcopy: Fix incorrect file path conversion
  Drivers: hv: Select CONFIG_SYSFB only if EFI is enabled

4 weeks agoMerge tag 'usb-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 20 Jul 2025 16:21:53 +0000 (09:21 -0700)]
Merge tag 'usb-6.16-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt fixes from Greg KH:
 "Here are some USB and Thunderbolt driver fixes for reported problems
  for 6.16-rc6.  Included in here are:

   - Thunderbolt fixes for some much-reported issues

   - dwc2 driver fixes

   - dwc3 driver fixes

   - new usb-serial driver device ids

   - gadgetfs configfs fix

   - musb driver fix

   - USB hub driver fix

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'usb-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: hub: Don't try to recover devices lost during warm reset.
  usb: dwc2: gadget: Fix enter to hibernation for UTMI+ PHY
  usb: dwc3: qcom: Don't leave BCR asserted
  USB: serial: option: add Telit Cinterion FE910C04 (ECM) composition
  USB: serial: ftdi_sio: add support for NDI EMGUIDE GEMINI
  usb: gadget: configfs: Fix OOB read on empty string write
  usb: musb: fix gadget state on disconnect
  USB: serial: option: add Foxconn T99W640
  thunderbolt: Fix bit masking in tb_dp_port_set_hops()
  thunderbolt: Fix wake on connect at runtime

4 weeks agoMerge tag 'tty-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 20 Jul 2025 16:14:32 +0000 (09:14 -0700)]
Merge tag 'tty-6.16-rc6' of git://git./linux/kernel/git/gregkh/tty

Pull serial driver fixes from Greg KH:
 "Here are two serial driver fixes for 6.16-rc6 that do:

   - fix for the serial core OF resource leak

   - pch_uart driver fix for a "incorrect variable" issue

  Both of these have been in linux-next for over a week with no reported
  problems"

* tag 'tty-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  pch_uart: Fix dma_sync_sg_for_device() nents value
  serial: core: fix OF node leak

4 weeks agoMerge tag 'staging-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 20 Jul 2025 16:08:55 +0000 (09:08 -0700)]
Merge tag 'staging-6.16-rc6' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are some small driver fixes for the vchiq_arm staging driver:

   - reverts of previous changes that turned out to caused problems.

   - change to prevent a resource leak

  All of these have been in linux-next this week with no reported
  problems"

* tag 'staging-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vchiq_arm: Make vchiq_shutdown never fail
  Revert "staging: vchiq_arm: Create keep-alive thread during probe"
  Revert "staging: vchiq_arm: Improve initial VCHIQ connect"

4 weeks agoMerge tag 'char-misc-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 20 Jul 2025 16:03:41 +0000 (09:03 -0700)]
Merge tag 'char-misc-6.16-rc7' of git://git./linux/kernel/git/gregkh/char-misc

Pull char / misc / IIO fixes from Greg KH:
 "Here are some char/misc/iio and other driver fixes for 6.16-rc7.
  Included in here are:

   - IIO driver fixes for reported problems

   - Interconnect driver fixes for reported problems

   - nvmem driver fixes

   - bunch of comedi driver fixes for long-term bugs

   - Kconfig dependancy fixes for mux drivers

   - other small driver fixes for reported problems.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'char-misc-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (35 commits)
  nvmem: layouts: u-boot-env: remove crc32 endianness conversion
  misc: amd-sbi: Explicitly clear in/out arg "mb_in_out"
  misc: amd-sbi: Address copy_to/from_user() warning reported in smatch
  misc: amd-sbi: Address potential integer overflow issue reported in smatch
  comedi: comedi_test: Fix possible deletion of uninitialized timers
  comedi: Fix initialization of data for instructions that write to subdevice
  comedi: Fix use of uninitialized data in insn_rw_emulate_bits()
  comedi: das6402: Fix bit shift out of bounds
  comedi: aio_iiro_16: Fix bit shift out of bounds
  comedi: pcl812: Fix bit shift out of bounds
  comedi: das16m1: Fix bit shift out of bounds
  comedi: Fix some signed shift left operations
  comedi: Fail COMEDI_INSNLIST ioctl if n_insns is too large
  nvmem: imx-ocotp: fix MAC address byte length
  MAINTAINERS: add miscdevice Rust abstractions
  mux: mmio: Fix missing CONFIG_REGMAP_MMIO
  iio: dac: ad3530r: Fix incorrect masking for channels 4-7 in powerdown mode
  iio: adc: ad7380: fix adi,gain-milli property parsing
  iio: adc: ad7949: use spi_is_bpw_supported()
  iio: accel: fxls8962af: Fix use after free in fxls8962af_fifo_flush
  ...

4 weeks agoMerge tag 'spi-fix-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Sun, 20 Jul 2025 15:58:58 +0000 (08:58 -0700)]
Merge tag 'spi-fix-v6.16-rc6' of git://git./linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
 "A fix adding missing validation that 8 bit I/O mode is actually
  supported for the specific device when attempting to use it.

  Anything that runs into this should already have been having problems,
  enforcing this should just make things safer and more obvious"

* tag 'spi-fix-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: Add check for 8-bit transfer with 8 IO mode support

4 weeks agoMerge tag 'regmap-fix-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 20 Jul 2025 15:56:40 +0000 (08:56 -0700)]
Merge tag 'regmap-fix-v6.16-rc6' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
 "A fix for a memory leak when we get an error during regmap init for a
  bus that uses free_on_exit to clean up device specific data"

* tag 'regmap-fix-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: fix potential memory leak of regmap_bus

4 weeks agoMerge tag 'input-for-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 20 Jul 2025 15:53:58 +0000 (08:53 -0700)]
Merge tag 'input-for-v6.16-rc6' of git://git./linux/kernel/git/dtor/input

Pull input fix from Dmitry Torokhov:

 - just a small fixup to the xpad driver correcting the recent addition
   of the Acer NGR200 controller

* tag 'input-for-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xpad - set correct controller type for Acer NGR200

4 weeks agoInput: xpad - set correct controller type for Acer NGR200
Nilton Perim Neto [Sun, 20 Jul 2025 05:07:36 +0000 (22:07 -0700)]
Input: xpad - set correct controller type for Acer NGR200

The controller should have been set as XTYPE_XBOX360 and not XTYPE_XBOX.
Also the entry is in the wrong place. Fix it.

Reported-by: Vicki Pfau <vi@endrift.com>
Signed-off-by: Nilton Perim Neto <niltonperimneto@gmail.com>
Link: https://lore.kernel.org/r/20250708033126.26216-2-niltonperimneto@gmail.com
Fixes: 22c69d786ef8 ("Input: xpad - support Acer NGR 200 Controller")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
4 weeks agoMerge tag 'efi-fixes-for-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 19 Jul 2025 23:27:03 +0000 (16:27 -0700)]
Merge tag 'efi-fixes-for-v6.16-2' of git://git./linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:

 - Fix potential memory leak reported by kmemleak

* tag 'efi-fixes-for-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efivarfs: Fix memory leak of efivarfs_fs_info in fs_context error paths

4 weeks agotracing: Add down_write(trace_event_sem) when adding trace event
Steven Rostedt [Sat, 19 Jul 2025 02:31:58 +0000 (22:31 -0400)]
tracing: Add down_write(trace_event_sem) when adding trace event

When a module is loaded, it adds trace events defined by the module. It
may also need to modify the modules trace printk formats to replace enum
names with their values.

If two modules are loaded at the same time, the adding of the event to the
ftrace_events list can corrupt the walking of the list in the code that is
modifying the printk format strings and crash the kernel.

The addition of the event should take the trace_event_sem for write while
it adds the new event.

Also add a lockdep_assert_held() on that semaphore in
__trace_add_event_dirs() as it iterates the list.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250718223158.799bfc0c@batman.local.home
Reported-by: Fusheng Huang(黄富生) <Fusheng.Huang@luxshare-ict.com>
Closes: https://lore.kernel.org/all/20250717105007.46ccd18f@batman.local.home/
Fixes: 110bf2b764eb6 ("tracing: add protection around module events unload")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 weeks agoMerge tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 19 Jul 2025 17:40:30 +0000 (10:40 -0700)]
Merge tag 'sched_ext-for-6.16-rc6-fixes' of git://git./linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Fix handling of migration disabled tasks in default idle selection

 - update_locked_rq() called __this_cpu_write() spuriously with NULL
   when @rq was not locked. As the writes were spurious, it didn't break
   anything directly. However, the function could be called in a
   preemptible leading to a context warning in __this_cpu_write(). Skip
   the spurious NULL writes.

 - Selftest fix on UP

* tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: idle: Handle migration-disabled tasks in idle selection
  sched/ext: Prevent update_locked_rq() calls with NULL rq
  selftests/sched_ext: Fix exit selftest hang on UP

4 weeks agoMerge tag 'cgroup-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 19 Jul 2025 17:00:47 +0000 (10:00 -0700)]
Merge tag 'cgroup-for-6.16-rc6-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
 "An earlier commit to suppress a warning introduced a race condition
  where tasks can escape cgroup1 freezer. Revert the commit and simply
  remove the warning which was spurious to begin with"

* tag 'cgroup-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  Revert "cgroup_freezer: cgroup_freezing: Check if not frozen"
  sched,freezer: Remove unnecessary warning in __thaw_task

4 weeks agoMerge tag 'hwmon-for-v6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 19 Jul 2025 16:51:01 +0000 (09:51 -0700)]
Merge tag 'hwmon-for-v6.16-rc7' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - corsair-cpro: Validate the size of the received input buffer

 - ina238: Report energy in microjoules as expected by the ABI

 - pmbus/ucd9000: Fixed GPIO functionality

* tag 'hwmon-for-v6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pmbus/ucd9000) Fix error in ucd9000_gpio_set
  hwmon: (ina238) Report energy in microjoules
  hwmon: (corsair-cpro) Validate the size of the received input buffer

4 weeks agoMerge tag 'rust-fixes-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda...
Linus Torvalds [Sat, 19 Jul 2025 16:22:26 +0000 (09:22 -0700)]
Merge tag 'rust-fixes-6.16-2' of git://git./linux/kernel/git/ojeda/linux

Pull Rust fixes from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Fix build and modpost confusion for the upcoming Rust 1.89.0
     release

   - Clean objtool warning for the upcoming Rust 1.89.0 release by
     adding one more noreturn function

  'kernel' crate:

   - Fix build error when using generics in the 'try_{,pin_}init!'
     macros"

* tag 'rust-fixes-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: use `#[used(compiler)]` to fix build and `modpost` with Rust >= 1.89.0
  objtool/rust: add one more `noreturn` Rust function for Rust 1.89.0
  rust: init: Fix generics in *_init! macros

4 weeks agoMerge tag 'vfs-6.16-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Sat, 19 Jul 2025 15:47:59 +0000 (08:47 -0700)]
Merge tag 'vfs-6.16-rc7.fixes' of git://git./linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix a memory leak in fcntl_dirnotify()

 - Raise SB_I_NOEXEC on secrement superblock instead of messing with
   flags on the mount

 - Add fsdevel and block mailing lists to uio entry. We had a few
   instances were very questionable stuff was added without either block
   or the VFS being aware of it

 - Fix netfs copy-to-cache so that it performs collection with
   ceph+fscache

 - Fix netfs race between cache write completion and ALL_QUEUED being
   set

 - Verify the inode mode when loading entries from disk in isofs

 - Avoid state_lock in iomap_set_range_uptodate()

 - Fix PIDFD_INFO_COREDUMP check in PIDFD_GET_INFO ioctl

 - Fix the incorrect return value in __cachefiles_write()

* tag 'vfs-6.16-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  MAINTAINERS: add block and fsdevel lists to iov_iter
  netfs: Fix race between cache write completion and ALL_QUEUED being set
  netfs: Fix copy-to-cache so that it performs collection with ceph+fscache
  fix a leak in fcntl_dirnotify()
  iomap: avoid unnecessary ifs_set_range_uptodate() with locks
  isofs: Verify inode mode when loading from disk
  cachefiles: Fix the incorrect return value in __cachefiles_write()
  secretmem: use SB_I_NOEXEC
  coredump: fix PIDFD_INFO_COREDUMP ioctl check

4 weeks agoMerge tag 'v6.16-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 19 Jul 2025 05:32:30 +0000 (22:32 -0700)]
Merge tag 'v6.16-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - fix creating special files to Samba when using SMB3.1.1 POSIX
   Extensions

 - fix incorrect caching on new file creation with directory leases
   enabled

 - two use after free fixes: one in oplock_break and one in async
   decryption

* tag 'v6.16-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  Fix SMB311 posix special file creation to servers which do not advertise reparse support
  smb: invalidate and close cached directory when creating child entries
  smb: client: fix use-after-free in crypt_message when using async crypto
  smb: client: fix use-after-free in cifs_oplock_break