linux-block.git
2 days agobcachefs: fix the memory leak in exception case
Hongbo Li [Tue, 24 Sep 2024 01:41:46 +0000 (09:41 +0800)]
bcachefs: fix the memory leak in exception case

The pointer clean points the memory allocated by kmemdup, when the
return value of bch2_sb_clean_validate_late is not zero. The memory
pointed by clean is leaked. So we should free it in this case.

Fixes: a37ad1a3aba9 ("bcachefs: sb-clean.c")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: fast exit when darray_make_room failed
Hongbo Li [Tue, 24 Sep 2024 01:42:24 +0000 (09:42 +0800)]
bcachefs: fast exit when darray_make_room failed

In downgrade_table_extra, the return value is needed. When it
return failed, we should exit immediately.

Fixes: 7773df19c35f ("bcachefs: metadata version bucket_stripe_sectors")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: Fix iterator leak in check_subvol()
Kent Overstreet [Tue, 24 Sep 2024 02:05:14 +0000 (22:05 -0400)]
bcachefs: Fix iterator leak in check_subvol()

A couple small error handling fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: Add snapshot to bch_inode_unpacked
Kent Overstreet [Tue, 24 Sep 2024 02:06:04 +0000 (22:06 -0400)]
bcachefs: Add snapshot to bch_inode_unpacked

this allows for various cleanups in fsck

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: assign return error when iterating through layout
Diogo Jahchan Koike [Mon, 23 Sep 2024 22:22:14 +0000 (19:22 -0300)]
bcachefs: assign return error when iterating through layout

syzbot reported a null ptr deref in __copy_user [0]

In __bch2_read_super, when a corrupt backup superblock matches the
default opts offset, no error is assigned to ret and the freed superblock
gets through, possibly being assigned as the best sb in bch2_fs_open and
being later dereferenced, causing a fault. Assign EINVALID to ret when
iterating through layout.

[0]: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876

Reported-by: syzbot+18a5c5e8a9c856944876@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876
Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: Fix srcu warning in check_topology
Kent Overstreet [Mon, 23 Sep 2024 22:42:39 +0000 (18:42 -0400)]
bcachefs: Fix srcu warning in check_topology

check_topology doesn't need the srcu lock and doesn't use normal btree
transactions - we can just drop the srcu lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: Fix error path in check_dirent_inode_dirent()
Kent Overstreet [Mon, 23 Sep 2024 22:41:46 +0000 (18:41 -0400)]
bcachefs: Fix error path in check_dirent_inode_dirent()

fsck_err() jumps to the fsck_err label when bailing out; need to make
sure bp_iter was initialized...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlapping
Piotr Zalewski [Sun, 22 Sep 2024 15:18:01 +0000 (15:18 +0000)]
bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlapping

Zero-initialize part of allocated bounce buffer which wasn't touched by
subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value
use KMSAN bug[1].

After applying the patch reproducer still triggers stack overflow[2] but
it seems unrelated to the uninit-value use warning. After further
investigation it was found that stack overflow occurs because KMSAN adds
too many function calls[3]. Backtrace of where the stack magic number gets
smashed was added as a reply to syzkaller thread[3].

It was confirmed that task's stack magic number gets smashed after the code
path where KSMAN detects uninit-value use is executed, so it can be assumed
that it doesn't contribute in any way to uninit-value use detection.

[1] https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718
[2] https://lore.kernel.org/lkml/66e57e46.050a0220.115905.0002.GAE@google.com
[3] https://lore.kernel.org/all/rVaWgPULej8K7HqMPNIu8kVNyXNjjCiTB-QBtItLFBmk0alH6fV2tk4joVPk97Evnuv4ZRDd8HB5uDCkiFG6u81xKdzDj-KrtIMJSlF6Kt8=@proton.me

Reported-by: syzbot+6f655a60d3244d0c6718@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718
Fixes: ec4edd7b9d20 ("bcachefs: Prep work for variable size btree node buffers")
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: Improve bch2_is_inode_open() warning message
Kent Overstreet [Mon, 23 Sep 2024 21:33:02 +0000 (17:33 -0400)]
bcachefs: Improve bch2_is_inode_open() warning message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: Add extra padding in bkey_make_mut_noupdate()
Kent Overstreet [Mon, 23 Sep 2024 21:30:59 +0000 (17:30 -0400)]
bcachefs: Add extra padding in bkey_make_mut_noupdate()

This fixes a kasan splat in propagate_key_to_snapshot_leaves() -
varint_decode_fast() does reads (that it never uses) up to 7 bytes past
the end of the integer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 days agobcachefs: Mark inode errors as autofix
Kent Overstreet [Mon, 23 Sep 2024 20:40:47 +0000 (16:40 -0400)]
bcachefs: Mark inode errors as autofix

Most or all errors will be autofix in the future, we're currently just
doing the ones that we know are well tested.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 days agobcachefs: Fix infinite loop in propagate_key_to_snapshot_leaves()
Kent Overstreet [Mon, 23 Sep 2024 20:39:49 +0000 (16:39 -0400)]
bcachefs: Fix infinite loop in propagate_key_to_snapshot_leaves()

As we iterate we need to mark that we no longer need iterators -
otherwise we'll infinite loop via the "too many iters" check when
there's many snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 days agobcachefs: Ensure BCH_FS_accounting_replay_done is always set
Kent Overstreet [Sun, 22 Sep 2024 06:10:30 +0000 (02:10 -0400)]
bcachefs: Ensure BCH_FS_accounting_replay_done is always set

if it doesn't get set we'll never be able to flush the btree write
buffer; this only happens in fake rw mode, but prevents us from shutting
down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Hold read lock in bch2_snapshot_tree_oldest_subvol()
Ahmed Ehab [Sat, 21 Sep 2024 21:00:36 +0000 (00:00 +0300)]
bcachefs: Hold read lock in bch2_snapshot_tree_oldest_subvol()

Syzbot reports a problem that a warning is triggered due to suspicious
use of rcu_dereference_check(). That is triggered by a call of
bch2_snapshot_tree_oldest_subvol().

The cause of the warning is that inside
bch2_snapshot_tree_oldest_subvol(), snapshot_t() is called which calls
rcu_dereference() that requires a read lock to be held. Also, the call
of bch2_snapshot_tree_next() eventually calls snapshot_t().

To fix this, call rcu_read_lock() before calling snapshot_t(). Then,
release the lock after the termination of the while loop.

Reported-by: <syzbot+f7c41a878676b72c16a6@syzkaller.appspotmail.com>
Signed-off-by: Ahmed Ehab <bottaawesome633@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: return err ptr instead of null in read sb clean
Diogo Jahchan Koike [Tue, 10 Sep 2024 21:18:34 +0000 (18:18 -0300)]
bcachefs: return err ptr instead of null in read sb clean

syzbot reported a null-ptr-deref in bch2_fs_start. [0]

When a sb is marked clear but doesn't have a clean section
bch2_read_superblock_clean returns NULL which PTR_ERR_OR_ZERO
lets through, eventually leading to a null ptr dereference down
the line. Adjust read sb clean to return an ERR_PTR indicating the
invalid clean section.

[0] https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543

Reported-by: syzbot+1cecc37d87c4286e5543@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543
Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Remove duplicated include in backpointers.c
Yang Li [Mon, 9 Sep 2024 00:58:02 +0000 (08:58 +0800)]
bcachefs: Remove duplicated include in backpointers.c

The header files bbpos.h is included twice in backpointers.c,
so one inclusion of each can be removed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10783
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Don't drop devices with stripe pointers
Kent Overstreet [Sat, 7 Sep 2024 00:22:26 +0000 (20:22 -0400)]
bcachefs: Don't drop devices with stripe pointers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices
Kent Overstreet [Fri, 6 Sep 2024 23:14:36 +0000 (19:14 -0400)]
bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices

This factors out ec_strie_head_devs_update(), which initializes the
bitmap of devices we're allocating from, and runs it every time
c->rw_devs_change_count changes.

We also cancel pending, not allocated stripes, since they may refer to
devices that are no longer available.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch_fs.rw_devs_change_count
Kent Overstreet [Fri, 6 Sep 2024 23:12:53 +0000 (19:12 -0400)]
bcachefs: bch_fs.rw_devs_change_count

Add a counter that's incremented whenever rw devices change; this will
be used for erasure coding so that it can keep ec_stripe_head in sync
and not deadlock on a new stripe when a device it wants goes away.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch2_dev_remove_stripes()
Kent Overstreet [Sun, 1 Sep 2024 22:35:52 +0000 (18:35 -0400)]
bcachefs: bch2_dev_remove_stripes()

We can now correctly force-remove a device that has stripes on it; this
uses the new BCH_SB_MEMBER_INVALID sentinal value.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch2_trigger_ptr() calculates sectors even when no device
Kent Overstreet [Sun, 8 Sep 2024 01:51:46 +0000 (21:51 -0400)]
bcachefs: bch2_trigger_ptr() calculates sectors even when no device

This is necessary for erasure coded pointers to devices that have been
removed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: improve error messages in bch2_ec_read_extent()
Kent Overstreet [Sat, 7 Sep 2024 20:31:47 +0000 (16:31 -0400)]
bcachefs: improve error messages in bch2_ec_read_extent()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: improve error message on too few devices for ec
Kent Overstreet [Sun, 1 Sep 2024 21:42:01 +0000 (17:42 -0400)]
bcachefs: improve error message on too few devices for ec

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: improve bch2_new_stripe_to_text()
Kent Overstreet [Sun, 1 Sep 2024 20:45:34 +0000 (16:45 -0400)]
bcachefs: improve bch2_new_stripe_to_text()

also print out the new stripe key

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: ec_stripe_head.nr_created
Kent Overstreet [Sun, 1 Sep 2024 20:44:36 +0000 (16:44 -0400)]
bcachefs: ec_stripe_head.nr_created

additional debug stat

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch_stripe.disk_label
Kent Overstreet [Sun, 1 Sep 2024 18:54:42 +0000 (14:54 -0400)]
bcachefs: bch_stripe.disk_label

When reshaping existing stripes, we should keep them on the same target
that they were allocated on; to do this, we need to add a field to the
btree stripe type.

This is a tad awkward, because we only have 8 bits left, and targets are
16 bits - but we only need to store a label, not a full target.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: stripe_to_mem()
Kent Overstreet [Sun, 1 Sep 2024 18:51:45 +0000 (14:51 -0400)]
bcachefs: stripe_to_mem()

factor out a common helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: EIO errcode cleanup
Kent Overstreet [Wed, 4 Sep 2024 21:51:47 +0000 (17:51 -0400)]
bcachefs: EIO errcode cleanup

We want to be using private errcodes whenever possible, for better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Rework btree node pinning
Kent Overstreet [Thu, 5 Sep 2024 00:49:37 +0000 (20:49 -0400)]
bcachefs: Rework btree node pinning

In backpointers fsck, we do a seqential scan of one btree, and check
references to another: extents <-> backpointers

Checking references generates random lookups, so we want to pin that
btree in memory (or only a range, if it doesn't fit in ram).

Previously, this was done with a simple check in the shrinker - "if
btree node is in range being pinned, don't free it" - but this generated
OOMs, as our shrinker wasn't well behaved if there was less memory
available than expected.

Instead, we now have two different shrinkers and lru lists; the second
shrinker being for pinned nodes, with seeks set much higher than normal
- so they can still be freed if necessary, but we'll prefer not to.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: split up btree cache counters for live, freeable
Kent Overstreet [Thu, 5 Sep 2024 23:37:56 +0000 (19:37 -0400)]
bcachefs: split up btree cache counters for live, freeable

this is prep for introducing a second live list and shrinker for pinned
nodes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: btree cache counters should be size_t
Kent Overstreet [Thu, 5 Sep 2024 23:25:01 +0000 (19:25 -0400)]
bcachefs: btree cache counters should be size_t

32 bits won't overflow any time soon, but size_t is the correct type for
counting objects in memory.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Don't count "skipped access bit" as touched in btree cache scan
Kent Overstreet [Wed, 4 Sep 2024 21:19:24 +0000 (17:19 -0400)]
bcachefs: Don't count "skipped access bit" as touched in btree cache scan

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Failed devices no longer require mounting in degraded mode
Kent Overstreet [Sat, 7 Sep 2024 15:45:21 +0000 (11:45 -0400)]
bcachefs: Failed devices no longer require mounting in degraded mode

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch2_dev_rcu_noerror()
Kent Overstreet [Sun, 1 Sep 2024 22:12:26 +0000 (18:12 -0400)]
bcachefs: bch2_dev_rcu_noerror()

bch2_dev_rcu() now properly errors if the device is invalid

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Progress indicator for extents_to_backpointers
Kent Overstreet [Wed, 28 Aug 2024 00:21:03 +0000 (20:21 -0400)]
bcachefs: Progress indicator for extents_to_backpointers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch2_opts_to_text()
Kent Overstreet [Sun, 8 Sep 2024 00:27:23 +0000 (20:27 -0400)]
bcachefs: bch2_opts_to_text()

Factor out bch2_show_options() into a generic helper, for debugging
option passing issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: improve "no device to read from" message
Kent Overstreet [Fri, 6 Sep 2024 22:32:49 +0000 (18:32 -0400)]
bcachefs: improve "no device to read from" message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Fix compilation error for bch2_sb_member_alloc
Hongbo Li [Wed, 4 Sep 2024 07:15:32 +0000 (15:15 +0800)]
bcachefs: Fix compilation error for bch2_sb_member_alloc

Fix the following compilation error:

```
fs/bcachefs/sb-members.c: In function ‘bch2_sb_member_alloc’:
fs/bcachefs/sb-members.c:508:2: error: a label can only be part of a statement and a declaration is not a statement
  508 |  unsigned nr_devices = max_t(unsigned, dev_idx + 1, c->sb.nr_devices);
```

Fixes: a7d364a133c7 ("bcachefs: bch2_sb_member_alloc()")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch2_sb_member_alloc()
Kent Overstreet [Sun, 1 Sep 2024 22:08:25 +0000 (18:08 -0400)]
bcachefs: bch2_sb_member_alloc()

refactoring

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: bch2_dev_remove_alloc() -> alloc_background.c
Kent Overstreet [Sun, 1 Sep 2024 21:56:27 +0000 (17:56 -0400)]
bcachefs: bch2_dev_remove_alloc() -> alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Move tabstop setup to bch2_dev_usage_to_text()
Kent Overstreet [Wed, 4 Sep 2024 21:51:16 +0000 (17:51 -0400)]
bcachefs: Move tabstop setup to bch2_dev_usage_to_text()

No reason for it not to be where it's needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Options for recovery_passes, recovery_passes_exclude
Kent Overstreet [Mon, 2 Sep 2024 02:39:42 +0000 (22:39 -0400)]
bcachefs: Options for recovery_passes, recovery_passes_exclude

This adds mount options for specifying recovery passes to run, or
exclude; the immediate need for this is that backpointers fsck is having
trouble completing, so we need a way to skip it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Use mm_account_reclaimed_pages() when freeing btree nodes
Kent Overstreet [Wed, 4 Sep 2024 19:30:48 +0000 (15:30 -0400)]
bcachefs: Use mm_account_reclaimed_pages() when freeing btree nodes

When freeing in a shrinker callback, we need to notify memory reclaim,
so it knows forward progress has been made.

Normally this is done in e.g. slab code, but we're not freeing through
slab - or rather we are, but these allocations are big, and use the
kmalloc_large() path.

This is really a bug in the slub code, but we're working around it here
for now.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Use __GFP_ACCOUNT for reclaimable memory
Kent Overstreet [Tue, 3 Sep 2024 21:42:53 +0000 (17:42 -0400)]
bcachefs: Use __GFP_ACCOUNT for reclaimable memory

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Hook up RENAME_WHITEOUT in rename.
Sasha Finkelstein [Sun, 18 Aug 2024 17:09:02 +0000 (19:09 +0200)]
bcachefs: Hook up RENAME_WHITEOUT in rename.

This is needed for overlayfs, which is used by container managers.

Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: rebalance writes use BCH_WRITE_ONLY_SPECIFIED_DEVS
Kent Overstreet [Sun, 1 Sep 2024 20:55:35 +0000 (16:55 -0400)]
bcachefs: rebalance writes use BCH_WRITE_ONLY_SPECIFIED_DEVS

this was an oversight: rebalance is moving data to a specific device, so
we don't want it falling back to the full filesystem

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: BCH_WRITE_ALLOC_NOWAIT no longer applies to open bucket allocation
Kent Overstreet [Sun, 1 Sep 2024 21:32:22 +0000 (17:32 -0400)]
bcachefs: BCH_WRITE_ALLOC_NOWAIT no longer applies to open bucket allocation

rebalance writes must be BCH_WRITE_ALLOC_NOWAIT because they don't
allocate from the full filesystem - but we don't want spurious
allocation failures due to open buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: fix prototype to bch2_alloc_sectors_start_trans()
Kent Overstreet [Sun, 1 Sep 2024 21:06:28 +0000 (17:06 -0400)]
bcachefs: fix prototype to bch2_alloc_sectors_start_trans()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: kill redundant is_vmalloc_addr()
Kent Overstreet [Sun, 1 Sep 2024 19:09:11 +0000 (15:09 -0400)]
bcachefs: kill redundant is_vmalloc_addr()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: convert __bch2_encrypt_bio() to darray
Kent Overstreet [Sun, 1 Sep 2024 19:33:17 +0000 (15:33 -0400)]
bcachefs: convert __bch2_encrypt_bio() to darray

like the previous patch, kill use of bare arrays; the encryption code
likes to work in big batches, so this is a small performance
improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: do_encrypt() now handles allocation failures
Kent Overstreet [Sun, 1 Sep 2024 19:24:11 +0000 (15:24 -0400)]
bcachefs: do_encrypt() now handles allocation failures

convert to darray, and add a fallback when allocation fails

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 days agobcachefs: Add pinned to btree cache not freed counters
Kent Overstreet [Sun, 1 Sep 2024 17:36:42 +0000 (13:36 -0400)]
bcachefs: Add pinned to btree cache not freed counters

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Annotate bch_replicas_entry_{v0,v1} with __counted_by()
Thorsten Blum [Mon, 26 Aug 2024 10:11:36 +0000 (12:11 +0200)]
bcachefs: Annotate bch_replicas_entry_{v0,v1} with __counted_by()

Add the __counted_by compiler attribute to the flexible array members
devs to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Increment nr_devs before adding a new device to the devs array and
adjust the array indexes accordingly. Add a helper macro for adding a
new device.

In bch2_journal_read(), explicitly set nr_devs to 0.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: support idmap mounts
Hongbo Li [Sat, 24 Aug 2024 01:27:24 +0000 (09:27 +0800)]
bcachefs: support idmap mounts

We enable idmapped mounts for bcachefs. Here, we just pass down
the user_namespace argument from the VFS methods to the relevant
helpers.

The idmap test in bcachefs is as following:

```
1. losetup /dev/loop1 bcachefs.img
2. ./bcachefs format /dev/loop1
3. mount -t bcachefs /dev/loop1 /mnt/bcachefs/
4. ./mount-idmapped --map-mount b:0:1000:1 /mnt/bcachefs /mnt/idmapped1/

ll /mnt/bcachefs
total 2
drwx------. 2 root root    0 Jun 14 14:10 lost+found
-rw-r--r--. 1 root root 1945 Jun 14 14:12 profile

ll /mnt/idmapped1/

total 2
drwx------. 2 1000 1000    0 Jun 14 14:10 lost+found
-rw-r--r--. 1 1000 1000 1945 Jun 14 14:12 profile

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Annotate struct bch_xattr with __counted_by()
Thorsten Blum [Sat, 24 Aug 2024 13:57:41 +0000 (15:57 +0200)]
bcachefs: Annotate struct bch_xattr with __counted_by()

Add the __counted_by compiler attribute to the flexible array member
x_name to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Switch gc bucket array to a genradix
Kent Overstreet [Sat, 24 Aug 2024 15:38:21 +0000 (11:38 -0400)]
bcachefs: Switch gc bucket array to a genradix

A user with a 30 tb device is overflowing the INT_MAX limit on vmalloc
allocations...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: darray: convert to alloc_hooks()
Kent Overstreet [Thu, 22 Aug 2024 07:50:22 +0000 (03:50 -0400)]
bcachefs: darray: convert to alloc_hooks()

better memory allocation profiling support

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Convert to use jiffies macros
Chen Yufan [Thu, 22 Aug 2024 02:57:31 +0000 (10:57 +0800)]
bcachefs: Convert to use jiffies macros

Use jiffies macros instead of using jiffies directly to handle wraparound.

Signed-off-by: Chen Yufan <chenyufan@vivo.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Refactor bch2_bset_fix_lookup_table
Alan Huang [Thu, 15 Aug 2024 15:40:53 +0000 (23:40 +0800)]
bcachefs: Refactor bch2_bset_fix_lookup_table

bch2_bset_fix_lookup_table is too complicated to be easily understood,
the comment "l now > where" there is also incorrect when where ==
t->end_offset. This patch therefore refactor the function, the idea is
that when where >= rw_aux_tree(b, t)[t->size - 1].offset, we don't need
to adjust the rw aux tree.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Assert that we don't lock nodes when !trans->locked
Kent Overstreet [Sun, 30 Jun 2024 13:25:56 +0000 (09:25 -0400)]
bcachefs: Assert that we don't lock nodes when !trans->locked

We rely on the trans->locked to know if a trans has nodes locked for
assertions about deadlocks; there can't be more than one trans in the
same process that is locked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Do not check folio_has_private()
Matthew Wilcox (Oracle) [Tue, 20 Aug 2024 04:10:11 +0000 (05:10 +0100)]
bcachefs: Do not check folio_has_private()

folio_has_private() is an attractive nuisance; filesystem authors
generally don't realise that it actually checks two flags (one of which
is never set by bcachefs).  There's no need to check the private flag at
all; for folios owned by bcachefs, we know that folio->private is NULL
when the private flag is clear and non-NULL when the private flag is set.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_time_stats_reset()
Kent Overstreet [Mon, 19 Aug 2024 19:33:38 +0000 (15:33 -0400)]
bcachefs: bch2_time_stats_reset()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Drop memalloc_nofs_save() in bch2_btree_node_mem_alloc()
Kent Overstreet [Mon, 19 Aug 2024 19:11:20 +0000 (15:11 -0400)]
bcachefs: Drop memalloc_nofs_save() in bch2_btree_node_mem_alloc()

It's really not needed: the only locks used here are the btree cache
lock, which we drop for GFP_WAIT allocations, and btree node locks - but
we also drop those for GFP_WAIT allocations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Simplify bch2_xattr_emit() implementation
Youling Tang [Thu, 15 Aug 2024 08:57:44 +0000 (16:57 +0800)]
bcachefs: Simplify bch2_xattr_emit() implementation

Use helper functions to make code more readable.

Similar to commit a5488f29835c ("fs: simplify ->listxattr() implementation")

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: drop unused posix acl handlers
Youling Tang [Thu, 15 Aug 2024 08:57:43 +0000 (16:57 +0800)]
bcachefs: drop unused posix acl handlers

Remove struct nop_posix_acl_{access,default} for bcachefs filesystem
that don't depend on the xattr handler in their inode->i_op->listxattr()
method in any way. There's nothing more to do than to simply remove the
handler. It's been effectively unused ever since we introduced the new
posix acl api. See [1] for details.

Link [1]: https://patchwork.kernel.org/project/linux-fsdevel/cover/20230125-fs-acl-remove-generic-xattr-handlers-v3-0-f760cc58967d@kernel.org/

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Remove unused parameter
Alan Huang [Wed, 14 Aug 2024 14:20:07 +0000 (22:20 +0800)]
bcachefs: Remove unused parameter

iter here is unused, remove it.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Remove the prev array stuff
Alan Huang [Mon, 12 Aug 2024 09:04:04 +0000 (17:04 +0800)]
bcachefs: Remove the prev array stuff

After reducing the search range when building the aux tree, the prev array
stuff is no longer useful, so remove it.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Minimize the search range used to calculate the mantissa
Alan Huang [Mon, 12 Aug 2024 08:06:09 +0000 (16:06 +0800)]
bcachefs: Minimize the search range used to calculate the mantissa

When the search key's mantissa is larger than the node i's, we know that
the search key is larger than the first key of the cacheline corresponding
to node i, so that when we are calculating the mantissa of right side
nodes of node i, the left side of the search range can be the first key
of node i. Once the search range is minimized, the mantissa we are
calculating can have more useful bits, thus reduce the slow path
comparison. Besides, we can now remove all the prev array stuff.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Convert open-coded extra computation to helper
Alan Huang [Sat, 10 Aug 2024 16:11:46 +0000 (00:11 +0800)]
bcachefs: Convert open-coded extra computation to helper

This patch replaces open-coded extra computation to eytzinger1_extra.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Remove dead code in __build_ro_aux_tree
Alan Huang [Sat, 10 Aug 2024 15:51:40 +0000 (23:51 +0800)]
bcachefs: Remove dead code in __build_ro_aux_tree

This logic is no longer useful since commit
3ce8b463e3e0 ("bcachefs: kill bset_tree->max_key"), so remove it.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Remove unused parameter of bkey_mantissa_bits_dropped
Alan Huang [Sat, 10 Aug 2024 16:52:25 +0000 (00:52 +0800)]
bcachefs: Remove unused parameter of bkey_mantissa_bits_dropped

The idx parameter of bkey_mantissa_bits_dropped is unused, remove it.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Remove unused parameter of bkey_mantissa
Alan Huang [Sat, 10 Aug 2024 16:52:24 +0000 (00:52 +0800)]
bcachefs: Remove unused parameter of bkey_mantissa

The idx parameter of bkey_mantissa became unused since commit
b904a7991802 ("bcachefs: Go back to 16 bit mantissa bkey floats"),
so remove it.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_sb_nr_devices()
Kent Overstreet [Thu, 8 Aug 2024 15:40:47 +0000 (11:40 -0400)]
bcachefs: bch2_sb_nr_devices()

factoring out a helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: trivial open_bucket_add_buckets() cleanup
Kent Overstreet [Wed, 7 Aug 2024 19:44:57 +0000 (15:44 -0400)]
bcachefs: trivial open_bucket_add_buckets() cleanup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Fix a spelling error in docs
Xiaxi Shen [Wed, 7 Aug 2024 07:10:05 +0000 (00:10 -0700)]
bcachefs: Fix a spelling error in docs

Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: promote_whole_extents is now a normal option
Kent Overstreet [Thu, 1 Aug 2024 03:56:04 +0000 (23:56 -0400)]
bcachefs: promote_whole_extents is now a normal option

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Move rebalance_status out of sysfs/internal
Kent Overstreet [Thu, 1 Aug 2024 03:39:49 +0000 (23:39 -0400)]
bcachefs: Move rebalance_status out of sysfs/internal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: remove the unused parameter in macro bkey_crc_next
Julian Sun [Sun, 21 Jul 2024 12:55:20 +0000 (08:55 -0400)]
bcachefs: remove the unused parameter in macro bkey_crc_next

In the macro definition of bkey_crc_next, five parameters
were accepted, but only four of them were used. Let's remove
the unused one.

The patch has only passed compilation tests, but it should be fine.

Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: fix macro definition allocate_dropping_locks
Julian Sun [Sun, 21 Jul 2024 12:45:47 +0000 (08:45 -0400)]
bcachefs: fix macro definition allocate_dropping_locks

The macro allocate_dropping_locks accepts a parameter _trans,
but it was not used, rather the variable trans was directly used,
which may be a local variable inside a function that calls the macros.

Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: fix macro definition allocate_dropping_locks_errcode
Julian Sun [Sun, 21 Jul 2024 12:44:24 +0000 (08:44 -0400)]
bcachefs: fix macro definition allocate_dropping_locks_errcode

The macro allocate_dropping_locks_errocode accepts a parameter _trans,
but it was not used, rather the variable trans was directly used,
which may be a local variable inside a function that calls the macros.

Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: remove the unused macro definition
Julian Sun [Sun, 21 Jul 2024 12:43:24 +0000 (08:43 -0400)]
bcachefs: remove the unused macro definition

macro bch2_kthread_wait_event_ioclock_timeout is no longer used,
let's remove it.

The patch has passed compilation test.

Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: quota_reserve_range() -> for_each_btree_key_in_subvolume_upto
Kent Overstreet [Wed, 17 Jul 2024 17:24:28 +0000 (13:24 -0400)]
bcachefs: quota_reserve_range() -> for_each_btree_key_in_subvolume_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_folio_set() -> for_each_btree_key_in_subvolume_upto
Kent Overstreet [Wed, 17 Jul 2024 17:34:35 +0000 (13:34 -0400)]
bcachefs: bch2_folio_set() -> for_each_btree_key_in_subvolume_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: range_has_data() -> for_each_btree_key_in_subvolume_upto
Kent Overstreet [Wed, 17 Jul 2024 17:30:23 +0000 (13:30 -0400)]
bcachefs: range_has_data() -> for_each_btree_key_in_subvolume_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_seek_hole() -> for_each_btree_key_in_subvolume_upto
Kent Overstreet [Wed, 17 Jul 2024 17:28:23 +0000 (13:28 -0400)]
bcachefs: bch2_seek_hole() -> for_each_btree_key_in_subvolume_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_seek_data() -> for_each_btree_key_in_subvolume_upto
Kent Overstreet [Wed, 17 Jul 2024 17:26:54 +0000 (13:26 -0400)]
bcachefs: bch2_seek_data() -> for_each_btree_key_in_subvolume_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_xattr_list() -> for_each_btree_key_in_subvolume_upto
Kent Overstreet [Wed, 17 Jul 2024 17:24:28 +0000 (13:24 -0400)]
bcachefs: bch2_xattr_list() -> for_each_btree_key_in_subvolume_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_readdir() -> for_each_btree_key_in_subvolume_upto
Kent Overstreet [Wed, 17 Jul 2024 17:24:28 +0000 (13:24 -0400)]
bcachefs: bch2_readdir() -> for_each_btree_key_in_subvolume_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: for_each_btree_key_in_subvolume_upto()
Kent Overstreet [Wed, 17 Jul 2024 16:59:51 +0000 (12:59 -0400)]
bcachefs: for_each_btree_key_in_subvolume_upto()

New helper for looping over keys in a given subvolume

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_fiemap(): call trans_begin() on every loop iter
Kent Overstreet [Wed, 17 Jul 2024 15:50:54 +0000 (11:50 -0400)]
bcachefs: bch2_fiemap(): call trans_begin() on every loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bchfs_read(): call trans_begin() on every loop iter
Kent Overstreet [Wed, 17 Jul 2024 15:47:01 +0000 (11:47 -0400)]
bcachefs: bchfs_read(): call trans_begin() on every loop iter

Same as the recent change for __bch2_read(); also, kill now unnecessary
btree_trans_too_many_iters() calls.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: kill bch2_btree_iter_peek_and_restart()
Kent Overstreet [Wed, 17 Jul 2024 15:42:11 +0000 (11:42 -0400)]
bcachefs: kill bch2_btree_iter_peek_and_restart()

dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Btree path tracepoints
Kent Overstreet [Wed, 10 Aug 2022 23:57:46 +0000 (19:57 -0400)]
bcachefs: Btree path tracepoints

Fastpath tracepoints, rarely needed, only enabled with
CONFIG_BCACHEFS_PATH_TRACEPOINTS.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Add check for btree_path ref overflow
Kent Overstreet [Tue, 16 Jul 2024 21:23:10 +0000 (17:23 -0400)]
bcachefs: Add check for btree_path ref overflow

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Mark bch_inode_info as SLAB_ACCOUNT
Youling Tang [Wed, 3 Jul 2024 07:09:55 +0000 (15:09 +0800)]
bcachefs: Mark bch_inode_info as SLAB_ACCOUNT

After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark
the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg:
account for certain kmem allocations to memcg")

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: allocate inode by using alloc_inode_sb()
Youling Tang [Tue, 16 Jul 2024 02:58:16 +0000 (10:58 +0800)]
bcachefs: allocate inode by using alloc_inode_sb()

The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().

It will also fix [1] to avoid the NULL pointer dereference BUG in
list_lru_add() when CONFIG_MEMCG is enabled.

Links:
[1]: https://lore.kernel.org/all/20589721-46c0-4344-b2ef-6ab48bbe2ea5@linux.dev/
[2]: https://lore.kernel.org/all/7db60e36-9c96-4938-a28d-a9745e287386@linux.dev/

Fixes: 86d81ec5f5f0 ("bcachefs: Mark bch_inode_info as SLAB_ACCOUNT")
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Opt_durability can now be set via bch2_opt_set_sb()
Kent Overstreet [Mon, 15 Jul 2024 23:54:51 +0000 (19:54 -0400)]
bcachefs: Opt_durability can now be set via bch2_opt_set_sb()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: bch2_opt_set_sb() can now set (some) device options
Kent Overstreet [Mon, 15 Jul 2024 23:26:46 +0000 (19:26 -0400)]
bcachefs: bch2_opt_set_sb() can now set (some) device options

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: data_allowed is now an opts.h option
Kent Overstreet [Mon, 15 Jul 2024 20:53:49 +0000 (16:53 -0400)]
bcachefs: data_allowed is now an opts.h option

need this so cmd_option in userspace can handle it

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 weeks agobcachefs: Annotate struct bucket_array with __counted_by()
Thorsten Blum [Wed, 21 Aug 2024 16:29:22 +0000 (18:29 +0200)]
bcachefs: Annotate struct bucket_array with __counted_by()

Add the __counted_by compiler attribute to the flexible array member
bucket to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>