Kent Overstreet [Tue, 22 Apr 2025 13:02:15 +0000 (09:02 -0400)]
bcachefs: bch2_fsck_err_opt()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 24 Apr 2025 13:27:10 +0000 (09:27 -0400)]
bcachefs: Plumb printbuf through bch2_btree_lost_data()
Part of the ongoing project to improve error messages by building them
up in printbufs and emitting them all at once, so that we can easily see
what events are related in the log.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 24 Apr 2025 13:28:56 +0000 (09:28 -0400)]
bcachefs: kill bch2_run_explicit_recovery_pass_persistent()
No longer has users, so we can kill it and rename
bch2_run_explicit_recovery_pass_persistent_locked().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 24 Apr 2025 13:13:28 +0000 (09:13 -0400)]
bcachefs: Remove redundant calls to btree_lost_data()
The btree node read path calls this before returning the read error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 24 Apr 2025 13:09:56 +0000 (09:09 -0400)]
bcachefs: bch2_btree_lost_data() now handles snapshots tree
We have a consolidated places for "this btree lost data, run this
repair", so use it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 22 Apr 2025 13:16:49 +0000 (09:16 -0400)]
bcachefs: Kill redundant error message in topology repair
The btree node read path already logs btree node read errors, this isn't
needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 22 Apr 2025 09:49:20 +0000 (05:49 -0400)]
bcachefs: Emit a single log message on data read error
Instead of emitting a message immediately when we get an error in the
read path, and then another at the end if we successfully retry - emit
one single log message before returning from bch2_rbio_retry().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 22 Apr 2025 09:45:48 +0000 (05:45 -0400)]
bcachefs: bch2_io_failures_to_text()
Pretty printer for bch_io_failures, to be used for better read error
messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 22 Apr 2025 10:03:33 +0000 (06:03 -0400)]
bcachefs: print_string_as_lines: avoid printing empty line
If the final line in in the message to be printed is blang, don't print
it.
This happens with indented printbufs - after a newline we emit spaces up
to the indent level.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 21 Apr 2025 17:02:51 +0000 (13:02 -0400)]
bcachefs: Make various async objs visible in debugfs
Add async objs list for
- promote_op
- bch_read_bio
- btree_read_bio
- btree_write_bio
This gets us introspection on in-flight async ops, and because under the
hood it uses fast_lists (percpu slot buffer on top of a radix tree),
it'll be fast enough to enable in production.
This will be very helpful for debugging "something got stuck" issues,
which have been cropping up from time to time (in the CI, especially
with folio writeback).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 21 Apr 2025 16:01:50 +0000 (12:01 -0400)]
bcachefs: Async object debugging
Debugging infrastructure for async objs: this lets us easily create
fast_lists for various object types so they'll be visible in debugfs.
Add new object types to the BCH_ASYNC_OBJS_TYPES() enum, and drop a
pretty-printer wrapper in async_objs.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 28 Sep 2024 20:22:38 +0000 (16:22 -0400)]
bcachefs: fast_list
A fast "list" data structure, which is actually a radix tree, with an
IDA for slot allocation and a percpu buffer on top of that.
Items cannot be added or moved to the head or tail, only added at some
(arbitrary) position and removed. The advantage is that adding, removing
and iteration is generally lockless, only hitting the lock in ida when
the percpu buffer is full or empty.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 2 Feb 2025 16:23:07 +0000 (11:23 -0500)]
bcachefs: bch2_read_bio_to_text
Pretty printer for struct bch_read_bio.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 21 Apr 2025 16:04:10 +0000 (12:04 -0400)]
bcachefs: bch2_bio_to_text()
Pretty printer for struct bio, to be used for async object debugging.
This is pretty minimal, we'll add more to it as we discover what we
need.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 19 Apr 2025 01:54:12 +0000 (21:54 -0400)]
bcachefs: bch_dev.io_ref -> enumerated_ref
Convert device IO refs to enumerated_refs, for easier debugging of
refcount issues.
Simple conversion: enumerate all users and convert to the new helpers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 18 Apr 2025 18:56:09 +0000 (14:56 -0400)]
bcachefs: bch_fs.writes -> enumerated_refs
Drop the single-purpose write ref code in bcachefs.h, and convert to
enumarated refs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 18 Apr 2025 18:56:09 +0000 (14:56 -0400)]
bcachefs: enumerated_ref.c
Factor out the debug code for rw filesystem refs into a small library.
In release mode an enumerated ref is a normal percpu refcount, but in
debug mode all enumerated users of the ref get their own atomic_long_t
ref - making it much easier to chase down refcount usage bugs for when a
refcount has many users.
For debugging, we have enumerated_ref_to_text(), which prints the
current value of each different user.
Additionally, in debug mode enumerated_ref_stop() has a 10 second
timeout, after which it will dump outstanding refcounts.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 19 Apr 2025 23:13:40 +0000 (19:13 -0400)]
bcachefs: for_each_rw_member_rcu()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Apr 2025 15:27:18 +0000 (11:27 -0400)]
bcachefs: __bch2_fs_read_write() no longer depends on io_ref
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 19 Apr 2025 02:11:15 +0000 (22:11 -0400)]
bcachefs: for_each_online_member_rcu()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Apr 2025 15:23:53 +0000 (11:23 -0400)]
bcachefs: recalc_capacity() no longer depends on io_ref
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 19 Apr 2025 04:57:55 +0000 (00:57 -0400)]
bcachefs: bch2_target_to_text() no longer depends on io_ref
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 14 Mar 2025 13:46:25 +0000 (09:46 -0400)]
bcachefs: bch2_check_rebalance_work()
Add a pass for checking the rebalance_work btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Alan Huang [Fri, 18 Apr 2025 07:52:10 +0000 (15:52 +0800)]
bcachefs: Kill dead code
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 17 Apr 2025 16:42:13 +0000 (12:42 -0400)]
bcachefs: Fix struct with flex member ABI warning
This pops up when buliding in userspace.
The structs aren't actually variable length, but no way to tell the
compiler that...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 20 Apr 2024 21:40:47 +0000 (17:40 -0400)]
docs: bcachefs: idle work scheduling design doc
People have been asking to see the plan for this, so -
bcachefs has various background tasks that need to be scheduled to
balance efficiency, predictability of performance, etc.
The design and philosophy hasn't changed too much since bcache, which
was primarily designed for server usage, with sustained load in mind.
These days we're seeing more desktop usage - where we really want to let
the system idle effictively, to reduce total power usage - while also
still balancing previous concerns, we still want to let work accumulate
to a degree.
This lays out all the requirements and starts to sketch out the
algorithm I have in mind.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 16 Apr 2025 01:35:28 +0000 (21:35 -0400)]
bcachefs: bch2_move_data_btree() can now walk roots
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 20:31:34 +0000 (16:31 -0400)]
bcachefs: bch2_move_data_btree() can move btree nodes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 3 Apr 2025 23:51:05 +0000 (19:51 -0400)]
bcachefs: plumb btree_id through move_pred_fd
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 3 Apr 2025 23:42:02 +0000 (19:42 -0400)]
bcachefs: Plumb target parameter through btree_node_rewrite_pos()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 3 Apr 2025 23:33:54 +0000 (19:33 -0400)]
bcachefs: export bch2_move_data_phys()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 18:09:34 +0000 (14:09 -0400)]
bcachefs: BCH_MEMBER_RESIZE_ON_MOUNT
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 19:15:36 +0000 (15:15 -0400)]
bcachefs: BCH_FEATURE_small_image
We can't go RW if it's an image file that hasn't been resized.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 3 Apr 2025 18:19:23 +0000 (14:19 -0400)]
bcachefs: BCH_FEATURE_no_alloc_info
If a filesystem is going to only be used read-only, and will be a
deployable image, we can strip out alloc info for a substantial
reduction in metadata size - around half, due to backpointers.
Alloc info will be regenerated on first read-write mount.
Remounting RW is disallowed for now, since we don't yet have
check_allocations running in RW mode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 16 Apr 2025 13:23:15 +0000 (09:23 -0400)]
bcachefs: Print features on startup with -o verbose
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 16 Apr 2025 10:48:31 +0000 (06:48 -0400)]
bcachefs: Shrink superblock downgrade table
Don't generate entries for versions that won't be able to mount.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 16 Apr 2025 03:35:48 +0000 (23:35 -0400)]
bcachefs: sb_validate() no longer requires members_v1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 23:21:52 +0000 (19:21 -0400)]
bcachefs: Add a recovery pass for making sure root inode is readable
If the root inode/subvolume is unreadable we can repair automatically -
but only if we're still in recovery, so that we can rewind to the
appropriate recovery pass.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 23:15:43 +0000 (19:15 -0400)]
bcachefs: Flag for repair on missing subvolume
Instead of going emegency read only with a bch2_fs_inconsistent() call,
log the error and recovery pass appropriately.
If we're still in recovery it'll be repaired immediately, otherwise
it'll be repaired on the next mount.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 21:31:47 +0000 (17:31 -0400)]
bcachefs: print_str_as_lines() -> print_str()
bch2_print_string_as_lines() is a low level helper that allows messages
longer than 1k to be printed without truncation.
But we should always be printing with the helpers that take a filesystem
object, if we're in fsck they direct output to the userspace process
controlling fsck instead of the dmesg log.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 18:08:42 +0000 (14:08 -0400)]
bcachefs: bch2_dev_missing_bkey()
Part of the ongoing project to kill off bch2_(fs|trans)_inconsistent
calls - they generally need to be replaced with either
- a fsck_err() call that can repair the error, or
- logging an error of the appropriate type in the superblock, and
flagging the appropriate recovery pass to repair the error
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 17:55:16 +0000 (13:55 -0400)]
bcachefs: Simplify bch2_count_fsck_err()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 17:45:39 +0000 (13:45 -0400)]
bcachefs: bch2_run_explicit_recovery_pass_printbuf()
We prefer helpers that emit log messages to printbufs rather than
printing them directly; that way, we can ensure that different log
messages from the same event are grouped together and formatted
appropriately in the dmesg log.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 14:20:46 +0000 (10:20 -0400)]
bcachefs: Incompatible features may now be enabled at runtime
version_upgrade is now a runtime option.
In the future we'll want to add compatible upgrades at runtime, and call
the full check_version_upgrade() when the option changes, but we don't
have compatible optional upgrades just yet.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Apr 2025 13:54:01 +0000 (09:54 -0400)]
bcachefs: Clean up option pre/post hooks, small fixes
The helpers are now:
- bch2_opt_hook_pre_set()
- bch2_opts_hooks_pre_set()
- bch2_opt_hook_post_set
Fix a bug where the filesystem discard option would incorrectly be
changed when setting the device option, and don't trigger rebalance
scans unnecessarily (when options aren't changing).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 12:20:47 +0000 (08:20 -0400)]
bcachefs: Use drop_locks_do() in bch2_inode_hash_find()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 2 Apr 2025 19:12:49 +0000 (15:12 -0400)]
bcachefs: Single device mode
Single device filesystems are now identified by the block device name,
not the UUID - and single device filesystems with the same UUID can be
mounted simultaneously, without any special options.
This allocates a new bit in the superblock, BCH_SB_MULTI_DEVICE, which
indicates whether a filesystem has ever been multi device.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 3 Apr 2025 17:10:03 +0000 (13:10 -0400)]
bcachefs: Initialize c->name earlier on single dev filesystems
On single device filesystems, c->name contains the block device name,
not the UUID.
Initialize this earlier, so that single device mode can use it for
initializing sysfs/debugfs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Alan Huang [Tue, 15 Apr 2025 05:33:07 +0000 (13:33 +0800)]
bcachefs: Simplify logic
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Alan Huang [Tue, 15 Apr 2025 05:33:06 +0000 (13:33 +0800)]
bcachefs: Remove spurious +1/-1 operation
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Alan Huang [Tue, 15 Apr 2025 05:33:04 +0000 (13:33 +0800)]
bcachefs: Kill bch2_trans_unlock_noassert
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 21:59:10 +0000 (17:59 -0400)]
bcachefs: Clean up duplicated code in bch2_journal_halt()
It's now a wrapper around bch2_journal_halt_locked().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 20:29:36 +0000 (16:29 -0400)]
bcachefs: bch2_dev_allocator_set_rw()
Add a helper that lets us change bch_member.data_allowed at runtime.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 17:52:12 +0000 (13:52 -0400)]
bcachefs: bch2_dev_journal_alloc() now respects data_allowed
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 11:47:47 +0000 (07:47 -0400)]
bcachefs: Improve bch2_btree_cache_to_text()
Make the output slightly clearer, and include a counter for "nodes we
couldn't free because we would have gone under our reserve".
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 11:45:13 +0000 (07:45 -0400)]
bcachefs: __btree_node_reclaim_checks()
Factor out a helper so we're not duplicating checks after locking the
btree node.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 11:42:46 +0000 (07:42 -0400)]
bcachefs: kill BTREE_CACHE_NOT_FREED_INCREMENT()
Small cleanup, just always increment the counters.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 6 Apr 2025 17:50:20 +0000 (13:50 -0400)]
bcachefs: Improve opts.degraded
Kill 'opts.very_degraded', and make 'opts.degraded' a persistent option,
stored in the superblock.
It's now an enum, with available choices ask/yes/very/no.
"ask" mode will be handled by the mount helper, for prompting the user
(on a machine used interactively) for whether to do a degraded mount.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Apr 2025 09:23:40 +0000 (05:23 -0400)]
bcachefs: export bch2_chacha20
Needed for userspcae.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Integral [Tue, 8 Apr 2025 10:31:29 +0000 (18:31 +0800)]
bcachefs: indent error messages of invalid compression
This patch uses printbuf_indent_add_nextline() to set a consistent
indentation level for error messages of invalid compression.
In my previous patch [1], the newline is added by using '\n' in
the argument of prt_str(). This patch replaces prt_str() with
prt_printf() to make indentation level work correctly.
Link: https://lore.kernel.org/20250406152659.205997-2-integral@archlinuxcn.org
Signed-off-by: Integral <integral@archlinuxcn.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Integral [Sun, 6 Apr 2025 15:26:59 +0000 (23:26 +0800)]
bcachefs: split error messages of invalid compression into two lines
When an invalid compression type or level is passed as an argument
to `--compression`, two error messages are squashed into one line:
> bcachefs format --compression=lzo bcachefs-comp.img
invalid option: invalid compression typecompression: parse error
> bcachefs format --compression=lz4:16 bcachefs-comp.img
invalid option: invalid compression levelcompression: parse error
To resolve this issue, add a newline character at the end of the
first error message to separate them into two lines.
Signed-off-by: Integral <integral@archlinuxcn.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Integral [Sun, 6 Apr 2025 14:53:28 +0000 (22:53 +0800)]
bcachefs: early return for negative values when parsing BCH_OPT_UINT
Currently, when passing a negative integer as argument, the error
message is "too big" due to casting to an unsigned integer:
> bcachefs format --block_size=-1 bcachefs.img
invalid option: block_size: too big (max 65536)
When negative value in argument detected, return early before
calling bch2_opt_validate().
A new error code `BCH_ERR_option_negative` is added.
Signed-off-by: Integral <integral@archlinuxcn.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 4 Apr 2025 00:56:09 +0000 (20:56 -0400)]
bcachefs: move_data_phys: stats are not required
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 5 Apr 2025 21:36:04 +0000 (17:36 -0400)]
bcachefs: RO mounts now use less memory
Defer memory allocations only needed in RW mode until we actually go RW.
This is part of improved support for RO images.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 5 Apr 2025 23:30:43 +0000 (19:30 -0400)]
bcachefs: Move various init code to _init_early()
_init_early() is for initialization that cannot fail, and often must
happen for teardown partway through initialization to work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 5 Apr 2025 23:41:35 +0000 (19:41 -0400)]
bcachefs: alphabetize init function calls
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 5 Apr 2025 23:26:19 +0000 (19:26 -0400)]
bcachefs: simplify journal pin initialization
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 5 Apr 2025 23:23:52 +0000 (19:23 -0400)]
bcachefs: btree_io_complete_wq -> btree_write_complete_wq
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 4 Apr 2025 02:30:39 +0000 (22:30 -0400)]
bcachefs: bch2_kvmalloc() mem alloc profiling
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 3 Apr 2025 16:18:39 +0000 (12:18 -0400)]
bcachefs: add missing include
Hygeine, and fix build in userspace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 2 Apr 2025 18:40:06 +0000 (14:40 -0400)]
bcachefs: bch2_snapshot_table_make_room()
Add a better helper for check_snapshot_exists().
create_snapids() can't be changed to use this, unfortunately, because
the transaction that creates new snapshot will also be inserting other
keys (e.g. root inode) that reference that snapshot ID, and they expect
the snapshot table to already be updated.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 2 Apr 2025 15:59:39 +0000 (11:59 -0400)]
bcachefs: darray: provide typedefs for primitive types
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 2 Apr 2025 21:23:22 +0000 (17:23 -0400)]
bcachefs: reduce new_stripe_alloc_buckets() stack usage
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 31 Mar 2025 21:50:52 +0000 (17:50 -0400)]
bcachefs: alloc_request no longer on stack
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 31 Mar 2025 21:57:06 +0000 (17:57 -0400)]
bcachefs: alloc_request.ptrs2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 31 Mar 2025 21:13:22 +0000 (17:13 -0400)]
bcachefs: alloc_request.ca
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 31 Mar 2025 21:08:43 +0000 (17:08 -0400)]
bcachefs: alloc_request.counters
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 31 Mar 2025 19:52:39 +0000 (15:52 -0400)]
bcachefs: alloc_request.usage
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 31 Mar 2025 21:54:43 +0000 (17:54 -0400)]
bcachefs: alloc_request: deallocate_extra_replicas()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 31 Mar 2025 19:46:45 +0000 (15:46 -0400)]
bcachefs: new_stripe_alloc_buckets() takes alloc_request
More stack usage improvements: instead of creating a new alloc_request
(currently on the stack), save/restore just the fields we need to reuse.
This is a bit tricky, because we're doing a normal alloc_foreground.c
allocation, which calls into ec.c to get a stripe, which then does more
normal allocations - some of the fields get reused, and used
differently.
So we have to save and restore them - but the stack usage improvements
will be well worth it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 31 Mar 2025 19:37:28 +0000 (15:37 -0400)]
bcachefs: bch2_ec_stripe_head_get() takes alloc_request
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 22 Mar 2025 01:13:53 +0000 (21:13 -0400)]
bcachefs: bch2_bucket_alloc_trans() takes alloc_request
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 22 Mar 2025 01:06:43 +0000 (21:06 -0400)]
bcachefs: alloc_request.data_type
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 22 Mar 2025 00:42:42 +0000 (20:42 -0400)]
bcachefs: struct alloc_request
Add a struct for common state for satisfying an on disk allocation,
instead of passing the same long list of items to every function.
This will help with stack usage, performance, and perhaps enable some
code cleanups.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 1 Apr 2025 18:29:31 +0000 (14:29 -0400)]
bcachefs: trace bch2_trans_kmalloc()
We're occasionally seeing the WARN_ON() for bump allocator usage
exceeding BTREE_TRANS_MEM_MAX; add some tracing so we can see what's
going on.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Roxana Nicolescu [Thu, 27 Mar 2025 14:50:09 +0000 (14:50 +0000)]
bcachefs: replace memcpy with memcpy_and_pad for jset_entry_log->d buff
This was achieved before by zero-ing out the source buffer and then
copying the bytes into the destination buffer. This can also be done with
memcpy_and_pad which will zero out only the destination buffer if its
size is bigger than the size of the source buffer. This is already used in
the same way in journal_transaction_name().
Moreover, zero-ing the source buffer was done twice, first in
__bch2_fs_log_msg() and then in bch2_trans_log_msg(). And this method
may also require allocating some extra memory for the source buffer.
In conclusion, using memcpy_and_pad is better even tough the result is
the same because it brings uniformity with what's already used in
journal_transaction_name, it avoids code duplication and reallocating
extra memory.
Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Roxana Nicolescu [Thu, 27 Mar 2025 14:50:05 +0000 (14:50 +0000)]
bcachefs: replace strncpy() with memcpy_and_pad in journal_transaction_name
Strncpy is now deprecated.
The buffer destination is not required to be NULL-terminated, but we also
want to zero out the rest of the buffer as it is already done in other
places.
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 11 Mar 2025 13:46:06 +0000 (09:46 -0400)]
bcachefs: Rebalance now skips poisoned extents
Let's not move poisoned extents unnecessarily, since we can't guard
against introducing more bitrot.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 11 Mar 2025 14:02:40 +0000 (10:02 -0400)]
bcachefs: Data move can read from poisoned extents
Now, if an extent is poisoned we can move it even if there was a
checksum error. We'll have to give it a new checksum, but the poison bit
means that userspace will still see the appropriate error when they try
to read it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 10 Mar 2025 18:03:25 +0000 (14:03 -0400)]
bcachefs: Poison extents that can't be read due to checksum errors
Copygc needs to be able to move extents that have bitrotted. We don't
want to delete them - in the future we'll have an API for "read me the
data even if there's checksum errors", and in general we don't want to
delete anything unless the user asks us to.
That will require writing it with a new checksum, which means we can't
forget that there was a checksum error so we return the correct error to
userspace.
Rebalance also wants to skip bad extents; we can now use the poison flag
for that.
This is currently disabled by default, as we want read fua support so
that we can distinguish between transient and permanent errors from the
device. It may be enabled with the module parameter:
poison_extents_on_checksum_error
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 10 Mar 2025 17:33:41 +0000 (13:33 -0400)]
bcachefs: Be precise about bch_io_failures
If the extent we're reading from changes, due to be being overwritten or
moved (possibly partially) - we need to reset bch_io_failures so that we
don't accidentally mark a new extent as poisoned prematurely.
This means we have to separately track (in the retry path) the extent we
previously read from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 29 Mar 2025 21:59:30 +0000 (17:59 -0400)]
bcachefs: bch2_subvolume_wait_for_pagecache_and_delete() cleanup
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 21 May 2025 04:38:04 +0000 (00:38 -0400)]
bcachefs: Check for casefolded dirents in non casefolded dirs
Check for mismatches between casefold dirents and casefold directories.
A mismatch will cause lookups to fail, as we'll be doing the lookup with
the casefolded name, which won't match the non-casefolded dirent, and
vice versa.
Reported-by: Christopher Snowhill <chris@kode54.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 21 May 2025 04:41:07 +0000 (00:41 -0400)]
bcachefs: Fix bch2_dirent_create_snapshot() for casefolding
bch2_dirent_create_snapshot(), used in fsck, neglected to create a
casefolded dirent.
Just move this into dirent_create_key().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 21 May 2025 00:05:45 +0000 (20:05 -0400)]
bcachefs: Fix casefold opt via xattr interface
Changing the casefold option requires extra checks/work - factor out a
helper from bch2_fileattr_set() for the xattr code to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 15 May 2025 01:32:40 +0000 (21:32 -0400)]
bcachefs: mkwrite() now only dirties one page
Don't dirty the whole folio - fixes write amplification with
applications doing mmaped writes.
https://www.reddit.com/r/bcachefs/comments/1klzcg1/incredible_amounts_of_write_amplification_when/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 19 May 2025 02:32:30 +0000 (22:32 -0400)]
bcachefs: fix extent_has_stripe_ptr()
This wasn't checking indirect extents.
Fixes: https://github.com/koverstreet/bcachefs/issues/887
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 17 May 2025 22:37:02 +0000 (18:37 -0400)]
bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
btree_key_cache_fill() will allocate and traverse another path (for the
underlying btree), so we can't hold pointers to paths across a call - we
have to pass indices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 14 May 2025 22:53:48 +0000 (18:53 -0400)]
bcachefs: fix wrong arg to fsck_err()
fsck_err() needs the btree transaction passed to it if there is one - so
that it can unlock/relock around prompting userspace for fixing the
error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 9 May 2025 19:05:19 +0000 (15:05 -0400)]
bcachefs: Fix missing commit in backpointer to missing target
Fsck wants to do transaction commits from an outer context; it may have
other repair to do (i.e. duplicate backpointers).
But when calling backpointer_not_found() from runtime code, i.e. runtime
self healing, we should be doing the commit - the outer context expects
to just be doing lookups.
This fixes bugs where we get stuck spinning, reported as "RCU lock hold
time warnings.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>