Kent Overstreet [Thu, 28 Nov 2024 02:58:43 +0000 (21:58 -0500)]
bcachefs: Guard against journal seq overflow
Wraparound is impractical to handle since in various places we use 0 as
a sentinal value - but 64 bits (or 56, because the btree write buffer
steals a few bits) is enough for all practical purposes.
Reported-by: syzbot+73ed43fbe826227bd4e0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 Nov 2024 08:00:54 +0000 (03:00 -0500)]
bcachefs: BCH_FS_recovery_running
If we're autofixing topology errors, we shouldn't shutdown if we're
still in recovery.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 02:28:07 +0000 (21:28 -0500)]
bcachefs: Make topology errors autofix
These repair paths are well tested, we can repair them without explicit
user intervention
This also tweaks bch2_topology_error() so that we run topology repair if
we're in recovery, not just fsck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 Nov 2024 05:29:52 +0000 (00:29 -0500)]
bcachefs: struct bkey_validate_context
Add a new parameter to bkey validate functions, and use it to improve
invalid bkey error messages: we can now print the btree and depth it
came from, or if it came from the journal, or is a btree root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 Nov 2024 06:03:41 +0000 (01:03 -0500)]
bcachefs: Ignore empty btree root journal entries
There's no reason to treat them as errors: just ignore them, and go with
a previous btree root if we had one.
Reported-by: syzbot+e22007d6acb9c87c2362@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 Nov 2024 03:59:27 +0000 (22:59 -0500)]
bcachefs: Fix null ptr deref in btree_path_lock_root()
Historically, we required that all btree node roots point to a valid
(possibly fake) node, but we're improving our ability to continue in the
presence of errors.
Reported-by: syzbot+e22007d6acb9c87c2362@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 Nov 2024 02:27:16 +0000 (21:27 -0500)]
bcachefs: Go RW earlier, for normal rw mount
Previously, when mounting read-write after a clean shutdown, we wouldn't
go read-write until after all the recovery passes completed.
Now, go RW early in recovery, the same as any other situation we'll need
to go read-write. This fixes a bug where we discover unlinked inodes
after a clean shutdown: repair fails because we're read only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 26 Nov 2024 20:16:57 +0000 (15:16 -0500)]
bcachefs: Fix bch2_btree_node_update_key_early()
Fix an assertion pop from the recent btree cache freelist fixes.
Fixes:
baefd3f849ed ("bcachefs: btree_cache.freeable list fixes")
Reported-by: Tyler <th020394@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 22:03:13 +0000 (17:03 -0500)]
bcachefs: Change "disk accounting version 0" check to commit only
6.11 had a bug where we'd sometimes create disk accounting keys with
version 0, which causes issues for journal replay - but we don't need to
delete existing accounting keys with version 0.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 07:05:02 +0000 (02:05 -0500)]
bcachefs: Don't try to en/decrypt when encryption not available
If a btree node says it's encrypted, but the superblock never had an
encryptino key - whoops, that needs to be handled.
Reported-by: syzbot+026f1857b12f5eb3f9e9@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 06:26:56 +0000 (01:26 -0500)]
bcachefs: Fix dup/misordered check in btree node read
We were checking for out of order keys, but not duplicate keys.
Reported-by: syzbot+dedbd67513939979f84f@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 05:21:27 +0000 (00:21 -0500)]
bcachefs: Bad btree roots are now autofix
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 04:28:21 +0000 (23:28 -0500)]
bcachefs: Kill bch2_bucket_alloc_new_fs()
The early-early allocation path, bch2_bucket_alloc_new_fs(), is no
longer needed - and inconsistencies around new_fs_bucket_idx have been a
frequent source of bugs.
Reported-by: syzbot+592425844580a6598410@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 03:57:01 +0000 (22:57 -0500)]
bcachefs: Fix btree node scan when unknown btree IDs are present
btree_root entries for unknown btree IDs are created during recovery,
before reading those btree roots.
But btree_node_scan may find btree nodes with unknown btree IDs when we
haven't seen roots for those btrees.
Reported-by: syzbot+1f202d4da221ec6ebf8e@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 03:45:25 +0000 (22:45 -0500)]
bcachefs: backpointer_to_missing_ptr is now autofix
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 03:28:41 +0000 (22:28 -0500)]
bcachefs: Fix accounting_read when we rewind
If we rewind recovery to run topology repair, that causes
accounting_read to run twice.
This fixes accounting being double counted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 03:23:41 +0000 (22:23 -0500)]
bcachefs: disk_accounting: bch2_dev_rcu -> bch2_dev_rcu_noerror
Accounting keys that reference invalid devices are corrected by fsck,
they shouldn't cause an emergency shutdown.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 02:49:08 +0000 (21:49 -0500)]
bcachefs: errcode cleanup: journal errors
Instead of throwing standard error codes, we should be throwing
dedicated private error codes, this greatly improves debugability.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Nov 2024 01:15:30 +0000 (20:15 -0500)]
bcachefs: Use separate rhltable for bch2_inode_or_descendents_is_open()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 24 Nov 2024 03:12:58 +0000 (22:12 -0500)]
bcachefs: BCH_ERR_btree_node_read_error_cached
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 23 Apr 2024 06:18:18 +0000 (02:18 -0400)]
bcachefs: btree_write_buffer_flush_seq() no longer closes journal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 22 Nov 2024 01:09:45 +0000 (20:09 -0500)]
bcachefs: discard fastpath now uses bch2_discard_one_bucket()
The discard bucket fastpath previously was using its own code for
discarding buckets and clearing them in the need_discard btree, which
didn't have any of the consistency checks of the main discard path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 23 Nov 2024 21:47:10 +0000 (16:47 -0500)]
bcachefs: Bias reads more in favor of faster device
Per reports of performance issues on mixed multi device filesystems
where we're issuing too much IO to the spinning rust - tweak this
algorithm.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 23 Nov 2024 23:21:12 +0000 (18:21 -0500)]
bcachefs: trivial btree write buffer refactoring
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 23 Nov 2024 21:27:47 +0000 (16:27 -0500)]
bcachefs: Can now block journal activity without closing cur entry
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 02:34:43 +0000 (21:34 -0500)]
bcachefs: New backpointers helpers
- bch2_backpointer_del()
- bch2_backpointer_maybe_flush()
Kill a bit of open coding and make sure we're properly handling the
btree write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Nov 2024 23:26:54 +0000 (18:26 -0500)]
bcachefs: kill bch_backpointer.bucket_offset usage
bch_backpointer.bucket_offset is going away - it's no longer needed
since we no longer store backpointers in alloc keys, the same
information is in the key position itself.
And we'll be reclaiming the space in bch_backpointer for the bucket
generation number.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 18 Nov 2024 05:32:57 +0000 (00:32 -0500)]
bcachefs: Fix check_backpointers_to_extents range limiting
bch2_get_btree_in_memory_pos() will return positions that refer directly
to the btree it's checking will fit in memory - i.e. backpointer
positions, not buckets.
This also means check_bp_exists() no longer has to refer to the device,
and we can delete some code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 22:36:09 +0000 (17:36 -0500)]
bcachefs: bch_backpointer -> bkey_i_backpointer
Since we no longer store backpointers in alloc keys, there's no reason
not to pass around bkey_i_backpointers; this means we don't have to pass
the bucket pos separately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 22:45:44 +0000 (17:45 -0500)]
bcachefs: Drop swab code for backpointers in alloc keys
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 21:30:30 +0000 (16:30 -0500)]
bcachefs: bucket_pos_to_bp_end()
Better helpers for iterating over backpointers within a specific bucket
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 18 Nov 2024 05:16:52 +0000 (00:16 -0500)]
bcachefs: check for backpointers to invalid device
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 03:49:40 +0000 (22:49 -0500)]
bcachefs: fix bp_pos_to_bucket_nodev_noerror
_noerror means don't produce inconsistent errors, so it should be using
bch2_dev_rcu_noerror().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 9 Dec 2024 11:18:49 +0000 (06:18 -0500)]
bcachefs: Fix evacuate_bucket tracepoint
86a494c8eef9 ("bcachefs: Kill bch2_get_next_backpointer()") dropped some
things the tracepoint emitted because bch2_evacuate_bucket() no longer
looks at the alloc key - but we did want at least some of that.
We still no longer look at the alloc key so we can't report on the
fragmentation number, but that's a direct function of dirty_sectors and
a copygc concern anyways - copygc should get its own tracepoint that
includes information from the fragmentation LRU.
But we can report on the number of sectors we moved and the bucket size.
Co-developed-by: Piotr Zalewski <pZ010001011111@proton.me>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Nov 2024 07:23:24 +0000 (02:23 -0500)]
bcachefs: fix O(n^2) issue with whiteouts in journal keys
The journal_keys array can't be substantially modified after we go RW,
because lookups need to be able to check it locklessly - thus we're
limited on what we can do when a key in the journal has been
overwritten.
This is a problem when there's many overwrites to skip over for peek()
operations. To fix this, add tracking of ranges of overwrites: we create
a range entry when there's more than one contiguous whiteout.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Nov 2024 19:39:46 +0000 (14:39 -0500)]
bcachefs: btree_and_journal_iter: don't iterate over too many whiteouts when prefetching
To help ameloriate issues with peek operations having to skip over
deletions in the journal - just bail out if all we're doing is
prefetching btree nodes.
Since btree node prefetching runs every time we iterate to a new node,
and has to sequentially scan ahead, this avoids another O(n^2).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Nov 2024 19:20:35 +0000 (14:20 -0500)]
bcachefs: journal keys: sort keys for interior nodes first
There's an unavoidable issue with btree lookups when we're overlaying
journal keys and the journal has many deletions for keys present in the
btree - peek operations will have to iterate over all those deletions to
find the next live key to return.
This is mainly a problem for lookups in interior nodes, if we have to
traverse to a leaf. Looking up an insert position in a leaf (for journal
replay) doesn't have to find the next live key, but walking down the
btree does.
So to ameloriate this, change journal key sort ordering so that we
replay keys from roots and interior nodes first.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Nov 2024 04:54:19 +0000 (23:54 -0500)]
bcachefs: kill bch2_journal_entries_free()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 04:03:40 +0000 (23:03 -0500)]
bcachefs: Don't BUG_ON() when superblock feature wasn't set for compressed data
We don't allocate the mempools for compression/decompression unless we
need them - but that means there's an inconsistency to check for.
Reported-by: syzbot+cb3fbcfb417448cfd278@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 05:52:20 +0000 (00:52 -0500)]
bcachefs: Don't use a shared decompress workspace mempool
gzip and zstd require different decompress workspace sizes, and if we
start with one and then start using the other at runtime we may not get
the correct size
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Nov 2024 02:03:53 +0000 (21:03 -0500)]
bcachefs: compression workspaces should be indexed by opt, not type
type includes lz4 and lz4_old, which do not get different compression
workspaces, and incompressible, a fake type - BCH_COMPRESSION_OPTS() is
the correct enum to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Nov 2024 08:31:01 +0000 (03:31 -0500)]
bcachefs: add missing BTREE_ITER_intent
this fixes excessive transaction restarts due to trans_commit having to
upgrade
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 02:53:38 +0000 (21:53 -0500)]
bcachefs: Kill bch2_get_next_backpointer()
Since for quite some time backpointers have only been stored in the
backpointers btree, not alloc keys (an aborted experiment, support for
which has been removed) - we can replace get_next_backpointer() with
simple btree iteration.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 02:28:40 +0000 (21:28 -0500)]
bcachefs: Delete backpointers check in try_alloc_bucket()
try_alloc_bucket() has a "safety" check, which avoids allocating a
bucket if there's any backpointers present.
But backpointers are not the source of truth for live data in a bucket,
the bucket sector counts are; this check was fairly useless, and we're
also deferring backpointers checks from fsck to runtime in the near
future.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 26 Oct 2024 00:41:06 +0000 (20:41 -0400)]
bcachefs: peek_prev_min(): Search forwards for extents, snapshots
With extents and snapshots, for slightly different reasons, we may have
to search forwards to find a key that compares equal to iter->pos (i.e.
a key that peek_prev() should return, as it returns keys <= iter->pos).
peek_slot() does this, and is an easy way to fix this case.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 25 Oct 2024 02:12:37 +0000 (22:12 -0400)]
bcachefs: Implement bch2_btree_iter_prev_min()
A user contributed a filessytem dump, where the dump was actually
corrupted (due to being taken while the filesystem was online), but
which exposed an interesting bug in fsck - reconstruct_inode().
When itearting in BTREE_ITER_filter_snapshots mode, it's required to
give an end position for the iteration and it can't span inode numbers;
continuing into the next inode might mean we start seeing keys from a
different snapshot tree, that the is_ancestor() checks always filter,
thus we're never able to return a key and stop iterating.
Backwards iteration never implemented the end position because nothing
else needed it - except for reconstuct_inode().
Additionally, backwards iteration is now able to overlay keys from the
journal, which will be useful if we ever decide to start doing journal
replay in the background.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 27 Oct 2024 03:25:17 +0000 (23:25 -0400)]
bcachefs: discard_one_bucket() now uses need_discard_or_freespace_err()
More conversion of inconsistent errors to fsck errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 27 Oct 2024 02:21:20 +0000 (22:21 -0400)]
bcachefs: bch2_bucket_do_index(): inconsistent_err -> fsck_err
Factor out a common helper, need_discard_or_freespace_err(), which is
now used by both fsck and the runtime checks, and can repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 27 Oct 2024 04:40:43 +0000 (00:40 -0400)]
bcachefs: try_alloc_bucket() now uses bch2_check_discard_freespace_key()
check_discard_freespace_key() was doing all the same checks as
try_alloc_bucket(), but with repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 28 Oct 2024 00:47:03 +0000 (20:47 -0400)]
bcachefs: rework bch2_bucket_alloc_freelist() freelist iteration
Prep work for converting try_alloc_bucket() to use
bch2_check_discard_freespace_key().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 27 Oct 2024 04:05:54 +0000 (00:05 -0400)]
bcachefs: kill inconsistent err in invalidate_one_bucket()
Change it to a normal fsck_err() - meaning it'll get repaired at runtime
when that's flipped on.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 21 Oct 2024 00:27:44 +0000 (20:27 -0400)]
bcachefs: Don't delete reflink pointers to missing indirect extents
To avoid tragic loss in the event of transient errors (i.e., a btree
node topology error that was later corrected by btree node scan), we
can't delete reflink pointers to correct errors.
This adds a new error bit to bch_reflink_p, indicating that it is known
to point to a missing indirect extent, and the error has already been
reported.
Indirect extent lookups now use bch2_lookup_indirect_extent(), which on
error reports it as a fsck_err() and sets the error bit, and clears it
if necessary on succesful lookup.
This also gets rid of the bch2_inconsistent_error() call in
__bch2_read_indirect_extent, and in the reflink_p trigger: part of the
online self healing project.
An on disk format change isn't necessary here: setting the error bit
will be interpreted by older versions as pointing to a different index,
which will also be missing - which is fine.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 31 Oct 2024 05:25:09 +0000 (01:25 -0400)]
bcachefs: Reorganize reflink.c a bit
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 29 Oct 2024 03:43:16 +0000 (23:43 -0400)]
bcachefs: Reserve 8 bits in bch_reflink_p
Better repair for reflink pointers, as well as propagating new inode
options to indirect extents, are going to require a few extra bits
bch_reflink_p: so claim a few from the high end of the destination
index.
Also add some missing bounds checking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 29 Oct 2024 01:27:23 +0000 (21:27 -0400)]
bcachefs: Kill FSCK_NEED_FSCK
If we find an error that indicates that we need to run fsck, we can
specify that directly with run_explicit_recovery_pass().
These are now log_fsck_err() calls: we're just logging in the superblock
that an error occurred - and possibly doing an emergency shutdown,
depending on policy.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 29 Oct 2024 05:17:08 +0000 (01:17 -0400)]
bcachefs: lru errors are expected when reconstructing alloc
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 27 Oct 2024 02:52:06 +0000 (22:52 -0400)]
bcachefs: Delete dead code from bch2_discard_one_bucket()
alloc key validation ensures that if a bucket is in need_discard state
the sector counts are all zero - we don't have to check for that.
The NEED_INC_GEN check appears to be dead code, as well: we only see
buckets in the need_discard btree, and it's an error if they aren't in
the need_discard state.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 27 Oct 2024 03:35:03 +0000 (23:35 -0400)]
bcachefs: bch2_btree_bit_mod_iter()
factor out a new helper, make it handle extents bitset btrees
(freespace).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 12 Nov 2024 08:53:30 +0000 (03:53 -0500)]
bcachefs: delete dead code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 8 Nov 2024 02:50:00 +0000 (21:50 -0500)]
bcachefs: Fix shutdown message
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 8 Nov 2024 00:15:38 +0000 (19:15 -0500)]
bcachefs: Don't use page allocator for sb_read_scratch
Kill another unnecessary dependency on PAGE_SIZE
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Youling Tang [Wed, 16 Oct 2024 01:49:11 +0000 (09:49 +0800)]
bcachefs: Simplify code in bch2_dev_alloc()
- Remove unnecessary variable 'ret'.
- Remove unnecessary bch2_dev_free() operations.
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Youling Tang [Fri, 27 Sep 2024 08:40:42 +0000 (16:40 +0800)]
bcachefs: Remove redundant initialization in bch2_vfs_inode_init()
`inode->v.i_ino` has been initialized to `inum.inum`. If `inum.inum` and
`bi->bi_inum` are not equal, BUG_ON() is triggered in
bch2_inode_update_after_write().
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Youling Tang [Tue, 24 Sep 2024 02:53:50 +0000 (10:53 +0800)]
bcachefs: Removes NULL pointer checks for __filemap_get_folio return values
__filemap_get_folio the return value cannot be NULL, so unnecessary checks
are removed.
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 9 Jul 2024 01:11:34 +0000 (09:11 +0800)]
bcachefs: Add support for FS_IOC_GETFSSYSFSPATH
[TEST]:
```
$ cat ioctl_getsysfspath.c
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <linux/fs.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int fd;
struct fs_sysfs_path sysfs_path = {};
if (argc != 2) {
fprintf(stderr, "Usage: %s <path_to_file_or_directory>\n", argv[0]);
exit(EXIT_FAILURE);
}
fd = open(argv[1], O_RDONLY);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
if (ioctl(fd, FS_IOC_GETFSSYSFSPATH, &sysfs_path) == -1) {
perror("ioctl FS_IOC_GETFSSYSFSPATH");
close(fd);
exit(EXIT_FAILURE);
}
printf("FS_IOC_GETFSSYSFSPATH: %s\n", sysfs_path.name);
close(fd);
return 0;
}
$ gcc ioctl_getsysfspath.c
$ sudo bcachefs format /dev/sda
$ sudo mount.bcachefs /dev/sda /mnt
$ sudo ./a.out /mnt
FS_IOC_GETFSSYSFSPATH: bcachefs/
c380b4ab-fbb6-41d2-b805-
7a89cae9cadb
```
Original patch link:
[1]: https://lore.kernel.org/all/
20240207025624.
1019754-8-kent.overstreet@linux.dev/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Youling Tang <youling.tang@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 9 Jul 2024 01:11:33 +0000 (09:11 +0800)]
bcachefs: Add support for FS_IOC_GETFSUUID
Use super_set_uuid() to set `sb->s_uuid_len` to avoid returning `-ENOTTY`
with sb->s_uuid_len being 0.
Original patch link:
[1]: https://lore.kernel.org/all/
20240207025624.
1019754-2-kent.overstreet@linux.dev/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Youling Tang [Wed, 16 Oct 2024 01:50:26 +0000 (09:50 +0800)]
bcachefs: Correct the description of the '--bucket=size' options
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Integral [Wed, 23 Oct 2024 10:00:33 +0000 (18:00 +0800)]
bcachefs: add support for true/false & yes/no in bool-type options
Here is the patch which uses existing constant table:
Currently, when using bcachefs-tools to set options, bool-type options
can only accept 1 or 0. Add support for accepting true/false and yes/no
for these options.
Signed-off-by: Integral <integral@murena.io>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: David Howells <dhowells@redhat.com>
Kent Overstreet [Wed, 6 Nov 2024 18:13:25 +0000 (13:13 -0500)]
bcachefs: Move fsck ioctl code to fsck.c
chardev.c and fs-ioctl.c are not organized by subject; let's try to fix
this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 26 Oct 2024 02:16:19 +0000 (22:16 -0400)]
bcachefs: Kill unnecessary iter_rewind() in bkey_get_empty_slot()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 25 Oct 2024 05:48:26 +0000 (01:48 -0400)]
bcachefs: Simplify btree_iter_peek() filter_snapshots
Collapse all the BTREE_ITER_filter_snapshots handling down into a single
block; btree iteration is much simpler in the !filter_snapshots case.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 24 Oct 2024 22:39:59 +0000 (18:39 -0400)]
bcachefs: Rename btree_iter_peek_upto() -> btree_iter_peek_max()
We'll be introducing btree_iter_peek_prev_min(), so rename for
consistency.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 26 Oct 2024 02:31:20 +0000 (22:31 -0400)]
bcachefs: Assert that we're not violating key cache coherency rules
We're not allowed to have a dirty key in the key cache if the key
doesn't exist at all in the btree - creation has to bypass the key
cache, so that iteration over the btree can check if the key is present
in the key cache.
Things break in subtle ways if cache coherency is broken, so this needs
an assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 27 Oct 2024 23:32:40 +0000 (19:32 -0400)]
bcachefs: bch2_trans_verify_not_unlocked_or_in_restart()
Fold two asserts into one.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Oct 2024 03:52:51 +0000 (23:52 -0400)]
bcachefs: Better in_restart error
We're ramping up on checking transaction restart handling correctness -
so, in debug mode we now save a backtrace for where the restart was
emitted, which makes it much easier to track down the incorrect
handling.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Oct 2024 03:33:57 +0000 (23:33 -0400)]
bcachefs: Assert we're not in a restart in bch2_trans_put()
This always indicates a transaction restart handling bug
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 8 Nov 2024 03:00:05 +0000 (22:00 -0500)]
bcachefs: Fix unhandled transaction restart in evacuate_bucket()
Generally, releasing a transaction within a transaction restart means an
unhandled transaction restart: but this can happen legitimately within
the move code, e.g. when bch2_move_ratelimit() tells us to exit before
we've retried.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 31 Oct 2024 04:25:36 +0000 (00:25 -0400)]
bcachefs: Improved check_topology() assert
On interior btree node updates, we always verify that we're not
introducing topology errors: child nodes should exactly span the range
of the parent node.
single_device.ktest small_nodes has been popping this assert: change it
to give us more information.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 31 Oct 2024 07:39:32 +0000 (03:39 -0400)]
bcachefs: Kill BCH_TRANS_COMMIT_lazy_rw
We unconditionally go read-write, if we're going to do so, before
journal replay: lazy_rw is obsolete.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 31 Oct 2024 07:35:41 +0000 (03:35 -0400)]
bcachefs: Add assert for use of journal replay keys for updates
The journal replay keys mechanism can only be used for updates in early
recovery, when still single threaded.
Add some asserts to make sure we never accidentally use it elsewhere.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Hongbo Li [Tue, 29 Oct 2024 12:54:08 +0000 (20:54 +0800)]
bcachefs: use attribute define helper for sysfs attribute
The sysfs attribute definition has been wrapped into macro:
rw_attribute, read_attribute and write_attribute, we can
use these helpers to uniform the attribute definition.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Hongbo Li [Tue, 29 Oct 2024 12:53:50 +0000 (20:53 +0800)]
bcachefs: remove write permission for gc_gens_pos sysfs interface
The gc_gens_pos is used to show the status of bucket gen gc.
There is no need to assign write permissions for this attribute.
Here we can use read_attribute helper to define this attribute.
```
[Before]
$ ll internal/gc_gens_pos
-rw-r--r-- 1 root root 4096 Oct 28 15:27 internal/gc_gens_pos
[After]
$ ll internal/gc_gens_pos
-r--r--r-- 1 root root 4096 Oct 28 17:27 internal/gc_gens_pos
```
Fixes:
ac516d0e7db7 ("bcachefs: Add the status of bucket gen gc to sysfs")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 29 Oct 2024 03:23:18 +0000 (23:23 -0400)]
bcachefs: Move bch_extent_rebalance code to rebalance.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 26 Oct 2024 05:42:57 +0000 (01:42 -0400)]
bcachefs: Improve trace_rebalance_extent
We now say explicitly which pointers are being moved or compressed
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 01:41:20 +0000 (21:41 -0400)]
bcachefs: Simplify option logic in rebalance
Since bch2_move_get_io_opts() now synchronizes io_opts with options from
bch_extent_rebalance, delete the ad-hoc logic in rebalance.c that
previously did this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 01:41:20 +0000 (21:41 -0400)]
bcachefs: get_update_rebalance_opts()
bch2_move_get_io_opts() now synchronizes options loaded from the
filesystem and inode (if present, i.e. not walking the reflink btree
directly) with options from the bch_extent_rebalance_entry, updating the
extent if necessary.
Since bch_extent_rebalance tracks where its option came from we can
preserve "inode options override filesystem options", even for indirect
extents where we don't have access to the inode the options came from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 21 Oct 2024 00:53:53 +0000 (20:53 -0400)]
bcachefs: bch2_write_inode() now checks for changing rebalance options
Previously, BCHFS_IOC_REINHERIT_ATTRS didn't trigger rebalance scans
when changing rebalance options - it had been missed, only the xattr
interface triggered them.
Ideally they'd be done by the transactional trigger, but unpacking the
inode to get the options is too heavy to be done in the low level
trigger - the inode trigger is run on every extent update, since the
bch_inode.bi_journal_seq has to be updated for fsync.
bch2_write_inode() is a good compromise, it already unpacks and repacks
and is not run in any super-fast paths.
Additionally, creating the new rebalance entry to trigger the scan is
now done in the same transaction as the inode update that changed the
options.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 01:41:20 +0000 (21:41 -0400)]
bcachefs: New bch_extent_rebalance fields
- Add more io path options to bch_extent_rebalance
- For each option, track whether it came from the filesystem or the
inode
This will be used for improved rebalance support for reflinked data.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 28 Oct 2024 05:14:53 +0000 (01:14 -0400)]
bcachefs: bch2_prt_csum_opt()
bounds checking helper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 24 Oct 2024 05:06:53 +0000 (01:06 -0400)]
bcachefs: copygc_enabled, rebalance_enabled now opts.h options
They can now be set at mount time
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 03:26:11 +0000 (23:26 -0400)]
bcachefs: Add bch_io_opts fields for indicating whether the opts came from the inode
This is going to be used in the bch_extent_rebalance improvements, which
propagate io_path options into the extent (important for rebalance,
which needs something present in the extent for transactionally tagging
them in the rebalance_work btree, and also for indirect extents).
By tracking in bch_extent_rebalance whether the option came from the
filesystem or the inode we can correctly handle options being changed on
indirect extents.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 06:28:51 +0000 (02:28 -0400)]
bcachefs: io_opts_to_rebalance_opts()
New helper to simplify bch2_bkey_set_needs_rebalance()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 06:21:28 +0000 (02:21 -0400)]
bcachefs: rename bch_extent_rebalance fields to match other opts structs
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 06:14:53 +0000 (02:14 -0400)]
bcachefs: kill __bch2_bkey_sectors_need_rebalance()
Single caller, fold into bch2_bkey_sectors_need_rebalance()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 05:40:19 +0000 (01:40 -0400)]
bcachefs: kill bch2_bkey_needs_rebalance()
Dead code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 05:32:55 +0000 (01:32 -0400)]
bcachefs: small cleanup for extent ptr bitmasks
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 05:21:43 +0000 (01:21 -0400)]
bcachefs: bch2_io_opts_fixups()
Centralize some io path option fixups - they weren't always being
applied correctly:
- background_compression uses compression if unset
- background_target uses foreground_target if unset
- nocow disables most fancy io path options
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 05:16:16 +0000 (01:16 -0400)]
bcachefs: use bch2_data_update_opts_to_text() in trace_move_extent_fail()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 05:11:29 +0000 (01:11 -0400)]
bcachefs: avoid 'unsigned flags'
flags should have actual types, where possible: fix btree_update.h
helpers
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Thorsten Blum [Sat, 26 Oct 2024 15:47:04 +0000 (17:47 +0200)]
bcachefs: Annotate struct bucket_gens with __counted_by()
Add the __counted_by compiler attribute to the flexible array member b
to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Use struct_size() to calculate the number of bytes to be allocated.
Update bucket_gens->nbuckets and bucket_gens->nbuckets_minus_first when
resizing.
Compile-tested only.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>