linux-block.git
10 months agobcachefs: Add an option for whether inodes use the key cache
Kent Overstreet [Sun, 13 Jun 2021 21:07:18 +0000 (17:07 -0400)]
bcachefs: Add an option for whether inodes use the key cache

We probably don't ever want to flip this off in production, but it may
be useful for certain kinds of testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix an allocator shutdown deadlock
Kent Overstreet [Tue, 13 Jul 2021 20:12:00 +0000 (16:12 -0400)]
bcachefs: Fix an allocator shutdown deadlock

On fstest generic/388, we were seeing sporadic deadlocks in the
emergency shutdown, where we'd get stuck shutting down the allocator
because bch2_btree_update_start() -> bch2_btree_reserve_get() allocated
and then deallocated some btree nodes, putting them back on the
btree_reserve_cache, after the allocator shutdown code had already
cleared out that cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add safe versions of varint encode/decode
Kent Overstreet [Tue, 13 Jul 2021 20:03:51 +0000 (16:03 -0400)]
bcachefs: Add safe versions of varint encode/decode

This adds safe versions of bch2_varint_(encode|decode) that don't read
or write past the end of the buffer, or varint being encoded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add open_buckets to sysfs
Kent Overstreet [Tue, 13 Jul 2021 03:52:49 +0000 (23:52 -0400)]
bcachefs: Add open_buckets to sysfs

This is to help debug a rare shutdown deadlock in the allocator code -
the btree code is leaking open_buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Ensure bad d_type doesn't oops in bch2_dirent_to_text()
Kent Overstreet [Tue, 13 Jul 2021 03:17:15 +0000 (23:17 -0400)]
bcachefs: Ensure bad d_type doesn't oops in bch2_dirent_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Kick off btree node writes from write completions
Kent Overstreet [Sun, 11 Jul 2021 20:41:14 +0000 (16:41 -0400)]
bcachefs: Kick off btree node writes from write completions

This is a performance improvement by removing the need to wait for the
in flight btree write to complete before kicking one off, which is going
to be needed to avoid a performance regression with the upcoming patch
to update btree ptrs after every btree write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Mask out unknown compat features when going read-write
Kent Overstreet [Sun, 11 Jul 2021 17:54:07 +0000 (13:54 -0400)]
bcachefs: Mask out unknown compat features when going read-write

Compat features should be cleared if the filesystem was touched by a
version that doesn't support them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Really don't hold btree locks while btree IOs are in flight
Kent Overstreet [Sun, 11 Jul 2021 03:03:15 +0000 (23:03 -0400)]
bcachefs: Really don't hold btree locks while btree IOs are in flight

This is something we've attempted to stick to for quite some time, as it
helps guarantee filesystem latency - but there's a few remaining paths
that this patch fixes.

This is also necessary for an upcoming patch to update btree pointers
after every btree write - since the btree write completion path will now
be doing btree operations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Regularize argument passing of btree_trans
Kent Overstreet [Sun, 11 Jul 2021 03:22:06 +0000 (23:22 -0400)]
bcachefs: Regularize argument passing of btree_trans

btree_trans should always be passed when we have one - iter->trans is
disfavoured. This mainly updates old code in btree_update_interior.c,
some of which predates btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: docs: add docs for bch2_trans_reset
Dan Robertson [Thu, 8 Jul 2021 02:31:36 +0000 (22:31 -0400)]
bcachefs: docs: add docs for bch2_trans_reset

Add basic kernel docs for bch2_trans_reset and bch2_trans_begin.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: set disk state should check new_state
Dan Robertson [Thu, 8 Jul 2021 22:15:38 +0000 (18:15 -0400)]
bcachefs: set disk state should check new_state

A new device state that is not a valid state should return -EINVAL
in the disk set state ioctl.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE
Kent Overstreet [Tue, 6 Jul 2021 02:16:02 +0000 (22:16 -0400)]
bcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE

Add a new flag to control assertions about updating to internal snapshot
nodes, that normally should not be written to - to be used in an
upcoming patch.

Also do some renaming - trigger_flags is now update_flags.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: bch2_d_types[]
Kent Overstreet [Tue, 6 Jul 2021 02:18:07 +0000 (22:18 -0400)]
bcachefs: bch2_d_types[]

Add readable names for d_type, and use it in dirent_to_text().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix bch2_btree_iter_peek_slot() assertion
Kent Overstreet [Tue, 6 Jul 2021 02:08:28 +0000 (22:08 -0400)]
bcachefs: Fix bch2_btree_iter_peek_slot() assertion

This assertion is checking that what the iterator points to is
consistent with iter->real_pos, and since it's an internal btree
ordering property it should be using bpos_cmp.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Split out SPOS_MAX
Kent Overstreet [Tue, 6 Jul 2021 02:02:07 +0000 (22:02 -0400)]
bcachefs: Split out SPOS_MAX

Internal btree code really wants a POS_MAX with all fields ~0; external
code more likely wants the snapshot field to be 0, because when we're
passing it to bch2_trans_get_iter() it's used for the snapshot we're
operating in, which should be 0 for most btrees that don't use
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: add bcachefs xxhash support
jpsollie [Thu, 17 Jun 2021 11:42:09 +0000 (13:42 +0200)]
bcachefs: add bcachefs xxhash support

xxhash is a much faster algorithm compared to crc32.
could be used to speed up checksum calculation.
xxhash 64-bit only, as it is much faster on 64-bit CPUs compared to xxh32.

Signed-off-by: jpsollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Prepare checksums for more advanced algorithms
jpsollie [Thu, 17 Jun 2021 09:29:59 +0000 (11:29 +0200)]
bcachefs: Prepare checksums for more advanced algorithms

Perform abstraction of hash calculation for advanced checksum algorithms.
Algorithms like xxhash do not store their state as a u64 int.

Signed-off-by: jpsollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Enforce SYS_CAP_ADMIN within ioctls
Tobias Geerinckx-Rice [Sun, 4 Jul 2021 19:35:32 +0000 (21:35 +0200)]
bcachefs: Enforce SYS_CAP_ADMIN within ioctls

bch2_fs_ioctl() didn't distinguish between unsupported ioctls and those
which the current user is unauthorised to perform.  That kept the code
simple but meant that, for example, an unprivileged TIOCGWINSZ ioctl on
a bcachefs file would return -EPERM instead of the expected -ENOTTY.
The same call made by a privileged user would correctly return -ENOTTY.

Fix this discrepancy by moving the check for CAP_SYS_ADMIN into each
privileged ioctl function.

Signed-off-by: Tobias Geerinckx-Rice <me@tobias.gr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Fix bch2_btree_iter_peek_prev()
Kent Overstreet [Sun, 4 Jul 2021 03:57:09 +0000 (23:57 -0400)]
bcachefs: Fix bch2_btree_iter_peek_prev()

In !BTREE_ITER_IS_EXTENTS mode, we shouldn't be looking at k->size, i.e.
we shouldn't use bkey_start_pos().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix bch2_acl_chmod() cleanup on error
Dan Robertson [Thu, 24 Jun 2021 01:52:41 +0000 (21:52 -0400)]
bcachefs: Fix bch2_acl_chmod() cleanup on error

Avoid calling kfree on the returned error pointer if
bch2_acl_from_disk fails.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: statfs bfree and bavail should be the same
Dan Robertson [Wed, 23 Jun 2021 23:25:00 +0000 (19:25 -0400)]
bcachefs: statfs bfree and bavail should be the same

The value of f_bfree and f_bavail should be the same. The value of
f_bfree is not currently scaled by the availability factor.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Fix shift-by-64 in bch2_bkey_format_validate()
Kent Overstreet [Thu, 24 Jun 2021 17:19:25 +0000 (13:19 -0400)]
bcachefs: Fix shift-by-64 in bch2_bkey_format_validate()

We need to ensure that packed formats can't represent fields larger than
the unpacked format, which is a bit tricky since the calculations can
also overflow a u64. This patch fixes a shift and simplifies the overall
calculations.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: fix truncate without a size change
Dan Robertson [Mon, 28 Jun 2021 00:54:34 +0000 (20:54 -0400)]
bcachefs: fix truncate without a size change

Do not attempt to shortcut a truncate when the given new size is
the same as the current size. There may be blocks allocated to the
file that extend beyond the i_size. The ctime and mtime should
not be updated in this case.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: fix ifdef for x86_64 asm
Dan Robertson [Sat, 3 Jul 2021 01:22:06 +0000 (21:22 -0400)]
bcachefs: fix ifdef for x86_64 asm

The implementation of prefetch_four_cachelines should use ifdef
CONFIG_X86_64 to conditionally compile x86_64 asm.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: ensure iter->should_be_locked is set
Dan Robertson [Tue, 29 Jun 2021 22:52:13 +0000 (18:52 -0400)]
bcachefs: ensure iter->should_be_locked is set

Ensure that iter->should_be_locked is set to true before we
call bch2_trans_update in __bch2_dev_usrdata_drop.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Fix unused variable warning when !BCACHEFS_DEBUG
Christopher James Halse Rogers [Fri, 25 Jun 2021 01:45:19 +0000 (11:45 +1000)]
bcachefs: Fix unused variable warning when !BCACHEFS_DEBUG

Signed-off-by: Christopher James Halse Rogers <raof@ubuntu.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Use memalloc_nofs_save() in bch2_read_endio()
Kent Overstreet [Wed, 30 Jun 2021 19:44:11 +0000 (15:44 -0400)]
bcachefs: Use memalloc_nofs_save() in bch2_read_endio()

This solves a problematic memory allocation in bch2_bio_uncompress() ->
vmap().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix btree_node_read_all_replicas() error handling
Kent Overstreet [Wed, 23 Jun 2021 01:51:17 +0000 (21:51 -0400)]
bcachefs: Fix btree_node_read_all_replicas() error handling

We weren't checking bch2_btree_node_read_done() for errors, oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Don't loop into topology repair
Kent Overstreet [Wed, 23 Jun 2021 00:44:54 +0000 (20:44 -0400)]
bcachefs: Don't loop into topology repair

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Don't ratelimit certain fsck errors
Kent Overstreet [Mon, 21 Jun 2021 20:28:43 +0000 (16:28 -0400)]
bcachefs: Don't ratelimit certain fsck errors

It's unhelpful if we see "Halting mark and sweep to start topology
repair" but we don't see the error that triggered it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: ensure iter->should_be_locked is set
Dan Robertson [Thu, 17 Jun 2021 03:21:23 +0000 (23:21 -0400)]
bcachefs: ensure iter->should_be_locked is set

Ensure that iter->should_be_locked value is set to true before we
call bch2_trans_update in ec_stripe_update_ptrs.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Don't disable preemption unnecessarily
Kent Overstreet [Fri, 11 Jun 2021 03:34:02 +0000 (23:34 -0400)]
bcachefs: Don't disable preemption unnecessarily

Small improvements to some percpu utility code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Extensive triggers cleanups
Kent Overstreet [Fri, 11 Jun 2021 01:44:27 +0000 (21:44 -0400)]
bcachefs: Extensive triggers cleanups

 - We no longer mark subsets of extents, they're marked like regular
   keys now - which means we can drop the offset & sectors arguments
   to trigger functions
 - Drop other arguments that are no longer needed anymore in various
   places - fs_usage
 - Drop the logic for handling extents in bch2_mark_update() that isn't
   needed anymore, to match bch2_trans_mark_update()
 - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
   don't have an old key to mark

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: fix truncate with ATTR_MODE
Kent Overstreet [Tue, 15 Jun 2021 02:29:54 +0000 (22:29 -0400)]
bcachefs: fix truncate with ATTR_MODE

After the v5.12 rebase, we started oopsing when truncate was passed
ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This
refactors things so that truncate/extend finish by using
bch2_setattr_nonsize(), which solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Improve iter->should_be_locked
Kent Overstreet [Mon, 14 Jun 2021 22:16:10 +0000 (18:16 -0400)]
bcachefs: Improve iter->should_be_locked

Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Kill __btree_delete_at()
Kent Overstreet [Mon, 14 Jun 2021 20:35:03 +0000 (16:35 -0400)]
bcachefs: Kill __btree_delete_at()

With trans->updates2 gone, we can now drop this helper and use
bch2_btree_delete_at() instead.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Make sure bch2_trans_mark_update uses correct iter flags
Kent Overstreet [Mon, 14 Jun 2021 20:32:44 +0000 (16:32 -0400)]
bcachefs: Make sure bch2_trans_mark_update uses correct iter flags

Now that bch2_btree_iter_peek_with_updates() has been removed in favor
of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we
don't want it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix a memory leak in dio write path
Kent Overstreet [Mon, 14 Jun 2021 18:47:26 +0000 (14:47 -0400)]
bcachefs: Fix a memory leak in dio write path

Commit c42bca92be928ce7dece5fc04cf68d0e37ee6718 "bio: don't copy bvec
for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec
at the incoming biovec, meaning if we already allocated one, it'll be
leaked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: fix a possible bcachefs checksum mapping error opt-checksum enum to type...
Janpieter Sollie [Sun, 13 Jun 2021 20:01:08 +0000 (22:01 +0200)]
bcachefs: fix a possible bcachefs checksum mapping error opt-checksum enum to type-checksum enum

This fixes some rare cases where the metadata checksum option specified
may map to the wrong actual checksum type.

Signed-off-by: Janpieter Sollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Clear iter->should_be_locked in bch2_trans_reset
Kent Overstreet [Sun, 13 Jun 2021 02:33:53 +0000 (22:33 -0400)]
bcachefs: Clear iter->should_be_locked in bch2_trans_reset

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Don't underflow c->sectors_available
Kent Overstreet [Fri, 11 Jun 2021 03:33:27 +0000 (23:33 -0400)]
bcachefs: Don't underflow c->sectors_available

This rarely used error path should've been checking for underflow -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Kill bch2_btree_iter_peek_cached()
Kent Overstreet [Fri, 11 Jun 2021 00:15:50 +0000 (20:15 -0400)]
bcachefs: Kill bch2_btree_iter_peek_cached()

It's now been rolled into bch2_btree_iter_peek_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Allow shorter JSET_ENTRY_dev_usage entries
Kent Overstreet [Sat, 12 Jun 2021 21:20:02 +0000 (17:20 -0400)]
bcachefs: Allow shorter JSET_ENTRY_dev_usage entries

If the last entry(ies) would be all zeros, there's no need to write them
out - the read path already handles that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: mount: fix null deref with null devname
Dan Robertson [Thu, 10 Jun 2021 11:52:42 +0000 (07:52 -0400)]
bcachefs: mount: fix null deref with null devname

 - Fix null deref on mount when given a null device name.
 - Move the dev_name checks to return EINVAL when it is invalid.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Fix null ptr deref when splitting compressed extents
Kent Overstreet [Sat, 12 Jun 2021 19:45:56 +0000 (15:45 -0400)]
bcachefs: Fix null ptr deref when splitting compressed extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix overflow in journal_replay_entry_early
Kent Overstreet [Fri, 11 Jun 2021 03:51:09 +0000 (23:51 -0400)]
bcachefs: Fix overflow in journal_replay_entry_early

If filesystem on disk was used by a version with a larger BCH_DATA_NR
thas the currently running version, we don't want this to cause a buffer
overrun.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Always zero memory from bch2_trans_kmalloc()
Kent Overstreet [Mon, 7 Jun 2021 20:50:30 +0000 (16:50 -0400)]
bcachefs: Always zero memory from bch2_trans_kmalloc()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Merging for indirect extents
Kent Overstreet [Sat, 15 May 2021 19:04:08 +0000 (15:04 -0400)]
bcachefs: Merging for indirect extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Improved extent merging
Kent Overstreet [Sat, 15 May 2021 04:37:37 +0000 (00:37 -0400)]
bcachefs: Improved extent merging

Previously, checksummed extents could only be merged when the checksum
covered only the currently live data.

xfstest generic/064 creates a test file, then uses finsert calls to
split the extent, then collapse calls to see if they get merged. But
without any reads to trigger the narrow_crcs path, each of the split
extents will still have a checksum for the entire original extent.

This patch improves the extent merge path so that if either of the
extents we're attempting to merge has a checksum that covers the entire
merged extent, we just use that checksum.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Re-implement extent merging in transaction commit path
Kent Overstreet [Thu, 29 Apr 2021 03:52:19 +0000 (23:52 -0400)]
bcachefs: Re-implement extent merging in transaction commit path

We haven't had extent merging in quite some time. It used to be done by
the btree code when sorting btree nodes, but that was eliminated as part
of the work to separate extent handling from core btree code.

This patch re-implements extent merging in the transaction commit path.
We don't currently have the ability to merge reflink pointers, we need
to do some work on the triggers code to be able to do that without
ending up with incorrect refcounts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Refactor extent_handle_overwrites()
Kent Overstreet [Thu, 29 Apr 2021 03:52:19 +0000 (23:52 -0400)]
bcachefs: Refactor extent_handle_overwrites()

Prep work for extent merging

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Clean up key merging
Kent Overstreet [Thu, 29 Apr 2021 03:49:30 +0000 (23:49 -0400)]
bcachefs: Clean up key merging

This patch simplifies the key merging code by getting rid of partial
merges - it's simpler and saner if we just don't merge extents when
they'd overflow k->size.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Kill trans->updates2
Kent Overstreet [Mon, 7 Jun 2021 18:54:56 +0000 (14:54 -0400)]
bcachefs: Kill trans->updates2

Now that extent handling has been lifted to bch2_trans_update(), we
don't need to keep two different lists of updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Simplify reflink trigger
Kent Overstreet [Mon, 7 Jun 2021 17:39:21 +0000 (13:39 -0400)]
bcachefs: Simplify reflink trigger

Now that we only mark entire extents, we can ditch the
"reflink_p_frag_references" code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Move extent_handle_overwrites() to bch2_trans_update()
Kent Overstreet [Wed, 2 Jun 2021 04:18:34 +0000 (00:18 -0400)]
bcachefs: Move extent_handle_overwrites() to bch2_trans_update()

This lifts handling of overlapping extents out of __bch2_trans_commit()
and moves it to where we first do the update - which means that
BTREE_ITER_WITH_UPDATES can now work correctly in extents mode.

Also, this patch reworks how extent triggers work: previously, on
partial extent overwrite we would pass this information to the trigger,
telling it what part of the extent was being overwritten. But, this
approach has had too many subtle corner cases - now, we only mark whole
extents, meaning on partial extent overwrite we unmark the old extent
and mark the new extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: bch2_btree_iter_peek_slot() now saves initial position when searching
Kent Overstreet [Sat, 31 Dec 2022 00:15:53 +0000 (19:15 -0500)]
bcachefs: bch2_btree_iter_peek_slot() now saves initial position when searching

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Kill __bch2_btree_iter_peek_slot_extents()
Kent Overstreet [Sat, 31 Dec 2022 00:15:53 +0000 (19:15 -0500)]
bcachefs: Kill __bch2_btree_iter_peek_slot_extents()

This codepath won't just be for extents in the future, it'll also be for
BTREE_ITER_FILTER_SNAPSHOTS mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: bch2_btree_iter_peek_slot() now supports BTREE_ITER_WITH_UPDATES
Kent Overstreet [Sat, 31 Dec 2022 03:41:38 +0000 (22:41 -0500)]
bcachefs: bch2_btree_iter_peek_slot() now supports BTREE_ITER_WITH_UPDATES

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: BTREE_ITER_WITH_UPDATES
Kent Overstreet [Fri, 4 Jun 2021 04:29:49 +0000 (00:29 -0400)]
bcachefs: BTREE_ITER_WITH_UPDATES

This drops bch2_btree_iter_peek_with_updates() and replaces it with a
new flag, BTREE_ITER_WITH_UPDATES, and also reworks
bch2_btree_iter_peek_slot() to respect it too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Child btree iterators
Kent Overstreet [Sat, 20 Mar 2021 19:12:05 +0000 (15:12 -0400)]
bcachefs: Child btree iterators

This adds the ability for btree iterators to own child iterators - to be
used by an upcoming rework of bch2_btree_iter_peek_slot(), so we can
scan forwards while maintaining our current position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Drop all btree locks when submitting btree node reads
Kent Overstreet [Fri, 9 Apr 2021 02:26:53 +0000 (22:26 -0400)]
bcachefs: Drop all btree locks when submitting btree node reads

As a rule we don't want to be holding btree locks while submitting IO -
this will improve overall filesystem latency.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: More topology repair code
Kent Overstreet [Mon, 7 Jun 2021 17:28:50 +0000 (13:28 -0400)]
bcachefs: More topology repair code

This improves the handling of overlapping btree nodes; now, we handle
the case where one btree node completely overwrites another.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix a buffer overrun
Kent Overstreet [Thu, 10 Jun 2021 17:21:39 +0000 (13:21 -0400)]
bcachefs: Fix a buffer overrun

In make_extent_indirect(), we were allocating too small of a buffer for
the new indirect extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Don't mark superblocks past end of usable space
Kent Overstreet [Wed, 9 Jun 2021 02:50:30 +0000 (22:50 -0400)]
bcachefs: Don't mark superblocks past end of usable space

bcachefs-tools recently started putting a backup superblock at the end
of the device. This causes a problem if the bucket size doesn't divide
the device size - but we can fix it by just skipping marking that part.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix a spurious debug mode assertion
Kent Overstreet [Tue, 8 Jun 2021 20:29:24 +0000 (16:29 -0400)]
bcachefs: Fix a spurious debug mode assertion

When we switched to using bch2_btree_bset_insert_key() for extents it
turned out it started leaving invalid keys around - of type deleted but
nonzero size - but this is fine (if ugly) because they're never written
out.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix unitialized use of a value
Brett Holman [Sun, 6 Jun 2021 15:29:42 +0000 (09:29 -0600)]
bcachefs: Fix unitialized use of a value

Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: do not compile acl mod on minimal config
Dan Robertson [Sat, 5 Jun 2021 23:03:16 +0000 (19:03 -0400)]
bcachefs: do not compile acl mod on minimal config

Do not compile the acl.o target if BCACHEFS_POSIX_ACL is not enabled.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: btree_iter->should_be_locked
Kent Overstreet [Fri, 4 Jun 2021 21:17:45 +0000 (17:17 -0400)]
bcachefs: btree_iter->should_be_locked

Add a field to struct btree_iter for tracking whether it should be
locked - this fixes spurious transaction restarts in
bch2_trans_relock().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Improve btree iterator tracepoints
Kent Overstreet [Fri, 4 Jun 2021 19:18:10 +0000 (15:18 -0400)]
bcachefs: Improve btree iterator tracepoints

This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Preallocate transaction mem
Kent Overstreet [Thu, 3 Jun 2021 03:31:42 +0000 (23:31 -0400)]
bcachefs: Preallocate transaction mem

This helps avoid transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Check for errors from bch2_trans_update()
Kent Overstreet [Wed, 2 Jun 2021 04:15:07 +0000 (00:15 -0400)]
bcachefs: Check for errors from bch2_trans_update()

Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs; Check for allocator thread shutdown
Kent Overstreet [Tue, 1 Jun 2021 00:52:39 +0000 (20:52 -0400)]
bcachefs; Check for allocator thread shutdown

We were missing a kthread_should_stop() check in the loop in
bch2_invalidate_buckets(), very occasionally leading to us getting stuck
while shutting down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Journal space calculation fix
Kent Overstreet [Mon, 31 May 2021 04:13:39 +0000 (00:13 -0400)]
bcachefs: Journal space calculation fix

When devices have different bucket sizes, we may accumulate a journal
write that doesn't fit on some of our devices - previously, we'd
underflow when calculating space on that device and then everything
would get weird.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Don't fragment extents when making them indirect
Kent Overstreet [Sun, 21 Mar 2021 02:14:10 +0000 (22:14 -0400)]
bcachefs: Don't fragment extents when making them indirect

This fixes a "disk usage increased without a reservation" bug, when
reflinking compressed extents. Also, there's no good reason for reflink
to be fragmenting extents anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fsck for reflink refcounts
Kent Overstreet [Sun, 23 May 2021 06:31:33 +0000 (02:31 -0400)]
bcachefs: Fsck for reflink refcounts

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Assorted endianness fixes
Kent Overstreet [Sun, 23 May 2021 21:04:13 +0000 (17:04 -0400)]
bcachefs: Assorted endianness fixes

Found by sparse

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix a deadlock
Kent Overstreet [Mon, 11 Sep 2023 03:33:08 +0000 (23:33 -0400)]
bcachefs: Fix a deadlock

Waiting on a btree node write with btree locks held can deadlock, if the
write errors: the write error path has to do do a btree update to drop
the pointer to the replica that errored.

The interior update path has to wait on in flight btree writes before
freeing nodes on disk. Previously, this was done in
bch2_btree_interior_update_will_free_node(), and could deadlock; now, we
just stash a pointer to the node and do it in
btree_update_nodes_written(), just prior to the transactional part of
the update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Split out btree_error_wq
Kent Overstreet [Fri, 28 May 2021 01:38:00 +0000 (21:38 -0400)]
bcachefs: Split out btree_error_wq

We can't use btree_update_wq becuase btree updates may be waiting on
btree writes to complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Fix pathalogical behaviour with inode sharding by cpu ID
Kent Overstreet [Fri, 28 May 2021 09:06:18 +0000 (05:06 -0400)]
bcachefs: Fix pathalogical behaviour with inode sharding by cpu ID

If the transactior restarts on a different CPU, it could end up needing
to read in a different btree node, which makes another transaction
restart more likely...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix journal write error path
Kent Overstreet [Fri, 28 May 2021 03:16:25 +0000 (23:16 -0400)]
bcachefs: Fix journal write error path

Journal write errors were racing with the submission path - potentially
causing writes to other replicas to not get submitted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Reflink refcount fix
Kent Overstreet [Fri, 28 May 2021 01:16:50 +0000 (21:16 -0400)]
bcachefs: Reflink refcount fix

__bch2_trans_mark_reflink_p wasn't always correctly returning the number
of sectors processed - the new logic is a bit more straightforward
overall too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add an option to control sharding new inode numbers
Kent Overstreet [Fri, 28 May 2021 00:20:20 +0000 (20:20 -0400)]
bcachefs: Add an option to control sharding new inode numbers

We're seeing a bug where inode creates end up spinning in
bch2_inode_create - disabling sharding will simplify what we're testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Don't use bch_write_op->cl for delivering completions
Kent Overstreet [Sat, 29 Oct 2022 06:47:33 +0000 (02:47 -0400)]
bcachefs: Don't use bch_write_op->cl for delivering completions

We already had op->end_io as an alternative mechanism to op->cl.parent
for delivering write completions; this switches all code paths to using
op->end_io.

Two reasons:
 - op->end_io is more efficient, due to fewer atomic ops, this completes
   the conversion that was originally only done for the direct IO path.
 - We'll be restructing the write path to use a different mechanism for
   punting to process context, refactoring to not use op->cl will make
   that easier.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Kill bch_write_op.index_update_fn
Kent Overstreet [Sat, 29 Oct 2022 03:57:01 +0000 (23:57 -0400)]
bcachefs: Kill bch_write_op.index_update_fn

This deletes bch_write_op.index_update_fn: indirect function calls have
gotten considerably more expensive post spectre/meltdown, and we only
have two different index_update_fns - this patch adds a flag to specify
which one to use (normal vs. data move path).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Inline fastpath of bch2_disk_reservation_add()
Kent Overstreet [Tue, 1 Nov 2022 02:28:09 +0000 (22:28 -0400)]
bcachefs: Inline fastpath of bch2_disk_reservation_add()

The fastpath now doesn't even disable preemption - instead we use a (non
locked) cmpxchg.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Don't use uuid in tracepoints
Kent Overstreet [Thu, 27 May 2021 23:15:44 +0000 (19:15 -0400)]
bcachefs: Don't use uuid in tracepoints

%pU for printing out pointers to uuids doesn't work in perf trace

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add a tracepoint for copygc waiting
Kent Overstreet [Wed, 26 May 2021 05:03:35 +0000 (01:03 -0400)]
bcachefs: Add a tracepoint for copygc waiting

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add a cond_resched call to the copygc main loop
Kent Overstreet [Tue, 25 May 2021 22:42:05 +0000 (18:42 -0400)]
bcachefs: Add a cond_resched call to the copygc main loop

We seem to have a bug where the copygc thread ends up spinning and
making the system unusable - this will at least prevent it from locking
up the machine, and it's a good thing to have anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix a null ptr deref
Kent Overstreet [Sun, 23 May 2021 22:42:51 +0000 (18:42 -0400)]
bcachefs: Fix a null ptr deref

bch2_btree_iter_peek() won't always return a key - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix an issue with inconsistent btree writes after unclean shutdown
Kent Overstreet [Sun, 23 May 2021 01:43:20 +0000 (21:43 -0400)]
bcachefs: Fix an issue with inconsistent btree writes after unclean shutdown

After unclean shutdown, btree writes may have completed on one device
and not others - and this inconsistency could lead us to writing new
bsets with a gap in our btree node in one of our replicas.

Fortunately, this is only an issue with bsets that are newer than the
most recent journal flush, and we already have a mechanism for detecting
and blacklisting those. We just need to make sure to start new btree
writes after the most recent _non_ blacklisted bset.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Improve FS_IOC_GOINGDOWN ioctl
Kent Overstreet [Sun, 23 May 2021 01:13:17 +0000 (21:13 -0400)]
bcachefs: Improve FS_IOC_GOINGDOWN ioctl

We weren't interpreting the flags argument at all.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add a workqueue for btree io completions
Kent Overstreet [Sat, 22 May 2021 21:37:25 +0000 (17:37 -0400)]
bcachefs: Add a workqueue for btree io completions

Also, clean up workqueue usage - we shouldn't be using system
workqueues, pretty much everything we do needs to be on our own
WQ_MEM_RECLAIM workqueues.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: rewrote prefetch asm in gas syntax for clang compatibility
Brett Holman [Fri, 21 May 2021 22:45:38 +0000 (16:45 -0600)]
bcachefs: rewrote prefetch asm in gas syntax for clang compatibility

Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Add a debug mode that always reads from every btree replica
Kent Overstreet [Sat, 22 May 2021 03:57:37 +0000 (23:57 -0400)]
bcachefs: Add a debug mode that always reads from every btree replica

There's a new module parameter, verify_all_btree_replicas, that enables
reading from every btree replica when reading in btree nodes and
comparing them against each other. We've been seeing some strange btree
corruption - this will hopefully aid in tracking it down and catching it
more often.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Don't repair btree nodes until after interior journal replay is done
Kent Overstreet [Fri, 21 May 2021 20:06:54 +0000 (16:06 -0400)]
bcachefs: Don't repair btree nodes until after interior journal replay is done

We need the btree to be in a consistent state before we can rewrite
btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix an uninitialized var
Kent Overstreet [Fri, 21 May 2021 00:47:27 +0000 (20:47 -0400)]
bcachefs: Fix an uninitialized var

this fixes a valgrind complaint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix for buffered writes getting -ENOSPC
Kent Overstreet [Thu, 20 May 2021 19:49:23 +0000 (15:49 -0400)]
bcachefs: Fix for buffered writes getting -ENOSPC

Buffered writes may have to increase their disk reservation at btree
update time, due to compression and erasure coding being unpredictable:
O_DIRECT writes should be checking for -ENOSPC, but buffered writes have
already been accepted and should not.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix inode backpointers in RENAME_OVERWRITE
Kent Overstreet [Thu, 20 May 2021 04:09:47 +0000 (00:09 -0400)]
bcachefs: Fix inode backpointers in RENAME_OVERWRITE

When we delete the dirent an inode points to, we need to zero out the
backpointer fields - this was missed in the RENAME_OVERWRITE case.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Make bch2_remap_range respect O_SYNC
Kent Overstreet [Thu, 20 May 2021 01:21:49 +0000 (21:21 -0400)]
bcachefs: Make bch2_remap_range respect O_SYNC

Caught by xfstest generic/628

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Split extents if necessary in bch2_trans_update()
Kent Overstreet [Wed, 19 May 2021 03:17:03 +0000 (23:17 -0400)]
bcachefs: Split extents if necessary in bch2_trans_update()

Currently, we handle multiple overlapping extents in the same
transaction commit by doing fixups in bch2_trans_update() - this patch
extents that to split updates when necessary. The next patch that
changes the reflink code to not fragment extents when making them
indirect will require this.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>