linux-block.git
10 months agosix locks: Delete six_lock_pcpu_free_rcu()
Kent Overstreet [Sat, 27 Aug 2022 19:00:59 +0000 (15:00 -0400)]
six locks: Delete six_lock_pcpu_free_rcu()

Didn't have any users, and wasn't a good idea to begin with - delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Add persistent counters for all tracepoints
Kent Overstreet [Sat, 27 Aug 2022 16:48:36 +0000 (12:48 -0400)]
bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Fix bch2_btree_update_start() to return -BCH_ERR_journal_reclaim_would_deadlock
Kent Overstreet [Sat, 27 Aug 2022 16:37:05 +0000 (12:37 -0400)]
bcachefs: Fix bch2_btree_update_start() to return -BCH_ERR_journal_reclaim_would_deadlock

On failure to get a journal pre-reservation because we're called from
journal reclaim we're not supposed to return a transaction restart error
- this fixes a livelock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Improve bch2_btree_node_relock()
Kent Overstreet [Sat, 27 Aug 2022 16:28:09 +0000 (12:28 -0400)]
bcachefs: Improve bch2_btree_node_relock()

This moves the IS_ERR_OR_NULL() check to the inline part, since that's a
fast path event.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Improve trans_restart_journal_preres_get tracepoint
Kent Overstreet [Sat, 27 Aug 2022 16:23:38 +0000 (12:23 -0400)]
bcachefs: Improve trans_restart_journal_preres_get tracepoint

It now includes journal_flags.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Improve btree_node_relock_fail tracepoint
Kent Overstreet [Sat, 27 Aug 2022 16:11:18 +0000 (12:11 -0400)]
bcachefs: Improve btree_node_relock_fail tracepoint

It now prints the error name when the btree node is an error pointer;
also, don't trace failures when the the btree node is
BCH_ERR_no_btree_node_up.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Make more btree_paths available
Kent Overstreet [Sat, 27 Aug 2022 14:30:36 +0000 (10:30 -0400)]
bcachefs: Make more btree_paths available

 - Don't decrease BTREE_ITER_MAX when building with CONFIG_LOCKDEP
   anymore. The lockdep table sizes are configurable now, we don't need
   this anymore.
 - btree_trans_too_many_iters() is less conservative now. Previously it
   was causing a transaction restart if we had used more than
   BTREE_ITER_MAX / 2 paths, change this to BTREE_ITER_MAX - 8.

This helps with excessive transaction restarts/livelocks in the bucket
allocator path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Correctly initialize bkey_cached->lock
Kent Overstreet [Fri, 26 Aug 2022 01:42:46 +0000 (21:42 -0400)]
bcachefs: Correctly initialize bkey_cached->lock

We need to use the right class for some assertions to work correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Track held write locks
Kent Overstreet [Tue, 23 Aug 2022 01:05:31 +0000 (21:05 -0400)]
bcachefs: Track held write locks

The upcoming lock cycle detection code will need to know precisely which
locks every btree_trans is holding, including write locks - this patch
updates btree_node_locked_type to include write locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Print lock counts in debugs btree_transactions
Kent Overstreet [Tue, 23 Aug 2022 05:20:24 +0000 (01:20 -0400)]
bcachefs: Print lock counts in debugs btree_transactions

Improve our debugfs output, to help in debugging deadlocks: this shows,
for every btree node we print, the current number of readers/intent
locks/write locks held.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Switch btree locking code to struct btree_bkey_cached_common
Kent Overstreet [Mon, 22 Aug 2022 17:21:10 +0000 (13:21 -0400)]
bcachefs: Switch btree locking code to struct btree_bkey_cached_common

This is just some type safety cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Track maximum transaction memory
Kent Overstreet [Tue, 23 Aug 2022 01:49:55 +0000 (21:49 -0400)]
bcachefs: Track maximum transaction memory

This patch
 - tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats
 - makes it available in debugfs
 - switches bch2_trans_init() to using that for the amount of memory to
   preallocate, instead of the parameter passed in

This drastically reduces transaction restarts, and means we no longer
need to track this in the source code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agosix locks: Improve six_lock_count
Kent Overstreet [Mon, 22 Aug 2022 03:08:53 +0000 (23:08 -0400)]
six locks: Improve six_lock_count

six_lock_count now counts up whether a write lock held, and this patch
now also correctly counts six_lock->intent_lock_recurse.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Kill nodes_intent_locked
Kent Overstreet [Sun, 21 Aug 2022 21:20:42 +0000 (17:20 -0400)]
bcachefs: Kill nodes_intent_locked

Previously, we used two different bit arrays for tracking held btree
node locks. This patch switches to an array of two bit integers, which
will let us track, in a future patch, when we hold a write lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Better use of locking helpers
Kent Overstreet [Sun, 21 Aug 2022 22:17:51 +0000 (18:17 -0400)]
bcachefs: Better use of locking helpers

Held btree locks are tracked in btree_path->nodes_locked and
btree_path->nodes_intent_locked. Upcoming patches are going to change
the representation in struct btree_path, so this patch switches to
proper helpers instead of direct access to these fields.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Reorganize btree_locking.[ch]
Kent Overstreet [Fri, 19 Aug 2022 23:50:18 +0000 (19:50 -0400)]
bcachefs: Reorganize btree_locking.[ch]

Tidy things up a bit before doing more work in this file.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: btree_locking.c
Kent Overstreet [Fri, 19 Aug 2022 19:35:34 +0000 (15:35 -0400)]
bcachefs: btree_locking.c

Start to centralize some of the locking code in a new file; more locking
code will be moving here in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix adding a device with a label
Kent Overstreet [Thu, 18 Aug 2022 21:57:24 +0000 (17:57 -0400)]
bcachefs: Fix adding a device with a label

Device labels are represented as pointers in the member info section: we
need to get and then set the label for it to be kept correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: fsck: Another transaction restart handling fix
Kent Overstreet [Thu, 18 Aug 2022 21:00:12 +0000 (17:00 -0400)]
bcachefs: fsck: Another transaction restart handling fix

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: bch2_btree_delete_range_trans() now returns -BCH_ERR_transaction_restart_nested
Kent Overstreet [Thu, 18 Aug 2022 17:00:26 +0000 (13:00 -0400)]
bcachefs: bch2_btree_delete_range_trans() now returns -BCH_ERR_transaction_restart_nested

The new convention is that functions that handle transaction restarts
within an existing transaction context should return
-BCH_ERR_transaction_restart_nested when they did so, since they
invalidated the outer transaction context.

This also means bch2_btree_delete_range_trans() is changed to only call
bch2_trans_begin() after a transaction restart, not on every loop
iteration.

This is to fix a bug in fsck, in check_inode() when we truncate an inode
with BCH_INODE_I_SIZE_DIRTY set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Minor transaction restart handling fix
Kent Overstreet [Thu, 18 Aug 2022 02:17:08 +0000 (22:17 -0400)]
bcachefs: Minor transaction restart handling fix

 - fsck_inode_rm() wasn't returning BCH_ERR_transaction_restart_nested
 - change bch2_trans_verify_not_restarted() to call panic() - we don't
   want these errors to be missed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix bch2_btree_iter_peek_slot() error path
Kent Overstreet [Wed, 17 Aug 2022 21:49:12 +0000 (17:49 -0400)]
bcachefs: Fix bch2_btree_iter_peek_slot() error path

iter->k needs to be consistent with iter->pos - required for
bch2_btree_iter_(rewind|advance) to work correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Another should_be_locked fixup
Kent Overstreet [Tue, 16 Aug 2022 07:08:15 +0000 (03:08 -0400)]
bcachefs: Another should_be_locked fixup

When returning a key from the key cache, in BTREE_ITER_WITH_KEY_CACHE
mode, we don't want to set should_be_locked on iter->path; we're not
returning a key from that path, so we donn't need to, and also since we
traversed the key cache iterator before setting should_be_locked on that
path it might be unlocked (if we unlocked, bch2_trans_relock() won't
have relocked it).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: bch2_bkey_packed_to_binary_text()
Kent Overstreet [Sun, 14 Aug 2022 18:44:17 +0000 (14:44 -0400)]
bcachefs: bch2_bkey_packed_to_binary_text()

For debugging the eytzinger search tree code, and low level bkey packing
code, it can be helpful to see things in binary: this patch improves our
helpers for doing so.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add assertions for unexpected transaction restarts
Kent Overstreet [Thu, 7 Jul 2022 04:37:46 +0000 (00:37 -0400)]
bcachefs: Add assertions for unexpected transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: btree_path_down() optimization
Kent Overstreet [Mon, 15 Aug 2022 22:55:20 +0000 (18:55 -0400)]
bcachefs: btree_path_down() optimization

We should be calling btree_node_mem_ptr_set() before path_level_init(),
since we already touched the key that btree_node_mem_ptr_set() will
modify and path_level_init() will be doing the lookup in the child btree
node we're recursing to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Always rebuild aux search trees when node boundaries change
Kent Overstreet [Wed, 17 Aug 2022 18:20:48 +0000 (14:20 -0400)]
bcachefs: Always rebuild aux search trees when node boundaries change

Topology repair may change btree node min/max keys: when it does so, we
need to always rebuild eytzinger search trees because nodes directly
depend on those values.

This fixes a bug found by the 'kill_btree_node' test, where we'd pop an
assertion in bch2_bset_search_linear().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add an overflow check in set_bkey_val_u64s()
Kent Overstreet [Mon, 15 Aug 2022 18:05:44 +0000 (14:05 -0400)]
bcachefs: Add an overflow check in set_bkey_val_u64s()

For now this is just a BUG_ON() - we may want to change this to return
an error in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: remove dead whiteout_u64s argument.
Olexa Bilaniuk [Mon, 15 Aug 2022 18:20:22 +0000 (14:20 -0400)]
bcachefs: remove dead whiteout_u64s argument.

Signed-off-by: Olexa Bilaniuk <obilaniu@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Debugfs cleanup
Kent Overstreet [Sun, 14 Aug 2022 20:11:35 +0000 (16:11 -0400)]
bcachefs: Debugfs cleanup

This improves flush_buf() so that it always returns nonzero when we're
done reading and ready to return to userspace, and so that it returns
the value we want to return to userspace (number of bytes read, if there
wasn't an error).

In the future we'll be better abstracting this mechanism and pulling it
out of bcachefs, and using it to replace seq_file.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix bch2_fs_check_snapshots()
Kent Overstreet [Mon, 15 Aug 2022 18:01:56 +0000 (14:01 -0400)]
bcachefs: Fix bch2_fs_check_snapshots()

We were iterating starting at BCACHEFS_ROOT_INO, but snapshots start at
POS_MIN - meaning this code was never getting run.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Olexa Bilaniuk <obilaniu@gmail.com>
10 months agobcachefs: Increment restart count in bch2_trans_begin()
Kent Overstreet [Fri, 12 Aug 2022 16:45:01 +0000 (12:45 -0400)]
bcachefs: Increment restart count in bch2_trans_begin()

Instead of counting transaction restarts, count when the transaction is
restarted: if bch2_trans_begin() was called when the transaction wasn't
restarted we need to ensure restart_count is still incremented.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix assertion in bch2_btree_key_cache_drop()
Kent Overstreet [Fri, 12 Aug 2022 01:06:43 +0000 (21:06 -0400)]
bcachefs: Fix assertion in bch2_btree_key_cache_drop()

Turns out this assertion was something we could legitimately hit - add a
comment describing what's going on, and handle it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Print last line in debugfs/btree_transaction_stats
Kent Overstreet [Fri, 12 Aug 2022 01:06:02 +0000 (21:06 -0400)]
bcachefs: Print last line in debugfs/btree_transaction_stats

We need to turn the flush_buf() thing into a proper API, to replace
seq_file.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Track the maximum btree_paths ever allocated by each transaction
Kent Overstreet [Fri, 12 Aug 2022 00:14:54 +0000 (20:14 -0400)]
bcachefs: Track the maximum btree_paths ever allocated by each transaction

We need a way to check if the machinery for handling btree_paths with in
a transaction is behaving reasonably, as it often has not been - we've
had bugs with transaction path overflows caused by duplicate paths and
plenty of other things.

This patch tracks, per transaction fn, the most btree paths ever
allocated by that transaction and makes it available in debugfs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Rename lock_held_stats -> btree_transaction_stats
Kent Overstreet [Thu, 11 Aug 2022 23:36:24 +0000 (19:36 -0400)]
bcachefs: Rename lock_held_stats -> btree_transaction_stats

Going to be adding more things to this in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Switch bch2_btree_delete_range() to bch2_trans_run()
Kent Overstreet [Thu, 11 Aug 2022 21:25:25 +0000 (17:25 -0400)]
bcachefs: Switch bch2_btree_delete_range() to bch2_trans_run()

This fixes an assertion about unexpected transaction restarts -
bch2_delete_range_trans() handles transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix btree_path->uptodate inconsistency
Kent Overstreet [Thu, 11 Aug 2022 17:23:04 +0000 (13:23 -0400)]
bcachefs: Fix btree_path->uptodate inconsistency

This fixes an assertion in bch2_btree_path_peek_slot().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix duplicate paths left by bch2_path_put()
Kent Overstreet [Thu, 11 Aug 2022 00:05:14 +0000 (20:05 -0400)]
bcachefs: Fix duplicate paths left by bch2_path_put()

bch2_path_put() is supposed to drop paths that aren't needed on
transaction restart, or to hold locks that we're supposed to keep until
transaction commit: but it was failing to free paths in some cases that
it should have, leading to transaction path overflows with lots of
duplicate paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Kill BTREE_ITER_CACHED_(NOFILL|NOCREATE)
Kent Overstreet [Thu, 11 Aug 2022 16:23:21 +0000 (12:23 -0400)]
bcachefs: Kill BTREE_ITER_CACHED_(NOFILL|NOCREATE)

These were used more prior to getting rid of the in-memory bucket arrays
- they don't serve much purpose anymore, and deleting them lets us write
better assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Tracepoint improvements
Kent Overstreet [Wed, 10 Aug 2022 16:42:55 +0000 (12:42 -0400)]
bcachefs: Tracepoint improvements

Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: "Snapshot deletion did not run correctly" should be a fsck err
Kent Overstreet [Thu, 11 Aug 2022 00:22:01 +0000 (20:22 -0400)]
bcachefs: "Snapshot deletion did not run correctly" should be a fsck err

This was noticed when a test hit this error and didn't fail, because
fsck wasn't returning that it fixed errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: six_lock_counts() is now in six.c
Kent Overstreet [Wed, 10 Aug 2022 16:34:18 +0000 (12:34 -0400)]
bcachefs: six_lock_counts() is now in six.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes
Kent Overstreet [Wed, 10 Aug 2022 23:08:30 +0000 (19:08 -0400)]
bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Don't set should_be_locked on paths that aren't locked
Kent Overstreet [Wed, 10 Aug 2022 22:55:53 +0000 (18:55 -0400)]
bcachefs: Don't set should_be_locked on paths that aren't locked

It doesn't make any sense to set should_be_locked on btree_paths that
aren't locked, and is often a bug - this patch adds assertions and fixes
some of those bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix missing error handling in bch2_subvolume_delete()
Kent Overstreet [Tue, 9 Aug 2022 17:47:03 +0000 (13:47 -0400)]
bcachefs: Fix missing error handling in bch2_subvolume_delete()

This fixes an assertion when the transaction has been unexpectedly
restarted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Improve an error message
Kent Overstreet [Sun, 7 Aug 2022 03:02:09 +0000 (23:02 -0400)]
bcachefs: Improve an error message

Update an error message to use bch2_err_str().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Tracepoint improvements
Kent Overstreet [Sun, 7 Aug 2022 17:43:32 +0000 (13:43 -0400)]
bcachefs: Tracepoint improvements

 - use strlcpy(), not strncpy()
 - add tracepoints for btree_path alloc and free
 - give the tracepoint for key cache upgrade fail a proper name
 - add a tracepoint for btree_node_upgrade_fail

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix incorrectly freeing btree_path in alloc path
Kent Overstreet [Fri, 5 Aug 2022 21:08:35 +0000 (17:08 -0400)]
bcachefs: Fix incorrectly freeing btree_path in alloc path

Clearing path->preserve means the path will be dropping in
bch2_trans_begin() - but on transaction restart, we're likely to need
that path again.

This fixes a livelock in the allocation path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix bch2_btree_trans_to_text()
Kent Overstreet [Fri, 5 Aug 2022 15:36:13 +0000 (11:36 -0400)]
bcachefs: Fix bch2_btree_trans_to_text()

bch2_btree_trans_to_text() is used to print btree_transactions owned by
other threads; thus, it needs to be particularly careful. This fixes a
null ptr deref caused by racing with the owning thread changing
path->l[].b.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add distinct error code for key_cache_upgrade
Kent Overstreet [Thu, 4 Aug 2022 16:46:37 +0000 (12:46 -0400)]
bcachefs: Add distinct error code for key_cache_upgrade

This aids in debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix not punting to worqueue when promoting
Kent Overstreet [Tue, 26 Jul 2022 04:50:25 +0000 (00:50 -0400)]
bcachefs: Fix not punting to worqueue when promoting

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: fsck: Fix nested transaction handling
Kent Overstreet [Fri, 22 Jul 2022 10:57:05 +0000 (06:57 -0400)]
bcachefs: fsck: Fix nested transaction handling

This uses the new trans->restart count to make sure we always correctly
return -BCH_ERR_transaction_restart_nested when we restart a nested
transaction - eliminating some other hacks and preparing for new
assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add an O_DIRECT option (for userspace)
Kent Overstreet [Thu, 21 Jul 2022 19:41:29 +0000 (15:41 -0400)]
bcachefs: Add an O_DIRECT option (for userspace)

Sometimes we see IO errors due to O_DIRECT alignment issues - having an
option to use buffered IO will be helpful.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Tighten up btree_path assertions
Kent Overstreet [Thu, 21 Jul 2022 13:53:28 +0000 (09:53 -0400)]
bcachefs: Tighten up btree_path assertions

Currently seeing a very rare and difficult to explain btree_path
inconsistency - this patch adds assertions to the only place that seems
to be missing them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: bch2_bucket_alloc_trans_early -> for_each_btree_key_norestart
Kent Overstreet [Sun, 17 Jul 2022 06:46:46 +0000 (02:46 -0400)]
bcachefs: bch2_bucket_alloc_trans_early -> for_each_btree_key_norestart

Nested btree transactions require special care, and an upcoming patch is
going to add assertions to that effect. We don't want to be using them
unnecessarily, so this patch switches bch2_bucket_trans_early() to not
handle transaction restarts.

This patch also adds a cursor so that on transaction restart we can
continue scanning from where the previous search for an empty bucket
left off.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Fix check_i_sectors()
Kent Overstreet [Wed, 20 Jul 2022 21:35:57 +0000 (17:35 -0400)]
bcachefs: Fix check_i_sectors()

bch2_count_inode_sectors() uses for_each_btree_key() internally, which
handles lock restarts - the lockrestart_do() in check_i_sectors() is
redundant, and buggy here since the count that
bch2_count_inode_sectors() returns was interpreted as an error by
lockrestart_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert debugfs code to for_each_btree_key2()
Kent Overstreet [Wed, 20 Jul 2022 20:50:26 +0000 (16:50 -0400)]
bcachefs: Convert debugfs code to for_each_btree_key2()

This fixes a bug where we were leaking a transaction restart error to
userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Unit test updates
Kent Overstreet [Wed, 20 Jul 2022 20:25:00 +0000 (16:25 -0400)]
bcachefs: Unit test updates

 - Convert to for_each_btree_key2(), for_each_btree_key_commit(),
   for_each_btree_key_reverse()
 - No more bare bch2_btree_iter_peek(); we're now fault-injection lock
   restarts, so we always need a lockrestart_do() or equivalent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: for_each_btree_key_reverse()
Kent Overstreet [Wed, 20 Jul 2022 20:13:27 +0000 (16:13 -0400)]
bcachefs: for_each_btree_key_reverse()

This adds a new macro, like for_each_btree_key2(), but for iterating in
reverse order.

Also, change for_each_btree_key2() to properly check the return value of
bch2_btree_iter_advance().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert fsck errors to errcode.h
Kent Overstreet [Tue, 19 Jul 2022 21:20:18 +0000 (17:20 -0400)]
bcachefs: Convert fsck errors to errcode.h

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Inject transaction restarts in debug mode
Kent Overstreet [Mon, 18 Jul 2022 00:22:30 +0000 (20:22 -0400)]
bcachefs: Inject transaction restarts in debug mode

In CONFIG_BCACHEFS_DEBUG mode, we'll now randomly issue transaction
restarts - with a decaying probability based on the number of restarts
we've already had, to ensure that transactions eventually make forward
progress. This should help shake out some bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: EINTR -> BCH_ERR_transaction_restart
Kent Overstreet [Mon, 18 Jul 2022 03:06:38 +0000 (23:06 -0400)]
bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: btree_trans_too_many_iters() is now a transaction restart
Kent Overstreet [Tue, 5 Jul 2022 21:27:44 +0000 (17:27 -0400)]
bcachefs: btree_trans_too_many_iters() is now a transaction restart

All transaction restarts need a tracepoint - this is essential for
debugging

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Prevent a btree iter overflow in alloc path
Kent Overstreet [Tue, 19 Jul 2022 18:51:52 +0000 (14:51 -0400)]
bcachefs: Prevent a btree iter overflow in alloc path

In bch2_bucket_alloc_trans(), we're iterating over buckets - but not
directly with an iterator, since we're iterating over the freespace
btree.

This means that we need to clear iter->path->preserve, otherwise we'll
end up retaining a btree_path for every alloc key we touched - which is
not what we want here.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Use bch2_err_str() in error messages
Kent Overstreet [Mon, 18 Jul 2022 23:42:58 +0000 (19:42 -0400)]
bcachefs: Use bch2_err_str() in error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Improved errcodes
Kent Overstreet [Mon, 18 Jul 2022 02:31:21 +0000 (22:31 -0400)]
bcachefs: Improved errcodes

Instead of overloading standard error codes (EINTR/EAGAIN), and defining
short lists of error codes in multiple places that potentially end up
overlapping & conflicting, we're now going to have one master list of
error codes.

Error codes are defined with an x-macro: thus we also have
bch2_err_str() now.

Also, error codes have a class field. Now, instead of checking for
errors with ==, code should use bch2_err_matches(), which returns true
if the error is equal to or a sub-error of the error class.

This means we can define unique errors for every source location where
an error is generated, which will help improve our error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: We can handle missing btree roots for all alloc btrees
Kent Overstreet [Thu, 23 Jun 2022 03:06:16 +0000 (23:06 -0400)]
bcachefs: We can handle missing btree roots for all alloc btrees

We can rebuild alloc info if these btree roots are missing - no need to
bail out and say the filesystem is unrecoverable

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Fix should_invalidate_buckets()
Kent Overstreet [Mon, 18 Jul 2022 02:59:01 +0000 (22:59 -0400)]
bcachefs: Fix should_invalidate_buckets()

Like bch2_copygc_wait_amount, should_invalidate_buckets() needs to try
to ensure that there are always more buckets free than the largest
reserve.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: ec_stripe_bkey_insert() -> for_each_btree_key_norestart()
Kent Overstreet [Mon, 18 Jul 2022 00:08:37 +0000 (20:08 -0400)]
bcachefs: ec_stripe_bkey_insert() -> for_each_btree_key_norestart()

With the upcoming patches to add assertions for incorrect nested
transaction restart handling, this code is now bogus. Switch it to
for_each_btree_key_norestart() so that transaction restarts are only
handled in one place.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert erasure coding to for_each_btree_key_commit()
Kent Overstreet [Sun, 17 Jul 2022 04:44:19 +0000 (00:44 -0400)]
bcachefs: Convert erasure coding to for_each_btree_key_commit()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Add a counter for btree_trans restarts
Kent Overstreet [Sun, 17 Jul 2022 23:35:38 +0000 (19:35 -0400)]
bcachefs: Add a counter for btree_trans restarts

This will help us improve nested transactions - we need to add
assertions that whenever an inner transaction handles a restart, it
still returns -EINTR to the outer transaction.

This also adds nested_lockrestart_do() and nested_commit_do() which use
the new counters to correctly return -EINTR when the transaction was
restarted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert alloc code to for_each_btree_key_commit()
Kent Overstreet [Sun, 17 Jul 2022 04:44:19 +0000 (00:44 -0400)]
bcachefs: Convert alloc code to for_each_btree_key_commit()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert subvol code to for_each_btree_key_commit()
Kent Overstreet [Sun, 17 Jul 2022 04:44:19 +0000 (00:44 -0400)]
bcachefs: Convert subvol code to for_each_btree_key_commit()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert bch2_dev_usrdata_drop() to for_each_btree_key2()
Kent Overstreet [Sun, 17 Jul 2022 04:31:40 +0000 (00:31 -0400)]
bcachefs: Convert bch2_dev_usrdata_drop() to for_each_btree_key2()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Convert bch2_do_invalidates_work() to for_each_btree_key2()
Kent Overstreet [Sun, 17 Jul 2022 04:31:40 +0000 (00:31 -0400)]
bcachefs: Convert bch2_do_invalidates_work() to for_each_btree_key2()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: bch2_trans_run()
Kent Overstreet [Thu, 14 Jul 2022 06:08:58 +0000 (02:08 -0400)]
bcachefs: bch2_trans_run()

This adds a new helper, bch2_trans_run(), that runs a function with a
btree_transaction context but without handling transaction restarts.
We're adding checks for nested transaction restart handling: when an
inner transaction handles a transaction restart it will still have to
return it to the outer transaction, or else assertions will be popped in
the outer transaction.

But some places don't need restart handling at the outer scope, so this
helper does what they need.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert bch2_gc_done() for_each_btree_key2()
Kent Overstreet [Sun, 17 Jul 2022 04:44:19 +0000 (00:44 -0400)]
bcachefs: Convert bch2_gc_done() for_each_btree_key2()

This converts bch2_gc_stripes_done() and bch2_gc_reflink_done() to the
new for_each_btree_key_commit() macro.

The new for_each_btree_key2() and for_each_btree_key_commit() macros
handles transaction retries, allowing us to avoid nested transactions -
which we want to avoid since they're tricky to do completely correctly
and upcoming assertions are going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert more fsck code to for_each_btree_key2()
Kent Overstreet [Sun, 17 Jul 2022 04:44:19 +0000 (00:44 -0400)]
bcachefs: Convert more fsck code to for_each_btree_key2()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert more quota code to for_each_btree_key2()
Kent Overstreet [Sun, 17 Jul 2022 04:44:19 +0000 (00:44 -0400)]
bcachefs: Convert more quota code to for_each_btree_key2()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert bch2_check_lrus() to for_each_btree_key_commit()
Kent Overstreet [Sun, 17 Jul 2022 04:44:19 +0000 (00:44 -0400)]
bcachefs: Convert bch2_check_lrus() to for_each_btree_key_commit()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert bch2_dev_freespace_init() to for_each_btree_key_commit()
Kent Overstreet [Sun, 17 Jul 2022 04:44:19 +0000 (00:44 -0400)]
bcachefs: Convert bch2_dev_freespace_init() to for_each_btree_key_commit()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Convert bch2_do_discards_work() to for_each_btree_key2()
Kent Overstreet [Sun, 17 Jul 2022 04:31:40 +0000 (00:31 -0400)]
bcachefs: Convert bch2_do_discards_work() to for_each_btree_key2()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Improve bucket_alloc_fail tracepoint
Kent Overstreet [Mon, 18 Jul 2022 01:40:39 +0000 (21:40 -0400)]
bcachefs: Improve bucket_alloc_fail tracepoint

We should be printing the number of free buckets, not just the number of
available buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: bch2_mark_alloc(): Do wakeups after updating usage
Kent Overstreet [Mon, 18 Jul 2022 01:33:00 +0000 (21:33 -0400)]
bcachefs: bch2_mark_alloc(): Do wakeups after updating usage

We have an obvious wake up race if we do the wakeup _before_ updating
the counters the thing doing the waiting is reading.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: added lock held time stats
Daniel Hill [Thu, 14 Jul 2022 08:33:09 +0000 (20:33 +1200)]
bcachefs: added lock held time stats

We now record the length of time btree locks are held and expose this in debugfs.

Enabled via CONFIG_BCACHEFS_LOCK_TIME_STATS.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: bch2_time_stats_to_text now indents properly
Daniel Hill [Thu, 14 Jul 2022 08:31:36 +0000 (20:31 +1200)]
bcachefs: bch2_time_stats_to_text now indents properly

Printbufs indentation feature doesn't yet work with '\n' and '\t'. So we've
replaced all instances of '\n' with prt_newline.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: lock time stats prep work.
Daniel Hill [Thu, 14 Jul 2022 06:58:23 +0000 (18:58 +1200)]
bcachefs: lock time stats prep work.

We need the caller name and a place to store our results, btree_trans provides this.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
10 months agobcachefs: Unlock in bch2_trans_begin() if we've held locks more than 10us
Kent Overstreet [Wed, 13 Jul 2022 10:03:21 +0000 (06:03 -0400)]
bcachefs: Unlock in bch2_trans_begin() if we've held locks more than 10us

We try to ensure we never hold btree locks for too long - bcachefs tries
to be soft realtime. This adds a check when restarting a transaction,
where a transaction restart is cheap - if we've been holding locks for
too long, drop and retake them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: for_each_btree_key2()
Kent Overstreet [Sat, 16 Jul 2022 00:51:09 +0000 (20:51 -0400)]
bcachefs: for_each_btree_key2()

This introduces two new macros for iterating through the btree, with
transaction restart handling
 - for_each_btree_key2()
 - for_each_btree_key_commit()

Every iteration is now in an implicit transaction, and - as with
lockrestart_do() and commit_do() - returning -EINTR will cause the
transaction to be restarted, at the same key.

This patch converts a bunch of code that was open coding this to these
new macros, saving a substantial amount of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix repair for extent past end of inode
Kent Overstreet [Sun, 17 Jul 2022 03:31:28 +0000 (23:31 -0400)]
bcachefs: Fix repair for extent past end of inode

When we find an extent past an inode's i_size, we need to do the
deletion in the inode's snapshot (which will emit a whiteout if
necessary); and we also need to note that we now have an a key at that
position and snapshot, so that we don't go into an infinite loop.

Also, switch to walking inodes in reverse older, oldest snapshot to
newest, so that we emit the fewest whiteouts possible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: When fsck finds redundant snapshot keys, trigger snapshots cleanup
Kent Overstreet [Sun, 17 Jul 2022 03:21:15 +0000 (23:21 -0400)]
bcachefs: When fsck finds redundant snapshot keys, trigger snapshots cleanup

Fsck now checks for keys in different snapshot IDs that are now
redundant due to other snapshots being deleted - it needs to for its own
algorithms to not get confused.

When it detects this it should re-run the post snapshot deletion cleanup
- this patch does that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Improve fsck for subvols/snapshots
Kent Overstreet [Thu, 14 Jul 2022 09:44:10 +0000 (05:44 -0400)]
bcachefs: Improve fsck for subvols/snapshots

 - Bunch of refactoring, and move some code out of
   bch2_snapshots_start() and into bch2_snapshots_check(), for constency
   with the rest of fsck

 - Interior snapshot nodes no longer point to a subvolume; this is so we
   don't end up with dangling subvol references when deleting or require
   scanning the full snapshots btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Improve snapshots_seen
Kent Overstreet [Thu, 14 Jul 2022 06:47:36 +0000 (02:47 -0400)]
bcachefs: Improve snapshots_seen

This makes the snapshots_seen data structure fsck private and improves
it; we now also track the equivalence class for each snapshot id we've
seen, which means we can detect when snapshot deletion hasn't finished
or run correctly (which will otherwise confuse fsck).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix subvol/snapshot deleting in recovery
Kent Overstreet [Thu, 14 Jul 2022 05:10:24 +0000 (01:10 -0400)]
bcachefs: Fix subvol/snapshot deleting in recovery

fsck doesn't want to run while we're cleaning up deleted snapshots - if
that work needs to be done, we want it to have finished before fsck
runs, otherwise fsck will get confused when it finds multiple keys in
the same snapshot ID equivalence class (i.e. the mechanism that
snapshot deletion uses for cleaning up redundant keys).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: fsck_inode_rm() shouldn't delete subvols
Kent Overstreet [Thu, 14 Jul 2022 04:44:09 +0000 (00:44 -0400)]
bcachefs: fsck_inode_rm() shouldn't delete subvols

We should never see an inode marked as unlinked that's a subvolume root
(or a directory) in fsck, but even if we do it's not correct for fsck to
delete the subvolume: subvolumes are owned by dirents, and if we find a
dangling subvolume (not marked as unlinked) we want fsck to reattach it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Switch data_update path to snapshot_id_list
Kent Overstreet [Thu, 14 Jul 2022 06:34:48 +0000 (02:34 -0400)]
bcachefs: Switch data_update path to snapshot_id_list

snapshots_seen is becoming private to fsck, and snapshot_id_list is
actually what the data update path needs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Fix snapshot deletion
Kent Overstreet [Tue, 12 Jul 2022 13:11:52 +0000 (09:11 -0400)]
bcachefs: Fix snapshot deletion

Snapshots being deleted won't in general have a corresponding subvolume:
this fixes a spurious fsck error where we'd complain about a snapshot
pointing to a missing subvolume - but the subvolume had been deleted,
and the snapshot was pending deletion as well.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Rename __bch2_trans_do() -> commit_do()
Kent Overstreet [Wed, 13 Jul 2022 09:25:29 +0000 (05:25 -0400)]
bcachefs: Rename __bch2_trans_do() -> commit_do()

Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
10 months agobcachefs: Silence some fsck errors when reconstructing alloc info
Kent Overstreet [Tue, 12 Jul 2022 01:06:52 +0000 (21:06 -0400)]
bcachefs: Silence some fsck errors when reconstructing alloc info

There's no need to print fsck errors for errors that are expected, and
the user has already opted to repair.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>