linux-block.git
11 months agobcachefs: Change when we allow overwrites
Kent Overstreet [Tue, 15 Dec 2020 02:59:33 +0000 (21:59 -0500)]
bcachefs: Change when we allow overwrites

Originally, we'd check for -ENOSPC when getting a disk reservation
whenever the new extent took up more space on disk than the old extent.

Erasure coding screwed this up, because with erasure coding writes are
initially replicated, and then in the background the extra replicas are
dropped when the stripe is created. This means that with erasure coding
enabled, writes will always take up more space on disk than the data
they're overwriting - but, according to posix, overwrites aren't
supposed to return ENOSPC.

So, in this patch we fudge things: if the new extent has more replicas
than the _effective_ replicas of the old extent, or if the old extent is
compressed and the new one isn't, we check for ENOSPC when getting the
disk reservation - otherwise, we don't.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't use BTREE_INSERT_USE_RESERVE so much
Kent Overstreet [Mon, 21 Dec 2020 22:17:18 +0000 (17:17 -0500)]
bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much

Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

 - we now have more open_buckets than we used to, and the reserves work
   better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
   we're holding open_buckets pinned anymore.

 - We have the btree key cache for updates to the alloc btree, meaning
   we no longer need the btree reserve to ensure the allocator can make
   forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix iterator overflow in move path
Kent Overstreet [Mon, 21 Dec 2020 02:42:19 +0000 (21:42 -0500)]
bcachefs: Fix iterator overflow in move path

The move path was calling bch2_bucket_io_time_reset() for cached
pointers (which it shouldn't have been), and then not calling
bch2_trans_reset() when it got -EINTR (indicating transaction restart).
Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix btree lock being incorrectly dropped
Kent Overstreet [Sun, 20 Dec 2020 02:31:05 +0000 (21:31 -0500)]
bcachefs: Fix btree lock being incorrectly dropped

__btree_trans_get_iter() was using bch2_btree_iter_upgrade, but it
shouldn't have been because on failure bch2_btree_iter_upgrade may drop
locks in other iterators, expecting the transaction to be restarted. But
__btree_trans_get_iter can't return an error to indicate that we need to
restart thet transaction - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix for spinning in journal reclaim on startup
Kent Overstreet [Sat, 19 Dec 2020 20:39:10 +0000 (15:39 -0500)]
bcachefs: Fix for spinning in journal reclaim on startup

We normally avoid having too many dirty keys in the btree key cache, to
ensure that we can always shrink our caches to reclaim memory if needed.

But this check was causing us to go into an infinite loop on startup, in
the btree insert path before journal reclaim was started.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix race between journal_seq_copy() and journal_seq_drop()
Kent Overstreet [Wed, 16 Dec 2020 20:41:29 +0000 (15:41 -0500)]
bcachefs: Fix race between journal_seq_copy() and journal_seq_drop()

In bch2_btree_interior_update_will_free_node, we copy the journal pins
from outstanding writes on the btree node we're about to free. But, this
can race with the writes completing, and dropping their journal pins.

To guard against this, just use READ_ONCE() in bch2_journal_pin_copy().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't write bucket IO time lazily
Kent Overstreet [Sat, 17 Oct 2020 01:39:16 +0000 (21:39 -0400)]
bcachefs: Don't write bucket IO time lazily

With the btree key cache code, we don't need to update the alloc btree
lazily - and this will mean we can remove the bch2_alloc_write() call in
the shutdown path.

Future work: we really need to expend the bucket IO clocks from 16 to 64
bits, so that we don't have to rescale them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add BCH_BKEY_PTRS_MAX
Kent Overstreet [Wed, 16 Dec 2020 19:23:27 +0000 (14:23 -0500)]
bcachefs: Add BCH_BKEY_PTRS_MAX

This now means "the maximum number of pointers within a bkey" - and
bch_devs_list is updated to use it instead of BCH_REPLICAS_MAX, since
stripes can contain more than BCH_REPLICAS_MAX pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Check for duplicate device ptrs in bch2_bkey_ptrs_invalid()
Kent Overstreet [Wed, 16 Dec 2020 19:18:33 +0000 (14:18 -0500)]
bcachefs: Check for duplicate device ptrs in bch2_bkey_ptrs_invalid()

This is something we clearly should be checking for, but weren't -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add some cond_rescheds() in shutdown path
Kent Overstreet [Sun, 13 Dec 2020 21:12:04 +0000 (16:12 -0500)]
bcachefs: Add some cond_rescheds() in shutdown path

Particularly on emergency shutdown we can end up having to clean up a
lot of dirty cached btree keys here.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix btree node merge -> split operations
Kent Overstreet [Fri, 11 Dec 2020 17:02:48 +0000 (12:02 -0500)]
bcachefs: Fix btree node merge -> split operations

If a btree node merger is followed by a split or compact of the parent
node, we could end up with the parent btree node iterator pointing to
the whiteout inserted by the btree node merge operation - the fix is to
ensure that interior btree node iterators always point to the first non
whiteout.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Always check if we need disk res in extent update path
Kent Overstreet [Thu, 10 Dec 2020 18:38:54 +0000 (13:38 -0500)]
bcachefs: Always check if we need disk res in extent update path

With erasure coding, we now have processes in the background that
compact data, causing it to take up less space on disk than when it was
written, or potentially when it was read.

This means that we can't trust the page cache when it says "we have data
on disk taking up x amount of space here" - there's always the potential
to race with background compaction.

To fix this, just check if we need to add to our disk reservation in the
bch2_extent_update() path, in the transaction that will do the btree
update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Update transactional triggers interface to pass old & new keys
Kent Overstreet [Thu, 10 Dec 2020 18:13:56 +0000 (13:13 -0500)]
bcachefs: Update transactional triggers interface to pass old & new keys

This is needed to fix a bug where we're overflowing iterators within a
btree transaction, because we're updating the stripes btree (to update
block counts) and the stripes btree trigger is unnecessarily updating
the alloc btree - it doesn't need to update the alloc btree when the
pointers within a stripe aren't changing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Only try to get existing stripe once in stripe create path
Kent Overstreet [Wed, 9 Dec 2020 18:39:30 +0000 (13:39 -0500)]
bcachefs: Only try to get existing stripe once in stripe create path

The stripe creation path was too state-machiney: it would always run the
full state machine until it had succesfully created a new stripe.

But if we tried to get and reuse an existing stripe after we'd already
allocated some buckets, the buckets we'd allocated might have conflicted
with the blocks in the existing stripe we need to keep - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix __btree_iter_next() when all iters are in use_next() when all iters...
Kent Overstreet [Wed, 9 Dec 2020 18:34:42 +0000 (13:34 -0500)]
bcachefs: Fix __btree_iter_next() when all iters are in use_next() when all iters are in use

Also, print out more information on btree transaction iterator overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix rand_delete() test
Kent Overstreet [Mon, 7 Dec 2020 16:44:12 +0000 (11:44 -0500)]
bcachefs: Fix rand_delete() test

When we didn't find a key to delete we were getting a null ptr deref.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Try to print full btree error message
Kent Overstreet [Sun, 6 Dec 2020 21:30:02 +0000 (16:30 -0500)]
bcachefs: Try to print full btree error message

Metadata corruption bugs are hard to debug if we can't see exactly what
went wrong - try to allocate a bigger buffer so we can print out
everything we have.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Prevent journal reclaim from spinning
Kent Overstreet [Sun, 6 Dec 2020 21:29:13 +0000 (16:29 -0500)]
bcachefs: Prevent journal reclaim from spinning

Without checking if we actually flushed anything, journal reclaim could
still go into an infinite loop while trying ot shut down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix btree key cache dirty checks
Kent Overstreet [Sun, 6 Dec 2020 02:03:57 +0000 (21:03 -0500)]
bcachefs: Fix btree key cache dirty checks

Had a type that meant we were triggering journal reclaim _much_ more
aggressively than needed. Also, fix a potential integer overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Be more conservation about journal pre-reservations
Kent Overstreet [Sat, 5 Dec 2020 21:25:05 +0000 (16:25 -0500)]
bcachefs: Be more conservation about journal pre-reservations

 - Try to always keep 1/8th of the journal free, on top of
   pre-reservations
 - Move the check for whether the journal is stuck to
   bch2_journal_space_available, and make it only fire when there aren't
   any journal writes in flight (that might free up space by updating
   last_seq)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't require flush/fua on every journal write
Kent Overstreet [Sat, 14 Nov 2020 14:59:58 +0000 (09:59 -0500)]
bcachefs: Don't require flush/fua on every journal write

This patch adds a flag to journal entries which, if set, indicates that
they weren't done as flush/fua writes.

 - non flush/fua journal writes don't update last_seq (i.e. they don't
   free up space in the journal), thus the journal free space
   calculations now check whether nonflush journal writes are currently
   allowed (i.e. are we low on free space, or would doing a flush write
   free up a lot of space in the journal)

 - write_delay_ms, the user configurable option for when open journal
   entries are automatically written, is now interpreted as the max
   delay between flush journal writes (default 1 second).

 - bch2_journal_flush_seq_async is changed to ensure a flush write >=
   the requested sequence number has happened

 - journal read/replay must now ignore, and blacklist, any journal
   entries newer than the most recent flush entry in the journal. Also,
   the way the read_entire_journal option is handled has been improved;
   struct journal_replay now has an entry, 'ignore', for entries that
   were read but should not be used.

 - assorted refactoring and improvements related to journal read in
   journal_io.c and recovery.c

Previously, we'd have to issue a flush/fua write every time we
accumulated a full journal entry - typically the bucket size. Now we
need to issue them much less frequently: when an fsync is requested, or
it's been more than write_delay_ms since the last flush, or when we need
to free up space in the journal. This is a significant performance
improvement on many write heavy workloads.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve journal free space calculations
Kent Overstreet [Sat, 14 Nov 2020 17:29:21 +0000 (12:29 -0500)]
bcachefs: Improve journal free space calculations

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Increase journal pipelining
Kent Overstreet [Fri, 13 Nov 2020 23:36:33 +0000 (18:36 -0500)]
bcachefs: Increase journal pipelining

This patch increases the maximum journal buffers in flight from 2 to 4 -
this will be particularly helpful when in the future we stop requiring
flush+fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't issue btree writes that weren't journalled
Kent Overstreet [Thu, 3 Dec 2020 21:20:18 +0000 (16:20 -0500)]
bcachefs: Don't issue btree writes that weren't journalled

If we have an error in the btree interior update path that prevents us
from journalling the update, we can't issue the corresponding btree node
write - we didn't get a journal sequence number that would cause it to
be ignored in recovery.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Check for errors in bch2_journal_reclaim()
Kent Overstreet [Thu, 3 Dec 2020 18:23:58 +0000 (13:23 -0500)]
bcachefs: Check for errors in bch2_journal_reclaim()

If the journal is halted, journal reclaim won't necessarily be able to
make any forward progress, and won't accomplish anything anyways - we
should bail out so that we don't get stuck looping in reclaim when the
caches are too dirty and we should be shutting down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Flag inodes that had btree update errors
Kent Overstreet [Thu, 3 Dec 2020 19:27:20 +0000 (14:27 -0500)]
bcachefs: Flag inodes that had btree update errors

On write error, the vfs inode's i_size may be inconsistent with the
btree inode's i_size - flag this so we don't have spurious assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve some IO error messages
Kent Overstreet [Thu, 3 Dec 2020 18:57:22 +0000 (13:57 -0500)]
bcachefs: Improve some IO error messages

it's useful to know whether an error was for a read or a write - this
also standardizes error messages a bit more.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Refactor filesystem usage accounting
Kent Overstreet [Fri, 13 Nov 2020 23:36:33 +0000 (18:36 -0500)]
bcachefs: Refactor filesystem usage accounting

Various filesystem usage counters are kept in percpu counters, with one
set per in flight journal buffer. Right now all the code that deals with
it assumes that there's only two buffers/sets of counters, but the
number of journal bufs is getting increased to 4 in the next patch - so
refactor that code to not assume a constant.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix spurious alloc errors on forced shutdown
Kent Overstreet [Wed, 2 Dec 2020 23:30:06 +0000 (18:30 -0500)]
bcachefs: Fix spurious alloc errors on forced shutdown

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix some spurious gcc warnings
Kent Overstreet [Thu, 3 Dec 2020 18:09:08 +0000 (13:09 -0500)]
bcachefs: Fix some spurious gcc warnings

These only come up when building in userspace, for some reason.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix journal_flush_seq()
Kent Overstreet [Wed, 2 Dec 2020 20:33:12 +0000 (15:33 -0500)]
bcachefs: Fix journal_flush_seq()

The error check was inverted - leading fsyncs to get stuck and hang,
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_trans_get_iter() no longer returns errors
Kent Overstreet [Wed, 2 Dec 2020 04:11:53 +0000 (23:11 -0500)]
bcachefs: bch2_trans_get_iter() no longer returns errors

Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add error handling to unit & perf tests
Kent Overstreet [Tue, 1 Dec 2020 17:23:55 +0000 (12:23 -0500)]
bcachefs: Add error handling to unit & perf tests

This way, these tests can be used with tests that inject IO errors and
shut down the filesystem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Journal pin refactoring
Kent Overstreet [Tue, 1 Dec 2020 16:48:08 +0000 (11:48 -0500)]
bcachefs: Journal pin refactoring

This deletes some duplicated code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix for fsck spuriously finding duplicate extents
Kent Overstreet [Tue, 1 Dec 2020 16:42:23 +0000 (11:42 -0500)]
bcachefs: Fix for fsck spuriously finding duplicate extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Use BTREE_ITER_PREFETCH in journal+btree iter
Kent Overstreet [Tue, 1 Dec 2020 16:40:59 +0000 (11:40 -0500)]
bcachefs: Use BTREE_ITER_PREFETCH in journal+btree iter

Introducing the journal+btree iter introduced a regression where we
stopped using BTREE_ITER_PREFETCH - this is a performance regression on
rotating disks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Ensure we always have a journal pin in interior update path
Kent Overstreet [Mon, 30 Nov 2020 07:08:14 +0000 (02:08 -0500)]
bcachefs: Ensure we always have a journal pin in interior update path

For the new nodes an interior btree update makes reachable, updates to
those nodes may be journalled after the btree update starts but before
the transactional part - where we make those nodes reachable. Those
updates need to be kept in the journal until after the btree update
completes, hence we should always get a journal pin at the start of the
interior update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Change a BUG_ON() to a fatal error
Kent Overstreet [Mon, 30 Nov 2020 07:07:38 +0000 (02:07 -0500)]
bcachefs: Change a BUG_ON() to a fatal error

In the btree key cache code, failing to flush a dirty key is a serious
error, but it doesn't need to be a BUG_ON(), we can stop the filesystem
instead.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix error in filesystem initialization
Kent Overstreet [Mon, 30 Nov 2020 04:48:20 +0000 (23:48 -0500)]
bcachefs: Fix error in filesystem initialization

The rhashtable code doesn't like when we destroy an rhashtable that was
never initialized

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix journal reclaim spinning in recovery
Kent Overstreet [Sun, 29 Nov 2020 22:09:13 +0000 (17:09 -0500)]
bcachefs: Fix journal reclaim spinning in recovery

We can't run journal reclaim until we've finished replaying updates to
interior btree nodes - the check for this was in the wrong place though,
leading to journal reclaim spinning before it was allowed to proceed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix for __readahead_batch getting partial batch
Kent Overstreet [Sun, 29 Nov 2020 21:00:47 +0000 (16:00 -0500)]
bcachefs: Fix for __readahead_batch getting partial batch

We were incorrectly ignoring the return value of __readahead_batch,
leading to a null ptr deref in __bch2_page_state_create().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Optimize bch2_journal_flush_seq_async()
Kent Overstreet [Sat, 21 Nov 2020 00:27:57 +0000 (19:27 -0500)]
bcachefs: Optimize bch2_journal_flush_seq_async()

Avoid taking the journal lock if we don't have to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Delete dead code
Kent Overstreet [Sat, 21 Nov 2020 03:51:04 +0000 (22:51 -0500)]
bcachefs: Delete dead code

The interior btree node update path has changed, this is no longer
needed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_delete_range_trans()
Kent Overstreet [Sat, 21 Nov 2020 02:28:55 +0000 (21:28 -0500)]
bcachefs: bch2_btree_delete_range_trans()

This helps reduce stack usage by avoiding multiple btree_trans on the
stack.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't use bkey cache for inode update in fsck
Kent Overstreet [Sat, 21 Nov 2020 02:21:28 +0000 (21:21 -0500)]
bcachefs: Don't use bkey cache for inode update in fsck

fsck doesn't know about the btree key cache, and non-cached iterators
aren't cache coherent (yet?)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix an rcu splat
Kent Overstreet [Fri, 20 Nov 2020 21:12:39 +0000 (16:12 -0500)]
bcachefs: Fix an rcu splat

bch2_bucket_alloc() requires rcu_read_lock() to be held.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Move journal reclaim to a kthread
Kent Overstreet [Fri, 20 Nov 2020 01:55:33 +0000 (20:55 -0500)]
bcachefs: Move journal reclaim to a kthread

This is to make tracing easier.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Throttle updates when btree key cache is too dirty
Kent Overstreet [Fri, 20 Nov 2020 02:40:03 +0000 (21:40 -0500)]
bcachefs: Throttle updates when btree key cache is too dirty

This is needed to ensure we don't deadlock because journal reclaim and
thus memory reclaim isn't making forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Journal reclaim requires memalloc_noreclaim_save()
Kent Overstreet [Fri, 20 Nov 2020 02:15:39 +0000 (21:15 -0500)]
bcachefs: Journal reclaim requires memalloc_noreclaim_save()

Memory reclaim requires journal reclaim to make forward progress - it's
what cleans our caches - thus, while we're in journal reclaim or holding
the journal reclaim lock we can't recurse into  memory reclaim.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Simplify transaction commit error path
Kent Overstreet [Fri, 20 Nov 2020 18:24:51 +0000 (13:24 -0500)]
bcachefs: Simplify transaction commit error path

The transaction restart path traverses all iterators, we don't need to
do it here.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Ensure journal reclaim runs when btree key cache is too dirty
Kent Overstreet [Fri, 20 Nov 2020 00:54:40 +0000 (19:54 -0500)]
bcachefs: Ensure journal reclaim runs when btree key cache is too dirty

Ensuring the key cache isn't too dirty is critical for ensuring that the
shrinker can reclaim memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve btree key cache shrinker
Kent Overstreet [Thu, 19 Nov 2020 20:38:27 +0000 (15:38 -0500)]
bcachefs: Improve btree key cache shrinker

The shrinker should start scanning for entries that can be freed oldest
to newest - this way, we can avoid scanning a lot of entries that are
too new to be freed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: More debug code improvements
Kent Overstreet [Thu, 19 Nov 2020 16:53:38 +0000 (11:53 -0500)]
bcachefs: More debug code improvements

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add a kmem_cache for btree_key_cache objects
Kent Overstreet [Wed, 18 Nov 2020 19:09:33 +0000 (14:09 -0500)]
bcachefs: Add a kmem_cache for btree_key_cache objects

We allocate a lot of these, and we're seeing sporading OOMs - this will
help with tracking those down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Be more precise with journal error reporting
Kent Overstreet [Wed, 18 Nov 2020 18:21:59 +0000 (13:21 -0500)]
bcachefs: Be more precise with journal error reporting

We were incorrectly detecting a journal deadlock - the journal filling
up - when only the journal pin fifo had filled up; if the journal pin
fifo is full that just means we need to wait on reclaim.

This plumbs through better error reporting so we can better discriminate
in the journal_res_get path what's going on.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add btree cache stats to sysfs
Kent Overstreet [Fri, 20 Nov 2020 01:13:30 +0000 (20:13 -0500)]
bcachefs: Add btree cache stats to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add an ioctl for resizing journal on a device
Kent Overstreet [Mon, 16 Nov 2020 19:23:06 +0000 (14:23 -0500)]
bcachefs: Add an ioctl for resizing journal on a device

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add more debug checks
Kent Overstreet [Mon, 16 Nov 2020 19:16:42 +0000 (14:16 -0500)]
bcachefs: Add more debug checks

tracking down a bug where we see a btree node pointer in the wrong node

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Dump journal state when the journal deadlocks
Kent Overstreet [Mon, 16 Nov 2020 23:21:55 +0000 (18:21 -0500)]
bcachefs: Dump journal state when the journal deadlocks

Currently tracking down one of these bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Dont' use percpu btree_iter buf in userspace
Kent Overstreet [Mon, 16 Nov 2020 23:20:50 +0000 (18:20 -0500)]
bcachefs: Dont' use percpu btree_iter buf in userspace

bcachefs-tools doesn't have a real percpu (per thread) implementation
yet

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Set preallocated transaction mem to avoid restarts
Kent Overstreet [Mon, 16 Nov 2020 01:52:55 +0000 (20:52 -0500)]
bcachefs: Set preallocated transaction mem to avoid restarts

this will reduce transaction restarts, from observation of tracepoints.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Convert tracepoints to use %ps, not %pf
Kent Overstreet [Mon, 16 Nov 2020 18:06:28 +0000 (13:06 -0500)]
bcachefs: Convert tracepoints to use %ps, not %pf

Symbol decoding was changed from %pf to %ps

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix journal entry repair code
Kent Overstreet [Mon, 16 Nov 2020 17:22:30 +0000 (12:22 -0500)]
bcachefs: Fix journal entry repair code

When we detect bad keys in the journal that have to be dropped, the flow
control was wrong - we ended up not checking the next key in that entry.
Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add a shrinker for the btree key cache
Kent Overstreet [Thu, 12 Nov 2020 22:19:47 +0000 (17:19 -0500)]
bcachefs: Add a shrinker for the btree key cache

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Take a SRCU lock in btree transactions
Kent Overstreet [Sun, 15 Nov 2020 21:30:22 +0000 (16:30 -0500)]
bcachefs: Take a SRCU lock in btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Check for errors from register_shrinker()
Kent Overstreet [Sun, 15 Nov 2020 21:31:58 +0000 (16:31 -0500)]
bcachefs: Check for errors from register_shrinker()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Assorted journal refactoring
Kent Overstreet [Sat, 14 Nov 2020 21:04:30 +0000 (16:04 -0500)]
bcachefs: Assorted journal refactoring

Improved the way we track various state by adding j->err_seq, which
records the first journal sequence number that encountered an error
being written, and j->last_empty_seq, which records the most recent
journal entry that was completely empty.

Also, use the low bits of the journal sequence number to index the
corresponding journal_buf.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Delete dead journalling code
Kent Overstreet [Sat, 14 Nov 2020 18:12:50 +0000 (13:12 -0500)]
bcachefs: Delete dead journalling code

Usage of the journal has gotten somewhat simpler over time - neat.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve journal error messages
Kent Overstreet [Fri, 13 Nov 2020 21:19:24 +0000 (16:19 -0500)]
bcachefs: Improve journal error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Be more careful in bch2_bkey_to_text()
Kent Overstreet [Fri, 13 Nov 2020 20:03:34 +0000 (15:03 -0500)]
bcachefs: Be more careful in bch2_bkey_to_text()

This is used to print keys that failed bch2_bkey_invalid(), so be more
careful with k->type.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Inode delete doesn't need to flush key cache anymore
Kent Overstreet [Fri, 13 Nov 2020 21:51:02 +0000 (16:51 -0500)]
bcachefs: Inode delete doesn't need to flush key cache anymore

Inode create checks to make sure the slot doesn't exist in the btree key
cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a btree transaction iter overflow
Kent Overstreet [Fri, 13 Nov 2020 23:30:53 +0000 (18:30 -0500)]
bcachefs: Fix a btree transaction iter overflow

extent_replay_key dates from before putting iterators was required -
fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a 64 bit divide
Kent Overstreet [Fri, 13 Nov 2020 19:49:57 +0000 (14:49 -0500)]
bcachefs: Fix a 64 bit divide

this fixes builds on 32 bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve journal entry validate code
Kent Overstreet [Fri, 13 Nov 2020 19:39:43 +0000 (14:39 -0500)]
bcachefs: Improve journal entry validate code

Previously, the journal entry read code was changed so that if we got a
journal entry that failed validation, we'd try to use it, preferring to
use a good version from another device if available.

But this left a bug where if an earlier validation check (say, checksum)
failed, the later checks (for last_seq) wouldn't run and we'd end up
using a journal entry with a garbage last_seq field. This fixes that so
that the later validation checks run and if necessary change those
fields to something sensible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Deadlock prevention for ei_pagecache_lock
Kent Overstreet [Wed, 11 Nov 2020 17:33:12 +0000 (12:33 -0500)]
bcachefs: Deadlock prevention for ei_pagecache_lock

In the dio write path, when get_user_pages() invokes the fault handler
we have a recursive locking situation - we have to handle the lock
ordering ourselves or we have a deadlock: this patch addresses that by
checking for locking ordering violations and doing the unlock/relock
dance if necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Hack around bch2_varint_decode invalid reads
Kent Overstreet [Wed, 11 Nov 2020 17:42:54 +0000 (12:42 -0500)]
bcachefs: Hack around bch2_varint_decode invalid reads

bch2_varint_decode can do reads up to 7 bytes past the end ptr, for the
sake of performance - these extra bytes are always masked off.

This won't be a problem in practice if we make sure to burn 8 bytes in
any buffer that has bkeys in it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix missing memalloc_nofs_restore()
Kent Overstreet [Wed, 11 Nov 2020 23:59:41 +0000 (18:59 -0500)]
bcachefs: Fix missing memalloc_nofs_restore()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix btree key cache shutdown
Kent Overstreet [Wed, 11 Nov 2020 22:47:39 +0000 (17:47 -0500)]
bcachefs: Fix btree key cache shutdown

On emergency shutdown, we might still have dirty keys in the btree key
cache that need to be cleaned up properly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add accounting for dirty btree nodes/keys
Kent Overstreet [Mon, 9 Nov 2020 18:01:52 +0000 (13:01 -0500)]
bcachefs: Add accounting for dirty btree nodes/keys

This lets us improve journal reclaim, so that it now tries to make sure
no more than 3/4s of the btree node cache and btree key cache are dirty
- ensuring the shrinkers can free memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix btree iterator leak
Kent Overstreet [Sat, 7 Nov 2020 21:55:57 +0000 (16:55 -0500)]
bcachefs: Fix btree iterator leak

this fixes an occasonial btree transaction iterators overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Inline make_bfloat() into __build_ro_aux_tree()
Kent Overstreet [Sat, 7 Nov 2020 21:16:52 +0000 (16:16 -0500)]
bcachefs: Inline make_bfloat() into __build_ro_aux_tree()

This is a fast path - also, lift out the checks/init for min/max key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: use a radix tree for inum bitmap in fsck
Kent Overstreet [Sat, 7 Nov 2020 18:03:24 +0000 (13:03 -0500)]
bcachefs: use a radix tree for inum bitmap in fsck

The change to use the cpu nr for the high bits of new inode numbers
means that inode numbers are very space - we see -ENOMEM during fsck
without this.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: New varints
Kent Overstreet [Fri, 6 Nov 2020 04:39:33 +0000 (23:39 -0500)]
bcachefs: New varints

Previous varint implementation used by the inode code was not nearly as
fast as it could have been; partly because it was attempting to encode
integers up to 96 bits (for timestamps) but this meant that encoding and
decoding the length required a table lookup.

Instead, we'll just encode timestamps greater than 64 bits as two
separate varints; this will make decoding/encoding of inodes
significantly faster overall.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix build warning when CONFIG_BCACHEFS_DEBUG=n
Kent Overstreet [Sat, 7 Nov 2020 17:43:48 +0000 (12:43 -0500)]
bcachefs: Fix build warning when CONFIG_BCACHEFS_DEBUG=n

this function is only used by debug code, but we'd like to always build
it so we know that it does build.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Drop typechecking from bkey_cmp_packed()
Kent Overstreet [Sat, 7 Nov 2020 17:31:20 +0000 (12:31 -0500)]
bcachefs: Drop typechecking from bkey_cmp_packed()

This only did anything in two places, and those can just be replaced
wiht bkey_cmp_left_packed()).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: More inlinining in the btree key cache code
Kent Overstreet [Fri, 6 Nov 2020 06:34:41 +0000 (01:34 -0500)]
bcachefs: More inlinining in the btree key cache code

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix spurious transaction restarts
Kent Overstreet [Fri, 6 Nov 2020 01:49:08 +0000 (20:49 -0500)]
bcachefs: Fix spurious transaction restarts

The checks for lock ordering violations weren't quite right.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add a single slot percpu buf for btree iters
Kent Overstreet [Fri, 6 Nov 2020 01:02:01 +0000 (20:02 -0500)]
bcachefs: Add a single slot percpu buf for btree iters

Allocating our array of btree iters is a big enough allocation that it
hits the buddy allocator, and we're seeing lots of lock contention.
Sticking a single element buffer in front of it should help.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Use attach_page_private and detach_page_private
Matthew Wilcox (Oracle) [Thu, 5 Nov 2020 15:58:38 +0000 (15:58 +0000)]
bcachefs: Use attach_page_private and detach_page_private

These recently added helpers simplify the code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Remove page_state_init_for_read
Matthew Wilcox (Oracle) [Thu, 5 Nov 2020 15:58:37 +0000 (15:58 +0000)]
bcachefs: Remove page_state_init_for_read

This is dead code; delete the function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Build fixes for 32bit x86
Kent Overstreet [Thu, 5 Nov 2020 17:16:05 +0000 (12:16 -0500)]
bcachefs: Build fixes for 32bit x86

PAGE_SIZE and size_t are not unsigned longs on 32 bit, annoying...

also switch to atomic64_cmpxchg instead of cmpxchg() for
journal_seq_copy, as atomic64_cmpxchg has a fallback that uses spinlocks
for when it's not supported.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improved inode create optimization
Kent Overstreet [Tue, 3 Nov 2020 04:51:33 +0000 (23:51 -0500)]
bcachefs: Improved inode create optimization

This shards new inodes into different btree nodes by using the processor
ID for the high bits of the new inode number. Much faster than the
previous inode create optimization - this also helps with sharding in
the other btrees that index by inode number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Report inode counts via statfs
Kent Overstreet [Tue, 3 Nov 2020 00:49:23 +0000 (19:49 -0500)]
bcachefs: Report inode counts via statfs

Took awhile to figure out exactly what statfs wanted...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: add const annotations to bset.c
Kent Overstreet [Tue, 3 Nov 2020 00:15:18 +0000 (19:15 -0500)]
bcachefs: add const annotations to bset.c

perhaps a bit silly, but some debug assertions we want to add need const
propagated a bit more.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't embed btree iters in btree_trans
Kent Overstreet [Mon, 2 Nov 2020 23:54:33 +0000 (18:54 -0500)]
bcachefs: Don't embed btree iters in btree_trans

These haven't been in used since reallocing iterators has been disabled,
and saves us a lot of stack if we get rid of it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Split out debug_check_btree_accounting
Kent Overstreet [Mon, 2 Nov 2020 23:36:08 +0000 (18:36 -0500)]
bcachefs: Split out debug_check_btree_accounting

This check is very expensive

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Drop sysfs interface to debug parameters
Kent Overstreet [Mon, 2 Nov 2020 23:20:44 +0000 (18:20 -0500)]
bcachefs: Drop sysfs interface to debug parameters

It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Minor journal reclaim improvement
Kent Overstreet [Mon, 2 Nov 2020 22:51:38 +0000 (17:51 -0500)]
bcachefs: Minor journal reclaim improvement

With the btree key cache code, journal reclaim now has a lot more work
to do. It could be the case that after journal reclaim has finished one
iteration there's already more work to do, so put it in a loop to check
for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Inode create optimization
Kent Overstreet [Tue, 27 Oct 2020 22:56:21 +0000 (18:56 -0400)]
bcachefs: Inode create optimization

On workloads that do a lot of multithreaded creates all at once, lock
contention on the inodes btree turns out to still be an issue.

This patch adds a small buffer of inode numbers that are known to be
free, so that we can avoid touching the btree on every create. Also,
this changes inode creates to update via the btree key cache for the
initial create.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve check for when bios are physically contiguous
Kent Overstreet [Fri, 30 Oct 2020 21:29:38 +0000 (17:29 -0400)]
bcachefs: Improve check for when bios are physically contiguous

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>