linux-2.6-block.git
11 months agobcachefs: Fix flushing held btree writes when there's a fs error
Kent Overstreet [Sat, 12 Oct 2019 20:44:44 +0000 (16:44 -0400)]
bcachefs: Fix flushing held btree writes when there's a fs error

Previously, we'd go into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix iterator counting for reflink pointers (again)
Kent Overstreet [Sat, 12 Oct 2019 18:44:09 +0000 (14:44 -0400)]
bcachefs: Fix iterator counting for reflink pointers (again)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a debug assertion
Kent Overstreet [Sat, 12 Oct 2019 18:13:45 +0000 (14:13 -0400)]
bcachefs: Fix a debug assertion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Switch to .iterate_shared for readdir
Kent Overstreet [Fri, 11 Oct 2019 19:14:36 +0000 (15:14 -0400)]
bcachefs: Switch to .iterate_shared for readdir

We definitely don't need an exclusive inode lock for readdir.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix creation of lost+found
Kent Overstreet [Fri, 11 Oct 2019 19:03:32 +0000 (15:03 -0400)]
bcachefs: Fix creation of lost+found

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a subtle race in the btree split path
Kent Overstreet [Fri, 11 Oct 2019 18:45:22 +0000 (14:45 -0400)]
bcachefs: Fix a subtle race in the btree split path

We have to free the old (in memory) btree node _before_ unlocking the
new nodes - else, some other thread with a read lock on the old node
could see stale data after another thread has already updated the new
node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill bchfs_extent_update()
Kent Overstreet [Wed, 9 Oct 2019 16:50:39 +0000 (12:50 -0400)]
bcachefs: Kill bchfs_extent_update()

The generic IO path now handles inode updates for i_size and i_sectors -
this means we can drop a fair amount of code from fs-io.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Convert bch2_fpunch to bch2_extent_update()
Kent Overstreet [Thu, 10 Oct 2019 16:47:22 +0000 (12:47 -0400)]
bcachefs: Convert bch2_fpunch to bch2_extent_update()

As before - we're moving non Linux specific code out of fs-io.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Split out bchfs_extent_update()
Kent Overstreet [Wed, 9 Oct 2019 16:11:00 +0000 (12:11 -0400)]
bcachefs: Split out bchfs_extent_update()

The next few patches are going to be more moving the logic around
i_size/i_sectors updates to io.c, and better separating the Linux VFS
specific code from core bcachefs code, to better support the fuse port.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill some dependencies on ei_inode
Kent Overstreet [Wed, 9 Oct 2019 15:12:48 +0000 (11:12 -0400)]
bcachefs: Kill some dependencies on ei_inode

Moving bch2_extent_update() to io.c will be greatly simplified if we
no longer have to keep ei_inode.bi_size/bi_sectors up to date.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Check if extending inode differently
Kent Overstreet [Wed, 9 Oct 2019 13:44:36 +0000 (09:44 -0400)]
bcachefs: Check if extending inode differently

In bch2_extent_update(), we have to update the inode if i_size is
changing (the file is being extend) or if i_sectors is changing, but we
want to avoid touching the inode if it's not necessary.

Change sum_sector_overwrites() to also check if there's already data
above where we're writing to - this means we're definitely not extending
the file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_btree_iter_next() after peek_slot()
Kent Overstreet [Wed, 9 Oct 2019 14:25:32 +0000 (10:25 -0400)]
bcachefs: Fix bch2_btree_iter_next() after peek_slot()

this deserves a unit test

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Refactor bch2_readdir() a bit
Kent Overstreet [Wed, 9 Oct 2019 13:23:30 +0000 (09:23 -0400)]
bcachefs: Refactor bch2_readdir() a bit

The tweaks to ctx->pos handling are also to help the fuse port

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add a lock to bch_page_state
Kent Overstreet [Wed, 9 Oct 2019 13:19:06 +0000 (09:19 -0400)]
bcachefs: Add a lock to bch_page_state

We can't use the page lock to protect it, because on writeback IO error
we need to access the page state before calling end_page_writeback() and
the page lock semantics are completely insane so that deadlocks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix erasure coding disk space accounting
Kent Overstreet [Mon, 7 Oct 2019 19:57:47 +0000 (15:57 -0400)]
bcachefs: Fix erasure coding disk space accounting

Disk space accounting for erasure coding + compression was completely
broken - we need to calculate the parity sectors delta the same way we
calculate disk_sectors, by calculating the old and new usage and
subtracting to get the difference.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix ec_stripes_read()
Kent Overstreet [Wed, 9 Oct 2019 02:56:33 +0000 (22:56 -0400)]
bcachefs: Fix ec_stripes_read()

The bkey_s_c returned by btree_iter_(peek|next) points into the btree
iter type, so advancing the iterator and then using the one previously
returned is a bug...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Limit pointers to being in only one stripe
Kent Overstreet [Tue, 8 Oct 2019 22:45:29 +0000 (18:45 -0400)]
bcachefs: Limit pointers to being in only one stripe

This make the disk accounting code saner, and it's not clear why we'd
ever want the same data to be in multiple stripes simultaneously.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_extent_ptr_durability()
Kent Overstreet [Mon, 7 Oct 2019 20:22:35 +0000 (16:22 -0400)]
bcachefs: Fix bch2_extent_ptr_durability()

We were looking up the wrong entry in the stripes radix tree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_mark_extent()
Kent Overstreet [Wed, 9 Oct 2019 01:33:56 +0000 (21:33 -0400)]
bcachefs: Fix bch2_mark_extent()

If an extent only contained cached or erasure coded pointers, there
won't be any devices in the normal dirty replicas list or an entry to
update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Initialize journal pad data in bch_replica_entry objects.
Justin Husted [Wed, 9 Oct 2019 02:17:06 +0000 (19:17 -0700)]
bcachefs: Initialize journal pad data in bch_replica_entry objects.

Running the filesystem under valgrind exposed some garbage data being
written to disk in bch2_journal_super_entries_add_common(), in the
portion which encodes bch_replica_entry objects.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix uninitialized data in bch2_gc_btree()
Justin Husted [Wed, 9 Oct 2019 02:16:28 +0000 (19:16 -0700)]
bcachefs: Fix uninitialized data in bch2_gc_btree()

Running the filesystem under valgrind exposed a path where the max_stale
variable in bch2_gc_btree() might not be initialized before use in a
rare case when there are no btree nodes in a transaction.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix incorrect use of bch2_extent_atomic_end()
Kent Overstreet [Mon, 7 Oct 2019 19:09:30 +0000 (15:09 -0400)]
bcachefs: Fix incorrect use of bch2_extent_atomic_end()

bch2_extent_atomic_end counts the number of iterators requried for
marking overwrites - but journal replay never marks overwrites, so that
part was incorrect. And counting iterators for the key being inserted
should be unnecessary because we did that prior to the key being
inserted before it was first journalled.

This should fix an iterator overflow bug - the iterators for walking
overwrites were totally unneeded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Can't be holding read locks while taking write locks
Kent Overstreet [Sat, 5 Oct 2019 00:40:47 +0000 (20:40 -0400)]
bcachefs: Can't be holding read locks while taking write locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't allocate memory under mark_lock
Kent Overstreet [Fri, 4 Oct 2019 23:14:43 +0000 (19:14 -0400)]
bcachefs: Don't allocate memory under mark_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_extent_atomic_end() now traverses iter
Kent Overstreet [Fri, 4 Oct 2019 21:07:20 +0000 (17:07 -0400)]
bcachefs: bch2_extent_atomic_end() now traverses iter

This fixes a bug in io.c bch2_write_index_default() - it was missing the
traverse call, but bch2_extent_atomic_end returns an error now and can
just call it itself.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Factor out fs-common.c
Kent Overstreet [Wed, 2 Oct 2019 22:35:36 +0000 (18:35 -0400)]
bcachefs: Factor out fs-common.c

This refactoring makes the code easier to understand by separating the
bcachefs btree transactional code from the linux VFS code - but more
importantly, it's also to share code with the fuse port.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't use sha256 for siphash str hash key
Kent Overstreet [Fri, 4 Oct 2019 19:58:43 +0000 (15:58 -0400)]
bcachefs: Don't use sha256 for siphash str hash key

With the refactoring that's coming to add fuse support, we want
bch2_hash_info_init() to be cheaper so we don't have to rely on anything
cached besides the inode in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Only look up inode io opts in extents btree
Kent Overstreet [Fri, 4 Oct 2019 18:39:38 +0000 (14:39 -0400)]
bcachefs: Only look up inode io opts in extents btree

We currently don't have a way to propagate inode io opts to indirect
extents. This is a problem...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix deref of error pointer
Kent Overstreet [Fri, 4 Oct 2019 18:38:41 +0000 (14:38 -0400)]
bcachefs: Fix deref of error pointer

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_inode_peek()/bch2_inode_write()
Kent Overstreet [Tue, 1 Oct 2019 20:51:57 +0000 (16:51 -0400)]
bcachefs: bch2_inode_peek()/bch2_inode_write()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix undefined behaviour
Kent Overstreet [Wed, 2 Oct 2019 13:14:32 +0000 (09:14 -0400)]
bcachefs: Fix undefined behaviour

roundup_pow_of_two(0) is undefined

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix an error path
Kent Overstreet [Wed, 2 Oct 2019 04:29:37 +0000 (00:29 -0400)]
bcachefs: Fix an error path

It's possible to get -EIO in __btree_iter_traverse_all() after looping,
with orig_iter NULL.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix __bch2_buffered_write() returning -ENOMEM
Kent Overstreet [Tue, 1 Oct 2019 22:51:10 +0000 (18:51 -0400)]
bcachefs: Fix __bch2_buffered_write() returning -ENOMEM

When grab_cache_page_write_begin() fails but we did pin some pages, we
shouldn't return -ENOMEM, we should do a partial write.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Trust inode in btree over bch_inode_info
Kent Overstreet [Thu, 26 Sep 2019 03:11:41 +0000 (23:11 -0400)]
bcachefs: Trust inode in btree over bch_inode_info

This is the start of some refactoring work to make less code depend on
the linux VFS - here the inode cache - to make e.g. the fuse port
easier.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix counting iterators for reflink pointers
Kent Overstreet [Tue, 1 Oct 2019 20:29:17 +0000 (16:29 -0400)]
bcachefs: Fix counting iterators for reflink pointers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Rework btree iterator lifetimes
Kent Overstreet [Fri, 27 Sep 2019 02:21:39 +0000 (22:21 -0400)]
bcachefs: Rework btree iterator lifetimes

The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.

This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill deferred btree updates
Kent Overstreet [Sun, 22 Sep 2019 22:49:16 +0000 (18:49 -0400)]
bcachefs: Kill deferred btree updates

Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix for partial buffered writes
Kent Overstreet [Thu, 26 Sep 2019 23:09:08 +0000 (19:09 -0400)]
bcachefs: Fix for partial buffered writes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: BTREE_ITER_SLOTS isn't a type of btree iter
Kent Overstreet [Sun, 22 Sep 2019 23:35:12 +0000 (19:35 -0400)]
bcachefs: BTREE_ITER_SLOTS isn't a type of btree iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve error handling for for_each_btree_key_continue()
Kent Overstreet [Wed, 25 Sep 2019 19:57:56 +0000 (15:57 -0400)]
bcachefs: Improve error handling for for_each_btree_key_continue()

Change it to match for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Cleanup i_nlink handling
Kent Overstreet [Wed, 25 Sep 2019 20:19:52 +0000 (16:19 -0400)]
bcachefs: Cleanup i_nlink handling

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Trivial cleanup
Kent Overstreet [Wed, 25 Sep 2019 19:26:14 +0000 (15:26 -0400)]
bcachefs: Trivial cleanup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Convert a BUG_ON() to a warning
Kent Overstreet [Tue, 24 Sep 2019 17:33:11 +0000 (13:33 -0400)]
bcachefs: Convert a BUG_ON() to a warning

We shouldn't ever be writing past i_size - but, apparently there's still
a bug to track down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Count iterators for reflink_p overwrites correctly
Kent Overstreet [Sun, 22 Sep 2019 21:48:25 +0000 (17:48 -0400)]
bcachefs: Count iterators for reflink_p overwrites correctly

In order to avoid trying to allocate too many btree iterators,
bch2_extent_atomic_end() needs to count how many iterators are going to
be needed for insertions and overwrites - but we weren't counting the
iterators for deleting a reflink_v when the refcount goes to 0.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Drop unnecessary rcu_read_lock()
Kent Overstreet [Sat, 21 Sep 2019 20:30:15 +0000 (16:30 -0400)]
bcachefs: Drop unnecessary rcu_read_lock()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Update path microoptimizations
Kent Overstreet [Sat, 21 Sep 2019 19:29:34 +0000 (15:29 -0400)]
bcachefs: Update path microoptimizations

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Handle bio_iov_iter_get_pages() returning unaligned bio
Kent Overstreet [Sun, 22 Sep 2019 19:02:05 +0000 (15:02 -0400)]
bcachefs: Handle bio_iov_iter_get_pages() returning unaligned bio

If the user buffer isn't aligned to the filesystem block size, on a
large enough IO - where it won't fit into a single bio -
bio_iov_iter_get_pages() won't necessarily return a bio with the proper
alignment.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Drop unused arg to bch2_open_buckets_stop_dev()
Kent Overstreet [Fri, 20 Sep 2019 20:17:46 +0000 (16:17 -0400)]
bcachefs: Drop unused arg to bch2_open_buckets_stop_dev()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix validation of replicas entries
Kent Overstreet [Fri, 20 Sep 2019 18:28:35 +0000 (14:28 -0400)]
bcachefs: Fix validation of replicas entries

When an extent is erasure coded, we need to record a replicas entry to
indicate that data is present on the devices that extent has pointers to
- but nr_required should be 0, because it's erasure coded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add support for FALLOC_FL_INSERT_RANGE
Kent Overstreet [Sat, 7 Sep 2019 22:04:23 +0000 (18:04 -0400)]
bcachefs: Add support for FALLOC_FL_INSERT_RANGE

Somewhat tricky and ugly, because iterating over extents backwards is a
pain.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_btree_iter_peek_prev()
Kent Overstreet [Sat, 7 Sep 2019 21:17:21 +0000 (17:17 -0400)]
bcachefs: bch2_btree_iter_peek_prev()

Last of the basic operations for iterating forwards and backwards over
the btree: we now have
 - peek(), returns key >= iter->pos
 - next(), returns key >  iter->pos
 - peek_prev(), returns key <= iter->pos
 - prev(), returns key < iter->pos

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't write past eof
Kent Overstreet [Thu, 19 Sep 2019 22:05:04 +0000 (18:05 -0400)]
bcachefs: Don't write past eof

When converting from PAGE_SIZE to block_size, the .mkwrite path was
missed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Check for extents past eof correctly
Kent Overstreet [Thu, 19 Sep 2019 20:20:38 +0000 (16:20 -0400)]
bcachefs: Check for extents past eof correctly

bcachefs used to work mostly in terms of PAGE_SIZE, not block size at
the vfs level - but that has since been fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Debug assertion improvements
Kent Overstreet [Thu, 19 Sep 2019 20:07:41 +0000 (16:07 -0400)]
bcachefs: Debug assertion improvements

Call bch2_btree_iter_verify from bch2_btree_node_iter_fix(); also verify
in btree_iter_peek_uptodate() that iter->k matches what's in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add missing bch2_btree_node_iter_fix() call
Kent Overstreet [Thu, 19 Sep 2019 20:01:32 +0000 (16:01 -0400)]
bcachefs: Add missing bch2_btree_node_iter_fix() call

Any time we're modifying what's in the btree, iterators potentially have
to be updated - this one was exposed by the reflink code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Avoid deadlocking on the allocator
Kent Overstreet [Wed, 18 Sep 2019 23:33:12 +0000 (19:33 -0400)]
bcachefs: Avoid deadlocking on the allocator

The allocator needs to make sure there's buckets available on the
RESERVE_NONE freelist if at all possible - otherwise foreground IO will
get stuck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: More btree iter improvements
Kent Overstreet [Sat, 7 Sep 2019 23:19:57 +0000 (19:19 -0400)]
bcachefs: More btree iter improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve btree_iter_pos_in_node()
Kent Overstreet [Fri, 13 Sep 2019 18:50:02 +0000 (14:50 -0400)]
bcachefs: Improve btree_iter_pos_in_node()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Debug code improvements
Kent Overstreet [Sat, 14 Sep 2019 14:47:14 +0000 (10:47 -0400)]
bcachefs: Debug code improvements

.key_debugcheck no longer needs to take a pointer to the btree node

Also, try to make sure wherever we're inserting or modifying keys in the
btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add missing bch2_btree_node_iter_fix() calls
Kent Overstreet [Sat, 14 Sep 2019 14:45:46 +0000 (10:45 -0400)]
bcachefs: Add missing bch2_btree_node_iter_fix() calls

With multiple iterators, if another iterator points to the key being
modified, we need to call bch2_btree_node_iter_fix() to re-unpack the
key into the iter->k

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Optimize calls to bch2_btree_iter_traverse()
Kent Overstreet [Sun, 8 Sep 2019 18:00:12 +0000 (14:00 -0400)]
bcachefs: Optimize calls to bch2_btree_iter_traverse()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a typo
Kent Overstreet [Fri, 13 Sep 2019 18:43:34 +0000 (14:43 -0400)]
bcachefs: Fix a typo

_iter, not iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improved bch2_fcollapse()
Kent Overstreet [Mon, 22 Jul 2019 17:37:02 +0000 (13:37 -0400)]
bcachefs: Improved bch2_fcollapse()

Move extents instead of copying them - this way, we can iterate over
only live extents, not the entire keyspace. Also, this means we can
mostly skip running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: __bch2_btree_node_iter_fix() improvements
Kent Overstreet [Sat, 7 Sep 2019 23:17:40 +0000 (19:17 -0400)]
bcachefs: __bch2_btree_node_iter_fix() improvements

Being more rigorous about noting when the key the iterator currently
poins to has changed - which should also give us a nice performance
improvement due to not having to check if we have to skip other bsets
backwards as much.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Do updates in order they were queued up in
Kent Overstreet [Sat, 7 Sep 2019 18:16:00 +0000 (14:16 -0400)]
bcachefs: Do updates in order they were queued up in

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Drop trans arg to bch2_extent_atomic_end()
Kent Overstreet [Sat, 7 Sep 2019 22:03:56 +0000 (18:03 -0400)]
bcachefs: Drop trans arg to bch2_extent_atomic_end()

Just for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: data move path should not be trying to move reflink_p keys
Kent Overstreet [Sat, 7 Sep 2019 20:13:20 +0000 (16:13 -0400)]
bcachefs: data move path should not be trying to move reflink_p keys

This was spotted when the move_extent() path tried to allocate a bio for
a reflink_p extent, but adding pages to the bio failed because we
overflowed bi_max_vecs. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a null ptr deref
Kent Overstreet [Sat, 7 Sep 2019 17:16:41 +0000 (13:16 -0400)]
bcachefs: Fix a null ptr deref

rbio->c wasn't being initialized in the move path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Flush fsck errors when looping in btree gc
Kent Overstreet [Sat, 7 Sep 2019 16:42:27 +0000 (12:42 -0400)]
bcachefs: Flush fsck errors when looping in btree gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Rebalance now adds replicas if needed
Kent Overstreet [Sat, 7 Sep 2019 16:39:59 +0000 (12:39 -0400)]
bcachefs: Rebalance now adds replicas if needed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill BTREE_INSERT_NOMARK_INSERT
Kent Overstreet [Thu, 5 Sep 2019 17:37:50 +0000 (13:37 -0400)]
bcachefs: Kill BTREE_INSERT_NOMARK_INSERT

Was dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix BTREE_INSERT_NOMARK_OVERWRITES
Kent Overstreet [Thu, 29 Aug 2019 17:29:31 +0000 (13:29 -0400)]
bcachefs: Fix BTREE_INSERT_NOMARK_OVERWRITES

bch2_mark_update() was correct, but bch2_trans_mark_update() wasn't
respecting BTREE_INSERT_NOMARK_OVERWRITES - key marking/triggers really
need to be cleaned up.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve pointer marking checks and error messages
Kent Overstreet [Thu, 29 Aug 2019 15:34:01 +0000 (11:34 -0400)]
bcachefs: Improve pointer marking checks and error messages

Importantly, we don't want to use bch2_fs_inconsistent_on() for errors
that fsck can repair, becuase that will just put us in RO mode and
prevent fsck from actually fixing stuff. Probably want to get rid of it
in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Switch reconstruct_alloc to a mount option
Kent Overstreet [Wed, 28 Aug 2019 17:20:31 +0000 (13:20 -0400)]
bcachefs: Switch reconstruct_alloc to a mount option

Right now this is the only way of repairing bucket gens in the future

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix fiemap (again)
Kent Overstreet [Wed, 28 Aug 2019 16:41:45 +0000 (12:41 -0400)]
bcachefs: Fix fiemap (again)

when iterating over reflink pointers, we use the key we just emitted to
set the iterator position - which means we have to be setting the key's
inode field as well

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix error message on bucket overflow
Kent Overstreet [Wed, 28 Aug 2019 16:11:39 +0000 (12:11 -0400)]
bcachefs: Fix error message on bucket overflow

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Reflink pointers also have to be remarked if split in journal replay
Kent Overstreet [Wed, 28 Aug 2019 16:05:17 +0000 (12:05 -0400)]
bcachefs: Reflink pointers also have to be remarked if split in journal replay

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fixes for replicas tracking
Kent Overstreet [Thu, 22 Aug 2019 17:20:38 +0000 (13:20 -0400)]
bcachefs: Fixes for replicas tracking

The continue statement in bch2_trans_mark_extent() was wrong - by
bailing out early, we'd be constructing the wrong replicas list to
update. Also, the assertion in update_replicas() was wrong - due to
rounding with compressed extents, it is possible for sectors to be 0
sometimes.

Also, change extent_to_replicas() in replicas.c to match the replicas
list we construct in buckets.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Refactor bch2_alloc_write()
Kent Overstreet [Tue, 27 Aug 2019 21:45:42 +0000 (17:45 -0400)]
bcachefs: Refactor bch2_alloc_write()

Major simplification - gets rid of the need for marking buckets as
dirty, instead we write buckets if the in memory mark is different from
what's in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Trust in memory bucket mark
Kent Overstreet [Tue, 27 Aug 2019 21:34:03 +0000 (17:34 -0400)]
bcachefs: Trust in memory bucket mark

This fixes a bug in the journal replay -> extent_replay_key ->
split_compressed path, when we do an update that changes alloc info but
the alloc info in the btree isn't up to date yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Optimize fiemap
Kent Overstreet [Thu, 22 Aug 2019 20:12:28 +0000 (16:12 -0400)]
bcachefs: Optimize fiemap

Reflink caused fiemap performance to regress badly - this gets us back
to where we were.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add a hint for allocating new stripes
Kent Overstreet [Thu, 22 Aug 2019 21:09:16 +0000 (17:09 -0400)]
bcachefs: Add a hint for allocating new stripes

This way we aren't doing a full linear scan every time we create a new
stripe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Inline some fast paths
Kent Overstreet [Thu, 22 Aug 2019 20:07:37 +0000 (16:07 -0400)]
bcachefs: Inline some fast paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Don't flush journal from bch2_vfs_write_inode()
Kent Overstreet [Thu, 22 Aug 2019 20:30:55 +0000 (16:30 -0400)]
bcachefs: Don't flush journal from bch2_vfs_write_inode()

It's only updating timestamps, so this doubly doesn't make sense. fsync
will flush the journal, if necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a spurious gcc warning
Kent Overstreet [Thu, 22 Aug 2019 20:34:59 +0000 (16:34 -0400)]
bcachefs: Fix a spurious gcc warning

*i is used as an output parameter, but gcc isn't noticing that. Oh well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Handle ec_buf not being page aligned when allocating bio
Kent Overstreet [Thu, 22 Aug 2019 20:41:50 +0000 (16:41 -0400)]
bcachefs: Handle ec_buf not being page aligned when allocating bio

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Update more code for KEY_TYPE_reflink_v
Kent Overstreet [Thu, 22 Aug 2019 20:23:10 +0000 (16:23 -0400)]
bcachefs: Update more code for KEY_TYPE_reflink_v

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Re-enable bkey_debugcheck() in the extent update path
Kent Overstreet [Thu, 22 Aug 2019 15:17:04 +0000 (11:17 -0400)]
bcachefs: Re-enable bkey_debugcheck() in the extent update path

Also, move other update path checks to where they actually check all the
updates (after triggers have run)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Check alignment in write path
Kent Overstreet [Thu, 22 Aug 2019 00:16:42 +0000 (20:16 -0400)]
bcachefs: Check alignment in write path

Also - fix alignment in bch2_set_page_dirty()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix faulty assertion
Kent Overstreet [Thu, 22 Aug 2019 03:52:10 +0000 (23:52 -0400)]
bcachefs: Fix faulty assertion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_bkey_narrow_crcs()
Kent Overstreet [Wed, 21 Aug 2019 22:55:07 +0000 (18:55 -0400)]
bcachefs: Fix bch2_bkey_narrow_crcs()

We have to reinitialize ptrs whenever we do something that changes them.
Regression from when the code was converted to be generic across all
keys with pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_sort_repack_merge()
Kent Overstreet [Wed, 21 Aug 2019 22:35:15 +0000 (18:35 -0400)]
bcachefs: Fix bch2_sort_repack_merge()

bch2_bkey_normalize() modifies the value, and we were modifying the
original value in the src btree node - but, we're called without a write
lock held on the src node. Oops...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Reflink
Kent Overstreet [Fri, 16 Aug 2019 13:59:56 +0000 (09:59 -0400)]
bcachefs: Reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Refactor bch2_extent_trim_atomic() for reflink
Kent Overstreet [Fri, 16 Aug 2019 13:58:07 +0000 (09:58 -0400)]
bcachefs: Refactor bch2_extent_trim_atomic() for reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Move node iterator fixup to extent_bset_insert()
Kent Overstreet [Tue, 20 Aug 2019 21:46:22 +0000 (17:46 -0400)]
bcachefs: Move node iterator fixup to extent_bset_insert()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_btree_node_iter_fix()
Kent Overstreet [Tue, 20 Aug 2019 21:43:47 +0000 (17:43 -0400)]
bcachefs: Fix bch2_btree_node_iter_fix()

bch2_btree_node_iter_prev_all() depends on an invariant that wasn't
being maintained for extent leaf nodes - specifically, the node iterator
may not have advanced past any keys that compare after the key the node
iterator points to.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_btree_node_iter_prev_filter()
Kent Overstreet [Mon, 19 Aug 2019 17:43:01 +0000 (13:43 -0400)]
bcachefs: Fix bch2_btree_node_iter_prev_filter()

bch2_btree_node_iter_prev_filter() tried to be smart about iterating
backwards when skipping over whiteouts/discards - but unfortunately,
doing so can leave the node iterator in an inconsistent state; the sane
solution is to just always iterate backwards one key at a time.

But we compact btree nodes when more than a quarter of the keys are
whiteouts/discards, so the optimization wasn't buying us that much
anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix __bch2_btree_iter_peek_slot_extents()
Kent Overstreet [Sat, 17 Aug 2019 19:54:48 +0000 (15:54 -0400)]
bcachefs: Fix __bch2_btree_iter_peek_slot_extents()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improved debug checks
Kent Overstreet [Sat, 17 Aug 2019 19:17:09 +0000 (15:17 -0400)]
bcachefs: Improved debug checks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Rework calling convention for marking overwrites
Kent Overstreet [Fri, 9 Aug 2019 17:01:10 +0000 (13:01 -0400)]
bcachefs: Rework calling convention for marking overwrites

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>