linux-block.git
11 months agobcachefs: Use struct_size()
Christophe JAILLET [Sun, 1 Oct 2023 07:13:54 +0000 (09:13 +0200)]
bcachefs: Use struct_size()

Use struct_size() instead of hand writing it.
This is less verbose and more robust.

While at it, prepare for the coming implementation by GCC and Clang of the
__counted_by attribute. Flexible array members annotated with __counted_by
can have their accesses bounds-checked at run-time checking via
CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for
strcpy/memcpy-family functions).

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Correctly initialize new buckets on device resize
Kent Overstreet [Thu, 28 Sep 2023 21:57:21 +0000 (17:57 -0400)]
bcachefs: Correctly initialize new buckets on device resize

bch2_dev_resize() was never updated for the allocator rewrite with
persistent freelists, and it wasn't noticed because the tests weren't
running fsck - oops.

Fix this by running bch2_dev_freespace_init() for the new buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix another smatch complaint
Kent Overstreet [Thu, 28 Sep 2023 04:54:12 +0000 (00:54 -0400)]
bcachefs: Fix another smatch complaint

This should be harmless, but initialize last_seq anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Use strsep() in split_devs()
Kent Overstreet [Thu, 28 Sep 2023 04:50:27 +0000 (00:50 -0400)]
bcachefs: Use strsep() in split_devs()

Minor refactoring to fix a smatch complaint.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add iops fields to bch_member
Hunter Shaffer [Mon, 25 Sep 2023 04:46:28 +0000 (00:46 -0400)]
bcachefs: Add iops fields to bch_member

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1
Hunter Shaffer [Mon, 25 Sep 2023 04:06:32 +0000 (00:06 -0400)]
bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: New superblock section members_v2
Hunter Shaffer [Mon, 25 Sep 2023 04:02:56 +0000 (00:02 -0400)]
bcachefs: New superblock section members_v2

members_v2 has dynamically resizable entries so that we can extend
bch_member. The members can no longer be accessed with simple array
indexing Instead members_v2_get is used to find a member's exact
location within the array and returns a copy of that member.
Alternatively member_v2_get_mut retrieves a mutable point to a member.

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add new helper to retrieve bch_member from sb
Hunter Shaffer [Mon, 25 Sep 2023 03:55:37 +0000 (23:55 -0400)]
bcachefs: Add new helper to retrieve bch_member from sb

Prep work for introducing bch_sb_field_members_v2 - introduce new
helpers that will check for members_v2 if it exists, otherwise using v1

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bucket_lock() is now a sleepable lock
Kent Overstreet [Wed, 27 Sep 2023 23:51:29 +0000 (19:51 -0400)]
bcachefs: bucket_lock() is now a sleepable lock

fsck_err() may sleep - it takes a mutex and may allocate memory, so
bucket_lock() needs to be a sleepable lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: fix crc32c checksum merge byte order problem
Brian Foster [Wed, 27 Sep 2023 11:23:37 +0000 (07:23 -0400)]
bcachefs: fix crc32c checksum merge byte order problem

An fsstress task on a big endian system (s390x) quickly produces a
bunch of CRC errors in the system logs. Most of these are related to
the narrow CRCs path, but the fundamental problem can be reduced to
a single write and re-read (after dropping caches) of a previously
merged extent.

The key merge path that handles extent merges eventually calls into
bch2_checksum_merge() to combine the CRCs of the associated extents.
This code attempts to avoid a byte order swap by feeding the le64
values into the crc32c code, but the latter casts the resulting u64
value down to a u32, which truncates the high bytes where the actual
crc value ends up. This results in a CRC value that does not change
(since it is merged with a CRC of 0), and checksum failures ensue.

Fix the checksum merge code to swap to cpu byte order on the
boundaries to the external crc code such that any value casting is
handled properly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_inode_delete_keys()
Kent Overstreet [Wed, 27 Sep 2023 18:44:56 +0000 (14:44 -0400)]
bcachefs: Fix bch2_inode_delete_keys()

bch2_inode_delete_keys() was using BTREE_ITER_NOT_EXTENTS, on the
assumption that it would never need to split extents.

But that caused a race with extents being split by other threads -
specifically, the data move path. Extents iterators have the iterator
position pointing to the start of the extent, which avoids the race.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Make btree root read errors recoverable
Kent Overstreet [Tue, 26 Sep 2023 21:21:21 +0000 (17:21 -0400)]
bcachefs: Make btree root read errors recoverable

The entire btree will be lost, but that is better than the entire
filesystem not being recoverable.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fall back to requesting passphrase directly
Kent Overstreet [Tue, 26 Sep 2023 21:20:39 +0000 (17:20 -0400)]
bcachefs: Fall back to requesting passphrase directly

We can only do this in userspace, unfortunately - but kernel keyrings
have never seemed to worked reliably, this is a useful fallback.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix looping around bch2_propagate_key_to_snapshot_leaves()
Kent Overstreet [Tue, 26 Sep 2023 21:11:23 +0000 (17:11 -0400)]
bcachefs: Fix looping around bch2_propagate_key_to_snapshot_leaves()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errors
Kent Overstreet [Tue, 26 Sep 2023 20:02:06 +0000 (16:02 -0400)]
bcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errors

These errors aren't actual errors, and should never be printed - do this
in the common helpers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Silence transaction restart error message
Kent Overstreet [Tue, 26 Sep 2023 05:39:25 +0000 (01:39 -0400)]
bcachefs: Silence transaction restart error message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: More assertions for nocow locking
Kent Overstreet [Sun, 24 Sep 2023 20:25:06 +0000 (16:25 -0400)]
bcachefs: More assertions for nocow locking

 - assert in shutdown path that no nocow locks are held
 - check for overflow when taking nocow locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: nocow locking: Fix lock leak
Kent Overstreet [Mon, 25 Sep 2023 01:05:50 +0000 (21:05 -0400)]
bcachefs: nocow locking: Fix lock leak

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fixes for building in userspace
Kent Overstreet [Sat, 23 Sep 2023 23:07:16 +0000 (19:07 -0400)]
bcachefs: Fixes for building in userspace

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Ignore unknown mount options
Kent Overstreet [Sat, 23 Sep 2023 22:41:51 +0000 (18:41 -0400)]
bcachefs: Ignore unknown mount options

This makes mount option handling consistent with other filesystems -
options may be handled at different layers, so an option we don't know
about might not be intended for us.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Always check for invalid bkeys in main commit path
Kent Overstreet [Sat, 23 Sep 2023 21:45:03 +0000 (17:45 -0400)]
bcachefs: Always check for invalid bkeys in main commit path

Previously, we would check for invalid bkeys at transaction commit time,
but only if CONFIG_BCACHEFS_DEBUG=y.

This check is important enough to always be on - it appears there's been
corruption making it into the journal that would have been caught by it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Make sure to initialize equiv when creating new snapshots
Kent Overstreet [Sat, 23 Sep 2023 20:55:03 +0000 (16:55 -0400)]
bcachefs: Make sure to initialize equiv when creating new snapshots

Previously, equiv was set in the snapshot deletion path, which is where
it's needed - equiv, for snapshot ID equivalence classes, would ideally
be a private data structure to the snapshot deletion path.

But if a new snapshot is created while snapshot deletion is running,
move_key_to_correct_snapshot() moves a key to snapshot id 0 - oops.

Fixes: https://github.com/koverstreet/bcachefs/issues/593
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a null ptr deref in bch2_get_alloc_in_memory_pos()
Kent Overstreet [Fri, 22 Sep 2023 18:19:52 +0000 (14:19 -0400)]
bcachefs: Fix a null ptr deref in bch2_get_alloc_in_memory_pos()

Reported-by: smatch
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix changing durability using sysfs
Torge Matthies [Thu, 21 Sep 2023 21:25:54 +0000 (23:25 +0200)]
bcachefs: Fix changing durability using sysfs

Signed-off-by: Torge Matthies <openglfreak@googlemail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: initial freeze/unfreeze support
Brian Foster [Fri, 15 Sep 2023 12:51:54 +0000 (08:51 -0400)]
bcachefs: initial freeze/unfreeze support

Initial support for the vfs superblock freeze and unfreeze
operations. Superblock freeze occurs in stages, where the vfs
attempts to quiesce high level write operations, page faults, fs
internal operations, and then finally calls into the filesystem for
any last stage steps (i.e. log flushing, etc.) before marking the
superblock frozen.

The majority of write paths are covered by freeze protection (i.e.
sb_start_write() and friends) in higher level common code, with the
exception of the fs-internal SB_FREEZE_FS stage (i.e.
sb_start_intwrite()). This typically maps to active filesystem
transactions in a manner that allows the vfs to implement a barrier
of internal fs operations during the freeze sequence. This is not a
viable model for bcachefs, however, because it utilizes transactions
both to populate the journal as well as to perform journal reclaim.
This means that mapping intwrite protection to transaction lifecycle
or transaction commit is likely to deadlock freeze, as quiescing the
journal requires transactional operations blocked by the final stage
of freeze.

The flipside of this is that bcachefs does already maintain its own
internal sets of write references for similar purposes, currently
utilized for transitions from read-write to read-only mode. Since
this largely mirrors the high level sequence involved with freeze,
we can simply invoke this mechanism in the freeze callback to fully
quiesce the filesystem in the final stage. This means that while the
SB_FREEZE_FS stage is essentially a no-op, the ->freeze_fs()
callback that immediately follows begins by performing effectively
the same step by quiescing all internal write references.

One caveat to this approach is that without integration of internal
freeze protection, write operations gated on internal write refs
will fail with an internal -EROFS error rather than block on
acquiring freeze protection. IOW, this is roughly equivalent to only
having support for sb_start_intwrite_trylock(), and not the blocking
variant. Many of these paths already use non-blocking internal write
refs and so would map into an sb_start_intwrite_trylock() anyways.
The only instance of this I've been able to uncover that doesn't
explicitly rely on a higher level non-blocking write ref is the
bch2_rbio_narrow_crcs() path, which updates crcs in certain read
cases, and Kent has pointed out isn't critical if it happens to fail
due to read-only status.

Given that, implement basic freeze support as described above and
leave tighter integration with internal freeze protection as a
possible future enhancement. There are multiple potential ideas
worth exploring here. For example, we could implement a multi-stage
freeze callback that might allow bcachefs to quiesce its internal
write references without deadlocks, we could integrate intwrite
protection with bcachefs' internal write references somehow or
another, or perhaps consider implementing blocking support for
internal write refs to be used specifically for freeze, etc. In the
meantime, this enables functional freeze support and the associated
test coverage that comes with it.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: More minor smatch fixes
Kent Overstreet [Wed, 20 Sep 2023 05:32:20 +0000 (01:32 -0400)]
bcachefs: More minor smatch fixes

 - fix a few uninitialized return values
 - return a proper error code in lookup_lostfound()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Minor bch2_btree_node_get() smatch fixes
Kent Overstreet [Wed, 20 Sep 2023 05:31:00 +0000 (01:31 -0400)]
bcachefs: Minor bch2_btree_node_get() smatch fixes

 - it's no longer possible for trans to be NULL
 - also, move "wait for read to complete" to the slowpath,
   __bch2_btree_node_get().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: snapshots: Use kvfree_rcu_mightsleep()
Kent Overstreet [Wed, 20 Sep 2023 05:20:40 +0000 (01:20 -0400)]
bcachefs: snapshots: Use kvfree_rcu_mightsleep()

kvfree_rcu() was renamed - not removed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix strndup_user() error checking
Kent Overstreet [Wed, 20 Sep 2023 05:19:53 +0000 (01:19 -0400)]
bcachefs: Fix strndup_user() error checking

strndup_user() returns an error pointer, not NULL.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: drop journal lock before calling journal_write
Kent Overstreet [Wed, 20 Sep 2023 02:36:30 +0000 (22:36 -0400)]
bcachefs: drop journal lock before calling journal_write

bch2_journal_write() expects process context, it takes journal_lock as
needed.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_ioctl_disk_resize_journal(): check for integer truncation
Kent Overstreet [Wed, 20 Sep 2023 02:26:18 +0000 (22:26 -0400)]
bcachefs: bch2_ioctl_disk_resize_journal(): check for integer truncation

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix error checks in bch2_chacha_encrypt_key()
Kent Overstreet [Wed, 20 Sep 2023 02:20:25 +0000 (22:20 -0400)]
bcachefs: Fix error checks in bch2_chacha_encrypt_key()

crypto_alloc_sync_skcipher() returns an ERR_PTR, not NULL.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix an overflow check
Kent Overstreet [Wed, 20 Sep 2023 02:18:39 +0000 (22:18 -0400)]
bcachefs: Fix an overflow check

When bucket sector counts were changed from u16s to u32s, a few things
were missed. This fixes an overflow check, and a truncation that
prevented the overflow check from firing.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix copy_to_user() usage in flush_buf()
Kent Overstreet [Tue, 19 Sep 2023 21:09:22 +0000 (17:09 -0400)]
bcachefs: Fix copy_to_user() usage in flush_buf()

copy_to_user() returns the number of bytes successfully copied - not an
errcode.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: fix race between journal entry close and pin set
Brian Foster [Fri, 15 Sep 2023 12:51:53 +0000 (08:51 -0400)]
bcachefs: fix race between journal entry close and pin set

bcachefs freeze testing via fstests generic/390 occasionally
reproduces the following BUG from bch2_fs_read_only():

  BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));

This indicates that one or more dirty key cache keys still exist
after the attempt to flush and quiesce the fs. The sequence that
leads to this problem actually occurs on unfreeze (ro->rw), and
looks something like the following:

- Task A begins a transaction commit and acquires journal_res for
  the current seq. This transaction intends to perform key cache
  insertion.
- Task B begins a bch2_journal_flush() via bch2_sync_fs(). This ends
  up in journal_entry_want_write(), which closes the current journal
  entry and drops the reference to the pin list created on entry open.
  The pin put pops the front of the journal via fast reclaim since the
  reference count has dropped to 0.
- Task A attempts to set the journal pin for the associated cached
  key, but bch2_journal_pin_set() skips the pin insert because the
  seq of the transaction reservation is behind the front of the pin
  list fifo.

The end result is that the pin associated with the cached key is not
added, which prevents a subsequent reclaim from processing the key
and thus leaves it dangling at freeze time. The fundamental cause of
this problem is that the front of the journal is allowed to pop
before a transaction with outstanding reservation on the associated
journal seq is able to add a pin. The count for the pin list
associated with the seq drops to zero and is prematurely reclaimed
as a result.

The logical fix for this problem lies in how the journal buffer is
managed in similar scenarios where the entry might have been closed
before a transaction with outstanding reservations happens to be
committed.

When a journal entry is opened, the current sequence number is
bumped, the associated pin list is initialized with a reference
count of 1, and the journal buffer reference count is bumped (via
journal_state_inc()). When a journal reservation is acquired, the
reservation also acquires a reference on the associated buffer. If
the journal entry is closed in the meantime, it drops both the pin
and buffer references held by the open entry, but the buffer still
has references held by outstanding reservation. After the associated
transaction commits, the reservation release drops the associated
buffer references and the buffer is written out once the reference
count has dropped to zero.

The fundamental problem here is that the lifecycle of the pin list
reference held by an open journal entry is too short to cover the
processing of transactions with outstanding reservations. The
simplest way to address this is to expand the pin list reference to
the lifecycle of the buffer vs. the shorter lifecycle of the open
journal entry. This ensures the pin list for a seq with outstanding
reservation cannot be popped and reclaimed before all outstanding
reservations have been released, even if the associated journal
entry has been closed for further reservations.

Move the pin put from journal entry close to where final processing
of the journal buffer occurs. Create a duplicate helper to cover the
case where the caller doesn't already hold the journal lock. This
allows generic/390 to pass reliably.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: prepare journal buf put to handle pin put
Brian Foster [Fri, 15 Sep 2023 12:51:52 +0000 (08:51 -0400)]
bcachefs: prepare journal buf put to handle pin put

bcachefs freeze testing has uncovered some raciness between journal
entry open/close and pin list reference count management. The
details of the problem are described in a separate patch. In
preparation for the associated fix, refactor the journal buffer put
path a bit to allow it to eventually handle dropping the pin list
reference currently held by an open journal entry.

Retain the journal write dispatch helper since the closure code is
inlined and we don't want to increase the amount of inline code in
the transaction commit path, but rename the function to reflect
the purpose of final processing of the journal buffer.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: refactor pin put helpers
Brian Foster [Fri, 15 Sep 2023 12:51:51 +0000 (08:51 -0400)]
bcachefs: refactor pin put helpers

We have a couple journal pin put helpers to handle cases where the
journal lock is already held or not. Refactor the helpers to lock
and reclaim from the highest level and open code the reclaim from
the one caller of the internal variant. The latter call will be
moved into the journal buf release helper in a later patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: snapshot: Add missing assignment in bch2_delete_dead_snapshots()
Dan Carpenter [Fri, 15 Sep 2023 12:56:37 +0000 (15:56 +0300)]
bcachefs: snapshot: Add missing assignment in bch2_delete_dead_snapshots()

This code accidentally left out the "ret = " assignment so the errors
from for_each_btree_key2() are not checked.

Fixes: 53534482a250 ("bcachefs: for_each_btree_key2()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: fs-ioctl: Fix copy_to_user() error code
Dan Carpenter [Fri, 15 Sep 2023 12:55:23 +0000 (15:55 +0300)]
bcachefs: fs-ioctl: Fix copy_to_user() error code

The copy_to_user() function returns the number of bytes that it wasn't
able to copy but we want to return -EFAULT to the user.

Fixes: e0750d947352 ("bcachefs: Initial commit")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: acl: Add missing check in bch2_acl_chmod()
Dan Carpenter [Fri, 15 Sep 2023 12:56:07 +0000 (15:56 +0300)]
bcachefs: acl: Add missing check in bch2_acl_chmod()

The "ret = bkey_err(k);" assignment was accidentally left out so the
call to bch2_btree_iter_peek_slot() is not checked for errors.

Fixes: 53306e096d91 ("bcachefs: Always check for transaction restarts")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: acl: Uninitialized variable in bch2_acl_chmod()
Dan Carpenter [Fri, 15 Sep 2023 12:55:40 +0000 (15:55 +0300)]
bcachefs: acl: Uninitialized variable in bch2_acl_chmod()

The clean up code at the end of the function uses "acl" so it needs
to be initialized to NULL.

Fixes: 53306e096d91 ("bcachefs: Always check for transaction restarts")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix -Wself-assign
Nick Desaulniers [Tue, 19 Sep 2023 20:38:31 +0000 (13:38 -0700)]
bcachefs: Fix -Wself-assign

Fixes the following observed error reported by Nathan on IRC.

  fs/bcachefs/io_misc.c:467:6: error: explicitly assigning value of
  variable of type 'int' to itself [-Werror,-Wself-assign]
    467 |         ret = ret;
        |         ~~~ ^ ~~~

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Remove duplicate include
Jiapeng Chong [Thu, 14 Sep 2023 06:05:54 +0000 (14:05 +0800)]
bcachefs: Remove duplicate include

./fs/bcachefs/btree_update.h: journal.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6573
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: fix error checking in bch2_fs_alloc()
Dan Carpenter [Thu, 14 Sep 2023 09:47:44 +0000 (12:47 +0300)]
bcachefs: fix error checking in bch2_fs_alloc()

There is a typo here where it uses ";" instead of "?:".  The result is
that bch2_fs_fs_io_direct_init() is called unconditionally and the errors
from it are not checked.

Fixes: 0060c68159fc ("bcachefs: Split up fs-io.[ch]")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Brian Foster <bfoster@redhat.com>
11 months agobcachefs: chardev: fix an integer overflow (32 bit only)
Dan Carpenter [Thu, 14 Sep 2023 14:59:10 +0000 (17:59 +0300)]
bcachefs: chardev: fix an integer overflow (32 bit only)

On 32 bit systems, "sizeof(*arg) + replica_entries_bytes" can have an
integer overflow leading to memory corruption.  Use size_add() to
prevent this.

Fixes: b44dd3797034 ("bcachefs: Redo filesystem usage ioctls")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: chardev: return -EFAULT if copy_to_user() fails
Dan Carpenter [Thu, 14 Sep 2023 14:58:07 +0000 (17:58 +0300)]
bcachefs: chardev: return -EFAULT if copy_to_user() fails

The copy_to_user() function returns the number of bytes remaining but
we want to return -EFAULT to the user.

Fixes: e0750d947352 ("bcachefs: Initial commit")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Change bucket_lock() to use bit_spin_lock()
Kent Overstreet [Thu, 14 Sep 2023 00:33:06 +0000 (20:33 -0400)]
bcachefs: Change bucket_lock() to use bit_spin_lock()

bucket_lock() previously open coded a spinlock, because we need to cram
a spinlock into a single byte.

But it turns out not all archs support xchg() on a single byte; since we
need struct bucket to be small, this means we have to play fun games
with casts and ifdefs for endianness.

This fixes building on 32 bit arm, and likely other architectures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill other unreachable() uses
Kent Overstreet [Thu, 14 Sep 2023 00:39:31 +0000 (20:39 -0400)]
bcachefs: Kill other unreachable() uses

Per previous commit, bare unreachable() considered harmful, convert to
BUG()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Remove undefined behavior in bch2_dev_buckets_reserved()
Josh Poimboeuf [Wed, 13 Sep 2023 21:08:29 +0000 (23:08 +0200)]
bcachefs: Remove undefined behavior in bch2_dev_buckets_reserved()

In general it's a good idea to avoid using bare unreachable() because it
introduces undefined behavior in compiled code.  In this case it even
confuses GCC into emitting an empty unused
bch2_dev_buckets_reserved.part.0() function.

Use BUG() instead, which is nice and defined.  While in theory it should
never trigger, if something were to go awry and the BCH_WATERMARK_NR
case were to actually hit, the failure mode is much more robust.

Fixes the following warnings:

  vmlinux.o: warning: objtool: bch2_bucket_alloc_trans() falls through to next function bch2_reset_alloc_cursors()
  vmlinux.o: warning: objtool: bch2_dev_buckets_reserved.part.0() is missing an ELF size annotation

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Remove a redundant and harmless bch2_free_super() call
Christophe JAILLET [Wed, 13 Sep 2023 16:44:09 +0000 (18:44 +0200)]
bcachefs: Remove a redundant and harmless bch2_free_super() call

Remove a redundant call to bch2_free_super().

This is harmless because bch2_free_super() has a memset() at its end. So
a second call would only lead to from kfree(NULL).

Remove the redundant call and only rely on the error handling path.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix use-after-free in bch2_dev_add()
Christophe JAILLET [Wed, 13 Sep 2023 16:44:08 +0000 (18:44 +0200)]
bcachefs: Fix use-after-free in bch2_dev_add()

If __bch2_dev_attach_bdev() fails, bch2_dev_free() is called twice.
Once here and another time in the error handling path.

This leads to several use-after-free.

Remove the redundant call and only rely on the error handling path.

Fixes: 6a44735653d4 ("bcachefs: Improved superblock-related error messages")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: add module description to fix modpost warning
Brian Foster [Wed, 13 Sep 2023 14:14:30 +0000 (10:14 -0400)]
bcachefs: add module description to fix modpost warning

modpost produces the following warning:

WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/bcachefs.o

Add a module description for bcachefs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Heap allocate btree_trans
Kent Overstreet [Tue, 12 Sep 2023 21:16:02 +0000 (17:16 -0400)]
bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix W=12 build errors
Kent Overstreet [Tue, 12 Sep 2023 22:41:22 +0000 (18:41 -0400)]
bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Remove unneeded semicolon
Yang Li [Wed, 13 Sep 2023 00:57:56 +0000 (08:57 +0800)]
bcachefs: Remove unneeded semicolon

./fs/bcachefs/btree_gc.c:1249:2-3: Unneeded semicolon
./fs/bcachefs/btree_gc.c:1521:2-3: Unneeded semicolon
./fs/bcachefs/btree_gc.c:1575:2-3: Unneeded semicolon
./fs/bcachefs/counters.c:46:2-3: Unneeded semicolon

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Add a missing prefetch include
Kent Overstreet [Tue, 12 Sep 2023 22:41:09 +0000 (18:41 -0400)]
bcachefs: Add a missing prefetch include

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix -Wcompare-distinct-pointer-types in bch2_copygc_get_buckets()
Nathan Chancellor [Tue, 12 Sep 2023 19:15:44 +0000 (12:15 -0700)]
bcachefs: Fix -Wcompare-distinct-pointer-types in bch2_copygc_get_buckets()

When building bcachefs for 32-bit ARM, there is a warning when using
max() to compare an expression involving 'size_t' with an 'unsigned
long' literal:

  fs/bcachefs/movinggc.c:159:21: error: comparison of distinct pointer types ('typeof (16UL) *' (aka 'unsigned long *') and 'typeof (buckets_in_flight->nr / 4) *' (aka 'unsigned int *')) [-Werror,-Wcompare-distinct-pointer-types]
    159 |         size_t nr_to_get = max(16UL, buckets_in_flight->nr / 4);
        |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  include/linux/minmax.h:76:19: note: expanded from macro 'max'
     76 | #define max(x, y)       __careful_cmp(x, y, >)
        |                         ^~~~~~~~~~~~~~~~~~~~~~
  include/linux/minmax.h:38:24: note: expanded from macro '__careful_cmp'
     38 |         __builtin_choose_expr(__safe_cmp(x, y), \
        |                               ^~~~~~~~~~~~~~~~
  include/linux/minmax.h:28:4: note: expanded from macro '__safe_cmp'
     28 |                 (__typecheck(x, y) && __no_side_effects(x, y))
        |                  ^~~~~~~~~~~~~~~~~
  include/linux/minmax.h:22:28: note: expanded from macro '__typecheck'
     22 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
        |                    ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~
  1 error generated.

On 64-bit architectures, size_t is 'unsigned long', so there is no
warning when comparing these two expressions. Use max_t(size_t, ...) for
this situation, eliminating the warning.

Fixes: dd49018737d4 ("bcachefs: Rhashtable based buckets_in_flight for copygc")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix -Wcompare-distinct-pointer-types in do_encrypt()
Nathan Chancellor [Tue, 12 Sep 2023 19:15:43 +0000 (12:15 -0700)]
bcachefs: Fix -Wcompare-distinct-pointer-types in do_encrypt()

When building bcachefs for 32-bit ARM, there is a warning when using
min() to compare a variable of type 'size_t' with an expression of type
'unsigned long':

  fs/bcachefs/checksum.c:142:22: error: comparison of distinct pointer types ('typeof (len) *' (aka 'unsigned int *') and 'typeof (((1UL) << 12) - offset) *' (aka 'unsigned long *')) [-Werror,-Wcompare-distinct-pointer-types]
    142 |                         unsigned pg_len = min(len, PAGE_SIZE - offset);
        |                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  include/linux/minmax.h:69:19: note: expanded from macro 'min'
     69 | #define min(x, y)       __careful_cmp(x, y, <)
        |                         ^~~~~~~~~~~~~~~~~~~~~~
  include/linux/minmax.h:38:24: note: expanded from macro '__careful_cmp'
     38 |         __builtin_choose_expr(__safe_cmp(x, y), \
        |                               ^~~~~~~~~~~~~~~~
  include/linux/minmax.h:28:4: note: expanded from macro '__safe_cmp'
     28 |                 (__typecheck(x, y) && __no_side_effects(x, y))
        |                  ^~~~~~~~~~~~~~~~~
  include/linux/minmax.h:22:28: note: expanded from macro '__typecheck'
     22 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
        |                    ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~
  1 error generated.

On 64-bit architectures, size_t is 'unsigned long', so there is no
warning when comparing these two expressions. Use min_t(size_t, ...) for
this situation, eliminating the warning.

Fixes: 1fb50457684f ("bcachefs: Fix memory corruption in encryption path")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix -Wincompatible-function-pointer-types-strict from key_invalid callbacks
Nathan Chancellor [Tue, 12 Sep 2023 19:15:42 +0000 (12:15 -0700)]
bcachefs: Fix -Wincompatible-function-pointer-types-strict from key_invalid callbacks

When building bcachefs with -Wincompatible-function-pointer-types-strict,
a clang warning designed to catch issues with mismatched function
pointer types, which will be fatal at runtime due to kernel Control Flow
Integrity (kCFI), there are several instances along the lines of:

  fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int (*)(const struct bch_fs *, struct bkey_s_c, enum bkey_invalid_flags, struct printbuf *)' with an expression of type 'int (const struct bch_fs *, struct bkey_s_c, unsigned int, struct printbuf *)' [-Werror,-Wincompatible-function-pointer-types-strict]
    118 |         BCH_BKEY_TYPES()
        |         ^~~~~~~~~~~~~~~~
  fs/bcachefs/bcachefs_format.h:342:2: note: expanded from macro 'BCH_BKEY_TYPES'
    342 |         x(deleted,              0)                      \
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
  fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x'
    117 | #define x(name, nr) [KEY_TYPE_##name]   = bch2_bkey_ops_##name,
        |                                           ^~~~~~~~~~~~~~~~~~~~
  <scratch space>:206:1: note: expanded from here
    206 | bch2_bkey_ops_deleted
        | ^~~~~~~~~~~~~~~~~~~~~
  fs/bcachefs/bkey_methods.c:34:17: note: expanded from macro 'bch2_bkey_ops_deleted'
     34 |         .key_invalid = deleted_key_invalid,             \
        |                        ^~~~~~~~~~~~~~~~~~~

The flags parameter should be of type 'enum bkey_invalid_flags', not
'unsigned int'. Adjust the type everywhere so that there is no more
warning.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix -Wformat in bch2_bucket_gens_invalid()
Nathan Chancellor [Tue, 12 Sep 2023 19:15:41 +0000 (12:15 -0700)]
bcachefs: Fix -Wformat in bch2_bucket_gens_invalid()

When building bcachefs for 32-bit ARM, there is a compiler warning in
bch2_bucket_gens_invalid() due to use of an incorrect format specifier:

  fs/bcachefs/alloc_background.c:530:10: error: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat]
    529 |                 prt_printf(err, "bad val size (%lu != %zu)",
        |                                                ~~~
        |                                                %zu
    530 |                        bkey_val_bytes(k.k), sizeof(struct bch_bucket_gens));
        |                        ^~~~~~~~~~~~~~~~~~~
  fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf'
    223 | #define prt_printf(_out, ...)           bch2_prt_printf(_out, __VA_ARGS__)
        |                                                               ^~~~~~~~~~~

On 64-bit architectures, size_t is 'unsigned long', so there is no
warning when using %lu but on 32-bit architectures, size_t is 'unsigned
int'. Use '%zu', the format specifier for 'size_t', to eliminate the
warning.

Fixes: 4be0d766a7e9 ("bcachefs: bucket_gens btree")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix -Wformat in bch2_alloc_v4_invalid()
Nathan Chancellor [Tue, 12 Sep 2023 19:15:40 +0000 (12:15 -0700)]
bcachefs: Fix -Wformat in bch2_alloc_v4_invalid()

When building bcachefs for 32-bit ARM, there is a compiler warning in
bch2_alloc_v4_invalid() due to use of an incorrect format specifier:

  fs/bcachefs/alloc_background.c:246:30: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat]
    245 |                 prt_printf(err, "bad val size (%u > %lu)",
        |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        |                                                     %u
    246 |                        alloc_v4_u64s(a.v), bkey_val_u64s(k.k));
        |                        ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
  fs/bcachefs/bkey.h:58:27: note: expanded from macro 'bkey_val_u64s'
     58 | #define bkey_val_u64s(_k)       ((_k)->u64s - BKEY_U64s)
        |                                 ^
  fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf'
    223 | #define prt_printf(_out, ...)           bch2_prt_printf(_out, __VA_ARGS__)
        |                                                               ^~~~~~~~~~~

This expression is of type 'size_t'. On 64-bit architectures, size_t is
'unsigned long', so there is no warning when using %lu but on 32-bit
architectures, size_t is 'unsigned int'. Use '%zu', the format specifier
for 'size_t' to eliminate the warning.

Fixes: 11be8e8db283 ("bcachefs: New on disk format: Backpointers")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix -Wformat in bch2_btree_key_cache_to_text()
Nathan Chancellor [Tue, 12 Sep 2023 19:15:39 +0000 (12:15 -0700)]
bcachefs: Fix -Wformat in bch2_btree_key_cache_to_text()

When building bcachefs for 32-bit ARM, there is a compiler warning in
bch2_btree_key_cache_to_text() due to use of an incorrect format
specifier:

  fs/bcachefs/btree_key_cache.c:1060:36: error: format specifies type 'size_t' (aka 'unsigned int') but the argument has type 'long' [-Werror,-Wformat]
   1060 |         prt_printf(out, "nr_freed:\t%zu",       atomic_long_read(&c->nr_freed));
        |                                     ~~~         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        |                                     %ld
  fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf'
    223 | #define prt_printf(_out, ...)           bch2_prt_printf(_out, __VA_ARGS__)
        |                                                               ^~~~~~~~~~~
  1 error generated.

On 64-bit architectures, size_t is 'unsigned long', so there is no
warning when using %zu but on 32-bit architectures, size_t is
'unsigned int'. Use '%lu' to match the other format specifiers used in
this function for printing values returned from atomic_long_read().

Fixes: 6d799930ce0f ("bcachefs: btree key cache pcpu freedlist")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix -Wformat in bch2_set_bucket_needs_journal_commit()
Nathan Chancellor [Tue, 12 Sep 2023 19:15:38 +0000 (12:15 -0700)]
bcachefs: Fix -Wformat in bch2_set_bucket_needs_journal_commit()

When building bcachefs for 32-bit ARM, there is a compiler warning in
bch2_set_bucket_needs_journal_commit() due to a debug print using the
wrong specifier:

  fs/bcachefs/buckets_waiting_for_journal.c:137:30: error: format specifies type 'size_t' (aka 'unsigned int') but the argument has type 'unsigned long' [-Werror,-Wformat]
    136 |         pr_debug("took %zu rehashes, table at %zu/%zu elements",
        |                                                   ~~~
        |                                                   %lu
    137 |                  nr_rehashes, nr_elements, 1UL << b->t->bits);
        |                                            ^~~~~~~~~~~~~~~~~
  include/linux/printk.h:579:26: note: expanded from macro 'pr_debug'
    579 |         dynamic_pr_debug(fmt, ##__VA_ARGS__)
        |                          ~~~    ^~~~~~~~~~~
  include/linux/dynamic_debug.h:270:22: note: expanded from macro 'dynamic_pr_debug'
    270 |                            pr_fmt(fmt), ##__VA_ARGS__)
        |                                   ~~~     ^~~~~~~~~~~
  include/linux/dynamic_debug.h:250:59: note: expanded from macro '_dynamic_func_call'
    250 |         _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__)
        |                                                                  ^~~~~~~~~~~
  include/linux/dynamic_debug.h:248:65: note: expanded from macro '_dynamic_func_call_cls'
    248 |         __dynamic_func_call_cls(__UNIQUE_ID(ddebug), cls, fmt, func, ##__VA_ARGS__)
        |                                                                        ^~~~~~~~~~~
  include/linux/dynamic_debug.h:224:15: note: expanded from macro '__dynamic_func_call_cls'
    224 |                 func(&id, ##__VA_ARGS__);                       \
        |                             ^~~~~~~~~~~
  1 error generated.

On 64-bit architectures, size_t is 'unsigned long', so there is no
warning when using %zu but on 32-bit architectures, size_t is
'unsigned int'. Use the correct specifier to resolve the warning.

Fixes: 7a82e75ddaef ("bcachefs: New data structure for buckets waiting on journal commit")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a handful of spelling mistakes in various messages
Colin Ian King [Tue, 12 Sep 2023 08:25:27 +0000 (09:25 +0100)]
bcachefs: Fix a handful of spelling mistakes in various messages

There are several spelling mistakes in error messages. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: remove redundant pointer q
Colin Ian King [Tue, 12 Sep 2023 12:37:44 +0000 (13:37 +0100)]
bcachefs: remove redundant pointer q

The pointer q is being assigned a value but it is never read. The
assignment and pointer are redundant and can be removed.
Cleans up clang scan build warning:

fs/bcachefs/quota.c:813:2: warning: Value stored to 'q' is never
read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: remove duplicated assignment to variable offset_into_extent
Colin Ian King [Tue, 12 Sep 2023 12:37:43 +0000 (13:37 +0100)]
bcachefs: remove duplicated assignment to variable offset_into_extent

Variable offset_into_extent is being assigned to zero and a few
statements later it is being re-assigned again to the save value.
The second assignment is redundant and can be removed. Cleans up
clang-scan build warning:

fs/bcachefs/io.c:2722:3: warning: Value stored to 'offset_into_extent'
is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: remove redundant initializations of variables start_offset and end_offset
Colin Ian King [Tue, 12 Sep 2023 12:37:42 +0000 (13:37 +0100)]
bcachefs: remove redundant initializations of variables start_offset and end_offset

The variables start_offset and end_offset are being initialized with
values that are never read, they being re-assigned later on. The
initializations are redundant and can be removed.

Cleans up clang-scan build warnings:
fs/bcachefs/fs-io.c:243:11: warning: Value stored to 'start_offset' during
its initialization is never read [deadcode.DeadStores]
fs/bcachefs/fs-io.c:244:11: warning: Value stored to 'end_offset' during
its initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: remove redundant initialization of pointer dst
Colin Ian King [Tue, 12 Sep 2023 12:37:41 +0000 (13:37 +0100)]
bcachefs: remove redundant initialization of pointer dst

The pointer dst is being initialized with a value that is never read,
it is being re-assigned later on when it is used in a while-loop
The initialization is redundant and can be removed.

Cleans up clang-scan build warning:
fs/bcachefs/disk_groups.c:186:30: warning: Value stored to 'dst' during
its initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: remove redundant initialization of pointer d
Colin Ian King [Tue, 12 Sep 2023 12:37:40 +0000 (13:37 +0100)]
bcachefs: remove redundant initialization of pointer d

The pointer d is being initialized with a value that is never read,
it is being re-assigned later on when it is used in a for-loop.
The initialization is redundant and can be removed.

Cleans up clang-scan build warning:
fs/bcachefs/buckets.c:1303:25: warning: Value stored to 'd' during its
initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: trace_read_nopromote()
Kent Overstreet [Tue, 12 Sep 2023 00:44:33 +0000 (20:44 -0400)]
bcachefs: trace_read_nopromote()

Add a tracepoint to print the reason a read wasn't promoted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Log finsert/fcollapse operations
Kent Overstreet [Sun, 10 Sep 2023 23:11:47 +0000 (19:11 -0400)]
bcachefs: Log finsert/fcollapse operations

Now that we have the logged operations btree, we can make
finsert/fcollapse atomic w.r.t. unclean shutdown as well.

This adds bch_logged_op_finsert to represent the state of an finsert or
fcollapse, which is a bit more complicated than truncate since we need
to track our position in the "shift extents" operation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Log truncate operations
Kent Overstreet [Sun, 10 Sep 2023 20:42:30 +0000 (16:42 -0400)]
bcachefs: Log truncate operations

Previously, we guaranteed atomicity of truncate after unclean shutdown
with the BCH_INODE_I_SIZE_DIRTY flag - which required a full scan of the
inodes btree.

Recently the deleted inodes btree was added so that we no longer have to
scan for deleted inodes, but truncate was unfinished and that change
left it broken.

This patch uses the new logged operations btree to fix truncate
atomicity; we now log an operation that can be replayed at the start of
a truncate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: BTREE_ID_logged_ops
Kent Overstreet [Sun, 27 Aug 2023 22:27:41 +0000 (18:27 -0400)]
bcachefs: BTREE_ID_logged_ops

Add a new btree for long running logged operations - i.e. for logging
operations that we can't do within a single btree transaction, so that
they can be resumed if we crash.

Keys in the logged operations btree will represent operations in
progress, with the state of the operation stored in the value.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: New io_misc.c helpers
Kent Overstreet [Mon, 4 Sep 2023 09:38:30 +0000 (05:38 -0400)]
bcachefs: New io_misc.c helpers

This pulls the non vfs specific parts of truncate and finsert/fcollapse
out of fs-io.c, and moves them to io_misc.c.

This is prep work for logging these operations, to make them atomic in
the event of a crash.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Break up io.c
Kent Overstreet [Sun, 10 Sep 2023 22:05:17 +0000 (18:05 -0400)]
bcachefs: Break up io.c

More reorganization, this splits up io.c into
 - io_read.c
 - io_misc.c - fallocate, fpunch, truncate
 - io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_trans_update_get_key_cache()
Kent Overstreet [Mon, 11 Sep 2023 23:50:42 +0000 (19:50 -0400)]
bcachefs: bch2_trans_update_get_key_cache()

Factor out a slowpath into a separate function.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans()
Kent Overstreet [Mon, 11 Sep 2023 23:48:07 +0000 (19:48 -0400)]
bcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill incorrect assertion
Kent Overstreet [Mon, 11 Sep 2023 18:34:56 +0000 (14:34 -0400)]
bcachefs: Kill incorrect assertion

In the bch2_fs_alloc() error path we call bch2_fs_free() without setting
BCH_FS_STOPPING - this is fine.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Convert more code to bch_err_msg()
Kent Overstreet [Mon, 11 Sep 2023 05:37:34 +0000 (01:37 -0400)]
bcachefs: Convert more code to bch_err_msg()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill missing inode warnings in bch2_quota_read()
Kent Overstreet [Mon, 11 Sep 2023 02:05:50 +0000 (22:05 -0400)]
bcachefs: Kill missing inode warnings in bch2_quota_read()

bch2_quota_read(), when scanning for inodes, may attempt to look up
inodes that have been deleted in the main subvolume - this is not an
error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch_sb_handle type
Kent Overstreet [Sun, 10 Sep 2023 06:13:33 +0000 (02:13 -0400)]
bcachefs: Fix bch_sb_handle type

blk_mode_t was recently introduced; we should be using it now, instead
of fmode_t.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_propagate_key_to_snapshot_leaves()
Kent Overstreet [Sun, 10 Sep 2023 20:24:02 +0000 (16:24 -0400)]
bcachefs: Fix bch2_propagate_key_to_snapshot_leaves()

When we handle a transaction restart in a nested context, we need to
return -BCH_ERR_transaction_restart_nested because we invalidated the
outer context's iterators and locks.

bch2_propagate_key_to_snapshot_leaves() wasn't doing this, this patch
fixes it to use trans_was_restarted().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix silent enum conversion error
Kent Overstreet [Sun, 10 Sep 2023 01:14:54 +0000 (21:14 -0400)]
bcachefs: Fix silent enum conversion error

This changes mark_btree_node_locked() to take an enum
btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED
is -1 which may cause problems converting back and forth to
six_lock_type if short enums are in use.

With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type
enum.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Array bounds fixes
Kent Overstreet [Sun, 10 Sep 2023 00:10:11 +0000 (20:10 -0400)]
bcachefs: Array bounds fixes

It's no longer legal to use a zero size array as a flexible array
member - this causes UBSAN to complain.

This patch switches our zero size arrays to normal flexible array
members when possible, and inserts casts in other places (e.g. where we
use the zero size array as a marker partway through an array).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_acl_to_text()
Kent Overstreet [Fri, 8 Sep 2023 22:14:08 +0000 (18:14 -0400)]
bcachefs: bch2_acl_to_text()

We can now print out acls from bch2_xattr_to_text(), when the xattr
contains an acl.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: restart journal reclaim thread on ro->rw transitions
Brian Foster [Wed, 30 Aug 2023 10:45:59 +0000 (06:45 -0400)]
bcachefs: restart journal reclaim thread on ro->rw transitions

Commit c2d5ff36065a4 ("bcachefs: Start journal reclaim thread
earlier") tweaked reclaim thread management to start a bit earlier
in the mount sequence by moving the start call from
__bch2_fs_read_write() to bch2_fs_journal_start(). This has the side
effect of never starting the reclaim thread on a ro->rw transition,
which can be observed by monitoring reclaim behavior via the
journal_reclaim tracepoints. I.e. once an fs has remounted ro->rw,
we only ever rely on direct reclaim from that point forward.

Since bch2_journal_reclaim_start() properly handles the case where
the reclaim thread has already been created, restore the start call
in the read-write helper. This allows the reclaim thread to start
early when appropriate and also exit/restart on remounts or freeze
cycles. In the latter case it may be possible to simply allow the
task to freeze rather than destroy it, but for now just fix the
immediate bug.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix snapshot_skiplist_good()
Kent Overstreet [Mon, 28 Aug 2023 19:17:31 +0000 (15:17 -0400)]
bcachefs: Fix snapshot_skiplist_good()

We weren't correctly checking snapshot skiplist nodes - we were checking
if they were in the same tree, not if they were an actual ancestor.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Kill stripe check in bch2_alloc_v4_invalid()
Kent Overstreet [Thu, 24 Aug 2023 21:07:50 +0000 (17:07 -0400)]
bcachefs: Kill stripe check in bch2_alloc_v4_invalid()

Since we set bucket data type to BCH_DATA_stripe based on the data
pointer, not just the stripe pointer, it doesn't make sense to check for
no stripe in the .key_invalid method - this is a situation that
shouldn't happen, but our other fsck/repair code handles it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve bch2_moving_ctxt_to_text()
Kent Overstreet [Thu, 24 Aug 2023 01:20:42 +0000 (21:20 -0400)]
bcachefs: Improve bch2_moving_ctxt_to_text()

Print more information out about moving contexts - fold in the output of
the redundant bch2_data_jobs_to_text(), and also include information
relevant to whether move_data() should be blocked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Put bkey invalid check in commit path in a more useful place
Kent Overstreet [Wed, 23 Aug 2023 00:29:35 +0000 (20:29 -0400)]
bcachefs: Put bkey invalid check in commit path in a more useful place

When doing updates early in recovery, before we can go RW, we still want
to check that keys are valid at commit time - this moves key invalid
checking to before the "btree updates to journal" path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Always check alloc data type
Kent Overstreet [Tue, 22 Aug 2023 22:48:09 +0000 (18:48 -0400)]
bcachefs: Always check alloc data type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix a double free on invalid bkey
Kent Overstreet [Tue, 22 Aug 2023 22:47:16 +0000 (18:47 -0400)]
bcachefs: Fix a double free on invalid bkey

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: bch2_propagate_key_to_snapshot_leaves()
Kent Overstreet [Sat, 19 Aug 2023 01:14:33 +0000 (21:14 -0400)]
bcachefs: bch2_propagate_key_to_snapshot_leaves()

If fsck finds a key that needs work done, the primary example being an
unlinked inode that needs to be deleted, and the key is in an internal
snapshot node, we have a bit of a conundrum.

The conundrum is that internal snapshot nodes are shared, and we in
general do updates in internal snapshot nodes because there may be
overwrites in some snapshots and not others, and this may affect other
keys referenced by this key (i.e. extents).

For example, we might be seeing an unlinked inode in an internal
snapshot node, but then in one child snapshot the inode might have been
reattached and might not be unlinked. Deleting the inode in the internal
snapshot node would be wrong, because then we'll delete all the extents
that the child snapshot references.

But if an unlinked inode does not have any overwrites in child
snapshots, we're fine: the inode is overwrritten in all child snapshots,
so we can do the deletion at the point of comonality in the snapshot
tree, i.e. the node where we found it.

This patch adds a new helper, bch2_propagate_key_to_snapshot_leaves(),
to handle the case where we need a to update a key that does have
overwrites in child snapshots: we copy the key to leaf snapshot nodes,
and then rewind fsck and process the needed updates there.

With this, fsck can now always correctly handle unlinked inodes found in
internal snapshot nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Cleanup redundant snapshot nodes
Kent Overstreet [Fri, 18 Aug 2023 02:10:02 +0000 (22:10 -0400)]
bcachefs: Cleanup redundant snapshot nodes

After deleteing snapshots, we may be left with a snapshot tree where
some nodes only have one child, and we have a linear chain.

Interior snapshot nodes are never used directly (i.e. they never have
subvolumes that point to them), they are only referered to by child
snapshot nodes - hence, they are redundant.

The existing code talks about redundant snapshot nodes as forming and
equivalence class; i.e. nodes for which snapshot_t->equiv is equal. In a
given equivalence class, we only ever need a single key at a given
position - i.e. multiple versions with different snapshot fields are
redundant.

The existing snapshot cleanup code deletes these redundant keys, but not
redundant nodes. It turns out this is buggy, because we assume that
after snapshot deletion finishes we should only have a single key per
equivalence class, but the btree update path doesn't preserve this -
overwriting keys in old snapshots doesn't check for the equivalence
class being equal, and thus we can end up with duplicate keys in the
same equivalence class and fsck complaining about snapshot deletion not
having run correctly.

The equivalence class notion has been leaking out of the core snapshots
code and into too much other code, i.e. fsck, so this patch takes a
different approach: snapshot deletion now moves keys to the node in an
equivalence class being kept (the leafiest node) and then deletes the
redundant nodes in the equivalance class.

Some work has to be done to correctly delete interior snapshot nodes;
snapshot node depth and skiplist fields for descendent nodes have to be
fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix btree write buffer with snapshots btrees
Kent Overstreet [Mon, 21 Aug 2023 23:57:34 +0000 (19:57 -0400)]
bcachefs: Fix btree write buffer with snapshots btrees

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix is_ancestor bitmap
Kent Overstreet [Thu, 13 Jul 2023 06:43:29 +0000 (02:43 -0400)]
bcachefs: Fix is_ancestor bitmap

The is_ancestor bitmap is at optimization for bch2_snapshot_is_ancestor;
once we get sufficiently close to the ancestor ID we're searching for we
test a bitmap.

But initialization of the is_ancestor bitmap was broken; we do it by
using bch2_snapshot_parent(), but we call that on nodes that haven't
been initialized yet with bch2_mark_snapshot().

Fix this by adding a separate loop in bch2_snapshots_read() for
initializing the is_ancestor bitmap, and also add some new debug asserts
for checking this sort of breakage in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: move check_pos_snapshot_overwritten() to snapshot.c
Kent Overstreet [Sat, 19 Aug 2023 01:13:44 +0000 (21:13 -0400)]
bcachefs: move check_pos_snapshot_overwritten() to snapshot.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Fix bch2_mount error path
Kent Overstreet [Fri, 18 Aug 2023 21:44:21 +0000 (17:44 -0400)]
bcachefs: Fix bch2_mount error path

In the bch2_mount() error path, we were calling
deactivate_locked_super(), which calls ->kill_sb(), which in our case
was calling bch2_fs_free() without __bch2_fs_stop().

This changes bch2_mount() to just call bch2_fs_stop() directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Delete a faulty assertion
Kent Overstreet [Fri, 18 Aug 2023 04:05:35 +0000 (00:05 -0400)]
bcachefs: Delete a faulty assertion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 months agobcachefs: Improve btree_path_relock_fail tracepoint
Kent Overstreet [Fri, 18 Aug 2023 02:04:20 +0000 (22:04 -0400)]
bcachefs: Improve btree_path_relock_fail tracepoint

In https://github.com/koverstreet/bcachefs/issues/450, we're seeing
unexplained btree_path_relock_fail events - according to the information
currently in the tracepoint, it appears the relock should be succeeding.

This adds lock counts to the tracepoint to help track it down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>