linux-block.git
6 weeks agobcachefs: Clean up option pre/post hooks, small fixes
Kent Overstreet [Tue, 15 Apr 2025 13:54:01 +0000 (09:54 -0400)]
bcachefs: Clean up option pre/post hooks, small fixes

The helpers are now:
- bch2_opt_hook_pre_set()
- bch2_opts_hooks_pre_set()
- bch2_opt_hook_post_set

Fix a bug where the filesystem discard option would incorrectly be
changed when setting the device option, and don't trigger rebalance
scans unnecessarily (when options aren't changing).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Use drop_locks_do() in bch2_inode_hash_find()
Kent Overstreet [Sun, 13 Apr 2025 12:20:47 +0000 (08:20 -0400)]
bcachefs: Use drop_locks_do() in bch2_inode_hash_find()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Single device mode
Kent Overstreet [Wed, 2 Apr 2025 19:12:49 +0000 (15:12 -0400)]
bcachefs: Single device mode

Single device filesystems are now identified by the block device name,
not the UUID - and single device filesystems with the same UUID can be
mounted simultaneously, without any special options.

This allocates a new bit in the superblock, BCH_SB_MULTI_DEVICE, which
indicates whether a filesystem has ever been multi device.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Initialize c->name earlier on single dev filesystems
Kent Overstreet [Thu, 3 Apr 2025 17:10:03 +0000 (13:10 -0400)]
bcachefs: Initialize c->name earlier on single dev filesystems

On single device filesystems, c->name contains the block device name,
not the UUID.

Initialize this earlier, so that single device mode can use it for
initializing sysfs/debugfs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Simplify logic
Alan Huang [Tue, 15 Apr 2025 05:33:07 +0000 (13:33 +0800)]
bcachefs: Simplify logic

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Remove spurious +1/-1 operation
Alan Huang [Tue, 15 Apr 2025 05:33:06 +0000 (13:33 +0800)]
bcachefs: Remove spurious +1/-1 operation

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Kill bch2_trans_unlock_noassert
Alan Huang [Tue, 15 Apr 2025 05:33:04 +0000 (13:33 +0800)]
bcachefs: Kill bch2_trans_unlock_noassert

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Clean up duplicated code in bch2_journal_halt()
Kent Overstreet [Sun, 13 Apr 2025 21:59:10 +0000 (17:59 -0400)]
bcachefs: Clean up duplicated code in bch2_journal_halt()

It's now a wrapper around bch2_journal_halt_locked().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: bch2_dev_allocator_set_rw()
Kent Overstreet [Sun, 13 Apr 2025 20:29:36 +0000 (16:29 -0400)]
bcachefs: bch2_dev_allocator_set_rw()

Add a helper that lets us change bch_member.data_allowed at runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: bch2_dev_journal_alloc() now respects data_allowed
Kent Overstreet [Sun, 13 Apr 2025 17:52:12 +0000 (13:52 -0400)]
bcachefs: bch2_dev_journal_alloc() now respects data_allowed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Improve bch2_btree_cache_to_text()
Kent Overstreet [Sun, 13 Apr 2025 11:47:47 +0000 (07:47 -0400)]
bcachefs: Improve bch2_btree_cache_to_text()

Make the output slightly clearer, and include a counter for "nodes we
couldn't free because we would have gone under our reserve".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: __btree_node_reclaim_checks()
Kent Overstreet [Sun, 13 Apr 2025 11:45:13 +0000 (07:45 -0400)]
bcachefs: __btree_node_reclaim_checks()

Factor out a helper so we're not duplicating checks after locking the
btree node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: kill BTREE_CACHE_NOT_FREED_INCREMENT()
Kent Overstreet [Sun, 13 Apr 2025 11:42:46 +0000 (07:42 -0400)]
bcachefs: kill BTREE_CACHE_NOT_FREED_INCREMENT()

Small cleanup, just always increment the counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Improve opts.degraded
Kent Overstreet [Sun, 6 Apr 2025 17:50:20 +0000 (13:50 -0400)]
bcachefs: Improve opts.degraded

Kill 'opts.very_degraded', and make 'opts.degraded' a persistent option,
stored in the superblock.

It's now an enum, with available choices ask/yes/very/no.

"ask" mode will be handled by the mount helper, for prompting the user
(on a machine used interactively) for whether to do a degraded mount.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: export bch2_chacha20
Kent Overstreet [Sun, 13 Apr 2025 09:23:40 +0000 (05:23 -0400)]
bcachefs: export bch2_chacha20

Needed for userspcae.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: indent error messages of invalid compression
Integral [Tue, 8 Apr 2025 10:31:29 +0000 (18:31 +0800)]
bcachefs: indent error messages of invalid compression

This patch uses printbuf_indent_add_nextline() to set a consistent
indentation level for error messages of invalid compression.

In my previous patch [1], the newline is added by using '\n' in
the argument of prt_str(). This patch replaces prt_str() with
prt_printf() to make indentation level work correctly.

Link: https://lore.kernel.org/20250406152659.205997-2-integral@archlinuxcn.org
Signed-off-by: Integral <integral@archlinuxcn.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: split error messages of invalid compression into two lines
Integral [Sun, 6 Apr 2025 15:26:59 +0000 (23:26 +0800)]
bcachefs: split error messages of invalid compression into two lines

When an invalid compression type or level is passed as an argument
to `--compression`, two error messages are squashed into one line:

    > bcachefs format --compression=lzo bcachefs-comp.img
    invalid option: invalid compression typecompression: parse error

    > bcachefs format --compression=lz4:16 bcachefs-comp.img
    invalid option: invalid compression levelcompression: parse error

To resolve this issue, add a newline character at the end of the
first error message to separate them into two lines.

Signed-off-by: Integral <integral@archlinuxcn.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: early return for negative values when parsing BCH_OPT_UINT
Integral [Sun, 6 Apr 2025 14:53:28 +0000 (22:53 +0800)]
bcachefs: early return for negative values when parsing BCH_OPT_UINT

Currently, when passing a negative integer as argument, the error
message is "too big" due to casting to an unsigned integer:

    > bcachefs format --block_size=-1 bcachefs.img
    invalid option: block_size: too big (max 65536)

When negative value in argument detected, return early before
calling bch2_opt_validate().

A new error code `BCH_ERR_option_negative` is added.

Signed-off-by: Integral <integral@archlinuxcn.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: move_data_phys: stats are not required
Kent Overstreet [Fri, 4 Apr 2025 00:56:09 +0000 (20:56 -0400)]
bcachefs: move_data_phys: stats are not required

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: RO mounts now use less memory
Kent Overstreet [Sat, 5 Apr 2025 21:36:04 +0000 (17:36 -0400)]
bcachefs: RO mounts now use less memory

Defer memory allocations only needed in RW mode until we actually go RW.

This is part of improved support for RO images.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Move various init code to _init_early()
Kent Overstreet [Sat, 5 Apr 2025 23:30:43 +0000 (19:30 -0400)]
bcachefs: Move various init code to _init_early()

_init_early() is for initialization that cannot fail, and often must
happen for teardown partway through initialization to work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: alphabetize init function calls
Kent Overstreet [Sat, 5 Apr 2025 23:41:35 +0000 (19:41 -0400)]
bcachefs: alphabetize init function calls

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: simplify journal pin initialization
Kent Overstreet [Sat, 5 Apr 2025 23:26:19 +0000 (19:26 -0400)]
bcachefs: simplify journal pin initialization

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: btree_io_complete_wq -> btree_write_complete_wq
Kent Overstreet [Sat, 5 Apr 2025 23:23:52 +0000 (19:23 -0400)]
bcachefs: btree_io_complete_wq -> btree_write_complete_wq

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: bch2_kvmalloc() mem alloc profiling
Kent Overstreet [Fri, 4 Apr 2025 02:30:39 +0000 (22:30 -0400)]
bcachefs: bch2_kvmalloc() mem alloc profiling

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: add missing include
Kent Overstreet [Thu, 3 Apr 2025 16:18:39 +0000 (12:18 -0400)]
bcachefs: add missing include

Hygeine, and fix build in userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: bch2_snapshot_table_make_room()
Kent Overstreet [Wed, 2 Apr 2025 18:40:06 +0000 (14:40 -0400)]
bcachefs: bch2_snapshot_table_make_room()

Add a better helper for check_snapshot_exists().

create_snapids() can't be changed to use this, unfortunately, because
the transaction that creates new snapshot will also be inserting other
keys (e.g. root inode) that reference that snapshot ID, and they expect
the snapshot table to already be updated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: darray: provide typedefs for primitive types
Kent Overstreet [Wed, 2 Apr 2025 15:59:39 +0000 (11:59 -0400)]
bcachefs: darray: provide typedefs for primitive types

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: reduce new_stripe_alloc_buckets() stack usage
Kent Overstreet [Wed, 2 Apr 2025 21:23:22 +0000 (17:23 -0400)]
bcachefs: reduce new_stripe_alloc_buckets() stack usage

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: alloc_request no longer on stack
Kent Overstreet [Mon, 31 Mar 2025 21:50:52 +0000 (17:50 -0400)]
bcachefs: alloc_request no longer on stack

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: alloc_request.ptrs2
Kent Overstreet [Mon, 31 Mar 2025 21:57:06 +0000 (17:57 -0400)]
bcachefs: alloc_request.ptrs2

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: alloc_request.ca
Kent Overstreet [Mon, 31 Mar 2025 21:13:22 +0000 (17:13 -0400)]
bcachefs: alloc_request.ca

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: alloc_request.counters
Kent Overstreet [Mon, 31 Mar 2025 21:08:43 +0000 (17:08 -0400)]
bcachefs: alloc_request.counters

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: alloc_request.usage
Kent Overstreet [Mon, 31 Mar 2025 19:52:39 +0000 (15:52 -0400)]
bcachefs: alloc_request.usage

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: alloc_request: deallocate_extra_replicas()
Kent Overstreet [Mon, 31 Mar 2025 21:54:43 +0000 (17:54 -0400)]
bcachefs: alloc_request: deallocate_extra_replicas()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: new_stripe_alloc_buckets() takes alloc_request
Kent Overstreet [Mon, 31 Mar 2025 19:46:45 +0000 (15:46 -0400)]
bcachefs: new_stripe_alloc_buckets() takes alloc_request

More stack usage improvements: instead of creating a new alloc_request
(currently on the stack), save/restore just the fields we need to reuse.

This is a bit tricky, because we're doing a normal alloc_foreground.c
allocation, which calls into ec.c to get a stripe, which then does more
normal allocations - some of the fields get reused, and used
differently.

So we have to save and restore them - but the stack usage improvements
will be well worth it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: bch2_ec_stripe_head_get() takes alloc_request
Kent Overstreet [Mon, 31 Mar 2025 19:37:28 +0000 (15:37 -0400)]
bcachefs: bch2_ec_stripe_head_get() takes alloc_request

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: bch2_bucket_alloc_trans() takes alloc_request
Kent Overstreet [Sat, 22 Mar 2025 01:13:53 +0000 (21:13 -0400)]
bcachefs: bch2_bucket_alloc_trans() takes alloc_request

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: alloc_request.data_type
Kent Overstreet [Sat, 22 Mar 2025 01:06:43 +0000 (21:06 -0400)]
bcachefs: alloc_request.data_type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: struct alloc_request
Kent Overstreet [Sat, 22 Mar 2025 00:42:42 +0000 (20:42 -0400)]
bcachefs: struct alloc_request

Add a struct for common state for satisfying an on disk allocation,
instead of passing the same long list of items to every function.

This will help with stack usage, performance, and perhaps enable some
code cleanups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: trace bch2_trans_kmalloc()
Kent Overstreet [Tue, 1 Apr 2025 18:29:31 +0000 (14:29 -0400)]
bcachefs: trace bch2_trans_kmalloc()

We're occasionally seeing the WARN_ON() for bump allocator usage
exceeding BTREE_TRANS_MEM_MAX; add some tracing so we can see what's
going on.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: replace memcpy with memcpy_and_pad for jset_entry_log->d buff
Roxana Nicolescu [Thu, 27 Mar 2025 14:50:09 +0000 (14:50 +0000)]
bcachefs: replace memcpy with memcpy_and_pad for jset_entry_log->d buff

This was achieved before by zero-ing out the source buffer and then
copying the bytes into the destination buffer. This can also be done with
memcpy_and_pad which will zero out only the destination buffer if its
size is bigger than the size of the source buffer. This is already used in
the same way in journal_transaction_name().

Moreover, zero-ing the source buffer was done twice, first in
__bch2_fs_log_msg() and then in bch2_trans_log_msg(). And this method
may also require allocating some extra memory for the source buffer.

In conclusion, using memcpy_and_pad is better even tough the result is
the same because it brings uniformity with what's already used in
journal_transaction_name, it avoids code duplication and reallocating
extra memory.

Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: replace strncpy() with memcpy_and_pad in journal_transaction_name
Roxana Nicolescu [Thu, 27 Mar 2025 14:50:05 +0000 (14:50 +0000)]
bcachefs: replace strncpy() with memcpy_and_pad in journal_transaction_name

Strncpy is now deprecated.
The buffer destination is not required to be NULL-terminated, but we also
want to zero out the rest of the buffer as it is already done in other
places.

Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Roxana Nicolescu <nicolescu.roxana@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Rebalance now skips poisoned extents
Kent Overstreet [Tue, 11 Mar 2025 13:46:06 +0000 (09:46 -0400)]
bcachefs: Rebalance now skips poisoned extents

Let's not move poisoned extents unnecessarily, since we can't guard
against introducing more bitrot.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Data move can read from poisoned extents
Kent Overstreet [Tue, 11 Mar 2025 14:02:40 +0000 (10:02 -0400)]
bcachefs: Data move can read from poisoned extents

Now, if an extent is poisoned we can move it even if there was a
checksum error. We'll have to give it a new checksum, but the poison bit
means that userspace will still see the appropriate error when they try
to read it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Poison extents that can't be read due to checksum errors
Kent Overstreet [Mon, 10 Mar 2025 18:03:25 +0000 (14:03 -0400)]
bcachefs: Poison extents that can't be read due to checksum errors

Copygc needs to be able to move extents that have bitrotted. We don't
want to delete them - in the future we'll have an API for "read me the
data even if there's checksum errors", and in general we don't want to
delete anything unless the user asks us to.

That will require writing it with a new checksum, which means we can't
forget that there was a checksum error so we return the correct error to
userspace.

Rebalance also wants to skip bad extents; we can now use the poison flag
for that.

This is currently disabled by default, as we want read fua support so
that we can distinguish between transient and permanent errors from the
device. It may be enabled with the module parameter:

  poison_extents_on_checksum_error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Be precise about bch_io_failures
Kent Overstreet [Mon, 10 Mar 2025 17:33:41 +0000 (13:33 -0400)]
bcachefs: Be precise about bch_io_failures

If the extent we're reading from changes, due to be being overwritten or
moved (possibly partially) - we need to reset bch_io_failures so that we
don't accidentally mark a new extent as poisoned prematurely.

This means we have to separately track (in the retry path) the extent we
previously read from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: bch2_subvolume_wait_for_pagecache_and_delete() cleanup
Kent Overstreet [Sat, 29 Mar 2025 21:59:30 +0000 (17:59 -0400)]
bcachefs: bch2_subvolume_wait_for_pagecache_and_delete() cleanup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Check for casefolded dirents in non casefolded dirs
Kent Overstreet [Wed, 21 May 2025 04:38:04 +0000 (00:38 -0400)]
bcachefs: Check for casefolded dirents in non casefolded dirs

Check for mismatches between casefold dirents and casefold directories.

A mismatch will cause lookups to fail, as we'll be doing the lookup with
the casefolded name, which won't match the non-casefolded dirent, and
vice versa.

Reported-by: Christopher Snowhill <chris@kode54.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Fix bch2_dirent_create_snapshot() for casefolding
Kent Overstreet [Wed, 21 May 2025 04:41:07 +0000 (00:41 -0400)]
bcachefs: Fix bch2_dirent_create_snapshot() for casefolding

bch2_dirent_create_snapshot(), used in fsck, neglected to create a
casefolded dirent.

Just move this into dirent_create_key().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Fix casefold opt via xattr interface
Kent Overstreet [Wed, 21 May 2025 00:05:45 +0000 (20:05 -0400)]
bcachefs: Fix casefold opt via xattr interface

Changing the casefold option requires extra checks/work - factor out a
helper from bch2_fileattr_set() for the xattr code to use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: mkwrite() now only dirties one page
Kent Overstreet [Thu, 15 May 2025 01:32:40 +0000 (21:32 -0400)]
bcachefs: mkwrite() now only dirties one page

Don't dirty the whole folio - fixes write amplification with
applications doing mmaped writes.

https://www.reddit.com/r/bcachefs/comments/1klzcg1/incredible_amounts_of_write_amplification_when/

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: fix extent_has_stripe_ptr()
Kent Overstreet [Mon, 19 May 2025 02:32:30 +0000 (22:32 -0400)]
bcachefs: fix extent_has_stripe_ptr()

This wasn't checking indirect extents.

Fixes: https://github.com/koverstreet/bcachefs/issues/887
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
Kent Overstreet [Sat, 17 May 2025 22:37:02 +0000 (18:37 -0400)]
bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced

btree_key_cache_fill() will allocate and traverse another path (for the
underlying btree), so we can't hold pointers to paths across a call - we
have to pass indices.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: fix wrong arg to fsck_err()
Kent Overstreet [Wed, 14 May 2025 22:53:48 +0000 (18:53 -0400)]
bcachefs: fix wrong arg to fsck_err()

fsck_err() needs the btree transaction passed to it if there is one - so
that it can unlock/relock around prompting userspace for fixing the
error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: Fix missing commit in backpointer to missing target
Kent Overstreet [Fri, 9 May 2025 19:05:19 +0000 (15:05 -0400)]
bcachefs: Fix missing commit in backpointer to missing target

Fsck wants to do transaction commits from an outer context; it may have
other repair to do (i.e. duplicate backpointers).

But when calling backpointer_not_found() from runtime code, i.e. runtime
self healing, we should be doing the commit - the outer context expects
to just be doing lookups.

This fixes bugs where we get stuck spinning, reported as "RCU lock hold
time warnings.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: Fix accidental O(n^2) in fiemap
Kent Overstreet [Wed, 14 May 2025 17:40:47 +0000 (13:40 -0400)]
bcachefs: Fix accidental O(n^2) in fiemap

Since bch2_seek_pagecache_data() searches for dirty data, we only want
to call it for holes in the extents btree - otherwise we have an
accidental O(n^2), as we repeatedly search the same range.

Reported-by: Marcin Mirosław <marcin@mejor.pl>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: Fix set_should_be_locked() call in peek_slot()
Kent Overstreet [Wed, 14 May 2025 01:14:17 +0000 (21:14 -0400)]
bcachefs: Fix set_should_be_locked() call in peek_slot()

set_should_be_locked() needs to be called before peek_key_cache(), which
traverses other paths and may do a trans unlock/relock.

This fixes an assertion pop in path_peek_slot(), when the path we're
using is unexpectedly not uptodate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: Fix self deadlock
Alan Huang [Tue, 13 May 2025 10:54:26 +0000 (18:54 +0800)]
bcachefs: Fix self deadlock

Before invoking bch2_accounting_mem_mod_locked in
bch2_gc_accounting_done, we already write locked mark_lock,
in bch2_accounting_mem_insert, we lock mark_lock again.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: Don't set btree nodes as accessed on fill
Kent Overstreet [Tue, 13 May 2025 18:27:01 +0000 (14:27 -0400)]
bcachefs: Don't set btree nodes as accessed on fill

Prevent jobs that do lots of scanning (i.e. evacuatee, scrub) from
causing OOMs.

The shrinker code seems to be having issues when it doesn't do any
freeing because it's just flipping off the acccessed bit - and the
accessed bit shouldn't be set on first use anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: Fix livelock in journal_entry_open()
Kent Overstreet [Tue, 13 May 2025 16:55:44 +0000 (12:55 -0400)]
bcachefs: Fix livelock in journal_entry_open()

When the journal is low on space, we might do discards from
journal_res_get() -> journal_entry_open().

Make sure we set j->can_discard correctly, so that if we're low on space
but not because discards aren't keeping up we don't livelock.

Fixes: 8e4d28036c29 ("bcachefs: Don't aggressively discard the journal")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: Fix broken btree_path lock invariants in next_node()
Kent Overstreet [Sat, 10 May 2025 18:42:37 +0000 (14:42 -0400)]
bcachefs: Fix broken btree_path lock invariants in next_node()

This fixes btree locking assert pops users were seeing during evacuate:

https://github.com/koverstreet/bcachefs/issues/878

May 09 22:45:02 sharon kernel: bcachefs (68116e25-fa2d-4c6f-86c7-e8b431d792ae):   bch2_btree_insert_node(): node not locked at level 1
May 09 22:45:02 sharon kernel:   bch2_btree_node_rewrite [bcachefs]: watermark=btree no_check_rw alloc l=0-1 mode=none nodes_written=0 cl.remaining=2 journal_seq=0
May 09 22:45:02 sharon kernel:   path: idx   1 ref 1:0   S B btree=alloc level=0 pos 0:3699637:0 0:3698012:1-0:3699637:0 bch2_move_btree.isra.0+0x1db/0x490 [bcachefs] uptodate 0 locks_want 2
May 09 22:45:02 sharon kernel:     l=0 locks intent seq 4 node ffff8bd700c93600
May 09 22:45:02 sharon kernel:     l=1 locks unlocked seq 1712 node ffff8bd6fd5e7a00
May 09 22:45:02 sharon kernel:     l=2 locks unlocked seq 2295 node ffff8bd6cc725400
May 09 22:45:02 sharon kernel:     l=3 locks unlocked seq 0 node 0000000000000000

Evacuate walks btree nodes with bch2_btree_iter_next_node() and rewrites
them, bch2_btree_update_start() upgrades the path to take intent locks
as far as it needs to.

But next_node() does low level unlock/relock calls on individual nodes,
and didn't handle the case where a path is supposed to be holding
multiple intent locks. If a path has locks_want > 1, it needs to be
either holding locks on all the btree nodes (at each level) requested,
or none of them.

Fix this with a bch2_btree_path_downgrade().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 weeks agobcachefs: Don't strip rebalance_opts from indirect extents
Kent Overstreet [Sat, 10 May 2025 15:30:21 +0000 (11:30 -0400)]
bcachefs: Don't strip rebalance_opts from indirect extents

Fix bch2_bkey_clear_needs_rebalance(): indirect extents are never
supposed to have bch_extent_rebalance stripped off, because that's how
we get the IO path options when we don't have the original inode it
belonged to.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: Don't aggressively discard the journal
Kent Overstreet [Wed, 7 May 2025 17:32:15 +0000 (13:32 -0400)]
bcachefs: Don't aggressively discard the journal

We frequently use 'bcachefs list_journal -a' for debugging, as it
provides a record of all btree transactions, and a history of what
happened.

But it's not so useful if we immediately discard journal buckets right
after they're no longer dirty.

This tweaks journal reclaim to only discard when we're low on space,
keeping the journal mostly un-discarded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: Ensure superblock gets written when we go ERO
Kent Overstreet [Wed, 7 May 2025 20:54:25 +0000 (16:54 -0400)]
bcachefs: Ensure superblock gets written when we go ERO

When we go emergency read-only, make sure we do a final write_super() to
persist counters and error counts - this can be critical for piecing
together what fsck was doing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: Filter out harmless EROFS error messages
Kent Overstreet [Wed, 7 May 2025 17:50:00 +0000 (13:50 -0400)]
bcachefs: Filter out harmless EROFS error messages

These just indicate that we're shutting down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: journal_shutdown is EROFS, not EIO
Kent Overstreet [Tue, 6 May 2025 04:22:26 +0000 (00:22 -0400)]
bcachefs: journal_shutdown is EROFS, not EIO

We often filter out EROFS errors to avoid log spew after an emergency
shutdown - journal_shutdown is just another emergency shutdown error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: Call bch2_fs_start before getting vfs superblock
Kent Overstreet [Mon, 5 May 2025 19:52:57 +0000 (15:52 -0400)]
bcachefs: Call bch2_fs_start before getting vfs superblock

This reverts

1fdbe0b184c8 bcachefs: Make sure c->vfs_sb is set before starting fs

switched up bch2_fs_get_tree() so that we got a superblock before
calling bch2_fs_start, so that c->vfs_sb would always be initialized
while the filesystem was active.

This turned out not to be necessary, because blk_holder_ops were
implemented using our own locking, not vfs locking.

And this had the side effect of creating a super_block and doing our
full recovery (including potentially fsck) before setting SB_BORN, which
causes things like sync calls to hang until our recovery is finished.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: fix hung task timeout in journal read
Kent Overstreet [Sun, 4 May 2025 22:46:16 +0000 (18:46 -0400)]
bcachefs: fix hung task timeout in journal read

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: Add missing barriers before wake_up_bit()
Kent Overstreet [Mon, 5 May 2025 18:13:21 +0000 (14:13 -0400)]
bcachefs: Add missing barriers before wake_up_bit()

wake_up() doesn't require a barrier - but wake_up_bit() does.

This only affected non x86, and primarily lead to lost wakeups after
btree node reads.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: Ensure proper write alignment
Kent Overstreet [Sun, 4 May 2025 20:31:40 +0000 (16:31 -0400)]
bcachefs: Ensure proper write alignment

There was a buggy version of bcachefs-tools which picked misaligned
bucket sizes when formatting, and we're also about to do dynamic block
sizes - which will allow picking logical block size or physical block
size of the device per-write, allowing for better compression ratios at
the cost of slightly worse write performance (i.e. forcing the device to
do RMW or extra buffering).

To account for this, tweak bch2_alloc_sectors_start() to properly align
open_buckets to the blocksize of the write we're about to do.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
8 weeks agobcachefs: Improve want_cached_ptr()
Kent Overstreet [Sun, 4 May 2025 19:01:34 +0000 (15:01 -0400)]
bcachefs: Improve want_cached_ptr()

If promote target isn't set, rebalance should still leave a cached copy
on the faster device.

Fall back to foreground_target if it's set, or allow a cached copy on
any device if neither are set.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: thread_with_stdio: fix spinning instead of exiting
Kent Overstreet [Sun, 4 May 2025 17:50:09 +0000 (13:50 -0400)]
bcachefs: thread_with_stdio: fix spinning instead of exiting

bch2_stdio_redirect_vprintf() was missing a check for stdio->done, i.e.
exiting.

This caused the thread attempting to print to spin, and since it was
being called from the kthread ran by thread_with_stdio, the userspace
side hung as well.

Change it to return -EPIPE - i.e. writing to a pipe that's been closed.

Reported-by: Jan Solanti <jhs@psonet.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Remove incorrect __counted_by annotation
Alan Huang [Thu, 1 May 2025 20:01:31 +0000 (04:01 +0800)]
bcachefs: Remove incorrect __counted_by annotation

This actually reverts 86e92eeeb237 ("bcachefs: Annotate struct bch_xattr
with __counted_by()").

After the x_name, there is a value. According to the disscussion[1],
__counted_by assumes that the flexible array member contains exactly
the amount of elements that are specified. Now there are users came across
a false positive detection of an out of bounds write caused by
the __counted_by here[2], so revert that.

[1] https://lore.kernel.org/lkml/Zv8VDKWN1GzLRT-_@archlinux/T/#m0ce9541c5070146320efd4f928cc1ff8de69e9b2
[2] https://privatebin.net/?a0d4e97d590d71e1#9bLmp2Kb5NU6X6cZEucchDcu88HzUQwHUah8okKPReEt

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: add missing sched_annotate_sleep()
Kent Overstreet [Thu, 1 May 2025 16:38:00 +0000 (12:38 -0400)]
bcachefs: add missing sched_annotate_sleep()

00594 ------------[ cut here ]------------
00594 do not call blocking ops when !TASK_RUNNING; state=2 set at [<000000003e51ef4a>] prepare_to_wait_event+0x5c/0x1c0
00594 WARNING: CPU: 12 PID: 1117 at kernel/sched/core.c:8741 __might_sleep+0x74/0x88
00594 Modules linked in:
00594 CPU: 12 UID: 0 PID: 1117 Comm: umount Not tainted 6.15.0-rc4-ktest-g3a72e369412d #21845 PREEMPT
00594 Hardware name: linux,dummy-virt (DT)
00594 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00594 pc : __might_sleep+0x74/0x88
00594 lr : __might_sleep+0x74/0x88
00594 sp : ffffff80c8d67a90
00594 x29: ffffff80c8d67a90 x28: ffffff80f5903500 x27: 0000000000000000
00594 x26: 0000000000000000 x25: ffffff80cf5002a0 x24: ffffffc087dad000
00594 x23: ffffff80c8d67b40 x22: 0000000000000000 x21: 0000000000000000
00594 x20: 0000000000000242 x19: ffffffc080b92020 x18: 00000000ffffffff
00594 x17: 30303c5b20746120 x16: 74657320323d6574 x15: 617473203b474e49
00594 x14: 0000000000000001 x13: 00000000000c0000 x12: ffffff80facc0000
00594 x11: 0000000000000001 x10: 0000000000000001 x9 : ffffffc0800b0774
00594 x8 : c0000000fffbffff x7 : ffffffc087dac670 x6 : 00000000015fffa8
00594 x5 : ffffff80facbffa8 x4 : ffffff80fbd30b90 x3 : 0000000000000000
00594 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80f5903500
00594 Call trace:
00594  __might_sleep+0x74/0x88 (P)
00594  __mutex_lock+0x64/0x8d8
00594  mutex_lock_nested+0x28/0x38
00594  bch2_fs_ec_flush+0xf8/0x128
00594  __bch2_fs_read_only+0x54/0x1d8
00594  bch2_fs_read_only+0x3e0/0x438
00594  __bch2_fs_stop+0x5c/0x250
00594  bch2_put_super+0x18/0x28
00594  generic_shutdown_super+0x6c/0x140
00594  bch2_kill_sb+0x1c/0x38
00594  deactivate_locked_super+0x54/0xd0
00594  deactivate_super+0x70/0x90
00594  cleanup_mnt+0xec/0x188
00594  __cleanup_mnt+0x18/0x28
00594  task_work_run+0x90/0xd8
00594  do_notify_resume+0x138/0x148
00594  el0_svc+0x9c/0xa0
00594  el0t_64_sync_handler+0x104/0x130
00594  el0t_64_sync+0x154/0x158

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Fix __bch2_dev_group_set()
Kent Overstreet [Thu, 1 May 2025 02:37:13 +0000 (22:37 -0400)]
bcachefs: Fix __bch2_dev_group_set()

bch2_sb_disk_groups_to_cpu() goes off of the superblock member info, so
we need to set that first.

Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Kill ERO for i_blocks check in truncate
Kent Overstreet [Thu, 1 May 2025 04:01:29 +0000 (00:01 -0400)]
bcachefs: Kill ERO for i_blocks check in truncate

Replace with logging the error in the superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: check for inode.bi_sectors underflow
Kent Overstreet [Thu, 1 May 2025 03:56:00 +0000 (23:56 -0400)]
bcachefs: check for inode.bi_sectors underflow

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Kill ERO in __bch2_i_sectors_acct()
Kent Overstreet [Thu, 1 May 2025 03:18:49 +0000 (23:18 -0400)]
bcachefs: Kill ERO in __bch2_i_sectors_acct()

We won't be root causing this in the immediate future, and it's fairly
innocuous - so just log it in the superblock.

https://github.com/koverstreet/bcachefs/issues/869

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: readdir fixes
Kent Overstreet [Tue, 29 Apr 2025 18:30:01 +0000 (14:30 -0400)]
bcachefs: readdir fixes

- Don't call bch2_trans_relock() after dir_emit(); taking a transaction
  restart here will cause us to emit the same dirent to userspace twice

- Fix incorrect checking of the return value on dir_emit(): "true" means
  success, keep going, but bch2_dir_emit() needs to return true when
  we're finished iterating.

https://github.com/koverstreet/bcachefs/issues/867

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: improve missing journal write device error message
Kent Overstreet [Tue, 29 Apr 2025 03:33:06 +0000 (23:33 -0400)]
bcachefs: improve missing journal write device error message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Topology error after insert is now an ERO
Kent Overstreet [Tue, 29 Apr 2025 00:38:04 +0000 (20:38 -0400)]
bcachefs: Topology error after insert is now an ERO

A user hit this, and this will naturally be easier to debug if we don't
panic.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Use bch2_kvmalloc() for journal keys array
Kent Overstreet [Tue, 29 Apr 2025 00:28:58 +0000 (20:28 -0400)]
bcachefs: Use bch2_kvmalloc() for journal keys array

We can hit this limit fairly easy when we have to reconstuct large
amounts of alloc info on large filesystems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: More informative error message when shutting down due to error
Kent Overstreet [Tue, 29 Apr 2025 00:25:15 +0000 (20:25 -0400)]
bcachefs: More informative error message when shutting down due to error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: btree_root_unreadable_and_scan_found_nothing autofix for non data btrees
Kent Overstreet [Tue, 29 Apr 2025 00:12:01 +0000 (20:12 -0400)]
bcachefs: btree_root_unreadable_and_scan_found_nothing autofix for non data btrees

If loosing a btree won't cause data loss - i.e. it's an alloc btree, or
we can easily reconstruct it - we shouldn't require user action to
continue repair.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: btree_node_data_missing is now autofix
Kent Overstreet [Sat, 26 Apr 2025 13:31:23 +0000 (09:31 -0400)]
bcachefs: btree_node_data_missing is now autofix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Don't generate alloc updates to invalid buckets
Kent Overstreet [Mon, 28 Apr 2025 16:11:31 +0000 (12:11 -0400)]
bcachefs: Don't generate alloc updates to invalid buckets

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Improve bch2_dev_bucket_missing()
Kent Overstreet [Mon, 28 Apr 2025 16:01:51 +0000 (12:01 -0400)]
bcachefs: Improve bch2_dev_bucket_missing()

More useful error message.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: fix bch2_dev_buckets_resize()
Kent Overstreet [Mon, 28 Apr 2025 16:09:53 +0000 (12:09 -0400)]
bcachefs: fix bch2_dev_buckets_resize()

The resize memcpy path was totally busted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Add upgrade table entry from 0.14
Kent Overstreet [Sat, 26 Apr 2025 15:05:32 +0000 (11:05 -0400)]
bcachefs: Add upgrade table entry from 0.14

There are a few errors that needed to be marked as autofix.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Run BCH_RECOVERY_PASS_reconstruct_snapshots on missing subvol -> snapshot
Kent Overstreet [Sat, 26 Apr 2025 15:05:32 +0000 (11:05 -0400)]
bcachefs: Run BCH_RECOVERY_PASS_reconstruct_snapshots on missing subvol -> snapshot

Fix this repair path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Add missing utf8_unload()
Kent Overstreet [Sat, 26 Apr 2025 16:19:47 +0000 (12:19 -0400)]
bcachefs: Add missing utf8_unload()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Emit unicode version message on startup
Kent Overstreet [Sat, 26 Apr 2025 16:09:33 +0000 (12:09 -0400)]
bcachefs: Emit unicode version message on startup

fstests expects this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Use generic_set_sb_d_ops for standard casefolding d_ops
Kent Overstreet [Sat, 26 Apr 2025 15:38:58 +0000 (11:38 -0400)]
bcachefs: Use generic_set_sb_d_ops for standard casefolding d_ops

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agobcachefs: Fix losing return code in next_fiemap_extent()
Kent Overstreet [Sun, 27 Apr 2025 00:07:24 +0000 (20:07 -0400)]
bcachefs: Fix losing return code in next_fiemap_extent()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2 months agoLinux 6.15-rc4 v6.15-rc4
Linus Torvalds [Sun, 27 Apr 2025 22:19:23 +0000 (15:19 -0700)]
Linux 6.15-rc4

2 months agoMerge tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Sat, 26 Apr 2025 20:02:36 +0000 (13:02 -0700)]
Merge tag 'pci-v6.15-fixes-3' of git://git./linux/kernel/git/pci/pci

Pull PCI fixes from Bjorn Helgaas:

 - When releasing a start-aligned resource, e.g., a bridge window, save
   start/end/flags for the next assignment attempt; fixes a v6.15-rc1
   regression (Ilpo Järvinen)

 - Move set_pcie_speed.sh from TEST_PROGS to TEST_FILE; fixes a bwctrl
   selftest v6.15-rc1 regression (Ilpo Järvinen)

 - Add Manivannan Sadhasivam as maintainer of native host bridge and
   endpoint drivers (Manivannan Sadhasivam)

 - In endpoint test driver, defer IRQ allocation from .probe() until
   ioctl() to fix a regression on platforms where the Vendor/Device ID
   match doesn't include driver_data (Niklas Cassel)

* tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  misc: pci_endpoint_test: Defer IRQ allocation until ioctl(PCITEST_SET_IRQTYPE)
  MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer
  selftests/pcie_bwctrl: Fix test progs list
  PCI: Restore assigned resources fully after release

2 months agoMerge tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Sat, 26 Apr 2025 17:43:03 +0000 (10:43 -0700)]
Merge tag 'nfsd-6.15-2' of git://git./linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Revert a v6.15 patch due to a report of SELinux test failures

* tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  Revert "sunrpc: clean cache_detail immediately when flush is written frequently"

2 months agoMerge tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 26 Apr 2025 16:45:54 +0000 (09:45 -0700)]
Merge tag 'x86-urgent-2025-04-26' of git://git./linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix 32-bit kernel boot crash if passed physical memory with more than
   32 address bits

 - Fix Xen PV crash

 - Work around build bug in certain limited build environments

 - Fix CTEST instruction decoding in insn_decoder_test

* tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn: Fix CTEST instruction decoding
  x86/boot: Work around broken busybox 'truncate' tool
  x86/mm: Fix _pgd_alloc() for Xen PV mode
  x86/e820: Discard high memory that can't be addressed by 32-bit systems

2 months agoMerge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 26 Apr 2025 16:23:20 +0000 (09:23 -0700)]
Merge tag 'sched-urgent-2025-04-26' of git://git./linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "Fix sporadic crashes in dequeue_entities() due to ... bad math.

  [ Arguably if pick_eevdf()/pick_next_entity() was less trusting of
    complex math being correct it could have de-escalated a crash into
    a warning, but that's for a different patch ]"

* tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash