Mike Snitzer [Sat, 27 Jan 2024 02:35:46 +0000 (21:35 -0500)]
dm vdo block-map: use uds_log_ratelimit() rather than open code it
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Sat, 27 Jan 2024 02:35:45 +0000 (21:35 -0500)]
dm vdo block-map: fix a few small nits
Rename 'pages' to 'num_pages' in distribute_page_over_waitq().
Update assert message in validate_completed_page() to model others.
Tweak line-wrapping on a comment that was needlessly long.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Sat, 27 Jan 2024 02:18:33 +0000 (21:18 -0500)]
dm vdo: use a proper Makefile for dm-vdo
Requires moving dm-vdo-target.c into drivers/md/dm-vdo/
This change adds a proper drivers/md/dm-vdo/Makefile and eliminates
the abnormal use of patsubst in drivers/md/Makefile -- which was the
cause of at least one build failure that was reported by the upstream
build bot.
Also, split out VDO's drivers/md/dm-vdo/Kconfig and include it from
drivers/md/Kconfig
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Fri, 1 Dec 2023 00:54:56 +0000 (19:54 -0500)]
dm vdo: fix how dm_kcopyd_client_create() failure is checked
dm_kcopyd_client_create() returns an ERR_PTR so its return must be
checked with IS_ERR().
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chung Chung <cchung@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Bruce Johnston [Mon, 20 Nov 2023 22:29:56 +0000 (17:29 -0500)]
dm vdo int-map: remove unused parameter from vdo_int_map_create
Reviewed-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Bruce Johnston [Mon, 20 Nov 2023 22:29:55 +0000 (17:29 -0500)]
dm vdo int-map: rename functions to use a common vdo_int_map preamble
Reviewed-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Bruce Johnston [Mon, 20 Nov 2023 22:29:54 +0000 (17:29 -0500)]
dm vdo dedupe: switch to using int-map instead of pointer-map
Use get_unaligned_le64() on the hash lock's record name to serve as
the key to use with the int hash-map.
Switching to using int hash-map removes the only consumer of pointer
hash-map, as such it is removed.
Reviewed-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Mon, 20 Nov 2023 22:29:20 +0000 (17:29 -0500)]
dm vdo wait-queue: rename to vdo_waitq_dequeue_waiter
Rename vdo_waitq_dequeue_next_waiter to vdo_waitq_dequeue_waiter. The
"next" aspect of returned waiter is implied. "next" also isn't
informative ("oldest" would be). Removing "next_" adds symmetry to
vdo_waitq_enqueue_waiter().
Also fix whitespace and comments from previous waitq commit.
Reviewed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Mon, 20 Nov 2023 22:29:19 +0000 (17:29 -0500)]
dm vdo block-map: optimize enter_zone_read_only_mode
Rather than incrementally dequeue from the zone->flush_waiters
vdo_wait_queue, simply re-initialize it.
Reviewed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Mon, 20 Nov 2023 22:29:18 +0000 (17:29 -0500)]
dm vdo wait-queue: optimize vdo_waitq_dequeue_matching_waiters
Remove temporary 'matched_waiters' waitq and just enqueue matched
waiters directly to the caller provided 'matched_waitq'.
Reviewed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Mon, 20 Nov 2023 22:29:17 +0000 (17:29 -0500)]
dm vdo wait-queue: remove unused debug function vdo_waitq_get_next_waiter
Reviewed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Mon, 20 Nov 2023 22:29:16 +0000 (17:29 -0500)]
dm vdo wait-queue: add proper namespace to interface
Rename various interfaces and structs associated with vdo's wait-queue,
e.g.: s/wait_queue/vdo_wait_queue/, s/waiter/vdo_waiter/, etc.
Now all function names start with "vdo_waitq_" or "vdo_waiter_".
Reviewed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Fri, 25 Aug 2023 19:16:17 +0000 (15:16 -0400)]
dm vdo io-submitter: rename to vdo_submit_vio and submit_data_vio
Rename process_vio_io() to vdo_submit_vio(), and process_data_vio_io() to
submit_data_vio().
Reviewed-by: Susan LeGendre-McGhee <slegendr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Fri, 25 Aug 2023 18:52:05 +0000 (14:52 -0400)]
dm vdo io-submitter: rename to vdo_submit_data_vio
Rename submit_data_vio_io() to vdo_submit_data_vio().
Reviewed-by: Susan LeGendre-McGhee <slegendr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Fri, 25 Aug 2023 18:43:30 +0000 (14:43 -0400)]
dm vdo io-submitter: rename to vdo_submit_flush_vio
Rename submit_flush_vio() to vdo_submit_flush_vio().
Reviewed-by: Susan LeGendre-McGhee <slegendr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Fri, 25 Aug 2023 18:36:46 +0000 (14:36 -0400)]
dm vdo io-submitter: rename to vdo_submit_metadata_vio
Rename submit_metadata_vio() to vdo_submit_metadata_vio().
Reviewed-by: Susan LeGendre-McGhee <slegendr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Mike Snitzer [Fri, 25 Aug 2023 16:42:25 +0000 (12:42 -0400)]
dm vdo io-submitter: remove get_bio_sector
Just open-code access to bio's sector.
Reviewed-by: Susan LeGendre-McGhee <slegendr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Matthew Sakai [Fri, 17 Nov 2023 02:32:39 +0000 (21:32 -0500)]
dm vdo: add MAINTAINERS file entry
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:28:35 +0000 (21:28 -0500)]
dm vdo: enable configuration and building of dm-vdo
dm-vdo targets are not supported for 32-bit configurations. A vdo target
typically requires 1 to 1.5 GB of memory at any given time, which is likely
a large fraction of the addressable memory of a 32-bit system. At the same
time, the amount of addressable storage attached to a 32-bit system may not
be large enough for deduplication to provide much benefit. Because of these
concerns, 32-bit platforms are deemed unlikely to benefit from using a vdo
target, so dm-vdo is targeted only at 64-bit platforms.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: John Wiele <jwiele@redhat.com>
Signed-off-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:24:06 +0000 (21:24 -0500)]
dm vdo: add the top-level DM target
This adds the dm-vdo target.
The dm-vdo target provides inline deduplication, compression, and
zero-block elimination, allowing applications to consume less actual
storage than a normal target. By layering it with other device mapper
targets, it can add these features to any storage stack. It can also
provide a common deduplication pool for groups of targets. The vdo target
does not protect against data corruption, relying instead on integrity
protection of the storage below it.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:12:14 +0000 (21:12 -0500)]
dm vdo: add debugging support
Add support for dumping detailed vdo state to the kernel log via a dmsetup
message. The dump code is not thread-safe and is generally intended for use
only when the vdo is hung.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:11:23 +0000 (21:11 -0500)]
dm vdo: add sysfs support for setting parameters and fetching stats
Add data and methods setting run time parameters via sysfs, and to
make state and statistics information available through sysfs.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:10:00 +0000 (21:10 -0500)]
dm vdo: add statistics reporting
Add data and methods to report statisics.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:09:25 +0000 (21:09 -0500)]
dm vdo: add the on-disk formats and marshalling of vdo structures
Add data and methods for marshalling and unmarshalling the persistent
metadata.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:08:47 +0000 (21:08 -0500)]
dm vdo: add the primary vdo structure
Add the data and methods that manage the dm-vdo target itself. This
includes the overall state of the target and its threads, the state of
the logical volumes, startup, shutdown, and statistics.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:08:05 +0000 (21:08 -0500)]
dm vdo: add repair of damaged vdo volumes
When a vdo is restarted after a crash, it will automatically attempt to
recover from its journals.
If a vdo encounters an unrecoverable error, it will enter read-only mode.
This mode indicates that some previously acknowledged data may have been
lost. The vdo may be instructed to rebuild as best it can in order to
return to a writable state. Although some data may be lost, this process
will ensure that the vdo's own metadata is self-consistent.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:07:26 +0000 (21:07 -0500)]
dm vdo: add the recovery journal
The recovery journal is used to amortize updates across the block map and
slab depot. Each write request causes an entry to be made in the journal.
Entries are either "data remappings" or "block map remappings." For a data
remapping, the journal records the logical address affected and its old and
new physical mappings. For a block map remapping, the journal records the
block map page number and the physical block allocated for it (block map
pages are never reclaimed, so the old mapping is always 0). Each journal
entry and the data write it represents must be stable on disk before the
other metadata structures may be updated to reflect the operation.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:06:33 +0000 (21:06 -0500)]
dm vdo: implement the block map page cache
The set of leaf pages of the block map tree is too large to fit in memory,
so each block map zone maintains a cache of leaf pages. This patch adds the
implementation of that cache.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:05:22 +0000 (21:05 -0500)]
dm vdo: add the block map
The block map contains the logical to physical mapping. It can be thought
of as an array with one entry per logical address. Each entry is 5 bytes:
36 bits contain the physical block number which holds the data for the
given logical address, and the remaining 4 bits are used to indicate the
nature of the mapping. Of the 16 possible states, one represents a logical
address which is unmapped (i.e. it has never been written, or has been
discarded), one represents an uncompressed block, and the other 14 states
are used to indicate that the mapped data is compressed, and which of the
compression slots in the compressed block this logical address maps to.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 02:00:27 +0000 (21:00 -0500)]
dm vdo: add the slab depot
Add the data and methods that implement the slab_depot that manages
the allocation of slabs of blocks added by the preceding patches.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 01:58:33 +0000 (20:58 -0500)]
dm vdo: add the block allocators and physical zones
Each slab is independent of every other. They are assigned to "physical
zones" in round-robin fashion. If there are P physical zones, then slab n
is assigned to zone n mod P. The set of slabs in each physical zone is
managed by a block allocator.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 01:57:50 +0000 (20:57 -0500)]
dm vdo: add the slab summary
The slab depot maintains an additional small data structure, the "slab
summary," which is used to reduce the amount of work needed to come back
online after a crash. The slab summary maintains an entry for each slab
indicating whether or not the slab has ever been used, whether it is clean
(i.e. all of its reference count updates have been persisted to storage),
and approximately how full it is. During recovery, each physical zone will
attempt to recover at least one slab, stopping whenever it has recovered a
slab which has some free blocks. Once each zone has some space (or has
determined that none is available), the target can resume normal operation
in a degraded mode. Read and write requests can be serviced, perhaps with
degraded performance, while the remainder of the dirty slabs are recovered.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 01:56:30 +0000 (20:56 -0500)]
dm vdo: add slab structure, slab journal and reference counters
Most of the vdo volume belongs to the slab depot. The depot contains a
collection of slabs. The slabs can be up to 32GB, and are divided into
three sections. Most of a slab consists of a linear sequence of 4K blocks.
These blocks are used either to store data, or to hold portions of the
block map (see subsequent patches). In addition to the data blocks, each
slab has a set of reference counters, using 1 byte for each data block.
Finally each slab has a journal. Reference updates are written to the slab
journal, which is written out one block at a time as each block fills. A
copy of the reference counters is kept in memory, and are written out a
block at a time, in oldest-dirtied-order whenever there is a need to
reclaim slab journal space. The journal is used both to ensure that the
main recovery journal (see subsequent patches) can regularly free up space,
and also to amortize the cost of updating individual reference blocks.
This patch adds the slab structure as well as the slab journal and
reference counters.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 01:13:38 +0000 (20:13 -0500)]
dm vdo: add the compressed block bin packer
When blocks do not deduplicate, vdo will attempt to compress them. Up to 14
compressed blocks may be packed into a single data block (this limitation
is imposed by the block map). The packer implements a simple best-fit
packing algorithm and also manages the formatting and writing of compressed
blocks when bins fill.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 01:13:06 +0000 (20:13 -0500)]
dm vdo: add use of deduplication index in hash zones
Add the data and methods that manage queries to the deduplication
index and the responses from the index.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 01:11:44 +0000 (20:11 -0500)]
dm vdo: add hash locks and hash zones
In order to deduplicate concurrent writes of the same data (to different
locations), data_vios which are writing the same data are grouped together
in a "hash lock," named for and keyed by the hash of the data being
written. Each hash lock is assigned to a hash zone based on a portion of
its hash.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 01:00:53 +0000 (20:00 -0500)]
dm vdo: add the vdo io_submitter
The io_submitter handles bio submission from vdo data store to the storage
below. It will merge bios when possible.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 01:00:21 +0000 (20:00 -0500)]
dm vdo: add flush support
This patch adds support for handling incoming flush and/or FUA bios. Each
such bio is assigned to a struct vdo_flush. These are allocated as needed,
but there is always one kept in reserve in case allocations fail. In the
event of an allocation failure, bios may need to wait for an outstanding
flush to complete.
The logical address space is partitioned into logical zones, each handled
by its own thread. Each zone keeps a list of all data_vios handling write
requests for logical addresses in that zone. When a flush bio is processed,
each logical zone is informed of the flush. When all of the writes which
are in progress at the time of the notification have completed in all
zones, the flush bio is then allowed to complete.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:59:39 +0000 (19:59 -0500)]
dm vdo: add data_vio, the request object which services incoming bios
Add the data and methods that implement the data_vio object that
handles user data bios as they are processed.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:58:59 +0000 (19:58 -0500)]
dm vdo: add vio, the request object for vdo metadata
Add the data and methods that implement the vio object that is basic
unit of I/O in vdo.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:58:24 +0000 (19:58 -0500)]
dm vdo: add administrative state and action manager
This patch adds the admin_state structures which are used to track the
states of individual vdo components for handling of operations like suspend
and resume. It also adds the action manager which is used to schedule and
manage cross-thread administrative and internal operations.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:57:44 +0000 (19:57 -0500)]
dm vdo: implement external deduplication index interface
The deduplication index interface for index clients includes the
deduplication request and index session structures. This is the interface
that the rest of the vdo target uses to make requests, receive responses,
and collect statistics.
This patch also adds sysfs nodes for inspecting various index properties at
runtime.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Co-developed-by: John Wiele <jwiele@redhat.com>
Signed-off-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:56:07 +0000 (19:56 -0500)]
dm vdo: implement top-level deduplication index
The top-level deduplication index brings all the earlier components
together. The top-level index creates the separate zone structures that
enable the index to handle several requests in parallel, handles
dispatching requests to the right zones and components, and coordinates
metadata to ensure that it remain consistent. It also coordinates recovery
in the event of an unexpected index failure.
If sparse caching is enabled, the top-level index also handles the
coordination required by the sparse chapter index cache, which (unlike most
index structures) is shared among all zones.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:55:23 +0000 (19:55 -0500)]
dm vdo: implement the chapter volume store
The volume store structures manage the reading and writing of chapter
pages. When a chapter is closed, it is packed into a read-only structure,
split across several pages, and written to storage.
The volume store also contains a cache and specialized queues that sort and
batch requests by the page they need, in order to minimize latency and I/O
requests when records have to be read from storage. The cache and queues
also coordinate with the volume index to ensure that the volume does not
waste resources reading pages that are no longer valid.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Co-developed-by: John Wiele <jwiele@redhat.com>
Signed-off-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:54:29 +0000 (19:54 -0500)]
dm vdo: implement the open chapter and chapter indexes
Deduplication records are stored in groups called chapters. New records are
collected in a structure called the open chapter, which is optimized for
adding, removing, and sorting records.
When a chapter fills, it is packed into a read-only structure called a
closed chapter, which is optimized for searching and reading. The closed
chapter includes a delta index, called the chapter index, which maps each
record name to the record page containing the record and allows the index
to read at most one record page when looking up a record.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:53:47 +0000 (19:53 -0500)]
dm vdo: implement the volume index
The volume index is a large delta index that maps each record name to the
chapter which contains the newest record for that name. The volume index
can contain several million records and is stored entirely in memory while
the index is operating, accounting for the majority of the deduplication
index's memory budget.
The volume index is composed of two subindexes in order to handle sparse
hook names separately from regular names. If sparse indexing is not
enabled, the sparse hook portion of the volume index is not used or
instantiated.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:52:58 +0000 (19:52 -0500)]
dm vdo: implement the delta index
The delta index is a space and memory efficient alternative to a hashtable.
Instead of storing the entire key for each entry, the entries are sorted by
key and only the difference between adjacent keys (the delta) is stored.
If the keys are evenly distributed, the size of the deltas follows an
exponential distribution, and the deltas can use a Huffman code to take up
even less space.
This structure allows the index to use many fewer bytes per entry than a
traditional hash table, but it is slightly more expensive to look up
entries, because a request must read and sum every entry in a list of
deltas in order to find a given record. The delta index reduces this lookup
cost by splitting its key space into many sub-lists, each starting at a
fixed key value, so that each individual list is short.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:52:09 +0000 (19:52 -0500)]
dm vdo: add deduplication index storage interface
This patch adds infrastructure for managing reads and writes to the
underlying storage layer for the deduplication index. The deduplication
index uses dm-bufio for all of its reads and writes, so part of this
infrastructure is managing the various dm-bufio clients required. It also
adds the buffered reader and buffered writer abstractions, which simplify
reading and writing metadata structures that span several blocks.
This patch also includes structures and utilities for encoding and decoding
all of the deduplication index metadata, collectively called the index
layout.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Co-developed-by: John Wiele <jwiele@redhat.com>
Signed-off-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:50:45 +0000 (19:50 -0500)]
dm vdo: add deduplication configuration structures
Add structures which record the configuration of various deduplication
index parameters. This also includes facilities for saving and loading the
configuration and validating its integrity.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Co-developed-by: John Wiele <jwiele@redhat.com>
Signed-off-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:49:57 +0000 (19:49 -0500)]
dm vdo: add basic hash map data structures
This patch adds two hash maps, one keyed by integers, the other by
pointers, and also a priority heap. The integer map is used for locking of
logical and physical addresses. The pointer map is used for managing
concurrent writes of the same data, ensuring that those writes are
deduplicated. The priority heap is used to minimize the search time for
free blocks.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:48:41 +0000 (19:48 -0500)]
dm vdo: add specialized request queueing functionality
This patch adds funnel_queue, a mostly lock-free multi-producer,
single-consumer queue. It also adds the request queue used by the dm-vdo
deduplication index, and the work_queue used by the dm-vdo data store. Both
of these are built on top of funnel queue and are intended to support the
dispatching of many short-running tasks. The work_queue also supports
priorities. Finally, this patch adds vdo_completion, the structure which is
enqueued on work_queues.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:47:35 +0000 (19:47 -0500)]
dm vdo: add thread and synchronization utilities
This patch adds utilities for managing and using named threads, as well as
several locking and synchronization utilities. These utilities help dm-vdo
minimize thread transitions and manage interactions between threads.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Co-developed-by: Bruce Johnston <bjohnsto@redhat.com>
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:46:10 +0000 (19:46 -0500)]
dm vdo: add vdo type declarations, constants, and simple data structures
Add definitions of constants defining the fixed parameters of a VDO
volume, and the default and maximum values of configurable or dynamic
parameters.
Add definitions of internal status codes used for internal
communication within the module and for logging.
Add definitions of types and structs used to manage the processing of
an I/O operation.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:44:01 +0000 (19:44 -0500)]
dm vdo: add basic logging and support utilities
Add various support utilities for the vdo target and deduplication index,
including logging utilities, string and time management, and index-specific
error codes.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:41:15 +0000 (19:41 -0500)]
dm vdo: add memory allocation utilities
This patch adds standardized allocation macros and memory tracking tools to
track and report any allocated memory that is not freed. This makes it
easier to ensure that the vdo target does not leak memory.
This patch also adds utilities for controlling whether certain threads are
allowed to allocate memory, since memory allocation during certain critical
code sections can cause the vdo target to deadlock.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Michael Sclafani <dm-devel@lists.linux.dev>
Co-developed-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Signed-off-by: Thomas Jaskiewicz <tom@jaskiewicz.us>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:37:30 +0000 (19:37 -0500)]
dm vdo: add the MurmurHash3 fast hashing algorithm
MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was originally
written by Austin Appleby and placed in the public domain. This version has
been modified to produce the same result on both big endian and little
endian processors, making it suitable for use in portable persistent data.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Co-developed-by: John Wiele <jwiele@redhat.com>
Signed-off-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Matthew Sakai [Fri, 17 Nov 2023 00:34:28 +0000 (19:34 -0500)]
dm: add documentation for dm-vdo target
This adds the admin-guide documentation for dm-vdo.
vdo.rst is the guide to using dm-vdo. vdo-design is an overview of the
design of dm-vdo.
Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Linus Torvalds [Sun, 4 Feb 2024 12:20:36 +0000 (12:20 +0000)]
Linux 6.8-rc3
Linus Torvalds [Sun, 4 Feb 2024 07:33:01 +0000 (07:33 +0000)]
Merge tag 'for-linus-6.8-rc3' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
and extent handling code"
* tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
ext4: make ext4_map_blocks() distinguish delalloc only extent
ext4: add a hole extent entry in cache after punch
ext4: correct the hole length returned by ext4_map_blocks()
ext4: convert to exclusive lock while inserting delalloc extents
ext4: refactor ext4_da_map_blocks()
ext4: remove 'needed' in trace_ext4_discard_preallocations
ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations
ext4: remove unused return value of ext4_mb_release_group_pa
ext4: remove unused return value of ext4_mb_release_inode_pa
ext4: remove unused return value of ext4_mb_release
ext4: remove unused ext4_allocation_context::ac_groups_considered
ext4: remove unneeded return value of ext4_mb_release_context
ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*()
ext4: remove unused return value of __mb_check_buddy
ext4: mark the group block bitmap as corrupted before reporting an error
ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
...
Linus Torvalds [Sun, 4 Feb 2024 07:26:19 +0000 (07:26 +0000)]
Merge tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Five smb3 client fixes, mostly multichannel related:
- four multichannel fixes including fix for channel allocation when
multiple inactive channels, fix for unneeded race in channel
deallocation, correct redundant channel scaling, and redundant
multichannel disabling scenarios
- add warning if max compound requests reached"
* tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: increase number of PDUs allowed in a compound request
cifs: failure to add channel on iface should bump up weight
cifs: do not search for channel if server is terminating
cifs: avoid redundant calls to disable multichannel
cifs: make sure that channel scaling is done only once
Linus Torvalds [Sun, 4 Feb 2024 07:22:51 +0000 (07:22 +0000)]
Merge tag 'xfs-6.8-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format
attribute fork
- Remove conditional compilation of realtime geometry validator
functions to prevent confusing error messages from being printed on
the console during the mount operation
* tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove conditional building of rt geometry validator functions
xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
Linus Torvalds [Sun, 4 Feb 2024 07:01:39 +0000 (07:01 +0000)]
Merge tag 'char-misc-6.8-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are three tiny driver fixes for 6.8-rc3. They include:
- Android binder long-term bug with epoll finally being fixed
- fastrpc driver shutdown bugfix
- open-dice lockdep fix
All of these have been in linux-next this week with no reported
issues"
* tag 'char-misc-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: signal epoll threads of self-work
misc: open-dice: Fix spurious lockdep warning
misc: fastrpc: Mark all sessions as invalid in cb_remove
Linus Torvalds [Sun, 4 Feb 2024 06:58:23 +0000 (06:58 +0000)]
Merge tag 'tty-6.8-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty and serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.8-rc3 that
resolve a number of reported issues. Included in here are:
- rs485 flag definition fix that affected the user/kernel abi in -rc1
- max310x driver fixes
- 8250_pci1xxxx driver off-by-one fix
- uart_tiocmget locking race fix
All of these have been in linux-next for over a week with no reported
issues"
* tag 'tty-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: max310x: prevent infinite while() loop in port startup
serial: max310x: fail probe if clock crystal is unstable
serial: max310x: improve crystal stable clock detection
serial: max310x: set default value when reading clock ready bit
serial: core: Fix atomicity violation in uart_tiocmget
serial: 8250_pci1xxxx: fix off by one in pci1xxxx_process_read_data()
tty: serial: Fix bit order in RS485 flag definitions
Linus Torvalds [Sun, 4 Feb 2024 06:52:29 +0000 (06:52 +0000)]
Merge tag 'usb-6.8-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are a bunch of small USB driver fixes for 6.8-rc3. Included in
here are:
- new usb-serial driver ids
- new dwc3 driver id added
- typec driver change revert
- ncm gadget driver endian bugfix
- xhci bugfixes for a number of reported issues
- usb hub bugfix for alternate settings
- ulpi driver debugfs memory leak fix
- chipidea driver bugfix
- usb gadget driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits)
USB: serial: option: add Fibocom FM101-GL variant
USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
USB: serial: cp210x: add ID for IMST iM871A-USB
usb: typec: tcpm: fix the PD disabled case
usb: ucsi_acpi: Quirk to ack a connector change ack cmd
usb: ucsi_acpi: Fix command completion handling
usb: ucsi: Add missing ppm_lock
usb: ulpi: Fix debugfs directory leak
Revert "usb: typec: tcpm: fix cc role at port reset"
usb: gadget: pch_udc: fix an Excess kernel-doc warning
usb: f_mass_storage: forbid async queue when shutdown happen
USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
usb: chipidea: core: handle power lost in workqueue
usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend
usb: dwc3: pci: add support for the Intel Arrow Lake-H
usb: core: Prevent null pointer dereference in update_port_device_state
xhci: handle isoc Babble and Buffer Overrun events properly
xhci: process isoc TD properly when there was a transaction error mid TD.
xhci: fix off by one check when adding a secondary interrupter.
xhci: fix possible null pointer dereference at secondary interrupter removal
...
Linus Torvalds [Sun, 4 Feb 2024 06:47:45 +0000 (06:47 +0000)]
Merge tag 'i2c-for-6.8-rc3' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixlet from Wolfram Sang:
"MAINTAINERS update to point people to the new tree for i2c host driver
changes"
* tag 'i2c-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: Update i2c host drivers repository
Linus Torvalds [Sun, 4 Feb 2024 06:37:38 +0000 (06:37 +0000)]
Merge tag 'dmaengine-fix-6.8' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"Core:
- fix return value of is_slave_direction() for D2D dma
Driver fixes for:
- Documentaion fixes to resolve warnings for at_hdmac driver
- bunch of fsl driver fixes for memory leaks, and useless kfree
- TI edma and k3 fixes for packet error and null pointer checks"
* tag 'dmaengine-fix-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: at_hdmac: add missing kernel-doc style description
dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
dmaengine: fsl-qdma: Remove a useless devm_kfree()
dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA
dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA
dmaengine: ti: k3-udma: Report short packet errors
dmaengine: ti: edma: Add some null pointer checks to the edma_probe
dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools
dmaengine: at_hdmac: fix some kernel-doc warnings
Linus Torvalds [Sun, 4 Feb 2024 06:35:00 +0000 (06:35 +0000)]
Merge tag 'phy-fixes-6.8' of git://git./linux/kernel/git/phy/linux-phy
Pull phy driver fixes from Vinod Koul:
- TI null pointer dereference
- missing erdes mux entry in lan966x driver
- Return of error code in renesas driver
- Serdes init sequence and register offsets for IPQ drivers
* tag 'phy-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
phy: lan966x: Add missing serdes mux entry
phy: renesas: rcar-gen3-usb2: Fix returning wrong error code
phy: qcom-qmp-usb: fix serdes init sequence for IPQ6018
phy: qcom-qmp-usb: fix register offsets for ipq8074/ipq6018
Wolfram Sang [Sat, 3 Feb 2024 18:23:41 +0000 (19:23 +0100)]
Merge tag 'i2c-host-fixes-6.8-rc3' of git://git./linux/kernel/git/andi.shyti/linux into i2c/for-current
Just a maintenance patch that updates the repository where the
i2c host and muxes related patches will be collected.
Linus Torvalds [Sat, 3 Feb 2024 12:52:36 +0000 (12:52 +0000)]
Merge tag 'perf-tools-fixes-for-v6.8-1-2024-02-01' of git://git./linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
"Vendor events:
- Intel Alderlake/Sapphire Rapids metric fixes, the CPU type
("cpu_atom", "cpu_core") needs to be used as a prefix to be
considered on a metric formula, detected via one of the 'perf test'
entries.
'perf test' fixes:
- Fix the creation of event selector lists on 'perf test' entries, by
initializing the sample ID flag, which is done by 'perf record', so
this fix affects only the tests, the common case isn't affected
- Make 'perf list' respect debug settings (-v) to fix its 'perf test'
entry
- Fix 'perf script' test when python support isn't enabled
- Special case 'perf script' tests on s390, where only DWARF call
graphs are supported and only on software events
- Make 'perf daemon' signal test less racy
Compiler warnings/errors:
- Remove needless malloc(0) call in 'perf top' that triggers
-Walloc-size
- Fix calloc() argument order to address error introduced in gcc-14
Build:
- Make minimal shellcheck version to v0.6.0, avoiding the build to
fail with older versions
Sync kernel header copies:
- stat.h to pick STATX_MNT_ID_UNIQUE
- msr-index.h to pick IA32_MKTME_KEYID_PARTITIONING
- drm.h to pick DRM_IOCTL_MODE_CLOSEFB
- unistd.h to pick {list,stat}mount,
lsm_{[gs]et_self_attr,list_modules} syscall numbers
- x86 cpufeatures to pick TDX, Zen, APIC MSR fence changes
- x86's mem{cpy,set}_64.S used in 'perf bench'
- Also, without tooling effects: asm-generic/unaligned.h, mount.h,
fcntl.h, kvm headers"
* tag 'perf-tools-fixes-for-v6.8-1-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (21 commits)
perf tools headers: update the asm-generic/unaligned.h copy with the kernel sources
tools include UAPI: Sync linux/mount.h copy with the kernel sources
perf evlist: Fix evlist__new_default() for > 1 core PMU
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
tools headers x86 cpufeatures: Sync with the kernel sources to pick TDX, Zen, APIC MSR fence changes
tools headers UAPI: Sync unistd.h to pick {list,stat}mount, lsm_{[gs]et_self_attr,list_modules} syscall numbers
perf vendor events intel: Alderlake/sapphirerapids metric fixes
tools headers UAPI: Sync kvm headers with the kernel sources
perf tools: Fix calloc() arguments to address error introduced in gcc-14
perf top: Remove needless malloc(0) call that triggers -Walloc-size
perf build: Make minimal shellcheck version to v0.6.0
tools headers UAPI: Update tools's copy of drm.h headers to pick DRM_IOCTL_MODE_CLOSEFB
perf test shell daemon: Make signal test less racy
perf test shell script: Fix test for python being disabled
perf test: Workaround debug output in list test
perf list: Add output file option
perf list: Switch error message to pr_err() to respect debug settings (-v)
perf test: Fix 'perf script' tests on s390
tools headers UAPI: Sync linux/fcntl.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING
...
Linus Torvalds [Fri, 2 Feb 2024 23:32:58 +0000 (15:32 -0800)]
Merge tag 'trace-v6.8-rc2' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing and eventfs fixes from Steven Rostedt:
- Fix the return code for ring_buffer_poll_wait()
It was returing a -EINVAL instead of EPOLLERR.
- Zero out the tracefs_inode so that all fields are initialized.
The ti->private could have had stale data, but instead of just
initializing it to NULL, clear out the entire structure when it is
allocated.
- Fix a crash in timerlat
The hrtimer was initialized at read and not open, but is canceled at
close. If the file was opened and never read the close will pass a
NULL pointer to hrtime_cancel().
- Rewrite of eventfs.
Linus wrote a patch series to remove the dentry references in the
eventfs_inode and to use ref counting and more of proper VFS
interfaces to make it work.
- Add warning to put_ei() if ei is not set to free. That means
something is about to free it when it shouldn't.
- Restructure the eventfs_inode to make it more compact, and remove the
unused llist field.
- Remove the fsnotify*() funtions for when the inodes were being
created in the lookup code. It doesn't make sense to notify about
creation just because something is being looked up.
- The inode hard link count was not accurate.
It was being updated when a file was looked up. The inodes of
directories were updating their parent inode hard link count every
time the inode was created. That means if memory reclaim cleaned a
stale directory inode and the inode was lookup up again, it would
increment the parent inode again as well. Al Viro said to just have
all eventfs directories have a hard link count of 1. That tells user
space not to trust it.
* tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Keep all directory links at 1
eventfs: Remove fsnotify*() functions from lookup()
eventfs: Restructure eventfs_inode structure to be more condensed
eventfs: Warn if an eventfs_inode is freed without is_freed being set
tracing/timerlat: Move hrtimer_init to timerlat_fd open()
eventfs: Get rid of dentry pointers without refcounts
eventfs: Clean up dentry ops and add revalidate function
eventfs: Remove unused d_parent pointer field
tracefs: dentry lookup crapectomy
tracefs: Avoid using the ei->dentry pointer unnecessarily
eventfs: Initialize the tracefs inode properly
tracefs: Zero out the tracefs_inode when allocating it
ring-buffer: Clean ring_buffer_poll_wait() error return
Linus Torvalds [Fri, 2 Feb 2024 23:30:33 +0000 (15:30 -0800)]
Merge tag 'gfs2-v6.8-rc2-revert' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 revert from Andreas Gruenbacher:
"It turns out that the commit to use GL_NOBLOCK flag for non-blocking
lookups has several issues, and not all of them have a simple fix"
* tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"
Linus Torvalds [Fri, 2 Feb 2024 20:56:56 +0000 (12:56 -0800)]
Merge tag 'pci-v6.8-fixes-1' of git://git./linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Fix a potential deadlock that was reintroduced by an ASPM revert
merged for v6.8 (Johan Hovold)
- Add Manivannan Sadhasivam as PCI Endpoint maintainer (Lorenzo
Pieralisi)
* tag 'pci-v6.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Manivannan Sadhasivam as PCI Endpoint maintainer
PCI/ASPM: Fix deadlock when enabling ASPM
Linus Torvalds [Fri, 2 Feb 2024 20:54:46 +0000 (12:54 -0800)]
Merge tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm/drm
Pul drm fixes from Dave Airlie:
"Regular weekly fixes, mostly amdgpu and xe. One nouveau fix is a
better fix for the deadlock and also helps with a sync race we were
seeing.
dma-buf:
- heaps CMA page accounting fix
virtio-gpu:
- fix segment size
xe:
- A crash fix
- A fix for an assert due to missing mem_acces ref
- Only allow a single user-fence per exec / bind.
- Some sparse warning fixes
- Two fixes for compilation failures on various odd combinations of
gcc / arch pointed out on LKML.
- Fix a fragile partial allocation pointed out on LKML.
- A sysfs ABI documentation warning fix
amdgpu:
- Fix reboot issue seen on some 7000 series dGPUs
- Fix client init order for KFD
- Misc display fixes
- USB-C fix
- DCN 3.5 fixes
- Fix issues with GPU scheduler and GPU reset
- GPU firmware loading fix
- Misc fixes
- GC 11.5 fix
- VCN 4.0.5 fix
- IH overflow fix
amdkfd:
- SVM fixes
- Trap handler fix
- Fix device permission lookup
- Properly reserve BO before validating it
nouveau:
- fence/irq lock deadlock fix (second attempt)
- gsp command size fix
* tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm/drm: (35 commits)
nouveau: offload fence uevents work to workqueue
nouveau/gsp: use correct size for registry rpc.
drm/amdgpu/pm: Use inline function for IP version check
drm/hwmon: Fix abi doc warnings
drm/xe: Make all GuC ABI shift values unsigned
drm/xe/vm: Subclass userptr vmas
drm/xe: Use LRC prefix rather than CTX prefix in lrc desc defines
drm/xe: Don't use __user error pointers
drm/xe: Annotate mcr_[un]lock()
drm/xe: Only allow 1 ufence per exec / bind IOCTL
drm/xe: Grab mem_access when disabling C6 on skip_guc_pc platforms
drm/xe: Fix crash in trace_dma_fence_init()
drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend
drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0
drm/amdkfd: reserve the BO before validating it
drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'
drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'
drm/amd: Don't init MEC2 firmware when it fails to load
...
Linus Torvalds [Fri, 2 Feb 2024 20:52:44 +0000 (12:52 -0800)]
Merge tag 'input-for-v6.8-rc2' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for the fix to deal with newer laptops which get confused by
the "GET ID" command when probing for PS/2 keyboards
- a couple of tweaks to i8042 to handle Clevo NS70PU and Lifebook U728
laptops
- a change to bcm5974 to validate that the device has appropriate
endpoints
- an addition of new product ID to xpad driver to recognize Lenovo
Legion Go controllers
- a quirk to Goodix controller to deal with extra GPIO described in
ACPI tables on some devices.
* tag 'input-for-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Fujitsu Lifebook U728 to i8042 quirk table
Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
Input: atkbd - do not skip atkbd_deactivate() when skipping ATKBD_CMD_GETID
Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
Input: bcm5974 - check endpoint type before starting traffic
Input: xpad - add Lenovo Legion Go controllers
Input: goodix - accept ACPI resources with gpio_count == 3 && gpio_int_idx == 0
Linus Torvalds [Fri, 2 Feb 2024 20:50:44 +0000 (12:50 -0800)]
Merge tag 'sound-6.8-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of fixes, mostly device-specific ones:
- Minor PCM core fix for name strings
- ASoC Qualcomm fixes, including DAI support extensions
- ASoC AMD platform updates
- ASoC Allwinner platform updates
- Various ASoC codec fixes for WSA, WCD, ES8326 drivers
- Various HD-audio and USB-audio fixes and quirks
- A series of fixes for Cirrus CS35L56 codecs"
* tag 'sound-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (63 commits)
ALSA: usb-audio: Ignore clock selector errors for single connection
ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL
ALSA: hda: cs35l56: Remove unused test stub function
ALSA: hda: cs35l56: Firmware file must match the version of preloaded firmware
ALSA: hda: cs35l56: Fix filename string field layout
ALSA: hda: cs35l56: Fix order of searching for firmware files
ASoC: cs35l56: Allow more time for firmware to boot
ASoC: cs35l56: Load tunings for the correct speaker models
ASoC: cs35l56: Firmware file must match the version of preloaded firmware
ASoC: cs35l56: Fix misuse of wm_adsp 'part' string for silicon revision
ASoC: cs35l56: Fix for initializing ASP1 mixer registers
ALSA: hda: cs35l56: Initialize all ASP1 registers
ASoC: cs35l56: Fix default SDW TX mixer registers
ASoC: cs35l56: Fix to ensure ASP1 registers match cache
ASoC: cs35l56: Remove buggy checks from cs35l56_is_fw_reload_needed()
ASoC: cs35l56: Don't add the same register patch multiple times
ASoC: cs35l56: cs35l56_component_remove() must clean up wm_adsp
ASoC: cs35l56: cs35l56_component_remove() must clear cs35l56->component
ASoC: wm_adsp: Don't overwrite fwf_name with the default
ASoC: wm_adsp: Fix firmware file search order
...
Linus Torvalds [Fri, 2 Feb 2024 20:48:33 +0000 (12:48 -0800)]
Merge tag 'hwmon-for-v6.8-rc3' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- pmbus/mp2975: Fix driver initialization
- gigabyte_waterforce: Add missing unlock in error handling path
* tag 'hwmon-for-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus/mp2975) Correct comment inside 'mp2975_read_byte_data'
hwmon: (pmbus/mp2975) Fix driver initialization for MP2975 device
hwmon: gigabyte_waterforce: Fix locking bug in waterforce_get_status()
Linus Torvalds [Fri, 2 Feb 2024 20:46:35 +0000 (12:46 -0800)]
Merge tag 'for-v6.8-rc' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fix from Sebastian Reichel:
- qcom_battmgr: revert broken fix
* tag 'for-v6.8-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
Revert "power: supply: qcom_battmgr: Register the power supplies after PDR is up"
Linus Torvalds [Fri, 2 Feb 2024 20:43:51 +0000 (12:43 -0800)]
Merge tag 'iommu-fixes-v6.8-rc2' of git://git./linux/kernel/git/joro/iommu
Pul iommu fixes from Joerg Roedel:
- Make iommu_ops->default_domain work without CONFIG_IOMMU_DMA to fix
initialization of FSL-PAMU devices
- Fix for Tegra fbdev initialization failure
- Fix for a VFIO device unbinding failure on PowerPC
* tag 'iommu-fixes-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
powerpc: iommu: Bring back table group release_ownership() call
drm/tegra: Do not assume that a NULL domain means no DMA IOMMU
iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA
Linus Torvalds [Fri, 2 Feb 2024 18:58:25 +0000 (10:58 -0800)]
Merge tag 'for-6.8/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM ioctl interface to avoid INT_MAX overflow warnings from
kvmalloc by limiting the number of targets and parameter size area.
- Fix DM stats to avoid INT_MAX overflow warnings from kvmalloc by
limiting the number of entries supported.
- Fix DM writecache to support mapping devices larger than 1 TiB by
switching from using kvmalloc_array to vmalloc_array -- which avoids
INT_MAX overflow in kvmalloc_node and associated warnings.
- Remove the (ab)use of tasklets from both the DM crypt and verity
targets. They will be converted to use BH workqueue in future.
* tag 'for-6.8/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-crypt, dm-verity: disable tasklets
dm writecache: allow allocations larger than 2GiB
dm stats: limit the number of entries
dm: limit the number of targets and parameter size area
Linus Torvalds [Fri, 2 Feb 2024 18:52:56 +0000 (10:52 -0800)]
Merge tag 'ata-6.8-rc3' of git://git./linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Following up on last week's ASMedia ASM1061 43-bit dma_mask quirk, we
sent an email to ASMedia developers that have previously been active
on the mailing list, asking exactly which SATA controllers that are
affected by this hardware limitation.
We got a reply that it affects all the SATA controllers in the
ASM106x family, thus extend the existing 43-bit dma_mask quirk to
apply to all the affected ASMedia SATA controllers.
* tag 'ata-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ahci: Extend ASM1061 43-bit DMA address quirk to other ASM106x parts
Szilard Fabian [Fri, 2 Feb 2024 18:28:59 +0000 (10:28 -0800)]
Input: i8042 - add Fujitsu Lifebook U728 to i8042 quirk table
Another Fujitsu-related patch.
In the initial boot stage the integrated keyboard of Fujitsu Lifebook U728
refuses to work and it's not possible to type for example a dm-crypt
passphrase without the help of an external keyboard.
i8042.nomux kernel parameter resolves this issue but using that a PS/2
mouse is detected. This input device is unused even when the i2c-hid-acpi
kernel module is blacklisted making the integrated ELAN touchpad
(04F3:3092) not working at all.
So this notebook uses a hid-over-i2c touchpad which is managed by the
i2c_designware input driver. Since you can't find a PS/2 mouse port on this
computer and you can't connect a PS/2 mouse to it even with an official
port replicator I think it's safe to not use the PS/2 mouse port at all.
Signed-off-by: Szilard Fabian <szfabian@bluemarch.art>
Link: https://lore.kernel.org/r/20240103014717.127307-2-szfabian@bluemarch.art
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Fri, 2 Feb 2024 18:49:28 +0000 (10:49 -0800)]
Merge tag 'block-6.8-2024-02-01' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Remove duplicated enums (Guixen)
- Use appropriate controller state accessors (Keith)
- Retryable authentication (Hannes)
- Add missing module descriptions (Chaitanya)
- Fibre-channel fixes for blktests (Daniel)
- Various type correctness updates (Caleb)
- Improve fabrics connection debugging prints (Nitin)
- Passthrough command verbose error logging (Adam)
- Fix for where we set IO priority in the bio for drivers that use
fops->submit_bio() to queue IO, like md/dm etc.
* tag 'block-6.8-2024-02-01' of git://git.kernel.dk/linux: (32 commits)
block: Fix where bio IO priority gets set
nvme: allow passthru cmd error logging
nvme-fc: show hostnqn when connecting to fc target
nvme-rdma: show hostnqn when connecting to rdma target
nvme-tcp: show hostnqn when connecting to tcp target
nvmet-fc: use RCU list iterator for assoc_list
nvmet-fc: take ref count on tgtport before delete assoc
nvmet-fc: avoid deadlock on delete association path
nvmet-fc: abort command when there is no binding
nvmet-fc: do not tack refs on tgtports from assoc
nvmet-fc: remove null hostport pointer check
nvmet-fc: hold reference on hostport match
nvmet-fc: free queue and assoc directly
nvmet-fc: defer cleanup using RCU properly
nvmet-fc: release reference on target port
nvmet-fcloop: swap the list_add_tail arguments
nvme-fc: do not wait in vain when unloading module
nvme-fc: log human-readable opcode on timeout
nvme: split out fabrics version of nvme_opcode_str()
nvme: take const cmd pointer in read-only helpers
...
Linus Torvalds [Fri, 2 Feb 2024 18:45:17 +0000 (10:45 -0800)]
Merge tag 'io_uring-6.8-2024-02-01' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for missing retry for read multishot.
If we trigger the execution of it and there's more than one buffer to
be read, then we don't always read more than the first one. As it's
edge triggered, this can lead to stalls.
- Limit inline receive multishot retries for fairness reasons.
If we have a very bursty socket receiving data, we still need to
ensure we process other requests as well. This is really two minor
cleanups, then adding a way for poll reissue to trigger a requeue,
and then finally having multishot receive utilize that.
- Fix for a weird corner case for non-multishot receive with
MSG_WAITALL, using provided buffers, and setting the length to
zero (to let the buffer dictate the receive size).
* tag 'io_uring-6.8-2024-02-01' of git://git.kernel.dk/linux:
io_uring/net: fix sr->len for IORING_OP_RECV with MSG_WAITALL and buffers
io_uring/net: limit inline multishot retries
io_uring/poll: add requeue return code from poll multishot handling
io_uring/net: un-indent mshot retry path in io_recv_finish()
io_uring/poll: move poll execution helpers higher up
io_uring/rw: ensure poll based multishot read retries appropriately
Linus Torvalds [Fri, 2 Feb 2024 18:40:50 +0000 (10:40 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Two small fixes.
The first one is an alternative fix for the SCS patching problem we
thought we'd fixed in -rc1; it turned out not to be robust with all
toolchains/configs, so this is a revert+retry which has seen some more
testing.
The other one simply removes an unused header file, but I couldn't
resist the negative diffstat.
- Really fix shadow call stack patching with LTO=full
- Remove unused (empty) header file generated from the compat vDSO"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: vdso32: Remove unused vdso32-offsets.h
arm64: scs: Disable LTO for SCS patching code
arm64: Revert "scs: Work around full LTO issue with dynamic SCS"
Werner Sembach [Tue, 5 Dec 2023 16:36:01 +0000 (17:36 +0100)]
Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
When closing the laptop lid with an external screen connected, the mouse
pointer has a constant movement to the lower right corner. Opening the
lid again stops this movement, but after that the touchpad does no longer
register clicks.
The touchpad is connected both via i2c-hid and PS/2, the predecessor of
this device (NS70MU) has the same layout in this regard and also strange
behaviour caused by the psmouse and the i2c-hid driver fighting over
touchpad control. This fix is reusing the same workaround by just
disabling the PS/2 aux port, that is only used by the touchpad, to give the
i2c-hid driver the lone control over the touchpad.
v2: Rebased on current master
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231205163602.16106-1-wse@tuxedocomputers.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Mikulas Patocka [Wed, 31 Jan 2024 20:57:27 +0000 (21:57 +0100)]
dm-crypt, dm-verity: disable tasklets
Tasklets have an inherent problem with memory corruption. The function
tasklet_action_common calls tasklet_trylock, then it calls the tasklet
callback and then it calls tasklet_unlock. If the tasklet callback frees
the structure that contains the tasklet or if it calls some code that may
free it, tasklet_unlock will write into free memory.
The commits
8e14f610159d and
d9a02e016aaf try to fix it for dm-crypt, but
it is not a sufficient fix and the data corruption can still happen [1].
There is no fix for dm-verity and dm-verity will write into free memory
with every tasklet-processed bio.
There will be atomic workqueues implemented in the kernel 6.9 [2]. They
will have better interface and they will not suffer from the memory
corruption problem.
But we need something that stops the memory corruption now and that can be
backported to the stable kernels. So, I'm proposing this commit that
disables tasklets in both dm-crypt and dm-verity. This commit doesn't
remove the tasklet support, because the tasklet code will be reused when
atomic workqueues will be implemented.
[1] https://lore.kernel.org/all/
d390d7ee-f142-44d3-822a-
87949e14608b@suse.de/T/
[2] https://lore.kernel.org/lkml/
20240130091300.
2968534-1-tj@kernel.org/
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes:
39d42fa96ba1b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Fixes:
5721d4e5a9cdb ("dm verity: Add optional "try_verify_in_tasklet" feature")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Greg Kroah-Hartman [Fri, 2 Feb 2024 16:36:38 +0000 (08:36 -0800)]
Merge tag 'usb-serial-6.8-rc3' of https://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial device ids for 6.8-rc3
Here are some new device ids for 6.8-rc3.
All have been in linux-next with no reported issues.
* tag 'usb-serial-6.8-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Fibocom FM101-GL variant
USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
USB: serial: cp210x: add ID for IMST iM871A-USB
Andreas Gruenbacher [Fri, 2 Feb 2024 16:11:25 +0000 (17:11 +0100)]
Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"
Commit "gfs2: Use GL_NOBLOCK flag for non-blocking lookups" has several
issues, some of which are non-trivial to fix, so revert it for now:
https://lore.kernel.org/gfs2/
20240202050230.GA875515@ZenIV/T/
This reverts commit
dd00aaeb343255a8a30de671bd27bde79a47c8e5.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Vinod Koul [Tue, 30 Jan 2024 16:32:16 +0000 (22:02 +0530)]
dmaengine: at_hdmac: add missing kernel-doc style description
We get following warning with W=1:
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'boundary' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'dst_hole' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'src_hole' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'memset_buffer' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'memset_paddr' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'memset_vaddr' not described in 'at_desc'
drivers/dma/at_hdmac.c:255: warning: Enum value 'ATC_IS_PAUSED' not described in enum 'atc_status'
drivers/dma/at_hdmac.c:255: warning: Enum value 'ATC_IS_CYCLIC' not described in enum 'atc_status'
drivers/dma/at_hdmac.c:287: warning: Function parameter or struct member 'cyclic' not described in 'at_dma_chan'
drivers/dma/at_hdmac.c:350: warning: Function parameter or struct member 'memset_pool' not described in 'at_dma'
Fix this by adding the required description and also drop unused struct
member 'cyclic' in 'at_dma_chan'
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20240130163216.633034-1-vkoul@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Shivaprasad G Bhat [Fri, 26 Jan 2024 15:09:18 +0000 (09:09 -0600)]
powerpc: iommu: Bring back table group release_ownership() call
The commit
2ad56efa80db ("powerpc/iommu: Setup a default domain and
remove set_platform_dma_ops") refactored the code removing the
set_platform_dma_ops(). It missed out the table group
release_ownership() call which would have got called otherwise
during the guest shutdown via vfio_group_detach_container(). On
PPC64, this particular call actually sets up the 32-bit TCE table,
and enables the 64-bit DMA bypass etc. Now after guest shutdown,
the subsequent host driver (e.g megaraid-sas) probe post unbind
from vfio-pci fails like,
megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0x7fffffffffffffff, table unavailable
megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0xffffffff, table unavailable
megaraid_sas 0031:01:00.0: Failed to set DMA mask
megaraid_sas 0031:01:00.0: Failed from megasas_init_fw 6539
The patch brings back the call to table_group release_ownership()
call when switching back to PLATFORM domain from BLOCKED, while
also separates the domain_ops for both.
Fixes:
2ad56efa80db ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops")
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/170628173462.3742.18330000394415935845.stgit@ltcd48-lp2.aus.stglab.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Dave Airlie [Mon, 29 Jan 2024 01:26:45 +0000 (11:26 +1000)]
nouveau: offload fence uevents work to workqueue
This should break the deadlock between the fctx lock and the irq lock.
This offloads the processing off the work from the irq into a workqueue.
Cc: linux-stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/576237/
Dave Airlie [Tue, 30 Jan 2024 03:24:40 +0000 (13:24 +1000)]
nouveau/gsp: use correct size for registry rpc.
Timur pointed this out before, and it just slipped my mind,
but this might help some things work better, around pcie power
management.
Fixes:
8d55b0a940bb ("nouveau/gsp: add some basic registry entries.")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/576336/
Dave Airlie [Fri, 2 Feb 2024 05:30:16 +0000 (15:30 +1000)]
Merge tag 'amd-drm-fixes-6.8-2024-02-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.8-2024-02-01:
amdgpu:
- Fix reboot issue seen on some 7000 series dGPUs
- Fix client init order for KFD
- Misc display fixes
- USB-C fix
- DCN 3.5 fixes
- Fix issues with GPU scheduler and GPU reset
- GPU firmware loading fix
- Misc fixes
- GC 11.5 fix
- VCN 4.0.5 fix
- IH overflow fix
amdkfd:
- SVM fixes
- Trap handler fix
- Fix device permission lookup
- Properly reserve BO before validating it
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201184108.4923-1-alexander.deucher@amd.com
Zhang Yi [Sat, 27 Jan 2024 01:58:05 +0000 (09:58 +0800)]
ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
Since ext4_map_blocks() can recognize a delayed allocated only extent,
make ext4_set_iomap() can also recognize it, and remove the useless
separate check in ext4_iomap_begin_report().
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Sat, 27 Jan 2024 01:58:04 +0000 (09:58 +0800)]
ext4: make ext4_map_blocks() distinguish delalloc only extent
Add a new map flag EXT4_MAP_DELAYED to indicate the mapping range is a
delayed allocated only (not unwritten) one, and making
ext4_map_blocks() can distinguish it, no longer mixing it with holes.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Sat, 27 Jan 2024 01:58:03 +0000 (09:58 +0800)]
ext4: add a hole extent entry in cache after punch
In order to cache hole extents in the extent status tree and keep the
hole length as long as possible, re-add a hole entry to the cache just
after punching a hole.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Hans de Goede [Fri, 26 Jan 2024 16:07:24 +0000 (17:07 +0100)]
Input: atkbd - do not skip atkbd_deactivate() when skipping ATKBD_CMD_GETID
After commit
936e4d49ecbc ("Input: atkbd - skip ATKBD_CMD_GETID in
translated mode") not only the getid command is skipped, but also
the de-activating of the keyboard at the end of atkbd_probe(), potentially
re-introducing the problem fixed by commit
be2d7e4233a4 ("Input: atkbd -
fix multi-byte scancode handling on reconnect").
Make sure multi-byte scancode handling on reconnect is still handled
correctly by not skipping the atkbd_deactivate() call.
Fixes:
936e4d49ecbc ("Input: atkbd - skip ATKBD_CMD_GETID in translated mode")
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240126160724.13278-3-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Hans de Goede [Fri, 26 Jan 2024 16:07:23 +0000 (17:07 +0100)]
Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
After commit
936e4d49ecbc ("Input: atkbd - skip ATKBD_CMD_GETID in
translated mode") the keyboard on Dell XPS 13 9350 / 9360 / 9370 models
has stopped working after a suspend/resume.
The problem appears to be that atkbd_probe() fails when called
from atkbd_reconnect() on resume, which on systems where
ATKBD_CMD_GETID is skipped can only happen by ATKBD_CMD_SETLEDS
failing. ATKBD_CMD_SETLEDS failing because ATKBD_CMD_GETID was
skipped is weird, but apparently that is what is happening.
Fix this by also skipping ATKBD_CMD_SETLEDS when skipping
ATKBD_CMD_GETID.
Fixes:
936e4d49ecbc ("Input: atkbd - skip ATKBD_CMD_GETID in translated mode")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/linux-input/
0aa4a61f-c939-46fe-a572-
08022e8931c7@molgen.mpg.de/
Closes: https://bbs.archlinux.org/viewtopic.php?pid=
2146300
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218424
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=
2260517
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240126160724.13278-2-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Zhang Yi [Sat, 27 Jan 2024 01:58:02 +0000 (09:58 +0800)]
ext4: correct the hole length returned by ext4_map_blocks()
In ext4_map_blocks(), if we can't find a range of mapping in the
extents cache, we are calling ext4_ext_map_blocks() to search the real
path and ext4_ext_determine_hole() to determine the hole range. But if
the querying range was partially or completely overlaped by a delalloc
extent, we can't find it in the real extent path, so the returned hole
length could be incorrect.
Fortunately, ext4_ext_put_gap_in_cache() have already handle delalloc
extent, but it searches start from the expanded hole_start, doesn't
start from the querying range, so the delalloc extent found could not be
the one that overlaped the querying range, plus, it also didn't adjust
the hole length. Let's just remove ext4_ext_put_gap_in_cache(), handle
delalloc and insert adjusted hole extent in ext4_ext_determine_hole().
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Sat, 27 Jan 2024 01:58:01 +0000 (09:58 +0800)]
ext4: convert to exclusive lock while inserting delalloc extents
ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem
when inserting delalloc extents, it could be raced by another querying
path of ext4_map_blocks() without i_rwsem, .e.g buffered read path.
Suppose we buffered read a file containing just a hole, and without any
cached extents tree, then it is raced by another delayed buffered write
to the same area or the near area belongs to the same hole, and the new
delalloc extent could be overwritten to a hole extent.
pread() pwrite()
filemap_read_folio()
ext4_mpage_readpages()
ext4_map_blocks()
down_read(i_data_sem)
ext4_ext_determine_hole()
//find hole
ext4_ext_put_gap_in_cache()
ext4_es_find_extent_range()
//no delalloc extent
ext4_da_map_blocks()
down_read(i_data_sem)
ext4_insert_delayed_block()
//insert delalloc extent
ext4_es_insert_extent()
//overwrite delalloc extent to hole
This race could lead to inconsistent delalloc extents tree and
incorrect reserved space counter. Fix this by converting to hold
i_data_sem in exclusive mode when adding a new delalloc extent in
ext4_da_map_blocks().
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>