Darrick J. Wong [Mon, 4 Nov 2024 04:19:01 +0000 (20:19 -0800)]
xfs: move repair temporary files to the metadata directory tree
Due to resource acquisition rules, we have to create the ondisk
temporary files used to stage a filesystem repair before we can acquire
a reference to the inode that we actually want to repair. Therefore,
we do not know at tempfile creation time whether the tempfile will
belong to the regular directory tree or the metadata directory tree.
This distinction becomes important when the swapext code tries to figure
out the quota accounting of the two files whose mappings are being
swapped. The swapext code assumes that accounting updates are required
for a file if dqattach attaches dquots. Metadir files are never
accounted in quota, which means that swapext must not update the quota
accounting when swapping in a repaired directory/xattr/rtbitmap structure.
Prior to the swapext call, therefore, both files must be marked as
METADIR for dqattach so that dqattach will ignore them. Add support for
a repair tempfile to be switched to the metadir tree and switched back
before being released so that ifree will just free the file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:00 +0000 (20:19 -0800)]
xfs: check the metadata directory inumber in superblocks
When metadata directories are enabled, make sure that the secondary
superblocks point to the metadata directory. This isn't strictly
required because the secondaries are only used to recover damaged
filesystems, and the metadir root inumber is fixed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:19:00 +0000 (20:19 -0800)]
xfs: scrub metadata directories
Teach online scrub about the metadata directory tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:59 +0000 (20:18 -0800)]
xfs: fix di_metatype field of inodes that won't load
Make sure that the di_metatype field is at least set plausibly so that
later scrubbers could set the real type.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:59 +0000 (20:18 -0800)]
xfs: adjust parent pointer scrubber for sb-rooted metadata files
Starting with the metadata directory feature, we're allowed to call the
directory and parent pointer scrubbers for every metadata file,
including the ones that are children of the superblock.
For these children, checking the link count against the number of parent
pointers is a bit funny -- there's no such thing as a parent pointer for
a child of the superblock since there's no corresponding dirent. For
purposes of validating nlink, we pretend that there is a parent pointer.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:58 +0000 (20:18 -0800)]
xfs: metadata files can have xattrs if metadir is enabled
If parent pointers are enabled, then metadata files will store parent
pointers in xattrs, just like files in the user visible directory tree.
Therefore, scrub and repair need to handle attr forks for metadata files
on metadir filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:58 +0000 (20:18 -0800)]
xfs: do not count metadata directory files when doing online quotacheck
Previously, we stated that files in the metadata directory tree are not
counted in the dquot information. Fix the online quotacheck code to
reflect this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:57 +0000 (20:18 -0800)]
xfs: refactor directory tree root predicates
Metadata directory trees make reasoning about the parent of a file more
difficult. Traditionally, user files are children of sb_rootino, and
metadata files are "children" of the superblock. Now, we add a third
possibility -- some metadata files can be children of sb_metadirino, but
the classic ones (rt free space data and quotas) are left alone.
Let's add some helper functions (instead of open-coding the logic
everywhere) to make scrub logic easier to understand.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:57 +0000 (20:18 -0800)]
xfs: record health problems with the metadata directory
Make a report to the health monitoring subsystem any time we encounter
something in the metadata directory tree that looks like corruption.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:56 +0000 (20:18 -0800)]
xfs: adjust xfs_bmap_add_attrfork for metadir
Online repair might use the xfs_bmap_add_attrfork to repair a file in
the metadata directory tree if (say) the metadata file lacks the correct
parent pointers. In that case, it is not correct to check that the file
is dqattached -- metadata files must be not have /any/ dquot attached at
all. Adjust the assertions appropriately.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:55 +0000 (20:18 -0800)]
xfs: mark quota inodes as metadata files
When we're creating quota files at mount time, make sure to mark them as
metadir inodes if appropriate.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:55 +0000 (20:18 -0800)]
xfs: don't count metadata directory files to quota
Files in the metadata directory tree are internal to the filesystem.
Don't count the inodes or the blocks they use in the root dquot because
users do not need to know about their resource usage. This will also
quiet down complaints about dquot usage not matching du output.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:54 +0000 (20:18 -0800)]
xfs: allow bulkstat to return metadata directories
Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.
(Metadata files of course require per-file scrub code and hence do not
need exposure.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:53 +0000 (20:18 -0800)]
xfs: advertise metadata directory feature
Advertise the existence of the metadata directory feature; this will be
used by scrub to decide if it needs to scan the metadir too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:53 +0000 (20:18 -0800)]
xfs: hide metadata inodes from everyone because they are special
Metadata inodes are private files and therefore cannot be exposed to
userspace. This means no bulkstat, no open-by-handle, no linking them
into the directory tree, and no feeding them to LSMs. As such, we mark
them S_PRIVATE, which stops all that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:52 +0000 (20:18 -0800)]
xfs: disable the agi rotor for metadata inodes
Ideally, we'd put all the metadata inodes in one place if we could, so
that the metadata all stay reasonably close together instead of
spreading out over the disk. Furthermore, if the log is internal we'd
probably prefer to keep the metadata near the log. Therefore, disable
AGI rotoring for metadata inode allocations.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:52 +0000 (20:18 -0800)]
xfs: read and write metadata inode directory tree
Plumb in the bits we need to load metadata inodes from a named entry in
a metadir directory, create (or hardlink) inodes into a metadir
directory, create metadir directories, and flag inodes as being metadata
files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:51 +0000 (20:18 -0800)]
xfs: enforce metadata inode flag
Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:51 +0000 (20:18 -0800)]
xfs: load metadata directory root at mount time
Load the metadata directory root inode into memory at mount time and
release it at unmount time.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:50 +0000 (20:18 -0800)]
xfs: iget for metadata inodes
Create a xfs_trans_metafile_iget function for metadata inodes to ensure
that when we try to iget a metadata file, the inode is allocated and its
file mode matches the metadata file type the caller expects.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:49 +0000 (20:18 -0800)]
xfs: define the on-disk format for the metadir feature
Define the on-disk layout and feature flags for the metadata inode
directory feature. Add a xfs_sb_version_hasmetadir for benefit of
xfs_repair, which needs to know where the new end of the superblock
lies.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:49 +0000 (20:18 -0800)]
xfs: standardize EXPERIMENTAL warning generation
Refactor the open-coded warnings about EXPERIMENTAL feature use into a
standard helper before we go adding more experimental features.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:48 +0000 (20:18 -0800)]
xfs: rename metadata inode predicates
The predicate xfs_internal_inum tells us if an inumber refers to one of
the inodes rooted in the superblock. Soon we're going to have internal
inodes in a metadata directory tree, so this helper should be renamed
to capture its limited scope.
Ondisk inodes will soon have a flag to indicate that they're metadata
inodes. Head off some confusion by renaming the xfs_is_metadata_inode
predicate to xfs_is_internal_inode.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:48 +0000 (20:18 -0800)]
xfs: constify the xfs_inode predicates
Change the xfs_inode predicates to take a const struct xfs_inode pointer
because they do not change the inode.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:47 +0000 (20:18 -0800)]
xfs: constify the xfs_sb predicates
Change the xfs_sb predicates to take a const struct xfs_sb pointer
because they do not change the superblock.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:46 +0000 (20:18 -0800)]
xfs: store a generic group structure in the intents
Replace the pag pointers in the extent free, bmap, rmap and refcount
intent structures with a pointer to the generic group to prepare
for adding intents for realtime groups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:46 +0000 (20:18 -0800)]
xfs: remove xfs_group_intent_hold and xfs_group_intent_rele
Each of them just has a single caller, so fold them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:45 +0000 (20:18 -0800)]
xfs: add group based bno conversion helpers
Add/move the blocks, blklog and blkmask fields to the generic groups
structure so that code can work with AGs and RTGs by just using the
right index into the array.
Then, add convenience helpers to convert block numbers based on the
generic group. This will allow writing code that doesn't care if it is
used on AGs or the upcoming realtime groups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:45 +0000 (20:18 -0800)]
xfs: store a generic xfs_group pointer in xfs_getfsmap_info
Replace the pag and rtg pointers with a generic group pointer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:44 +0000 (20:18 -0800)]
xfs: add a generic group pointer to the btree cursor
Replace the pag pointers in the type specific union with a generic
xfs_group pointer. This prepares for adding realtime group support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:43 +0000 (20:18 -0800)]
xfs: convert busy extent tracking to the generic group structure
Split busy extent tracking from struct xfs_perag into its own private
structure, which can be pointed to by the generic group structure.
Note that this structure is now dynamically allocated instead of embedded
as the upcoming zone XFS code doesn't need it and will also have an
unusually high number of groups due to hardware constraints. Dynamically
allocating the structure this is a big memory saver for this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:43 +0000 (20:18 -0800)]
xfs: convert extent busy tracepoints to the generic group structure
Prepare for tracking busy RT extents by passing the generic group
structure to the xfs_extent_busy_class tracepoints.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:42 +0000 (20:18 -0800)]
xfs: return the busy generation from xfs_extent_busy_list_empty
This avoid having to poke into the internals of the busy tracking in
xrep_setup_ag_allocbt.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:42 +0000 (20:18 -0800)]
xfs: move the online repair rmap hooks to the generic group structure
Prepare for the upcoming realtime groups feature by moving the online
repair rmap hooks to based to the generic xfs_group structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:41 +0000 (20:18 -0800)]
xfs: move draining of deferred operations to the generic group structure
Prepare supporting the upcoming realtime groups feature by moving the
deferred operation draining to the generic xfs_group structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:41 +0000 (20:18 -0800)]
xfs: mark xfs_perag_intent_{hold,rele} static
These two functions are only used inside of xfs_drain.c, so mark them
static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:40 +0000 (20:18 -0800)]
xfs: move metadata health tracking to the generic group structure
Prepare for also tracking the health status of the upcoming realtime
groups by moving the health tracking code to the generic xfs_group
structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:39 +0000 (20:18 -0800)]
xfs: switch perag iteration from the for_each macros to a while based iterator
The current for_each_perag* macros are a bit annoying in that they
require the caller to both provide an object and an index iterator, and
also somewhat obsfucate the underlying control flow mechanism.
Switch to open coded while loops using new xfs_perag_next{,_from,_range}
helpers that return the next pag structure to iterate on based on the
previous one or NULL for the loop start.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:39 +0000 (20:18 -0800)]
xfs: add a xfs_group_next_range helper
Add a helper to iterate over iterate over all groups, which can be used
as a simple while loop:
struct xfs_group *xg = NULL;
while ((xg = xfs_group_next_range(mp, xg, 0, MAX_GROUP))) {
...
}
This will be wrapped by the realtime group code first, and eventually
replace the for_each_rtgroup_from and for_each_rtgroup_range helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:38 +0000 (20:18 -0800)]
xfs: factor out a generic xfs_group structure
Split the lookup and refcount handling of struct xfs_perag into an
embedded xfs_group structure that can be reused for the upcoming
realtime groups.
It will be extended with more features later.
Note that he xg_type field will only need a single bit even with
realtime group support. For now it fills a hole, but it might be
worth to fold it into another field if we can use this space better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:37 +0000 (20:18 -0800)]
xfs: factor out a xfs_iwalk_args helper
Add a helper to share more code between xfs_iwalk and xfs_inobt_walk,
and at the same time do away with the extra flags indirect so that
everyone use the same names for the same flags when using the common
iwalk code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:37 +0000 (20:18 -0800)]
xfs: insert the pag structures into the xarray later
Cleaning up is much easier if a structure can't be looked up yet, so only
insert the pag once it is fully set up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:36 +0000 (20:18 -0800)]
xfs: split xfs_initialize_perag
Factor out a xfs_perag_alloc helper that allocates a single perag
structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:36 +0000 (20:18 -0800)]
xfs: convert remaining trace points to pass pag structures
Convert all tracepoints that take [mp,agno] tuples to take a pag argument
instead so that decoding only happens when tracepoints are enabled and to
clean up the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:35 +0000 (20:18 -0800)]
xfs: pass the pag to the xrep_newbt_extent_class tracepoints
This requires moving a few of the callsites a little bit to ensure that
we already have the reference, but allows for the decoding to only happen
when tracing is actually enabled, and cleans up the callsites a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:35 +0000 (20:18 -0800)]
xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points
This requires holding the pag refcount a little longer, but allows for the
decoding to only happen when tracing is actually enabled, and cleans up the
callsites a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:34 +0000 (20:18 -0800)]
xfs: pass objects to the xrep_ibt_walk_rmap tracepoint
Pass the perag structure and the irec so that the decoding is only done
when tracing is actually enabled and the call sites look a lot neater,
and remove the pointless class indirection.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:33 +0000 (20:18 -0800)]
xfs: pass the iunlink item to the xfs_iunlink_update_dinode trace point
So that decoding is only done when tracing is actually enabled and the
call site look a lot neater.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:33 +0000 (20:18 -0800)]
xfs: pass objects to the xfs_irec_merge_{pre,post} trace points
Pass the perag structure and the irec to these tracepoints so that the
decoding is only done when tracing is actually enabled and the call sites
look a lot neater.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:32 +0000 (20:18 -0800)]
xfs: pass a perag structure to the xfs_ag_resv_init_error trace point
And remove the single instance class indirection for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:32 +0000 (20:18 -0800)]
xfs: constify pag arguments to trace points
Trace points never modify their arguments. Mark all the pag objects
passed to trace points. The exception is the xfs_ag_resv_class, which
uses the xfs_perag_resv helper that can't be marked const due to
other users modifying the returned structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:31 +0000 (20:18 -0800)]
xfs: remove the unused xrep_bmap_walk_rmap trace point
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:30 +0000 (20:18 -0800)]
xfs: remove the unused trace_xfs_iwalk_ag trace point
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:30 +0000 (20:18 -0800)]
xfs: remove the mount field from struct xfs_busy_extents
The mount field is only passed to xfs_extent_busy_clear, which never uses
it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:29 +0000 (20:18 -0800)]
xfs: keep a reference to the pag for busy extents
Processing of busy extents requires the perag structure, so keep the
reference while they are in flight.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:29 +0000 (20:18 -0800)]
xfs: pass a pag to xfs_extent_busy_{search,reuse}
Replace the [mp,agno] tuple with the perag structure, which will become
more useful later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:28 +0000 (20:18 -0800)]
xfs: add a xfs_agino_to_ino helper
Add a helpers to convert an agino to an ino based on a pag structure.
This provides a simpler conversion and better type safety compared to the
existing code that passes the mount structure and the agno separately.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:28 +0000 (20:18 -0800)]
xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers
Add helpers to convert an agbno to a daddr or fsbno based on a pag
structure.
This provides a simpler conversion and better type safety compared to the
existing code that passes the mount structure and the agno separately.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:27 +0000 (20:18 -0800)]
xfs: remove the agno argument to xfs_free_ag_extent
xfs_free_ag_extent already has a pointer to the pag structure through
the agf buffer. Use that instead of passing the redundant argument,
and do the same for the tracepoint.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:26 +0000 (20:18 -0800)]
xfs: pass a pag to xfs_difree_inode_chunk
We'll want to use more than just the agno field in a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:26 +0000 (20:18 -0800)]
xfs: remove the unused pag_active_wq field in struct xfs_perag
pag_active_wq is only woken, but never waited for.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:25 +0000 (20:18 -0800)]
xfs: remove the unused pagb_count field in struct xfs_perag
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Christoph Hellwig [Mon, 4 Nov 2024 04:18:25 +0000 (20:18 -0800)]
xfs: fix superfluous clearing of info->low in __xfs_getfsmap_datadev
The for_each_perag helpers update the agno passed in for each iteration,
and thus the "if (pag->pag_agno == start_ag)" check will always be true.
Add another variable for the loop iterator so that the field is only
cleared after the first iteration.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 4 Nov 2024 04:18:24 +0000 (20:18 -0800)]
xfs: fix simplify extent lookup in xfs_can_free_eofblocks
In commit
11f4c3a53adde, we tried to simplify the extent lookup in
xfs_can_free_eofblocks so that it doesn't incur the overhead of all the
extra stuff that xfs_bmapi_read does around the iext lookup.
Unfortunately, this causes regressions on generic/603, xfs/108,
generic/219, xfs/173, generic/694, xfs/052, generic/230, and xfs/441
when always_cow is turned on. In all cases, the regressions take the
form of alwayscow files consuming rather more space than the golden
output is expecting. I observed that in all these cases, the cause of
the excess space usage was due to CoW fork delalloc reservations that go
beyond EOF.
For alwayscow files we allow posteof delalloc CoW reservations because
all writes go through the CoW fork. Recall that all extents in the CoW
fork are accounted for via i_delayed_blks, which means that prior to
this patch, we'd invoke xfs_free_eofblocks on first close if anything
was in the CoW fork. Now we don't do that.
Fix the problem by reverting the removal of the i_delayed_blks check.
Cc: <stable@vger.kernel.org> # v6.12-rc1
Fixes:
11f4c3a53adde ("xfs: simplify extent lookup in xfs_can_free_eofblocks")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Linus Torvalds [Mon, 4 Nov 2024 00:05:52 +0000 (14:05 -1000)]
Linux 6.12-rc6
Linus Torvalds [Sun, 3 Nov 2024 20:25:05 +0000 (10:25 -1000)]
Merge tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM.
The usual collection of singletons - please see the changelogs"
* tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
mm: shrinker: avoid memleak in alloc_shrinker_info
.mailmap: update e-mail address for Eugen Hristev
vmscan,migrate: fix page count imbalance on node stats when demoting pages
mailmap: update Jarkko's email addresses
mm: allow set/clear page_type again
nilfs2: fix potential deadlock with newly created symlinks
Squashfs: fix variable overflow in squashfs_readpage_block
kasan: remove vmalloc_percpu test
tools/mm: -Werror fixes in page-types/slabinfo
mm, swap: avoid over reclaim of full clusters
mm: fix PSWPIN counter for large folios swap-in
mm: avoid VM_BUG_ON when try to map an anon large folio to zero page.
mm/codetag: fix null pointer check logic for ref and tag
mm/gup: stop leaking pinned pages in low memory conditions
Linus Torvalds [Sun, 3 Nov 2024 20:19:34 +0000 (10:19 -1000)]
Merge tag 'phy-fixes-6.12' of git://git./linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:
- Qualcomm QMP driver fixes for null deref on suspend, bogus supplies
fix and reset entries fix
- BCM usb driver init array fix
- cadence array offset fix
- starfive link configuration fix
- config dependency fix for rockchip driver
- freescale reset signal fix before pll lock
- tegra driver fix for error pointer check
* tag 'phy-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: tegra: xusb: Add error pointer check in xusb.c
dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Fix X1E80100 resets entries
phy: freescale: imx8m-pcie: Do CMN_RST just before PHY PLL lock check
phy: phy-rockchip-samsung-hdptx: Depend on CONFIG_COMMON_CLK
phy: ti: phy-j721e-wiz: fix usxgmii configuration
phy: starfive: jh7110-usb: Fix link configuration to controller
phy: qcom: qmp-pcie: drop bogus x1e80100 qref supplies
phy: qcom: qmp-combo: move driver data initialisation earlier
phy: qcom: qmp-usbc: fix NULL-deref on runtime suspend
phy: qcom: qmp-usb-legacy: fix NULL-deref on runtime suspend
phy: qcom: qmp-usb: fix NULL-deref on runtime suspend
dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: add missing x1e80100 pipediv2 clocks
phy: usb: disable COMMONONN for dual mode
phy: cadence: Sierra: Fix offset of DEQ open eye algorithm control register
phy: usb: Fix missing elements in BCM4908 USB init array
Linus Torvalds [Sun, 3 Nov 2024 20:15:50 +0000 (10:15 -1000)]
Merge tag 'dmaengine-fix-6.12' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
- TI driver fix to set EOP for cyclic BCDMA transfers
- sh rz-dmac driver fix for handling config with zero address
* tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: ti: k3-udma: Set EOP for all TRs in cyclic BCDMA transfer
dmaengine: sh: rz-dmac: handle configs where one address is zero
Linus Torvalds [Sun, 3 Nov 2024 18:51:53 +0000 (08:51 -1000)]
Merge tag 'driver-core-6.12-rc6' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core revert from Greg KH:
"Here is a single driver core revert for 6.12-rc6. It reverts a change
that came in -rc1 that was supposed to resolve a reported problem, but
caused another one, so revert it for now so that we can get this all
worked out properly in 6.13.
The revert has been in linux-next all week with no reported issues"
* tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "driver core: Fix uevent_show() vs driver detach race"
Linus Torvalds [Sun, 3 Nov 2024 18:48:11 +0000 (08:48 -1000)]
Merge tag 'usb-6.12-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 6.12-rc6 that
have been sitting in my tree this week. Included in here are the
following:
- thunderbolt driver fixes for reported issues
- USB typec driver fixes
- xhci driver fixes for reported problems
- dwc2 driver revert for a broken change
- usb phy driver fix
- usbip tool fix
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices
usb: phy: Fix API devm_usb_put_phy() can not release the phy
usb: typec: use cleanup facility for 'altmodes_node'
usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes()
usb: typec: qcom-pmic-typec: fix missing fwnode removal in error path
usb: typec: qcom-pmic-typec: use fwnode_handle_put() to release fwnodes
usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links
Revert "usb: dwc2: Skip clock gating on Broadcom SoCs"
xhci: Fix Link TRB DMA in command ring stopped completion event
xhci: Use pm_runtime_get to prevent RPM on unsupported systems
usbip: tools: Fix detach_port() invalid port error path
thunderbolt: Honor TMU requirements in the domain when setting TMU mode
thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan()
Yu Zhao [Sat, 19 Oct 2024 01:29:39 +0000 (01:29 +0000)]
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
When the MM_WALK capability is enabled, memory that is mostly accessed by
a VM appears younger than it really is, therefore this memory will be less
likely to be evicted. Therefore, the presence of a running VM can
significantly increase swap-outs for non-VM memory, regressing the
performance for the rest of the system.
Fix this regression by always calling {ptep,pmdp}_clear_young_notify()
whenever we clear the young bits on PMDs/PTEs.
[jthoughton@google.com: fix link-time error]
Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com
Fixes:
bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Reported-by: David Stevens <stevensd@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yu Zhao [Sat, 19 Oct 2024 01:29:38 +0000 (01:29 +0000)]
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/
20240926013506.860253-18-jthoughton@google.com/
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes:
bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sun, 3 Nov 2024 18:45:03 +0000 (08:45 -1000)]
Merge tag 'char-misc-6.12-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Pull misc driver fixes from Greg KH:
"Here are some small char/misc/iio fixes for 6.12-rc6 that resolve
some reported issues. Included in here are the following:
- small IIO driver fixes for many reported issues
- mei driver fix for a suddenly much reported issue for an "old"
issue.
- MAINTAINERS update for a developer who has moved companies and
forgot to update their old entry.
All of these have been in linux-next this week with no reported
issues"
* tag 'char-misc-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mei: use kvmalloc for read buffer
MAINTAINERS: add netup_unidvb maintainer
iio: dac: Kconfig: Fix build error for ltc2664
iio: adc: ad7124: fix division by zero in ad7124_set_channel_odr()
staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg()
docs: iio: ad7380: fix supply for ad7380-4
iio: adc: ad7380: fix supplies for ad7380-4
iio: adc: ad7380: add missing supplies
iio: adc: ad7380: use devm_regulator_get_enable_read_voltage()
dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply
iio: light: veml6030: fix microlux value calculation
iio: gts-helper: Fix memory leaks for the error path of iio_gts_build_avail_scale_table()
iio: gts-helper: Fix memory leaks in iio_gts_build_avail_scale_table()
Linus Torvalds [Sun, 3 Nov 2024 18:35:29 +0000 (08:35 -1000)]
Merge tag 'input-for-v6.12-rc5' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for regression in input core introduced in 6.11 preventing
re-registering input handlers
- a fix for adp5588-keys driver tyring to disable interrupt 0 at
suspend when devices is used without interrupt
- a fix for edt-ft5x06 to stop leaking regmap structure when probing
fails and to make sure it is not released too early on removal.
* tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: fix regression when re-registering input handlers
Input: adp5588-keys - do not try to disable interrupt 0
Input: edt-ft5x06 - fix regmap leak when probe fails
Linus Torvalds [Sun, 3 Nov 2024 18:29:02 +0000 (08:29 -1000)]
Merge tag 'kbuild-fixes-v6.12-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix a memory leak in modpost
- Resolve build issues when cross-compiling RPM and Debian packages
- Fix another regression in Kconfig
- Fix incorrect MODULE_ALIAS() output in modpost
* tag 'kbuild-fixes-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host
modpost: fix acpi MODULE_DEVICE_TABLE built with mismatched endianness
kconfig: show sub-menu entries even if the prompt is hidden
kbuild: deb-pkg: add pkg.linux-upstream.nokerneldbg build profile
kbuild: deb-pkg: add pkg.linux-upstream.nokernelheaders build profile
kbuild: rpm-pkg: disable kernel-devel package when cross-compiling
sumversion: Fix a memory leak in get_src_version()
Linus Torvalds [Sun, 3 Nov 2024 18:26:00 +0000 (08:26 -1000)]
Merge tag 'x86-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A trivial compile test fix for x86:
When CONFIG_AMD_NB is not set a COMPILE_TEST of an AMD specific driver
fails due to a missing inline stub. Add the stub to cure it"
* tag 'x86-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd_nb: Fix compile-testing without CONFIG_AMD_NB
Linus Torvalds [Sun, 3 Nov 2024 18:22:21 +0000 (08:22 -1000)]
Merge tag 'timers-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for posix CPU timers.
When a thread is cloned, the posix CPU timers are not inherited.
If the parent has a CPU timer armed the corresponding tick dependency
in the tasks tick_dep_mask is set and copied to the new thread, which
means the new thread and all decendants will prevent the system to go
into full NOHZ operation.
Clear the tick dependency mask in copy_process() to fix this"
* tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
Linus Torvalds [Sun, 3 Nov 2024 18:18:28 +0000 (08:18 -1000)]
Merge tag 'sched-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
- Plug a race between pick_next_task_fair() and try_to_wake_up() where
both try to write to the same task, even though both paths hold a
runqueue lock, but obviously from different runqueues.
The problem is that the store to task::on_rq in __block_task() is
visible to try_to_wake_up() which assumes that the task is not
queued. Both sides then operate on the same task.
Cure it by rearranging __block_task() so the the store to task::on_rq
is the last operation on the task.
- Prevent a potential NULL pointer dereference in task_numa_work()
task_numa_work() iterates the VMAs of a process. A concurrent unmap
of the address space can result in a NULL pointer return from
vma_next() which is unchecked.
Add the missing NULL pointer check to prevent this.
- Operate on the correct scheduler policy in task_should_scx()
task_should_scx() returns true when a task should be handled by sched
EXT. It checks the tasks scheduling policy.
This fails when the check is done before a policy has been set.
Cure it by handing the policy into task_should_scx() so it operates
on the requested value.
- Add the missing handling of sched EXT in the delayed dequeue
mechanism. This was simply forgotten.
* tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/ext: Fix scx vs sched_delayed
sched: Pass correct scheduling policy to __setscheduler_class
sched/numa: Fix the potential null pointer dereference in task_numa_work()
sched: Fix pick_next_task_fair() vs try_to_wake_up() race
Linus Torvalds [Sun, 3 Nov 2024 18:13:52 +0000 (08:13 -1000)]
Merge tag 'perf-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"perf_event_clear_cpumask() uses list_for_each_entry_rcu() without
being in a RCU read side critical section, which triggers a
'suspicious RCU usage' warning.
It turns out that the list walk does not be RCU protected because the
write side lock is held in this context.
Change it to a regular list walk"
* tag 'perf-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix missing RCU reader protection in perf_event_clear_cpumask()
Linus Torvalds [Sun, 3 Nov 2024 18:09:25 +0000 (08:09 -1000)]
Merge tag 'irq-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- Fix an off-by-one error in the failure path of msi_domain_alloc(),
which causes the cleanup loop to terminate early and leaking the
first allocated interrupt.
- Handle a corner case in GIC-V4 versus a lazily mapped Virtual
Processing Element (VPE). If the VPE has not been mapped because the
guest has not yet emitted a mapping command, then the set_affinity()
callback returns an error code, which causes the vCPU management to
fail.
Return success in this case without touching the hardware. This will
be done later when the guest issues the mapping command.
* tag 'irq-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEs
genirq/msi: Fix off-by-one error in msi_domain_alloc()
Masahiro Yamada [Sun, 3 Nov 2024 12:52:57 +0000 (21:52 +0900)]
modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host
When building a 64-bit kernel on a 32-bit build host, incorrect
input MODULE_ALIAS() entries may be generated.
For example, when compiling a 64-bit kernel with CONFIG_INPUT_MOUSEDEV=m
on a 64-bit build machine, you will get the correct output:
$ grep MODULE_ALIAS drivers/input/mousedev.mod.c
MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*110,*r*0,*1,*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*r*8,*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*14A,*r*a*0,*1,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*145,*r*a*0,*1,*18,*1C,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*110,*r*a*0,*1,*m*l*s*f*w*");
However, building the same kernel on a 32-bit machine results in
incorrect output:
$ grep MODULE_ALIAS drivers/input/mousedev.mod.c
MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*110,*130,*r*0,*1,*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*r*8,*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*14A,*16A,*r*a*0,*1,*20,*21,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*145,*165,*r*a*0,*1,*18,*1C,*20,*21,*38,*3C,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*110,*130,*r*a*0,*1,*20,*21,*m*l*s*f*w*");
A similar issue occurs with CONFIG_INPUT_JOYDEV=m. On a 64-bit build
machine, the output is:
$ grep MODULE_ALIAS drivers/input/joydev.mod.c
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*0,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*2,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*8,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*6,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*120,*r*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*130,*r*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*2C0,*r*a*m*l*s*f*w*");
However, on a 32-bit machine, the output is incorrect:
$ grep MODULE_ALIAS drivers/input/joydev.mod.c
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*0,*20,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*2,*22,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*8,*28,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*6,*26,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*11F,*13F,*r*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*11F,*13F,*r*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*2C0,*2E0,*r*a*m*l*s*f*w*");
When building a 64-bit kernel, BITS_PER_LONG is defined as 64. However,
on a 32-bit build machine, the constant 1L is a signed 32-bit value.
Left-shifting it beyond 32 bits causes wraparound, and shifting by 31
or 63 bits makes it a negative value.
The fix in commit
e0e92632715f ("[PATCH] PATCH: 1 line 2.6.18 bugfix:
modpost-64bit-fix.patch") is incorrect; it only addresses cases where
a 64-bit kernel is built on a 64-bit build machine, overlooking cases
on a 32-bit build machine.
Using 1ULL ensures a 64-bit width on both 32-bit and 64-bit machines,
avoiding the wraparound issue.
Fixes:
e0e92632715f ("[PATCH] PATCH: 1 line 2.6.18 bugfix: modpost-64bit-fix.patch")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Masahiro Yamada [Sun, 3 Nov 2024 12:46:50 +0000 (21:46 +0900)]
modpost: fix acpi MODULE_DEVICE_TABLE built with mismatched endianness
When CONFIG_SATA_AHCI_PLATFORM=m, modpost outputs incorect acpi
MODULE_ALIAS() if the endianness of the target and the build machine
do not match.
When the endianness of the target kernel and the build machine match,
the output is correct:
$ grep 'MODULE_ALIAS("acpi' drivers/ata/ahci_platform.mod.c
MODULE_ALIAS("acpi*:APMC0D33:*");
MODULE_ALIAS("acpi*:010601:*");
However, when building a little-endian kernel on a big-endian machine
(or vice versa), the output is incorrect:
$ grep 'MODULE_ALIAS("acpi' drivers/ata/ahci_platform.mod.c
MODULE_ALIAS("acpi*:APMC0D33:*");
MODULE_ALIAS("acpi*:0601??:*");
The 'cls' and 'cls_msk' fields are 32-bit.
DEF_FIELD() must be used instead of DEF_FIELD_ADDR() to correctly handle
endianness of these 32-bit fields.
The check 'if (cls)' was unnecessary; it never became NULL, as it was
the pointer to 'symval' plus the offset to the 'cls' field.
Fixes:
26095a01d359 ("ACPI / scan: Add support for ACPI _CLS device matching")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Dmitry Torokhov [Mon, 28 Oct 2024 05:31:15 +0000 (22:31 -0700)]
Input: fix regression when re-registering input handlers
Commit
d469647bafd9 ("Input: simplify event handling logic") introduced
code that would set handler->events() method to either
input_handler_events_filter() or input_handler_events_default() or
input_handler_events_null(), depending on the kind of input handler
(a filter or a regular one) we are dealing with. Unfortunately this
breaks cases when we try to re-register the same filter (as is the case
with sysrq handler): after initial registration the handler will have 2
event handling methods defined, and will run afoul of the check in
input_handler_check_methods():
input: input_handler_check_methods: only one event processing method can be defined (sysrq)
sysrq: Failed to register input handler, error -22
Fix this by adding handle_events() method to input_handle structure and
setting it up when registering a new input handle according to event
handling methods defined in associated input_handler structure, thus
avoiding modifying the input_handler structure.
Reported-by: "Ned T. Crigler" <crigler@gmail.com>
Reported-by: Christian Heusel <christian@heusel.eu>
Tested-by: "Ned T. Crigler" <crigler@gmail.com>
Tested-by: Peter Seiderer <ps.report@gmx.net>
Fixes:
d469647bafd9 ("Input: simplify event handling logic")
Link: https://lore.kernel.org/r/Zx2iQp6csn42PJA7@xavtug
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Sat, 2 Nov 2024 19:27:11 +0000 (09:27 -1000)]
Merge tag 'nfsd-6.12-3' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix two async COPY bugs found during NFS bake-a-thon
- Fix an svcrdma memory leak
* tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
rpcrdma: Always release the rpcrdma_device's xa_array
NFSD: Never decrement pending_async_copies on error
NFSD: Initialize struct nfsd4_copy earlier
Linus Torvalds [Sat, 2 Nov 2024 19:22:16 +0000 (09:22 -1000)]
Merge tag 'xfs-6.12-fixes-6' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- fix a sysbot reported crash on filestreams
- Reduce cpu time spent searching for extents in a very fragmented FS
- Check for delayed allocations before setting extsize
* tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: streamline xfs_filestream_pick_ag
xfs: fix finding a last resort AG in xfs_filestream_pick_ag
xfs: Reduce unnecessary searches when searching for the best extents
xfs: Check for delayed allocations before setting extsize
Linus Torvalds [Sat, 2 Nov 2024 02:05:50 +0000 (16:05 -1000)]
Merge tag 'linux_kselftest-fixes-6.12-rc6' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- fix syntax error in frequency calculation arithmetic expression in
intel_pstate run.sh
- add missing cpupower dependency check intel_pstate run.sh
- fix idmap_mount_tree_invalid test failure due to incorrect argument
- fix watchdog-test run leaving the watchdog timer enabled causing
system reboot. With this fix, the test disables the watchdog timer
when it gets terminated with SIGTERM, SIGKILL, and SIGQUIT in
addition to SIGINT
* tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/watchdog-test: Fix system accidentally reset after watchdog-test
selftests/intel_pstate: check if cpupower is installed
selftests/intel_pstate: fix operand expected error
selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
Linus Torvalds [Sat, 2 Nov 2024 01:59:46 +0000 (15:59 -1000)]
Merge tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Avoid build errors with old 'rustc's without LLVM patch version
(important since it impacts people that do not even enable Rust)
- Update LLVM version for 'HAVE_CFI_ICALL_NORMALIZE_INTEGERS' in
'depends on' condition (the fix was eventually backported rather
than land in LLVM 19)"
* tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux:
cfi: tweak llvm version for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
kbuild: rust: avoid errors with old `rustc`s without LLVM patch version
Linus Torvalds [Sat, 2 Nov 2024 01:44:23 +0000 (15:44 -1000)]
Merge tag 'pci-v6.12-fixes-2' of git://git./linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Enable device-specific ACS-like functionality even if the device
doesn't advertise an ACS capability, which got broken when adding
fancy ACS kernel parameter (Jason Gunthorpe)
* tag 'pci-v6.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix pci_enable_acs() support for the ACS quirks
Linus Torvalds [Sat, 2 Nov 2024 01:37:09 +0000 (15:37 -1000)]
Merge tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes pull, nothing too out of the ordinary, the mediatek
fixes came in a batch that I might have preferred a bit earlier but
all seem fine, otherwise regular xe/amdgpu and a few misc ones.
xe:
- Fix missing HPD interrupt enabling, bringing one PM refactor with it
- Workaround LNL GGTT invalidation not being visible to GuC
- Avoid getting jobs stuck without a protecting timeout
ivpu:
- Fix firewall IRQ handling
panthor:
- Fix firmware initialization wrt page sizes
- Fix handling and reporting of dead job groups
sched:
- Guarantee forward progress via WC_MEM_RECLAIM
tests:
- Fix memory leak in drm_display_mode_from_cea_vic()
amdgpu:
- DCN 3.5 fix
- Vangogh SMU KASAN fix
- SMU 13 profile reporting fix
mediatek:
- Fix degradation problem of alpha blending
- Fix color format MACROs in OVL
- Fix get efuse issue for MT8188 DPTX
- Fix potential NULL dereference in mtk_crtc_destroy()
- Correct dpi power-domains property
- Add split subschema property constraints"
* tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
drm/xe: Don't short circuit TDR on jobs not started
drm/xe: Add mmio read before GGTT invalidate
drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic()
drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic()
drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()
drm/panthor: Report group as timedout when we fail to properly suspend
drm/panthor: Fail job creation when the group is dead
drm/panthor: Fix firmware initialization on systems with a page size > 4k
accel/ivpu: Fix NOC firewall interrupt handling
drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume
drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling
drm/xe: Remove runtime argument from display s/r functions
drm/amdgpu/smu13: fix profile reporting
drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"
drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM
drm/tegra: Fix NULL vs IS_ERR() check in probe()
dt-bindings: display: mediatek: split: add subschema property constraints
dt-bindings: display: mediatek: dpi: correct power-domains property
drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy()
...
Linus Torvalds [Sat, 2 Nov 2024 01:22:57 +0000 (15:22 -1000)]
Merge tag 'cxl-fixes-6.12-rc6' of git://git./linux/kernel/git/cxl/cxl
Pull cxl fixes from Ira Weiny:
"The bulk of these fixes center around an initialization order bug
reported by Gregory Price and some additional fall out from the
debugging effort.
In summary, cxl_acpi and cxl_mem race and previously worked because of
a bus_rescan_devices() while testing without modules built in.
Unfortunately with modules built in the rescan would fail due to the
cxl_port driver being registered late via the build order. Furthermore
it was found bus_rescan_devices() did not guarantee a probe barrier
which CXL was expecting. Additional fixes to cxl-test and decoder
allocation came along as they were found in this debugging effort.
The other fixes are pretty minor but one affects trace point data seen
by user space.
Summary:
- Fix crashes when running with cxl-test code
- Fix Trace DRAM Event Record field decodes
- Fix module/built in initialization order errors
- Fix use after free on decoder shutdowns
- Fix out of order decoder allocations
- Improve cxl-test to better reflect real world systems"
* tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/test: Improve init-order fidelity relative to real-world systems
cxl/port: Prevent out-of-order decoder allocation
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
cxl/port: Fix CXL port initialization order when the subsystem is built-in
cxl/events: Fix Trace DRAM Event Record
cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
Linus Torvalds [Fri, 1 Nov 2024 23:41:55 +0000 (13:41 -1000)]
Merge tag 'block-6.12-
20241101' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fixup for a recent blk_rq_map_user_bvec() patch
- NVMe pull request via Keith:
- Spec compliant identification fix (Keith)
- Module parameter to enable backward compatibility on unusual
namespace formats (Keith)
- Target double free fix when using keys (Vitaliy)
- Passthrough command error handling fix (Keith)
* tag 'block-6.12-
20241101' of git://git.kernel.dk/linux:
nvme: re-fix error-handling for io_uring nvme-passthrough
nvmet-auth: assign dh_key to NULL after kfree_sensitive
nvme: module parameter to disable pi with offsets
block: fix queue limits checks in blk_rq_map_user_bvec for real
nvme: enhance cns version checking
Linus Torvalds [Fri, 1 Nov 2024 23:38:01 +0000 (13:38 -1000)]
Merge tag 'io_uring-6.12-
20241101' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
- Fix not honoring IOCB_NOWAIT for starting buffered writes in terms of
calling sb_start_write(), leading to a deadlock if someone is
attempting to freeze the file system with writes in progress, as each
side will end up waiting for the other to make progress.
* tag 'io_uring-6.12-
20241101' of git://git.kernel.dk/linux:
io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
Linus Torvalds [Fri, 1 Nov 2024 19:04:23 +0000 (09:04 -1000)]
Merge tag 'acpi-6.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make the ACPI CPPC library use a raw spinlock for operations carried
out in scheduler context via the schedutil governor and the ACPI CPPC
cpufreq driver (Pierre Gondois)"
* tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Make rmw_lock a raw_spin_lock
Linus Torvalds [Fri, 1 Nov 2024 19:03:02 +0000 (09:03 -1000)]
Merge tag 'gpio-fixes-for-v6.12-rc6' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an uninitialized variable in GPIO swnode code
- add a missing return value check for devm_mutex_init()
- fix an old issue with debugfs output
* tag 'gpio-fixes-for-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: fix debugfs dangling chip separator
gpiolib: fix debugfs newline separators
gpio: sloppy-logic-analyzer: Check for error code from devm_mutex_init() call
gpio: fix uninit-value in swnode_find_gpio
Dave Airlie [Fri, 1 Nov 2024 18:44:02 +0000 (04:44 +1000)]
Merge tag 'drm-xe-fixes-2024-10-31' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix missing HPD interrupt enabling, bringing one PM refactor with it
(Imre / Maarten)
- Workaround LNL GGTT invalidation not being visible to GuC
(Matthew Brost)
- Avoid getting jobs stuck without a protecting timeout (Matthew Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/tsbftadm7owyizzdaqnqu7u4tqggxgeqeztlfvmj5fryxlfomi@5m5bfv2zvzmw
Linus Torvalds [Fri, 1 Nov 2024 18:26:38 +0000 (08:26 -1000)]
Merge tag 'riscv-for-linus-6.11-rc6' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- Avoid accessing the early boot ACPI tables via unsafe memory
attributes, which can result in incorrect ACPI table data appearing.
This can cause all sorts of bad behavior.
- Avoid compiler-inserted library calls in the VDSO.
- GCC+Rust builds have been disabled, to avoid issues related to ISA
string mismatched between the GCC and LLVM Rust implementations.
- The NX flag is now set in the EFI PE/COFF headers, which is necessary
for some distro GRUB versions to boot images.
- A fix to avoid leaking DT node reference counts on ACPI systems
during cache info parsing.
- CPU numbers are now printed as unsigned values during hotplug.
- A pair of build fixes for usused macros, which can trigger warnings
on some configurations.
* tag 'riscv-for-linus-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Remove duplicated GET_RM
riscv: Remove unused GENERATING_ASM_OFFSETS
riscv: Use '%u' to format the output of 'cpu'
riscv: Prevent a bad reference count on CPU nodes
riscv: efi: Set NX compat flag in PE/COFF header
RISC-V: disallow gcc + rust builds
riscv: Do not use fortify in early code
RISC-V: ACPI: fix early_ioremap to early_memremap
riscv: vdso: Prevent the compiler from inserting calls to memset()
Linus Torvalds [Fri, 1 Nov 2024 17:54:11 +0000 (07:54 -1000)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The important one is a change to the way in which we handle protection
keys around signal delivery so that we're more closely aligned with
the x86 behaviour, however there is also a revert of the previous fix
to disable software tag-based KASAN with GCC, since a workaround
materialised shortly afterwards.
I'd love to say we're done with 6.12, but we're aware of some
longstanding fpsimd register corruption issues that we're almost at
the bottom of resolving.
Summary:
- Fix handling of POR_EL0 during signal delivery so that pushing the
signal context doesn't fail based on the pkey configuration of the
interrupted context and align our user-visible behaviour with that
of x86.
- Fix a bogus pointer being passed to the CPU hotplug code from the
Arm SDEI driver.
- Re-enable software tag-based KASAN with GCC by using an alternative
implementation of '__no_sanitize_address'"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
Revert "kasan: Disable Software Tag-Based KASAN with GCC"
kasan: Fix Software Tag-Based KASAN with GCC
Linus Torvalds [Fri, 1 Nov 2024 17:45:00 +0000 (07:45 -1000)]
Merge tag 'vfs-6.12-rc6.iomap' of gitolite.pub/scm/linux/kernel/git/vfs/vfs
Pull iomap fixes from Christian Brauner:
"Fixes for iomap to prevent data corruption bugs in the fallocate
unshare range implementation of fsdax and a small cleanup to turn
iomap_want_unshare_iter() into an inline function"
* tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iomap: turn iomap_want_unshare_iter into an inline function
fsdax: dax_unshare_iter needs to copy entire blocks
fsdax: remove zeroing code from dax_unshare_iter
iomap: share iomap_unshare_iter predicate code with fsdax
xfs: don't allocate COW extents when unsharing a hole
Linus Torvalds [Fri, 1 Nov 2024 17:37:10 +0000 (07:37 -1000)]
Merge tag 'vfs-6.12-rc6.fixes' of gitolite.pub/scm/linux/kernel/git/vfs/vfs
Pull filesystem fixes from Christian Brauner:
"VFS:
- Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set
- Add a get_tree_bdev_flags() helper that allows to modify e.g.,
whether errors are logged into the filesystem context during
superblock creation. This is used by erofs to fix a userspace
regression where an error is currently logged when its used on a
regular file which is an new allowed mode in erofs.
netfs:
- Fix the sysfs debug path in the documentation.
- Fix iov_iter_get_pages*() for folio queues by skipping the page
extracation if we're at the end of a folio.
afs:
- Fix moving subdirectories to different parent directory.
autofs:
- Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in
validate_dev_ioctl(). The actual ioctl number, not the ioctl
command needs to be checked for autofs"
* tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
autofs: fix thinko in validate_dev_ioctl()
iov_iter: Fix iov_iter_get_pages*() for folio_queue
afs: Fix missing subdir edit when renamed between parent dirs
doc: correcting the debug path for cachefiles
erofs: use get_tree_bdev_flags() to avoid misleading messages
fs/super.c: introduce get_tree_bdev_flags()
Linus Torvalds [Fri, 1 Nov 2024 17:31:47 +0000 (07:31 -1000)]
Merge tag 'for-6.12-rc5-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more stability fixes. There's one patch adding export of MIPS
cmpxchg helper, used in the error propagation fix.
- fix error propagation from split bios to the original btrfs bio
- fix merging of adjacent extents (normal operation, defragmentation)
- fix potential use after free after freeing btrfs device structures"
* tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix defrag not merging contiguous extents due to merged extent maps
btrfs: fix extent map merging not happening for adjacent extents
btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()
btrfs: fix error propagation of split bios
MIPS: export __cmpxchg_small()