Linus Torvalds [Tue, 19 Nov 2024 00:32:58 +0000 (16:32 -0800)]
Merge tag 'ext4_for_linus-6.13-rc1' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most
notably in the journaling code, bufered I/O, and compiler warning
cleanups"
* tag 'ext4_for_linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits)
jbd2: Fix comment describing journal_init_common()
ext4: prevent an infinite loop in the lazyinit thread
ext4: use struct_size() to improve ext4_htree_store_dirent()
ext4: annotate struct fname with __counted_by()
jbd2: avoid dozens of -Wflex-array-member-not-at-end warnings
ext4: use str_yes_no() helper function
ext4: prevent delalloc to nodelalloc on remount
jbd2: make b_frozen_data allocation always succeed
ext4: cleanup variable name in ext4_fc_del()
ext4: use string choices helpers
jbd2: remove the 'success' parameter from the jbd2_do_replay() function
jbd2: remove useless 'block_error' variable
jbd2: factor out jbd2_do_replay()
jbd2: refactor JBD2_COMMIT_BLOCK process in do_one_pass()
jbd2: unified release of buffer_head in do_one_pass()
jbd2: remove redundant judgments for check v1 checksum
ext4: use ERR_CAST to return an error-valued pointer
mm: zero range of eof folio exposed by inode size extension
ext4: partial zero eof block on unaligned inode size extension
ext4: disambiguate the return value of ext4_dio_write_end_io()
...
Linus Torvalds [Mon, 18 Nov 2024 22:54:10 +0000 (14:54 -0800)]
Merge tag 'pull-statx' of git://git./linux/kernel/git/viro/vfs
Pull statx updates from Al Viro:
"Sanitize struct filename and lookup flags handling in statx and
friends"
* tag 'pull-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
libfs: kill empty_dir_getattr()
fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag
fs/stat.c: switch to CLASS(fd_raw)
kill getname_statx_lookup_flags()
io_statx_prep(): use getname_uflags()
Linus Torvalds [Mon, 18 Nov 2024 20:58:23 +0000 (12:58 -0800)]
Merge tag 'pull-ufs' of git://git./linux/kernel/git/viro/vfs
Pull ufs updates from Al Viro:
"ufs cleanups, fixes and folio conversion"
* tag 'pull-ufs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: ufs_sb_private_info: remove unused s_{2,3}apb fields
ufs: Convert ufs_change_blocknr() to take a folio
ufs: Pass a folio to ufs_new_fragments()
ufs: Convert ufs_inode_getfrag() to take a folio
ufs: Convert ufs_extend_tail() to take a folio
ufs: Convert ufs_inode_getblock() to take a folio
ufs: take the handling of free block counters into a helper
clean ufs_trunc_direct() up a bit...
ufs: get rid of ubh_{ubhcpymem,memcpyubh}()
ufs_inode_getfrag(): remove junk comment
ufs_free_fragments(): fix the braino in sanity check
ufs_clusteracct(): switch to passing fragment number
ufs: untangle ubh_...block...(), part 3
ufs: untangle ubh_...block...(), part 2
ufs: untangle ubh_...block...() macros, part 1
ufs: fix ufs_read_cylinder() failure handling
ufs: missing ->splice_write()
ufs: fix handling of delete_entry and set_link failures
Linus Torvalds [Mon, 18 Nov 2024 20:44:25 +0000 (12:44 -0800)]
Merge tag 'pull-xattr' of git://git./linux/kernel/git/viro/vfs
Pull xattr updates from Al Viro:
"Sanitize xattr and io_uring interactions with it, add *xattrat()
syscalls, sanitize struct filename handling in there"
* tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr: remove redundant check on variable err
fs/xattr: add *at family syscalls
new helpers: file_removexattr(), filename_removexattr()
new helpers: file_listxattr(), filename_listxattr()
replace do_getxattr() with saner helpers.
replace do_setxattr() with saner helpers.
new helper: import_xattr_name()
fs: rename struct xattr_ctx to kernel_xattr_ctx
xattr: switch to CLASS(fd)
io_[gs]etxattr_prep(): just use getname()
io_uring: IORING_OP_F[GS]ETXATTR is fine with REQ_F_FIXED_FILE
getname_maybe_null() - the third variant of pathname copy-in
teach filename_lookup() to treat NULL filename as ""
Linus Torvalds [Mon, 18 Nov 2024 20:24:06 +0000 (12:24 -0800)]
Merge tag 'pull-fd' of git://git./linux/kernel/git/viro/vfs
Pull 'struct fd' class updates from Al Viro:
"The bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same scope
where they'd been created, getting rid of reassignments and passing
them by reference, converting to CLASS(fd{,_pos,_raw}).
We are getting very close to having the memory safety of that stuff
trivial to verify"
* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
deal with the last remaing boolean uses of fd_file()
css_set_fork(): switch to CLASS(fd_raw, ...)
memcg_write_event_control(): switch to CLASS(fd)
assorted variants of irqfd setup: convert to CLASS(fd)
do_pollfd(): convert to CLASS(fd)
convert do_select()
convert vfs_dedupe_file_range().
convert cifs_ioctl_copychunk()
convert media_request_get_by_fd()
convert spu_run(2)
switch spufs_calls_{get,put}() to CLASS() use
convert cachestat(2)
convert do_preadv()/do_pwritev()
fdget(), more trivial conversions
fdget(), trivial conversions
privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
o2hb_region_dev_store(): avoid goto around fdget()/fdput()
introduce "fd_pos" class, convert fdget_pos() users to it.
fdget_raw() users: switch to CLASS(fd_raw)
convert vmsplice() to CLASS(fd)
...
Linus Torvalds [Mon, 18 Nov 2024 19:44:06 +0000 (11:44 -0800)]
Merge tag 'vfs-6.13.ecryptfs' of git://git./linux/kernel/git/vfs/vfs
Pull ecryptfs updates from Christian Brauner:
"The folio project is about to remove page->index. This contains the
work required for ecryptfs"
* tag 'vfs-6.13.ecryptfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ecryptfs: Pass the folio index to crypt_extent()
ecryptfs: Convert lower_offset_for_page() to take a folio
ecryptfs: Convert ecryptfs_decrypt_page() to take a folio
ecryptfs: Convert ecryptfs_encrypt_page() to take a folio
ecryptfs: Convert ecryptfs_write_lower_page_segment() to take a folio
ecryptfs: Convert ecryptfs_write() to use a folio
ecryptfs: Convert ecryptfs_read_lower_page_segment() to take a folio
ecryptfs: Convert ecryptfs_copy_up_encrypted_with_header() to take a folio
ecryptfs: Use a folio throughout ecryptfs_read_folio()
ecryptfs: Convert ecryptfs_writepage() to ecryptfs_writepages()
Linus Torvalds [Mon, 18 Nov 2024 19:30:09 +0000 (11:30 -0800)]
Merge tag 'vfs-6.13.untorn.writes' of git://git./linux/kernel/git/vfs/vfs
Pull vfs untorn write support from Christian Brauner:
"An atomic write is a write issed with torn-write protection. This
means for a power failure or any hardware failure all or none of the
data from the write will be stored, never a mix of old and new data.
This work is already supported for block devices. If a block device is
opened with O_DIRECT and the block device supports atomic write, then
FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block
device.
This contains the work to expand atomic write support to filesystems,
specifically ext4 and XFS. Currently, only support for writing exactly
one filesystem block atomically is added.
Since it's now possible to have filesystem block size > page size for
XFS, it's possible to write 4K+ blocks atomically on x86"
* tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: drop an obsolete comment in iomap_dio_bio_iter
ext4: Do not fallback to buffered-io for DIO atomic write
ext4: Support setting FMODE_CAN_ATOMIC_WRITE
ext4: Check for atomic writes support in write iter
ext4: Add statx support for atomic writes
xfs: Support setting FMODE_CAN_ATOMIC_WRITE
xfs: Validate atomic writes
xfs: Support atomic write for statx
fs: iomap: Atomic write support
fs: Export generic_atomic_write_valid()
block: Add bdev atomic write limits helpers
fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
block/fs: Pass an iocb to generic_atomic_write_valid()
Linus Torvalds [Mon, 18 Nov 2024 19:05:26 +0000 (11:05 -0800)]
Merge tag 'vfs-6.13.tmpfs' of git://git./linux/kernel/git/vfs/vfs
Pull tmpfs case folding updates from Christian Brauner:
"This adds case-insensitive support for tmpfs.
The work contained in here adds support for case-insensitive file
names lookups in tmpfs. The main difference from other casefold
filesystems is that tmpfs has no information on disk, just on RAM, so
we can't use mkfs to create a case-insensitive tmpfs. For this
implementation, there's a mount option for casefolding. The rest of
the patchset follows a similar approach as ext4 and f2fs.
The use case for this feature is similar to the use case for ext4, to
better support compatibility layers (like Wine), particularly in
combination with sandboxing/container tools (like Flatpak).
Those containerization tools can share a subset of the host filesystem
with an application. In the container, the root directory and any
parent directories required for a shared directory are on tmpfs, with
the shared directories bind-mounted into the container's view of the
filesystem.
If the host filesystem is using case-insensitive directories, then the
application can do lookups inside those directories in a
case-insensitive way, without this needing to be implemented in
user-space. However, if the host is only sharing a subset of a
case-insensitive directory with the application, then the parent
directories of the mount point will be part of the container's root
tmpfs. When the application tries to do case-insensitive lookups of
those parent directories on a case-sensitive tmpfs, the lookup will
fail"
* tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
tmpfs: Initialize sysfs during tmpfs init
tmpfs: Fix type for sysfs' casefold attribute
libfs: Fix kernel-doc warning in generic_ci_validate_strict_name
docs: tmpfs: Add casefold options
tmpfs: Expose filesystem features via sysfs
tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs
tmpfs: Add casefold lookup support
libfs: Export generic_ci_ dentry functions
unicode: Recreate utf8_parse_version()
unicode: Export latest available UTF-8 version number
ext4: Use generic_ci_validate_strict_name helper
libfs: Create the helper function generic_ci_validate_strict_name()
Linus Torvalds [Mon, 18 Nov 2024 18:50:09 +0000 (10:50 -0800)]
Merge tag 'vfs-6.13.usercopy' of git://git./linux/kernel/git/vfs/vfs
Pull copy_struct_to_user helper from Christian Brauner:
"This adds a copy_struct_to_user() helper which is a companion helper
to the already widely used copy_struct_from_user().
It copies a struct from kernel space to userspace, in a way that
guarantees backwards-compatibility for struct syscall arguments as
long as future struct extensions are made such that all new fields are
appended to the old struct, and zeroed-out new fields have the same
meaning as the old struct.
The first user is sched_getattr() system call but the new extensible
pidfs ioctl will be ported to it as well"
* tag 'vfs-6.13.usercopy' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
sched_getattr: port to copy_struct_to_user
uaccess: add copy_struct_to_user helper
Linus Torvalds [Mon, 18 Nov 2024 18:47:46 +0000 (10:47 -0800)]
Merge tag 'vfs-6.13.pidfs' of git://git./linux/kernel/git/vfs/vfs
Pull pidfs update from Christian Brauner:
"This adds a new ioctl to retrieve information about a pidfd.
A common pattern when using pidfds is having to get information about
the process, which currently requires /proc being mounted, resolving
the fd to a pid, and then do manual string parsing of /proc/N/status
and friends. This needs to be reimplemented over and over in all
userspace projects (e.g.: it has been reimplemented in systemd, dbus,
dbus-daemon, polkit so far), and requires additional care in checking
that the fd is still valid after having parsed the data, to avoid
races.
Having a programmatic API that can be used directly removes all these
requirements, including having /proc mounted.
As discussed at LPC24, add an ioctl with an extensible struct so that
more parameters can be added later if needed. Start with returning
pid/tgid/ppid and some creds unconditionally, and cgroupid optionally"
* tag 'vfs-6.13.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pidfd: add ioctl to retrieve pid info
Linus Torvalds [Mon, 18 Nov 2024 18:45:06 +0000 (10:45 -0800)]
Merge tag 'vfs-6.13.ovl' of git://git./linux/kernel/git/vfs/vfs
Pull overlayfs updates from Christian Brauner:
"Make overlayfs support specifying layers through file descriptors.
Currently overlayfs only allows specifying layers through path names.
This is inconvenient for users that want to assemble an overlayfs
mount purely based on file descriptors:
This enables user to specify both:
fsconfig(fd_overlay, FSCONFIG_SET_FD, "upperdir+", NULL, fd_upper);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "workdir+", NULL, fd_work);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2);
in addition to:
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "upperdir+", "/upper", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "workdir+", "/work", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower1", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower2", 0);
There's also a large set of new overlayfs selftests to test new
features and some older properties"
* tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add test for specifying 500 lower layers
selftests: add overlayfs fd mounting selftests
selftests: use shared header
Documentation,ovl: document new file descriptor based layers
ovl: specify layers via file descriptors
fs: add helper to use mount option as path or fd
Linus Torvalds [Mon, 18 Nov 2024 18:30:29 +0000 (10:30 -0800)]
Merge tag 'vfs-6.13.file' of git://git./linux/kernel/git/vfs/vfs
Pull vfs file updates from Christian Brauner:
"This contains changes the changes for files for this cycle:
- Introduce a new reference counting mechanism for files.
As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop
it has O(N^2) behaviour under contention with N concurrent
operations and it is in a hot path in __fget_files_rcu().
The rcuref infrastructures remedies this problem by using an
unconditional increment relying on safe- and dead zones to make
this work and requiring rcu protection for the data structure in
question. This not just scales better it also introduces overflow
protection.
However, in contrast to generic rcuref, files require a memory
barrier and thus cannot rely on *_relaxed() atomic operations and
also require to be built on atomic_long_t as having massive amounts
of reference isn't unheard of even if it is just an attack.
This adds a file specific variant instead of making this a generic
library.
This has been tested by various people and it gives consistent
improvement up to 3-5% on workloads with loads of threads.
- Add a fastpath for find_next_zero_bit(). Skip 2-levels searching
via find_next_zero_bit() when there is a free slot in the word that
contains the next fd. This improves pts/blogbench-1.1.0 read by 8%
and write by 4% on Intel ICX 160.
- Conditionally clear full_fds_bits since it's very likely that a bit
in full_fds_bits has been cleared during __clear_open_fds(). This
improves pts/blogbench-1.1.0 read up to 13%, and write up to 5% on
Intel ICX 160.
- Get rid of all lookup_*_fdget_rcu() variants. They were used to
lookup files without taking a reference count. That became invalid
once files were switched to SLAB_TYPESAFE_BY_RCU and now we're
always taking a reference count. Switch to an already existing
helper and remove the legacy variants.
- Remove pointless includes of <linux/fdtable.h>.
- Avoid cmpxchg() in close_files() as nobody else has a reference to
the files_struct at that point.
- Move close_range() into fs/file.c and fold __close_range() into it.
- Cleanup calling conventions of alloc_fdtable() and expand_files().
- Merge __{set,clear}_close_on_exec() into one.
- Make __set_open_fd() set cloexec as well instead of doing it in two
separate steps"
* tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add file SLAB_TYPESAFE_BY_RCU recycling stressor
fs: port files to file_ref
fs: add file_ref
expand_files(): simplify calling conventions
make __set_open_fd() set cloexec state as well
fs: protect backing files with rcu
file.c: merge __{set,clear}_close_on_exec()
alloc_fdtable(): change calling conventions.
fs/file.c: add fast path in find_next_fd()
fs/file.c: conditionally clear full_fds
fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd()
move close_range(2) into fs/file.c, fold __close_range() into it
close_files(): don't bother with xchg()
remove pointless includes of <linux/fdtable.h>
get rid of ...lookup...fdget_rcu() family
Linus Torvalds [Mon, 18 Nov 2024 18:26:49 +0000 (10:26 -0800)]
Merge tag 'vfs-6.13.netfs' of git://git./linux/kernel/git/vfs/vfs
Pull netfs updates from Christian Brauner:
"Various fixes for the netfs library and related infrastructure:
cachefiles:
- Fix a dentry leak in cachefiles_open_file()
- Fix incorrect length return value in
cachefiles_ondemand_fd_write_iter()
- Fix missing pos updates in cachefiles_ondemand_fd_write_iter()
- Clean up in cachefiles_commit_tmpfile()
- Fix NULL pointer dereference in object->file
- Add a memory barrier for FSCACHE_VOLUME_CREATING
netfs:
- Remove call to folio_index()
- Fix a few minor bugs in netfs_page_mkwrite()
- Remove unnecessary references to pages"
* tag 'vfs-6.13.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs/fscache: Add a memory barrier for FSCACHE_VOLUME_CREATING
cachefiles: Fix NULL pointer dereference in object->file
cachefiles: Clean up in cachefiles_commit_tmpfile()
cachefiles: Fix missing pos updates in cachefiles_ondemand_fd_write_iter()
cachefiles: Fix incorrect length return value in cachefiles_ondemand_fd_write_iter()
netfs: Remove unnecessary references to pages
netfs: Fix a few minor bugs in netfs_page_mkwrite()
netfs: Remove call to folio_index()
Linus Torvalds [Mon, 18 Nov 2024 17:54:32 +0000 (09:54 -0800)]
Merge tag 'vfs-6.13.pagecache' of git://git./linux/kernel/git/vfs/vfs
Pull vfs pagecache updates from Christian Brauner:
"Cleanup filesystem page flag usage: This continues the work to make
the mappedtodisk/owner_2 flag available to filesystems which don't use
buffer heads. Further patches remove uses of Private2. This brings us
very close to being rid of it entirely"
* tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
migrate: Remove references to Private2
ceph: Remove call to PagePrivate2()
btrfs: Switch from using the private_2 flag to owner_2
mm: Remove PageMappedToDisk
nilfs2: Convert nilfs_copy_buffer() to use folios
fs: Move clearing of mappedtodisk to buffer.c
Linus Torvalds [Mon, 18 Nov 2024 17:51:32 +0000 (09:51 -0800)]
Merge tag 'vfs-6.13.rust.file' of git://git./linux/kernel/git/vfs/vfs
Pull vfs rust file abstractions from Christian Brauner:
"This contains the file abstractions needed by the Rust implementation
of the Binder driver and other parts of the kernel.
Let's treat this as a first attempt at getting something working but I
do expect the actual interfaces to change significantly over time.
Simply because we are still figuring out what actually works. But
there's no point in further theorizing. Let's see how it holds up with
actual users"
* tag 'vfs-6.13.rust.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
rust: task: adjust safety comments in Task methods
rust: add seqfile abstraction
rust: file: add abstraction for `poll_table`
rust: file: add `Kuid` wrapper
rust: file: add `FileDescriptorReservation`
rust: security: add abstraction for secctx
rust: cred: add Rust abstraction for `struct cred`
rust: file: add Rust abstraction for `struct file`
rust: task: add `Task::current_raw`
rust: types: add `NotThreadSafe`
Linus Torvalds [Mon, 18 Nov 2024 17:35:30 +0000 (09:35 -0800)]
Merge tag 'vfs-6.13.misc' of git://git./linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Fixup and improve NLM and kNFSD file lock callbacks
Last year both GFS2 and OCFS2 had some work done to make their
locking more robust when exported over NFS. Unfortunately, part of
that work caused both NLM (for NFS v3 exports) and kNFSD (for
NFSv4.1+ exports) to no longer send lock notifications to clients
This in itself is not a huge problem because most NFS clients will
still poll the server in order to acquire a conflicted lock
It's important for NLM and kNFSD that they do not block their
kernel threads inside filesystem's file_lock implementations
because that can produce deadlocks. We used to make sure of this by
only trusting that posix_lock_file() can correctly handle blocking
lock calls asynchronously, so the lock managers would only setup
their file_lock requests for async callbacks if the filesystem did
not define its own lock() file operation
However, when GFS2 and OCFS2 grew the capability to correctly
handle blocking lock requests asynchronously, they started
signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
for also trusting posix_lock_file() was inadvertently dropped, so
now most filesystems no longer produce lock notifications when
exported over NFS
Fix this by using an fop_flag which greatly simplifies the problem
and grooms the way for future uses by both filesystems and lock
managers alike
- Add a sysctl to delete the dentry when a file is removed instead of
making it a negative dentry
Commit
681ce8623567 ("vfs: Delete the associated dentry when
deleting a file") introduced an unconditional deletion of the
associated dentry when a file is removed. However, this led to
performance regressions in specific benchmarks, such as
ilebench.sum_operations/s, prompting a revert in commit
4a4be1ad3a6e ("Revert "vfs: Delete the associated dentry when
deleting a file""). This reintroduces the concept conditionally
through a sysctl
- Expand the statmount() system call:
* Report the filesystem subtype in a new fs_subtype field to
e.g., report fuse filesystem subtypes
* Report the superblock source in a new sb_source field
* Add a new way to return filesystem specific mount options in an
option array that returns filesystem specific mount options
separated by zero bytes and unescaped. This allows caller's to
retrieve filesystem specific mount options and immediately pass
them to e.g., fsconfig() without having to unescape or split
them
* Report security (LSM) specific mount options in a separate
security option array. We don't lump them together with
filesystem specific mount options as security mount options are
generic and most users aren't interested in them
The format is the same as for the filesystem specific mount
option array
- Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command
- Optimize acl_permission_check() to avoid costly {g,u}id ownership
checks if possible
- Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()
- Add synchronous wakeup support for ep_poll_callback.
Currently, epoll only uses wake_up() to wake up task. But sometimes
there are epoll users which want to use the synchronous wakeup flag
to give a hint to the scheduler, e.g., the Android binder driver.
So add a wake_up_sync() define, and use wake_up_sync() when sync is
true in ep_poll_callback()
Fixes:
- Fix kernel documentation for inode_insert5() and iget5_locked()
- Annotate racy epoll check on file->f_ep
- Make F_DUPFD_QUERY associative
- Avoid filename buffer overrun in initramfs
- Don't let statmount() return empty strings
- Add a cond_resched() to dump_user_range() to avoid hogging the CPU
- Don't query the device logical blocksize multiple times for hfsplus
- Make filemap_read() check that the offset is positive or zero
Cleanups:
- Various typo fixes
- Cleanup wbc_attach_fdatawrite_inode()
- Add __releases annotation to wbc_attach_and_unlock_inode()
- Add hugetlbfs tracepoints
- Fix various vfs kernel doc parameters
- Remove obsolete TODO comment from io_cancel()
- Convert wbc_account_cgroup_owner() to take a folio
- Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()
- Reorder struct posix_acl to save 8 bytes
- Annotate struct posix_acl with __counted_by()
- Replace one-element array with flexible array member in freevxfs
- Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"
* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
statmount: retrieve security mount options
vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
statmount: add flag to retrieve unescaped options
fs: add the ability for statmount() to report the sb_source
writeback: wbc_attach_fdatawrite_inode out of line
writeback: add a __releases annoation to wbc_attach_and_unlock_inode
fs: add the ability for statmount() to report the fs_subtype
fs: don't let statmount return empty strings
fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
hfsplus: don't query the device logical block size multiple times
freevxfs: Replace one-element array with flexible array member
fs: optimize acl_permission_check()
initramfs: avoid filename buffer overrun
fs/writeback: convert wbc_account_cgroup_owner to take a folio
acl: Annotate struct posix_acl with __counted_by()
acl: Realign struct posix_acl to save 8 bytes
epoll: Add synchronous wakeup support for ep_poll_callback
coredump: add cond_resched() to dump_user_range
mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
...
Linus Torvalds [Mon, 18 Nov 2024 17:33:34 +0000 (09:33 -0800)]
Merge tag 'vfs-6.13.mount.api' of git://git./linux/kernel/git/vfs/vfs
Pull vfs mount api conversions from Christian Brauner:
"Convert adfs, affs, befs, hfs, hfsplus, jfs, and hpfs to the new mount
api"
* tag 'vfs-6.13.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
efs: fix the efs new mount api implementation
ubifs: Convert ubifs to use the new mount API
hpfs: convert hpfs to use the new mount api
jfs: convert jfs to use the new mount api
hfsplus: convert hfsplus to use the new mount api
hfs: convert hfs to use the new mount api
befs: convert befs to use the new mount api
affs: convert affs to use the new mount api
adfs: convert adfs to use the new mount api
Linus Torvalds [Mon, 18 Nov 2024 17:15:39 +0000 (09:15 -0800)]
Merge tag 'vfs-6.13.mgtime' of git://git./linux/kernel/git/vfs/vfs
Pull vfs multigrain timestamps from Christian Brauner:
"This is another try at implementing multigrain timestamps. This time
with significant help from the timekeeping maintainers to reduce the
performance impact.
Thomas provided a base branch that contains the required timekeeping
interfaces for the VFS. It serves as the base for the multi-grain
timestamp work:
- Multigrain timestamps allow the kernel to use fine-grained
timestamps when an inode's attributes is being actively observed
via ->getattr(). With this support, it's possible for a file to get
a fine-grained timestamp, and another modified after it to get a
coarse-grained stamp that is earlier than the fine-grained time. If
this happens then the files can appear to have been modified in
reverse order, which breaks VFS ordering guarantees.
To prevent this, a floor value is maintained for multigrain
timestamps. Whenever a fine-grained timestamp is handed out, record
it, and when later coarse-grained stamps are handed out, ensure
they are not earlier than that value. If the coarse-grained
timestamp is earlier than the fine-grained floor, return the floor
value instead.
The timekeeper changes add a static singleton atomic64_t into
timekeeper.c that is used to keep track of the latest fine-grained
time ever handed out. This is tracked as a monotonic ktime_t value
to ensure that it isn't affected by clock jumps. Because it is
updated at different times than the rest of the timekeeper object,
the floor value is managed independently of the timekeeper via a
cmpxchg() operation, and sits on its own cacheline.
Two new public timekeeper interfaces are added:
(1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the
later of the coarse-grained clock and the floor time
(2) ktime_get_real_ts64_mg() gets the fine-grained clock value,
and tries to swap it into the floor. A timespec64 is filled
with the result.
- The VFS has always used coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around
1 per jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting
via NFSv3, which relies on timestamps to validate caches. A lot of
changes can happen in a jiffy, so timestamps aren't sufficient to
help the client decide when to invalidate the cache. Even with
NFSv4, a lot of exported filesystems don't properly support a
change attribute and are subject to the same problems with
timestamp granularity. Other applications have similar issues with
timestamps (e.g backup applications).
If we were to always use fine-grained timestamps, that would
improve the situation, but that becomes rather expensive, as the
underlying filesystem would have to log a lot more metadata
updates.
This adds a way to only use fine-grained timestamps when they are
being actively queried. Use the (unused) top bit in
inode->i_ctime_nsec as a flag that indicates whether the current
timestamps have been queried via stat() or the like. When it's set,
we allow the kernel to use a fine-grained timestamp iff it's
necessary to make the ctime show a different value.
This solves the problem of being able to distinguish the timestamp
between updates, but introduces a new problem: it's now possible
for a file being changed to get a fine-grained timestamp. A file
that is altered just a bit later can then get a coarse-grained one
that appears older than the earlier fine-grained time. This
violates timestamp ordering guarantees.
This is where the earlier mentioned timkeeping interfaces help. A
global monotonic atomic64_t value is kept that acts as a timestamp
floor. When we go to stamp a file, we first get the latter of the
current floor value and the current coarse-grained time. If the
inode ctime hasn't been queried then we just attempt to stamp it
with that value.
If it has been queried, then first see whether the current coarse
time is later than the existing ctime. If it is, then we accept
that value. If it isn't, then we get a fine-grained time and try to
swap that into the global floor. Whether that succeeds or fails, we
take the resulting floor time, convert it to realtime and try to
swap that into the ctime.
We take the result of the ctime swap whether it succeeds or fails,
since either is just as valid.
Filesystems can opt into this by setting the FS_MGTIME fstype flag.
Others should be unaffected (other than being subject to the same
floor value as multigrain filesystems)"
* tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: reduce pointer chasing in is_mgtime() test
tmpfs: add support for multigrain timestamps
btrfs: convert to multigrain timestamps
ext4: switch to multigrain timestamps
xfs: switch to multigrain timestamps
Documentation: add a new file documenting multigrain timestamps
fs: add percpu counters for significant multigrain timestamp events
fs: tracepoints around multigrain timestamp events
fs: handle delegated timestamps in setattr_copy_mgtime
timekeeping: Add percpu counter for tracking floor swap events
timekeeping: Add interfaces for handling timestamps with a floor value
fs: have setattr_copy handle multigrain timestamps appropriately
fs: add infrastructure for multigrain timestamps
Linus Torvalds [Sun, 17 Nov 2024 22:15:08 +0000 (14:15 -0800)]
Linux 6.12
Linus Torvalds [Sun, 17 Nov 2024 17:35:51 +0000 (09:35 -0800)]
Merge tag 'x86_urgent_for_v6.12' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure a kdump kernel with CONFIG_IMA_KEXEC enabled and booted on
an AMD SME enabled hardware properly decrypts the ima_kexec buffer
information passed to it from the previous kernel
- Fix building the kernel with Clang where a non-TLS definition of the
stack protector guard cookie leads to bogus code generation
- Clear a wrongly advertised virtualized VMLOAD/VMSAVE feature flag on
some Zen4 client systems as those insns are not supported on client
* tag 'x86_urgent_for_v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix a kdump kernel failure on SME system when CONFIG_IMA_KEXEC=y
x86/stackprotector: Work around strict Clang TLS symbol requirements
x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client
Linus Torvalds [Sun, 17 Nov 2024 00:00:38 +0000 (16:00 -0800)]
Merge tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. All singletons, please see the
changelogs for details"
* tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: revert "mm: shmem: fix data-race in shmem_getattr()"
ocfs2: uncache inode which has failed entering the group
mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
mm, doc: update read_ahead_kb for MADV_HUGEPAGE
fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args()
sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers
crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
mm/mremap: fix address wraparound in move_page_tables()
tools/mm: fix compile error
mm, swap: fix allocation and scanning race with swapoff
Andrew Morton [Sat, 16 Nov 2024 00:57:24 +0000 (16:57 -0800)]
mm: revert "mm: shmem: fix data-race in shmem_getattr()"
Revert
d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") as
suggested by Chuck [1]. It is causing deadlocks when accessing tmpfs over
NFS.
As Hugh commented, "added just to silence a syzbot sanitizer splat: added
where there has never been any practical problem".
Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net
Fixes:
d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeongjun Park <aha310510@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sat, 16 Nov 2024 23:14:39 +0000 (15:14 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rmk/linux
Pull ARM fixes from Russell King:
- Fix kernel mapping for XIP kernels
- Fix SMP support for XIP kernels
- Fix complication corner case with CFI
- Fix a typo in nommu code
- Fix cacheflush syscall when PAN is enabled on LPAE platforms
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: fix cacheflush with PAN
ARM: 9435/1: ARM/nommu: Fix typo "absence"
ARM: 9434/1: cfi: Fix compilation corner case
ARM: 9420/1: smp: Fix SMP for xip kernels
ARM: 9419/1: mm: Fix kernel memory mapping for xip kernels
Linus Torvalds [Sat, 16 Nov 2024 23:09:14 +0000 (15:09 -0800)]
Merge tag 'drm-fixes-2024-11-17' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fix from Dave Airlie:
"Alex sent on a last minute revert for a amdgpu/swsmu regression:
- revert patch to fix swsmu regression"
* tag 'drm-fixes-2024-11-17' of https://gitlab.freedesktop.org/drm/kernel:
Revert "drm/amd/pm: correct the workload setting"
Dave Airlie [Sat, 16 Nov 2024 22:12:44 +0000 (08:12 +1000)]
Merge tag 'amd-drm-fixes-6.12-2024-11-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.12-2024-11-16:
amdgpu:
- Revert a swsmu patch to fix a regression
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241116145320.2507156-1-alexander.deucher@amd.com
Linus Torvalds [Sat, 16 Nov 2024 16:12:43 +0000 (08:12 -0800)]
Merge tag 'trace-ringbuffer-v6.12-rc7-2' of git://git./linux/kernel/git/trace/linux-trace
Pull ring buffer fixes from Steven Rostedt:
- Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU
hotplug"
A crash that happened on cpu hotplug was actually caused by the
incorrect ref counting that was fixed by commit
2cf9733891a4
("ring-buffer: Fix refcount setting of boot mapped buffers"). The
removal of calling cpu hotplug callbacks on memory mapped buffers was
not an issue even though the tests at the time pointed toward it. But
in fact, there's a check in that code that tests to see if the
buffers are already allocated or not, and will not allocate them
again if they are. Not calling the cpu hotplug callbacks ended up not
initializing the non boot CPU buffers.
Simply remove that change.
- Clear all CPU buffers when starting tracing in a boot mapped buffer
To properly process events from a previous boot, the address space
needs to be accounted for due to KASLR and the events in the buffer
are updated accordingly when read. This also requires that when the
buffer has tracing enabled again in the current boot that the buffers
are reset so that events from the previous boot do not interact with
the events of the current boot and cause confusing due to not having
the proper meta data.
It was found that if a CPU is taken offline, that its per CPU buffer
is not reset when tracing starts. This allows for events to be from
both the previous boot and the current boot to be in the buffer at
the same time. Clear all CPU buffers when tracing is started in a
boot mapped buffer.
* tag 'trace-ringbuffer-v6.12-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/ring-buffer: Clear all memory mapped CPU ring buffers on first recording
Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU hotplug"
Alex Deucher [Sat, 16 Nov 2024 14:22:14 +0000 (09:22 -0500)]
Revert "drm/amd/pm: correct the workload setting"
This reverts commit
74e1006430a5377228e49310f6d915628609929e.
This causes a regression in the workload selection.
A more extensive fix is being worked on.
For now, revert.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3618
Fixes:
74e1006430a5 ("drm/amd/pm: correct the workload setting")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Torvalds [Fri, 15 Nov 2024 19:44:32 +0000 (11:44 -0800)]
Merge tag 'riscv-for-linus-6.12-rc8' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
- A fix for the CPU perf driver that avoids leaking CPU ID references
on systems without snapshot support.
* tag 'riscv-for-linus-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
drivers: perf: Fix wrong put_cpu() placement
Linus Torvalds [Fri, 15 Nov 2024 18:53:42 +0000 (10:53 -0800)]
Merge tag 'drm-fixes-2024-11-16' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Final week of fixes, lots of small amdgpu fixes, some i915 and xe
fixes, the nouveau changes fix a recent regression and some laptop
panel black screens, then a couple of other misc ones.
It's probably a little busier than I'd like, but each fix seems fine.
amdgpu:
- PSR fix
- Panel replay fixes
- DML fix
- vblank power fix
- Fix video caps
- SMU 14.0 fix
- GPUVM fix
- MES 12 fix
- APU carve out fix
- DC vbios fix
- NBIO fix
i915:
- Don't load GSC on ARL-H and ARL-U if too old FW
- Avoid potential OOPS in enabling/disabling TV output
xe:
- Fix unlock on exec ioctl error path
- Fix hibernation on LNL due to ggtt getting lost
- Fix missing runtime PM in OA release
bridge:
- tc358768: Fix DSI command tx
nouveau:
- Fix GSP AUX error handling
- dp: Handle retires for AUX CH transfers with GSP
- fw: Sync DMA after setup
panthor:
- Fix partial BO mappings to GPU
rockchip:
- vop: Avoid null-ptr deref in plane-state check
vmwgfx:
- Avoid null-ptr deref in surface creation"
* tag 'drm-fixes-2024-11-16' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
drm/bridge: tc358768: Fix DSI command tx
drm/vmwgfx: avoid null_ptr_deref in vmw_framebuffer_surface_create_handle
nouveau/dp: handle retries for AUX CH transfers with GSP.
nouveau: handle EBUSY and EAGAIN for GSP aux errors.
nouveau: fw: sync dma after setup is called.
drm/xe/oa: Fix "Missing outer runtime PM protection" warning
drm/xe: handle flat ccs during hibernation on igpu
drm/xe: improve hibernation on igpu
drm/xe: Restore system memory GGTT mappings
drm/xe: Ensure all locks released in exec IOCTL
drm/panthor: Fix handling of partial GPU mapping of BOs
drm/amd: Fix initialization mistake for NBIO 7.7.0
Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
drm/amd/display: Fix failure to read vram info due to static BP_RESULT
drm/amdgpu: enable GTT fallback handling for dGPUs only
drm/i915: Grab intel_display from the encoder to avoid potential oopsies
drm/i915/gsc: ARL-H and ARL-U need a newer GSC FW.
drm/amdgpu/mes12: correct kiq unmap latency
drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()
drm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0
...
Linus Torvalds [Fri, 15 Nov 2024 18:48:28 +0000 (10:48 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
- Revert a change to the VLAN logic, this broke previously working ROCE
configurations
- Fix a memory leak on error unwinding in bnxt_re
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
Revert "RDMA/core: Fix ENODEV error for iWARP test over vlan"
RDMA/bnxt_re: Remove some dead code
RDMA/bnxt_re: Fix some error handling paths in bnxt_re_probe()
Dave Airlie [Fri, 15 Nov 2024 18:31:09 +0000 (04:31 +1000)]
Merge tag 'drm-xe-fixes-2024-11-14' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix unlock on exec ioctl error path (Matthew Brost)
- Fix hibernation on LNL due to ggtt getting lost
(Matthew Brost / Matthew Auld)
- Fix missing runtime PM in OA release (Ashutosh)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5ntcf2ssmmvo5dsf2mdcee4guwwmpbm3xrlufgt2pdfmznzjo3@62ygo3bxkock
Linus Torvalds [Fri, 15 Nov 2024 18:20:17 +0000 (10:20 -0800)]
Merge tag 'pmdomain-v6.12-rc1-2' of git://git./linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
"pmdomain core:
- Add GENPD_FLAG_DEV_NAME_FW flag to generate unique names
pmdomain providers:
- arm: Use FLAG_DEV_NAME_FW to ensure unique names
- imx93-blk-ctrl: Fix the remove path
arm_scmi/qcom-cpucp:
- Report duplicate OPPs as firmware bugs for arm_scmi
- Skip OPP duplicates for arm_scmi
- Mark the qcom-cpucp mailbox irq with IRQF_NO_SUSPEND flag"
* tag 'pmdomain-v6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
firmware: arm_scmi: Report duplicate opps as firmware bugs
firmware: arm_scmi: Skip opp duplicates
pmdomain: imx93-blk-ctrl: correct remove path
pmdomain: arm: Use FLAG_DEV_NAME_FW to ensure unique names
pmdomain: core: Add GENPD_FLAG_DEV_NAME_FW flag
Linus Torvalds [Fri, 15 Nov 2024 18:16:12 +0000 (10:16 -0800)]
Merge tag 'mmc-v6.12-rc3-2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- dw_mmc: Revert fix for IDMAC operation with pages bigger than 4K
- sunxi-mmc: Fix A100 compatible description
* tag 'mmc-v6.12-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Revert "mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K"
mmc: sunxi-mmc: Fix A100 compatible description
Linus Torvalds [Fri, 15 Nov 2024 18:09:38 +0000 (10:09 -0800)]
Merge tag 'sound-6.12' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A few last-minute fixes. All changes are device-specific small fixes
that should be pretty safe to apply"
* tag 'sound-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - update set GPIO3 to default for Thinkpad with ALC1318
ALSA: hda/realtek: fix mute/micmute LEDs for a HP EliteBook 645 G10
ALSA: hda/realtek - Fixed Clevo platform headset Mic issue
ALSA: usb-audio: Fix Yamaha P-125 Quirk Entry
ASoC: max9768: Fix event generation for playback mute
ASoC: intel: sof_sdw: add quirk for Dell SKU
ASoC: audio-graph-card2: Purge absent supplies for device tree nodes
Linus Torvalds [Fri, 15 Nov 2024 18:04:39 +0000 (10:04 -0800)]
Merge tag 'v6.12-p5' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression in the MIPS CRC32C code"
* tag 'v6.12-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: mips/crc32 - fix the CRC32C implementation
Linus Torvalds [Fri, 15 Nov 2024 17:59:51 +0000 (09:59 -0800)]
Merge tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git./linux/kernel/git/tj/sched_ext
Pull sched_ext fix from Tejun Heo:
"One more fix for v6.12-rc7
ops.cpu_acquire() was being invoked with the wrong kfunc mask allowing
the operation to call kfuncs which shouldn't be allowed. Fix it by
using SCX_KF_REST instead, which is trivial and low risk"
* tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
Linus Torvalds [Fri, 15 Nov 2024 17:45:32 +0000 (09:45 -0800)]
Merge tag 'for-6.12-rc7-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix that seems urgent and good to have in 6.12 final.
It could potentially lead to unexpected transaction aborts, due to
wrong comparison and order of processing of delayed refs"
* tag 'for-6.12-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix incorrect comparison for delayed refs
Dmitry Antipov [Thu, 14 Nov 2024 04:38:44 +0000 (07:38 +0300)]
ocfs2: uncache inode which has failed entering the group
Syzbot has reported the following BUG:
kernel BUG at fs/ocfs2/uptodate.c:509!
...
Call Trace:
<TASK>
? __die_body+0x5f/0xb0
? die+0x9e/0xc0
? do_trap+0x15a/0x3a0
? ocfs2_set_new_buffer_uptodate+0x145/0x160
? do_error_trap+0x1dc/0x2c0
? ocfs2_set_new_buffer_uptodate+0x145/0x160
? __pfx_do_error_trap+0x10/0x10
? handle_invalid_op+0x34/0x40
? ocfs2_set_new_buffer_uptodate+0x145/0x160
? exc_invalid_op+0x38/0x50
? asm_exc_invalid_op+0x1a/0x20
? ocfs2_set_new_buffer_uptodate+0x2e/0x160
? ocfs2_set_new_buffer_uptodate+0x144/0x160
? ocfs2_set_new_buffer_uptodate+0x145/0x160
ocfs2_group_add+0x39f/0x15a0
? __pfx_ocfs2_group_add+0x10/0x10
? __pfx_lock_acquire+0x10/0x10
? mnt_get_write_access+0x68/0x2b0
? __pfx_lock_release+0x10/0x10
? rcu_read_lock_any_held+0xb7/0x160
? __pfx_rcu_read_lock_any_held+0x10/0x10
? smack_log+0x123/0x540
? mnt_get_write_access+0x68/0x2b0
? mnt_get_write_access+0x68/0x2b0
? mnt_get_write_access+0x226/0x2b0
ocfs2_ioctl+0x65e/0x7d0
? __pfx_ocfs2_ioctl+0x10/0x10
? smack_file_ioctl+0x29e/0x3a0
? __pfx_smack_file_ioctl+0x10/0x10
? lockdep_hardirqs_on_prepare+0x43d/0x780
? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
? __pfx_ocfs2_ioctl+0x10/0x10
__se_sys_ioctl+0xfb/0x170
do_syscall_64+0xf3/0x230
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
</TASK>
When 'ioctl(OCFS2_IOC_GROUP_ADD, ...)' has failed for the particular
inode in 'ocfs2_verify_group_and_input()', corresponding buffer head
remains cached and subsequent call to the same 'ioctl()' for the same
inode issues the BUG() in 'ocfs2_set_new_buffer_uptodate()' (trying
to cache the same buffer head of that inode). Fix this by uncaching
the buffer head with 'ocfs2_remove_from_cache()' on error path in
'ocfs2_group_add()'.
Link: https://lkml.kernel.org/r/20241114043844.111847-1-dmantipov@yandex.ru
Fixes:
7909f2bf8353 ("[PATCH 2/2] ocfs2: Implement group add for online resize")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reported-by: syzbot+453873f1588c2d75b447@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
453873f1588c2d75b447
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Dmitry Antipov <dmantipov@yandex.ru>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jinjiang Tu [Wed, 13 Nov 2024 08:32:35 +0000 (16:32 +0800)]
mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
We triggered a NULL pointer dereference for ac.preferred_zoneref->zone in
alloc_pages_bulk_noprof() when the task is migrated between cpusets.
When cpuset is enabled, in prepare_alloc_pages(), ac->nodemask may be
¤t->mems_allowed. when first_zones_zonelist() is called to find
preferred_zoneref, the ac->nodemask may be modified concurrently if the
task is migrated between different cpusets. Assuming we have 2 NUMA Node,
when traversing Node1 in ac->zonelist, the nodemask is 2, and when
traversing Node2 in ac->zonelist, the nodemask is 1. As a result, the
ac->preferred_zoneref points to NULL zone.
In alloc_pages_bulk_noprof(), for_each_zone_zonelist_nodemask() finds a
allowable zone and calls zonelist_node_idx(ac.preferred_zoneref), leading
to NULL pointer dereference.
__alloc_pages_noprof() fixes this issue by checking NULL pointer in commit
ea57485af8f4 ("mm, page_alloc: fix check for NULL preferred_zone") and
commit
df76cee6bbeb ("mm, page_alloc: remove redundant checks from alloc
fastpath").
To fix it, check NULL pointer for preferred_zoneref->zone.
Link: https://lkml.kernel.org/r/20241113083235.166798-1-tujinjiang@huawei.com
Fixes:
387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yafang Shao [Wed, 13 Nov 2024 15:07:11 +0000 (23:07 +0800)]
mm, doc: update read_ahead_kb for MADV_HUGEPAGE
MADV_HUGEPAGE is a new addition to readahead with behavior distinct from
normal pages. To prevent confusion, we should update the documentation
accordingly.
Link: https://lkml.kernel.org/r/20241113150711.1685-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dan Carpenter [Thu, 14 Nov 2024 08:59:32 +0000 (11:59 +0300)]
fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args()
The "arg->vec_len" variable is a u64 that comes from the user at the start
of the function. The "arg->vec_len * sizeof(struct page_region))"
multiplication can lead to integer wrapping. Use size_mul() to avoid
that.
Also the size_add/mul() functions work on unsigned long so for 32bit
systems we need to ensure that "arg->vec_len" fits in an unsigned long.
Link: https://lkml.kernel.org/r/39d41335-dd4d-48ed-8a7f-402c57d8ea84@stanley.mountain
Fixes:
52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Qun-Wei Lin [Wed, 13 Nov 2024 04:25:43 +0000 (12:25 +0800)]
sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers
When CONFIG_KASAN_SW_TAGS and CONFIG_KASAN_STACK are enabled, the
object_is_on_stack() function may produce incorrect results due to the
presence of tags in the obj pointer, while the stack pointer does not have
tags. This discrepancy can lead to incorrect stack object detection and
subsequently trigger warnings if CONFIG_DEBUG_OBJECTS is also enabled.
Example of the warning:
ODEBUG: object
3eff800082ea7bb0 is NOT on stack
ffff800082ea0000, but annotated.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:557 __debug_object_init+0x330/0x364
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5 #4
Hardware name: linux,dummy-virt (DT)
pstate:
600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __debug_object_init+0x330/0x364
lr : __debug_object_init+0x330/0x364
sp :
ffff800082ea7b40
x29:
ffff800082ea7b40 x28:
98ff0000c0164518 x27:
98ff0000c0164534
x26:
ffff800082d93ec8 x25:
0000000000000001 x24:
1cff0000c00172a0
x23:
0000000000000000 x22:
ffff800082d93ed0 x21:
ffff800081a24418
x20:
3eff800082ea7bb0 x19:
efff800000000000 x18:
0000000000000000
x17:
00000000000000ff x16:
0000000000000047 x15:
206b63617473206e
x14:
0000000000000018 x13:
ffff800082ea7780 x12:
0ffff800082ea78e
x11:
0ffff800082ea790 x10:
0ffff800082ea79d x9 :
34d77febe173e800
x8 :
34d77febe173e800 x7 :
0000000000000001 x6 :
0000000000000001
x5 :
feff800082ea74b8 x4 :
ffff800082870a90 x3 :
ffff80008018d3c4
x2 :
0000000000000001 x1 :
ffff800082858810 x0 :
0000000000000050
Call trace:
__debug_object_init+0x330/0x364
debug_object_init_on_stack+0x30/0x3c
schedule_hrtimeout_range_clock+0xac/0x26c
schedule_hrtimeout+0x1c/0x30
wait_task_inactive+0x1d4/0x25c
kthread_bind_mask+0x28/0x98
init_rescuer+0x1e8/0x280
workqueue_init+0x1a0/0x3cc
kernel_init_freeable+0x118/0x200
kernel_init+0x28/0x1f0
ret_from_fork+0x10/0x20
---[ end trace
0000000000000000 ]---
ODEBUG: object
3eff800082ea7bb0 is NOT on stack
ffff800082ea0000, but annotated.
------------[ cut here ]------------
Link: https://lkml.kernel.org/r/20241113042544.19095-1-qun-wei.lin@mediatek.com
Signed-off-by: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Cc: Andrew Yang <andrew.yang@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Casper Li <casper.li@mediatek.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dave Vasilevsky [Tue, 17 Sep 2024 16:37:20 +0000 (12:37 -0400)]
crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using Open Firmware.
On these machines, the kernel refuses to boot from non-zero
PHYSICAL_START, which occurs when CRASH_DUMP is on.
Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should
default to off for them. Users booting via some other mechanism can still
turn it on explicitly.
Does not change the default on any other architectures for the
time being.
Link: https://lkml.kernel.org/r/20240917163720.1644584-1-dave@vasilevsky.ca
Fixes:
75bc255a7444 ("crash: clean up kdump related config items")
Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca>
Reported-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Closes: https://lists.debian.org/debian-powerpc/2024/07/msg00001.html
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jann Horn [Mon, 11 Nov 2024 19:34:30 +0000 (20:34 +0100)]
mm/mremap: fix address wraparound in move_page_tables()
On 32-bit platforms, it is possible for the expression `len + old_addr <
old_end` to be false-positive if `len + old_addr` wraps around.
`old_addr` is the cursor in the old range up to which page table entries
have been moved; so if the operation succeeded, `old_addr` is the *end* of
the old region, and adding `len` to it can wrap.
The overflow causes mremap() to mistakenly believe that PTEs have been
copied; the consequence is that mremap() bails out, but doesn't move the
PTEs back before the new VMA is unmapped, causing anonymous pages in the
region to be lost. So basically if userspace tries to mremap() a
private-anon region and hits this bug, mremap() will return an error and
the private-anon region's contents appear to have been zeroed.
The idea of this check is that `old_end - len` is the original start
address, and writing the check that way also makes it easier to read; so
fix the check by rearranging the comparison accordingly.
(An alternate fix would be to refactor this function by introducing an
"orig_old_start" variable or such.)
Tested in a VM with a 32-bit X86 kernel; without the patch:
```
user@horn:~/big_mremap$ cat test.c
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <err.h>
#include <sys/mman.h>
#define ADDR1 ((void*)0x60000000)
#define ADDR2 ((void*)0x10000000)
#define SIZE 0x50000000uL
int main(void) {
unsigned char *p1 = mmap(ADDR1, SIZE, PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0);
if (p1 == MAP_FAILED)
err(1, "mmap 1");
unsigned char *p2 = mmap(ADDR2, SIZE, PROT_NONE,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0);
if (p2 == MAP_FAILED)
err(1, "mmap 2");
*p1 = 0x41;
printf("first char is 0x%02hhx\n", *p1);
unsigned char *p3 = mremap(p1, SIZE, SIZE,
MREMAP_MAYMOVE|MREMAP_FIXED, p2);
if (p3 == MAP_FAILED) {
printf("mremap() failed; first char is 0x%02hhx\n", *p1);
} else {
printf("mremap() succeeded; first char is 0x%02hhx\n", *p3);
}
}
user@horn:~/big_mremap$ gcc -static -o test test.c
user@horn:~/big_mremap$ setarch -R ./test
first char is 0x41
mremap() failed; first char is 0x00
```
With the patch:
```
user@horn:~/big_mremap$ setarch -R ./test
first char is 0x41
mremap() succeeded; first char is 0x41
```
Link: https://lkml.kernel.org/r/20241111-fix-mremap-32bit-wrap-v1-1-61d6be73b722@google.com
Fixes:
af8ca1c14906 ("mm/mremap: optimize the start addresses in move_page_tables()")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Motiejus JakÅ`tys [Tue, 12 Nov 2024 17:16:55 +0000 (19:16 +0200)]
tools/mm: fix compile error
Add a missing semicolon.
Link: https://lkml.kernel.org/r/20241112171655.1662670-1-motiejus@jakstys.lt
Fixes:
ece5897e5a10 ("tools/mm: -Werror fixes in page-types/slabinfo")
Signed-off-by: Motiejus JakÅ`tys <motiejus@jakstys.lt>
Closes: https://github.com/NixOS/nixpkgs/issues/355369
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Wladislav Wiebe <wladislav.kw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kairui Song [Tue, 12 Nov 2024 08:34:14 +0000 (16:34 +0800)]
mm, swap: fix allocation and scanning race with swapoff
There are two flags used to synchronize allocation and scanning with
swapoff: SWP_WRITEOK and SWP_SCANNING.
SWP_WRITEOK: Swapoff will first unset this flag, at this point any further
swap allocation or scanning on this device should just abort so no more
new entries will be referencing this device. Swapoff will then unuse all
existing swap entries.
SWP_SCANNING: This flag is set when device is being scanned. Swapoff will
wait for all scanner to stop before the final release of the swap device
structures to avoid UAF. Note this flag is the highest used bit of
si->flags so it could be added up arithmetically, if there are multiple
scanner.
commit
5f843a9a3a1e ("mm: swap: separate SSD allocation from
scan_swap_map_slots()") ignored SWP_SCANNING and SWP_WRITEOK flags while
separating cluster allocation path from the old allocation path. Add the
flags back to fix swapoff race. The race is hard to trigger as si->lock
prevents most parallel operations, but si->lock could be dropped for
reclaim or discard. This issue is found during code review.
This commit fixes this problem. For SWP_SCANNING, Just like before, set
the flag before scan and remove it afterwards.
For SWP_WRITEOK, there are several places where si->lock could be dropped,
it will be error-prone and make the code hard to follow if we try to cover
these places one by one. So just do one check before the real allocation,
which is also very similar like before. With new cluster allocator it may
waste a bit of time iterating the clusters but won't take long, and
swapoff is not performance sensitive.
Link: https://lkml.kernel.org/r/20241112083414.78174-1-ryncsn@gmail.com
Fixes:
5f843a9a3a1e ("mm: swap: separate SSD allocation from scan_swap_map_slots()")
Reported-by: "Huang, Ying" <ying.huang@intel.com>
Closes: https://lore.kernel.org/linux-mm/87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com/
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dave Airlie [Thu, 14 Nov 2024 20:48:49 +0000 (06:48 +1000)]
Merge tag 'amd-drm-fixes-6.12-2024-11-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.12-2024-11-14:
amdgpu:
- PSR fix
- Panel replay fixes
- DML fix
- vblank power fix
- Fix video caps
- SMU 14.0 fix
- GPUVM fix
- MES 12 fix
- APU carve out fix
- DC vbios fix
- NBIO fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241114143401.448210-1-alexander.deucher@amd.com
Dave Airlie [Thu, 14 Nov 2024 20:38:32 +0000 (06:38 +1000)]
Merge tag 'drm-misc-fixes-2024-11-14' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
bridge:
- tc358768: Fix DSI command tx
nouveau:
- Fix GSP AUX error handling
- dp: Handle retires for AUX CH transfers with GSP
- fw: Sync DMA after setup
panthor:
- Fix partial BO mappings to GPU
rockchip:
- vop: Avoid null-ptr deref in plane-state check
vmwgfx:
- Avoid null-ptr deref in surface creation
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241114142256.GA86810@2a02-2454-fd5e-fd00-4ce-489-4b34-bd1a.dyn6.pyur.net
Dave Airlie [Thu, 14 Nov 2024 20:18:34 +0000 (06:18 +1000)]
Merge tag 'drm-intel-fixes-2024-11-14' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Don't load GSC on ARL-H and ARL-U if too old FW
- Avoid potential OOPS in enabling/disabling TV output
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZzWksU6CMGLPfjkT@jlahtine-mobl.ger.corp.intel.com
Tejun Heo [Thu, 14 Nov 2024 18:50:58 +0000 (08:50 -1000)]
sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as
SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is
called from balance_one() under the rq lock and should only be allowed call
kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Zhao Mengmeng <zhaomzhao@126.com>
Link: http://lkml.kernel.org/r/ZzYvf2L3rlmjuKzh@slm.duckdns.org
Fixes:
245254f7081d ("sched_ext: Implement sched_ext_ops.cpu_acquire/release()")
Linus Torvalds [Thu, 14 Nov 2024 18:05:33 +0000 (10:05 -0800)]
Merge tag 'net-6.12-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Quite calm week. No new regression under investigation.
Current release - regressions:
- eth: revert "igb: Disable threaded IRQ for igb_msix_other"
Current release - new code bugs:
- bluetooth: btintel: direct exception event to bluetooth stack
Previous releases - regressions:
- core: fix data-races around sk->sk_forward_alloc
- netlink: terminate outstanding dump on socket close
- mptcp: error out earlier on disconnect
- vsock: fix accept_queue memory leak
- phylink: ensure PHY momentary link-fails are handled
- eth: mlx5:
- fix null-ptr-deref in add rule err flow
- lock FTE when checking if active
- eth: dwmac-mediatek: fix inverted handling of mediatek,mac-wol
Previous releases - always broken:
- sched: fix u32's systematic failure to free IDR entries for hnodes.
- sctp: fix possible UAF in sctp_v6_available()
- eth: bonding: add ns target multicast address to slave device
- eth: mlx5: fix msix vectors to respect platform limit
- eth: icssg-prueth: fix 1 PPS sync"
* tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
net: sched: u32: Add test case for systematic hnode IDR leaks
selftests: bonding: add ns multicast group testing
bonding: add ns target multicast address to slave device
net: ti: icssg-prueth: Fix 1 PPS sync
stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines
net: Make copy_safe_from_sockptr() match documentation
net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
ipmr: Fix access to mfc_cache_list without lock held
samples: pktgen: correct dev to DEV
net: phylink: ensure PHY momentary link-fails are handled
mptcp: pm: use _rcu variant under rcu_read_lock
mptcp: hold pm lock when deleting entry
mptcp: update local address flags when setting it
net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
MAINTAINERS: Re-add cancelled Renesas driver sections
Revert "igb: Disable threaded IRQ for igb_msix_other"
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
virtio/vsock: Improve MSG_ZEROCOPY error handling
vsock: Fix sk_error_queue memory leak
...
Linus Torvalds [Thu, 14 Nov 2024 18:00:23 +0000 (10:00 -0800)]
Merge tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"This fixes one minor regression from the btree cache fixes (in the
scan_for_btree_nodes repair path) - and the shutdown path fix is the
big one here, in terms of bugs closed:
- Assorted tiny syzbot fixes
- Shutdown path fix: "bch2_btree_write_buffer_flush_going_ro()"
The shutdown path wasn't flushing the btree write buffer, leading
to shutting down while we still had operations in flight. This
fixes a whole slew of syzbot bugs, and undoubtedly other strange
heisenbugs.
* tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix assertion pop in bch2_ptr_swab()
bcachefs: Fix journal_entry_dev_usage_to_text() overrun
bcachefs: Allow for unknown key types in backpointers fsck
bcachefs: Fix assertion pop in topology repair
bcachefs: Fix hidden btree errors when reading roots
bcachefs: Fix validate_bset() repair path
bcachefs: Fix missing validation for bch_backpointer.level
bcachefs: Fix bch_member.btree_bitmap_shift validation
bcachefs: bch2_btree_write_buffer_flush_going_ro()
Steven Rostedt [Thu, 14 Nov 2024 16:28:25 +0000 (11:28 -0500)]
tracing/ring-buffer: Clear all memory mapped CPU ring buffers on first recording
The events of a memory mapped ring buffer from the previous boot should
not be mixed in with events from the current boot. There's meta data that
is used to handle KASLR so that function names can be shown properly.
Also, since the timestamps of the previous boot have no meaning to the
timestamps of the current boot, having them intermingled in a buffer can
also cause confusion because there could possibly be events in the future.
When a trace is activated the meta data is reset so that the pointers of
are now processed for the new address space. The trace buffers are reset
when tracing starts for the first time. The problem here is that the reset
only happens on online CPUs. If a CPU is offline, it does not get reset.
To demonstrate the issue, a previous boot had tracing enabled in the boot
mapped ring buffer on reboot. On the following boot, tracing has not been
started yet so the function trace from the previous boot is still visible.
# trace-cmd show -B boot_mapped -c 3 | tail
<idle>-0 [003] d.h2. 156.462395: __rcu_read_lock <-cpu_emergency_disable_virtualization
<idle>-0 [003] d.h2. 156.462396: vmx_emergency_disable_virtualization_cpu <-cpu_emergency_disable_virtualization
<idle>-0 [003] d.h2. 156.462396: __rcu_read_unlock <-__sysvec_reboot
<idle>-0 [003] d.h2. 156.462397: stop_this_cpu <-__sysvec_reboot
<idle>-0 [003] d.h2. 156.462397: set_cpu_online <-stop_this_cpu
<idle>-0 [003] d.h2. 156.462397: disable_local_APIC <-stop_this_cpu
<idle>-0 [003] d.h2. 156.462398: clear_local_APIC <-disable_local_APIC
<idle>-0 [003] d.h2. 156.462574: mcheck_cpu_clear <-stop_this_cpu
<idle>-0 [003] d.h2. 156.462575: mce_intel_feature_clear <-stop_this_cpu
<idle>-0 [003] d.h2. 156.462575: lmce_supported <-mce_intel_feature_clear
Now, if CPU 3 is taken offline, and tracing is started on the memory
mapped ring buffer, the events from the previous boot in the CPU 3 ring
buffer is not reset. Now those events are using the meta data from the
current boot and produces just hex values.
# echo 0 > /sys/devices/system/cpu/cpu3/online
# trace-cmd start -B boot_mapped -p function
# trace-cmd show -B boot_mapped -c 3 | tail
<idle>-0 [003] d.h2. 156.462395: 0xffffffff9a1e3194 <-0xffffffff9a0f655e
<idle>-0 [003] d.h2. 156.462396: 0xffffffff9a0a1d24 <-0xffffffff9a0f656f
<idle>-0 [003] d.h2. 156.462396: 0xffffffff9a1e6bc4 <-0xffffffff9a0f7323
<idle>-0 [003] d.h2. 156.462397: 0xffffffff9a0d12b4 <-0xffffffff9a0f732a
<idle>-0 [003] d.h2. 156.462397: 0xffffffff9a1458d4 <-0xffffffff9a0d12e2
<idle>-0 [003] d.h2. 156.462397: 0xffffffff9a0faed4 <-0xffffffff9a0d12e7
<idle>-0 [003] d.h2. 156.462398: 0xffffffff9a0faaf4 <-0xffffffff9a0faef2
<idle>-0 [003] d.h2. 156.462574: 0xffffffff9a0e3444 <-0xffffffff9a0d12ef
<idle>-0 [003] d.h2. 156.462575: 0xffffffff9a0e4964 <-0xffffffff9a0d12ef
<idle>-0 [003] d.h2. 156.462575: 0xffffffff9a0e3fb0 <-0xffffffff9a0e496f
Reset all CPUs when starting a boot mapped ring buffer for the first time,
and not just the online CPUs.
Fixes:
7a1d1e4b9639f ("tracing/ring-buffer: Add last_boot_info file to boot instance")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Christian Brauner [Thu, 14 Nov 2024 15:31:27 +0000 (16:31 +0100)]
statmount: retrieve security mount options
Add the ability to retrieve security mount options. Keep them separate
from filesystem specific mount options so it's easy to tell them apart.
Also allow to retrieve them separate from other mount options as most of
the time users won't be interested in security specific mount options.
Link: https://lore.kernel.org/r/20241114-radtour-ofenrohr-ff34b567b40a@brauner
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Takashi Iwai [Thu, 14 Nov 2024 15:40:15 +0000 (16:40 +0100)]
Merge tag 'asoc-fix-v6.12-rc7' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.12
Some last updates for v6.12, one quirk plus a couple of fixes. One is a
minor fix for a relatively obscure driver and the other is a relatively
important fix for boot hangs with some audio graph based cards.
Josef Bacik [Wed, 13 Nov 2024 16:05:13 +0000 (11:05 -0500)]
btrfs: fix incorrect comparison for delayed refs
When I reworked delayed ref comparison in
cf4f04325b2b ("btrfs: move
->parent and ->ref_root into btrfs_delayed_ref_node"), I made a mistake
and returned -1 for the case where ref1->ref_root was > than
ref2->ref_root. This is a subtle bug that can result in improper
delayed ref running order, which can result in transaction aborts.
Fixes:
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
CC: stable@vger.kernel.org # 6.10+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Steven Rostedt [Thu, 14 Nov 2024 04:08:39 +0000 (23:08 -0500)]
Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU hotplug"
A crash happened when testing cpu hotplug with respect to the memory
mapped ring buffers. It was assumed that the hot plug code was adding a
per CPU buffer that was already created that caused the crash. The real
problem was due to ref counting and was fixed by commit
2cf9733891a4
("ring-buffer: Fix refcount setting of boot mapped buffers").
When a per CPU buffer is created, it will not be created again even with
CPU hotplug, so the fix to not use CPU hotplug was a red herring. In fact,
it caused only the boot CPU buffer to be created, leaving the other CPU
per CPU buffers disabled.
Revert that change as it was not the culprit of the fix it was intended to
be.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241113230839.6c03640f@gandalf.local.home
Fixes:
912da2c384d5 ("ring-buffer: Do not have boot mapped buffers hook to CPU hotplug")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Alexandre Ferrieux [Wed, 13 Nov 2024 10:04:28 +0000 (11:04 +0100)]
net: sched: u32: Add test case for systematic hnode IDR leaks
Add a tdc test case to exercise the just-fixed systematic leak of
IDR entries in u32 hnode disposal. Given the IDR in question is
confined to the range [1..0x7FF], it is sufficient to create/delete
the same filter 2048 times to fill it up and get a nonzero exit
status from "tc filter add".
Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20241113100428.360460-1-alexandre.ferrieux@orange.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Francesco Dolcini [Thu, 26 Sep 2024 14:12:46 +0000 (16:12 +0200)]
drm/bridge: tc358768: Fix DSI command tx
Wait for the command transmission to be completed in the DSI transfer
function polling for the dc_start bit to go back to idle state after the
transmission is started.
This is documented in the datasheet and failures to do so lead to
commands corruption.
Fixes:
ff1ca6397b1d ("drm/bridge: Add tc358768 driver")
Cc: stable@vger.kernel.org
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20240926141246.48282-1-francesco@dolcini.it
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240926141246.48282-1-francesco@dolcini.it
Paolo Abeni [Thu, 14 Nov 2024 10:16:30 +0000 (11:16 +0100)]
Merge branch 'bonding-fix-ns-targets-not-work-on-hardware-nic'
Hangbin Liu says:
====================
bonding: fix ns targets not work on hardware NIC
The first patch fixed ns targets not work on hardware NIC when bonding
set arp_validate.
The second patch add a related selftest for bonding.
v4: Thanks Nikolay for the comments:
use bond_slave_ns_maddrs_{add/del} with clear name
fix comments typos
remove _slave_set_ns_maddrs underscore directly
update bond_option_arp_validate_set() change logic
v3: use ndisc_mc_map to convert the mcast mac address (Jay Vosburgh)
v2: only add/del mcast group on backup slaves when arp_validate is set (Jay Vosburgh)
arp_validate doesn't support 3ad, tlb, alb. So let's only do it on ab mode.
====================
Link: https://patch.msgid.link/20241111101650.27685-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Mon, 11 Nov 2024 10:16:50 +0000 (10:16 +0000)]
selftests: bonding: add ns multicast group testing
Add a test to make sure the backup slaves join correct multicast group
when arp_validate enabled and ns_ip6_target is set. Here is the result:
TEST: arp_validate (active-backup ns_ip6_target arp_validate 0) [ OK ]
TEST: arp_validate (join mcast group) [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 1) [ OK ]
TEST: arp_validate (join mcast group) [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 2) [ OK ]
TEST: arp_validate (join mcast group) [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 3) [ OK ]
TEST: arp_validate (join mcast group) [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 4) [ OK ]
TEST: arp_validate (join mcast group) [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 5) [ OK ]
TEST: arp_validate (join mcast group) [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 6) [ OK ]
TEST: arp_validate (join mcast group) [ OK ]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Mon, 11 Nov 2024 10:16:49 +0000 (10:16 +0000)]
bonding: add ns target multicast address to slave device
Commit
4598380f9c54 ("bonding: fix ns validation on backup slaves")
tried to resolve the issue where backup slaves couldn't be brought up when
receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
worked for drivers that receive all multicast messages, such as the veth
interface.
For standard drivers, the NS multicast message is silently dropped because
the slave device is not a member of the NS target multicast group.
To address this, we need to make the slave device join the NS target
multicast group, ensuring it can receive these IPv6 NS messages to validate
the slave’s status properly.
There are three policies before joining the multicast group:
1. All settings must be under active-backup mode (alb and tlb do not support
arp_validate), with backup slaves and slaves supporting multicast.
2. We can add or remove multicast groups when arp_validate changes.
3. Other operations, such as enslaving, releasing, or setting NS targets,
need to be guarded by arp_validate.
Fixes:
4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jeff Layton [Wed, 13 Nov 2024 14:17:51 +0000 (09:17 -0500)]
fs: reduce pointer chasing in is_mgtime() test
The is_mgtime test checks whether the FS_MGTIME flag is set in the
fstype. To get there from the inode though, we have to dereference 3
pointers.
Add a new IOP_MGTIME flag, and have inode_init_always() set that flag
when the fstype flag is set. Then, make is_mgtime test for IOP_MGTIME
instead.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241113-mgtime-v1-1-84e256980e11@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Mateusz Guzik [Wed, 13 Nov 2024 15:51:03 +0000 (16:51 +0100)]
vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
It literally directly follows a spin_lock() call.
This whacks an explicit barrier on x86-64.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20241113155103.4194099-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Meghana Malladi [Mon, 11 Nov 2024 09:58:42 +0000 (15:28 +0530)]
net: ti: icssg-prueth: Fix 1 PPS sync
The first PPS latch time needs to be calculated by the driver
(in rounded off seconds) and configured as the start time
offset for the cycle. After synchronizing two PTP clocks
running as master/slave, missing this would cause master
and slave to start immediately with some milliseconds
drift which causes the PPS signal to never synchronize with
the PTP master.
Fixes:
186734c15886 ("net: ti: icssg-prueth: add packet timestamping and ptp support")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20241111095842.478833-1-m-malladi@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Chen Ridong [Tue, 29 Oct 2024 08:34:29 +0000 (08:34 +0000)]
drm/vmwgfx: avoid null_ptr_deref in vmw_framebuffer_surface_create_handle
The 'vmw_user_object_buffer' function may return NULL with incorrect
inputs. To avoid possible null pointer dereference, add a check whether
the 'bo' is NULL in the vmw_framebuffer_surface_create_handle.
Fixes:
d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029083429.1185479-1-chenridong@huaweicloud.com
Vitalii Mordan [Fri, 8 Nov 2024 17:33:34 +0000 (20:33 +0300)]
stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines
If the clock dwmac->tx_clk was not enabled in intel_eth_plat_probe,
it should not be disabled in any path.
Conversely, if it was enabled in intel_eth_plat_probe, it must be disabled
in all error paths to ensure proper cleanup.
Found by Linux Verification Center (linuxtesting.org) with Klever.
Fixes:
9efc9b2b04c7 ("net: stmmac: Add dwmac-intel-plat for GBE driver")
Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
Link: https://patch.msgid.link/20241108173334.2973603-1-mordan@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Luczaj [Sun, 10 Nov 2024 23:17:34 +0000 (00:17 +0100)]
net: Make copy_safe_from_sockptr() match documentation
copy_safe_from_sockptr()
return copy_from_sockptr()
return copy_from_sockptr_offset()
return copy_from_user()
copy_from_user() does not return an error on fault. Instead, it returns a
number of bytes that were not copied. Have it handled.
Patch has a side effect: it un-breaks garbage input handling of
nfc_llcp_setsockopt() and mISDN's data_sock_setsockopt().
Fixes:
6309863b31dd ("net: add copy_safe_from_sockptr() helper")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20241111-sockptr-copy-ret-fix-v1-1-a520083a93fb@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nícolas F. R. A. Prado [Sat, 9 Nov 2024 15:16:32 +0000 (10:16 -0500)]
net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
The mediatek,mac-wol property is being handled backwards to what is
described in the binding: it currently enables PHY WOL when the property
is present and vice versa. Invert the driver logic so it matches the
binding description.
Fixes:
fd1d62d80ebc ("net: stmmac: replace the use_phy_wol field with a flag")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://patch.msgid.link/20241109-mediatek-mac-wol-noninverted-v2-1-0e264e213878@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao [Fri, 8 Nov 2024 14:08:36 +0000 (06:08 -0800)]
ipmr: Fix access to mfc_cache_list without lock held
Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the RCU read lock is not held, causing the
following error when `RCU_PROVE` is not held. The same problem might
show up in the IPv6 code path.
6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
-----------------------------
net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by RetransmitAggre/3519:
#0:
ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
#1:
ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90
stack backtrace:
lockdep_rcu_suspicious
mr_table_dump
ipmr_rtm_dumproute
rtnl_dump_all
rtnl_dumpit
netlink_dump
__netlink_dump_start
rtnetlink_rcv_msg
netlink_rcv_skb
netlink_unicast
netlink_sendmsg
This is not a problem per see, since the RTNL lock is held here, so, it
is safe to iterate in the list without the RCU read lock, as suggested
by Eric.
To alleviate the concern, modify the code to use
list_for_each_entry_rcu() with the RTNL-held argument.
The annotation will raise an error only if RTNL or RCU read lock are
missing during iteration, signaling a legitimate problem, otherwise it
will avoid this false positive.
This will solve the IPv6 case as well, since ip6mr_rtm_dumproute() calls
this function as well.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241108-ipmr_rcu-v2-1-c718998e209b@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Fang [Tue, 12 Nov 2024 03:03:47 +0000 (11:03 +0800)]
samples: pktgen: correct dev to DEV
In the pktgen_sample01_simple.sh script, the device variable is uppercase
'DEV' instead of lowercase 'dev'. Because of this typo, the script cannot
enable UDP tx checksum.
Fixes:
460a9aa23de6 ("samples: pktgen: add UDP tx checksum support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20241112030347.1849335-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Tue, 12 Nov 2024 16:20:00 +0000 (16:20 +0000)]
net: phylink: ensure PHY momentary link-fails are handled
Normally, phylib won't notify changes in quick succession. However, as
a result of commit
3e43b903da04 ("net: phy: Immediately call
adjust_link if only tx_lpi_enabled changes") this is no longer true -
it is now possible that phy_link_down() and phy_link_up() will both
complete before phylink's resolver has run, which means it'll miss that
pl->phy_state.link momentarily became false.
Rename "mac_link_dropped" to be more generic "link_failed" since it will
cover more than the MAC/PCS end of the link failing, and arrange to set
this in phylink_phy_change() if we notice that the PHY reports that the
link is down.
This will ensure that we capture an EEE reconfiguration event.
Fixes:
3e43b903da04 ("net: phy: Immediately call adjust_link if only tx_lpi_enabled changes")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/E1tAtcW-002RBS-LB@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 14 Nov 2024 02:51:09 +0000 (18:51 -0800)]
Merge branch 'mptcp-pm-a-few-more-fixes'
Matthieu Baerts says:
====================
mptcp: pm: a few more fixes
Three small fixes related to the MPTCP path-manager:
- Patch 1: correctly reflect the backup flag to the corresponding local
address entry of the userspace path-manager. A fix for v5.19.
- Patch 2: hold the PM lock when deleting an entry from the local
addresses of the userspace path-manager to avoid messing up with this
list. A fix for v5.19.
- Patch 3: use _rcu variant to iterate the in-kernel path-manager's
local addresses list, when under rcu_read_lock(). A fix for v5.17.
====================
Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-0-b835580cefa8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Tue, 12 Nov 2024 19:18:35 +0000 (20:18 +0100)]
mptcp: pm: use _rcu variant under rcu_read_lock
In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are
used as expected to iterate over the list of local addresses, but
list_for_each_entry() was used instead of list_for_each_entry_rcu() in
__lookup_addr(). It is important to use this variant which adds the
required READ_ONCE() (and diagnostic checks if enabled).
Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it
is called under the pernet->lock and not rcu_read_lock(), an extra
condition is then passed to help the diagnostic checks making sure
either the associated spin lock or the RCU lock is held.
Fixes:
86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-3-b835580cefa8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Tue, 12 Nov 2024 19:18:34 +0000 (20:18 +0100)]
mptcp: hold pm lock when deleting entry
When traversing userspace_pm_local_addr_list and deleting an entry from
it in mptcp_pm_nl_remove_doit(), msk->pm.lock should be held.
This patch holds this lock before mptcp_userspace_pm_lookup_addr_by_id()
and releases it after list_move() in mptcp_pm_nl_remove_doit().
Fixes:
d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-2-b835580cefa8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Tue, 12 Nov 2024 19:18:33 +0000 (20:18 +0100)]
mptcp: update local address flags when setting it
Just like in-kernel pm, when userspace pm does set_flags, it needs to send
out MP_PRIO signal, and also modify the flags of the corresponding address
entry in the local address list. This patch implements the missing logic.
Traverse all address entries on userspace_pm_local_addr_list to find the
local address entry, if bkup is true, set the flags of this entry with
FLAG_BACKUP, otherwise, clear FLAG_BACKUP.
Fixes:
892f396c8e68 ("mptcp: netlink: issue MP_PRIO signals from userspace PMs")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-1-b835580cefa8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Airlie [Mon, 11 Nov 2024 03:41:25 +0000 (13:41 +1000)]
nouveau/dp: handle retries for AUX CH transfers with GSP.
eb284f4b3781 drm/nouveau/dp: Honor GSP link training retry timeouts
tried to fix a problem with panel retires, however it appears
the auxch also needs the same treatment, so add the same retry
wrapper around it.
This fixes some eDP panels after a suspend/resume cycle.
Fixes:
eb284f4b3781 ("drm/nouveau/dp: Honor GSP link training retry timeouts")
Cc: stable@vger.kernel.org
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241111034126.2028401-2-airlied@gmail.com
Dave Airlie [Mon, 11 Nov 2024 03:41:24 +0000 (13:41 +1000)]
nouveau: handle EBUSY and EAGAIN for GSP aux errors.
The upper layer transfer functions expect EBUSY as a return
for when retries should be done.
Fix the AUX error translation, but also check for both errors
in a few places.
Fixes:
eb284f4b3781 ("drm/nouveau/dp: Honor GSP link training retry timeouts")
Cc: stable@vger.kernel.org
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241111034126.2028401-1-airlied@gmail.com
Dave Airlie [Tue, 12 Nov 2024 19:57:03 +0000 (05:57 +1000)]
nouveau: fw: sync dma after setup is called.
When this code moved to non-coherent allocator the sync was put too
early for some firmwares which called the setup function, move the
sync down after the setup function.
Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Fixes:
9b340aeb26d5 ("nouveau/firmware: use dma non-coherent allocator")
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241114004603.3095485-1-airlied@gmail.com
Linus Torvalds [Wed, 13 Nov 2024 21:32:51 +0000 (13:32 -0800)]
Merge tag 'pm-6.12-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a locking issue in the asymmetric CPU capacity setup code in the
intel_pstate driver that may lead to a deadlock if CPU online/offline
runs in parallel with the code in question, which is unlikely but not
impossible (Rafael Wysocki)"
* tag 'pm-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()
Linus Torvalds [Wed, 13 Nov 2024 21:28:58 +0000 (13:28 -0800)]
Merge tag 'tpmdd-next-6.12-rc8' of git://git./linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two bug fixes for TPM bus encryption (the remaining reported issues in
the feature)"
* tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Disable TPM on tpm2_create_primary() failure
tpm: Opt-in in disable PCR integrity protection
Ashutosh Dixit [Sat, 9 Nov 2024 03:20:03 +0000 (19:20 -0800)]
drm/xe/oa: Fix "Missing outer runtime PM protection" warning
Fix the following drm_WARN:
[953.586396] xe 0000:00:02.0: [drm] Missing outer runtime PM protection
...
<4> [953.587090] ? xe_pm_runtime_get_noresume+0x8d/0xa0 [xe]
<4> [953.587208] guc_exec_queue_add_msg+0x28/0x130 [xe]
<4> [953.587319] guc_exec_queue_fini+0x3a/0x40 [xe]
<4> [953.587425] xe_exec_queue_destroy+0xb3/0xf0 [xe]
<4> [953.587515] xe_oa_release+0x9c/0xc0 [xe]
Suggested-by: John Harrison <john.c.harrison@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Fixes:
e936f885f1e9 ("drm/xe/oa/uapi: Expose OA stream fd")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241109032003.3093811-1-ashutosh.dixit@intel.com
(cherry picked from commit
b107c63d2953907908fd0cafb0e543b3c3167b75)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Jarkko Sakkinen [Wed, 13 Nov 2024 18:35:39 +0000 (20:35 +0200)]
tpm: Disable TPM on tpm2_create_primary() failure
The earlier bug fix misplaced the error-label when dealing with the
tpm2_create_primary() return value, which the original completely ignored.
Cc: stable@vger.kernel.org
Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=
1087331
Fixes:
cc7d8594342a ("tpm: Rollback tpm2_load_null()")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Jarkko Sakkinen [Wed, 13 Nov 2024 05:54:12 +0000 (07:54 +0200)]
tpm: Opt-in in disable PCR integrity protection
The initial HMAC session feature added TPM bus encryption and/or integrity
protection to various in-kernel TPM operations. This can cause performance
bottlenecks with IMA, as it heavily utilizes PCR extend operations.
In order to mitigate this performance issue, introduce a kernel
command-line parameter to the TPM driver for disabling the integrity
protection for PCR extend operations (i.e. TPM2_PCR_Extend).
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/
Fixes:
6519fea6fd37 ("tpm: add hmac checks to tpm2_pcr_extend()")
Tested-by: Mimi Zohar <zohar@linux.ibm.com>
Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Co-developed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Daniel Martín Gómez [Thu, 7 Nov 2024 14:45:38 +0000 (15:45 +0100)]
jbd2: Fix comment describing journal_init_common()
The code indicates that journal_init_common() fills the journal_t object
it returns while the comment incorrectly states that only a few fields are
initialised. Also, the comment claims that journal structures could be
created from scratch which isn't possible as journal_init_common() calls
journal_load_superblock() which loads and checks journal superblock from
disk.
Signed-off-by: Daniel Martín Gómez <dalme@riseup.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20241107144538.3544-1-dalme@riseup.net
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Mathieu Othacehe [Wed, 6 Nov 2024 13:47:41 +0000 (14:47 +0100)]
ext4: prevent an infinite loop in the lazyinit thread
Use ktime_get_ns instead of ktime_get_real_ns when computing the lr_timeout
not to be affected by system time jumps.
Use a boolean instead of the MAX_JIFFY_OFFSET value to determine whether
the next_wakeup value has been set. Comparing elr->lr_next_sched to
MAX_JIFFY_OFFSET can cause the lazyinit thread to loop indefinitely.
Co-developed-by: Lukas Skupinski <lukas.skupinski@landisgyr.com>
Signed-off-by: Lukas Skupinski <lukas.skupinski@landisgyr.com>
Signed-off-by: Mathieu Othacehe <othacehe@gnu.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241106134741.26948-2-othacehe@gnu.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Thorsten Blum [Tue, 5 Nov 2024 10:33:54 +0000 (11:33 +0100)]
ext4: use struct_size() to improve ext4_htree_store_dirent()
Inline and use struct_size() to calculate the number of bytes to
allocate for new_fn and remove the local variable len.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20241105103353.11590-2-thorsten.blum@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Thorsten Blum [Tue, 5 Nov 2024 10:18:14 +0000 (11:18 +0100)]
ext4: annotate struct fname with __counted_by()
Add the __counted_by compiler attribute to the flexible array member
name to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20241105101813.10864-2-thorsten.blum@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Gustavo A. R. Silva [Fri, 1 Nov 2024 20:45:23 +0000 (14:45 -0600)]
jbd2: avoid dozens of -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end was introduced in GCC-14, and we
are getting ready to enable it, globally.
Use the `DEFINE_RAW_FLEX()` helper for an on-stack definition of
a flexible structure (`struct shash_desc`) where the size of the
flexible-array member (`__ctx`) is known at compile-time, and
refactor the rest of the code, accordingly.
So, with this, fix 77 of the following warnings:
include/linux/jbd2.h:1800:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/ZyU94w0IALVhc9Jy@kspp
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Thorsten Blum [Mon, 21 Oct 2024 10:00:57 +0000 (12:00 +0200)]
ext4: use str_yes_no() helper function
Remove hard-coded strings by using the str_yes_no() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20241021100056.5521-2-thorsten.blum@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Wed, 13 Nov 2024 17:14:19 +0000 (09:14 -0800)]
Merge tag 'bpf-fixes' of git://git./linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye)
- Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS
context retrieval (Zijian Zhang)
- Fix BPF bits iterator selftest for s390x (Hou Tao)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6
bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx
selftests/bpf: Use -4095 as the bad address for bits iterator
Linus Torvalds [Wed, 13 Nov 2024 17:09:00 +0000 (09:09 -0800)]
Merge tag 'loongarch-fixes-6.12-2' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- fix possible CPUs setup logical-physical CPU mapping, in order to
avoid CPU hotplug issue
- fix some KASAN bugs
- fix AP booting issue in VM mode
- some trivial cleanups
* tag 'loongarch-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix AP booting issue in VM mode
LoongArch: Add WriteCombine shadow mapping in KASAN
LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits
LoongArch: Make KASAN work with 5-level page-tables
LoongArch: Define a default value for VM_DATA_DEFAULT_FLAGS
LoongArch: Fix early_numa_add_cpu() usage for FDT systems
LoongArch: For all possible CPUs setup logical-physical CPU mapping
Linus Torvalds [Wed, 13 Nov 2024 16:58:11 +0000 (08:58 -0800)]
Merge tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All
singletons"
* tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: swapfile: fix cluster reclaim work crash on rotational devices
selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
mm/thp: fix deferred split queue not partially_mapped: fix
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
nommu: pass NULL argument to vma_iter_prealloc()
ocfs2: fix UBSAN warning in ocfs2_verify_volume()
nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
mm: count zeromap read and set for swapout and swapin
Al Viro [Sun, 3 Nov 2024 20:20:15 +0000 (15:20 -0500)]
libfs: kill empty_dir_getattr()
It's used only to initialize ->getattr in one inode_operations instance
(empty_dir_inode_operations) and its behaviour had always been equivalent
to what we get with NULL ->getattr.
Just remove that initializer, along with empty_dir_getattr() itself.
While we are at it, the same instance has ->permission initialized to
generic_permission, which is what NULL ->permission ends up doing.
Again, no point keeping it.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stefan Berger [Fri, 1 Nov 2024 19:37:03 +0000 (15:37 -0400)]
fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag
Commit
8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface
function")' introduced the AT_GETATTR_NOSEC flag to ensure that the
call paths only call vfs_getattr_nosec if it is set instead of vfs_getattr.
Now, simplify the getattr interface functions of filesystems where the flag
AT_GETATTR_NOSEC is checked.
There is only a single caller of inode_operations getattr function and it
is located in fs/stat.c in vfs_getattr_nosec. The caller there is the only
one from which the AT_GETATTR_NOSEC flag is passed from.
Two filesystems are checking this flag in .getattr and the flag is always
passed to them unconditionally from only vfs_getattr_nosec:
- ecryptfs: Simplify by always calling vfs_getattr_nosec in
ecryptfs_getattr. From there the flag is passed to no other
function and this function is not called otherwise.
- overlayfs: Simplify by always calling vfs_getattr_nosec in
ovl_getattr. From there the flag is passed to no other
function and this function is not called otherwise.
The query_flags in vfs_getattr_nosec will mask-out AT_GETATTR_NOSEC from
any caller using AT_STATX_SYNC_TYPE as mask so that the flag is not
important inside this function. Also, since no filesystem is checking the
flag anymore, remove the flag entirely now, including the BUG_ON check that
never triggered.
The net change of the changes here combined with the original commit is
that ecryptfs and overlayfs do not call vfs_getattr but only
vfs_getattr_nosec.
Fixes:
8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Closes: https://lore.kernel.org/linux-fsdevel/
20241101011724.GN1350452@ZenIV/T/#u
Cc: Tyler Hicks <code@tyhicks.com>
Cc: ecryptfs@vger.kernel.org
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: linux-unionfs@vger.kernel.org
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 20 Oct 2024 00:48:30 +0000 (20:48 -0400)]
fs/stat.c: switch to CLASS(fd_raw)
... and use fd_empty() consistently
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 20 Sep 2024 04:38:59 +0000 (00:38 -0400)]
kill getname_statx_lookup_flags()
LOOKUP_EMPTY is ignored by the only remaining user, and without
that 'getname_' prefix makes no sense.
Remove LOOKUP_EMPTY part, rename to statx_lookup_flags() and make
static. It most likely is _not_ statx() specific, either, but
that's the next step.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 20 Sep 2024 04:26:57 +0000 (00:26 -0400)]
io_statx_prep(): use getname_uflags()
the only thing in flags getname_flags() ever cares about is
LOOKUP_EMPTY; anything else is none of its damn business.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 12 Nov 2024 10:10:04 +0000 (11:10 +0100)]
statmount: add flag to retrieve unescaped options
Filesystem options can be retrieved with STATMOUNT_MNT_OPTS, which
returns a string of comma separated options, where some characters are
escaped using the \OOO notation.
Add a new flag, STATMOUNT_OPT_ARRAY, which instead returns the raw
option values separated with '\0' charaters.
Since escaped charaters are rare, this inteface is preferable for
non-libmount users which likley don't want to deal with option
de-escaping.
Example code:
if (st->mask & STATMOUNT_OPT_ARRAY) {
const char *opt = st->str + st->opt_array;
for (unsigned int i = 0; i < st->opt_num; i++) {
printf("opt_array[%i]: <%s>\n", i, opt);
opt += strlen(opt) + 1;
}
}
Example ouput:
(1) mnt_opts: <lowerdir+=/l\054w\054r,lowerdir+=/l\054w\054r1,upperdir=/upp\054r,workdir=/w\054rk,redirect_dir=nofollow,uuid=null>
(2) opt_array[0]: <lowerdir+=/l,w,r>
opt_array[1]: <lowerdir+=/l,w,r1>
opt_array[2]: <upperdir=/upp,r>
opt_array[3]: <workdir=/w,rk>
opt_array[4]: <redirect_dir=nofollow>
opt_array[5]: <uuid=null>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20241112101006.30715-1-mszeredi@redhat.com
Acked-by: Jeff Layton <jlayton@kernel.org>
[brauner: tweak variable naming and parsing add example output]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Matthew Auld [Tue, 12 Nov 2024 16:28:28 +0000 (16:28 +0000)]
drm/xe: handle flat ccs during hibernation on igpu
Starting from LNL, CCS has moved over to flat CCS model where there is
now dedicated memory reserved for storing compression state. On
platforms like LNL this reserved memory lives inside graphics stolen
memory, which is not treated like normal RAM and is therefore skipped by
the core kernel when creating the hibernation image. Currently if
something was compressed and we enter hibernation all the corresponding
CCS state is lost on such HW, resulting in corrupted memory. To fix this
evict user buffers from TT -> SYSTEM to ensure we take a snapshot of the
raw CCS state when entering hibernation, where upon resuming we can
restore the raw CCS state back when next validating the buffer. This has
been confirmed to fix display corruption on LNL when coming back from
hibernation.
Fixes:
cbdc52c11c9b ("drm/xe/xe2: Support flat ccs")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3409
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241112162827.116523-2-matthew.auld@intel.com
(cherry picked from commit
c8b3c6db941299d7cc31bd9befed3518fdebaf68)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>