Ming Lei [Mon, 9 Oct 2023 09:33:16 +0000 (17:33 +0800)]
ublk: don't get ublk device reference in ublk_abort_queue()
ublk_abort_queue() is called in ublk_daemon_monitor_work(), in which
it is guaranteed that the device is live because monitor work is
canceled when removing device, so no need to get the device reference.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20231009093324.957829-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mike Christie [Thu, 12 Oct 2023 15:06:00 +0000 (10:06 -0500)]
ublk: Make ublks_max configurable
We are converting tcmu applications to ublk, but have systems with up
to 1k devices. This patch allows us to configure the ublks_max from
userspace with the ublks_max modparam.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20231012150600.6198-3-michael.christie@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mike Christie [Thu, 12 Oct 2023 15:05:59 +0000 (10:05 -0500)]
ublk: Limit dev_id/ub_number values
The dev_id/ub_number is used for the ublk dev's char device's minor
number so it has to fit into MINORMASK. This patch adds checks to prevent
userspace from passing a number that's too large and limits what can be
allocated by the ublk_index_idr for the case where userspace has the
kernel allocate the dev_id/ub_number.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20231012150600.6198-2-michael.christie@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 17 Oct 2023 14:26:38 +0000 (08:26 -0600)]
Merge branch 'for-6.7/io_uring' into for-6.7/block
Merge in io_uring fixes, as the ublk simplifying cancelations and
aborts depend on the two patches from Ming adding cancelation support
for uring_cmd.
* for-6.7/io_uring:
io_uring/kbuf: Use slab for struct io_buffer objects
io_uring/kbuf: Allow the full buffer id space for provided buffers
io_uring/kbuf: Fix check of BID wrapping in provided buffers
io_uring/rsrc: cleanup io_pin_pages()
io_uring: cancelable uring_cmd
io_uring: retain top 8bits of uring_cmd flags for kernel internal use
io_uring: add IORING_OP_WAITID support
exit: add internal include file with helpers
exit: add kernel_waitid_prepare() helper
exit: move core of do_wait() into helper
exit: abstract out should_wake helper for child_wait_callback()
io_uring/rw: add support for IORING_OP_READ_MULTISHOT
io_uring/rw: mark readv/writev as vectored in the opcode definition
io_uring/rw: split io_read() into a helper
Jens Axboe [Thu, 12 Oct 2023 17:35:58 +0000 (11:35 -0600)]
Merge tag 'md-next-
20231012' of https://git./linux/kernel/git/song/md into for-6.7/block
Pull MD changes from Song:
"1. Rewrite mddev_suspend(), by Yu Kuai;
2. Simplify md_seq_ops, by Yu Kuai;
3. Reduce unnecessary locking array_state_store(), by Mariusz Tkaczyk."
* tag 'md-next-
20231012' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: (23 commits)
md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
md: remove old apis to suspend the array
md: suspend array in md_start_sync() if array need reconfiguration
md/raid5: replace suspend with quiesce() callback
md/md-linear: cleanup linear_add()
md: cleanup mddev_create/destroy_serial_pool()
md: use new apis to suspend array before mddev_create/destroy_serial_pool
md: use new apis to suspend array for ioctls involed array reconfiguration
md: use new apis to suspend array for adding/removing rdev from state_store()
md: use new apis to suspend array for sysfs apis
md/raid5: use new apis to suspend array
md/raid5-cache: use new apis to suspend array
md/md-bitmap: use new apis to suspend array for location_store()
md/dm-raid: use new apis to suspend array
md: add new helpers to suspend/resume and lock/unlock array
md: add new helpers to suspend/resume array
md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery()
md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
md/raid1: don't split discard io for write behind
...
Song Liu [Wed, 11 Oct 2023 02:23:32 +0000 (19:23 -0700)]
Merge branch 'md-suspend-rewrite' into md-next
From Yu Kuai, written by Song Liu
Recent tests with raid10 revealed many issues with the following scenarios:
- add or remove disks to the array
- issue io to the array
At first, we fixed each problem independently respect that io can
concurrent with array reconfiguration. However, with more issues reported
continuously, I am hoping to fix these problems thoroughly.
Refer to how block layer protect io with queue reconfiguration (for
example, change elevator):
blk_mq_freeze_queue
-> wait for all io to be done, and prevent new io to be dispatched
// reconfiguration
blk_mq_unfreeze_queue
I think we can do something similar to synchronize io with array
reconfiguration.
Current synchronization works as the following. For the reconfiguration
operation:
1. Hold 'reconfig_mutex';
2. Check that rdev can be added/removed, one condition is that there is no
IO (for example, check nr_pending).
3. Do the actual operations to add/remove a rdev, one procedure is
set/clear a pointer to rdev.
4. Check if there is still no IO on this rdev, if not, revert the
change.
IO path uses rcu_read_lock/unlock() to access rdev.
- rcu is used wrongly;
- There are lots of places involved that old rdev can be read, however,
many places doesn't handle old value correctly;
- Between step 3 and 4, if new io is dispatched, NULL will be read for
the rdev, and data will be lost if step 4 failed.
The new synchronization is similar to blk_mq_freeze_queue(). To add or
remove disk:
1. Suspend the array, that is, stop new IO from being dispatched
and wait for inflight IO to finish.
2. Add or remove rdevs to array;
3. Resume the array;
IO path doesn't need to change for now, and all rcu implementation can
be removed.
Then main work is divided into 3 steps:
First, first make sure new apis to suspend the array is general:
- make sure suspend array will wait for io to be done(Done by [1]);
- make sure suspend array can be called for all personalities(Done by [2]);
- make sure suspend array can be called at any time(Done by [3]);
- make sure suspend array doesn't rely on 'reconfig_mutex'(PATCH 3-5);
Second replace old apis with new apis(PATCH 6-16). Specifically, the
synchronization is changed from:
lock reconfig_mutex
suspend array
make changes
resume array
unlock reconfig_mutex
to:
suspend array
lock reconfig_mutex
make changes
unlock reconfig_mutex
resume array
Finally, for the remain path that involved reconfiguration, suspend the
array first(PATCH 11,12, [4] and PATCH 17):
Preparatory work:
[1] https://lore.kernel.org/all/
20230621165110.
1498313-1-yukuai1@huaweicloud.com/
[2] https://lore.kernel.org/all/
20230628012931.88911-2-yukuai1@huaweicloud.com/
[3] https://lore.kernel.org/all/
20230825030956.
1527023-1-yukuai1@huaweicloud.com/
[4] https://lore.kernel.org/all/
20230825031622.
1530464-1-yukuai1@huaweicloud.com/
* md-suspend-rewrite:
md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
md: remove old apis to suspend the array
md: suspend array in md_start_sync() if array need reconfiguration
md/raid5: replace suspend with quiesce() callback
md/md-linear: cleanup linear_add()
md: cleanup mddev_create/destroy_serial_pool()
md: use new apis to suspend array before mddev_create/destroy_serial_pool
md: use new apis to suspend array for ioctls involed array reconfiguration
md: use new apis to suspend array for adding/removing rdev from state_store()
md: use new apis to suspend array for sysfs apis
md/raid5: use new apis to suspend array
md/raid5-cache: use new apis to suspend array
md/md-bitmap: use new apis to suspend array for location_store()
md/dm-raid: use new apis to suspend array
md: add new helpers to suspend/resume and lock/unlock array
md: add new helpers to suspend/resume array
md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery()
md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
Yu Kuai [Tue, 10 Oct 2023 15:19:58 +0000 (23:19 +0800)]
md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
Now that the old apis are removed, __mddev_suspend/resume() can be
renamed to their original names.
This is done by:
sed -i "s/__mddev_suspend/mddev_suspend/g" *.[ch]
sed -i "s/__mddev_resume/mddev_resume/g" *.[ch]
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-20-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:57 +0000 (23:19 +0800)]
md: remove old apis to suspend the array
Now that mddev_suspend() and mddev_resume() is not used anywhere, remove
them, and remove 'MD_ALLOW_SB_UPDATE' and 'MD_UPDATING_SB' as well.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-19-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:56 +0000 (23:19 +0800)]
md: suspend array in md_start_sync() if array need reconfiguration
So that io won't concurrent with array reconfiguration, and it's safe to
suspend the array directly because normal io won't rely on
md_start_sync().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-18-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:55 +0000 (23:19 +0800)]
md/raid5: replace suspend with quiesce() callback
raid5 is the only personality to suspend array in check_reshape() and
start_reshape() callback, suspend and quiesce() callback can both wait
for all normal io to be done, and prevent new io to be dispatched, the
difference is that suspend is implemented in common layer, and quiesce()
callback is implemented in raid5.
In order to cleanup all the usage of mddev_suspend(), the new apis
__mddev_suspend() need to be called before 'reconfig_mutex' is held,
and it's not good to affect all the personalities in common layer just
for raid5. Hence replace suspend with quiesce() callaback, prepare to
reomove all the users of mddev_suspend().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-17-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:54 +0000 (23:19 +0800)]
md/md-linear: cleanup linear_add()
Now that caller already suspend the array, there is no need to suspend
array in liner_add().
Note that mddev_suspend/resume() is not used anymore.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-16-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:53 +0000 (23:19 +0800)]
md: cleanup mddev_create/destroy_serial_pool()
Now that except for stopping the array, all the callers already suspend
the array, there is no need to suspend anymore, hence remove the second
parameter.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-15-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:52 +0000 (23:19 +0800)]
md: use new apis to suspend array before mddev_create/destroy_serial_pool
mddev_create/destroy_serial_pool() will be called from several places
where mddev_suspend() will be called later.
Prepare to remove the mddev_suspend() from
mddev_create/destroy_serial_pool().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-14-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:51 +0000 (23:19 +0800)]
md: use new apis to suspend array for ioctls involed array reconfiguration
'reconfig_mutex' will be grabbed before these ioctls, suspend array
before holding the lock, so that io won't concurrent with array
reconfiguration through ioctls.
This is not hot path, so performance is not concerned.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-13-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:50 +0000 (23:19 +0800)]
md: use new apis to suspend array for adding/removing rdev from state_store()
User can write 'remove' and 're-add' to trigger array reconfiguration
through sysfs, suspend array in this case so that io won't concurrent
with array reconfiguration.
And now that all the caller of add_bound_rdev() alread suspend the
array, remove mddev_suspend/resume() from add_bound_rdev() as well.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-12-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:49 +0000 (23:19 +0800)]
md: use new apis to suspend array for sysfs apis
Convert to use new apis in following sysfs apis:
- level_store
- suspend_lo_store
- suspend_hi_store
- serialize_policy_store
These are not hot path, so performance is not concerned.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-11-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:48 +0000 (23:19 +0800)]
md/raid5: use new apis to suspend array
Convert to use new apis, the old apis will be removed eventually.
These are not hot path, so performance is not concerned.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-10-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:47 +0000 (23:19 +0800)]
md/raid5-cache: use new apis to suspend array
Convert to use new apis, the old apis will be removed eventually.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-9-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:46 +0000 (23:19 +0800)]
md/md-bitmap: use new apis to suspend array for location_store()
Convert to use new apis, the old apis will be removed eventually.
This is not hot path, so performance is not concerned.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-8-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:45 +0000 (23:19 +0800)]
md/dm-raid: use new apis to suspend array
Convert to use new apis, the old apis will be removed eventually.
These are not hot path, so performance is not concerned.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-7-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:44 +0000 (23:19 +0800)]
md: add new helpers to suspend/resume and lock/unlock array
The new helpers suspend the array first and then lock the array,
Prepare to refactor from:
mddev_lock/lock_nointr
mddev_suspend
...
mddev_resuem
mddev_lock
With:
mddev_suspend_and_lock/lock_nointr
...
mddev_unlock_and_resume
After all the use cases is refactored, mddev_suspend/resume() will be
removed.
And mddev_suspend_and_lock() will also replace mddev_lock() for the case
that the array will be reconfigured, in order to synchronize with io to
prevent problems in many corner cases.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-6-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:43 +0000 (23:19 +0800)]
md: add new helpers to suspend/resume array
Advantages for new apis:
- reconfig_mutex is not required;
- the weird logical that suspend array hold 'reconfig_mutex' for
mddev_check_recovery() to update superblock is not needed;
- the specail handling, 'pers->prepare_suspend', for raid456 is not
needed;
- It's safe to be called at any time once mddev is allocated, and it's
designed to be used from slow path where array configuration is changed;
- the new helpers is designed to be called before mddev_lock(), hence
it support to be interrupted by user as well.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-5-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:42 +0000 (23:19 +0800)]
md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery()
Prepare to cleanup pers->prepare_suspend(), which is used to fix a
deadlock in raid456 by returning error for io that is waiting for
reshape to make progress in mddev_suspend().
This change will allow reshape to make progress while waiting for io to
be done in mddev_suspend() in following patches.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-4-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:41 +0000 (23:19 +0800)]
md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are
not procted, hence protect it with READ_ONCE/WRITE_ONCE to prevent
reading abnormal values.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-3-yukuai1@huaweicloud.com
Yu Kuai [Tue, 10 Oct 2023 15:19:40 +0000 (23:19 +0800)]
md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
Protect 'suspend_lo' and 'suspend_hi' with READ_ONCE/WRITE_ONCE to prevent
reading abnormal values.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231010151958.145896-2-yukuai1@huaweicloud.com
Yu Kuai [Sat, 7 Oct 2023 11:21:05 +0000 (19:21 +0800)]
md/raid1: don't split discard io for write behind
Currently, discad io is treated the same as normal write io, and for
write behind case, io size is limited to:
BIO_MAX_VECS * (PAGE_SIZE >> 9)
For 0.5KB sector size and 4KB PAGE_SIZE, this is just 1MB. For
consequence, if 'WriteMostly' is set to one of the underlying disks,
then diskcard io will be splited into 1MB and it will take a long time
for the diskcard to finish.
Fix this problem by disable write behind for discard io.
Reported-by: Roman Mamedov <rm@romanrm.net>
Closes: https://lore.kernel.org/all/
6a1165f7-c792-c054-b8f0-
1ad4f7b8ae01@ultracoder.org/
Reported-and-tested-by: Kirill Kirilenko <kirill@ultracoder.org>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231007112105.407449-1-yukuai1@huaweicloud.com
Gabriel Krisman Bertazi [Thu, 5 Oct 2023 00:05:31 +0000 (20:05 -0400)]
io_uring/kbuf: Use slab for struct io_buffer objects
The allocation of struct io_buffer for metadata of provided buffers is
done through a custom allocator that directly gets pages and
fragments them. But, slab would do just fine, as this is not a hot path
(in fact, it is a deprecated feature) and, by keeping a custom allocator
implementation we lose benefits like tracking, poisoning,
sanitizers. Finally, the custom code is more complex and requires
keeping the list of pages in struct ctx for no good reason. This patch
cleans this path up and just uses slab.
I microbenchmarked it by forcing the allocation of a large number of
objects with the least number of io_uring commands possible (keeping
nbufs=USHRT_MAX), with and without the patch. There is a slight
increase in time spent in the allocation with slab, of course, but even
when allocating to system resources exhaustion, which is not very
realistic and happened around 1/2 billion provided buffers for me, it
wasn't a significant hit in system time. Specially if we think of a
real-world scenario, an application doing register/unregister of
provided buffers will hit ctx->io_buffers_cache more often than actually
going to slab.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20231005000531.30800-4-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gabriel Krisman Bertazi [Thu, 5 Oct 2023 00:05:30 +0000 (20:05 -0400)]
io_uring/kbuf: Allow the full buffer id space for provided buffers
nbufs tracks the number of buffers and not the last bgid. In 16-bit, we
have 2^16 valid buffers, but the check mistakenly rejects the last
bid. Let's fix it to make the interface consistent with the
documentation.
Fixes:
ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20231005000531.30800-3-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gabriel Krisman Bertazi [Thu, 5 Oct 2023 00:05:29 +0000 (20:05 -0400)]
io_uring/kbuf: Fix check of BID wrapping in provided buffers
Commit
3851d25c75ed0 ("io_uring: check for rollover of buffer ID when
providing buffers") introduced a check to prevent wrapping the BID
counter when sqe->off is provided, but it's off-by-one too
restrictive, rejecting the last possible BID (65534).
i.e., the following fails with -EINVAL.
io_uring_prep_provide_buffers(sqe, addr, size, 0xFFFF, 0, 0);
Fixes:
3851d25c75ed ("io_uring: check for rollover of buffer ID when providing buffers")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20231005000531.30800-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Höppner [Fri, 15 Sep 2023 13:10:01 +0000 (15:10 +0200)]
partitions/ibm: Introduce defines for magic string length values
The length values for volume label type and volume label id are
hard-coded in several places. Provide defines for those values and
replace all occurrences accordingly.
Note that the length is defined and used, and not the size since the
volume label type string and volume label id string are not
nul-terminated.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20230915131001.697070-4-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Höppner [Fri, 15 Sep 2023 13:10:00 +0000 (15:10 +0200)]
partitions/ibm: Replace strncpy() and improve readability
strncpy() is deprecated and needs to be replaced. The volume label
information strings are not nul-terminated and strncpy() can simply be
replaced with memcpy().
To enhance the readability of find_label() alongside this change, the
following improvements are made:
- Introduce the array dasd_vollabels[] containing all information
necessary for the label detection.
- Provide a helper function to obtain an index value corresponding to a
volume label type. This allows the use of a switch statement to reduce
indentation levels.
- The 'temp' variable is used to check against valid volume label types.
In the good case, this variable already contains the volume label type
making it unnecessary to copy the information again from e.g.
label->vol.vollbl. Remove the 'temp' variable and the second copy as
all information are already provided.
- Remove the 'found' variable and replace it with early returns
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20230915131001.697070-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Höppner [Fri, 15 Sep 2023 13:09:59 +0000 (15:09 +0200)]
partitions/ibm: Remove unnecessary memset
The data holding the volume label information is zeroed in case no valid
volume label was found. Since the label information isn't used in that
case, zeroing the data doesn't provide any value whatsoever.
Remove the unnecessary memset() call accordingly.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20230915131001.697070-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Justin Stitt [Tue, 19 Sep 2023 05:27:45 +0000 (05:27 +0000)]
aoe: replace strncpy with strscpy
`strncpy` is deprecated for use on NUL-terminated destination strings [1].
`aoe_iflist` is expected to be NUL-terminated which is evident by its
use with string apis later on like `strspn`:
| p = aoe_iflist + strspn(aoe_iflist, WHITESPACE);
It also seems `aoe_iflist` does not need to be NUL-padded which means
`strscpy` [2] is a suitable replacement due to the fact that it
guarantees NUL-termination on the destination buffer while not
unnecessarily NUL-padding.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Cc: Xu Panda <xu.panda@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230919-strncpy-drivers-block-aoe-aoenet-c-v2-1-3d5d158410e9@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Justin Stitt [Tue, 19 Sep 2023 05:30:35 +0000 (05:30 +0000)]
null_blk: replace strncpy with strscpy
`strncpy` is deprecated for use on NUL-terminated destination strings [1].
We should favor a more robust and less ambiguous interface.
We expect that both `nullb->disk_name` and `disk->disk_name` be
NUL-terminated:
| snprintf(nullb->disk_name, sizeof(nullb->disk_name),
| "%s", config_item_name(&dev->group.cg_item));
...
| pr_info("disk %s created\n", nullb->disk_name);
It seems like NUL-padding may be required due to __assign_disk_name()
utilizing a memcpy as opposed to a `str*cpy` api.
| static inline void __assign_disk_name(char *name, struct gendisk *disk)
| {
| if (disk)
| memcpy(name, disk->disk_name, DISK_NAME_LEN);
| else
| memset(name, 0, DISK_NAME_LEN);
| }
Then we go and print it with `__print_disk_name` which wraps `nullb_trace_disk_name()`.
| #define __print_disk_name(name) nullb_trace_disk_name(p, name)
This function obviously expects a NUL-terminated string.
| const char *nullb_trace_disk_name(struct trace_seq *p, char *name)
| {
| const char *ret = trace_seq_buffer_ptr(p);
|
| if (name && *name)
| trace_seq_printf(p, "disk=%s, ", name);
| trace_seq_putc(p, 0);
|
| return ret;
| }
>From the above, we need both 1) a NUL-terminated string and 2) a
NUL-padded string. So, let's use strscpy_pad() as per Kees' suggestion
from v1.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230919-strncpy-drivers-block-null_blk-main-c-v3-1-10cf0a87a2c3@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Joel Granados [Mon, 2 Oct 2023 22:02:03 +0000 (23:02 +0100)]
cdrom: Remove now superfluous sentinel element from ctl_table array
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Remove sentinel element from cdrom_table
Signed-off-by: Joel Granados <j.granados@samsung.com>
Link: https://lore.kernel.org/lkml/20231002-jag-sysctl_remove_empty_elem_drivers-v2-1-02dd0d46f71e@samsung.com
Reviewed-by: Phillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/lkml/20231002214528.15529-1-phil@philpotter.co.uk
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/r/20231002220203.15637-2-phil@philpotter.co.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 3 Oct 2023 00:25:23 +0000 (18:25 -0600)]
io_uring/rsrc: cleanup io_pin_pages()
This function is overly convoluted with a goto error path, and checks
under the mmap_read_lock() that don't need to be at all. Rearrange it
a bit so the checks and errors fall out naturally, rather than needing
to jump around for it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 29 Sep 2023 05:58:34 +0000 (23:58 -0600)]
Merge tag 'md-next-
20230927' of https://git./linux/kernel/git/song/md into for-6.7/block
Pull MD updates from Song:
"1. Make rdev add/remove independent from daemon thread, by Yu Kuai;
2. Refactor code around quiesce() and mddev_suspend(), by Yu Kuai."
* tag 'md-next-
20230927' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: replace deprecated strncpy with memcpy
md/md-linear: Annotate struct linear_conf with __counted_by
md: don't check 'mddev->pers' and 'pers->quiesce' from suspend_lo_store()
md: don't check 'mddev->pers' from suspend_hi_store()
md-bitmap: suspend array earlier in location_store()
md-bitmap: remove the checking of 'pers->quiesce' from location_store()
md: don't rely on 'mddev->pers' to be set in mddev_suspend()
md: initialize 'writes_pending' while allocating mddev
md: initialize 'active_io' while allocating mddev
md: delay remove_and_add_spares() for read only array to md_start_sync()
md: factor out a helper rdev_addable() from remove_and_add_spares()
md: factor out a helper rdev_is_spare() from remove_and_add_spares()
md: factor out a helper rdev_removeable() from remove_and_add_spares()
md: delay choosing sync action to md_start_sync()
md: factor out a helper to choose sync action from md_check_recovery()
md: use separate work_struct for md_start_sync()
Mariusz Tkaczyk [Thu, 28 Sep 2023 12:55:17 +0000 (14:55 +0200)]
md: do not require mddev_lock() for all options in array_state_store()
We don't need to lock device to reject not supported request
in array_state_store(). No functional changes intended.
There are differences between ioctl and sysfs handling during stopping.
With this change, it will be easier to add additional steps which needs
to be completed before mddev_lock() is taken.
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230928125517.12356-1-mariusz.tkaczyk@linux.intel.com
Ming Lei [Thu, 28 Sep 2023 12:43:25 +0000 (20:43 +0800)]
io_uring: cancelable uring_cmd
uring_cmd may never complete, such as ublk, in which uring cmd isn't
completed until one new block request is coming from ublk block device.
Add cancelable uring_cmd to provide mechanism to driver for cancelling
pending commands in its own way.
Add API of io_uring_cmd_mark_cancelable() for driver to mark one command as
cancelable, then io_uring will cancel this command in
io_uring_cancel_generic(). ->uring_cmd() callback is reused for canceling
command in driver's way, then driver gets notified with the cancelling
from io_uring.
Add API of io_uring_cmd_get_task() to help driver cancel handler
deal with the canceling.
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 28 Sep 2023 12:43:24 +0000 (20:43 +0800)]
io_uring: retain top 8bits of uring_cmd flags for kernel internal use
Retain top 8bits of uring_cmd flags for kernel internal use, so that we
can move IORING_URING_CMD_POLLED out of uapi header.
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Yu Kuai [Wed, 27 Sep 2023 06:12:41 +0000 (14:12 +0800)]
md: simplify md_seq_ops
Before this patch, the implementation is hacky and hard to understand:
1) md_seq_start set pos to 1;
2) md_seq_show found pos is 1, then print Personalities;
3) md_seq_next found pos is 1, then it update pos to the first mddev;
4) md_seq_show found pos is not 1 or 2, show mddev;
5) md_seq_next found pos is not 1 or 2, update pos to next mddev;
6) loop 4-5 until the last mddev, then md_seq_next update pos to 2;
7) md_seq_show found pos is 2, then print unused devices;
8) md_seq_next found pos is 2, stop;
This patch remove the magic value and use seq_list_start/next/stop()
directly, and move printing "Personalities" to md_seq_start(),
"unsed devices" to md_seq_stop():
1) md_seq_start print Personalities, and then set pos to first mddev;
2) md_seq_show show mddev;
3) md_seq_next update pos to next mddev;
4) loop 2-3 until the last mddev;
5) md_seq_stop print unsed devices;
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230927061241.1552837-3-yukuai1@huaweicloud.com
Yu Kuai [Wed, 27 Sep 2023 06:12:40 +0000 (14:12 +0800)]
md: factor out a helper from mddev_put()
There are no functional changes, prepare to simplify md_seq_ops in next
patch.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230927061241.1552837-2-yukuai1@huaweicloud.com
Coly Li [Fri, 11 Aug 2023 17:05:12 +0000 (01:05 +0800)]
badblocks: switch to the improved badblock handling code
This patch removes old code of badblocks_set(), badblocks_clear() and
badblocks_check(), and make them as wrappers to call _badblocks_set(),
_badblocks_clear() and _badblocks_check().
By this change now the badblock handing switch to the improved algorithm
in _badblocks_set(), _badblocks_clear() and _badblocks_check().
This patch only contains the changes of old code deletion, new added
code for the improved algorithms are in previous patches.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geliang Tang <geliang.tang@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: NeilBrown <neilb@suse.de>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Xiao Ni <xni@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Acked-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/20230811170513.2300-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Fri, 11 Aug 2023 17:05:11 +0000 (01:05 +0800)]
badblocks: improve badblocks_check() for multiple ranges handling
This patch rewrites badblocks_check() with similar coding style as
_badblocks_set() and _badblocks_clear(). The only difference is bad
blocks checking may handle multiple ranges in bad tables now.
If a checking range covers multiple bad blocks range in bad block table,
like the following condition (C is the checking range, E1, E2, E3 are
three bad block ranges in bad block table),
+------------------------------------+
| C |
+------------------------------------+
+----+ +----+ +----+
| E1 | | E2 | | E3 |
+----+ +----+ +----+
The improved badblocks_check() algorithm will divide checking range C
into multiple parts, and handle them in 7 runs of a while-loop,
+--+ +----+ +----+ +----+ +----+ +----+ +----+
|C1| | C2 | | C3 | | C4 | | C5 | | C6 | | C7 |
+--+ +----+ +----+ +----+ +----+ +----+ +----+
+----+ +----+ +----+
| E1 | | E2 | | E3 |
+----+ +----+ +----+
And the start LBA and length of range E1 will be set as first_bad and
bad_sectors for the caller.
The return value rule is consistent for multiple ranges. For example if
there are following bad block ranges in bad block table,
Index No. Start Len Ack
0 400 20 1
1 500 50 1
2 650 20 0
the return value, first_bad, bad_sectors by calling badblocks_set() with
different checking range can be the following values,
Checking Start, Len Return Value first_bad bad_sectors
100, 100 0 N/A N/A
100, 310 1 400 10
100, 440 1 400 10
100, 540 1 400 10
100, 600 -1 400 10
100, 800 -1 400 10
In order to make code review easier, this patch names the improved bad
block range checking routine as _badblocks_check() and does not change
existing badblock_check() code yet. Later patch will delete old code of
badblocks_check() and make it as a wrapper to call _badblocks_check().
Then the new added code won't mess up with the old deleted code, it will
be more clear and easier for code review.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geliang Tang <geliang.tang@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: NeilBrown <neilb@suse.de>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Xiao Ni <xni@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Acked-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/20230811170513.2300-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Fri, 11 Aug 2023 17:05:10 +0000 (01:05 +0800)]
badblocks: improve badblocks_clear() for multiple ranges handling
With the fundamental ideas and helper routines from badblocks_set()
improvement, clearing bad block for multiple ranges is much simpler.
With a similar idea from badblocks_set() improvement, this patch
simplifies bad block range clearing into 5 situations. No matter how
complicated the clearing condition is, we just look at the head part
of clearing range with relative already set bad block range from the
bad block table. The rested part will be handled in next run of the
while-loop.
Based on existing helpers added from badblocks_set(), this patch adds
two more helpers,
- front_clear()
Clear the bad block range from bad block table which is front
overlapped with the clearing range.
- front_splitting_clear()
Handle the condition that the clearing range hits middle of an
already set bad block range from bad block table.
Similar as badblocks_set(), the first part of clearing range is handled
with relative bad block range which is find by prev_badblocks(). In most
cases a valid hint is provided to prev_badblocks() to avoid unnecessary
bad block table iteration.
This patch also explains the detail algorithm code comments at beginning
of badblocks.c, including which five simplified situations are
categrized and how all the bad block range clearing conditions are
handled by these five situations.
Again, in order to make the code review easier and avoid the code
changes mixed together, this patch does not modify badblock_clear() and
implement another routine called _badblock_clear() for the improvement.
Later patch will delete current code of badblock_clear() and make it as
a wrapper to _badblock_clear(), so the code change can be much clear for
review.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geliang Tang <geliang.tang@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: NeilBrown <neilb@suse.de>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Xiao Ni <xni@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Acked-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/20230811170513.2300-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Fri, 11 Aug 2023 17:05:09 +0000 (01:05 +0800)]
badblocks: improve badblocks_set() for multiple ranges handling
Recently I received a bug report that current badblocks code does not
properly handle multiple ranges. For example,
badblocks_set(bb, 32, 1, true);
badblocks_set(bb, 34, 1, true);
badblocks_set(bb, 36, 1, true);
badblocks_set(bb, 32, 12, true);
Then indeed badblocks_show() reports,
32 3
36 1
But the expected bad blocks table should be,
32 12
Obviously only the first 2 ranges are merged and badblocks_set() returns
and ignores the rest setting range.
This behavior is improper, if the caller of badblocks_set() wants to set
a range of blocks into bad blocks table, all of the blocks in the range
should be handled even the previous part encountering failure.
The desired way to set bad blocks range by badblocks_set() is,
- Set as many as blocks in the setting range into bad blocks table.
- Merge the bad blocks ranges and occupy as less as slots in the bad
blocks table.
- Fast.
Indeed the above proposal is complicated, especially with the following
restrictions,
- The setting bad blocks range can be acknowledged or not acknowledged.
- The bad blocks table size is limited.
- Memory allocation should be avoided.
The basic idea of the patch is to categorize all possible bad blocks
range setting combinations into much less simplified and more less
special conditions. Inside badblocks_set() there is an implicit loop
composed by jumping between labels 're_insert' and 'update_sectors'. No
matter how large the setting bad blocks range is, in every loop just a
minimized range from the head is handled by a pre-defined behavior from
one of the categorized conditions. The logic is simple and code flow is
manageable.
The different relative layout between the setting range and existing bad
block range are checked and handled (merge, combine, overwrite, insert)
by the helpers in previous patch. This patch is to make all the helpers
work together with the above idea.
This patch only has the algorithm improvement for badblocks_set(). There
are following patches contain improvement for badblocks_clear() and
badblocks_check(). But the algorithm in badblocks_set() is fundamental
and typical, other improvement in clear and check routines are based on
all the helpers and ideas in this patch.
In order to make the change to be more clear for code review, this patch
does not directly modify existing badblocks_set(), and just add a new
one named _badblocks_set(). Later patch will remove current existing
badblocks_set() code and make it as a wrapper of _badblocks_set(). So
the new added change won't be mixed with deleted code, the code review
can be easier.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geliang Tang <geliang.tang@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: NeilBrown <neilb@suse.de>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Wols Lists <antlists@youngman.org.uk>
Cc: Xiao Ni <xni@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Acked-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/20230811170513.2300-4-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Fri, 11 Aug 2023 17:05:08 +0000 (01:05 +0800)]
badblocks: add helper routines for badblock ranges handling
This patch adds several helper routines to improve badblock ranges
handling. These helper routines will be used later in the improved
version of badblocks_set()/badblocks_clear()/badblocks_check().
- Helpers prev_by_hint() and prev_badblocks() are used to find the bad
range from bad table which the searching range starts at or after.
- The following helpers are to decide the relative layout between the
manipulating range and existing bad block range from bad table.
- can_merge_behind()
Return 'true' if the manipulating range can backward merge with the
bad block range.
- can_merge_front()
Return 'true' if the manipulating range can forward merge with the
bad block range.
- can_combine_front()
Return 'true' if two adjacent bad block ranges before the
manipulating range can be merged.
- overlap_front()
Return 'true' if the manipulating range exactly overlaps with the
bad block range in front of its range.
- overlap_behind()
Return 'true' if the manipulating range exactly overlaps with the
bad block range behind its range.
- can_front_overwrite()
Return 'true' if the manipulating range can forward overwrite the
bad block range in front of its range.
- The following helpers are to add the manipulating range into the bad
block table. Different routine is called with the specific relative
layout between the manipulating range and other bad block range in the
bad block table.
- behind_merge()
Merge the manipulating range with the bad block range behind its
range, and return the number of merged length in unit of sector.
- front_merge()
Merge the manipulating range with the bad block range in front of
its range, and return the number of merged length in unit of sector.
- front_combine()
Combine the two adjacent bad block ranges before the manipulating
range into a larger one.
- front_overwrite()
Overwrite partial of whole bad block range which is in front of the
manipulating range. The overwrite may split existing bad block range
and generate more bad block ranges into the bad block table.
- insert_at()
Insert the manipulating range at a specific location in the bad
block table.
All the above helpers are used in later patches to improve the bad block
ranges handling for badblocks_set()/badblocks_clear()/badblocks_check().
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geliang Tang <geliang.tang@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: NeilBrown <neilb@suse.de>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Xiao Ni <xni@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Acked-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/20230811170513.2300-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Fri, 11 Aug 2023 17:05:07 +0000 (01:05 +0800)]
badblocks: add more helper structure and routines in badblocks.h
This patch adds the following helper structure and routines into
badblocks.h,
- struct badblocks_context
This structure is used in improved badblocks code for bad table
iteration.
- BB_END()
The macro to calculate end LBA of a bad range record from bad
table.
- badblocks_full() and badblocks_empty()
The inline routines to check whether bad table is full or empty.
- set_changed() and clear_changed()
The inline routines to set and clear 'changed' tag from struct
badblocks.
These new helper structure and routines can help to make the code more
clear, they will be used in the improved badblocks code in following
patches.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Xiao Ni <xni@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geliang Tang <geliang.tang@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: NeilBrown <neilb@suse.de>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Acked-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/20230811170513.2300-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Justin Stitt [Mon, 25 Sep 2023 09:49:17 +0000 (09:49 +0000)]
md: replace deprecated strncpy with memcpy
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
There are three such strncpy uses that this patch addresses:
The respective destination buffers are:
1) mddev->clevel
2) clevel
3) mddev->metadata_type
We expect mddev->clevel to be NUL-terminated due to its use with format
strings:
| ret = sprintf(page, "%s\n", mddev->clevel);
Furthermore, we can see that mddev->clevel is not expected to be
NUL-padded as `md_clean()` merely set's its first byte to NULL -- not
the entire buffer:
| static void md_clean(struct mddev *mddev)
| {
| mddev->array_sectors = 0;
| mddev->external_size = 0;
| ...
| mddev->level = LEVEL_NONE;
| mddev->clevel[0] = 0;
| ...
A suitable replacement for this instance is `memcpy` as we know the
number of bytes to copy and perform manual NUL-termination at a
specified offset. This really decays to just a byte copy from one buffer
to another. `strscpy` is also a considerable replacement but using
`slen` as the length argument would result in truncation of the last
byte unless something like `slen + 1` was provided which isn't the most
idiomatic strscpy usage.
For the next case, the destination buffer `clevel` is expected to be
NUL-terminated based on its usage within kstrtol() which expects
NUL-terminated strings. Note that, in context, this code removes a
trailing newline which is seemingly not required as kstrtol() can handle
trailing newlines implicitly. However, there exists further usage of
clevel (or buf) that would also like to have the newline removed. All in
all, with similar reasoning to the first case, let's just use memcpy as
this is just a byte copy and NUL-termination is handled manually.
The third and final case concerning `mddev->metadata_type` is more or
less the same as the other two. We expect that it be NUL-terminated
based on its usage with seq_printf:
| seq_printf(seq, " super external:%s",
| mddev->metadata_type);
... and we can surmise that NUL-padding isn't required either due to how
it is handled in md_clean():
| static void md_clean(struct mddev *mddev)
| {
| ...
| mddev->metadata_type[0] = 0;
| ...
So really, all these instances have precisely calculated lengths and
purposeful NUL-termination so we can just use memcpy to remove ambiguity
surrounding strncpy.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230925-strncpy-drivers-md-md-c-v1-1-2b0093b89c2b@google.com
Kees Cook [Fri, 15 Sep 2023 20:03:28 +0000 (13:03 -0700)]
md/md-linear: Annotate struct linear_conf with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct linear_conf.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230915200328.never.064-kees@kernel.org
Yu Kuai [Fri, 25 Aug 2023 03:09:56 +0000 (11:09 +0800)]
md: don't check 'mddev->pers' and 'pers->quiesce' from suspend_lo_store()
Now that mddev_suspend() doean't rely on 'mddev->pers' to be set, it's
safe to remove such checking.
This will also allow the array to be suspended even before the array
is ran.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825030956.1527023-8-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:09:55 +0000 (11:09 +0800)]
md: don't check 'mddev->pers' from suspend_hi_store()
Now that mddev_suspend() doean't rely on 'mddev->pers' to be set, it's
safe to remove such checking.
This will also allow the array to be suspended even before the array
is ran.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825030956.1527023-7-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:09:54 +0000 (11:09 +0800)]
md-bitmap: suspend array earlier in location_store()
Now that mddev_suspend() doean't rely on 'mddev->pers' to be set, it's
safe to call mddev_suspend() earlier.
This will also be helper to refactor mddev_suspend() later.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825030956.1527023-6-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:09:53 +0000 (11:09 +0800)]
md-bitmap: remove the checking of 'pers->quiesce' from location_store()
After commit
4d27e927344a ("md: don't quiesce in mddev_suspend()"),
there is no need to check 'pers->quiesce' before calling
mddev_suspend().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825030956.1527023-5-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:09:52 +0000 (11:09 +0800)]
md: don't rely on 'mddev->pers' to be set in mddev_suspend()
'active_io' used to be initialized while the array is running, and
'mddev->pers' is set while the array is running as well. Hence caller
must hold 'reconfig_mutex' and guarantee 'mddev->pers' is set before
calling mddev_suspend().
Now that 'active_io' is initialized when mddev is allocated, such
restriction doesn't exist anymore. In the meantime, follow up patches
will refactor mddev_suspend(), hence add checking for 'mddev->pers' to
prevent null-ptr-deref.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825030956.1527023-4-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:09:51 +0000 (11:09 +0800)]
md: initialize 'writes_pending' while allocating mddev
Currently 'writes_pending' is initialized in pers->run for raid1/5/10,
and it's freed while deleing mddev, instead of pers->free. pers->run can
be called multiple times before mddev is deleted, and a helper
mddev_init_writes_pending() is used to prevent 'writes_pending' to be
initialized multiple times, this usage is safe but a litter weird.
On the other hand, 'writes_pending' is only initialized for raid1/5/10,
however, it's used in common layer, for example:
array_state_store
set_in_sync
if (!mddev->in_sync) -> in_sync is used for all levels
// access writes_pending
There might be some implicit dependency that I don't recognized to make
sure 'writes_pending' can only be accessed for raid1/5/10, but there are
no comments about that.
By the way, it make sense to initialize 'writes_pending' in common layer
because there are already three levels use it.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825030956.1527023-3-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:09:50 +0000 (11:09 +0800)]
md: initialize 'active_io' while allocating mddev
'active_io' is used for mddev_suspend() and it's initialized in
md_run(), this restrict that 'reconfig_mutex' must be held and
"mddev->pers" must be set before calling mddev_suspend().
Initialize 'active_io' early so that mddev_suspend() is safe to call
once mddev is allocated, this will be helpful to refactor
mddev_suspend() in following patches.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825030956.1527023-2-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:16:22 +0000 (11:16 +0800)]
md: delay remove_and_add_spares() for read only array to md_start_sync()
Before this patch, for read-only array:
md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, then it will
call remove_and_add_spares() directly to try to remove and add rdevs
from array.
After this patch:
1) md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, and the
worker 'sync_work' is not pending, and there are rdevs can be added
or removed, then it will queue new work md_start_sync();
2) md_start_sync() will call remove_and_add_spares() and exist;
This change make sure that array reconfiguration is independent from
daemon, and it'll be much easier to synchronize it with io, consier
that io may rely on daemon thread to be done.
Also fix a problem that 'pers->spars_active' is called after
remove_and_add_spares(), which order is wrong, because spares must
active first, and then remove_and_add_spares() can add spares to the
array, like what read-write case does:
1) daemon set 'MD_RECOVERY_RUNNING', register new sync thread to do
recovery;
2) recovery is done, md_do_sync() set 'MD_RECOVERY_DONE' before return;
3) daemon call 'pers->spars_active', and clear 'MD_RECOVERY_RUNNING';
4) in the next round of daemon, call remove_and_add_spares() to add
spares to the array.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825031622.1530464-8-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:16:21 +0000 (11:16 +0800)]
md: factor out a helper rdev_addable() from remove_and_add_spares()
There are no functional changes, just to make the code simpler and
prepare to delay remove_and_add_spares() to md_start_sync().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825031622.1530464-7-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:16:20 +0000 (11:16 +0800)]
md: factor out a helper rdev_is_spare() from remove_and_add_spares()
There are no functional changes, just to make the code simpler and
prepare to delay remove_and_add_spares() to md_start_sync().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825031622.1530464-6-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:16:19 +0000 (11:16 +0800)]
md: factor out a helper rdev_removeable() from remove_and_add_spares()
There are no functional changes, just to make the code simpler and
prepare to delay remove_and_add_spares() to md_start_sync().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825031622.1530464-5-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:16:18 +0000 (11:16 +0800)]
md: delay choosing sync action to md_start_sync()
Before this patch, for read-write array:
1) md_check_recover() found that something need to be done, and it'll
try to grab 'reconfig_mutex'. The case that md_check_recover() need
to do something:
- array is not suspend;
- super_block need to be updated;
- 'MD_RECOVERY_NEEDED' or 'MD_RECOVERY_DONE' is set;
- unusual case related to safemode;
2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
md_check_recover() will try to choose a sync action, and then queue a
work md_start_sync().
3) md_start_sync() register sync_thread;
After this patch,
1) is the same;
2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
queue a work md_start_sync() directly;
3) md_start_sync() will try to choose a sync action, and then register
sync_thread();
Because 'MD_RECOVERY_RUNNING' is cleared when sync_thread is done, 2)
and 3) and md_do_sync() is always ran in serial and they can never
concurrent, this change should not introduce any behavior change for now.
Also fix a problem that md_start_sync() can clear 'MD_RECOVERY_RUNNING'
without protection in error path, which might affect the logical in
md_check_recovery().
The advantage to change this is that array reconfiguration is
independent from daemon now, and it'll be much easier to synchronize it
with io, consider that io may rely on daemon thread to be done.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825031622.1530464-4-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:16:17 +0000 (11:16 +0800)]
md: factor out a helper to choose sync action from md_check_recovery()
There are no functional changes, on the one hand make the code cleaner,
on the other hand prevent following checkpatch error in the next patch to
delay choosing sync action to md_start_sync().
ERROR: do not use assignment in if condition
+ } else if ((spares = remove_and_add_spares(mddev, NULL))) {
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825031622.1530464-3-yukuai1@huaweicloud.com
Yu Kuai [Fri, 25 Aug 2023 03:16:16 +0000 (11:16 +0800)]
md: use separate work_struct for md_start_sync()
It's a little weird to borrow 'del_work' for md_start_sync(), declare
a new work_struct 'sync_work' for md_start_sync().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230825031622.1530464-2-yukuai1@huaweicloud.com
Chengming Zhou [Wed, 13 Sep 2023 15:16:16 +0000 (15:16 +0000)]
block/null_blk: add queue_rqs() support
Add batched mq_ops.queue_rqs() support in null_blk for testing. The
implementation is much easy since null_blk doesn't have commit_rqs().
We simply handle each request one by one, if errors are encountered,
leave them in the passed in list and return back.
There is about 3.6% improvement in IOPS of fio/t/io_uring on null_blk
with hw_queue_depth=256 on my test VM, from 1.09M to 1.13M.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230913151616.3164338-6-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chengming Zhou [Wed, 13 Sep 2023 15:16:15 +0000 (15:16 +0000)]
blk-mq: update driver tags request table when start request
Now we update driver tags request table in blk_mq_get_driver_tag(),
so the driver that support queue_rqs() have to update that inflight
table by itself.
Move it to blk_mq_start_request(), which is a better place where
we setup the deadline for request timeout check. And it's just
where the request becomes inflight.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230913151616.3164338-5-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chengming Zhou [Wed, 13 Sep 2023 15:16:14 +0000 (15:16 +0000)]
blk-mq: support batched queue_rqs() on shared tags queue
Since active requests have been accounted when allocate driver tags,
we can remove this limit now.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230913151616.3164338-4-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chengming Zhou [Wed, 13 Sep 2023 15:16:13 +0000 (15:16 +0000)]
blk-mq: remove RQF_MQ_INFLIGHT
Since the previous patch change to only account active requests when
we really allocate the driver tag, the RQF_MQ_INFLIGHT can be removed
and no double account problem.
1. none elevator: flush request will use the first pending request's
driver tag, won't double account.
2. other elevator: flush request will be accounted when allocate driver
tag when issue, and will be unaccounted when it put the driver tag.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230913151616.3164338-3-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chengming Zhou [Wed, 13 Sep 2023 15:16:12 +0000 (15:16 +0000)]
blk-mq: account active requests when get driver tag
There is a limit that batched queue_rqs() can't work on shared tags
queue, since the account of active requests can't be done there.
Now we account the active requests only in blk_mq_get_driver_tag(),
which is not the time we get driver tag actually (with none elevator).
To support batched queue_rqs() on shared tags queue, we move the
account of active requests to where we get the driver tag:
1. none elevator: blk_mq_get_tags() and blk_mq_get_tag()
2. other elevator: __blk_mq_alloc_driver_tag()
This is clearer and match with the unaccount side, which just happen
when we put the driver tag.
The other good point is that we don't need RQF_MQ_INFLIGHT trick
anymore, which used to avoid double account of flush request.
Now we only account when actually get the driver tag, so all is good.
We will remove RQF_MQ_INFLIGHT in the next patch.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230913151616.3164338-2-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 10 Jul 2023 22:14:37 +0000 (16:14 -0600)]
io_uring: add IORING_OP_WAITID support
This adds support for an async version of waitid(2), in a fully async
version. If an event isn't immediately available, wait for a callback
to trigger a retry.
The format of the sqe is as follows:
sqe->len The 'which', the idtype being queried/waited for.
sqe->fd The 'pid' (or id) being waited for.
sqe->file_index The 'options' being set.
sqe->addr2 A pointer to siginfo_t, if any, being filled in.
buf_index, add3, and waitid_flags are reserved/unused for now.
waitid_flags will be used for options for this request type. One
interesting use case may be to add multi-shot support, so that the
request stays armed and posts a notification every time a monitored
process state change occurs.
Note that this does not support rusage, on Arnd's recommendation.
See the waitid(2) man page for details on the arguments.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 11 Jul 2023 16:40:31 +0000 (10:40 -0600)]
exit: add internal include file with helpers
Move struct wait_opts and waitid_info into kernel/exit.h, and include
function declarations for the recently added helpers. Make them
non-static as well.
This is in preparation for adding a waitid operation through io_uring.
With the abtracted helpers, this is now possible.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 11 Jul 2023 16:38:23 +0000 (10:38 -0600)]
exit: add kernel_waitid_prepare() helper
Move the setup logic out of kernel_waitid(), and into a separate helper.
No functional changes intended in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 11 Jul 2023 16:34:37 +0000 (10:34 -0600)]
exit: move core of do_wait() into helper
Rather than have a maze of gotos, put the actual logic in __do_wait()
and have do_wait() loop deal with waitqueue setup/teardown and whether
to call __do_wait() again.
No functional changes intended in this patch.
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 11 Jul 2023 16:31:10 +0000 (10:31 -0600)]
exit: abstract out should_wake helper for child_wait_callback()
Abstract out the helper that decides if we should wake up following
a wake_up() callback on our internal waitqueue.
No functional changes intended in this patch.
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 11 Sep 2023 19:35:42 +0000 (13:35 -0600)]
io_uring/rw: add support for IORING_OP_READ_MULTISHOT
This behaves like IORING_OP_READ, except:
1) It only supports pollable files (eg pipes, sockets, etc). Note that
for sockets, you probably want to use recv/recvmsg with multishot
instead.
2) It supports multishot mode, meaning it will repeatedly trigger a
read and fill a buffer when data is available. This allows similar
use to recv/recvmsg but on non-sockets, where a single request will
repeatedly post a CQE whenever data is read from it.
3) Because of #2, it must be used with provided buffers. This is
uniformly true across any request type that supports multishot and
transfers data, with the reason being that it's obviously not
possible to pass in a single buffer for the data, as multiple reads
may very well trigger before an application has a chance to process
previous CQEs and the data passed from them.
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 11 Sep 2023 19:46:07 +0000 (13:46 -0600)]
io_uring/rw: mark readv/writev as vectored in the opcode definition
This is cleaner than gating on the opcode type, particularly as more
read/write type opcodes may be added.
Then we can use that for the data import, and for __io_read() on
whether or not we need to copy state.
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 11 Sep 2023 19:31:56 +0000 (13:31 -0600)]
io_uring/rw: split io_read() into a helper
Add __io_read() which does the grunt of the work, leaving the completion
side to the new io_read(). No functional changes in this patch.
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 17 Sep 2023 21:40:24 +0000 (14:40 -0700)]
Linux 6.6-rc2
Linus Torvalds [Sun, 17 Sep 2023 18:13:37 +0000 (11:13 -0700)]
Merge tag 'x86-urgent-2023-09-17' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- Fix an UV boot crash
- Skip spurious ENDBR generation on _THIS_IP_
- Fix ENDBR use in putuser() asm methods
- Fix corner case boot crashes on 5-level paging
- and fix a false positive WARNING on LTO kernels"
* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Remove LTO flags
x86/boot/compressed: Reserve more memory for page tables
x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
x86/ibt: Suppress spurious ENDBR
x86/platform/uv: Use alternate source for socket to node data
Linus Torvalds [Sun, 17 Sep 2023 18:10:23 +0000 (11:10 -0700)]
Merge tag 'sched-urgent-2023-09-17' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix a performance regression on large SMT systems, an Intel SMT4
balancing bug, and a topology setup bug on (Intel) hybrid processors"
* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
sched/fair: Fix SMT4 group_smt_balance handling
sched/fair: Optimize should_we_balance() for large SMT systems
Linus Torvalds [Sun, 17 Sep 2023 17:59:37 +0000 (10:59 -0700)]
Merge tag 'objtool-urgent-2023-09-17' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Ingo Molnar:
"Fix a cold functions related false-positive objtool warning that
triggers on Clang"
* tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix _THIS_IP_ detection for cold functions
Linus Torvalds [Sun, 17 Sep 2023 17:55:35 +0000 (10:55 -0700)]
Merge tag 'core-urgent-2023-09-17' of git://git./linux/kernel/git/tip/tip
Pull WARN fix from Ingo Molnar:
"Fix a missing preempt-enable in the WARN() slowpath"
* tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
panic: Reenable preemption in WARN slowpath
Linus Torvalds [Sun, 3 Sep 2023 18:09:56 +0000 (11:09 -0700)]
stat: remove no-longer-used helper macros
The choose_32_64() macros were added to deal with an odd inconsistency
between the 32-bit and 64-bit layout of 'struct stat' way back when in
commit
a52dd971f947 ("vfs: de-crapify "cp_new_stat()" function").
Then a decade later Mikulas noticed that said inconsistency had been a
mistake in the early x86-64 port, and shouldn't have existed in the
first place. So commit
932aba1e1690 ("stat: fix inconsistency between
struct stat and struct compat_stat") removed the uses of the helpers.
But the helpers remained around, unused.
Get rid of them.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 17 Sep 2023 17:41:42 +0000 (10:41 -0700)]
Merge tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Three small SMB3 client fixes, one to improve a null check and two
minor cleanups"
* tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix some minor typos and repeated words
smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP
smb3: move server check earlier when setting channel sequence number
Linus Torvalds [Sun, 17 Sep 2023 17:38:01 +0000 (10:38 -0700)]
Merge tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Two ksmbd server fixes"
* tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd:
ksmbd: fix passing freed memory 'aux_payload_buf'
ksmbd: remove unneeded mark_inode_dirty in set_info_sec()
Linus Torvalds [Sun, 17 Sep 2023 17:33:53 +0000 (10:33 -0700)]
Merge tag 'ext4_for_linus-6.6-rc2' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Regression and bug fixes for ext4"
* tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix rec_len verify error
ext4: do not let fstrim block system suspend
ext4: move setting of trimmed bit into ext4_try_to_trim_range()
jbd2: Fix memory leak in journal_init_common()
jbd2: Remove page size assumptions
buffer: Make bh_offset() work for compound pages
Song Liu [Thu, 14 Sep 2023 17:01:38 +0000 (10:01 -0700)]
x86/purgatory: Remove LTO flags
-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
multiple .text sections for purgatory.ro:
$ readelf -S purgatory.ro | grep " .text"
[ 1] .text PROGBITS
0000000000000000 00000040
[ 7] .text.purgatory PROGBITS
0000000000000000 000020e0
[ 9] .text.warn PROGBITS
0000000000000000 000021c0
[13] .text.sha256_upda PROGBITS
0000000000000000 000022f0
[15] .text.sha224_upda PROGBITS
0000000000000000 00002be0
[17] .text.sha256_fina PROGBITS
0000000000000000 00002bf0
[19] .text.sha224_fina PROGBITS
0000000000000000 00002cc0
This causes WARNING from kexec_purgatory_setup_sechdrs():
WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
kexec_load_purgatory+0x37f/0x390
Fix this by disabling LTO for purgatory.
[ AFAICT, x86 is the only arch that supports LTO and purgatory. ]
We could also fix this with an explicit linker script to rejoin .text.*
sections back into .text. However, given the benefit of LTOing purgatory
is small, simply disable the production of more .text.* sections for now.
Fixes:
b33fff07e3e3 ("x86, build: allow LTO to be selected")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
Kirill A. Shutemov [Fri, 15 Sep 2023 07:02:21 +0000 (10:02 +0300)]
x86/boot/compressed: Reserve more memory for page tables
The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.
The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.
In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.
Update wost-case calculations to include 5-level paging.
To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.
[ Also add a warning, should this memory run low - by Dave Hansen. ]
Fixes:
34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
Linus Torvalds [Sat, 16 Sep 2023 22:27:00 +0000 (15:27 -0700)]
Merge tag 'kbuild-fixes-v6.6' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix kernel-devel RPM and linux-headers Deb package
- Fix too long argument list error in 'make modules_install'
* tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: avoid long argument lists in make modules_install
kbuild: fix kernel-devel RPM package and linux-headers Deb package
Linus Torvalds [Sat, 16 Sep 2023 19:31:42 +0000 (12:31 -0700)]
vm: fix move_vma() memory accounting being off
Commit
408579cd627a ("mm: Update do_vmi_align_munmap() return
semantics") seems to have updated one of the callers of do_vmi_munmap()
incorrectly: it used to check for the error case (which didn't
change: negative means error).
That commit changed the check to the success case (which did change:
before that commit, 0 was success, and 1 was "success and lock
downgraded". After the change, it's always 0 for success, and the lock
will have been released if requested).
This didn't change any actual VM behavior _except_ for memory accounting
when 'VM_ACCOUNT' was set on the vma. Which made the wrong return value
test fairly subtle, since everything continues to work.
Or rather - it continues to work but the "Committed memory" accounting
goes all wonky (Committed_AS value in /proc/meminfo), and depending on
settings that then causes problems much much later as the VM relies on
bogus statistics for its heuristics.
Revert that one line of the change back to the original logic.
Fixes:
408579cd627a ("mm: Update do_vmi_align_munmap() return semantics")
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Reported-bisected-and-tested-by: Michael Labiuk <michael.labiuk@virtuozzo.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 16 Sep 2023 18:54:48 +0000 (11:54 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"16 small(ish) fixes all in drivers.
The major fixes are in pm8001 (fixes MSI-X issue going back to its
origin), the qla2xxx endianness fix, which fixes a bug on big endian
and the lpfc ones which can cause an oops on module removal without
them"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
scsi: target: core: Fix target_cmd_counter leak
scsi: pm8001: Setup IRQs on resume
scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command
scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock
scsi: qedf: Add synchronization between I/O completions and abort
scsi: target: Replace strlcpy() with strscpy()
scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id()
scsi: qla2xxx: Correct endianness for rqstlen and rsplen
scsi: ppa: Fix accidentally reversed conditions for 16-bit and 32-bit EPP
scsi: megaraid_sas: Fix deadlock on firmware crashdump
Linus Torvalds [Sat, 16 Sep 2023 18:49:57 +0000 (11:49 -0700)]
Merge tag 'ata-6.6-rc2' of git://git./linux/kernel/git/dlemoal/libata
Pull ata fixes from Damien Le Moal:
- Fix link power management transitions to disallow unsupported states
(Niklas)
- A small string handling fix for the sata_mv driver (Christophe)
- Clear port pending interrupts before reset, as per AHCI
specifications (Szuying).
Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in
ata_eh_reset() to allow EH to continue on with other actions recorded
with error interrupts triggered before EH completes. And an
additional fix to avoid thawing a port twice in EH (Niklas)
- Small code style fixes in the pata_parport driver to silence the
build bot as it keeps complaining about bad indentation (me)
- A fix for the recent CDL code to avoid fetching sense data for
successful commands when not necessary for correct operation (Niklas)
* tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-core: fetch sense data for successful commands iff CDL enabled
ata: libata-eh: do not thaw the port twice in ata_eh_reset()
ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
ata: pata_parport: Fix code style issues
ata: libahci: clear pending interrupt status
ata: sata_mv: Fix incorrect string length computation in mv_dump_mem()
ata: libata: disallow dev-initiated LPM transitions to unsupported states
Linus Torvalds [Sat, 16 Sep 2023 18:37:11 +0000 (11:37 -0700)]
Merge tag 'usb-6.6-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB fix from Greg KH:
"Here is a single USB fix for a much-reported regression for 6.6-rc1.
It resolves a crash in the typec debugfs code for many systems. It's
been in linux-next with no reported issues, and many people have
reported it resolving their problem with 6.6-rc1"
* tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Fix NULL pointer dereference
Linus Torvalds [Sat, 16 Sep 2023 18:26:52 +0000 (11:26 -0700)]
Merge tag 'driver-core-6.6-rc2' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here is a single driver core fix for a much-reported-by-sysbot issue
that showed up in 6.6-rc1. It's been submitted by many people, all in
the same way, so it obviously fixes things for them all.
Also in here is a single documentation update adding riscv to the
embargoed hardware document in case there are any future issues with
that processor family.
Both of these have been in linux-next with no reported problems"
* tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: embargoed-hardware-issues.rst: Add myself for RISC-V
driver core: return an error when dev_set_name() hasn't happened
Linus Torvalds [Sat, 16 Sep 2023 18:17:19 +0000 (11:17 -0700)]
Merge tag 'char-misc-6.6-rc2' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
comedi subsystem that has ended up being incorrect and caused drivers
that were working for people to be unable to be able to be selected to
build at all.
To fix this, the Kconfig change needs to be reverted and a future set
of fixes for the ioport dependancies will show up in 6.7-rc1 (there's
no rush for them.)
This has been in linux-next with no reported issues"
* tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "comedi: add HAS_IOPORT dependencies"
Linus Torvalds [Sat, 16 Sep 2023 18:09:18 +0000 (11:09 -0700)]
Merge tag 'i2c-for-6.6-rc2' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"The main thing is the removal of 'probe_new' because all i2c client
drivers are converted now. Thanks Uwe, this marks the end of a long
conversion process.
Other than that, we have a few Kconfig updates and driver bugfixes"
* tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cadence: Fix the kernel-doc warnings
i2c: aspeed: Reset the i2c controller when timeout occurs
i2c: I2C_MLXCPLD on ARM64 should depend on ACPI
i2c: Make I2C_ATR invisible
i2c: Drop legacy callback .probe_new()
w1: ds2482: Switch back to use struct i2c_driver's .probe()
Niklas Cassel [Wed, 13 Sep 2023 15:04:43 +0000 (17:04 +0200)]
ata: libata-core: fetch sense data for successful commands iff CDL enabled
Currently, we fetch sense data for a _successful_ command if either:
1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag
ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command
sense data supported bit is set); or
2) Command was non-NCQ and regular sense data reporting is enabled.
This means that case 2) will trigger for a non-NCQ command which has
ATA_SENSE bit set, regardless if CDL is enabled or not.
This decision was by design. If the device reports that it has sense data
available, it makes sense to fetch that sense data, since the sk/asc/ascq
could be important information regardless if CDL is enabled or not.
However, the fetching of sense data for a successful command is done via
ATA EH. Considering how intricate the ATA EH is, we really do not want to
invoke ATA EH unless absolutely needed.
Before commit
18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL
commands using policy 0xD") we never fetched sense data for successful
commands.
In order to not invoke the ATA EH unless absolutely necessary, even if the
device claims support for sense data reporting, only fetch sense data for
successful (NCQ and non-NCQ commands) commands that are using CDL.
[Damien] Modified the check to test the qc flag ATA_QCFLAG_HAS_CDL
instead of the device support for CDL, which is implied for commands
using CDL.
Fixes:
3ac873c76d79 ("ata: libata-core: fix when to fetch sense data for successful commands")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Niklas Cassel [Wed, 13 Sep 2023 22:19:17 +0000 (00:19 +0200)]
ata: libata-eh: do not thaw the port twice in ata_eh_reset()
commit
1e641060c4b5 ("libata: clear eh_info on reset completion") added
a workaround that broke the retry mechanism in ATA EH.
Tejun himself suggested to remove this workaround when it was identified
to cause additional problems:
https://lore.kernel.org/linux-ide/
20110426135027.GI878@htj.dyndns.org/
He even said:
"Hmm... it seems I wasn't thinking straight when I added that work around."
https://lore.kernel.org/linux-ide/
20110426155229.GM878@htj.dyndns.org/
While removing the workaround solved the issue, however, the workaround was
kept to avoid "spurious hotplug events during reset", and instead another
workaround was added on top of the existing workaround in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
Because these IRQs happened when the port was frozen, we know that they
were actually a side effect of PxIS and IS.IPS(x) not being cleared before
the COMRESET. This is now done in commit
94152042eaa9 ("ata: libahci: clear
pending interrupt status"), so these workarounds can now be removed.
Since commit
1e641060c4b5 ("libata: clear eh_info on reset completion") has
now been reverted, the ATA EH retry mechanism is functional again, so there
is once again no need to thaw the port more than once in ata_eh_reset().
This reverts "the workaround on top of the workaround" introduced in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Niklas Cassel [Wed, 13 Sep 2023 22:19:16 +0000 (00:19 +0200)]
ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
before calling ap->ops->error_handler() (without holding the ap->lock).
If an error IRQ is received while ap->ops->error_handler() is running,
the irq handler will set ATA_PFLAG_EH_PENDING.
Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
of ATA EH is performed.
The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().
ata_eh_reset() is called by ap->ops->error_handler(). This additional
clearing done by ata_eh_reset() breaks the whole retry logic in
ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
ap->ops->error_handler() is running, the port will currently remain
frozen and will never get re-enabled.
The additional clearing in ata_eh_reset() was introduced in commit
1e641060c4b5 ("libata: clear eh_info on reset completion").
Looking at the original error report:
https://marc.info/?l=linux-ide&m=
124765325828495&w=2
We can see the following happening:
[ 1.074659] ata3: XXX port freeze
[ 1.074700] ata3: XXX hardresetting link, stopping engine
[ 1.074746] ata3: XXX flipping SControl
[ 1.411471] ata3: XXX irq_stat=400040 CONN|PHY
[ 1.411475] ata3: XXX port freeze
[ 1.420049] ata3: XXX starting engine
[ 1.420096] ata3: XXX rc=0, class=1
[ 1.420142] ata3: XXX clearing IRQs for thawing
[ 1.420188] ata3: XXX port thawed
[ 1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
We are not supposed to be able to receive an error IRQ while the port is
frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).
AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
"Each bit location can be thought of as reporting a '1' if the virtual
"interrupt line" for that port is indicating it wishes to generate an
interrupt. That is, if a port has one or more interrupt status bit set,
and the enables for those status bits are set, then this bit shall be set."
Additionally, AHCI state P:ComInit clearly shows that the state machine
will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.
So IS.IPS(x) only gets set if PxIS and PxIE is set.
AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
"The bits in this register are read/write clear. It is set by the level of
the virtual interrupt line being a set, and cleared by a write of '1' from
the software."
So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
IS.IPS(x) for that port.
Since PxIE is cleared, the only way to get an interrupt while the port is
frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
the port is frozen, is if it was set before the port was frozen.
However, since commit
737dd811a3db ("ata: libahci: clear pending interrupt
status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
before the COMRESET, so the problem that commit
1e641060c4b5 ("libata:
clear eh_info on reset completion") fixed can no longer happen.
Thus, revert commit
1e641060c4b5 ("libata: clear eh_info on reset
completion"), so that the retry logic in ata_scsi_port_error_handler()
works once again. (The retry logic is still needed, since we can still
get an error IRQ _after_ the port has been thawed, but before
ata_scsi_port_error_handler() takes the ap->lock in order to check
if ATA_PFLAG_EH_PENDING is set.)
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Linus Torvalds [Sat, 16 Sep 2023 02:22:20 +0000 (19:22 -0700)]
Merge tag 'linux-kselftest-fixes-6.6-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull more kselftest fixes from Shuah Khan
"Fixes to user_events test and ftrace test.
The user_events test was enabled by default in Linux 6.6-rc1. The
following fixes are for bugs found since then:
- add checks for dependencies and skip the test if they aren't met.
The user_events test requires root access, and tracefs and
user_events enabled. It leaves tracefs mounted and a fix is in
progress for that missing piece.
- create user_events test-specific Kconfig fragments
ftrace test fixes:
- unmount tracefs for recovering environment. Fix identified during
the above mentioned user_events dependencies fix.
- adds softlink to latest log directory improving usage"
* tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: tracing: Fix to unmount tracefs for recovering environment
selftests: user_events: create test-specific Kconfig fragments
ftrace/selftests: Add softlink to latest log directory
selftests/user_events: Fix failures when user_events is not installed