Jens Axboe [Sun, 15 Sep 2024 14:51:20 +0000 (08:51 -0600)]
io_uring/rsrc: get rid of io_mapped_ubuf->folio_mask
We don't really need to cache this, let's reclaim 8 bytes from struct
io_mapped_ubuf and just calculate it when we need it. The only hot path
here is io_import_fixed().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sat, 14 Sep 2024 14:51:15 +0000 (08:51 -0600)]
io_uring: rename "copy buffers" to "clone buffers"
A recent commit added support for copying registered buffers from one
ring to another. But that term is a bit confusing, as no copying of
buffer data is done here. What is being done is simply cloning the
buffer registrations from one ring to another.
Rename it while we still can, so that it's more descriptive. No
functional changes in this patch.
Fixes:
7cc2a6eadcd7 ("io_uring: add IORING_REGISTER_COPY_BUFFERS method")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 11 Sep 2024 19:56:08 +0000 (13:56 -0600)]
io_uring: add IORING_REGISTER_COPY_BUFFERS method
Buffers can get registered with io_uring, which allows to skip the
repeated pin_pages, unpin/unref pages for each O_DIRECT operation. This
reduces the overhead of O_DIRECT IO.
However, registrering buffers can take some time. Normally this isn't an
issue as it's done at initialization time (and hence less critical), but
for cases where rings can be created and destroyed as part of an IO
thread pool, registering the same buffers for multiple rings become a
more time sensitive proposition. As an example, let's say an application
has an IO memory pool of 500G. Initial registration takes:
Got 500 huge pages (each 1024MB)
Registered 500 pages in 409 msec
or about 0.4 seconds. If we go higher to 900 1GB huge pages being
registered:
Registered 900 pages in 738 msec
which is, as expected, a fully linear scaling.
Rather than have each ring pin/map/register the same buffer pool,
provide an io_uring_register(2) opcode to simply duplicate the buffers
that are registered with another ring. Adding the same 900GB of
registered buffers to the target ring can then be accomplished in:
Copied 900 pages in 17 usec
While timing differs a bit, this provides around a 25,000-40,000x
speedup for this use case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 12 Sep 2024 15:29:29 +0000 (09:29 -0600)]
io_uring/register: provide helper to get io_ring_ctx from 'fd'
Can be done in one of two ways:
1) Regular file descriptor, just fget()
2) Registered ring, index our own table for that
In preparation for adding another register use of needing to get a ctx
from a file descriptor, abstract out this helper and use it in the main
register syscall as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 11 Sep 2024 19:54:32 +0000 (13:54 -0600)]
io_uring/rsrc: add reference count to struct io_mapped_ubuf
Currently there's a single ring owner of a mapped buffer, and hence the
reference count will always be 1 when it's torn down and freed. However,
in preparation for being able to link io_mapped_ubuf to different spots,
add a reference count to manage the lifetime of it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 11 Sep 2024 19:52:17 +0000 (13:52 -0600)]
io_uring/rsrc: clear 'slot' entry upfront
No functional changes in this patch, but clearing the slot pointer
earlier will be required by a later change.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Felix Moessbauer [Tue, 10 Sep 2024 17:11:57 +0000 (19:11 +0200)]
io_uring/io-wq: inherit cpuset of cgroup in io worker
The io worker threads are userland threads that just never exit to the
userland. By that, they are also assigned to a cgroup (the group of the
creating task).
When creating a new io worker, this worker should inherit the cpuset
of the cgroup.
Fixes:
da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
Link: https://lore.kernel.org/r/20240910171157.166423-3-felix.moessbauer@siemens.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Felix Moessbauer [Tue, 10 Sep 2024 17:11:56 +0000 (19:11 +0200)]
io_uring/io-wq: do not allow pinning outside of cpuset
The io worker threads are userland threads that just never exit to the
userland. By that, they are also assigned to a cgroup (the group of the
creating task).
When changing the affinity of the io_wq thread via syscall, we must only
allow cpumasks within the limits defined by the cpuset controller of the
cgroup (if enabled).
Fixes:
da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
Link: https://lore.kernel.org/r/20240910171157.166423-2-felix.moessbauer@siemens.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 10 Sep 2024 14:57:04 +0000 (08:57 -0600)]
io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()
A recent change ensured that the necessary -EOPNOTSUPP -> -EAGAIN
transformation happens inline on both the reader and writer side,
and hence there's no need to check for both of these anymore on
the completion handler side.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 10 Sep 2024 14:30:57 +0000 (08:30 -0600)]
io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
Some file systems, ocfs2 in this case, will return -EOPNOTSUPP for
an IOCB_NOWAIT read/write attempt. While this can be argued to be
correct, the usual return value for something that requires blocking
issue is -EAGAIN.
A refactoring io_uring commit dropped calling kiocb_done() for
negative return values, which is otherwise where we already do that
transformation. To ensure we catch it in both spots, check it in
__io_read() itself as well.
Reported-by: Robert Sander <r.sander@heinlein-support.de>
Link: https://fosstodon.org/@gurubert@mastodon.gurubert.de/113112431889638440
Cc: stable@vger.kernel.org
Fixes:
a08d195b586a ("io_uring/rw: split io_read() into a helper")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Felix Moessbauer [Mon, 9 Sep 2024 15:00:36 +0000 (17:00 +0200)]
io_uring/sqpoll: do not allow pinning outside of cpuset
The submit queue polling threads are userland threads that just never
exit to the userland. When creating the thread with IORING_SETUP_SQ_AFF,
the affinity of the poller thread is set to the cpu specified in
sq_thread_cpu. However, this CPU can be outside of the cpuset defined
by the cgroup cpuset controller. This violates the rules defined by the
cpuset controller and is a potential issue for realtime applications.
In
b7ed6d8ffd6 we fixed the default affinity of the poller thread, in
case no explicit pinning is required by inheriting the one of the
creating task. In case of explicit pinning, the check is more
complicated, as also a cpu outside of the parent cpumask is allowed.
We implemented this by using cpuset_cpus_allowed (that has support for
cgroup cpusets) and testing if the requested cpu is in the set.
Fixes:
37d1e2e3642e ("io_uring: move SQPOLL thread io-wq forked worker")
Cc: stable@vger.kernel.org # 6.1+
Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
Link: https://lore.kernel.org/r/20240909150036.55921-1-felix.moessbauer@siemens.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sun, 8 Sep 2024 22:34:55 +0000 (16:34 -0600)]
io_uring/eventfd: move refs to refcount_t
atomic_t for the struct io_ev_fd references and there are no issues with
it. While the ref getting and putting for the eventfd code is somewhat
performance critical for cases where eventfd signaling is used (news
flash, you should not...), it probably doesn't warrant using an atomic_t
for this. Let's just move to it to refcount_t to get the added
protection of over/underflows.
Link: https://lore.kernel.org/lkml/202409082039.hnsaIJ3X-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202409082039.hnsaIJ3X-lkp@intel.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Anuj Gupta [Mon, 2 Sep 2024 06:21:34 +0000 (11:51 +0530)]
io_uring: remove unused rsrc_put_fn
rsrc_put_fn is declared but never used, remove it.
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/20240902062134.136387-3-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Anuj Gupta [Mon, 2 Sep 2024 06:21:33 +0000 (11:51 +0530)]
io_uring: add new line after variable declaration
Fixes checkpatch warning
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/20240902062134.136387-2-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 30 Aug 2024 16:52:02 +0000 (10:52 -0600)]
io_uring: add GCOV_PROFILE_URING Kconfig option
If GCOV is enabled and this option is set, it enables code coverage
profiling of the io_uring subsystem. Only use this for test purposes,
as it will impact the runtime performance.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 9 Aug 2024 17:20:45 +0000 (11:20 -0600)]
io_uring/kbuf: add support for incremental buffer consumption
By default, any recv/read operation that uses provided buffers will
consume at least 1 buffer fully (and maybe more, in case of bundles).
This adds support for incremental consumption, meaning that an
application may add large buffers, and each read/recv will just consume
the part of the buffer that it needs.
For example, let's say an application registers 1MB buffers in a
provided buffer ring, for streaming receives. If it gets a short recv,
then the full 1MB buffer will be consumed and passed back to the
application. With incremental consumption, only the part that was
actually used is consumed, and the buffer remains the current one.
This means that both the application and the kernel needs to keep track
of what the current receive point is. Each recv will still pass back a
buffer ID and the size consumed, the only difference is that before the
next receive would always be the next buffer in the ring. Now the same
buffer ID may return multiple receives, each at an offset into that
buffer from where the previous receive left off. Example:
Application registers a provided buffer ring, and adds two 32K buffers
to the ring.
Buffer1 address: 0x1000000 (buffer ID 0)
Buffer2 address: 0x2000000 (buffer ID 1)
A recv completion is received with the following values:
cqe->res 0x1000 (4k bytes received)
cqe->flags 0x11 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)
and the application now knows that 4096b of data is available at
0x1000000, the start of that buffer, and that more data from this buffer
will be coming. Now the next receive comes in:
cqe->res 0x2010 (8k bytes received)
cqe->flags 0x11 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 0)
which tells the application that 8k is available where the last
completion left off, at 0x1001000. Next completion is:
cqe->res 0x5000 (20k bytes received)
cqe->flags 0x1 (CQE_F_BUFFER set, buffer ID 0)
and the application now knows that 20k of data is available at
0x1003000, which is where the previous receive ended. CQE_F_BUF_MORE
isn't set, as no more data is available in this buffer ID. The next
completion is then:
cqe->res 0x1000 (4k bytes received)
cqe->flags 0x10001 (CQE_F_BUFFER|CQE_F_BUF_MORE set, buffer ID 1)
which tells the application that buffer ID 1 is now the current one,
hence there's 4k of valid data at 0x2000000. 0x2001000 will be the next
receive point for this buffer ID.
When a buffer will be reused by future CQE completions,
IORING_CQE_BUF_MORE will be set in cqe->flags. This tells the application
that the kernel isn't done with the buffer yet, and that it should expect
more completions for this buffer ID. Will only be set by provided buffer
rings setup with IOU_PBUF_RING INC, as that's the only type of buffer
that will see multiple consecutive completions for the same buffer ID.
For any other provided buffer type, any completion that passes back
a buffer to the application is final.
Once a buffer has been fully consumed, the buffer ring head is
incremented and the next receive will indicate the next buffer ID in the
CQE cflags.
On the send side, the application can manage how much data is sent from
an existing buffer by setting sqe->len to the desired send length.
An application can request incremental consumption by setting
IOU_PBUF_RING_INC in the provided buffer ring registration. Outside of
that, any provided buffer ring setup and buffer additions is done like
before, no changes there. The only change is in how an application may
see multiple completions for the same buffer ID, hence needing to know
where the next receive will happen.
Note that like existing provided buffer rings, this should not be used
with IOSQE_ASYNC, as both really require the ring to remain locked over
the duration of the buffer selection and the operation completion. It
will consume a buffer otherwise regardless of the size of the IO done.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 27 Aug 2024 14:26:07 +0000 (08:26 -0600)]
io_uring/kbuf: pass in 'len' argument for buffer commit
In preparation for needing the consumed length, pass in the length being
completed. Unused right now, but will be used when it is possible to
partially consume a buffer.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 20 Aug 2024 13:22:53 +0000 (07:22 -0600)]
Revert "io_uring: Require zeroed sqe->len on provided-buffers send"
This reverts commit
79996b45f7b28c0e3e08a95bab80119e95317e28.
Revert the change that restricts a send provided buffer to be zero, so
it will always consume the whole buffer. This is strictly needed for
partial consumption, as the send may very well be a subset of the
current buffer. In fact, that's the intended use case.
For non-incremental provided buffer rings, an application should set
sqe->len carefully to avoid the potential issue described in the
reverted commit. It is recommended that '0' still be set for len for
that case, if the application is set on maintaining more than 1 send
inflight for the same socket. This is somewhat of a nonsensical thing
to do.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 12 Aug 2024 15:25:36 +0000 (09:25 -0600)]
io_uring/kbuf: move io_ring_head_to_buf() to kbuf.h
In preparation for using this helper in kbuf.h as well, move it there and
turn it into a macro.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 12 Aug 2024 15:18:35 +0000 (09:18 -0600)]
io_uring/kbuf: add io_kbuf_commit() helper
Committing the selected ring buffer is currently done in three different
spots, combine it into a helper and just call that.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 8 Aug 2024 18:54:55 +0000 (12:54 -0600)]
io_uring/kbuf: shrink nr_iovs/mode in struct buf_sel_arg
nr_iovs is capped at 1024, and mode only has a few low values. We can
safely make them u16, in preparation for adding a few more members.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 4 Jan 2024 17:46:30 +0000 (10:46 -0700)]
io_uring: wire up min batch wake timeout
Expose min_wait_usec in io_uring_getevents_arg, replacing the pad member
that is currently in there. The value is in usecs, which is explained in
the name as well.
Note that if min_wait_usec and a normal timeout is used in conjunction,
the normal timeout is still relative to the base time. For example, if
min_wait_usec is set to 100 and the normal timeout is 1000, the max
total time waited is still 1000. This also means that if the normal
timeout is shorter than min_wait_usec, then only the min_wait_usec will
take effect.
See previous commit for an explanation of how this works.
IORING_FEAT_MIN_TIMEOUT is added as a feature flag for this, as
applications doing submit_and_wait_timeout() style operations will
generally not see the -EINVAL from the wait side as they return the
number of IOs submitted. Only if no IOs are submitted will the -EINVAL
bubble back up to the application.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 4 Jan 2024 17:17:54 +0000 (10:17 -0700)]
io_uring: add support for batch wait timeout
Waiting for events with io_uring has two knobs that can be set:
1) The number of events to wake for
2) The timeout associated with the event
Waiting will abort when either of those conditions are met, as expected.
This adds support for a third event, which is associated with the number
of events to wait for. Applications generally like to handle batches of
completions, and right now they'd set a number of events to wait for and
the timeout for that. If no events have been received but the timeout
triggers, control is returned to the application and it can wait again.
However, if the application doesn't have anything to do until events are
reaped, then it's possible to make this waiting more efficient.
For example, the application may have a latency time of 50 usecs and
wanting to handle a batch of 8 requests at the time. If it uses 50 usecs
as the timeout, then it'll be doing 20K context switches per second even
if nothing is happening.
This introduces the notion of min batch wait time. If the min batch wait
time expires, then we'll return to userspace if we have any events at all.
If none are available, the general wait time is applied. Any request
arriving after the min batch wait time will cause waiting to stop and
return control to the application.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 4 Jan 2024 15:46:23 +0000 (08:46 -0700)]
io_uring: implement our own schedule timeout handling
In preparation for having two distinct timeouts and avoid waking the
task if we don't need to.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 4 Jan 2024 15:02:59 +0000 (08:02 -0700)]
io_uring: move schedule wait logic into helper
In preparation for expanding how we handle waits, move the actual
schedule and schedule_timeout() handling into a helper.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 15 Aug 2024 22:13:32 +0000 (16:13 -0600)]
io_uring: encapsulate extraneous wait flags into a separate struct
Rather than need to pass in 2 or 3 separate arguments, add a struct
to encapsulate the timeout and sigset_t parts of waiting. In preparation
for adding another argument for waiting.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 7 Aug 2024 14:18:14 +0000 (15:18 +0100)]
io_uring: user registered clockid for wait timeouts
Add a new registration opcode IORING_REGISTER_CLOCK, which allows the
user to select which clock id it wants to use with CQ waiting timeouts.
It only allows a subset of all posix clocks and currently supports
CLOCK_MONOTONIC and CLOCK_BOOTTIME.
Suggested-by: Lewis Baker <lewissbaker@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/98f2bc8a3c36cdf8f0e6a275245e81e903459703.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 7 Aug 2024 14:18:13 +0000 (15:18 +0100)]
io_uring: add absolute mode wait timeouts
In addition to current relative timeouts for the waiting loop, where the
timespec argument specifies the maximum time it can wait for, add
support for the absolute mode, with the value carrying a CLOCK_MONOTONIC
absolute time until which we should return control back to the user.
Suggested-by: Lewis Baker <lewissbaker@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4d5b74d67ada882590b2e42aa3aa7117bbf6b55f.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 7 Aug 2024 14:18:12 +0000 (15:18 +0100)]
io_uring/napi: postpone napi timeout adjustment
Remove io_napi_adjust_timeout() and move the adjustments out of the
common path into __io_napi_busy_loop(). Now the limit it's calculated
based on struct io_wait_queue::timeout, for which we query current time
another time. The overhead shouldn't be a problem, it's a polling path,
however that can be optimised later by additionally saving the delta
time value in io_cqring_wait().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/88e14686e245b3b42ff90a3c4d70895d48676206.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 7 Aug 2024 14:18:11 +0000 (15:18 +0100)]
io_uring/napi: refactor __io_napi_busy_loop()
we don't need to set ->napi_prefer_busy_poll if we're not going to poll,
do the checks first and all polling preparation after.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2ad7ede8cc7905328fc62e8c3805fdb11635ae0b.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 9 Aug 2024 16:39:38 +0000 (10:39 -0600)]
io_uring/kbuf: turn io_buffer_list booleans into flags
We could just move these two and save some space, but in preparation
for adding another flag, turn them into flags first.
This saves 8 bytes in struct io_buffer_list, making it exactly half
a cacheline on 64-bit archs now rather than 40 bytes.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 8 Aug 2024 16:42:18 +0000 (10:42 -0600)]
io_uring/net: use ITER_UBUF for single segment send maps
Just like what is being done on the recv side, if we only map a single
segment, then use ITER_UBUF for mapping it. That's more efficient than
using an ITER_IOVEC.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 8 Aug 2024 16:33:16 +0000 (10:33 -0600)]
io_uring/kbuf: use 'bl' directly rather than req->buf_list
req->buf_list is assigned higher up and is safe to use as we remain
within a locked region, as is the 'bl' variable itself from which it
was assigned. To improve readability, use 'bl' directly rather than
get it from the io_kiocb, if we need to increment the head directly
in the buffer selection path. This makes it readily apparent that
it's the same io_buffer_list being used.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Olivier Langlois [Tue, 30 Jul 2024 20:56:06 +0000 (16:56 -0400)]
io_uring: micro optimization of __io_sq_thread() condition
reverse the order of the element evaluation in an if statement.
for many users that are not using iopoll, the iopoll_list will always
evaluate to false after having made a memory access whereas to_submit is
very likely already loaded in a register.
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/052ca60b5c49e7439e4b8bd33bfab4a09d36d3d6.1722374371.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chenliang Li [Wed, 31 Jul 2024 09:01:33 +0000 (17:01 +0800)]
io_uring/rsrc: enable multi-hugepage buffer coalescing
Add support for checking and coalescing multi-hugepage-backed fixed
buffers. The coalescing optimizes both time and space consumption caused
by mapping and storing multi-hugepage fixed buffers.
A coalescable multi-hugepage buffer should fully cover its folios
(except potentially the first and last one), and these folios should
have the same size. These requirements are for easier processing later,
also we need same size'd chunks in io_import_fixed for fast iov_iter
adjust.
Signed-off-by: Chenliang Li <cliang01.li@samsung.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20240731090133.4106-3-cliang01.li@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chenliang Li [Wed, 31 Jul 2024 09:01:32 +0000 (17:01 +0800)]
io_uring/rsrc: store folio shift and mask into imu
Store the folio shift and folio mask into imu struct and use it in
iov_iter adjust, as we will have non PAGE_SIZE'd chunks if a
multi-hugepage buffer get coalesced.
Signed-off-by: Chenliang Li <cliang01.li@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20240731090133.4106-2-cliang01.li@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Olivier Langlois [Mon, 29 Jul 2024 22:38:33 +0000 (18:38 -0400)]
io_uring: add napi busy settings to the fdinfo output
This info may be useful when attempting to debug a problem involving a
ring using the NAPI feature.
Here is an example of the output:
ip-172-31-39-89 /proc/772/fdinfo # cat 14
pos: 0
flags:
02000002
mnt_id: 16
ino: 10243
SqMask: 0xff
SqHead: 633
SqTail: 633
CachedSqHead: 633
CqMask: 0x3fff
CqHead: 430250
CqTail: 430250
CachedCqTail: 430250
SQEs: 0
CQEs: 0
SqThread: 885
SqThreadCpu: 0
SqTotalTime:
52793826
SqWorkTime:
3590465
UserFiles: 0
UserBufs: 0
PollList:
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=6, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=6, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
CqOverflowList:
NAPI: enabled
napi_busy_poll_to: 1
napi_prefer_busy_poll: true
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bb184f8b62703ddd3e6e19eae7ab6c67b97e1e10.1722293317.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 25 Aug 2024 07:07:11 +0000 (19:07 +1200)]
Linux 6.11-rc5
Linus Torvalds [Sun, 25 Aug 2024 05:20:48 +0000 (17:20 +1200)]
Merge tag 'bcachefs-2024-08-24' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- assorted syzbot fixes
- some upgrade fixes for old (pre 1.0) filesystems
- fix for moving data off a device that was switched to durability=0
after data had been written to it.
- nocow deadlock fix
- fix for new rebalance_work accounting
* tag 'bcachefs-2024-08-24' of git://evilpiepirate.org/bcachefs: (28 commits)
bcachefs: Fix rebalance_work accounting
bcachefs: Fix failure to flush moves before sleeping in copygc
bcachefs: don't use rht_bucket() in btree_key_cache_scan()
bcachefs: add missing inode_walker_exit()
bcachefs: clear path->should_be_locked in bch2_btree_key_cache_drop()
bcachefs: Fix double assignment in check_dirent_to_subvol()
bcachefs: Fix refcounting in discard path
bcachefs: Fix compat issue with old alloc_v4 keys
bcachefs: Fix warning in bch2_fs_journal_stop()
fs/super.c: improve get_tree() error message
bcachefs: Fix missing validation in bch2_sb_journal_v2_validate()
bcachefs: Fix replay_now_at() assert
bcachefs: Fix locking in bch2_ioc_setlabel()
bcachefs: fix failure to relock in btree_node_fill()
bcachefs: fix failure to relock in bch2_btree_node_mem_alloc()
bcachefs: unlock_long() before resort in journal replay
bcachefs: fix missing bch2_err_str()
bcachefs: fix time_stats_to_text()
bcachefs: Fix bch2_bucket_gens_init()
bcachefs: Fix bch2_trigger_alloc assert
...
Linus Torvalds [Sun, 25 Aug 2024 00:15:04 +0000 (12:15 +1200)]
Merge tag '6.11-rc5-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- query directory flex array fix
- fix potential null ptr reference in open
- fix error message in some open cases
- two minor cleanups
* tag '6.11-rc5-server-fixes' of git://git.samba.org/ksmbd:
smb/server: update misguided comment of smb2_allocate_rsp_buf()
smb/server: remove useless assignment of 'file_present' in smb2_open()
smb/server: fix potential null-ptr-deref of lease_ctx_info in smb2_open()
smb/server: fix return value of smb2_open()
ksmbd: the buffer of smb2 query dir response has at least 1 byte
Linus Torvalds [Sun, 25 Aug 2024 00:05:23 +0000 (12:05 +1200)]
Merge tag 's390-6.11-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix KASLR base offset to account for symbol offsets in the vmlinux
ELF file, preventing tool breakages like the drgn debugger
- Fix potential memory corruption of physmem_info during kernel
physical address randomization
- Fix potential memory corruption due to overlap between the relocated
lowcore and identity mapping by correctly reserving lowcore memory
- Fix performance regression and avoid randomizing identity mapping
base by default
- Fix unnecessary delay of AP bus binding complete uevent to prevent
startup lag in KVM guests using AP
* tag 's390-6.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/boot: Fix KASLR base offset off by __START_KERNEL bytes
s390/boot: Avoid possible physmem_info segment corruption
s390/ap: Refine AP bus bindings complete processing
s390/mm: Pin identity mapping base to zero
s390/mm: Prevent lowcore vs identity mapping overlap
Linus Torvalds [Sun, 25 Aug 2024 00:00:16 +0000 (12:00 +1200)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The important core fix is another tweak to our discard discovery
issues. The off by 512 in logical block count seems bad, but in fact
the inline was only ever used in debug prints, which is why no-one
noticed"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Do not attempt to configure discard unless LBPME is set
scsi: MAINTAINERS: Add header files to SCSI SUBSYSTEM
scsi: ufs: qcom: Add UFSHCD_QUIRK_BROKEN_LSDBS_CAP for SM8550 SoC
scsi: ufs: core: Add a quirk for handling broken LSDBS field in controller capabilities register
scsi: core: Fix the return value of scsi_logical_block_count()
scsi: MAINTAINERS: Update HiSilicon SAS controller driver maintainer
Kent Overstreet [Fri, 23 Aug 2024 19:35:22 +0000 (15:35 -0400)]
bcachefs: Fix rebalance_work accounting
rebalance_work was keying off of the presence of rebelance_opts in the
extent - but that was incorrect, we keep those around after rebalance
for indirect extents since the inode's options are not directly
available
Fixes:
20ac515a9cc7 ("bcachefs: bch_acct_rebalance_work")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 23 Aug 2024 21:38:41 +0000 (17:38 -0400)]
bcachefs: Fix failure to flush moves before sleeping in copygc
This fixes an apparent deadlock - rebalance would get stuck trying to
take nocow locks because they weren't being released by copygc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Sat, 24 Aug 2024 02:39:18 +0000 (10:39 +0800)]
Merge tag 'cgroup-for-6.11-rc4-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Three patches addressing cpuset corner cases"
* tag 'cgroup-for-6.11-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Eliminate unncessary sched domains rebuilds in hotplug
cgroup/cpuset: Clear effective_xcpus on cpus_allowed clearing only if cpus.exclusive not set
cgroup/cpuset: fix panic caused by partcmd_update
Linus Torvalds [Sat, 24 Aug 2024 02:35:57 +0000 (10:35 +0800)]
Merge tag 'wq-for-6.11-rc4-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Nothing too interesting. One patch to remove spurious warning and
others to address static checker warnings"
* tag 'wq-for-6.11-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Correct declaration of cpu_pwq in struct workqueue_struct
workqueue: Fix spruious data race in __flush_work()
workqueue: Remove incorrect "WARN_ON_ONCE(!list_empty(&worker->entry));" from dying worker
workqueue: Fix UBSAN 'subtraction overflow' error in shift_and_mask()
workqueue: doc: Fix function name, remove markers
Linus Torvalds [Sat, 24 Aug 2024 02:10:43 +0000 (10:10 +0800)]
Merge tag 'mips-fixes_6.11_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- Set correct timer mode on Loongson64
- Only request r4k clockevent interrupt on one CPU
* tag 'mips-fixes_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: cevt-r4k: Don't call get_c0_compare_int if timer irq is installed
MIPS: Loongson64: Set timer mode in cpu-probe
Linus Torvalds [Sat, 24 Aug 2024 02:03:03 +0000 (10:03 +0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 kvm fixes from Catalin Marinas:
- Don't drop references on LPIs that weren't visited by the vgic-debug
iterator
- Cure lock ordering issue when unregistering vgic redistributors
- Fix for misaligned stage-2 mappings when VMs are backed by hugetlb
pages
- Treat SGI registers as UNDEFINED if a VM hasn't been configured for
GICv3
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3
KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling fault
KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors
KVM: arm64: vgic-debug: Don't put unmarked LPIs
Linus Torvalds [Sat, 24 Aug 2024 01:03:25 +0000 (09:03 +0800)]
Merge tag 'nfs-for-6.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
- Fix rpcrdma refcounting in xa_alloc
- Fix rpcrdma usage of XA_FLAGS_ALLOC
- Fix requesting FATTR4_WORD2_OPEN_ARGUMENTS
- Fix attribute bitmap decoder to handle a 3rd word
- Add reschedule points when returning delegations to avoid soft lockups
- Fix clearing layout segments in layoutreturn
- Avoid unnecessary rescanning of the per-server delegation list
* tag 'nfs-for-6.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Avoid unnecessary rescanning of the per-server delegation list
NFSv4: Fix clearing of layout segments in layoutreturn
NFSv4: Add missing rescheduling points in nfs_client_return_marked_delegations
nfs: fix bitmap decoder to handle a 3rd word
nfs: fix the fetch of FATTR4_OPEN_ARGUMENTS
rpcrdma: Trace connection registration and unregistration
rpcrdma: Use XA_FLAGS_ALLOC instead of XA_FLAGS_ALLOC1
rpcrdma: Device kref is over-incremented on error from xa_alloc
Linus Torvalds [Sat, 24 Aug 2024 00:50:21 +0000 (08:50 +0800)]
Merge tag 'v6.11-rc4-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix refcount leak (can cause rmmod fail)
- fix byte range locking problem with cached reads
- fix for mount failure if reparse point unrecognized
- minor typo
* tag 'v6.11-rc4-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb/client: fix typo: GlobalMid_Sem -> GlobalMid_Lock
smb: client: ignore unhandled reparse tags
smb3: fix problem unloading module due to leaked refcount on shutdown
smb3: fix broken cached reads when posix locks
Linus Torvalds [Sat, 24 Aug 2024 00:15:21 +0000 (08:15 +0800)]
Merge tag 'input-for-v6.11-rc4' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a tweak to uinput interface to reject requests with abnormally large
number of slots. 100 slots/contacts should be enough for real devices
- support for FocalTech FT8201 added to the edt-ft5x06 driver
- tweaks to i8042 to handle more devices that have issue with its
emulation
- Synaptics touchpad switched to native SMbus/RMI mode on HP Elitebook
840 G2
- other minor fixes
* tag 'input-for-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: himax_hx83112b - fix incorrect size when reading product ID
Input: i8042 - use new forcenorestore quirk to replace old buggy quirk combination
Input: i8042 - add forcenorestore quirk to leave controller untouched even on s3
Input: i8042 - add Fujitsu Lifebook E756 to i8042 quirk table
Input: uinput - reject requests with unreasonable number of slots
Input: edt-ft5x06 - add support for FocalTech FT8201
dt-bindings: input: touchscreen: edt-ft5x06: Document FT8201 support
Input: adc-joystick - fix optional value handling
Input: synaptics - enable SMBus for HP Elitebook 840 G2
Input: ads7846 - ratelimit the spi_sync error message
Linus Torvalds [Sat, 24 Aug 2024 00:10:17 +0000 (08:10 +0800)]
Merge tag 'drm-fixes-2024-08-24' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes. xe and msm are the major groups, with
amdgpu/i915/nouveau having smaller bits. xe has a bunch of hw
workaround fixes that were found to be missing, so that is why there
are a bunch of scattered fixes, and one larger one. But overall size
doesn't look too out of the ordinary.
msm:
- virtual plane fixes:
- drop yuv on hw where not supported
- csc vs yuv format fix
- rotation fix
- fix fb cleanup on close
- reset phy before link training
- fix visual corruption at 4K
- fix NULL ptr crash on hotplug
- simplify debug macros
- sc7180 fix
- adreno firmware name error path fix
amdgpu:
- GFX10 firmware loading fix
- SDMA 5.2 fix
- Debugfs parameter validation fix
- eGPU hotplug fix
i915:
- fix HDCP timeouts
nouveau:
- fix SG_DEBUG crash
xe:
- Fix OA format masks which were breaking build with gcc-5
- Fix opregion leak (Lucas)
- Fix OA sysfs entry (Ashutosh)
- Fix VM dma-resv lock (Brost)
- Fix tile fini sequence (Brost)
- Prevent UAF around preempt fence (Auld)
- Fix DGFX display suspend/resume (Maarten)
- Many Xe/Xe2 critical workarounds (Auld, Ngai-Mint, Bommu, Tejas, Daniele)
- Fix devm/drmm issues (Daniele)
- Fix missing workqueue destroy in xe_gt_pagefault (Stuart)
- Drop HW fence pointer to HW fence ctx (Brost)
- Free job before xe_exec_queue_put (Brost)"
* tag 'drm-fixes-2024-08-24' of https://gitlab.freedesktop.org/drm/kernel: (35 commits)
drm/xe: Free job before xe_exec_queue_put
drm/xe: Drop HW fence pointer to HW fence ctx
drm/xe: Fix missing workqueue destroy in xe_gt_pagefault
drm/amdgpu: fix eGPU hotplug regression
drm/amdgpu: Validate TA binary size
drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
drm/amdgpu: fixing rlc firmware loading failure issue
drm/xe/uc: Use devm to register cleanup that includes exec_queues
drm/xe: use devm instead of drmm for managed bo
drm/xe/xe2hpg: Add Wa_14021821874
drm/xe: fix WA
14018094691
drm/xe/xe2: Add Wa_15015404425
drm/xe/xe2: Make subsequent L2 flush sequential
drm/xe/xe2lpg: Extend workaround
14021402888
drm/xe/xe2lpm: Extend Wa_16021639441
drm/xe/bmg: implement Wa_16023588340
drm/xe/oa/uapi: Make bit masks unsigned
drm/xe/display: Make display suspend/resume work on discrete
drm/xe: prevent UAF around preempt fence
drm/xe: Fix tile fini sequence
...
Linus Torvalds [Fri, 23 Aug 2024 23:49:14 +0000 (07:49 +0800)]
Merge tag 'block-6.11-
20240823' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith
- Remove unused struct field (Nilay)
- Fix fabrics keep-alive teardown order (Ming)
- Write zeroes fixes (John)
* tag 'block-6.11-
20240823' of git://git.kernel.dk/linux:
nvme: Remove unused field
nvme: move stopping keep-alive into nvme_uninit_ctrl()
block: Drop NULL check in bdev_write_zeroes_sectors()
block: Read max write zeroes once for __blkdev_issue_write_zeroes()
Linus Torvalds [Fri, 23 Aug 2024 23:45:08 +0000 (07:45 +0800)]
Merge tag 'io_uring-6.11-
20240823' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix for provided buffer validation"
* tag 'io_uring-6.11-
20240823' of git://git.kernel.dk/linux:
io_uring/kbuf: sanitize peek buffer setup
Linus Torvalds [Fri, 23 Aug 2024 23:39:35 +0000 (07:39 +0800)]
Merge tag 'acpi-6.11-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix backlight control on a Dell All In One system where a backlight
controller board is attached to a UART port and the dell-uart
backlight driver binds to it, but the backlight is actually controlled
by other means (Hans de Goede)"
* tag 'acpi-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Add backlight=native quirk for Dell OptiPlex 7760 AIO
platform/x86: dell-uart-backlight: Use acpi_video_get_backlight_type()
ACPI: video: Add Dell UART backlight controller detection
Linus Torvalds [Fri, 23 Aug 2024 23:26:28 +0000 (07:26 +0800)]
Merge tag 'thermal-6.11-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix error handling in the thermal debug code and OF node
reference leaks in the thermal OF driver.
Specifics:
- Use IS_ERR() in checks of debugfs_create_dir() return value instead
of checking it against NULL in the thermal debug code (Yang Ruibin)
- Fix three OF node reference leaks in thermal_of (Krzysztof
Kozlowski)"
* tag 'thermal-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: of: Fix OF node leak in of_thermal_zone_find() error paths
thermal: of: Fix OF node leak in thermal_of_zone_register()
thermal: of: Fix OF node leak in thermal_of_trips_init() error path
thermal/debugfs: Fix the NULL vs IS_ERR() confusion in debugfs_create_dir()
Linus Torvalds [Fri, 23 Aug 2024 22:58:04 +0000 (06:58 +0800)]
Merge tag 'mmc-v6.11-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull mmc fixes from Ulf Hansson:
"MMC core:
- Fix NULL dereference for mmc_test on allocation failure
MMC host:
- dw_mmc: Fix support for deferred probe for biu/ciu clocks
- mtk-sd: Fix CMD8 support when fragile tuning settings"
* tag 'mmc-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: mmc_test: Fix NULL dereference on allocation failure
mmc: dw_mmc: allow biu and ciu clocks to defer
mmc: mtk-sd: receive cmd8 data when hs400 tuning fail
Linus Torvalds [Fri, 23 Aug 2024 22:56:06 +0000 (06:56 +0800)]
Merge tag 'spi-fix-v6.11-rc3' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small collection of fixes here, all driver specific and none of them
too serious. For whatever reason runtime PM seems to have been causing
a bunch of issues recently"
* tag 'spi-fix-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: pxa2xx: Move PM runtime handling to the glue drivers
spi: pxa2xx: Do not override dev->platform_data on probe
spi: spi-fsl-lpspi: limit PRESCALE bit in TCR register
spi: spi-cadence-quadspi: Fix OSPI NOR failures during system resume
spi: zynqmp-gqspi: Scale timeout by data size
Linus Torvalds [Fri, 23 Aug 2024 09:48:27 +0000 (17:48 +0800)]
Merge tag 'pwrseq-fixes-for-v6.11-rc5' of git://git./linux/kernel/git/brgl/linux
Pull power sequencing fix from Bartosz Golaszewski:
- request the wlan-enable GPIO "as-is" to fix an issue with the wifi
module being already powered up before linux boots
* tag 'pwrseq-fixes-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
power: sequencing: request the WLAN enable GPIO as-is
Linus Torvalds [Fri, 23 Aug 2024 09:43:34 +0000 (17:43 +0800)]
Merge tag 'pmdomain-v6.11-rc2' of git://git./linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- imx: Remove duplicated clocks for scu power domain
- imx: Wait for SSAR to complete power-on for i.MX93 power domain
* tag 'pmdomain-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: imx: wait SSAR when i.MX93 power domain on
pmdomain: imx: scu-pd: Remove duplicated clocks
Catalin Marinas [Fri, 23 Aug 2024 08:47:39 +0000 (09:47 +0100)]
Merge tag 'kvmarm-fixes-6.11-2' of git://git./linux/kernel/git/kvmarm/kvmarm into for-next/fixes
KVM/arm64 fixes for 6.11, round #2
- Don't drop references on LPIs that weren't visited by the
vgic-debug iterator
- Cure lock ordering issue when unregistering vgic redistributors
- Fix for misaligned stage-2 mappings when VMs are backed by hugetlb
pages
- Treat SGI registers as UNDEFINED if a VM hasn't been configured for
GICv3
* tag 'kvmarm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm:
KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3
KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling fault
KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors
KVM: arm64: vgic-debug: Don't put unmarked LPIs
KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
KVM: arm64: Tidying up PAuth code in KVM
KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
docs: KVM: Fix register ID of SPSR_FIQ
KVM: arm64: vgic: fix unexpected unlock sparse warnings
KVM: arm64: fix kdoc warnings in W=1 builds
KVM: arm64: fix override-init warnings in W=1 builds
KVM: arm64: free kvm->arch.nested_mmus with kvfree()
Linus Torvalds [Fri, 23 Aug 2024 02:25:29 +0000 (10:25 +0800)]
Merge tag 'ata-6.11-rc5' of git://git./linux/kernel/git/libata/linux
Pull ata fixes from Damien Le Moal:
- Fix the max segment size and max number of segments supported by the
pata_macio driver to fix issues with BIO splitting leading to an
overflow of the adapter DMA table (from Michael)
- Related to the previous fix, change BUG_ON() calls for incorrect
command buffer segmentation into WARN_ON() and an error return (from
Michael)
* tag 'ata-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: pata_macio: Use WARN instead of BUG
ata: pata_macio: Fix DMA table overflow
Linus Torvalds [Thu, 22 Aug 2024 23:47:01 +0000 (07:47 +0800)]
Merge tag 'net-6.11-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter.
Current release - regressions:
- virtio_net: avoid crash on resume - move netdev_tx_reset_queue()
call before RX napi enable
Current release - new code bugs:
- net/mlx5e: fix page leak and incorrect header release w/ HW GRO
Previous releases - regressions:
- udp: fix receiving fraglist GSO packets
- tcp: prevent refcount underflow due to concurrent execution of
tcp_sk_exit_batch()
Previous releases - always broken:
- ipv6: fix possible UAF when incrementing error counters on output
- ip6: tunnel: prevent merging of packets with different L2
- mptcp: pm: fix IDs not being reusable
- bonding: fix potential crashes in IPsec offload handling
- Bluetooth: HCI:
- MGMT: add error handling to pair_device() to avoid a crash
- invert LE State quirk to be opt-out rather then opt-in
- fix LE quote calculation
- drv: dsa: VLAN fixes for Ocelot driver
- drv: igb: cope with large MAX_SKB_FRAGS Kconfig settings
- drv: ice: fi Rx data path on architectures with PAGE_SIZE >= 8192
Misc:
- netpoll: do not export netpoll_poll_[disable|enable]()
- MAINTAINERS: update the list of networking headers"
* tag 'net-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
s390/iucv: Fix vargs handling in iucv_alloc_device()
net: ovs: fix ovs_drop_reasons error
net: xilinx: axienet: Fix dangling multicast addresses
net: xilinx: axienet: Always disable promiscuous mode
MAINTAINERS: Mark JME Network Driver as Odd Fixes
MAINTAINERS: Add header files to NETWORKING sections
MAINTAINERS: Add limited globs for Networking headers
MAINTAINERS: Add net_tstamp.h to SOCKET TIMESTAMPING section
MAINTAINERS: Add sonet.h to ATM section of MAINTAINERS
octeontx2-af: Fix CPT AF register offset calculation
net: phy: realtek: Fix setting of PHY LEDs Mode B bit on RTL8211F
net: ngbe: Fix phy mode set to external phy
netfilter: flowtable: validate vlan header
bnxt_en: Fix double DMA unmapping for XDP_REDIRECT
ipv6: prevent possible UAF in ip6_xmit()
ipv6: fix possible UAF in ip6_finish_output2()
ipv6: prevent UAF in ip6_send_skb()
netpoll: do not export netpoll_poll_[disable|enable]()
selftests: mlxsw: ethtool_lanes: Source ethtool lib from correct path
udp: fix receiving fraglist GSO packets
...
Linus Torvalds [Thu, 22 Aug 2024 23:43:15 +0000 (07:43 +0800)]
Merge tag 'kbuild-fixes-v6.11-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Eliminate the fdtoverlay command duplication in scripts/Makefile.lib
- Fix 'make compile_commands.json' for external modules
- Ensure scripts/kconfig/merge_config.sh handles missing newlines
- Fix some build errors on macOS
* tag 'kbuild-fixes-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: fix typos "prequisites" to "prerequisites"
Documentation/llvm: turn make command for ccache into code block
kbuild: avoid scripts/kallsyms parsing /dev/null
treewide: remove unnecessary <linux/version.h> inclusion
scripts: kconfig: merge_config: config files: add a trailing newline
Makefile: add $(srctree) to dependency of compile_commands.json target
kbuild: clean up code duplication in cmd_fdtoverlay
Dave Airlie [Thu, 22 Aug 2024 23:11:52 +0000 (09:11 +1000)]
Merge tag 'drm-xe-fixes-2024-08-22' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
- Fix OA format masks which were breaking build with gcc-5 (Geert)
Driver Changes:
- Fix opregion leak (Lucas)
- Fix OA sysfs entry (Ashutosh)
- Fix VM dma-resv lock (Brost)
- Fix tile fini sequence (Brost)
- Prevent UAF around preempt fence (Auld)
- Fix DGFX display suspend/resume (Maarten)
- Many Xe/Xe2 critical workarounds (Auld, Ngai-Mint, Bommu, Tejas, Daniele)
- Fix devm/drmm issues (Daniele)
- Fix missing workqueue destroy in xe_gt_pagefault (Stuart)
- Drop HW fence pointer to HW fence ctx (Brost)
- Free job before xe_exec_queue_put (Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZsdVe0XI2Pq8C-ON@intel.com
Dave Airlie [Thu, 22 Aug 2024 23:08:18 +0000 (09:08 +1000)]
Merge tag 'drm-misc-fixes-2024-08-22' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
nouveau:
- firmware: use dma non-coherent allocator
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240822123907.GA234335@localhost.localdomain
Dave Airlie [Thu, 22 Aug 2024 23:05:11 +0000 (09:05 +1000)]
Merge tag 'drm-intel-fixes-2024-08-22' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix for HDCP timeouts
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZsbPMm6XfzimmZW0@jlahtine-mobl.ger.corp.intel.com
Dave Airlie [Thu, 22 Aug 2024 22:58:17 +0000 (08:58 +1000)]
Merge tag 'amd-drm-fixes-6.11-2024-08-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.11-2024-08-21:
amdgpu:
- GFX10 firmware loading fix
- SDMA 5.2 fix
- Debugfs parameter validation fix
- eGPU hotplug fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240821172810.302416-1-alexander.deucher@amd.com
Jens Axboe [Thu, 22 Aug 2024 22:20:24 +0000 (16:20 -0600)]
Merge tag 'nvme-6.11-2024-08-22' of git://git.infradead.org/nvme into block-6.11
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.11
- Remove unused struct field (Nilay)
- Fix fabrics keep-alive teardown order (Ming)"
* tag 'nvme-6.11-2024-08-22' of git://git.infradead.org/nvme:
nvme: Remove unused field
nvme: move stopping keep-alive into nvme_uninit_ctrl()
Trond Myklebust [Wed, 21 Aug 2024 18:05:02 +0000 (14:05 -0400)]
NFS: Avoid unnecessary rescanning of the per-server delegation list
If the call to nfs_delegation_grab_inode() fails, we will not have
dropped any locks that require us to rescan the list.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Wed, 21 Aug 2024 18:05:01 +0000 (14:05 -0400)]
NFSv4: Fix clearing of layout segments in layoutreturn
Make sure that we clear the layout segments in cases where we see a
fatal error, and also in the case where the layout is invalid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Wed, 21 Aug 2024 18:05:00 +0000 (14:05 -0400)]
NFSv4: Add missing rescheduling points in nfs_client_return_marked_delegations
We're seeing reports of soft lockups when iterating through the loops,
so let's add rescheduling points.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jeff Layton [Wed, 21 Aug 2024 12:28:25 +0000 (08:28 -0400)]
nfs: fix bitmap decoder to handle a 3rd word
It only decodes the first two words at this point. Have it decode the
third word as well. Without this, the client doesn't send delegated
timestamps in the CB_GETATTR response.
With this change we also need to expand the on-stack bitmap in
decode_recallany_args to 3 elements, in case the server sends a larger
bitmap than expected.
Fixes:
43df7110f4a9 ("NFSv4: Add CB_GETATTR support for delegated attributes")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jeff Layton [Thu, 15 Aug 2024 14:18:41 +0000 (10:18 -0400)]
nfs: fix the fetch of FATTR4_OPEN_ARGUMENTS
The client doesn't properly request FATTR4_OPEN_ARGUMENTS in the initial
SERVER_CAPS getattr. Add FATTR4_WORD2_OPEN_ARGUMENTS to the initial
request.
Fixes:
707f13b3d081 (NFSv4: Add support for the FATTR4_OPEN_ARGUMENTS attribute)
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Dave Airlie [Thu, 22 Aug 2024 20:46:28 +0000 (06:46 +1000)]
Merge tag 'drm-msm-fixes-2024-08-19' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
Fixes for v6.11-rc5
1) Fixes from the virtual plane series, namely
- fix the list of formats for QCM2290 since it has no YUV support
- minor fix in dpu_plane_atomic_check_pipe() to check only for csc and
not csc and scaler while allowing yuv formats
- take rotation into account while allocating virtual planes
2) Fix to cleanup FB if dpu_format_populate_layout() fails. This fixes the
warning splat during DRM file closure
3) Fix to reset the phy link params before re-starting link training. This
fixes the 100% link training failure when someone starts modetest while
cable is connected
4) Long pending fix to fix a visual corruption seen for 4k modes. Root-cause
was we cannot support 4k@30 with 30bpp with 2 lanes so this is a critical
fix to use 24bpp for such cases
5) Fix to move dpu encoder's connector assignment to atomic_enable(). This
fixes the NULL ptr crash for cases when there is an atomic_enable()
without atomic_modeset() after atomic_disable() . This happens for
connectors_changed case of crtc. It fixes a NULL ptr crash reported
during hotplug.
6) Fix to simplify DPU's debug macros without which dynamic debug does not
work as expected
7) Fix the highest bank bit setting for sc7180
8) adreno: fix error return if missing firmware-name
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvxF2p3-AsjUydmSYrA0Vb+Ea7nh3VtNX0pT0Ae_Me-Kw@mail.gmail.com
ChenXiaoSong [Tue, 20 Aug 2024 14:33:15 +0000 (14:33 +0000)]
smb/client: fix typo: GlobalMid_Sem -> GlobalMid_Lock
The comments have typos, fix that to not confuse readers.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Nilay Shroff [Wed, 14 Aug 2024 13:56:50 +0000 (19:26 +0530)]
nvme: Remove unused field
The "name" field in struct nvme_ctrl is unsued so removing it.
This would help save 12 bytes of space for each nvme_ctrl instance
created.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Ming Lei [Tue, 13 Aug 2024 01:35:27 +0000 (09:35 +0800)]
nvme: move stopping keep-alive into nvme_uninit_ctrl()
Commit
4733b65d82bd ("nvme: start keep-alive after admin queue setup")
moves starting keep-alive from nvme_start_ctrl() into
nvme_init_ctrl_finish(), but don't move stopping keep-alive into
nvme_uninit_ctrl(), so keep-alive work can be started and keep pending
after failing to start controller, finally use-after-free is triggered if
nvme host driver is unloaded.
This patch fixes kernel panic when running nvme/004 in case that connection
failure is triggered, by moving stopping keep-alive into nvme_uninit_ctrl().
This way is reasonable because keep-alive is now started in
nvme_init_ctrl_finish().
Fixes:
3af755a46881 ("nvme: move nvme_stop_keep_alive() back to original position")
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mark O'Donovan <shiftee@posteo.net>
Reported-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Alexandra Winter [Wed, 21 Aug 2024 09:13:37 +0000 (11:13 +0200)]
s390/iucv: Fix vargs handling in iucv_alloc_device()
iucv_alloc_device() gets a format string and a varying number of
arguments. This is incorrectly forwarded by calling dev_set_name() with
the format string and a va_list, while dev_set_name() expects also a
varying number of arguments.
Symptoms:
Corrupted iucv device names, which can result in log messages like:
sysfs: cannot create duplicate filename '/devices/iucv/hvc_iucv1827699952'
Fixes:
4452e8ef8c36 ("s390/iucv: Provide iucv_alloc_device() / iucv_release_device()")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1228425
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20240821091337.3627068-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Menglong Dong [Wed, 21 Aug 2024 12:32:52 +0000 (20:32 +0800)]
net: ovs: fix ovs_drop_reasons error
There is something wrong with ovs_drop_reasons. ovs_drop_reasons[0] is
"OVS_DROP_LAST_ACTION", but OVS_DROP_LAST_ACTION == __OVS_DROP_REASON + 1,
which means that ovs_drop_reasons[1] should be "OVS_DROP_LAST_ACTION".
And as Adrian tested, without the patch, adding flow to drop packets
results in:
drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97)
origin: software
input port ifindex: 8
timestamp: Tue Aug 20 10:19:17 2024
859853461 nsec
protocol: 0x800
length: 98
original length: 98
drop reason: OVS_DROP_ACTION_ERROR
With the patch, the same results in:
drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97)
origin: software
input port ifindex: 8
timestamp: Tue Aug 20 10:16:13 2024
475856608 nsec
protocol: 0x800
length: 98
original length: 98
drop reason: OVS_DROP_LAST_ACTION
Fix this by initializing ovs_drop_reasons with index.
Fixes:
9d802da40b7c ("net: openvswitch: add last-action drop reason")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240821123252.186305-1-dongml2@chinatelecom.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 22 Aug 2024 20:06:24 +0000 (13:06 -0700)]
Merge tag 'nf-24-08-22' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
Patch #1 disable BH when collecting stats via hardware offload to ensure
concurrent updates from packet path do not result in losing stats.
From Sebastian Andrzej Siewior.
Patch #2 uses write seqcount to reset counters serialize against reader.
Also from Sebastian Andrzej Siewior.
Patch #3 ensures vlan header is in place before accessing its fields,
according to KMSAN splat triggered by syzbot.
* tag 'nf-24-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: flowtable: validate vlan header
netfilter: nft_counter: Synchronize nft_counter_reset() against reader.
netfilter: nft_counter: Disable BH in nft_counter_offload_stats().
====================
Link: https://patch.msgid.link/20240822101842.4234-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 22 Aug 2024 20:03:59 +0000 (13:03 -0700)]
Merge branch 'net-xilinx-axienet-multicast-fixes-and-improvements'
Sean Anderson says:
====================
net: xilinx: axienet: Multicast fixes and improvements [part]
====================
First two patches of the series which are fixes.
Link: https://patch.msgid.link/20240822154059.1066595-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Thu, 22 Aug 2024 15:40:56 +0000 (11:40 -0400)]
net: xilinx: axienet: Fix dangling multicast addresses
If a multicast address is removed but there are still some multicast
addresses, that address would remain programmed into the frame filter.
Fix this by explicitly setting the enable bit for each filter.
Fixes:
8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Thu, 22 Aug 2024 15:40:55 +0000 (11:40 -0400)]
net: xilinx: axienet: Always disable promiscuous mode
If promiscuous mode is disabled when there are fewer than four multicast
addresses, then it will not be reflected in the hardware. Fix this by
always clearing the promiscuous mode flag even when we program multicast
addresses.
Fixes:
8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Krzysztof Kozlowski [Wed, 14 Aug 2024 19:58:23 +0000 (21:58 +0200)]
thermal: of: Fix OF node leak in of_thermal_zone_find() error paths
Terminating for_each_available_child_of_node() loop requires dropping OF
node reference, so bailing out on errors misses this. Solve the OF node
reference leak with scoped for_each_available_child_of_node_scoped().
Fixes:
3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240814195823.437597-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Krzysztof Kozlowski [Wed, 14 Aug 2024 19:58:22 +0000 (21:58 +0200)]
thermal: of: Fix OF node leak in thermal_of_zone_register()
thermal_of_zone_register() calls of_thermal_zone_find() which will
iterate over OF nodes with for_each_available_child_of_node() to find
matching thermal zone node. When it finds such, it exits the loop and
returns the node. Prematurely ending for_each_available_child_of_node()
loops requires dropping OF node reference, thus success of
of_thermal_zone_find() means that caller must drop the reference.
Fixes:
3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240814195823.437597-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Krzysztof Kozlowski [Wed, 14 Aug 2024 19:58:21 +0000 (21:58 +0200)]
thermal: of: Fix OF node leak in thermal_of_trips_init() error path
Terminating for_each_child_of_node() loop requires dropping OF node
reference, so bailing out after thermal_of_populate_trip() error misses
this. Solve the OF node reference leak with scoped
for_each_child_of_node_scoped().
Fixes:
d0c75fa2c17f ("thermal/of: Initialize trip points separately")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240814195823.437597-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Masahiro Yamada [Sun, 18 Aug 2024 07:07:11 +0000 (16:07 +0900)]
kbuild: fix typos "prequisites" to "prerequisites"
This typo in scripts/Makefile.build has been present for more than 20
years. It was accidentally copy-pasted to other scripts/Makefile.* files.
Fix them all.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Paulo Alcantara [Wed, 21 Aug 2024 03:45:03 +0000 (00:45 -0300)]
smb: client: ignore unhandled reparse tags
Just ignore reparse points that the client can't parse rather than
bailing out and not opening the file or directory.
Reported-by: Marc <1marc1@gmail.com>
Closes: https://lore.kernel.org/r/CAMHwNVv-B+Q6wa0FEXrAuzdchzcJRsPKDDRrNaYZJd6X-+iJzw@mail.gmail.com
Fixes:
539aad7f14da ("smb: client: introduce ->parse_reparse_point()")
Tested-by: Anthony Nandaa (Microsoft) <profnandaa@gmail.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Fri, 16 Aug 2024 21:47:39 +0000 (16:47 -0500)]
smb3: fix problem unloading module due to leaked refcount on shutdown
The shutdown ioctl can leak a refcount on the tlink which can
prevent rmmod (unloading the cifs.ko) module from working.
Found while debugging xfstest generic/043
Fixes:
69ca1f57555f ("smb3: add dynamic tracepoints for shutdown ioctl")
Reviewed-by: Meetakshi Setiya <msetiya@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Alexander Gordeev [Wed, 21 Aug 2024 16:55:07 +0000 (18:55 +0200)]
s390/boot: Fix KASLR base offset off by __START_KERNEL bytes
Symbol offsets to the KASLR base do not match symbol address in
the vmlinux image. That is the result of setting the KASLR base
to the beginning of .text section as result of an optimization.
Revert that optimization and allocate virtual memory for the
whole kernel image including __START_KERNEL bytes as per the
linker script. That allows keeping the semantics of the KASLR
base offset in sync with other architectures.
Rename __START_KERNEL to TEXT_OFFSET, since it represents the
offset of the .text section within the kernel image, rather than
a virtual address.
Still skip mapping TEXT_OFFSET bytes to save memory on pgtables
and provoke exceptions in case an attempt to access this area is
made, as no kernel symbol may reside there.
In case CONFIG_KASAN is enabled the location counter might exceed
the value of TEXT_OFFSET, while the decompressor linker script
forcefully resets it to TEXT_OFFSET, which leads to a sections
overlap link failure. Use MAX() expression to avoid that.
Reported-by: Omar Sandoval <osandov@osandov.com>
Closes: https://lore.kernel.org/linux-s390/ZnS8dycxhtXBZVky@telecaster.dhcp.thefacebook.com/
Fixes:
56b1069c40c7 ("s390/boot: Rework deployment of the kernel image")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Alexander Gordeev [Wed, 21 Aug 2024 16:55:06 +0000 (18:55 +0200)]
s390/boot: Avoid possible physmem_info segment corruption
When physical memory for the kernel image is allocated it does not
consider extra memory required for offsetting the image start to
match it with the lower 20 bits of KASLR virtual base address. That
might lead to kernel access beyond its memory range.
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Fixes:
693d41f7c938 ("s390/mm: Restore mapping of kernel image using large pages")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
ChenXiaoSong [Thu, 22 Aug 2024 08:20:54 +0000 (08:20 +0000)]
smb/server: update misguided comment of smb2_allocate_rsp_buf()
smb2_allocate_rsp_buf() will return other error code except -ENOMEM.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
ChenXiaoSong [Thu, 22 Aug 2024 08:20:52 +0000 (08:20 +0000)]
smb/server: remove useless assignment of 'file_present' in smb2_open()
The variable is already true here.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
ChenXiaoSong [Thu, 22 Aug 2024 08:20:51 +0000 (08:20 +0000)]
smb/server: fix potential null-ptr-deref of lease_ctx_info in smb2_open()
null-ptr-deref will occur when (req_op_level == SMB2_OPLOCK_LEVEL_LEASE)
and parse_lease_state() return NULL.
Fix this by check if 'lease_ctx_info' is NULL.
Additionally, remove the redundant parentheses in
parse_durable_handle_context().
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
ChenXiaoSong [Thu, 22 Aug 2024 08:20:50 +0000 (08:20 +0000)]
smb/server: fix return value of smb2_open()
In most error cases, error code is not returned in smb2_open(),
__process_request() will not print error message.
Fix this by returning the correct value at the end of smb2_open().
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Tue, 20 Aug 2024 13:07:38 +0000 (22:07 +0900)]
ksmbd: the buffer of smb2 query dir response has at least 1 byte
When STATUS_NO_MORE_FILES status is set to smb2 query dir response,
->StructureSize is set to 9, which mean buffer has 1 byte.
This issue occurs because ->Buffer[1] in smb2_query_directory_rsp to
flex-array.
Fixes:
eb3e28c1e89b ("smb3: Replace smb2pdu 1-element arrays with flex-arrays")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Kent Overstreet [Mon, 19 Aug 2024 20:41:00 +0000 (16:41 -0400)]
bcachefs: don't use rht_bucket() in btree_key_cache_scan()
rht_bucket() does strange complicated things when a rehash is in
progress.
Instead, just skip scanning when a rehash is in progress: scanning is
going to be more expensive (many more empty slots to cover), and some
sort of infinite loop is being observed
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 22 Aug 2024 07:57:39 +0000 (03:57 -0400)]
bcachefs: add missing inode_walker_exit()
fix a small leak
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Paolo Abeni [Thu, 22 Aug 2024 13:24:07 +0000 (15:24 +0200)]
Merge branch 'maintainers-networking-updates'
Simon Horman says:
====================
MAINTAINERS: Networking updates
This series includes Networking-related updates to MAINTAINERS.
* Patches 1-4 aim to assign header files with "*net*' and '*skbuff*'
in their name to Networking-related sections within Maintainers.
There are a few such files left over after this patches.
I have to sent separate patches to add them to SCSI SUBSYSTEM
and NETWORKING DRIVERS (WIRELESS) sections [1][2].
[1] https://lore.kernel.org/linux-scsi/
20240816-scsi-mnt-v1-1-
439af8b1c28b@kernel.org/
[2] https://lore.kernel.org/linux-wireless/
20240816-wifi-mnt-v1-1-
3fb3bf5d44aa@kernel.org/
* Patch 5 updates the status of the JME driver to 'Odd Fixes'
====================
Link: https://patch.msgid.link/20240821-net-mnt-v2-0-59a5af38e69d@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>