linux-block.git
13 months agoio_uring/rsrc: refactor io_rsrc_node_switch
Pavel Begunkov [Tue, 11 Apr 2023 11:06:07 +0000 (12:06 +0100)]
io_uring/rsrc: refactor io_rsrc_node_switch

We use io_rsrc_node_switch() coupled with io_rsrc_node_switch_start()
for a bunch of cases including initialising ctx->rsrc_node, i.e. by
passing NULL instead of rsrc_data. Leave it to only deal with actual
node changing.

For that, first remove it from io_uring_create() and add a function
allocating the first node. Then also remove all calls to
io_rsrc_node_switch() from files/buffers register as we already have a
node installed and it does essentially nothing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d146fe306ff98b1a5a60c997c252534f03d423d7.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: zero node's rsrc data on alloc
Pavel Begunkov [Tue, 11 Apr 2023 11:06:06 +0000 (12:06 +0100)]
io_uring/rsrc: zero node's rsrc data on alloc

struct io_rsrc_node::rsrc_data field is initialised on rsrc removal and
shouldn't be used before that, still let's play safe and zero the field
on alloc.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/09bd03cedc8da8a7974c5e6e4bf0489fd16593ab.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: consolidate node caching
Pavel Begunkov [Tue, 11 Apr 2023 11:06:05 +0000 (12:06 +0100)]
io_uring/rsrc: consolidate node caching

We store one pre-allocated rsrc node in ->rsrc_backup_node, merge it
with ->rsrc_node_cache.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6d5410e51ccd29be7a716be045b51d6b371baef6.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: add lockdep checks
Pavel Begunkov [Tue, 11 Apr 2023 11:06:04 +0000 (12:06 +0100)]
io_uring/rsrc: add lockdep checks

Add a lockdep chek to make sure that file and buffer updates hold
->uring_lock.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/961bbe6e433ec9bc0375127f23468b37b729df99.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: add irq lockdep checks
Pavel Begunkov [Tue, 11 Apr 2023 11:06:03 +0000 (12:06 +0100)]
io_uring: add irq lockdep checks

We don't post CQEs from the IRQ context, add a check catching that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f23f7a24dbe8027b3d37873fece2b6488f878b31.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/kbuf: remove extra ->buf_ring null check
Pavel Begunkov [Tue, 11 Apr 2023 11:06:02 +0000 (12:06 +0100)]
io_uring/kbuf: remove extra ->buf_ring null check

The kernel test robot complains about __io_remove_buffers().

io_uring/kbuf.c:221 __io_remove_buffers() warn: variable dereferenced
before check 'bl->buf_ring' (see line 219)

That check is not needed as ->buf_ring will always be set, so we can
remove it and so silence the warning.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9a632bbf749d9d911e605255652ce08d18e7d2c6.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: shut io_prep_async_work warning
Pavel Begunkov [Tue, 11 Apr 2023 11:06:01 +0000 (12:06 +0100)]
io_uring: shut io_prep_async_work warning

io_uring/io_uring.c:432 io_prep_async_work() error: we previously
assumed 'req->file' could be null (see line 425).

Even though it's a false positive as there will not be REQ_F_ISREG set
without a file, let's add a simple check to make the kernel test robot
happy. We don't care about performance here, but assumingly it'll be
optimised out by the compiler.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a6cfbe92c74b789c0b4f046f7f98d19b1ca2e5b7.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/uring_cmd: take advantage of completion batching
Jens Axboe [Wed, 12 Apr 2023 18:07:36 +0000 (12:07 -0600)]
io_uring/uring_cmd: take advantage of completion batching

We know now what the completion context is for the uring_cmd completion
handling, so use that to have io_req_task_complete() decide what the
best way to complete the request is. This allows batching of the posted
completions if we have multiple pending, rather than always doing them
one-by-one.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: optimise io_req_local_work_add
Pavel Begunkov [Thu, 6 Apr 2023 13:20:14 +0000 (14:20 +0100)]
io_uring: optimise io_req_local_work_add

Chains of memory accesses are never good for performance.
The req->task->io_uring->in_cancel in io_req_local_work_add() is there
so that when a task is exiting via io_uring_try_cancel_requests() and
starts waiting for completions, it gets woken up by every new task_work
item queued.

Do a little trick by announcing waiting in io_uring_try_cancel_requests(),
making io_req_local_work_add() wake us up. We also need to check for
deferred tw items after prepare_to_wait(TASK_INTERRUPTIBLE);

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fb11597e9bbcb365901824f8c5c2cf0d6ee100d0.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: refactor __io_cq_unlock_post_flush()
Pavel Begunkov [Thu, 6 Apr 2023 13:20:13 +0000 (14:20 +0100)]
io_uring: refactor __io_cq_unlock_post_flush()

Separate ->task_complete path in __io_cq_unlock_post_flush().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/baa9b8d822f024e4ee01c40209dbbe38d9c8c11d.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: reduce scheduling due to tw
Pavel Begunkov [Thu, 6 Apr 2023 13:20:12 +0000 (14:20 +0100)]
io_uring: reduce scheduling due to tw

Every task_work will try to wake the task to be executed, which causes
excessive scheduling and additional overhead. For some tw it's
justified, but others won't do much but post a single CQE.

When a task waits for multiple cqes, every such task_work will wake it
up. Instead, the task may give a hint about how many cqes it waits for,
io_req_local_work_add() will compare against it and skip wake ups
if #cqes + #tw is not enough to satisfy the waiting condition. Task_work
that uses the optimisation should be simple enough and never post more
than one CQE. It's also ignored for non DEFER_TASKRUN rings.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d2b77e99d1e86624d8a69f7037d764b739dcd225.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: inline llist_add()
Pavel Begunkov [Thu, 6 Apr 2023 13:20:11 +0000 (14:20 +0100)]
io_uring: inline llist_add()

We'll need to grab some information from the previous request in the tw
list, inline llist_add(), it'll be used in the following patch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0165493af7b379943c792114b972f331e7d7d10.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: add tw add flags
Pavel Begunkov [Thu, 6 Apr 2023 13:20:10 +0000 (14:20 +0100)]
io_uring: add tw add flags

We pass 'allow_local' into io_req_task_work_add() but will need more
flags. Replace it with a flags bit field and name this allow_local
flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4c0f01e7ef4e6feebfb199093cc995af7a19befa.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: refactor io_cqring_wake()
Pavel Begunkov [Thu, 6 Apr 2023 13:20:09 +0000 (14:20 +0100)]
io_uring: refactor io_cqring_wake()

Instead of smp_mb() + __io_cqring_wake() in __io_cq_unlock_post_flush()
use equivalent io_cqring_wake(). With that we can clean it up further
and remove __io_cqring_wake().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/662ee5d898168ac206be06038525e97b64072a46.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: optimize local tw add ctx pinning
Pavel Begunkov [Thu, 6 Apr 2023 13:20:08 +0000 (14:20 +0100)]
io_uring: optimize local tw add ctx pinning

We currently pin the ctx for io_req_local_work_add() with
percpu_ref_get/put, which implies two rcu_read_lock/unlock pairs and some
extra overhead on top in the fast path. Replace it with a pure rcu read
and let io_ring_exit_work() synchronise against it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cbdfcb6b232627f30e9e50ef91f13c4f05910247.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: move pinning out of io_req_local_work_add
Pavel Begunkov [Thu, 6 Apr 2023 13:20:07 +0000 (14:20 +0100)]
io_uring: move pinning out of io_req_local_work_add

Move ctx pinning from io_req_local_work_add() to the caller, looks
better and makes working with the code a bit easier.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/49c0dbed390b0d6d04cb942dd3592879fd5bfb1b.1680782017.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/uring_cmd: assign ioucmd->cmd at async prep time
Jens Axboe [Wed, 5 Apr 2023 14:21:45 +0000 (08:21 -0600)]
io_uring/uring_cmd: assign ioucmd->cmd at async prep time

Rather than check this in the fast path issue, it makes more sense to
just assign the copy of the data when we're setting it up anyway. This
makes the code a bit cleaner, and removes the need for this check in
the issue path.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: add custom limit for node caching
Pavel Begunkov [Tue, 4 Apr 2023 12:39:57 +0000 (13:39 +0100)]
io_uring/rsrc: add custom limit for node caching

The number of entries in the rsrc node cache is limited to 512, which
still seems unnecessarily large. Add per cache thresholds and set to
to 32 for the rsrc node cache.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d0cd538b944dac0bf878e276fc0199f21e6bccea.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: optimise io_rsrc_data refcounting
Pavel Begunkov [Tue, 4 Apr 2023 12:39:56 +0000 (13:39 +0100)]
io_uring/rsrc: optimise io_rsrc_data refcounting

Every struct io_rsrc_node takes a struct io_rsrc_data reference, which
means all rsrc updates do 2 extra atomics. Replace atomics refcounting
with a int as it's all done under ->uring_lock.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e73c3d6820cf679532696d790b5b8fae23537213.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: add lockdep sanity checks
Pavel Begunkov [Tue, 4 Apr 2023 12:39:55 +0000 (13:39 +0100)]
io_uring/rsrc: add lockdep sanity checks

We should hold ->uring_lock while putting nodes with io_put_rsrc_node(),
add a lockdep check for that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b50d5f156ac41450029796738c1dfd22a521df7a.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: cache struct io_rsrc_node
Pavel Begunkov [Tue, 4 Apr 2023 12:39:54 +0000 (13:39 +0100)]
io_uring/rsrc: cache struct io_rsrc_node

Add allocation cache for struct io_rsrc_node, it's always allocated and
put under ->uring_lock, so it doesn't need any extra synchronisation
around caches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/252a9d9ef9654e6467af30fdc02f57c0118fb76e.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: don't offload node free
Pavel Begunkov [Tue, 4 Apr 2023 12:39:53 +0000 (13:39 +0100)]
io_uring/rsrc: don't offload node free

struct delayed_work rsrc_put_work was previously used to offload node
freeing because io_rsrc_node_ref_zero() was previously called by RCU in
the IRQ context. Now, as percpu refcounting is gone, we can do it
eagerly at the spot without pushing it to a worker.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/13fb1aac1e8d068ad8fd4a0c6d0d157ab61b90c0.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: optimise io_rsrc_put allocation
Pavel Begunkov [Tue, 4 Apr 2023 12:39:52 +0000 (13:39 +0100)]
io_uring/rsrc: optimise io_rsrc_put allocation

Every io_rsrc_node keeps a list of items to put, and all entries are
kmalloc()'ed. However, it's quite often to queue up only one entry per
node, so let's add an inline entry there to avoid extra allocations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c482c1c652c45c85ac52e67c974bc758a50fed5f.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: rename rsrc_list
Pavel Begunkov [Tue, 4 Apr 2023 12:39:51 +0000 (13:39 +0100)]
io_uring/rsrc: rename rsrc_list

We have too many "rsrc" around which makes the name of struct
io_rsrc_node::rsrc_list confusing. The field is responsible for keeping
a list of files or buffers, so call it item_list and add comments
around.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3e34d4dfc1fdbb6b520f904ee6187c2ccf680efe.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: kill rsrc_ref_lock
Pavel Begunkov [Tue, 4 Apr 2023 12:39:50 +0000 (13:39 +0100)]
io_uring/rsrc: kill rsrc_ref_lock

We use ->rsrc_ref_lock spinlock to protect ->rsrc_ref_list in
io_rsrc_node_ref_zero(). Now we removed pcpu refcounting, which means
io_rsrc_node_ref_zero() is not executed from the irq context as an RCU
callback anymore, and we also put it under ->uring_lock.
io_rsrc_node_switch(), which queues up nodes into the list, is also
protected by ->uring_lock, so we can safely get rid of ->rsrc_ref_lock.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6b60af883c263551190b526a55ff2c9d5ae07141.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: protect node refs with uring_lock
Pavel Begunkov [Tue, 4 Apr 2023 12:39:49 +0000 (13:39 +0100)]
io_uring/rsrc: protect node refs with uring_lock

Currently, for nodes we have an atomic counter and some cached
(non-atomic) refs protected by uring_lock. Let's put all ref
manipulations under uring_lock and get rid of the atomic part.
It's free as in all cases we care about we already hold the lock.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/25b142feed7d831008257d90c8b17c0115d4fc15.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: io_free_req() via tw
Pavel Begunkov [Tue, 4 Apr 2023 12:39:48 +0000 (13:39 +0100)]
io_uring: io_free_req() via tw

io_free_req() is not often used but nevertheless problematic as there is
no way to know the current context, it may be used from the submission
path or even by an irq handler. Push it to a fresh context using
task_work.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3a92fe80bb068757e51aaa0b105cfbe8f5dfee9e.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: don't put nodes under spinlocks
Pavel Begunkov [Tue, 4 Apr 2023 12:39:47 +0000 (13:39 +0100)]
io_uring: don't put nodes under spinlocks

io_req_put_rsrc() doesn't need any locking, so move it out of
a spinlock section in __io_req_complete_post() and adjust helpers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d5b87a5f31270dade6805f7acafc4cc34b84b241.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: keep cached refs per node
Pavel Begunkov [Tue, 4 Apr 2023 12:39:46 +0000 (13:39 +0100)]
io_uring/rsrc: keep cached refs per node

We cache refs of the current node (i.e. ctx->rsrc_node) in
ctx->rsrc_cached_refs. We'll be moving away from atomics, so move the
cached refs in struct io_rsrc_node for now. It's a prep patch and
shouldn't change anything in practise.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9edc3669c1d71b06c2dca78b2b2b8bb9292738b9.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/rsrc: use non-pcpu refcounts for nodes
Pavel Begunkov [Tue, 4 Apr 2023 12:39:45 +0000 (13:39 +0100)]
io_uring/rsrc: use non-pcpu refcounts for nodes

One problem with the current rsrc infra is that often updates will
generates lots of rsrc nodes, each carry pcpu refs. That takes quite a
lot of memory, especially if there is a stall, and takes lots of CPU
cycles. Only pcpu allocations takes >50 of CPU with a naive benchmark
updating files in a loop.

Replace pcpu refs with normal refcounting. There is already a hot path
avoiding atomics / refs, but following patches will further improve it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e9ed8a9457b331a26555ff9443afc64cdaab7247.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: cap io_sqring_entries() at SQ ring size
Jens Axboe [Thu, 30 Mar 2023 16:05:31 +0000 (10:05 -0600)]
io_uring: cap io_sqring_entries() at SQ ring size

We already do this manually for the !SQPOLL case, do it in general and
we can also dump the ugly min3() in io_submit_sqes().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: rename trace_io_uring_submit_sqe() tracepoint
Jens Axboe [Thu, 30 Mar 2023 16:03:41 +0000 (10:03 -0600)]
io_uring: rename trace_io_uring_submit_sqe() tracepoint

It has nothing to do with the SQE at this point, it's a request
submission. While in there, get rid of the 'force_nonblock' argument
which is also dead, as we only pass in true.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: encapsulate task_work state
Pavel Begunkov [Mon, 27 Mar 2023 15:38:15 +0000 (16:38 +0100)]
io_uring: encapsulate task_work state

For task works we're passing around a bool pointer for whether the
current ring is locked or not, let's wrap it in a structure, that
will make it more opaque preventing abuse and will also help us
to pass more info in the future if needed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: remove extra tw trylocks
Pavel Begunkov [Mon, 27 Mar 2023 15:38:14 +0000 (16:38 +0100)]
io_uring: remove extra tw trylocks

Before cond_resched()'ing in handle_tw_list() we also drop the current
ring context, and so the next loop iteration will need to pick/pin a new
context and do trylock.

The chunk removed by this patch was intended to be an optimisation
covering exactly this case, i.e. retaking the lock after reschedule, but
in reality it's skipped for the first iteration after resched as
described and will keep hammering the lock if it's contended.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/io-wq: drop outdated comment
Jens Axboe [Mon, 27 Mar 2023 19:10:21 +0000 (13:10 -0600)]
io_uring/io-wq: drop outdated comment

Since the move to PF_IO_WORKER, we don't juggle memory context manually
anymore. Remove that outdated part of the comment for __io_worker_idle().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: kill unused notif declarations
Pavel Begunkov [Mon, 27 Mar 2023 15:34:48 +0000 (16:34 +0100)]
io_uring: kill unused notif declarations

There are two leftover structures from the notification registration
mechanism that has never been released, kill them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f05f65aebaf8b1b5bf28519a8fdb350e3e7c9ad0.1679924536.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio-wq: Drop struct io_wqe
Gabriel Krisman Bertazi [Wed, 22 Mar 2023 01:16:28 +0000 (22:16 -0300)]
io-wq: Drop struct io_wqe

Since commit 0654b05e7e65 ("io_uring: One wqe per wq"), we have just a
single io_wqe instance embedded per io_wq.  Drop the extra structure in
favor of accessing struct io_wq directly, cleaning up quite a bit of
dereferences and backpointers.

No functional changes intended.  Tested with liburing's testsuite
and mmtests performance microbenchmarks.  I didn't observe any
performance regressions.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio-wq: Move wq accounting to io_wq
Gabriel Krisman Bertazi [Wed, 22 Mar 2023 01:16:27 +0000 (22:16 -0300)]
io-wq: Move wq accounting to io_wq

Since we now have a single io_wqe per io_wq instead of per-node, and in
preparation to its removal, move the accounting into the parent
structure.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/kbuf: disallow mapping a badly aligned provided ring buffer
Jens Axboe [Fri, 17 Mar 2023 16:42:08 +0000 (10:42 -0600)]
io_uring/kbuf: disallow mapping a badly aligned provided ring buffer

On at least parisc, we have strict requirements on how we virtually map
an address that is shared between the application and the kernel. On
these platforms, IOU_PBUF_RING_MMAP should be used when setting up a
shared ring buffer for provided buffers. If the application is mapping
these pages and asking the kernel to pin+map them as well, then we have
no control over what virtual address we get in the kernel.

For that case, do a sanity check if SHM_COLOUR is defined, and disallow
the mapping request. The application must fall back to using
IOU_PBUF_RING_MMAP for this case, and liburing will do that transparently
with the set of helpers that it has.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: Add KASAN support for alloc_caches
Breno Leitao [Thu, 23 Feb 2023 16:43:53 +0000 (08:43 -0800)]
io_uring: Add KASAN support for alloc_caches

Add support for KASAN in the alloc_caches (apoll and netmsg_cache).
Thus, if something touches the unused caches, it will raise a KASAN
warning/exception.

It poisons the object when the object is put to the cache, and unpoisons
it when the object is gotten or freed.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: Move from hlist to io_wq_work_node
Breno Leitao [Thu, 23 Feb 2023 16:43:52 +0000 (08:43 -0800)]
io_uring: Move from hlist to io_wq_work_node

Having cache entries linked using the hlist format brings no benefit, and
also requires an unnecessary extra pointer address per cache entry.

Use the internal io_wq_work_node single-linked list for the internal
alloc caches (async_msghdr and async_poll)

This is required to be able to use KASAN on cache entries, since we do
not need to touch unused (and poisoned) cache entries when adding more
entries to the list.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: One wqe per wq
Breno Leitao [Fri, 10 Mar 2023 20:11:07 +0000 (12:11 -0800)]
io_uring: One wqe per wq

Right now io_wq allocates one io_wqe per NUMA node.  As io_wq is now
bound to a task, the task basically uses only the NUMA local io_wqe, and
almost never changes NUMA nodes, thus, the other wqes are mostly
unused.

Allocate just one io_wqe embedded into io_wq, and uses all possible cpus
(cpu_possible_mask) in the io_wqe->cpumask.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230310201107.4020580-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: add support for user mapped provided buffer ring
Jens Axboe [Tue, 14 Mar 2023 17:07:19 +0000 (11:07 -0600)]
io_uring: add support for user mapped provided buffer ring

The ring mapped provided buffer rings rely on the application allocating
the memory for the ring, and then the kernel will map it. This generally
works fine, but runs into issues on some architectures where we need
to be able to ensure that the kernel and application virtual address for
the ring play nicely together. This at least impacts architectures that
set SHM_COLOUR, but potentially also anyone setting SHMLBA.

To use this variant of ring provided buffers, the application need not
allocate any memory for the ring. Instead the kernel will do so, and
the allocation must subsequently call mmap(2) on the ring with the
offset set to:

IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)

to get a virtual address for the buffer ring. Normally the application
would allocate a suitable piece of memory (and correctly aligned) and
simply pass that in via io_uring_buf_reg.ring_addr and the kernel would
map it.

Outside of the setup differences, the kernel allocate + user mapped
provided buffer ring works exactly the same.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags'
Jens Axboe [Tue, 14 Mar 2023 17:01:45 +0000 (11:01 -0600)]
io_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags'

In preparation for allowing flags to be set for registration, rename
the padding and use it for that.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/kbuf: add buffer_list->is_mapped member
Jens Axboe [Tue, 14 Mar 2023 16:59:46 +0000 (10:59 -0600)]
io_uring/kbuf: add buffer_list->is_mapped member

Rather than rely on checking buffer_list->buf_pages or ->buf_nr_pages,
add a separate member that tracks if this is a ring mapped provided
buffer list or not.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring/kbuf: move pinning of provided buffer ring into helper
Jens Axboe [Tue, 14 Mar 2023 16:55:50 +0000 (10:55 -0600)]
io_uring/kbuf: move pinning of provided buffer ring into helper

In preparation for allowing the kernel to allocate the provided buffer
rings and have the application mmap it instead, abstract out the
current method of pinning and mapping the user allocated ring.

No functional changes intended in this patch.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: Adjust mapping wrt architecture aliasing requirements
Helge Deller [Thu, 16 Feb 2023 08:09:38 +0000 (09:09 +0100)]
io_uring: Adjust mapping wrt architecture aliasing requirements

Some architectures have memory cache aliasing requirements (e.g. parisc)
if memory is shared between userspace and kernel. This patch fixes the
kernel to return an aliased address when asked by userspace via mmap().

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoio_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it
Jens Axboe [Tue, 7 Mar 2023 16:47:20 +0000 (09:47 -0700)]
io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it

io_uring hashes writes to a given file/inode so that it can serialize
them. This is useful if the file system needs exclusive access to the
file to perform the write, as otherwise we end up with a ton of io-wq
threads trying to lock the inode at the same time. This can cause
excessive system time.

But if the file system has flagged that it supports parallel O_DIRECT
writes, then there's no need to serialize the writes. Check for that
through FMODE_DIO_PARALLEL_WRITE and don't hash it if we don't need to.

In a basic test of 8 threads writing to a file on XFS on a gen2 Optane,
with each thread writing in 4k chunks, it improves performance from
~1350K IOPS (or ~5290MiB/sec) to ~1410K IOPS (or ~5500MiB/sec).

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agofs: add FMODE_DIO_PARALLEL_WRITE flag
Jens Axboe [Tue, 7 Mar 2023 16:40:28 +0000 (09:40 -0700)]
fs: add FMODE_DIO_PARALLEL_WRITE flag

Some filesystems support multiple threads writing to the same file with
O_DIRECT without requiring exclusive access to it. io_uring can use this
hint to avoid serializing dio writes to this inode, instead allowing them
to run in parallel.

XFS and ext4 both fall into this category, so set the flag for both of
them.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 months agoLinux 6.3-rc5 v6.3-rc5
Linus Torvalds [Sun, 2 Apr 2023 21:29:29 +0000 (14:29 -0700)]
Linux 6.3-rc5

13 months agoMerge tag 'for-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Sun, 2 Apr 2023 17:57:12 +0000 (10:57 -0700)]
Merge tag 'for-6.3-rc4-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - scan block devices in non-exclusive mode to avoid temporary mkfs
   failures

 - fix race between quota disable and quota assign ioctls

 - fix deadlock when aborting transaction during relocation with scrub

 - ignore fiemap path cache when there are multiple paths for a node

* tag 'for-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: ignore fiemap path cache when there are multiple paths for a node
  btrfs: fix deadlock when aborting transaction during relocation with scrub
  btrfs: scan device in non-exclusive mode
  btrfs: fix race between quota disable and quota assign ioctls

13 months agoRevert "venus: firmware: Correct non-pix start and end addresses"
Javier Martinez Canillas [Tue, 7 Feb 2023 10:22:54 +0000 (11:22 +0100)]
Revert "venus: firmware: Correct non-pix start and end addresses"

This reverts commit a837e5161cff, which broke probing of the venus
driver, at least on the SC7180 SoC HP X2 Chromebook:

  qcom-venus aa00000.video-codec: Adding to iommu group 11
  qcom-venus aa00000.video-codec: non legacy binding
  qcom-venus aa00000.video-codec: failed to reset venus core
  qcom-venus: probe of aa00000.video-codec failed with error -110

Matthias Kaehlcke also reported that the same change caused a regression
in SC7180 and sc7280, that prevents AOSS from entering sleep mode during
system suspend.  So let's revert this commit for now to fix both issues.

Fixes: a837e5161cff ("venus: firmware: Correct non-pix start and end addresses")
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 months agoMerge tag 'driver-core-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 2 Apr 2023 17:10:16 +0000 (10:10 -0700)]
Merge tag 'driver-core-6.3-rc5' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are three small changes for 6.3-rc5 semi-related to driver core
  stuff:

   - documentation update where we move the security_bugs file to a more
     relevant location.

   - mdt/spi-nor debugfs memory leak fix that's been floating around for
     a long time and acked by the maintainer

   - cacheinfo bugfix for a regression in 6.3-rc1

  All have been in linux-next with no reported problems"

* tag 'driver-core-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  cacheinfo: Fix LLC is not exported through sysfs
  Documentation/security-bugs: move from admin-guide/ to process/
  mtd: spi-nor: fix memory leak when using debugfs_lookup()

13 months agoMerge tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 2 Apr 2023 17:01:56 +0000 (10:01 -0700)]
Merge tag 'powerpc-6.3-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a false positive warning in __pte_needs_flush() (with DEBUG_VM=y)

 - Fix oops when a PF_IO_WORKER thread tries to core dump

 - Don't try to reconfigure VAS when it's disabled

Thanks to Benjamin Gray, Haren Myneni, Jens Axboe, Nathan Lynch, and
Russell Currey.

* tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabled
  powerpc: Don't try to copy PPR for task with NULL pt_regs
  powerpc/64s: Fix __pte_needs_flush() false positive warning

13 months agoMerge tag '6.3-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 1 Apr 2023 21:50:22 +0000 (14:50 -0700)]
Merge tag '6.3-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:
 "Four cifs/smb3 client (reconnect and DFS related) fixes, including two
  for stable:

   - DFS oops fix

   - DFS reconnect recursion fix

   - An SMB1 parallel reconnect fix

   - Trivial dead code removal in smb2_reconnect"

* tag '6.3-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: get rid of dead check in smb2_reconnect()
  cifs: prevent infinite recursion in CIFSGetDFSRefer()
  cifs: avoid races in parallel reconnects in smb1
  cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL

13 months agoMerge tag 'input-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor...
Linus Torvalds [Sat, 1 Apr 2023 21:09:51 +0000 (14:09 -0700)]
Merge tag 'input-for-v6.3-rc4' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - fixes to ALPS and Focaltech PS/2 drivers dealing with the breakage of
   switching to -funsigned-char

 - quirks to i8042 to better handle Lifebook A574/H and TUXEDO devices

 - a quirk to Goodix touchscreen driver to handle Yoga Book X90F

 - a fix for incorrectly merged patch to xpad game controller driver

* tag 'input-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - add TUXEDO devices to i8042 quirk tables for partial fix
  Input: alps - fix compatibility with -funsigned-char
  Input: focaltech - use explicitly signed char type
  Input: xpad - fix incorrectly applied patch for MAP_PROFILE_BUTTON
  Input: goodix - add Lenovo Yoga Book X90F to nine_bytes_report DMI table
  Input: i8042 - add quirk for Fujitsu Lifebook A574/H

13 months agoMerge tag 'pinctrl-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 1 Apr 2023 16:47:08 +0000 (09:47 -0700)]
Merge tag 'pinctrl-v6.3-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Some pin control fixes for the v6.3 series.

  The most notable and urgent one is probably the AMD fix which affects
  AMD laptops, found by the Chromium people.

  Summary:

   - Fix up the Kconfig options for MediaTek MT7981

   - Fix the irq domain name in the AT91-PIO4 driver

   - Fix some alternative muxing modes in the Ocelot driver

   - Allocate the GPIO numbers dynamically in the STM32 driver

   - Disable and mask interrupts on resume in the AMD driver

   - Fix a typo in the Qualcomm SM8550 pin control device tree bindings"

* tag 'pinctrl-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  dt-bindings: pinctrl: qcom,sm8550-lpass-lpi: allow input-enabled and bias-bus-hold
  pinctrl: amd: Disable and mask interrupts on resume
  pinctrl: stm32: use dynamic allocation of GPIO base
  pinctrl: ocelot: Fix alt mode for ocelot
  pinctrl: at91-pio4: fix domain name assignment
  pinctrl: mediatek: fix naming inconsistency
  pinctrl: mediatek: add missing options to PINCTRL_MT7981

13 months agoMerge tag 'kbuild-fixes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Apr 2023 16:25:17 +0000 (09:25 -0700)]
Merge tag 'kbuild-fixes-v6.3-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix linux-headers debian package

 - Fix a merge_config.sh error due to a misspelled variable

 - Fix modversion for 32-bit build machines

* tag 'kbuild-fixes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  modpost: Fix processing of CRCs on 32-bit build machines
  scripts: merge_config: Fix typo in variable name.
  kbuild: deb-pkg: set version for linux-headers paths

13 months agoMerge tag 'iommu-fixes-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Apr 2023 16:17:33 +0000 (09:17 -0700)]
Merge tag 'iommu-fixes-6.3-rc4' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Maintainer update for S390 IOMMU driver

 - A fix for the set_platform_dma_ops() call-back in the Exynos
   IOMMU driver

 - Intel VT-d fixes from Lu Baolu:
    - Fix a lockdep splat
    - Fix a supplement of the specification
    - Fix a warning in perfmon code

* tag 'iommu-fixes-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
  iommu/vt-d: Allow zero SAGAW if second-stage not supported
  iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
  iommu/exynos: Fix set_platform_dma_ops() callback
  MAINTAINERS: Update s390-iommu driver maintainer information

13 months agomedia: i2c: imx290: fix conditional function defintions
Arnd Bergmann [Tue, 7 Feb 2023 16:13:12 +0000 (17:13 +0100)]
media: i2c: imx290: fix conditional function defintions

The runtime suspend/resume functions are only referenced from the
dev_pm_ops, but they use the old SET_RUNTIME_PM_OPS() helper that
requires a __maybe_unused annotation to avoid a warning:

  drivers/media/i2c/imx290.c:1082:12: error: unused function 'imx290_runtime_resume' [-Werror,-Wunused-function]
  static int imx290_runtime_resume(struct device *dev)
             ^
  drivers/media/i2c/imx290.c:1090:12: error: unused function 'imx290_runtime_suspend' [-Werror,-Wunused-function]
  static int imx290_runtime_suspend(struct device *dev)
             ^

Convert this to the new RUNTIME_PM_OPS() helper that so this is not
required.  To improve this further, also use the pm_ptr() helper that
lets the dev_pm_ops get dropped entirely when CONFIG_PM is disabled.

A related mistake happened in the of_match_ptr() macro here, which like
SET_RUNTIME_PM_OPS() requires the match table to be marked as
__maybe_unused, though I could not reproduce building this without
CONFIG_OF.  Remove the of_match_ptr() here as there is no point in
dropping the match table in configurations without CONFIG_OF.

Fixes: 02852c01f654 ("media: i2c: imx290: Initialize runtime PM before subdev")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 months agoMerge tag 'nfs-for-6.3-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Fri, 31 Mar 2023 20:22:14 +0000 (13:22 -0700)]
Merge tag 'nfs-for-6.3-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:

 - Fix shutdown of NFS TCP client sockets

 - Fix hangs when recovering open state after a server reboot

* tag 'nfs-for-6.3-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: fix shutdown of NFS TCP client socket
  NFSv4: Fix hangs when recovering open state after a server reboot

13 months agoMerge tag 'platform-drivers-x86-v6.3-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 31 Mar 2023 20:11:06 +0000 (13:11 -0700)]
Merge tag 'platform-drivers-x86-v6.3-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - Fix a regression in ideapad-laptop which caused the touchpad to stop
   working after a suspend/resume on some models

 - One other small fix and three hw-id additions

* tag 'platform-drivers-x86-v6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: ideapad-laptop: Stop sending KEY_TOUCHPAD_TOGGLE
  platform/x86: asus-nb-wmi: Add quirk_asus_tablet_mode to other ROG Flow X13 models
  platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE
  platform/x86: gigabyte-wmi: add support for B650 AORUS ELITE AX
  platform/x86/intel/pmc: Alder Lake PCH slp_s0_residency fix

13 months agoMerge tag 'pci-v6.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 31 Mar 2023 20:07:01 +0000 (13:07 -0700)]
Merge tag 'pci-v6.3-fixes-1' of git://git./linux/kernel/git/pci/pci

Pull PCI fix from Bjorn Helgaas:

 - Fix DesignWare PORT_LINK_CONTROL setup, which was corrupted when the
   DT "snps,enable-cdm-check" property was present (Yoshihiro Shimoda)

* tag 'pci-v6.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: dwc: Fix PORT_LINK_CONTROL update when CDM check enabled

13 months agoMerge tag 'regulator-fix-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 31 Mar 2023 20:02:34 +0000 (13:02 -0700)]
Merge tag 'regulator-fix-v6.3-rc4' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "Deferred probe fix for v6.3.

  This fixes a rarely triggered issue where we would treat probe
  deferral for clocks as a fatal error in the fixed regulator, causing
  it to fail to retry when it should"

* tag 'regulator-fix-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Handle deferred clk

13 months agoMerge tag 'block-6.3-2023-03-30' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 31 Mar 2023 19:35:03 +0000 (12:35 -0700)]
Merge tag 'block-6.3-2023-03-30' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
     - Mark Lexar NM760 as IGNORE_DEV_SUBNQN (Juraj Pecigos)
     - Fix a possible UAF when failing to allocate an TCP io queue (Sagi
       Grimberg)

 - MD pull request via Song:
     - Fix a null pointer deference in 6.3-rc (Yu Kuai)

 - uevent partition fix (Alyssa)

* tag 'block-6.3-2023-03-30' of git://git.kernel.dk/linux:
  nvme-tcp: fix a possible UAF when failing to allocate an io queue
  md: fix regression for null-ptr-deference in __md_stop()
  nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
  loop: LOOP_CONFIGURE: send uevents for partitions

13 months agoMerge tag 'io_uring-6.3-2023-03-30' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 31 Mar 2023 19:30:13 +0000 (12:30 -0700)]
Merge tag 'io_uring-6.3-2023-03-30' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix a regression with the poll retry, introduced in this merge window
   (me)

 - Fix a regression with the alloc cache not decrementing the member
   count on removal. Also a regression from this merge window (Pavel)

 - Fix race around rsrc node grabbing (Pavel)

* tag 'io_uring-6.3-2023-03-30' of git://git.kernel.dk/linux:
  io_uring: fix poll/netmsg alloc caches
  io_uring/rsrc: fix rogue rsrc node grabbing
  io_uring/poll: clear single/double poll flags on poll arming

13 months agoplatform/x86: ideapad-laptop: Stop sending KEY_TOUCHPAD_TOGGLE
Hans de Goede [Thu, 30 Mar 2023 19:46:44 +0000 (21:46 +0200)]
platform/x86: ideapad-laptop: Stop sending KEY_TOUCHPAD_TOGGLE

Commit 5829f8a897e4 ("platform/x86: ideapad-laptop: Send
KEY_TOUCHPAD_TOGGLE on some models") made ideapad-laptop send
KEY_TOUCHPAD_TOGGLE when we receive an ACPI notify with VPC event bit 5 set
and the touchpad-state has not been changed by the EC itself already.

This was done under the assumption that this would be good to do to make
the touchpad-toggle hotkey work on newer models where the EC does not
toggle the touchpad on/off itself (because it is not routed through
the PS/2 controller, but uses I2C).

But it turns out that at least some models, e.g. the Yoga 7-15ITL5 the EC
triggers an ACPI notify with VPC event bit 5 set on resume, which would
now cause a spurious KEY_TOUCHPAD_TOGGLE on resume to which the desktop
environment responds by disabling the touchpad in software, breaking
the touchpad (until manually re-enabled) on resume.

It was never confirmed that sending KEY_TOUCHPAD_TOGGLE actually improves
things on new models and at least some new models like the Yoga 7-15ITL5
don't have a touchpad on/off toggle hotkey at all, while still sending
ACPI notify events with VPC event bit 5 set.

So it seems best to revert the change to send KEY_TOUCHPAD_TOGGLE when
receiving an ACPI notify events with VPC event bit 5 and the touchpad
state as reported by the EC has not changed.

Note this is not a full revert the code to cache the last EC touchpad
state is kept to avoid sending spurious KEY_TOUCHPAD_ON / _OFF events
on resume.

Fixes: 5829f8a897e4 ("platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217234
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230330194644.64628-1-hdegoede@redhat.com
13 months agoplatform/x86: asus-nb-wmi: Add quirk_asus_tablet_mode to other ROG Flow X13 models
weiliang1503 [Thu, 30 Mar 2023 11:49:43 +0000 (19:49 +0800)]
platform/x86: asus-nb-wmi: Add quirk_asus_tablet_mode to other ROG Flow X13 models

Make quirk_asus_tablet_mode apply on other ROG Flow X13 devices,
which only affects the GV301Q model before.

Signed-off-by: weiliang1503 <weiliang1503@gmail.com>
Link: https://lore.kernel.org/r/20230330114943.15057-1-weiliang1503@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
13 months agoplatform/x86: gigabyte-wmi: add support for X570S AORUS ELITE
Hans de Goede [Fri, 31 Mar 2023 17:31:48 +0000 (19:31 +0200)]
platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE

Add "X570S AORUS ELITE" to known working boards

Reported-by: Brandon Nielsen <nielsenb@jetfuse.net>
Link: https://lore.kernel.org/r/20230331014902.7864-1-nielsenb@jetfuse.net
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
13 months agoMerge tag 'thermal-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 31 Mar 2023 17:23:27 +0000 (10:23 -0700)]
Merge tag 'thermal-6.3-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "These remove two recently added excessive lockdep assertions from the
  sysfs-related thermal code and fix two issues in Intel thermal
  drivers.

  Specifics:

   - Drop two lockdep assertions producing false positive warnings from
     the sysfs-related thermal core code (Rafael Wysocki)

   - Fix handling of two recently added module parameters in the Intel
     powerclamp thermal driver (David Arcari)

   - Fix one more deadlock in the int340x thermal driver (Srinivas
     Pandruvada)"

* tag 'thermal-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
  thermal: intel: int340x: processor_thermal: Fix additional deadlock
  thermal: core: Drop excessive lockdep_assert_held() calls

13 months agoMerge tag 'acpi-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 31 Mar 2023 17:18:56 +0000 (10:18 -0700)]
Merge tag 'acpi-6.3-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Fix a recent regression related to the handling of ACPI notifications
  that made it more likely for ACPI driver callbacks to be invoked in an
  unexpected order and NULL pointers can be dereferenced as a result or
  similar.

  The fix is to modify the global ACPI notification handler so it does
  not invoke driver callbacks at all and allow the device-level
  notification handlers to receive "system" notifications (for the
  drivers that want to receive them)"

* tag 'acpi-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: bus: Rework system-level device notification handling

13 months agoMerge tag 'riscv-for-linus-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 31 Mar 2023 17:15:17 +0000 (10:15 -0700)]
Merge tag 'riscv-for-linus-6.3-rc5' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix for FPU probing in XIP kernels

 - Always enable the alternative framework for non-XIP kernels

* tag 'riscv-for-linus-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
  RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()

13 months agoMerge tag 'mips-fixes_6.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Fri, 31 Mar 2023 17:12:07 +0000 (10:12 -0700)]
Merge tag 'mips-fixes_6.3_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:
 "Fix to avoid crash on BCM6358 platforms"

* tag 'mips-fixes_6.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  mips: bmips: BCM6358: disable RAC flush for TP1

13 months agoMerge branch 'thermal-intel-fixes'
Rafael J. Wysocki [Fri, 31 Mar 2023 10:02:46 +0000 (12:02 +0200)]
Merge branch 'thermal-intel-fixes'

Merge Intel thermal driver fixes for 6.3-rc5:

 - Fix handling of two recently added module parameters in the Intel
   powerclamp thermal driver (David Arcari).

 - Fix one more deadlock in the int340x thermal driver (Srinivas
   Pandruvada).

* thermal-intel-fixes:
  thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
  thermal: intel: int340x: processor_thermal: Fix additional deadlock

13 months agoiommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
Kan Liang [Wed, 29 Mar 2023 13:47:21 +0000 (21:47 +0800)]
iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug

A warning can be triggered when hotplug CPU 0.
$ echo 0 > /sys/devices/system/cpu/cpu0/online

 ------------[ cut here ]------------
 Voluntary context switch within RCU read-side critical section!
 WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
          rcu_note_context_switch+0x4f4/0x580
 RIP: 0010:rcu_note_context_switch+0x4f4/0x580
 Call Trace:
  <TASK>
  ? perf_event_update_userpage+0x104/0x150
  __schedule+0x8d/0x960
  ? perf_event_set_state.part.82+0x11/0x50
  schedule+0x44/0xb0
  schedule_timeout+0x226/0x310
  ? __perf_event_disable+0x64/0x1a0
  ? _raw_spin_unlock+0x14/0x30
  wait_for_completion+0x94/0x130
  __wait_rcu_gp+0x108/0x130
  synchronize_rcu+0x67/0x70
  ? invoke_rcu_core+0xb0/0xb0
  ? __bpf_trace_rcu_stall_warning+0x10/0x10
  perf_pmu_migrate_context+0x121/0x370
  iommu_pmu_cpu_offline+0x6a/0xa0
  ? iommu_pmu_del+0x1e0/0x1e0
  cpuhp_invoke_callback+0x129/0x510
  cpuhp_thread_fun+0x94/0x150
  smpboot_thread_fn+0x183/0x220
  ? sort_range+0x20/0x20
  kthread+0xe6/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>
 ---[ end trace 0000000000000000 ]---

The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
when migrating a PMU to a new CPU. However, the current for_each_iommu()
is within RCU read-side critical section.

Two methods were considered to fix the issue.
- Use the dmar_global_lock to replace the RCU read lock when going
  through the drhd list. But it triggers a lockdep warning.
- Use the cpuhp_setup_state_multi() to set up a dedicated state for each
  IOMMU PMU. The lock can be avoided.

The latter method is implemented in this patch. Since each IOMMU PMU has
a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track
the state. The state can be dynamically allocated now. Remove the
CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE.

Fixes: 46284c6ceb5e ("iommu/vt-d: Support cpumask for IOMMU perfmon")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
13 months agoiommu/vt-d: Allow zero SAGAW if second-stage not supported
Lu Baolu [Wed, 29 Mar 2023 13:47:20 +0000 (21:47 +0800)]
iommu/vt-d: Allow zero SAGAW if second-stage not supported

The VT-d spec states (in section 11.4.2) that hardware implementations
reporting second-stage translation support (SSTS) field as Clear also
report the SAGAW field as 0. Fix an inappropriate check in alloc_iommu().

Fixes: 792fb43ce2c9 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default")
Suggested-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230318024824.124542-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
13 months agoiommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
Lu Baolu [Wed, 29 Mar 2023 13:47:19 +0000 (21:47 +0800)]
iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()

The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.

Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
the DMAR global data structures are not touched there. Remove it to avoid
below lockdep warning.

 ======================================================
 WARNING: possible circular locking dependency detected
 6.3.0-rc2 #468 Not tainted
 ------------------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
   at: __irq_domain_alloc_irqs+0x3b/0xa0

 but task is already holding lock:
 ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
   at: intel_iommu_init+0x58e/0x880

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (dmar_global_lock){++++}-{3:3}:
        lock_acquire+0xd6/0x320
        down_read+0x42/0x180
        intel_irq_remapping_alloc+0xad/0x750
        mp_irqdomain_alloc+0xb8/0x2b0
        irq_domain_alloc_irqs_locked+0x12f/0x2d0
        __irq_domain_alloc_irqs+0x56/0xa0
        alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
        mp_map_pin_to_irq+0x1dc/0x330
        setup_IO_APIC+0x128/0x210
        apic_intr_mode_init+0x67/0x110
        x86_late_time_init+0x24/0x40
        start_kernel+0x41e/0x7e0
        secondary_startup_64_no_verify+0xe0/0xeb

 -> #0 (&domain->mutex){+.+.}-{3:3}:
        check_prevs_add+0x160/0xef0
        __lock_acquire+0x147d/0x1950
        lock_acquire+0xd6/0x320
        __mutex_lock+0x9c/0xfc0
        __irq_domain_alloc_irqs+0x3b/0xa0
        dmar_alloc_hwirq+0x9e/0x120
        iommu_pmu_register+0x11d/0x200
        intel_iommu_init+0x5de/0x880
        pci_iommu_init+0x12/0x40
        do_one_initcall+0x65/0x350
        kernel_init_freeable+0x3ca/0x610
        kernel_init+0x1a/0x140
        ret_from_fork+0x29/0x50

 other info that might help us debug this:

 Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(dmar_global_lock);
                                lock(&domain->mutex);
                                lock(dmar_global_lock);
   lock(&domain->mutex);

                *** DEADLOCK ***

Fixes: 9dbb8e3452ab ("irqdomain: Switch to per-domain locking")
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230314051836.23817-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
13 months agoMerge tag 'md-fixes-2023-03-29' of https://git.kernel.org/pub/scm/linux/kernel/git... block-6.3-2023-03-30
Jens Axboe [Fri, 31 Mar 2023 02:29:47 +0000 (20:29 -0600)]
Merge tag 'md-fixes-2023-03-29' of https://git./linux/kernel/git/song/md into block-6.3

Pull MD fix from Song.

* tag 'md-fixes-2023-03-29' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: fix regression for null-ptr-deference in __md_stop()

13 months agoMerge tag 'dma-mapping-6.3-2023-03-31' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Thu, 30 Mar 2023 23:09:37 +0000 (16:09 -0700)]
Merge tag 'dma-mapping-6.3-2023-03-31' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - fix for swiotlb deadlock due to wrong alignment checks (GuoRui.Yu,
   Petr Tesarik)

* tag 'dma-mapping-6.3-2023-03-31' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: fix slot alignment checks
  swiotlb: use wrap_area_index() instead of open-coding it
  swiotlb: fix the deadlock in swiotlb_do_find_slots

13 months agocifs: get rid of dead check in smb2_reconnect()
Paulo Alcantara [Wed, 29 Mar 2023 20:14:23 +0000 (17:14 -0300)]
cifs: get rid of dead check in smb2_reconnect()

The SMB2_IOCTL check in the switch statement will never be true as we
return earlier from smb2_reconnect() if @smb2_command == SMB2_IOCTL.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
13 months agocifs: prevent infinite recursion in CIFSGetDFSRefer()
Paulo Alcantara [Wed, 29 Mar 2023 20:14:22 +0000 (17:14 -0300)]
cifs: prevent infinite recursion in CIFSGetDFSRefer()

We can't call smb_init() in CIFSGetDFSRefer() as cifs_reconnect_tcon()
may end up calling CIFSGetDFSRefer() again to get new DFS referrals
and thus causing an infinite recursion.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
13 months agocifs: avoid races in parallel reconnects in smb1
Paulo Alcantara [Wed, 29 Mar 2023 20:14:21 +0000 (17:14 -0300)]
cifs: avoid races in parallel reconnects in smb1

Prevent multiple threads of doing negotiate, session setup and tree
connect by holding @ses->session_mutex in cifs_reconnect_tcon() while
reconnecting session and tcon.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
13 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Thu, 30 Mar 2023 22:52:45 +0000 (15:52 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four small fixes, three in drivers. The core fix is yet another
  attempt to insulate us from UFS devices' weird behaviour for VPD
  pages"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mpt3sas: Don't print sense pool info twice
  scsi: core: Improve scsi_vpd_inquiry() checks
  scsi: megaraid_sas: Fix crash after a double completion
  scsi: megaraid_sas: Fix fw_crash_buffer_show()

13 months agoMerge tag 'nvme-6.3-2023-03-31' of git://git.infradead.org/nvme into block-6.3
Jens Axboe [Thu, 30 Mar 2023 22:39:04 +0000 (16:39 -0600)]
Merge tag 'nvme-6.3-2023-03-31' of git://git.infradead.org/nvme into block-6.3

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.3

 - mark Lexar NM760 as IGNORE_DEV_SUBNQN (Juraj Pecigos)
 - fix a possible UAF when failing to allocate an TCP io queue
   (Sagi Grimberg)"

* tag 'nvme-6.3-2023-03-31' of git://git.infradead.org/nvme:
  nvme-tcp: fix a possible UAF when failing to allocate an io queue
  nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN

13 months agocifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
David Disseldorp [Wed, 29 Mar 2023 20:24:06 +0000 (22:24 +0200)]
cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL

When compiled with CONFIG_CIFS_DFS_UPCALL disabled, cifs_dfs_d_automount
is NULL. cifs.ko logic for mapping CIFS_FATTR_DFS_REFERRAL attributes to
S_AUTOMOUNT and corresponding dentry flags is retained regardless of
CONFIG_CIFS_DFS_UPCALL, leading to a NULL pointer dereference in
VFS follow_automount() when traversing a DFS referral link:
  BUG: kernel NULL pointer dereference, address: 0000000000000000
  ...
  Call Trace:
   <TASK>
   __traverse_mounts+0xb5/0x220
   ? cifs_revalidate_mapping+0x65/0xc0 [cifs]
   step_into+0x195/0x610
   ? lookup_fast+0xe2/0xf0
   path_lookupat+0x64/0x140
   filename_lookup+0xc2/0x140
   ? __create_object+0x299/0x380
   ? kmem_cache_alloc+0x119/0x220
   ? user_path_at_empty+0x31/0x50
   user_path_at_empty+0x31/0x50
   __x64_sys_chdir+0x2a/0xd0
   ? exit_to_user_mode_prepare+0xca/0x100
   do_syscall_64+0x42/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

This fix adds an inline cifs_dfs_d_automount() {return -EREMOTE} handler
when CONFIG_CIFS_DFS_UPCALL is disabled. An alternative would be to
avoid flagging S_AUTOMOUNT, etc. without CONFIG_CIFS_DFS_UPCALL. This
approach was chosen as it provides more control over the error path.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
13 months agoMerge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 30 Mar 2023 21:05:21 +0000 (14:05 -0700)]
Merge tag 'net-6.3-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from CAN and WPAN.

  Still quite a few bugs from this release. This pull is a bit smaller
  because major subtrees went into the previous one. Or maybe people
  took spring break off?

  Current release - regressions:

   - phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement

  Current release - new code bugs:

   - eth: wangxun: fix vector length of interrupt cause

   - vsock/loopback: consistently protect the packet queue with
     sk_buff_head.lock

   - virtio/vsock: fix header length on skb merging

   - wpan: ca8210: fix unsigned mac_len comparison with zero

  Previous releases - regressions:

   - eth: stmmac: don't reject VLANs when IFF_PROMISC is set

   - eth: smsc911x: avoid PHY being resumed when interface is not up

   - eth: mtk_eth_soc: fix tx throughput regression with direct 1G links

   - eth: bnx2x: use the right build_skb() helper after core rework

   - wwan: iosm: fix 7560 modem crash on use on unsupported channel

  Previous releases - always broken:

   - eth: sfc: don't overwrite offload features at NIC reset

   - eth: r8169: fix RTL8168H and RTL8107E rx crc error

   - can: j1939: prevent deadlock by moving j1939_sk_errqueue()

   - virt: vmxnet3: use GRO callback when UPT is enabled

   - virt: xen: don't do grant copy across page boundary

   - phy: dp83869: fix default value for tx-/rx-internal-delay

   - dsa: ksz8: fix multiple issues with ksz8_fdb_dump

   - eth: mvpp2: fix classification/RSS of VLAN and fragmented packets

   - eth: mtk_eth_soc: fix flow block refcounting logic

  Misc:

   - constify fwnode pointers in SFP handling"

* tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
  net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
  net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
  net: ethernet: mtk_eth_soc: fix flow block refcounting logic
  net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
  net: dsa: sync unicast and multicast addresses for VLAN filters too
  net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
  xen/netback: use same error messages for same errors
  test/vsock: new skbuff appending test
  virtio/vsock: WARN_ONCE() for invalid state of socket
  virtio/vsock: fix header length on skb merging
  bnxt_en: Add missing 200G link speed reporting
  bnxt_en: Fix typo in PCI id to device description string mapping
  bnxt_en: Fix reporting of test result in ethtool selftest
  i40e: fix registers dump after run ethtool adapter self test
  bnx2x: use the right build_skb() helper
  net: ipa: compute DMA pool size properly
  net: wwan: iosm: fixes 7560 modem crash
  net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
  ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
  ice: add profile conflict check for AVF FDIR
  ...

13 months agoMerge tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/devic...
Linus Torvalds [Thu, 30 Mar 2023 20:58:12 +0000 (13:58 -0700)]
Merge tag 'for-6.3/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix two DM core bugs in the code that handles splitting "abnormal" IO
   (discards, write same and secure erase) and issuing that IO to the
   correct underlying devices (and offsets within those devices).

* tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix __send_duplicate_bios() to always allow for splitting IO
  dm: fix improper splitting for abnormal bios

13 months agoMerge tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Thu, 30 Mar 2023 20:38:27 +0000 (13:38 -0700)]
Merge tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
 "Two regression fixes in here, otherwise just the usual stuff:

   - i915 fixes for color mgmt, psr, lmem flush, hibernate oops, and
     more

   - amdgpu: dp mst and hibernate regression fix

   - etnaviv: revert fdinfo support (incl drm/sched revert), leak fix

   - misc ivpu fixes, nouveau backlight, drm buddy allocator 32bit
     fixes"

* tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits)
  Revert "drm/scheduler: track GPU active time per entity"
  Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
  drm/etnaviv: fix reference leak when mmaping imported buffer
  drm/amdgpu: allow more APUs to do mode2 reset when go to S4
  drm/amd/display: Take FEC Overhead into Timeslot Calculation
  drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
  drm: test: Fix 32-bit issue in drm_buddy_test
  drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
  drm/nouveau/kms: Fix backlight registration
  drm/i915/perf: Drop wakeref on GuC RC error
  drm/i915/dpt: Treat the DPT BO as a framebuffer
  drm/i915/gem: Flush lmem contents after construction
  drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
  drm/i915: Disable DC states for all commits
  drm/i915: Workaround ICL CSC_MODE sticky arming
  drm/i915: Add a .color_post_update() hook
  drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
  drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
  drm/i915/pmu: Use functions common with sysfs to read actual freq
  accel/ivpu: Fix IPC buffer header status field value
  ...

13 months agodm: fix __send_duplicate_bios() to always allow for splitting IO
Mike Snitzer [Thu, 30 Mar 2023 19:09:29 +0000 (15:09 -0400)]
dm: fix __send_duplicate_bios() to always allow for splitting IO

Commit 7dd76d1feec70 ("dm: improve bio splitting and associated IO
accounting") only called setup_split_accounting() from
__send_duplicate_bios() if a single bio were being issued. But the case
where duplicate bios are issued must call it too.

Otherwise the bio won't be split and resubmitted (via recursion through
block core back to DM) to submit the later portions of a bio (which may
map to an entirely different target).

For example, when discarding an entire DM striped device with the
following DM table:
 vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
 vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048

Before (broken, discards the first striped target's devices twice):
 device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
 device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528
 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528

After (works as expected):
 device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
 device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528
 device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528

Fixes: 7dd76d1feec70 ("dm: improve bio splitting and associated IO accounting")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
13 months agodm: fix improper splitting for abnormal bios
Mike Snitzer [Thu, 30 Mar 2023 18:56:38 +0000 (14:56 -0400)]
dm: fix improper splitting for abnormal bios

"Abnormal" bios include discards, write zeroes and secure erase. By no
longer passing the calculated 'len' pointer, commit 7dd06a2548b2 ("dm:
allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
senseless approach to disallowing dm_accept_partial_bio() from working
for duplicate bios processed using __send_duplicate_bios().

It inadvertently and incorrectly stopped the use of 'len' when
initializing a target's io (in alloc_tio). As such the resulting tio
could address more area of a device than it should.

For example, when discarding an entire DM striped device with the
following DM table:
 vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
 vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048

Before this fix:

 device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
 blkdiscard: attempt to access beyond end of device
 loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920

 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
 blkdiscard: attempt to access beyond end of device
 loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920

After this fix;

 device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872

Fixes: 7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
13 months agonet: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
Felix Fietkau [Thu, 30 Mar 2023 12:08:40 +0000 (14:08 +0200)]
net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow

The cache needs to be flushed to ensure that the hardware stops offloading
the flow immediately.

Fixes: 33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
Felix Fietkau [Thu, 30 Mar 2023 12:08:39 +0000 (14:08 +0200)]
net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload

Check for skb metadata in order to detect the case where the DSA header
is not present.

Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-2-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: ethernet: mtk_eth_soc: fix flow block refcounting logic
Felix Fietkau [Thu, 30 Mar 2023 12:08:38 +0000 (14:08 +0200)]
net: ethernet: mtk_eth_soc: fix flow block refcounting logic

Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to
call flow_block_cb_incref for a newly allocated cb.
Also fix the accidentally inverted refcount check on unbind.

Fixes: 502e84e2382d ("net: ethernet: mtk_eth_soc: add flow offloading support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
Russell King (Oracle) [Wed, 29 Mar 2023 12:11:17 +0000 (13:11 +0100)]
net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()

Reported on the Turris forum, mvneta provokes kernel warnings in the
architecture DMA mapping code when mvneta_setup_txqs() fails to
allocate memory. This happens because when mvneta_cleanup_txqs() is
called in the mvneta_stop() path, we leave pointers in the structure
that have been freed.

Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
will walk all the queues freeing any non-NULL pointers - which includes
pointers that were previously freed in mvneta_stop().

Fix this by setting these pointers to NULL to prevent double-freeing
of the same memory.

Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO")
Link: https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/8
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1phUe5-00EieL-7q@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: sync unicast and multicast addresses for VLAN filters too
Vladimir Oltean [Wed, 29 Mar 2023 15:18:21 +0000 (18:18 +0300)]
net: dsa: sync unicast and multicast addresses for VLAN filters too

If certain conditions are met, DSA can install all necessary MAC
addresses on the CPU ports as FDB entries and disable flooding towards
the CPU (we call this RX filtering).

There is one corner case where this does not work.

ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
ip link set swp0 master br0 && ip link set swp0 up
ip link add link swp0 name swp0.100 type vlan id 100
ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100

Traffic through swp0.100 is broken, because the bridge turns on VLAN
filtering in the swp0 port (causing RX packets to be classified to the
FDB database corresponding to the VID from their 802.1Q header), and
although the 8021q module does call dev_uc_add() towards the real
device, that API is VLAN-unaware, so it only contains the MAC address,
not the VID; and DSA's current implementation of ndo_set_rx_mode() is
only for VID 0 (corresponding to FDB entries which are installed in an
FDB database which is only hit when the port is VLAN-unaware).

It's interesting to understand why the bridge does not turn on
IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
that this is a regression caused by the logic in commit 2796d0c648c9
("bridge: Automatically manage port promiscuous mode."). After all,
a bridge port needs to have IFF_PROMISC by its very nature - it needs to
receive and forward frames with a MAC DA different from the bridge
ports' MAC addresses.

While that may be true, when the bridge is VLAN-aware *and* it has a
single port, there is no real reason to enable promiscuity even if that
is an automatic port, with flooding and learning (there is nowhere for
packets to go except to the BR_FDB_LOCAL entries), and this is how the
corner case appears. Adding a second automatic interface to the bridge
would make swp0 promisc as well, and would mask the corner case.

Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
pass a VLAN ID), the only way to address that problem is to install host
FDB entries for the cartesian product of RX filtering MAC addresses and
VLAN RX filters.

Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
Steffen Bätz [Wed, 29 Mar 2023 15:01:40 +0000 (12:01 -0300)]
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only

Do not set the MV88E6XXX_PORT_CTL0_IGMP_MLD_SNOOP bit on CPU or DSA ports.

This allows the host CPU port to be a regular IGMP listener by sending out
IGMP Membership Reports, which would otherwise not be forwarded by the
mv88exxx chip, but directly looped back to the CPU port itself.

Fixes: 54d792f257c6 ("net: dsa: Centralise global and port setup code into mv88e6xxx.")
Signed-off-by: Steffen Bätz <steffen@innosonix.de>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329150140.701559-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm...
Daniel Vetter [Thu, 30 Mar 2023 18:15:06 +0000 (20:15 +0200)]
Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes

- revert gpu time fdinfo support
- reference leak fix on imported buffers

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/de8e08c2599ec0e22456ae36e9757b9ff14c2124.camel@pengutronix.de
13 months agothermal: intel: powerclamp: Fix cpumask and max_idle module parameters
David Arcari [Thu, 30 Mar 2023 13:42:18 +0000 (09:42 -0400)]
thermal: intel: powerclamp: Fix cpumask and max_idle module parameters

When cpumask is specified as a module parameter the value is
overwritten by the module init routine.  This can easily be fixed
by checking to see if the mask has already been allocated in the
init routine.

When max_idle is specified as a module parameter a panic will occur.
The problem is that the idle_injection_cpu_mask is not allocated until
the module init routine executes. This can easily be fixed by allocating
the cpumask if it's not already allocated.

Fixes: ebf519710218 ("thermal: intel: powerclamp: Add two module parameters")
Signed-off-by: David Arcari <darcari@redhat.com>
Reviewed-by: Srinivas Pandruvada<srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
13 months agoMerge tag 'amd-drm-fixes-6.3-2023-03-30' of https://gitlab.freedesktop.org/agd5f...
Daniel Vetter [Thu, 30 Mar 2023 17:59:06 +0000 (19:59 +0200)]
Merge tag 'amd-drm-fixes-6.3-2023-03-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.3-2023-03-30:

amdgpu:
- Hibernation regression fix

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230330153859.18332-1-alexander.deucher@amd.com
13 months agoMerge tag 'drm-misc-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-misc...
Daniel Vetter [Thu, 30 Mar 2023 16:56:52 +0000 (18:56 +0200)]
Merge tag 'drm-misc-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

 * various ivpu fixes
 * fix nouveau backlight registration
 * fix buddy allocator in 32-bit systems

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230330141006.GA22908@linux-uq9g