linux-2.6-block.git
4 weeks agoMerge branch 'for-6.10/io_uring' into for-next
Jens Axboe [Wed, 17 Apr 2024 14:29:44 +0000 (08:29 -0600)]
Merge branch 'for-6.10/io_uring' into for-next

* for-6.10/io_uring:
  io-wq: Drop intermediate step between pending list and active work
  io-wq: write next_work before dropping acct_lock

4 weeks agoio-wq: Drop intermediate step between pending list and active work
Gabriel Krisman Bertazi [Tue, 16 Apr 2024 02:10:54 +0000 (22:10 -0400)]
io-wq: Drop intermediate step between pending list and active work

next_work is only used to make the work visible for
cancellation. Instead, we can just directly write to cur_work before
dropping the acct_lock and avoid the extra hop.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-3-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio-wq: write next_work before dropping acct_lock
Gabriel Krisman Bertazi [Tue, 16 Apr 2024 02:10:53 +0000 (22:10 -0400)]
io-wq: write next_work before dropping acct_lock

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active work") closed a race between a cancellation and the work
being removed from the wq for execution.  To ensure the request is
always reachable by the cancellation, we need to move it within the wq
lock, which also synchronizes the cancellation.  But commit
42abc95f05bf ("io-wq: decouple work_list protection from the big
wqe->lock") replaced the wq lock here and accidentally reintroduced the
race by releasing the acct_lock too early.

In other words:

        worker                |     cancellation
work = io_get_next_work()     |
raw_spin_unlock(&acct->lock); |
      |
                              | io_acct_cancel_pending_work
                              | io_wq_worker_cancel()
worker->next_work = work

Using acct_lock is still enough since we synchronize on it on
io_acct_cancel_pending_work.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoMerge branch 'for-6.10/io_uring' into for-next
Jens Axboe [Mon, 15 Apr 2024 19:06:36 +0000 (13:06 -0600)]
Merge branch 'for-6.10/io_uring' into for-next

* for-6.10/io_uring:
  io_uring/sqpoll: work around a potential audit memory leak

4 weeks agoio_uring/sqpoll: work around a potential audit memory leak
Jens Axboe [Thu, 21 Mar 2024 13:38:38 +0000 (07:38 -0600)]
io_uring/sqpoll: work around a potential audit memory leak

kmemleak complains that there's a memory leak related to connect
handling:

unreferenced object 0xffff0001093bdf00 (size 128):
comm "iou-sqp-455", pid 457, jiffies 4294894164
hex dump (first 32 bytes):
02 00 fa ea 7f 00 00 01 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
backtrace (crc 2e481b1a):
[<00000000c0a26af4>] kmemleak_alloc+0x30/0x38
[<000000009c30bb45>] kmalloc_trace+0x228/0x358
[<000000009da9d39f>] __audit_sockaddr+0xd0/0x138
[<0000000089a93e34>] move_addr_to_kernel+0x1a0/0x1f8
[<000000000b4e80e6>] io_connect_prep+0x1ec/0x2d4
[<00000000abfbcd99>] io_submit_sqes+0x588/0x1e48
[<00000000e7c25e07>] io_sq_thread+0x8a4/0x10e4
[<00000000d999b491>] ret_from_fork+0x10/0x20

which can can happen if:

1) The command type does something on the prep side that triggers an
   audit call.
2) The thread hasn't done any operations before this that triggered
   an audit call inside ->issue(), where we have audit_uring_entry()
   and audit_uring_exit().

Work around this by issuing a blanket NOP operation before the SQPOLL
does anything.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoMerge branch 'for-6.10/block' into for-next
Jens Axboe [Mon, 15 Apr 2024 14:12:29 +0000 (08:12 -0600)]
Merge branch 'for-6.10/block' into for-next

* for-6.10/block:
  block: Call blkdev_dio_unaligned() from blkdev_direct_IO()

4 weeks agoblock: Call blkdev_dio_unaligned() from blkdev_direct_IO()
John Garry [Mon, 15 Apr 2024 12:20:20 +0000 (12:20 +0000)]
block: Call blkdev_dio_unaligned() from blkdev_direct_IO()

blkdev_dio_unaligned() is called from __blkdev_direct_IO(),
__blkdev_direct_IO_simple(), and __blkdev_direct_IO_async(), and all these
are only called from blkdev_direct_IO().

Move the blkdev_dio_unaligned() call to the common callsite,
blkdev_direct_IO().

Pass those functions the bdev pointer from blkdev_direct_IO(), as it is
non-trivial to look up.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240415122020.1541594-1-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoMerge branch 'for-6.10/io_uring' into for-next
Jens Axboe [Mon, 15 Apr 2024 14:11:11 +0000 (08:11 -0600)]
Merge branch 'for-6.10/io_uring' into for-next

* for-6.10/io_uring: (72 commits)
  io_uring/notif: shrink account_pages to u32
  io_uring/notif: remove ctx var from io_notif_tw_complete
  io_uring/notif: refactor io_tx_ubuf_complete()
  io_uring: ensure overflow entries are dropped when ring is exiting
  io_uring/timeout: remove duplicate initialization of the io_timeout list.
  io_uring: consolidate overflow flushing
  io_uring: always lock __io_cqring_overflow_flush
  io_uring: open code io_cqring_overflow_flush()
  io_uring: remove extra SQPOLL overflow flush
  io_uring: unexport io_req_cqe_overflow()
  io_uring: separate header for exported net bits
  io_uring/net: set MSG_ZEROCOPY for sendzc in advance
  io_uring/net: get rid of io_notif_complete_tw_ext
  io_uring/net: merge ubuf sendzc callbacks
  io_uring: return void from io_put_kbuf_comp()
  io_uring: remove io_req_put_rsrc_locked()
  io_uring: remove async request cache
  io_uring: turn implicit assumptions into a warning
  io_uring: kill dead code in io_req_complete_post
  io_uring/kbuf: remove dead define
  ...

4 weeks agoMerge branch 'for-6.10/block' into for-next
Jens Axboe [Mon, 15 Apr 2024 14:11:08 +0000 (08:11 -0600)]
Merge branch 'for-6.10/block' into for-next

* for-6.10/block:
  blk-cgroup: use group allocation/free of per-cpu counters API
  btrfs use bio_list_merge_init
  dm: use bio_list_merge_init
  blk-cgroup: use bio_list_merge_init
  block: add a bio_list_merge_init helper
  blk-throttle: Only use seq_printf() in tg_prfill_limit()
  blk-mq: don't schedule block kworker on isolated CPUs
  brd: Remove use of page->index

4 weeks agoio_uring/notif: shrink account_pages to u32
Pavel Begunkov [Mon, 15 Apr 2024 12:50:13 +0000 (13:50 +0100)]
io_uring/notif: shrink account_pages to u32

->account_pages is the number of pages we account against the user
derived from unsigned len, it definitely fits into unsigned, which saves
some space in struct io_notif_data.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19f2687fcb36daa74d86f4a27bfb3d35cffec318.1713185320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/notif: remove ctx var from io_notif_tw_complete
Pavel Begunkov [Mon, 15 Apr 2024 12:50:12 +0000 (13:50 +0100)]
io_uring/notif: remove ctx var from io_notif_tw_complete

We don't need ctx in the hottest path, i.e. registered buffers,
let's get it only when we need it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e7345e268ffaeaf79b4c8f3a5d019d6a87a3d1f1.1713185320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/notif: refactor io_tx_ubuf_complete()
Pavel Begunkov [Mon, 15 Apr 2024 12:50:11 +0000 (13:50 +0100)]
io_uring/notif: refactor io_tx_ubuf_complete()

Flip the dec_and_test "if", that makes the function extension easier in
the future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/43939e2b04dff03bff5d7227c98afedf951227b3.1713185320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: ensure overflow entries are dropped when ring is exiting
Jens Axboe [Fri, 12 Apr 2024 19:16:20 +0000 (13:16 -0600)]
io_uring: ensure overflow entries are dropped when ring is exiting

A previous consolidation cleanup missed handling the case where the ring
is dying, and __io_cqring_overflow_flush() doesn't flush entries if the
CQ ring is already full. This is fine for the normal CQE overflow
flushing, but if the ring is going away, we need to flush everything,
even if it means simply freeing the overflown entries.

Fixes: 6c948ec44b29 ("io_uring: consolidate overflow flushing")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/timeout: remove duplicate initialization of the io_timeout list.
Ruyi Zhang [Thu, 11 Apr 2024 05:59:53 +0000 (13:59 +0800)]
io_uring/timeout: remove duplicate initialization of the io_timeout list.

In the __io_timeout_prep function, the io_timeout list is initialized
twice, removing the meaningless second initialization.

Signed-off-by: Ruyi Zhang <ruyi.zhang@samsung.com>
Link: https://lore.kernel.org/r/20240411055953.2029218-1-ruyi.zhang@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: consolidate overflow flushing
Pavel Begunkov [Wed, 10 Apr 2024 01:26:55 +0000 (02:26 +0100)]
io_uring: consolidate overflow flushing

Consolidate __io_cqring_overflow_flush and io_cqring_overflow_kill()
into a single function as it once was, it's easier to work with it this
way.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/986b42c35e76a6be7aa0cdcda0a236a2222da3a7.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: always lock __io_cqring_overflow_flush
Pavel Begunkov [Wed, 10 Apr 2024 01:26:54 +0000 (02:26 +0100)]
io_uring: always lock __io_cqring_overflow_flush

Conditional locking is never great, in case of
__io_cqring_overflow_flush(), which is a slow path, it's not justified.
Don't handle IOPOLL separately, always grab uring_lock for overflow
flushing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/162947df299aa12693ac4b305dacedab32ec7976.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: open code io_cqring_overflow_flush()
Pavel Begunkov [Wed, 10 Apr 2024 01:26:53 +0000 (02:26 +0100)]
io_uring: open code io_cqring_overflow_flush()

There is only one caller of io_cqring_overflow_flush(), open code it

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a1fecd56d9dba923ed8d4d159727fa939d3baa2a.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: remove extra SQPOLL overflow flush
Pavel Begunkov [Wed, 10 Apr 2024 01:26:52 +0000 (02:26 +0100)]
io_uring: remove extra SQPOLL overflow flush

c1edbf5f081be ("io_uring: flag SQPOLL busy condition to userspace")
added an extra overflowed CQE flush in the SQPOLL submission path due to
backpressure, which was later removed. Remove the flush and let
io_cqring_wait() / iopoll handle it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2a83b0724ca6ca9d16c7d79a51b77c81876b2e39.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: unexport io_req_cqe_overflow()
Pavel Begunkov [Wed, 10 Apr 2024 01:26:51 +0000 (02:26 +0100)]
io_uring: unexport io_req_cqe_overflow()

There are no users of io_req_cqe_overflow() apart from io_uring.c, make
it static.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f4295eb2f9eb98d5db38c0578f57f0b86bfe0d8c.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: separate header for exported net bits
Pavel Begunkov [Tue, 9 Apr 2024 21:05:53 +0000 (14:05 -0700)]
io_uring: separate header for exported net bits

We're exporting some io_uring bits to networking, e.g. for implementing
a net callback for io_uring cmds, but we don't want to expose more than
needed. Add a separate header for networking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20240409210554.1878789-1-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: set MSG_ZEROCOPY for sendzc in advance
Pavel Begunkov [Sun, 7 Apr 2024 23:54:57 +0000 (00:54 +0100)]
io_uring/net: set MSG_ZEROCOPY for sendzc in advance

We can set MSG_ZEROCOPY at the preparation step, do it so we don't have
to care about it later in the issue callback.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c2c22aaa577624977f045979a6db2b9fb2e5648c.1712534031.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: get rid of io_notif_complete_tw_ext
Pavel Begunkov [Sun, 7 Apr 2024 23:54:56 +0000 (00:54 +0100)]
io_uring/net: get rid of io_notif_complete_tw_ext

io_notif_complete_tw_ext() can be removed and combined with
io_notif_complete_tw to make it simpler without sacrificing
anything.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/025a124a5e20e2474a57e2f04f16c422eb83063c.1712534031.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: merge ubuf sendzc callbacks
Pavel Begunkov [Sun, 7 Apr 2024 23:54:55 +0000 (00:54 +0100)]
io_uring/net: merge ubuf sendzc callbacks

Splitting io_tx_ubuf_callback_ext from io_tx_ubuf_callback is a pre
mature optimisation that doesn't give us much. Merge the functions into
one and reclaim some simplicity back.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d44d68f6f7add33a0dcf0b7fd7b73c2dc543604f.1712534031.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: return void from io_put_kbuf_comp()
Ming Lei [Sun, 7 Apr 2024 13:27:59 +0000 (21:27 +0800)]
io_uring: return void from io_put_kbuf_comp()

The only caller doesn't handle the return value of io_put_kbuf_comp(), so
change its return type into void.

Also follow Jens's suggestion to rename it as io_put_kbuf_drop().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240407132759.4056167-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: remove io_req_put_rsrc_locked()
Pavel Begunkov [Fri, 5 Apr 2024 15:50:05 +0000 (16:50 +0100)]
io_uring: remove io_req_put_rsrc_locked()

io_req_put_rsrc_locked() is a weird shim function around
io_req_put_rsrc(). All calls to io_req_put_rsrc() require holding
->uring_lock, so we can just use it directly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a195bc78ac3d2c6fbaea72976e982fe51e50ecdd.1712331455.git.asml.silence@gmail.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: remove async request cache
Pavel Begunkov [Fri, 5 Apr 2024 15:50:04 +0000 (16:50 +0100)]
io_uring: remove async request cache

io_req_complete_post() was a sole user of ->locked_free_list, but
since we just gutted the function, the cache is not used anymore and
can be removed.

->locked_free_list served as an asynhronous counterpart of the main
request (i.e. struct io_kiocb) cache for all unlocked cases like io-wq.
Now they're all forced to be completed into the main cache directly,
off of the normal completion path or via io_free_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7bffccd213e370abd4de480e739d8b08ab6c1326.1712331455.git.asml.silence@gmail.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: turn implicit assumptions into a warning
Pavel Begunkov [Fri, 5 Apr 2024 15:50:03 +0000 (16:50 +0100)]
io_uring: turn implicit assumptions into a warning

io_req_complete_post() is now io-wq only and shouldn't be used outside
of it, i.e. it relies that io-wq holds a ref for the request as
explained in a comment below. Let's add a warning to enforce the
assumption and make sure nobody would try to do anything weird.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1013b60c35d431d0698cafbc53c06f5917348c20.1712331455.git.asml.silence@gmail.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: kill dead code in io_req_complete_post
Ming Lei [Fri, 5 Apr 2024 15:50:02 +0000 (16:50 +0100)]
io_uring: kill dead code in io_req_complete_post

Since commit 8f6c829491fe ("io_uring: remove struct io_tw_state::locked"),
io_req_complete_post() is only called from io-wq submit work, where the
request reference is guaranteed to be grabbed and won't drop to zero
in io_req_complete_post().

Kill the dead code, meantime add req_ref_put() to put the reference.

Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1d8297e2046553153e763a52574f0e0f4d512f86.1712331455.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/kbuf: remove dead define
Jens Axboe [Fri, 29 Mar 2024 23:22:27 +0000 (17:22 -0600)]
io_uring/kbuf: remove dead define

We no longer use IO_BUFFER_LIST_BUF_PER_PAGE, kill it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: fix warnings on shadow variables
Jens Axboe [Fri, 29 Mar 2024 23:19:45 +0000 (17:19 -0600)]
io_uring: fix warnings on shadow variables

There are a few of those:

io_uring/fdinfo.c:170:16: warning: declaration shadows a local variable [-Wshadow]
  170 |                 struct file *f = io_file_from_index(&ctx->file_table, i);
      |                              ^
io_uring/fdinfo.c:53:67: note: previous declaration is here
   53 | __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
      |                                                                   ^
io_uring/cancel.c:187:25: warning: declaration shadows a local variable [-Wshadow]
  187 |                 struct io_uring_task *tctx = node->task->io_uring;
      |                                       ^
io_uring/cancel.c:166:31: note: previous declaration is here
  166 |                              struct io_uring_task *tctx,
      |                                                    ^
io_uring/register.c:371:25: warning: declaration shadows a local variable [-Wshadow]
  371 |                 struct io_uring_task *tctx = node->task->io_uring;
      |                                       ^
io_uring/register.c:312:24: note: previous declaration is here
  312 |         struct io_uring_task *tctx = NULL;
      |                               ^

and a simple cleanup gets rid of them. For the fdinfo case, make a
distinction between the file being passed in (for the ring), and the
registered files we iterate. For the other two cases, just get rid of
shadowed variable, there's no reason to have a new one.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: move mapping/allocation helpers to a separate file
Jens Axboe [Wed, 27 Mar 2024 20:59:09 +0000 (14:59 -0600)]
io_uring: move mapping/allocation helpers to a separate file

Move the related code from io_uring.c into memmap.c. No functional
changes in this patch, just cleaning it up a bit now that the full
transition is done.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: use unpin_user_pages() where appropriate
Jens Axboe [Wed, 13 Mar 2024 21:01:03 +0000 (15:01 -0600)]
io_uring: use unpin_user_pages() where appropriate

There are a few cases of open-rolled loops around unpin_user_page(), use
the generic helper instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring
Jens Axboe [Wed, 13 Mar 2024 02:24:21 +0000 (20:24 -0600)]
io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring

Rather than use remap_pfn_range() for this and manually free later,
switch to using vm_insert_page() and have it Just Work.

This requires a bit of effort on the mmap lookup side, as the ctx
uring_lock isn't held, which  otherwise protects buffer_lists from being
torn down, and it's not safe to grab from mmap context that would
introduce an ABBA deadlock between the mmap lock and the ctx uring_lock.
Instead, lookup the buffer_list under RCU, as the the list is RCU freed
already. Use the existing reference count to determine whether it's
possible to safely grab a reference to it (eg if it's not zero already),
and drop that reference when done with the mapping. If the mmap
reference is the last one, the buffer_list and the associated memory can
go away, since the vma insertion has references to the inserted pages at
that point.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/kbuf: vmap pinned buffer ring
Jens Axboe [Tue, 12 Mar 2024 16:42:27 +0000 (10:42 -0600)]
io_uring/kbuf: vmap pinned buffer ring

This avoids needing to care about HIGHMEM, and it makes the buffer
indexing easier as both ring provided buffer methods are now virtually
mapped in a contigious fashion.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: unify io_pin_pages()
Jens Axboe [Wed, 13 Mar 2024 20:58:14 +0000 (14:58 -0600)]
io_uring: unify io_pin_pages()

Move it into io_uring.c where it belongs, and use it in there as well
rather than have two implementations of this.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: use vmap() for ring mapping
Jens Axboe [Wed, 13 Mar 2024 20:10:40 +0000 (14:10 -0600)]
io_uring: use vmap() for ring mapping

This is the last holdout which does odd page checking, convert it to
vmap just like what is done for the non-mmap path.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: get rid of remap_pfn_range() for mapping rings/sqes
Jens Axboe [Wed, 13 Mar 2024 15:56:14 +0000 (09:56 -0600)]
io_uring: get rid of remap_pfn_range() for mapping rings/sqes

Rather than use remap_pfn_range() for this and manually free later,
switch to using vm_insert_pages() and have it Just Work.

If possible, allocate a single compound page that covers the range that
is needed. If that works, then we can just use page_address() on that
page. If we fail to get a compound page, allocate single pages and use
vmap() to map them into the kernel virtual address space.

This just covers the rings/sqes, the other remaining user of the mmap
remap_pfn_range() user will be converted separately. Once that is done,
we can kill the old alloc/free code.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agomm: add nommu variant of vm_insert_pages()
Jens Axboe [Sat, 16 Mar 2024 13:21:43 +0000 (07:21 -0600)]
mm: add nommu variant of vm_insert_pages()

An identical one exists for vm_insert_page(), add one for
vm_insert_pages() to avoid needing to check for CONFIG_MMU in code using
it.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: Avoid anonymous enums in io_uring uapi
Gabriel Krisman Bertazi [Thu, 28 Mar 2024 21:09:35 +0000 (17:09 -0400)]
io_uring: Avoid anonymous enums in io_uring uapi

While valid C, anonymous enums confuse Cython (Python to C translator),
as reported by Ritesh (YoSTEALTH) [1] .  Since people rely on it when
building against liburing and we want to keep this header in sync with
the library version, let's name the existing enums in the uapi header.

[1] https://github.com/cython/cython/issues/3240

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240328210935.25640-1-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: use the right type for work_llist empty check
Jens Axboe [Tue, 26 Mar 2024 00:53:33 +0000 (18:53 -0600)]
io_uring: use the right type for work_llist empty check

io_task_work_pending() uses wq_list_empty() on ctx->work_llist, but it's
not an io_wq_work_list, it's a struct llist_head. They both have
->first as head-of-list, and it turns out the checks are identical. But
be proper and use the right helper.

Fixes: dac6a0eae793 ("io_uring: ensure iopoll runs local task work as well")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: Remove the now superfluous sentinel elements from ctl_table array
Joel Granados [Thu, 28 Mar 2024 15:57:53 +0000 (16:57 +0100)]
io_uring: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which will
reduce the overall build time size of the kernel and run time memory
bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel element from kernel_io_uring_disabled_table

Signed-off-by: Joel Granados <j.granados@samsung.com>
Link: https://lore.kernel.org/r/20240328-jag-sysctl_remset_misc-v1-6-47c1463b3af2@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: Remove unused function
Jiapeng Chong [Thu, 28 Mar 2024 02:23:24 +0000 (10:23 +0800)]
io_uring: Remove unused function

The function are defined in the io_uring.c file, but not called
elsewhere, so delete the unused function.

io_uring/io_uring.c:646:20: warning: unused function '__io_cq_unlock'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8660
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240328022324.78029-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: re-arrange Makefile order
Jens Axboe [Wed, 27 Mar 2024 04:13:41 +0000 (22:13 -0600)]
io_uring: re-arrange Makefile order

The object list is a bit of a mess, with core and opcode files mixed in.
Re-arrange it so that we have the core bits first, and then opcode
specific files after that.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: refill request cache in memory order
Jens Axboe [Tue, 26 Mar 2024 01:07:22 +0000 (19:07 -0600)]
io_uring: refill request cache in memory order

The allocator will generally return memory in order, but
__io_alloc_req_refill() then adds them to a stack and we'll extract them
in the opposite order. This obviously isn't a huge deal, but:

1) it makes debugging easier when they are in order
2) keeping them in-order is the right thing to do
3) reduces the code for adding them to the stack

Just add them in reverse to the stack.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/poll: shrink alloc cache size to 32
Jens Axboe [Thu, 21 Mar 2024 13:53:07 +0000 (07:53 -0600)]
io_uring/poll: shrink alloc cache size to 32

This should be plenty, rather than the default of 128, and matches what
we have on the rsrc and futex side as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/alloc_cache: switch to array based caching
Jens Axboe [Wed, 20 Mar 2024 21:19:44 +0000 (15:19 -0600)]
io_uring/alloc_cache: switch to array based caching

Currently lists are being used to manage this, but best practice is
usually to have these in an array instead as that it cheaper to manage.

Outside of that detail, games are also played with KASAN as the list
is inside the cached entry itself.

Finally, all users of this need a struct io_cache_entry embedded in
their struct, which is union'ized with something else in there that
isn't used across the free -> realloc cycle.

Get rid of all of that, and simply have it be an array. This will not
change the memory used, as we're just trading an 8-byte member entry
for the per-elem array size.

This reduces the overhead of the recycled allocations, and it reduces
the amount of code code needed to support recycling to about half of
what it currently is.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: drop ->prep_async()
Jens Axboe [Tue, 19 Mar 2024 02:48:38 +0000 (20:48 -0600)]
io_uring: drop ->prep_async()

It's now unused, drop the code related to it. This includes the
io_issue_defs->manual alloc field.

While in there, and since ->async_size is now being used a bit more
frequently and in the issue path, move it to io_issue_defs[].

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/uring_cmd: defer SQE copying until it's needed
Jens Axboe [Wed, 20 Mar 2024 21:23:47 +0000 (15:23 -0600)]
io_uring/uring_cmd: defer SQE copying until it's needed

The previous commit turned on async data for uring_cmd, and did the
basic conversion of setting everything up on the prep side. However, for
a lot of use cases, -EIOCBQUEUED will get returned on issue, as the
operation got successfully queued. For that case, a persistent SQE isn't
needed, as it's just used for issue.

Unless execution goes async immediately, defer copying the double SQE
until it's necessary.

This greatly reduces the overhead of such commands, as evidenced by
a perf diff from before and after this change:

    10.60%     -8.58%  [kernel.vmlinux]  [k] io_uring_cmd_prep

where the prep side drops from 10.60% to ~2%, which is more expected.
Performance also rises from ~113M IOPS to ~122M IOPS, bringing us back
to where it was before the async command prep.

Tested-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/uring_cmd: switch to always allocating async data
Jens Axboe [Tue, 19 Mar 2024 02:41:58 +0000 (20:41 -0600)]
io_uring/uring_cmd: switch to always allocating async data

Basic conversion ensuring async_data is allocated off the prep path. Adds
a basic alloc cache as well, as passthrough IO can be quite high in rate.

Tested-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: move connect to always using async data
Jens Axboe [Tue, 19 Mar 2024 02:37:22 +0000 (20:37 -0600)]
io_uring/net: move connect to always using async data

While doing that, get rid of io_async_connect and just use the generic
io_async_msghdr. Both of them have a struct sockaddr_storage in there,
and while io_async_msghdr is bigger, if the same type can be used then
the netmsg_cache can get reused for connect as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/rw: add iovec recycling
Jens Axboe [Mon, 18 Mar 2024 22:31:44 +0000 (16:31 -0600)]
io_uring/rw: add iovec recycling

Let the io_async_rw hold on to the iovec and reuse it, rather than always
allocate and free them.

Also enables KASAN for the iovec entries, so that reuse can be detected
even while they are in the cache.

While doing so, shrink io_async_rw by getting rid of the bigger embedded
fast iovec. Since iovecs are being recycled now, shrink it from 8 to 1.
This reduces the io_async_rw size from 264 to 160 bytes, a 40% reduction.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/rw: cleanup retry path
Jens Axboe [Sat, 23 Mar 2024 02:41:18 +0000 (20:41 -0600)]
io_uring/rw: cleanup retry path

We no longer need to gate a potential retry on whether or not the
context matches our original task, as all read/write operations have
been fully prepared upfront. This means there's never any re-import
needed, and hence we can always retry requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: get rid of struct io_rw_state
Jens Axboe [Mon, 18 Mar 2024 22:25:58 +0000 (16:25 -0600)]
io_uring: get rid of struct io_rw_state

A separate state struct is not needed anymore, just fold it in with
io_async_rw.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/rw: always setup io_async_rw for read/write requests
Jens Axboe [Mon, 18 Mar 2024 22:13:01 +0000 (16:13 -0600)]
io_uring/rw: always setup io_async_rw for read/write requests

read/write requests try to put everything on the stack, and then alloc
and copy if a retry is needed. This necessitates a bunch of nasty code
that deals with intermediate state.

Get rid of this, and have the prep side setup everything that is needed
upfront, which greatly simplifies the opcode handlers.

This includes adding an alloc cache for io_async_rw, to make it cheap
to handle.

In terms of cost, this should be basically free and transparent. For
the worst case of {READ,WRITE}_FIXED which didn't need it before,
performance is unaffected in the normal peak workload that is being
used to test that. Still runs at 122M IOPS.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()
Jens Axboe [Mon, 18 Mar 2024 19:52:42 +0000 (13:52 -0600)]
io_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()

Now that iovec recycling is being done, the iovec is no longer being
freed in there. Hence the kmsg parameter is now useless.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: add iovec recycling
Jens Axboe [Sat, 16 Mar 2024 21:33:53 +0000 (15:33 -0600)]
io_uring/net: add iovec recycling

Right now the io_async_msghdr is recycled to avoid the overhead of
allocating+freeing it for every request. But the iovec is not included,
hence that will be allocated and freed for each transfer regardless.
This commit enables recyling of the iovec between io_async_msghdr
recycles. This avoids alloc+free for each one if an iovec is used, and
on top of that, it extends the cache hot nature of msg to the iovec as
well.

Also enables KASAN for the iovec entries, so that reuse can be detected
even while they are in the cache.

The io_async_msghdr also shrinks from 376 -> 288 bytes, an 88 byte
saving (or ~23% smaller), as the fast_iovec entry is dropped from 8
entries to a single entry. There's no point keeping a big fast iovec
entry, if iovecs aren't being allocated and freed continually.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: remove (now) dead code in io_netmsg_recycle()
Jens Axboe [Thu, 21 Mar 2024 01:09:50 +0000 (19:09 -0600)]
io_uring/net: remove (now) dead code in io_netmsg_recycle()

All net commands have async data at this point, there's no reason to
check if this is the case or not.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: kill io_msg_alloc_async_prep()
Jens Axboe [Mon, 18 Mar 2024 16:07:37 +0000 (10:07 -0600)]
io_uring: kill io_msg_alloc_async_prep()

We now ONLY call io_msg_alloc_async() from inside prep handling, which
is always locked. No need for this helper anymore, or the check in
io_msg_alloc_async() on whether the ring is locked or not.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: get rid of ->prep_async() for send side
Jens Axboe [Mon, 18 Mar 2024 14:09:47 +0000 (08:09 -0600)]
io_uring/net: get rid of ->prep_async() for send side

Move the io_async_msghdr out of the issue path and into prep handling,
e it's now done unconditionally and hence does not need to be part
of the issue path. This means any usage of io_sendrecv_prep_async() and
io_sendmsg_prep_async(), and hence the forced async setup path is now
unified with the normal prep setup.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: get rid of ->prep_async() for receive side
Jens Axboe [Mon, 18 Mar 2024 13:36:03 +0000 (07:36 -0600)]
io_uring/net: get rid of ->prep_async() for receive side

Move the io_async_msghdr out of the issue path and into prep handling,
since it's now done unconditionally and hence does not need to be part
of the issue path. This reduces the footprint of the multishot fast
path of multiple invocations of ->issue() per prep, and also means that
using ->prep_async() can be dropped for recvmsg asthis is now done via
setup on the prep side.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: always set kmsg->msg.msg_control_user before issue
Jens Axboe [Fri, 12 Apr 2024 18:39:54 +0000 (12:39 -0600)]
io_uring/net: always set kmsg->msg.msg_control_user before issue

We currently set this separately for async/sync entry, but let's just
move it to a generic pre-issue spot and eliminate the difference
between the two.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: always setup an io_async_msghdr
Jens Axboe [Sat, 16 Mar 2024 23:26:09 +0000 (17:26 -0600)]
io_uring/net: always setup an io_async_msghdr

Rather than use an on-stack one and then need to allocate and copy if
async execution is required, always grab one upfront. This should be
very cheap, and potentially even have cache hotness benefits for
back-to-back send/recv requests.

For any recv type of request, this is probably a good choice in general,
as it's expected that no data is available initially. For send this is
not necessarily the case, as space in the socket buffer is expected to
be available. However, getting a cached io_async_msghdr is very cheap,
and as it should be cache hot, probably the difference here is neglible,
if any.

A nice side benefit is that io_setup_async_msg can get killed
completely, which has some nasty iovec manipulation code.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: unify cleanup handling
Jens Axboe [Tue, 12 Mar 2024 15:38:08 +0000 (09:38 -0600)]
io_uring/net: unify cleanup handling

Now that recv/recvmsg both do the same cleanup, put it in the retry and
finish handlers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: switch io_recv() to using io_async_msghdr
Jens Axboe [Tue, 5 Mar 2024 22:39:16 +0000 (15:39 -0700)]
io_uring/net: switch io_recv() to using io_async_msghdr

No functional changes in this patch, just in preparation for carrying
more state than what is available now, if necessary.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr
Jens Axboe [Tue, 5 Mar 2024 16:34:21 +0000 (09:34 -0700)]
io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr

No functional changes in this patch, just in preparation for carrying
more state then what is being done now, if necessary. While unifying
some of this code, add a generic send setup prep handler that they can
both use.

This gets rid of some manual msghdr and sockaddr on the stack, and makes
it look a bit more like the sendmsg/recvmsg variants. Going forward, more
can get unified on top.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/alloc_cache: shrink default max entries from 512 to 128
Jens Axboe [Sun, 17 Mar 2024 00:23:44 +0000 (18:23 -0600)]
io_uring/alloc_cache: shrink default max entries from 512 to 128

In practice, we just need to recycle a few elements for (by far) most
use cases. Shrink the total size down from 512 to 128, which should be
more than plenty.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: remove timeout/poll specific cancelations
Jens Axboe [Sat, 16 Mar 2024 17:10:21 +0000 (11:10 -0600)]
io_uring: remove timeout/poll specific cancelations

For historical reasons these were special cased, as they were the only
ones that needed cancelation. But now we handle cancelations generally,
and hence there's no need to check for these in
io_ring_ctx_wait_and_kill() when io_uring_try_cancel_requests() handles
both these and the rest as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: flush delayed fallback task_work in cancelation
Jens Axboe [Mon, 18 Mar 2024 16:41:25 +0000 (10:41 -0600)]
io_uring: flush delayed fallback task_work in cancelation

Just like we run the inline task_work, ensure we also factor in and
run the fallback task_work.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: clean up io_lockdep_assert_cq_locked
Pavel Begunkov [Mon, 18 Mar 2024 22:00:35 +0000 (22:00 +0000)]
io_uring: clean up io_lockdep_assert_cq_locked

Move CONFIG_PROVE_LOCKING checks inside of io_lockdep_assert_cq_locked()
and kill the else branch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/bbf33c429c9f6d7207a8fe66d1a5866ec2c99850.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: refactor io_req_complete_post()
Pavel Begunkov [Mon, 18 Mar 2024 22:00:34 +0000 (22:00 +0000)]
io_uring: refactor io_req_complete_post()

Make io_req_complete_post() to push all IORING_SETUP_IOPOLL requests
to task_work, it's much cleaner and should normally happen. We couldn't
do it before because there was a possibility of looping in

complete_post() -> tw -> complete_post() -> ...

Also, unexport the function and inline __io_req_complete_post().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/ea19c032ace3e0dd96ac4d991a063b0188037014.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: remove current check from complete_post
Pavel Begunkov [Mon, 18 Mar 2024 22:00:33 +0000 (22:00 +0000)]
io_uring: remove current check from complete_post

task_work execution is now always locked, and we shouldn't get into
io_req_complete_post() from them. That means that complete_post() is
always called out of the original task context and we don't even need to
check current.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/24ec27f27db0d8f58c974d8118dca1d345314ddc.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: get rid of intermediate aux cqe caches
Pavel Begunkov [Mon, 18 Mar 2024 22:00:32 +0000 (22:00 +0000)]
io_uring: get rid of intermediate aux cqe caches

io_post_aux_cqe(), which is used for multishot requests, delays
completions by putting CQEs into a temporary array for the purpose
completion lock/flush batching.

DEFER_TASKRUN doesn't need any locking, so for it we can put completions
directly into the CQ and defer post completion handling with a flag.
That leaves !DEFER_TASKRUN, which is not that interesting / hot for
multishot requests, so have conditional locking with deferred flush
for them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/b1d05a81fd27aaa2a07f9860af13059e7ad7a890.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: refactor io_fill_cqe_req_aux
Pavel Begunkov [Mon, 18 Mar 2024 22:00:31 +0000 (22:00 +0000)]
io_uring: refactor io_fill_cqe_req_aux

The restriction on multishot execution context disallowing io-wq is
driven by rules of io_fill_cqe_req_aux(), it should only be called in
the master task context, either from the syscall path or in task_work.
Since task_work now always takes the ctx lock implying
IO_URING_F_COMPLETE_DEFER, we can just assume that the function is
always called with its defer argument set to true.

Kill the argument. Also rename the function for more consistency as
"fill" in CQE related functions was usually meant for raw interfaces
only copying data into the CQ without any locking, waking the user
and other accounting "post" functions take care of.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: remove struct io_tw_state::locked
Pavel Begunkov [Mon, 18 Mar 2024 22:00:30 +0000 (22:00 +0000)]
io_uring: remove struct io_tw_state::locked

ctx is always locked for task_work now, so get rid of struct
io_tw_state::locked. Note I'm stopping one step before removing
io_tw_state altogether, which is not empty, because it still serves the
purpose of indicating which function is a tw callback and forcing users
not to invoke them carelessly out of a wrong context. The removal can
always be done later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring: force tw ctx locking
Pavel Begunkov [Mon, 18 Mar 2024 22:00:29 +0000 (22:00 +0000)]
io_uring: force tw ctx locking

We can run normal task_work without locking the ctx, however we try to
lock anyway and most handlers prefer or require it locked. It might have
been interesting to multi-submitter ring with high contention completing
async read/write requests via task_work, however that will still need to
go through io_req_complete_post() and potentially take the lock for
rsrc node putting or some other case.

In other words, it's hard to care about it, so alawys force the locking.
The case described would also because of various io_uring caches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/6ae858f2ef562e6ed9f13c60978c0d48926954ba.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/rw: avoid punting to io-wq directly
Pavel Begunkov [Mon, 18 Mar 2024 22:00:28 +0000 (22:00 +0000)]
io_uring/rw: avoid punting to io-wq directly

kiocb_done() should care to specifically redirecting requests to io-wq.
Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let
the core code io_uring handle offloading.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agonvme/io_uring: use helper for polled completions
Jens Axboe [Tue, 19 Mar 2024 23:10:50 +0000 (17:10 -0600)]
nvme/io_uring: use helper for polled completions

NVMe is making up issue_flags, which is a no-no in general, and to make
matters worse, they are completely the wrong ones. For a pure polled
request, which it does check for, we're already inside the
ctx->uring_lock when the completions are run off io_do_iopoll(). Hence
the correct flag would be '0' rather than IO_URING_F_UNLOCKED.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/cmd: document some uring_cmd related helpers
Pavel Begunkov [Mon, 18 Mar 2024 22:00:26 +0000 (22:00 +0000)]
io_uring/cmd: document some uring_cmd related helpers

Add comments warning users that they're only allowed to pass issue_flags
that were given from io_uring.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/82ff8a45f2c3eb5f3a04a33f0692e5e4a1320455.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/cmd: fix tw <-> issue_flags conversion
Pavel Begunkov [Mon, 18 Mar 2024 22:00:25 +0000 (22:00 +0000)]
io_uring/cmd: fix tw <-> issue_flags conversion

!IO_URING_F_UNLOCKED does not translate to availability of the deferred
completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
pass and look for to use io_req_complete_defer() and other variants.

Luckily, it's not a real problem as two wrongs actually made it right,
at least as far as io_uring_cmd_work() goes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/aef76d34fe9410df8ecc42a14544fd76cd9d8b9e.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/cmd: kill one issue_flags to tw conversion
Pavel Begunkov [Mon, 18 Mar 2024 22:00:24 +0000 (22:00 +0000)]
io_uring/cmd: kill one issue_flags to tw conversion

io_uring cmd converts struct io_tw_state to issue_flags and later back
to io_tw_state, it's awfully ill-fated, not to mention that intermediate
issue_flags state is not correct.

Get rid of the last conversion, drag through tw everything that came
with IO_URING_F_UNLOCKED, and replace io_req_complete_defer() with a
direct call to io_req_complete_defer(), at least for the time being.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/c53fa3df749752bd058cf6f824a90704822d6bcc.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoio_uring/cmd: move io_uring_try_cancel_uring_cmd()
Pavel Begunkov [Mon, 18 Mar 2024 22:00:23 +0000 (22:00 +0000)]
io_uring/cmd: move io_uring_try_cancel_uring_cmd()

io_uring_try_cancel_uring_cmd() is a part of the cmd handling so let's
move it closer to all cmd bits into uring_cmd.c

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/43a3937af4933655f0fd9362c381802f804f43de.1710799188.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 weeks agoLinux 6.9-rc4
Linus Torvalds [Sun, 14 Apr 2024 20:38:39 +0000 (13:38 -0700)]
Linux 6.9-rc4

4 weeks agoMerge tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Apr 2024 18:41:51 +0000 (11:41 -0700)]
Merge tag 'pull-sysfs-annotation-fix' of git://git./linux/kernel/git/viro/vfs

Pull sysfs fix from Al Viro:
 "Get rid of lockdep false positives around sysfs/overlayfs

  syzbot has uncovered a class of lockdep false positives for setups
  with sysfs being one of the backing layers in overlayfs. The root
  cause is that of->mutex allocated when opening a sysfs file read-only
  (which overlayfs might do) is confused with of->mutex of a file opened
  writable (held in write to sysfs file, which overlayfs won't do).

  Assigning them separate lockdep classes fixes that bunch and it's
  obviously safe"

* tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kernfs: annotate different lockdep class for of->mutex of writable files

4 weeks agoMerge tag 'x86-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 14 Apr 2024 17:48:51 +0000 (10:48 -0700)]
Merge tag 'x86-urgent-2024-04-14' of git://git./linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Follow up fixes for the BHI mitigations code

 - Fix !SPECULATION_MITIGATIONS bug not turning off mitigations as
   expected

 - Work around an APIC emulation bug when the kernel is built with Clang
   and run as a SEV guest

 - Follow up x86 topology fixes

* tag 'x86-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Move TOPOEXT enablement into the topology parser
  x86/cpu/amd: Make the NODEID_MSR union actually work
  x86/cpu/amd: Make the CPUID 0x80000008 parser correct
  x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI
  x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
  x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
  x86/bugs: Fix BHI handling of RRSBA
  x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
  x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
  x86/bugs: Fix BHI documentation
  x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
  x86/topology: Don't update cpu_possible_map in topo_set_cpuids()
  x86/bugs: Fix return type of spectre_bhi_state()
  x86/apic: Force native_apic_mem_read() to use the MOV instruction

4 weeks agoMerge tag 'timers-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Apr 2024 17:32:22 +0000 (10:32 -0700)]
Merge tag 'timers-urgent-2024-04-14' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:

 - Address a (valid) W=1 build warning

 - Fix timer self-tests

 - Annotate a KCSAN warning wrt. accesses to the tick_do_timer_cpu
   global variable

 - Address a !CONFIG_BUG build warning

* tag 'timers-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests: kselftest: Fix build failure with NOLIBC
  selftests: timers: Fix abs() warning in posix_timers test
  selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn
  selftests: timers: Fix posix_timers ksft_print_msg() warning
  selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior
  bug: Fix no-return-statement warning with !CONFIG_BUG
  timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu
  selftests/timers/posix_timers: Reimplement check_timer_distribution()
  irqflags: Explicitly ignore lockdep_hrtimer_exit() argument

4 weeks agoMerge tag 'perf-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 14 Apr 2024 17:26:27 +0000 (10:26 -0700)]
Merge tag 'perf-urgent-2024-04-14' of git://git./linux/kernel/git/tip/tip

Pull perf event fix from Ingo Molnar:
 "Fix the x86 PMU multi-counter code returning invalid data in certain
  circumstances"

* tag 'perf-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix out of range data

4 weeks agoMerge tag 'locking-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Apr 2024 17:13:56 +0000 (10:13 -0700)]
Merge tag 'locking-urgent-2024-04-14' of git://git./linux/kernel/git/tip/tip

Pull locking fix from Ingo Molnar:
 "Fix a PREEMPT_RT build bug"

* tag 'locking-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking: Make rwsem_assert_held_write_nolockdep() build with PREEMPT_RT=y

4 weeks agoMerge tag 'irq-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 14 Apr 2024 17:12:34 +0000 (10:12 -0700)]
Merge tag 'irq-urgent-2024-04-14' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "Fix a bug in the GIC irqchip driver"

* tag 'irq-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Fix VSYNC referencing an unmapped VPE on GIC v4.1

4 weeks agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Sun, 14 Apr 2024 17:05:59 +0000 (10:05 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio bugfixes from Michael Tsirkin:
 "Some small, obvious (in hindsight) bugfixes:

   - new ioctl in vhost-vdpa has a wrong # - not too late to fix

   - vhost has apparently been lacking an smp_rmb() - due to code
     duplication :( The duplication will be fixed in the next merge
     cycle, this is a minimal fix

   - an error message in vhost talks about guest moving used index -
     which of course never happens, guest only ever moves the available
     index

   - i2c-virtio didn't set the driver owner so it did not get refcounted
     correctly"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: correct misleading printing information
  vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZE
  virtio: store owner from modules with register_virtio_driver()
  vhost: Add smp_rmb() in vhost_enable_notify()
  vhost: Add smp_rmb() in vhost_vq_avail_empty()

4 weeks agoMerge tag 'dma-maping-6.9-2024-04-14' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Sun, 14 Apr 2024 17:02:40 +0000 (10:02 -0700)]
Merge tag 'dma-maping-6.9-2024-04-14' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - fix up swiotlb buffer padding even more (Petr Tesarik)

 - fix for partial dma_sync on swiotlb (Michael Kelley)

 - swiotlb debugfs fix (Dexuan Cui)

* tag 'dma-maping-6.9-2024-04-14' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: do not set total_used to 0 in swiotlb_create_debugfs_files()
  swiotlb: fix swiotlb_bounce() to do partial sync's correctly
  swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

5 weeks agokernfs: annotate different lockdep class for of->mutex of writable files
Amir Goldstein [Fri, 5 Apr 2024 14:56:35 +0000 (17:56 +0300)]
kernfs: annotate different lockdep class for of->mutex of writable files

The writable file /sys/power/resume may call vfs lookup helpers for
arbitrary paths and readonly files can be read by overlayfs from vfs
helpers when sysfs is a lower layer of overalyfs.

To avoid a lockdep warning of circular dependency between overlayfs
inode lock and kernfs of->mutex, use a different lockdep class for
writable and readonly kernfs files.

Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com
Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5 weeks agoMerge tag 'ata-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Linus Torvalds [Sat, 13 Apr 2024 17:27:58 +0000 (10:27 -0700)]
Merge tag 'ata-6.9-rc4' of git://git./linux/kernel/git/libata/linux

Pull ata fixes from Damien Le Moal:

 - Add the mask_port_map parameter to the ahci driver. This is a
   follow-up to the recent snafu with the ASMedia controller and its
   virtual port hidding port-multiplier devices. As ASMedia confirmed
   that there is no way to determine if these slow-to-probe virtual
   ports are actually representing the ports of a port-multiplier
   devices, this new parameter allow masking ports to significantly
   speed up probing during system boot, resulting in shorter boot times.

 - A fix for an incorrect handling of a port unlock in
   ata_scsi_dev_rescan().

 - Allow command duration limits to be detected for ACS-4 devices are
   there are such devices out in the field.

* tag 'ata-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-core: Allow command duration limits detection for ACS-4 drives
  ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
  ata: ahci: Add mask_port_map module parameter

5 weeks agoMerge tag 'zonefs-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Sat, 13 Apr 2024 17:25:32 +0000 (10:25 -0700)]
Merge tag 'zonefs-6.9-rc4' of git://git./linux/kernel/git/dlemoal/zonefs

Pull zonefs fix from Damien Le Moal:

 - Suppress a coccicheck warning using str_plural()

* tag 'zonefs-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Use str_plural() to fix Coccinelle warning

5 weeks agoMerge tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 13 Apr 2024 17:10:18 +0000 (10:10 -0700)]
Merge tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - fix for oops in cifs_get_fattr of deleted files

 - fix for the remote open counter going negative in some directory
   lease cases

 - fix for mkfifo to instantiate dentry to avoid possible crash

 - important fix to allow handling key rotation for mount and remount
   (ie cases that are becoming more common when password that was used
   for the mount will expire soon but will be replaced by new password)

* tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix broken reconnect when password changing on the server by allowing password rotation
  smb: client: instantiate when creating SFU files
  smb3: fix Open files on server counter going negative
  smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()

5 weeks agoata: libata-core: Allow command duration limits detection for ACS-4 drives
Igor Pylypiv [Thu, 11 Apr 2024 20:12:24 +0000 (20:12 +0000)]
ata: libata-core: Allow command duration limits detection for ACS-4 drives

Even though the command duration limits (CDL) feature was first added
in ACS-5 (major version 12), there are some ACS-4 (major version 11)
drives that implement CDL as well.

IDENTIFY_DEVICE, SUPPORTED_CAPABILITIES, and CURRENT_SETTINGS log pages
are mandatory in the ACS-4 standard so it should be safe to read these
log pages on older drives implementing the ACS-4 standard.

Fixes: 62e4a60e0cdb ("scsi: ata: libata: Detect support for command duration limits")
Cc: stable@vger.kernel.org
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
5 weeks agoata: libata-scsi: Fix ata_scsi_dev_rescan() error path
Damien Le Moal [Thu, 11 Apr 2024 23:41:15 +0000 (08:41 +0900)]
ata: libata-scsi: Fix ata_scsi_dev_rescan() error path

Commit 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
incorrectly handles failures of scsi_resume_device() in
ata_scsi_dev_rescan(), leading to a double call to
spin_unlock_irqrestore() to unlock a device port. Fix this by redefining
the goto labels used in case of errors and only unlock the port
scsi_scan_mutex when scsi_resume_device() fails.

Bug found with the Smatch static checker warning:

drivers/ata/libata-scsi.c:4774 ata_scsi_dev_rescan()
error: double unlocked 'ap->lock' (orig line 4757)

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
5 weeks agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 12 Apr 2024 20:08:39 +0000 (13:08 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix the TLBI RANGE operand calculation causing live migration under
  KVM/arm64 to miss dirty pages due to stale TLB entries"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: tlb: Fix TLBI RANGE operand

5 weeks agoMerge tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Fri, 12 Apr 2024 20:02:27 +0000 (13:02 -0700)]
Merge tag 'soc-fixes-6.9-1' of git://git./linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "The device tree changes this time are all for NXP i.MX platforms,
  addressing issues with clocks and regulators on i.MX7 and i.MX8.

  The old OMAP2 based Nokia N8x0 tablet get a couple of code fixes for
  regressions that came in.

  The ARM SCMI and FF-A firmware interfaces get a couple of minor bug
  fixes.

  A regression fix for RISC-V cache management addresses a problem with
  probe order on Sifive cores"

* tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (23 commits)
  MAINTAINERS: Change Krzysztof Kozlowski's email address
  arm64: dts: imx8qm-ss-dma: fix can lpcg indices
  arm64: dts: imx8-ss-dma: fix can lpcg indices
  arm64: dts: imx8-ss-dma: fix adc lpcg indices
  arm64: dts: imx8-ss-dma: fix pwm lpcg indices
  arm64: dts: imx8-ss-dma: fix spi lpcg indices
  arm64: dts: imx8-ss-conn: fix usb lpcg indices
  arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
  ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
  ARM: dts: imx7-mba7: Use 'no-mmc' property
  arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
  arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
  arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
  cache: sifive_ccache: Partially convert to a platform driver
  firmware: arm_scmi: Make raw debugfs entries non-seekable
  firmware: arm_scmi: Fix wrong fastchannel initialization
  firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get()
  ARM: OMAP2+: fix USB regression on Nokia N8x0
  mmc: omap: restore original power up/down steps
  mmc: omap: fix deferred probe
  ...

5 weeks agoMerge tag 'iommu-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 12 Apr 2024 19:56:19 +0000 (12:56 -0700)]
Merge tag 'iommu-fixes-v6.9-rc3' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Intel VT-d Fixes:
     - Allocate local memory for PRQ page
     - Fix WARN_ON in iommu probe path
     - Fix wrong use of pasid config

 - AMD IOMMU Fixes:
     - Lock inversion fix
     - Log message severity fix
     - Disable SNP when v2 page-tables are used

 - Mediatek driver:
     - Fix module autoloading

* tag 'iommu-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Change log message severity
  iommu/vt-d: Fix WARN_ON in iommu probe path
  iommu/vt-d: Allocate local memory for page request queue
  iommu/vt-d: Fix wrong use of pasid config
  iommu: mtk: fix module autoloading
  iommu/amd: Do not enable SNP when V2 page table is enabled
  iommu/amd: Fix possible irq lock inversion dependency issue

5 weeks agoMerge tag 'pci-v6.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 12 Apr 2024 19:47:48 +0000 (12:47 -0700)]
Merge tag 'pci-v6.9-fixes-1' of git://git./linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Revert a quirk that prevented Secondary Bus Reset for LSI / Agere
   FW643.

   We thought the device was broken, but the reset does work correctly
   on other platforms, and the reset avoids leaking data out of VMs
   (Bjorn Helgaas)

 - Update MAINTAINERS to reflect that Gustavo Pimentel is no longer
   reachable (Manivannan Sadhasivam)

* tag 'pci-v6.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  Revert "PCI: Mark LSI FW643 to avoid bus reset"
  MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer