Hao Xu [Fri, 17 Jun 2022 05:04:29 +0000 (13:04 +0800)]
io_uring: kbuf: add comments for some tricky code
Add comments to explain why it is always under uring lock when
incrementing head in __io_kbuf_recycle. And rectify one comemnt about
kbuf consuming in iowq case.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220617050429.94293-1-hao.xu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:12 +0000 (10:22 +0100)]
io_uring: mutex locked poll hashing
Currently we do two extra spin lock/unlock pairs to add a poll/apoll
request to the cancellation hash table and remove it from there.
On the submission side we often already hold ->uring_lock and tw
completion is likely to hold it as well. Add a second cancellation hash
table protected by ->uring_lock. In concerns for latency because of a
need to have the mutex locked on the completion side, use the new table
only in following cases:
1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
is little to no contention and so the main tw hander will almost
always end up grabbing it before calling callbacks.
2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
a major user of ->uring_lock.
3) apoll: we normally grab the lock on the completion side anyway to
execute the request, so it's free.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1bbad9c78c454b7b92f100bbf46730a37df7194f.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:11 +0000 (10:22 +0100)]
io_uring: propagate locking state to poll cancel
Poll cancellation will be soon need to grab ->uring_lock inside, pass
the locking state, i.e. issue_flags, inside the cancellation functions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b86781d047727c07163443b57551a3fa57c7c5e1.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:10 +0000 (10:22 +0100)]
io_uring: introduce a struct for hash table
Instead of passing around a pointer to hash buckets, add a bit of type
safety and wrap it into a structure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d65bc3faba537ec2aca9eabf334394936d44bd28.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:09 +0000 (10:22 +0100)]
io_uring: pass hash table into poll_find
In preparation for having multiple cancellation hash tables, pass a
table pointer into io_poll_find() and other poll cancel functions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a31c88502463dce09254240fa037352927d7ecc3.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:08 +0000 (10:22 +0100)]
io_uring: add IORING_SETUP_SINGLE_ISSUER
Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
of it, i.e. put limitations of submitters. Also, don't allow it together
with IOPOLL as we're not going to put it to good use.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4bcc41ee467fdf04c8aab8baf6ce3ba21858c3d4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:07 +0000 (10:22 +0100)]
io_uring: use state completion infra for poll reqs
Use io_req_task_complete() for poll request completions, so it can
utilise state completions and save lots of unnecessary locking.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ced94cb5a728d8e386c640d052fd3da3f5d6891a.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:06 +0000 (10:22 +0100)]
io_uring: clean up io_ring_ctx_alloc
Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
makes it more readable.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/993926ed0d614ba9a76b2a85bebae2babcb13983.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:05 +0000 (10:22 +0100)]
io_uring: limit the number of cancellation buckets
Don't allocate to many hash/cancellation buckets, there might be too
many, clamp it to 8 bits, or 256 * 64B = 16KB. We don't usually have too
many requests, and 256 buckets should be enough, especially since we
do hash search only in the cancellation path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b9620c8072ba61a2d50eba894b89bd93a94a9abd.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:04 +0000 (10:22 +0100)]
io_uring: clean up io_try_cancel
Get rid of an unnecessary extra goto in io_try_cancel() and simplify the
function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/48cf5417b43a8386c6c364dba1ad9b4c7382d158.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:03 +0000 (10:22 +0100)]
io_uring: pass poll_find lock back
Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dae1dc5749aa34367812ecf62f82fd3f053aae44.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Thu, 16 Jun 2022 09:22:02 +0000 (10:22 +0100)]
io_uring: switch cancel_hash to use per entry spinlock
Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Thu, 16 Jun 2022 09:22:01 +0000 (10:22 +0100)]
io_uring: poll: remove unnecessary req->ref set
We now don't need to set req->refcount for poll requests since the
reworked poll code ensures no request release race.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec6fee45705890bdb968b0c175519242753c0215.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:00 +0000 (10:22 +0100)]
io_uring: don't inline io_put_kbuf
io_put_kbuf() is huge, don't bloat the kernel with inlining.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e21ccf0be471ffa654032914b9430813cae53f8.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:59 +0000 (10:21 +0100)]
io_uring: refactor io_req_task_complete()
Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ae3148ac7eb5cce3e06895cde306e9e959d6f6ae.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:58 +0000 (10:21 +0100)]
io_uring: kill REQ_F_COMPLETE_INLINE
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.
As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/600ba20a9338b8a39b249b23d3d177803613dde4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:57 +0000 (10:21 +0100)]
io_uring: rw: delegate sync completions to core io_uring
io_issue_sqe() from the io_uring core knows how to complete requests
based on the returned error code, we can delegate io_read()/io_write()
completion to it. Make kiocb_done() to return the right completion
code and propagate it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/32ef005b45d23bf6b5e6837740dc0331bb051bd4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 15 Jun 2022 22:28:17 +0000 (16:28 -0600)]
io_uring: remove unused IO_REQ_CACHE_SIZE defined
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:56 +0000 (17:33 +0100)]
io_uring: don't set REQ_F_COMPLETE_INLINE in tw
io_req_task_complete() enqueues requests for state completion itself, no
need for REQ_F_COMPLETE_INLINE, which is only serve the purpose of not
bloating the kernel.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/aca80f71464ad02c06f1311d998a2d6ee0b31573.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:55 +0000 (17:33 +0100)]
io_uring: remove check_cq checking from hot paths
All ctx->check_cq events are slow path, don't test every single flag one
by one in the hot path, but add a common guarding if.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dff026585cea7ff3a172a7c83894a3b0111bbf6a.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:54 +0000 (17:33 +0100)]
io_uring: never defer-complete multi-apoll
Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a65ed3f5effd9321ee06e6edea294a03be3e15a0.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:53 +0000 (17:33 +0100)]
io_uring: inline ->registered_rings
There can be only 16 registered rings, no need to allocate an array for
them separately but store it in tctx.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/495f0b953c87994dd9e13de2134019054fa5830d.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:52 +0000 (17:33 +0100)]
io_uring: explain io_wq_work::cancel_seq placement
Add a comment on why we keep ->cancel_seq in struct io_wq_work instead
of struct io_kiocb despite it needed only by io_uring but not io-wq.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/988e87eec9dc700b5dae933df3aefef303502f6c.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:51 +0000 (17:33 +0100)]
io_uring: move small helpers to headers
There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/22df99c83723e44cba7e945e8519e64e3642c064.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:50 +0000 (17:33 +0100)]
io_uring: refactor ctx slow data placement
Shove all slow path data at the end of ctx and get rid of extra
indention.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bcaf200298dd469af20787650550efc66d89bef2.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:49 +0000 (17:33 +0100)]
io_uring: better caching for ctx timeout fields
Following timeout fields access patterns, move all of them into a
separate cache line inside ctx, so they don't intervene with normal
completion caching, especially since timeout removals and completion
are separated and the later is done via tw.
It also sheds some bytes from io_ring_ctx, 1216B -> 1152B
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4b163793072840de53b3cb66e0c2995e7226ff78.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:48 +0000 (17:33 +0100)]
io_uring: move defer_list to slow data
draining is slow path, move defer_list to the end where slow data lives
inside the context.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e16379391ca72b490afdd24e8944baab849b4a7b.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:47 +0000 (17:33 +0100)]
io_uring: make reg buf init consistent
The default (i.e. empty) state of register buffer is dummy_ubuf, so set
it to dummy on init instead of NULL.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c5456aecf03d9627fbd6e65e100e2b5293a6151e.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 1 Jun 2022 18:36:42 +0000 (12:36 -0600)]
io_uring: deprecate epoll_ctl support
As far as we know, nobody ever adopted the epoll_ctl management via
io_uring. Deprecate it now with a warning, and plan on removing it in
a later kernel version. When we do remove it, we can revert the following
commits as well:
39220e8d4a2a ("eventpoll: support non-blocking do_epoll_ctl() calls")
58e41a44c488 ("eventpoll: abstract out epoll_ctl() handler")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wiTyisXBgKnVHAGYCNvkmjk=50agS2Uk6nr+n3ssLZg2w@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 27 May 2022 16:55:07 +0000 (10:55 -0600)]
io_uring: add support for level triggered poll
By default, the POLL_ADD command does edge triggered poll - if we get
a non-zero mask on the initial poll attempt, we complete the request
successfully.
Support level triggered by always waiting for a notification, regardless
of whether or not the initial mask matches the file state.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 15 Jun 2022 22:27:42 +0000 (16:27 -0600)]
io_uring: move opcode table to opdef.c
We already have the declarations in opdef.h, move the rest into its own
file rather than in the main io_uring.c file.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:27:03 +0000 (07:27 -0600)]
io_uring: move read/write related opcodes to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 15:44:31 +0000 (09:44 -0600)]
io_uring: move remaining file table manipulation to filetable.c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:12:45 +0000 (07:12 -0600)]
io_uring: move rsrc related data, core, and commands
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:07:23 +0000 (07:07 -0600)]
io_uring: split provided buffers handling into its own file
Move both the opcodes related to it, and the internals code dealing with
it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 02:36:47 +0000 (20:36 -0600)]
io_uring: move cancelation into its own file
This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 02:31:09 +0000 (20:31 -0600)]
io_uring: move poll handling into its own file
Add a io_poll_issue() rather than export the general task_work locking
and io_issue_sqe(), and put the io_op_defs definition and structure into
a separate header file so that poll can use it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:57:03 +0000 (11:57 -0600)]
io_uring: add opcode name to io_op_defs
This kills the last per-op switch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:48:35 +0000 (11:48 -0600)]
io_uring: include and forward-declaration sanitation
Remove some dead headers we no longer need, and get rid of the
io_ring_ctx and io_uring_fops forward declarations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:01:04 +0000 (11:01 -0600)]
io_uring: move io_uring_task (tctx) helpers into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 16:40:19 +0000 (10:40 -0600)]
io_uring: move fdinfo helpers to its own file
This also means moving a bit more of the fixed file handling to the
filetable side, which makes sense separately too.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 16:28:04 +0000 (10:28 -0600)]
io_uring: use io_is_uring_fops() consistently
Convert the last spots that check for io_uring_fops to use the provided
helper instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 15:13:39 +0000 (09:13 -0600)]
io_uring: move SQPOLL related handling into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 14:57:27 +0000 (08:57 -0600)]
io_uring: move timeout opcodes and handling into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 14:56:52 +0000 (08:56 -0600)]
io_uring: move our reference counting into a header
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:42:08 +0000 (06:42 -0600)]
io_uring: move msg_ring into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:25:13 +0000 (06:25 -0600)]
io_uring: split network related opcodes into its own file
While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:12:18 +0000 (06:12 -0600)]
io_uring: move statx handling to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:09:18 +0000 (06:09 -0600)]
io_uring: move epoll handler to its own file
Would be nice to sort out Kconfig for this and don't even compile
epoll.c if we don't have epoll configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:04:14 +0000 (06:04 -0600)]
io_uring: add a dummy -EOPNOTSUPP prep handler
Add it and use it for the epoll handling, if epoll isn't configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 11:59:19 +0000 (05:59 -0600)]
io_uring: move uring_cmd handling to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:54:43 +0000 (21:54 -0600)]
io_uring: split out open/close operations
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:43:10 +0000 (21:43 -0600)]
io_uring: separate out file table handling code
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:28:33 +0000 (21:28 -0600)]
io_uring: split out fadvise/madvise operations
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:25:19 +0000 (21:25 -0600)]
io_uring: split out fs related sync/fallocate functions
This splits out sync_file_range, fsync, and fallocate.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:19:47 +0000 (21:19 -0600)]
io_uring: split out splice related operations
This splits out splice and tee support.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:13:00 +0000 (21:13 -0600)]
io_uring: split out filesystem related operations
This splits out renameat, unlinkat, mkdirat, symlinkat, and linkat.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 17:56:42 +0000 (11:56 -0600)]
io_uring: move nop into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 17:46:43 +0000 (11:46 -0600)]
io_uring: move xattr related opcodes to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 21:21:00 +0000 (15:21 -0600)]
io_uring: handle completions in the core
Normally request handlers complete requests themselves, if they don't
return an error. For the latter case, the core will complete it for
them.
This is unhandy for pushing opcode handlers further out, as we don't
want a bunch of inline completion code and we don't want to make the
completion path slower than it is now.
Let the core handle any completion, unless the handler explicitly
asks us not to.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 18:45:38 +0000 (12:45 -0600)]
io_uring: set completion results upfront
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:56:14 +0000 (10:56 -0600)]
io_uring: add io_uring_types.h
This adds definitions of structs that both the core and the various
opcode handlers need to know about.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:26:28 +0000 (10:26 -0600)]
io_uring: define a request type cleanup handler
This can move request type specific cleanup into a private handler,
removing the need for the core io_uring parts to know what types
they are dealing with.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:19:47 +0000 (10:19 -0600)]
io_uring: unify struct io_symlink and io_hardlink
They are really just a subset of each other, just use the one type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:09:32 +0000 (10:09 -0600)]
io_uring: convert iouring_cmd to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:06:46 +0000 (10:06 -0600)]
io_uring: convert xattr to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:05:49 +0000 (10:05 -0600)]
io_uring: convert rsrc_update to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:03:49 +0000 (10:03 -0600)]
io_uring: convert msg and nop to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:01:47 +0000 (10:01 -0600)]
io_uring: convert splice to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:01:09 +0000 (10:01 -0600)]
io_uring: convert epoll to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:59:28 +0000 (09:59 -0600)]
io_uring: convert file system request types to use io_cmd_type
This converts statx, rename, unlink, mkdir, symlink, and hardlink to
use io_cmd_type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:51:05 +0000 (09:51 -0600)]
io_uring: convert madvise/fadvise to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:49:25 +0000 (09:49 -0600)]
io_uring: convert open/close path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:45:22 +0000 (09:45 -0600)]
io_uring: convert timeout path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:33:01 +0000 (09:33 -0600)]
io_uring: convert cancel path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:30:45 +0000 (09:30 -0600)]
io_uring: convert the sync and fallocate paths to use io_cmd_type
They all share the same struct io_sync, convert them to use the
io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:27:38 +0000 (09:27 -0600)]
io_uring: convert net related opcodes to use io_cmd_type
This converts accept, connect, send/recv, sendmsg/recvmsg, shutdown, and
socket to use io_cmd_type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:24:42 +0000 (09:24 -0600)]
io_uring: remove recvmsg knowledge from io_arm_poll_handler()
There's a special case for recvmsg with MSG_ERRQUEUE set. This is
problematic as it means the core needs to know about this special
request type.
For now, just add a generic flag for it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:16:40 +0000 (09:16 -0600)]
io_uring: convert poll_update path to use io_cmd_type
Remove struct io_poll_update from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:13:46 +0000 (09:13 -0600)]
io_uring: convert poll path to use io_cmd_type
Remove struct io_poll_iocb from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.
While at it, rename io_poll_iocb to io_poll which is consistent with the
other request type private structures.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 12:57:44 +0000 (06:57 -0600)]
io_uring: convert read/write path to use io_cmd_type
Remove struct io_rw from io_kiocb, and convert the read/write path to
use the io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 14:32:05 +0000 (08:32 -0600)]
io_uring: add generic command payload type to struct io_kiocb
Each opcode generally has a command structure in io_kiocb which it can
use to store data associated with that request.
In preparation for having the core layer not know about what's inside
these fields, add a generic io_cmd_data type and put in the union as
well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 23:30:37 +0000 (17:30 -0600)]
io_uring: move req async preparation into opcode handler
Define an io_op_def->prep_async() handler and push the async preparation
to there. Since we now have that, we can drop ->needs_async_setup, as
they mean the same thing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 23:05:03 +0000 (17:05 -0600)]
io_uring: move to separate directory
In preparation for splitting io_uring up a bit, move it into its own
top level directory. It didn't really belong in fs/ anyway, as it's
not a file system only API.
This adds io_uring/ and moves the core files in there, and updates the
MAINTAINERS file for the new location.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 22:56:21 +0000 (16:56 -0600)]
io_uring: define a 'prep' and 'issue' handler for each opcode
Rather than have two giant switches for doing request preparation and
then for doing request issue, add a prep and issue handler for each
of them in the io_op_defs[] request definition.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 21 Jul 2022 14:33:52 +0000 (08:33 -0600)]
Merge branch 'io_uring-5.19' into for-5.20/io_uring
* io_uring-5.19:
io_uring: do not recycle buffer in READV
io_uring: fix free of unallocated buffer list
Dylan Yudaken [Thu, 21 Jul 2022 13:13:25 +0000 (06:13 -0700)]
io_uring: do not recycle buffer in READV
READV cannot recycle buffers as it would lose some of the data required to
reimport that buffer.
Reported-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Fixes:
b66e65f41426 ("io_uring: never call io_buffer_select() for a buffer re-select")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220721131325.624788-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dylan Yudaken [Thu, 21 Jul 2022 11:01:15 +0000 (04:01 -0700)]
io_uring: fix free of unallocated buffer list
in the error path of io_register_pbuf_ring, only free bl if it was
allocated.
Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com>
Fixes:
c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/all/CANX2M5bXKw1NaHdHNVqssUUaBCs8aBpmzRNVEYEvV0n44P7ioA@mail.gmail.com/
Link: https://lore.kernel.org/all/CANX2M5YiZBXU3L6iwnaLs-HHJXRvrxM8mhPDiMDF9Y9sAvOHUA@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 17 Jul 2022 20:30:22 +0000 (13:30 -0700)]
Linux 5.19-rc7
Linus Torvalds [Sun, 17 Jul 2022 20:08:03 +0000 (13:08 -0700)]
Merge tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel
Pull intel drm build fix from Rodrigo Vivi:
"Our 'dim' flow has a problem with fixes of fixes getting missed. We
need to take a look on that later.
Meanwhile, please allow me to quickly propagate this fix for the
32-bit build issue here upstream"
* tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915/ttm: fix 32b build
Linus Torvalds [Sun, 17 Jul 2022 19:42:57 +0000 (12:42 -0700)]
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix SIGSEGV when processing syscall args in perf.data files in 'perf
trace'
- Sync kvm, msr-index and cpufeatures headers with the kernel sources
- Fix 'convert perf time to TSC' 'perf test':
- No need to open events twice
- Fix finding correct event on hybrid systems
* tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf trace: Fix SIGSEGV when processing syscall args
perf tests: Fix Convert perf time to TSC test for hybrid
perf tests: Stop Convert perf time to TSC test opening events twice
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
Matthew Auld [Tue, 12 Jul 2022 17:40:50 +0000 (18:40 +0100)]
drm/i915/ttm: fix 32b build
Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes:
aff1e0b09b54 ("drm/i915/ttm: fix sg_table construction")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220712174050.592550-1-matthew.auld@intel.com
(cherry picked from commit
9306b2b2dfce6931241ef804783692cee526599c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Linus Torvalds [Sun, 17 Jul 2022 15:34:02 +0000 (08:34 -0700)]
Merge tag 'perf_urgent_for_v5.19_rc7' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- A single data race fix on the perf event cleanup path to avoid
endless loops due to insufficient locking
* tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
Linus Torvalds [Sun, 17 Jul 2022 15:27:30 +0000 (08:27 -0700)]
Merge tag 'x86_urgent_for_v5.19_rc7' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Improve the check whether the kernel supports WP mappings so that it
can accomodate a XenPV guest due to how the latter is setting up the
PAT machinery
- Now that the retbleed nightmare is public, here's the first round of
fallout fixes:
* Fix a build failure on 32-bit due to missing include
* Remove an untraining point in espfix64 return path
* other small cleanups
* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Remove apostrophe typo
um: Add missing apply_returns()
x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
x86/bugs: Mark retbleed_strings static
x86/pat: Fix x86_has_pat_wp()
x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
Linus Torvalds [Sun, 17 Jul 2022 14:58:19 +0000 (07:58 -0700)]
Merge tag 'gpio-fixes-for-v5.19-rc7' of git://git./linux/kernel/git/brgl/linux
Pull gpio fix from Bartosz Golaszewski:
- fix a configfs attribute of the gpio-sim module
* tag 'gpio-fixes-for-v5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: fix the chip_name configfs item
Linus Torvalds [Sun, 17 Jul 2022 14:52:46 +0000 (07:52 -0700)]
Merge tag 'input-for-v5.19-rc6' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- fix Goodix driver to properly behave on the Aya Neo Next
- some more sanity checks in usbtouchscreen driver
- a tweak in wm97xx driver in preparation for remove() to return void
- a clarification in input core regarding units of measurement for
resolution on touch events.
* tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: document the units for resolution of size axes
Input: goodix - call acpi_device_fix_up_power() in some cases
Input: wm97xx - make .remove() obviously always return 0
Input: usbtouchscreen - add driver_info sanity check
Linus Torvalds [Sun, 17 Jul 2022 14:45:51 +0000 (07:45 -0700)]
Merge tag 'for-v5.19-rc' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
- power-supply core temperature interpolation regression fix for
incorrect boundaries
- ab8500 needs to destroy its work queues in error paths
- Fix old DT refcount leak in arm-versatile
* tag 'for-v5.19-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: core: Fix boundary conditions in interpolation
power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe
power: supply: ab8500_fg: add missing destroy_workqueue in ab8500_fg_probe
Naveen N. Rao [Thu, 7 Jul 2022 09:09:00 +0000 (14:39 +0530)]
perf trace: Fix SIGSEGV when processing syscall args
On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':
#0 0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
#1 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
#2 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
#3 0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
#4 syscall__scnprintf_args <snip> at builtin-trace.c:2041
#5 0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319
That points to the below code in tools/perf/builtin-trace.c:
/*
* If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
* arguments, even if the syscall being handled, say "openat", uses only 4 arguments
* this breaks syscall__augmented_args() check for augmented args, as we calculate
* syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
* so when handling, say the openat syscall, we end up getting 6 args for the
* raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
* thinking that the extra 2 u64 args are the augmented filename, so just check
* here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
*/
if (evsel != trace->syscalls.events.sys_enter)
augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);
As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.
Reported-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220707090900.572584-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 13 Jul 2022 12:34:59 +0000 (15:34 +0300)]
perf tests: Fix Convert perf time to TSC test for hybrid
The test does not always correctly determine the number of events for
hybrids, nor allow for more than 1 evsel when parsing.
Fix by iterating the events actually created and getting the correct
evsel for the events processed.
Fixes:
d9da6f70eb235110 ("perf tests: Support 'Convert perf time to TSC' test for hybrid")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713123459.24145-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 13 Jul 2022 12:34:58 +0000 (15:34 +0300)]
perf tests: Stop Convert perf time to TSC test opening events twice
Do not call evlist__open() twice.
Fixes:
5bb017d4b97a0f13 ("perf test: Fix error message for test case 71 on s390, where it is not supported")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713123459.24145-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>