Pavel Begunkov [Thu, 16 Jun 2022 09:22:07 +0000 (10:22 +0100)]
io_uring: use state completion infra for poll reqs
Use io_req_task_complete() for poll request completions, so it can
utilise state completions and save lots of unnecessary locking.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ced94cb5a728d8e386c640d052fd3da3f5d6891a.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:06 +0000 (10:22 +0100)]
io_uring: clean up io_ring_ctx_alloc
Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
makes it more readable.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/993926ed0d614ba9a76b2a85bebae2babcb13983.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:05 +0000 (10:22 +0100)]
io_uring: limit the number of cancellation buckets
Don't allocate to many hash/cancellation buckets, there might be too
many, clamp it to 8 bits, or 256 * 64B = 16KB. We don't usually have too
many requests, and 256 buckets should be enough, especially since we
do hash search only in the cancellation path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b9620c8072ba61a2d50eba894b89bd93a94a9abd.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:04 +0000 (10:22 +0100)]
io_uring: clean up io_try_cancel
Get rid of an unnecessary extra goto in io_try_cancel() and simplify the
function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/48cf5417b43a8386c6c364dba1ad9b4c7382d158.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:03 +0000 (10:22 +0100)]
io_uring: pass poll_find lock back
Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dae1dc5749aa34367812ecf62f82fd3f053aae44.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Thu, 16 Jun 2022 09:22:02 +0000 (10:22 +0100)]
io_uring: switch cancel_hash to use per entry spinlock
Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Thu, 16 Jun 2022 09:22:01 +0000 (10:22 +0100)]
io_uring: poll: remove unnecessary req->ref set
We now don't need to set req->refcount for poll requests since the
reworked poll code ensures no request release race.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec6fee45705890bdb968b0c175519242753c0215.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:00 +0000 (10:22 +0100)]
io_uring: don't inline io_put_kbuf
io_put_kbuf() is huge, don't bloat the kernel with inlining.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e21ccf0be471ffa654032914b9430813cae53f8.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:59 +0000 (10:21 +0100)]
io_uring: refactor io_req_task_complete()
Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ae3148ac7eb5cce3e06895cde306e9e959d6f6ae.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:58 +0000 (10:21 +0100)]
io_uring: kill REQ_F_COMPLETE_INLINE
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.
As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/600ba20a9338b8a39b249b23d3d177803613dde4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:57 +0000 (10:21 +0100)]
io_uring: rw: delegate sync completions to core io_uring
io_issue_sqe() from the io_uring core knows how to complete requests
based on the returned error code, we can delegate io_read()/io_write()
completion to it. Make kiocb_done() to return the right completion
code and propagate it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/32ef005b45d23bf6b5e6837740dc0331bb051bd4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 15 Jun 2022 22:28:17 +0000 (16:28 -0600)]
io_uring: remove unused IO_REQ_CACHE_SIZE defined
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:56 +0000 (17:33 +0100)]
io_uring: don't set REQ_F_COMPLETE_INLINE in tw
io_req_task_complete() enqueues requests for state completion itself, no
need for REQ_F_COMPLETE_INLINE, which is only serve the purpose of not
bloating the kernel.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/aca80f71464ad02c06f1311d998a2d6ee0b31573.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:55 +0000 (17:33 +0100)]
io_uring: remove check_cq checking from hot paths
All ctx->check_cq events are slow path, don't test every single flag one
by one in the hot path, but add a common guarding if.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dff026585cea7ff3a172a7c83894a3b0111bbf6a.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:54 +0000 (17:33 +0100)]
io_uring: never defer-complete multi-apoll
Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a65ed3f5effd9321ee06e6edea294a03be3e15a0.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:53 +0000 (17:33 +0100)]
io_uring: inline ->registered_rings
There can be only 16 registered rings, no need to allocate an array for
them separately but store it in tctx.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/495f0b953c87994dd9e13de2134019054fa5830d.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:52 +0000 (17:33 +0100)]
io_uring: explain io_wq_work::cancel_seq placement
Add a comment on why we keep ->cancel_seq in struct io_wq_work instead
of struct io_kiocb despite it needed only by io_uring but not io-wq.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/988e87eec9dc700b5dae933df3aefef303502f6c.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:51 +0000 (17:33 +0100)]
io_uring: move small helpers to headers
There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/22df99c83723e44cba7e945e8519e64e3642c064.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:50 +0000 (17:33 +0100)]
io_uring: refactor ctx slow data placement
Shove all slow path data at the end of ctx and get rid of extra
indention.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bcaf200298dd469af20787650550efc66d89bef2.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:49 +0000 (17:33 +0100)]
io_uring: better caching for ctx timeout fields
Following timeout fields access patterns, move all of them into a
separate cache line inside ctx, so they don't intervene with normal
completion caching, especially since timeout removals and completion
are separated and the later is done via tw.
It also sheds some bytes from io_ring_ctx, 1216B -> 1152B
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4b163793072840de53b3cb66e0c2995e7226ff78.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:48 +0000 (17:33 +0100)]
io_uring: move defer_list to slow data
draining is slow path, move defer_list to the end where slow data lives
inside the context.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e16379391ca72b490afdd24e8944baab849b4a7b.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:47 +0000 (17:33 +0100)]
io_uring: make reg buf init consistent
The default (i.e. empty) state of register buffer is dummy_ubuf, so set
it to dummy on init instead of NULL.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c5456aecf03d9627fbd6e65e100e2b5293a6151e.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 1 Jun 2022 18:36:42 +0000 (12:36 -0600)]
io_uring: deprecate epoll_ctl support
As far as we know, nobody ever adopted the epoll_ctl management via
io_uring. Deprecate it now with a warning, and plan on removing it in
a later kernel version. When we do remove it, we can revert the following
commits as well:
39220e8d4a2a ("eventpoll: support non-blocking do_epoll_ctl() calls")
58e41a44c488 ("eventpoll: abstract out epoll_ctl() handler")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wiTyisXBgKnVHAGYCNvkmjk=50agS2Uk6nr+n3ssLZg2w@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 27 May 2022 16:55:07 +0000 (10:55 -0600)]
io_uring: add support for level triggered poll
By default, the POLL_ADD command does edge triggered poll - if we get
a non-zero mask on the initial poll attempt, we complete the request
successfully.
Support level triggered by always waiting for a notification, regardless
of whether or not the initial mask matches the file state.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 15 Jun 2022 22:27:42 +0000 (16:27 -0600)]
io_uring: move opcode table to opdef.c
We already have the declarations in opdef.h, move the rest into its own
file rather than in the main io_uring.c file.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:27:03 +0000 (07:27 -0600)]
io_uring: move read/write related opcodes to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 15:44:31 +0000 (09:44 -0600)]
io_uring: move remaining file table manipulation to filetable.c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:12:45 +0000 (07:12 -0600)]
io_uring: move rsrc related data, core, and commands
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:07:23 +0000 (07:07 -0600)]
io_uring: split provided buffers handling into its own file
Move both the opcodes related to it, and the internals code dealing with
it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 02:36:47 +0000 (20:36 -0600)]
io_uring: move cancelation into its own file
This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 02:31:09 +0000 (20:31 -0600)]
io_uring: move poll handling into its own file
Add a io_poll_issue() rather than export the general task_work locking
and io_issue_sqe(), and put the io_op_defs definition and structure into
a separate header file so that poll can use it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:57:03 +0000 (11:57 -0600)]
io_uring: add opcode name to io_op_defs
This kills the last per-op switch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:48:35 +0000 (11:48 -0600)]
io_uring: include and forward-declaration sanitation
Remove some dead headers we no longer need, and get rid of the
io_ring_ctx and io_uring_fops forward declarations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:01:04 +0000 (11:01 -0600)]
io_uring: move io_uring_task (tctx) helpers into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 16:40:19 +0000 (10:40 -0600)]
io_uring: move fdinfo helpers to its own file
This also means moving a bit more of the fixed file handling to the
filetable side, which makes sense separately too.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 16:28:04 +0000 (10:28 -0600)]
io_uring: use io_is_uring_fops() consistently
Convert the last spots that check for io_uring_fops to use the provided
helper instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 15:13:39 +0000 (09:13 -0600)]
io_uring: move SQPOLL related handling into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 14:57:27 +0000 (08:57 -0600)]
io_uring: move timeout opcodes and handling into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 14:56:52 +0000 (08:56 -0600)]
io_uring: move our reference counting into a header
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:42:08 +0000 (06:42 -0600)]
io_uring: move msg_ring into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:25:13 +0000 (06:25 -0600)]
io_uring: split network related opcodes into its own file
While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:12:18 +0000 (06:12 -0600)]
io_uring: move statx handling to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:09:18 +0000 (06:09 -0600)]
io_uring: move epoll handler to its own file
Would be nice to sort out Kconfig for this and don't even compile
epoll.c if we don't have epoll configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:04:14 +0000 (06:04 -0600)]
io_uring: add a dummy -EOPNOTSUPP prep handler
Add it and use it for the epoll handling, if epoll isn't configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 11:59:19 +0000 (05:59 -0600)]
io_uring: move uring_cmd handling to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:54:43 +0000 (21:54 -0600)]
io_uring: split out open/close operations
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:43:10 +0000 (21:43 -0600)]
io_uring: separate out file table handling code
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:28:33 +0000 (21:28 -0600)]
io_uring: split out fadvise/madvise operations
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:25:19 +0000 (21:25 -0600)]
io_uring: split out fs related sync/fallocate functions
This splits out sync_file_range, fsync, and fallocate.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:19:47 +0000 (21:19 -0600)]
io_uring: split out splice related operations
This splits out splice and tee support.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:13:00 +0000 (21:13 -0600)]
io_uring: split out filesystem related operations
This splits out renameat, unlinkat, mkdirat, symlinkat, and linkat.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 17:56:42 +0000 (11:56 -0600)]
io_uring: move nop into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 17:46:43 +0000 (11:46 -0600)]
io_uring: move xattr related opcodes to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 21:21:00 +0000 (15:21 -0600)]
io_uring: handle completions in the core
Normally request handlers complete requests themselves, if they don't
return an error. For the latter case, the core will complete it for
them.
This is unhandy for pushing opcode handlers further out, as we don't
want a bunch of inline completion code and we don't want to make the
completion path slower than it is now.
Let the core handle any completion, unless the handler explicitly
asks us not to.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 18:45:38 +0000 (12:45 -0600)]
io_uring: set completion results upfront
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:56:14 +0000 (10:56 -0600)]
io_uring: add io_uring_types.h
This adds definitions of structs that both the core and the various
opcode handlers need to know about.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:26:28 +0000 (10:26 -0600)]
io_uring: define a request type cleanup handler
This can move request type specific cleanup into a private handler,
removing the need for the core io_uring parts to know what types
they are dealing with.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:19:47 +0000 (10:19 -0600)]
io_uring: unify struct io_symlink and io_hardlink
They are really just a subset of each other, just use the one type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:09:32 +0000 (10:09 -0600)]
io_uring: convert iouring_cmd to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:06:46 +0000 (10:06 -0600)]
io_uring: convert xattr to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:05:49 +0000 (10:05 -0600)]
io_uring: convert rsrc_update to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:03:49 +0000 (10:03 -0600)]
io_uring: convert msg and nop to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:01:47 +0000 (10:01 -0600)]
io_uring: convert splice to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:01:09 +0000 (10:01 -0600)]
io_uring: convert epoll to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:59:28 +0000 (09:59 -0600)]
io_uring: convert file system request types to use io_cmd_type
This converts statx, rename, unlink, mkdir, symlink, and hardlink to
use io_cmd_type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:51:05 +0000 (09:51 -0600)]
io_uring: convert madvise/fadvise to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:49:25 +0000 (09:49 -0600)]
io_uring: convert open/close path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:45:22 +0000 (09:45 -0600)]
io_uring: convert timeout path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:33:01 +0000 (09:33 -0600)]
io_uring: convert cancel path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:30:45 +0000 (09:30 -0600)]
io_uring: convert the sync and fallocate paths to use io_cmd_type
They all share the same struct io_sync, convert them to use the
io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:27:38 +0000 (09:27 -0600)]
io_uring: convert net related opcodes to use io_cmd_type
This converts accept, connect, send/recv, sendmsg/recvmsg, shutdown, and
socket to use io_cmd_type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:24:42 +0000 (09:24 -0600)]
io_uring: remove recvmsg knowledge from io_arm_poll_handler()
There's a special case for recvmsg with MSG_ERRQUEUE set. This is
problematic as it means the core needs to know about this special
request type.
For now, just add a generic flag for it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:16:40 +0000 (09:16 -0600)]
io_uring: convert poll_update path to use io_cmd_type
Remove struct io_poll_update from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:13:46 +0000 (09:13 -0600)]
io_uring: convert poll path to use io_cmd_type
Remove struct io_poll_iocb from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.
While at it, rename io_poll_iocb to io_poll which is consistent with the
other request type private structures.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 12:57:44 +0000 (06:57 -0600)]
io_uring: convert read/write path to use io_cmd_type
Remove struct io_rw from io_kiocb, and convert the read/write path to
use the io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 14:32:05 +0000 (08:32 -0600)]
io_uring: add generic command payload type to struct io_kiocb
Each opcode generally has a command structure in io_kiocb which it can
use to store data associated with that request.
In preparation for having the core layer not know about what's inside
these fields, add a generic io_cmd_data type and put in the union as
well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 23:30:37 +0000 (17:30 -0600)]
io_uring: move req async preparation into opcode handler
Define an io_op_def->prep_async() handler and push the async preparation
to there. Since we now have that, we can drop ->needs_async_setup, as
they mean the same thing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 23:05:03 +0000 (17:05 -0600)]
io_uring: move to separate directory
In preparation for splitting io_uring up a bit, move it into its own
top level directory. It didn't really belong in fs/ anyway, as it's
not a file system only API.
This adds io_uring/ and moves the core files in there, and updates the
MAINTAINERS file for the new location.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 22:56:21 +0000 (16:56 -0600)]
io_uring: define a 'prep' and 'issue' handler for each opcode
Rather than have two giant switches for doing request preparation and
then for doing request issue, add a prep and issue handler for each
of them in the io_op_defs[] request definition.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 17 Jun 2022 12:30:32 +0000 (06:30 -0600)]
Merge branch 'io_uring-5.19' into for-5.20/io_uring
* io_uring-5.19: (21 commits)
io_uring: recycle provided buffer if we punt to io-wq
io_uring: do not use prio task_work_add in uring_cmd
io_uring: commit non-pollable provided mapped buffers upfront
io_uring: make io_fill_cqe_aux honour CQE32
io_uring: remove __io_fill_cqe() helper
io_uring: fix ->extra{1,2} misuse
io_uring: fill extra big cqe fields from req
io_uring: unite fill_cqe and the 32B version
io_uring: get rid of __io_fill_cqe{32}_req()
io_uring: remove IORING_CLOSE_FD_AND_FILE_SLOT
Revert "io_uring: add buffer selection support to IORING_OP_NOP"
Revert "io_uring: support CQE32 for nop operation"
io_uring: limit size of provided buffer ring
io_uring: fix types in provided buffer ring
io_uring: fix index calculation
io_uring: fix double unlock for pbuf select
io_uring: kbuf: fix bug of not consuming ring buffer in partial io case
io_uring: openclose: fix bug of closing wrong fixed file
io_uring: fix not locked access to fixed buf table
io_uring: fix races with buffer table unregister
...
Jens Axboe [Fri, 17 Jun 2022 12:24:26 +0000 (06:24 -0600)]
io_uring: recycle provided buffer if we punt to io-wq
io_arm_poll_handler() will recycle the buffer appropriately if we end
up arming poll (or if we're ready to retry), but not for the io-wq case
if we have attempted poll first.
Explicitly recycle the buffer to avoid both hanging on to it too long,
but also to avoid multiple reads grabbing the same one. This can happen
for ring mapped buffers, since it hasn't necessarily been committed.
Fixes:
c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
Link: https://github.com/axboe/liburing/issues/605
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 16 Jun 2022 22:53:38 +0000 (15:53 -0700)]
Merge tag 'audit-pr-
20220616' of git://git./linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single audit patch to fix a problem where we were not properly
freeing memory allocated when recording information related to a
module load"
* tag 'audit-pr-
20220616' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: free module name
Linus Torvalds [Thu, 16 Jun 2022 22:50:36 +0000 (15:50 -0700)]
Merge tag 'selinux-pr-
20220616' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A single SELinux patch to fix memory leaks when mounting filesystems
with SELinux mount options"
* tag 'selinux-pr-
20220616' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: free contexts previously transferred in selinux_add_opt()
Linus Torvalds [Thu, 16 Jun 2022 18:51:32 +0000 (11:51 -0700)]
Merge tag 'net-5.19-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Mostly driver fixes.
Current release - regressions:
- Revert "net: Add a second bind table hashed by port and address",
needs more work
- amd-xgbe: use platform_irq_count(), static setup of IRQ resources
had been removed from DT core
- dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation
Current release - new code bugs:
- hns3: modify the ring param print info
Previous releases - always broken:
- axienet: make the 64b addressable DMA depends on 64b architectures
- iavf: fix issue with MAC address of VF shown as zero
- ice: fix PTP TX timestamp offset calculation
- usb: ax88179_178a needs FLAG_SEND_ZLP
Misc:
- document some net.sctp.* sysctls"
* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
net: axienet: add missing error return code in axienet_probe()
Revert "net: Add a second bind table hashed by port and address"
net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
net: usb: ax88179_178a needs FLAG_SEND_ZLP
MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
ARM: dts: at91: ksz9477_evb: fix port/phy validation
net: bgmac: Fix an erroneous kfree() in bgmac_remove()
ice: Fix memory corruption in VF driver
ice: Fix queue config fail handling
ice: Sync VLAN filtering features for DVM
ice: Fix PTP TX timestamp offset calculation
mlxsw: spectrum_cnt: Reorder counter pools
docs: networking: phy: Fix a typo
amd-xgbe: Use platform_irq_count()
octeontx2-vf: Add support for adaptive interrupt coalescing
xilinx: Fix build on x86.
net: axienet: Use iowrite64 to write all 64b descriptor pointers
net: axienet: make the 64b addresable DMA depends on 64b archectures
net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
net: hns3: fix PF rss size initialization bug
...
Yang Yingliang [Thu, 16 Jun 2022 06:29:17 +0000 (14:29 +0800)]
net: axienet: add missing error return code in axienet_probe()
It should return error code in error path in axienet_probe().
Fixes:
00be43a74ca2 ("net: axienet: make the 64b addresable DMA depends on 64b archectures")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220616062917.3601-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joanne Koong [Wed, 15 Jun 2022 19:32:13 +0000 (12:32 -0700)]
Revert "net: Add a second bind table hashed by port and address"
This reverts:
commit
d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
commit
538aaf9b2383 ("selftests: Add test for timing a bind request to a port with a populated bhash entry")
Link: https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/
There are a few things that need to be fixed here:
* Updating bhash2 in cases where the socket's rcv saddr changes
* Adding bhash2 hashbucket locks
Links to syzbot reports:
https://lore.kernel.org/netdev/
00000000000022208805e0df247a@google.com/
https://lore.kernel.org/netdev/
0000000000003f33bc05dfaf44fe@google.com/
Fixes:
d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
Reported-by: syzbot+015d756bbd1f8b5c8f09@syzkaller.appspotmail.com
Reported-by: syzbot+98fd2d1422063b0f8c44@syzkaller.appspotmail.com
Reported-by: syzbot+0a847a982613c6438fba@syzkaller.appspotmail.com
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20220615193213.2419568-1-joannelkoong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dylan Yudaken [Thu, 16 Jun 2022 13:50:11 +0000 (06:50 -0700)]
io_uring: do not use prio task_work_add in uring_cmd
io_req_task_prio_work_add has a strict assumption that it will only be
used with io_req_task_complete. There is a codepath that assumes this is
the case and will not even call the completion function if it is hit.
For uring_cmd with an arbitrary completion function change the call to the
correct non-priority version.
Fixes:
ee692a21e9bf8 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220616135011.441980-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 16 Jun 2022 01:51:11 +0000 (19:51 -0600)]
io_uring: commit non-pollable provided mapped buffers upfront
For recv/recvmsg, IO either completes immediately or gets queued for a
retry. This isn't the case for read/readv, if eg a normal file or a block
device is used. Here, an operation can get queued with the block layer.
If this happens, ring mapped buffers must get committed immediately to
avoid that the next read can consume the same buffer.
Check if we're dealing with pollable file, when getting a new ring mapped
provided buffer. If it's not, commit it immediately rather than wait post
issue. If we don't wait, we can race with completions coming in, or just
plain buffer reuse by committing after a retry where others could have
grabbed the same buffer.
Fixes:
c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christian Göttsche [Wed, 15 Jun 2022 15:38:39 +0000 (17:38 +0200)]
selinux: free contexts previously transferred in selinux_add_opt()
`selinux_add_opt()` stopped taking ownership of the passed context since
commit
70f4169ab421 ("selinux: parse contexts for mount options early").
unreferenced object 0xffff888114dfd140 (size 64):
comm "mount", pid 15182, jiffies
4295687028 (age 796.340s)
hex dump (first 32 bytes):
73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f system_u:object_
72 3a 74 65 73 74 5f 66 69 6c 65 73 79 73 74 65 r:test_filesyste
backtrace:
[<
ffffffffa07dbef4>] kmemdup_nul+0x24/0x80
[<
ffffffffa0d34253>] selinux_sb_eat_lsm_opts+0x293/0x560
[<
ffffffffa0d13f08>] security_sb_eat_lsm_opts+0x58/0x80
[<
ffffffffa0af1eb2>] generic_parse_monolithic+0x82/0x180
[<
ffffffffa0a9c1a5>] do_new_mount+0x1f5/0x550
[<
ffffffffa0a9eccb>] path_mount+0x2ab/0x1570
[<
ffffffffa0aa019e>] __x64_sys_mount+0x20e/0x280
[<
ffffffffa1f47124>] do_syscall_64+0x34/0x80
[<
ffffffffa200007e>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
unreferenced object 0xffff888108e71640 (size 64):
comm "fsmount", pid 7607, jiffies
4295044974 (age 1601.016s)
hex dump (first 32 bytes):
73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f system_u:object_
72 3a 74 65 73 74 5f 66 69 6c 65 73 79 73 74 65 r:test_filesyste
backtrace:
[<
ffffffff861dc2b1>] memdup_user+0x21/0x90
[<
ffffffff861dc367>] strndup_user+0x47/0xa0
[<
ffffffff864f6965>] __do_sys_fsconfig+0x485/0x9f0
[<
ffffffff87940124>] do_syscall_64+0x34/0x80
[<
ffffffff87a0007e>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Cc: stable@vger.kernel.org
Fixes:
70f4169ab421 ("selinux: parse contexts for mount options early")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Christian Göttsche [Wed, 15 Jun 2022 15:44:31 +0000 (17:44 +0200)]
audit: free module name
Reset the type of the record last as the helper `audit_free_module()`
depends on it.
unreferenced object 0xffff888153b707f0 (size 16):
comm "modprobe", pid 1319, jiffies
4295110033 (age 1083.016s)
hex dump (first 16 bytes):
62 69 6e 66 6d 74 5f 6d 69 73 63 00 6b 6b 6b a5 binfmt_misc.kkk.
backtrace:
[<
ffffffffa07dbf9b>] kstrdup+0x2b/0x50
[<
ffffffffa04b0a9d>] __audit_log_kern_module+0x4d/0xf0
[<
ffffffffa03b6664>] load_module+0x9d4/0x2e10
[<
ffffffffa03b8f44>] __do_sys_finit_module+0x114/0x1b0
[<
ffffffffa1f47124>] do_syscall_64+0x34/0x80
[<
ffffffffa200007e>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Cc: stable@vger.kernel.org
Fixes:
12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus Torvalds [Wed, 15 Jun 2022 21:20:26 +0000 (14:20 -0700)]
Merge tag 'hardening-v5.19-rc3' of git://git./linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Correctly handle vm_map areas in hardened usercopy (Matthew Wilcox)
- Adjust CFI RCU usage to avoid boot splats with cpuidle (Sami Tolvanen)
* tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usercopy: Make usercopy resilient against ridiculously large copies
usercopy: Cast pointer to an integer once
usercopy: Handle vm_map_ram() areas
cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle
Linus Torvalds [Wed, 15 Jun 2022 19:34:19 +0000 (12:34 -0700)]
Merge tag 'tpmdd-next-v5.19-rc3' of git://git./linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two fixes for this merge window"
* tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
certs/blacklist_hashes.c: fix const confusion in certs blacklist
Masahiro Yamada [Sat, 11 Jun 2022 17:22:31 +0000 (02:22 +0900)]
certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
Commit
addf466389d9 ("certs: Check that builtin blacklist hashes are
valid") was applied 8 months after the submission.
In the meantime, the base code had been removed by commit
b8c96a6b466c
("certs: simplify $(srctree)/ handling and remove config_filename
macro").
Fix the Makefile.
Create a local copy of $(CONFIG_SYSTEM_BLACKLIST_HASH_LIST). It is
included from certs/blacklist_hashes.c and also works as a timestamp.
Send error messages from check-blacklist-hashes.awk to stderr instead
of stdout.
Fixes:
addf466389d9 ("certs: Check that builtin blacklist hashes are valid")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Masahiro Yamada [Sat, 11 Jun 2022 17:22:30 +0000 (02:22 +0900)]
certs/blacklist_hashes.c: fix const confusion in certs blacklist
This file fails to compile as follows:
CC certs/blacklist_hashes.o
certs/blacklist_hashes.c:4:1: error: ignoring attribute ‘section (".init.data")’ because it conflicts with previous ‘section (".init.rodata")’ [-Werror=attributes]
4 | const char __initdata *const blacklist_hashes[] = {
| ^~~~~
In file included from certs/blacklist_hashes.c:2:
certs/blacklist.h:5:38: note: previous declaration here
5 | extern const char __initconst *const blacklist_hashes[];
| ^~~~~~~~~~~~~~~~
Apply the same fix as commit
2be04df5668d ("certs/blacklist_nohashes.c:
fix const confusion in certs blacklist").
Fixes:
734114f8782f ("KEYS: Add a system blacklist keyring")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Linus Torvalds [Wed, 15 Jun 2022 16:04:55 +0000 (09:04 -0700)]
Merge tag 'fs.fixes.v5.19-rc3' of git://git./linux/kernel/git/brauner/linux
Pull vfs idmapping fix from Christian Brauner:
"This fixes an issue where we fail to change the group of a file when
the caller owns the file and is a member of the group to change to.
This is only relevant on idmapped mounts.
There's a detailed description in the commit message and regression
tests have been added to xfstests"
* tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: account for group membership
Duoming Zhou [Tue, 14 Jun 2022 09:25:57 +0000 (17:25 +0800)]
net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
and block until it receives a packet from the remote. If the client
doesn`t connect to server and calls read() directly, it will not
receive any packets forever. As a result, the deadlock will happen.
The fail log caused by deadlock is shown below:
[ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
[ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 369.613058] Call Trace:
[ 369.613315] <TASK>
[ 369.614072] __schedule+0x2f9/0xb20
[ 369.615029] schedule+0x49/0xb0
[ 369.615734] __lock_sock+0x92/0x100
[ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20
[ 369.617941] lock_sock_nested+0x6e/0x70
[ 369.618809] ax25_bind+0xaa/0x210
[ 369.619736] __sys_bind+0xca/0xf0
[ 369.620039] ? do_futex+0xae/0x1b0
[ 369.620387] ? __x64_sys_futex+0x7c/0x1c0
[ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40
[ 369.620613] __x64_sys_bind+0x11/0x20
[ 369.621791] do_syscall_64+0x3b/0x90
[ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 369.623319] RIP: 0033:0x7f43c8aa8af7
[ 369.624301] RSP: 002b:
00007f43c8197ef8 EFLAGS:
00000246 ORIG_RAX:
0000000000000031
[ 369.625756] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f43c8aa8af7
[ 369.626724] RDX:
0000000000000010 RSI:
000055768e2021d0 RDI:
0000000000000005
[ 369.628569] RBP:
00007f43c8197f00 R08:
0000000000000011 R09:
00007f43c8198700
[ 369.630208] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fff845e6afe
[ 369.632240] R13:
00007fff845e6aff R14:
00007f43c8197fc0 R15:
00007f43c8198700
This patch replaces skb_recv_datagram() with an open-coded variant of it
releasing the socket lock before the __skb_wait_for_more_packets() call
and re-acquiring it after such call in order that other functions that
need socket lock could be executed.
what's more, the socket lock will be released only when recvmsg() will
block and that should produce nicer overall behavior.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reported-by: Thomas Habets <thomas@@habets.se>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Wed, 15 Jun 2022 10:23:07 +0000 (11:23 +0100)]
io_uring: make io_fill_cqe_aux honour CQE32
Don't let io_fill_cqe_aux() post 16B cqes for CQE32 rings, neither the
kernel nor the userspace expect this to happen.
Fixes:
76c68fbf1a1f9 ("io_uring: enable CQE32")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/64fae669fae1b7083aa15d0cd807f692b0880b9a.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 10:23:06 +0000 (11:23 +0100)]
io_uring: remove __io_fill_cqe() helper
In preparation for the following patch, inline __io_fill_cqe(), there is
only one user.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/71dab9afc3cde3f8b64d26f20d3b60bdc40726ff.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 10:23:05 +0000 (11:23 +0100)]
io_uring: fix ->extra{1,2} misuse
We don't really know the state of req->extra{1,2] fields in
__io_fill_cqe_req(), if an opcode handler is not aware of CQE32 option,
it never sets them up properly. Track the state of those fields with a
request flag.
Fixes:
76c68fbf1a1f9 ("io_uring: enable CQE32")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4b3e5be512fbf4debec7270fd485b8a3b014d464.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 10:23:04 +0000 (11:23 +0100)]
io_uring: fill extra big cqe fields from req
The only user of io_req_complete32()-like functions is cmd
requests. Instead of keeping the whole complete32 family, remove them
and provide the extras in already added for inline completions
req->extra{1,2}. When fill_cqe_res() finds CQE32 option enabled
it'll use those fields to fill a 32B cqe.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/af1319eb661b1f9a0abceb51cbbf72b8002e019d.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>