io_uring: support deferring submissions to schedule out time
If IORING_SETUP_SCHED_SUBMIT is enabled for !SQPOLL rings,
io_uring_enter(2) syscalls that submit requests will merely prepare the
selected requests and not go through the submission of them. Submission
will instead be performed when the task that prepared the IO is
scheduled out. This allows for fairly trivial batching of submission
events, even across multiple io_uring_enter(2) syscalls, and for
submission of already prepared IO by the time that the task is scheduled
out anyway.
This is a WIP patch, more of a proof of concept than anything else. The
schedule out handling isn't particularly nice, as submission can
potentially cause a blocking event. If the flow to get there is through
schedule, then we could recurse. This is handled by ensuring that the
pending submission list is fully cleared before any of them are
submitted. Hence if an unlikely recursion event into schedule were to
happen, it will be a no-op as nothing is pending.
Outside of doing lazy batch submissions when the task schedules out
anyway, it also allows more precise control of the full list of
pending requests to issue, as they have all been prepared upfront. This
can be advantageous in situations where better decisions can be made
when the kernel knows exactly what is pending on the submission side.
Signed-off-by: Jens Axboe <axboe@kernel.dk>