AgeCommit message (Collapse)Author
2021-03-25io-wq: fix race around pending work on teardownJens Axboe
syzbot reports that it's triggering the warning condition on having pending work on shutdown: WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline] WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072 Modules linked in: CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline] RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072 Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09 RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293 RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040 RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82 R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000 FS: 00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802 __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820 io_uring_files_cancel include/linux/io_uring.h:47 [inline] do_exit+0x258/0x2340 kernel/exit.c:780 do_group_exit+0x168/0x2d0 kernel/exit.c:922 get_signal+0x1734/0x1ef0 kernel/signal.c:2773 arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x465f69 which shouldn't happen, but seems to be possible due to a race on whether or not the io-wq manager sees a fatal signal first, or whether the io-wq workers do. If we race with queueing work and then send a fatal signal to the owning task, and the io-wq worker sees that before the manager sets IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work behind. Just turn the WARN_ON_ONCE() into a cancelation condition instead. Reported-by: Signed-off-by: Jens Axboe <>
2021-03-24io_uring: do ctx sqd ejection in a clear contextPavel Begunkov
WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147 CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0 RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147 Call Trace: io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619 io_uring_release+0x3e/0x50 fs/io_uring.c:8646 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 io_run_task_work fs/io_uring.c:2238 [inline] io_run_task_work fs/io_uring.c:2228 [inline] io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770 io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974 io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907 io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961 io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so fput callback (i.e. io_uring_release()) is enqueued through task_work, and run by same cancellation. As it's deeply nested we can't do parking or taking sqd->lock there, because its state is unclear. So avoid ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it in a clear context in io_ring_exit_work(). Fixes: f6d54255f423 ("io_uring: halt SQO submission on ctx exit") Reported-by: Signed-off-by: Pavel Begunkov <> Link: Signed-off-by: Jens Axboe <>
2021-03-22io_uring: fix provide_buffers sign extensionPavel Begunkov
io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension problems. Not a huge problem as it's only used for access_ok() and increases the checked length, but better to keep typing right. Reported-by: Colin Ian King <> Fixes: efe68c1ca8f49 ("io_uring: validate the full range of provided buffers for access") Signed-off-by: Pavel Begunkov <> Reviewed-by: Colin Ian King <> Link: Signed-off-by: Jens Axboe <>
2021-03-22io_uring: don't skip file_end_write() on reissuePavel Begunkov
Don't miss to call kiocb_end_write() from __io_complete_rw() on reissue. Shouldn't be much of a problem as the function actually does some work only for ISREG, and NONBLOCK won't be reissued. Signed-off-by: Pavel Begunkov <> Link: Signed-off-by: Jens Axboe <>
2021-03-22io_uring: correct io_queue_async_work() tracesPavel Begunkov
Request's io-wq work is hashed in io_prep_async_link(), so as trace_io_uring_queue_async_work() looks at it should follow after prep has been done. Signed-off-by: Pavel Begunkov <> Link: Signed-off-by: Jens Axboe <>
2021-03-21io_uring: don't use {test,clear}_tsk_thread_flag() for currentJens Axboe
Linus correctly points out that this is both unnecessary and generates much worse code on some archs as going from current to thread_info is actually backwards - and obviously just wasteful, since the thread_info is what we care about. Since io_uring only operates on current for these operations, just use test_thread_flag() instead. For io-wq, we can further simplify and use tracehook_notify_signal() to handle the TIF_NOTIFY_SIGNAL work and clear the flag. The latter isn't an actual bug right now, but it may very well be in the future if we place other work items under TIF_NOTIFY_SIGNAL. Reported-by: Linus Torvalds <> Link: Signed-off-by: Jens Axboe <>
2021-03-21io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with ↵io_uring-5.12-2021-03-21Stefan Metzmacher
MSG_WAITALL Without that it's not safe to use them in a linked combination with others. Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE should be possible. We already handle short reads and writes for the following opcodes: - IORING_OP_READV - IORING_OP_READ_FIXED - IORING_OP_READ - IORING_OP_WRITEV - IORING_OP_WRITE_FIXED - IORING_OP_WRITE - IORING_OP_SPLICE - IORING_OP_TEE Now we have it for these as well: - IORING_OP_SENDMSG - IORING_OP_SEND - IORING_OP_RECVMSG - IORING_OP_RECV For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC flags in order to call req_set_fail_links(). There might be applications arround depending on the behavior that even short send[msg]()/recv[msg]() retuns continue an IOSQE_IO_LINK chain. It's very unlikely that such applications pass in MSG_WAITALL, which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'. It's expected that the low level sock_sendmsg() call just ignores MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set SO_ZEROCOPY. We also expect the caller to know about the implicit truncation to MAX_RW_COUNT, which we don't detect. cc: Link: Signed-off-by: Stefan Metzmacher <> Signed-off-by: Jens Axboe <>
2021-03-21io-wq: ensure task is running before processing task_workJens Axboe
Mark the current task as running if we need to run task_work from the io-wq threads as part of work handling. If that is the case, then return as such so that the caller can appropriately loop back and reset if it was part of a going-to-sleep flush. Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task") Signed-off-by: Jens Axboe <>
2021-03-21signal: don't allow STOP on PF_IO_WORKER threadsEric W. Biederman
Just like we don't allow normal signals to IO threads, don't deliver a STOP to a task that has PF_IO_WORKER set. The IO threads don't take signals in general, and have no means of flushing out a stop either. Longer term, we may want to look into allowing stop of these threads, as it relates to eg process freezing. For now, this prevents a spin issue if a SIGSTOP is delivered to the parent task. Reported-by: Stefan Metzmacher <> Signed-off-by: Jens Axboe <> Signed-off-by: "Eric W. Biederman" <>
2021-03-21signal: don't allow sending any signals to PF_IO_WORKER threadsJens Axboe
They don't take signals individually, and even if they share signals with the parent task, don't allow them to be delivered through the worker thread. Linux does allow this kind of behavior for regular threads, but it's really a compatability thing that we need not care about for the IO threads. Reported-by: Stefan Metzmacher <> Signed-off-by: Jens Axboe <>
2021-03-18io_uring: don't leak creds on SQO attach errorio_uring-5.12-2021-03-19Pavel Begunkov
Attaching to already dead/dying SQPOLL task is disallowed in io_sq_offload_create(), but cleanup is hand coded by calling io_put_sq_data()/etc., that miss to put ctx->sq_creds. Defer everything to error-path io_sq_thread_finish(), adding ctx->sqd_list in the error case as well as finish will handle it. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-18io_uring: use typesafe pointers in io_uring_taskStefan Metzmacher
Signed-off-by: Stefan Metzmacher <> Link: Signed-off-by: Jens Axboe <>
2021-03-18io_uring: remove structures from include/linux/io_uring.hStefan Metzmacher
Link: Signed-off-by: Stefan Metzmacher <> Signed-off-by: Jens Axboe <>
2021-03-18io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() callsStefan Metzmacher
We never want to generate any SIGPIPE, -EPIPE only is much better. Signed-off-by: Stefan Metzmacher <> Link: Signed-off-by: Jens Axboe <>
2021-03-15io_uring: fix sqpoll cancellation via task_workPavel Begunkov
Running sqpoll cancellations via task_work_run() is a bad idea because it depends on other task works to be run, but those may be locked in currently running task_work_run() because of how it's (splicing the list in batches). Enqueue and run them through a separate callback head, namely struct io_sq_data::park_task_work. As a nice bonus we now precisely control where it's run, that's much safer than guessing where it can happen as it was before. Reported-by: Jens Axboe <> Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-15io_uring: add generic callback_head helpersPavel Begunkov
We already have helpers to run/add callback_head but taking ctx and working with ctx->exit_task_work. Extract generic versions of them implemented in terms of struct callback_head, it will be used later. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-15io_uring: fix concurrent parkingPavel Begunkov
If io_sq_thread_park() of one task got rescheduled right after set_bit(), before it gets back to mutex_lock() there can happen park()/unpark() by another task with SQPOLL locking again and continuing running never seeing that first set_bit(SHOULD_PARK), so won't even try to put the mutex down for parking. It will get parked eventually when SQPOLL drops the lock for reschedule, but may be problematic and will get in the way of further fixes. Account number of tasks waiting for parking with a new atomic variable park_pending and adjust SHOULD_PARK accordingly. It doesn't entirely replaces SHOULD_PARK bit with this atomic var because it's convenient to have it as a bit in the state and will help to do optimisations later. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-15io_uring: halt SQO submission on ctx exitPavel Begunkov
io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is potentially running submitting new requests. It's not a disaster because of using a "try" variant of percpu_ref_get, but is far from nice. Remove ctx from the sqd ctx list earlier, before cancellation loop, so SQPOLL can't find it and so won't submit new requests. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-15io_uring: replace sqd rw_semaphore with mutexPavel Begunkov
The only user of read-locking of sqd->rw_lock is sq_thread itself, which is by definition alone, so we don't really need rw_semaphore, but mutex will do. Replace it with a mutex, and kill read-to-write upgrading and extra task_work handling in io_sq_thread(). Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-15io_uring: fix complete_post use ctx after freePavel Begunkov
If io_req_complete_post() put not a final ref, we can't rely on the request's ctx ref, and so ctx may potentially be freed while complete_post() is in io_cqring_ev_posted()/etc. In that case get an additional ctx reference, and put it in the end, so protecting following io_cqring_ev_posted(). And also prolong ctx lifetime until spin_unlock happens, as we do with mutexes, so added percpu_ref_get() doesn't race with ctx free. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-15io_uring: fix ->flags races by linked timeoutsPavel Begunkov
It's racy to modify req->flags from a not owning context, e.g. linked timeout calling req_set_fail_links() for the master request might race with that request setting/clearing flags while being executed concurrently. Just remove req_set_fail_links(prev) from io_link_timeout_fn(), io_async_find_and_cancel() and functions down the line take care of setting the fail bit. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-14io_uring: convert io_buffer_idr to XArrayJens Axboe
Like we did for the personality idr, convert the IO buffer idr to use XArray. This avoids a use-after-free on removal of entries, since idr doesn't like doing so from inside an iterator, and it nicely reduces the amount of code we need to support this feature. Fixes: 5a2e745d4d43 ("io_uring: buffer registration infrastructure") Cc: Cc: Matthew Wilcox <> Cc: yangerkun <> Reported-by: Hulk Robot <> Signed-off-by: Jens Axboe <>
2021-03-12io_uring: allow IO worker threads to be frozenJens Axboe
With the freezer using the proper signaling to notify us of when it's time to freeze a thread, we can re-enable normal freezer usage for the IO threads. Ensure that SQPOLL, io-wq, and the io-wq manager call try_to_freeze() appropriately, and remove the default setting of PF_NOFREEZE from create_io_thread(). Signed-off-by: Jens Axboe <>
2021-03-12kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezingJens Axboe
Don't send fake signals to PF_IO_WORKER threads, they don't accept signals. Just treat them like kthreads in this regard, all they need is a wakeup as no forced kernel/user transition is needed. Suggested-by: Linus Torvalds <> Signed-off-by: Jens Axboe <>
2021-03-12io_uring: fix OP_ASYNC_CANCEL across tasksio_uring-5.12-2021-03-12Pavel Begunkov
IORING_OP_ASYNC_CANCEL tries io-wq cancellation only for current task. If it fails go over tctx_list and try it out for every single tctx. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-12io_uring: cancel sqpoll via task_workPavel Begunkov
1) The first problem is io_uring_cancel_sqpoll() -> io_uring_cancel_task_requests() basically doing park(); park(); and so hanging. 2) Another one is more subtle, when the master task is doing cancellations, but SQPOLL task submits in-between the end of the cancellation but before finish() requests taking a ref to the ctx, and so eternally locking it up. 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and same io_uring_cancel_sqpoll() from the owner task, they race for tctx->wait events. And there probably more of them. Instead do SQPOLL cancellations from within SQPOLL task context via task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal park()/unpark() during cancellation, which is ugly, subtle and anyway doesn't allow to do io_run_task_work() properly. io_uring_cancel_sqpoll() is called only from SQPOLL task context and under sqd locking, so all parking is removed from there. And so, io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by SQPOLL task, and that spare us from some headache. Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll, which is not used anymore. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-12io_uring: prevent racy sqd->thread checksPavel Begunkov
SQPOLL thread to which we're trying to attach may be going away, it's not nice but a more serious problem is if io_sq_offload_create() sees sqd->thread==NULL, and tries to init it with a new thread. There are tons of ways it can be exploited or fail. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-12io_uring: remove useless ->startup completionPavel Begunkov
We always do complete(&sqd->startup) almost right after sqd->thread creation, either in the success path or in io_sq_thread_finish(). It's specifically created not started for us to be able to set some stuff like sqd->thread and io_uring_alloc_task_context() before following right after wake_up_new_task(). Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-12io_uring: cancel deferred requests in try_cancelPavel Begunkov
As io_uring_cancel_files() and others let SQO to run between io_uring_try_cancel_requests(), SQO may generate new deferred requests, so it's safer to try to cancel them in it. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-11io_uring: perform IOPOLL reaping if canceler is thread itselfJens Axboe
We bypass IOPOLL completion polling (and reaping) for the SQPOLL thread, but if it's the thread itself invoking cancelations, then we still need to perform it or no one will. Fixes: 9936c7c2bc76 ("io_uring: deduplicate core cancellations sequence") Signed-off-by: Jens Axboe <>
2021-03-11io_uring: force creation of separate context for ATTACH_WQ and non-threadsJens Axboe
Earlier kernels had SQPOLL threads that could share across anything, as we grabbed the context we needed on a per-ring basis. This is no longer the case, so only allow attaching directly if we're in the same thread group. That is the common use case. For non-group tasks, just setup a new context and thread as we would've done if sharing wasn't set. This isn't 100% ideal in terms of CPU utilization for the forked and share case, but hopefully that isn't much of a concern. If it is, there are plans in motion for how to improve that. Most importantly, we want to avoid app side regressions where sharing worked before and now doesn't. With this patch, functionality is equivalent to previous kernels that supported IORING_SETUP_ATTACH_WQ with SQPOLL. Reported-by: Stefan Metzmacher <> Signed-off-by: Jens Axboe <>
2021-03-10io_uring: remove indirect ctx into sqo injectionPavel Begunkov
We use ->ctx_new_list to notify sqo about new ctx pending, then sqo should stop and splice it to its sqd->ctx_list, paired with ->sq_thread_comp. The last one is broken because nobody reinitialises it, and trying to fix it would only add more complexity and bugs. And the first isn't really needed as is done under park(), that protects from races well. Add ctx into sqd->ctx_list directly (under park()), it's much simpler and allows to kill both, ctx_new_list and sq_thread_comp. note: apparently there is no real problem at the moment, because sq_thread_comp is used only by io_sq_thread_finish() followed by parking, where list_del(&ctx->sqd_list) removes it well regardless whether it's in the new or the active list. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-10io_uring: fix invalid ctx->sq_thread_idlePavel Begunkov
We have to set ctx->sq_thread_idle before adding a ring to an SQ task, otherwise sqd races for seeing zero and accounting it as such. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-10kernel: make IO threads unfreezable by defaultJens Axboe
The io-wq threads were already marked as no-freeze, but the manager was not. On resume, we perpetually have signal_pending() being true, and hence the manager will loop and spin 100% of the time. Just mark the tasks created by create_io_thread() as PF_NOFREEZE by default, and remove any knowledge of it in io-wq and io_uring. Reported-by: Kevin Locke <> Tested-by: Kevin Locke <> Signed-off-by: Jens Axboe <>
2021-03-10io_uring: always wait for sqd exited when stopping SQPOLL threadJens Axboe
We have a tiny race where io_put_sq_data() calls io_sq_thead_stop() and finds the thread gone, but the thread has indeed not fully exited or called complete() yet. Close it up by always having io_sq_thread_stop() wait on completion of the exit event. Signed-off-by: Jens Axboe <>
2021-03-10io_uring: remove unneeded variable 'ret'Yang Li
Fix the following coccicheck warning: ./fs/io_uring.c:8984:5-8: Unneeded variable: "ret". Return "0" on line 8998 Reported-by: Abaci Robot <> Signed-off-by: Yang Li <> Link: Signed-off-by: Jens Axboe <>
2021-03-10io_uring: move all io_kiocb init early in io_init_req()Jens Axboe
If we hit an error path in the function, make sure that the io_kiocb is fully initialized at that point so that freeing the request always sees a valid state. Signed-off-by: Jens Axboe <>
2021-03-10io-wq: fix ref leak for req in case of exit cancelationsyangerkun
do_work such as io_wq_submit_work that cancel the work may leave a ref of req as 1 if we have links. Fix it by call io_run_cancel. Fixes: 4fb6ac326204 ("io-wq: improve manager/worker handling over exec") Signed-off-by: yangerkun <> Link: Signed-off-by: Jens Axboe <>
2021-03-10io_uring: fix complete_post races for linked reqPavel Begunkov
Calling io_queue_next() after spin_unlock in io_req_complete_post() races with the other side extracting and reusing this request. Hand coded parts of io_req_find_next() considering that io_disarm_next() and io_req_task_queue() have (and safe) to be called with completion_lock held. It already does io_commit_cqring() and io_cqring_ev_posted(), so just reuse it for post io_disarm_next(). Signed-off-by: Pavel Begunkov <> Link: Signed-off-by: Jens Axboe <>
2021-03-10io_uring: add io_disarm_next() helperPavel Begunkov
A preparation patch placing all preparations before extracting a next request into a separate helper io_disarm_next(). Also, don't spuriously do ev_posted in a rare case where REQ_F_FAIL_LINK is set but there are no requests linked (i.e. after cancelling a linked timeout or setting IOSQE_IO_LINK on a last request of a submission batch). Signed-off-by: Pavel Begunkov <> Link: Signed-off-by: Jens Axboe <>
2021-03-10io_uring: fix io_sq_offload_create error handlingPavel Begunkov
Don't set IO_SQ_THREAD_SHOULD_STOP when io_sq_offload_create() has failed on io_uring_alloc_task_context() but leave everything to io_sq_thread_finish(), because currently io_sq_thread_finish() hangs on trying to park it. That's great it stalls there, because otherwise the following io_sq_thread_stop() would be skipped on IO_SQ_THREAD_SHOULD_STOP check and the sqo would race for sqd with freeing ctx. A simple error injection gives something like this. [ 245.463955] INFO: task sqpoll-test-hang:523 blocked for more than 122 seconds. [ 245.463983] Call Trace: [ 245.463990] __schedule+0x36b/0x950 [ 245.464005] schedule+0x68/0xe0 [ 245.464013] schedule_timeout+0x209/0x2a0 [ 245.464032] wait_for_completion+0x8b/0xf0 [ 245.464043] io_sq_thread_finish+0x44/0x1a0 [ 245.464049] io_uring_setup+0x9ea/0xc80 [ 245.464058] __x64_sys_io_uring_setup+0x16/0x20 [ 245.464064] do_syscall_64+0x38/0x50 [ 245.464073] entry_SYSCALL_64_after_hwframe+0x44/0xae Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-10io-wq: remove unused 'user' member of io_wqJens Axboe
Previous patches killed the last user of this, now it's just a dead member in the struct. Get rid of it. Signed-off-by: Jens Axboe <>
2021-03-10io_uring: Convert personality_idr to XArrayMatthew Wilcox (Oracle)
You can't call idr_remove() from within a idr_for_each() callback, but you can call xa_erase() from an xa_for_each() loop, so switch the entire personality_idr from the IDR to the XArray. This manifests as a use-after-free as idr_for_each() attempts to walk the rest of the node after removing the last entry from it. Fixes: 071698e13ac6 ("io_uring: allow registering credentials") Cc: # 5.6+ Reported-by: yangerkun <> Signed-off-by: Matthew Wilcox (Oracle) <> [Pavel: rebased (creds load was moved into io_init_req())] Signed-off-by: Pavel Begunkov <> Link: Signed-off-by: Jens Axboe <>
2021-03-10io_uring: clean R_DISABLED startup messPavel Begunkov
There are enough of problems with IORING_SETUP_R_DISABLED, including the burden of checking and kicking off the SQO task all over the codebase -- for exit/cancel/etc. Rework it, always start the thread but don't do submit unless the flag is gone, that's much easier. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-10io_uring: fix unrelated ctx reqs cancellationPavel Begunkov
io-wq now is per-task, so cancellations now should match against request's ctx. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-10io_uring: SQPOLL parking fixesJens Axboe
We keep running into weird dependency issues between the sqd lock and the parking state. Disentangle the SQPOLL thread from the last bits of the kthread parking inheritance, and just replace the parking state, and two associated locks, with a single rw mutex. The SQPOLL thread keeps the mutex for read all the time, except if someone has marked us needing to park. Then we drop/re-acquire and try again. This greatly simplifies the parking state machine (by just getting rid of it), and makes it a lot more obvious how it works - if you need to modify the ctx list, then you simply park the thread which will grab the lock for writing. Fold in fix from Hillf Danton on not setting STOP on a fatal signal. Fixes: e54945ae947f ("io_uring: SQPOLL stop error handling fixes") Signed-off-by: Jens Axboe <>
2021-03-07io_uring: kill io_sq_thread_fork() and return -EOWNERDEAD if the sq_thread ↵Stefan Metzmacher
is gone This brings the behavior back in line with what 5.11 and earlier did, and this is no longer needed with the improved handling of creds not needing to do unshare(). Signed-off-by: Stefan Metzmacher <> Signed-off-by: Jens Axboe <>
2021-03-07io_uring: run __io_sq_thread() with the initial creds from io_uring_setup()Stefan Metzmacher
With IORING_SETUP_ATTACH_WQ we should let __io_sq_thread() use the initial creds from each ctx. Signed-off-by: Stefan Metzmacher <> Signed-off-by: Jens Axboe <>
2021-03-07io-wq: warn on creating manager while exitingPavel Begunkov
Add a simple warning making sure that nobody tries to create a new manager while we're under IO_WQ_BIT_EXIT. That can potentially happen due to racy work submission after final put. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2021-03-07io_uring: cancel reqs of all iowq's on ring exitPavel Begunkov
io_ring_exit_work() have to cancel all requests, including those staying in io-wq, however it tries only cancellation of current tctx, which is NULL. If we've got task==NULL, use the ctx-to-tctx map to go over all tctx/io-wq and try cancellations on them. Signed-off-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>