AgeCommit message (Collapse)Author
2020-10-12io_uring: Convert advanced XArray uses to the normal API5.8-stableMatthew Wilcox (Oracle)
commit 5e2ed8c4f45093698855b1f45cdf43efbf6dd498 upstream. There are no bugs here that I've spotted, it's just easier to use the normal API and there are no performance advantages to using the more verbose advanced API. Signed-off-by: Matthew Wilcox (Oracle) <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: Fix XArray usage in io_uring_add_task_fileMatthew Wilcox (Oracle)
commit 236434c3438c4da3dfbd6aeeab807577b85e951a upstream. The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't allocate memory using GFP_NOWAIT, it would leak the reference to the file descriptor. Also the node pointed to by the xas could be freed between the call to xas_load() under the rcu_read_lock() and the acquisition of the xa_lock. It's easier to just use the normal xa_load/xa_store interface here. Signed-off-by: Matthew Wilcox (Oracle) <> [axboe: fix missing assign after alloc, cur_uring -> tctx rename] Signed-off-by: Jens Axboe <>
2020-10-12io_uring: Fix use of XArray in __io_uring_files_cancelMatthew Wilcox (Oracle)
commit ce765372bc443573d1d339a2bf4995de385dea3a upstream. We have to drop the lock during each iteration, so there's no advantage to using the advanced API. Convert this to a standard xa_for_each() loop. Reported-by: Signed-off-by: Matthew Wilcox (Oracle) <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: no need to call xa_destroy() on empty xarrayJens Axboe
commit ca6484cd308a671811bf39f3119e81966eb476e3 upstream. The kernel test robot reports this lockdep issue: [child1:659] mbind (274) returned ENOSYS, marking as inactive. [child1:659] mq_timedsend (279) returned ENOSYS, marking as inactive. [main] 10175 iterations. [F:7781 S:2344 HI:2397] [ 24.610601] [ 24.610743] ================================ [ 24.611083] WARNING: inconsistent lock state [ 24.611437] 5.9.0-rc7-00017-g0f2122045b9462 #5 Not tainted [ 24.611861] -------------------------------- [ 24.612193] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 24.612660] ksoftirqd/0/7 [HC0[0]:SC1[3]:HE0:SE0] takes: [ 24.613086] f00ed998 (&xa->xa_lock#4){+.?.}-{2:2}, at: xa_destroy+0x43/0xc1 [ 24.613642] {SOFTIRQ-ON-W} state was registered at: [ 24.614024] lock_acquire+0x20c/0x29b [ 24.614341] _raw_spin_lock+0x21/0x30 [ 24.614636] io_uring_add_task_file+0xe8/0x13a [ 24.614987] io_uring_create+0x535/0x6bd [ 24.615297] io_uring_setup+0x11d/0x136 [ 24.615606] __ia32_sys_io_uring_setup+0xd/0xf [ 24.615977] do_int80_syscall_32+0x53/0x6c [ 24.616306] restore_all_switch_stack+0x0/0xb1 [ 24.616677] irq event stamp: 939881 [ 24.616968] hardirqs last enabled at (939880): [<8105592d>] __local_bh_enable_ip+0x13c/0x145 [ 24.617642] hardirqs last disabled at (939881): [<81b6ace3>] _raw_spin_lock_irqsave+0x1b/0x4e [ 24.618321] softirqs last enabled at (939738): [<81b6c7c8>] __do_softirq+0x3f0/0x45a [ 24.618924] softirqs last disabled at (939743): [<81055741>] run_ksoftirqd+0x35/0x61 [ 24.619521] [ 24.619521] other info that might help us debug this: [ 24.620028] Possible unsafe locking scenario: [ 24.620028] [ 24.620492] CPU0 [ 24.620685] ---- [ 24.620894] lock(&xa->xa_lock#4); [ 24.621168] <Interrupt> [ 24.621381] lock(&xa->xa_lock#4); [ 24.621695] [ 24.621695] *** DEADLOCK *** [ 24.621695] [ 24.622154] 1 lock held by ksoftirqd/0/7: [ 24.622468] #0: 823bfb94 (rcu_callback){....}-{0:0}, at: rcu_process_callbacks+0xc0/0x155 [ 24.623106] [ 24.623106] stack backtrace: [ 24.623454] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.9.0-rc7-00017-g0f2122045b9462 #5 [ 24.624090] Call Trace: [ 24.624284] ? show_stack+0x40/0x46 [ 24.624551] dump_stack+0x1b/0x1d [ 24.624809] print_usage_bug+0x17a/0x185 [ 24.625142] mark_lock+0x11d/0x1db [ 24.625474] ? print_shortest_lock_dependencies+0x121/0x121 [ 24.625905] __lock_acquire+0x41e/0x7bf [ 24.626206] lock_acquire+0x20c/0x29b [ 24.626517] ? xa_destroy+0x43/0xc1 [ 24.626810] ? lock_acquire+0x20c/0x29b [ 24.627110] _raw_spin_lock_irqsave+0x3e/0x4e [ 24.627450] ? xa_destroy+0x43/0xc1 [ 24.627725] xa_destroy+0x43/0xc1 [ 24.627989] __io_uring_free+0x57/0x71 [ 24.628286] ? get_pid+0x22/0x22 [ 24.628544] __put_task_struct+0xf2/0x163 [ 24.628865] put_task_struct+0x1f/0x2a [ 24.629161] delayed_put_task_struct+0xe2/0xe9 [ 24.629509] rcu_process_callbacks+0x128/0x155 [ 24.629860] __do_softirq+0x1a3/0x45a [ 24.630151] run_ksoftirqd+0x35/0x61 [ 24.630443] smpboot_thread_fn+0x304/0x31a [ 24.630763] kthread+0x124/0x139 [ 24.631016] ? sort_range+0x18/0x18 [ 24.631290] ? kthread_create_worker_on_cpu+0x17/0x17 [ 24.631682] ret_from_fork+0x1c/0x28 which is complaining about xa_destroy() grabbing the xa lock in an IRQ disabling fashion, whereas the io_uring uses cases aren't interrupt safe. This is really an xarray issue, since it should not assume the lock type. But for our use case, since we know the xarray is empty at this point, there's no need to actually call xa_destroy(). So just get rid of it. Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references") Reported-by: kernel test robot <> Signed-off-by: Jens Axboe <>
2020-10-12io-wq: fix use-after-free in io_wq_worker_runningHillf Danton
commit c4068bf898ddaef791049a366828d9b84b467bda upstream. The smart syzbot has found a reproducer for the following issue: ================================================================== BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline] BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771 CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:192 instrument_atomic_write include/linux/instrumented.h:71 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] io_wqe_inc_running fs/io-wq.c:301 [inline] io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 schedule_timeout+0x148/0x250 kernel/time/timer.c:1879 io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Allocated by task 7768: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594 kmalloc_node include/linux/slab.h:572 [inline] kzalloc_node include/linux/slab.h:677 [inline] io_wq_create+0x57b/0xa10 fs/io-wq.c:1064 io_init_wq_offload fs/io_uring.c:7432 [inline] io_sq_offload_start fs/io_uring.c:7504 [inline] io_uring_create fs/io_uring.c:8625 [inline] io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 21: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3418 [inline] kfree+0x10e/0x2b0 mm/slab.c:3756 __io_wq_destroy fs/io-wq.c:1138 [inline] io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146 io_finish_async fs/io_uring.c:6836 [inline] io_ring_ctx_free fs/io_uring.c:7870 [inline] io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the object at ffff8882183db000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 140 bytes inside of 1024-byte region [ffff8882183db000, ffff8882183db400) The buggy address belongs to the page: page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db flags: 0x57ffe0000000200(slab) raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700 raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== which is down to the comment below, /* all workers gone, wq exit can proceed */ if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs)) complete(&wqe->wq->done); because there might be multiple cases of wqe in a wq and we would wait for every worker in every wqe to go home before releasing wq's resources on destroying. To that end, rework wq's refcount by making it independent of the tracking of workers because after all they are two different things, and keeping it balanced when workers come and go. Note the manager kthread, like other workers, now holds a grab to wq during its lifetime. Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker and do nothing for exiting wq. Cc: # v5.5+ Reported-by: Reported-by: Cc: Pavel Begunkov <> Signed-off-by: Hillf Danton <> Signed-off-by: Jens Axboe <>
2020-10-12io_wq: Make io_wqe::lock a raw_spinlock_tSebastian Andrzej Siewior
commit 95da84659226d75698a1ab958be0af21d9cc2a9c upstream. During a context switch the scheduler invokes wq_worker_sleeping() with disabled preemption. Disabling preemption is needed because it protects access to `worker->sleeping'. As an optimisation it avoids invoking schedule() within the schedule path as part of possible wake up (thus preempt_enable_no_resched() afterwards). The io-wq has been added to the mix in the same section with disabled preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping() acquires a spinlock_t. Also within the schedule() the spinlock_t must be acquired after tsk_is_pi_blocked() otherwise it will block on the sleeping lock again while scheduling out. While playing with `io_uring-bench' I didn't notice a significant latency spike after converting io_wqe::lock to a raw_spinlock_t. The latency was more or less the same. In order to keep the spinlock_t it would have to be moved after the tsk_is_pi_blocked() check which would introduce a branch instruction into the hot path. The lock is used to maintain the `work_list' and wakes one task up at most. Should io_wqe_cancel_pending_work() cause latency spikes, while searching for a specific item, then it would need to drop the lock during iterations. revert_creds() is also invoked under the lock. According to debug cred::non_rcu is 0. Otherwise it should be moved outside of the locked section because put_cred_rcu()->free_uid() acquires a sleeping lock. Convert io_wqe::lock to a raw_spinlock_t.c Signed-off-by: Sebastian Andrzej Siewior <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: reference ->nsproxy for file table commandsJens Axboe
commit 9b8284921513fc1ea57d87777283a59b05862f03 upstream. If we don't get and assign the namespace for the async work, then certain paths just don't work properly (like /dev/stdin, /proc/mounts, etc). Anything that references the current namespace of the given task should be assigned for async work on behalf of that task. Cc: # v5.5+ Reported-by: Al Viro <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: don't rely on weak ->files referencesJens Axboe
commit 0f2122045b946241a9e549c2a76cea54fa58a7ff upstream. Grab actual references to the files_struct. To avoid circular references issues due to this, we add a per-task note that keeps track of what io_uring contexts a task has used. When the tasks execs or exits its assigned files, we cancel requests based on this tracking. With that, we can grab proper references to the files table, and no longer need to rely on stashing away ring_fd and ring_file to check if the ring_fd may have been closed. Cc: # v5.5+ Reviewed-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: enable task/files specific overflow flushingJens Axboe
commit e6c8aa9ac33bd7c968af7816240fc081401fddcd upstream. This allows us to selectively flush out pending overflows, depending on the task and/or files_struct being passed in. No intended functional changes in this patch. Reviewed-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: return cancelation status from poll/timeout/files handlersJens Axboe
commit 76e1b6427fd8246376a97e3227049d49188dfb9c upstream. Return whether we found and canceled requests or not. This is in preparation for using this information, no functional changes in this patch. Reviewed-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: unconditionally grab req->taskJens Axboe
commit e3bc8e9dad7f2f83cc807111d4472164c9210153 upstream. Sometimes we assign a weak reference to it, sometimes we grab a reference to it. Clean this up and make it unconditional, and drop the flag related to tracking this state. Reviewed-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: stash ctx task reference for SQPOLLJens Axboe
commit 2aede0e417db846793c276c7a1bbf7262c8349b0 upstream. We can grab a reference to the task instead of stashing away the task files_struct. This is doable without creating a circular reference between the ring fd and the task itself. Reviewed-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: move dropping of files into separate helperJens Axboe
commit f573d384456b3025d3f8e58b3eafaeeb0f510784 upstream. No functional changes in this patch, prep patch for grabbing references to the files_struct. Reviewed-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: allow timeout/poll/files killing to take task into accountJens Axboe
commit f3606e3a92ddd36299642c78592fc87609abb1f6 upstream. We currently cancel these when the ring exits, and we cancel all of them. This is in preparation for killing only the ones associated with a given task. Reviewed-by: Pavel Begunkov <> Signed-off-by: Jens Axboe <>
2020-10-12io_uring: don't run task work on an exiting taskJens Axboe
commit 6200b0ae4ea28a4bfd8eb434e33e6201b7a6a282 upstream. This isn't safe, and isn't needed either. We are guaranteed that any work we queue is on a live task (and will be run), or it goes to our backup io-wq threads if the task is exiting. Signed-off-by: Jens Axboe <>
2020-10-07Linux 5.8.14Greg Kroah-Hartman
Tested-by: Jon Hunter <> Tested-by: Shuah Khan <> Tested-by: Linux Kernel Functional Testing <> Tested-by: Jeffrin Jose T <> Tested-by: Guenter Roeck <> Link: Signed-off-by: Greg Kroah-Hartman <>
2020-10-07ep_create_wakeup_source(): dentry name can change under you...Al Viro
commit 3701cb59d892b88d569427586f01491552f377b1 upstream. or get freed, for that matter, if it's a long (separately stored) name. Signed-off-by: Al Viro <> Signed-off-by: Greg Kroah-Hartman <>
2020-10-07epoll: EPOLL_CTL_ADD: close the race in decision to take fast pathAl Viro
commit fe0a916c1eae8e17e86c3753d13919177d63ed7e upstream. Checking for the lack of epitems refering to the epoll we want to insert into is not enough; we might have an insertion of that epoll into another one that has already collected the set of files to recheck for excessive reverse paths, but hasn't gotten to creating/inserting the epitem for it. However, any such insertion in progress can be detected - it will update the generation count in our epoll when it's done looking through it for files to check. That gets done under ->mtx of our epoll and that allows us to detect that safely. We are *not* holding epmutex here, so the generation count is not stable. However, since both the update of ep->gen by loop check and (later) insertion into ->f_ep_link are done with ep->mtx held, we are fine - the sequence is grab epmutex bump loop_check_gen ... grab tep->mtx // 1 tep->gen = loop_check_gen ... drop tep->mtx // 2 ... grab tep->mtx // 3 ... insert into ->f_ep_link ... drop tep->mtx // 4 bump loop_check_gen drop epmutex and if the fastpath check in another thread happens for that eventpoll, it can come * before (1) - in that case fastpath is just fine * after (4) - we'll see non-empty ->f_ep_link, slow path taken * between (2) and (3) - loop_check_gen is stable, with ->mtx providing barriers and we end up taking slow path. Note that ->f_ep_link emptiness check is slightly racy - we are protected against insertions into that list, but removals can happen right under us. Not a problem - in the worst case we'll end up taking a slow path for no good reason. Signed-off-by: Al Viro <> Signed-off-by: Greg Kroah-Hartman <>
2020-10-07epoll: replace ->visited/visited_list with generation countAl Viro
commit 18306c404abe18a0972587a6266830583c60c928 upstream. removes the need to clear it, along with the races. Signed-off-by: Al Viro <> Signed-off-by: Greg Kroah-Hartman <>
2020-10-07epoll: do not insert into poll queues until all sanity checks are doneAl Viro
commit f8d4f44df056c5b504b0d49683fb7279218fd207 upstream. Signed-off-by: Al Viro <> Signed-off-by: Greg Kroah-Hartman <>
2020-10-07scsi: sd: sd_zbc: Fix ZBC disk initializationDamien Le Moal
commit 6c5dee18756b4721ac8518c69b22ee8ac0c9c442 upstream. Make sure to call sd_zbc_init_disk() when the sdkp->zoned field is known, that is, once sd_read_block_characteristics() is executed in sd_revalidate_disk(), so that host-aware disks also get initialized. To do so, move sd_zbc_init_disk() call in sd_zbc_revalidate_zones() and make sure to execute it for all zoned disks, including for host-aware disks used as regular disks as these disk zoned model may be changed back to BLK_ZONED_HA when partitions are deleted. Link: Fixes: 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") Cc: <> # v5.8+ Reported-by: Borislav Petkov <> Tested-by: Borislav Petkov <> Reviewed-by: Johannes Thumshirn <> Reviewed-by: Christoph Hellwig <> Signed-off-by: Damien Le Moal <> Signed-off-by: Martin K. Petersen <> Signed-off-by: Greg Kroah-Hartman <>
2020-10-07scsi: sd: sd_zbc: Fix handling of host-aware ZBC disksDamien Le Moal
commit 27ba3e8ff3ab86449e63d38a8d623053591e65fa upstream. When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as regular disks. In this case, ensure that command completion is correctly executed by changing sd_zbc_complete() to return good_bytes instead of 0 and causing a hang during device probe (endless retries). When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to have partitions, it will be used as a regular disk. In this case, make sure to not do anything in sd_zbc_revalidate_zones() as that triggers warnings. Since all these different cases result in subtle settings of the disk queue zoned model, introduce the block layer helper function blk_queue_set_zoned() to generically implement setting up the effective zoned model according to the disk type, the presence of partitions on the disk and CONFIG_BLK_DEV_ZONED configuration. Link: Fixes: b72053072c0b ("block: allow partitions on host aware zone devices") Cc: <> Reported-by: Borislav Petkov <> Suggested-by: Christoph Hellwig <> Reviewed-by: Christoph Hellwig <> Reviewed-by: Johannes Thumshirn <> Signed-off-by: Damien Le Moal <> Signed-off-by: Martin K. Petersen <> Signed-off-by: Greg Kroah-Hartman <>
2020-10-07drm/i915/gvt: Fix port number for BDW on EDID region setupZhenyu Wang
commit 28284943ac94014767ecc2f7b3c5747c4a5617a0 upstream. Current BDW virtual display port is initialized as PORT_B, so need to use same port for VFIO EDID region, otherwise invalid EDID blob pointer is assigned which caused kernel null pointer reference. We might evaluate actual display hotplug for BDW to make this function work as expected, anyway this is always required to be fixed first. Reported-by: Alejandro Sior <> Cc: Alejandro Sior <> Fixes: 0178f4ce3c3b ("drm/i915/gvt: Enable vfio edid for all GVT supported platform") Reviewed-by: Hang Yuan <> Signed-off-by: Zhenyu Wang <> Link: Signed-off-by: Greg Kroah-Hartman <>
2020-10-07gpiolib: Fix line event handling in syscall compatible modeAndy Shevchenko
[ Upstream commit 5ad284ab3a01e2d6a89be2a8663ae76f4e617549 ] The introduced line event handling ABI in the commit 61f922db7221 ("gpio: userspace ABI for reading GPIO line events") missed the fact that 64-bit kernel may serve for 32-bit applications. In such case the very first check in the lineevent_read() will fail due to alignment differences. To workaround this introduce lineevent_get_size() helper which returns actual size of the structure in user space. Fixes: 61f922db7221 ("gpio: userspace ABI for reading GPIO line events") Suggested-by: Arnd Bergmann <> Signed-off-by: Andy Shevchenko <> Acked-by: Arnd Bergmann <> Tested-by: Kent Gibson <> Signed-off-by: Bartosz Golaszewski <> Signed-off-by: Sasha Levin <>
2020-10-07random32: Restore __latent_entropy attribute on net_rand_stateThibaut Sautereau
[ Upstream commit 09a6b0bc3be793ca8cba580b7992d73e9f68f15d ] Commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity") broke compilation and was temporarily fixed by Linus in 83bdc7275e62 ("random32: remove net_rand_state from the latent entropy gcc plugin") by entirely moving net_rand_state out of the things handled by the latent_entropy GCC plugin. From what I understand when reading the plugin code, using the __latent_entropy attribute on a declaration was the wrong part and simply keeping the __latent_entropy attribute on the variable definition was the correct fix. Fixes: 83bdc7275e62 ("random32: remove net_rand_state from the latent entropy gcc plugin") Acked-by: Willy Tarreau <> Cc: Emese Revfy <> Signed-off-by: Thibaut Sautereau <> Signed-off-by: Linus Torvalds <> Signed-off-by: Sasha Levin <>
2020-10-07pipe: remove pipe_wait() and fix wakeup race with spliceLinus Torvalds
[ Upstream commit 472e5b056f000a778abb41f1e443de58eb259783 ] The pipe splice code still used the old model of waiting for pipe IO by using a non-specific "pipe_wait()" that waited for any pipe event to happen, which depended on all pipe IO being entirely serialized by the pipe lock. So by checking the state you were waiting for, and then adding yourself to the wait queue before dropping the lock, you were guaranteed to see all the wakeups. Strictly speaking, the actual wakeups were not done under the lock, but the pipe_wait() model still worked, because since the waiter held the lock when checking whether it should sleep, it would always see the current state, and the wakeup was always done after updating the state. However, commit 0ddad21d3e99 ("pipe: use exclusive waits when reading or writing") split the single wait-queue into two, and in the process also made the "wait for event" code wait for _two_ wait queues, and that then showed a race with the wakers that were not serialized by the pipe lock. It's only splice that used that "pipe_wait()" model, so the problem wasn't obvious, but Josef Bacik reports: "I hit a hang with fstest btrfs/187, which does a btrfs send into /dev/null. This works by creating a pipe, the write side is given to the kernel to write into, and the read side is handed to a thread that splices into a file, in this case /dev/null. The box that was hung had the write side stuck here [pipe_write] and the read side stuck here [splice_from_pipe_next -> pipe_wait]. [ more details about pipe_wait() scenario ] The problem is we're doing the prepare_to_wait, which sets our state each time, however we can be woken up either with reads or writes. In the case above we race with the WRITER waking us up, and re-set our state to INTERRUPTIBLE, and thus never break out of schedule" Josef had a patch that avoided the issue in pipe_wait() by just making it set the state only once, but the deeper problem is that pipe_wait() depends on a level of synchonization by the pipe mutex that it really shouldn't. And the whole "wait for any pipe state change" model really isn't very good to begin with. So rather than trying to work around things in pipe_wait(), remove that legacy model of "wait for arbitrary pipe event" entirely, and actually create functions that wait for the pipe actually being readable or writable, and can do so without depending on the pipe lock serializing everything. Fixes: 0ddad21d3e99 ("pipe: use exclusive waits when reading or writing") Link: Reported-by: Josef Bacik <> Reviewed-and-tested-by: Josef Bacik <> Signed-off-by: Linus Torvalds <> Signed-off-by: Sasha Levin <>
2020-10-07iommu/amd: Fix the overwritten field in IVMD headerAdrian Huang
[ Upstream commit 0bbe4ced53e36786eafc2ecbf9a1761f55b4a82e ] Commit 387caf0b759a ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions") accidentally overwrites the 'flags' field in IVMD (struct ivmd_header) when the I/O virtualization memory definition is associated with the exclusion range entry. This leads to the corrupted IVMD table (incorrect checksum). The kdump kernel reports the invalid checksum: ACPI BIOS Warning (bug): Incorrect checksum in table [IVRS] - 0x5C, should be 0x60 (20200717/tbprint-177) AMD-Vi: [Firmware Bug]: IVRS invalid checksum Fix the above-mentioned issue by modifying the 'struct unity_map_entry' member instead of the IVMD header. Cleanup: The *exclusion_range* functions are not used anymore, so get rid of them. Fixes: 387caf0b759a ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions") Reported-and-tested-by: Baoquan He <> Signed-off-by: Adrian Huang <> Cc: Jerry Snitselaar <> Link: Signed-off-by: Joerg Roedel <> Signed-off-by: Sasha Levin <>
2020-10-07gpio: pca953x: Correctly initialize registers 6 and 7 for PCA957xAndy Shevchenko
[ Upstream commit 8c1f1c34777bddb633d4a068a9c812d29974c6bd ] When driver has been converted to the bitmap API the non-bitmap functions started behaving differently on 32-bit BE architectures since the bytes in two consequent unsigned longs are in different order in comparison to byte array. Hence if the chip had had more than 32 lines the memset() call over it would have not set up upper lines correctly. Although it's currently a theoretical case (no supported chips of this type has 32+ lines), it's better to provide a clean code to avoid people thinking this is okay and potentially producing not fully working things. Fixes: 35d13d94893f ("gpio: pca953x: convert to use bitmap API") Signed-off-by: Andy Shevchenko <> Reviewed-by: Bartosz Golaszewski <> Link: Signed-off-by: Linus Walleij <> Signed-off-by: Sasha Levin <>
2020-10-07pinctrl: mediatek: check mtk_is_virt_gpio input parameterHanks Chen
[ Upstream commit 39c4dbe4cc363accd676162c24b264f44c581490 ] check mtk_is_virt_gpio input parameter, virtual gpio need to support eint mode. add error handler for the ko case to fix this boot fail: pc : mtk_is_virt_gpio+0x20/0x38 [pinctrl_mtk_common_v2] lr : mtk_gpio_get_direction+0x44/0xb0 [pinctrl_paris] Fixes: edd546465002 ("pinctrl: mediatek: avoid virtual gpio trying to set reg") Signed-off-by: Hanks Chen <> Acked-by: Sean Wang <> Singed-off-by: Jie Yang <> Link: Signed-off-by: Linus Walleij <> Signed-off-by: Sasha Levin <>
2020-10-07pinctrl: qcom: sm8250: correct sdc2_clkDmitry Baryshkov
[ Upstream commit 5d8ff95a52c36740bf4e61202d08549e7a9caf20 ] Correct sdc2_clk pin definition (register offset is wrong, verified by the msm-4.19 driver). Fixes: 4e3ec9e407ad ("pinctrl: qcom: Add sm8250 pinctrl driver.") Signed-off-by: Dmitry Baryshkov <> Reviewed-by: Bjorn Andersson <> Acked-by: Manivannan Sadhasivam <> Link: Signed-off-by: Linus Walleij <> Signed-off-by: Sasha Levin <>
2020-10-07autofs: use __kernel_write() for the autofs pipe writingLinus Torvalds
[ Upstream commit 90fb702791bf99b959006972e8ee7bb4609f441b ] autofs got broken in some configurations by commit 13c164b1a186 ("autofs: switch to kernel_write") because there is now an extra LSM permission check done by security_file_permission() in rw_verify_area(). autofs is one if the few places that really does want the much more limited __kernel_write(), because the write is an internal kernel one that shouldn't do any user permission checks (it also doesn't need the file_start_write/file_end_write logic, since it's just a pipe). There are a couple of other cases like that - accounting, core dumping, and splice - but autofs stands out because it can be built as a module. As a result, we need to export this internal __kernel_write() function again. We really don't want any other module to use this, but we don't have a "EXPORT_SYMBOL_FOR_AUTOFS_ONLY()". But we can mark it GPL-only to at least approximate that "internal use only" for licensing. While in this area, make autofs pass in NULL for the file position pointer, since it's always a pipe, and we now use a NULL file pointer for streaming file descriptors (see file_ppos() and commit 438ab720c675: "vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files") This effectively reverts commits 9db977522449 ("fs: unexport __kernel_write") and 13c164b1a186 ("autofs: switch to kernel_write"). Fixes: 13c164b1a186 ("autofs: switch to kernel_write") Reported-by: Ondrej Mosnacek <> Acked-by: Christoph Hellwig <> Acked-by: Acked-by: Ian Kent <> Signed-off-by: Linus Torvalds <> Signed-off-by: Sasha Levin <>
2020-10-07scripts/dtc: only append to HOST_EXTRACFLAGS instead of overwritingUwe Kleine-König
[ Upstream commit efe84d408bf41975db8506d3a1cc02e794e2309c ] When building with $ HOST_EXTRACFLAGS=-g make the expectation is that host tools are built with debug informations. This however doesn't happen if the Makefile assigns a new value to the HOST_EXTRACFLAGS instead of appending to it. So use += instead of := for the first assignment. Fixes: e3fd9b5384f3 ("scripts/dtc: consolidate include path options in Makefile") Signed-off-by: Uwe Kleine-König <> Signed-off-by: Rob Herring <> Signed-off-by: Sasha Levin <>
2020-10-07blk-mq: call commit_rqs while list empty but error happenyangerkun
[ Upstream commit 632bfb6323799c087fcb4108dfe59518609667a7 ] Blk-mq should call commit_rqs once 'bd.last != true' and no more request will come(so virtscsi can kick the virtqueue, e.g.). We already do that in 'blk_mq_dispatch_rq_list/blk_mq_try_issue_list_directly' while list not empty and 'queued > 0'. However, we can seen the same scene once the last request in list call queue_rq and return error like BLK_STS_IOERR which will not requeue the request, and lead that list empty but need call commit_rqs too(Or the request for virtscsi will stay timeout until other request kick virtqueue). We found this problem by do fsstress test with offline/online virtscsi device repeat quickly. Fixes: d666ba98f849 ("blk-mq: add mq_ops->commit_rqs()") Reported-by: zhangyi (F) <> Signed-off-by: yangerkun <> Reviewed-by: Ming Lei <> Signed-off-by: Jens Axboe <> Signed-off-by: Sasha Levin <>
2020-10-07Input: trackpoint - enable Synaptics trackpointsVincent Huang
[ Upstream commit 996d585b079ad494a30cac10e08585bcd5345125 ] Add Synaptics IDs in trackpoint_start_protocol() to mark them as valid. Signed-off-by: Vincent Huang <> Fixes: 6c77545af100 ("Input: trackpoint - add new trackpoint variant IDs") Reviewed-by: Harry Cutts <> Tested-by: Harry Cutts <> Link: Signed-off-by: Dmitry Torokhov <> Signed-off-by: Sasha Levin <>
2020-10-07i2c: npcm7xx: Clear LAST bit after a failed transaction.Tali Perry
[ Upstream commit 8947efc077168c53b84d039881a7c967086a248a ] Due to a HW issue, in some scenarios the LAST bit might remain set. This will cause an unexpected NACK after reading 16 bytes on the next read. Example: if user tries to read from a missing device, get a NACK, then if the next command is a long read ( > 16 bytes), the master will stop reading after 16 bytes. To solve this, if a command fails, check if LAST bit is still set. If it does, reset the module. Fixes: 56a1485b102e (i2c: npcm7xx: Add Nuvoton NPCM I2C controller driver) Signed-off-by: Tali Perry <> Signed-off-by: Wolfram Sang <> Signed-off-by: Sasha Levin <>
2020-10-07i2c: cpm: Fix i2c_ram structureNicolas VINCENT
[ Upstream commit a2bd970aa62f2f7f80fd0d212b1d4ccea5df4aed ] the i2c_ram structure is missing the sdmatmp field mentionned in datasheet for MPC8272 at paragraph 36.5. With this field missing, the hardware would write past the allocated memory done through cpm_muram_alloc for the i2c_ram structure and land in memory allocated for the buffers descriptors corrupting the cbd_bufaddr field. Since this field is only set during setup(), the first i2c transaction would work and the following would send data read from an arbitrary memory location. Fixes: 61045dbe9d8d ("i2c: Add support for I2C bus on Freescale CPM1/CPM2 controllers") Signed-off-by: Nicolas VINCENT <> Acked-by: Jochen Friedrich <> Acked-by: Christophe Leroy <> Signed-off-by: Wolfram Sang <> Signed-off-by: Sasha Levin <>
2020-10-07gpio: aspeed: fix ast2600 bank propertiesTao Ren
[ Upstream commit 3e640b1eec38e4c8eba160f26cba4f592e657f3d ] GPIO_U is mapped to the least significant byte of input/output mask, and the byte in "output" mask should be 0 because GPIO_U is input only. All the other bits need to be 1 because GPIO_V/W/X support both input and output modes. Similarly, GPIO_Y/Z are mapped to the 2 least significant bytes, and the according bits need to be 1 because GPIO_Y/Z support both input and output modes. Fixes: ab4a85534c3e ("gpio: aspeed: Add in ast2600 details to Aspeed driver") Signed-off-by: Tao Ren <> Reviewed-by: Andrew Jeffery <> Signed-off-by: Bartosz Golaszewski <> Signed-off-by: Sasha Levin <>
2020-10-07gpio/aspeed-sgpio: don't enable all interrupts by defaultJeremy Kerr
[ Upstream commit bf0d394e885015941ed2d5724c0a6ed8d42dd95e ] Currently, the IRQ setup for the SGPIO driver enables all interrupts in dual-edge trigger mode. Since the default handler is handle_bad_irq, any state change on input GPIOs will trigger bad IRQ warnings. This change applies sensible IRQ defaults: single-edge trigger, and all IRQs disabled. Signed-off-by: Jeremy Kerr <> Fixes: 7db47faae79b ("gpio: aspeed: Add SGPIO driver") Reviewed-by: Joel Stanley <> Acked-by: Andrew Jeffery <> Signed-off-by: Bartosz Golaszewski <> Signed-off-by: Sasha Levin <>
2020-10-07gpio/aspeed-sgpio: enable access to all 80 input & output sgpiosJeremy Kerr
[ Upstream commit ac67b07e268d46eba675a60c37051bb3e59fd201 ] Currently, the aspeed-sgpio driver exposes up to 80 GPIO lines, corresponding to the 80 status bits available in hardware. Each of these lines can be configured as either an input or an output. However, each of these GPIOs is actually an input *and* an output; we actually have 80 inputs plus 80 outputs. This change expands the maximum number of GPIOs to 160; the lower half of this range are the input-only GPIOs, the upper half are the outputs. We fix the GPIO directions to correspond to this mapping. This also fixes a bug when setting GPIOs - we were reading from the input register, making it impossible to set more than one output GPIO. Signed-off-by: Jeremy Kerr <> Fixes: 7db47faae79b ("gpio: aspeed: Add SGPIO driver") Reviewed-by: Joel Stanley <> Reviewed-by: Andrew Jeffery <> Acked-by: Rob Herring <> Signed-off-by: Bartosz Golaszewski <> Signed-off-by: Sasha Levin <>
2020-10-07gpio: pca953x: Fix uninitialized pending variableYe Li
[ Upstream commit e43c26e12dd49a41cf5a4cd5c5b59a1eb98ed11e ] When pca953x_irq_pending returns false, the pending parameter won't be set. But pca953x_irq_handler continues using this uninitialized variable as pending irqs and will cause problem. Fix the issue by initializing pending to 0. Fixes: 064c73afe738 ("gpio: pca953x: Synchronize interrupt handler properly") Signed-off-by: Ye Li <> Reviewed-by: Andy Shevchenko <> Signed-off-by: Bartosz Golaszewski <> Signed-off-by: Sasha Levin <>
2020-10-07iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()Yu Kuai
[ Upstream commit 1a26044954a6d1f4d375d5e62392446af663be7a ] if of_find_device_by_node() succeed, exynos_iommu_of_xlate() doesn't have a corresponding put_device(). Thus add put_device() to fix the exception handling for this function implementation. Fixes: aa759fd376fb ("iommu/exynos: Add callback for initializing devices from device tree") Signed-off-by: Yu Kuai <> Acked-by: Marek Szyprowski <> Link: Signed-off-by: Joerg Roedel <> Signed-off-by: Sasha Levin <>
2020-10-07scsi: target: Fix lun lookup for TARGET_SCF_LOOKUP_LUN_FROM_TAG caseSudhakar Panneerselvam
[ Upstream commit 149415586243bd0ea729760fb6dd7b3c50601871 ] transport_lookup_tmr_lun() uses "orig_fe_lun" member of struct se_cmd for the lookup. Hence, update this field directly for the TARGET_SCF_LOOKUP_LUN_FROM_TAG case. Link: Fixes: a36840d80027 ("target: Initialize LUN in transport_init_se_cmd()") Reported-by: Martin Wilck <> Reviewed-by: Mike Christie <> Signed-off-by: Sudhakar Panneerselvam <> Signed-off-by: Martin K. Petersen <> Signed-off-by: Sasha Levin <>
2020-10-07clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSEDMarek Szyprowski
[ Upstream commit f3bb0f796f5ffe32f0fbdce5b1b12eb85511158f ] The ChipID IO region has it's own clock, which is being disabled while scanning for unused clocks. It turned out that some CPU hotplug, CPU idle or even SOC firmware code depends on the reads from that area. Fix the mysterious hang caused by entering deep CPU idle state by ignoring the 'chipid' clock during unused clocks scan, as there are no direct clients for it which will keep it enabled. Fixes: e062b571777f ("clk: exynos4: register clocks using common clock framework") Signed-off-by: Marek Szyprowski <> Link: Reviewed-by: Krzysztof Kozlowski <> Acked-by: Sylwester Nawrocki <> Signed-off-by: Stephen Boyd <> Signed-off-by: Sasha Levin <>
2020-10-07dmaengine: dmatest: Prevent to run on misconfigured channelVladimir Murzin
[ Upstream commit ce65d55f92a67e247f4d799e581cf9fed677871c ] Andy reported that commit 6b41030fdc79 ("dmaengine: dmatest: Restore default for channel") broke his scripts for the case where "busy" channel is used for configuration with expectation that run command would do nothing. Instead, behavior was (unintentionally) changed to treat such case as under-configuration and progress with defaults, i.e. run command would start a test with default setting for channel (which would use all channels). Restore original behavior with tracking status of channel setter so we can distinguish between misconfigured and under-configured cases in run command and act accordingly. Fixes: 6b41030fdc79 ("dmaengine: dmatest: Restore default for channel") Reported-by: Andy Shevchenko <> Tested-by: Andy Shevchenko <> Signed-off-by: Vladimir Murzin <> Tested-by: Peter Ujfalusi <> Signed-off-by: Andy Shevchenko <> Link: Signed-off-by: Vinod Koul <> Signed-off-by: Sasha Levin <>
2020-10-07clk: tegra: Fix missing prototype for tegra210_clk_register_emc()Thierry Reding
[ Upstream commit 2f878d04218c8b26f6d0ab26955ca6b03848a1ad ] Include the Tegra driver's clk.h to pull in the prototype definition for this function so that compilers don't warn about it being missing. Fixes: 0ac65fc946d3 ("clk: tegra: Implement Tegra210 EMC clock") Reported-by: kernel test robot <> Signed-off-by: Thierry Reding <> Signed-off-by: Sasha Levin <>
2020-10-07clk: tegra: Always program PLL_E when enabledThierry Reding
[ Upstream commit 5105660ee80862b85f7769626d0f936c18ce1885 ] Commit bff1cef5f23a ("clk: tegra: Don't enable already enabled PLLs") added checks to avoid enabling PLLs that have already been enabled by the bootloader. However, the PLL_E configuration inherited from the bootloader isn't necessarily the one that is needed for the kernel. This can cause SATA to fail like this: [ 5.310270] phy phy-sata.6: phy poweron failed --> -110 [ 5.315604] tegra-ahci 70027000.sata: failed to power on AHCI controller: -110 [ 5.323022] tegra-ahci: probe of 70027000.sata failed with error -110 Fix this by always programming the PLL_E. This ensures that any mis- configuration by the bootloader will be overwritten by the kernel. Fixes: bff1cef5f23a ("clk: tegra: Don't enable already enabled PLLs") Reported-by: LABBE Corentin <> Tested-by: Corentin Labbe <> Reviewed-by: Dmitry Osipenko <> Signed-off-by: Thierry Reding <> Signed-off-by: Sasha Levin <>
2020-10-07pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on readTrond Myklebust
[ Upstream commit ee15c7b53e52fb04583f734461244c4dcca828fa ] While it is true that reading from an unmirrored source always uses index 0, that is no longer true for mirrored sources when we fail over. Fixes: 563c53e73b8b ("NFS: Fix flexfiles read failover") Signed-off-by: Trond Myklebust <> Signed-off-by: Sasha Levin <>
2020-10-07NFSv4.2: fix client's attribute cache management for copy_file_rangeOlga Kornievskaia
[ Upstream commit 16abd2a0c124a6c3543c88ca4c53c997c9fb4114 ] After client is done with the COPY operation, it needs to invalidate its pagecache (as it did no reading or writing of the data locally) and it needs to invalidate it's attributes just like it would have for a read on the source file and write on the destination file. Once the linux server started giving out read delegations to read+write opens, the destination file of the copy_file range started having delegations and not doing syncup on close of the file leading to xfstest failures for generic/430,431,432,433,565. v2: changing cache_validity needs to be protected by the i_lock. Reported-by: Murphy Zhou <> Fixes: 2e72448b07dc ("NFS: Add COPY nfs operation") Signed-off-by: Olga Kornievskaia <> Signed-off-by: Trond Myklebust <> Signed-off-by: Sasha Levin <>
2020-10-07nfs: Fix security label length not being resetJeffrey Mitchell
[ Upstream commit d33030e2ee3508d65db5644551435310df86010e ] nfs_readdir_page_filler() iterates over entries in a directory, reusing the same security label buffer, but does not reset the buffer's length. This causes decode_attr_security_label() to return -ERANGE if an entry's security label is longer than the previous one's. This error, in nfs4_decode_dirent(), only gets passed up as -EAGAIN, which causes another failed attempt to copy into the buffer. The second error is ignored and the remaining entries do not show up in ls, specifically the getdents64() syscall. Reproduce by creating multiple files in NFS and giving one of the later files a longer security label. ls will not see that file nor any that are added afterwards, though they will exist on the backend. In nfs_readdir_page_filler(), reset security label buffer length before every reuse Signed-off-by: Jeffrey Mitchell <> Fixes: b4487b935452 ("nfs: Fix getxattr kernel panic and memory overflow") Signed-off-by: Trond Myklebust <> Signed-off-by: Sasha Levin <>
2020-10-07pinctrl: mvebu: Fix i2c sda definition for 98DX3236Chris Packham
[ Upstream commit 63c3212e7a37d68c89a13bdaebce869f4e064e67 ] Per the datasheet the i2c functions use MPP_Sel=0x1. They are documented as using MPP_Sel=0x4 as well but mixing 0x1 and 0x4 is clearly wrong. On the board tested 0x4 resulted in a non-functioning i2c bus so stick with 0x1 which works. Fixes: d7ae8f8dee7f ("pinctrl: mvebu: pinctrl driver for 98DX3236 SoC") Signed-off-by: Chris Packham <> Reviewed-by: Andrew Lunn <> Link: Signed-off-by: Linus Walleij <> Signed-off-by: Sasha Levin <>