linux-2.6-block.git
5 years agomm: add kiocb_wait_page_async_init() helper
Jens Axboe [Fri, 22 May 2020 16:18:23 +0000 (10:18 -0600)]
mm: add kiocb_wait_page_async_init() helper

Checks if the file supports it, and initializes the values that we need.
Caller passes in 'data' pointer, if any, and the callback function to
be used.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobtrfs: flag files as supporting buffered async reads
Jens Axboe [Fri, 22 May 2020 16:19:22 +0000 (10:19 -0600)]
btrfs: flag files as supporting buffered async reads

btrfs uses generic_file_read_iter(), which already supports this.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoxfs: flag files as supporting buffered async reads
Jens Axboe [Fri, 22 May 2020 15:27:33 +0000 (09:27 -0600)]
xfs: flag files as supporting buffered async reads

XFS uses generic_file_read_iter(), which already supports this.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: flag block devices as supporting IOCB_WAITQ
Jens Axboe [Fri, 22 May 2020 15:14:08 +0000 (09:14 -0600)]
block: flag block devices as supporting IOCB_WAITQ

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoext4: flag as supporting buffered async reads
Jens Axboe [Fri, 22 May 2020 15:13:42 +0000 (09:13 -0600)]
ext4: flag as supporting buffered async reads

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agofs: add FMODE_BUF_RASYNC
Jens Axboe [Fri, 22 May 2020 15:12:51 +0000 (09:12 -0600)]
fs: add FMODE_BUF_RASYNC

If set, this indicates that the file system supports IOCB_WAITQ for
buffered reads.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agomm: support async buffered reads in generic_file_buffered_read()
Jens Axboe [Fri, 22 May 2020 15:18:38 +0000 (09:18 -0600)]
mm: support async buffered reads in generic_file_buffered_read()

Use the async page locking infrastructure, if IOCB_WAITQ is set in the
passed in iocb. The caller must expect an -EIOCBQUEUED return value,
which means that IO is started but not done yet. This is similar to how
O_DIRECT signals the same operation. Once the callback is received by
the caller for IO completion, the caller must retry the operation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agomm: add support for async page locking
Jens Axboe [Fri, 22 May 2020 15:12:09 +0000 (09:12 -0600)]
mm: add support for async page locking

Normally waiting for a page to become unlocked, or locking the page,
requires waiting for IO to complete. Add support for lock_page_async()
and wait_on_page_locked_async(), which are callback based instead. This
allows a caller to get notified when a page becomes unlocked, rather
than wait for it.

We use the iocb->private field to pass in this necessary data for this
to happen. struct wait_page_key is made public, and we define struct
wait_page_async as the interface between the caller and the core.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agomm: allow read-ahead with IOCB_NOWAIT set
Jens Axboe [Fri, 22 May 2020 14:59:42 +0000 (08:59 -0600)]
mm: allow read-ahead with IOCB_NOWAIT set

The read-ahead shouldn't block, so allow it to be done even if
IOCB_NOWAIT is set in the kiocb.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: read-ahead submission should imply no-wait as well
Jens Axboe [Fri, 22 May 2020 14:55:22 +0000 (08:55 -0600)]
block: read-ahead submission should imply no-wait as well

As read-ahead is opportunistic, don't block for request allocation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'io_uring-5.7' into async-buffered
Jens Axboe [Fri, 22 May 2020 15:51:28 +0000 (09:51 -0600)]
Merge branch 'io_uring-5.7' into async-buffered

* io_uring-5.7:
  io_uring: reset -EBUSY error when io sq thread is waken up
  io_uring: don't add non-IO requests to iopoll pending list
  io_uring: don't use kiocb.private to store buf_index
  io_uring: cancel work if task_work_add() fails
  io_uring: remove dead check in io_splice()
  io_uring: fix FORCE_ASYNC req preparation
  io_uring: don't prepare DRAIN reqs twice
  io_uring: initialize ctx->sqo_wait earlier

5 years agoMerge branch 'master' into async-buffered
Jens Axboe [Fri, 22 May 2020 15:51:12 +0000 (09:51 -0600)]
Merge branch 'master' into async-buffered

* master: (36 commits)
  pipe: Fix pipe_full() test in opipe_prep().
  fix multiplication overflow in copy_fdtable()
  vsprintf: don't obfuscate NULL and error pointers
  iommu: Fix deferred domain attachment
  mtd:rawnand: brcmnand: Fix PM resume crash
  mtd: Fix mtd not registered due to nvmem name collision
  mtd: spinand: Propagate ECC information to the MTD structure
  ACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive
  ubi: Fix seq_file usage in detailed_erase_block_info debugfs file
  ubifs: fix wrong use of crypto_shash_descsize()
  MAINTAINERS: add maintainer for mediatek i2c controller driver
  i2c: mux: Replace zero-length array with flexible-array
  i2c: mux: demux-pinctrl: Fix an error handling path in 'i2c_demux_pinctrl_probe()'
  i2c: altera: Fix race between xfer_msg and isr thread
  i2c: algo-pca: update contact email
  i2c: at91: Fix pinmux after devm_gpiod_get() for bus recovery
  ARC: show_regs: avoid extra line of output
  x86/hyperv: Properly suspend/resume reenlightenment notifications
  iommu/amd: Fix get_acpihid_device_id()
  iommu/amd: Fix over-read of ACPI UID from IVRS table
  ...

5 years agoMerge branch 'for-5.8/block' into for-next
Jens Axboe [Fri, 22 May 2020 14:46:38 +0000 (08:46 -0600)]
Merge branch 'for-5.8/block' into for-next

* for-5.8/block:
  block: remove the disk and queue NULL checks in blkdev_issue_flush
  block: remove the error_sector argument to blkdev_issue_flush

5 years agoblock: remove the disk and queue NULL checks in blkdev_issue_flush
Christoph Hellwig [Wed, 13 May 2020 12:36:01 +0000 (14:36 +0200)]
block: remove the disk and queue NULL checks in blkdev_issue_flush

Both of these never can be NULL for a live block device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: remove the error_sector argument to blkdev_issue_flush
Christoph Hellwig [Wed, 13 May 2020 12:36:00 +0000 (14:36 +0200)]
block: remove the error_sector argument to blkdev_issue_flush

The argument isn't used by any caller, and drivers don't fill out
bi_sector for flush requests either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'for-5.8/drivers' into for-next
Jens Axboe [Thu, 21 May 2020 14:23:03 +0000 (08:23 -0600)]
Merge branch 'for-5.8/drivers' into for-next

* for-5.8/drivers:
  block: remove ioctl_by_bdev
  s390/dasd: remove ioctl_by_bdev calls
  dasd: refactor dasd_ioctl_information
  loop: Add LOOP_CONFIGURE ioctl
  loop: Clean up LOOP_SET_STATUS lo_flags handling
  loop: Rework lo_ioctl() __user argument casting
  loop: Move loop_set_status_from_info() and friends up
  loop: Factor out configuring loop from status
  loop: Remove figure_loop_size()
  loop: Refactor loop_set_status() size calculation
  loop: Switch to set_capacity_revalidate_and_notify()
  loop: Factor out setting loop device size
  loop: Remove sector_t truncation checks
  loop: Call loop_config_discard() only after new config is applied

5 years agoblock: remove ioctl_by_bdev
Christoph Hellwig [Tue, 19 May 2020 14:33:21 +0000 (16:33 +0200)]
block: remove ioctl_by_bdev

No callers left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agos390/dasd: remove ioctl_by_bdev calls
Stefan Haberland [Tue, 19 May 2020 14:22:59 +0000 (16:22 +0200)]
s390/dasd: remove ioctl_by_bdev calls

The IBM partition parser requires device type specific information only
available to the DASD driver to correctly register partitions. The
current approach of using ioctl_by_bdev with a fake user space pointer
is discouraged.

Fix this by replacing IOCTL calls with direct in-kernel function calls.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agodasd: refactor dasd_ioctl_information
Christoph Hellwig [Tue, 19 May 2020 14:22:58 +0000 (16:22 +0200)]
dasd: refactor dasd_ioctl_information

Prepare for in-kernel callers of this functionality.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[sth@de.ibm.com: remove leftover kfree]
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Add LOOP_CONFIGURE ioctl
Martijn Coenen [Wed, 13 May 2020 13:38:45 +0000 (15:38 +0200)]
loop: Add LOOP_CONFIGURE ioctl

This allows userspace to completely setup a loop device with a single
ioctl, removing the in-between state where the device can be partially
configured - eg the loop device has a backing file associated with it,
but is reading from the wrong offset.

Besides removing the intermediate state, another big benefit of this
ioctl is that LOOP_SET_STATUS can be slow; the main reason for this
slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to
freeze the associated queue; this requires waiting for RCU
synchronization, which I've measured can take about 15-20ms on this
device on average.

In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can
also be used to:
- Set the correct block size immediately by setting
  loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE)
- Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO
  in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO)
- Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY
  in loop_config.info.lo_flags

Here's setting up ~70 regular loop devices with an offset on an x86
Android device, using LOOP_SET_FD and LOOP_SET_STATUS:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
    0m03.40s real     0m00.02s user     0m00.03s system

Here's configuring ~70 devices in the same way, but using a modified
losetup that uses the new LOOP_CONFIGURE ioctl:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
    0m01.94s real     0m00.01s user     0m00.01s system

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Clean up LOOP_SET_STATUS lo_flags handling
Martijn Coenen [Wed, 13 May 2020 13:38:44 +0000 (15:38 +0200)]
loop: Clean up LOOP_SET_STATUS lo_flags handling

LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in
particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas
LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this
explicit by updating the UAPI to include the flags that can be
set/cleared using this ioctl.

The implementation can then blindly take over the passed in flags,
and use the previous flags for those flags that can't be set / cleared
using LOOP_SET_STATUS.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Rework lo_ioctl() __user argument casting
Martijn Coenen [Wed, 13 May 2020 13:38:43 +0000 (15:38 +0200)]
loop: Rework lo_ioctl() __user argument casting

In preparation for a new ioctl that needs to copy_from_user(); makes the
code easier to read as well.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Move loop_set_status_from_info() and friends up
Martijn Coenen [Wed, 13 May 2020 13:38:42 +0000 (15:38 +0200)]
loop: Move loop_set_status_from_info() and friends up

So we can use it without forward declaration. This is a separate commit
to make it easier to verify that this is just a move, without functional
modifications.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Factor out configuring loop from status
Martijn Coenen [Wed, 13 May 2020 13:38:41 +0000 (15:38 +0200)]
loop: Factor out configuring loop from status

Factor out this code into a separate function, so it can be reused by
other code more easily.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Remove figure_loop_size()
Martijn Coenen [Wed, 13 May 2020 13:38:40 +0000 (15:38 +0200)]
loop: Remove figure_loop_size()

This function was now only used by loop_set_capacity(). Just open code
the remaining code in the caller instead.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Refactor loop_set_status() size calculation
Martijn Coenen [Wed, 13 May 2020 13:38:39 +0000 (15:38 +0200)]
loop: Refactor loop_set_status() size calculation

figure_loop_size() calculates the loop size based on the passed in
parameters, but at the same time it updates the offset and sizelimit
parameters in the loop device configuration. That is a somewhat
unexpected side effect of a function with this name, and it is only only
needed by one of the two callers of this function - loop_set_status().

Move the lo_offset and lo_sizelimit assignment back into loop_set_status(),
and use the newly factored out functions to validate and apply the newly
calculated size. This allows us to get rid of figure_loop_size() in a
follow-up commit.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Switch to set_capacity_revalidate_and_notify()
Martijn Coenen [Wed, 13 May 2020 13:38:38 +0000 (15:38 +0200)]
loop: Switch to set_capacity_revalidate_and_notify()

This was recently added to block/genhd.c, and takes care of both
updating the capacity and notifying userspace of the new size.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Factor out setting loop device size
Martijn Coenen [Wed, 13 May 2020 13:38:37 +0000 (15:38 +0200)]
loop: Factor out setting loop device size

This code is used repeatedly.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Remove sector_t truncation checks
Martijn Coenen [Wed, 13 May 2020 13:38:36 +0000 (15:38 +0200)]
loop: Remove sector_t truncation checks

sector_t is now always u64, so we don't need to check for truncation.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoloop: Call loop_config_discard() only after new config is applied
Martijn Coenen [Wed, 13 May 2020 13:38:35 +0000 (15:38 +0200)]
loop: Call loop_config_discard() only after new config is applied

loop_set_status() calls loop_config_discard() to configure discard for
the loop device; however, the discard configuration depends on whether
the loop device uses encryption, and when we call it the encryption
configuration has not been updated yet. Move the call down so we apply
the correct discard configuration based on the new configuration.

Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge tag 'fixes-for-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd...
Linus Torvalds [Wed, 20 May 2020 20:23:55 +0000 (13:23 -0700)]
Merge tag 'fixes-for-5.7-rc6' of git://git./linux/kernel/git/mtd/linux

Pull MTD fixes from Richard Weinberger:

 - Fix a PM regression in brcmnand driver

 - Propagate ECC information correctly on SPI-NAND

 - Make sure no MTD name is used multiple time in nvmem

* tag 'fixes-for-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd:rawnand: brcmnand: Fix PM resume crash
  mtd: Fix mtd not registered due to nvmem name collision
  mtd: spinand: Propagate ECC information to the MTD structure

5 years agoMerge tag 'for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw...
Linus Torvalds [Wed, 20 May 2020 20:07:01 +0000 (13:07 -0700)]
Merge tag 'for-linus-5.7-rc6' of git://git./linux/kernel/git/rw/ubifs

Pull UBI and UBIFS fixes from Richard Weinberger:

 - Correctly set next cursor for detailed_erase_block_info debugfs file

 - Don't use crypto_shash_descsize() for digest size in UBIFS

 - Remove broken lazytime support from UBIFS

* tag 'for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: Fix seq_file usage in detailed_erase_block_info debugfs file
  ubifs: fix wrong use of crypto_shash_descsize()
  ubifs: remove broken lazytime support

5 years agoMerge tag 'for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Linus Torvalds [Wed, 20 May 2020 19:56:21 +0000 (12:56 -0700)]
Merge tag 'for-linus-5.7-rc6' of git://git./linux/kernel/git/rw/uml

Pull UML fixes from Richard Weinberger:

 - Two missing includes which caused build issues on recent systems

 - Correctly set TRANS_GRE_LEN in our vector network driver

* tag 'for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Fix typo in vector driver transport option definition
  um: syscall.c: include <asm/unistd.h>
  um: Fix xor.h include

5 years agoMerge tag 'pm-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Wed, 20 May 2020 18:33:30 +0000 (11:33 -0700)]
Merge tag 'pm-5.7-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "This makes a recently introduced suspend-to-idle wakeup issue on Dell
  XPS13 9360 go away"

* tag 'pm-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive

5 years agoMerge tag 'ovl-fixes-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszere...
Linus Torvalds [Wed, 20 May 2020 18:28:35 +0000 (11:28 -0700)]
Merge tag 'ovl-fixes-5.7-rc7' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "Fix two bugs introduced in this cycle and one introduced in v5.5"

* tag 'ovl-fixes-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: potential crash in ovl_fid_to_fh()
  ovl: clear ATTR_OPEN from attr->ia_valid
  ovl: clear ATTR_FILE from attr->ia_valid

5 years agopipe: Fix pipe_full() test in opipe_prep().
Tetsuo Handa [Tue, 19 May 2020 23:51:59 +0000 (08:51 +0900)]
pipe: Fix pipe_full() test in opipe_prep().

syzbot is reporting that splice()ing from non-empty read side to
already-full write side causes unkillable task, for opipe_prep() is by
error not inverting pipe_full() test.

  CPU: 0 PID: 9460 Comm: syz-executor.5 Not tainted 5.6.0-rc3-next-20200228-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:rol32 include/linux/bitops.h:105 [inline]
  RIP: 0010:iterate_chain_key kernel/locking/lockdep.c:369 [inline]
  RIP: 0010:__lock_acquire+0x6a3/0x5270 kernel/locking/lockdep.c:4178
  Call Trace:
     lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4720
     __mutex_lock_common kernel/locking/mutex.c:956 [inline]
     __mutex_lock+0x156/0x13c0 kernel/locking/mutex.c:1103
     pipe_lock_nested fs/pipe.c:66 [inline]
     pipe_double_lock+0x1a0/0x1e0 fs/pipe.c:104
     splice_pipe_to_pipe fs/splice.c:1562 [inline]
     do_splice+0x35f/0x1520 fs/splice.c:1141
     __do_sys_splice fs/splice.c:1447 [inline]
     __se_sys_splice fs/splice.c:1427 [inline]
     __x64_sys_splice+0x2b5/0x320 fs/splice.c:1427
     do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:295
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: syzbot+b48daca8639150bc5e73@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=9386d051e11e09973d5a4cf79af5e8cedf79386d
Fixes: 8cefc107ca54c8b0 ("pipe: Use head and tail pointers for the ring, not cursor and length")
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoMerge branch 'for-5.8/io_uring' into for-next
Jens Axboe [Wed, 20 May 2020 14:41:51 +0000 (08:41 -0600)]
Merge branch 'for-5.8/io_uring' into for-next

* for-5.8/io_uring:
  io_uring: don't submit sqes when ctx->refs is dying

5 years agoio_uring: don't submit sqes when ctx->refs is dying
Xiaoguang Wang [Wed, 20 May 2020 07:35:03 +0000 (15:35 +0800)]
io_uring: don't submit sqes when ctx->refs is dying

When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait
for sq thread to idle by busy loop:

    while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
        cond_resched();

Above loop isn't very CPU friendly, it may introduce a short cpu burst on
the current cpu.

If ctx->refs is dying, we forbid sq_thread from submitting any further
SQEs. Instead they just get discarded when we exit.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: reset -EBUSY error when io sq thread is waken up io_uring-5.7 io_uring-5.7-2020-05-22
Xiaoguang Wang [Wed, 20 May 2020 13:24:35 +0000 (21:24 +0800)]
io_uring: reset -EBUSY error when io sq thread is waken up

In io_sq_thread(), currently if we get an -EBUSY error and go to sleep,
we will won't clear it again, which will result in io_sq_thread() will
never have a chance to submit sqes again. Below test program test.c
can reveal this bug:

int main(int argc, char *argv[])
{
        struct io_uring ring;
        int i, fd, ret;
        struct io_uring_sqe *sqe;
        struct io_uring_cqe *cqe;
        struct iovec *iovecs;
        void *buf;
        struct io_uring_params p;

        if (argc < 2) {
                printf("%s: file\n", argv[0]);
                return 1;
        }

        memset(&p, 0, sizeof(p));
        p.flags = IORING_SETUP_SQPOLL;
        ret = io_uring_queue_init_params(4, &ring, &p);
        if (ret < 0) {
                fprintf(stderr, "queue_init: %s\n", strerror(-ret));
                return 1;
        }

        fd = open(argv[1], O_RDONLY | O_DIRECT);
        if (fd < 0) {
                perror("open");
                return 1;
        }

        iovecs = calloc(10, sizeof(struct iovec));
        for (i = 0; i < 10; i++) {
                if (posix_memalign(&buf, 4096, 4096))
                        return 1;
                iovecs[i].iov_base = buf;
                iovecs[i].iov_len = 4096;
        }

        ret = io_uring_register_files(&ring, &fd, 1);
        if (ret < 0) {
                fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret);
                return ret;
        }

        for (i = 0; i < 10; i++) {
                sqe = io_uring_get_sqe(&ring);
                if (!sqe)
                        break;

                io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0);
                sqe->flags |= IOSQE_FIXED_FILE;

                ret = io_uring_submit(&ring);
                sleep(1);
                printf("submit %d\n", i);
        }

        for (i = 0; i < 10; i++) {
                io_uring_wait_cqe(&ring, &cqe);
                printf("receive: %d\n", i);
                if (cqe->res != 4096) {
                        fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res);
                        ret = 1;
                }
                io_uring_cqe_seen(&ring, cqe);
        }

        close(fd);
        io_uring_queue_exit(&ring);
        return 0;
}
sudo ./test testfile
above command will hang on the tenth request, to fix this bug, when io
sq_thread is waken up, we reset the variable 'ret' to be zero.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'master' into for-next
Jens Axboe [Wed, 20 May 2020 13:11:28 +0000 (07:11 -0600)]
Merge branch 'master' into for-next

* master: (376 commits)
  afs: Don't unlock fetched data pages until the op completes successfully
  exfat: fix possible memory leak in exfat_find()
  exfat: use iter_file_splice_write
  Linux 5.7-rc6
  exec: Move would_dump into flush_old_exec
  KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
  selftests: mptcp: pm: rm the right tmp file
  dpaa2-eth: properly handle buffer size restrictions
  bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
  bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
  bpf: Restrict bpf_probe_read{, str}() only to archs where they work
  USB: gadget: fix illegal array access in binding with UDC
  usb: core: hub: limit HUB_QUIRK_DISABLE_AUTOSUSPEND to USB5534B
  x86: Fix early boot crash on gcc-10, third try
  ARM: dts: iwg20d-q7-dbcm-ca: Remove unneeded properties in hdmi@39
  ARM: dts: renesas: Make hdmi encoder nodes compliant with DT bindings
  arm64: dts: renesas: Make hdmi encoder nodes compliant with DT bindings
  x86/unwind/orc: Fix error handling in __unwind_start()
  MAINTAINERS: Mark networking drivers as Maintained.
  ipmr: Add lockdep expression to ipmr_for_each_table macro
  ...

5 years agoio_uring: don't add non-IO requests to iopoll pending list
Jens Axboe [Wed, 20 May 2020 03:20:27 +0000 (21:20 -0600)]
io_uring: don't add non-IO requests to iopoll pending list

We normally disable any commands that aren't specifically poll commands
for a ring that is setup for polling, but we do allow buffer provide and
remove commands to support buffer selection for polled IO. Once a
request is issued, we add it to the poll list to poll for completion. But
we should not do that for non-IO commands, as those request complete
inline immediately and aren't pollable. If we do, we can leave requests
on the iopoll list after they are freed.

Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Tue, 19 May 2020 23:33:26 +0000 (16:33 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
 "Stable fodder fix: copy_fdtable() would get screwed on 64bit boxen
  with sysctl_nr_open raised to 512M or higher, which became possible
  since 2.6.25.

  Nobody sane would set the things up that way, but..."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix multiplication overflow in copy_fdtable()

5 years agoMerge tag 'arc-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Tue, 19 May 2020 22:40:51 +0000 (15:40 -0700)]
Merge tag 'arc-5.7-rc7' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - fix recent DSP code regression on ARC700 platforms

 - fix thinkos in ICCM/DCCM size checks

 - USB regression fix

 - other small fixes here and there

* tag 'arc-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: show_regs: avoid extra line of output
  ARC: guard dsp early init against non ARCv2
  ARC: [plat-eznps]: Restrict to CONFIG_ISA_ARCOMPACT
  ARC: entry: comment
  arc: remove #ifndef CONFIG_AS_CFI_SIGNAL_FRAME
  arc: ptrace: hard-code "arc" instead of UTS_MACHINE
  ARC: [plat-hsdk]: fix USB regression
  ARC: Fix ICCM & DCCM runtime size checks

5 years agofix multiplication overflow in copy_fdtable()
Al Viro [Tue, 19 May 2020 21:48:52 +0000 (17:48 -0400)]
fix multiplication overflow in copy_fdtable()

cpy and set really should be size_t; we won't get an overflow on that,
since sysctl_nr_open can't be set above ~(size_t)0 / sizeof(void *),
so nr that would've managed to overflow size_t on that multiplication
won't get anywhere near copy_fdtable() - we'll fail with EMFILE
before that.

Cc: stable@kernel.org # v2.6.25+
Fixes: 9cfe015aa424 (get rid of NR_OPEN and introduce a sysctl_nr_open)
Reported-by: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5 years agoio_uring: don't use kiocb.private to store buf_index
Bijan Mottahedeh [Tue, 19 May 2020 21:52:49 +0000 (14:52 -0700)]
io_uring: don't use kiocb.private to store buf_index

kiocb.private is used in iomap_dio_rw() so store buf_index separately.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Move 'buf_index' to a hole in io_kiocb.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 19 May 2020 18:52:24 +0000 (11:52 -0700)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "A set of driver and core fixes as well as MAINTAINER update"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: add maintainer for mediatek i2c controller driver
  i2c: mux: Replace zero-length array with flexible-array
  i2c: mux: demux-pinctrl: Fix an error handling path in 'i2c_demux_pinctrl_probe()'
  i2c: altera: Fix race between xfer_msg and isr thread
  i2c: algo-pca: update contact email
  i2c: at91: Fix pinmux after devm_gpiod_get() for bus recovery
  i2c: use my kernel.org address from now on
  i2c: fix missing pm_runtime_put_sync in i2c_device_probe

5 years agoMerge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 19 May 2020 18:48:21 +0000 (11:48 -0700)]
Merge tag 'hyperv-fixes-signed' of git://git./linux/kernel/git/hyperv/linux

Pull hyperv fix from Wei Liu:
 "One patch from Vitaly to fix reenlightenment notifications"

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Properly suspend/resume reenlightenment notifications

5 years agoMerge tag 'iommu-fixes-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 19 May 2020 18:37:11 +0000 (11:37 -0700)]
Merge tag 'iommu-fixes-v5.7-rc6' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
 "All related to the AMD IOMMU driver:

   - ACPI table parser fix to correctly read the UID of ACPI devices

   - ACPI UID device matching fix

   - Fix deferred device attachment to a domain in kdump kernels when
     the IOMMU driver uses the dma-iommu DMA-API implementation"

* tag 'iommu-fixes-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Fix deferred domain attachment
  iommu/amd: Fix get_acpihid_device_id()
  iommu/amd: Fix over-read of ACPI UID from IVRS table

5 years agovsprintf: don't obfuscate NULL and error pointers
Ilya Dryomov [Tue, 19 May 2020 11:26:57 +0000 (13:26 +0200)]
vsprintf: don't obfuscate NULL and error pointers

I don't see what security concern is addressed by obfuscating NULL
and IS_ERR() error pointers, printed with %p/%pK.  Given the number
of sites where %p is used (over 10000) and the fact that NULL pointers
aren't uncommon, it probably wouldn't take long for an attacker to
find the hash that corresponds to 0.  Although harder, the same goes
for most common error values, such as -1, -2, -11, -14, etc.

The NULL part actually fixes a regression: NULL pointers weren't
obfuscated until commit 3e5903eb9cff ("vsprintf: Prevent crash when
dereferencing invalid pointers") which went into 5.2.  I'm tacking
the IS_ERR() part on here because error pointers won't leak kernel
addresses and printing them as pointers shouldn't be any different
from e.g. %d with PTR_ERR_OR_ZERO().  Obfuscating them just makes
debugging based on existing pr_debug and friends excruciating.

Note that the "always print 0's for %pK when kptr_restrict == 2"
behaviour which goes way back is left as is.

Example output with the patch applied:

                             ptr         error-ptr              NULL
 %p:            0000000001f8cc5b  fffffffffffffff2  0000000000000000
 %pK, kptr = 0: 0000000001f8cc5b  fffffffffffffff2  0000000000000000
 %px:           ffff888048c04020  fffffffffffffff2  0000000000000000
 %pK, kptr = 1: ffff888048c04020  fffffffffffffff2  0000000000000000
 %pK, kptr = 2: 0000000000000000  0000000000000000  0000000000000000

Fixes: 3e5903eb9cff ("vsprintf: Prevent crash when dereferencing invalid pointers")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoMerge branch 'for-5.8/block' into for-next
Jens Axboe [Tue, 19 May 2020 15:54:32 +0000 (09:54 -0600)]
Merge branch 'for-5.8/block' into for-next

* for-5.8/block:
  block: Remove unused flush_queue_delayed in struct blk_flush_queue
  null_blk: Zero-initialize read buffers in non-memory-backed mode
  block: Document the bio_vec properties
  bio.h: Declare the arguments of the bio iteration functions const
  block: Fix type of first compat_put_{,u}long() argument
  block: merge part_{inc,dev}_in_flight into their only callers
  block: don't call part_{inc,dec}_in_flight for blk-mq devices
  block: move the blk-mq calls out of part_in_flight{,_rw}
  block: mark blk_account_io_completion static
  blk-mq: allow blk_mq_make_request to consume the q_usage_counter reference
  blk-mq: remove a pointless queue enter pair in blk_mq_alloc_request_hctx
  blk-mq: remove a pointless queue enter pair in blk_mq_alloc_request
  blk-mq: move the call to blk_queue_enter_live out of blk_mq_get_request

5 years agoblock: Remove unused flush_queue_delayed in struct blk_flush_queue
Baolin Wang [Sun, 17 May 2020 11:49:41 +0000 (19:49 +0800)]
block: Remove unused flush_queue_delayed in struct blk_flush_queue

The flush_queue_delayed was introdued to hold queue if flush is
running for non-queueable flush drive by commit 3ac0cc450870
("hold queue if flush is running for non-queueable flush drive"),
but the non mq parts of the flush code had been removed by
commit 7e992f847a08 ("block: remove non mq parts from the flush code"),
as well as removing the usage of the flush_queue_delayed flag.
Thus remove the unused flush_queue_delayed flag.

Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonull_blk: Zero-initialize read buffers in non-memory-backed mode
Bart Van Assche [Tue, 19 May 2020 04:07:37 +0000 (21:07 -0700)]
null_blk: Zero-initialize read buffers in non-memory-backed mode

This patch suppresses an uninteresting KMSAN complaint without affecting
performance of the null_blk driver if CONFIG_KMSAN is disabled.

Reported-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: Document the bio_vec properties
Bart Van Assche [Tue, 19 May 2020 04:07:36 +0000 (21:07 -0700)]
block: Document the bio_vec properties

Since it is nontrivial that nth_page() does not have to be used for a
bio_vec, document this.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agobio.h: Declare the arguments of the bio iteration functions const
Bart Van Assche [Tue, 19 May 2020 04:07:35 +0000 (21:07 -0700)]
bio.h: Declare the arguments of the bio iteration functions const

This change makes it possible to pass 'const struct bio *' arguments to
these functions.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: Fix type of first compat_put_{,u}long() argument
Bart Van Assche [Tue, 19 May 2020 04:07:34 +0000 (21:07 -0700)]
block: Fix type of first compat_put_{,u}long() argument

This patch fixes the following sparse warnings:

block/ioctl.c:209:16: warning: incorrect type in argument 1 (different address spaces)
block/ioctl.c:209:16:    expected void const volatile [noderef] <asn:1> *
block/ioctl.c:209:16:    got signed int [usertype] *argp
block/ioctl.c:214:16: warning: incorrect type in argument 1 (different address spaces)
block/ioctl.c:214:16:    expected void const volatile [noderef] <asn:1> *
block/ioctl.c:214:16:    got unsigned int [usertype] *argp
block/ioctl.c:666:40: warning: incorrect type in argument 1 (different address spaces)
block/ioctl.c:666:40:    expected signed int [usertype] *argp
block/ioctl.c:666:40:    got void [noderef] <asn:1> *argp
block/ioctl.c:672:41: warning: incorrect type in argument 1 (different address spaces)
block/ioctl.c:672:41:    expected unsigned int [usertype] *argp
block/ioctl.c:672:41:    got void [noderef] <asn:1> *argp

Fixes: 9b81648cb5e3 ("compat_ioctl: simplify up block/ioctl.c")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: merge part_{inc,dev}_in_flight into their only callers
Christoph Hellwig [Wed, 13 May 2020 10:49:35 +0000 (12:49 +0200)]
block: merge part_{inc,dev}_in_flight into their only callers

part_inc_in_flight and part_dec_in_flight only have one caller each, and
those callers are purely for bio based drivers.  Merge each function into
the only caller, and remove the superflous blk-mq checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: don't call part_{inc,dec}_in_flight for blk-mq devices
Christoph Hellwig [Wed, 13 May 2020 10:49:34 +0000 (12:49 +0200)]
block: don't call part_{inc,dec}_in_flight for blk-mq devices

part_inc_in_flight and part_dec_in_flight are no-ops for blk-mq queues,
so remove the calls in purely blk-mq callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: move the blk-mq calls out of part_in_flight{,_rw}
Christoph Hellwig [Wed, 13 May 2020 10:49:33 +0000 (12:49 +0200)]
block: move the blk-mq calls out of part_in_flight{,_rw}

Don't bother to call part_in_flight / part_in_flight_rw on blk-mq
devices, just call the blk-mq versions directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: mark blk_account_io_completion static
Christoph Hellwig [Wed, 13 May 2020 10:49:32 +0000 (12:49 +0200)]
block: mark blk_account_io_completion static

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblk-mq: allow blk_mq_make_request to consume the q_usage_counter reference
Christoph Hellwig [Sat, 16 May 2020 18:28:01 +0000 (20:28 +0200)]
blk-mq: allow blk_mq_make_request to consume the q_usage_counter reference

blk_mq_make_request currently needs to grab an q_usage_counter
reference when allocating a request.  This is because the block layer
grabs one before calling blk_mq_make_request, but also releases it as
soon as blk_mq_make_request returns.  Remove the blk_queue_exit call
after blk_mq_make_request returns, and instead let it consume the
reference.  This works perfectly fine for the block layer caller, just
device mapper needs an extra reference as the old problem still
persists there.  Open code blk_queue_enter_live in device mapper,
as there should be no other callers and this allows better documenting
why we do a non-try get.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblk-mq: remove a pointless queue enter pair in blk_mq_alloc_request_hctx
Christoph Hellwig [Sat, 16 May 2020 18:28:00 +0000 (20:28 +0200)]
blk-mq: remove a pointless queue enter pair in blk_mq_alloc_request_hctx

No need for two queue references.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblk-mq: remove a pointless queue enter pair in blk_mq_alloc_request
Christoph Hellwig [Sat, 16 May 2020 18:27:59 +0000 (20:27 +0200)]
blk-mq: remove a pointless queue enter pair in blk_mq_alloc_request

No need for two queue references.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblk-mq: move the call to blk_queue_enter_live out of blk_mq_get_request
Christoph Hellwig [Sat, 16 May 2020 18:27:58 +0000 (20:27 +0200)]
blk-mq: move the call to blk_queue_enter_live out of blk_mq_get_request

Move the blk_queue_enter_live calls into the callers, where they can
successively be cleaned up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoiommu: Fix deferred domain attachment
Joerg Roedel [Tue, 19 May 2020 13:03:40 +0000 (15:03 +0200)]
iommu: Fix deferred domain attachment

The IOMMU core code has support for deferring the attachment of a domain
to a device. This is needed in kdump kernels where the new domain must
not be attached to a device before the device driver takes it over.

When the AMD IOMMU driver got converted to use the dma-iommu
implementation, the deferred attaching got lost. The code in
dma-iommu.c has support for deferred attaching, but it calls into
iommu_attach_device() to actually do it. But iommu_attach_device()
will check if the device should be deferred in it code-path and do
nothing, breaking deferred attachment.

Move the is_deferred_attach() check out of the attach_device path and
into iommu_group_add_device() to make deferred attaching work from the
dma-iommu code.

Fixes: 795bbbb9b6f8 ("iommu/dma-iommu: Handle deferred devices")
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Tom Murphy <murphyt7@tcd.ie>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20200519130340.14564-1-joro@8bytes.org
5 years agomtd:rawnand: brcmnand: Fix PM resume crash
Kamal Dasu [Sat, 2 May 2020 20:41:36 +0000 (16:41 -0400)]
mtd:rawnand: brcmnand: Fix PM resume crash

This change fixes crash observed on PM resume. This bug
was introduced in the change made for flash-edu support.

Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")

Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
5 years agomtd: Fix mtd not registered due to nvmem name collision
Ricardo Ribalda Delgado [Thu, 30 Apr 2020 13:17:21 +0000 (15:17 +0200)]
mtd: Fix mtd not registered due to nvmem name collision

When the nvmem framework is enabled, a nvmem device is created per mtd
device/partition.

It is not uncommon that a device can have multiple mtd devices with
partitions that have the same name. Eg, when there DT overlay is allowed
and the same device with mtd is attached twice.

Under that circumstances, the mtd fails to register due to a name
duplication on the nvmem framework.

With this patch we use the mtdX name instead of the partition name,
which is unique.

[    8.948991] sysfs: cannot create duplicate filename '/bus/nvmem/devices/Production Data'
[    8.948992] CPU: 7 PID: 246 Comm: systemd-udevd Not tainted 5.5.0-qtec-standard #13
[    8.948993] Hardware name: AMD Dibbler/Dibbler, BIOS 05.22.04.0019 10/26/2019
[    8.948994] Call Trace:
[    8.948996]  dump_stack+0x50/0x70
[    8.948998]  sysfs_warn_dup.cold+0x17/0x2d
[    8.949000]  sysfs_do_create_link_sd.isra.0+0xc2/0xd0
[    8.949002]  bus_add_device+0x74/0x140
[    8.949004]  device_add+0x34b/0x850
[    8.949006]  nvmem_register.part.0+0x1bf/0x640
...
[    8.948926] mtd mtd8: Failed to register NVMEM device

Fixes: c4dfa25ab307 ("mtd: add support for reading MTD devices via the nvmem API")
Signed-off-by: Ricardo Ribalda Delgado <ribalda@kernel.org>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
5 years agomtd: spinand: Propagate ECC information to the MTD structure
Miquel Raynal [Wed, 13 May 2020 13:10:29 +0000 (15:10 +0200)]
mtd: spinand: Propagate ECC information to the MTD structure

This is done by default in the raw NAND core (nand_base.c) but was
missing in the SPI-NAND core. Without these two lines the ecc_strength
and ecc_step_size values are not exported to the user through sysfs.

Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")
Cc: stable@vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
5 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux...
Linus Torvalds [Mon, 18 May 2020 18:29:21 +0000 (11:29 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/zohar/linux-integrity

Pull integrity fixes from Mimi Zohar:
 "A couple of miscellaneous bug fixes for the integrity subsystem:

  IMA:

   - Properly modify the open flags in order to calculate the file hash.

   - On systems requiring the IMA policy to be signed, the policy is
     loaded differently. Don't differentiate between "enforce" and
     either "log" or "fix" modes how the policy is loaded.

  EVM:

   - Two patches to fix an EVM race condition, normally the result of
     attempting to load an unsupported hash algorithm.

   - Use the lockless RCU version for walking an append only list"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  evm: Fix a small race in init_desc()
  evm: Fix RCU list related warnings
  ima: Fix return value of ima_write_policy()
  evm: Check also if *tfm is an error pointer in init_desc()
  ima: Set file->f_mode instead of file->f_flags in ima_calc_file_hash()

5 years agoMerge tag 'for-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon...
Linus Torvalds [Mon, 18 May 2020 17:33:13 +0000 (10:33 -0700)]
Merge tag 'for-5.7-rc7' of git://git./linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:

 - Fix potential memory leak in exfat_find

 - Set exfat's splice_write to iter_file_splice_write to fix a splice
   failure on direct-opened files

* tag 'for-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix possible memory leak in exfat_find()
  exfat: use iter_file_splice_write

5 years agoafs: Don't unlock fetched data pages until the op completes successfully
David Howells [Sun, 17 May 2020 20:21:05 +0000 (21:21 +0100)]
afs: Don't unlock fetched data pages until the op completes successfully

Don't call req->page_done() on each page as we finish filling it with
the data coming from the network.  Whilst this might speed up the
application a bit, it's a problem if there's a network failure and the
operation has to be reissued.

If this happens, an oops occurs because afs_readpages_page_done() clears
the pointer to each page it unlocks and when a retry happens, the
pointers to the pages it wants to fill are now NULL (and the pages have
been unlocked anyway).

Instead, wait till the operation completes successfully and only then
release all the pages after clearing any terminal gap (the server can
give us less data than we requested as we're allowed to ask for more
than is available).

KASAN produces a bug like the following, and even without KASAN, it can
oops and panic.

    BUG: KASAN: wild-memory-access in _copy_to_iter+0x323/0x5f4
    Write of size 1404 at addr 0005088000000000 by task md5sum/5235

    CPU: 0 PID: 5235 Comm: md5sum Not tainted 5.7.0-rc3-fscache+ #250
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    Call Trace:
     memcpy+0x39/0x58
     _copy_to_iter+0x323/0x5f4
     __skb_datagram_iter+0x89/0x2a6
     skb_copy_datagram_iter+0x129/0x135
     rxrpc_recvmsg_data.isra.0+0x615/0xd42
     rxrpc_kernel_recv_data+0x1e9/0x3ae
     afs_extract_data+0x139/0x33a
     yfs_deliver_fs_fetch_data64+0x47a/0x91b
     afs_deliver_to_call+0x304/0x709
     afs_wait_for_call_to_complete+0x1cc/0x4ad
     yfs_fs_fetch_data+0x279/0x288
     afs_fetch_data+0x1e1/0x38d
     afs_readpages+0x593/0x72e
     read_pages+0xf5/0x21e
     __do_page_cache_readahead+0x128/0x23f
     ondemand_readahead+0x36e/0x37f
     generic_file_buffered_read+0x234/0x680
     new_sync_read+0x109/0x17e
     vfs_read+0xe6/0x138
     ksys_read+0xd8/0x14d
     do_syscall_64+0x6e/0x8a
     entry_SYSCALL_64_after_hwframe+0x49/0xb3

Fixes: 196ee9cd2d04 ("afs: Make afs_fs_fetch_data() take a list of pages")
Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoio_uring: cancel work if task_work_add() fails
Jens Axboe [Mon, 18 May 2020 17:04:17 +0000 (11:04 -0600)]
io_uring: cancel work if task_work_add() fails

We currently move it to the io_wqe_manager for execution, but we cannot
safely do so as we may lack some of the state to execute it out of
context. As we cancel work anyway when the ring/task exits, just mark
this request as canceled and io_async_task_func() will do the right
thing.

Fixes: aa96bf8a9ee3 ("io_uring: use io-wq manager as backup task if task is exiting")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'for-5.8/io_uring' into for-next
Jens Axboe [Mon, 18 May 2020 15:15:33 +0000 (09:15 -0600)]
Merge branch 'for-5.8/io_uring' into for-next

* for-5.8/io_uring:
  io_uring: async task poll trigger cleanup
  io_uring: add tee(2) support
  splice: export do_tee()
  io_uring: don't repeat valid flag list
  io_uring: rename io_file_put()
  io_uring: remove req->needs_fixed_files
  io_uring: cleanup io_poll_remove_one() logic
  io_uring: file registration list and lock optimization
  io_uring: add IORING_CQ_EVENTFD_DISABLED to the CQ ring flags
  io_uring: add 'cq_flags' field for the CQ ring
  io_uring: allow POLL_ADD with double poll_wait() users
  io_uring: batch reap of dead file registrations
  io_uring: name sq thread and ref completions
  io_uring: remove duplicate semicolon at the end of line

5 years agoMerge branch 'for-5.8/drivers' into for-next
Jens Axboe [Mon, 18 May 2020 15:15:29 +0000 (09:15 -0600)]
Merge branch 'for-5.8/drivers' into for-next

* for-5.8/drivers:
  block/swim3: use set_current_state macro

5 years agoMerge branch 'for-5.8/block' into for-next
Jens Axboe [Mon, 18 May 2020 15:15:25 +0000 (09:15 -0600)]
Merge branch 'for-5.8/block' into for-next

* for-5.8/block:
  blktrace: Report pid with note messages
  block: remove the REQ_NOWAIT_INLINE flag

5 years agoACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive
Rafael J. Wysocki [Fri, 15 May 2020 10:58:19 +0000 (12:58 +0200)]
ACPI: EC: PM: Avoid flushing EC work when EC GPE is inactive

Flushing the EC work while suspended to idle when the EC GPE status
is not set causes some EC wakeup events (notably power button and
lid ones) to be missed after a series of spurious wakeups on the Dell
XPS13 9360 in my office.

If that happens, the machine cannot be woken up from suspend-to-idle
by the power button or lid status change and it needs to be woken up
in some other way (eg. by a key press).

Flushing the EC work only after successful dispatching the EC GPE,
which means that its status has been set, avoids the issue, so change
the code in question accordingly.

Fixes: 7b301750f7f8 ("ACPI: EC: PM: Avoid premature returns from acpi_s2idle_wake()")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Chris Chiu <chiu@endlessm.com>
5 years agoexfat: fix possible memory leak in exfat_find()
Wei Yongjun [Wed, 6 May 2020 14:25:54 +0000 (14:25 +0000)]
exfat: fix possible memory leak in exfat_find()

'es' is malloced from exfat_get_dentry_set() in exfat_find() and should
be freed before leaving from the error handling cases, otherwise it will
cause memory leak.

Fixes: 5f2aa075070c ("exfat: add inode operations")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
5 years agoexfat: use iter_file_splice_write
Eric Sandeen [Sat, 2 May 2020 01:34:25 +0000 (20:34 -0500)]
exfat: use iter_file_splice_write

Doing copy_file_range() on exfat with a file opened for direct IO leads
to an -EFAULT:

# xfs_io -f -d -c "truncate 32768" \
       -c "copy_range -d 16384 -l 16384 -f 0" /mnt/test/junk
copy_range: Bad address

and the reason seems to be that we go through:

default_file_splice_write
 splice_from_pipe
  __splice_from_pipe
   write_pipe_buf
    __kernel_write
     new_sync_write
      generic_file_write_iter
       generic_file_direct_write
        exfat_direct_IO
         do_blockdev_direct_IO
          iov_iter_get_pages

and land in iterate_all_kinds(), which does "return -EFAULT" for our kvec
iter.

Setting exfat's splice_write to iter_file_splice_write fixes this and lets
fsx (which originally detected the problem) run to success from
the xfstests harness.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
5 years agoLinux 5.7-rc6 v5.7-rc6
Linus Torvalds [Sun, 17 May 2020 23:48:37 +0000 (16:48 -0700)]
Linux 5.7-rc6

5 years agoio_uring: async task poll trigger cleanup
Jens Axboe [Sun, 17 May 2020 23:43:31 +0000 (17:43 -0600)]
io_uring: async task poll trigger cleanup

If the request is still hashed in io_async_task_func(), then it cannot
have been canceled and it's pointless to check. So save that check.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge tag 'for-linus-5.7-2' of git://github.com/cminyard/linux-ipmi
Linus Torvalds [Sun, 17 May 2020 23:07:30 +0000 (16:07 -0700)]
Merge tag 'for-linus-5.7-2' of git://github.com/cminyard/linux-ipmi

Pull IPMI update from Corey Minyard:
 "Convert i2c_new_device() to i2c_new_client_device()

  Wolfram Sang has asked to have this included in 5.7 so the deprecated
  API can be removed next release. There should be no functional
  difference.

  I think that entire this section of code can be removed; it is
  leftover from other things that have since changed, but this is the
  safer thing to do for now. The full removal can happen next release"

* tag 'for-linus-5.7-2' of git://github.com/cminyard/linux-ipmi:
  char: ipmi: convert to use i2c_new_client_device()

5 years agoubi: Fix seq_file usage in detailed_erase_block_info debugfs file
Richard Weinberger [Sat, 2 May 2020 12:48:02 +0000 (14:48 +0200)]
ubi: Fix seq_file usage in detailed_erase_block_info debugfs file

3bfa7e141b0b ("fs/seq_file.c: seq_read(): add info message about buggy .next functions")
showed that we don't use seq_file correctly.
So make sure that our ->next function always updates the position.

Fixes: 7bccd12d27b7 ("ubi: Add debugfs file for tracking PEB state")
Signed-off-by: Richard Weinberger <richard@nod.at>
5 years agoubifs: fix wrong use of crypto_shash_descsize()
Eric Biggers [Sat, 2 May 2020 05:59:45 +0000 (22:59 -0700)]
ubifs: fix wrong use of crypto_shash_descsize()

crypto_shash_descsize() returns the size of the shash_desc context
needed to compute the hash, not the size of the hash itself.

crypto_shash_digestsize() would be correct, or alternatively using
c->hash_len and c->hmac_desc_len which already store the correct values.
But actually it's simpler to just use stack arrays, so do that instead.

Fixes: 49525e5eecca ("ubifs: Add helper functions for authentication support")
Fixes: da8ef65f9573 ("ubifs: Authenticate replayed journal")
Cc: <stable@vger.kernel.org> # v4.20+
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
5 years agoio_uring: remove dead check in io_splice()
Jens Axboe [Sun, 17 May 2020 20:21:38 +0000 (14:21 -0600)]
io_uring: remove dead check in io_splice()

We checked for 'force_nonblock' higher up, so it's definitely false
at this point. Kill the check, it's a remnant of when we tried to do
inline splice without always punting to async context.

Fixes: 2fb3e82284fc ("io_uring: punt splice async because of inode mutex")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: add tee(2) support
Pavel Begunkov [Sun, 17 May 2020 11:18:06 +0000 (14:18 +0300)]
io_uring: add tee(2) support

Add IORING_OP_TEE implementing tee(2) support. Almost identical to
splice bits, but without offsets.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agosplice: export do_tee()
Pavel Begunkov [Sun, 17 May 2020 11:18:05 +0000 (14:18 +0300)]
splice: export do_tee()

export do_tee() for use in io_uring

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: don't repeat valid flag list
Pavel Begunkov [Sun, 17 May 2020 11:13:42 +0000 (14:13 +0300)]
io_uring: don't repeat valid flag list

req->flags stores all sqe->flags. After checking that sqe->flags are
valid set if IOSQE* flags, no need to double check it, just forward them
all.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: rename io_file_put()
Pavel Begunkov [Sun, 17 May 2020 11:13:41 +0000 (14:13 +0300)]
io_uring: rename io_file_put()

io_file_put() deals with flushing state's file refs, adding "state" to
its name makes it a bit clearer. Also, avoid double check of
state->file in __io_file_get() in some cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: remove req->needs_fixed_files
Pavel Begunkov [Sun, 17 May 2020 11:13:40 +0000 (14:13 +0300)]
io_uring: remove req->needs_fixed_files

A submission is "async" IIF it's done by SQPOLL thread. Instead of
passing @async flag into io_submit_sqes(), deduce it from ctx->flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: cleanup io_poll_remove_one() logic
Jens Axboe [Sun, 17 May 2020 19:54:12 +0000 (13:54 -0600)]
io_uring: cleanup io_poll_remove_one() logic

We only need apoll in the one section, do the juggling with the work
restoration there. This removes a special case further down as well.

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 17 May 2020 19:33:00 +0000 (12:33 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Some more clk driver fixes and one core framework fix:

   - A handful of TI driver fixes for bad of_node_put() and incorrect
     parent names

   - Rockchip rk3228 aclk_gpu* creation was interfering with lima GPU
     work so we use a composite clk now

   - Resuming from suspend on Tegra Jetson TK1 was broken because an
     audio PLL calculated an incorrect rate

   - A fix for devicetree probing on IM-PD1 by actually specifying a clk
     name which is required to pass clk registration

   - Avoid list corruption if registration fails for a critical clk"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: ti: clkctrl: convert subclocks to use proper names also
  clk: ti: am33xx: fix RTC clock parent
  clk: ti: clkctrl: Fix Bad of_node_put within clkctrl_get_name
  clk: tegra: Fix initial rate for pll_a on Tegra124
  clk: impd1: Look up clock-output-names
  clk: Unlink clock if failed to prepare or enable
  clk: rockchip: fix incorrect configuration of rk3228 aclk_gpu* clocks

5 years agoMerge tag 'usb-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 17 May 2020 19:31:22 +0000 (12:31 -0700)]
Merge tag 'usb-5.7-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of USB fixes for 5.7-rc6

  The "largest" in here is a bunch of raw-gadget fixes and api changes
  as the driver just showed up in -rc1 and work has been done to fix up
  some uapi issues found with the original submission, before it shows
  up in a -final release.

  Other than that, a bunch of other small USB gadget fixes, xhci fixes,
  some quirks, andother tiny fixes for reported issues.

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
  USB: gadget: fix illegal array access in binding with UDC
  usb: core: hub: limit HUB_QUIRK_DISABLE_AUTOSUSPEND to USB5534B
  USB: usbfs: fix mmap dma mismatch
  usb: host: xhci-plat: keep runtime active when removing host
  usb: xhci: Fix NULL pointer dereference when enqueuing trbs from urb sg list
  usb: cdns3: gadget: make a bunch of functions static
  usb: mtu3: constify struct debugfs_reg32
  usb: gadget: udc: atmel: Make some symbols static
  usb: raw-gadget: fix null-ptr-deref when reenabling endpoints
  usb: raw-gadget: documentation updates
  usb: raw-gadget: support stalling/halting/wedging endpoints
  usb: raw-gadget: fix gadget endpoint selection
  usb: raw-gadget: improve uapi headers comments
  usb: typec: mux: intel: Fix DP_HPD_LVL bit field
  usb: raw-gadget: fix return value of ep read ioctls
  usb: dwc3: select USB_ROLE_SWITCH
  usb: gadget: legacy: fix error return code in gncm_bind()
  usb: gadget: legacy: fix error return code in cdc_bind()
  usb: gadget: legacy: fix redundant initialization warnings
  usb: gadget: tegra-xudc: Fix idle suspend/resume
  ...

5 years agoMerge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm...
Linus Torvalds [Sun, 17 May 2020 19:23:37 +0000 (12:23 -0700)]
Merge branch 'exec-linus' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull execve fix from Eric Biederman:
 "While working on my exec cleanups I found a bug in exec that I
  introduced by accident a couple of years ago. I apparently missed the
  fact that bprm->file can change.

  Now I have a very personal motive to clean up exec and make it more
  approachable.

  The change is just moving woud_dump to where it acts on the final
  bprm->file not the initial bprm->file. I have been careful and tested
  and verify this fix works"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  exec: Move would_dump into flush_old_exec

5 years agoMerge tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 May 2020 19:20:14 +0000 (12:20 -0700)]
Merge tag 'objtool-urgent-2020-05-17' of git://git./linux/kernel/git/tip/tip

Pull x86 stack unwinding fix from Thomas Gleixner:
 "A single bugfix for the ORC unwinder to ensure that the error flag
  which tells the unwinding code whether a stack trace can be trusted or
  not is always set correctly.

  This was messed up by a couple of changes in the recent past"

* tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Fix error handling in __unwind_start()

5 years agoMerge tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 May 2020 18:08:29 +0000 (11:08 -0700)]
Merge tag 'x86_urgent_for_v5.7-rc7' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
 "A single fix for early boot crashes of kernels built with gcc10 and
  stack protector enabled"

* tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix early boot crash on gcc-10, third try

5 years agoexec: Move would_dump into flush_old_exec
Eric W. Biederman [Sat, 16 May 2020 21:29:20 +0000 (16:29 -0500)]
exec: Move would_dump into flush_old_exec

I goofed when I added mm->user_ns support to would_dump.  I missed the
fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
binfmt_script bprm->file is reassigned.  Which made the move of
would_dump from setup_new_exec to __do_execve_file before exec_binprm
incorrect as it can result in would_dump running on the script instead
of the interpreter of the script.

The net result is that the code stopped making unreadable interpreters
undumpable.  Which allows them to be ptraced and written to disk
without special permissions.  Oops.

The move was necessary because the call in set_new_exec was after
bprm->mm was no longer valid.

To correct this mistake move the misplaced would_dump from
__do_execve_file into flos_old_exec, before exec_mmap is called.

I tested and confirmed that without this fix I can attach with gdb to
a script with an unreadable interpreter, and with this fix I can not.

Cc: stable@vger.kernel.org
Fixes: f84df2a6f268 ("exec: Ensure mm->user_ns contains the execed files")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
5 years agoio_uring: fix FORCE_ASYNC req preparation
Pavel Begunkov [Sun, 17 May 2020 11:02:12 +0000 (14:02 +0300)]
io_uring: fix FORCE_ASYNC req preparation

As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs,
so they can be prepared properly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: don't prepare DRAIN reqs twice
Pavel Begunkov [Sun, 17 May 2020 11:02:11 +0000 (14:02 +0300)]
io_uring: don't prepare DRAIN reqs twice

If req->io is not NULL, it's already prepared. Don't do it again,
it's dangerous.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: initialize ctx->sqo_wait earlier
Jens Axboe [Sun, 17 May 2020 15:20:00 +0000 (09:20 -0600)]
io_uring: initialize ctx->sqo_wait earlier

Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated,
instead of deferring it to the offload setup. This fixes a syzbot
reported lockdep complaint, which is really due to trying to wake_up
on an uninitialized wait queue:

RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319
RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b
RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260
R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 assign_lock_key kernel/locking/lockdep.c:913 [inline]
 register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
 __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234
 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122
 io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160
 io_poll_remove_all fs/io_uring.c:4357 [inline]
 io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305
 io_uring_create fs/io_uring.c:7843 [inline]
 io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x441319
Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9

Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge tag '5.7-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 17 May 2020 04:43:11 +0000 (21:43 -0700)]
Merge tag '5.7-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Three small cifs/smb3 fixes, one for stable"

* tag '5.7-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix leaked reference on requeued write
  cifs: Fix null pointer check in cifs_read
  CIFS: Spelling s/EACCESS/EACCES/

5 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 16 May 2020 20:39:22 +0000 (13:39 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "A new testcase for guest debugging (gdbstub) that exposed a bunch of
  bugs, mostly for AMD processors. And a few other x86 fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
  KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c
  KVM: SVM: Disable AVIC before setting V_IRQ
  KVM: Introduce kvm_make_all_cpus_request_except()
  KVM: VMX: pass correct DR6 for GD userspace exit
  KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6
  KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6
  KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on
  KVM: selftests: Add KVM_SET_GUEST_DEBUG test
  KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG
  KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG
  KVM: x86: fix DR6 delivery for various cases of #DB injection
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly