linux-2.6-block.git
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Fri, 21 Apr 2017 17:14:50 +0000 (11:14 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoMerge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-4.12/block
Jens Axboe [Fri, 21 Apr 2017 17:14:22 +0000 (11:14 -0600)]
Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-4.12/block

Christoph writes:

This is the current NVMe pile: virtualization extensions, lots of FC
updates and various misc bits.  There are a few more FC bits that didn't
make the cut, but we'd like to get this request out before the merge
window for sure.

8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Fri, 21 Apr 2017 14:46:49 +0000 (08:46 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agomtip32xx: fix dereference of stack garbage
Jens Axboe [Fri, 21 Apr 2017 14:46:44 +0000 (08:46 -0600)]
mtip32xx: fix dereference of stack garbage

We need to get the command payload from the request before
we attempt to dereference it.

Fixes: 4dda4735c581 ("mtip32xx: add a status field to struct mtip_cmd")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonvme: let dm-mpath distinguish nvme error codes
Junxiong Guan [Fri, 21 Apr 2017 10:59:07 +0000 (12:59 +0200)]
nvme: let dm-mpath distinguish nvme error codes

Currently most IOs which return the nvme error codes are retried on
the other path if those IOs returns EIO from NVMe driver. This
patch let Multipath distinguish nvme media error codes and some
generic or cmd-specific nvme error codes so that multipath will
not retry those kinds of IO, to save bandwidth.

Signed-off-by: Junxiong Guan <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
8 years agonvme/pci: Poll CQ on timeout
Keith Busch [Fri, 24 Feb 2017 22:59:28 +0000 (17:59 -0500)]
nvme/pci: Poll CQ on timeout

If an IO timeout occurs, it's helpful to know if the controller did not
post a completion or the driver missed an interrupt. While we never expect
the latter, this patch will make it possible to tell the difference so
we don't have to guess.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
8 years agonvmet_fc: Change traddr field separator to a colon
James Smart [Wed, 12 Apr 2017 22:15:18 +0000 (15:15 -0700)]
nvmet_fc: Change traddr field separator to a colon

The FC-NVME spec revised syntax to avoid comma separators.
Sync with the change in the parser for traddr on port attachments.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvme_fc: Add ls aborts on remote port teardown
James Smart [Tue, 11 Apr 2017 18:35:09 +0000 (11:35 -0700)]
nvme_fc: Add ls aborts on remote port teardown

remoteport teardown never aborted the LS opertions. Add support.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvme_fc: Move LS's to rport
James Smart [Tue, 11 Apr 2017 18:35:08 +0000 (11:35 -0700)]
nvme_fc: Move LS's to rport

Link LS's on the remoteport rather than the controller. LS's are
between nport's. Makes more sense, especially on async teardown where
the controller is torn down regardless of the LS (LS is more of a notifier
to the target of the teardown), to have them on the remoteport.

While revising ls send/done routines, issues were seen relative to
refcounting and cleanup, especially in async path. Reworked these code
paths.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvmet_fc: add missing reference in add_port
James Smart [Tue, 11 Apr 2017 18:32:32 +0000 (11:32 -0700)]
nvmet_fc: add missing reference in add_port

Add missing reference in add_port

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvmet_fc: Rework target side abort handling
James Smart [Tue, 11 Apr 2017 18:32:31 +0000 (11:32 -0700)]
nvmet_fc: Rework target side abort handling

target transport:
----------------------
There are cases when there is a need to abort in-progress target
operations (writedata) so that controller termination or errors can
clean up. That can't happen currently as the abort is another target
op type, so it can't be used till the running one finishes (and it may
not).  Solve by removing the abort op type and creating a separate
downcall from the transport to the lldd to request an io to be aborted.

The transport will abort ios on queue teardown or io errors. In general
the transport tries to call the lldd abort only when the io state is
idle. Meaning: ops that transmit data (readdata or rsp) will always
finish their transmit (or the lldd will see a state on the
link or initiator port that fails the transmit) and the done call for
the operation will occur. The transport will wait for the op done
upcall before calling the abort function, and as the io is idle, the
io can be cleaned up immediately after the abort call; Similarly, ios
that are not waiting for data or transmitting data must be in the nvmet
layer being processed. The transport will wait for the nvmet layer
completion before calling the abort function, and as the io is idle,
the io can be cleaned up immediately after the abort call; As for ops
that are waiting for data (writedata), they may be outstanding
indefinitely if the lldd doesn't see a condition where the initiatior
port or link is bad. In those cases, the transport will call the abort
function and wait for the lldd's op done upcall for the operation, where
it will then clean up the io.

Additionally, if a lldd receives an ABTS and matches it to an outstanding
request in the transport, A new new transport upcall was created to abort
the outstanding request in the transport. The transport expects any
outstanding op call (readdata or writedata) will completed by the lldd and
the operation upcall made. The transport doesn't act on the reported
abort (e.g. clean up the io) until an op done upcall occurs, a new op is
attempted, or the nvmet layer completes the io processing.

fcloop:
----------------------
Updated to support the new target apis.
On fcp io aborts from the initiator, the loopback context is updated to
NULL out the half that has completed. The initiator side is immediately
called after the abort request with an io completion (abort status).
On fcp io aborts from the target, the io is stopped and the initiator side
sees it as an aborted io. Target side ops, perhaps in progress while the
initiator side is done, continue but noop the data movement as there's no
structure on the initiator side to reference.

patch also contains:
----------------------
Revised lpfc to support the new abort api

commonized rsp buffer syncing and nulling of private data based on
calling paths.

errors in op done calls don't take action on the fod. They're bad
operations which implies the fod may be bad.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvme_fcloop: split job struct from transport for req_release
James Smart [Tue, 11 Apr 2017 18:32:30 +0000 (11:32 -0700)]
nvme_fcloop: split job struct from transport for req_release

Current design has the fcloop job struct, used for both initiator and
target processing, allocated as part of the initiator request structure.
On aborts, the initiator side (based on the request) may terminate, yet
the target side wants to continue processing. the target side can't do
that if the initiator side goes away.
Revise fcloop to allocate an independent target side structure when it
starts an io from the initiator.

Added a lock to the request struct as well to synchronize pointer updates
on abort calls.

Modified target downcalls to recognize conditions where initiator has
aborted the io (thus nulled the pointer between job structs), thus
avoid referencing sgl lists which are gone and no longer making upcalls
to the initiator.

In conditions where the targetport is no longer connected, have the
initiator return an access failure rather than simulating a command
completion.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvmet_fc: add req_release to lldd api
James Smart [Tue, 11 Apr 2017 18:32:29 +0000 (11:32 -0700)]
nvmet_fc: add req_release to lldd api

With the advent of the opdone calls changing context, the lldd can no
longer assume that once the op->done call returns for RSP operations
that the request struct is no longer being accessed.

As such, revise the lldd api for a req_release callback that the
transport will call when the job is complete. This will also be used
with abort cases.

Fixed text in api header for change in io complete semantics.

Revised lpfc to support the new req_release api.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvmet_fc: add target feature flags for upcall isr contexts
James Smart [Tue, 11 Apr 2017 18:32:28 +0000 (11:32 -0700)]
nvmet_fc: add target feature flags for upcall isr contexts

Two new feature flags were added to control whether upcalls to the
transport result in context switches or stay in the calling context.

NVMET_FCTGTFEAT_CMD_IN_ISR:
  By default, if the flag is not set, the transport assumes the
  lldd is in a non-isr context and in the cpu context it should be
  for the io queue. As such, the cmd handler is called directly in the
  calling context.
  If the flag is set, indicating the upcall is an isr context, the
  transport mandates a transition to a workqueue. The workqueue assigned
  to the queue is used for the context.
NVMET_FCTGTFEAT_OPDONE_IN_ISR
  By default, if the flag is not set, the transport assumes the
  lldd is in a non-isr context and in the cpu context it should be
  for the io queue. As such, the fcp operation done callback is called
  directly in the calling context.
  If the flag is set, indicating the upcall is an isr context, the
  transport mandates a transition to a workqueue. The workqueue assigned
  to the queue is used for the context.

Updated lpfc for flags

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvmet: convert from kmap to nvmet_copy_from_sgl
Logan Gunthorpe [Tue, 18 Apr 2017 23:32:15 +0000 (17:32 -0600)]
nvmet: convert from kmap to nvmet_copy_from_sgl

This is safer as it doesn't rely on the data being stored in
a single page in an sgl.

It also aids our effort to start phasing out users of sg_page. See [1].

For this we kmalloc some memory, copy to it and free at the end. Note:
we can't allocate this memory on the stack as the kbuild test robot
reports some frame size overflows on i386.

[1] https://lwn.net/Articles/720053/

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
8 years agonvme: improve performance for virtual NVMe devices
Helen Koike [Mon, 10 Apr 2017 15:51:07 +0000 (12:51 -0300)]
nvme: improve performance for virtual NVMe devices

This change provides a mechanism to reduce the number of MMIO doorbell
writes for the NVMe driver. When running in a virtualized environment
like QEMU, the cost of an MMIO is quite hefy here. The main idea for
the patch is provide the device two memory location locations:
 1) to store the doorbell values so they can be lookup without the doorbell
    MMIO write
 2) to store an event index.
I believe the doorbell value is obvious, the event index not so much.
Similar to the virtio specification, the virtual device can tell the
driver (guest OS) not to write MMIO unless you are writing past this
value.

FYI: doorbell values are written by the nvme driver (guest OS) and the
event index is written by the virtual device (host OS).

The patch implements a new admin command that will communicate where
these two memory locations reside. If the command fails, the nvme
driver will work as before without any optimizations.

Contributions:
  Eric Northup <digitaleric@google.com>
  Frank Swiderski <fes@google.com>
  Ted Tso <tytso@mit.edu>
  Keith Busch <keith.busch@intel.com>

Just to give an idea on the performance boost with the vendor
extension: Running fio [1], a stock NVMe driver I get about 200K read
IOPs with my vendor patch I get about 1000K read IOPs. This was
running with a null device i.e. the backing device simply returned
success on every read IO request.

[1] Running on a 4 core machine:
  fio --time_based --name=benchmark --runtime=30
  --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32
  --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4
  --rw=randread --blocksize=4k --randrepeat=false

Signed-off-by: Rob Nelson <rlnelson@google.com>
[mlin: port for upstream]
Signed-off-by: Ming Lin <mlin@kernel.org>
[koike: updated for upstream]
Signed-off-by: Helen Koike <helen.koike@collabora.co.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
8 years agonvme/pci: Don't set reserved SQ create flags
Keith Busch [Tue, 4 Apr 2017 22:18:12 +0000 (18:18 -0400)]
nvme/pci: Don't set reserved SQ create flags

The QPRIO field is only valid if weighted round robin arbitration is used,
and this driver doesn't enable that controller configuration option.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Fri, 21 Apr 2017 13:56:37 +0000 (07:56 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblk-stat: kill blk_stat_rq_ddir()
Jens Axboe [Fri, 21 Apr 2017 13:55:42 +0000 (07:55 -0600)]
blk-stat: kill blk_stat_rq_ddir()

No point in providing and exporting this helper. There's just
one (real) user of it, just use rq_data_dir().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Fri, 21 Apr 2017 01:53:33 +0000 (19:53 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agonbd: set the max segments to USHRT_MAX
Josef Bacik [Thu, 20 Apr 2017 19:47:01 +0000 (15:47 -0400)]
nbd: set the max segments to USHRT_MAX

I lack the basic understanding of what segments mean, so we were being
limited to 512kib requests even with higher max_sectors sizes set.
Setting the maximum number of segments to unlimited allows us to
actually have arbitrarily large IO's go through NBD.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 23:28:37 +0000 (17:28 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblk-mq: Remove blk_mq_sched_move_to_dispatch()
Bart Van Assche [Thu, 20 Apr 2017 23:25:19 +0000 (16:25 -0700)]
blk-mq: Remove blk_mq_sched_move_to_dispatch()

commit c13660a08c8b ("blk-mq-sched: change ->dispatch_requests()
to ->dispatch_request()") removed the last user of this function.
Hence also remove the function itself.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 23:23:50 +0000 (17:23 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblk-mq: add might_sleep check to blk_mq_get_driver_tag()
Jens Axboe [Thu, 20 Apr 2017 23:23:13 +0000 (17:23 -0600)]
blk-mq: add might_sleep check to blk_mq_get_driver_tag()

If the caller passes in wait=true, it has to be able to block
for a driver tag. We just had a bug where flush insertion
would block on tag allocation, while we had preempt disabled.
Ensure that we catch cases like that earlier next time.

Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 23:11:10 +0000 (17:11 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblk-mq: Fix poll_stat for new size-based bucketing.
Stephen Bates [Thu, 20 Apr 2017 22:59:11 +0000 (16:59 -0600)]
blk-mq: Fix poll_stat for new size-based bucketing.

Fixes an issue where the size of the poll_stat array in request_queue
does not match the size expected by the new size based bucketing for
IO completion polling.

Fixes: 720b8ccc4500 ("blk-mq: Add a polling specific stats function")
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 22:42:07 +0000 (16:42 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblk-mq: fix schedule-while-atomic with scheduler attached
Jens Axboe [Thu, 20 Apr 2017 22:40:36 +0000 (16:40 -0600)]
blk-mq: fix schedule-while-atomic with scheduler attached

We must have dropped the ctx before we call
blk_mq_sched_insert_request() with can_block=true, otherwise we risk
that a flush request can block on insertion if we are currently out of
tags.

[   47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002
[   47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre
[   47.690572] Preemption disabled at:
[   47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280
[   47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271
[   47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
[   47.718081] Call Trace:
[   47.720903]  dump_stack+0x4f/0x73
[   47.724694]  ? blk_mq_sched_get_request+0x6c/0x280
[   47.730137]  __schedule_bug+0x6c/0xc0
[   47.734314]  __schedule+0x559/0x780
[   47.738302]  schedule+0x3b/0x90
[   47.741899]  io_schedule+0x11/0x40
[   47.745788]  blk_mq_get_tag+0x167/0x2a0
[   47.750162]  ? remove_wait_queue+0x70/0x70
[   47.754901]  blk_mq_get_driver_tag+0x92/0xf0
[   47.759758]  blk_mq_sched_insert_request+0x134/0x170
[   47.765398]  ? blk_account_io_start+0xd0/0x270
[   47.770679]  blk_mq_make_request+0x1b2/0x850
[   47.775766]  generic_make_request+0xf7/0x2d0
[   47.780860]  submit_bio+0x5f/0x120
[   47.784979]  ? submit_bio+0x5f/0x120
[   47.789631]  submit_bh_wbc.isra.46+0x10d/0x130
[   47.794902]  submit_bh+0xb/0x10
[   47.798719]  journal_submit_commit_record+0x190/0x210
[   47.804686]  ? _raw_spin_unlock+0x13/0x30
[   47.809480]  jbd2_journal_commit_transaction+0x180a/0x1d00
[   47.815925]  kjournald2+0xb6/0x250
[   47.820022]  ? kjournald2+0xb6/0x250
[   47.824328]  ? remove_wait_queue+0x70/0x70
[   47.829223]  kthread+0x10e/0x140
[   47.833147]  ? commit_timeout+0x10/0x10
[   47.837742]  ? kthread_create_on_node+0x40/0x40
[   47.843122]  ret_from_fork+0x29/0x40

Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 21:29:54 +0000 (15:29 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblk-mq: Add a polling specific stats function
Stephen Bates [Fri, 7 Apr 2017 12:24:03 +0000 (06:24 -0600)]
blk-mq: Add a polling specific stats function

Rather than bucketing IO statisics based on direction only we also
bucket based on the IO size. This leads to improved polling
performance. Update the bucket callback function and use it in the
polling latency estimation.

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-stat: convert blk-stat bucket callback to signed
Stephen Bates [Thu, 20 Apr 2017 21:29:16 +0000 (15:29 -0600)]
blk-stat: convert blk-stat bucket callback to signed

In order to allow for filtering of IO based on some other properties
of the request than direction we allow the bucket function to return
an int.

If the bucket callback returns a negative do no count it in the stats
accumulation.

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Fixed up Kyber scheduler stat callback.

Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 18:16:19 +0000 (12:16 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblock: remove the errors field from struct request
Christoph Hellwig [Thu, 20 Apr 2017 14:03:16 +0000 (16:03 +0200)]
block: remove the errors field from struct request

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblktrace: remove the unused block_rq_abort tracepoint
Christoph Hellwig [Thu, 20 Apr 2017 14:03:15 +0000 (16:03 +0200)]
blktrace: remove the unused block_rq_abort tracepoint

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoswim3: remove (commented out) printing of req->errors
Christoph Hellwig [Thu, 20 Apr 2017 14:03:14 +0000 (16:03 +0200)]
swim3: remove (commented out) printing of req->errors

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoataflop: switch from req->errors to req->error_count
Christoph Hellwig [Thu, 20 Apr 2017 14:03:13 +0000 (16:03 +0200)]
ataflop: switch from req->errors to req->error_count

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agofloppy: switch from req->errors to req->error_count
Christoph Hellwig [Thu, 20 Apr 2017 14:03:12 +0000 (16:03 +0200)]
floppy: switch from req->errors to req->error_count

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: add a error_count field to struct request
Christoph Hellwig [Thu, 20 Apr 2017 14:03:11 +0000 (16:03 +0200)]
block: add a error_count field to struct request

This is for the legacy floppy and ataflop drivers that currently abuse
->errors for this purpose.  It's stashed away in a union to not grow
the struct size, the other fields are either used by modern drivers
for different purposes or the I/O scheduler before queing the I/O
to drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: simplify __blk_mq_complete_request
Christoph Hellwig [Thu, 20 Apr 2017 14:03:10 +0000 (16:03 +0200)]
blk-mq: simplify __blk_mq_complete_request

Merge blk_mq_ipi_complete_request and blk_mq_stat_add into their only
caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: remove the error argument to blk_mq_complete_request
Christoph Hellwig [Thu, 20 Apr 2017 14:03:09 +0000 (16:03 +0200)]
blk-mq: remove the error argument to blk_mq_complete_request

Now that all drivers that call blk_mq_complete_requests have a
->complete callback we can remove the direct call to blk_mq_end_request,
as well as the error argument to blk_mq_complete_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoxen-blkfront: don't use req->errors
Christoph Hellwig [Thu, 20 Apr 2017 14:03:08 +0000 (16:03 +0200)]
xen-blkfront: don't use req->errors

xen-blkfron is the last users using rq->errros for passing back error to
blk-mq, and I'd like to get rid of that.  In the longer run the driver
should be moving more of the completion processing into .complete, but
this is the minimal change to move forward for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agomtip32xx: add a status field to struct mtip_cmd
Christoph Hellwig [Thu, 20 Apr 2017 14:03:07 +0000 (16:03 +0200)]
mtip32xx: add a status field to struct mtip_cmd

Instead of using req->errors, which will go away.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonbd: don't use req->errors
Christoph Hellwig [Thu, 20 Apr 2017 14:03:06 +0000 (16:03 +0200)]
nbd: don't use req->errors

Add a nbd-specific field instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agodm mpath: don't check for req->errors
Christoph Hellwig [Thu, 20 Apr 2017 14:03:05 +0000 (16:03 +0200)]
dm mpath: don't check for req->errors

We'll get all proper errors reported through ->end_io and ->errors will
go away soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agodm rq: don't pass irrelevant error code to blk_mq_complete_request
Christoph Hellwig [Thu, 20 Apr 2017 14:03:04 +0000 (16:03 +0200)]
dm rq: don't pass irrelevant error code to blk_mq_complete_request

dm never uses rq->errors, so there is no need to pass an error argument
to blk_mq_complete_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonull_blk: don't pass always-0 req->errors to blk_mq_complete_request
Christoph Hellwig [Thu, 20 Apr 2017 14:03:03 +0000 (16:03 +0200)]
null_blk: don't pass always-0 req->errors to blk_mq_complete_request

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoloop: zero-fill bio on the submitting cpu
Christoph Hellwig [Thu, 20 Apr 2017 14:03:02 +0000 (16:03 +0200)]
loop: zero-fill bio on the submitting cpu

In thruth I've just audited which blk-mq drivers don't currently have a
complete callback, but I think this change is at least borderline useful.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi: introduce a result field in struct scsi_request
Christoph Hellwig [Thu, 20 Apr 2017 14:03:01 +0000 (16:03 +0200)]
scsi: introduce a result field in struct scsi_request

This passes on the scsi_cmnd result field to users of passthrough
requests.  Currently we abuse req->errors for this purpose, but that
field will go away in its current form.

Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it.  I didn't dare
to touch this magic, so the abuses are brought forward 1:1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agovirtio_blk: don't use req->errors
Christoph Hellwig [Thu, 20 Apr 2017 14:03:00 +0000 (16:03 +0200)]
virtio_blk: don't use req->errors

Remove passing req->errors (which at that point is always 0) to
blk_mq_complete_request, and rely on the virtio status code for the
serial number passthrough request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agovirtio: fix spelling of virtblk_scsi_request_done
Christoph Hellwig [Thu, 20 Apr 2017 14:02:59 +0000 (16:02 +0200)]
virtio: fix spelling of virtblk_scsi_request_done

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonvme: make nvme_error_status private
Christoph Hellwig [Thu, 20 Apr 2017 14:02:58 +0000 (16:02 +0200)]
nvme: make nvme_error_status private

Currently it's used by the lighnvm passthrough ioctl, but we'd like to make
it private in preparation of block layer specific error code.  Lighnvm already
returns the real NVMe status anyway, so I think we can just limit it to
returning -EIO for any status set.

This will need a careful audit from the lightnvm folks, though.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonvme: split nvme status from block req->errors
Christoph Hellwig [Thu, 20 Apr 2017 14:02:57 +0000 (16:02 +0200)]
nvme: split nvme status from block req->errors

We want our own clearly defined error field for NVMe passthrough commands,
and the request errors field is going away in its current form.

Just store the status and result field in the nvme_request field from
hardirq completion context (using a new helper) and then generate a
Linux errno for the block layer only when we actually need it.

Because we can't overload the status value with a negative error code
for cancelled command we now have a flags filed in struct nvme_request
that contains a bit for this condition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonvme-fc: fix status code handling in nvme_fc_fcpio_done
Christoph Hellwig [Thu, 20 Apr 2017 14:02:56 +0000 (16:02 +0200)]
nvme-fc: fix status code handling in nvme_fc_fcpio_done

nvme_complete_async_event expects the little endian status code
including the phase bit, and a new completion handler I plan to
introduce will do so as well.

Change the status variable into the little endian format with the
phase bit used in the NVMe CQE to fix / enable this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: remove the blk_execute_rq return value
Christoph Hellwig [Thu, 20 Apr 2017 14:02:55 +0000 (16:02 +0200)]
block: remove the blk_execute_rq return value

The function only returns -EIO if rq->errors is non-zero, which is not
very useful and lets a large number of callers ignore the return value.

Just let the callers figure out their error themselves.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agopd: don't check blk_execute_rq return value.
Christoph Hellwig [Thu, 20 Apr 2017 14:02:54 +0000 (16:02 +0200)]
pd: don't check blk_execute_rq return value.

The driver never sets req->errors, so blk_execute_rq will always return 0.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agobdi: Drop 'parent' argument from bdi_register[_va]()
Jan Kara [Wed, 12 Apr 2017 10:24:49 +0000 (12:24 +0200)]
bdi: Drop 'parent' argument from bdi_register[_va]()

Drop 'parent' argument of bdi_register() and bdi_register_va().  It is
always NULL.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: Remove unused functions
Jan Kara [Wed, 12 Apr 2017 10:24:48 +0000 (12:24 +0200)]
block: Remove unused functions

Now that all backing_dev_info structure are allocated separately, we can
drop some unused functions.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agofs: Remove SB_I_DYNBDI flag
Jan Kara [Wed, 12 Apr 2017 10:24:47 +0000 (12:24 +0200)]
fs: Remove SB_I_DYNBDI flag

Now that all bdi structures filesystems use are properly refcounted, we
can remove the SB_I_DYNBDI flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoubifs: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:46 +0000 (12:24 +0200)]
ubifs: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Richard Weinberger <richard@nod.at>
CC: Artem Bityutskiy <dedekind1@gmail.com>
CC: Adrian Hunter <adrian.hunter@intel.com>
CC: linux-mtd@lists.infradead.org
Acked-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonfs: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:45 +0000 (12:24 +0200)]
nfs: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Anna Schumaker <anna.schumaker@netapp.com>
CC: linux-nfs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoncpfs: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:44 +0000 (12:24 +0200)]
ncpfs: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Petr Vandrovec <petr@vandrovec.name>
Acked-by: Petr Vandrovec <petr@vandrovec.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonilfs2: Convert to properly refcounting bdi
Jan Kara [Wed, 12 Apr 2017 10:24:43 +0000 (12:24 +0200)]
nilfs2: Convert to properly refcounting bdi

Similarly to set_bdev_super() NILFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.

CC: linux-nilfs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agogfs2: Convert to properly refcounting bdi
Jan Kara [Wed, 12 Apr 2017 10:24:42 +0000 (12:24 +0200)]
gfs2: Convert to properly refcounting bdi

Similarly to set_bdev_super() GFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.

CC: Steven Whitehouse <swhiteho@redhat.com>
CC: Bob Peterson <rpeterso@redhat.com>
CC: cluster-devel@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agofuse: Get rid of bdi_initialized
Jan Kara [Wed, 12 Apr 2017 10:24:41 +0000 (12:24 +0200)]
fuse: Get rid of bdi_initialized

It is not needed anymore since bdi is initialized whenever superblock
exists.

CC: Miklos Szeredi <miklos@szeredi.hu>
CC: linux-fsdevel@vger.kernel.org
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agofuse: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:40 +0000 (12:24 +0200)]
fuse: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Miklos Szeredi <miklos@szeredi.hu>
CC: linux-fsdevel@vger.kernel.org
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoexofs: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:39 +0000 (12:24 +0200)]
exofs: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Boaz Harrosh <ooo@electrozaur.com>
CC: Benny Halevy <bhalevy@primarydata.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agocoda: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:38 +0000 (12:24 +0200)]
coda: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Jan Harkes <jaharkes@cs.cmu.edu>
CC: coda@cs.cmu.edu
CC: codalist@coda.cs.cmu.edu
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agomtd: Convert to dynamically allocated bdi infrastructure
Jan Kara [Wed, 12 Apr 2017 10:24:37 +0000 (12:24 +0200)]
mtd: Convert to dynamically allocated bdi infrastructure

MTD already allocates backing_dev_info dynamically. Convert it to use
generic infrastructure for this including proper refcounting. We drop
mtd->backing_dev_info as its only use was to pass mtd_bdi pointer from
one file into another and if we wanted to keep that in a clean way, we'd
have to make mtd hold and drop bdi reference as needed which seems
pointless for passing one global pointer...

CC: David Woodhouse <dwmw2@infradead.org>
CC: Brian Norris <computersforpeace@gmail.com>
CC: linux-mtd@lists.infradead.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoafs: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:36 +0000 (12:24 +0200)]
afs: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: David Howells <dhowells@redhat.com>
CC: linux-afs@lists.infradead.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoecryptfs: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:35 +0000 (12:24 +0200)]
ecryptfs: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Tyler Hicks <tyhicks@canonical.com>
CC: ecryptfs@vger.kernel.org
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agocifs: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:34 +0000 (12:24 +0200)]
cifs: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside superblock. This unifies handling of bdi among users.

CC: Steve French <sfrench@samba.org>
CC: linux-cifs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoceph: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:33 +0000 (12:24 +0200)]
ceph: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside client structure. This unifies handling of bdi among users.

CC: Ilya Dryomov <idryomov@gmail.com>
CC: "Yan, Zheng" <zyan@redhat.com>
CC: Sage Weil <sage@redhat.com>
CC: ceph-devel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agobtrfs: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:32 +0000 (12:24 +0200)]
btrfs: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside superblock. This unifies handling of bdi among users.

CC: Chris Mason <clm@fb.com>
CC: Josef Bacik <jbacik@fb.com>
CC: David Sterba <dsterba@suse.com>
CC: linux-btrfs@vger.kernel.org
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years ago9p: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:31 +0000 (12:24 +0200)]
9p: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside session. This unifies handling of bdi among users.

CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: v9fs-developer@lists.sourceforge.net
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agolustre: Convert to separately allocated bdi
Jan Kara [Wed, 12 Apr 2017 10:24:30 +0000 (12:24 +0200)]
lustre: Convert to separately allocated bdi

Allocate struct backing_dev_info separately instead of embedding it
inside superblock. This unifies handling of bdi among users.

CC: Oleg Drokin <oleg.drokin@intel.com>
CC: Andreas Dilger <andreas.dilger@intel.com>
CC: James Simmons <jsimmons@infradead.org>
CC: lustre-devel@lists.lustre.org
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agofs: Get proper reference for s_bdi
Jan Kara [Wed, 12 Apr 2017 10:24:29 +0000 (12:24 +0200)]
fs: Get proper reference for s_bdi

So far we just relied on block device to hold a bdi reference for us
while the filesystem is mounted. While that works perfectly fine, it is
a bit awkward that we have a pointer to a refcounted structure in the
superblock without proper reference. So make s_bdi hold a proper
reference to block device's BDI. No filesystem using mount_bdev()
actually changes s_bdi so this is safe and will make bdev filesystems
work the same way as filesystems needing to set up their private bdi.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agofs: Provide infrastructure for dynamic BDIs in filesystems
Jan Kara [Wed, 12 Apr 2017 10:24:28 +0000 (12:24 +0200)]
fs: Provide infrastructure for dynamic BDIs in filesystems

Provide helper functions for setting up dynamically allocated
backing_dev_info structures for filesystems and cleaning them up on
superblock destruction.

CC: linux-mtd@lists.infradead.org
CC: linux-nfs@vger.kernel.org
CC: Petr Vandrovec <petr@vandrovec.name>
CC: linux-nilfs@vger.kernel.org
CC: cluster-devel@redhat.com
CC: osd-dev@open-osd.org
CC: codalist@coda.cs.cmu.edu
CC: linux-afs@lists.infradead.org
CC: ecryptfs@vger.kernel.org
CC: linux-cifs@vger.kernel.org
CC: ceph-devel@vger.kernel.org
CC: linux-btrfs@vger.kernel.org
CC: v9fs-developer@lists.sourceforge.net
CC: lustre-devel@lists.lustre.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agobdi: Export bdi_alloc_node() and bdi_put()
Jan Kara [Wed, 12 Apr 2017 10:24:27 +0000 (12:24 +0200)]
bdi: Export bdi_alloc_node() and bdi_put()

MTD will want to call bdi_alloc_node() and bdi_put() directly. Export
these functions.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: Unregister bdi on last reference drop
Jan Kara [Wed, 12 Apr 2017 10:24:26 +0000 (12:24 +0200)]
block: Unregister bdi on last reference drop

Most users will want to unregister bdi when dropping last reference to a
bdi. Only a few users (like block devices) want to play more complex
tricks with bdi registration and unregistration. So unregister bdi when
the last reference to bdi is dropped and just make sure we don't
unregister the bdi the second time if it is already unregistered.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agobdi: Provide bdi_register_va() and bdi_alloc()
Jan Kara [Wed, 12 Apr 2017 10:24:25 +0000 (12:24 +0200)]
bdi: Provide bdi_register_va() and bdi_alloc()

Add function that registers bdi and takes va_list instead of variable
number of arguments.

Add bdi_alloc() as simple wrapper for NUMA-unaware users allocating BDI.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 15:43:06 +0000 (09:43 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblk-throttle: fix unused variable warning with BLK_DEV_THROTTLING_LOW=n
Jens Axboe [Thu, 20 Apr 2017 15:41:36 +0000 (09:41 -0600)]
blk-throttle: fix unused variable warning with BLK_DEV_THROTTLING_LOW=n

We trigger this warning:

block/blk-throttle.c: In function ‘blk_throtl_bio’:
block/blk-throttle.c:2042:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]
  int ret;
      ^~~

since we only assign 'ret' if BLK_DEV_THROTTLING_LOW is off, we never
check it.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 15:39:18 +0000 (09:39 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agobfq: fix compile error if CONFIG_CGROUPS=n
Jens Axboe [Thu, 20 Apr 2017 15:37:05 +0000 (09:37 -0600)]
bfq: fix compile error if CONFIG_CGROUPS=n

If we don't have CGROUPS enabled, the compile ends in the
following misery:

In file included from ../block/bfq-iosched.c:105:0:
../block/bfq-iosched.h:819:22: error: array type has incomplete element type
 extern struct cftype bfq_blkcg_legacy_files[];
                      ^
../block/bfq-iosched.h:820:22: error: array type has incomplete element type
 extern struct cftype bfq_blkg_files[];
                      ^

Move the declarations under the right ifdef.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 14:19:30 +0000 (08:19 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblock, bfq: don't dereference bic before null checking it
Colin Ian King [Thu, 20 Apr 2017 14:07:18 +0000 (15:07 +0100)]
block, bfq: don't dereference bic before null checking it

The call to bfq_check_ioprio_change will dereference bic, however,
the null check for bic is after this call.  Move the the null
check on bic to before the call to avoid any potential null
pointer dereference issues.

Detected by CoverityScan, CID#1430138 ("Dereference before null check")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Thu, 20 Apr 2017 14:18:29 +0000 (08:18 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoligtnvm: fix double blk_put_queue on same queue
Rakesh Pandit [Thu, 20 Apr 2017 14:17:47 +0000 (08:17 -0600)]
ligtnvm: fix double blk_put_queue on same queue

On an error path in NVM_DEV_CREATE ioctl blk_put_queue is being called
twice: one via blk_cleanup_queue and another via put_disk.  Straight fix
seems to remove queue pointer so that disk_release never ends up caling
blk_put_queue again.

  [  391.808827] WARNING: CPU: 1 PID: 1250 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
  [  391.808830] refcount_t: underflow; use-after-free.
  [ 391.808832] Modules linked in: nf_conntrack_netbios_ns............
  [  391.809052] CPU: 1 PID: 1250 Comm: nvme Not tainted.........
  [  391.809057] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
             BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
  [  391.809060] Call Trace:
  [  391.809079]  dump_stack+0x63/0x86
  [  391.809094]  __warn+0xcb/0xf0
  [  391.809103]  warn_slowpath_fmt+0x5f/0x80
  [  391.809118]  refcount_sub_and_test+0x70/0x80
  [  391.809125]  refcount_dec_and_test+0x11/0x20
  [  391.809136]  kobject_put+0x1f/0x60
  [  391.809149]  blk_put_queue+0x15/0x20
  [  391.809159]  disk_release+0xae/0xf0
  [  391.809172]  device_release+0x32/0x90
  [  391.809184]  kobject_release+0x6a/0x170
  [  391.809196]  kobject_put+0x2f/0x60
  [  391.809206]  put_disk+0x17/0x20
  [  391.809219]  nvm_ioctl_dev_create.isra.16+0x897/0xa30
  [  391.809236]  nvm_ctl_ioctl+0x23c/0x4c0
  [  391.809248]  do_vfs_ioctl+0xa3/0x5f0
  [  391.809258]  SyS_ioctl+0x79/0x90
  [  391.809271]  entry_SYSCALL_64_fastpath+0x1a/0xa9
  [  391.809280] RIP: 0033:0x7f5d3ef363c7
  [  391.809286] RSP: 002b:00007ffc72ed8d78 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
  [  391.809296] RAX: ffffffffffffffda RBX: 00007ffc72edb552 RCX: 00007f5d3ef363c7
  [  391.809301] RDX: 00007ffc72ed8d90 RSI: 0000000040804c22 RDI: 0000000000000003
  [  391.809306] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000001
  [  391.809311] R10: 000000000000053f R11: 0000000000000206 R12: 0000000000000000
  [  391.809316] R13: 0000000000000000 R14: 00007ffc72edb58d R15: 00007ffc72edb581

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Fixes: 7d1ef2f408ab "lightnvm: fix cleanup order of disk on init error"
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Wed, 19 Apr 2017 23:53:09 +0000 (17:53 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblock: Optimize ioprio_best()
Bart Van Assche [Wed, 19 Apr 2017 21:01:28 +0000 (14:01 -0700)]
block: Optimize ioprio_best()

Since ioprio_best() translates IOPRIO_CLASS_NONE into IOPRIO_CLASS_BE
and since lower numerical priority values represent a higher priority
a simple numerical comparison is sufficient.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Adam Manzanares <adam.manzanares@wdc.com>
Tested-by: Adam Manzanares <adam.manzanares@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: Inline blk_rq_set_prio()
Bart Van Assche [Wed, 19 Apr 2017 21:01:27 +0000 (14:01 -0700)]
block: Inline blk_rq_set_prio()

Since only a single caller remains, inline blk_rq_set_prio(). Initialize
req->ioprio even if no I/O priority has been set in the bio nor in the
I/O context.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Adam Manzanares <adam.manzanares@wdc.com>
Tested-by: Adam Manzanares <adam.manzanares@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agolightnvm: Use blk_init_request_from_bio() instead of open-coding it
Bart Van Assche [Wed, 19 Apr 2017 21:01:26 +0000 (14:01 -0700)]
lightnvm: Use blk_init_request_from_bio() instead of open-coding it

This patch changes the behavior of the lightnvm driver as follows:
* REQ_FAILFAST_MASK is set for read-ahead requests.
* If no I/O priority has been set in the bio, the I/O priority is
  copied from the I/O context.
* The rq_disk member is initialized if bio->bi_bdev != NULL.
* The bio sector offset is copied into req->__sector instead of
  retaining the value -1 set by blk_mq_alloc_request().
* req->errors is initialized to zero.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matias Bjørling <m@bjorling.me>
Cc: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agonull_blk: Use blk_init_request_from_bio() instead of open-coding it
Bart Van Assche [Wed, 19 Apr 2017 21:01:25 +0000 (14:01 -0700)]
null_blk: Use blk_init_request_from_bio() instead of open-coding it

This patch changes the behavior of the null_blk driver for the
LightNVM mode as follows:
* REQ_FAILFAST_MASK is set for read-ahead requests.
* If no I/O priority has been set in the bio, the I/O priority is
  copied from the I/O context.
* The rq_disk member is initialized if bio->bi_bdev != NULL.
* req->errors is initialized to zero.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matias Bjørling <m@bjorling.me>
Cc: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: Export blk_init_request_from_bio()
Bart Van Assche [Wed, 19 Apr 2017 21:01:24 +0000 (14:01 -0700)]
block: Export blk_init_request_from_bio()

Export this function such that it becomes available to block
drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matias Bjørling <m@bjorling.me>
Cc: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Wed, 19 Apr 2017 18:07:48 +0000 (12:07 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agolightnvm: assume 64-bit lba numbers
Arnd Bergmann [Wed, 19 Apr 2017 17:39:13 +0000 (19:39 +0200)]
lightnvm: assume 64-bit lba numbers

The driver uses both u64 and sector_t to refer to offsets, and assigns between the
two. This causes one harmless warning when sector_t is 32-bit:

drivers/lightnvm/pblk-rb.c: In function 'pblk_rb_write_entry_gc':
include/linux/lightnvm.h:215:20: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
drivers/lightnvm/pblk-rb.c:324:22: note: in expansion of macro 'ADDR_EMPTY'

As the driver is already doing this inconsistently, changing the type
won't make it worse and is an easy way to avoid the warning.

Fixes: a4bd217b4326 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.12/block' into for-next
Jens Axboe [Wed, 19 Apr 2017 16:27:04 +0000 (10:27 -0600)]
Merge branch 'for-4.12/block' into for-next

8 years agoblock: make __blk_end_bidi_request private
Christoph Hellwig [Wed, 12 Apr 2017 10:13:59 +0000 (12:13 +0200)]
block: make __blk_end_bidi_request private

blk_insert_flush should be using __blk_end_request to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: remove blk_end_request_cur
Christoph Hellwig [Wed, 12 Apr 2017 10:13:58 +0000 (12:13 +0200)]
block: remove blk_end_request_cur

This function is not used anywhere in the kernel.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>