linux-2.6-block.git
5 years agoMerge branch 'for-5.8/block' into for-next
Jens Axboe [Wed, 13 May 2020 02:36:40 +0000 (20:36 -0600)]
Merge branch 'for-5.8/block' into for-next

* for-5.8/block:
  zonefs: use REQ_OP_ZONE_APPEND for sync DIO
  block: export bio_release_pages and bio_iov_iter_get_pages
  null_blk: Support REQ_OP_ZONE_APPEND
  scsi: sd_zbc: emulate ZONE_APPEND commands
  scsi: sd_zbc: factor out sanity checks for zoned commands
  block: Modify revalidate zones
  block: introduce blk_req_zone_write_trylock
  block: Introduce REQ_OP_ZONE_APPEND
  block: rename __bio_add_pc_page to bio_add_hw_page
  block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no

5 years agozonefs: use REQ_OP_ZONE_APPEND for sync DIO
Johannes Thumshirn [Tue, 12 May 2020 08:55:54 +0000 (17:55 +0900)]
zonefs: use REQ_OP_ZONE_APPEND for sync DIO

Synchronous direct I/O to a sequential write only zone can be issued using
the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple
BIOs can potentially result in reordering, we cannot support asynchronous
IO via this interface.

We also can only dispatch up to queue_max_zone_append_sectors() via the
new zone-append method and have to return a short write back to user-space
in case an IO larger than queue_max_zone_append_sectors() has been issued.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: export bio_release_pages and bio_iov_iter_get_pages
Johannes Thumshirn [Tue, 12 May 2020 08:55:53 +0000 (17:55 +0900)]
block: export bio_release_pages and bio_iov_iter_get_pages

Export bio_release_pages and bio_iov_iter_get_pages, so they can be used
from modular code.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonull_blk: Support REQ_OP_ZONE_APPEND
Damien Le Moal [Tue, 12 May 2020 08:55:52 +0000 (17:55 +0900)]
null_blk: Support REQ_OP_ZONE_APPEND

Support REQ_OP_ZONE_APPEND requests for null_blk devices with zoned
mode enabled. Use the internally tracked zone write pointer position
as the actual write position and return it using the command request
__sector field in the case of an mq device and using the command BIO
sector in the case of a BIO device.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoscsi: sd_zbc: emulate ZONE_APPEND commands
Johannes Thumshirn [Tue, 12 May 2020 08:55:51 +0000 (17:55 +0900)]
scsi: sd_zbc: emulate ZONE_APPEND commands

Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) command
with a start LBA set to the target zone write pointer position.

In order to always know the write pointer position of a sequential write
zone, the write pointer of all zones is tracked using an array of 32bits
zone write pointer offset attached to the scsi disk structure. Each
entry of the array indicate a zone write pointer position relative to
the zone start sector. The write pointer offsets are maintained in sync
with the device as follows:
1) the write pointer offset of a zone is reset to 0 when a
   REQ_OP_ZONE_RESET command completes.
2) the write pointer offset of a zone is set to the zone size when a
   REQ_OP_ZONE_FINISH command completes.
3) the write pointer offset of a zone is incremented by the number of
   512B sectors written when a write, write same or a zone append
   command completes.
4) the write pointer offset of all zones is reset to 0 when a
   REQ_OP_ZONE_RESET_ALL command completes.

Since the block layer does not write lock zones for zone append
commands, to ensure a sequential ordering of the regular write commands
used for the emulation, the target zone of a zone append command is
locked when the function sd_zbc_prepare_zone_append() is called from
sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained
(e.g. a zone append is in-flight or a regular write has already locked
the zone), the zone append command dispatching is delayed by returning
BLK_STS_ZONE_RESOURCE.

To avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL
requests, use a spinlock to protect accesses and modifications of the
zone write pointer offsets. This spinlock is initialized from sd_probe()
using the new function sd_zbc_init().

Co-developed-by: Damien Le Moal <Damien.LeMoal@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoscsi: sd_zbc: factor out sanity checks for zoned commands
Johannes Thumshirn [Tue, 12 May 2020 08:55:50 +0000 (17:55 +0900)]
scsi: sd_zbc: factor out sanity checks for zoned commands

Factor sanity checks for zoned commands from sd_zbc_setup_zone_mgmt_cmnd().

This will help with the introduction of an emulated ZONE_APPEND command.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: Modify revalidate zones
Damien Le Moal [Tue, 12 May 2020 08:55:49 +0000 (17:55 +0900)]
block: Modify revalidate zones

Modify the interface of blk_revalidate_disk_zones() to add an optional
driver callback function that a driver can use to extend processing
done during zone revalidation. The callback, if defined, is executed
with the device request queue frozen, after all zones have been
inspected.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: introduce blk_req_zone_write_trylock
Johannes Thumshirn [Tue, 12 May 2020 08:55:48 +0000 (17:55 +0900)]
block: introduce blk_req_zone_write_trylock

Introduce blk_req_zone_write_trylock(), which either grabs the write-lock
for a sequential zone or returns false, if the zone is already locked.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: Introduce REQ_OP_ZONE_APPEND
Keith Busch [Tue, 12 May 2020 08:55:47 +0000 (17:55 +0900)]
block: Introduce REQ_OP_ZONE_APPEND

Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned
block device. This is a no-merge write operation.

A zone append write BIO must:
* Target a zoned block device
* Have a sector position indicating the start sector of the target zone
* The target zone must be a sequential write zone
* The BIO must not cross a zone boundary
* The BIO size must not be split to ensure that a single range of LBAs
  is written with a single command.

Implement these checks in generic_make_request_checks() using the
helper function blk_check_zone_append(). To avoid write append BIO
splitting, introduce the new max_zone_append_sectors queue limit
attribute and ensure that a BIO size is always lower than this limit.
Export this new limit through sysfs and check these limits in bio_full().

Also when a LLDD can't dispatch a request to a specific zone, it
will return BLK_STS_ZONE_RESOURCE indicating this request needs to
be delayed, e.g.  because the zone it will be dispatched to is still
write-locked. If this happens set the request aside in a local list
to continue trying dispatching requests such as READ requests or a
WRITE/ZONE_APPEND requests targetting other zones. This way we can
still keep a high queue depth without starving other requests even if
one request can't be served due to zone write-locking.

Finally, make sure that the bio sector position indicates the actual
write position as indicated by the device on completion.

Signed-off-by: Keith Busch <kbusch@kernel.org>
[ jth: added zone-append specific add_page and merge_page helpers ]
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: rename __bio_add_pc_page to bio_add_hw_page
Christoph Hellwig [Tue, 12 May 2020 08:55:46 +0000 (17:55 +0900)]
block: rename __bio_add_pc_page to bio_add_hw_page

Rename __bio_add_pc_page() to bio_add_hw_page() and explicitly pass in a
max_sectors argument.

This max_sectors argument can be used to specify constraints from the
hardware.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[ jth: rebased and made public for blk-map.c ]
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no
Johannes Thumshirn [Tue, 12 May 2020 08:55:45 +0000 (17:55 +0900)]
block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no

blk_queue_zone_is_seq() and blk_queue_zone_no() have not been called with
CONFIG_BLK_DEV_ZONED disabled until now.

The introduction of REQ_OP_ZONE_APPEND will change this, so we need to
provide noop fallbacks for the !CONFIG_BLK_DEV_ZONED case.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'for-5.8/block' into for-next
Jens Axboe [Wed, 13 May 2020 02:32:50 +0000 (20:32 -0600)]
Merge branch 'for-5.8/block' into for-next

* for-5.8/block:
  block: add blk_io_schedule() for avoiding task hung in sync dio

5 years agoblock: add blk_io_schedule() for avoiding task hung in sync dio
Ming Lei [Sun, 3 May 2020 01:54:22 +0000 (09:54 +0800)]
block: add blk_io_schedule() for avoiding task hung in sync dio

Sync dio could be big, or may take long time in discard or in case of
IO failure.

We have prevented task hung in submit_bio_wait() and blk_execute_rq(),
so apply the same trick for prevent task hung from happening in sync dio.

Add helper of blk_io_schedule() and use io_schedule_timeout() to prevent
task hung warning.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: Salman Qazi <sqazi@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'for-5.8/block' into for-next
Jens Axboe [Wed, 13 May 2020 02:31:58 +0000 (20:31 -0600)]
Merge branch 'for-5.8/block' into for-next

* for-5.8/block:
  block: don't hold part0's refcount in IO path
  block: re-organize fields of 'struct hd_part'
  block: only define 'nr_sects_seq' in hd_part for 32bit SMP
  block: fix use-after-free on cached last_lookup partition

5 years agoblock: don't hold part0's refcount in IO path
Ming Lei [Fri, 8 May 2020 08:17:58 +0000 (16:17 +0800)]
block: don't hold part0's refcount in IO path

gendisk can't be gone when there is IO activity, so not hold
part0's refcount in IO path.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: re-organize fields of 'struct hd_part'
Ming Lei [Fri, 8 May 2020 08:17:57 +0000 (16:17 +0800)]
block: re-organize fields of 'struct hd_part'

Put all fields accessed in IO path together at the beginning
of the struct, so that all can be fetched in single cacheline.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: only define 'nr_sects_seq' in hd_part for 32bit SMP
Ming Lei [Fri, 8 May 2020 08:17:56 +0000 (16:17 +0800)]
block: only define 'nr_sects_seq' in hd_part for 32bit SMP

The seqcount of 'nr_sects_seq' is only needed in case of 32bit SMP,
so define it just for 32bit SMP.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblock: fix use-after-free on cached last_lookup partition
Ming Lei [Fri, 8 May 2020 08:17:55 +0000 (16:17 +0800)]
block: fix use-after-free on cached last_lookup partition

delete_partition() clears the cached last_lookup partition. However the
.last_lookup cache may be overwritten by one IO path after it is cleared
from delete_partition(). Then another IO path may use the cached deleting
partition after hd_struct_free() is called, then use-after-free is triggered
on the cached partition.

Fixes the issue by the following approach:

1) always get the partition's refcount via hd_struct_try_get() before
setting .last_lookup

2) move clearing .last_lookup from delete_partition() to hd_struct_free()
which is the release handle of the partition's percpu-refcount, so that no
IO path can cache deleteing partition via .last_lookup.

It is one candidate approach of Yufen's patch[1] which adds overhead
in fast path by indirect lookup which may introduce one extra cacheline
in IO path. Also this patch relies on percpu-refcount's protection, and
it is easier to understand and verify.

[1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t

Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'for-5.8/block' into for-next
Jens Axboe [Wed, 13 May 2020 02:20:34 +0000 (20:20 -0600)]
Merge branch 'for-5.8/block' into for-next

* for-5.8/block:
  block: reset mapping if failed to update hardware queue count

5 years agoblock: reset mapping if failed to update hardware queue count
Weiping Zhang [Wed, 13 May 2020 00:44:05 +0000 (08:44 +0800)]
block: reset mapping if failed to update hardware queue count

When we increase hardware queue count, blk_mq_update_queue_map will
reset the mapping between cpu and hardware queue base on the hardware
queue count(set->nr_hw_queues). The mapping cannot be reset if it
encounters error in blk_mq_realloc_hw_ctxs, but the fallback flow will
continue using it, then blk_mq_map_swqueue will touch a invalid memory,
because the mapping points to a wrong hctx.

blktest block/030:

null_blk: module loaded
Increasing nr_hw_queues to 8 fails, fallback to 1
==================================================================
BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830
Read of size 8 at addr 0000000000000128 by task nproc/8541

CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Call Trace:
dump_stack+0xa5/0xe6
__kasan_report.cold+0x65/0xbb
kasan_report+0x45/0x60
check_memory_region+0x15e/0x1c0
__kasan_check_read+0x15/0x20
blk_mq_map_swqueue+0x2f2/0x830
__blk_mq_update_nr_hw_queues+0x3df/0x690
blk_mq_update_nr_hw_queues+0x32/0x50
nullb_device_submit_queues_store+0xde/0x160 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x14b/0x2d0
ksys_write+0xdd/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x310
entry_SYSCALL_64_after_hwframe+0x49/0xb3

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Tested-by: Bart van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'for-5.8/drivers' into for-next
Jens Axboe [Tue, 12 May 2020 17:36:06 +0000 (11:36 -0600)]
Merge branch 'for-5.8/drivers' into for-next

* for-5.8/drivers: (31 commits)
  floppy: suppress UBSAN warning in setup_rw_floppy()
  floppy: add defines for sizes of cmd & reply buffers of floppy_raw_cmd
  floppy: add FD_AUTODETECT_SIZE define for struct floppy_drive_params
  floppy: use print_hex_dump() in setup_DMA()
  floppy: cleanup: make set_fdc() always set current_drive and current_fd
  floppy: cleanup: get rid of current_reqD in favor of current_drive
  floppy: make sure to reset all FDCs upon resume()
  floppy: cleanup: do not iterate on current_fdc in do_floppy_init()
  floppy: cleanup: add a few comments about expectations in certain functions
  floppy: cleanup: do not iterate on current_fdc in DMA grab/release functions
  floppy: cleanup: make get_fdc_version() not rely on current_fdc anymore
  floppy: cleanup: make next_valid_format() not rely on current_drive anymore
  floppy: cleanup: make check_wp() not rely on current_{fdc,drive} anymore
  floppy: cleanup: make fdc_specify() not rely on current_{fdc,drive} anymore
  floppy: cleanup: make fdc_configure() not rely on current_fdc anymore
  floppy: cleanup: make perpendicular_mode() not rely on current_fdc anymore
  floppy: cleanup: make need_more_output() not rely on current_fdc anymore
  floppy: cleanup: make result() not rely on current_fdc anymore
  floppy: cleanup: make output_byte() not rely on current_fdc anymore
  floppy: cleanup: make wait_til_ready() not rely on current_fdc anymore
  ...

5 years agoMerge tag 'floppy-for-5.8' of https://github.com/evdenis/linux-floppy into for-5...
Jens Axboe [Tue, 12 May 2020 17:35:49 +0000 (11:35 -0600)]
Merge tag 'floppy-for-5.8' of https://github.com/evdenis/linux-floppy into for-5.8/drivers

Floppy patches for 5.8

Cleanups:
  - symbolic register names for x86,sparc64,sparc32,powerpc,parisc,m68k
  - split of local/global variables for drive,fdc
  - UBSAN warning suppress in setup_rw_floppy()

Changes were compile tested on arm, sparc64, powerpc, m68k. Many patches
introduce no binary changes by using defines instead of magic numbers.
The patches were also tested with syzkaller and simple write/read/format
tests on real hardware.

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* tag 'floppy-for-5.8' of https://github.com/evdenis/linux-floppy: (31 commits)
  floppy: suppress UBSAN warning in setup_rw_floppy()
  floppy: add defines for sizes of cmd & reply buffers of floppy_raw_cmd
  floppy: add FD_AUTODETECT_SIZE define for struct floppy_drive_params
  floppy: use print_hex_dump() in setup_DMA()
  floppy: cleanup: make set_fdc() always set current_drive and current_fd
  floppy: cleanup: get rid of current_reqD in favor of current_drive
  floppy: make sure to reset all FDCs upon resume()
  floppy: cleanup: do not iterate on current_fdc in do_floppy_init()
  floppy: cleanup: add a few comments about expectations in certain functions
  floppy: cleanup: do not iterate on current_fdc in DMA grab/release functions
  floppy: cleanup: make get_fdc_version() not rely on current_fdc anymore
  floppy: cleanup: make next_valid_format() not rely on current_drive anymore
  floppy: cleanup: make check_wp() not rely on current_{fdc,drive} anymore
  floppy: cleanup: make fdc_specify() not rely on current_{fdc,drive} anymore
  floppy: cleanup: make fdc_configure() not rely on current_fdc anymore
  floppy: cleanup: make perpendicular_mode() not rely on current_fdc anymore
  floppy: cleanup: make need_more_output() not rely on current_fdc anymore
  floppy: cleanup: make result() not rely on current_fdc anymore
  floppy: cleanup: make output_byte() not rely on current_fdc anymore
  floppy: cleanup: make wait_til_ready() not rely on current_fdc anymore
  ...

5 years agofloppy: suppress UBSAN warning in setup_rw_floppy()
Denis Efremov [Fri, 1 May 2020 13:44:16 +0000 (16:44 +0300)]
floppy: suppress UBSAN warning in setup_rw_floppy()

UBSAN: array-index-out-of-bounds in drivers/block/floppy.c:1521:45
index 16 is out of range for type 'unsigned char [16]'
Call Trace:
...
 setup_rw_floppy+0x5c3/0x7f0
 floppy_ready+0x2be/0x13b0
 process_one_work+0x2c1/0x5d0
 worker_thread+0x56/0x5e0
 kthread+0x122/0x170
 ret_from_fork+0x35/0x40

From include/uapi/linux/fd.h:
struct floppy_raw_cmd {
...
unsigned char cmd_count;
unsigned char cmd[16];
unsigned char reply_count;
unsigned char reply[16];
...
}

This out-of-bounds access is intentional. The command in struct
floppy_raw_cmd may take up the space initially intended for the reply
and the reply count. It is needed for long 82078 commands such as
RESTORE, which takes 17 command bytes. Initial cmd size is not enough
and since struct setup_rw_floppy is a part of uapi we check that
cmd_count is in [0:16+1+16] in raw_cmd_copyin().

The patch adds union with original cmd,reply_count,reply fields and
fullcmd field of equivalent size. The cmd accesses are turned to
fullcmd where appropriate to suppress UBSAN warning.

Link: https://lore.kernel.org/r/20200501134416.72248-5-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: add defines for sizes of cmd & reply buffers of floppy_raw_cmd
Denis Efremov [Fri, 1 May 2020 13:44:15 +0000 (16:44 +0300)]
floppy: add defines for sizes of cmd & reply buffers of floppy_raw_cmd

Use FD_RAW_CMD_SIZE, FD_RAW_REPLY_SIZE defines instead of magic numbers
for cmd & reply buffers of struct floppy_raw_cmd. Remove local to
floppy.c MAX_REPLIES define, as it is now FD_RAW_REPLY_SIZE.
FD_RAW_CMD_FULLSIZE added as we allow command to also fill reply_count
and reply fields.

Link: https://lore.kernel.org/r/20200501134416.72248-4-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: add FD_AUTODETECT_SIZE define for struct floppy_drive_params
Denis Efremov [Fri, 1 May 2020 13:44:14 +0000 (16:44 +0300)]
floppy: add FD_AUTODETECT_SIZE define for struct floppy_drive_params

Use FD_AUTODETECT_SIZE for autodetect buffer size in struct
floppy_drive_params instead of a magic number.

Link: https://lore.kernel.org/r/20200501134416.72248-3-efremov@linux.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: use print_hex_dump() in setup_DMA()
Denis Efremov [Fri, 1 May 2020 13:44:13 +0000 (16:44 +0300)]
floppy: use print_hex_dump() in setup_DMA()

Remove pr_cont() and use print_hex_dump() in setup_DMA() to print the
contents of the cmd buffer.

Link: https://lore.kernel.org/r/20200501134416.72248-2-efremov@linux.com
Suggested-by: Joe Perches <joe@perches.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make set_fdc() always set current_drive and current_fd
Willy Tarreau [Fri, 10 Apr 2020 10:19:04 +0000 (12:19 +0200)]
floppy: cleanup: make set_fdc() always set current_drive and current_fd

When called with a negative drive value, set_fdc() would stick to the
current fdc (which was assumed to reflect the current_drive's FDC). We
do not need this anymore as the last call place with a negative value
was just addressed. Let's make this function always set both current_fdc
and current_drive so that there's no more ambiguity. A few comments
stating this were added to a few non-obvious places.

Link: https://lore.kernel.org/r/20200410101904.14652-3-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: get rid of current_reqD in favor of current_drive
Willy Tarreau [Fri, 10 Apr 2020 10:19:03 +0000 (12:19 +0200)]
floppy: cleanup: get rid of current_reqD in favor of current_drive

This macro equals -1 and is used as an alternative for current_drive when
calling reschedule_timeout(), which in turn needs to remap it. This only
adds obfuscation, let's simply use current_drive.

Link: https://lore.kernel.org/r/20200410101904.14652-2-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: make sure to reset all FDCs upon resume()
Willy Tarreau [Fri, 10 Apr 2020 10:19:02 +0000 (12:19 +0200)]
floppy: make sure to reset all FDCs upon resume()

In floppy_resume() we don't properly reinitialize all FDCs, instead
we reinitialize the current FDC once per available FDC because value
-1 is passed to user_reset_fdc(). Let's simply save the current drive
and properly reinitialize each FDC.

Link: https://lore.kernel.org/r/20200410101904.14652-1-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: do not iterate on current_fdc in do_floppy_init()
Willy Tarreau [Fri, 10 Apr 2020 09:30:23 +0000 (11:30 +0200)]
floppy: cleanup: do not iterate on current_fdc in do_floppy_init()

There's no need to iterate on current_fdc in do_floppy_init() anymore,
in the first case it's only used as an array index to access fdc_state[],
so let's get rid of this confusing assignment. The second case is a bit
trickier because user_reset_fdc() needs to already know current_fdc when
called with drive==-1 due to this call chain:

    user_reset_fdc()
      lock_fdc()
        set_fdc()
           drive<0 ==> new_fdc = current_fdc

Note that current_drive is not used in this code part and may even not
match a unit belonging to current_fdc. Instead of passing -1 we can
simply pass the first drive of the FDC being initialized, which is even
cleaner as it will allow the function chain above to consistently assign
both variables.

Link: https://lore.kernel.org/r/20200410093023.14499-1-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: add a few comments about expectations in certain functions
Willy Tarreau [Tue, 31 Mar 2020 09:40:54 +0000 (11:40 +0200)]
floppy: cleanup: add a few comments about expectations in certain functions

The locking in the driver is far from being obvious, with unlocking
automatically happening at end of operations scheduled by interrupt,
especially for the error paths where one does not necessarily expect
that such an interrupt may be triggered. Let's add a few comments
about what to expect at certain places to avoid misdetecting bugs
which are not.

Link: https://lore.kernel.org/r/20200331094054.24441-24-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: do not iterate on current_fdc in DMA grab/release functions
Willy Tarreau [Tue, 31 Mar 2020 09:40:53 +0000 (11:40 +0200)]
floppy: cleanup: do not iterate on current_fdc in DMA grab/release functions

Both floppy_grab_irq_and_dma() and floppy_release_irq_and_dma() used to
iterate on the global variable while setting up or freeing resources.
Now that they exclusively rely on functions which take the fdc as an
argument, so let's not touch the global one anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-23-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make get_fdc_version() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:52 +0000 (11:40 +0200)]
floppy: cleanup: make get_fdc_version() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-22-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make next_valid_format() not rely on current_drive anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:51 +0000 (11:40 +0200)]
floppy: cleanup: make next_valid_format() not rely on current_drive anymore

Now the drive is passed in argument so that the function does not
use current_drive anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-21-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make check_wp() not rely on current_{fdc,drive} anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:50 +0000 (11:40 +0200)]
floppy: cleanup: make check_wp() not rely on current_{fdc,drive} anymore

Now the fdc and drive are passed in argument so that the function does
not use current_fdc nor current_drive anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-20-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make fdc_specify() not rely on current_{fdc,drive} anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:49 +0000 (11:40 +0200)]
floppy: cleanup: make fdc_specify() not rely on current_{fdc,drive} anymore

Now the fdc and drive are passed in argument so that the function does
not use current_fdc nor current_drive anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-19-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make fdc_configure() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:48 +0000 (11:40 +0200)]
floppy: cleanup: make fdc_configure() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-18-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make perpendicular_mode() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:47 +0000 (11:40 +0200)]
floppy: cleanup: make perpendicular_mode() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

It's worth noting that there's still a single raw_cmd pointer
specific to the current fdc. It may make sense to have one per
fdc in the future. In addition, cont->done() still relies on the
current drive and current raw_cmd.

Link: https://lore.kernel.org/r/20200331094054.24441-17-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make need_more_output() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:46 +0000 (11:40 +0200)]
floppy: cleanup: make need_more_output() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-16-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make result() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:45 +0000 (11:40 +0200)]
floppy: cleanup: make result() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

It's worth noting that there's still a single reply_buffer[] which
will store the result for the current fdc. It may or may not make
sense to implement one buffer per fdc in the future.

Link: https://lore.kernel.org/r/20200331094054.24441-15-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make output_byte() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:44 +0000 (11:40 +0200)]
floppy: cleanup: make output_byte() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-14-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make wait_til_ready() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:43 +0000 (11:40 +0200)]
floppy: cleanup: make wait_til_ready() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-13-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make show_floppy() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:42 +0000 (11:40 +0200)]
floppy: cleanup: make show_floppy() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-12-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make reset_fdc_info() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:41 +0000 (11:40 +0200)]
floppy: cleanup: make reset_fdc_info() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-11-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: cleanup: make twaddle() not rely on current_{fdc,drive} anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:40 +0000 (11:40 +0200)]
floppy: cleanup: make twaddle() not rely on current_{fdc,drive} anymore

Now the fdc and drive are passed in argument so that the function does
not use current_fdc nor current_drive anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-10-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: use symbolic register names in the x86 port
Willy Tarreau [Tue, 31 Mar 2020 09:40:39 +0000 (11:40 +0200)]
floppy: use symbolic register names in the x86 port

Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-9-w@1wt.eu
Cc: x86@kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: use symbolic register names in the sparc64 port
Willy Tarreau [Tue, 31 Mar 2020 09:40:38 +0000 (11:40 +0200)]
floppy: use symbolic register names in the sparc64 port

Now by splitting the base address from the register index we can
use the symbolic register names instead of the hard-coded numeric
values.

Link: https://lore.kernel.org/r/20200331094054.24441-8-w@1wt.eu
Cc: "David S. Miller" <davem@davemloft.net>
[willy: fix printk warnings s/%lx/%x/g in sun_82077_fd_{inb,outb}()]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: use symbolic register names in the sparc32 port
Willy Tarreau [Tue, 31 Mar 2020 09:40:37 +0000 (11:40 +0200)]
floppy: use symbolic register names in the sparc32 port

The sparc port used to be forced to rely on numeric register indexes
with their equivalent in comments. Now that they don't depend on the
IO port we can use their symbolic names.

Link: https://lore.kernel.org/r/20200331094054.24441-7-w@1wt.eu
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: use symbolic register names in the powerpc port
Willy Tarreau [Tue, 31 Mar 2020 09:40:36 +0000 (11:40 +0200)]
floppy: use symbolic register names in the powerpc port

Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-6-w@1wt.eu
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: use symbolic register names in the parisc port
Willy Tarreau [Tue, 31 Mar 2020 09:40:35 +0000 (11:40 +0200)]
floppy: use symbolic register names in the parisc port

Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-5-w@1wt.eu
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: use symbolic register names in the m68k port
Willy Tarreau [Tue, 31 Mar 2020 09:40:34 +0000 (11:40 +0200)]
floppy: use symbolic register names in the m68k port

Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-4-w@1wt.eu
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: add references to 82077's extra registers
Willy Tarreau [Tue, 31 Mar 2020 09:40:33 +0000 (11:40 +0200)]
floppy: add references to 82077's extra registers

This controller provides extra status registers SRA and SRB as well
as a tape drive register (TDR) and a data rate select register (DSR),
which are referenced in the sparc port, so let's have their symbolic
definitions centralized.

Link: https://lore.kernel.org/r/20200331094054.24441-3-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agofloppy: split the base port from the register in I/O accesses
Willy Tarreau [Tue, 31 Mar 2020 09:40:32 +0000 (11:40 +0200)]
floppy: split the base port from the register in I/O accesses

Currently we have architecture-specific fd_inb() and fd_outb() functions
or macros, taking just a port which is in fact made of a base address and
a register. The base address is FDC-specific and derived from the local or
global "fdc" variable through the FD_IOPORT macro used in the base address
calculation.

This change splits this by explicitly passing the FDC's base address and
the register separately to fd_outb() and fd_inb(). It affects the
following archs:
  - x86, alpha, mips, powerpc, parisc, arm, m68k:
    simple remap of port -> base+reg

  - sparc32: use of reg only, since the base address was already masked
    out and the FDC controller is known from a static struct.

  - sparc64: like x86 for PCI, like sparc32 for 82077

Some archs use inline functions and others macros. This was not
unified in order to minimize the number of changes to review. For the
same reason checkpatch still spews a few warnings about things that
were already there before.

The parisc still uses hard-coded register values and could be cleaned up
by taking the register definitions.

The sparc per-controller inb/outb functions could further be refined
to explicitly take an FDC register instead of a port in argument but it
was not needed yet and may be cleaned later.

Link: https://lore.kernel.org/r/20200331094054.24441-2-w@1wt.eu
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
5 years agoMerge branch 'for-5.8/block' into for-next
Jens Axboe [Mon, 11 May 2020 15:08:44 +0000 (09:08 -0600)]
Merge branch 'for-5.8/block' into for-next

* for-5.8/block:
  bdi: fix up for "remove the name field in struct backing_dev_info"

5 years agobdi: fix up for "remove the name field in struct backing_dev_info"
Stephen Rothwell [Mon, 11 May 2020 04:19:30 +0000 (14:19 +1000)]
bdi: fix up for "remove the name field in struct backing_dev_info"

Fixes: 1cd925d58385 ("bdi: remove the name field in struct backing_dev_info")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'for-5.8/drivers' into for-next
Jens Axboe [Sat, 9 May 2020 22:18:54 +0000 (16:18 -0600)]
Merge branch 'for-5.8/drivers' into for-next

* for-5.8/drivers: (62 commits)
  nvme: define constants for identification values
  nvmet: align addrfam list to spec
  nvmet: centralize port enable access for configfs
  nvmet: use type-name map for address treq
  nvmet: use type-name map for ana states
  nvmet: use type-name map for address family
  nvmet: add generic type-name mapping
  nvme-multipath: stop using ->queuedata
  nvme-tcp: try to send request in queue_rq context
  nvme-tcp: avoid scheduling io_work if we are already polling
  nvme-tcp: use bh_lock in data_ready
  nvme-pci: align io queue count with allocted nvme_queue in nvme_probe
  nvme-pci: remove last_sq_tail
  nvme-pci: remove volatile cqes
  nvme: flush scan work on passthrough commands
  nvme: clean up error handling in nvme_init_ns_head
  nvme-fc: avoid gcc-10 zero-length-bounds warning
  nvmet: add ns revalidation support
  nvme: consolodate io settings
  nvme: revalidate namespace stream parameters
  ...

5 years agonvme: define constants for identification values
Keith Busch [Fri, 3 Apr 2020 17:53:46 +0000 (10:53 -0700)]
nvme: define constants for identification values

Improve code readability by defining the specification's constants that
the driver is using when decoding identification payloads.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Bart van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet: align addrfam list to spec
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:48 +0000 (01:56 -0700)]
nvmet: align addrfam list to spec

With reference to the NVMeOF Specification (page 44, Figure 38)
discovery log page entry provides address family field. We do set the
transport type field but the adrfam field is not set when using loop
transport and also it doesn't have support in the nvme-cli. So when
reading discovery log page with a loop transport it leads to confusing
output.

As per the spec for adrfam value 254 is reserved for Intra Host
Transport i.e. loopback), we add a required macro in the protocol
header file, set default port disc addr entry's adrfam to
NVMF_ADDR_FAMILY_MAX, and update nvmet_addr_family configfs array for
show/store attribute.

Without this patch, setting adrfam to (ipv4/ipv6/ib/fc/loop/" ") we get
following output for nvme discover command from nvme-cli which is
confusing.
trtype:  loop
adrfam:  ipv4
trtype:  loop
adrfam:  ipv6
trtype:  loop
adrfam:  infiniband
trtype:  loop
adrfam:  fibre-channel
trtype:  loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam:  pci            # <----- pci for loop
trtype:  loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = " "
adrfam:  pci            # <----- pci for unrecognized

This patch fixes above output :-
trtype:  loop
adrfam:  ipv4
trtype:  loop
adrfam:  ipv6
trtype:  loop
adrfam:  infiniband
trtype:  loop
adrfam:  fibre-channel
trtype:  loop           # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam:  loop           # <----- loop for loop
trtype:  loop # ${CFGFS_HOME}/config/nvmet/ports/adrfam = " "
adrfam:  unrecognized   # <----- unrecognized when invalid value

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet: centralize port enable access for configfs
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:47 +0000 (01:56 -0700)]
nvmet: centralize port enable access for configfs

The configfs attributes which are supposed to set when port is disable
such as addr[addrfam|portid|traddr|treq|trsvcid|inline_data_size|trtype]
has repetitive check and generic error message printing.

This patch creates centralize helper to check and print an error
message that also accepts caller as a parameter. This makes error
message easy to parse for the user, removes the duplicate code and
makes it available for futures such scenarios.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet: use type-name map for address treq
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:46 +0000 (01:56 -0700)]
nvmet: use type-name map for address treq

Currently nvmet_addr_treq_[store|show]() uses switch and if else
ladder for address transport requirements to string and reverse
mapping. With addtion of the generic nvmet_type_name_map structure
we can get rid of the switch and if else ladder with string
duplication.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet: use type-name map for ana states
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:45 +0000 (01:56 -0700)]
nvmet: use type-name map for ana states

Now that we have a generic type to name map for configfs, get rid of
the nvmet_ana_state_names structure and replace it with newly added
nvmet_type_name_map.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet: use type-name map for address family
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:44 +0000 (01:56 -0700)]
nvmet: use type-name map for address family

Right now nvmet_addr_adrfam_[store|show]() uses switch and if else
ladder for address family to string and reverse mapping which also
repeats the strings in show and store function.

With addition of generic nvmet_type_name_map structure we can now get rid
of the switch and if else ladder and string duplication.

Also, we add a newline in before found label in nvmet_addr_trtype_store()
which keeps goto label code consistent with
nvmet_allowed_hosts_drop_link(), nvmet_port_subsys_drop_link() and
nvmet_ana_group_ana_state_store().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet: add generic type-name mapping
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:43 +0000 (01:56 -0700)]
nvmet: add generic type-name mapping

This patch adds a new type to name mapping generic structure. It
replaces nvmet_transport_name with new generic mapping structure
nvmet_transport.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-multipath: stop using ->queuedata
Christoph Hellwig [Sun, 29 Mar 2020 17:41:38 +0000 (19:41 +0200)]
nvme-multipath: stop using ->queuedata

nvme-multipath already uses the gendisk private data, not need to
also set up the request_queue queuedata and use it in one place only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-tcp: try to send request in queue_rq context
Sagi Grimberg [Fri, 1 May 2020 21:25:45 +0000 (14:25 -0700)]
nvme-tcp: try to send request in queue_rq context

Today, nvme-tcp automatically schedules a send request
to a workqueue context, which is 1 more than we'd need
in case the socket buffer is wide open.

However, because we have async send activity (as a result
of r2t, or write_space callbacks), we need to synchronize
sends from possibly multiple contexts (ideally all running
on the same cpu though).

Thus, we only try to send directly from queue_rq in cases:
1. the send_list is empty
2. we can send it synchronously (i.e. not from the RX path)
3. we run on the same cpu as the queue->io_cpu to avoid
   contention on the send operation.

Proposed-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-tcp: avoid scheduling io_work if we are already polling
Sagi Grimberg [Fri, 1 May 2020 21:25:44 +0000 (14:25 -0700)]
nvme-tcp: avoid scheduling io_work if we are already polling

When the user runs polled I/O, we shouldn't have to trigger
the workqueue to generate the receive work upon the .data_ready
upcall. This prevents a redundant context switch when the
application is already polling for completions.

Proposed-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-tcp: use bh_lock in data_ready
Sagi Grimberg [Thu, 30 Apr 2020 20:59:32 +0000 (13:59 -0700)]
nvme-tcp: use bh_lock in data_ready

data_ready may be invoked from send context or from
softirq, so need bh locking for that.

Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-pci: align io queue count with allocted nvme_queue in nvme_probe
Weiping Zhang [Sat, 2 May 2020 07:29:41 +0000 (15:29 +0800)]
nvme-pci: align io queue count with allocted nvme_queue in nvme_probe

Since commit 147b27e4bd08 ("nvme-pci: allocate device queues storage
space at probe"), nvme_alloc_queue does not alloc the nvme queues
itself anymore.

If the write/poll_queues module parameters are changed at runtime to
values larger than the number of allocated queues in nvme_probe,
nvme_alloc_queue will access unallocated memory.

Add a new nr_allocated_queues member to struct nvme_dev to record how
many queues were alloctated in nvme_probe to avoid using more than the
allocated queues after a reset following a change to the
write/poll_queues module parameters.

Also add nr_write_queues and nr_poll_queues members to allow refreshing
the number of write and poll queues based on a change to the module
parameters when resetting the controller.

Fixes: 147b27e4bd08 ("nvme-pci: allocate device queues storage space at probe")
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
[hch: add nvme_max_io_queues, update the commit message]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-pci: remove last_sq_tail
Keith Busch [Mon, 27 Apr 2020 18:54:46 +0000 (11:54 -0700)]
nvme-pci: remove last_sq_tail

The nvme driver does not have enough tags to wrap the queue, and blk-mq
will no longer call commit_rqs() when there are no new submissions to
notify.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-pci: remove volatile cqes
Keith Busch [Tue, 28 Apr 2020 14:21:56 +0000 (07:21 -0700)]
nvme-pci: remove volatile cqes

The completion queue entry is not volatile once the phase is confirmed.
Remove the volatile keywords and check the phase using the appropriate
READ_ONCE() accessor, allowing the compiler to optimize the remaining
completion path.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: flush scan work on passthrough commands
Keith Busch [Wed, 29 Apr 2020 20:31:23 +0000 (05:31 +0900)]
nvme: flush scan work on passthrough commands

If a passthrough command causes the namespace inventory or capabilities
to change, flush the scan work that handles these changes so the driver
synchronizes with the user command's effects before returning the result
to user space.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: clean up error handling in nvme_init_ns_head
Christoph Hellwig [Wed, 22 Apr 2020 07:59:08 +0000 (09:59 +0200)]
nvme: clean up error handling in nvme_init_ns_head

Use a common label for putting the nshead if needed and only convert
nvme status codes for the one case where it actually is needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-fc: avoid gcc-10 zero-length-bounds warning
Arnd Bergmann [Thu, 30 Apr 2020 21:30:57 +0000 (23:30 +0200)]
nvme-fc: avoid gcc-10 zero-length-bounds warning

When CONFIG_ARCH_NO_SG_CHAIN is set, op->sgl[0] cannot be dereferenced,
as gcc-10 now points out:

drivers/nvme/host/fc.c: In function 'nvme_fc_init_request':
drivers/nvme/host/fc.c:1774:29: warning: array subscript 0 is outside the bounds of an interior zero-length array 'struct scatterlist[0]' [-Wzero-length-bounds]
 1774 |  op->op.fcp_req.first_sgl = &op->sgl[0];
      |                             ^~~~~~~~~~~
drivers/nvme/host/fc.c:98:21: note: while referencing 'sgl'
   98 |  struct scatterlist sgl[NVME_INLINE_SG_CNT];
      |                     ^~~

I don't know if this is a legitimate warning or a false-positive.
If this is just a false alarm, the warning is easily suppressed
by interpreting the array as a pointer.

Fixes: b1ae1a238900 ("nvme-fc: Avoid preallocating big SGL for data")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet: add ns revalidation support
Anthony Iliopoulos [Sun, 19 Apr 2020 23:48:50 +0000 (16:48 -0700)]
nvmet: add ns revalidation support

Add support for detecting capacity changes on nvmet blockdev and file
backed namespaces. This allows for emulating and testing online resizing
of nvme devices and filesystems on top.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
[chaitanya: Fix comments posted on V1]
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: reuse code a bit more]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: consolodate io settings
Keith Busch [Thu, 9 Apr 2020 16:09:08 +0000 (09:09 -0700)]
nvme: consolodate io settings

The stream parameters indicating optimal io settings were just getting
overwritten later. Rearrange the settings so the streams parameters can
be preserved if provided.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: revalidate namespace stream parameters
Keith Busch [Thu, 9 Apr 2020 16:09:07 +0000 (09:09 -0700)]
nvme: revalidate namespace stream parameters

The stream parameters are based on the currently formatted logical block
size. Recheck these parameters on namespace revalidation so the
registered constraints will be accurate.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: consolidate chunk_sectors settings
Keith Busch [Thu, 9 Apr 2020 16:09:06 +0000 (09:09 -0700)]
nvme: consolidate chunk_sectors settings

Move the quirked chunk_sectors setting to the same location as noiob so
one place registers this setting. And since the noiob value is only used
locally, remove the member from struct nvme_ns.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: revalidate after verifying identifiers
Keith Busch [Thu, 9 Apr 2020 16:09:05 +0000 (09:09 -0700)]
nvme: revalidate after verifying identifiers

If the namespace identifiers have changed, skip updating the disk
information, as that will register parameters from a mismatched
namespace.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme-multipath: set bdi capabilities once
Keith Busch [Thu, 9 Apr 2020 16:09:04 +0000 (09:09 -0700)]
nvme-multipath: set bdi capabilities once

The queues' backing device info capabilities don't change with each
namespace revalidation. Set it only when each path's request_queue
is initially added to a multipath queue.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: check namespace head shared property
Keith Busch [Thu, 9 Apr 2020 16:09:02 +0000 (09:09 -0700)]
nvme: check namespace head shared property

Reject a new shared namespace if a duplicate unshared namespace exists.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: always search for namespace head
Keith Busch [Thu, 9 Apr 2020 16:09:01 +0000 (09:09 -0700)]
nvme: always search for namespace head

Even if a namespace reports it is not capable of sharing, search the
subsystem for a matching namespace head. If found, the driver should
reject that namespace since it's coming from an invalid configuration.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: release namespace head reference on error
Keith Busch [Thu, 9 Apr 2020 16:09:00 +0000 (09:09 -0700)]
nvme: release namespace head reference on error

If a namespace identification does not match the subsystem's head for
that NSID, release the reference that was taken when the matching head
was initially found.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: unlink head after removing last namespace
Keith Busch [Thu, 9 Apr 2020 16:08:59 +0000 (09:08 -0700)]
nvme: unlink head after removing last namespace

The driver had been unlinking the namespace head from the subsystem's
list only after the last reference was released, and outside of the
list's subsys->lock protection.

There is no reason to track an empty head, so unlink the entry from the
subsystem's list when the last namespace using that head is removed and
with the mutex lock protecting the list update. The next namespace to
attach reusing the previous NSID will allocate a new head rather than
find the old head with mismatched identifiers.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: remove the magic 1024 constant in nvme_scan_ns_list
Christoph Hellwig [Sat, 4 Apr 2020 08:34:21 +0000 (10:34 +0200)]
nvme: remove the magic 1024 constant in nvme_scan_ns_list

Replace it with a value derived from the identify data and nsid sizes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: avoid an Identify Controller command for each namespace scan
Christoph Hellwig [Sat, 4 Apr 2020 08:31:35 +0000 (10:31 +0200)]
nvme: avoid an Identify Controller command for each namespace scan

The namespace lists are 0-terminated, so we don't really need the NN value
execept for the legacy sequential scan.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: factor out a nvme_ns_remove_by_nsid helper
Christoph Hellwig [Sat, 4 Apr 2020 08:30:32 +0000 (10:30 +0200)]
nvme: factor out a nvme_ns_remove_by_nsid helper

Factor out a piece of deeply indented and logicaly separate code
from nvme_scan_ns_list into a new helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: clean up nvme_scan_work
Christoph Hellwig [Sat, 4 Apr 2020 08:16:03 +0000 (10:16 +0200)]
nvme: clean up nvme_scan_work

Move the check for the supported CNS value into nvme_scan_ns_list, and
limit the life time of the identify controller allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: refine the Qemu Identify CNS quirk
Christoph Hellwig [Sat, 4 Apr 2020 08:11:28 +0000 (10:11 +0200)]
nvme: refine the Qemu Identify CNS quirk

Add a helper to check if we can use Identify CNS values > 1, and refine
the Qemu quirk to not apply to reported versions larger than 1.1, as the
Qemu implementation had been fixed by then.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet-fc: slight cleanup for kbuild test warnings
James Smart [Mon, 6 Apr 2020 23:55:34 +0000 (16:55 -0700)]
nvmet-fc: slight cleanup for kbuild test warnings

The kbuild tst robot flagged the following 3 issues:

Case 1)
>> drivers/nvme/target/fc.c:1201:37: warning: Either the condition
>> '!assoc' is redundant or there is possible null pointer dereference:
>> assoc. [nullPointerRedundantCheck]
>>  struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
                                       ^
>> drivers/nvme/target/fc.c:1853:7: note: Assuming that condition '!assoc'
>> is not redundant
>>   if (!assoc)
         ^
>> drivers/nvme/target/fc.c:1850:37: note: Assignment
>> 'assoc=nvmet_fc_find_target_assoc(tgtport,be64_to_cpu(
>>              rqst->associd.association_id))', assigned value is 0
>>   assoc = nvmet_fc_find_target_assoc(tgtport,
                                       ^
>> drivers/nvme/target/fc.c:1896:31: note: Calling function
>> 'nvmet_fc_delete_target_assoc', 1st argument 'assoc' value is 0
>>  nvmet_fc_delete_target_assoc(assoc);
                                 ^

The tool isn't smart enough to see that line 1854 sets a ret value which
thereafter causes the routine to exit. This occurs before any of the assoc
references, so it is not an issue. There are 2 more reportings of this
same failure.

To quiet the tool - rework the if test that does the exit to also
reference assoc.  No change in logic otherwise.

Case 2)
drivers/nvme/target/fc.c:1202:29: warning: The scope of the variable
'queue' can be reduced. [variableScope]
    struct nvmet_fc_tgt_queue *queue;
                               ^

The tool is requesting the variable be declared within the code block
that utilizes it. Ignoring this report as existing code style is fine.

Case 3)
drivers/nvme/target/fc.c:1137:16: warning: Variable 'needrandom' is
assigned a value that is never used. [unreadVariable]
       needrandom = true;
                  ^

Another parsing issue with the tool. Given that parens were not used
with the list_for_each_entry() check, it inadvertantly thinks the
break exited the outer while loop not the inner for loop.

This is not an error. But, added parens to the inner list_for_each_entry()
to quiet the tool and as it is better coding style.

-- james

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reported-by: kbuild test robot <lkp@intel.com>
CC: kbuild test robot <lkp@intel.com>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvmet-rdma: use SRQ per completion vector
Max Gurtovoy [Wed, 19 Apr 2017 08:56:57 +0000 (11:56 +0300)]
nvmet-rdma: use SRQ per completion vector

In order to save resource allocation and utilize the completion
locality in a better way (compared to SRQ per device that exist today),
allocate Shared Receive Queues (SRQs) per completion vector. Associate
each created QP/CQ with an appropriate SRQ according to the queue index.
This association will reduce the lock contention in the fast path
(compared to SRQ per device solution) and increase the locality in
memory buffers. Add new module parameter for SRQ size to adjust it
according to the expected load. User should make sure the size is >= 256
to avoid lack of resources. Also reduce the debug level of "last WQE
reached" event that is raised when a QP is using SRQ during destruction
process to relief the log.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: remove unused parameter
Keith Busch [Fri, 3 Apr 2020 16:24:09 +0000 (09:24 -0700)]
nvme: remove unused parameter

nvme_alloc_ns_head() doesn't use the 'struct nvme_id_ns' parameter.
Remove it, and update caller accordingly.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agonvme: provide num dword helper
Keith Busch [Fri, 3 Apr 2020 16:24:01 +0000 (09:24 -0700)]
nvme: provide num dword helper

Various nvme commands use a zeroes based number of dwords field. Create
a helper function to convert byte lengths to this format.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agolpfc: nvmet: Add Send LS Request and Abort LS Request support
James Smart [Tue, 31 Mar 2020 16:50:11 +0000 (09:50 -0700)]
lpfc: nvmet: Add Send LS Request and Abort LS Request support

Now that common helpers exist, add the ability to Send an NVME LS Request
and to Abort an outstanding LS Request to the nvmet side of the driver.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agolpfc: nvmet: Add support for NVME LS request hosthandle
James Smart [Tue, 31 Mar 2020 16:50:10 +0000 (09:50 -0700)]
lpfc: nvmet: Add support for NVME LS request hosthandle

As the nvmet layer does not have the concept of a remoteport object, which
can be used to identify the entity on the other end of the fabric that is
to receive an LS, the hosthandle was introduced.  The driver passes the
hosthandle, a value representative of the remote port, with a ls request
receive. The LS request will create the association.  The transport will
remember the hosthandle for the association, and if there is a need to
initiate a LS request to the remote port for the association, the
hosthandle will be used. When the driver loses connectivity with the
remote port, it needs to notify the transport that the hosthandle is no
longer valid, allowing the transport to terminate associations related to
the hosthandle.

This patch adds support to the driver for the hosthandle. The driver will
use the ndlp pointer of the remote port for the hosthandle in calls to
nvmet_fc_rcv_ls_req().  The discovery engine is updated to invalidate the
hosthandle whenever connectivity with the remote port is lost.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agolpfc: nvme: Add Receive LS Request and Send LS Response support to nvme
James Smart [Tue, 31 Mar 2020 16:50:09 +0000 (09:50 -0700)]
lpfc: nvme: Add Receive LS Request and Send LS Response support to nvme

Now that common helpers exist, add the ability to receive NVME LS requests
to the driver. New requests will be delivered to the transport by
nvme_fc_rcv_ls_req().

In order to complete the LS, add support for Send LS Response and send
LS response completion handling to the driver.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agolpfc: Refactor Send LS Response support
James Smart [Tue, 31 Mar 2020 16:50:08 +0000 (09:50 -0700)]
lpfc: Refactor Send LS Response support

Currently, the ability to send an NVME LS response is limited to the nvmet
(controller/target) side of the driver.  In preparation of both the nvme
and nvmet sides supporting Send LS Response, rework the existing send
ls_rsp and ls_rsp completion routines such that there is common code that
can be used by both sides.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agolpfc: Refactor Send LS Abort support
James Smart [Tue, 31 Mar 2020 16:50:07 +0000 (09:50 -0700)]
lpfc: Refactor Send LS Abort support

Send LS Abort support is needed when Send LS Request is supported.

Currently, the ability to abort an NVME LS request is limited to the nvme
(host) side of the driver.  In preparation of both the nvme and nvmet sides
supporting Send LS Abort, rework the existing ls_req abort routines such
that there is common code that can be used by both sides.

While refactoring it was seen the logic in the abort routine was incorrect.
It attempted to abort all NVME LS's on the indicated port. As such, the
routine was reworked to abort only the NVME LS request that was specified.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agolpfc: Refactor Send LS Request support
James Smart [Tue, 31 Mar 2020 16:50:06 +0000 (09:50 -0700)]
lpfc: Refactor Send LS Request support

Currently, the ability to send an NVME LS request is limited to the nvme
(host) side of the driver.  In preparation of both the nvme and nvmet sides
support Send LS Request, rework the existing send ls_req and ls_req
completion routines such that there is common code that can be used by
both sides.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agolpfc: Refactor NVME LS receive handling
James Smart [Tue, 31 Mar 2020 16:50:05 +0000 (09:50 -0700)]
lpfc: Refactor NVME LS receive handling

In preparation for supporting both intiator mode and target mode
receiving NVME LS's, commonize the existing NVME LS request receive
handling found in the base driver and in the nvmet side.

Using the original lpfc_nvmet_unsol_ls_event() and
lpfc_nvme_unsol_ls_buffer() routines as a templates, commonize the
reception of an NVME LS request. The common routine will validate the LS
request, that it was received from a logged-in node, and allocate a
lpfc_async_xchg_ctx that is used to manage the LS request. The role of
the port is then inspected to determine which handler is to receive the
LS - nvme or nvmet. As such, the nvmet handler is tied back in. A handler
is created in nvme and is stubbed out.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agolpfc: Commonize lpfc_async_xchg_ctx state and flag definitions
James Smart [Tue, 31 Mar 2020 16:50:04 +0000 (09:50 -0700)]
lpfc: Commonize lpfc_async_xchg_ctx state and flag definitions

The last step of commonization is to remove the 'T' suffix from
state and flag field definitions.  This is minor, but removes the
mental association that it solely applies to nvmet use.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>