linux-2.6-block.git
5 years agoIB/uverbs: Allow all DESTROY commands to succeed after disassociate
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:20 +0000 (21:40 -0600)]
IB/uverbs: Allow all DESTROY commands to succeed after disassociate

The disassociate function was broken by design because it failed all
commands. This prevents userspace from calling destroy on a uobject after
it has detected a device fatal error and thus reclaiming the resources in
userspace is prevented.

This fix is now straightforward, when anything destroys a uobject that is
not the user the object remains on the IDR with a NULL context and object
pointer. All lookup locking modes other than DESTROY will fail. When the
user ultimately calls the destroy function it is simply dropped from the
IDR while any related information is returned.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Do not block disassociate during write()
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:19 +0000 (21:40 -0600)]
IB/uverbs: Do not block disassociate during write()

Now that all the callbacks are safe to run concurrently with
disassociation this test can be eliminated. The ufile core infrastructure
becomes entirely self contained and is not sensitive to disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Do not pass struct ib_device to the ioctl methods
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:18 +0000 (21:40 -0600)]
IB/uverbs: Do not pass struct ib_device to the ioctl methods

This does the same as the patch before, except for ioctl. The rules are
the same, but for the ioctl methods the core code handles setting up the
uobject.

- Retrieve the ib_dev from the uobject->context->device. This is
  safe under ioctl as the core has already done rdma_alloc_begin_uobject
  and so CREATE calls are entirely protected by the rwsem.
- Retrieve the ib_dev from uobject->object
- Call ib_uverbs_get_ucontext()

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Do not pass struct ib_device to the write based methods
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:17 +0000 (21:40 -0600)]
IB/uverbs: Do not pass struct ib_device to the write based methods

This is a step to get rid of the global check for disassociation. In this
model, the ib_dev is not proven to be valid by the core code and cannot be
provided to the method. Instead, every method decides if it is able to
run after disassociation and obtains the ib_dev using one of three
different approaches:

- Call srcu_dereference on the udevice's ib_dev. As before, this means
  the method cannot be called after disassociation begins.
  (eg alloc ucontext)
- Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext()
- Retrieve the ib_dev from the uobject->object after checking
  under SRCU if disassociation has started (eg uobj_get)

Largely, the code is all ready for this, the main work is to provide a
ib_dev after calling uobj_alloc(). The few other places simply use
ib_uverbs_get_ucontext() to get the ib_dev.

This flexibility will let the next patches allow destroy to operate
after disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Lower the test for ongoing disassociation
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:16 +0000 (21:40 -0600)]
IB/uverbs: Lower the test for ongoing disassociation

Commands that are reading/writing to objects can test for an ongoing
disassociation during their initial call to rdma_lookup_get_uobject.  This
directly prevents all of these commands from conflicting with an ongoing
disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Allow uobject allocation to work concurrently with disassociate
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:15 +0000 (21:40 -0600)]
IB/uverbs: Allow uobject allocation to work concurrently with disassociate

After all the recent structural changes this is now straightforward, hold
the hw_destroy_rwsem across the entire uobject creation. We already take
this semaphore on the success path, so holding it a bit longer is not
going to change the performance.

After this change none of the create callbacks require the
disassociate_srcu lock to be correct.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:14 +0000 (21:40 -0600)]
IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate

After all the recent structural changes this is now straightfoward, hoist
the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around
the uobject write lock as well as the destroy.

This is necessary as obtaining a write lock concurrently with
uverbs_destroy_ufile_hw() will cause malfunction.

After this change none of the destroy callbacks require the
disassociate_srcu lock to be correct.

This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the
IOCTL interface needs to hold an unlocked kref until all command
verification is completed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Convert 'bool exclusive' into an enum
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:13 +0000 (21:40 -0600)]
IB/uverbs: Convert 'bool exclusive' into an enum

This is more readable, and future patches will need a 3rd lookup type.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Consolidate uobject destruction
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:12 +0000 (21:40 -0600)]
IB/uverbs: Consolidate uobject destruction

There are several flows that can destroy a uobject and each one is
minimized and sprinkled throughout the code base, making it difficult to
understand and very hard to modify the destroy path.

Consolidate all of these into uverbs_destroy_uobject() and call it in all
cases where a uobject has to be destroyed.

This makes one change to the lifecycle, during any abort (eg when
alloc_commit is not called) we always call out to alloc_abort, even if
remove_commit needs to be called to delete a HW object.

This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to
clarify its actual usage and revises some of the comments to reflect what
the life cycle is for the type implementation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Make the write path destroy methods use the same flow as ioctl
Jason Gunthorpe [Thu, 26 Jul 2018 03:40:11 +0000 (21:40 -0600)]
IB/uverbs: Make the write path destroy methods use the same flow as ioctl

The ridiculous dance with uobj_remove_commit() is not needed, the write
path can follow the same flow as ioctl - lock and destroy the HW object
then use the data left over in the uobject to form the response to
userspace.

Two helpers are introduced to make this flow straightforward for the
caller.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Remove rdma_explicit_destroy() from the ioctl methods
Jason Gunthorpe [Thu, 26 Jul 2018 21:57:56 +0000 (15:57 -0600)]
IB/uverbs: Remove rdma_explicit_destroy() from the ioctl methods

The core code will destroy the HW object on behalf of the method, if the
method provides an implementation it must simply copy data from the stub
uobj into the response. Destroy methods cannot touch the HW object.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA: Fix return code check in rdma_set_cq_moderation
Kamal Heib [Tue, 31 Jul 2018 06:02:36 +0000 (09:02 +0300)]
RDMA: Fix return code check in rdma_set_cq_moderation

The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is
not supported, all drivers should generate this and all users should check
for it when detecting not supported functionality.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5)
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agordma/cxgb4: Simplify a structure initialization
Bart Van Assche [Tue, 31 Jul 2018 15:51:30 +0000 (08:51 -0700)]
rdma/cxgb4: Simplify a structure initialization

This patch avoids that sparse reports the following warning:

drivers/infiniband/hw/cxgb4/qp.c:2269:34: warning: Using plain integer as NULL pointer

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agordma/cxgb4: Fix SRQ endianness annotations
Bart Van Assche [Tue, 31 Jul 2018 15:25:41 +0000 (08:25 -0700)]
rdma/cxgb4: Fix SRQ endianness annotations

This patch avoids that sparse complains about casts to restricted __be32.

Fixes: a3cdaa69e4ae ("cxgb4: Adds CPL support for Shared Receive Queues")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agordma/cxgb4: Remove a set-but-not-used variable
Bart Van Assche [Tue, 31 Jul 2018 15:08:15 +0000 (08:08 -0700)]
rdma/cxgb4: Remove a set-but-not-used variable

This patch avoids that the following warning is reported when building with
W=1:

drivers/infiniband/hw/cxgb4/cm.c:1860:5: warning: variable 'status' set but not used [-Wunused-but-set-variable]
  u8 status;
     ^~~~~~

Fixes: 6a0b6174d35a ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Prefix _ib to IB/RoCE specific functions
Parav Pandit [Sun, 29 Jul 2018 08:53:16 +0000 (11:53 +0300)]
RDMA/core: Prefix _ib to IB/RoCE specific functions

In rdma cm module, functions which are common between IB and iWarp
are named with cma_.
iWarp specific functions are prefixed with cma_iw.
IB specific functions are perfixed with cma_ib.

However some functions in request processing path didn't follow
cma_ib notion. Prefix them with _ib for better code clarity.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Simplify gid type check in cma_acquire_dev()
Parav Pandit [Sun, 29 Jul 2018 08:53:15 +0000 (11:53 +0300)]
RDMA/core: Simplify gid type check in cma_acquire_dev()

cma_add_one() initializes the default GID regardless of device type.
listen_id is bound to a device and an IP address, its GID type is
initialized by cma_acquire_dev().

Therefore a valid default GID type is always available, it is not needed
to check port type during cma_acquire_dev().

Initialize gid type of a cm id when the cm_id is created instead of
doing conditional checks during cma_acquire_dev() and trying to
initialize to 0 during _cma_attach_to_dev().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Avoid holding lock while initializing fields on stack
Parav Pandit [Sun, 29 Jul 2018 08:53:14 +0000 (11:53 +0300)]
RDMA/core: Avoid holding lock while initializing fields on stack

In various functions rdma_cm_event is zero initialized on stack using
memset() while holding lock which is not necessary.
Therefore, don't hold the lock while initializing on stack.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Return bool instead of int
Parav Pandit [Sun, 29 Jul 2018 08:53:13 +0000 (11:53 +0300)]
RDMA/core: Return bool instead of int

Return bool for following internal and inline functions as their
underlying APIs return bool too.

1. cma_zero_addr()
2. cma_loopback_addr()
3. cma_any_addr()
4. ib_addr_any()
5. ib_addr_loopback()

While we are touching cma_loopback_addr(), remove extra white spaces
in it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Get rid of 1 bit boolean
Parav Pandit [Sun, 29 Jul 2018 08:53:12 +0000 (11:53 +0300)]
RDMA/cma: Get rid of 1 bit boolean

Arrange fields of cma_req_info structure for efficiency on
stack and get rid of one bit boolean field.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Constify path record, ib_cm_event, listen_id pointers
Parav Pandit [Sun, 29 Jul 2018 08:53:11 +0000 (11:53 +0300)]
RDMA/cma: Constify path record, ib_cm_event, listen_id pointers

Constify several pointers such as path_rec, ib_cm_event and listen_id
pointers in several functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Constify dst_addr argument
Parav Pandit [Sun, 29 Jul 2018 08:53:10 +0000 (11:53 +0300)]
RDMA/core: Constify dst_addr argument

Following APIs are not supposed to modify addr or dest_addr contents.
Therefore make those function argument const for better code
readability.

1. rdma_resolve_ip()
2. rdma_addr_size()
3. rdma_resolve_addr()

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Simplify rdma_resolve_addr() error flow
Parav Pandit [Sun, 29 Jul 2018 08:53:09 +0000 (11:53 +0300)]
RDMA/cma: Simplify rdma_resolve_addr() error flow

Currently dst address is first set and later on cleared on either of the
3 error conditions are met.
However none of the APIs or checks are supposed to refer to the
destination address of the cm_id.
Therefore, set the destination address after necessary checks pass which
simplifies the error flow.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Initialize resource type in __rdma_create_id()
Parav Pandit [Sun, 29 Jul 2018 08:53:08 +0000 (11:53 +0300)]
RDMA/cma: Initialize resource type in __rdma_create_id()

Currently rdma_cm_id's resource tracking fields such as owner task and
kern_name and other non resource tracking fields are initialized in
in single function __rdma_create_id().

Therefore, initialize rdma_cm_id's resource type also in same init
function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Program the tclass and flow label into the hardware
Lijun Ou [Mon, 30 Jul 2018 12:20:30 +0000 (20:20 +0800)]
RDMA/hns: Program the tclass and flow label into the hardware

This was missed in a few places, and was just using 0.

Also correct the spelling of HNS_ROCE_FLOW_LABEL_MASK

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Use macro instead of magic number
Lijun Ou [Mon, 30 Jul 2018 12:20:29 +0000 (20:20 +0800)]
RDMA/hns: Use macro instead of magic number

This patch mainly uses CMD_CSQ_DESC_NUM instead of magic number in order
to improve readability.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Modify qp will return errno when qp type is illegal
Lijun Ou [Mon, 30 Jul 2018 12:20:28 +0000 (20:20 +0800)]
RDMA/hns: Modify qp will return errno when qp type is illegal

Set for ret was missing in the error path here, resulting in incorrect
error code for modify_qp.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Assign the value for vlan field of qp context
Lijun Ou [Mon, 30 Jul 2018 12:20:27 +0000 (20:20 +0800)]
RDMA/hns: Assign the value for vlan field of qp context

This patch mainly fills the correct value into the vlan id field of qp
context as well as update the vlan field name according to the latest
hardware user manual.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Only assgin the fields of the av if IB_QP_AV bit is set
Lijun Ou [Mon, 30 Jul 2018 12:20:25 +0000 (20:20 +0800)]
RDMA/hns: Only assgin the fields of the av if IB_QP_AV bit is set

Only when the IB_QP_AV flag of attr_mask is set is it valid to assign the
related fields of the av into the qp context.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/providers: Remove pointless functions
Kamal Heib [Fri, 27 Jul 2018 18:23:06 +0000 (21:23 +0300)]
RDMA/providers: Remove pointless functions

The rdma core is taking care of return the right error code when the
rdma device callbacks aren't supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Check for verbs callbacks before using them
Kamal Heib [Fri, 27 Jul 2018 18:23:05 +0000 (21:23 +0300)]
RDMA/core: Check for verbs callbacks before using them

Make sure the providers implement the verbs callbacks before calling
them, otherwise return -EOPNOTSUPP.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Remove {create,destroy}_ah from mandatory verbs
Kamal Heib [Fri, 27 Jul 2018 18:23:04 +0000 (21:23 +0300)]
RDMA/core: Remove {create,destroy}_ah from mandatory verbs

{create,destroy}_ah aren't mandatory verbs, because not all providers
are implementing them.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/ipoib: Fix check for return code from ib_create_srq
Kamal Heib [Mon, 30 Jul 2018 18:56:44 +0000 (21:56 +0300)]
RDMA/ipoib: Fix check for return code from ib_create_srq

Make sure to check for "-EOPNOTSUPP" instead of "-ENOSYS" which is the
return code from ib_create_srq() in case that it not supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/providers: Fix return value from create_srq callbacks
Kamal Heib [Mon, 30 Jul 2018 18:56:43 +0000 (21:56 +0300)]
RDMA/providers: Fix return value from create_srq callbacks

The proper return code is "-EOPNOTSUPP" when the create_srq() callback
is not supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx4: Use 4K pages for kernel QP's WQE buffer
Jack Morgenstein [Thu, 26 Jul 2018 07:08:37 +0000 (10:08 +0300)]
IB/mlx4: Use 4K pages for kernel QP's WQE buffer

In the current implementation, the driver tries to allocate contiguous
memory, and if it fails, it falls back to 4K fragmented allocation.

Once the memory is fragmented, the first allocation might take a lot
of time, and even fail, which can cause connection failures.

This patch changes the logic to always allocate with 4K granularity,
since it's more robust and more likely to succeed.

This patch was tested with Lustre and no performance degradation
was observed.

Note: This commit eliminates the "shrinking WQE" feature. This feature
depended on using vmap to create a virtually contiguous send WQ.
vmap use was abandoned due to problems with several processors (see the
commit cited below). As a result, shrinking WQE was available only with
physically contiguous send WQs. Allocating such send WQs caused the
problems described above.
Therefore, as a side effect of eliminating the use of large physically
contiguous send WQs, the shrinking WQE feature became unavailable.

Warning example:
worker/20:1: page allocation failure: order:8, mode:0x80d0
CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<ffffffff81686d81>] dump_stack+0x19/0x1b
[<ffffffff81186160>] warn_alloc_failed+0x110/0x180
[<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
[<ffffffff811ce868>] alloc_pages_current+0x98/0x110
[<ffffffff81184fae>] __get_free_pages+0xe/0x50
[<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
[<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
[<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
[<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
[<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
[<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
[<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
[<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
[<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
[<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
[<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
[<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
[<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
[<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]

Fixes: 73898db04301 ("net/mlx4: Avoid wrong virtual mappings")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language
Jason Gunthorpe [Thu, 26 Jul 2018 22:37:14 +0000 (16:37 -0600)]
IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language

This clearly indicates that the input is a bitwise combination of values
in an enum, and identifies which enum contains the definition of the bits.

Special accessors are provided that handle the mandatory validation of the
allowed bits and enforce the correct type for bitwise flags.

If we had introduced this at the start then the kabi would have uniformly
used u64 data to pass flags, however today there is a mixture of u64 and
u32 flags. All places are converted to accept both sizes and the accessor
fixes it. This allows all existing flags to grow to u64 in future without
any hassle.

Finally all flags are, by definition, optional. If flags are not passed
the accessor does not fail, but provides a value of zero.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const
Bart Van Assche [Wed, 18 Jul 2018 16:25:32 +0000 (09:25 -0700)]
RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const

Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument
Bart Van Assche [Wed, 18 Jul 2018 16:25:15 +0000 (09:25 -0700)]
IB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument

Since the next patch will constify the wr pointer, do not modify the data
that pointer points at.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA: Constify the argument of the work request conversion functions
Bart Van Assche [Wed, 18 Jul 2018 16:25:14 +0000 (09:25 -0700)]
RDMA: Constify the argument of the work request conversion functions

When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/iser: Inline two work request conversion functions
Bart Van Assche [Wed, 18 Jul 2018 16:25:13 +0000 (09:25 -0700)]
IB/iser: Inline two work request conversion functions

Since the next patch will change the return type of these functions into a
const pointer and since the iSER driver modifies the work request these
functions return a pointer two, inline two work request conversion
function calls. This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/cache: Restore compatibility for ib_query_gid
Jason Gunthorpe [Fri, 27 Jul 2018 15:48:30 +0000 (09:48 -0600)]
IB/cache: Restore compatibility for ib_query_gid

Code changes in smc have become so complicated this cycle that the RDMA
patches to remove ib_query_gid in smc create too complex merge conflicts.
Allow those conflicts to be resolved by using the net/smc hunks by
providing a compatibility wrapper. During the second phase of the merge
window this wrapper will be deleted and smc updated to use the new API.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Enable modify_cq for uverbs.
Lijun Ou [Wed, 25 Jul 2018 07:29:41 +0000 (15:29 +0800)]
RDMA/hns: Enable modify_cq for uverbs.

The driver implements the modify_cq callback, but did not set the bit to
expose it to userspace.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Update the data type of immediate data
Lijun Ou [Wed, 25 Jul 2018 07:29:40 +0000 (15:29 +0800)]
RDMA/hns: Update the data type of immediate data

Because the data structure of hip08 is little endian, it needs to fix the
immediate field of wqe and cqe into __le32.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Use delay instead of usleep
Lijun Ou [Wed, 25 Jul 2018 07:29:38 +0000 (15:29 +0800)]
RDMA/hns: Use delay instead of usleep

In order to avoid using usleep function in lock function, we use delay
function instead of it.  Besides, it also use brackets for standardized
the computed order.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add illegal hop_num judgement
Lijun Ou [Wed, 25 Jul 2018 07:29:37 +0000 (15:29 +0800)]
RDMA/hns: Add illegal hop_num judgement

When hop_num is more than three, it need to return -EINVAL.  This patch
fixes it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Return correct error code from hns_roce_v1_rsv_lp_qp()
Lijun Ou [Wed, 25 Jul 2018 07:29:36 +0000 (15:29 +0800)]
RDMA/hns: Return correct error code from hns_roce_v1_rsv_lp_qp()

When create loop qp fail, it will return the correct result when
modify_qp() fails.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add 50GE type of hnae3 device match
Lijun Ou [Wed, 25 Jul 2018 07:29:33 +0000 (15:29 +0800)]
RDMA/hns: Add 50GE type of hnae3 device match

This patch adds PCI matching for the hns 50GE NIC.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Do not overwrite the error code during error unwind in hns_roce_init
Lijun Ou [Wed, 25 Jul 2018 07:29:31 +0000 (15:29 +0800)]
RDMA/hns: Do not overwrite the error code during error unwind in hns_roce_init

When init cmq fail in initial flow of RoCE, it should return the errno of
cmq_init function, not of the rest call.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: avoid excessive warning msgs when creating VFs on 2nd port
Qing Huang [Mon, 23 Jul 2018 21:15:08 +0000 (14:15 -0700)]
IB/mlx5: avoid excessive warning msgs when creating VFs on 2nd port

When a CX5 device is configured in dual-port RoCE mode, after creating
many VFs against port 1, creating the same number of VFs against port 2
will flood kernel/syslog with something like
"mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
affiliated."

So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
repeatedly attempts to bind the new mpi structure to every device on the
list until it finds an unbound device.

Change the log level from warn to dbg to avoid log flooding as the warning
should be harmless.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/usnic: Suppress a compiler warning
Bart Van Assche [Mon, 23 Jul 2018 22:37:01 +0000 (15:37 -0700)]
RDMA/usnic: Suppress a compiler warning

This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:

drivers/infiniband/hw/usnic/usnic_fwd.c:95:2: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation]
  strncpy(ufdev->name, netdev_name(ufdev->netdev),
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    sizeof(ufdev->name) - 1);
    ~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/xprtrdma: Restore needed argument to ib_post_send
Jason Gunthorpe [Thu, 26 Jul 2018 17:36:50 +0000 (11:36 -0600)]
net/xprtrdma: Restore needed argument to ib_post_send

The call in svc_rdma_post_chunk_ctxt() does actually use bad_wr.

Fixes: ed288d74a9e5 ("net/xprtrdma: Simplify ib_post_(send|recv|srq_recv)() calls")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Do not ignore net namespace for unbound cm_id
Parav Pandit [Mon, 16 Jul 2018 08:50:13 +0000 (11:50 +0300)]
RDMA/cma: Do not ignore net namespace for unbound cm_id

Currently if the cm_id is not bound to any netdevice, than for such cm_id,
net namespace is ignored; which is incorrect.

Regardless of cm_id bound to a netdevice or not, net namespace must
match. When a cm_id is bound to a netdevice, in such case net namespace
and netdevice both must match.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Consider netdevice for RoCE ports
Parav Pandit [Mon, 16 Jul 2018 08:50:12 +0000 (11:50 +0300)]
RDMA/cma: Consider netdevice for RoCE ports

When netdevice is not found for a request, and if it for RoCE port,
currently it allows matching the listener as long as port number matches
by ignoring the netdevice.

Now that we always prefer to have netdevice associated with RoCE, when
netdevice is not found, don't consider RoCE ports.

In other words, a NULL netdevice with RoCE is not acceptable. Therefore,
remove this confusing RoCE port ignorance check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/core: Introduce and use sgid_attr in CM requests
Parav Pandit [Mon, 16 Jul 2018 08:50:11 +0000 (11:50 +0300)]
IB/core: Introduce and use sgid_attr in CM requests

For RoCE, when CM requests are received for RC and UD connections,
netdevice of the incoming request is unavailable. Because of that CM
requests are always forwarded to init_net namespace.

Now that we have the GID attribute available, introduce SGID attribute in
incoming CM requests and refer to the netdevice of it.  This is similar to
existing SGID attribute field in outgoing CM requests for RC and UD
transports.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/usnic: usnic should not select INFINIBAND_USER_ACCESS
Jason Gunthorpe [Tue, 24 Jul 2018 20:37:52 +0000 (14:37 -0600)]
IB/usnic: usnic should not select INFINIBAND_USER_ACCESS

This driver doesn't provide any kernel services, it only provides
an interface via uverbs, so it should depend on, not select, uverbs
support.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agordma/cxgb4: Add support for kernel mode SRQ's
Raju Rangoju [Wed, 25 Jul 2018 15:52:14 +0000 (21:22 +0530)]
rdma/cxgb4: Add support for kernel mode SRQ's

This patch implements the srq specific verbs such as create/destroy/modify
and post_srq_recv. And adds srq specific structures and defines to t4.h
and uapi.

Also updates the cq poll logic to deal with completions that are
associated with the SRQ's.

This patch also handles kernel mode SRQ_LIMIT events as well as flushed
SRQ buffers

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agordma/cxgb4: Add support for srq functions & structs
Raju Rangoju [Wed, 25 Jul 2018 15:52:13 +0000 (21:22 +0530)]
rdma/cxgb4: Add support for srq functions & structs

This patch adds kernel mode t4_srq structures and support functions,
uapi structures and defines, as well as firmware work request structures.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/core: Remove extra parentheses
Varsha Rao [Wed, 25 Jul 2018 18:43:56 +0000 (20:43 +0200)]
IB/core: Remove extra parentheses

Remove unnecessary parentheses to fix the clang warning of extraneous
parentheses.

Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/ocrdma: Suppress a compiler warning
Bart Van Assche [Wed, 18 Jul 2018 16:01:35 +0000 (09:01 -0700)]
RDMA/ocrdma: Suppress a compiler warning

This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:

In function 'ocrdma_mbx_get_ctrl_attribs',
    inlined from 'ocrdma_init_hw' at drivers/infiniband/hw/ocrdma/ocrdma_hw.c:3224:11:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:1368:3: warning: 'strncpy' output may be truncated copying 31 bytes from a string of length 31 [-Wstringop-truncation]
   strncpy(dev->model_number,
   ^~~~~~~~~~~~~~~~~~~~~~~~~~
    hba_attribs->controller_model_number, 31);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Fix locking around struct ib_uverbs_file ucontext
Jason Gunthorpe [Tue, 10 Jul 2018 19:43:06 +0000 (13:43 -0600)]
IB/uverbs: Fix locking around struct ib_uverbs_file ucontext

We have a parallel unlocked reader and writer with ib_uverbs_get_context()
vs everything else, and nothing guarantees this works properly.

Audit and fix all of the places that access ucontext to use one of the
following locking schemes:
- Call ib_uverbs_get_ucontext() under SRCU and check for failure
- Access the ucontext through an struct ib_uobject context member
  while holding a READ or WRITE lock on the uobject.
  This value cannot be NULL and has no race.
- Hold the ucontext_lock and check for ufile->ucontext !NULL

This also re-implements ib_uverbs_get_ucontext() in a way that is safe
against concurrent ib_uverbs_get_context() and disassociation.

As a side effect, every access to ucontext in the commands is via
ib_uverbs_get_context() with an error check, or via the uobject, so there
is no longer any need for the core code to check ucontext on every command
call. These checks are also removed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Use the ucontext from the uobj, not the file
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:22 +0000 (20:55 -0600)]
IB/mlx5: Use the ucontext from the uobj, not the file

This approach matches the standard flow of the typical write method that
relies on the HW object to store the device and the uobject to access the
ucontext.  Avoids the use of the devx_ufile2uctx in several places will
make revising the semantics of ib_uverbs_get_ucontext() in the next patch
simpler.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Move the FD uobj type struct file allocation to alloc_commit
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:21 +0000 (20:55 -0600)]
IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit

Allocating the struct file during alloc_begin creates this strange
asymmetry with IDR, where the FD has two krefs pointing at it during the
pre-commit phase. In particular this makes the abort process for FD very
strange and confusing.

For instance abort currently calls the type's destroy_object twice, and
the fops release once if abort is done. This is very counter intuitive. No
fops should be called until alloc_commit succeeds, and destroy_object
should only ever be called once.

Moving the struct file allocation to the alloc_commit is now simple, as we
already support failure of rdma_alloc_commit_uobject, with all the
required rollback pieces.

This creates an understandable symmetry with IDR and simplifies/fixes the
abort handling for FD types.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Always propagate errors from rdma_alloc_commit_uobject()
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:20 +0000 (20:55 -0600)]
IB/uverbs: Always propagate errors from rdma_alloc_commit_uobject()

The ioctl framework already does this correctly, but the write path did
not. This is trivially fixed by simply using a standard pattern to return
uobj_alloc_commit() as the last statement in every function.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Rework the locking for cleaning up the ucontext
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:19 +0000 (20:55 -0600)]
IB/uverbs: Rework the locking for cleaning up the ucontext

The locking here has always been a bit crazy and spread out, upon some
careful analysis we can simplify things.

Create a single function uverbs_destroy_ufile_hw() that internally handles
all locking. This pulls together pieces of this process that were
sprinkled all over the places into one place, and covers them with one
lock.

This eliminates several duplicate/confusing locks and makes the control
flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
simple.

Unfortunately we have to keep an extra mutex, ucontext_lock.  This lock is
logically part of the rwsem and provides the 'down write, fail if write
locked, wait if read locked' semantic we require.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Revise and clarify the rwsem and uobjects_lock
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:18 +0000 (20:55 -0600)]
IB/uverbs: Revise and clarify the rwsem and uobjects_lock

Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call
to the type destroy function (aka 'hw' destroy). The main purpose of this
lock is to prevent normal add and destroy from running concurrently with
uverbs_cleanup_ufile()

Since the uobjects list is always manipulated under the 'hw_destroy_rwsem'
we can eliminate the uobjects_lock in the cleanup function. This allows
converting that lock to a very simple spinlock with a narrow critical
section.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Clarify and revise uverbs_close_fd
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:17 +0000 (20:55 -0600)]
IB/uverbs: Clarify and revise uverbs_close_fd

The locking requirements here have changed slightly now that we can rely
on the ib_uverbs_file always existing and containing all the necessary
locking infrastructure.

That means we can get rid of the cleanup_mutex usage (this was protecting
the check on !uboj->context).

Otherwise, follow the same pattern that IDR uses for destroy, acquire
exclusive write access, then call destroy and the undo the 'lookup'.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Revise the placement of get/puts on uobject
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:16 +0000 (20:55 -0600)]
IB/uverbs: Revise the placement of get/puts on uobject

This wasn't wrong, but the placement of two krefs didn't make any
sense. Follow some simple rules.

- A kref is held inside uobjects_list
- A kref is held inside the IDR
- A kref is held inside file->private
- A stack based kref is passed bettwen alloc_begin and
  alloc_abort/alloc_commit

Any place we destroy one of the above pointers, we stick a put,
or 'move' the kref into another pointer.

The key functions have sensible semantics:
- alloc_uobj fully initializes the common members in uobj, including
  the list
- Get rid of the uverbs_idr_remove_uobj helper since IDR remove
  does require put, but it depends on the situation. Later
  patches will re-consolidate this differently.
- alloc_abort always consumes the passed kref, done in the type
- alloc_commit always consumes the passed kref, done in the type
- rdma_remove_commit_uobject always pairs with a lookup_get

After it is all done the only control flow change is to:
- move a get from alloc_commit_fd_uobject to rdma_alloc_commit_uobject
- add a put to remove_commit_idr_uobject
- Consistenly use rdma_lookup_put in rdma_remove_commit_uobject at
  the right place

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Clarify the kref'ing ordering for alloc_commit
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:15 +0000 (20:55 -0600)]
IB/uverbs: Clarify the kref'ing ordering for alloc_commit

The alloc_commit callback makes the uobj visible to other threads,
and it does so using a 'move' semantic of the uobj kref on the stack
into the public storage (eg the IDR, uobject list and file_private_data)

Once this is done another thread could start up and trigger deletion
of the kref. Fortunately cleanup_rwsem happens to prevent this from
being a bug, but that is a fantastically unclear side effect.

Re-organize things so that alloc_commit is that last thing to touch
the uobj, get rid of the sneaky implicit dependency on cleanup_rwsem,
and add a comment reminding that uobj is no longer kref'd after
alloc_commit.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Handle IDR and FD types without truncation
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:14 +0000 (20:55 -0600)]
IB/uverbs: Handle IDR and FD types without truncation

Our ABI for write() uses a s32 for FDs and a u32 for IDRs, but internally
we ended up implicitly casting these ABI values into an 'int'. For ioctl()
we use a s64 for FDs and a u64 for IDRs, again casting to an int.

The various casts to int are all missing range checks which can cause
userspace values that should be considered invalid to be accepted.

Fix this by making the generic lookup routine accept a s64, which does not
truncate the write API's u32/s32 or the ioctl API's s64. Then push the
detailed range checking down to the actual type implementations to be
shared by both interfaces.

Finally, change the copy of the uobj->id to sign extend into a s64, so eg,
if we ever wish to return a negative value for a FD it is carried
properly.

This ensures that userspace values are never weirdly interpreted due to
the various trunctations and everything that is really out of range gets
an EINVAL.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/uverbs: Get rid of null_obj_type
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:13 +0000 (20:55 -0600)]
IB/uverbs: Get rid of null_obj_type

If the method fails after calling rdma_explicit_destroy (eg if
copy_to_user faults) then it will trigger a kernel oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000000548d067 P4D 800000000548d067 PUD 54a0067 PMD 0
SMP PTI
CPU: 0 PID: 359 Comm: ibv_rc_pingpong Not tainted 4.18.0-rc1+ #28
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
RIP: 0010:          (null)
Code: Bad RIP value.
RSP: 0018:ffffc900001a3bf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88000603bd00 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88000603bd00
RBP: 0000000000000001 R08: ffffc900001a3cf8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900001a3cf0
R13: 0000000000000000 R14: ffffc900001a3cf0 R15: 0000000000000000
FS:  00007fb00dda8700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000000548e004 CR4: 00000000003606b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rdma_lookup_put_uobject+0x22/0x50 [ib_uverbs]
 ? uverbs_finalize_object+0x3b/0x60 [ib_uverbs]
 ? uverbs_finalize_attrs+0x128/0x140 [ib_uverbs]
 ? ib_uverbs_cmd_verbs+0x698/0x7c0 [ib_uverbs]
 ? find_held_lock+0x2d/0x90
 ? __might_fault+0x39/0x90
 ? ib_uverbs_ioctl+0x111/0x1f0 [ib_uverbs]
 ? do_vfs_ioctl+0xa0/0x6d0
 ? trace_hardirqs_on_caller+0xed/0x180
 ? _raw_spin_unlock_irq+0x24/0x40
 ? syscall_trace_enter+0x138/0x1d0
 ? ksys_ioctl+0x35/0x60
 ? __x64_sys_ioctl+0x11/0x20
 ? do_syscall_64+0x5b/0x1c0
 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is because the type was replaced with the null_type during explicit
destroy that cannot complete the destruction.

One of the side effects of replacing the type is to make the object
handle totally unreachable - so no other command could attempt to use
it, even though it remains on the uboject list.

We can get the same end result by just fully destroying the object inside
rdma_explicit_destroy and leaving the caller the residual kref for the
uobj with no attached HW object, and no presence in the ubojects list.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/xprtrdma: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:31 +0000 (09:25 -0700)]
net/xprtrdma: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/smc: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:30 +0000 (09:25 -0700)]
net/smc: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/smc: Remove a WARN_ON() statement
Bart Van Assche [Wed, 18 Jul 2018 16:25:29 +0000 (09:25 -0700)]
net/smc: Remove a WARN_ON() statement

Remove a WARN_ON() statement that verifies something that is guaranteed
by the RDMA API, namely that the failed_wr pointer is not touched if an
ib_post_send() call succeeds and that it points at the failed wr if an
ib_post_send() call fails.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/rds: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:28 +0000 (09:25 -0700)]
net/rds: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/rds: Remove two WARN_ON() statements
Bart Van Assche [Wed, 18 Jul 2018 16:25:27 +0000 (09:25 -0700)]
net/rds: Remove two WARN_ON() statements

Remove two WARN_ON() statements that verify something that is guaranteed
by the RDMA API, namely that the failed_wr pointer is not touched if an
ib_post_send() call succeeds and that it points at the failed wr if an
ib_post_send() call fails.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/9p: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:26 +0000 (09:25 -0700)]
net/9p: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agofs/cifs: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:25 +0000 (09:25 -0700)]
fs/cifs: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonvmet-rdma: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:24 +0000 (09:25 -0700)]
nvmet-rdma: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonvme-rdma: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:23 +0000 (09:25 -0700)]
nvme-rdma: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/srpt: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:22 +0000 (09:25 -0700)]
IB/srpt: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/srp: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:21 +0000 (09:25 -0700)]
IB/srp: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/isert: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:20 +0000 (09:25 -0700)]
IB/isert: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/iser: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:19 +0000 (09:25 -0700)]
IB/iser: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/IPoIB: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:18 +0000 (09:25 -0700)]
IB/IPoIB: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:17 +0000 (09:25 -0700)]
RDMA/core: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/core: Allow ULPs to specify NULL as the third ib_post_(send|recv|srq_recv)() argument
Bart Van Assche [Wed, 18 Jul 2018 16:25:16 +0000 (09:25 -0700)]
IB/core: Allow ULPs to specify NULL as the third ib_post_(send|recv|srq_recv)() argument

This patch does not change the behavior of the modified functions.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/rxe: Drop QP0 silently
Zhu Yanjun [Fri, 13 Jul 2018 07:10:20 +0000 (03:10 -0400)]
IB/rxe: Drop QP0 silently

According to "Annex A16: RDMA over Converged Ethernet (RoCE)":

A16.4.3 MANAGEMENT INTERFACES

As defined in the base specification, a special Queue Pair, QP0 is defined
solely for communication between subnet manager(s) and subnet management
agents. Since such an IB-defined subnet management architecture is outside
the scope of this annex, it follows that there is also no requirement that
a port which conforms to this annex be associated with a QP0. Thus, for
end nodes designed to conform to this annex, the concept of QP0 is
undefined and unused for any port connected to an Ethernet network.

CA16-8: A packet arriving at a RoCE port containing a BTH with the
destination QP field set to QP0 shall be silently dropped.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/ipoib: Fix error return code in ipoib_dev_init()
Wei Yongjun [Wed, 11 Jul 2018 13:15:42 +0000 (13:15 +0000)]
IB/ipoib: Fix error return code in ipoib_dev_init()

Fix to return a negative error code from the ipoib_neigh_hash_init()
error handling case instead of 0, as done elsewhere in this function.

Fixes: 515ed4f3aab4 ("IB/IPoIB: Separate control and data related initializations")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Enable driver uapi commands for flow steering
Yishai Hadas [Mon, 23 Jul 2018 12:25:12 +0000 (15:25 +0300)]
IB/mlx5: Enable driver uapi commands for flow steering

Expose the mlx5 flow steering parsing trees, exposing the functionality to
user space.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Add support for a flow table destination for driver flow steering
Yishai Hadas [Mon, 23 Jul 2018 12:25:11 +0000 (15:25 +0300)]
IB/mlx5: Add support for a flow table destination for driver flow steering

Add support to set a destination that is a flow table, this can come from
the DEVX destination.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Support adding flow steering rule by raw description
Yishai Hadas [Mon, 23 Jul 2018 12:25:10 +0000 (15:25 +0300)]
IB/mlx5: Support adding flow steering rule by raw description

Add support to set a public flow steering rule when its destination is a
TIR by using raw specification data.

The logic follows the verbs API but instead of using ib_spec(s) the raw,
device specific, description is used.

This allows supporting specialty matchers without having to define new
matches in the verbs struct based language.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Introduce driver create and destroy flow methods
Yishai Hadas [Mon, 23 Jul 2018 12:25:09 +0000 (15:25 +0300)]
IB/mlx5: Introduce driver create and destroy flow methods

Introduce driver create and destroy flow methods on the uverbs flow
object.

This allows the driver to get its specific device attributes to match the
underlay specification while still using the generic ib_flow object for
cleanup and code sharing.

The IB object's attributes are set via the ib_set_flow() helper function.

The specific implementation for the given specification is added in
downstream patches.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB: Support ib_flow creation in drivers
Yishai Hadas [Mon, 23 Jul 2018 12:25:08 +0000 (15:25 +0300)]
IB: Support ib_flow creation in drivers

This patch considers the case that ib_flow is created by some device
driver with its specific parameters using the KABI infrastructure.

In that case both QP and ib_uflow_resources might not be applicable.
Downstream patches from this series use the above functionality.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Introduce flow steering matcher uapi object
Yishai Hadas [Mon, 23 Jul 2018 12:25:07 +0000 (15:25 +0300)]
IB/mlx5: Introduce flow steering matcher uapi object

Introduce flow steering matcher object and its create and destroy methods.

This matcher object holds some mlx5 specific driver properties that
matches the underlay device specification when an mlx5 flow steering group
is created.

It will be used in downstream patches to be part of mlx5 specific create
flow method.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoMerge branch 'mellanox/mlx5-next' into rdma.git for-next
Jason Gunthorpe [Tue, 24 Jul 2018 19:10:23 +0000 (13:10 -0600)]
Merge branch 'mellanox/mlx5-next' into rdma.git for-next

From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

This is required to resolve dependencies of the next series of RDMA
patches.

* branch 'mellanox/mlx5-next':
  net/mlx5: Add support for flow table destination number
  net/mlx5: Add forward compatible support for the FTE match data
  net/mlx5: Fix tristate and description for MLX5 module
  net/mlx5: Better return types for CQE API
  net/mlx5: Use ERR_CAST() instead of coding it
  net/mlx5: Add missing SET_DRIVER_VERSION command translation
  net/mlx5: Add XRQ commands definitions
  net/mlx5: Add core support for double vlan push/pop steering action
  net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures
  net/mlx5: FW tracer, add hardware structures
  net/mlx5: fix uaccess beyond "count" in debugfs read/write handlers

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/mlx5: Add support for flow table destination number
Yishai Hadas [Tue, 19 Jun 2018 12:23:36 +0000 (15:23 +0300)]
net/mlx5: Add support for flow table destination number

Add support to set a destination from a flow table number.
This functionality will be used in downstream patches from this
series by the DEVX stuff.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Add forward compatible support for the FTE match data
Yishai Hadas [Tue, 19 Jun 2018 12:09:48 +0000 (15:09 +0300)]
net/mlx5: Add forward compatible support for the FTE match data

Use the PRM size including the reserved when working with the FTE
match data.

This comes to support forward compatibility for cases that current
reserved data will be exposed by the firmware by an application that
uses the DEVX API without changing the kernel.

Also drop some driver checks around the match criteria leaving the work
for firmware to enable forward compatibility for future bits there.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoIB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi
Jason Gunthorpe [Wed, 11 Jul 2018 22:20:44 +0000 (16:20 -0600)]
IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi

These constants are used in the ioctl interface so they are part of the
uapi, place them in the correct header for clarity.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoMAINTAINERS: Remove Dave Goodell from the usnic RDMA driver maintainer list
Bart Van Assche [Wed, 18 Jul 2018 16:14:36 +0000 (09:14 -0700)]
MAINTAINERS: Remove Dave Goodell from the usnic RDMA driver maintainer list

The e-mail address dgoodell@exch.cisco.com no longer exists. Additionally,
according to https://www.linkedin.com/in/goodell/ Dave is an Amazon
employee since December 2017. Hence remove his Cisco e-mail address from
the usnic maintainer list.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/bnxt_re: Modify a fall-through annotation
Bart Van Assche [Wed, 18 Jul 2018 15:58:02 +0000 (08:58 -0700)]
RDMA/bnxt_re: Modify a fall-through annotation

This patch avoids that gcc reports the following warning when building
with W=1:

drivers/infiniband/hw/bnxt_re/ib_verbs.c:2404:4: warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>