summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-28bcache: remove EXPERIMENTAL for Kconfig option 'Asynchronous device ↵for-5.20/drivers-2022-07-29for-5.20/driversColy Li
registration' The "Asynchronous device registration (EXPERIMENTAL)" Kconfig option is for 2+ years, it is used when registration takes too much time for massive amount of cached data, to avoid udev task timeout during boot time. Many users and products enable this Kconfig option for quite long time (e.g. SUSE Linux) and it works as expected and no issue reported. It is time to remove the "EXPERIMENTAL" tag from this Kconfig item. Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220719042724.8498-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-25nbd: add missing definition of pr_fmtYu Kuai
commit 1243172d5894 ("nbd: use pr_err to output error message") tries to define pr_fmt and use short pr_err() to output error message, however, the definition is missed. This patch also remove existing "nbd:" inside pr_err(). Fixes: 1243172d5894 ("nbd: use pr_err to output error message") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220723082427.3890655-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-15null_blk: fix ida error handling in null_add_dev()Dan Carpenter
There needs to be some error checking if ida_simple_get() fails. Also call ida_free() if there are errors later. Fixes: 94bc02e30fb8 ("nullb: use ida to manage index") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YtEhXsr6vJeoiYhd@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-15null_blk: cleanup null_init_tag_setMing Lei
The passed 'nullb' can be NULL, so cause null ptr reference. Fix the issue, meantime cleanup null_init_tag_set for avoiding to add similar issue in future. Meantime set BLK_MQ_F_NO_SCHED if g_no_sched is true in case of NULL device, same with BLK_MQ_F_TAG_HCTX_SHARED. Cc: Vincent Fu <vincent.fu@samsung.com> Fixes: 37ae152c7a0d ("null_blk: add configfs variables for 2 options") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220715142847.188275-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14Merge tag 'nvme-5.20-2022-07-14' of git://git.infradead.org/nvme into ↵Jens Axboe
for-5.20/drivers Pull NVMe updates from Christoph: "nvme updates for Linux 5.20 - add support for In-Band authentication (Hannes Reinecke) - handle the persistent internal error AER (Michael Kelley) - use in-capsule data for TCP I/O queue connect (Caleb Sander) - remove timeout for getting RDMA-CM established event (Israel Rukshin) - misc cleanups (Joel Granados, Sagi Grimberg, Chaitanya Kulkarni, Guixin Liu, Xiang wangx)" * tag 'nvme-5.20-2022-07-14' of git://git.infradead.org/nvme: (21 commits) nvme-multipath: refactor nvme_mpath_add_disk nvme-apple: use nvme core helper to cancel requests in tagset nvme-pci: use nvme core helper to cancel requests in tagset nvme-tcp: use in-capsule data for I/O connect nvme-rdma: remove timeout for getting RDMA-CM established event nvmet-auth: expire authentication sessions nvmet-auth: Diffie-Hellman key exchange support nvmet: implement basic In-Band Authentication nvmet: parse fabrics commands on io queues nvme-auth: Diffie-Hellman key exchange support nvme: implement In-Band authentication nvme-fabrics: decode 'authentication required' connect error nvme: add definitions for NVMe In-Band authentication lib/base64: RFC4648-compliant base64 encoding crypto: add crypto_has_kpp() crypto: add crypto_has_shash() nvme-loop: use nvme core helpers to cancel all requests in a tagset nvme: fix qid param blk_mq_alloc_request_hctx nvme: remove unused timeout parameter nvme: handle the persistent internal error AER ...
2022-07-12nvme-multipath: refactor nvme_mpath_add_diskJoel Granados
Pass anagrpid as second argument. This is prep patch that allows reusing this function for supporting unknown command sets. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-12nvme-apple: use nvme core helper to cancel requests in tagsetGuixin Liu
Use nvme core helper nvme_cancel_tagset and nvme_cancel_admin_tagset instead of same logic code. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-12nvme-pci: use nvme core helper to cancel requests in tagsetGuixin Liu
Use nvme core helper nvme_cancel_tagset and nvme_cancel_admin_tagset instead of same logic code. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ruozhu Li <liruozhu@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-12nvme-tcp: use in-capsule data for I/O connectCaleb Sander
Currently, command data is only sent in-capsule on the for admin or I/O commands on queues that indicate support for it. Send fabrics command data in-capsule for I/O queues as well to avoid needing a separate H2CData PDU for the connect command. This is optimization. Without this change, we send the connect command capsule and data in separate PDUs (CapsuleCmd and H2CData), and must wait for the controller to respond with an R2T PDU before sending the H2CData. With the change, we send a single CapsuleCmd PDU that includes the data. This reduces the number of bytes (and likely packets) sent across the network, and simplifies the send state machine handling in the driver. Signed-off-by: Caleb Sander <csander@purestorage.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-12nvme-rdma: remove timeout for getting RDMA-CM established eventIsrael Rukshin
In case many controllers start error recovery at the same time (i.e., when port is down and up), they may never succeed to reconnect again. This is because the target can't handle all the connect requests at three seconds (the arbitrary value set today). Even if some of the connections are established, when a single queue fails to connect, all the controller's queues are destroyed as well. So, on the following reconnection attempts the number of connect requests may remain the same. To fix this, remove the timeout and wait for RDMA-CM event to abort/complete the connect request. RDMA-CM sends unreachable event when a timeout of ~90 seconds is expired. This approach is used at other RDMA-CM users like SRP and iSER at blocking mode. The commit also renames NVME_RDMA_CONNECT_TIMEOUT_MS to NVME_RDMA_CM_TIMEOUT_MS. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-12null_blk: add configfs variables for 2 optionsVincent Fu
Allow setting via configfs these two options: no_sched shared_tag_bitmap Previously these could only be activated as module parameters. Still missing are: shared_tags timeout requeue init_hctx Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220708174943.87787-3-vincent.fu@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-12null_blk: add module parameters for 4 optionsVincent Fu
Add as module parameters these options: memory_backed discard mbps cache_size Previously these could only be set via configfs. Still missing is bad_blocks. The kernel test robot found a documentation formatting issue in v1 of this patch. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Link: https://lore.kernel.org/r/20220708174943.87787-2-vincent.fu@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-07block/rnbd-srv: Replace sess_dev_list with index_idrMd Haris Iqbal
The structure rnbd_srv_session maintains a list and an xarray of rnbd_srv_dev. There is no need to keep both as one of them can serve the purpose. Since one of the places where the lookup of rnbd_srv_dev using rnbd_srv_session is IO path, an xarray would serve us better than a list traversal. Hence remove sess_dev_list from rnbd_srv_session, and replace its uses from xarray. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20220707143122.460362-3-haris.iqbal@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-07block/rnbd-srv: Set keep_id to true after mutex_trylockMd Haris Iqbal
After setting keep_id if the mutex trylock fails, the keep_id stays set for the rest of the sess_dev lifetime. Therefore, set keep_id to true after mutex_trylock succeeds, so that a failure of trylock does'nt touch keep_id. Fixes: b168e1d85cf3 ("block/rnbd-srv: Prevent a deadlock generated by accessing sysfs in parallel") Cc: gi-oh.kim@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20220707143122.460362-2-haris.iqbal@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06nvmet-auth: expire authentication sessionsHannes Reinecke
Each authentication step is required to be completed within the KATO interval (or two minutes if not set). So add a workqueue function to reset the transaction ID and the expected next protocol step; this will automatically the next authentication command referring to the terminated authentication. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvmet-auth: Diffie-Hellman key exchange supportHannes Reinecke
Implement Diffie-Hellman key exchange using FFDHE groups for NVMe In-Band Authentication. This patch adds a new host configfs attribute 'dhchap_dhgroup' to select the FFDHE group to use. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvmet: implement basic In-Band AuthenticationHannes Reinecke
Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006. This patch adds three additional configfs entries 'dhchap_key', 'dhchap_ctrl_key', and 'dhchap_hash' to the 'host' configfs directory. The 'dhchap_key' and 'dhchap_ctrl_key' entries need to be in the ASCII format as specified in NVMe Base Specification v2.0 section 8.13.5.8 'Secret representation'. 'dhchap_hash' defaults to 'hmac(sha256)', and can be written to to switch to a different HMAC algorithm. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvmet: parse fabrics commands on io queuesHannes Reinecke
Some fabrics commands can be sent via io queues, so add a new function nvmet_parse_fabrics_io_cmd() and rename the existing nvmet_parse_fabrics_cmd() to nvmet_parse_fabrics_admin_cmd(). Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme-auth: Diffie-Hellman key exchange supportHannes Reinecke
Implement Diffie-Hellman key exchange using FFDHE groups for NVMe In-Band Authentication. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme: implement In-Band authenticationHannes Reinecke
Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006. This patch adds two new fabric options 'dhchap_secret' to specify the pre-shared key (in ASCII respresentation according to NVMe 2.0 section 8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify the pre-shared controller key for bi-directional authentication of both the host and the controller. Re-authentication can be triggered by writing the PSK into the new controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme-fabrics: decode 'authentication required' connect errorHannes Reinecke
The 'connect' command might fail with NVME_SC_AUTH_REQUIRED, so we should be decoding this error, too. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme: add definitions for NVMe In-Band authenticationHannes Reinecke
Add new definitions for NVMe In-band authentication as defined in the NVMe Base Specification v2.0. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06lib/base64: RFC4648-compliant base64 encodingHannes Reinecke
Add RFC4648-compliant base64 encoding and decoding routines, based on the base64url encoding in fs/crypto/fname.c. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06crypto: add crypto_has_kpp()Hannes Reinecke
Add helper function to determine if a given key-agreement protocol primitive is supported. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06crypto: add crypto_has_shash()Hannes Reinecke
Add helper function to determine if a given synchronous hash is supported. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme-loop: use nvme core helpers to cancel all requests in a tagsetSagi Grimberg
A helper now exist, no need to open-code the same functionality. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme: fix qid param blk_mq_alloc_request_hctxChaitanya Kulkarni
Only caller of the __nvme_submit_sync_cmd() with qid value not equal to NVME_QID_ANY is nvmf_connect_io_queues(), where qid value is alway set to > 0. [1] __nvme_submit_sync_cmd() callers with qid parameter from :- Caller | qid parameter ------------------------------------------------------ * nvme_fc_connect_io_queues() | nvmf_connect_io_queue() | qid > 0 * nvme_rdma_start_io_queues() | nvme_rdma_start_queue() | nvmf_connect_io_queues() | qid > 0 * nvme_tcp_start_io_queues() | nvme_tcp_start_queue() | nvmf_connect_io_queues() | qid > 0 * nvme_loop_connect_io_queues() | nvmf_connect_io_queues() | qid > 0 When qid value of the function parameter __nvme_submit_sync_cmd() is > 0 from above callers, we use blk_mq_alloc_request_hctx(), where we pass last parameter as 0 if qid functional parameter value is set to 0 with conditional operators, see 1002 :- 991 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd, 992 union nvme_result *result, void *buffer, unsigned bufflen, 993 int qid, int at_head, blk_mq_req_flags_t flags) 994 { 995 struct request *req; 996 int ret; 997 998 if (qid == NVME_QID_ANY) 999 req = blk_mq_alloc_request(q, nvme_req_op(cmd), flags); 1000 else 1001 req = blk_mq_alloc_request_hctx(q, nvme_req_op(cmd), flags, 1002 qid ? qid - 1 : 0); 1003 But qid function parameter value of the __nvme_submit_sync_cmd() will never be 0 from above caller list see [1], and all the other callers of __nvme_submit_sync_cmd() use NVME_QID_ANY as qid value :- 1. nvme_submit_sync_cmd() 2. nvme_features() 3. nvme_sec_submit() 4. nvmf_reg_read32() 5. nvmf_reg_read64() 6. nvmf_ref_write32() 7. nvmf_connect_admin_queue() Remove the conditional operator to pass the qid as 0 in the call to blk_mq_alloc_requst_hctx(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme: remove unused timeout parameterChaitanya Kulkarni
The function __nvme_submit_sync_cmd() has following list of callers that sets the timeout value to 0 :- Callers | Timeout value ------------------------------------------------ nvme_submit_sync_cmd() | 0 nvme_features() | 0 nvme_sec_submit() | 0 nvmf_reg_read32() | 0 nvmf_reg_read64() | 0 nvmf_reg_write32() | 0 nvmf_connect_admin_queue() | 0 nvmf_connect_io_queue() | 0 Remove the timeout function parameter from __nvme_submit_sync_cmd() and adjust the rest of code accordingly. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme: handle the persistent internal error AERMichael Kelley
In the NVM Express Revision 1.4 spec, Figure 145 describes possible values for an AER with event type "Error" (value 000b). For a Persistent Internal Error (value 03h), the host should perform a controller reset. Add support for this error using code that already exists for doing a controller reset. As part of this support, introduce two utility functions for parsing the AER type and subtype. This new support was tested in a lab environment where we can generate the persistent internal error on demand, and observe both the Linux side and NVMe controller side to see that the controller reset has been done. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme: remove a double word in a commentXiang wangx
Delete the redundant word 'be'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06rnbd-clt: make rnbd_clt_change_capacity return voidGuoqing Jiang
No need to checking the return value, make it return void. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220706133152.12058-9-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06rnbd-clt: pass sector_t type for resize capacityGuoqing Jiang
Let's change the parameter type to 'sector_t' then we don't need to cast it from rnbd_clt_resize_dev_store, and update rnbd_clt_resize_disk too. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220706133152.12058-8-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06rnbd-clt: check capacity inside rnbd_clt_change_capacityGuoqing Jiang
Currently, process_msg_open_rsp checks if capacity changed or not before call rnbd_clt_change_capacity while the checking also make sense for rnbd_clt_resize_dev_store, let's move the checking into the function. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220706133152.12058-7-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06rnbd-clt: adjust the layout of struct rnbd_clt_devGuoqing Jiang
While at it, let re-arrange the struct to remove holes. Before, pahole reports /* size: 232, cachelines: 4, members: 17 */ /* sum members: 224, holes: 2, sum holes: 8 */ /* last cacheline: 40 bytes */ After the change, the report changes to /* size: 224, cachelines: 4, members: 17 */ /* last cacheline: 32 bytes */ Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220706133152.12058-6-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06rnbd-clt: reduce the size of struct rnbd_clt_devGuoqing Jiang
Previously, both map and remap trigger rnbd_clt_set_dev_attr to set some members in rnbd_clt_dev such as wc, fua and logical_block_size etc, but those members are only useful for map scenario given the setup_request_queue is only called from the path: rnbd_clt_map_device -> rnbd_client_setup_device Since rnbd_clt_map_device frees rsp after rnbd_client_setup_device, we can pass rsp to rnbd_client_setup_device and it's callees, which means queue's attributes can be set directly from relevant members of rsp instead from rnbd_clt_dev. After that, we can kill 11 members from rnbd_clt_dev, and we don't need rnbd_clt_set_dev_attr either. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220706133152.12058-5-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06rnbd-clt: kill read_only from struct rnbd_clt_devGuoqing Jiang
The member is not needed since we can call get_disk_ro to achieve the same goal. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220706133152.12058-4-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06rnbd-clt: don't free rsp in msg_open_conf for map scenarioGuoqing Jiang
For map scenario, rsp is freed in two places: 1. msg_open_conf frees rsp if rtrs_clt_request returns 0. 2. Otherwise, rsp is freed by the call sites of rtrs_clt_request. Now, We'd like to control full lifecycle of rsp in rnbd_clt_map_device, with that, it is feasible to pass rsp to rnbd_client_setup_device in next commit. For 1, it is possible to free rsp from the caller of send_usr_msg because of the synchronization of iu->comp.wait. And we put iu later in rnbd_clt_map_device to ensure order of release rsp and iu. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220706133152.12058-3-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06rnbd-clt: open code send_msg_open in rnbd_clt_map_deviceGuoqing Jiang
Let's open code it in rnbd_clt_map_device, then we can use information from rsp to setup gendisk and request_queue in next commits. After that, we can remove some members (wc, fua and max_hw_sectors etc) from struct rnbd_clt_dev. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220706133152.12058-2-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-03block: null_blk: Use the bitmap API to allocate bitmapsChristophe JAILLET
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/7c4d3116ba843fc4a8ae557dd6176352a6cd0985.1656864320.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-03Merge branch 'md-next' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.20/drivers Pull MD updates from Song: "1. Improve raid5 lock contention, by Logan Gunthorpe. 2. Misc fixes to raid5, by Logan Gunthorpe. 3. Fix race condition with md_reap_sync_thread(), by Guoqing Jiang." * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: (29 commits) md: Fix spelling mistake in comments md/raid5: Increase restriction on max segments per request md/raid5: Improve debug prints md/raid5: Pivot raid5_make_request() md/raid5: Check all disks in a stripe_head for reshape progress md/raid5: Refactor add_stripe_bio() md/raid5: Keep a reference to last stripe_head for batch md/raid5: Refactor for loop in raid5_make_request() into while loop md/raid5: Move read_seqcount_begin() into make_stripe_request() md/raid5: Drop the do_prepare flag in raid5_make_request() md/raid5: Factor out helper from raid5_make_request() loop md/raid5: Move common stripe get code into new find_get_stripe() helper md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio() md/raid5: Refactor raid5_make_request loop md/raid5: Factor out ahead_of_reshape() function md/raid5: Make logic blocking check consistent with logic that blocks md: unlock mddev before reap sync_thread in action_store md: Explicitly create command-line configured devices md: Notify sysfs sync_completed in md_reap_sync_thread() md: Ensure resync is reported after it starts ...
2022-07-03md: Fix spelling mistake in commentsZhang Jiaming
There are 2 spelling mistakes in comments. Fix it. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Increase restriction on max segments per requestLogan Gunthorpe
The block layer defaults the maximum segments to 128, which means requests tend to get split around the 512KB depending on how many pages can be merged. There's no such restriction in the raid5 code so increase the limit to USHRT_MAX so that larger requests can be sent as one. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Improve debug printsLogan Gunthorpe
Add a debug print for raid5_make_request() so that each request is printed and add the logical sector number to the debug print in __add_stripe_bio(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Pivot raid5_make_request()Logan Gunthorpe
raid5_make_request() loops through every page in the request, finds the appropriate stripe and adds the bio for that page in the disk. This causes a great deal of contention on the hash_lock and extra work seeing each stripe must be found once for every data disk. The number of times a stripe must be found can be reduced by pivoting raid5_make_request() so that it loops through every stripe and then loops through every disk in that stripe to see if the bio must be added. This reduces the number of times the hash lock must be taken by a factor equal to the number of data disks. To accomplish this, the logical sectors that have already been added must be tracked. Tracking them is done with a bitmap: the bits for all pages are set at the start of the request and each bit is cleared once the bio is added to a stripe. Finding the next sector to be done is then just a call to find_first_bit() so that sectors that have been done can simply be skipped. One minor downside is that the maximum sectors for a request must be limited so that the bitmap can be appropriately sized on the stack. This limit is arbitrarily chosen to be 256 stripe pages which works out to 1MB if PAGE_SIZE == DEFAULT_STRIPE_SIZE. This doesn't actually restrict the maximum request further seeing the default block queue settings are used which restricts the number of segments to 128 (which results in request sizes that are approximately 512KB). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Check all disks in a stripe_head for reshape progressLogan Gunthorpe
When testing if a previous stripe has had reshape expand past it, use the earliest or latest logical sector in all the disks for that stripe head. This will allow adding multiple disks at a time in a subesquent patch. To do this cleaner, refactor the check into a helper function called stripe_ahead_of_reshape(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Refactor add_stripe_bio()Logan Gunthorpe
Factor out two helper functions from add_stripe_bio(): one to check for overlap (stripe_bio_overlaps()), and one to actually add the bio to the stripe (__add_stripe_bio()). The latter function will always succeed. This will be useful in the next patch so that overlap can be checked for multiple disks before adding any Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Keep a reference to last stripe_head for batchLogan Gunthorpe
When batching, every stripe head has to find the previous stripe head to add to the batch list. This involves taking the hash lock which is highly contended during IO. Instead of finding the previous stripe_head each time, store a reference to the previous stripe_head in a pointer so that it doesn't require taking the contended lock another time. The reference to the previous stripe must be released before scheduling and waiting for work to get done. Otherwise, it can hold up raid5_activate_delayed() and deadlock. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Refactor for loop in raid5_make_request() into while loopLogan Gunthorpe
The for loop with retry label can be more cleanly expressed as a while loop by moving the logical_sector increment into the success path. No functional changes intended. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Move read_seqcount_begin() into make_stripe_request()Logan Gunthorpe
Now that prepare_to_wait() isn't in the way, move read_sequcount_begin() into make_stripe_request(). No functional changes intended. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-07-03md/raid5: Drop the do_prepare flag in raid5_make_request()Logan Gunthorpe
prepare_to_wait() can be reasonably called after schedule instead of setting a flag and preparing in the next loop iteration. This means that prepare_to_wait() will be called before read_seqcount_begin(), but there shouldn't be any reason that the order matters here. On the first iteration of the loop prepare_to_wait() is already called first. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>