vdpa/mlx5: Parallelize device resume
authorDragos Tatulea <dtatulea@nvidia.com>
Fri, 16 Aug 2024 09:01:56 +0000 (12:01 +0300)
committerMichael S. Tsirkin <mst@redhat.com>
Wed, 25 Sep 2024 11:07:42 +0000 (07:07 -0400)
commit5eb8c7eb1ec74ac6b9e7337674cb7a33e82a1e68
tree2e109fced0668b194871554e8526a25d12ef3354
parentdcf3eac01f063df0a60ea779399331d2ac535784
vdpa/mlx5: Parallelize device resume

Currently device resume works on vqs serially. Building up on previous
changes that converted vq operations to the async api, this patch
parallelizes the device resume.

For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM,
32 CPUs x 2 threads per core), the device resume time is reduced from
~16 ms to ~4.5 ms.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Eugenio PĂ©rez <eperezma@redhat.com>
Message-Id: <20240816090159.1967650-8-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
drivers/vdpa/mlx5/net/mlx5_vnet.c