net/mlx5: Fix missing lock on sync reset reload
authorMoshe Shemesh <moshe@nvidia.com>
Tue, 30 Jul 2024 06:16:34 +0000 (09:16 +0300)
committerJakub Kicinski <kuba@kernel.org>
Thu, 1 Aug 2024 01:04:51 +0000 (18:04 -0700)
On sync reset reload work, when remote host updates devlink on reload
actions performed on that host, it misses taking devlink lock before
calling devlink_remote_reload_actions_performed() which results in
triggering lock assert like the following:

WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+0x3e/0x50

 CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S      W          6.10.0-rc2+ #116
 Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0 12/18/2015
 Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core]
 RIP: 0010:devl_assert_locked+0x3e/0x50

 Call Trace:
  <TASK>
  ? __warn+0xa4/0x210
  ? devl_assert_locked+0x3e/0x50
  ? report_bug+0x160/0x280
  ? handle_bug+0x3f/0x80
  ? exc_invalid_op+0x17/0x40
  ? asm_exc_invalid_op+0x1a/0x20
  ? devl_assert_locked+0x3e/0x50
  devlink_notify+0x88/0x2b0
  ? mlx5_attach_device+0x20c/0x230 [mlx5_core]
  ? __pfx_devlink_notify+0x10/0x10
  ? process_one_work+0x4b6/0xbb0
  process_one_work+0x4b6/0xbb0
[…]

Fixes: 84a433a40d0e ("net/mlx5: Lock mlx5 devlink reload callbacks")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c

index 979c49ae6b5cc987eeac168adfc03f2278a0d91e..b43ca0b762c306112e8ead14a3e7cda8dcdf0738 100644 (file)
@@ -207,6 +207,7 @@ int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev)
 static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unloaded)
 {
        struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+       struct devlink *devlink = priv_to_devlink(dev);
 
        /* if this is the driver that initiated the fw reset, devlink completed the reload */
        if (test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags)) {
@@ -218,9 +219,11 @@ static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unload
                        mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n");
                else
                        mlx5_load_one(dev, true);
-               devlink_remote_reload_actions_performed(priv_to_devlink(dev), 0,
+               devl_lock(devlink);
+               devlink_remote_reload_actions_performed(devlink, 0,
                                                        BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
                                                        BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE));
+               devl_unlock(devlink);
        }
 }