drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
authorshaoyunl <shaoyun.liu@amd.com>
Sun, 14 Nov 2021 17:38:18 +0000 (12:38 -0500)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 8 Dec 2021 08:03:19 +0000 (09:03 +0100)
[ Upstream commit 2cf49e00d40d5132e3d067b5aa6d84791929ab15 ]

In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index 352a32dc609b2026a039caef5a4e40688f9291fb..2645ebc63a14d8a64152159cbba3f11d7e1fb391 100644 (file)
@@ -1207,6 +1207,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
        bool hanging;
 
        dqm_lock(dqm);
+       if (!dqm->sched_running) {
+               dqm_unlock(dqm);
+               return 0;
+       }
+
        if (!dqm->is_hws_hang)
                unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
        hanging = dqm->is_hws_hang || dqm->is_resetting;