drm/amdgpu : Add hive ras recovery check
authorAsad Kamal <asad.kamal@amd.com>
Thu, 5 Oct 2023 07:40:42 +0000 (15:40 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Thu, 19 Oct 2023 22:26:51 +0000 (18:26 -0400)
commit53dd920c1f471a5763c660a7b94fe0aaf746d357
tree67ee44f4003a8a8bb1e8812fb2c03c2df4a6fe3e
parent16fb2a41e64e3133e9457c85490f6ee36c2ffaaf
drm/amdgpu : Add hive ras recovery check

If one of the devices in the hive detects a
fatal error, need to send ras recovery reset
message to PMFW of all devices in the hive.
For that add a flag in hive to indicate that
it's undergoing ras recovery

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c