accel/amdxdna: Add error handling
authorLizhi Hou <lizhi.hou@amd.com>
Mon, 18 Nov 2024 17:29:41 +0000 (09:29 -0800)
committerJeffrey Hugo <quic_jhugo@quicinc.com>
Fri, 22 Nov 2024 18:44:47 +0000 (11:44 -0700)
commit4fd4ca984b833a41f36bf7b2eaa9025377e310d0
tree8cc115047f720efcc39aa601a8493179201727b8
parentbed4c73e59e8e32a3dd68a5ea755601ab000bf7b
accel/amdxdna: Add error handling

When there is a hardware error, the NPU firmware notifies the host through
a mailbox message. The message includes details of the error, such as the
tile and column indexes where the error occurred.

The driver starts a thread to handle the NPU error message. The thread
stops the clients which are using the column where error occurred. Then
the driver resets that column.

Co-developed-by: Min Ma <min.ma@amd.com>
Signed-off-by: Min Ma <min.ma@amd.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118172942.2014541-10-lizhi.hou@amd.com
drivers/accel/amdxdna/Makefile
drivers/accel/amdxdna/aie2_error.c [new file with mode: 0644]
drivers/accel/amdxdna/aie2_message.c
drivers/accel/amdxdna/aie2_pci.c
drivers/accel/amdxdna/aie2_pci.h