wifi: ath10k: shutdown driver when hardware is unreliable
authorKang Yang <kang.yang@oss.qualcomm.com>
Mon, 23 Jun 2025 02:27:31 +0000 (10:27 +0800)
committerJeff Johnson <jeff.johnson@oss.qualcomm.com>
Mon, 30 Jun 2025 15:45:50 +0000 (08:45 -0700)
commitc256a94d1b1b15109740306f7f2a7c2173e12072
treefbce00eba1b185e590229d7a5ac400923d139e41
parent6e17bbb5a86e6c68d65e38dfc850699e7a0706cb
wifi: ath10k: shutdown driver when hardware is unreliable

In rare cases, ath10k may lose connection with the PCIe bus due to
some unknown reasons, which could further lead to system crashes during
resuming due to watchdog timeout:

ath10k_pci 0000:01:00.0: wmi command 20486 timeout, restarting hardware
ath10k_pci 0000:01:00.0: already restarting
ath10k_pci 0000:01:00.0: failed to stop WMI vdev 0: -11
ath10k_pci 0000:01:00.0: failed to stop vdev 0: -11
ieee80211 phy0: PM: **** DPM device timeout ****
Call Trace:
 panic+0x125/0x315
 dpm_watchdog_set+0x54/0x54
 dpm_watchdog_handler+0x57/0x57
 call_timer_fn+0x31/0x13c

At this point, all WMI commands will timeout and attempt to restart
device. So set a threshold for consecutive restart failures. If the
threshold is exceeded, consider the hardware is unreliable and all
ath10k operations should be skipped to avoid system crash.

fail_cont_count and pending_recovery are atomic variables, and
do not involve complex conditional logic. Therefore, even if recovery
check and reconfig complete are executed concurrently, the recovery
mechanism will not be broken.

Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00288-QCARMSWPZ-1

Signed-off-by: Kang Yang <kang.yang@oss.qualcomm.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20250623022731.509-1-kang.yang@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
drivers/net/wireless/ath/ath10k/core.c
drivers/net/wireless/ath/ath10k/core.h
drivers/net/wireless/ath/ath10k/mac.c
drivers/net/wireless/ath/ath10k/wmi.c