powerpc/eeh: No hotplug on permanently removed dev
authorGavin Shan <gwshan@linux.vnet.ibm.com>
Thu, 24 Apr 2014 08:00:19 +0000 (18:00 +1000)
committerBenjamin Herrenschmidt <benh@kernel.crashing.org>
Mon, 28 Apr 2014 07:34:32 +0000 (17:34 +1000)
commitd2b0f6f77ee525811b6efe864efa6a4eb82eea73
tree84205706f9cc2e03425ba3a48edf2a1d527e3267
parent7f52a526f64c69c913f0027fbf43821ff0b3a7d7
powerpc/eeh: No hotplug on permanently removed dev

The issue was detected in a bit complicated test case where
we have multiple hierarchical PEs shown as following figure:

                +-----------------+
                | PE#3     p2p#0  |
                |          p2p#1  |
                +-----------------+
                        |
                +-----------------+
                | PE#4     pdev#0 |
                |          pdev#1 |
                +-----------------+

PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p
bridges. We accidentally had less-known scenario: PE#4 was removed
permanently from the system because of permanent failure (e.g.
exceeding the max allowd failure times in last hour), then we detects
EEH errors on PE#3 and tried to recover it. However, eeh_dev instances
for pdev#0/1 were not detached from PE#4, which was still connected to
PE#3. All of that was because of the fact that we rely on count-based
pcibios_release_device(), which isn't reliable enough. When doing
recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which
are not valid any more. Eventually, we run into kernel crash.

The patch fixes above issue from two aspects. For unplug, we simply
skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED
&& !EEH_PE_STATE_RECOVERING) and its frozen count should be greater
than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently
removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read
its PCI config so that PCI core will omit them.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/include/asm/eeh.h
arch/powerpc/include/asm/ppc-pci.h
arch/powerpc/kernel/eeh_driver.c
arch/powerpc/kernel/eeh_pe.c
arch/powerpc/kernel/pci_of_scan.c
arch/powerpc/platforms/powernv/pci.c