Commit | Line | Data |
---|---|---|
cc2a2d19 MCC |
1 | =========================== |
2 | HPE iLO NMI Watchdog Driver | |
3 | =========================== | |
4 | ||
5 | for iLO based ProLiant Servers | |
6 | ============================== | |
7 | ||
18bd1963 | 8 | Last reviewed: 08/20/2018 |
47bece87 | 9 | |
47bece87 | 10 | |
84df082c | 11 | The HPE iLO NMI Watchdog driver is a kernel module that provides basic |
18bd1963 JH |
12 | watchdog functionality and handler for the iLO "Generate NMI to System" |
13 | virtual button. | |
14 | ||
84df082c NC |
15 | All references to iLO in this document imply it also works on iLO2 and all |
16 | subsequent generations. | |
47bece87 TM |
17 | |
18 | Watchdog functionality is enabled like any other common watchdog driver. That | |
19 | is, an application needs to be started that kicks off the watchdog timer. A | |
718d50ec | 20 | basic application exists in tools/testing/selftests/watchdog/ named |
47bece87 | 21 | watchdog-test.c. Simply compile the C file and kick it off. If the system |
84df082c | 22 | gets into a bad state and hangs, the HPE ProLiant iLO timer register will |
47bece87 TM |
23 | not be updated in a timely fashion and a hardware system reset (also known as |
24 | an Automatic Server Recovery (ASR)) event will occur. | |
25 | ||
18bd1963 | 26 | The hpwdt driver also has the following module parameters: |
47bece87 | 27 | |
cc2a2d19 MCC |
28 | ============ ================================================================ |
29 | soft_margin allows the user to set the watchdog timer value. | |
84df082c | 30 | Default value is 30 seconds. |
cc2a2d19 MCC |
31 | timeout an alias of soft_margin. |
32 | pretimeout allows the user to set the watchdog pretimeout value. | |
18bd1963 JH |
33 | This is the number of seconds before timeout when an |
34 | NMI is delivered to the system. Setting the value to | |
35 | zero disables the pretimeout NMI. | |
36 | Default value is 9 seconds. | |
cc2a2d19 | 37 | nowayout basic watchdog parameter that does not allow the timer to |
47bece87 | 38 | be restarted or an impending ASR to be escaped. |
84df082c NC |
39 | Default value is set when compiling the kernel. If it is set |
40 | to "Y", then there is no way of disabling the watchdog once | |
41 | it has been started. | |
f213fcf0 JH |
42 | kdumptimeout Minimum timeout in seconds to apply upon receipt of an NMI |
43 | before calling panic. (-1) disables the watchdog. When value | |
44 | is > 0, the timer is reprogrammed with the greater of | |
45 | value or current timeout value. | |
cc2a2d19 | 46 | ============ ================================================================ |
47bece87 | 47 | |
cc2a2d19 MCC |
48 | NOTE: |
49 | More information about watchdog drivers in general, including the ioctl | |
47bece87 | 50 | interface to /dev/watchdog can be found in |
cc2a2d19 | 51 | Documentation/watchdog/watchdog-api.rst and Documentation/IPMI.txt. |
47bece87 | 52 | |
18bd1963 JH |
53 | Due to limitations in the iLO hardware, the NMI pretimeout if enabled, |
54 | can only be set to 9 seconds. Attempts to set pretimeout to other | |
55 | non-zero values will be rounded, possibly to zero. Users should verify | |
56 | the pretimeout value after attempting to set pretimeout or timeout. | |
47bece87 | 57 | |
18bd1963 JH |
58 | Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a |
59 | panic. This is to allow for a crash dump to be collected. It is incumbent | |
60 | upon the user to have properly configured the system for kdump. | |
47bece87 | 61 | |
18bd1963 JH |
62 | The default Linux kernel behavior upon panic is to print a kernel tombstone |
63 | and loop forever. This is generally not what a watchdog user wants. | |
47bece87 | 64 | |
18bd1963 | 65 | For those wishing to learn more please see: |
bff9e34c | 66 | Documentation/admin-guide/kdump/kdump.rst |
18bd1963 JH |
67 | Documentation/admin-guide/kernel-parameters.txt (panic=) |
68 | Your Linux Distribution specific documentation. | |
47bece87 | 69 | |
18bd1963 JH |
70 | If the hpwdt does not receive the NMI associated with an expiring timer, |
71 | the iLO will proceed to reset the system at timeout if the timer hasn't | |
72 | been updated. | |
47bece87 | 73 | |
18bd1963 | 74 | -- |
47bece87 | 75 | |
18bd1963 JH |
76 | The HPE iLO NMI Watchdog Driver and documentation were originally developed |
77 | by Tom Mingarelli. |