Commit | Line | Data |
---|---|---|
47bece87 TM |
1 | Last reviewed: 06/02/2009 |
2 | ||
3 | HP iLO2 NMI Watchdog Driver | |
4 | NMI sourcing for iLO2 based ProLiant Servers | |
5 | Documentation and Driver by | |
6 | Thomas Mingarelli <thomas.mingarelli@hp.com> | |
7 | ||
8 | The HP iLO2 NMI Watchdog driver is a kernel module that provides basic | |
9 | watchdog functionality and the added benefit of NMI sourcing. Both the | |
10 | watchdog functionality and the NMI sourcing capability need to be enabled | |
11 | by the user. Remember that the two modes are not dependant on one another. | |
12 | A user can have the NMI sourcing without the watchdog timer and vice-versa. | |
13 | ||
14 | Watchdog functionality is enabled like any other common watchdog driver. That | |
15 | is, an application needs to be started that kicks off the watchdog timer. A | |
16 | basic application exists in the Documentation/watchdog/src directory called | |
17 | watchdog-test.c. Simply compile the C file and kick it off. If the system | |
18 | gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will | |
19 | not be updated in a timely fashion and a hardware system reset (also known as | |
20 | an Automatic Server Recovery (ASR)) event will occur. | |
21 | ||
44df7535 | 22 | The hpwdt driver also has four (4) module parameters. They are the following: |
47bece87 TM |
23 | |
24 | soft_margin - allows the user to set the watchdog timer value | |
25 | allow_kdump - allows the user to save off a kernel dump image after an NMI | |
26 | nowayout - basic watchdog parameter that does not allow the timer to | |
27 | be restarted or an impending ASR to be escaped. | |
44df7535 TM |
28 | priority - determines whether or not the hpwdt driver is first on the |
29 | die_notify list to handle NMIs or last. The default value | |
30 | for this module parameter is 0 or LAST. If the user wants to | |
31 | enable NMI sourcing then reload the hpwdt driver with | |
32 | priority=1 (and boot with nmi_watchdog=0). | |
47bece87 TM |
33 | |
34 | NOTE: More information about watchdog drivers in general, including the ioctl | |
35 | interface to /dev/watchdog can be found in | |
36 | Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt. | |
37 | ||
44df7535 TM |
38 | The priority parameter was introduced due to other kernel software that relied |
39 | on handling NMIs (like oprofile). Keeping hpwdt's priority at 0 (or LAST) | |
40 | enables the users of NMIs for non critical events to be work as expected. | |
41 | ||
42 | The NMI sourcing capability is disabled by default due to the inability to | |
47bece87 TM |
43 | distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the |
44 | Linux kernel. What this means is that the hpwdt nmi handler code is called | |
45 | each time the NMI signal fires off. This could amount to several thousands of | |
46 | NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and | |
47 | confused" message in the logs or if the system gets into a hung state, then | |
44df7535 TM |
48 | the hpwdt driver can be reloaded with the "priority" module parameter set |
49 | (priority=1). | |
47bece87 TM |
50 | |
51 | 1. If the kernel has not been booted with nmi_watchdog turned off then | |
52 | edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the | |
53 | currently booting kernel line. | |
54 | 2. reboot the sever | |
44df7535 TM |
55 | 3. Once the system comes up perform a rmmod hpwdt |
56 | 4. insmod /lib/modules/`uname -r`/kernel/drivers/char/watchdog/hpwdt.ko priority=1 | |
47bece87 TM |
57 | |
58 | Now, the hpwdt can successfully receive and source the NMI and provide a log | |
59 | message that details the reason for the NMI (as determined by the HP BIOS). | |
60 | ||
61 | Below is a list of NMIs the HP BIOS understands along with the associated | |
62 | code (reason): | |
63 | ||
64 | No source found 00h | |
65 | ||
66 | Uncorrectable Memory Error 01h | |
67 | ||
68 | ASR NMI 1Bh | |
69 | ||
70 | PCI Parity Error 20h | |
71 | ||
72 | NMI Button Press 27h | |
73 | ||
74 | SB_BUS_NMI 28h | |
75 | ||
76 | ILO Doorbell NMI 29h | |
77 | ||
78 | ILO IOP NMI 2Ah | |
79 | ||
80 | ILO Watchdog NMI 2Bh | |
81 | ||
82 | Proc Throt NMI 2Ch | |
83 | ||
84 | Front Side Bus NMI 2Dh | |
85 | ||
86 | PCI Express Error 2Fh | |
87 | ||
88 | DMA controller NMI 30h | |
89 | ||
90 | Hypertransport/CSI Error 31h | |
91 | ||
92 | ||
93 | ||
94 | -- Tom Mingarelli | |
95 | (thomas.mingarelli@hp.com) |