kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
authorMax Kellermann <max.kellermann@ionos.com>
Sun, 4 May 2025 18:08:30 +0000 (20:08 +0200)
committerAndrew Morton <akpm@linux-foundation.org>
Wed, 21 May 2025 17:48:22 +0000 (10:48 -0700)
commitaaf05e96e93cb9c8fc8d35fc4525715530397655
tree66a02bb56d03e1cf7a97cbf6cd1dc9e4e10148db
parentcc66e4863ac3a00c709bfcbec685d48c184172b5
kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count

Patch series "sysfs: add counters for lockups and stalls", v2.

Commits 9db89b411170 ("exit: Expose "oops_count" to sysfs") and
8b05aa263361 ("panic: Expose "warn_count" to sysfs") added counters for
oopses and warnings to sysfs, and these two patches do the same for
hard/soft lockups and RCU stalls.

All of these counters are useful for monitoring tools to detect whether
the machine is healthy.  If the kernel has experienced a lockup or a
stall, it's probably due to a kernel bug, and I'd like to detect that
quickly and easily.  There is currently no way to detect that, other than
parsing dmesg.  Or observing indirect effects: such as certain tasks not
responding, but then I need to observe all tasks, and it may take a while
until these effects become visible/measurable.  I'd rather be able to
detect the primary cause more quickly, possibly before everything falls
apart.

This patch (of 2):

There is /proc/sys/kernel/hung_task_detect_count, /sys/kernel/warn_count
and /sys/kernel/oops_count but there is no userspace-accessible counter
for hard/soft lockups.  Having this is useful for monitoring tools.

Link: https://lkml.kernel.org/r/20250504180831.4190860-1-max.kellermann@ionos.com
Link: https://lkml.kernel.org/r/20250504180831.4190860-2-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Cc:
Cc: Core Minyard <cminyard@mvista.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Documentation/ABI/testing/sysfs-kernel-hardlockup_count [new file with mode: 0644]
Documentation/ABI/testing/sysfs-kernel-softlockup_count [new file with mode: 0644]
kernel/watchdog.c