Check dmesg¶
Overview¶
Checks the kernel ring buffer (dmesg) for messages at severity levels emerg, alert, crit, and err. Known false positives and hardware-specific noise are filtered out by default. To clear reported messages after resolving the underlying issue, run „dmesg –clear“. Requires root or sudo.
Important Notes:
The reported timestamps may be inaccurate. The time source used for dmesg is not updated after system SUSPEND/RESUME. Timestamps are adjusted according to the current delta between boottime and monotonic clocks, which only works for messages printed after the last resume
The kernel ring buffer is a fixed-size circular buffer. Over time, newer messages overwrite older ones, so errors that have been resolved and whose messages have been overwritten will no longer be reported
Data Collection:
Executes
dmesg --level=emerg,alert,crit,err --ctimeto read the kernel ring bufferKnown false positives are filtered out by default, including common harmless messages such as „Assuming drive cache: write through“, „ioctl error in smb2_get_dfs_refer rc=-5“, „shpchp pci_hp_register failed with error -16“ on virtualized hosts, and various KVM/EFI/SMBus messages. The bundled default ignore list is annotated inline with the rationale and reference URLs for each entry, so it can be re-evaluated as the plugin matures
Additional messages can be excluded using the
--ignoreparameter, which accepts Python regular expressions and may be specified multiple times. Once--ignoreis given, the user-supplied list replaces the bundled default ignore list, so admins can curate their own catalogue without inheriting the defaultsIf more than 10 error lines are found, the output is shortened to the first 5 and last 5 lines
Fact Sheet¶
Fact |
Value |
|---|---|
Check Plugin Download |
https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/dmesg |
Nagios/Icinga Check Name |
|
Check Interval Recommendation |
Every minute |
Can be called without parameters |
Yes |
Runs on |
Linux |
Compiled for Windows |
No |
Help¶
usage: dmesg [-h] [-V] [--always-ok] [--ignore IGNORE] [--test TEST]
Checks the kernel ring buffer (dmesg) for messages at severity levels emerg,
alert, crit, and err. Known false positives and hardware-specific noise are
filtered out by default; the filtered count is reported as the `errors`
perfdata so trends can be graphed. To clear reported messages after resolving
the underlying issue, run "dmesg --clear". Note: the kernel ring buffer is a
fixed-size circular buffer, so older messages are overwritten over time, and
timestamps may drift across SUSPEND/RESUME because the time source is not
updated on resume. Requires root or sudo.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--always-ok Always returns OK.
--ignore IGNORE Ignore a kernel message matching this Python regular
expression. Can be specified multiple times. Specifying
this parameter replaces the bundled default ignore list.
Example: `--ignore="^.* unhandled (rd|wr)msr: "`.
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-stderr-
file,expected-retc".
Usage Examples¶
Run with the bundled defaults:
./dmesg
Add a regex to suppress noisy ACPI EC method-abort messages on top of the defaults:
./dmesg --ignore="ACPI Error: Aborting method"
Note: specifying --ignore replaces the bundled defaults. To keep the defaults plus an extra pattern, repeat the bundled patterns or wrap them in a single broader regex such as --ignore="(unhandled (rd|wr)msr: |EFI MOKvar)".
Sample output on a host with real errors:
5 errors in Kernel Ring Buffer.
[Mon May 31 18:27:14 2021] x86/cpu: SGX disabled by BIOS
[Sat Jun 5 18:49:50 2021] ACPI Error: Thread 2495397888 cannot release Mutex [ECMX] acquired by thread 1817575424 (20210105/exmutex-378)
[Sat Jun 5 18:49:50 2021] ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20210105/psparse-529)
[Tue Jun 8 18:54:41 2021] usb usb2-port1: Cannot enable. Maybe the USB cable is bad?
[Tue Jun 8 18:54:41 2021] usb usb2-port1: unable to enumerate USB device|'errors'=5;;;0
States¶
OK if no emerg, alert, crit, or err messages are found in the kernel ring buffer (after filtering).
CRIT if any such messages are found.
--always-oksuppresses all alerts and always returns OK.
Perfdata / Metrics¶
Name |
Description |
|---|---|
|
Number of unfiltered error lines found in the ring buffer. |
Credits, License¶
Authors: Linuxfabrik GmbH, Zurich
License: The Unlicense, see LICENSE file.