Check systemd-units-failed¶
Overview¶
Checks for failed systemd units by running systemctl --state=failed. Reports any unit that is in a failed active state or failed sub state.
When no unit is currently failed, the plugin reads the current boot’s journal for the most recent system unit-failed event and appends it to the output, so the admin sees how long the host has been clean since the last reboot and which unit last broke.
Data Collection:
Executes
systemctl --state=failed --no-pager --no-legendWhen everything is OK, additionally runs
journalctl --output=short-unix --boot=0 _PID=1 --grep="Failed with result"and picks the last matching line client-side, to read the most recent system-scope unit-failed event of the current boot. Scoping to--boot=0keeps the call fast regardless of total journal size. Ifjournalctlis unavailable, returns nothing, or its entry cannot be parsed, the suffix is silently omittedUnits can be excluded from the check via
--ignore, which supports glob patterns according to Python’sfnmatchmodule (e.g.--ignore "sshd@*.service")
Fact Sheet¶
Fact |
Value |
|---|---|
Check Plugin Download |
https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/systemd-units-failed |
Nagios/Icinga Check Name |
|
Check Interval Recommendation |
Every minute |
Can be called without parameters |
Yes |
Runs on |
Linux |
Compiled for Windows |
No |
Help¶
usage: systemd-units-failed [-h] [-V] [--always-ok] [--ignore IGNORE]
[--test TEST]
Checks for failed systemd units. Alerts when any unit is in a failed state.
Specific units can be excluded from the check via --ignore with regular
expressions. When no unit is currently failed, reports the most recent system
unit-failed event from the current boot's journal so operators see at a glance
how long the host has been clean since the last reboot.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--always-ok Always returns OK.
--ignore IGNORE Unit name to exclude from the check. Can be specified
multiple times. Supports glob patterns according to
https://docs.python.org/3/library/fnmatch.html. Example:
`--ignore "dhcpd.service"`. Default: []
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-stderr-
file,expected-retc".
Usage Examples¶
./systemd-units-failed --ignore=openipmi.service --ignore=dhcpd.service
./systemd-units-failed --ignore=sshd@*.service
Output (something is failing):
1 failed unit: ipmievd.service
unit ! load ! active ! sub ! description
----------------+--------+--------+--------+----------------
ipmievd.service ! loaded ! failed ! failed ! Ipmievd Daemon
Output (currently clean, with last failure surfaced from the journal):
Everything is ok. Last failed: `ipmievd.service` with message "Failed with result 'exit-code'" at 2026-04-30 12:01:53 (1W 2D ago)
States¶
OK if no units are in a failed state.
WARN if at least one unit has a failed active state or failed sub state.
--always-oksuppresses all alerts and always returns OK.
Perfdata / Metrics¶
Name |
Type |
Description |
|---|---|---|
systemd-units-failed |
Number |
Number of failed units |
Troubleshooting¶
If you cannot fix the underlying issue and simply want to reset the status of a failed unit:
systemctl reset-failed ipmievd.service
Credits, License¶
Authors: Linuxfabrik GmbH, Zurich
License: The Unlicense, see LICENSE file.