Check journald-query
Overview
Query the systemd journal and alert on any events found. For help on any of the journalctl-specific parameters, have a look at man journalctl
.
How to use the check:
The idea is to always look for the bad entries in the journal. Define a journalctl query that returns results only for error cases, and also only for a specific application, for example.
The check takes a number of the parameters known from
journalctl
. These can be fed with the same values as in the original. For details see the man page ofjournalctl
.So feed the parameters you used to filter your messages with
journalctl
to this check. As soon as results are returned, the check plugin alerts with the desired severity.If no
--priority
is given, the check uses the range--priority=emerg..err
.If no unit or user unit is specified, the check looks for errors in the units present on the most common Linux systems, which are thus found after a fresh installation. To get an idea of which services are handled, have a look at the source code (search for
units = [
).
Hint:
If the initial execution of the check takes more than 10 seconds, the journal is probably too large (which can be checked with the plugin journal-usage). In this case it is recommended to „vacuum“ the journal first.
Fact Sheet
Check Plugin Download |
https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/journald-query |
Check Interval Recommendation |
Once a minute |
Can be called without parameters |
Yes |
Available for |
Python 3 |
Requirements |
None |
Help
usage: journald-query3 [-h] [-V] [--always-ok] [--case-sensitive]
[--facility FACILITY] [--identifier IDENTIFIER]
[--grep GREP] [--priority PRIORITY]
[--severity {warn,crit}] [--since SINCE] [--test TEST]
[--unit UNIT] [--user-unit USER_UNIT]
Query the systemd journal and alert on any events found. For help on any of
the journalctl-specific parameters, have a look at `man journalctl`.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--always-ok Always returns OK.
--case-sensitive journalctl: Make pattern matching case sensitive or
case insensitive. Default: None
--facility FACILITY journalctl: Filter output by syslog facility. Takes a
comma-separated list of numbers or facility names.
Default: None
--grep GREP journalctl: Filter output to entries where the
MESSAGE= field matches the specified regular
expression. If the pattern is all lowercase, matching
is case insensitive. Otherwise, matching is case
sensitive. Default: None
--identifier IDENTIFIER
journalctl: Show messages for the specified syslog
identifier. Default: None
--priority PRIORITY journalctl: Filter output by message priorities or
priority ranges. Default: emerg..err
--severity {warn,crit}
Severity for alerts if journalctl returns results. One
of "warn" or "crit". Default: warn
--since SINCE journalctl: Start showing entries on or newer than the
specified date. Default: >= -24h
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-
stderr-file,expected-retc".
--unit UNIT journalctl: Show messages for the specified systemd
unit UNIT|PATTERN. This parameter can be specified
multiple times. Default: None
--user-unit USER_UNIT
journalctl: Show messages for the specified user
session
Usage Examples
Simple call that checks the most common system services for errors of any kind:
./journald-query
Output:
27 events. Latest event at 2022-07-28 15:08:04 from systemd-resolved, level err: Failed to send hostname reply: Transport endpoint is not connected [WARNING].
Attention: Table below is shortened and just shows the 5 newest and the 5 oldest messages.
Timestamp ! Unit ! Prio ! Message
--------------------+------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------
2022-07-28 15:08:04 ! systemd-resolved ! err ! Failed to send hostname reply: Transport endpoint is not connected
2022-07-28 09:27:03 ! dnf-makecache ! err ! Failed to start dnf makecache.
2022-07-28 09:10:55 ! session-c1.scope ! err ! GLib-GObject: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
2022-07-28 09:10:51 ! user@1000 ! err ! Failed to start Application launched by gnome-session-binary.
2022-07-28 09:10:51 ! user@1000 ! err ! Failed to start Application launched by gnome-session-binary.
2022-07-27 20:36:52 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:36 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:36 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:34 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:34 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
Use `journalctl --reverse --priority=emerg..err --since=-24h` as a starting point for debugging. Be aware of the fact that you might see even more messages then, as we apply a lot of unit filters to only get messages from basic system services.
The full command used was:
journalctl --reverse --priority=emerg..err --since=-24h --quiet --output=json --unit="accounts-daemon.service" --unit="acpid.service" --unit="apparmor.service" --unit="apport.service" --unit="auditd.service" --unit="cron.service" --unit="crond.service" --unit="dbus.service" --unit="dracut-*.service" --unit="haveged.service" --unit="ifplugd.service" --unit="ifup@*.service" --unit="init.scope" --unit="irqbalance.service" --unit="iscsid.service" --unit="lvm2-*.service" --unit="lxcfs.service" --unit="mdadm.service" --unit="network.service" --unit="NetworkManager*.service" --unit="open-iscsi.service" --unit="polkit.service" --unit="polkitd.service" --unit="qemu-guest-agent.service" --unit="rsyslog.service" --unit="session-*.scope" --unit="snapd*.service" --unit="ssh.service" --unit="sshd*.service" --unit="sssd.service" --unit="sysstat.service" --unit="systemd-*.service" --unit="user@*.service"
Explicitly search for error messages in the Apache httpd unit only:
./journald-query --unit=httpd --priority=emerg..err --severity=crit
Output:
994 events. Latest event at 2022-07-28 18:00:04 from httpd, level err: [proxy_fcgi:error] [pid 896:tid 929] [client 127.0.0.1:50256] AH01071: Got error 'Primary script unknown' [CRITICAL].
Attention: Table below is shortened and just shows the 5 newest and the 5 oldest messages.
Timestamp ! Unit ! Prio ! Message
--------------------+-------+------+-----------------------------------------------------------------------------------------------------------
2022-07-28 18:00:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 929] [client 127.0.0.1:50256] AH01071: Got error 'Primary script unknown'
2022-07-28 17:59:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 927] [client 127.0.0.1:57732] AH01071: Got error 'Primary script unknown'
2022-07-28 17:59:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 945] [client 127.0.0.1:53908] AH01071: Got error 'Primary script unknown'
2022-07-28 17:58:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 943] [client 127.0.0.1:56074] AH01071: Got error 'Primary script unknown'
2022-07-28 17:58:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 936] [client 127.0.0.1:44684] AH01071: Got error 'Primary script unknown'
2022-07-28 09:45:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 947] [client 127.0.0.1:52536] AH01071: Got error 'Primary script unknown'
2022-07-28 09:45:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 940] [client 127.0.0.1:53256] AH01071: Got error 'Primary script unknown'
2022-07-28 09:44:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 938] [client 127.0.0.1:44544] AH01071: Got error 'Primary script unknown'
2022-07-28 09:44:04 ! httpd ! err ! [proxy_fcgi:error] [pid 897:tid 904] [client 127.0.0.1:40142] AH01071: Got error 'Primary script unknown'
2022-07-28 09:43:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 931] [client 127.0.0.1:34050] AH01071: Got error 'Primary script unknown'
The full command used was:
journalctl --reverse --priority=emerg..err --since=-24h --quiet --output=json --unit="httpd.service"
States
Depending on the given
--severity
, returns WARN (default) or CRIT if any entries are found.
Perfdata / Metrics
Name |
Type |
Description |
---|---|---|
journald-query |
Continous Counter |
Number of events found in journald |
Credits, License
Authors: Linuxfabrik GmbH, Zurich
License: The Unlicense, see LICENSE file.