Check journald-query
Overview
Query the systemd journal and alert on any events found. For help on any of the journalctl-specific parameters, have a look at man journalctl
.
How to use the check:
The idea is to always look for the bad entries in the journal. Define a journalctl query that returns results only for error cases, and also only for a specific application, for example.
The check takes a number of the parameters known from
journalctl
. These can be fed with the same values as in the original. For details see the man page ofjournalctl
.So feed the parameters you used to filter your messages with
journalctl
to this check. As soon as results are returned, the check plugin alerts with the desired severity.If no
--priority
is given, the check uses the range--priority=emerg..err
.If no unit or user unit is specified, the check looks for errors in the units present on the most common Linux systems, which are thus found after a fresh installation. To get an idea of which services are handled, have a look at the source code (search for
units = [
).
Hints:
If the initial execution of the check takes more than 10 seconds, the journal is probably too large (which can be checked with the plugin journal-usage). In this case it is recommended to „vacuum“ the journal first.
Fact Sheet
Check Plugin Download |
https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/journald-query |
Check Interval Recommendation |
Once a minute |
Can be called without parameters |
Yes |
Compiled for |
Linux |
Help
usage: journald-query [-h] [-V] [--always-ok] [--facility FACILITY]
[--identifier IDENTIFIER]
[--ignore-pattern IGNORE_PATTERN]
[--ignore-regex IGNORE_REGEX] [--priority PRIORITY]
[--severity {warn,crit}] [--since SINCE] [--test TEST]
[--unit UNIT] [--user-unit USER_UNIT]
Query the systemd journal and alert on any events found. For help on any of
the journalctl-specific parameters, see `man journalctl`.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--always-ok Always returns OK.
--facility FACILITY journalctl: Filter output by syslog facility. Takes a
comma-separated list of numbers or facility names.
Default: None
--identifier IDENTIFIER
journalctl: Show messages for the specified syslog
identifier. Default: None
--ignore-pattern IGNORE_PATTERN
Any line containing this case-sensitive pattern on the
MESSAGE field will be ignored (repeating). So, unlike
`journalctl`, you can easily use strings to ignore
certain messages.
--ignore-regex IGNORE_REGEX
Any line matching this Python regex on the MESSAGE
field will be ignored (repeating). So, unlike
`journalctl`, you can easily use a regex to ignore
certain messages. Example: '(?i)linuxfabrik' for a
case-insensitive search for "linuxfabrik".
--priority PRIORITY journalctl: Filter output by message priorities or
priority ranges. Default: emerg..err
--severity {warn,crit}
Severity for alerts if journalctl returns results. One
of "warn" or "crit". Default: warn
--since SINCE journalctl: Start showing entries on or newer than the
specified date. Default: >= -8h
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-
stderr-file,expected-retc".
--unit UNIT journalctl: Show messages for the specified systemd
unit UNIT|PATTERN. This parameter can be specified
multiple times. Default: None
--user-unit USER_UNIT
journalctl: Show messages for the specified user
session unit. This parameter can be specified multiple
times. Default: None
Usage Examples
Simple call that checks the most common system services for errors of any kind:
./journald-query
Output:
27 events. Latest event at 2022-07-28 15:08:04 from systemd-resolved, level err: `Failed to send hostname reply: Transport endpoint is not connected` [WARNING].
Attention: Table below is shortened and just shows the 5 newest and the 5 oldest messages.
Timestamp ! Unit ! Prio ! Message
--------------------+------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------
2022-07-28 15:08:04 ! systemd-resolved ! err ! Failed to send hostname reply: Transport endpoint is not connected
2022-07-28 09:27:03 ! dnf-makecache ! err ! Failed to start dnf makecache.
2022-07-28 09:10:55 ! session-c1.scope ! err ! GLib-GObject: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
2022-07-28 09:10:51 ! user@1000 ! err ! Failed to start Application launched by gnome-session-binary.
2022-07-28 09:10:51 ! user@1000 ! err ! Failed to start Application launched by gnome-session-binary.
2022-07-27 20:36:52 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:36 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:36 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:34 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:34 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
Use `journalctl --reverse --priority=emerg..err --since=-24h` as a starting point for debugging. Be aware of the fact that you might see even more messages then, as we apply a lot of unit filters to only get messages from basic system services.
The full command used was:
journalctl --reverse --priority=emerg..err --since=-24h --unit="accounts-daemon.service" --unit="acpid.service" --unit="apparmor.service" --unit="apport.service" --unit="auditd.service" --unit="cron.service" --unit="crond.service" --unit="dbus.service" --unit="dracut-*.service" --unit="haveged.service" --unit="ifplugd.service" --unit="ifup@*.service" --unit="init.scope" --unit="irqbalance.service" --unit="iscsid.service" --unit="lvm2-*.service" --unit="lxcfs.service" --unit="mdadm.service" --unit="network.service" --unit="NetworkManager*.service" --unit="open-iscsi.service" --unit="polkit.service" --unit="polkitd.service" --unit="qemu-guest-agent.service" --unit="rsyslog.service" --unit="session-*.scope" --unit="snapd*.service" --unit="ssh.service" --unit="sshd*.service" --unit="sssd.service" --unit="sysstat.service" --unit="systemd-*.service" --unit="user@*.service"
Explicitly search for error messages in the Apache httpd unit only:
./journald-query --unit=httpd --priority=emerg..err --severity=crit --ignore-regex='mod_qos.*: Access denied, invalid request line'
Output:
994 events. Latest event at 2022-07-28 18:00:04 from httpd, level err: `[proxy_fcgi:error] [pid 896:tid 929] [client 127.0.0.1:50256] AH01071: Got error 'Primary script unknown'` [CRITICAL].
Attention: Table below is shortened and just shows the 5 newest and the 5 oldest messages.
Timestamp ! Unit ! Prio ! Message
--------------------+-------+------+-----------------------------------------------------------------------------------------------------------
2022-07-28 18:00:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 929] [client 127.0.0.1:50256] AH01071: Got error 'Primary script unknown'
2022-07-28 17:59:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 927] [client 127.0.0.1:57732] AH01071: Got error 'Primary script unknown'
2022-07-28 17:59:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 945] [client 127.0.0.1:53908] AH01071: Got error 'Primary script unknown'
2022-07-28 17:58:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 943] [client 127.0.0.1:56074] AH01071: Got error 'Primary script unknown'
2022-07-28 17:58:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 936] [client 127.0.0.1:44684] AH01071: Got error 'Primary script unknown'
2022-07-28 09:45:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 947] [client 127.0.0.1:52536] AH01071: Got error 'Primary script unknown'
2022-07-28 09:45:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 940] [client 127.0.0.1:53256] AH01071: Got error 'Primary script unknown'
2022-07-28 09:44:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 938] [client 127.0.0.1:44544] AH01071: Got error 'Primary script unknown'
2022-07-28 09:44:04 ! httpd ! err ! [proxy_fcgi:error] [pid 897:tid 904] [client 127.0.0.1:40142] AH01071: Got error 'Primary script unknown'
2022-07-28 09:43:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 931] [client 127.0.0.1:34050] AH01071: Got error 'Primary script unknown'
The full command used was:
journalctl --reverse --priority=emerg..err --since=-24h --unit="httpd.service"
States
Depending on the given
--severity
, returns WARN (default) or CRIT if any entries are found.
Perfdata / Metrics
Name |
Type |
Description |
---|---|---|
journald-query |
Number |
Number of events found in journald |
Credits, License
Authors: Linuxfabrik GmbH, Zurich
License: The Unlicense, see LICENSE file.