Check redfish-logservices

Overview

Checks the event log entries exposed under the LogServices of a Redfish-compatible server (the System Event Log, SEL) via the Redfish API. Alerts based on the severity of the log entries.

Important Notes:

  • Tested on DELL iDRAC and DMTF Simulator

  • A check usually completes within a few seconds, but a slow or retried request can take longer. The bundled Director basket allows a 60 second runtime timeout.

  • This check runs with both HTTP and HTTPS. It uses GET requests only.

  • No additional Python Redfish modules need to be installed.

Data Collection:

  • Reads the service root to detect the vendor, then queries the Managers collection (or Systems on Supermicro) to locate the log service

  • Reads the SEL log entries and evaluates each entry’s severity

  • Uses HTTP Basic authentication if --username and --password are provided

Fact Sheet

Fact

Value

Check Plugin Download

https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/redfish-logservices

Nagios/Icinga Check Name

check_redfish_logservices

Check Interval Recommendation

Every 5 minutes

Can be called without parameters

Yes

Runs on

Cross-platform

Compiled for Windows

No

Help

usage: redfish-logservices [-h] [-V] [--always-ok]
                           [--cache-expire CACHE_EXPIRE] [--ignore IGNORE]
                           [--insecure] [--log-type {sel,mel,both}]
                           [--match MATCH] [--max-age MAX_AGE] [--no-proxy]
                           [--password PASSWORD] [--retries RETRIES]
                           [--test TEST] [--timeout TIMEOUT] [--url URL]
                           [--username USERNAME]

Checks the event log entries exposed under the LogServices of a Redfish-
compatible server via the Redfish API and alerts based on the severity of the
log entries. By default it reads the System Event Log (SEL); `--log-type`
selects the management controller log (MEL) or both. Entries can be filtered
by regular expression (--match, --ignore), and entries older than --max-age
days can be aged out so a long-since resolved event does not keep the check in
a non-OK state forever.

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  --always-ok           Always returns OK.
  --cache-expire CACHE_EXPIRE
                        The amount of time after which the credential/data
                        cache expires, in minutes. Default: 15
  --ignore IGNORE       Ignore SEL entries whose message matches this Python
                        regular expression. Case-sensitive by default; use
                        `(?i)` for case-insensitive matching. Can be specified
                        multiple times. Example: `--ignore="Log area
                        reset/cleared"`.
  --insecure            This option explicitly allows insecure SSL
                        connections.
  --log-type {sel,mel,both}
                        Which log to read: `sel` (System Event Log, default),
                        `mel` (management controller event log) or `both`.
                        Default: sel
  --match MATCH         Only consider SEL entries whose message matches this
                        Python regular expression. Case-sensitive by default;
                        use `(?i)` for case-insensitive matching. Can be
                        specified multiple times. Example:
                        `--match="(?i)temperature"`.
  --max-age MAX_AGE     Age out SEL entries older than this many days: they
                        are no longer alerted on, only counted in the summary.
                        A controller keeps an entry until the log is cleared,
                        so a long-since resolved event would otherwise keep
                        the check in a non-OK state forever. Default: 0 (0
                        disables aging).
  --no-proxy            Do not use a proxy.
  --password PASSWORD   Redfish API password.
  --retries RETRIES     Number of extra attempts if a request to the Redfish
                        API fails, before the check gives up. Helps against an
                        occasionally slow or flaky management controller.
                        Default: 3
  --test TEST           For unit tests. Needs "path-to-stdout-file,path-to-
                        stderr-file,expected-retc".
  --timeout TIMEOUT     Network timeout in seconds. Default: 8 (seconds)
  --url URL             Redfish API URL. Default: https://localhost:5000
  --username USERNAME   Redfish API username.

Usage Examples

./redfish-logservices --url=https://bmc --username=redfish-monitoring --password='linuxfabrik'

Output:

Checked SEL on 1 member. There are critical errors.

/redfish/v1/Managers/BMC
* 2012-03-07T14:44:00Z: System May be Melting [CRITICAL]

States

  • OK if no log entry has a severity above OK.

  • WARN if a log entry has severity „Warning“.

  • CRIT if a log entry has severity „Critical“.

  • --always-ok suppresses all alerts and always returns OK.

Perfdata / Metrics

This plugin does not provide any performance data.

For Maintainers

You don’t need a physical server with a real BMC (the management controller that serves the Redfish API, e.g. HPE iLO or Dell iDRAC) to develop or test this plugin. The official DMTF Redfish mockup server serves a static, read-only Redfish tree (including the manager log service) over plain HTTP, which is exactly what this GET-only plugin needs.

Run the mockup server and point the plugin at it, from the repository root:

podman run \
    --detach --rm \
    --name lfmp-redfish-mock \
    --publish 5000:8000 \
    docker.io/dmtf/redfish-mockup-server:latest
sleep 3
check-plugins/redfish-logservices/redfish-logservices --url=http://127.0.0.1:5000 --no-proxy
podman stop lfmp-redfish-mock

Use http://127.0.0.1:5000 rather than http://localhost:5000, because localhost may resolve to IPv6 (::1) while the published container port is bound to IPv4.

The fixtures under unit-test/stdout/ are the raw Redfish responses the plugin walks, one set per scenario named <scenario>-root (the service root), <scenario>-managers (the Managers collection) and <scenario>-sel (the log entries). To simulate an alert, copy a healthy set and add an entry with a Severity of Critical or Warning to the -sel file. The offline test suite is run with ./run from the unit-test directory.

Credits, License