Check redis-status
Overview
Returns information and statistics about a Redis server. Alerts on memory consumption, memory fragmentation, hit rates and more. Connects to Redis via 127.0.0.1:6379 by default.
Hints:
Tested on Redis 3.0, 3.2, 6.0, 6.2 and 7.0.
„I’m here to keep you safe, Sam. I want to help you.“ comes from the character GERTY in the movie „Moon“ (2009).
Fact Sheet
Check Plugin Download |
https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/redis-status |
Check Interval Recommendation |
Once a minute |
Can be called without parameters |
Yes |
Compiled for |
Linux |
Requirements |
command-line tool |
Help
usage: redis-status [-h] [-V] [--always-ok] [-c CRIT] [-H HOSTNAME]
[--ignore-maxmemory0] [--ignore-overcommit]
[--ignore-somaxconn] [--ignore-sync-partial-err]
[--ignore-thp] [-p PASSWORD] [--port PORT]
[--socket SOCKET] [--test TEST] [-w WARN]
Returns information and statistics about a Redis server. Alerts on memory
consumption, memory fragmentation, hit rates and more.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--always-ok Always returns OK.
-c CRIT, --critical CRIT
Set the CRIT threshold as a percentage. Default: >=
None
-H HOSTNAME, --hostname HOSTNAME
Redis server hostname. Default: 127.0.0.1
--ignore-maxmemory0 Don't warn about redis' maxmemory=0. Default: False
--ignore-overcommit Don't warn about vm.overcommit_memory<>1. Default:
False
--ignore-somaxconn Don't warn about net.core.somaxconn <
net.ipv4.tcp_max_syn_backlog. Default: False
--ignore-sync-partial-err
Don't warn about partial sync errors (because if you
have an asynchronous replication, a small number of
"denied partial resync requests" might be normal).
Default: False
--ignore-thp Don't warn about transparent huge page setting.
Default: False
-p PASSWORD, --password PASSWORD
Password to use when connecting to the redis server.
--port PORT Redis server port. Default: 6379
--socket SOCKET Redis server socket (overrides hostname and port).
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-
stderr-file,expected-retc".
-w WARN, --warning WARN
Set the WARN threshold as a percentage. Default: >= 90
Usage Examples
./redis-status --ignore-maxmemory0 --ignore-overcommit --ignore-somaxconn --ignore-sync-partial-err --ignore-thp
Output:
Redis v5.0.3, standalone mode on 127.0.0.1:6379, /etc/redis.conf, up 4m 25s, 100.9% memory usage [WARNING] (9.6MiB/9.5MiB, 9.6MiB peak, 19.6MiB RSS), maxmemory-policy=noeviction, 3 DBs (db0 db3 db4) with 10 keys, 0.0 evicted keys, 0.0 expired keys, hit rate 100.0% (3.0M hits, 0.0 misses), vm.overcommit_memory is not set to 1, kernel transparent_hugepage is not set to "madvise" or "never", net.core.somaxconn (128) is lower than net.ipv4.tcp_max_syn_backlog (256). Sam, I detected a few issues in this Redis instance memory implants:
* High total RSS: This instance has a memory fragmentation and RSS overhead greater than 1.4 (this means that the Resident Set Size of the Redis process is much larger than the sum of the logical allocations Redis performed). This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default libc malloc. Note: The currently used allocator is "jemalloc-5.1.0".
I'm here to keep you safe, Sam. I want to help you.
States
WARN or CRIT in case of memory usage above the specified thresholds
WARN on Redis‘
maxmemory 0
setting (can be disabled)WARN on any memory issues (can be disabled)
WARN on partial sync errors (can be disabled)
WARN on bad OS configuration (can be disabled)
Perfdata / Metrics
Latest info can be found here.
Name |
Type |
Description |
---|---|---|
clients_blocked_clients |
Number |
Number of clients pending on a blocking call |
clients_connected_clients |
Number |
Number of client connections (excluding connections from replicas) |
cpu_used_cpu_sys |
Number |
System CPU consumed by the Redis server, which is the sum of system CPU consumed by all threads of the server process (main thread and background threads) |
cpu_used_cpu_sys_children |
Number |
System CPU consumed by the background processes |
cpu_used_cpu_user |
Number |
User CPU consumed by the Redis server, which is the sum of user CPU consumed by all threads of the server process (main thread and background threads) |
cpu_used_cpu_user_children |
Number |
User CPU consumed by the background processes |
db_count |
Number |
Number of Redis databases |
key_count |
Number |
Sum of all keys across all databases |
keyspace_<dbname>_keys |
Number |
The number of keys |
keyspace_<dbname>_expires |
Number |
The number of keys with an expiration |
keyspace_<dbname>_avg_ttl |
Seonds |
|
keyspace_hit_rate |
Percentage |
Percentage of key lookups that are successfully returned by keys in your Redis instance. Generally speaking, a higher cache-hit ratio is better than a lower cache-hit ratio. You should make a note of your cache-hit ratio before you make any large configuration changes such as adjusting the maxmemory-gb limit, changing your eviction policy, or scaling your instance. Then, after you modify your instance, check the cache-hit ratio again to see how your change impacted this metric. |
mem_usage |
Percentage |
Indicates how close your working set size is to reaching the maxmemory-gb limit. Unless the eviction policy is set to no-eviction, the instance data reaching maxmemory does not always indicate a problem. However, key eviction is a background process that takes time. If you have a high write-rate, you could run out of memory before Redis has time to evict keys to free up space. |
memory_maxmemory |
Bytes |
|
memory_mem_fragmentation_ratio |
Number |
Ratio between used_memory_rss and used_memory. Note that this doesn’t only includes fragmentation, but also other process overheads (see the allocator_* metrics), and also overheads like code, shared libraries, stack, etc. Memory fragmentation can cause your Memorystore instance to run out of memory even when the used memory to maxmemory-gb ratio is low. Memory fragmentation happens when the operating system allocates memory pages which Redis cannot fully utilize after repeated write and delete operations. The accumulation of such pages can result in the system running out of memory and eventually causes the Redis server to crash. |
memory_total_system_memory |
Bytes |
The total amount of memory that the Redis host has |
memory_used_memory |
Bytes |
Total number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc) |
memory_used_memory_lua |
Bytes |
Number of bytes used by the Lua engine |
memory_used_memory_rss |
Bytes |
Number of bytes that Redis allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1) |
persistance_aof_current_rewrite_time_sec |
Seconds |
Duration of the on-going AOF rewrite operation if any |
persistance_aof_rewrite_in_progress |
Number |
Flag indicating a AOF rewrite operation is on-going |
persistance_aof_rewrite_scheduled |
Number |
Flag indicating an AOF rewrite operation will be scheduled once the on-going RDB save is complete. |
persistance_loading |
Number |
Flag indicating if the load of a dump file is on-going |
persistance_rdb_bgsave_in_progress |
Number |
Flag indicating a RDB save is on-going |
persistance_rdb_changes_since_last_save |
Number |
Number of changes since the last dump |
persistance_rdb_current_bgsave_time_sec |
Seconds |
Duration of the on-going RDB save operation if any |
replication_connected_slaves |
Number |
Number of connected replicas |
replication_repl_backlog_histlen |
Bytes |
Size in bytes of the data in the replication backlog buffer |
replication_repl_backlog_size |
Bytes |
Total size in bytes of the replication backlog buffer |
server_uptime_in_seconds |
Seconds |
Number of seconds since Redis server start |
stats_evicted_keys |
Continous Counter |
Number of evicted keys due to maxmemory limit |
stats_expired_keys |
Continous Counter |
Total number of key expiration events. If there are no expirable keys, it can be an indication that you are not setting TTLs on keys. In such cases, when your instance data reaches the maxmemory-gb limit, there are no keys to evict which can result in an out of memory condition. If the metric shows many expired keys, but you still see memory pressure on your instance, you should lower maxmemory-gb. |
stats_instantaneous_input |
Number |
The network read rate per second in KB/sec |
stats_instantaneous_ops_per_sec |
Number |
Number of commands processed per second |
stats_instantaneous_output |
Number |
The networks write rate per second in KB/sec |
stats_keyspace_hits |
Number |
Number of successful lookup of keys in the main dictionary |
stats_keyspace_misses |
Number |
Number of failed lookup of keys in the main dictionary |
stats_latest_fork_usec |
Number |
Duration of the latest fork operation in microseconds |
stats_migrate_cached_sockets |
Number |
The number of sockets open for MIGRATE purposes |
stats_pubsub_channels |
Number |
Global number of pub/sub channels with client subscriptions |
stats_pubsub_patterns |
Number |
Global number of pub/sub pattern with client subscriptions |
stats_rejected_connections |
Number |
Number of connections rejected because of maxclients limit |
stats_sync_full |
Number |
The number of full resyncs with replicas |
stats_sync_partial_err |
Number |
The number of denied partial resync requests |
stats_sync_partial_ok |
Number |
The number of accepted partial resync requests |
stats_total_commands_processed |
Number |
Total number of commands processed by the server |
stats_total_connections_received |
Number |
Total number of connections accepted by the server |
stats_total_net_input_bytes |
Bytes |
The total number of bytes read from the network |
stats_total_net_output_bytes |
Bytes |
The total number of bytes written to the network |
Troubleshooting
- vm.overcommit_memory is not set to 1
sysctl -w vm.overcommit_memory=1
- kernel transparent_hugepage is not set to „madvise“
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
- net.core.somaxconn is lower than net.ipv4.tcp_max_syn_backlog
tcp_max_syn_backlog
represents the maximal number of connections inSYN_RECV
queue.somaxconn
represents the maximal size ofESTABLISHED
queue and should be greater thantcp_max_syn_backlog
, so do something like this:sysctl -w net.core.somaxconn=1024; sysctl -w net.ipv4.tcp_max_syn_backlog=512
Credits, License
Authors: Linuxfabrik GmbH, Zurich
License: The Unlicense, see LICENSE file.