Check disk-smart
Overview
Multi HDD/SSD scan. No need to provide any warning/critical thresholds, no need to maintain any disk or property databases, no need for any additional libraries.
This check will scan for devices and attempt to open each device first. If successful, all information for the device will be parsed.
The check calls smartctl
, which itself controls the Self-Monitoring, Analysis
and Reporting Technology (SMART) system built into most ATA/SATA and SCSI/SAS
hard drives and solid-state drives. The purpose of SMART is to monitor the
reliability of the hard drive and predict drive failures. (from the man page of smart
)
Hints:
Needs
sudo
.Running this check just makes sense on hardware using ATA/SATA and/or SCSI/SAS HDDs and SSDs.
The check tries to identify all disks automatically. Disks without SMART capability can be ignored using the
--ignore
parameter manually.Keep in mind that a
smartctl
run can take up to one or two seconds per disk, depending on its health and (interface/bus) speed.Don’t forget to run
/usr/sbin/update-smart-drivedb
from time to time to get the newest drive database (sometimes there are improvements on how to interpret some attributes).Use
--full
to get also a warning for notices.
Fact Sheet
Check Plugin Download |
https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/disk-smart |
Check Interval Recommendation |
Every 8 hours |
Can be called without parameters |
Yes |
Compiled for |
Linux, Windows |
Help
usage: disk-smart [-h] [-V] [--always-ok] [--full] [--ignore IGNORE]
[--test TEST]
This check is some kind of user interface for smartctl, which is a tool for
querying and controlling SMART (Self-Monitoring, Analysis, and Reporting
Technology) data in hard disk and solid-state drives. It allows you to inspect
the drive's SMART data to determine its health.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--always-ok Always returns OK.
--full If set, also warn on any assumptions (in GSmartControl
stated as "notice" messages), otherwise just warn on "real"
SMART issues. Default: False
--ignore IGNORE A comma-separated list of disks which should be ignored, in
the format 'sda,sdb'. Default: []
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-stderr-
file,expected-retc".
Usage Examples
./disk-smart --ignore sdd,sdbx,mmcblk0 --full
Output:
Checked 6 disks. There are critical errors.
* sda (Crucial/Micron Client SSDs, Crucial_CT525MX300SSD1, SerNo 1a2b3c4d)
* sdb (Crucial/Micron Client SSDs, Crucial_CT525MX300SSD1, SerNo 1a2b3c4d)
* [CRITICAL] sdc (Seagate IronWolf, ST12000VN0007-2GS116, SerNo 1a2b3c4d)
- The device error log contains records of errors.
- Error Log: Drive is reporting 2 internal errors. Usually this means uncorrectable data loss and similar severe errors. Check the actual errors for details.
- Error Log: Error "Uncorrectable error in data".
- Error Log: Error "Uncorrectable error in data".
- Attributes: Drive has a non-zero Raw value ("5 Reallocated_Sector_Ct"), but there is no SMART warning yet. This could be an indication of future failures and/or potential data loss in bad sectors.
* sdd (Seagate IronWolf, ST12000VN0007-2GS116, SerNo 1a2b3c4d)
- The device error log contains records of errors.
* sde (Seagate IronWolf, ST12000VN0007-2GS116, SerNo 1a2b3c4d)
- The device error log contains records of errors.
* sdf (Seagate IronWolf, ST12000VN0007-2GS116, SerNo 1a2b3c4d)
- The device error log contains records of errors.
States
CRIT, if SMART reports
any messages in subsection „health“
drive has a failing pre-fail attribute
„Address mark not found“ in subsection „error_log“
„Identity not found“ in subsection „error_log“
„Track 0 not found“ in subsection „error_log“
„Uncorrectable error in data“ in subsection „error_log“
SMART status check returned DISK FAILING
WARN, if SMART reports
failing old-age attribute
failing pre-fail attribute in the past
„Command completion timed out“ in subsection „error_log“
„End of media“ in subsection „error_log“
„Interface CRC error“ in subsection „error_log“
Drive is past its estimated lifespan
Drive is reporting surface errors
UNKNOWN on smartctl
not found, errors running smartctl
, SMART not available or not supported.
If smartctl
reports more than one issue, the worst issue state over all disks is returned.
Perfdata / Metrics
Temperatures
Remaining or used Lifetimes
Power On Hours
Power Cycle Counts
Troubleshooting
- smartctl failed with exit status „Device open failed, device did not return an IDENTIFY DEVICE structure, or device is in a low-power mode.
Run the check with root privileges, for example using
sudo
.
Credits, License
Authors: Linuxfabrik GmbH, Zurich
License: The Unlicense, see LICENSE file.
Credits: GSmartControl: We re-implemented parts of the logic in Python and used its excellent output.