DebianFreeBSDdiagnosticshard drivemaintenancemonitoringsmartmontoolssystem health
Smartmontools: Hard Drive Monitoring
Introduction
Smartmontools is a tool for analyzing hard drives and their most critical physical characteristics. It consists of two parts: smartd daemon, which checks parameters every 30 minutes and writes the results to /var/log/syslog, and the smartctl command which requires root privileges and is used to display all the information.
Activation / Installation of smartmontools
Debian
Installation requires root privileges. The package name varies depending on your Debian version. The example below is for Sarge.
As you can see, the daemon has not been started immediately. You need to edit /etc/default/smartmontools and uncomment the lines start_smartd=yes and smartd_opts="--interval=1800":
# Defaults for smartmontools initscript (/etc/init.d/smartmontools)# This is a POSIX shell fragment# list of devices you want to explicitly enable S.M.A.R.T. for# not needed if the device is monitored by smartd# enable_smart="/dev/hda /dev/hdb"# uncomment to start smartd on system startupstart_smartd=yes
# uncomment to pass additional options to smartd on startupsmartd_opts="--interval=1800"
To fine-tune the smartmontools configuration, edit the /etc/smartd.conf file and look for the DEVICESCAN line to add your own settings, as in this example:
The DEVICESCAN directive indicates that you want to apply this configuration to all hard disks detected as SMART compatible on the system. It can be replaced by the name of a device /dev/hdx or /dev/sdx.
Adding this line to the configuration file allows sending an email to admin@domain.com using your system's mail command. The -t option indicates that we want to be informed in case the "Pre-Fail" or "Old-age" attribute shows errors, if the health test (option -H) fails, or if the error and selftest logs evolve (-l). You can choose from a range of options to best adjust according to your needs. For example, you can deliberately ignore an attribute using the -I option. Adding the -I 194 option indicates that we want to receive an email in case of failure but ignoring attribute number 194 (temperature). The -s option allows you to define the periodicity of the tests to be performed (version >5.30 required). In this example, we perform a short test (S/) every day at 2 a.m., and a long test every Saturday at 3 a.m. It's also possible to modify the email that will be sent by smartd in case of failure by creating a script that will be called instead of /bin/mail.
FreeBSD
To receive daily emails indicating the state of your disks, add this to the /etc/periodic.conf file:
How to interpret these lines? The drive shows a constant value that varies between 246 and 247. If the value suddenly changes from 247 to 500, this is abnormal behavior.
Using the smartctl command requires root privileges. Let's look at the different attributes of the command.
Now we need to interpret the information such as disk uptime, temperature, and most importantly for us, errors. For this we mainly observe the last two columns: WHEN_FAILED and RAW_VALUE, and the section just below: SMART Error Log Version: 1 No Errors Logged.
Here we see that sector reallocation has failed. You should therefore monitor this part. If the number indicated quickly increases to higher figures, take the necessary measures: back up your data and possibly contact support.
Conclusion
Smartmontools is simple to use and very comprehensive. Note, however, that such a tool does not replace the most important thing: regular backup of your data.
FAQ
Problems during updates
Sometimes during a package update, things may go wrong and you may not know why. The problem is actually quite simple. Just stop the service: