Introduction

Smartmontools is a tool for analyzing hard drives and their most critical physical characteristics. It consists of two parts: smartd daemon, which checks parameters every 30 minutes and writes the results to /var/log/syslog, and the smartctl command which requires root privileges and is used to display all the information.

Activation / Installation of smartmontools

Debian

Installation requires root privileges. The package name varies depending on your Debian version. The example below is for Sarge.

  > aptitude install smartmontools
Lecture des listes de paquets... Fait
Construction de l'arbre des dependances... Fait
Les NOUVEAUX paquets suivants seront installes :
 smartmontools
0 mis a jour, 1 nouvellement installes, 0 a enlever et 60 non mis a jour.
Il est necessaire de prendre 222ko dans les archives.
Apres depaquetage, 508ko d'espace disque supplementaires seront utilises.
Reception de : 1 http://ftp.fr.debian.org unstable/main smartmontools 5.32-3 [222kB]
222ko receptionnes en 0s (272ko/s)
Selection du paquet smartmontools precedemment deselectionne.
(Lecture de la base de données... 67466 fichiers et repertoires deja installes.)
Depaquetage de smartmontools (a partir de .../smartmontools_5.32-3_i386.deb) ...
Parametrage de smartmontools (5.32-3) ...
Not starting S.M.A.R.T. daemon smartd, disabled via /etc/default/smartmontools
  

As you can see, the daemon has not been started immediately. You need to edit /etc/default/smartmontools and uncomment the lines start_smartd=yes and smartd_opts="--interval=1800":

  # Defaults for smartmontools initscript (/etc/init.d/smartmontools)
# This is a POSIX shell fragment

# list of devices you want to explicitly enable S.M.A.R.T. for
# not needed if the device is monitored by smartd
# enable_smart="/dev/hda /dev/hdb"

# uncomment to start smartd on system startup
start_smartd=yes

# uncomment to pass additional options to smartd on startup
smartd_opts="--interval=1800"
  

Once the changes are validated, start the daemon:

  /etc/init.d/smartmontools start
Enabling S.M.A.R.T. for: /dev/hda /dev/hdb.
Starting S.M.A.R.T. daemon: smartd.
23:21 root@revolution /# smartctl -a /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
  

The smartd daemon will now regularly check your disk information and record it in your logs:

  cat /var/log/syslog | grep smartd
Mar 17 10:48:34 slut smartd[990]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
  

And there you go, it’s ready.

FreeBSD

To install smartmontools:

  pkg_add -r smartmontool
  

Then start it like this:

  /usr/local/etc/rc.d/smartd start
  

Edit the /usr/local/etc/smartd.conf configuration and add this line (adapting to your email):

  DEVICESCAN -a -m my@mail.com
  

Next, if we want smartd to start at every boot, add this line:

  smartd_enable="YES"
  

Fine Tuning

Debian

To fine-tune the smartmontools configuration, edit the /etc/smartd.conf file and look for the DEVICESCAN line to add your own settings, as in this example:

  DEVICESCAN -H -l error -l selftest -t -f -m admin@webank.fr -M exec /usr/bin/mail -s (S/../.././02|L/../../6/03)
  

The DEVICESCAN directive indicates that you want to apply this configuration to all hard disks detected as SMART compatible on the system. It can be replaced by the name of a device /dev/hdx or /dev/sdx.

  /dev/hda -H -l error -l selftest -t -f -m admin@webank.fr -M exec /usr/bin/mail -s (S/../.././02|L/../../6/03)
/dev/hdc -H -l error -l selftest -t -f -m admin@webank.fr -M exec /usr/bin/mail -s (S/../.././02|L/../../6/03)
  

Adding this line to the configuration file allows sending an email to admin@domain.com using your system’s mail command. The -t option indicates that we want to be informed in case the “Pre-Fail” or “Old-age” attribute shows errors, if the health test (option -H) fails, or if the error and selftest logs evolve (-l). You can choose from a range of options to best adjust according to your needs. For example, you can deliberately ignore an attribute using the -I option. Adding the -I 194 option indicates that we want to receive an email in case of failure but ignoring attribute number 194 (temperature). The -s option allows you to define the periodicity of the tests to be performed (version >5.30 required). In this example, we perform a short test (S/) every day at 2 a.m., and a long test every Saturday at 3 a.m. It’s also possible to modify the email that will be sent by smartd in case of failure by creating a script that will be called instead of /bin/mail.

FreeBSD

To receive daily emails indicating the state of your disks, add this to the /etc/periodic.conf file:

  daily_status_smart_devices="/dev/ad4 /dev/ad6 /dev/ad8 /dev/ad10 /dev/ad12"
  

Obviously, use your own devices.

Diagnostics and Troubleshooting

Since smartd writes to /var/log/syslog, it’s easy to search with a grep command as in the following example:

  > grep smartd /var/log/syslog
Mar 17 10:48:34 slut smartd[990]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Mar 17 10:48:34 slut smartd[990]: Device: /dev/hda, opened
Mar 17 10:48:34 slut smartd[990]: Device: /dev/hda, found in smartd database.
Mar 17 10:48:35 slut smartd[990]: Device: /dev/hda, is SMART capable. Adding to "monitor" list.
Mar 17 10:48:35 slut smartd[990]: Device: /dev/hdb, opened
Mar 17 10:48:35 slut smartd[990]: Device: /dev/hdb, not ATA, no IDENTIFY DEVICE Structure
Mar 17 10:48:35 slut smartd[990]: Monitoring 1 ATA and 0 SCSI devices
Mar 17 10:48:35 slut smartd: Lancement smartd succeeded
Mar 17 10:48:35 slut smartd[2421]: smartd has fork()ed into background mode. New PID=2421.
Mar 17 13:48:35 slut smartd[2421]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 246 to 247
Mar 17 15:48:35 slut smartd[2421]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 247 to 246
Mar 17 17:18:35 slut smartd[2421]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 246 to 247
  

How to interpret these lines? The drive shows a constant value that varies between 246 and 247. If the value suddenly changes from 247 to 500, this is abnormal behavior.

Using the smartctl command requires root privileges. Let’s look at the different attributes of the command.

  smarctl -h

smartctl version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/[1]
Usage: smartctl [options] device
h, --help, --usage
Display this help and exit
i, --info
Show identity information for device
a, --all
Show all SMART information for device
  
  smartctl -i /dev/hda

=== START OF INFORMATION SECTION ===
Device Model:     Maxtor 6E040L0
Serial Number:    E1KTPXFE
Firmware Version: NAR61590
User Capacity:    41,110,142,976 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Thu Mar 17 22:21:52 2005 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
  
  smartctl -a /dev/hda

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                     was completed without error.
                                     Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                     without error or no self-test has ever
                                     been run.
Total time to complete Offline
data collection:                 (1021) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                     Auto Offline data collection on/off support.
                                     Suspend Offline collection upon new
                                     command.
                                     Offline surface scan supported.
                                     Self-test supported.
                                     No Conveyance Self-test supported.
                                     Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                     power-saving mode.
                                     Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                     No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  17) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
3 Spin_Up_Time            0x0027   252   252   063    Pre-fail  Always       -       2463
4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       18
5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
8 Seek_Time_Performance   0x0027   247   238   187    Pre-fail  Always       -       46214
9 Power_On_Minutes        0x0032   241   241   000    Old_age   Always       -       950h+09m
10 Spin_Retry_Count        0x002b   252   252   157    Pre-fail  Always       -       0
11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       22
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       13
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       72
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       31
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       25095
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   251   138   000    Old_age   Always       -       1746
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       137
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   252   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   252   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   187   183   000    Old_age   Offline      -       0
99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1        0        0  Not_testing
  2        0        0  Not_testing
  3        0        0  Not_testing
  4        0        0  Not_testing
  5        0        0  Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
  

Now we need to interpret the information such as disk uptime, temperature, and most importantly for us, errors. For this we mainly observe the last two columns: WHEN_FAILED and RAW_VALUE, and the section just below: SMART Error Log Version: 1 No Errors Logged.

An example:

  5 Reallocated_Sector_Ct 0x0033 016 016 063 Pre-fail Always FAILING_NOW 598
  

Here we see that sector reallocation has failed. You should therefore monitor this part. If the number indicated quickly increases to higher figures, take the necessary measures: back up your data and possibly contact support.

Conclusion

Smartmontools is simple to use and very comprehensive. Note, however, that such a tool does not replace the most important thing: regular backup of your data.

FAQ

Problems during updates

Sometimes during a package update, things may go wrong and you may not know why. The problem is actually quite simple. Just stop the service:

  /etc/init.d/smartmontool stop
  

then restart the update.

The service won’t start

This problem can occur when SMART is simply not enabled. To enable it, just type this command:

  smartctl -s on /dev/sda
  

Then try to start smartmontools:

  /etc/init.d/smartmontools start
  

Resources

Last updated 14 Jan 2011, 20:24 +0200. history