查看centos服务器硬盘状态信息-阿里云开发者社区

windows下有HDTune可以查看磁盘的状态，防止磁盘挂掉才会自己知道，CentOS下有SMART (Self-Monitoring, Analysis and Reporting Technology System) 同样对磁盘做状态检测

http://www.smartmontools.org/

下面以dell R720服务器举例，/dev/sda是1T的scsi接口普通硬盘，/dev/sdd 是三块盘做的raid5

# df -h #查看磁盘的名字

# dmesg |grep sdd #查看开机信息里面的磁盘info

sd 0:2:0:0: [sdd] Attached SCSI disk

# hdparm -I /dev/sda #查看磁盘硬件信息、开启的功能等,信息特别详细

下面用smart查看磁盘的状态：

 
         # yum install smartmontools  //安装SMART
        
         # smartctl -H /dev/sdd   //磁盘健康状况查看
        
         smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.56-11.el6.centos.alt.x86_64] (
         local 
         build) 
        
         Copyright (C) 2002-12 by Bruce Allen,
        
         http:
         //smartmontools
         .sourceforge.net 
        
         SMART Health Status: OK

# smartctl -A /dev/sda 或者 smartctl --all /dev/sda #硬盘的smart信息

 
         # smartctl -a /dev/sdd 
        
         smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.56-11.el6.centos.alt.x86_64] (
         local 
         build) 
        
         Copyright (C) 2002-12 by Bruce Allen,
        
         http:
         //smartmontools
         .sourceforge.net 
        
         Vendor: DELL
        
         Product: PERC H310
        
         Revision: 2.12
        
         User Capacity: 598,879,502,336 bytes [598 GB]
        
         Logical block size: 512 bytes
        
         Logical Unit 
         id
         :  
        
         Serial number: 
        
         Device 
         type
         : disk 
        
         Local Time is: Wed Jan 14 15:37:39 2015 CST
        
         Device does not support SMART
        
         Error Counter logging not supported
        
         Device does not support Self Test logging

这里提示Device does not support SMART，所以按下面方式查看

查看raid5中第一块磁盘的状态

# smartctl -a -d megaraid,0 /dev/sdd

同样查看第二块、第三块磁盘的状态，根据自己的监控情况，加速nagios、zabbix报警

# smartctl -a -d megaraid,1 /dev/sdd

# smartctl -a -d megaraid,2 /dev/sdd

除此之外的smartctl用法，介绍的很详细：

 
         # smartctl -h
        
         Usage: smartctl [options] device
        
         ============================================ SHOW INFORMATION OPTIONS =====
        
         -h, --help, --usage 
        
         Display this help and 
         exit 
        
         -V, --version, --copyright, --license 
        
         Print license, copyright, and version information and 
         exit 
        
         -i, --info 
        
         Show identity information 
         for 
         device 
        
         -g NAME, --get=NAME 
        
         Get device setting: all, aam, apm, lookahead, security, wcache 
        
         -a, --all 
        
         Show all SMART information 
         for 
         device 
        
         -x, --xall 
        
         Show all information 
         for 
         device 
        
         --scan 
        
         Scan 
         for 
         devices 
        
         --scan-
         open 
        
         Scan 
         for 
         devices and try to 
         open 
         each device 
        
         ================================== SMARTCTL RUN-TIME BEHAVIOR OPTIONS =====
        
         -q TYPE, --quietmode=TYPE                                           (ATA) 
        
         Set smartctl quiet mode to one of: errorsonly, silent, noserial 
        
         -d TYPE, --device=TYPE        
        
         Specify device 
         type 
         to one of: ata, scsi, sat[,auto][,N][+TYPE],  
        
         usbcypress[,X], usbjmicron[,x][,N], usbsunplus, marvell, areca,N
         /E
         ,  
        
         3ware,N, hpt,L
         /M/N
         , megaraid,N, cciss,N, auto, 
         test 
        
         -T TYPE, --tolerance=TYPE                                           (ATA) 
        
         Tolerance: normal, conservative, permissive, verypermissive 
        
         -b TYPE, --badsum=TYPE                                              (ATA) 
        
         Set action on bad checksum to one of: warn, 
         exit
         , ignore 
        
         -r TYPE, --report=TYPE 
        
         Report transactions (see 
         man 
         page) 
        
         -n MODE, --nocheck=MODE                                             (ATA) 
        
         No check 
         if
         : never, 
         sleep
         , standby, idle (see 
         man 
         page) 
        
         ============================== DEVICE FEATURE ENABLE
         /DISABLE 
         COMMANDS ===== 
        
         -s VALUE, --smart=VALUE 
        
         Enable
         /disable 
         SMART on device (on
         /off
         ) 
        
         -o VALUE, --offlineauto=VALUE                                       (ATA) 
        
         Enable
         /disable 
         automatic offline testing on device (on
         /off
         ) 
        
         -S VALUE, --saveauto=VALUE                                          (ATA) 
        
         Enable
         /disable 
         Attribute autosave on device (on
         /off
         ) 
        
         -s NAME[,VALUE], --
         set
         =NAME[,VALUE] 
        
         Enable
         /disable/change 
         device setting: aam,[N|off], apm,[N|off], 
        
         lookahead,[on|off], security-freeze, standby,[N|off|now], 
        
         wcache,[on|off] 
        
         ======================================= READ AND DISPLAY DATA OPTIONS =====
        
         -H, --health 
        
         Show device SMART health status 
        
         -c, --capabilities                                                  (ATA) 
        
         Show device SMART capabilities 
        
         -A, --attributes 
        
         Show device SMART vendor-specific Attributes and values 
        
         -f FORMAT, --
         format
         =FORMAT                                          (ATA) 
        
         Set output 
         format 
         for 
         attributes: old, brief, hex[,
         id
         |val] 
        
         -l TYPE, --log=TYPE 
        
         Show device log. TYPE: error, selftest, selective, directory[,g|s], 
        
         xerror[,N][,error], xselftest[,N][,selftest], 
        
         background, sasphy[,reset], sataphy[,reset], 
        
         scttemp[sts,hist], scttempint,N[,p], 
        
         scterc[,N,M], devstat[,N], ssd, 
        
         gplog,N[,RANGE], smartlog,N[,RANGE] 
        
         -
         v 
         N,OPTION , --vendorattribute=N,OPTION                            (ATA) 
        
         Set display OPTION 
         for 
         vendor Attribute N (see 
         man 
         page) 
        
         -F TYPE, --firmwarebug=TYPE                                         (ATA) 
        
         Use firmware bug workaround: none, samsung, samsung2, 
        
         samsung3, swapid 
        
         -P TYPE, --presets=TYPE                                             (ATA) 
        
         Drive-specific presets: use, ignore, show, showall 
        
         -B [+]FILE, --drivedb=[+]FILE                                       (ATA) 
        
         Read and replace [add] drive database from FILE 
        
         [default is +
         /etc/smart_drivedb
         .h 
        
         and 
         then    
         /usr/share/smartmontools/drivedb
         .h] 
        
         ============================================ DEVICE SELF-TEST OPTIONS =====
        
         -t TEST, --
         test
         =TEST 
        
         Run 
         test
         . TEST: offline, short, long, conveyance, force, vendor,N, 
        
         select
         ,M-N, pending,N, afterselect,[on|off] 
        
         -C, --captive 
        
         Do 
         test 
         in 
         captive mode (along with -t) 
        
         -X, --abort 
        
         Abort any non-captive 
         test 
         on device 
        
         =================================================== SMARTCTL EXAMPLES =====
        
         smartctl --all 
         /dev/hda                    
         (Prints all SMART information) 
        
         smartctl --smart=on --offlineauto=on --saveauto=on 
         /dev/hda 
        
         (Enables SMART on first disk) 
        
         smartctl --
         test
         =long 
         /dev/hda          
         (Executes extended disk self-
         test
         ) 
        
         smartctl --attributes --log=selftest --quietmode=errorsonly 
         /dev/hda 
        
         (Prints Self-Test & Attribute errors) 
        
         smartctl --all --device=3ware,2 
         /dev/sda 
        
         smartctl --all --device=3ware,2 
         /dev/twe0 
        
         smartctl --all --device=3ware,2 
         /dev/twa0 
        
         smartctl --all --device=3ware,2 
         /dev/twl0 
        
         (Prints all SMART info 
         for 
         3rd ATA disk on 3ware RAID controller) 
        
         smartctl --all --device=hpt,1
         /1/3 
         /dev/sda 
        
         (Prints all SMART info 
         for 
         the SATA disk attached to the 3rd PMPort 
        
         of the 1st channel on the 1st HighPoint RAID controller) 
        
         smartctl --all --device=areca,3
         /1 
         /dev/sg2 
        
         (Prints all SMART info 
         for 
         3rd ATA disk of the 1st enclosure 
        
         on Areca RAID controller)

http://linux-wiki.cn/wiki/zh-hans/SSD_(%E5%9B%BA%E6%80%81%E7%A1%AC%E7%9B%98)

nagios设置

下面检测raid5磁盘，总共3块磁盘

 
         root@web: 
         /usr/local/nagios/libexec 
         # vim check_disk_status.sh 
        
         #!/bin/bash
        
         #
        
         STATE_OK=0
        
         STATE_W ARNING=1
        
         SMARTCTL=
         "/usr/sbin/smartctl" 
        
         CHECK_DISK=
         "/dev/sda"    
        
         DISK_HEALTH1=`$SMARTCTL -a -d megaraid,0 $CHECK_DISK |
         grep 
         "SMART Health Status"
         |
         awk 
         '{print $4}'
         ` 
        
         if 
         [ 
         "$DISK_HEALTH1" 
         = 
         "OK" 
         ]|| [  
         "$DISK_HEALTH1" 
         = 
         "PASSED" 
         ];
         then 
        
         echo 
         "OK - $CHECK_DISK 1 status is $DISK_HEALTH1 " 
        
         else
        
         echo 
         "CRITICAL - $CHECK_DISK status is $DISK_HEALTH1 " 
        
         exit 
         $STATE_CRITICAL 
        
         fi
        
         DISK_HEALTH2=`$SMARTCTL -a -d megaraid,1 $CHECK_DISK |
         grep 
         "SMART Health Status"
         |
         awk 
         '{print $4}'
         ` 
        
         if 
         [ 
         "$DISK_HEALTH2" 
         = 
         "OK" 
         ]|| [  
         "$DISK_HEALTH2" 
         = 
         "PASSED" 
         ];
         then 
        
         echo 
         "OK - $CHECK_DISK 2 status is $DISK_HEALTH2 " 
        
         else
        
         echo 
         "CRITICAL - $CHECK_DISK status is $DISK_HEALTH2 " 
        
         exit 
         $STATE_CRITICAL 
        
         fi
        
         DISK_HEALTH3=`$SMARTCTL -a -d megaraid,2 $CHECK_DISK |
         grep 
         "SMART Health Status"
         |
         awk 
         '{print $4}'
         ` 
        
         if 
         [ 
         "$DISK_HEALTH3" 
         = 
         "OK" 
         ]|| [  
         "$DISK_HEALTH3" 
         = 
         "PASSED" 
         ];
         then 
        
         echo 
         "OK - $CHECK_DISK 3 status is $DISK_HEALTH3 " 
        
         else
        
         echo 
         "CRITICAL - $CHECK_DISK status is $DISK_HEALTH3 " 
        
         exit 
         $STATE_CRITICAL 
        
         fi
        
         # chmod 755 check_disk_status.sh

 
         vim 
         /usr/local/nagios/etc/nrpe
         .cfg 
        
         command
         [check_disk_status]=
         /usr/bin/sudo 
         /usr/local/nagios/libexec/check_disk_status
         .sh

因为/usr/sbin/smartctl必须要root才可以运行，得到磁盘的状态

 
         vim 
         /etc/sudoers 
        
         #Defaults requiretty
        
         nagios ALL=(ALL) NOPASSWD:
         /usr/local/nagios/libexec/check_disk_status
         .sh

在nagios服务器端执行命令来测试：

 
         root@nagios: 
         /usr/local/nagios/libexec 
         # ./check_nrpe -H 192.168.2.2 -c check_disk_status 
        
         OK - 
         /dev/sda 
         1 status is OK  
        
         OK - 
         /dev/sda 
         2 status is OK  
        
         OK - 
         /dev/sda 
         3 status is OK

定义nagios服务

 
         define service{
        
         use                             linux-service 
        
         host_name                       192_168_2_2 
        
         service_description             check disk status                
        
         check_command                   check_nrpe!check_disk_status 
        
         }