SUMMARY: storage array error notifications

From: Dave Martini 1 (martini@raider.llnl.gov)
Date: Wed Jun 22 2005 - 19:36:52 EDT


Hi, thanks to Tom Jones for sending me the script he uses to monitor
his 3510's. This is exactly what I needed.

Here is Tom's response.

I monitor my 3510s as part of my system monitoring scripts.

Here's a portion from my script, using sccli commands:
        /usr/sbin/sccli $array show disks >$workfile
        cat $workfile >>$logfile
        num_failures=`egrep -i "FAIL|OFFLINE|INIT|REBUILD|BAD|ABSENT|MISS" $work
file | wc -l`
        if [ $num_failures -ne 0 ]; then
                a3510dk_notify
        fi
        num_hotspares=`grep STAND-BY $workfile|wc -l`
        if [ $num_hotspares -ne $installed_spares ]; then
                a3510hs_notify
        fi

I send the sccli output of show disks to a temp file ($workfile) then I cat that
temp file to the logfile.

I then check for failures via the egrep against $workfile. If found, it calls
a function that sends me an email and pages me.

I also check the number of global spares. IIf less than 2, I get an email/page
as well.

Using the above coding, I've successfully detected at least 4 or 5 drive
failures over
the past year (we have 8 3510s in production)

Good luck!
Tom Jones

My original question

I have a 3310 stoage array with a single RAID controller in it.
I want to be notified when a disk failure occurs.
I didn't see anyting in the manual about the system emailing
when there is a disk problem on the 3310.
Does anyone know if there is an email notification built into
the 3310?

If not I was thinking of writing a script that would go into the
sccli utility and do a enclosure-status and output this to a file
although I can figure out how to output it to a file because I tried
this

sccli> show enclosure-status >/opt/filemane

but it didn't create the file it just output it to the screen.
Is it possible to output it to a file?

I see under he status column it says the word OK.
What will this change to if the status is not ok would it say
NOT OK?
I want to know what word to search for.

Any suggestions on how best to do this would be great.
The cli interfact has many options. Another is

sccli> show disks
Ch Id Size Speed LD Status IDs
-----------------------------------------------------------------------
 0 0 68.37GB 160MB ld0 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8HJ
 0 1 68.37GB 160MB ld0 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8NJ
 0 2 68.37GB 160MB ld0 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8J3
 0 3 68.37GB 160MB ld0 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8P5
 0 4 68.37GB 160MB ld0 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8JJ
 0 5 68.37GB 160MB ld0 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8JD
 0 8 68.37GB 160MB ld0 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C9C0
 0 9 68.37GB 160MB ld0 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8P9
 0 10 68.37GB 160MB ld1 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8HV
 0 11 68.37GB 160MB ld1 ONLINE FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8NC
 0 12 68.37GB 160MB GLOBAL STAND-BY FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8NV
 0 13 68.37GB 160MB GLOBAL STAND-BY FUJITSU MAP3735N SUN72G 0401
                                              S/N 00Q0C8JB

Thank You
Dave Martini
LLNL
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:57 EDT