Problem with RAID array

From: Stan Horwitz (stan@temple.edu)
Date: Mon Oct 15 2007 - 21:29:43 EDT


I have a Sun Fire V480 with two Sun A1000 disk arrays attached to it.
This hardware runs Solaris 9 and EMC NetWorker 7.4. Both arrays are
on the same SCSI chain. This past Saturday morning, the array that
houses the /nsr files failed. Saturday afternoon, the RAID controller
for the array that has /nsr on it was replaced, but I was still
unable to boot the server fully. It went into single user mode and
said to run fsck so I issued the root password and I ran "fsck -Y /
nsr". This fsck process ran for appox. 24 hours claiming "unknown
file type" for millions of inodes interspersed with a few errors that
said partially allocated inode. Last night when I found that fsck
had finished, I rebooted the server and both RAID arrays mounted fine
as /nsr and /nsr2 which is what's supposed to happen, but when I
tried to start NetWorker, a SCSI error appeared on the console and
NetWorker crashed.

Today, we replaced a disk drive in the /nsr array that had a yellow
light on it after the RAID controller was replaced (but not before).
We also replaced the SCSI card in the V480.

I was able to boot to single user mode after replacing the SCSI card
at around 1:00 this afternoon and another "fsck -Y /nsr" completed in
maybe ten minutes. Rebooting the server worked fine at that point and
there were no SCSI errors. I had to recover the NetWorker
configuration files from the Friday morning bootstrap tape and
restart NetWorker because when I tried restarting it just after
rebooting it, some of the resource files it needed were not available
or were corrupt. After I recovered the data from tape and restarted
NetWorker, it worked great and several backups completed fine.

When I left the computer room this afternoon, neither RAID array had
a wrench light lit on it, nor did the V480.

Unfortunately, at around 6:30 tonight, the /var/adm/messages file
started to show RAID errors and its showing them every few seconds.
NetWorker is still working, but I can't call up the NSR console. The /
nsr/logs/messages file does not appear to be updated as frequently as
it should. I can tell from my tape library's console that some tape
drives are actively receiving data as I write this. If I do a "cd /
nsr" and than "ls -l" the response from ls takes several seconds.
There's also a core file from tonight in the root directory.

What could be causing these RAID array errors? How do I correct the
problem? This /nsr array is only 15% full, but the disk drive we
removed from it was a 32GB drive and it was replaced with an 18GB
drive, so I am wondering if that's what is causing this issue
tonight. How do I correct the problem? Unfortunately, I know very
little about how Solaris works at this level.

Here's a sample of the errors:

Oct 15 21:01:18 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 7677616
Oct 15 21:01:19 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 7677072
Oct 15 21:01:40 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 7677680
Oct 15 21:02:18 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 2317440
Oct 15 21:02:20 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 2317408
Oct 15 21:03:03 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 7650592
Oct 15 21:03:31 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 2317392
Oct 15 21:03:37 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 7677072
Oct 15 21:04:41 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 7677680
Oct 15 21:04:46 bootz rdriver: [ID 486355 kern.notice] ID
[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/
O, with errno 5, on gb029_002, Lun 0, sector 2317408
#

--
Stan Horwitz
stan@temple.edu
CONFIDENTIALITY STATEMENT: The information contained in this e-mail,  
including attachments, is the confidential information of, and/or is  
the property of, Temple University. The information is intended for  
use solely by the individual or entity named in the e-mail. If you  
are not an intended recipient or you received this in error, then any  
review, printing, copying, or distribution of any such information is  
prohibited. Please notify the sender immediately by reply e-mail and  
then delete this e-mail from your system.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:42:25 EDT