SUMMARY Disk failure?

From: Richard Butler (rbutler@ibc.cnr.it)
Date: Mon Jun 18 2007 - 13:06:22 EDT


Hi,

I asked:
> Yesterday I had a disk fail on a SunFire 280R Solaris 8.
> The questions first:
> Can the experts confirm my opinion that this is a hardware problem -
> disk failure?
Thanks to all those who gave help. (Roger Kynaston, Grant Lowe, Michael
Grice, Brad Morrison and Abhijit Das). The consensus was that this was
indeed a hardware disk failure - not the controller as this would have
affected the other drive. This was confirmed using iostat which showed
multiple hard errors and also by smartd which was unable to register the
disk: Device: /dev/rdsk/c1t1d0s0, failed Test Unit Ready [err=-5], but
did register the good one. smartd is part of the smartmontools package
from SourceForge and, if I had had it installed, it might have given me
advance warning of the failure.

I also asked:
> Have you any suggestions for recovering data from this drive (I do
> have backups, but I would still lose some important data)?
Suggestions were that I might be able to revive it temporarily by
slapping it and/or putting it in the fridge for a couple of hours. Short
of this forget it or the expensive data retrieval services. I can
confirm that both methods have worked for me with PC disks in the past.
Warning - don't try the fridge trick in a humid atmosphere or
condensation can cause worse problems (if possible!).

I tried both methods, but no luck. I have ordered a new drive and will
recover what I can from backups. In addition I will probably install
smartd on this and other servers.

Thanks again
Richard Butler

Details of the original question:
>
> Symptoms:
> This machine had two 72G disks (not mirrored) and during reboot after
> installing the latest recommended patches I get the warning:
> ...
> Jun 14 12:06:30 ed pcisch: [ID 370704 kern.info] PCI-device:
> SUNW,qlc@4, qlc0
> Jun 14 12:06:30 ed genunix: [ID 936769 kern.info] qlc0 is
> /pci@8,600000/SUNW,qlc@4
> Jun 14 12:06:30 ed genunix: [ID 936769 kern.info] fp0 is
> /pci@8,600000/SUNW,qlc@4/fp@0,0
> Jun 14 12:06:31 ed genunix: [ID 405830 kern.warning] WARNING: Device
> ssd0 failed to power up.
> Jun 14 12:06:32 ed genunix: [ID 749148 kern.warning] WARNING: Please
> see your system administrator or reboot.
> Jun 14 12:06:32 ed scsi: [ID 799468 kern.info] ssd0 at fp0: name
> w21000004cf8e7591,0, bus address e8
> Jun 14 12:06:32 ed genunix: [ID 936769 kern.info] ssd0 is
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7591,0
> Jun 14 12:06:32 ed scsi: [ID 365881 kern.info] Vendor 'SEAGATE',
> product 'ST373405FSUN72G', (unknown capacity)
> Jun 14 12:06:32 ed genunix: [ID 408114 kern.info]
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7591,0 (ssd0) online
> Jun 14 12:06:32 ed scsi: [ID 799468 kern.info] ssd1 at fp0: name
> w21000004cf8e7555,0, bus address ef
> Jun 14 12:06:32 ed genunix: [ID 936769 kern.info] ssd1 is
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7555,0
> Jun 14 12:06:32 ed scsi: [ID 365881 kern.info] <SUN72G cyl 14087 alt
> 2 hd 24 sec 424>
> Jun 14 12:06:32 ed genunix: [ID 408114 kern.info]
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7555,0 (ssd1) online
> Jun 14 12:06:32 ed swapgeneric: [ID 308332 kern.info] root on
> /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w21000004cf8e7555,0:a fstype ufs
> ...
> And of course all the filesystems on this disk failed to fsck or
> mount.
>
> Using format I can see the bad disk as c1t1d0 (although searching
> for disks... seems to take longer than normal)
> 0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7555,0
> 1. c1t1d0 <drive type unknown>
> /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf8e7591,0
>
> If I go ahead and give the correct geometry I can then see the
> partition table as I had it before the crash.
> Part Tag Flag Cylinders Size Blocks
> 0 root wm 0 - 412 2.00GB (413/0/0)
> 4202688
> 1 swap wu 413 - 1237 4.00GB (825/0/0)
> 8395200
> 2 backup wm 0 - 14086 68.35GB (14087/0/0)
> 143349312
> 3 unassigned wm 1238 - 1240 14.91MB (3/0/0)
> 30528
> 4 var wm 1241 - 2065 4.00GB (825/0/0)
> 8395200
> 5 unassigned wm 2066 - 6187 20.00GB (4122/0/0)
> 41945472
> 6 usr wm 6188 - 8248 10.00GB (2061/0/0)
> 20972736
> 7 home wm 8249 - 14086 28.33GB (5838/0/0)
> 59407488
>
> I cannot however mount any partition:
> mount /dev/dsk/c1t1d0s5 /mnt
> mount: I/O error
> mount: cannot mount /dev/dsk/c1t1d0s5
> _______________________________________________
> sunmanagers mailing list
> sunmanagers@sunmanagers.org
> http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:42:04 EDT