SUMMARY: A 1000 disks not seen with lad utility

From: Rami Aubourg (rami.aubourg@ifrance.com)
Date: Tue Feb 04 2003 - 11:38:30 EST


Hello, gurus,

First of all, many thanks to all those who replied. Particularly:
Mike Salehi
Dominic Clarke
William Enestvedt

It was a rather tense moment for me, so I appreciated even more the
community's support.

In fact, the problem was neither with the Ultra10, neither with any
configuration quirks.

All of my seven disks just happened to be physically dead (*Yes. It's
true*). Which is the reason why I couldn't see them. We reconstructed
everything from a remote backup, and now it's rolling along fine, except
some cold sweat.

What happened was rather incredible. The battery on which the A1000 was
plugged on broke during the night, causing it to turn the A1000 on and
off every 5 secs. Which treatment apparently ended up crashing all of my
disks. All seven of them.

What's fun about it is that on the same battery I had two SunFire
V100's, and they suffered no damage. The two internal disks of the
Ultra10 neither.

Apparently, that could be because on the SunFire and the Ultra10 don't
start the disks right away, since there is some latency in the boot
process before starting the disks. The A1000, on the other hand, just
does what he is told, that is: start the disks, shut them down, start
them on again, etc...

So, it's possible to totally screw up a disk bay made for maximum
redundancy and data safety, under certain particular conditions, i.e.
bad electrical supply in my case. I believe I'm the only one who's had
this kind of experience. It's one of the very rare bad surprises I've
had with Sun material. Any feedback is welcome.

Rami Aubourg

Original post below.

***********************************************************
Hello, gurus,

I had an Ultra10 connected to an A1000 with three mirrored disks. We had
a serous power failure problem last night, and today I can't connect to
the A100 anymore.
A probe-scsi-all sees the seven channels attached to the three mirrored
disks, plus the hot spare, and that's all. lad says there are no RAID
devices. rm6 saw no disks. The internal disk leds light up, but not the
ones on the A1000 corresponding to the disks We changed the scsi card ,
the SCSI cable, the controlles on the A1000, tried putting the disks on
an entirely new A1000. Same effect. The last thing I can imagine is the
terminator, or a deeper problem with the motherboard.
The internal system disks on the Ultra10 are all right.

My collleague's with the people who sold us the bay and the Ultra10
right now with the material, trying to fix the problem with them. I'm
setting up another server with yesterday's backup if everything else
fails. I'm also searching for some clues as to how we could access the
disks on the A1000.

Has anyone had this kind of problems with an A1000 before? Were there
other tools that you used to acces the disks? And in case the disks are
all right, is there a way to revert to a normal filesystem and plug them
on another server, without using the A1000?

Thanks in advance,

Rami Aubourg
*****************************************************

-- 
Lost Knowledge Sets Back Civilization
_____________________________________________________________________
Envie de discuter en "live" avec vos amis ? Tilicharger MSN Messenger
http://www.ifrance.com/_reloc/m la 1hre messagerie instantanie de France
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:25:45 EDT