Boot Disk/IO Board Failure

From: David Price (dprice@plugnpay.com)
Date: Wed Jun 25 2003 - 19:26:28 EDT


E3500 2.6

Over the last 2 months we have had repeated problems with the boot disk on
this machine.

It will slowly freeze up progressively not allowing functions like 'ls' and
then refuse to shutdown.

In particular, the last 2 times this happened, if you tied to do a 'ls' of
/var/spool/mqueue it would hang and never return the prompt and you could
not escape out. Doing an 'ls' of /var/spool was OK. Wierd.

As first we thought it was a bad hard drive as the root drive is usually NG
when we reboot. Then we thought it was the GBIC' and/or Fiber Cable.

Bit by bit all have been replaced. No errors were ever reported during
hardware testing at powerup. "Failed Compnents" when installed on another
server worked OK.

The last time this happened, last week, the system could not see the fcal
bus after powering up. After powering down and reseating the I/O board the
system once again recognized the fcal bus. We booted off the mirror,
resynced the main drive and all was well.

It happend again this morning. As has been the case the only way to shut
down is to cut power.

This time when the system powered up it reported a RAM error on the ASIC
(SOC) and failed the board.

I killed power and tried again. Same error.

I replaced the IO Board and rebooted and system came all the way up with no
errors. The boot drive was even OK.

Question.

Anybody have any similar experience. Can Failing RAM on an ASIC (SOC) cause
these types of intermittent errors or am I causing these false errors by
impolite way I am shutting down.

Any thoughts appreciated. I am trying to determine if it is safe to sleep
at night again.

Thanks

Dave
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:39 EDT