Fire 280R hanging

From: Eric Voisard (eric.voisard@atisuher.ch)
Date: Tue Jan 15 2008 - 07:08:22 EST


Hi there,

One of the Sun Fire 280R we have by a customer started to have problems.
As it apparently wasn't online anymore, we connected to the serial
console (it's headless) to see that it was down to the OK prompt.
Trying to boot from there caused the following:

===
{0} ok
{0} ok boot
Resetting ...
Corrected ECC Error
{0} ok boot
FATAL: OpenBoot initialization sequence prematurely terminated.

FATAL: system is not bootable, boot command is disabled
z~} ok |
@(#)OBP 4.10.11 2003/09/25 11:53 Sun Fire 280R
Power-On Reset
 [then the system booted up to unix...]
===

It then kept running for a couple of hours and then froze up (no OK
prompt available).

Nothing logged in the system's messages. The computer simply stopped
working without notice. Rebooting in diag mode and with diag-level set
to the max reported no problem (given previous ECC message, I was
expecting something related to the memory)..

We can't keep the system up and running, it now always fails after
having worked properly in the same conditions (as a warm standby server,
hence mostly idling) during a couple of years.

I know the firmware is somewhat outdated and there's is an OPB patch
available (118323-01: OBP 4.16.4) that fixes a "Corrected ECC Error
during boot" (problem #5018979). Though I've no details and I'm not sure
this would fix actual problem but maybe some symptoms. And again: this
computer was working...

As I've limited access to this system in situ, I try to gather as many
suggestions as possible before to get back there again.

I'm suspecting a hardware problem, probably related to the main memory
or the cache... Any opinion?...

Many thanks, Eric
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:42:40 EDT