SUMMARY: E450 rebooting

From: Connolly, Michael (Michael.Connolly@itt.com)
Date: Thu Mar 20 2003 - 08:27:33 EST


A number of you suggested that possibly something "changed" with the move of
the server and the rack mounting (change: memory memory/cpu came unseated,
borderline temperature issue in the rack, etc.). Also, some recommended
upgrading the OBP and kernel patch (always good advice but I'm prone to "if
it ain't broke - don't fix it so that is why my patch level is so old). But
the general consensus was that this will be difficult to debug as there are
no errors being logged or crash dumps generated so it would be good to
attach a serial console. So my plan (when I get a window of time to shut it
down):

Power off box and re-seat everything
Upgrade OBP to OBP_3.26.0 (from patch 106122-10)
Patch the system ( Solaris 8 Recommended Patch Cluster) to the latest
March/18/03 - a little bothersome as it is SO new but well see...
See what happens...

I also have a fan for the roof of the rack that I will install - this was on
the advice of a Sun reseller prior to this problem - maybe he's clairvoyant
(no, he didn't sell me the fan).

Thanks to:

Sean Berry
Karl Rossing
Robert Wood
Willie Flint
Pascal Grostabussiat
Ann Kurokawa
Octave Orgeron
Laurence Moughan
Joe Fletcher

Regards,

Michael

Original post:

I have an E450 w/20 9Gb drives, 2Gb RAM, Disksuite 4.2.1 (RAID 0+1) and
Solaris 2.8 108528-04 (old, I know), OBP 3.16.2 2000/01/11 15:42 POST
6.0.9 2000/01/11 15:43. Primary app is Oracle 8.0.6 Well, this box has been
rock solid for 2 years. Now, on March 15 it came to a halt - just stopped.
No apparent crash or reboot; it just hung. I turn the key switch from Locked
to Power on and it automatically re-booted. Checked dmesg and prtdiag and
they show nothing out of the ordinary. Now, ten minutes ago the system
automatically rebooted (I left the key in the Power On position on March
15). Again, checked dmesg and prtdiag and they show nothing out of the
ordinary.

All that has been done in the past 2 years:

Added 1 Gb RAM in December to bring it up to 2Gb
Moved machine from 1 building to another in Dec. '02
Changed IP using sys-unconfig in Dec. '02 and February '03
Rack mounted machine in Feb. '03

I should probably download/install the latest patch cluster dated Mar/18/03
- as this is a new one has anyone loaded it yet? Any problems? (I've never
updated the patches as "if it ain't broke, don't fix it).

As this machine has "behaved" for so long I'm at a loss as to where to begin
to diagnose but am very concerend as this machine hosts Oracle for users
around the world. Any ideas would be helpful. TIA
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:02 EDT