E3500 crash

From: R. Marc Baldus (rbaldus@e-one.com)
Date: Thu May 16 2002 - 07:54:38 EDT


Help!

ANY thoughts or ideas are welcome.

I have an E3500 that keeps crashing. I have replaced several parts to
no avail. I have our support working on it, in fact, we have been on it
for *SEVERAL* months. It is a production Oracle 8.1.7 server so down
time must be scheduled but is doable.

The last time this happened, about a month ago, it called out cpu6 and
J3200 on board 3. So I replaced them. The time before that the
messages look much as they do now.

Below is some configuration information and a cut from the messages log.
  Server room temperature is at 68 - 69 degrees.

Thanks in advance,
Marc B.

****************************************
****************************************

E3500
4 - 400MHz 8M CPU ( Made in Canada stamp )
2 - 1GB RAM Modules ( 1GB in bank 0 on board 3 and 1GB in bank 0 on board 5)

****************************************
****************************************

uname -a
SunOS ch1065 5.8 Generic_108528-12 sun4u sparc SUNW,Ultra-Enterprise

****************************************
****************************************

# /usr/plat*/sun4u/sbin/prtdiag -v
System Configuration: Sun Microsystems sun4u 5-slot Sun Enterprise E3500
System clock frequency: 100 MHz
Memory size: 2048Mb

========================= CPUs =========================

                     Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
  3 6 0 400 8.0 US-II 10.0
  3 7 1 400 8.0 US-II 10.0
  5 10 0 400 8.0 US-II 10.0
  5 11 1 400 8.0 US-II 10.0

========================= Memory =========================

                                               Intrlv. Intrlv.
Brd Bank MB Status Condition Speed Factor With
--- ----- ---- ------- ---------- ----- ------- -------
  3 0 1024 Active OK 60ns 2-way A
  5 0 1024 Active OK 60ns 2-way A

========================= IO Cards =========================

      Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---- --------------------------------
----------------------
  1 SBus 25 0 cgsix SUNW,501-2325

  1 SBus 25 1 network SUNW,sbus-gem

  1 SBus 25 3 SUNW,hme

  1 SBus 25 3 SUNW,fas/sd (block)

  1 SBus 25 13 SUNW,socal/sf (scsi-3) 501-3060

  9 SBus 25 1 QLGC,isp/sd (block) QLGC,ISP1000

  9 SBus 25 3 SUNW,hme

  9 SBus 25 3 SUNW,fas/sd (block)

  9 SBus 25 13 SUNW,socal/sf (scsi-3) 501-3060

No failures found in System
===========================

No System Faults found
======================

Most recent AC Power Failure:
=============================
Tue Mar 12 13:08:19 2002

========================= Environmental Status =========================
Keyswitch position is in Normal Mode
System Power Status: Redundant
System LED Status: GREEN YELLOW GREEN
Normal ON OFF BLINKING

Fans:
-----
Unit Status
---- ------
Disk OK

System Temperatures (Celsius):
------------------------------
Brd State Current Min Max Trend
--- ------- ------- --- --- -----
  1 OK 39 38 41 stable
  3 OK 34 31 36 stable
  5 OK 36 33 37 stable
  9 OK 39 37 41 stable
CLK OK 35 35 37 stable

Power Supplies:
---------------
Supply Status
--------- ------
1 OK
3 OK
5 OK
PPS OK
     System 3.3v OK
     System 5.0v OK
     Peripheral 5.0v OK
     Peripheral 12v OK
     Auxilary 5.0v OK
     Peripheral 5.0v precharge OK
     Peripheral 12v precharge OK
     System 3.3v precharge OK
     System 5.0v precharge OK
AC Power OK

========================= HW Revisions =========================

ASIC Revisions:
---------------
Brd FHC AC SBus0 SBus1 PCI0 PCI1 FEPS Board Type Attributes
--- --- -- ----- ----- ---- ---- ---- ---------- ----------
  1 1 5 1 1 22 Dual-SBus-SOC+ 100MHz
Capable
  3 1 5 CPU 100MHz
Capable
  5 1 5 CPU 100MHz
Capable
  9 1 5 1 1 22 Dual-SBus-SOC+ 100MHz
Capable

System Board PROM revisions:
----------------------------
Board 1: FCODE 1.8.29 2001/06/18 17:26 iPOST 3.4.29 2001/06/18 17:49
Board 3: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
Board 5: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
Board 9: FCODE 1.8.29 2001/06/18 17:26 iPOST 3.4.29 2001/06/18 17:49

****************************************
****************************************

/var/adm/messages

May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 758999 kern.warning]
WARNING: [AFT1] Uncorrectable Memory Error on CPU7 Data access at TL=0,
errID 0x0001b306.218661ea
May 15 20:48:15 ch1065 AFSR 0x00000000.80300000<PRIV,UE,CE> AFAR
0x00000000.3f918c90
May 15 20:48:15 ch1065 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0x101429f0
May 15 20:48:15 ch1065 UDBH 0x03af<UE,CE> UDBH.ESYND 0xaf UDBL
0x0189<CE> UDBL.ESYND 0x89
May 15 20:48:15 ch1065 UDBH Syndrome 0xaf Memory Module Board 3
J3100 J3200 J3300 J3400 J3500 J3600 J3700 J3800
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 496838 kern.info] [AFT2]
errID 0x0001b306.218661ea PA=0x00000000.3f918c90
May 15 20:48:15 ch1065 E$tag 0x00000000.18c007f2 E$State: Exclusive
E$parity 0x0c
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x00): 0x454c4506.0240c6c2
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x08): 0x00130000.10534845
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x10): 0x4c4c2043.55532e53
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x18): 0x4d20454c.45060240
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x20): 0xc6c20001.00001053
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x28): 0x48454c4c.20435553
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x30): 0x2e534d20.454c4506
May 15 20:48:15 ch1065 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data (0x38): 0x0240c6c1.00810000
May 15 20:48:15 ch1065 unix: [ID 836849 kern.notice]
May 15 20:48:15 ch1065 ^Mpanic[cpu7]/thread=2a1033bdd40:
May 15 20:48:15 ch1065 unix: [ID 385482 kern.notice] [AFT1] errID
0x0001b306.218661ea UE Error(s)
May 15 20:48:15 ch1065 See previous message(s) for details
May 15 20:48:15 ch1065 unix: [ID 100000 kern.notice]
May 15 20:48:15 ch1065 genunix: [ID 723222 kern.notice] 000002a1033bc870
SUNW,UltraSPARC-II:cpu_aflt_log+4e0 (2a1033bc92e, 1, 10147058,
2a1033bcab8, 2a1033bc97b, 10147080)
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:24:20 EDT