E4000_Board_Failure

From: Mr Rene Occelli (rene@polytech.univ-mrs.fr)
Date: Mon Mar 15 2004 - 09:04:50 EST


Hi,
I have an E4000 with 3 boards (6 Cpus 250MHZ 1 Gig memory) running Solaris 8
This machine runs mainly as a NIS and license server.
#uname -a
SunOS Hostname 5.8 Generic_108528-12 sun4u sparc SUNW,Ultra-Enterprise
#cat /etc/release
                        Solaris 8 6/00 s28s_u1wos_08 SPARC
           Copyright 2000 Sun Microsystems, Inc. All Rights Reserved.
                             Assembled 26 April 2000

One Saturday morning the machine has rebooted and disabled a board.
Nothing in /var/adm/messages except :

Date Hostname unix: [ID 796976 kern.notice] System booting after fatal error FATAL
...
Date Hostname fhc: [ID 744982 kern.notice] NOTICE: failed cpu board in slot 3

Output of prtdiag -v gives: ( partial, can send total on demand)

System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise 4000/5000
System clock frequency: 82 MHz
Memory size: 768Mb

................
No failures found in System
===========================

Detected System Faults
======================
PROM detected failure
        Detected Date

Most recent AC Power Failure:
=============================
xxxxxxxxxxxxxxxxxxxxxxx

========================= Environmental Status =========================
Keyswitch position is in Secure Mode
System Power Status: Redundant
System LED Status: GREEN YELLOW GREEN
WARNING ON ON BLINKING
...........
========================= HW Revisions =========================

ASIC Revisions:
---------------
Brd FHC AC SBus0 SBus1 PCI0 PCI1 FEPS Board Type Attributes
--- --- -- ----- ----- ---- ---- ---- ---------- ----------
 0 1 5 CPU 84MHz Capable
 1 1 5 1 1 22 Dual-SBus 84MHz Capable
 2 1 5 CPU 100MHz Capable

System Board PROM revisions:
----------------------------
Board 0: OBP 3.2.28 2000/12/20 12:24 POST 3.9.28 2000/12/20 12:29
Board 1: FCODE 1.8.28 2000/12/20 12:20 iPOST 3.4.28 2000/12/20 12:28
Board 2: OBP 3.2.28 2000/12/20 12:24 POST 3.9.28 2000/12/20 12:29

Analysis of most recent Fatal Hardware Watchdog:
======================================================
Log Date: Date

Analysis for Board 3
--------------------
AC: UPA Port A Dtag Parity Error
        The error could be caused by:
                Data Tags for UPA Port A
                Address Controller

Question:
How can I found what is wrong in the board ( cpu , memory , bus ,... )?
During the boot I can see many test running ( leds are blinking) , but no messages on the console.
Is this board really out ?
Many thanks
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Rene OCCELLI +
+ Polytech Marseille IUSTI Lab. C.N.R.S. U.M.R. 6595 +
+ Technopole de Chateau Gombert +
+ 5 Rue Enrico FERMI +
+ 13453 MARSEILLE Cedex 13 France +
+ Tel: (33)04 91 10 69 37 04 91 10 69 38 +
+ Fax: (33)04 91 10 69 69 +
+ Email: rene@polytech.univ-mrs.fr +
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:28:16 EDT