Alphaserver ES40 crash

From: Vimal Upreti (vimal@iitcoman.com)
Date: Sat Nov 22 2003 - 02:53:51 EST


Hi all,
One of the Alphaserver ES40 is crashing intermittently (one's in a day
or 2 days). I checked all the logs, couldn't really pin-point the
problem. Though it looks like a CPU or Motherboard problem. Following is
the /var/adm/messages file:

Nov 12 11:16:22 ldcems01 vmunix: Environmental Monitoring Subsystem
Configured.
Nov 12 11:16:35 ldcems01 vmunix: SuperLAT. Copyright 1994 Meridian
Technology Corp. All rights reserved.
Nov 15 08:34:52 ldcems01 vmunix: Machine Check SYSTEM Fatal Abort
Nov 15 08:34:52 ldcems01 vmunix: Machine check code = 0x100000202
Nov 15 08:34:52 ldcems01 vmunix: Ibox Status =
0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: Dcache Status= 0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: Cbox Address= 0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: Fill Syndrome 1=
0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: Fill Syndrome 0=
0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: Cbox Status =
0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: EV6 captured status of Bcache
mode = 0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: EV6 Exception Address=
fffffc00002e7f60
Nov 15 08:34:52 ldcems01 vmunix: EV6 Interrupt Enablement and
Current Processor mode = 0000007ee0000000
Nov 15 08:34:52 ldcems01 vmunix: EV6 Interrupt Summary Register
= 0000000200000000
Nov 15 08:34:52 ldcems01 vmunix: EV6 TBmiss or Fault status =
0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: EV6 PAL Base Address=
0000000000018000
Nov 15 08:34:52 ldcems01 vmunix: EV6 Ibox control =
fffffe001e304396
Nov 15 08:34:52 ldcems01 vmunix: EV6 Ibox Process_context=
0000000000000000
Nov 15 08:34:52 ldcems01 vmunix: O/S Summary flag=
0000000000000004
Nov 15 08:34:52 ldcems01 vmunix: Cchip Base Address (phys)=
00000f01a0000000
Nov 15 08:34:52 ldcems01 vmunix: Cchip Device Raw Interrupt
Request = 4000000000000000
Nov 15 08:34:52 ldcems01 vmunix: DRIR Register Decode:
Nov 15 08:34:52 ldcems01 vmunix: Bit 62: Error from Pchip
0
Nov 15 08:34:53 ldcems01 vmunix: PCI Device Interrupt
Mask= 0000000000000000
Nov 15 08:34:53 ldcems01 vmunix: Cchip Miscellaneous Register=
00000008000000e0
Nov 15 08:34:54 ldcems01 vmunix: Misc Register Decode:
Nov 15 08:34:54 ldcems01 vmunix: Bit 5: Interval Timer
Intr Pending to CPU 1
Nov 15 08:34:54 ldcems01 vmunix: Bit 6: Interval Timer
Intr Pending to CPU 2
Nov 15 08:34:54 ldcems01 vmunix: Bit 7: Interval Timer
Intr Pending to CPU 3
Nov 15 08:34:54 ldcems01 vmunix: Bit 35: CChip Rev
(Bit<35>)
Nov 15 08:34:54 ldcems01 vmunix: Cchip Revision: 08
Nov 15 08:34:54 ldcems01 vmunix: ID of CPU performing
read: 00
Nov 15 08:34:54 ldcems01 vmunix: Pchip 0 Base Address (phys)=
00000f0180000000
Nov 15 08:34:54 ldcems01 vmunix: Pchip 0 Error Register=
553003c008200400
Nov 15 08:34:54 ldcems01 vmunix: Pchip Error Register Decode:
Nov 15 08:34:54 ldcems01 vmunix: Bit 10: Uncorrectable
ECC Error
Nov 15 08:34:54 ldcems01 vmunix: System Address
= 0000000003c00820
Nov 15 08:34:55 ldcems01 vmunix: Command: SGTE Read
Nov 15 08:34:55 ldcems01 vmunix: ECC Syndrome: 55
Nov 15 08:34:55 ldcems01 vmunix: Pchip 1 Base Address (phys)=
00000f0380000000
Nov 15 08:34:55 ldcems01 vmunix: Pchip 1 Error Register=
0000000000000000
Nov 15 08:34:55 ldcems01 vmunix: Pchip Error Register Decode:
Nov 15 08:34:55 ldcems01 vmunix: PCI Xaction Start
Address= 0000000000000000
Nov 15 08:34:55 ldcems01 vmunix: PCI Command: Interrupt
Acknowledge
Nov 15 08:34:55 ldcems01 vmunix: panic (cpu 0): System Uncorrectable
Machine Check

Can you please check this file and suggest the solution.

Thanks & regards.
Vimal
 



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:44 EDT