CPU Panic

From: J Bacher (jb@jbacher.com)
Date: Fri Aug 10 2007 - 11:25:13 EDT


Tru64 v5.1B
Dual CPU
2G memory

PROBLEM: cpu 0 panic system uncorrectable machine check

ON BOOT: error identifies problem in Bank 3 DIMM 0 Byte 4

NOTES:
  * We have swapped to a different chassis/motherboard/etc using only same CPUs
and memory
  * CPU0 swapped with spare
  * CPU0's memory swapped with spare
  * We then took the original CPU0 and put it into CPU1 (still cpu 0 panic)
  * We reduced the memory down to 1M (so that above dimm/bank was empty)
  * We have swapped the memory between the two banks (expecting to see the error
move, it did not).

We have replaced everything, tried moving CPUs and memory (cpu0<->cpu1) to see
if the problem traveled. We are still crashing and always on boot we see the
same memory error even when memory was swapped out and when no memory was in
that bank.

MESSAGES:
Aug 10 08:19:18 ns vmunix: Machine check code = 0x100000202
Aug 10 08:19:18 ns vmunix: Ibox Status =
0000000000000000
Aug 10 08:19:18 ns vmunix: Dcache Status =
0000000000000000
Aug 10 08:19:18 ns vmunix: Cbox Address =
0000000000000000
Aug 10 08:19:18 ns vmunix: Fill Syndrome 1 =
0000000000000000
Aug 10 08:19:18 ns vmunix: Fill Syndrome 0 =
0000000000000000
Aug 10 08:19:18 ns vmunix: Cbox Status =
0000000000000000
Aug 10 08:19:18 ns vmunix: EV6 captured status of Bcache mode =
0000000000000000
Aug 10 08:19:19 ns vmunix: EV6 Exception Address =
000003ff81666b10
Aug 10 08:19:19 ns vmunix: EV6 Interrupt Enablement and Current Processor
mode = 0000003ee0000008
Aug 10 08:19:19 ns vmunix: EV6 Interrupt Summary Register =
0000000200000000
Aug 10 08:19:19 ns vmunix: EV6 TBmiss or Fault status =
0000000000000000
Aug 10 08:19:19 ns vmunix: EV6 PAL Base Address =
0000000000018000
Aug 10 08:19:19 ns vmunix: EV6 Ibox control =
fffffe001e304396
Aug 10 08:19:19 ns vmunix: EV6 Ibox Process_context =
00000a0000000000
Aug 10 08:19:19 ns vmunix: O/S Summary flag =
0000000000000005
Aug 10 08:19:19 ns vmunix: Cchip Base Address (phys) =
00000f01a0000000
Aug 10 08:19:19 ns vmunix: Cchip Device Raw Interrupt Request =
4000000000000000
Aug 10 08:19:19 ns vmunix: DRIR Register Decode:
Aug 10 08:19:19 ns vmunix: Bit 62: Error from Pchip 0
Aug 10 08:19:19 ns vmunix: PCI Device Interrupt Mask =
0000000000000000
Aug 10 08:19:19 ns vmunix: Cchip Miscellaneous Register =
0000000100000000
Aug 10 08:19:19 ns vmunix: Misc Register Decode:
Aug 10 08:19:19 ns vmunix: Bit 32: CChip Rev (Bit<32>)
Aug 10 08:19:19 ns vmunix: Cchip Revision: 01
Aug 10 08:19:19 ns vmunix: ID of CPU performing read: 00
Aug 10 08:19:19 ns vmunix: Pchip 0 Base Address (phys) =
00000f0180000000
Aug 10 08:19:19 ns vmunix: Pchip 0 Error Register =
5d0066a6bc780801
Aug 10 08:19:19 ns vmunix: Pchip Error Register Decode:
Aug 10 08:19:19 ns vmunix: Bit 0: Lost Error
Aug 10 08:19:19 ns vmunix: Bit 11: Correctable ECC Error
Aug 10 08:19:19 ns vmunix: System Address = 0000000066a6bc78
Aug 10 08:19:19 ns vmunix: Command: DMA Read
Aug 10 08:19:19 ns vmunix: ECC Syndrome: 5d
Aug 10 08:19:20 ns vmunix: Pchip 1 Base Address (phys) =
00000f0380000000
Aug 10 08:19:20 ns vmunix: Pchip 1 Error Register =
0000000000000000
Aug 10 08:19:20 ns vmunix: Pchip Error Register Decode:
Aug 10 08:19:20 ns vmunix: PCI Xaction Start Address =
0000000000000000
Aug 10 08:19:20 ns vmunix: PCI Command: Interrupt Acknowledge
Aug 10 08:19:20 ns vmunix: panic (cpu 0): System Uncorrectable Machine Check

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:35 EDT