DS10 CPU correctable error

From: Daniel Lungu (lungu@nagra.com)
Date: Mon Jul 01 2002 - 12:04:47 EDT


Hello everybody!

I have just experienced what looks like a CPU problem on a DS10 that worked fine
for months...

After a halt command, the SRM console did not come back:

# halt
....Halt completed....
syncing disks... done
CPU 0: Halting... (transferring to monitor)

CP - SAVE_TERM routine to be called
CP - SAVE_TERM exited with hlt_req = 1, r0 = 00000000.00000000

halted CPU 0

halt code = 5
HALT instruction executed
PC = ffffffff002263d0
Resetting I/O buses...
-----frozen-here-----

Then, after a power cycle I could see the following messages:

2048 Meg of system memory
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 0, slot 9 -- ewa -- DE500-BA Network Controller
bus 0, slot 11 -- ewb -- DE500-BA Network Controller
bus 0, slot 13 -- dqa -- Acer Labs M1543C IDE
bus 0, slot 13 -- dqb -- Acer Labs M1543C IDE
bus 2, slot 4 -- pka -- NCR 53C895
bus 2, slot 5 -- eia -- DE600-AA
bus 2, slot 6 -- vga -- Permedia - P2V Graphics Controller
bus 0, slot 16 -- pkb -- NCR 53C895
initializing GCT/FRU at 3ff52000

Processor correctable error through vector 630.

Machine Check Logout Frame @ 0x6000 Code = 0x86

Alpha 21264 IPRs (CPU 0):
I_STAT: 0000000000000000 DC_STAT: 0000000000000008
C_ADDR: 0000000000048A40 DC1_SYNDROME: 0000000000000000
DC0_SYNDROME: 0000000000000094 C_STAT: 000000000000000B
C_STS: 000000000000000D MM_STAT: 0000000000000000

Processor correctable error through vector 630.

Machine Check Logout Frame @ 0x6000 Code = 0x86

Alpha 21264 IPRs (CPU 0):
I_STAT: 0000000000000000 DC_STAT: 0000000000000008
C_ADDR: 0000000000048E80 DC1_SYNDROME: 0000000000000000
DC0_SYNDROME: 0000000000000094 C_STAT: 000000000000000B
C_STS: 000000000000000D MM_STAT: 0000000000000000

Processor correctable error through vector 630.

Machine Check Logout Frame @ 0x6000 Code = 0x86

Alpha 21264 IPRs (CPU 0):
I_STAT: 0000000000000000 DC_STAT: 0000000000000008
C_ADDR: 0000000000076900 DC1_SYNDROME: 0000000000000000
DC0_SYNDROME: 0000000000000094 C_STAT: 000000000000000B
C_STS: 0000000000000008 MM_STAT: 0000000000000000
T
Processor correctable error through vector 630.

Machine Check Logout Frame @ 0x6000 Code = 0x86

Alpha 21264 IPRs (CPU 0):
I_STAT: 0000000000000000 DC_STAT: 0000000000000008
C_ADDR: 00000000000637C0 DC1_SYNDROME: 0000000000000000
DC0_SYNDROME: 0000000000000094 C_STAT: 000000000000000B
C_STS: 0000000000000008 MM_STAT: 0000000000000000
esting the System
Testing the Disks (read only)
Testing ei* devices.

If this could help:

>>>show config
                        COMPAQ AlphaServer DS10 617 MHz

SRM Console: V5.9-4
PALcode: OpenVMS PALcode V1.90-76, Tru64 UNIX PALcode V1.86-68

Processors
CPU 0 Alpha 21264A-9 617 MHz SROM Revision: V1.18.208
                Bcache size: 2 MB

Core Logic
Cchip DECchip 21272-CA Rev 2
Dchip DECchip 21272-DA Rev 2
Pchip 0 DECchip 21272-EA Rev 2

TIG Rev 2.1
Arbiter Rev 7.30 (0xfe)

MEMORY

Array # Size Base Addr
------- ---------- ---------
   0 1024 MB 000000000
   1 1024 MB 040000000

Total Bad Pages = 0
Total Good Memory = 2048 MBytes
-----cut-here-----

I also tried:

>>>clear_error all
>>>init

and got a "processor correctable error" report again.

Does anybody have a clue?

Thanks,
Daniel Lungu



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:45 EDT