SUMMARY: DS10 CPU correctable error

From: Daniel Lungu (lungu@nagra.com)
Date: Tue Jul 02 2002 - 03:11:10 EDT


In conclusion this error could be corrected by replacing CPU/cache/memory. How
could a halt cause such a disaster?

"This error is usually associated with faulty cache on the CPU card. However, it
could also mean you have faulty memory, despite the 'no bad pages' report. Time
to call maintenance..."

"Get HP/Q to replace the faulting CPU"

Sorry for posting the question mail twice.

Daniel

Thanks to everybody and those who replied:

Peter Reynolds
Lucien Hercaud
Jim Caldwell

---------- Original message ----------
Date: Mon, 01 Jul 2002 18:04:47 +0200 (W. Europe Daylight Time)
From: Daniel Lungu <lungu@nagra.com>
To: tru64-unix-managers@ornl.gov
Followup-To: poster
Subject: DS10 CPU correctable error

Hello everybody!

I have just experienced what looks like a CPU problem on a DS10 that worked fine
for months...

After a halt command, the SRM console did not come back:

# halt
....Halt completed....
syncing disks... done
CPU 0: Halting... (transferring to monitor)

CP - SAVE_TERM routine to be called
CP - SAVE_TERM exited with hlt_req = 1, r0 = 00000000.00000000

halted CPU 0

halt code = 5
HALT instruction executed
PC = ffffffff002263d0
Resetting I/O buses...
-----frozen-here-----

Then, after a power cycle I could see the following messages:

2048 Meg of system memory
probing hose 0, PCI
probing PCI-to-ISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 0, slot 9 -- ewa -- DE500-BA Network Controller
bus 0, slot 11 -- ewb -- DE500-BA Network Controller
bus 0, slot 13 -- dqa -- Acer Labs M1543C IDE
bus 0, slot 13 -- dqb -- Acer Labs M1543C IDE
bus 2, slot 4 -- pka -- NCR 53C895
bus 2, slot 5 -- eia -- DE600-AA
bus 2, slot 6 -- vga -- Permedia - P2V Graphics Controller
bus 0, slot 16 -- pkb -- NCR 53C895
initializing GCT/FRU at 3ff52000

Processor correctable error through vector 630.

Machine Check Logout Frame @ 0x6000 Code = 0x86

Alpha 21264 IPRs (CPU 0):
I_STAT: 0000000000000000 DC_STAT: 0000000000000008
C_ADDR: 0000000000048A40 DC1_SYNDROME: 0000000000000000
DC0_SYNDROME: 0000000000000094 C_STAT: 000000000000000B
C_STS: 000000000000000D MM_STAT: 0000000000000000

Processor correctable error through vector 630.

Machine Check Logout Frame @ 0x6000 Code = 0x86

Alpha 21264 IPRs (CPU 0):
I_STAT: 0000000000000000 DC_STAT: 0000000000000008
C_ADDR: 0000000000048E80 DC1_SYNDROME: 0000000000000000
DC0_SYNDROME: 0000000000000094 C_STAT: 000000000000000B
C_STS: 000000000000000D MM_STAT: 0000000000000000

Processor correctable error through vector 630.

Machine Check Logout Frame @ 0x6000 Code = 0x86

Alpha 21264 IPRs (CPU 0):
I_STAT: 0000000000000000 DC_STAT: 0000000000000008
C_ADDR: 0000000000076900 DC1_SYNDROME: 0000000000000000
DC0_SYNDROME: 0000000000000094 C_STAT: 000000000000000B
C_STS: 0000000000000008 MM_STAT: 0000000000000000
T
Processor correctable error through vector 630.

Machine Check Logout Frame @ 0x6000 Code = 0x86

Alpha 21264 IPRs (CPU 0):
I_STAT: 0000000000000000 DC_STAT: 0000000000000008
C_ADDR: 00000000000637C0 DC1_SYNDROME: 0000000000000000
DC0_SYNDROME: 0000000000000094 C_STAT: 000000000000000B
C_STS: 0000000000000008 MM_STAT: 0000000000000000
esting the System
Testing the Disks (read only)
Testing ei* devices.

If this could help:

>>>show config
                        COMPAQ AlphaServer DS10 617 MHz

SRM Console: V5.9-4
PALcode: OpenVMS PALcode V1.90-76, Tru64 UNIX PALcode V1.86-68

Processors
CPU 0 Alpha 21264A-9 617 MHz SROM Revision: V1.18.208
                Bcache size: 2 MB

Core Logic
Cchip DECchip 21272-CA Rev 2
Dchip DECchip 21272-DA Rev 2
Pchip 0 DECchip 21272-EA Rev 2

TIG Rev 2.1
Arbiter Rev 7.30 (0xfe)

MEMORY

Array # Size Base Addr
------- ---------- ---------
   0 1024 MB 000000000
   1 1024 MB 040000000

Total Bad Pages = 0
Total Good Memory = 2048 MBytes
-----cut-here-----

I also tried:

>>>clear_error all
>>>init

and got a "processor correctable error" report again.

Does anybody have a clue?

Thanks,
Daniel Lungu



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:45 EDT