SUMMARY: Memory/CPU failure ES40

From: Christian Becker (Christian.Becker@Mathematik.Uni-Dortmund.DE)
Date: Wed Aug 07 2002 - 03:33:36 EDT


Hi folks,

my problem with an ES40 (EV67 666Mhz, 8GB RAM) is the following:

/var/adm/messages:

Jul 22 16:12:15 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 2. Reporting suspended.

During the memory test the following messages appear:

EV6 Correctable Memory Fill ECC Error on CPU 0
C_ADDR: 00000000A8FC5BC0
C_SYNDROME_1: 0000000000000057
C_SYNDROME_0: 0000000000000000

My question: What kind of error (Memory, CPU, Board) is this?

Many thanks for all answers!

Bye,
Christian

-----------------------------------------------------

Jim Lola wrote:

I would recommend that you call in HPQ Field Service to check this
system
over. There have been known issues with a specific manufacture of
ES40s.
There does exist an HPQ Field Change Order (FCO) to change the CPU
boards
for certain ES40s. Do you have such a system? I do not know nor could
I
tell you. The HPQ Field Service Engineer would know this based on the
serial number of the CPU boards...

-----------------------------------------------------

Dr Thomas.Blinn wrote:

It's almost certainly a memory error. Authorized HP service depots have
documentation available to them that helps decode the platform specific
messages. If you want to try to diagnose this yourself, you need to get
the hardware service manuals for the system (usually pretty pricey) or
find someone who has a set and will let you copy the relevant info from
them (they are copyrighted works and depending on your countries laws,
doing that may be illegal), or you can try to use trial and error (that
is, swap memory out until you isolate the failing parts). The latter
may be the simplest. I know that I don't have the ES40 service guide,
nor do I even know the part number to order a copy.

Your guess that it's bank 1 based on the C_ADDR assumes that the memory
has not been configured "interleaved" by console firmware. I believe
that on that platform, when all the memories are the same size and all
the banks are fully populated, the memory gets interleaved, but I'd have
to go read the tech manuals to be sure. And I don't have them..

-----------------------------------------------------

Cohen, Andy wrote:

My suggestion would be to contact HPaq (the new HP/Compaq). They'll
want
you to send them the binary.errlog. They can then troubleshoot the
problem
directly. When we've had memory problems they've only been able to
point
the problem to 1 of 2 DIMMS -- they can't pinpoint it to the exact DIMM.
They've then replaced both DIMMS in question.

-----------------------------------------------------

Matthew Wild wrote:

        we had similar errors on our 2 cpu ES45 and after analysing the
error log using WEBES and playing around with the memory positions,
Compaq
support said that it was the processors causing problems. It appears
there
were early EV67 that had a slight design flaw that could allow the
heatsink/chip/board sandwich to start separating and so cause these
errors. Engineer came round and replaced both our cpus.

-----------------------------------------------------

Wheelock, Michael D wrote:

I had this symptom on an ES/45 system. It is almost certainly a cpu
issue.
To my knowledge (and experience with compaq services), there is no
really
fool proof way to diagnose which cpu is dead. We ended up switching
them until it stopped.

-----------------------------------------------------

alan wrote:

Modern versions of Compaq Analyze should have rules that
can analyze the events and suggest what needs to be fixed.
CA is part of the WEBES kit that is on the Associated
Products CDROMs. Service packs should be available from
the corporate Services web site that will have bug fixes
to the tools and new rules.

-- 
         v          
      ..d8b..       Dipl.inform. Christian Becker 
  ..:::d888b:::..
 :::::d88888b:::::  Institut fuer Angewandte Mathematik & Numerik, LS3
:::::d8888888b::::: Universitaet Dortmund 
::::d888888888b:::: Vogelpothsweg 87, 44227 Dortmund, Germany 
 ::{8888P"::"V8,::  Voicemail: +49 231 755 5934 FAX: +49 231 755 5933 
  :D8P":::::::VD:   mailto:Christian.Becker@mathematik.uni-dortmund.de  
  dP  ```````   Y


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:48 EDT