[SUMMARY] UltraSparcII Ecache parity errors ["CBI event on CPU1" / "*Bad* PSYND=0x0004"]

From: David Foster (foster@dim.ucsd.edu)
Date: Wed Nov 20 2002 - 20:13:55 EST


My apologies, the Manager's List archives were down so I couldn't
tell that there are many posts about this.

This is an Ecache parity error on the CPU, a known problem with
the UltraII cpu's. Can happen when the cpu is under heavy load,
extremely intermittently, but if it happens multiple times then
Sun will replace the cpu under contract support. Just heard from a
Sun engineer that "best practices" is to wait for 3 occurances.
It's happened once; they recommended upgrading to the latest kernel
(108528-17 for Solaris 8) and see if it presents itself again.
Apparently rev -16 included some fixes to prevent spurious cpu
errors.

Apparently this usually hits cpu's with 8 meg cache, but sometimes
4 meg as well.

Rant (source anonymous)

   It never ceases to amaze me how well SUN kept the UltraII design
   problems quiet. In effect virtually a whole years
   production of chips was broken. A shortcut in the design
   (using parity instead of ECC on the cache) meant that
   thousands of these things had to be replaced. Never
   quite made the news though and how loud did they
   shout about the first Pentium being unable to add up.

Thanks to:

steven.ruby
Ryan Bishop
Will Enestvedt
rene_casalme
Tim Chipman
joe.fletcher

>
> Can anyone help with this, it doesn't look good...
>
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 672871 kern.info] NOTICE:
> [AFT2] errID 0x000644be.021b33e1 CBI event on CPU1
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 192776 kern.info] [AFT2]
errID
> 0x000644be.021b33e1 PA=0x00000000.00565000
> Nov 18 17:31:44 cressida E$tag 0x00000000.0e40000a E$State: Shared
E$parity
> 0x07
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
> (0x00): 0x00000000.00000000
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
E$Data
> (0x08): 0x00000000.00080000 *Bad* PSYND=0x0004
> Nov 18 17:31:44 cressida SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
E$Data
> (0x10): 0x00000000.00000000
>
> Dave
>
>

   << All opinions expressed are mine, not the University's >>

  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
   David Foster National Center for Microscopy and Imaging Research
    Programmer/Analyst University of California, San Diego
    dfoster@ucsd.edu Department of Neuroscience, Mail 0608
    (858) 534-7968 http://ncmir.ucsd.edu/
  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

   "The reasonable man adapts himself to the world; the unreasonable one
   persists in trying to adapt the world to himself. Therefore, all progress
   depends on the unreasonable." -- George Bernard Shaw
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:25:19 EDT