Info

From: Lars Van-Casteren (lars.van-casteren@aircominternational.com)
Date: Tue Jun 06 2006 - 05:24:33 EDT


Hello List,

I came across this is in the message log:

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 934035 kern.info]
NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU2 at TL=0,
errID 0x00112472.93bbef64

Jun 5 11:27:25 backbase AFSR 0x00000002<CE>.00000007 AFAR
0x000000b0.7a160b40

Jun 5 11:27:25 backbase Fault_PC 0x10199ae84 Esynd 0x0007 Slot B:
J7901

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 140262 kern.info]
[AFT0] errID 0x00112472.93bbef64 Corrected Memory Error on Slot B: J7901
is Intermittent

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 590598 kern.info]
[AFT0] errID 0x00112472.93bbef64 Data Bit 47 was in error and corrected

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 873678 kern.info]
[AFT2] errID 0x00112472.93bbef64 PA=0x000000b0.7a160b40

Jun 5 11:27:25 backbase E$tag 0x000002c1.e8010080 E$state_5
Exclusive

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 895151 kern.info]
[AFT2] E$Data (0x00) 0x689b0241.b7b50019 0x0002c102.0005c419 ECC 0x0e7

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 895151 kern.info]
[AFT2] E$Data (0x10) 0x432d2a00.074c4732 0x31313435.000a0000 ECC 0x0dd

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 895151 kern.info]
[AFT2] E$Data (0x20) 0x66670241.29730003 0x0006c503.4206621d ECC 0x1ac

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 895151 kern.info]
[AFT2] E$Data (0x30) 0x00cb0ba6.4f7c8900 0x05c44a53.37040006 ECC 0x1d1

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 929717 kern.info]
[AFT2] D$ data not available

Jun 5 11:27:25 backbase SUNW,UltraSPARC-III+: [ID 335345 kern.info]
[AFT2] I$ data not available

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 777785 kern.info]
NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU0 at TL=0,
errID 0x0011277d.926fee18

Jun 5 12:23:11 backbase AFSR 0x00010002<EMC,CE>.00020007 AFAR
0x000000b0.78170b40 AMBIGUOUS

Jun 5 12:23:11 backbase Fault_PC 0x1011c0234 Esynd 0x0007

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 456856 kern.info]
[AFT0] errID 0x0011277d.926fee18 Data Bit 47 was in error and corrected

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 676017 kern.info]
[AFT2] errID 0x0011277d.926fee18 PA=0x000000b0.78170b40

Jun 5 12:23:11 backbase E$tag 0x000002c1.e0490000 E$state_5
Exclusive

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 895151 kern.info]
[AFT2] E$Data (0x00) 0xc5025047.4a5503c2 0x340802c1.0d06c503 ECC 0x0dd

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 895151 kern.info]
[AFT2] E$Data (0x10) 0x421d4f26.00000241 0x9dae006a.02c10206 ECC 0x0b8

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 895151 kern.info]
[AFT2] E$Data (0x20) 0xc5025047.4a5406c5 0x0250474a.5503c234 ECC 0x032

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 895151 kern.info]
[AFT2] E$Data (0x30) 0x0802c10d.06c50342 0x1d364600.0002419d ECC 0x01a

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 929717 kern.info]
[AFT2] D$ data not available

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 335345 kern.info]
[AFT2] I$ data not available

Jun 2 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 706982 kern.info]
NOTICE: [AFT0] EMC Event detected by CPU0 at TL=0, errID
0x0011277d.926fee18

Jun 5 12:23:11 backbase AFSR 0x00010002<EMC,CE>.00020007 AFAR
0x000000b0.78170b40 AMBIGUOUS

Jun 5 12:23:11 backbase Fault_PC 0x1011c0234 Msynd 0x0002

Jun 5 12:23:11 backbase SUNW,UltraSPARC-III+: [ID 546503 kern.info]
[AFT0] errID 0x0011277d.926fee18 MTAG Check Bit 1 was in error and
corrected

I believe this to be a memory error on the first cpu board in the system
on slot B: J7901

But, I haven't had yet any practical experience on how this might be
really a hardware error and how this might cause problems or fail the
memory board all together very soon.

System seems stable and all applications running on it are not throwing
errors in log files.

Prtdiag assures me nothing is faulty.

Can someone inform me that I either shouldn't worry, or should I should
start opening up a support call to have the board tested / replaced.

Are there any test procedures I can run myself on a production system to
do some memory testing ?

Thanks!

L
Great user experience benefits everyone.

Find out <http://www.aircominternational.com/handset_validation.html> how
AIRCOM's Handset Validation Services help operators and handset manufacturers
improve user experience and speed up time-to-market of mobile handsets.
Disclaimer:

The information contained in this e-mail, including any attachments to it, is
confidential and intended
only for the person(s) to whom it is addressed. Any examination, distribution,
disclosure, printing, or
copying of this information, or reliance upon this information by any person
other than the intended
recipient(s) is strictly prohibited. If this e-mail has been misdirected and
you are not the intended
recipient, please notify the sender immediately and delete this e-mail from
your system. The views and
opinions contained in this transmission represent those of the author and do
not necessarily reflect
those of AIRCOM International. AIRCOM International may monitor incoming and
outgoing e-mails. By
replying to this message, you consent to this monitoring. This e-mail has been
scanned by McAfee Group
Shield prior to transmission. However, recipients are advised to apply their
own antivirus detection
measures to this e-mail and any attachments upon receipt. AIRCOM International
does not accept
liability for any damage or losses arising as a result of receiving this
e-mail.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:40:02 EDT