E420R strange crashing!!! HELP!!

From: Michael Schulte (michael.schulte@materna.de)
Date: Thu May 02 2002 - 07:43:27 EDT


Hi Gurus!

I have a very strange problem with an Enterprise 420 R, 2x
450MHz-Sparc-II, 1 GB RAM. This server crashes once or twice a week with
the following error-message in /var/adm/messages :

May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 967545 kern.warning]
WARNING: [AFT1] EDP event on CPU1 Data access at TL=0, errID
0x0000093d.592f9d58
May 2 11:18:32 <HOSTNAME> AFSR 0x00000000.80400200<PRIV,EDP> AFAR
0x00000000.3c865380
May 2 11:18:32 <HOSTNAME> AFSR.PSYND 0x0200(Score 95) AFSR.ETS 0x00
Fault_PC 0x10164880
May 2 11:18:32 <HOSTNAME> UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000
UDBL.ESYND 0x00
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 818911 kern.info]
[AFT2] errID 0x0000093d.592f9d58 PA=0x00000000.3c865380
May 2 11:18:32 <HOSTNAME> E$tag 0x00000000.0bc00790 E$State:
Modified E$parity 0x05 Badlines found=2
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 989652 kern.info]
[AFT2] E$Data (0x00): 0x00000000.00000100 *Bad* PSYND=0x0200
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x08): 0x00268819.00268cd0
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x10): 0x00000007.0000000f
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x18): 0x00000300.039ef040
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x20): 0x00000300.02904040
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x28): 0x00000000.00000000
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x30): 0x00000000.00000000
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x38): 0x00000000.00000000
May 2 11:18:32 <HOSTNAME> SUNW,UltraSPARC-II: [ID 530584 kern.info]
[AFT2] errID 0x0000093d.592f9d58 AFAR was derived from E$Tag
May 2 11:18:32 <HOSTNAME> unix: [ID 836849 kern.notice]
May 2 11:18:32 <HOSTNAME> ^Mpanic[cpu1]/thread=30002ae9680:
May 2 11:18:32 <HOSTNAME> unix: [ID 478247 kern.notice] [AFT1] errID
0x0000093d.592f9d58 EDP Error(s)
May 2 11:18:32 <HOSTNAME> See previous message(s) for details
May 2 11:18:32 <HOSTNAME> unix: [ID 100000 kern.notice]
May 2 11:18:32 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f6df0 SUNW,UltraSPARC-II:cpu_aflt_log+4e0 (2a1007f6eae, 1,
10147be0, 2a1007f7038, 2a1007f6efb, 1May 2 11:18:32 <HOSTNAME> genunix:
[ID 179002 kern.notice] %l0-3: 0000000000000000 000002a1007f7100
0000000000000003 0000000000000010
May 2 11:18:32 <HOSTNAME> %l4-7: 0000000000400000 0000000000400000
0000000000000000 0000000000000000
May 2 11:18:32 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f7040 SUNW,UltraSPARC-II:cpu_async_error+868 (1, 2a1007f7100,
80400200, 0, 640000080400200, 2a10May 2 11:18:32 <HOSTNAME> genunix:
[ID 179002 kern.notice] %l0-3: 0000000000000001 0000000000000032
0000000000000000 0000000000000000
May 2 11:18:32 <HOSTNAME> %l4-7: 0000000004004208 0000000000000000
0000030002ae9680 000002a1007f7ba0
May 2 11:18:33 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f7210 unix:prom_rtt+0 (30000f3de18, 30002ae9680, 0, feaba004,
ffbef306, d3d53)
May 2 11:18:33 <HOSTNAME> genunix: [ID 179002 kern.notice] %l0-3:
0000000000000004 0000000000001400 0000000000001603 000000001013f894
May 2 11:18:33 <HOSTNAME> %l4-7: 0000000000000000 0000000000000001
0000000000000000 000002a1007f72c0
May 2 11:18:33 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f7360 genunix:kmem_cache_alloc+3c0 (30000f3de18, 30000835ba8,
2a1007f7538, 2a1007f7540, 30000f3dMay 2 11:18:33 <HOSTNAME> genunix:
[ID 179002 kern.notice] %l0-3: 0000030000030780 0000000000000000
0000000000000000 0000000000000000
May 2 11:18:33 <HOSTNAME> %l4-7: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
May 2 11:18:33 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f7440 genunix:lookuppnvp+2cc (2a1007f7890, 0, 1045eaf8,
1045a988, 1, 0)
May 2 11:18:33 <HOSTNAME> genunix: [ID 179002 kern.notice] %l0-3:
000000001016482c 000002a1007f7890 0000000000000000 0000030000f3de18
May 2 11:18:33 <HOSTNAME> %l4-7: 0000030000835aa8 000002a1007f7990
0000030000f3de18 0000000000000000
May 2 11:18:33 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f7640 genunix:lookuppn+108 (30000f3de18, 0, 2a1007f7888,
30000f3de18, 2a1007f7990, 1)
May 2 11:18:33 <HOSTNAME> genunix: [ID 179002 kern.notice] %l0-3:
0000000000000000 0000030002cd2a98 000002a1007f7890 0000030002e7c480
May 2 11:18:33 <HOSTNAME> %l4-7: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
May 2 11:18:34 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f7700 genunix:vn_create+c4 (14de08, 0, 1045a988, 0, 0, 0)
May 2 11:18:34 <HOSTNAME> genunix: [ID 179002 kern.notice] %l0-3:
0000000000000000 000000000014de08 0000000000000000 000002a1007f7998
May 2 11:18:34 <HOSTNAME> %l4-7: 0000000000000000 000002a1007f7990
000000007efefeff 0000000000000180
May 2 11:18:34 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f78b0 genunix:vn_open+d4 (10b, 2, 180, 100, 180, 0)
May 2 11:18:34 <HOSTNAME> genunix: [ID 179002 kern.notice] %l0-3:
00000000fea82294 000000000014de08 0000000000000000 000000000000010b
May 2 11:18:34 <HOSTNAME> %l4-7: 0000000000000000 0000000000000000
0000000000010000 000000007fffffff
May 2 11:18:34 <HOSTNAME> genunix: [ID 723222 kern.notice]
000002a1007f7a20 genunix:copen+94 (14de08, 10b, 180, 10b, 30, ffbeeb1f)
May 2 11:18:34 <HOSTNAME> genunix: [ID 179002 kern.notice] %l0-3:
0000004482001a03 0000000000000016 0000000000000000 0000000000000003
May 2 11:18:34 <HOSTNAME> %l4-7: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
May 2 11:18:34 <HOSTNAME> unix: [ID 100000 kern.notice]
May 2 11:18:34 <HOSTNAME> genunix: [ID 672855 kern.notice] syncing file
systems...
May 2 11:18:35 <HOSTNAME> genunix: [ID 733762 kern.notice] 226
May 2 11:18:37 <HOSTNAME> genunix: [ID 733762 kern.notice] 197
May 2 11:18:38 <HOSTNAME> genunix: [ID 733762 kern.notice] 171
May 2 11:18:48 <HOSTNAME> last message repeated 9 times
May 2 11:18:49 <HOSTNAME> genunix: [ID 616637 kern.notice] cannot sync
-- giving up
May 2 11:18:50 <HOSTNAME> genunix: [ID 353387 kern.notice] dumping to
/dev/dsk/c0t0d0s1, offset 65536
May 2 11:19:11 <HOSTNAME> genunix: [ID 409368 kern.notice] ^M100% done:
13228 pages dumped, compression ratio 3.81,
May 2 11:19:11 <HOSTNAME> genunix: [ID 851671 kern.notice] dump
succeeded

Normally this seems to be a damaged CPU1. But...

we changed almost everything in this server by spare parts : Mainboard,
RAM, main power-supply-plane, DC-DC-converter and (four times!) the
CPUS! At least there was support-engineer from SUN at our site who
changed the CPU1 again started Sun VTS software to make some
long-term-testing on CPUs and the RAM-modules. Everything seemed to be
fine, but this morning, the server crashed again.

Last original parts in the box are the fans (working well!), the
power-supply-modules in the front-bays and the box itself.

SUN has no idea (the case has now been open for almost 9 weeks),
we have no idea,
do you have one???????

Michael Schulte

p.s.: It's no software-problem, because it even crashed during initial
installation procedure of Solaris 8!
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:24:17 EDT