420r coredump (fw-1, stonebeat)

From: Christofer Olofsson (christofer.olofsson@songnetworks.se)
Date: Wed Dec 18 2002 - 09:42:32 EST


Hi all,

We have a pair of Sun420R DualCPU who has started to coredump and reboot.
We reported this to Sun who replaced the CPU (cpu2) but the problem didn't
disappear..

#uname -a
SunOS 5.7 Generic_106541-23 sun4u sparc SUNW, Ultra-80

Sun Enterprise 420R (2 X UltraSPARC-II 450MHz)
Checkpoint Firewall-1 4.1 SP-6
Stonebeat Fullcluster 2.0 (Build 2035) revision 05-03

This happened yesterday:

Dec 17 14:09:08 unix: BAD TRAP: cpu=2 type=0x31 rp=0x4028b868 addr=0x10
mmu_fsr=0x0
Dec 17 14:09:08 unix: BAD TRAP occurred in module "tcp" due to a NULL pointer
dereference.
Dec 17 14:09:08 unix: fw:
Dec 17 14:09:08 unix: trap type = 0x31
Dec 17 14:09:08 unix: addr=0x10
Dec 17 14:09:08 unix: pid=452, pc=0x101f5094, sp=0x4028b8f8,
tstate=0x8811001e07, context=0x18b2
Dec 17 14:09:08 unix: g1-g7: 1042f000, 1, 1, 0, 4028bb00, 0, 719550e0
Dec 17 14:09:08 unix: Begin traceback... sp = 4028b8f8
Dec 17 14:09:08 unix: Called from 101f411c, fp=4028b9a8, args=28 e43b7dd0
7010a2b0 1 ef88 705c0680
Dec 17 14:09:08 unix: Called from 100b9dc8, fp=4028ba08, args=71a1cc10
4028bac8 71c65260 71a1cc10 71a1c8f8 71a1c8f8
Dec 17 14:09:08 unix: Called from 100bd7c0, fp=4028ba68, args=101f40d8 0
71a1cc10 71996b58 71a1cc90 4028bac8
Dec 17 14:09:08 unix: Called from 100bdad8, fp=4028bb80, args=0 0 8 71a1d558 0
0
Dec 17 14:09:08 unix: Called from 1009d5f4, fp=4028bbe8, args=0 4028bc50
1000000 b68 5 71a1d558
Dec 17 14:09:08 unix: Called from 10036078, fp=4028bc80, args=14a4 14a4 14a4
19 700ea398 71a1fc80
Dec 17 14:09:08 unix: Called from ff37ad8c, fp=ffbef910, args=19 12d1600 14a4
19 0 ff191fc8
Dec 17 14:09:08 unix: End traceback...
Dec 17 14:09:09 unix: fw_lock: already locked. current = fw_filter (in),
previous = fw_filter (out), level=2
Dec 17 14:09:11 unix: panic[cpu2]/thread=719550e0:
Dec 17 14:09:11 unix: trap
Dec 17 14:09:11 unix:
Dec 17 14:09:11 unix: syncing file systems...
Dec 17 14:09:11 unix: 2
Dec 17 14:09:12 unix: done
Dec 17 14:09:13 unix: dumping to /dev/dsk/c0t0d0s1, offset 107741184
Dec 17 14:09:23 unix: ^M100% done: 5727 pages dumped, compression ratio 2.84,
Dec 17 14:09:23 unix: dump succeeded

>From /var/adm/messages

unix: panic[cpu2]/thread=401a1e60:
savecore: reboot after panic: trap

unix: panic[cpu2]/thread=401a1e60:
savecore: reboot after panic: trap

unix: panic[cpu2]/thread=401a5e60:
savecore: reboot after panic: trap

unix: panic[cpu2]/thread=401a9e60:
savecore: reboot after panic: trap

unix: panic[cpu2]/thread=401a9e60:
savecore: reboot after panic: recursive mutex_enter,lp=70808950 owner=401a9e60
thread=401a9e60

unix: panic[cpu2]/thread=71a65ce0:
savecore: reboot after panic: trap

unix: panic[cpu2]/thread=71a65ce0:
savecore: reboot after panic: kernel heap corruption detected

The different traps have been;

bad trap in module: "znb/genunix/tcp/ip" due to a NULL_POINTER dereference.

We used to have znb quad cards but replaced them with Sun's qfe cards. Before
the switch of network cards the problem was "bad trap in module: "znb" due to
a
NULL_POINTER dereference and after the change other problems as described
above
started to show up (bad trap in module: "genunix/tcp/ip" due to a NULL_POINTER
dereference.).

Do you have any suggestions?

br, Christofer.
_____________________________
* Christofer Olofsson
* Unix Systems Operation
* Song Networks Swedish AB
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:25:29 EDT