V880 weirdness (complete this time)

From: NetComrade (netcomrade@bookexchange.net)
Date: Wed Mar 30 2005 - 17:16:21 EST


sorry, i didn't post all the info the first time..

We have a very strange issue with a V880 anyone seen this?

This pretty much all we see in the logs.. this happenned twice (in 2
days), and I don't think we've given it enough time to sort itself out..
tried to go to OK prompt, and resume, which I don't think it was a smart
thing to do (it paniced)

GAB seems to report a high load.. However, when I logged in via terminal
server the 'uptime' only showed 1 for all 3 times (1,5,15mins). I ran
ifconfig -a, and it hung.. then it kinda died (on my attempts to
OK/resume). However, last time when it happenned, the load was high,
uptime was about the only thing that I could run, and then it would just
hang.

Another weird thing about this machine, cntrl-c doesn't work in bash.

It's running Solaris 8 and VCS 1.3.0. It had also Oracle running on it,
but I am pretty sure the db could not have generated so much load, since
it's in very light testing mode.

Let me reiterate, that I don't think any of the 'apps' (oracle server in
this case), would've been able to create such huge load.

starts at (prob bigsaint wants to monitor)
15:25:18

sshd[1182]: [ID 800047 auth.error] error: fork: Resource temporarily
unavailable
15:25:33
gab: [ID 524258 kern.notice] GAB:20057: Port h process 1262 inactive 7 sec
gab: [ID 524258 kern.notice] GAB:20057: Port h process 1262 inactive 8 sec
gab: [ID 524258 kern.notice] GAB:20057: Port h process 1262 inactive 9 sec
gab: [ID 524258 kern.notice] GAB:20057: Port h process 1262 inactive 10
sec
gab: [ID 524258 kern.notice] GAB:20057: Port h process 1262 inactive 11
sec
gab: [ID 524258 kern.notice] GAB:20057: Port h process 1262 inactive 12
sec
gab: [ID 524258 kern.notice] GAB:20057: Port h process 1262 inactive 13
sec
gab: [ID 524258 kern.notice] GAB:20057: Port h process 1262 inactive 14
sec
gab: [ID 183661 kern.notice] GAB:20058: Port h process 1262: heartbeat
failed, killing
gab: [ID 238234 kern.notice] GAB:20059: Port h heartbeat interval 15000
msec. Statistics:
gab: [ID 761221 kern.notice] GAB: Port h: heartbeats in 0 ~ 3000
gab: [ID 761221 kern.notice] GAB: Port h: heartbeats in 3000 ~ 6000
gab: [ID 761221 kern.notice] GAB: Port h: heartbeats in 6000 ~ 9000
gab: [ID 761221 kern.notice] GAB: Port h: heartbeats in 9000 ~ 12000
gab: [ID 761221 kern.notice] GAB: Port h: heartbeats in 12000 ~ 15000
gab: [ID 199900 kern.notice] GAB:20062: process 1262, command line: had
gab: [ID 482222 kern.notice] GAB:20064: state: SRUN (running)
gab: [ID 655754 kern.notice] GAB:20070: total lwp: 1
gab: [ID 851661 kern.notice] GAB:20071: total lwp swapped out: 0
gab: [ID 989485 kern.notice] GAB:20072: lwp information:
gab: [ID 687487 kern.notice] GAB:20073: 1: thread id 1, kthread at
30009354180,
gab: [ID 538219 kern.notice] GAB:20075: TS_SLEEP (waiting an event)
gab: [ID 391709 kern.notice] GAB:20088: System information:
gab: [ID 905529 kern.notice] GAB:20089: number of cpu: 8
gab: [ID 517451 kern.notice] GAB:20090: physical memory: 16359400 K
gab: [ID 501767 kern.notice] GAB:20091: free memory: 11514472 K
gab: [ID 828792 kern.notice] GAB:20092: average free memory in 5
sec: 11514472
gab: [ID 827190 kern.notice] GAB:20093: average free memory in 30
sec: 11514312
gab: [ID 809009 kern.notice] GAB:20094: number of processes: 525
gab: [ID 717043 kern.notice] GAB:20095: load average in 1 min: 249
gab: [ID 210278 kern.notice] GAB:20096: load average in 5 min: 143
gab: [ID 139215 kern.notice] GAB:20097: load average in 15 min: 78
gab: [ID 155039 kern.notice] GAB:20098: pagein rate: 0
gab: [ID 615007 kern.notice] GAB:20099: pageout rate: 0
gab: [ID 376718 kern.notice] GAB:20041: Port h: client process
failure: killing process
gab: [ID 184552 kern.notice] GAB:20035: Port h attempting to kill process
due
last message repeated 2 times
gab: [ID 294923 kern.notice] GAB:20032: Port h closed
gab: [ID 495678 kern.notice] GAB:20036: Port h gen 8a718316 membership 01
6
unix: [ID 836849 kern.notice]

15:43:20 prob panics due to OK prompt/resume
^Mpanic[cpu7]/thread=2a10007dd20:
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:27 EDT