From: Tim Chipman (tchipman@gmail.com)
Date: Mon Oct 15 2007 - 09:22:23 EDT
Hi all,
I've dug through the archives and can't find anything which is
consistent with my issue (clearly some picld problems on v880 in the
past, esp. solaris 8 platform).. if anyone has advice or pointers, it
is *greatly* appreciated. (Sunsolve also is proving difficult,
especially since I have free-only-access).
I've got a v880 running solaris9 (recommended patch cluster updated as
of ~feb-march/07) which has been quite stable for the past year (i.e.,
since I've been managing it). Now twice in the last 2 weeks, the
system has shut itself down, with picld doing it due to self-diagnosis
of "problems". The catch, is that the problems sound a tad fishy (MB
temp of -50 C).
Typical sequence of logged errors in dmesg are shown below.
Can anyone comment,
* can I simply disable picld to not-run at boot? Is this prudent?
* should I apply some patch to update picld to behave better on v880 /
Sol9 ? I had found some patches for Sol8 from previous discussion
threads (110460-32) but this patch line appears to ultimately have
been obseleted by a kernel patch, 108528-29.
* has anyone else ever seen something like this // have any other
comments or suggestions ?
Many thanks for any / all help,
Tim Chipman
======PASTE===========
Oct 12 07:07:11 Jade picld[75]: [ID 625010 daemon.error] WARNING:
Device IO_BRIDGE_PRIM_FAN failure detected
then a bit later,
Oct 12 22:36:55 Jade picld[75]: [ID 916734 daemon.error] CRITICAL :
LOW TEMPERATURE DETECTED -50, MB_AMB_TEMPERATURE_SENSOR
[System then shuts itself down)
Oct 12 22:39:22 Jade agent[1041]: [ID 854342 daemon.alert] syslog
Oct 12 22:39:22 agent {received software termination signal}
Oct 12 22:39:22 Jade agent[1041]: [ID 251449 daemon.alert] syslog
Oct 12 22:39:22 agent *** terminating execution ***
Oct 12 22:40:20 Jade syslogd: going down on signal 15
Oct 12 22:40:20 Jade xntpd[413]: [ID 866926 daemon.notice] xntpd
exiting on signal 15
Oct 12 22:40:47 Jade genunix: [ID 672855 kern.notice] syncing file systems...
Oct 12 22:40:47 Jade genunix: [ID 904073 kern.notice] done
Oct 15 09:51:28 Jade genunix: [ID 540533 kern.notice] ^MSunOS Release
5.9 Version Generic_122300-03 64-bit
and we see logged on a root ssh session console, consistent with this
time frame,
root@Jade # Broadcast Message from root (???) on Jade Fri Oct 12 22:37:01...
The system Jade will be shut down in 1 minute
OVERTEMP condition
Broadcast Message from root (???) on Jade Fri Oct 12 22:37:31...
The system Jade will be shut down in 30 seconds
OVERTEMP condition
Broadcast Message from root (???) on Jade Fri Oct 12 22:37:52...
THE SYSTEM Jade IS BEING SHUT DOWN NOW ! ! !
Log off now or risk your files being damaged
OVERTEMP condition
Hangup
root@Jade # Connection to jade closed by remote host.
====ENDPASTE==========
-------POSSIBLY UNRELATED - PASTE OF PRTDIAG -V OUTPUT FROM THIS SYSTEM----
(note, after system is rebooted there are no fan failures detected or reported)
root@Jade # prtdiag -v
System Configuration: Sun Microsystems sun4u Sun Fire 880
System clock frequency: 150 MHz
Memory size: 8192 Megabytes
========================= CPUs ===============================================
Run E$ CPU CPU
Brd CPU MHz MB Impl. Mask
--- ----- ---- ---- ------- ----
A 0 1200 8.0 US-III+ 11.1
B 1 1200 8.0 US-III+ 11.1
A 2 1200 8.0 US-III+ 11.1
B 3 1200 8.0 US-III+ 11.1
========================= Memory Configuration ===============================
Logical Logical Logical
MC Bank Bank Bank DIMM Interleave Interleaved
Brd ID num size Status Size Factor with
---- --- ---- ------ ----------- ------ ---------- -----------
A 0 0 512MB no_status 256MB 8-way 0
A 0 1 512MB no_status 256MB 8-way 0
A 0 2 512MB no_status 256MB 8-way 0
A 0 3 512MB no_status 256MB 8-way 0
B 1 0 512MB no_status 256MB 8-way 1
B 1 1 512MB no_status 256MB 8-way 1
B 1 2 512MB no_status 256MB 8-way 1
B 1 3 512MB no_status 256MB 8-way 1
A 2 0 512MB no_status 256MB 8-way 0
A 2 1 512MB no_status 256MB 8-way 0
A 2 2 512MB no_status 256MB 8-way 0
A 2 3 512MB no_status 256MB 8-way 0
B 3 0 512MB no_status 256MB 8-way 1
B 3 1 512MB no_status 256MB 8-way 1
B 3 2 512MB no_status 256MB 8-way 1
B 3 3 512MB no_status 256MB 8-way 1
========================= IO Cards =========================
Bus Max
IO Port Bus Freq Bus Dev,
Brd Type ID Side Slot MHz Freq Func State Name
Model
---- ---- ---- ---- ---- ---- ---- ---- -----
-------------------------------- ----------------------
I/O PCI 8 B 0 33 33 5,0 ok SUNW,jfca/fp (fp)
FCX-6562-L
I/O PCI 9 B 6 33 33 2,0 ok ethernet-pci1148,9000.1148.2100.+
I/O PCI 9 B 4 33 33 4,0 ok fibre-channel-pci1077,2312.1077.+
No failures found in System
===========================
========================= Environmental Status =========================
System Temperatures (Celsius):
-------------------------------
Device Temperature Status
---------------------------------------
CPU0 50 OK
CPU1 45 OK
CPU2 48 OK
CPU3 47 OK
MB 22 OK
IOB 18 OK
DBP0 19 OK
=================================
Front Status Panel:
-------------------
Keyswitch position: NORMAL
System LED Status:
GEN FAULT REMOVE
[OFF] [OFF]
DISK FAULT POWER FAULT
[OFF] [OFF]
LEFT THERMAL FAULT RIGHT THERMAL FAULT
[OFF] [OFF]
LEFT DOOR RIGHT DOOR
[OFF] [OFF]
=================================
Disk Status:
Presence Fault LED Remove LED
DISK 0: [PRESENT] [OFF] [OFF]
DISK 1: [PRESENT] [OFF] [OFF]
DISK 2: [PRESENT] [OFF] [OFF]
DISK 3: [PRESENT] [OFF] [OFF]
DISK 4: [PRESENT] [OFF] [OFF]
DISK 5: [PRESENT] [OFF] [OFF]
DISK 6: [ EMPTY]
DISK 7: [ EMPTY]
DISK 8: [ EMPTY]
DISK 9: [ EMPTY]
DISK 10: [ EMPTY]
DISK 11: [ EMPTY]
=================================
Fan Bank :
----------
Bank Speed Status Fan State
( RPMS )
---- -------- --------- ---------
CPU0_PRIM_FAN 1910 [ENABLED] OK
CPU1_PRIM_FAN 2013 [ENABLED] OK
CPU0_SEC_FAN 0 [DISABLED] OK
CPU1_SEC_FAN 0 [DISABLED] OK
IO0_PRIM_FAN 3061 [ENABLED] OK
IO1_PRIM_FAN 2912 [ENABLED] OK
IO0_SEC_FAN 0 [DISABLED] OK
IO1_SEC_FAN 0 [DISABLED] OK
IO_BRIDGE_PRIM_FAN 3614 [ENABLED] OK
IO_BRIDGE_SEC_FAN 0 [DISABLED] OK
=================================
Power Supplies:
---------------
Supply Status Fan Fail Temp Fail CS Fail 3.3V 5V 12V 48V
------ ------------ -------- --------- ------- ---- -- --- ---
PS0 GOOD 6 3 2 3
PS1 GOOD 6 3 2 3
PS2 GOOD 6 3 2 3
========================= HW Revisions =======================================
System PROM revisions:
----------------------
OBP 4.13.0 2004/01/19 18:26
IO ASIC revisions:
------------------
Port
Brd Model ID Status Version
---- --------------- ---- ------ -------
IB-1 unknown 8 ok 7
IB-1 unknown 9 ok 7
root@Jade #
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:42:25 EDT