temperature swings on Sun Blade 1000

From: Sean Walmsley (sean.p.walmsley@nuclearsafetysolutions.com)
Date: Mon May 05 2003 - 21:33:01 EDT


Sun Managers:

  OS: Solaris 8 with 8_Recommended_2003-02-20 patch cluster
  HARDWARE: SunBlade 1000 w/4Gb RAM
  
Two questions I hope you can help me with to get one of our systems
back up and running:

(1)
Has anyone tried one of Sun's new XVR-100 2-D graphics boards in
a SunBlade 1000 successfully? Sun says they work in a Blade 2000 but
won't guarantee they work in the older 1000.
    
(2)
We recently had a Creator 3D graphics board fail in a SunBlade 1000
desktop machine. On further investigation, the unit's /var/adm/messages
file contains:

picld[73]: [ID 402047 daemon.crit] SUNW_piclenvd: pmthr thread creation failed!

and a bunch of:

picld[73]: [ID 845468 daemon.crit] SUNW_piclenvd: 'cpu0' sensor temperature 90
outside safe operating limits (0...88)

Looping the /usr/platform/sun4u/sbin/prtdiag -v command and grepping
for CPU temps and cpu fan speeds (see below) shows that:

  - the CPU fan is staying at 19% speed until the CPU temperature
    exceeds 88 degrees C
  - when the die temp reaches 88 degrees the system logs the overtemp
    in /var/adm/messages and increases the fan speed to 100%
  - the CPU temp gradually falls and the fan speed is gradually
    reduced to 19% again
  - the cycle repeats itself
  
Unfortunately, the script that we use to monitor /var/adm/messages output
was not looking for "daemon.crit" so I don't know if this is a new problem
or not.

It seems like the ratiometric control of the fans on rising temperature is
not working. I presume that this may be because of the picld "pmthr" thread
dying.

Has anyone else experienced this problem? If so, any suggestions on work
arounds or fixes would be most welcome.

Thanks - I will summarize.

Sean

BELOW IS THE LOOPED OUTPUT OF PRTDIAG
-------------------------------------

 0 900 MHz 8MB US-III 5.11 87 C 24 C
cpu 19%
 0 900 MHz 8MB US-III 5.11 87 C 24 C
cpu 19%
 0 900 MHz 8MB US-III 5.11 87 C 24 C
cpu 19%
 0 900 MHz 8MB US-III 5.11 87 C 24 C
cpu 19%
 0 900 MHz 8MB US-III 5.11 90 C 24 C
cpu 100%
 0 900 MHz 8MB US-III 5.11 87 C 23 C
cpu 98%
 0 900 MHz 8MB US-III 5.11 87 C 23 C
cpu 96%
 0 900 MHz 8MB US-III 5.11 86 C 23 C
cpu 95%
 0 900 MHz 8MB US-III 5.11 82 C 23 C
cpu 93%
 0 900 MHz 8MB US-III 5.11 81 C 22 C
 .....
=================================================================
Sean Walmsley sean.p.walmsley@nuclearsafetysolutions.com
Nuclear Safety Solutions Ltd. 416-592-4608 (V) 416-592-5528 (F)
700 University Ave M/S H04 J19, Toronto, Ontario, M5G 1X6, CANADA

COMPANY DISCLAIMER:
This message is for the sole use of the intended recipients and may contain
confidential and/or privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of the original message. E-mail attachments may contain viruses that could
damage your computer. While we have taken precautions to minimize this
risk, we cannot accept liability for such damage and you should carry out
your own virus checks before opening an attachment.
Thank you for your cooperation.
Nuclear Safety Solutions Ltd.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:20 EDT