automatic poweroff after high temperature detected

From: Bob Vickers (bobv@cs.rhul.ac.uk)
Date: Mon Jun 05 2006 - 05:36:47 EDT


Dear All,

This weekend we had an air conditioning failure and the temperature in the
machine room rose dramatically. The result was that every 18 minutes
envmond shut down our Alpahaserver 4100, only for it to reboot. This
carried on for about 27 hours until eventually (I presume) the hardware
detected an error and refused to come up.

The machine was still extremely hot the following morning (about 12 hours
later) so I'm not certain the power got turned off even then.

I am worried about the damage that might be caused by this, also the heat
being generated which increases the problem for the other servers in the
room. Is it possible to configure the system so that when a high
temperature is detected the machine shuts down and powers itself off? In
fact I don't mind if any kind of crash causes a power off, because tru64
is so reliable we never get crashes (touch wood!).

Our relevant variables are:

# /usr/sbin/envconfig -q
ENVMON_CONFIGURED = 1
ENVMON_GRACE_PERIOD =
ENVMON_MONITOR_PERIOD =
ENVMON_HIGH_THRESH = 40
ENVMON_USER_SCRIPT =

$ /sbin/consvar -l |grep -i boot
auto_action = BOOT
boot_dev = rz25
bootdef_dev = rz25
booted_dev = rz25
boot_file =
booted_file =
boot_osflags = A
booted_osflags = A
boot_reset = OFF

Regards,
Bob Vickers
Dept of Computer Science, Royal Holloway, University of London



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:30 EDT