SUMMARY: automatic poweroff after high temperature detected

From: Bob Vickers (bobv@cs.rhul.ac.uk)
Date: Mon Jun 05 2006 - 09:38:55 EDT


Thanks go to the various people who replied, mostly saying the same thing:
software triggered poweroff on a 4100 is impossible, but you can at least
cause the system to halt after a failure as follows:

# /sbin/consvar -s auto_action HALT
# /sbin/consvar -a

I have included Peter Stern's reply verbatim as he went into quite a bit
of detail:

Dear Bob:

I do not think that there is any command which will cause your system to
power off. Even when you shutdown the system with 'shutdown -h' it
doesn't power off by itself. I think that the best you can do is:
setenv auto_action = halt

which will stop your system at the console prompt (>>>).
Then at least, the cpus won't be working generating heat and it won't
keep shutting down and rebooting.

And yes, you should be worried about the damage heat may cause.
Hopefully, nothing bad happened. but....

By the way, you can set ENVMON_USER_SCRIPT to some script which gets
executed when the system gets hot, e.g. sends you email. Then you can
try to solve the problem or power off the machine until you can.

Regards,
Peter Stern

>
> Dear All,
>
> This weekend we had an air conditioning failure and the temperature in the
> machine room rose dramatically. The result was that every 18 minutes
> envmond shut down our Alpahaserver 4100, only for it to reboot. This
> carried on for about 27 hours until eventually (I presume) the hardware
> detected an error and refused to come up.
>
> The machine was still extremely hot the following morning (about 12 hours
> later) so I'm not certain the power got turned off even then.
>
> I am worried about the damage that might be caused by this, also the heat
> being generated which increases the problem for the other servers in the
> room. Is it possible to configure the system so that when a high
> temperature is detected the machine shuts down and powers itself off? In
> fact I don't mind if any kind of crash causes a power off, because tru64
> is so reliable we never get crashes (touch wood!).
>
> Our relevant variables are:
>
> # /usr/sbin/envconfig -q
> ENVMON_CONFIGURED = 1
> ENVMON_GRACE_PERIOD =
> ENVMON_MONITOR_PERIOD =
> ENVMON_HIGH_THRESH = 40
> ENVMON_USER_SCRIPT =
>
> $ /sbin/consvar -l |grep -i boot
> auto_action = BOOT
> boot_dev = rz25
> bootdef_dev = rz25
> booted_dev = rz25
> boot_file =
> booted_file =
> boot_osflags = A
> booted_osflags = A
> boot_reset = OFF
>
> Regards,
> Bob Vickers
> Dept of Computer Science, Royal Holloway, University of London
>
>



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:30 EDT