Hardware Monitoring

From: Jonathan Williams (jonathw@shubertorg.com)
Date: Thu Jan 23 2003 - 14:19:06 EST


Tru64 Unix version 5.1a (no patches) Alpha ES40s and ES45s

I've sent emails in the past asking about system tuning, performance monitoring,
application monitoring, etc. Now we want to focus on hardware monitoring, so
that we might pick up on an issue before it causes the system to crash. I know
that when a disk drive (for example) is going bad, it might log that there were
recoverable errors writing to it...it might be this way for a long time before
the disk actually crashes--it would be nice to do something before the crash. I
know there are programs out there that can read log files and tell you what
happened after the fact...but what about before? I'm just curious what others
out there are using for some hardware preventative maintenance? I've heard of
Compaq Analyze (although I've never used it, and am not really aware of it's
functionality), I'm also aware of Big Brother and it's ability to read through
log files periodically searching for key words (like error, and warning). What
else is out there? I'm interested in anything from the most robust/expensive
down to some simple shell scripts that someone may have written. I would like
something that can keep track of local devices and maybe even devices out on the
SAN. Thank you very much in advance.

Jonathan Williams
Unix Systems Administrator
The Shubert Organization, Inc.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:05 EDT