Re: Hardware monitoring

From: damir delija (ddelija@SRCE.HR)
Date: Sat Dec 07 2002 - 01:23:28 EST


On Fri, Dec 06, 2002 at 10:14:07AM -0600, Bill Verzal wrote:
> Does anyone out there know of a utility that can monitor a server farm
> (pSeries, RS/6000, and SP) and show exceptions (drive failures,
> environmental sensors, etc) and control it all from a central station ?
>
> BV

As far as I now there is not such thing.
This is to close to the silver bullet -

But some simple usefull improvisations can be used
There is a whole life cycle for such home brew tool.

Basically first step is error log /syslog integration
and some swatch type tool on the central syslog host
later it comes console, fancy notification, trouble ticket
integration etc ..., and not to forget reports which managemnt like so
much ...

if you have SP CWS is almost perfect destination for this ...
but don not rely on email / page notification only
there must be a console!

On SP event montoring is part of the system itself
there is about 400 resource variables which can be bounded to events
notification, and there is direct syslog errorlog interface
events can be bound directly to recovery scripts (in theory)
But by my experinece it is quite impossible to have
such scripts on wild without human intervetions.
There is a redbook whith basical exmaples for file system monitoring
arming rearming events etc ...
This is part of PSSP so nice move will be to include as much
standalone AIX boxex into SP as possible (there is no need
for switch connections, just for serial connection)

On other machines it dependes on HW and its capabilities
but the same mechanism can be used

You can also modify / write SNMP agents or interfaces for such purposes
(I've once did such thing for Solaris and it was very usefull)

Actually
there is big brother as simple extensible console
with huge amount of tools on www.deadcat.com
you can easliy do some improvisations
Also in redbook managing AIX farms there are few chapters
about SNMP agent intergration
Some good ideas can be found on Sysadmin magazine,
and there was some roumours about auto-management tool on AIX5L
(more or less bind event actions on ordinary AIX ..)

I hope this helps
PS: this text is a little fuzzy but I've just finished
one such enviromental incident handling ...

Damir Delija

> --------------------------------------------------------------------------------------------------------
>
> Bill Verzal
> Technical Consultant
> Forbes Technical Consulting
> (312) 653-3684
> bill_verzal@bcbsil.com
> billverzal@imcingular.com (Pager)
> 888-428-4025 (Pager)
> MailStop: 27.202B
>
>
> **********
> The information contained in this communication is confidential, private, proprietary, or otherwise privileged and is intended only for the use of the addressee. Unauthorized use, disclosure, distribution or copying is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately at (312)653-6000 in Illinois; (972)766-6900 in Texas; or (800)835-8699 in New Mexico.
> **********



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:24 EDT