The AIX Error Logging Facility (explored in the article 'The AIX Error Logging Facility" published in the Supplement to the June 2001 issue of Sys Admin Magazine) provides the administrator of an RS/6000 with unparalleled monitoring and reporting of the health and general welfare of system components, often providing warning of impending problems with ample time to take action before the problem causes unscheduled downtime or data loss. In addition to the error logging and analysis available, Error Notification (errnotify) objects enable the sysadmin to automate troubleshooting and problem resolution, reducing the amount of time and resources required to monitor the error log as well as the risk of "missing" a vital message.
An errnotify object is a "hook" into the error logging facility that causes the execution of a program whenever an error message is recorded that matches user-defined criterion. By using simple "one-liners" or complex scripts, any number of actions can be performed that can notify the administrator, perform analysis of sense data received from a device, or run system-level diagnostics.
Error Notification (errnotify) objects are installed by creating a text file with the properly formatted contents of the object, and then adding it to the "errnotify" class of the ODM via the "odmadd" command. Each errnotify object should have the following descriptors (note that there are other descriptors available, but are not required for creating basic errnotify objects):
| Table 1: errnotify object class descriptors | ||
| Descriptor | Value | Description |
| en_name | Name (text string) | Specifies the name that will be used to identify this object within the errnotify class. |
| en_class |
H (hardware errors) |
Specifies the class of the error log entries to match. If not included in the object, or if defined as a null string, all classes of errors will be matched. |
| en_crcid | Identifier (text string) | Specifies the unique identifier for error messages to match. Valid identifiers can be viewed with the errpt -t command. |
| en_label | Label (text string) | Specifies the label associated with a particular identifier.Valid labels can be viewed with the errpt -t command. |
| en_persistenceflg | 0 (non-persistent) 1 (persistent) |
Specifies whether this object should be deleted from the errnotify class upon system restart. This is useful for objects that take action on a process identified by its PID. |
| en_pid | Process ID (numeric value) | Specifies a process ID for use in identifying the Error Notification object. Object that have the en_pid descriptor specifies should also have the en_persistenceflg descriptor set to "0". |
| en_rclass | Device Class (text string) | Specifies the device class for hardware resources to match. |
| en_type | INFO (informational) PEND (impending loss of resource) PERM (permanent) TEMP (temporary) UNKN (unknown) |
Specifies the severity of error log entries to match. |
| en_method | Path and arguments to an executable program | Specifies the program to be run upon successful match of an error log entry. Parameter expansion for arguments is detailed in Table 2. |
Note that if a descriptor that is used to match part of an error log entry is not included in the object class, or if its value is a null string, that descriptor will match all possible values.
The most important descriptor is "en_method", as it holds the command that is to be executed each time an error that matches this class. A number of parameter are made available to the "en_method", which may be passed as arguments to the specified program.
The parameter and a description of their contents are:
| Table 2: en_method parameters | |
| Parameter |
Description |
| $1 | Sequence number from the error log entry |
| $2 | Error ID from the error Log entry |
| $3 | Class from the error log entry |
| $4 | Type from the error log entry |
| $5 | Alert flags from the error log entry |
| $6 | Resource name from the error log entry |
| $7 | Resource type from the error log entry |
| $8 | Resource class from the error log entry |
| $9 | Error label from the error log entry |
To add an Error Notification object that sends an e-mail to root each time an error of any type is added to the error log, create a file "/tmp/mailroot" with the following contents:
errnotify: |
After saving that file, run the command "odmadd /tmp/mailroot", and the object will be added to the "errnotify" ODM class. To verify that the object was installed correctly, run the command "odmget -q 'en_name=mailroot' errnotify", and the contents of the object will be displayed.
Once the above errnotify object is installed, an e-mail will be sent to root for every new entry in the error log. To confirm this, run the command "errlogger 'this is a test"', and root will receive an e-mail with the subject "errpt: OPMSG", containing the contents of the error log entry.
If, at some point, you wish to remove this object, execute the command "odmdelete -q 'en_name=mailroot' -o errnotify", and the object will be deleted from the ODM.
Several of our RS/6000s are deployed as TSM servers and have multiple IBM Magstar 3590 Tape Drives attached. These "intelligent" tape drives communicate to the host messages about the condition of the drive itself (a "SIM", or System Information Message) and of the tapes that are used by it (a "MIM", or Media Information Message.) These messages are processed in different ways, depending on the type of host that the drive is connected to. On AIX systems, SIM and MIM messages are recorded in the error log, with the identifier "D1A1AE6F" and the label "SIM_MIM_RECORD_3590".
--------------------------------------------------------------------------- |
Unfortunately, the information provided in the SIM or MIM message is encoded within a 144 character hexadecimal string, making it difficult to determine whether the message contains information about damaged media, or if it is a simple notification that the drive was cleaned. In order to make these messages more useful, I wrote a script that is invoked by the Error Notification daemon as an errnotify method.
Each time a message is recorded in the error log with the "D1A1AE6F" identifier, the script (named "cinnamon") is invoked, with the sequence number of the error log entry passed as an argument. The script then retrieves the complete entry from the error log, parses the encoded message to determine the severity and contents, and if the the severity is higher then a specified threshold, a e-mail is sent containing a "readable" version of the error message.
To add the object for the cinnamon script, create the text file /tmp/cinnamon with the following contents, and add it to the ODM with the command "odmadd /tmp/cinnamon".
errnotify: |
After adding the above stanza to the errnotify ODM class, each SIM or MIM message that is received and is higher than the severity level defined in the script will be processed and mailed to the specified addresses.
Subject: SIM posted by rmt6: Device Degraded |
The Error Logging Facility is one of the features that helps AIX "stand-out" from other Unix platforms. By making use of the Error Notification object class, administrators of AIX systems can reduce the amount of time that they spend monitoring their systems, can automate solutions to common problems, and improve the overall availability of their systems.
"The AIX Error Logging Facility", published in the AIX Supplement to the June 2001 issue of Sys Admin Magazine, is available online at <http://www.samag.com/documents/s=1150/sam0106a/0106a.htm>.
Chapter 4 of the IBM manual "General Programming Concepts: Writing and Debugging Programs" describes Error Notification, and was the primary source of information for this article. It can be read online at <http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/toc.htm>.
Additional examples of Error Notification object classes can be found in "/usr/samples/findcore", installed by the fileset bos.sysmgt.serv_aid, and in several documents from the IBM TechDocs website at <http://techsupport.services.ibm.com/rs6k/techbrowse/>.
Sandor W. Sklar is a Unix Systems Administrator at Stanford University, in California.
The Object Data Manager ("ODM") is a system-wide database used by AIX to store various device configuration, system resource information, installed product data, other information. It consists of files located in the directories "/usr/lib/objrepos", "/usr/share/lib/objrepos", and "/etc/objrepos", and is comprised of "objects" and "classes". The class that stores error notification objects is called "errnotify", and is located in the /usr/lib/objrepos directory.
Administrators should not manipulate the ODM at the file-level; instead, the ODM commands ("odmadd", "odmdelete", "odmshow", etc.) should be used.
#!/usr/local/bin/perl -w
#=====================================================================
# cinnamon -- a perl script that translates the sense data from SIM
# and MIM messages posted by IBM 3590 tape drives into
# human-readable format, and sends the messages via email
#---------------------------------------------------------------------
# $Id: errnotify.html,v 1.1 2001/09/23 05:29:57 ssklar Exp $
#=====================================================================
use strict; $|++;
#=====================================================================
# USER-DEFINABLE VALUES
#=====================================================================
# the variable "$recipient" should be set to a comma-separated list of
# addresses to whom this script will send the parsed SARS email to.
# Note: don't forget to backslash any "@" signs, or the script will die.
# If this variable is not set, all mail will be sent to root.
my $recipient = "";
# the variables "$min_sim_sev" and "$min_mim_sev" should be set to the
# minimum severity value that emails should be sent for. Note that
# "1" is the highest severity for both MIMs and SIMs, while "4" is the
# lowest value for SIMs and "3" is the lowest value for MIMs. If these
# variables are not set, email will be sent for messages at all
# severity levels.
my $min_sim_sev = "";
my $min_mim_sev = "";
#======================================================================
# END OF USER-DEFINABLE VALUES
#======================================================================
#----------------------------------------------------------------------
# error checking and defaults setting ...
#----------------------------------------------------------------------
die "cinnamon is useful only on AIX systems. Sorry.\n" unless ($^O =~ /aix/);
$recipient = "root" unless $recipient;
$min_sim_sev = "4" unless ($min_sim_sev =~ /\d/);
$min_mim_sev = "3" unless ($min_mim_sev =~ /\d/);;
#----------------------------------------------------------------------
# the sequence number of the error log entry that we were invoked for
# will be passed as the single argument; make sure that is nothing
# other then six digits ...
#----------------------------------------------------------------------
chomp (my $sequence_number = shift);
unless ($sequence_number =~ /^\d+$/)
{ die "cinnamon: error log sequence number needed as argument\n" };
#----------------------------------------------------------------------
# read in the full unformatted error log entry with the specified
# sequence number ...
#----------------------------------------------------------------------
open (ERROR, "/usr/bin/errpt -g -l $sequence_number |");
#------------------------------------------------------------------
# pull out the detail data from the error log entry ...
#------------------------------------------------------------------
my %message;
while () {
$message{host} = (split)[1], next if /^el_nodeid/;
$message{drive} = (split)[1], next if /^el_resource/;
$message{detail} = (split)[1], last if /^el_detail_data/;
};
close (ERROR);
#------------------------------------------------------------------
# make sure that there is 144 digits in $message{detail}; if not,
# something went wrong, so die ...
#------------------------------------------------------------------
die "cinnamon: incomplete or incorrect error log values retrieved\n"
unless (length($message{detail}) == 144);
#------------------------------------------------------------------
# get the "Machine Type" and convert it from hex to ascii ...
#------------------------------------------------------------------
$message{machine_type} = pack ("H*", substr($message{detail}, 128, 10));
#------------------------------------------------------------------
# get the "Model" and convert it from hex to ascii ...
#------------------------------------------------------------------
$message{model} = pack ("H*", substr($message{detail}, 138, 6));
#------------------------------------------------------------------
# get the "Model and Microcode Level" and convert it from hex
# to ascii ...
#------------------------------------------------------------------
$message{mml} = pack ("H*", substr($message{detail}, 32, 8));
#------------------------------------------------------------------
# get the "Message Code" and look up it's meaning ...
#------------------------------------------------------------------
my %message_code = (
3030 => "No Message",
3430 => "Operator Intervention Required",
3431 => "Device Degraded",
3432 => "Device Hardware Failure",
3433 => "Service Circuits Failed, Operations not Affected",
3535 => "Clean Device",
3537 => "Device has been cleaned",
3630 => "Bad Media, Read-Only Permitted",
3631 => "Rewrite Data if Possible",
3632 => "Read Data if Possible",
3634 => "Bad Media, Cannot Read or Write",
3732 => "Replace Cleaner Cartridge"
);
$message{code} = $message_code{substr($message{detail}, 40, 4)} || "UNKNOWN";
#------------------------------------------------------------------
# determine if we're dealing with a SIM or a MIM ...
#------------------------------------------------------------------
if (substr($message{detail}, 16, 2) eq "01") {
#--------------------------------------------------------------
# it's a SIM ...
#--------------------------------------------------------------
$message{type} = "SIM";
#--------------------------------------------------------------
# convert the FID Severity Code into something meaningful ...
#--------------------------------------------------------------
my %fid_severity_code = (
33 => "1 -- Acute",
32 => "2 -- Serious",
31 => "3 -- Moderate",
30 => "4 -- Service"
);
$message{severity} = $fid_severity_code{substr($message{detail}, 52, 2)} || "UNKNOWN";
#--------------------------------------------------------------
# if the severity of the SIM is not greater than $min_sim_sev,
# exit now ...
#--------------------------------------------------------------
exit 0 unless (substr($message{severity}, 0, 1) <= $min_sim_sev);
#--------------------------------------------------------------
# get the FID (FRU Identification Number), and convert it from
# hex to ascii ...
#--------------------------------------------------------------
$message{fid} = pack ("H*", substr($message{detail}, 64, 4));
#--------------------------------------------------------------
# get the "First FSC" (Fault Symptom Code), and convert it from
# hex to ascii ...
#--------------------------------------------------------------
$message{first_fsc} = pack ("H*", substr($message{detail}, 68, 8));
#--------------------------------------------------------------
# get the "Last FSC" (Fault Symptom Code), and convert it from
# hex to ascii ...
#--------------------------------------------------------------
$message{last_fsc} = pack ("H*", substr($message{detail}, 76, 8));
} else {
#--------------------------------------------------------------
# it's a MIM ...
#--------------------------------------------------------------
$message{type} = "MIM";
#--------------------------------------------------------------
# convert the MIM Severity Code into something meaningful ...
#--------------------------------------------------------------
my %mim_severity_code = (
31 => "3 -- Moderate: high temporary read or write errors have occurred",
32 => "2 -- Serious: permanent read or write errors have occurred",
33 => "1 -- Acute: tape directory errors have occurred"
);
$message{severity} = $mim_severity_code{substr($message{detail}, 52, 2)} || "UNKNOWN";
#--------------------------------------------------------------
# if the severity of the MIM is not greater than $min_mim_sev,
# exit now ...
#--------------------------------------------------------------
exit 0 unless (substr($message{severity}, 0, 1) <= $min_mim_sev);
#--------------------------------------------------------------
# get the VOLSER (Volume Serial Number), and convert it from
# hex to ascii ...
#--------------------------------------------------------------
$message{volser} = pack ("H*", substr($message{detail}, 68, 12));
};
#------------------------------------------------------------------
# format the data and store it in the array @mail ...
#------------------------------------------------------------------
my @mail;
push (@mail, sprintf("Subject: %s posted by %s: %s\n", $message{type}, $message{drive}, $message{code}));
push (@mail, sprintf("%-16s: %-20s\n", "Sequence Number", $sequence_number));
push (@mail, sprintf("%-16s: %-20s\n", "Host", $message{host}));
push (@mail, sprintf("%-16s: %-20s\n", "Drive", $message{drive}));
push (@mail, sprintf("%-16s: %-20s\n", "Model", $message{model}));
push (@mail, sprintf("%-16s: %-20s\n", "Microcode", $message{mml}));
push (@mail, sprintf("%-16s: %-20s\n", "Message Type", $message{type}));
push (@mail, sprintf("%-16s: %-20s\n", "Message Code", $message{code}));
push (@mail, sprintf("%-16s: %-20s\n", "Severity", $message{severity}));
if ($message{type} eq "SIM") {
push (@mail, sprintf("%-16s: %-20s\n", "First FSC", $message{first_fsc}));
push (@mail, sprintf("%-16s: %-20s\n", "Last FSC", $message{last_fsc}));
} else {
push (@mail, sprintf("%-16s: %-20s\n", "VOLSER", $message{volser}));
};
push (@mail, "\n\nRaw Sense Data:\n$message{detail}\n" . "-" x 72 . "\n\n");
#------------------------------------------------------------------
# open a pipe to sendmail and sent the message ...
#------------------------------------------------------------------
open (SENDMAIL, "|/usr/sbin/sendmail $recipient") or
die "cinnamon: couldn't open sendmail: $!";
print SENDMAIL @mail;
close (SENDMAIL);
exit 0;
#======================================================================
# PROGRAM DOCUMENTATION: Run "perldoc cinnamon" to view ...
#======================================================================
=pod
=head1 NAME
B -- an errnotify object method that translates the sense data posted to the AIX error log by an IBM 3590 tape drive (a SIM or a MIM) into a readable format, and mails it to a specified address
=head1 DESCRIPTION
B (so named because I thought it sounded like "sim-mim-mon", my original name for the program) parses and mails AIX error log entries posted with the identifier B, which is the ERROR ID for B.
SIM and MIM records are part of the "Statistical Analysis and Reporting System" (SARS), and are messages created by IBM 3590 tape drives that report on the condition of the drive (a SIM) or of the medium (a MIM). These records are presented by the operating system in different ways. In AIX, SIMs and MIMs are recorded in the error log, the actual information encoded into a 144 character hexadecimal string.
=head1 CONFIGURATION
There are three user-definable values that can be set at the beginning of this script. If they are not defined, default values will be used, as described below.
=item B<$recipient>
The variable B<$recipient> may be set to one or more e-mail addresses to which the output of this script will be mailed. Any "@" signs in the string B be back-slash protected; multiple addresses should be separated by commas, with all addresses inside a single set of double-quotes.
If this variable is not set, the output of the script will be mailed to "root".
=item B<$min_sim_sev>
The variable B<$min_sim_sev> defines the lowest severity level of SIM messages that will be parsed and mailed. The severity level for SIMs range from "4" (a "Service" type message, the lowest severity) to "1" (an "Acute" problem, probably resulting from hardware failure.) To have the script parse and mail only SIMs with a severity of "1" or "2", define $min_sim_sev to "2".
If this variable is not set, SIMs of all severity levels will be parsed and mailed.
=item B<$min_mim_sev>
The variable B<$min_mim_sev> defines the lowest severity level of MIM messages that will be parsed and mailed. The severity level for MIMs range from "3" (a "Moderate", temporary error) to "1" (an "Acute" problem, resulting from tape directory errors.) To have the script parse and mail only MIMs with a severity of "1" or "2", define $min_mim_sev to "2".
If this variable is not set, MIMs of all severity levels will be parsed and mailed.
=head1 USAGE
This program is designed to be used as an B method added to the ODM, so that it will be invoked by the system each time an errpt entry is logged that matches the descriptor values of a 3590 SIM or MIM message.
To create the B, save the following text to the file B:
errnotify:
en_name = "cinnamon"
en_persistenceflg = 1
en_label = "SIM_MIM_RECORD_3590"
en_class = "H"
en_type = "INFO"
en_method = "/usr/local/bin/perl /usr/local/sbin/cinnamon $1"
(Note: use the proper paths to your perl executable and to this program in the above "en_method" line.)
After saving the above text, run the command:
odmadd /tmp/cinnamon.add
The error notification object will be added to the ODM. To verify that the object was added to the ODM properly, run the command:
odmget -q "en_name='cinnamon'" errnotify
To remove the object from the ODM (why would you want to do that?), run the command:
odmdelete -q "en_name='cinnamon'" -o errnotify
=head1 AUTHOR
Sandor W. Sklar
Unix Systems Administrator
Stanford University ITSS-CSS
If this script is useful to you, or even if it is of no use to you, or you have some changes/improvements/questions/extra money, please send me an email.
=head1 FOR MORE INFORMATION
Most of the parsing that this script does was derived from the IBM publication "Statistical Analysis and Reporting System User Guide", which can be downloaded from .
Information about creating custom error notification objects can be found in Chapter 4 of the IBM manual "General Programming Concepts: Writing and Debugging Programs", available online at
=head1 COPYRIGHT
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
=cut