SUMARY: Event Messages aren't being directed to the console port

From: Halte, James (JHALTE@ci.tacoma.wa.us)
Date: Mon Oct 15 2007 - 13:48:26 EDT


SUMARY: Event Messages aren't being directed to the console port

ORIGINAL EMAIL: For some reason Event messages (syslog, evm, etc.) aren't being sent out the console port/device (COM Port 1) of GS160 server. I'm using Console Works to monitor the messages from this server via the Serial Com Port.  This was all working fine, until I rebooted the Server for maintenance on two instances now. (once in April, and once in Aug. '07). Rebooting the Server a second time for the instance back in April fixed this issue. For this recent occurrence, I am delaying rebooting, while I try to determine the initial cause.
In addition, I can type an event message at the command line, and the Console Works application, then gets the string and alarms on it.

Thanks to Benjamin Ingwersen for his pinpoint accuracy on the cause of this issue. Also, thanks to Maria Gilliland, for her responses in support of the Console Works end of this issue.
Ultimately, the solution was to replace the PSM module in the QBB1 (Quad Building Block), for my GS160, as the original had a rev. version that was causing conflict with the other PSM modules in my system.

I've summarized the email threads, below from start to finish:

Benjamin,
James did you install any new hardware when you did the maint?

RE: Interesting that you say that...The reason for the boot in April, was for a QBB (CPU Quad building Block) replacement. That in turn required an Alpha System Firmware upgrade to Version 7.2.
------------
Benjamin,
Was it an upgrade for speed/cache or just a replacement? Also is it a mod 8 or mod 16? You may have a rev issue somewhere; let's see if I can help you figure out where. Let me know on the above. Also did they replace all your cards or just the system/firebox or the backplane on one side or?

RE: Just a replacement, because I was getting some minor QBB errors being reported, and HP suggested replacement of the PSM module. (Sorry, I wasn't quite clear that it was just a PSM replacement).
------------
Benjamin,
Do a show fru from the console prompt and see what rev your psm cards are; make sure they match. If they do, check that the global port module cables front and rear are seated properly and do not have damaged pins. Chances are your psm cards are rev b2 And a01 and this is causing a conflict. The only other thing I can think of at the moment would be the switch on the front of one of the psm's is switched to serve mode? Let me know. Now I'm curious. That's the wonderful thing about these wildfire boxes, they can be a pain to get up and running, but once they are, they stay that way for quite a while!!

RE: Well, it seems that there is only one way to test this...by shutting down to SRM and performing a show fru. By the way, I hope you don't mind me copying one of the other responders to this issue, as well as my Console Works support contact (he is also a tru64-unix-managers@ornl.gov member).
------------

Maria suggested the following:
Have you tried an "echo" command from a normal login?
# echo "alarm string" > /dev/console

  Have you tried restarting syslogd?
# kill -HUP pid

I rebooted yesterday afternoon, and this morning I performed the echo test from an xterm directly on the server (not from a CW console window connection). After a few seconds (about 20 sec), the event popped up in CW.
This is a great test of the functionality. Also, I've made no changes to the syslog.conf since the installation of CW back in Aug '2006.

In the mean time I've tried killing the syslogd process, and this didn't fix anything. I did find one thing: Entries coming into the "syslog.dated/current/user.log" are being sent to the serial port, but entries from "syslog.dated/current/auth.log" are not.

Benjamin...Here's the FRU entries for my PSMs:
QBB0.PSM 00 54-25074-02.B02 AY20302076
QBB1.PSM 00 54-25074-01.L01 SM02201595
QBB2.PSM 20 54-25074-02.B01 AY11201316
QBB3.PSM 00 54-25074-02.B01 AY11201273

------------

RE: Benjamin,

I got your phone message...What you're saying makes sense, in that the PSM with the Hardware level of "-01" appears to be out of place. I will place a hardware call with HP to get it replaced, and in turn see if this issue gets resolved.
HP replaced the PSM for QBB1, with a version 54-25074-02.B01, on 9/18/2007. We've not seen a problem since.

-----------------------------------------------------------------------------
Thank you, and please feel free to contact me by phone or email.
 
James Halte
Tacoma Power - T&D EMS Engineer
Office: (253) 502-8094
Cell:    (253) 381-4088
 



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:36 EDT