[SUMMARY]: memory channel scary messages on 5.1A

From: speakmaj@mskcc.org
Date: Wed Jun 18 2003 - 14:04:21 EDT


     Hi all
     
     Thanks to Philip, Tom and Alan; conclusion is one memory channel has
     crapped out and failed over to the other one and it has nothing to do
     with link aggregation. I confirmed this by going into the room and
     looking at the LEDs on the back - the primary MC had one amber light
     (bad), the secondary one had two green (good).
     
     Opinion was divided as to whether this merited a service call. As the
     system is still not live, I pulled the cluster down and ran mc_cable
     and mc_diags from firmware. No problems reported. Booting the
     cluster again brought everything back up fine, two pairs of green LEDs
     and nothing bad in the logs. Putting it down to mc cables needed
     reseating, construction in the machine room, and that it happened on
     Friday 13th. At least we know the failover works...
     
     John

______________________________ Reply Separator _________________________________
Subject: memory channel scary messages on 5.1A - link aggregation?
Author: speakmaj (speakmaj@mskcc.org) at Internet
Date: 6/16/2003 1:33 AM

     Hello all
     
     We are preparing to move our production database to an Oracle 9iRAC
     cluster of GS80s running Tru64 5.1A PK4. The new cluster seems to be
     running just fine and we have no complaints. Well, just one. Out of
     the blue we notice the below in /var/adm/messages. It looks scary
     although we didn't experience any problems (but then, we are not
     running production yet...). There is nothing within about 24 hrs
     either side of this snip of /var/adm/messages. The only vaguely
     notable thing is that a day or so before we switched on link
     aggregation on the NICs; after fiddling with the switches a bit, it
     works fine. Does anyone think I should worry about this?
     
     Thanks
     John
     --------------------------------------------------------------------
     
     Jun 13 11:25:34 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1
     ret 2. Clear and restart
     Jun 13 11:25:34 crdbds1 last message repeated 5 times
     Jun 13 11:25:34 crdbds1 vmunix: rmerror_int: mchan1 double failure
     Jun 13 11:25:34 crdbds1 vmunix: rm: logical rail 0 moved from
     phys_rail 0 offset 0 MB
     Jun 13 11:25:34 crdbds1 vmunix: rm: to
     phys_rail 1 offset 0 MB
     Jun 13 11:25:34 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline
     Jun 13 11:25:34 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
     phys_rail 0 removed
     Jun 13 11:25:34 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
     phys_rail 0 (size 512 MB)
     Jun 13 12:05:28 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1
     ret 2. Clear and restart
     Jun 13 12:05:28 crdbds1 last message repeated 5 times
     Jun 13 12:05:28 crdbds1 vmunix: rmerror_int: mchan1 double failure
     Jun 13 12:05:28 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline
     Jun 13 12:05:28 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
     phys_rail 0 removed
     Jun 13 12:05:28 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
     phys_rail 0 (size 512 MB)
     Jun 13 18:15:32 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1
     ret 2. Clear and restart
     Jun 13 18:15:32 crdbds1 last message repeated 5 times
     Jun 13 18:15:32 crdbds1 vmunix: rmerror_int: mchan1 double failure
     Jun 13 18:15:32 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline
     Jun 13 18:15:32 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
     phys_rail 0 removed
     
     
     
     =====================================================================
     
     Please note that this e-mail and any files transmitted with it may be
     privileged, confidential, and protected from disclosure under
     applicable law. If the reader of this message is not the intended
     recipient, or an employee or agent responsible for delivering this
     message to the intended recipient, you are hereby notified that any
     reading, dissemination, distribution, copying, or other use of this
     communication or any of its attachments is strictly prohibited. If
     you have received this communication in error, please notify the
     sender immediately by replying to this message and deleting this
     message, any attachments, and all copies and backups from your
     computer.
     
     
     



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:23 EDT