Re: Error log entries following SP Switch Efence

From: JOSEPH KREMBLAS (jkremblas@REDHEARTGIFTS.COM)
Date: Wed Jan 14 2004 - 10:27:52 EST


Simon,

        A transient error would have me concerned. I would recommend viewing
the contents of the /var/adm/SPlogs/css/out.top file; in here you'll be able
to obtain a more reasonable explanation of whether the transient error is
normal. Also, I would check /var/adm/SPlogs/css/cable_miswire (hopefully it
doesn't exist!) on the primary node.

        Sample out.top file:

S 15 2 tb3 1 0 E01-S17-BH-J8 to E01-N10 -4 R:
     device has been removed from network - faulty
     (link has been removed from network - fenced)

        The -4 error code is normal for a fence operation.

        Explanation: The link is not operational and was removed from the
switch network.
        Cause: Either the link is miswired or the link has failed.

        Action: First check the /var/adm/SPlogs/css[0 | 1]/p0 directory for
the existence of a cable_miswire file. If the file exists, verify and
correct all links listed in the file. Then issue the Estart command.

        If the cable_miswire file does not exist, examine the
/var/adm/SPlogs/css[0 | 1]/p0/flt file for entries relating to this link. If
entries are found, verify that the cable is seated at both ends, then run
the Estart command. If the problem persists, contact the IBM Support Center.

                --joseph
-----Original Message-----
From: IBM AIX Discussion List [mailto:aix-l@Princeton.EDU] On Behalf Of
Green, Simon
Sent: Wednesday, January 14, 2004 6:37 AM
To: aix-l@Princeton.EDU
Subject: Re: Error log entries following SP Switch Efence

My main interest is not - yet - figuring out what this means, but simply if
it's normal when you fence a node.

Here's the detailed error report, anyway.

LABEL: TB3_TRANSIENT_RE
IDENTIFIER: E94651FA

Date/Time: Wed Jan 14 10:28:46
Sequence Number: 166211
Machine Id: XXXXXXXXXXXX
Node Id: XXXXXXXX
Class: H
Type: TEMP
Resource Name: css
Resource Class: NONE
Resource Type: NONE
Location: NONE

Description
Switch adapter transient error

Probable Causes
Loose, disconnected or bad switch cable

User Causes
Switch cable loose or disconnected

        Recommended Actions
        Check / reconnect / replace cable if problem persists

Failure Causes
Switch cable faulty

        Recommended Actions
        Check / reconnect / replace cable if problem persists

Detail Data
DETECTING MODULE
PSSP,TB3recovery.c,1.41,660
ERROR ID
6u5ZFd1SbF/.//LM1H0KZb0...................
REFERENCE CODE
7a12...SbF/./AHM1H0KZb0...................
Interrupt source (ISR or MX CFG3)
TBIC intr

Bus Err (DMA CSR, MX MBA_ER, PCI C/S)
        b8000240

> -----Original Message-----
> From: JOSEPH KREMBLAS [mailto:jkremblas@REDHEARTGIFTS.COM]
> Sent: 14 January 2004 13:20
> To: aix-l@Princeton.EDU
> Subject: Re: Error log entries following SP Switch Efence
>
>
> Can you provide the detailed results from errpt? I'd like to see if
> there are any sense error codes for the transient error.



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:17:30 EDT