DCPID - Sample Script - Cluster hangs for over 300 Seconds

From: David.Knight@clubcorp.com
Date: Wed May 12 2004 - 09:49:25 EDT


Managers et al,

        I have been looking into the DCPID to profile my kernel in trouble
shooting efforts. If any one has information on the DCPI (
http://h30097.www3.hp.com/dcpi/ ) command/etc I would greatly appreciate
the info/help with correctly running this utility.

Now to the problem,
        I have a three member cluster running TruCluster 5.1B with every
patch kit (3) and early release patch installed. the cluster (or some
times two or even one member) will hang for anywhere from 100 seconds to
300 seconds. during this period every things stops responding I can't even
run a `pwd` command. I have plenty of collect data from when the problem
happens but during the time even collect stops collecting information so
there is a gap in the logs. I get the messages (below) from the evm. some
apps crash and report time out errors but oracle seems to make it threw
the hard times. If any one out there has experienced this or something
close to it your advice would be greatly appreciated

Thanks in advance,
David Knight

============================ Syslog event ============================
EVM event name: sys.unix.syslog.daemon

    Syslog daemon events are posted by system daemons to alert the
    administrator to an unusual condition. The user name field usually
    indicates which daemon posted the event. The text of the message
    indicates the reason for the event.

======================================================================

Formatted Message:
    CAAD[1049179]: RTD #0: Action Script
    /var/cluster/caa/script/cluster_lockd.scr(check) timed out!
(timeout=60)

Event Data Items:
    Event Name : sys.unix.syslog.daemon
    Priority : 600
    PID : 1049080
    PPID : 1048577
    Event Id : 5547
    Member Id : 2
    Timestamp : 05-Jan-2004 08:14:40
    Host IP address : 10.10.5.140
    Cluster IP address: 10.10.5.151
    Host Name : dalunix140.clubcorp.com
    Cluster Name : dalunixcl
    User Name : root
    Format : CAAD[1049179]: RTD #0: Action Script
                        /var/cluster/caa/script/cluster_lockd.scr(check)
timed
                        out! (timeout=60)
    Reference : cat:evmexp.cat:200

Variable Items:
    None

======================================================================

============================ Syslog event ============================
EVM event name: sys.unix.syslog.daemon

    Syslog daemon events are posted by system daemons to alert the
    administrator to an unusual condition. The user name field usually
    indicates which daemon posted the event. The text of the message
    indicates the reason for the event.

======================================================================

Formatted Message:
    CAAD[1573780]: RTD #0: Action Script
    /var/cluster/caa/script/swcc.scr(check) timed out! (timeout=60)

Event Data Items:
    Event Name : sys.unix.syslog.daemon
    Priority : 600
    PID : 1573526
    PPID : 1572865
    Event Id : 14788
    Member Id : 3
    Timestamp : 07-May-2004 17:30:17
    Host IP address : 10.10.5.170
    Cluster IP address: 10.10.5.151
    Host Name : dalunix170.clubcorp.com
    Cluster Name : dalunixcl
    User Name : root
    Format : CAAD[1573780]: RTD #0: Action Script
                        /var/cluster/caa/script/swcc.scr(check) timed out!

                        (timeout=60)
    Reference : cat:evmexp.cat:200

Variable Items:
    None

======================================================================



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:58 EDT