TruCluster v5.1A server boot PANIC problem

From: Gergen, Peter (petergergen@kpmg.com.au)
Date: Thu Apr 17 2003 - 08:37:32 EDT


Hi Fellow Managers

I have created a cluster environment and reason for setting this
configuration up is to do testing of patch kit upgrades before doing it on
production servers.
While setting up my test cluster, I ran into a problem that I had not seen
before.
The cluster consists of 2 x 500au Personal workstations with KZPSA
controllers connected to a SWXRC-04 (HSZ40 equiv).
I Got a single node cluster running on v5.1A w PK2. Then I Added the second
cluster member boot disk through clu_add_member and
booted that second member while the first was running. This worked and I was
able to watch the second node being configured
and updated. the network configuration failed and this was accomplished
manually with problems and then this second node was up.
I could change this node to run level S and back to run level 3 and any
other run level and back as long as the server was not rebooted
or shut down and attempted to be restarted. If the server was shut down or
rebooted, then the following message appeared:
CNX MGR: Invalid configuration for cluster seq disk and the server would
panic and return to the SRM prompt.
There seems to be a patch for this in v5.1B but no reference to this in
v5.1A. See documentation below.
Any assistance would be appreciated in solving this problem.

****************************************************************************
**************

I did some digging and came up with this, but it is for v5.1B:
FROM:
http://ftp1.support.compaq.com/public/unix/v5.1b/TruCluster_V5.1B/doc/txt/TC
RPAT00005000540.txt
PROBLEM: (93677, 92409, 94911, 92799) (PATCH ID: TCR540-012)
********
PROBLEM: (93677) (PATCH ID: )
This patch improves the responsiveness of EINPROGRESS handling during the
issuing of I/O barriers. The fix removes a possible infinite loop scenario
which could occur due to the deletion of a storage device.
The issue with EINPROGRESS responsiveness is the continued looping while
waiting for a disk structure to become available. No attempts were being
made to force the availability of the disk structure.
In addition, no retry limit was being enforced and no checks were being made
for deleted devices. This combination presents the possibility of infinite
retry attempts.

PROBLEM: (92409) (PATCH ID: )
This patch fixes a CNX manager panic encountered while multiple cluster
nodes are booted simultaneously.
The panic string seen is: CNX MGR: Invalid configuration for cluster seq
disk

>From : Patch Summary and Release Notes for Patch Kit 1 for v5.1B
This manual contains information specific to Patch Kit 1 of the Tru64 UNIX
operating system and TruCluster Server software products for Version 5.1B.

Number: Patch 50.00
Abstract: Fixes a regression associated with non-SCSI storage
State: Supersedes Patch 27.00, 28.00, 29.00, 31.00
This patch:
* Fixes a regression associated with non SCSI storage.
* Improves the responsiveness of EINPROGRESS handling during the
issuing of I/O barriers by removing a possible infinite loop scenario that
could occur due to the deletion of a storage device.
* Fixes a problem that causes a panic with the message "CNX MGR:
Invalid configuration for cluster seq disk" during simultaneous booting of
cluster nodes.
* Fixes a possible race condition between a SCSI reservation conflict
and an I/O drain, which could result in a hang.
* Alleviates a condition in which a cluster member takes an extremely
long time to boot when using LSM.
* Fixes a problem in the cluster kernel where a cluster member panics
while doing remote I/O over the interconnect.
* Corrects an issue to allow the Device Request Dispatcher, DRD, to
retry to get disk attributes when EINPROGRESS is returned from the disk
driver.
* Fixes a problem in which access to the quorum disk can be lost if
the quorum disk is on a parallel SCSI bus and multiple bus resets are
encountered.
****************************************************************************
**************

Regards

Peter Gergen
Nexus Tru64/HP-UX/Win2K/Oracle System Administrator
Tel: (03) 9288 6236 / 0418 475 575
petergergen@kpmg.com.au
kpmg Melbourne Australia

**********************************************************************
This email is intended only for the use of the individual or entity
named above and may contain information that is confidential and
privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. When addressed to our clients, any opinions or advice contained in this Email are subject to the terms and conditions expressed in the governing KPMG client engagement letter. If you have received this Email in error, please notify us immediately by return email or telephone +61 2 93357000 and destroy the original message. Thank You.
**********************************************************************M



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:16 EDT