Trucluster 5.1B 'clu_upgrade clean' fails to run

From: SMITH, Gavin (gavin.smith@airwidesolutions.com)
Date: Thu Nov 11 2004 - 12:24:57 EST


Hi Managers,

We're in the process of upgrading several of our 5.1A production systems -
our lab systems and those we've done to date have pretty much been trouble
free. One particular two-node cluster has been less successful (attempting
to upgrade to 5.1B and Patch Kit #4)

* Firstly, the 'clu_upgrade setup' reported the following errors:

clubase: Entry not found in /cluster/admin/tmp/stanza.stdin.862581

clubase: Entry not found in /cluster/admin/tmp/stanza.stdin.863378

When asked if we wished to proceed, we decided to continue and rollback from
backups if we hit problems.

* After the postinstall stage, when the second member was taken down prior
to rolling, the following error was reported:

Member '1' appears to be booting or is in single-user mode. Either wait for
the node to continue booting, boot the node to multiuser mode, or halt the
node using the shutdown command. After all members are running in multiuser
mode or halted, you can rerun the clu_upgrade command.
/usr/sbin/clu_upgrade[3639]: test: argument expected

* The roll completed successfully, as did the version switch. Once again,
when node 2 was rebooted, the following message was displayed:

Member '1' appears to be booting or is in single-user mode. Either wait for
the node to continue booting, boot the node to multiuser mode, or halt the
node using the shutdown command. After all members are running in multiuser
mode or halted, you can rerun the clu_upgrade command.
/usr/sbin/clu_upgrade[3639]: test: argument expected

This was repeated when member1 was rebooted:

Member '2' appears to be booting or is in single-user mode. Either wait for
the node to continue booting, boot the node to multiuser mode, or halt the
node using the shutdown command. After all members are running in multiuser
mode or halted, you can rerun the clu_upgrade command.
/usr/sbin/clu_upgrade[3639]: test: argument expected

* Finally, the attempt to run 'clu_upgrade clean' failed with:

*** Error ***
Member '2' must be rebooted before running the 'clean' command.

*** Error ***
Member '1' must be rebooted before running the 'clean' command.

Several reboots later and inspection of the clu_upgrade script shows that
its looking for the following files at the end of the 'clean' function:

/cluster/admin/clu_upgrade/switch.completed.member1
/cluster/admin/clu_upgrade/switch.completed.member2
/cluster/admin/clu_upgrade/switch.completed

Neither of the member specific files are present :(

The upgrade appears to have worked (both members believe they're running
5.1B and PK4).

Has anyone ever seen this before, or like to hazard a guess why we're seeing
node communication errors during the clu_upgrade?

I could of course just create the member specific files it's missing - but
I'd like to get a handle on how it got into this state in the first place...

TIA,

Gavin.

________________________________________________________________________
airwide solutions has changed its email address format to firstname.lastname@airwidesolutions.com.
Any slb.com email addresses are no longer valid. If you experience difficulty contacting our staff,
please email info@airwidesolutions.com.
________________________________________________________________________



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:11 EDT