Patch makes system unusable

From: Rudolf Gabler (rug@usm.uni-muenchen.de)
Date: Sun Apr 01 2007 - 11:52:46 EDT


Hi managers,

We applied the latest jumbo patch to our 5.1B 3-node cluster. The whole
rolling upgrade went well without any errors until we applied the
   clu_upgrade switch
stage. This went also well and then you should reboot the nodes (to apply at
the last stage a clu_upgrade clean).
The leading node was rebooted and during boot the remaining nodes crashed as
the leading one attempted to do a lsmbstartup.

Now we have the following:

A) we are able to boot the 3 nodes into single user mode from halt. Any
attempt to start lsmbstartup on all 3 nodes yields in either
   LSM: Vold is not enabled for transactions - and the nodes remain up
B) a subsequent bcheckrc tries to start lsm on one of the nodes and crash
the others

C) if we use only the 2 nodes crashed during reboot -- mentioned above, we
can startup the cluster, but any attempt to boot the leading node again into
the cluster results in a panic with some sort of clsm messages i.e.
"clsm_state.distribute is false", "kgs_respond failed with error 6" or a
crash without a dump on the already running nodes.

Anyone with a clue what we should try?

Rudi Gabler



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:34 EDT