SUMMARY: Cluster crash

From: Rudolf Gabler (rug@usm.uni-muenchen.de)
Date: Mon Mar 21 2005 - 09:55:34 EST


Hi netters,

The solution was: some directories (exactly the mountpoints
/cluster/members.../boot_partition) were not in the backup because of its
nature to save only filesystems (I'm using tsm).

Thanks to anybody who helped.

Regards,

Rudi Gabler

-----Ursprüngliche Nachricht-----
Von: tru64-unix-managers-owner@ornl.gov
[mailto:tru64-unix-managers-owner@ornl.gov] Im Auftrag von Rudolf Gabler
Gesendet: Freitag, 18. März 2005 12:40
An: tru64-unix-managers@ornl.gov
Betreff: Cluster crash

Hi managers,

My 3 member cluster under V5.1b PK4 crashed after one member lost one of his
dimms. cluster_root was badly broken when I tried to rebuild it with fixfdmn
(from a rescue installation). So I rebuilt cluster_root exactly like
documented in the "cluster administration manual: Troubleshooting clusters"
section: make it new but on the same disk and with a fresh backup restore. I
have a rescue system on which the harware view is like on the cluster and I
can mount any of the filesystems.

When I try to boot the first member with the orig configuration and
specifying the maj,min devices:

     vmunix cfs:cluster_root_dev1_maj=19 cfs:cluster_root_dev1_min=277
clubase:cluster_expected_votes=1 clubase:cluster_qdisk_votes=0 I get an:
     ..
     Waiting for cluster mount to complete
     panic (cpu 0): cfs_issue_localroot_do_mount: namei on boot partition mp
failed

The same is true if I boot the first member (without the kernel clubase:..
Specifications; this member waits until quorum is reached by another member)
and crashes as soon as the second member reaches cluster connect also with
this message.

I googled for the cfs_issue error without any success.

Who knows an advice?

Best regards,

Rudolf Gabler



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:16 EDT