SUMMARY Can't boot

From: Rusdi (rusdi@komputeralif.com)
Date: Thu Oct 24 2002 - 02:46:30 EDT


Thank you very much to :
1. Thomas Sjolshagen
2. Pete Sherwood
3. Stan Horwitz
4. Bluejay Adametz

Original question :
> My 2nd server of my cluster system (2 servers) can't boot dan I got
this
> error message:
> ........
> ........
> Reservation conflict for dkc 100.1.0.2.0
> Read/Write failed with status 0002 on dra 0.0.0.4.1
> Fail open file for dkc 100.1.0.0
> Hard error -error#5
> What is causing problem?

Problem solving:
Based on sugestion from Thomas I did this efforts :
1. Shutdown all nodes
2. shutdown storage array
3. Boot storage array
4. Boot 2nd server ( the server that has booting problem)
5. Boot 1 st server
Finally, I could boot my 2nd server and could
access to external storage as well. However, I saw another
'resevation conflict' on 1st server but the server still could
boot and passed through 'bootstrap'. And Tru64 and cluster
s/w run well on 1st server.

Response from Thomas Sjolshagen :

1st e-mail:
The problem is listed in the error-text "reservation conflict" for the
device. Depending on what version of TruCluster you're running this
could either be a Persistant Reservation problem, which can be using the
cleanPR utility, or for TCR ASE/PR V1.x it's a SCSI RESERVE problem
which can possibly be solved by shutting down all of the nodes, then
shutdown of the storage arrray, before you the boot it and after it has
booted, boot the nodes. (also make sure you're running the latest
patch-kit for both the ARRAY as well as the OS/Cluster).

Word of caution on using the cleanPR utility. It's a V5.x utlitiy and
it's undocumented. It will blow away *any* persistent reserves in your
storage environment and as a result, bad things *can* possibly happen.
You need to exercise extreme caution when using cleanPR and as I said,
to limit the potential for "bad things to happen", have only a single
member of the cluster running when issuing the command (so ensure your
expected votes allow this), then reboot that single node immediately
before you adjust votes and bring the other members back up.

2nd e-mail:
So everything is running, and failover between the two is working? If
so, I suspect the reservation conflict is nothing to worry about..
(Perhaps your fstab has an entry trying to mount the file system from
both nodes. Comment out/remove any fstab entries that include the
devices on the shared busses. Let the failover scripts do the
mount/dismount of the devices).

You should verify that you're running the latest patchkit for V4.0f/g
and ASE/PR 1.6, see
http://ftp.support.compaq.com/patches/.new/unix.shtml. If you're not, I
suggest trying to install the patches. We've seen reservation conflict
problems in the past w/the ASE/PR product and any and all fixes we've
submitted can be found in the latest patch kit.
-----------------------------

Best regards to all outstanding members!

Rusdiyanto

---------------------



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:57 EDT