Strange messages during cluster reboot

From: Erdei Tamás (Erdei.Tamas@lnx.hu)
Date: Fri May 30 2003 - 08:21:29 EDT


Hi all,

I am building a new cluster based on two Netra 20/T4 servers and two D2
storage, using Solaris 8, Sun Cluster 3.0 and DiskSuite 4.2.1.
The disks in a D2 box are striped together, and the two boxes are mirrored
onto each other. I think this is a quite standard setup. (The slicing of the
disk is the default created by SDS, eg. slice 7 for the SDS metadb (2
clusters) and the rest for slice 0).
Everything went well during the install, I installed the SC3.0, set the
cluster quroum to one of the disks in the first D2, then set up the disk group
(metaset), metadb, stripes and mirroring. I created and mounted the filesystem
on the new disk group successfully.
The system seems to be working properly, except for some strange system
messages during a node or cluster reboot:

May 30 00:40:27 n2 scsi: WARNING: /pci@8,700000/pci@3/scsi@4/sd@8,0 (sd67):
May 30 00:40:27 n2 Error for Command: read(10) Error Level:
Informational
May 30 00:40:27 n2 scsi: Requested Block: 0 Error
Block: 0
May 30 00:40:27 n2 scsi: Vendor: FUJITSU
Serial Number: 0303X78024
May 30 00:40:27 n2 scsi: Sense Key: Unit Attention
May 30 00:40:27 n2 scsi: ASC: 0x29 (<vendor unique code 0x29>), ASCQ:
0x2, FRU: 0x0

This message gets repeated a few times.
According to a SCSI Sense Key table, this message means only that the disk was
reset (probably because of the reboot of the other node). The strange thing
is, that this message is generated only for the disk which is set as the
cluster quorum.

Strange messages are generated on the booting node as well:

May 30 00:41:51 n1 cl_runtime: [ID 606467 kern.warning] WARNING: CMM:
Initialization for quorum device /dev/did/rdsk/d4s2 failed with error EACCES.
Will retry later.
May 30 00:41:53 n1 cl_runtime: [ID 847496 kern.warning] WARNING: CMM: Reading
reservation keys from quorum device /dev/did/rdsk/d4s2 failed with error 2.

I am new to Sun Clustering, so I am not sure if these messages are harmless
and can be ignored, or I missed something during the install. Could someone
help me interpret these messages, or confirm that they are harmless?

What I do not understand is, that I can set the quorum disk only, not a
specific slice. The software selects slice 2 automatically, which is the
overlap slice on normal disks, but this slice is not defined on the shared
disks (it has 0 length). Now where does the SC software store quroum info
(reservation keys??) on the quorum disk, does it use cluster 0? Is it
possible, that this conflicts with the SDS metadb, which is also stored on
cluster 0?

Besides these strange messages, the cluster seems to work correctly. After the
reboot, both nodes and the quorum disk are online (according to scstat), the
mirror and stripes are in Okay state.

I appreciate any help. I searched through the docs, list archives and usenet
groups, but found nothing relevant.

Thanks,
Tamas

-----------------------------------------------------------
Tamas Erdei E-mail: erdei.tamas@lnx.hu
Systems Engineer
LNX Ltd.
-----------------------------------------------------------
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:30 EDT