Sun Cluster 2.2 - can't join cluster

From: Stanley, Jon (Jon.Stanley@savvis.net)
Date: Tue Jan 20 2004 - 10:23:58 EST


I can't join a node to a Sun Cluster 2.2 cluster - the node crashed due
to a failed disk, but I got the system back up, and did an scadmin
startnode, and it failed. Both nodes are Solaris 2.6, kernel patch
105181-32, running on E250's, with Etherent private interconnects, and
2xA1000's for the shared storage. Lookming in /var/adm/messages reveals
the following relevant items: Anyone seen this before?

Jan 20 14:54:14 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1200]: Reconfiguration step start started
Jan 20 14:54:14 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.smad.1102]: smad: Cluster 'ids2ns' monitoring
Jan 20 14:54:15 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1201]: Reconfiguration step start completed
Jan 20 14:54:16 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1200]: Reconfiguration Step 1 started
Jan 20 14:54:16 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1120]: ids2ns reconfiguration 10 started on
i02sv1019.ids2.intelonline.com i02sv1020.ids2.intelonline.com
Jan 20 14:54:16 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.smad.1103]: smad: Cluster 'ids2ns' running
Jan 20 14:54:16 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.monitor.6010]: node 1 able to communicate with node 0
over net 0
Jan 20 14:54:16 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.monitor.6010]: node 1 able to communicate with node 0
over net 1
Jan 20 14:54:16 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.smad.1030]: ids2ns net 0 (qfe3:1) selected
Jan 20 14:54:16 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.monitor.6011]: net 0 is up
Jan 20 14:54:16 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.monitor.6011]: net 1 is up
Jan 20 14:54:17 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1201]: Reconfiguration Step 1 completed
Jan 20 14:54:18 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1200]: Reconfiguration Step 2 started
Jan 20 14:54:18 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1201]: Reconfiguration Step 2 completed
Jan 20 14:54:18 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1200]: Reconfiguration Step 3 started
Jan 20 14:54:18 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.ccd.1703]: ids2ns starting ccdd.
Jan 20 14:54:33 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.4051]: ccd exited with 255 in cmmstep3
Jan 20 14:54:34 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.clustd.transition.4008]: transition 'step3' failed. Child
exit status 256
Jan 20 14:54:34 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.clustd.signal.4007]: fatal: received signal 15.
Jan 20 14:54:34 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1200]: Reconfiguration step abort started
Jan 20 14:54:35 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.loghost.1010]: Giving up logical host i02lh0011
Jan 20 14:54:37 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.loghost.1050]: abort_net method of data service oracle
completed successfully.
Jan 20 14:54:37 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.loghost.1050]: abort method of data service oracle
completed successfully.
Jan 20 14:54:38 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.pnm.pnmd.2005]: pnmd daemon is shutting down
Jan 20 14:54:40 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.pnm.pnmd.2001]: initial: Bk_gp (nafo0) using primary adp
(qfe0)
Jan 20 14:54:40 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.pnm.pnmd.2002]: initial: Bk_gp (nafo0) Status (OK); Adp
(qfe0) Status (OK)
Jan 20 14:54:40 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.ccd.1701]: ids2ns aborting ccdd.
Jan 20 14:54:55 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.ccd.1702]: ids2ns aborting ccdd completed.
Jan 20 14:54:56 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.smad.1105]: smad: Cluster 'ids2ns' no longer running
Jan 20 14:54:56 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.sma.smad.5010]: ids2ns net 0 (qfe3:1) de-selected
Jan 20 14:54:56 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.reconf.1201]: Reconfiguration step abort completed
Jan 20 14:54:57 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.clustd.transition.4010]: cluster aborted on this node
(i02sv1020.ids2.intelonline.com)

And from /var/opt/SUNWcluster/scadmin.log:

Tue Jan 20 14:54:12 GMT 2004 SUNWcluster.reconf.1340 quorum started in
startnode
Tue Jan 20 14:54:12 GMT 2004 SUNWcluster.reconf.1050 quorum completed
successfully in startnode
Tue Jan 20 14:54:12 GMT 2004 SUNWcluster.reconf.1340 ccd started in
startnode
Tue Jan 20 14:54:12 GMT 2004 SUNWcluster.reconf.1050 ccd completed
successfully in startnode
Tue Jan 20 14:54:12 GMT 2004 SUNWcluster.reconf.1340 sma started in
startnode
Tue Jan 20 14:54:12 GMT 2004 SUNWcluster.reconf.1050 sma completed
successfully in startnode
In parse_cmd_args
Tue Jan 20 14:54:14 GMT 2004 SUNWcluster.reconf.1340 sma started in
cmmstart
Tue Jan 20 14:54:14 GMT 2004 SUNWcluster.reconf.1050 sma completed
successfully in cmmstart
Tue Jan 20 14:54:14 GMT 2004 SUNWcluster.reconf.1340 ccd started in
cmmstart
Tue Jan 20 14:54:15 GMT 2004 SUNWcluster.reconf.1050 ccd completed
successfully in cmmstart
Tue Jan 20 14:54:16 GMT 2004 SUNWcluster.reconf.1340 sma started in
cmmstep1
Tue Jan 20 14:54:17 GMT 2004 SUNWcluster.reconf.1050 sma completed
successfully in cmmstep1
Tue Jan 20 14:54:17 GMT 2004 SUNWcluster.reconf.1340 quorum started in
cmmstep1
Tue Jan 20 14:54:17 GMT 2004 SUNWcluster.reconf.1050 quorum completed
successfully in cmmstep1
Tue Jan 20 14:54:18 GMT 2004 SUNWcluster.reconf.1340 quorum started in
cmmstep2
Tue Jan 20 14:54:18 GMT 2004 SUNWcluster.reconf.1050 quorum completed
successfully in cmmstep2
Tue Jan 20 14:54:18 GMT 2004 SUNWcluster.reconf.1340 ccd started in
cmmstep3
Jan 20 14:54:33 i02sv1020.ids2.intelonline.com
ID[SUNWcluster.ccd.ccdctl.5103]: (error) timed out during RPC request
for 'start' transition
Tue Jan 20 14:54:34 GMT 2004 SUNWcluster.reconf.1340 quorum started in
cmmabort
Tue Jan 20 14:54:34 GMT 2004 SUNWcluster.reconf.1050 quorum completed
successfully in cmmabort
Tue Jan 20 14:54:34 GMT 2004 SUNWcluster.reconf.1340 loghost started in
cmmabort
localhost: RPC: Program not registered
Tue Jan 20 14:54:38 GMT 2004 SUNWcluster.reconf.1050 loghost completed
successfully in cmmabort
Tue Jan 20 14:54:38 GMT 2004 SUNWcluster.reconf.1340 pnmreconfig started
in cmmabort
Tue Jan 20 14:54:40 GMT 2004 SUNWcluster.reconf.1050 pnmreconfig
completed successfully in cmmabort
Tue Jan 20 14:54:40 GMT 2004 SUNWcluster.reconf.1340 ccd started in
cmmabort
Tue Jan 20 14:54:55 GMT 2004 SUNWcluster.reconf.1050 ccd completed
successfully in cmmabort
Tue Jan 20 14:54:55 GMT 2004 SUNWcluster.reconf.1340 rpcbindmon started
in cmmabort
Tue Jan 20 14:54:55 GMT 2004 SUNWcluster.reconf.1050 rpcbindmon
completed successfully in cmmabort
Tue Jan 20 14:54:55 GMT 2004 SUNWcluster.reconf.1340 hadbms started in
cmmabort
Tue Jan 20 14:54:56 GMT 2004 SUNWcluster.reconf.1050 hadbms completed
successfully in cmmabort
Tue Jan 20 14:54:56 GMT 2004 SUNWcluster.reconf.1340
netmon started in cmmabort
Tue Jan 20 14:54:56 GMT 2004 SUNWcluster.reconf.1340
scmgr_server started in cmmabort
Tue Jan 20 14:54:56 GMT 2004 SUNWcluster.reconf.1340
sma started in cmmabort
Tue Jan 20 14:54:56 GMT 2004 SUNWcluster.reconf.1050 netmon completed
successfully in cmmabort
Tue Jan 20 14:54:56 GMT 2004 SUNWcluster.reconf.1050 scmgr_server
completed successfully in cmmabort
Tue Jan 20 14:54:56 GMT 2004 SUNWcluster.reconf.1050 sma completed
successfully in cmmabort

Jon Stanley
Hosting Systems Engineer
SAVVIS Communications
1 SAVVIS Parkway
Town & Country, MO 63017
SAVVIS, The Network That Powers Wall Street(SM)
314-628-7570 (direct)
314-265-4690 (mobile)
pagejon@savvis.net (pager)
866-234-4678 (Toll Free)
jon.stanley@savvis.net
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:51 EDT