From: Steven Langdale (Langdale_Steven@PERKINS.COM)
Date: Mon Mar 17 2003 - 09:22:18 EST
Are you sure its a heartbeat problem. Your right, it does look like it's
the problem. Does it work OK if you remove the serial stuff from the HACMP
config?
Thanks
Steven
augusta Zhou
<meijun_Zhou@SHAN To: aix-l@Princeton.EDU
GHAIGM.COM> cc:
Sent by: IBM AIX
Discussion List
<aix-l@Princeton.
EDU> Subject: HA problem
03/17/2003 07:13
Please respond to
IBM AIX
Discussion List
Perkins: Confidential Green Retain Until: 04/16/2003 Retention Category:
G90 - Information and
Reports
I met a problem after I start HA on backup machine after master HA started.
AIX 4.4.3 HA 4.4.0
It seems the hearbeat problem.
When I start HA on master machine, it seems normal, vg can varyon, service
IP can instead of boot IP.
But when I start HA on the backup machine after master,
I can see lssrc -g cluster
C<Test02>/ #lssrc -g cluster
ubsystem Group PID Status
clstrmgr cluster 15064 active
clsmuxpd cluster 15364 active
it seems HA has been started on backup machines,
but when I check /tmp/hacmp.out file, no message appear.
I found some error message from /tmp/cm.log
short mwrite (0/29)
jil_open_heartbeat_path: A file descriptor does not refer to an open file
mwrite: A file descriptor does not refer to an open file.
mwrite: A file descriptor does not refer to an open file.
mwrite: A file descriptor does not refer to an open file.
mwrite: A file descriptor does not refer to an open file.
mwrite: A file descriptor does not refer to an open file.
short mwrite (0/184)
write to jim: A file descriptor does not refer to an open file.
+ callback not invoked for EVENT VOTE message
mwrite: A file descriptor does not refer to an open file.
mwrite: A file descriptor does not refer to an open file.
mwrite: A file descriptor does not refer to an open file.
mwrite: A file descriptor does not refer to an open file.
mwrite: A file descriptor does not refer to an open file.
I have add tty adapter to HA.
Adapter IP Label Test01_tty
New Adapter Label []
Network Type [rs232]
Network Name [Test_noip]
Network Attribute serial
Adapter Function service
Adapter Identifier [/dev/tty1]
Adapter Hardware Address []
Node Name [Test01]
Test02_tty for another tty adapter.
before that, I have tested the heartbeat, use <Test01>stty </dev/tty1
<Test02>stty </dev/tty1
The result appear on two machines:
<Test02>/ #stty </dev/tty1
speed 9600 baud; -parity hupcl
eol2 = ^?
brkint -inpck -istrip icrnl -ixany ixoff onlcr tab3
echo echoe echok
I can not achieve takeover action with these two machines, smit clstop on
master, lssrc -g cluster the status will remaining "stopping" until I stop
cluster force. On backup /tmp/hacmp.out it shows a request :
config_too_long[82] config_too_long[82] expr 2 + 1
CNT=3
config_too_long[83] config_too_long[83] expr 3 * 30 + 360
TIME=450
config_too_long[76] [ 1 ]
config_too_long[78] config_too_long[78] dspmsg scripts.cat 326 WARNING:
Cluster
Test has been running event 'node_up Test02' for 450 seconds.\n Please
check ev
ent status. Test node_up Test02 450
MSG=WARNING: Cluster Test has been running event 'node_up Test02' for 450
second
s.
Please check event status.
config_too_long[79] /bin/echo WARNING: Cluster Test has been running event
'node
_up Test02' for 450 seconds. Please check event status.
config_too_long[79] 1> /dev/console
config_too_long[80] sleep 30
no actions on master, no actions on backup.
What's wrong? Dose any one can give me a suggestion?
Best Regards.
zhou meijun
IS Department
Shanghai General Motors Co., Ltd.
Tel: (021)28902879
Fax: (021)50317990
E-mail: meijun_zhou@shanghaigm.com
This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:40 EDT