Re: HA problem

From: Green, Simon (Simon.Green@EU.ALTRIA.COM)
Date: Mon Mar 17 2003 - 05:54:10 EST


I would recommend that you start by doing cluster verification, for both
topology and resource groups.

Simon Green
Altria ITSC Europe s.a.r.l.

AIX-L Archive at http://marc.theaimsgroup.com/?l=aix-l&r=1&w=2
AIX FAQ at http://www.faqs.org/faqs/aix-faq/

N.B. Unsolicited email from vendors will seldom be appreciated.

> -----Original Message-----
> From: augusta Zhou [mailto:meijun_Zhou@SHANGHAIGM.COM]
> Sent: 17 March 2003 07:13
> To: aix-l@Princeton.EDU
> Subject: HA problem
>
>
> I met a problem after I start HA on backup machine after
> master HA started.
> AIX 4.4.3 HA 4.4.0
> It seems the hearbeat problem.
> When I start HA on master machine, it seems normal, vg can
> varyon, service
> IP can instead of boot IP.
> But when I start HA on the backup machine after master,
> I can see lssrc -g cluster
> C<Test02>/ #lssrc -g cluster
> ubsystem Group PID Status
> clstrmgr cluster 15064 active
> clsmuxpd cluster 15364 active
>
> it seems HA has been started on backup machines,
> but when I check /tmp/hacmp.out file, no message appear.
> I found some error message from /tmp/cm.log
> short mwrite (0/29)
> jil_open_heartbeat_path: A file descriptor does not refer to
> an open file
> mwrite: A file descriptor does not refer to an open file.
> mwrite: A file descriptor does not refer to an open file.
> mwrite: A file descriptor does not refer to an open file.
> mwrite: A file descriptor does not refer to an open file.
> mwrite: A file descriptor does not refer to an open file.
> short mwrite (0/184)
> write to jim: A file descriptor does not refer to an open file.
> + callback not invoked for EVENT VOTE message
> mwrite: A file descriptor does not refer to an open file.
> mwrite: A file descriptor does not refer to an open file.
> mwrite: A file descriptor does not refer to an open file.
> mwrite: A file descriptor does not refer to an open file.
> mwrite: A file descriptor does not refer to an open file.
>
> I have add tty adapter to HA.
> Adapter IP Label Test01_tty
> New Adapter Label []
> Network Type [rs232]
> Network Name [Test_noip]
> Network Attribute serial
> Adapter Function service
> Adapter Identifier [/dev/tty1]
> Adapter Hardware Address []
> Node Name [Test01]
>
> Test02_tty for another tty adapter.
>
> before that, I have tested the heartbeat, use <Test01>stty </dev/tty1
> <Test02>stty </dev/tty1
> The result appear on two machines:
> <Test02>/ #stty </dev/tty1
> speed 9600 baud; -parity hupcl
> eol2 = ^?
> brkint -inpck -istrip icrnl -ixany ixoff onlcr tab3
> echo echoe echok
>
> I can not achieve takeover action with these two machines,
> smit clstop on
> master, lssrc -g cluster the status will remaining "stopping"
> until I stop
> cluster force. On backup /tmp/hacmp.out it shows a request :
> config_too_long[82] config_too_long[82] expr 2 + 1
> CNT=3
> config_too_long[83] config_too_long[83] expr 3 * 30 + 360
> TIME=450
> config_too_long[76] [ 1 ]
> config_too_long[78] config_too_long[78] dspmsg scripts.cat
> 326 WARNING:
> Cluster
> Test has been running event 'node_up Test02' for 450
> seconds.\n Please
> check ev
> ent status. Test node_up Test02 450
> MSG=WARNING: Cluster Test has been running event 'node_up
> Test02' for 450
> second
> s.
> Please check event status.
> config_too_long[79] /bin/echo WARNING: Cluster Test has been
> running event
> 'node
> _up Test02' for 450 seconds. Please check event status.
> config_too_long[79] 1> /dev/console
> config_too_long[80] sleep 30
>
> no actions on master, no actions on backup.
>
> What's wrong? Dose any one can give me a suggestion?



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:40 EDT