From: Tim Cutts (tjrc@sanger.ac.uk)
Date: Tue Feb 22 2005 - 12:01:07 EST
LSF started misbehaving on one of our clusters; in this case a 6-node
ES45 cluster, Tru64 5.1B PK 2. I discovered that LSF daemons could be
contacted from outside the cluster, but not from any machine inside the
cluster.
Examining /etc/clua_services, I discovered that it was missing the
lines telling the cluster alias about the ports, so I added the lines:
#
# LSF Ports
#
lim 3879/tcp in_noalias,static
res 3878/tcp in_noalias,static
mbatchd 3881/tcp in_noalias,static
sbatchd 3882/tcp in_noalias,static
mbdquery 40001/tcp in_noalias,static
ran 'cluamgr -f' on all nodes, and restarted LSF on all 6 nodes for
good measure.
But the strange behaviour still continues. If I try to connect to one
of these ports from outside the cluster it works:
16:52:51 tjrc@ecs4d:~$ telnet ecs2d 3882
Trying 172.17.1.204...
Connected to ecs2d.
But if I try to connect from within the cluster, the operation times
out:
16:53:30 tjrc@ecs2c:~$ telnet ecs2d 3882
Trying 172.17.1.204...
telnet: Unable to connect to remote host: Connection timed out
Any ideas, short of rebooting the cluster, which I am reluctant to do?
Many thanks,
Tim
-- Dr Tim Cutts Informatics Systems Group, Wellcome Trust Sanger Institute GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:15 EDT