Solaris 8 resolver

From: Guðbjörn S. Hreinsson (gsh@centrum.is)
Date: Mon Jul 21 2003 - 12:46:42 EDT


Cheers,

I have three Solaris 8 machines, running multithreaded commercial
SMTP package. There is a fair amount of messages going through
these servers (280R type machines) and for each message the MAIL
FROM is checked for correctness; i.e. if the domain part of the
sending address does not exist the message is refused. This is pretty
standard today.

My problem lies with dns or the Solaris 8 resolver. Since I have a lot
of other machines using the same dns servers w/o problems I suspect
the problem to be in the resolver, or the SMTP package.

The problem manifests itself in that the SMTP server will stop processing
incoming connections and ultimately run into the maximum allowed limit
for simultaneous connections and refuse to accept more connections. The
reason for the hang in processing is always that the server is waiting for a
response from the resolver, this does not happen that often, maybe once
a week and we have three systems so it's not such a big issue but it's been
ongoing for some time, requires manual intervention so I would still like to
resolve this.

The SMTP package will seemingly wait forever for the response, the server
itself is multithreaded, but according to the vendor the resolver functions
are
not threadable, thus they maintain only a single thread performing dns
resolutions and maintaining a lock so no other threads can perform other
dns lookups until the current one returns.

According to some threads on this mailing list this is true, i.e. the _res
functions
on Solaris 8 are not thread safe and they are serialized, maintaining some
global
lock.

Furthermore the vendor maintains that dns resolutions on Solaris can take a
very long time to timeout. I haven't been able to verify this on the mailing
list.

I also found in this or other mailing lists that

   - if there are more than 20 concurrent dns lookups they become
serialized,
     strange and I don't fully believe that
   - all lookups are synchronous and serial, which I think is true, albeit
not
     very modern behaviour... don't know if other OS's resolver functions
     are any better
   - if there are a lot of dns lookups, then it would be better to turn of
the
     caching in the nscd daemon
   - 'dns' should be last item in the nsswitch.cf file otherwise the process
      making the lookup can not determine if the lookup is permanent or
      temporary

The vendor has some mechanisms to control the -res.retry and -res.trans
options. This can also be set in resolv.conf as timeout and retrans. Default
values are RES_TIMEOUT (as defined per resolv.h as 5).

I have defined two nameservers in /etc/resolv.conf, the default number of
retries is 4 and the timeout is 5 seconds between retries. Seems to me that
a timeout should occur in 5+2*5+3*5+4*5 = 50 seconds using the default
options? Is the resolver then not reliable on Solaris? Should I really try
and
modify the defaults in /etc/resolv.conf or per the SMTP server application?

Anyone willing to verify some of these statements about Solaris resolver
behaviour or care to make suggestions, specifically about the behaviour of
the SMTP server even if the the Solaris resolver routines are not being
reliable?

Thanks and sorry for a lengthy letter!
-GSH
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:26:47 EDT