Resolv.conf error?

From: Copper, Steve (scopper@westernpower.co.uk)
Date: Fri Sep 21 2007 - 08:49:32 EDT


Hi All,

I have just experienced a problem where my samba shares to 2 servers
suddenly stopped working. The reason as to why they stopped working is
because out DNS server crashed. Now I am 99.99% sure that this isn't a
samba issue as all my other 16 servers with identical configuration
files were still working, and when the DNS server came back up the 2
servers samba shares started working again. Stopping and restarting the
samba processes had no affect.

So this led me to look at the resolv.conf files and this is where I
think that the problem lies. All the servers have a domain statement
followed by 2 nameserver entries as follows

domain aaaaa.co.uk
nameserver 1.1.1.1
nameserver 2.2.2.2

And it was the 1.1.1.1 server which crashed. Now the 2 servers whose
samba shares stopped working had the 1.1.1.1 address as the first entry
followed by 2.2.2.2, all the others who still worked have 2.2.2.2 as the
first entry followed by the 1.1.1.1

So this leads me to conclude that somehow the resolv.conf "process"
isn't working as designed as reading the below taken from the man page
on resolv.conf indicates that if the first server (1.1.1.1) isn't
available then it will move onto the second server (2.2.2.2)

......
  nameserver Address

  Internet address (in dot notation) of a name server that the resolver
should query. Up to MAXNS (currently 3) name servers may be listed, one
per keyword. If there are multiple servers, the resolver library
queries them in the order listed. If no nameserver entries are present,
the default is to use the name server on the local machine. (The
algorithm used
  is to try a name server, and if the query times out, try the next,
until out of name servers, then repeat trying all the name servers until
a maximum number of retries are made).
......

All the nsswitch.conf and svc.conf files are exactly the same on all 18
servers so I doubt they are the problem, although here is the contents
of the svc.conf file and nsswitch.conf file

aliases=local,bind
auth=local
group=local,yp
hosts=local,bind
netgroup=yp
networks=local
passwd=local,yp
protocols=local
rpc=local
services=local

And

aliases: files dns
auth_default: files
auth_devassign: files
auth_files: files
auth_prpasswd: files
auth_ttys: files
group: compat
group_compat: nis
hosts: files dns
netgroup: nis
networks: files
passwd: compat
passwd_compat: nis
protocols: files
rpc: files
services: files

Has anyone seen this behaviour before of not moving onto the next server
in the nameserver address, and know what the cause is?

We are running on Tru645.1B with PK4

Thanks in advance.

Regards
Steve Copper

Western Power Distribution (South West) plc / Western Power Distribution (South Wales) plc
Registered in England and Wales
Registered number: 2366894 (South West) / 2366985 (South Wales)
Registered Office: Avonbank, Feeder Road, Bristol, BS2 0TB

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster@westernpower.co.uk



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:35 EDT