more info: Random network/NIS issue

From: Software Groups (sfgroups@gmail.com)
Date: Mon Aug 07 2006 - 23:29:26 EDT


Looks like problem with one host, when batch jobs running, it losses
the network connecting, form that server I was not able to ping
default gateway.

Here is netstat s command output, Is this output looks normal?

RAWIP
        rawipInDatagrams = 5247 rawipInErrors = 0
        rawipInCksumErrs = 0 rawipOutDatagrams = 4887
        rawipOutErrors = 0

UDP
        udpInDatagrams =669682 udpInErrors = 0
        udpOutDatagrams =700570 udpOutErrors = 0

TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400
        tcpRtoMax = 60000 tcpMaxConn = -1
        tcpActiveOpens =387486 tcpPassiveOpens =2236130
        tcpAttemptFails = 4052 tcpEstabResets = 28710
        tcpCurrEstab = 658 tcpOutSegs =3686501853
        tcpOutDataSegs =3565412936 tcpOutDataBytes =3371633015
        tcpRetransSegs =13935950 tcpRetransBytes =2783222380
        tcpOutAck =117283172 tcpOutAckDelayed =16590179
        tcpOutUrg = 0 tcpOutWinUpdate = 2949
        tcpOutWinProbe = 2939 tcpOutControl =5247874
        tcpOutRsts = 7436 tcpOutFastRetrans =10390254
        tcpInSegs =2247106769
        tcpInAckSegs =1882098886 tcpInAckBytes =3905074533
        tcpInDupAck =91856682 tcpInAckUnsent = 0
        tcpInInorderSegs =976875420 tcpInInorderBytes =2970183904
        tcpInUnorderSegs = 19513 tcpInUnorderBytes =23708915
        tcpInDupSegs =117640 tcpInDupBytes =8101130
        tcpInPartDupSegs = 584 tcpInPartDupBytes =444224
        tcpInPastWinSegs = 19 tcpInPastWinBytes =1832077384
        tcpInWinProbe = 0 tcpInWinUpdate = 2932
        tcpInClosed = 120 tcpRttNoUpdate =2792622
        tcpRttUpdate =1877052564 tcpTimRetrans =20154403
        tcpTimRetransDrop = 2235 tcpTimKeepalive =16408674
        tcpTimKeepaliveProbe=6915641 tcpTimKeepaliveDrop = 741
        tcpListenDrop = 1032 tcpListenDropQ0 = 0
        tcpHalfOpenDrop = 0 tcpOutSackRetrans = 52308

IPv4 ipForwarding = 2 ipDefaultTTL = 255
        ipInReceives =2053070322 ipInHdrErrors = 0
        ipInAddrErrors = 0 ipInCksumErrs = 0
        ipForwDatagrams = 0 ipForwProhibits = 0
        ipInUnknownProtos =8831687 ipInDiscards = 0
        ipInDelivers =2244655136 ipOutRequests =3507582785
        ipOutDiscards = 8828 ipOutNoRoutes = 113
        ipReasmTimeout = 60 ipReasmReqds = 26
        ipReasmOKs = 26 ipReasmFails = 0
        ipReasmDuplicates = 0 ipReasmPartDups = 0
        ipFragOKs = 26 ipFragFails = 0
        ipFragCreates = 71 ipRoutingDiscards = 0
        tcpInErrs = 0 udpNoPorts =1359609
        udpInCksumErrs = 0 udpInOverflows = 0
        rawipInOverflows = 0 ipsecInSucceeded = 0
        ipsecInFailed = 0 ipInIPv6 = 0
        ipOutIPv6 = 0 ipOutSwitchIPv6 = 810
Thanks

On 8/7/06, Software Groups <sfgroups@gmail.com> wrote:
> Hi Managers,
>
> We have about 50 Solaris servers with NIS, autofs setup. We have this
> setup working fine about 5 years.
>
> Now the problem is on few servers, we are getting "NIS server not
> responding" error message. This happens only from 10pm to 3am, the
> problem lost about 10 minutes to 2 hours, after that all services
> works fine. During daytime we are not having any problem.
>
> All server running Solaris 8, NIS master server patch level
> Generic_117350-27. NIS slave server patch level Generic_117350-33
>
> One of the NIS client server, we ran shell script to ping other
> server; during NIS problem time we got 100% ping package loss. We
> opened ticket with network team; they checked the Cisco switch.
> 1. They didn't find any error message on the switch
> 2. Servers connected using VLAN
> 3. No firewall between servers.
>
> Some servers we have two network interface, one for normal network
> other one for backup network, ifconfig command show same mac address
> for both interface (hme0, hme1). Network team said, it might be issue
> with different mac address.
>
> 1. In syslog I didn't see any IP address conflict or network
> connection UP/DOWN error message.
> 2. eeprom shows local-mac-address?=false
> 3. Server load is normal during that time.
> 4. NIS master server uptime is 183 days, client server uptime 30 to 50
days
>
> I am looking for some troubleshooting steps, to resolve this issue.
>
> 1. Is this related to server patch level?
> 2. Why this problem occurs only in the night, at random time.
> 3. These servers running for years, last couple of months we install
> onlu OS patche on these severs. Other than patch nothing changes on
> this configuration.
> 4. If it is network switch issue, if they way to capture some debug
> message on host side?
>
> Thanks
>
>
> --
> Software Groups (SFG)
> http://www.sfgroups.com
>

--
Software Groups (SFG)
http://www.sfgroups.com
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:40:33 EDT