Summary: slow tcp response from sendmail

From: Jay R. Wren (jrwren@oakland.edu)
Date: Thu May 09 2002 - 17:32:39 EDT


Well, I recieved some excellent responses, most of which could have been
eliminated if I had been more descriptive in my original email.

Suggestions where:
Reverse DNS was slow or not available, thus causing the problem.
Our sendmail was using ORBS and its DNS lookups were unavailable.
Slow network due to mismatched autonegotiation.
High Load average causing sendmail to stop responding. The QueueLA,
RefuseLA options.

Unfortunately none of these were the problem.

What I did finally notice was that 'sysconfig -q socket' showed
sobacklog_drops as counting upward very quickly. I hunted down this in
the Performance Tuning manual and the Internet Server Best Practices
docs. It was unclear, but I got the impression that I should increase
and max out somaxconn and sominconn. somaxconn was already at the
maximum value of 65535, so I set sominconn to 65535. This didn't seem to
help.

I called Compaq Support. We have a silver agreement, I had better use it.

After some discussion they agreed that maxing out sominconn and
somaxconn were good steps, but I was still having this problem. It turns
out that there is a known bug with closing a socket. I have no idea why
it reared it ugly head. We were running fine for months with no changes
recently. We moved to PK1 and applied a special patch refered to us by
Compaq Support. This fixed the problem.

It turns out the sendmail 'server' process that listens on the socket
was going into state 'U' as specified by ps output. Using dbx the phone
support person helpped me verify that the process was hung on the
soclose call. Unfortunately I don't remember the exact dbx commands to
verify this. something along the lines of :
# dbx /vmunix
(dbx) set $pid=8080
(dbx) t
....output here...

where 8080 is the pid of the uninteruptable process

Thanks to all for suggestions.

--
Jay R. Wren
Systems Programmer
Oakland University
Rochester, Michigan
Jay R. Wren wrote:
>> Hello,
>> Today we started noticing slow tcp response from sendmail running on 
>> Tru64 5.1A. The behavior is basically slow to no(timed out) response 
>> on tcp port 25. The logs don't show anything. It is a 2 processor DS20 
>> with a load avg of around 1. The CPU's don't do much.
>>
>> vmstat, iostat, netstat all show normal, expected information. There 
>> are no page outs. Disk IO seems normal. There are no network errors or 
>> collisions.
>>
>> The mailq has about 1700 messages waiting in it, but most are to 
>> deferred hosts, etc. There is usually around 21 sendmail processes 
>> running, processing the queue at any given time.
>>
>> Even a 'telnet localhost 25' will occasionally time out, even though 
>> now it is 5:30pm, off our peak hours, so our load is under .5.
>>
>> If anyone has any clues or pointers I would appreciate it.
>>
>> -- 
>> Jay R. Wren
>> Systems Programmer
>> Oakland University
>> Rochester, Michigan
>>
>  
>


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:40 EDT