weekly system slowdown

From: Dirk Kleinhesselink (dkleinh@phy.ucsf.edu)
Date: Wed Oct 06 2004 - 19:00:35 EDT


I have a 2 member tru64 5.1A pk 4 cluster connected to an external HSG80
raid controller. For the last several weeks, every wednesday afternoon my
users complain of poor system response. They are running experiments
which require a timely response in writing files over the network. A
couple of weeks back it was so bad they really couldn't work. At that
time, after much checking into the system, I was able to determine that
there was a huge file stuck in the mail queue that the mail system kept
delivering continually to the recipients AND the samba service was under
some kind of denial of service attack (2.2.8a). At that time, the
server's load averages were 30-50. After purging the mail and stopping
the samba service and replacing with 2.2.11, the load average went way
down to 0 - 2. However, I am still getting complaints that while the
response is much better, it still is occasionally slow and particularly
things seem to become much slower every Wednesday afternoon. I've looked
into cron files and there's nothing that starts Wednesday at noon /
afternoon nor do I see much of anything in the process tables except for
many [icssvr_daemon_fr] and [icssvr_daemon_pe] processes.

Today I killed samba, killed mail service, killed web service and that
seem to be of no avail to the users running experiments. They are running
on client machines, writing over NFS to the server. Their home
directories mount from the server. I'm at a loss to figure out what's
going on. Anyone have any idea what might be causing these problems ?

Thanks,
Dirk



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:09 EDT