Tracking down a memory leak

From: stan (stanb@panix.com)
Date: Tue May 10 2005 - 19:40:12 EDT


I've got a machine that's sudenly developed a failty latge memory leak (~=
1M/HR). The machine is running:

SunOS AW0550 5.8 Generic_108528-13 sun4u sparc SUNW,Sun-Blade-100

Now, in the past I've used this little script to catch the culprit process:

#!/bin/sh
OUTFILE=/tmp/siz.rpt

date >> $OUTFILE
ps -ef -o vsz -o pid -o comm | sort -n -r >> $OUTFILE

I process the reulsting data file with this perl script:

#!/bin/perl -w

use strict;

# main()

my $pass = 1;
my $records = 0;
my $line_no = 0;
my $a_record;
my @rval;
my @memused;
my @proccnt;
my %record;
my @results;
my @pass_date;
my $field_qty;
my $lkey;
my $mkey;
my $proc_plus_pid;
my $k;
my $v;
my $date;
my $l_size;
{
        open FILE , "< siz.rpt" ;
        LINE: while(<FILE>)
        {
                $line_no++;
                next LINE if /^$/; # skip blamk lines

                chop; # Kill the Newline that came from the file
                $a_record = $_;
                $a_record =~ s/^\s+//;

                # Parse the fields of each line into individual
                # records
                (@rval)=split(/\s+/,$a_record);
                $field_qty = scalar(@rval);
                if ($rval[0] eq "VSZ")
                {
                        $pass++;
                        next LINE;
                }
                if ($field_qty != 3)
                {
                        if ($field_qty == 6)
                        {
                                $date = $a_record;
                        }
                        next LINE;
                }
                else
                {
                    # Good data, having pased all the tests
                    # Put it into the hash (associative array)
                    $memused[$pass]+=$rval[0];
                    $proccnt[$pass]++;
                    $pass_date[$pass] = $date;
                    $proc_plus_pid = $rval[2].':'.$rval[1];
                    $record{$proc_plus_pid}{"SIZE"}[$pass] = $rval[0];
                    $record{$proc_plus_pid}{"DATE"}[$pass] = $date;
                    if (!$record{$proc_plus_pid}{LPASS})
                        {
                            $record{$proc_plus_pid}{LPASS}=$pass;
                            $record{$proc_plus_pid}{ISIZE}=$rval[0];
                    }
                    $record{$proc_plus_pid}{HPASS}=$pass;
                    if ($rval[0] != $record{$proc_plus_pid}{ISIZE})
                        {
                                # Resident siz of this process has changed
                                # flag it for review
                            $record{$proc_plus_pid}{FOOBAR}=1;
                    }
                }
   }
        # print seselected results
        foreach $lkey (sort keys(%record) )
        {
                if ($record{$lkey}{FOOBAR}) {
                    print "$lkey\n";
# print "ISIZE $record{$lkey}{ISIZE}\n";
                    my $zto=$record{$lkey}{HPASS};
                        $l_size = 0;
                    while ($zat <= $zto) {
                                if ( $l_size != $record{$lkey}{SIZE}[$zat])
                                {

                                           print "$record{$lkey}{DATE}[$zat] size $record{$lkey}{SIZE}[$zat]\n";
                                        $l_size = $record{$lkey}{SIZE}[$zat];
                                }
                            $zat++;
                    }
                }
        }
        print "\n\nMemory Usage:\n";
        print "Pass\tDate\t\t\t\tProcs\tMemory\n";
        my $zat=1;
        while ($proccnt[$zat]) {
          print "$zat\t$pass_date[$zat]\t$proccnt[$zat]\t$memused[$zat]\n";
          $zat++;
        }
}

Which generates output like this:

AW0550# cd /tmp
AW0550# /t4.pl
/opt/fox/hstorian/bin/sampling_ctl:2005
Tue May 10 15:00:00 GMT 2005 size 10080
Tue May 10 17:00:00 GMT 2005 size 10104
/usr/local/bin/ntop:1045
Tue May 10 15:00:00 GMT 2005 size 29528
Tue May 10 17:00:00 GMT 2005 size 29536
/usr/sbin/syslogd:983
Tue May 10 15:00:00 GMT 2005 size 3384
Tue May 10 18:00:00 GMT 2005 size 3392
mibiisa:1838
Tue May 10 15:00:00 GMT 2005 size 2432
Tue May 10 16:00:00 GMT 2005 size 2440

Memory Usage:
Pass Date Procs Memory
1 Tue May 10 15:00:00 GMT 2005 137 672784
2 Tue May 10 16:00:00 GMT 2005 137 672792
3 Tue May 10 17:00:00 GMT 2005 137 672824
4 Tue May 10 18:00:00 GMT 2005 139 688048
5 Tue May 10 19:00:00 GMT 2005 137 672832
AW0550#
script done on Tue May 10 19:30:02 2005

The problem is, this time I don't see a culprit.

Is it possible that something in the kernel is alloating memory that ps
won't report? If so, how can I examine this?

Thanks.

-- 
U.S. Encouraged by Vietnam Vote - Officials Cite 83% Turnout Despite Vietcong Terror 
- New York Times 9/3/1967
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:41 EDT