Re: accessing a system with high load

From: Holger.VanKoll@SWISSCOM.COM
Date: Thu Feb 27 2003 - 11:32:13 EST


>It's perhaps worth pointing out that WLM's memory management ...
 
this is the point. if you use wlm, ulimit, schedtune, nice / whatever you can restrict cpu or REAL mem.
this doesnt help if responses take hours / connections dont get established due to too much paging
by restricting mem you make things (paging) worse
restricting cpu wont help as the blocked processes (waiting for page-ins) dont consume cpu
 
 
I see two ways to go:
1) increase paging / increase npswarn and npskill
2) get some way to use pinned memory. I dont see how to use plock with ksh/ps/kill etc. as I dont have the source to put plock() in and link it static.
 
so probably 1) is the way to go
 
one could also think about what you do if you are in the situation and get a prompt
ps and kill sound good, but if the procs are blocked due to paging, the kill-signals dont get delivered (not even -9 ; and iŽd prefer -17)
 
I will think about how to improve response-time once I have a shell.
maybe vmtune with maxperm low, minfree+max high, strict_maxperm yes and maybe rasining npskill could give you some "air to breathe"
schedtune -f could also be worth a try
if anyone has an idea, feel free to post
 
if you want to test it on a testnode, compile this and start some (hundred) instances of it:
#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

#define MEMS 1000

int main(void)

{

size_t b;

int i,u;

char * mem[MEMS];

b=5000;

for ( i=0 ; i<MEMS ; i++ )

{ mem[i] = (char *) malloc(b);

if ( mem[i] == NULL ) { printf("%s","malloc failed"); break; }

}

printf ("a:%ld mem now filling memory\n",i*b/1024/1024);

for (;;)

{

puts ("filling memory");

for ( i=0 ; i<MEMS ; i++ )

{

for ( u=0 ; u<b ; u++)

{

if ( mem[i] == NULL ) break;

mem[i][u]=0;

}

}

puts ("reading memory");

for ( i=0 ; i<MEMS ; i++ )

{

for ( u=0 ; u<b ; u++)

{

if ( mem[i] == NULL ) break;

if ( mem[i][u] == 1 ) puts ("strange...");

}

}

}

//puts ("sleeping");

//sleep (20);

//puts ("done");

return 0;

} //end of main

 

 
 
 

        -----Original Message-----
        From: Green, Simon [mailto:Simon.Green@EU.ALTRIA.COM]
        Sent: Thursday, February 27, 2003 4:37 PM
        To: aix-l@Princeton.EDU
        Subject: Re: accessing a system with high load
        
        
        I did wonder about that. I think you'd have to deny the offending process all resources. That wouldn't improve things, but at least it would stop things getting worse. It might be quite difficult to do, however, unless you set some ludicrously low hard limit for CPU. You could automate it, but it might be tricky to identify the task. (Unless you have a pretty good idea of what that process is, in which case why hasn't it been fixed. :-))
         
        It's perhaps worth pointing out that WLM's memory management wouldn't help in the slightest and could make things worse. It restricts a process's use of real memory, not paging space, so if anything paging would increase.

                -----Original Message-----
                From: Ferenc Gyurcsan [mailto:fgyurcsa@AVAILANT.COM]
                Sent: 27 February 2003 15:25
                To: aix-l@Princeton.EDU
                Subject: Re: accessing a system with high load
                
                
                Haven't tried it yet, but maybe you could set up WLM and only allow a given amount a resources to the application process once some kind of login runs as well?
                 
                --Ferenc

                        -----Original Message-----
                        From: Green, Simon [mailto:Simon.Green@EU.ALTRIA.COM]
                        Sent: Thursday, February 27, 2003 9:41 AM
                        To: aix-l@Princeton.EDU
                        Subject: Re: accessing a system with high load
                        
                        
                        Once things get bad, I don't think that there's much you can do.
                        Sometimes, an existing telnet session will survive and can be used to kill off high memory users.
                         
                        I think that the best way to deal with this sort of situation is through monitoring, with an alarm sent to a pager or something if paging space utilisation reaches, say 80%.
                         
                        That's what we do, using Patrol. We've still been caught out a couple of times, though, on one of our QA systems. (The application has a memory leak which hasn't been sorted out yet. Every now and then it goes berserk and eats all the memory.)
                         
                        You could create a small paging space somewhere, not normally swapped on, and activate it when things get bad. That would give you a breathing space to take care of the problem without the risk of your paging spaces creeping up in size. Since it would just be for emergencies you could allocate it on a disk which already had a paging space, if necessary. Obviously you still need some sort of monitoring and alarm system.
                        
                        

                        Simon Green
                        Altria ITSC Europe s.a.r.l.
                        
                        AIX-L Archive at http://marc.theaimsgroup.com/?l=aix-l&r=1&w=2
                        AIX FAQ at http://www.faqs.org/faqs/aix-faq/
                        
                        N.B. Unsolicited email from vendors will seldom be appreciated.

                                -----Original Message-----
                                From: Holger.VanKoll@SWISSCOM.COM [mailto:Holger.VanKoll@SWISSCOM.COM]
                                Sent: 27 February 2003 14:15
                                To: aix-l@Princeton.EDU
                                Subject: accessing a system with high load
                                
                                

                                Hello,

                                I am thinking about what to do to ensure access to a system where some application uses that much paging-space that connections (telnet/ssh/getty) cant be made anymore (fork fails).

                                Aix5.1 has the ability (shconf) to do certain things if certain-priority apps dont get cpu anymore.
                                Also, one could start a high-priority ssh-daemon on bootup.

                                Thats fine, but I solves the problem when applications consume too much cpu. That doesnt help if they consume too much paging-space.

                                As far as I see even ulimit/wlm has no way to solve this problem.

                                I could try to start sshd with plock(); but that would only get sshd up running... any command started from there still fails (fork - not enough memory available now).

                                So far, I see no other possibility than to increase paging-space and set high values for npswarn and npskill (vmtune).
                                The only disadvantage I currently see is more disk-usage for paging-space.

                                What do you think / what do you do to ensure access to a high-paging system?



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:37 EDT