SUMMARY: ES40 Hardware or Software?

From: jim caldwell (caldwell@heimdal.compchem.ucsf.edu)
Date: Thu Aug 01 2002 - 14:36:54 EDT


It turns out that my problem machine had "dropped" 2 cpus
in the inverval since I last power cycled it. It HAD crashed
a couple of times but restarted by itself apparently OK.

I power cycled it and all 4 cpus came up. Why it dropped
the 2 cpus is still a mystery but now that I know that this can
happen I'll actually check the /var/adm/messages file after any
machine crashes and reboots.

Thanks to Paul A Sand, he suggested using

/usr/sbin/pset_info

to probe what the system was actually using for cpu's. Only 2 showed
but all other 8 machines showed 4.

thanks,
jim

On Wed, 31 Jul 2002, jim caldwell wrote:

>
> Hi Managers,
>
> I have 9 ES40's running Tru64 5.1, all 4 processor machines,
> each with 1GB or more of memory.
>
> The problem is that when I run a small 4 processor MPI (fortran)
> job on any one of 8 of the machines all is fine: "top" shows 4 processes
> running at 99.9% cpu active each cpu.
>
> HOWEVER, on the 9th machine, the identical job runs 4 processes at
> 49.9% cpu active each cpu. I can take the job and set the number of
> processors down to 2 and I get 2 processes running at 99.9% cpu active
> each.
>
> I'm completely baffled.
>
> thanks,
> jim
>
> ----------------------------------------------------------------------------
> James W. Caldwell (voice) 415-476-8603
> Department of Pharmaceutical Chemistry (fax) 415-502-1411
> Mail Stop 0446 (email) caldwell@heimdal.ucsf.edu
> 513 Parnassus Avenue
> University of California
> San Francisco, CA 94143-0446
> ----------------------------------------------------------------------------
>
>

----------------------------------------------------------------------------
James W. Caldwell (voice) 415-476-8603
Department of Pharmaceutical Chemistry (fax) 415-502-1411
Mail Stop 0446 (email) caldwell@heimdal.ucsf.edu
513 Parnassus Avenue
University of California
San Francisco, CA 94143-0446
----------------------------------------------------------------------------



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:48 EDT