ADDITIONAL SUMMARY: Problems with 5.1B PK4 causing machine to hang

From: Chad W Baker (Chad_W_Baker@raytheon.com)
Date: Fri Feb 11 2005 - 16:05:51 EST


Hello,
A little more investigation turned up a problem with Clearcase v5.0. There
is a problem that could cause a potential system hang in the kernel virtual
memory when running Clearcase V5.0 MVFS. The IBM defect number is
RATLC00728925. The fix is included in patch 41. After installing this, I
was able to build software in Clearcase without the machine hanging up.

Chad

----- Forwarded by Chad W Baker/RES/Raytheon/US on 02/11/2005 03:59 PM
-----
                                                                           
             Chad W Baker
             <Chad_W_Baker@ray
             theon.com> To
             Sent by: tru64-unix-managers@ornl.gov
             tru64-unix-manage cc
             rs-owner@ornl.gov
                                                                   Subject
                                       SUMMARY: Problems with 5.1B PK4
             02/08/2005 03:25 causing machine to hang
             PM
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           

Hello Managers,

Here is the response from I received from Dr. Thomas Blinn - the fix looks
like it's scheduled for the patch kit release. Original question follows.

"There was a change in V5.1B PK4 (aka BL25) that can cause certain
programs to get an "interrupt" return status with no evidence of
an abnormal condition otherwise. There is not yet a patch that
undoes this for PK4 in general, but it may be the root cause of
your problem. It will be changed back to the pre-PK4 behavior
in the next patch kit, but that will be a while yet. We have seen
the change break some
of the shell scripts on SierraCluster systems, and we've seen
it break the ladebug debugger, I would not be surprised if it
also breaks, for instance, Clearcase; it may be causing some of
the "wait" system calls to return unexpected status for child
processes that are exiting, and I suspect it's load dependent
as it involves "race" conditions; it's especially likely that
it will impact multi-threaded applications more than simple
classic UNIX applications, and I would not be surprised if at
least part of the Clearcase tool suite is multi-threaded."

Chad

#######################################################################

Hello Managers,

I have a collection of Alpha machines - ES40s, AS4100s, and XP1000s - all
running Tru64 5.1B. Before last Friday, there was a mix of patch kit
versions from 2 to 4. Last Friday, I upgraded our build and development
environment from PK3 to PK4. Once I did that, we were no longer able to
build any software, the servers and workstations got into a loop and never
came out. From what I can tell, there was no specific piece of code being
built that was causing the problem (the same code build successfully on PK2
and 3). I also do not think it's the same process each time that hangs it
up, either, but I can't be sure.

I've installed PK4 over PK3 and also installed PK4 on a fresh OS install --
same result. Our config. mgmt software is Clearcase, so all of the makes
and builds are done through that. As a side note, I was able to run
configure and make/build gcc without a problem, so I'm guessing it's not
something in the compiler that's causing this problem.

Has anyone seen problems similar to this or know what's causing the hang
ups or how it can be fixed?

Thanks in advance,
Chad



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:15 EDT