SUMMARY: disklabel hangs

From: Andrew Raine (Andrew.Raine@mrc-dunn.cam.ac.uk)
Date: Tue Feb 22 2005 - 06:40:35 EST


Oops - I replied to one of my corresponents on-list instead of
directly. My apologies...

Dear Colleagues,

Thanks again for the rapid replies!

[Original problem appended]

B L Venkatesh and Zoong Pham asked some sensible questions, and John
Lanier gave me a neat dbx trick to set the processes to "Interruptible"
(ps -Amo pid,ppid,rssize,comm,state,cputime,pcpu,psr showed them as
Uninterruptible) but they immediately revert to Uninterruptible as soon
as I try to interrup them!

Tom Blinn's answer is probably definitive:

    Sounds like a bug. I vaguely remember some similar scenario but it
    was a long time ago and I believe we fixed that particular case,
    but you may have stumbled on a different one.
    
    You are likely to need to reboot to clear this. Since you have a
    cluster, I'd recommend you NOT try to do the disklabel from the
    other system, or you'll probably have hung disklabel processes
    on both systems.
    
    I can't think of anything you did wrong. I wonder if you could
    do I/O to the device with, say, "dd", or that would get hung as
    well.
    
    It could even be an MSA1000 problem, not a host software issue.
    
    Just can't say without something like a forced crash to look at
    to see where the disklabel process is hung (probably waiting for
    some mass storage subsystem operation to complete that may never
    complete).
    
    There were mass storage bugs in V5.1B PK3 that are fixed in PK4,
    but there are different issues with PK4 that you'd need fixes for
    before I'd encourage you to move forward.
    
    Tom
    
So it looks like I'm stuck for the moment - I'll need to reboot, and I
probably can't use storage on the MSA1000 without doing some
significant work!

As I was only wanting to use this storage temporarily, I think I'll
look at other easier ways of doing it!

Thanks again everybody!

Andrew

--
Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit, 
Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK
phone: +44 (0)1223 252830   fax: +44 (0)1223 252835
web: www.mrc-dunn.cam.ac.uk email: Andrew.Raine@mrc-dunn.cam.ac.uk
> Dear Managers,
> 
> On a 2-node cluster, running Tru64 PK3, with most of the storage on an
> HSG80...
> 
> We tried to add some storage from our MSA1000, which is sitting on the
> same SAN.  We created the volume, made it visible to the cluster:
> 
> beta # hwmgr -v d
>  HWID: Device Name          Mfg      Model            Location
>  ------------------------------------------------------------------------------
> <cut>
>   230: /dev/cport/scp1               MSA1000          bus-1-targ-3-lun-0
>   231: /dev/disk/dsk32c     COMPAQ   MSA1000 VOLUME   bus-1-targ-3-lun-9
> 
> and then tried to put a disklabel on it:
> 
> beta # disklabel -w dsk32
> 
> but this has hung, and can't be killed!
> 
> So can anyone tell me:
> 
> (a)  What we did wrong?
> 
> (b)  Whether there is anything we can do to recover the situation
>      without rebooting?
> 
> Many thanks!
> 
> Andrew
> 
> --
> Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit, 
> Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK
> phone: +44 (0)1223 252830   fax: +44 (0)1223 252835
> web: www.mrc-dunn.cam.ac.uk email: Andrew.Raine@mrc-dunn.cam.ac.uk
> 


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:15 EDT