help with 2104 raid-5 set drive failure, please!

From: Sandor W. Sklar (ssklar@STANFORD.EDU)
Date: Tue Jul 02 2002 - 12:53:04 EDT


Yesterday afternoon, I got the dreaded "LVM_MISSPVADDED" error log
entry. One of the hdisks that was a raid5 set in a 2104 frame
disappeared (well, went to 'defined' state.)

I went into smitty pdam, and saw that one drive in one of the two
raid-5 sets attached to scraid0 had a status of failed. for some
reason, this failure (one disk out of six in the raid set) caused the
PV to go "missing". The missing PV was hdisk13: lsdev -Cc disk
showed the item in the "defined" state, and lspv did not show the
item at all.

I varyoff'ed the vg containing the missing pv (vg name is
"scsiraid"), attempted to reconstruct the raid set with the failed
drive; that action failed. I then removed the failed drive from the
2104 array and replaced it with a known good drive module out of an
unused 2104. The system did not acknowledge the change in drives,
meaning it continued to report the drive as "failed", and reported
the serial number for the broken, removed disk when displaying VPD.

I then added a "hot spare" to the raid set, and the system
immediately began "reconstructing" the raid set. It completed in
about two hours, leaving the status of the set as "optimal", but
hdisk13 was still in the "defined" state.

I then did "mkdev -l hdisk13", and it failed with the error:

Method error (/usr/lib/methods/cfgscraid -l scraid0 ):
         0514-061 Cannot find a child device.

i also ran 'cfgmgr',which produced the same error. IBM then told me
to do a "rmdev -dl hdisk13", which I did (perhaps foolishly). The
hdisk was deleted (as expected), and no longer showed up anywhere.

I now run cfgmgr, and get the same error message as above. Note that
the other raid5 set (hdisk12) that is on that scraid adapter is
functioning normally: I can vary on the volume group that contains
it, and aside from the missing volume, the hdisk12 works as expected.

At this point, IBM has asked me to upload a snap-ball so they can
look at it. Sadly, they are as lost as I am.

Any pearls of wisdom out there? (aside from, in the future, don't be
cheap, and go with the EMC disk :-)

-s-

--
   Sandor W. Sklar  -  Unix Systems Administrator  -  Stanford University ITSS
   Non impediti ratione cogitationis.     http://whippet.stanford.edu/~ssklar/


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 22:16:02 EDT