SUMMARY ADDANDUM: LSM recovering after error OK, what about FAILING in volprint ?

From: Martin Rønde Andersen - mail.dk (martin.roende@mail.dk)
Date: Mon Jan 14 2008 - 15:35:35 EST


Below is the correct syntaks for fixing this issue.
It is the physical disk name you should use.

Here is also a snippet from a 5.1A release node.
http://www.helsinki.fi/atk/unix/dec_manuals/DOC_51A/HTML/ARHH1DTE/NWRKSCXX.HTM

-----------------------------------------------------------------------------------------------------

      3.6.1 LSM Commands Fail When Disk Left in Failing State

When a node boots into an existing cluster and has connectivity to a
failed device, it automatically brings the device online and
reestablishes the associations with appropriate disk media records.
After this process, the disk is occasionally left in the failing state,
which prevents the disk from being used when space is requested by
commands such as |*volassist*|.

If this situation occurs, you must manually turn off the disk's failing
state, as follows:

----------------------------------------------------------------------------
# *voledit set failing=off device_name

*OUTPUT volprint
dm dsk9 dsk9 sliced 4096 53301919 FAILING

node1:/> voledit set failing=off dsk9

OUTPUT Volprint is now fine:

dm dsk9 dsk9 sliced 4096 53301919 -

Best regards Martin Rønde Andersen

Martin Rønde Andersen - mail.dk skrev:
> Johns finding answers my question.:
> (why didn't I look there... ? ;-) )
>
> # man voledit # !!!
>
>
>
> Lanier, John skrev:
>> Hello,
>>
>> I found the following while looking at "man voledit":
>>
>>
>> ...
>>
>> /sbin/voledit [-g diskgroup] [-e pattern] [-vpsdGrf] set
>> attribute=value...
>> [name...]
>>
>>
>> ...
>>
>>
>> failing
>> Sets (on) or clears (off) the disk failing flag. If the
>> failing
>> flag is set for a disk, the disk space is not used as free
>> space or
>> used by the hot-sparing facility.
>>
>>
>> ...
>>
>>
>> So the syntax looks to be as follows:
>>
>>
>> ...
>>
>> dm dsk3g dsk3g simple 4096 15918532 FAILING <--
>> dm dsk3h dsk3h simple 4096 15918532 FAILING <--
>>
>> ...
>>
>>
>> /sbin/voledit set failing=off cluster_usrvol <--uses "dsk3g"
>> /sbin/voledit set failing=off cluster_varvol <--uses "dsk3h"
>>
>>
>> Hope this helps,
>>
>> --John Lanier
>>
>>
>> -----Original Message-----
>> From: tru64-unix-managers-owner@ornl.gov
>> [mailto:tru64-unix-managers-owner@ornl.gov] On Behalf Of "Martin
>> Rønde Andersen - mail.dk"
>> Sent: Saturday, January 12, 2008 1:54 PM
>> To: tru64-unix-managers@ornl.gov
>> Subject: LSM recovering after error OK, what about FAILING in volprint ?
>>
>> Hello all ..
>>
>> I have the following problem:
>>
>> After this cluster have had problems in the dual HSG80 SAN , the
>> disks are in place again.
>> BUT two partitions of the one side, has a FAILING beside the definition.
>> How do I et rid of the FAILING ?
>>
>> I assume that things are OK, looking down the list of ACTIVE plexes,
>> but I want to get the errors away, because I fear that this is giving
>> me problems when I want to move the rootvol, usrvol and varvol with
>> cfs commands.
>>
>> Moreover, where does this come from ?
>>
>> I also have a datadg disk with the same fenomen.
>>
>> Here is the volprint output for rootdg:
>>
>> * volprint rootdg *
>>
>> ------------------------------------------------------------------------
>>
>> DG NAME NCONFIG NLOG MINORS GROUP-ID
>> DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
>> V NAME USETYPE KSTATE STATE LENGTH READPOL
>> PREFPLEX
>> PL NAME VOLUME KSTATE STATE LENGTH LAYOUT
>> NCOL/WID MODE
>> SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF
>> DEVICE MODE
>>
>> dg rootdg default default 0 1081008785.1026.bivaclu
>>
>> dm dsk3b dsk3b simple 4096 2995888 -
>> dm dsk3g dsk3g simple 4096 15918532 FAILING
>> dm dsk3h dsk3h simple 4096 15918532 FAILING
>> dm dsk24b dsk24b simple 4096 2995888 -
>> dm dsk24g dsk24g simple 4096 15918532 -
>> dm dsk24h dsk24h simple 4096 15918532 -
>>
>> v cluster_rootvol cluroot ENABLED ACTIVE 2995888 SELECT -
>> pl cluster_rootvol-01 cluster_rootvol ENABLED ACTIVE 2995888 CONCAT
>> - RW
>> sd dsk24b-01 cluster_rootvol-01 dsk24b 0 2995888 0
>> dsk24b ENA
>> pl cluster_rootvol-02 cluster_rootvol ENABLED ACTIVE 2995888 CONCAT
>> - RW
>> sd dsk3b-01 cluster_rootvol-02 dsk3b 0 2995888 0
>> dsk3b ENA
>>
>> v cluster_usrvol fsgen ENABLED ACTIVE 15918532 SELECT -
>> pl cluster_usrvol-01 cluster_usrvol ENABLED ACTIVE 15918532 CONCAT
>> - RW
>> sd dsk24g-01 cluster_usrvol-01 dsk24g 0 15918532 0
>> dsk24g ENA
>> pl cluster_usrvol-02 cluster_usrvol ENABLED ACTIVE 15918532 CONCAT
>> - RW
>> sd dsk3g-01 cluster_usrvol-02 dsk3g 0 15918532 0
>> dsk3g ENA
>>
>> v cluster_varvol fsgen ENABLED ACTIVE 15918532 SELECT -
>> pl cluster_varvol-01 cluster_varvol ENABLED ACTIVE 15918532 CONCAT
>> - RW
>> sd dsk24h-01 cluster_varvol-01 dsk24h 0 15918532 0
>> dsk24h ENA
>> pl cluster_varvol-02 cluster_varvol ENABLED ACTIVE 15918532 CONCAT
>> - RW
>>
>>
>>
>>
>> sd dsk3h-01 cluster_varvol-02 dsk3h 0 15918532 0
>> dsk3h ENA
>>
>>
>>
>>
>>
>>
>
>
>



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:36 EDT