terrible lsm problem (lost volume info)

From: Piotr Grzybowski (merlin@abakus.if.uj.edu.pl)
Date: Thu Aug 29 2002 - 16:30:47 EDT


Hullo Everyone!

 I am having a great problem and great difficulties in fighting
with it. I know that i can count on everyone on this mailinglist,
and to tell you the truth i do count on you!

 What was the source of problem:
 - i have shutdown the machine, running digital unix 5.0 with lsm
 support.
 - connected the second scsi hdd, not changing the first one connection
 - the second hdd, is not working any way (there is fat there, but not
 really formated, not unusable anyway)
 - when i have switched the machine back on, the system on the
 hdd which was left untouched refused to boot.

 What are the symptoms:
 - the boot disk is dkc600, it is ibm scsi disk, new one, 36gb.
 when booting normally (boot -fl A dkc600) the /var, /usr and /proc
 file systems are not mounted.
 - the single user boot (boot -fl s dkc600) is possible. only
 the root_domain is mounted. when trying to do `mount /usr`
 i get the error, that the /dev/vol/rootdg/usrvol is not
 a valid device, or something like that, any way it cant be mounted,
 same for /var. both /usr and /var are under lsm control.

 What has been done:
 - i was able to do/discover the following:
   - when booting to the single user mode, as well as i think to the
   multiuser mode, the devices are _NOT_ assosiated with the device
   files(!), `hwmgr -view devices` shows no device special file name
   for all the deivces.
   - the call to `dsfmgr -N` fixes that problem, but the names
   which are assosiated with the deivces are _different_ (!!)
   the cdrom was cdrom0a, and it is now cdrom0c, no matter what
   i do with the other disk. i belive that it is the same for all
   the devices, they all became 'c', dsk0a is now dsk0c, etc.
   (i have not yet tried to delete the device special files
   and recreate (cd /dev && rm -Rf * && dsfmgr -C))
   - the rootdg cannot be read from disk. i was able to
     - put the vold in enabled state (`vold`)
     - add disk to volboot (`voldctl add disk dsk0c`)
     - init the rootdg, and import it
       (`voldg init rootdg`, `voldg -o convert_old -f import rootdg`)
     - list the disks known to lsm (`voldisk list`), there is _NO_
       dsk0c in that list, and there is only one simple type disk,
       dsk0g, dsk0b,d,e, and dsk1 - not online, and not udner the
       lsm control
     - add disks to the rootdg (`voldg adddisk dsk0g=dsk0g`)
     - try volrecover (no efect)
     - try volprint (0 volumes 0 plexes found (!!!))
     - all that was possible only after changing in sysconfigtab the
       lsm_rootdev_is_volume=1 to =0
    - the disklabel on dsk0c informs me, that the root partition
      (a) is advfs partition, but i still do not understand
      how it can be mounted, when
      /etc/fdmns/root_domain/rootdg.root_domain points to
      /dev/vol/rootdg/rootvol which is _NOT_ the valid device file
      (!!!!)
    - the ed editor is for those who like to edit files A LOT.
 - none of the things i have done is saved after the reboot, besides
   voldctl init,add,rm .

 What has happedned to the configuration?
 - i belive that the rootdg information, somehow, is lost
 - lsm cannot find any volumes on the disks that are seen by
   `voldisk list`
 - lsm do not see the dsk0c (voldisk list)
 - something is very, very wrong with the device files assosiation
   to the devices, at every reboot i must run `dsfmgr -N`, to
   make the devices have the device special files.

 What are my questions?
 - how to find the info about rootdg on the disk
 - how to make the disk became dsk0a - the way it was before
 - how to make lsm see the volumes on the disk, and how
   to make lsm see the corrent disk
 - what kind of system goes crazy, and crashes, after adding
   and removing a neutral hdd
 - what should i do (the format of the /usr and /var is _OUT_ of
   the question, i have invested two weeks, 8hours per day work
   into this system)
 - who was brave enough to come so far into this post to read that

 Summary
 - i am gratefull for time, which must have been given by you, to me
   only for reading that. i would like to thank you all, even if
   we will not find the solution to this problem, you are always here,
   to _give_ help and listen to someones problems and question.

Thank You tru64-managers!

Yours,
 pg



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:50 EDT