SUMMARY: domain panic

From: Cohen, Andy (Andy.Cohen@cognex.com)
Date: Fri Jul 16 2004 - 15:12:54 EDT


whew! More oustanding help from the list!

Basically the suggestions were to run /sbin/advfs/verify and/or run /sbin/advfs/fixfdmn. I did both but it was the fixfdmn that did the trick for me.

... and to those that kindly suggested I RTFM -- I would've if I had one :-)

The most information came from Derek Haining:

+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_
Domain panics render the filesystem unavailable until the next mount.
Sometimes you must reboot the system before you can attempt to mount
the filesystem again.

At this point I would recommend commenting this filesystem out of
/etc/fstab,
rebooting the system, and then checking the filesystem using
/sbin/advfs/fixfdmn.

Allow me to give you a few tips of fixfdmn if you haven't used fixfdmn
before.
First, I always recommend running it in the "no fix" mode first. This is
done
using the "-n" flag.

Once fixfdmn is done examining the domain (which must not be in use at the
time)
you should examine the output log file. This is named something like:

        fixfdmn.kingdom_domain.log

:)

I expect that you will find that it attempts to clear 512 pages of the
transaction
log file. This is normal and, in fact, is always done. (Unless you have a
really
old copy of fixfdmn.) Look for problems that fixfdmn finds and attempts to
correct. It may simply be that the problem that was causing you difficulty
in
removing the volume also caused you to have problems updating the other data
structures that keep information about the domain, such as the number of
volumes
in use. If this is all it is, this should be a simple fix.

Anyway if fixfdmn finds problems then you should probably repair the domain
by
running fixfdmn without the "-n" flag.

You could try running verify, but verify mounts the domain on a hidden mount
point and uses the kernel to check and/or fix the domain. If the meta-data
on the disk is corrupted, the kernel will simply force another domain panic.
Which is why we wrote fixfdmn -- to get away from the restrictions imposed
by the kernel. We *know* the meta-data could be corrupt. We're trying t
fix it, for goodness' sake! :)
+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_

but thanks also to Graham Allen, Jenny Butler, George Banane, and Kevin Raubenolt.

Andy

ORIGINAL QUESTION
=================
I removed a volume from a domain and now the domain has panicked:

The domain was:

root@thor==> showfdmn -k home_domain

              Id Date Created LogPgs Version Domain Name
3d0e2d5a.0007c3dd Mon Jun 17 14:41:30 2002 512 4 home_domain

 Vol 1K-Blks Free % Used Cmode Rblks Wblks Vol Name
  1L 835760 710560 15% on 256 256 /dev/disk/dsk12a
  2 839472 735432 12% on 256 256 /dev/disk/dsk12b
  3 839472 722672 14% on 256 256 /dev/disk/dsk12d
  4 839472 724952 14% on 256 256 /dev/disk/dsk12e
  5 835840 721952 14% on 256 256 /dev/disk/dsk12f
  6 3548920 3142696 11% on 256 256 /dev/disk/dsk13b
  7 7125776 7109096 0% on 256 256 /dev/disk/dsk15b <===
  8 3567800 3239152 9% on 256 256 /dev/disk/dsk13a
  9 3548920 3216272 9% on 256 256 /dev/disk/dsk13d
 10 3548920 3202840 10% on 256 256 /dev/disk/dsk13e
 11 3567968 3146144 12% on 256 256 /dev/disk/dsk13f
     ---------- ---------- ------
       29098320 26671768 8%

I issued:

rmvol /dev/disk/dsk15b

and removed the volume.

Now I can't do anything with /home. This message is in /var/adm/messages:

Jul 15 15:07:57 thor vmunix:
Jul 15 15:07:57 thor vmunix: bs_inherit - bmtr_get_rec failed, return code = -1043
Jul 15 15:07:57 thor vmunix: AdvFS Domain Panic; Domain home_domain Id 0x3d0e2d5a.0007c3dd
Jul 15 15:07:57 thor vmunix: An AdvFS domain panic has occurred due to either a metadata write error or an internal inconsistency. This domain is being rende
red inaccessible.
Jul 15 15:07:57 thor vmunix: Please refer to guidelines in AdvFS Guide to File System Administration regarding what steps to take to recover this domain.

in etc/fdmns/home_domain:

lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12a -> /dev/disk/dsk12a
lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12b -> /dev/disk/dsk12b
lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12d -> /dev/disk/dsk12d
lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12e -> /dev/disk/dsk12e
lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12f -> /dev/disk/dsk12f
lrwxr-xr-x 1 root system 16 Jul 14 15:20 dsk13a -> /dev/disk/dsk13a
lrwxr-xr-x 1 root system 16 Jul 14 15:31 dsk13b -> /dev/disk/dsk13b
lrwxr-xr-x 1 root system 16 Jul 14 15:31 dsk13d -> /dev/disk/dsk13d
lrwxr-xr-x 1 root system 16 Jul 14 15:31 dsk13e -> /dev/disk/dsk13e
lrwxr-xr-x 1 root system 16 Jul 14 15:31 dsk13f -> /dev/disk/dsk13f

I tried advscan:

root@thor==> /sbin/advfs/advscan -a -f home_domain

Scanning devices /dev/rdisk/dsk0 /dev/rdisk/dsk4 /dev/rdisk/dsk12 /dev/rdisk/dsk13 /dev/rdisk/dsk5
               /dev/rdisk/dsk6 /dev/rdisk/dsk7 /dev/rdisk/dsk1 /dev/rdisk/dsk8

Attempting to fix link/dev_count for domain

       home_domain

Nothing to fix

but it still doesn't work.

How can I fix this? Salvage seems overkill. Can I remove the links in /etc/fdmns/home_domain and recreate the home_domain from scratch using the same configuration?

Thank you!!

Andy

Andy Cohen
Database Systems Administrator
Cognex Corporation
1 Vision Drive
Natick, MA 01760



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:04 EDT