SUMMARY: AdvFS crashes AS 2000

From: Harald.Knipp@bfa.de
Date: Wed Aug 21 2002 - 06:32:40 EDT


Hi again,

first thanks for their replys to:

alan@nabeth.cxo.cpqcorp.net
Dr Thomas Blinn and
Peter Gergen

My question was in short if, one could restore a two volume AdvFS domain
with a crashed disk by dd-ing
the content of the old disk to the new. Second part was: why the attempt to
delete the domain entrys in
/etc/fdmns and restoring them manually could result in a server crash (see
my original posting
at the end of this mail).

The short summary for both parts is:

DON'T DO THIS!

Dr. Tom and alan pointed out that even if the domain was unmounted,
something could still be active on it.
They were right. The 'something' was verify. Verify mounts the filesets it
checks. You can see this if you issue
a 'mount' while verify is active. If verify crashes, the temporary mounts
are still there ...

On using dd to clone a crashed disk, Dr. Tom wrote:

"You have to assume that your old disk corrupted some of the data, and
that even though the service tech managed to "dd" the contents
of the old disk onto a new disk, that doesn't mean that your
data was valid. That's probably what caused AdvFS "verify"
to fail as it did. "

 And he pointed out that "The old AdvFS code was often buggy"
(And yes, I know that Tru64 4.0e is no longer supportet, but thats a
management decision. Recently they
allowed me to upgrade two servers to 5.1a - two of about onehundred ... !)

He (and Peter Gerg) also said that it's savest to shut down to single user
mode
for doing such work, but that wasn't possible because
of the distance to the server.

The "restoring an AdvFS Domain" issue was best summarized by Peter Gerg:
- Backup if possible
- disklabel the disks to delete any domain information
- recreate the domain
- get data back from backup
and his conclusion:

"I recommend not trying to fix a corrupted advfs filesystem - too many
headaches in the past and ended up doing the lines mentioned above anyway.
"

... Yes.

Thanks again

Harald

__________________

Hi managers,

I have a serious problem here with a AS 2000 running Tru64 V. 4.0E.
On sunday one of the disks in a two volume AdvFS domain crashed.
We made a call to Compaq and the technician who changed the disk tried
successfully to get the old disk
running again. So he copied the old disk with dd if=/dev/rrzOLD
of=/dev/rrzNEW.
So this morning I tried to mount the filedomain -> worked.
After that i did a verify -q my_domain.
-> After a load of errors verify crashed with a memory fault!

After several attempts to get the domain working I tried to delete the
/etc/fdmns/my_domain directory
- with the domain not mounted, indeed - to recreate it manually (this
worked sometimes in the past).
But after submitting the rm -rf my_domain command the server just crashed.
The domain looked like this:
# ls /etc/fdmns/my_domain
my_domain:
rz9c rz10c

Any idea why a simple rm could crash a whole server?
I read in another posting here that it is not recommended to clone advfs
volumes with dd,
so this might be the cause for the problems with verify.

Anyway, the server is down now (that's why i can't be more specific about
the verify errors)
and I have to wait for a compaq technician again, because the server is
more than 300 Km away. And they
have no local unix manager there. So I just try to collect some
information...

TIA

Harald

Harald.Knipp@bfa.de



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:49 EDT