SUMMARY: Question on Tru64 5.1 and HSG80 snapshots (8.7 does it :)

From: Martin Petder (martin@kungla.ee)
Date: Tue Oct 15 2002 - 11:19:55 EDT


Hello again

Several people confirmed the problem - thus with HSG80 and Tru64 5.x
(and also Solaris 8&9) the Snapshot feature of ACS 8.6S-* is not usable
(and can corrupt your data (and storage configuration) at worst :))

No patches exist (yet?) to solve this problem and it seems that only
solution is to upgrade to ACS 8.7.

We acquired 8.7 last week and have been testing it since with no
problems whatsoever with Tru64 5.1 PK3 (testing with Solaris 9 will
start tomorrow). So I think anybody who plans to use snapshots with
those operating systems (and is still using 8.6) should give a long call
to their customer rep.. :)

So far we have done ~70 snapshots in a row of different data sizes with
8.7 and no of the problems have reoccurred.

But there's some other things about hardware cloning (physical and
logical) - as some of the responces have pointed out. As I've understood
with Tru64 5.1B there will be an 'freeze' function for AdvFS
specifically meant to remedy those. Probably same functionality is
required to use Veritas filesystem. Namely expect during high load of
the filesystem to see a lots of hardware 'aborted command' error
messages from both storage and server size - although we haven't
encountered any data corruption yet - data loss might be a possibility.

Anyways - thank you for your kind words and I hope the information given
can help you get the backups running :)

Sincerely
Martin Petder

---
-------- Original Message --------
Subject: Question on Tru64 5.1 and HSG80 snapshots
Date: Tue, 08 Oct 2002 14:12:36 +0300
From: Martin Petder <martin@kungla.ee>
Organization: Kungla Dialoog C.P.
To: tru64-unix-managers@ornl.gov
Followup-To: poster
Hello all
Having dropped to a little discussion here with HP support I would like
to ask an opinion on:
Whether it's possible to use HSG80 snapshot feature with Tru64 Unix 5.1
if the ACS version is 8.6S?
Our tests seem to indicate and HP is trying to tell us that it's
impossible (i.e. SCSI Unit Attention commands sent out by HSG80 after
completion of snapshot would be interpreted by Tru64 5.1 as Scsi error
messages and initiate a error recovery from the Alpha; which in turn
would mess up HSG80 (disable the snapshot, log some ugly messages to fmu
and in extreme cases crash the storage))..
But HP is relying only on one document to say this (release notes for
Enteprise Volume Manager 2.0C, Tru64 platform agent); it's not mentioned
anywhere else (even in the later version, 2.0D release notes). We are
not using EVM. And all other HPQ material seems to indicate no problems
with Snapshots and Tru64...
So the question is: has anybody anywhere been able to do a snapshot with
HSG80, ACS8.6S(-*), Tru64 5.1 with AdvFS? Or are there any relevant
patches to apply? :)
The enivronment in question is:
- MA8000 storage (2xHSG80 with ACS 8.6S-8, ~10 units altogether),
     transparent failover mode, SCSI-3.
- ES40 with Tru64 5.1, PK1, no clustering, Oracle 8.1.7, 2xKGPSA
- 2xSanSwitch 8EL, interconnected.
- Other servers connected to storage:
     - 1xWin2000 (Proliant, 1xKGPSA) server
     - 3xLinux server (Proliant, 1xQla2200)
     - 1xAS1000, Tru64 5.1, PK1, no clustering, Oracle 8.1.7, 1xKGPSA
So far ~80% of cases we're trying to do snapshots from units connected
to ES40 would result in "Snapshot disabled", sometimes crashing one of
the HSGs.
When trying to do snapshots from other units (not connected to Alphas)
we would get freezing of the CLI (both in SWCC and serial-port) and
disabling of the original units and snapshot units. It doesn't happen
when Alphas are not working.
So - hoping for any ideas :)
---
A little thing that changed our perspective here on this error:
Namely another MA8000 here, connected only to Sun Solaris 9 server 
started to show the same problems. So probably this is related to all 
(?) newest Unixes.
It seems that the errors related to snapshot will start after a number 
of snapshots in a row in both Tru64 and Solaris 9 cases.
And - clarification on the configuration list (the cache is mirrored).
The enivronment in question is:
- MA8000 storage (2xHSG80 with ACS 8.6S-8, ~10 units altogether),
     transparent failover mode, SCSI-3, MIRRORED CACHE.
- ES40 with Tru64 5.1, PK1, no clustering, Oracle 8.1.7, 2xKGPSA
- 2xSanSwitch 8EL, interconnected.
- Other servers connected to storage:
     - 1xWin2000 (Proliant, 1xKGPSA) server
     - 3xLinux server (Proliant, 1xQla2200)
     - 1xAS1000, Tru64 5.1, PK1, no clustering, Oracle 8.1.7, 1xKGPSA
---
-- 
Sincerely
Martin Petder
========================
Kungla Dialoog C.P.
tel. +372 6 115 300
fax  +372 6 115 301
e-mail martin@kungla.ee
========================
-- 
Sincerely
Martin Petder
========================
Kungla Dialoog C.P.
tel. +372 6 115 300
fax  +372 6 115 301
e-mail martin@kungla.ee
========================


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:56 EDT