SUMMARY: ACS 8.7s snapshot option

From: Jean-François Blanchet (Jfblanchet@dgeq.qc.ca)
Date: Fri Nov 08 2002 - 09:53:13 EST


Thank to Thomas Sjolshagen, Paul Thompson and Martin Petder.

I will do my one tests next week. I will wrote a second summary to give you the result.

I include the message I have receive:

>From Paul:

I have been running on 8.7S for about a month with no issues.

It resolved various issues we had with persistent reservation from 8.6.

We had the option of installing the latest patches to 8.6 (and there were a bunch of them from our patch level) or moving to 8.7 and the latter seemed the winner. From someone else on the manager's list, it seemed that even the latest of 8.6 did not resolve their persistent reservation issues.

The rolling upgrade worked as advertised.

>From Thomas:

ACS 8.7s w/PK3 for Tru64 UNIX V5.1A should be both stable and provide usable snapshot functionality. You are correct in that the combination of V5.1A and ACS 8.6 was somewhat lackluster (for lack of a more politically correct term). We're also about to introduce a set of utilities in Tru64 UNIX that will enable hardware based multi-LUN (volume) AdvFS domain snapshots as well as LSM support for HW based cloning/snapshotting

----------------------------------------------

I ask thomas if is it possible to do at the same time two snapshot on the same system (not the same unit)?

Response from Thomas:

You can do 1 snapshot per unit exported to a system. They'll all show up
as LUNs w/"ready-made filesystems" (you have to do the '#mkdir
domainname_clone' in /etc/fdmns and softlink the new devicenames into
the directory) on them and can be mounted (limit is theoretically 65K
(ish) LUNs per HBA on the system (we've not actually tested this limit,
but the infrastructure is there and we've got customers who've created
10000 LUNs in their SAN (>8000 of them were snapshots/clones)

>From Martin :

Well-well, I'll tell engineers here to let you know next week how our
tests are proceeding.

First - NEVER EVEN THINK ABOUT 8.6S-8 - it has specific bugs that'll
crash the storage badly with new unixes (baddest thing that happened to
us was complete loss of HSG configuration on one of the controllers and
corruption on other - many hours of manual restore followed :)

Things clear about ACS8.7 for now:
- Snapshot creation and deletion works perfectly! (Not like 8.6 ()
- YOU MUST UPGRADE ALL FIRMWARES TO ONES INDICATED IN RELEASE NOTES!!!
    (can't stress the importance of that enough :)
    (All firmwares == HBAs, Switches, Disks and - preferably -
    steamd)
- Better use freezefs/thawfs while doing snapshot.
- There is no undestructive upgrade of disks possible in MA*000
    (We upgraded disks firmware by erasing whole storagesets, taking
    disks out into Proliant server, upgraded there, put the disks
    back, restored the storagesets and found out that there is no
    'initialize nodestroy' anymore... but HSG recognized the
    metadata and we left it without reinitialization... :)

Things unclear and messy for us now:
- Reading snapshot at high speeds causes snapshot unit to go into loop
(first there is big reduction in perfomance, then (maybe) there will be
hang of original unit and snapshot unit together with HSG controls (via
steamd and serial)
- Having snapshot decreases the overall speed of the units on the
controller at least 50%, sometimes more.
- There might be requirement of "formatting" snapshot space beforehand -
in our case we first created usual unit on the snapshot area, wrote if
full from /dev/zero, deleted unit, created snapshot (without
initializing the area meanwhile) and it resolved some of the troubles.

Some of thoze things we observed while using old firmware on
switches&disks and now we are retesting everything again.... :)

 

My message :

I want to know if somebody use snapshot (ACS 8.7s) and tru64 5.1a in a production environnement without trouble.

We have made a lot of test with the version 8.6s-8 of ACS and we conclude that it was not ready for production. After reading some comment about version 8.7s, I'm not sure, I want to do test with our system in production.

Is it safe to test/use this feature?
I want to heard about good working case and bad experience.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:58 EDT