Brief on HSG80 SCSI-3 to SCSI-2 reconfiguration

From: Davis, Alan (Davis@tessco.com)
Date: Mon May 12 2003 - 12:53:34 EDT

Next message: wj27@mail.gatech.edu: "Off the self disks in an Alphastation?"
Previous message: Chris Los: "SUMMARY: Firmware requirements for 5.0A on AlphaServer 2100"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This is, at this point, probably only good for historical documentation. I
wrote it up after reconfiguring our SAN to allow both v4.0x and v5.1 servers
to connect to the same fabric. Reconfiguration the other direction, from
SCSI-2 to SCSI-3 is much less problematic and is documented in the manuals.

The saga of a SAN reconfiguration...

Keeping in mind that we only have 8 UNIX systems to put up on the SAN
together,
some of the lessons learned may apply to other's with a similar situation. I
would hesitate
to do this with a large server farm.

The root-cause of this reconfiguration is that the target date for upgrading
our
production Oracle Apps servers to v5.1 has come and gone and I needed to be
able to move forward with the SAN deployment with a mix of v4.0F and v5.1
systems. The SAN was originally configured with only v5.1 systems attached
and
the v4.0F systems were to be upgraded before attaching them. The SAN now had

to be reconfigured so that both OS's could attach to it.

The SAN configuration rules, among other things, require that the HSG80's be

configured in SCSI-2 mode with transparent failover and only one HBA per
host.
This meant downgrading the HSG80 from SCSI-3 mode and multi-bus failover.
Neither of these are covered in any of the manuals or whitepapers.

I logged a call with Compaq Services, StorageWorks support and was told that
I should delete all the connections and units prior to changing the HSG80
settings. Further discussions with remote support and our local Field Circus
engineer made it clear that upgrading to the latest revisions of all the
bits and pieces would be very advisable.

These upgrades consisted of :
patching 6 v5.1 DS10L's from pk2 to pk3 to get the v1.29 emx driver
patching 2 v4.0F systems, 1 DS20 and 1 AS4100, from pk5 to pk6 for the emx
update
upgrading the firmware on all KGPSA's to 3.18a4
upgrading the FC SAN switches from 2.17 to 2.19g
upgrading the HSG80's ACS from v8.5F-1 to v8.6F-1, requires a card change
patching the HSG80's from v8.6F-1 to v8.6F-2

Two of the DS10L's were production web servers and needed to be back in
service as quickly as possible. The rest of the systems could be down for
several hours on a Sunday without seriously affecting operations.

The order of the updates may or may not be critical, but it seemed to be
a good idea to get the OS and KGPSA's updated first, then the switch and
finally the HSG80's. The switch to SCSI-2 and transparent failover would be
last.

The HSG80 8.6F-1 cards were ordered in advance. The KGPSA and FC switch
firmware were downloaded from the Compaq support website. The KGPSA fw
was put onto floppy, the FC fw was put on a UNIX system for upload.

Full backups were made of all the systems prior to beginning the upgrades.

The OS patches went on easily. The KGPSA firmware was more difficult.
The readability of the floppy varied from system to system. Several
DS10L's required numerous tries to be able to read it and one refused
even after repeated efforts. For this system the fw was burned onto a CD
and finally loaded. The other problems stemmed from the differences in
getting into and out of the Alphabios/ARC console on the various
systems. Using the RCM (remote console) command "reset" was the most
consistent way of exiting ARC.

The FC fw file is loaded using rcp and requires that the copy not prompt
for a password. There must be an entry in the hosts file for FC switch
and an entry in the $HOME/.rhosts file. I wasn't able to get it to work
with the "admin" username in the .rhosts file, so only the hostname of
the switch was used.

The web interface worked well to upload the new code. A switch reboot is
required to activate the new fw. This will interrupt access to any disks
or hosts served solely by that switch. This means that any systems that
must stay online must have an alternate path to disks on the SAN or a
non-SAN mirror of the disks.

Similarly, the HSG80 update requires that at least one of the controllers
be offline at a time. For this upgrade all systems on the SAN were either
halted or had non-SAN mirrors. More on that later.

The replacement procedure for the ACS cards is straightforward and presented
no surprises.

Applying the patch to bring the ACS up to -2 was beset with problems. The
SWCC
v2.2 seems to have problems with the controller software update process and
was abandoned in favor of manually entering the patch via the CLCP utility
on the controller.

The process is tedious and error-prone, but is easily explained and has
good error checking. The only problem came from the version verification.
The
patch listing printed out the version as V85F. When this was entered into
the
CLCP, however, the current version of the card was displayed as (V85F ).
The CLCP wouldn't load the code due to a version mismatch. The difference is
easy to see here, but in the loading process isn't nearly as clear. The
solution was to enter the version as "V85F<sp><sp>". This satisfied CLCP
and the patch went on cleanly afterwards.

The final steps were to, at last, reconfigure the HSG80 to achieve the
ultimate goal, heterogenous OS SAN access.

The HSG80's were in SCSI-3 and multibus failover mode. It took a number of
attempts and several controller reboots to find the right combination of
events :
    Set all port 1 connections to use unit_offsets between 0 and 99
    Set all port 2 connections to use unit_offsets between 100 and 199
    Set this nofailover
    Manually restart other by pressing the restart button
    Set failover copy=this

At this point any units above 99 are only visible on port 2 of the bottom
HSG80 and units 0-99 are only visible on port 1 of the top controller.
This will affect which units are accessible to which systems if the switches
aren't meshed. There are explanations of the different configuration options
in the Heterogenous SAN Implementation whitepaper.

If at all possible, shut down all systems attached to the HSG, even if no
disks are being presented. The switch from SCSI-3 to SCSI-2 affects the
initialization of the UNIX emx device driver. It doesn't seem to cause
any dataloss, but any ADVfs domains will panic if an I/O is attempted. LSM
will not catch the errors until too late. Rebooting the system will bring
all the disks back online, but it's less alarming to have them boot cleanly
into SCSI-2 mode.

    Shutdown all systems attached to the HSG80, if possible.
    Set this SCSI-2
    Restart other
    Restart this
    Reboot connected systems.

The new disks should now be accessible, provided that the unit/LUN naming
rules are followed.

One nice surprise was that units that were deleted and re-added at a
lower unit number retained their WWN and reattached to the host without
changing their dsk number.

There were some connections that had to be recreated. Deleting them from
the HSG and issuing a "hwmgr -scan scsi" or "scsimgr -scan_bus bus=N"
created new !NEWCONnn connections. The unit_offset was updated for these
and new, more meaningful, names were given to them. The disks were then
recognized at by the host.

The recommendation from Compaq Service to delete all units and connections
seems to stem from this requirement to reconfigure them. Identifying which
ones must change isn't difficult and limits the amount of effort to complete
the reconfiguration.

Regarding the non-local mirrors used to keep the production systems
available
while the SAN was offline : LSM was used to mirror the 18gb of data to the
H partition of the 30gb internal disk on each DS10L. When the SAN was
offline,
LSM continued serving the data from the internal mirror. After the SCSI-3 to

SCSI-2 configuration change each DS10L was rebooted and LSM automatically
recovered the DISABLED STALE plexes from the internal mirror. The internal
mirrors were then removed, leaving the system running strictly on the SAN
RAID1 production disks.

Next message: wj27@mail.gatech.edu: "Off the self disks in an Alphastation?"
Previous message: Chris Los: "SUMMARY: Firmware requirements for 5.0A on AlphaServer 2100"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:18 EDT