A5000: cannot not bring A loop online (flickering lightning bolt)

From: Wesley W. Garland (wes@page.ca)
Date: Mon Jan 12 2004 - 12:14:31 EST


Hi, Sun Managers!

Hopefully someone out there can help me figure out what's going on.

I have an A5000 and an A5200, connected to four 420Rs with two FC/100Ps each. There are two mirrors of data with SVM (SDS for Solaris 9); half of each mirror is in one SENA enclosure. One mirrored pair is on A channels, one mirrored pair is on B channels. The 420s multi-initiate so that two standby machines can replace their opposing two machines by taking over/releasing SVM disksets (importing/deporting DGs for you Veritas folks). Oh, the A5x00s are configured in single loop mode.

I recently rebooted one of the 420s, and the other 420 on the same fibre loops lost the ability to talk to its disks until I rebooted it -- bad, bad, very bad! I learned via a Sun FIN that the FC/100P is not compatible with the A5x00 in a multi-initiator configuration due to a problem with LIP -- GREAT. The reactive options in the FIN say either "Don't do that!" or "Stick some hubs in the way".

Shortly after this multi-initiator problem arose, I noticed that I was not able to get access to the A channel of the A5000 from either host on it, although the B channel is working fine. I initially chalked this up to a hung 'luxadm probe' on one of those boxes which I couldn't kill (even with -9), and didn't want to reboot, so I left it alone.

Since I was going to the data center anyway, I looked in on those machines to double-check the fibre configuration (thinking evil trolls might have knocked my cables loose somehow), and noticed the A5000 reporting via the front panel LCD that one of the interface boards (IBs) had failed.

So, along with ordering Sun FC100 Hubs to fix the first problem, I also ordered an IB for the A5000.

Fast forward a couple of days, and I've got all my parts in hand and a scheduled maintenance window. I powered down the SENAs, and replaced the A5000's IB (is that hot-pluggable? The service manual doesn't seem to say either way). Hooked up the hubs, powered things back up.. verified that the application software is working correctly.. maintenance window is over.

THEN I noticed that the A0 status "lightning bolt" on the A5000 LCD panel is flickering (the hub is now plugged into A0). GREAT. Sure enough, I am able to sometimes see the enclosure with 'luxadm probe' but 'luxadm display' fails after hanging for a long time.

I swapped fibre cables (Sun branded, even!); I swapped GBICs and ports at the hub; I switched the hub to A1 (with a different GBIC) on the A5000. No luck. Oh, I have green lights on all the equipment on the GBIC-side, it's just the stupid lightning bolt which is flickering.

Any ideas out there in Sun Manager land?

The following possibilities have crossed my mind:
 - Fresh IB was also bad (or installed wrong? Is that even possible?)
 - New hub is bad
 - A5000 chassis suddenly went bad
 - Evil Leprochauns are hiding in the rack and taunting me

Any experience out there? Should I just truck out a whole new enclosure and swap the disks? (The enclosure is VERY difficult to remove from the cabinet... two hours to swap, at least). Are there any diagnostic steps I'm missing, that can be done without disrupting the other three hubs/loops?

I have a spare 5200 in my lab which I can use.. Lots and lots of spare GBICs and fibre.. No spare hubs or switches. The equipment is also about 135-150 miles from here and mission-critical. This means that I have to stay up until the VERY wee hours of the morning before I'm allowed to fix it, which I absolutely detest (I am not a young man anymore!).

Oh, I guess I have one more question, is it possible to get at those disks using the B channel? Previous experience tells me no, because the WWN will change (22* to 21*?) which will make SDS very, very angry. (No DID or multipathing without SunCluster AFAIK). And of course "switch it, and see!" is not an option for this environment.

Thanks for any info.
Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
--
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:48 EDT