From: Wesley W. Garland (wes@page.ca)
Date: Wed Jan 14 2004 - 14:06:29 EST
Hi, Sun Managers!
The problem has been solved. It turns out that the replacement IB (Interface Board) was either defective, or the wrong revision. Here's
where it gets interesting; it turns out it was the A5200 I was having problems with, not the A5000. Whoops. :) But the time I'd gotten to the data center, it was no longer flickering the lightnight bolt, but reporting a failed IB.
The replacement IB, which the vendor said would work with either the A5000 or the A5200 (but invoiced as "A5000 IB") is stamped with Sun Part Number 340-4069-04, and stickered "-06 REV 52.) To fix it, I used an IB from my lab A5200, marked with the same part number but stickered "-07 REV 50". Ironically enough, the one from the lab is date coded 98/51 while the replacement is date coded 99/47.
I also found that one of the IBM GBICs connected to the hub for that channel (going to an HBA) had failed. I wonder if the flickering lightning-bolt-state is hard on the equipment, or if there are Gremlins in the system?
I received some excellent advise when trying to fix this problem:
Octave Orgeron:
- Double-check firmware revisions in HBA, A5000, IB.
- Double-check GBIC with a loopback cable.
- Patch matrix for A5x00, HBAs, etc. here: http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=finfodoc%2F43212&zone_110=43212
Scott Mickey:
- Try the A5200 IB from your lab. (Good advice!)
- Note that while the Sun part number for A5000 and A5200 IB's are the
same, I think the revision levels are different, so IB's from
A5000's should not be deployed in A5200's (I think we have a winner!)
- Did you know that many datacenters replace their fibre once a year?
(No, I didn't, I think mine will.. we only rolled out FCAL in Sep/03)
- A5000 Configuration Guide: http://docs-pdf.sun.com/805-0264-15/805-0264-15.pdf
- Sun X6732A hub is actually a Vixel 1000 (they even say "Vixel" on the bottom) The Vixel manual is here: http://www.sms.com/support/Vixel/Rapport%201000/InstallGuide_00041017-001_D.pdf
- You should power up the Vixel hub before the rest of the equipment (I didn't know that, but I had been doing it that way "by luck" -- as the hubs have no power switches)
- Check your logs for messages (wow, it filled up /var/adm..):
Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):
Jan 11 09:51:18 zaphod Loop reconfigure in progress
Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):
Jan 11 09:51:18 zaphod LIP reset occured; cause f801
Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):
Jan 11 09:51:18 zaphod Loop reconfigure done
Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):
Jan 11 09:51:18 zaphod LIP occured; cause f801
Jan 11 09:51:18 zaphod scsi: [ID 243001 kern.info] /pci@1f,4000/SUNW,ifp@2 (ifp0):
Also, I learned one more tidbit from the A5000 troubleshooting PDF; you're supposed to use the GBICs in a particular order in the Vixel hubs to prevent signal degredation. I didn't change any of my running hubs (which are using ports 1, 5, and 6) but I clustered the hub connected to the broken IB such that it was using ports 3, 4, and 5, just in case.
Thanks a million, guys!
Wes
-- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102 -- _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers
This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:49 EDT