SWXRC-04 - Storage Works - Conroller

From: Martin Koerfer (koerfer@mpch-mainz.mpg.de)
Date: Wed Aug 04 2004 - 05:35:06 EDT


Answers to my mail below
from R.Cortegoso:
Try "set failover copy=this" on the working SWXRC-controller
-I just did before with the resulting message:

Cannot set failover between SWXRC and HSZ40 controllers
controller misconfiguered

from Peter Reynolds:
-besides some ideas about "invalid cache", he pointed out the following:

However there is one major worry that I have with your problem, and that is
that if the unit has lost information such as the prompt there may be a
fault with the actual controller itself. This information is retained by a
battery backup cell on the board, and this has a finite life (usually about
5 years). If this cell is failing you will continue to get these errors, and
you can't just replace the cell. If it is removed the board will loose all
its internal configuration information, the most important of which is the
serial number of the board. This can only be recovered by using the
'dangerous' command, and the name for that speaks for itself. If this is the
case it would be better to either replace the board, or call for outside
support.

-Due to the behaviour with the "set failover"-command I suggest that somehow
exactly this happened.
Now when using the 'dangerous' command, how can I (knowing the serial number of
the board) recover the internal controller configuration ???

Thanks

Martin

-------------------------------
Dear Managers,

after a lot of work I succeeded in running TruCluster 5.1A PK6 on AlphaServers
2100A/1000A with a Storage Works Shelf that used dual redundant (HSZ40-type)
SWXRC-04 controllers.

-Since a few days one of the controllers could no longer be accessed.
-Exchange of the Flash-Cards between both does not change anything
- "run fmu" described the last_failure_code "02d72390" as a "cache battery failure".
After changing the cache-batteries the failed contröller could be accessed
again, but I got the following error:

An unexpected buckcheck occurred during last failure processing:
Last failure code: 02D72390
This controller has an invalid cache module
Invalid Cache -- CLI command set reduced

and the former prompt "SWXRC>" changed to "HSZ>".

-The 2 controllers were no longer able to work together
-It seems, that the "failed" controller had lost its "mind" !

removing the "HSZ-controller" I was able to set up my system again with the
former working "SWXRC-controller" !
But loosing the redundancy !!!

Does anybody know, how I can regain the configuration for the "failed"
controller, in order to let him reappear again as a "SWXRC"-controller ??

Any help would be appreciated

Thanks in advance
 
 
Martin Körfer

-- 
Dr. Martin Körfer
MPI für Chemie
EDV
J.J.Becherweg 27
D-55128 Mainz
Tel. -49-6131/305541
Fax. -49-6131/305318
-------------------------------------------------
This mail sent through IMP: www.mpch-mainz.mpg.de


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:06 EDT