HSZ 40 / DEC Alpha Cluster / Problem after power failure

From: Christian Wessely (christian.wessely@uni-graz.at)
Date: Fri Dec 17 2004 - 02:11:28 EST


Complete output of SHOW THIS FULL at the HSZ40 at bottom of message

Hello Admin wizards,

after almost two years of painless work we suffered from a power failure
last morning.
Even though we have a UPS connected the system did not power down
regularly as it was supposed to do after the 15 minutes period. So what
I have now is the following problem:

The server itself boots up normally, but does not find the connected HSZ
40 (dual redundant) controllers, better: the defined raidsets, formerly
known as HSZ40#raid -> /dev/rrz17g and so on. I already figured that the
symlinks in /etc/fdmns are missing; I recreated them by unpacking the
fdmns dir from a tarfile, but no success. Am I supposed to do a MAKEDEV
in /dev/ again?

Having connected the laptop to the device, one of them says
--------------------------------------------------
This controller has an invalid cache module
Controllers misconfigured. Type SHOW THIS_CONTROLLER
Power Supply failure cleared.
Invalid cache -- CLI command set reduced. Type SHOW THIS_CONTROLLER.
Please-
see user guide to determine corrective action
--------------------------------------------------
The other one seems to be ok, and it shows the raids and units as it is
supposed to be.
The user guide suggests to use the command
CLEAR_ERRORS INVALID_CACHE THIS_CONTROLLER

but the device responds with
HSZ_UNTEN > CLEAR_ERRO
Incomplete command

it wont accept complete commands but breaks after an unpredictable
length. Same if I try to issue SET FAILOVER COPY=OTHER - breaks after
SET FAILO and complains about incomplete command ...

I wonder why, and how I can get out of this mess ....

would be grateful for any hint

regards
CW

--------SHOW THIS FULL----------------------------------
HSZ_UNTEN > show this full
%CER--HSZ_UNTEN > --13-JAN-1946 04:33:29 (time not set)-- Invalid cache
-- CLI-
command set reduced. Type SHOW THIS_CONTROLLER. Please see user guide to-
determine corrective action
HSZ_UN
Controller:
         HSZ40 ZG62003200 Firmware V31Z-4, Hardware A01
         Configured for dual-redundancy with ZG65008815
             Controllers misconfigured -- other controller not in
failover, a
             SET FAILOVER COPY= is required to re-synchronize controllers
         SCSI address 7
         Time: NOT SET
Host port:
         SCSI target(s) (1, 2, 3, 4), Preferred target(s) (1, 2, 3, 4)
         TRANSFER_RATE_REQUESTED = 10MHZ
Cache:
         32 megabyte write cache, version 2
         Cache is INVALID. Cache containing unflushed data
          has been removed from this controller
         Unknown unflushed data in cache
         CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
         CACHE_UPS
         Host Functionality Mode = A
Licensing information:
         RAID (RAID Option) is ENABLED, license key is VALID
         WBCA (Writeback Cache Option) is ENABLED, license key is VALID
         MIRR (Disk Mirroring Option) is ENABLED, license key is VALID
Extended information:
         Terminal speed 9600 baud, eight bit, no parity, 1 stop bit
         Operation control: 00000000 Security state code: 16536
         Configuration backup enabled on 12 devices
This controller has an invalid cache module
Controllers misconfigured. Type SHOW THIS_CONTROLLER
Invalid cache -- CLI command set reduced. Type SHOW THIS_CONTROLLER.
Please-
see user guide to determine corrective action
HSZ_UNTEN >



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:13 EDT