Power supply confusion.

From: Lawrence, Kenneth E ERDC-CHL-MS Contractor (Kenneth.E.Lawrence@erdc.usace.army.mil)
Date: Wed Jul 09 2003 - 17:58:05 EDT


Hello Admins,

I have an AS4100 running 4.0E with patches.
It has all 4 cpus, 2 GB memory
and 2 power supplies (specifically 0 and 2).

The system panicked this morning.
The "show power" command at the SRM prompt points to an intermittent problem
with power supply 2. I am willing to believe this.

BUT, dia says:

  System type register x00000016 Alpha 4000/1200 Series
  Number of CPUs (mpnum) x00000004
  CPU logging event (mperr) x00000000

  Event validity 1. O/S claims event is valid
  Event severity 1. Severe Priority
  Entry type 100. CPU Machine Check Errors

  CPU Minor class 2. 660 Entry

  Software Flags x0000000300000000
                                       IOD 0 Register Subpkt Pres
                                       IOD 1 Register Subpkt Pres
  Active CPUs x0000000F
  Hardware Rev x00000000
  System Serial Number <Deleted>
  Module Serial Number
  Module Type x0000
  System Revision x00000000

  Machine Check Reason x0208 Fatal Environmental Event Interrupt
                                       
                                       
  Environmental Entry ---> System Environmental Register Follows

  ======================== =====================================

  Sys Environmental Regs x000017CB Function Reg<15:8>: x00000017
                                       Failure Reg <7:0> : x000000CB
>> Invalid Pwr Supply 0 Status Bits
Sequence
>> Power Supply 1 Present and Ok
>> Invalid Pwr Supply 2 Status Bits
Sequence
                                       System Fans are OK
>> PROBLEM with CPU Fan 0 and 2
                                       Temperature is OK

  PALcode Revision Palcode Rev: 1.21-26

As you can see, the decoded Environmental Register bits claim that power
supplies 0 and 2 have invalid status bits and power supply 1 (which doesn't
exist) is present and ok.

So the real question is "Does this suggest that the problem is monitoring
circuits giving false readings?" Or should I trust the SRM and buy a
replacement power supply?

As a followup question...Does anyone know where I can find a quality, yet
inexpensive, replacement power supply?

Thanks!!

Ken Lawrence <mailto:lawrenk@wes.army.mil>
BAE Systems
USAE-Engineer Research & Development Center
Coastal & Hydraulics Lab
Unix System Administrator?
601.634.3813



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:26 EDT