SUMMARY: DS20 won't boot

From: Bill Sadvary (sadvary@dickinson.edu)
Date: Mon Jun 21 2004 - 07:22:02 EDT


Most replies suggested reseated the CPU board and memory, which I did with
no luck, and to also pull the PCI cards one at a time and try to boot.

After pulling the SAN card, it did boot to SRM. I swapped in a card from
an unused system, reconfig'd the SAN access and now all is fine!

A thanks goes out to:

Thomas.Blinn
Kjell Andresen
Peter Reynolds
Tom Traina

My original post is below along with the first three replies.

-Bill Sadvary

---------- Original message ----------

I had our DS20E shutdown for a couple hours and now it won't boot. The
four LEDs on the front panel all flash on, then #2 and #4 shut of for half
a second then all are on and stay on. This all happens within a
second or so.

I get nothing on the graphics monitor (I'm almost certain the console is
set to graphics) and nothing on COM1's terminal connection.

The Systems Features Board says all is OK, the two pwr supplies, the
system fans, the CPU fan and the temperature. There is no spare pwr supply
installed.

I connected a terminal up to the CPU diagnostics port and I get...

DP264...V00005201.01.000000567ace.0018.02.000007d1.03.0373bef8c1.05.04..
0620000000.14#0000000000000204#.15.00900000.17.

I press enter and never get the SRM prompt.

It has only one CPU, boots off an internal drive, has storage on a SAN
(the SAN switch port does not give a green status like it should).

Any hardware gurus out there with any ideas?

Thanks,
-Bill

----------Replies-----------

Dr Thomas.Blinn
---------------
I have the DS20E maintenance guide but it's in the office and I'm
at home.

I'd start by reseating the PCI modules and things like the CPU
and memories. If you are able to talk to the management board
then the serial hardware is working to some extent, but that's
no guarantee that COM1 is really working. But if the power-on
self-test (the software that runs before the SRM console) is
managing to load the SRM console, and the hardware's connected
and working, then even if the console is set to graphics, you
should be able to get the SRM problem on COM1 by entering a
few "enter" key strokes (unless auto_action is boot in which
case it's going to try to boot, and if the problem is the SAN
card which it sounds like it might be then you might have to
break it out of that mode).

It does sound to me like something is "hanging" the system, so
try removing PCI options until you can get the SRM console to
talk to you through the serial port (COM1). If you can't get
that to work, you've got a problem in the motherboard, the CPU,
or the memory. If you can get the "bare" system board to talk
to you, then you can start adding the options back in to see if
the hang comes back; that will help isolate the problem.

Good luck..

Tom

Kjell Andresen
--------------
http://h18002.www1.hp.com/alphaserver/download/ds20e_reference_d.pdf
Page 8-3:

All 4: Starting console
#2+#4 off: Setting memory low limit
All 4 (again): Probing I/O

NOTE: The first two LED patterns (LEDs 1-4 on, followed by LEDs 1-3 on
      and LED 4 off) are identical to the last two patterns, but
      represent different startup phases. Observe the LED pattern on
      power-up to ensure that the first two patterns execute
      successfully. If power-up does not succeed, and a LED pattern is
      lit that is the same as one of the first two patterns, the
      problem lies with one of the last two phases of the power-up
      sequence.

Any beep codes? --> p.8-2

Table 81 Error Beep Codes
Beeps Message/Meaning Action to Repair
1-2-3 Indicates fail-safe booter startup. The
      firmware in flash ROM is unavailable and
      fail-safe booter has begun running.
      Update the firmware.
      See Section 8.11.
4 No valid header in ROM. Loading entire
      ROM. The header in the ROM is not valid.
      Replace the ROM.
6 Memory error detected. A checksum error
      occurred after the ROM image was copied
      into memory. Either memory is
      misconfigured or a memory DIMM needs to
      be reseated.
      Check memory
      configuration.
      Reseat or replace DIMM.
This was how far I got - hope you can get some information to come a
bit further..

Didn't find information about the CPU diags port in det manual.

I'd remove the i/o card and tried without the san in first place.

Regards,
Kjell Andresen Systems administrator, University of Oslo, Norway
                Center for Information Technology Services and
                Department of Geosciences

Peter Reynolds
--------------
According to the DS20E service guide, the problem you may have can be
caused by a defective DIMM or B-cache, or a faulty PCI card. In addition
to this a faulty CPU or main logic board can pass its POST, but fail to
start the SRM.

Is the LED on the floppy drive illuminated? If it is the firmware is
corrupted.

Is there a 'beep' code? 1-2-3 indicates that the firmware in the flash
ROM is unavailable, and the firmware needs to be updated/replaced. 4
means the ROM has failed and needs replacing, 6 means that there was a
ROM checksum error, and the fault lies with memory. Either memory is
misconfigured or a faulty DIMM needs to be reseated or replaced.

To get over the problem of the firmware being unavailable, you need a
diskette with the file PC264SRM.ROM renamed to DP264SRM.ROM on it. If the
floppy activity led is on, then the failsafe loader is already active,
and expecting this diskette. Insert the disk, then reset the system.

Hope this helps - I'm sorry I can't be more specific.



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:01 EDT