Tru64 DS10 466 hangs - 53C895 controller?

From: Christopher C. Stevenson (csteven@physics.mun.ca)
Date: Fri Oct 25 2002 - 09:40:23 EDT


Greetings,

One of our DS10's has become flakey. I suspect the 53C895 scsi
controller, but it might be other than this because vmunix starts to load
after an error-free POST and check as per a normal boot, but fails when
vmunix checks scsi0.

Some weeks ago, while it continued to answer pings and maintain NFS
connects to the point where a mirror-machine could still actively see its
exported disks, any and all attempts at interactive sessions with the
machine (rsh/login, telnet, ftp, including the console) would freeze. A
power cycle was the only way to attempt recovery. The machine would
successfully pass system, disk and network tests, start loading vmunix,
otherwise booting as normal, and then freeze immediately after reporting
the CD:

(power on; blue screen)
Testing the System
waiting for pka0.7.0.14.0 to poll...
Testing the Disks
dqa0.0.0.13.0 has no media present or is diabled via the RUN/STOP switch
file open failed for dqa0.0.0.13.0
Testing the Network
System Temperature is 35 degrees
initializing GCT/FRU at 3ff5a000

COMPAQ AlphaServer DS10 466 MHz Console V5.7-8, Apr 12 2000 11:20:07

>>> b
(booting dka0.0.0.14.0 -file vmunix -flags a)

[...]
Loading vmunix
[...]
PALcode: UNIX version 1.72-59
[...]
tu1: auto negotiation off: selecting 100BaseTC (UTP) port: half duplex
ata0 at pci0 slot 13
ata0: ACER M1543C
----solid freeze----

(next lines would've been "scsi0 at ata0 slot 0","rz0 at scsi0..."etc)

(press halt button)

CP - SAVE_TERM routine to be called
CP - SAVE_TERM exited with hlt_req = 1, r0 = 00000000.00000000

halted CPU 0
halt code = 1
operator initiated halt
PC = fffffc00002d5e58
>>>
>>> show config
...PCI Hose 00
Bus 00 Slot 01: Acer Labs M1543C USB
Bus 00 Slot 07: Acer Labs M1543C
                        Bridge to Bus 1, ISA
Bus 00 Slot 09: DE500-BA Network Controller
                ewa0.0.0.9.0 08-00-2B-87-25-26
Bus 00 Slot 11: DE500-BA Network Controller
                ewb0.0.0.11.0 08-00-2B-87-26-D0
Bus 00 Slot 13: Acer Labs M1543C IDE
                dqa.0.0.13.0
                dqa0.0.0.13.0 Compaq CRD-8402B
Bus 00 Slot 14: NCR 53C895
                pka0.7.0.14.0 SCSI Bus ID 7
                dka0.0.0.14.0 COMPAQ BD018734A4
Bus 00 Slot 17: ELSA GLoria Synergy
>>>
>>> test

(and the system will not come back with a ^C, or even
with a halt button push. Power cycle is all that works)

A few weeks ago, opening up the box and reseating the controller
seemed to do the trick (or perhaps letting it cool to ambient was
what did it?) - an unnerving fix. Sure enough, yesterday it acted
up again. This time, reseating does nothing. There was little warning.

Any ideas? Apologies for the lengthy message.

Chris

======================================================================
Christopher C Stevenson, C4063 office: (709) 737-2624
Dept. of Physics & Physical Oceanography fax: (709) 737-8739
Memorial University of Newfoundland
St. John's, Newfoundland, CANADA A1B 3X7
URL: http://www.physics.mun.ca/~csteven
======================================================================

"We are all in the gutter, but some of us are looking at the stars."
                        -- Oscar Wilde



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:48:57 EDT