Hard Lock/SCSI CAM ERROR/5.1B/ES40/HSZ70

From: David Knight (dknight@fitzandfloyd.com)
Date: Mon Aug 25 2003 - 11:17:46 EDT


Managers,
        I received a hard lock of my Alpha server this morning ( ES40/HSZ70 (RA7000)/ 5.1B PK4) Keyboard/consol not responding/no ping/halt button had no effect. hit the restart button on the ES40 and on reboot (firmware check) received mem errors on the LCD then the system stopped booting before I ever got my consol. I then preformed a cold boot of the system with the halt button in to get the SRM. at the SRM I preformed test mem/etc and received no errors. I then continued to boot the system (rc3) with success. I have no errors in any of my OS logs/alert logs/ no core files. The only errors I found were in by binary error log (Below). the errors talk about scsi cam lun0 target1 witch is on my HSZ70. From the RA7000, show shows that the state is good/no errors on this lun/target (R5). Correct me if I'm wrong but scsi cam errors wouldn't cause a system lock. I would think I would at least get a kernel panic out of the deal.
Any thoughts/leads on my issue would be greatly appreciated.

Thanks,
David Knight

UERF:

----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 2263.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sun Aug 24 04:44:36 2003
OCCURRED ON SYSTEM alpha0
SYSTEM ID x000D0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
----- UNIT INFORMATION -----
CLASS x0037
SUBSYSTEM x0037
BUS # x0000
                              x0008 LUN x0
                                        TARGET x1

_____________________________________________________

======================= Binary Error Log event =======================
EVM event name: sys.unix.binlog.hw.scsi

    Binary error log events are posted through the binlogd daemon, and
    stored in the binary error log file, /var/adm/binary.errlog. This
    event is used to report all SCSI device errors, including disk,
    tape, HSZ raid events and adapter errors.

    Action: Use Compaq Analyze or DECevent to read and analyze the
    system error log to determine if a SCSI device may need to be
    replaced.

======================================================================

Formatted Message:
    SCSI event

Event Data Items:
    Event Name : sys.unix.binlog.hw.scsi
    Priority : 700
    PID : 466
    PPID : 1
    Event Id : 1660
    Timestamp : 25-Aug-2003 06:03:04
    Host IP address : 10.34.80.2
    Host Name : alpha0
    User Name : root
    Format : SCSI event
    Reference : cat:evmexp.cat:300

Variable Items:
    subid_class (INT32) = 199
    subid_num (INT32) = 0
    subid_unit_num (INT32) = 8
    subid_type (INT32) = 34
    binlog_event (OPAQUE) = [OPAQUE VALUE: 1224 bytes]

============================ Translation =============================
Sequence number of error: -129694387
Time of error entry: 25-Aug-2003 06:03:04
Host name: alpha0

SCSI CAM ERROR PACKET
SCSI device class: DEC SIM
Bus Number: 0
Target number: 1
Lun Number: 0

Name of routine that logged the event: ss_perform_timeout
Event information: timeout on disconnected request

                ############### Entry End ###############

Event information: Active CCB at time of error

                ############### Entry End ###############

======================================================================



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:33 EDT