RA7000 I/O errors

From: Bryan Daniel (Bryan.Daniel@ucfv.bc.ca)
Date: Thu May 22 2003 - 17:49:41 EDT


Managers:

I have an RA7000 raid array with HSZ70 controller attached to a DS20.
Some time ago one of the controllers failed and has not yet been
replaced, therefore we have been running on a single controller. It has
been working alright, but last week and again today we experienced I/O
errors on one of the RAID sets which continually sent this error to the
messages log:

May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
block(0x31a440, 0x31a440) on device 19,162
May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
block(0x3065c0, 0x3065c0) on device 19,162
May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
block(0x595440, 0x595440) on device 19,162

The only way I found to clear the error was to reboot the server.

What I am wondering is, could this be a problem caused by an overrun of
the single controller? It only seems to occur when we have high I/O for
an extended period such as a database export. Any advice would be
helpful.

The attached message is sent to the root account.

Thank you,
Bryan Daniel
Systems Administrator
University College of the Fraser Valley
Abbotsford, BC Canada

Subject: EVM ALERT [700]: SCSI event

======================= Binary Error Log event =======================
EVM event name: sys.unix.binlog.hw.scsi

    Binary error log events are posted through the binlogd daemon, and
    stored in the binary error log file, /var/adm/binary.errlog. This
    event is used to report all SCSI device errors, including disk,
    tape, HSZ raid events, and adapter errors.

======================================================================

Formatted Message:
    SCSI event

Event Data Items:
    Event Name : sys.unix.binlog.hw.scsi
    Priority : 700
    PID : 326
    PPID : 1
    Event Id : 2054
    Timestamp : 21-May-2003 13:43:33
    Host IP address : 198.162.97.2
    Host Name : whistler
    User Name : root
    Format : SCSI event
    Reference : cat:evmexp.cat:300

Variable Items:
    subid_class (INT32) = 199
    subid_num (INT32) = 2
    subid_unit_num (INT32) = 2047
    subid_type (INT32) = 55
    binlog_event (OPAQUE) = [OPAQUE VALUE: 360 bytes]

============================ Translation =============================
Sequence number of error: 2099250215 Time of error entry: 21-May-2003
13:43:33 Host name: whistler

SCSI CAM ERROR PACKET
Controller type: DISK
SCSI device class: UNKNOWN
Bus Number: 2
Target number: 7
Lun Number: 7

Name of routine that logged the event: isp_reinit
Event information: Beginning Adapter/Chip reinitialization (0x1)
======================================================================



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:19 EDT