Sol8 and EVA hangs

From: Eugene Schmidt (fereug@acute.co.za)
Date: Fri Oct 08 2004 - 19:43:32 EDT


Hi Sun Managers

Hope someone has seen this one and can help please?

Customer has an E4500, Solaris 8 with newly attached 2 x EVA disk arrays via
two QLogic 2200 SBus HBA's. Tesing was 100% and fast.

Secure Path 3.0D is loaded for channel failover.

Started experiencing hangs today. What had changed? Was rebooted this
morning. No changes prior to reboot.

Initially no errors in /var/adm/messages, but after a second reboot, errors
started appearing:

Oct 8 11:00:41 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,1 (ssd5):
Oct 8 11:00:41 proddb SCSI transport failed: reason 'aborted':
retrying command
Oct 8 11:09:00 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):
Oct 8 11:09:00 proddb SCSI transport failed: reason 'aborted':
retrying command
Oct 8 11:58:52 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):
Oct 8 11:58:52 proddb SCSI transport failed: reason 'aborted':
retrying command
Oct 8 12:11:13 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):

Disks c7t0d0 c7t0d1 hanging. C6 performs beautifully.

Switch logs and EVA logs shows nothing.

No other error messages except the shown above.

Mounting disk readonly and putting heavy I/O on it emulates problem.

Also, iostat shows disk as 100% busy, with no I/O passing thru. hsx dev -
current path - has same hung state:
"9 9 17 66
                    extended device statistics
    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hsx1
    ....
    0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 hsx813
    .....
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
    0.0 0.8 0.0 0.4 0.0 0.0 0.0 13.9 0 1 c0t1d0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d1
    0.0 4.2 0.0 18.6 0.0 0.0 0.0 0.4 0 0 c6t0d2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d3
    0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c7t0d0
    0.0 0.0 ...
"

Below lenghty config files as installed by install script.

Promise a summary.

Thx

E Schmidt
==========

"spmgr" display shows the following config:
# spmgr display
  Server: acproddb10 Report Created: Fri, Oct 08 16:34:46 2004
  Command: spmgr display
  = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
  Storage: 5000-1FE1-5002-81C0
  Load Balance: Off Auto-restore: Off
  Path Verify: On Verify Interval: 30
  HBAs: qla2200-0 qla2200-2
  Controller: P5849D5AAPW01O, Operational
               P5849D5AAPW038, Operational
  Devices: c6t0d0 c6t0d1 c6t0d2 c6t0d3

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 0 c6t0d0 6005-08B4-0001-3879-0000-D000-0150-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW01O no
                      hsx-1-37-1 qla2200-0 no Active
                      hsx-3655-36-1 qla2200-2 no
Available

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW038 no
                      hsx-204-38-1 qla2200-0 no
Standby
                      hsx-3858-39-1 qla2200-2 no
Standby

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 1 c6t0d1 6005-08B4-0001-3879-0000-D000-0153-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW01O no
                      hsx-2-37-2 qla2200-0 no
Standby
                      hsx-3656-36-2 qla2200-2 no
Standby

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW038 no
                      hsx-205-38-2 qla2200-0 no Active
                      hsx-3859-39-2 qla2200-2 no
Available

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 2 c6t0d2 6005-08B4-0001-3879-0000-D000-0156-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW01O no
                      hsx-3-37-3 qla2200-0 no Active
                      hsx-3657-36-3 qla2200-2 no
Available

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW038 no
                      hsx-206-38-3 qla2200-0 no
Standby
                      hsx-3860-39-3 qla2200-2 no
Standby

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 3 c6t0d3 6005-08B4-0001-3879-0000-D000-0164-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW01O no
                      hsx-4-37-4 qla2200-0 no
Standby
                      hsx-3658-36-4 qla2200-2 no
Standby

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW038 no
                      hsx-207-38-4 qla2200-0 no Active
                      hsx-3861-39-4 qla2200-2 no
Available

  Storage: 5000-1FE1-5002-2510
  Load Balance: Off Auto-restore: Off
  Path Verify: On Verify Interval: 30
  HBAs: qla2200-0 qla2200-2
  Controller: P5849D5AAPC09X, Operational
               P5849D5AAPC09E, Operational
  Devices: c7t0d0 c7t0d1 c7t0d2 c7t0d3

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 0 c7t0d0 6005-08B4-0001-24D1-0000-A000-0193-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09X no
                      hsx-813-33-1 qla2200-0 no
Standby
                      hsx-4467-32-1 qla2200-2 no
Standby

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09E YES
                      hsx-1016-34-1 qla2200-0 no Active
                      hsx-4670-35-1 qla2200-2 no
Available

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 1 c7t0d1 6005-08B4-0001-24D1-0000-A000-0196-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09X no
                      hsx-814-33-2 qla2200-0 no Active
                      hsx-4468-32-2 qla2200-2 no
Available

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09E no
                      hsx-1017-34-2 qla2200-0 no
Standby
                      hsx-4671-35-2 qla2200-2 no
Standby

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 2 c7t0d2 6005-08B4-0001-24D1-0000-A000-0199-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09X no
                      hsx-815-33-3 qla2200-0 no
Standby
                      hsx-4469-32-3 qla2200-2 no
Standby

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09E YES
                      hsx-1018-34-3 qla2200-0 no Active
                      hsx-4672-35-3 qla2200-2 no
Available

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 3 c7t0d3 6005-08B4-0001-24D1-0000-A000-01A7-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09X no
                      hsx-816-33-4 qla2200-0 no Active
                      hsx-4470-32-4 qla2200-2 no
Available

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09E no
                      hsx-1019-34-4 qla2200-0 no
Standby
                      hsx-4673-35-4 qla2200-2 no
Standby
======== END OF OUTPUT ============

Entries in /etc/system:
* Start of CPQhsv edits. DO NOT DELETE THIS LINE
forceload: drv/clone
set maxphys=8388608
set sd:sd_max_throttle=32
set sd:sd_io_time=180
* End of CPQhsv edits. DO NOT DELETE THIS LINE
* Start of HPfcraid edits. DO NOT DELETE THIS LINE
forceload: drv/clone
forceload: drv/ssd
set maxphys=8388608
set sd:sd_max_throttle=32
set sd:sd_io_time=180
set ssd:ssd_max_throttle=32
set ssd:ssd_io_time=180
* End of HPfcraid edits. DO NOT DELETE THIS LINE

set shmsys:shminfo_shmmax=4194304000
------- EOF ---------------

Entries in /kernel/drv/ssd.conf:
#
# Copyright (c) 1995-1999 by Sun Microsystems, Inc.
# All rights reserved.
#
#ident "@(#)ssd.conf 1.9 99/07/29 SMI"

name="ssd" parent="SUNW,pln" port=0 target=0;
....
name="ssd" parent="SUNW,pln" port=0 target=15;
name="ssd" parent="SUNW,pln" port=1 target=0;
name="ssd" parent="SUNW,pln" port=1 target=1;
.....
   ditto port=1 to port=5, with target=0 thru target=15
.....
name="ssd" parent="SUNW,pln" port=5 target=15;
name="ssd" parent="sf" target=0;
name="ssd" parent="fp" target=0;
name="ssd" parent="ifp" target=127;
name="ssd" parent="scsi_vhci" target=0;
---EOF --------------
/kernel/drv/hsx.conf:
#
# Compaq StorageWorks Secure Path
# hsx.conf - Hardware Configuration file for hsx, a Disk Array Block
# SCSI Target driver. Refer to the driver.conf(4) manpage
# for more information on the syntax of this file.
#
# name "hsx" - required
# class "scsi" - required
# target SCSI target-ID
# lun SCSI logical unit number
# qdepth depth of command queue (1,..,64)
# parent restrict parent HBA
# preferred this path is preferred for a controller when load
# balancing is disabled
#
# If no "parent=" qualifier is present, all SCSI-HBA adapters in
# the system will attempt to attach an HSX instance at the indicated
# target/lun on the SCSI bus.
#
# HSX will only attach device instances for Compaq StorageWorks HSx80
# disk array targets. The SD device will also want to claim these
# targets. Explicit use of "parent=" in sd.conf may be required to
# resolve conflicts.
#
# Each HSX instance found will result in a path being provided via
# the misc/path driver.
name="hsx" parent="qla2200" target=37 lun=0 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=1 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=2 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=3 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=4 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=5 qdepth=32;
.... etc,
For targets = 32 to 39 (although not in sequence) , lun= 0 thru 202
============= EOF

Contents of /kernel/drv/qla2300.conf

# Number of times to retry a SCSI queue full error.
# Range: 0 - 255
hba0-queue-full-retry-count=16;

# Amount of time to delay after a SCSI queue full error before
# starting any new I/O commands.
# Range: 0 - 255 seconds
hba0-queue-full-retry-delay=2;

# Maximum fibre channel frame size.
# Range: 512, 1024 or 2048 bytes
hba0-max-frame-length=1024;

# Maximum number of commands queued on each logical unit.
# Range: 1 - 65535
hba0-execution-throttle=16;

# Number of port login retry attempts.
# Range: 0 - 255
hba0-login-retry-count=8;

# Enable/disable the use adapter hard loop ID address on the fibre
# channel bus.
# 0 = disable, 1 = enabled
hba0-enable-adapter-hard-loop-ID=0;

# Adapter hard loop ID address to use on the fibre channel bus.
# Range: 0 - 125
hba0-adapter-hard-loop-ID=0;

# Enable/disable the use LIP reset for loop reset.
# 0 = disable, 1 = enabled
hba0-enable-LIP-reset=0;

# Enable/disable the use LIP full login for loop reset.
# 0 = disable, 1 = enabled
hba0-enable-LIP-full-login=1;

# Enable/disable the use of target reset for loop reset.
# 0 = disable, 1 = enabled
hba0-enable-target-reset=0;

# Amount of time to delay after a loop reset for starting any new
# I/O commands.
# Range: 0 - 255 seconds
hba0-reset-delay=5;

# Number of times to retry a port that is not responding.
# Range: 0 - 255
hba0-port-down-retry-count=90;

# Maximum number of LUNs to scan for, if a device does not
# support SCSI Report LUNs command.
# Range: 1 - 256
hba0-maximum-luns-per-target=8;

# Connection options.
# 0 = loop only
# 1 = point-to-point only
# 2 = loop preferred, otherwise point-to-point
# 3 = point-to-point preferred, otherwise loop
hba0-connection-options=1;

# Fibre Channel tape support enable/disable.
# 0 = disable, 1 = enabled
hba0-fc-tape=1;

# PCI latency timer.
# Range: 0 - 0xF8
# Default: 0x40
hba0-pci-latency-timer=0x40;

# During link down conditions enable/disable the reporting of
# errors.
# 0 = disabled, 1 = enable
hba0-link-down-error=1;

# Amount of time to wait for loop to come up after it has gone down
# before reporting I/O errors.
# Range: 0 - 240 seconds
hba0-link-down-timeout=10;

# Persistent binding only option.
# 0 = Reports to OS discovery of binded and non-binded devices
# 1 = Reports to OS discovery of persistent binded devices only
hba0-persistent-binding-configuration=1;

# Fast error reporting to Solaris, enabled/disabled.
# 0 = disabled, 1 = enable
hba0-fast-error-reporting=0;

# Enable extended logging.
# 0 = disabled, 1 = enable
hba0-extended-logging=0;

#####################################################################
# WARNING: Beginning of Configuration Data stored by the QLogic #
# Applications. Consult documentation before editing #
# any data passed this text. #
#####################################################################

# CPQ installation changes made.

# CPQswsp: start of Secure Path edits. Caution: do not remove! This line is
used by pkgadd/pkgrm.

hba0-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9";
hba2-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9";
hba0-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC";
hba2-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC";
hba0-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8";
hba2-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8";
hba0-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD";
hba2-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD";
hba0-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519";
hba2-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519";
hba0-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C";
hba2-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C";
hba0-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518";
hba2-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518";
hba0-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D";
hba2-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D";

# CPQswsp: end of Secure Path edits. Caution: do not remove! This line is
used by pkgadd/pkgrm.
=========== EOF =====================
/kernel/drv/swsp.conf
# Compaq StorageWorks Secure Path
# swsp.conf - Configuration file for swsp
#
# use swsp.conf to configure which arrays can be controlled by Secure Path
# add one entry of the following form per array:
# name="swsp" class="root" portid=0 reg=0x0,0x(instance+1),0x1
# instance=(instance #) array-name="ARRAY_WWID";
#
# configurable parameters can be set globally, or on an array basis by
# adding one of path-verify, path-verify-period load-balance or auto-restore
# to the line defining the array instance, or on a line by itself (for
global)
#
# path-verify=?
# 1= path-verification enabled
# 0= path-verification disabled
# path-verify-period=X
# X = number of seconds between path verification attempts
#
# load-balance=?
# 1= enabled
# 0= disabled
#
# auto-restore=?
# 1= enabled
# 0= disabled
#
path-verify=1;
name="swsp" class="root" portid=0 reg=0x0,0x1,0x1 instance=0
array-name="5000-1FE1-5002-81C0";
wwlid-0-0="6005-08B4-0001-3879-0000-D000-0150-0000@0,0";
wwlid-0-1="6005-08B4-0001-3879-0000-D000-0153-0000@0,1";
wwlid-0-2="6005-08B4-0001-3879-0000-D000-0156-0000@0,2";
wwlid-0-3="6005-08B4-0001-3879-0000-D000-0164-0000@0,3";
name="swsp" class="root" portid=0 reg=0x0,0x2,0x1 instance=1
array-name="5000-1FE1-5002-2510";
wwlid-1-0="6005-08B4-0001-24D1-0000-A000-0193-0000@0,0";
wwlid-1-1="6005-08B4-0001-24D1-0000-A000-0196-0000@0,1";
wwlid-1-2="6005-08B4-0001-24D1-0000-A000-0199-0000@0,2";
wwlid-1-3="6005-08B4-0001-24D1-0000-A000-01A7-0000@0,3";
======================== EOF ========================================
=====================================================================
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:33 EDT