Summary: Sol8 and EVA hangs

From: Eugene Schmidt (fereug@acute.co.za)
Date: Tue Nov 02 2004 - 18:33:31 EST


Hi Everybody

Long overdue summary.

No applicable answers received. However, it seemed there was some interest
on this topic.

Anyway, the Solaris system was healthy, with the failure way downstream in
the SAN infrastructure (fibre cable between switches). Somehow this slipped
past the SAN supplier and was only found after this started impacting other
servers. So much for logs...

After the fibre was replaced, the errors stopped.

Best regards

Eugene
===============================================

Hope someone has seen this one and can help please?

Customer has an E4500, Solaris 8 with newly attached 2 x EVA disk arrays via
two QLogic 2200 SBus HBA's. Tesing was 100% and fast.

Secure Path 3.0D is loaded for channel failover.

Started experiencing hangs today. What had changed? Was rebooted this
morning. No changes prior to reboot.

Initially no errors in /var/adm/messages, but after a second reboot, errors
started appearing:

Oct 8 11:00:41 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,1 (ssd5):
Oct 8 11:00:41 proddb SCSI transport failed: reason 'aborted':
retrying command
Oct 8 11:09:00 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):
Oct 8 11:09:00 proddb SCSI transport failed: reason 'aborted':
retrying command
Oct 8 11:58:52 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):
Oct 8 11:58:52 proddb SCSI transport failed: reason 'aborted':
retrying command
Oct 8 12:11:13 proddb scsi: [ID 243001 kern.warning] WARNING:
/swsp@0,2/ssd@0,0 (ssd4):

Disks c7t0d0 c7t0d1 hanging. C6 performs beautifully.

Switch logs and EVA logs shows nothing.

No other error messages except the shown above.

Mounting disk readonly and putting heavy I/O on it emulates problem.

Also, iostat shows disk as 100% busy, with no I/O passing thru. hsx dev -
current path - has same hung state:
"9 9 17 66
                    extended device statistics
    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hsx1
    ....
    0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 hsx813
    .....
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
    0.0 0.8 0.0 0.4 0.0 0.0 0.0 13.9 0 1 c0t1d0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d0
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d1
    0.0 4.2 0.0 18.6 0.0 0.0 0.0 0.4 0 0 c6t0d2
    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d3
    0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c7t0d0
    0.0 0.0 ...
"

Below lenghty config files as installed by install script.

Promise a summary.

Thx

E Schmidt
==========

"spmgr" display shows the following config:
# spmgr display
  Server: acproddb10 Report Created: Fri, Oct 08 16:34:46 2004
  Command: spmgr display
  = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
  Storage: 5000-1FE1-5002-81C0
  Load Balance: Off Auto-restore: Off
  Path Verify: On Verify Interval: 30
  HBAs: qla2200-0 qla2200-2
  Controller: P5849D5AAPW01O, Operational
               P5849D5AAPW038, Operational
  Devices: c6t0d0 c6t0d1 c6t0d2 c6t0d3

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 0 c6t0d0 6005-08B4-0001-3879-0000-D000-0150-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW01O no
                      hsx-1-37-1 qla2200-0 no Active
                      hsx-3655-36-1 qla2200-2 no
Available

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW038 no
                      hsx-204-38-1 qla2200-0 no
Standby
                      hsx-3858-39-1 qla2200-2 no
Standby

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 1 c6t0d1 6005-08B4-0001-3879-0000-D000-0153-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW01O no
                      hsx-2-37-2 qla2200-0 no
Standby
                      hsx-3656-36-2 qla2200-2 no
Standby

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW038 no
                      hsx-205-38-2 qla2200-0 no Active
                      hsx-3859-39-2 qla2200-2 no
Available

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 2 c6t0d2 6005-08B4-0001-3879-0000-D000-0156-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW01O no
                      hsx-3-37-3 qla2200-0 no Active
                      hsx-3657-36-3 qla2200-2 no
Available

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW038 no
                      hsx-206-38-3 qla2200-0 no
Standby
                      hsx-3860-39-3 qla2200-2 no
Standby

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 3 c6t0d3 6005-08B4-0001-3879-0000-D000-0164-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW01O no
                      hsx-4-37-4 qla2200-0 no
Standby
                      hsx-3658-36-4 qla2200-2 no
Standby

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPW038 no
                      hsx-207-38-4 qla2200-0 no Active
                      hsx-3861-39-4 qla2200-2 no
Available

  Storage: 5000-1FE1-5002-2510
  Load Balance: Off Auto-restore: Off
  Path Verify: On Verify Interval: 30
  HBAs: qla2200-0 qla2200-2
  Controller: P5849D5AAPC09X, Operational
               P5849D5AAPC09E, Operational
  Devices: c7t0d0 c7t0d1 c7t0d2 c7t0d3

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 0 c7t0d0 6005-08B4-0001-24D1-0000-A000-0193-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09X no
                      hsx-813-33-1 qla2200-0 no
Standby
                      hsx-4467-32-1 qla2200-2 no
Standby

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09E YES
                      hsx-1016-34-1 qla2200-0 no Active
                      hsx-4670-35-1 qla2200-2 no
Available

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 1 c7t0d1 6005-08B4-0001-24D1-0000-A000-0196-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09X no
                      hsx-814-33-2 qla2200-0 no Active
                      hsx-4468-32-2 qla2200-2 no
Available

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09E no
                      hsx-1017-34-2 qla2200-0 no
Standby
                      hsx-4671-35-2 qla2200-2 no
Standby

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 2 c7t0d2 6005-08B4-0001-24D1-0000-A000-0199-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09X no
                      hsx-815-33-3 qla2200-0 no
Standby
                      hsx-4469-32-3 qla2200-2 no
Standby

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09E YES
                      hsx-1018-34-3 qla2200-0 no Active
                      hsx-4672-35-3 qla2200-2 no
Available

  TGT/LUN Device WWLUN_ID
#_Paths
    0/ 3 c7t0d3 6005-08B4-0001-24D1-0000-A000-01A7-0000 4

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09X no
                      hsx-816-33-4 qla2200-0 no Active
                      hsx-4470-32-4 qla2200-2 no
Available

          Controller Path_Instance HBA Preferred?
Path_Status
          P5849D5AAPC09E no
                      hsx-1019-34-4 qla2200-0 no
Standby
                      hsx-4673-35-4 qla2200-2 no
Standby
======== END OF OUTPUT ============

Entries in /etc/system:
* Start of CPQhsv edits. DO NOT DELETE THIS LINE
forceload: drv/clone
set maxphys=8388608
set sd:sd_max_throttle=32
set sd:sd_io_time=180
* End of CPQhsv edits. DO NOT DELETE THIS LINE
* Start of HPfcraid edits. DO NOT DELETE THIS LINE
forceload: drv/clone
forceload: drv/ssd
set maxphys=8388608
set sd:sd_max_throttle=32
set sd:sd_io_time=180
set ssd:ssd_max_throttle=32
set ssd:ssd_io_time=180
* End of HPfcraid edits. DO NOT DELETE THIS LINE

set shmsys:shminfo_shmmax=4194304000
------- EOF ---------------

Entries in /kernel/drv/ssd.conf:
#
# Copyright (c) 1995-1999 by Sun Microsystems, Inc.
# All rights reserved.
#
#ident "@(#)ssd.conf 1.9 99/07/29 SMI"

name="ssd" parent="SUNW,pln" port=0 target=0;
....
name="ssd" parent="SUNW,pln" port=0 target=15;
name="ssd" parent="SUNW,pln" port=1 target=0;
name="ssd" parent="SUNW,pln" port=1 target=1;
.....
   ditto port=1 to port=5, with target=0 thru target=15
.....
name="ssd" parent="SUNW,pln" port=5 target=15;
name="ssd" parent="sf" target=0;
name="ssd" parent="fp" target=0;
name="ssd" parent="ifp" target=127;
name="ssd" parent="scsi_vhci" target=0;
---EOF --------------
/kernel/drv/hsx.conf:
#
# Compaq StorageWorks Secure Path
# hsx.conf - Hardware Configuration file for hsx, a Disk Array Block
# SCSI Target driver. Refer to the driver.conf(4) manpage
# for more information on the syntax of this file.
#
# name "hsx" - required
# class "scsi" - required
# target SCSI target-ID
# lun SCSI logical unit number
# qdepth depth of command queue (1,..,64)
# parent restrict parent HBA
# preferred this path is preferred for a controller when load
# balancing is disabled
#
# If no "parent=" qualifier is present, all SCSI-HBA adapters in
# the system will attempt to attach an HSX instance at the indicated
# target/lun on the SCSI bus.
#
# HSX will only attach device instances for Compaq StorageWorks HSx80
# disk array targets. The SD device will also want to claim these
# targets. Explicit use of "parent=" in sd.conf may be required to
# resolve conflicts.
#
# Each HSX instance found will result in a path being provided via
# the misc/path driver.
name="hsx" parent="qla2200" target=37 lun=0 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=1 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=2 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=3 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=4 qdepth=32;
name="hsx" parent="qla2200" target=37 lun=5 qdepth=32;
.... etc,
For targets = 32 to 39 (although not in sequence) , lun= 0 thru 202
============= EOF

Contents of /kernel/drv/qla2300.conf

# Number of times to retry a SCSI queue full error.
# Range: 0 - 255
hba0-queue-full-retry-count=16;

# Amount of time to delay after a SCSI queue full error before
# starting any new I/O commands.
# Range: 0 - 255 seconds
hba0-queue-full-retry-delay=2;

# Maximum fibre channel frame size.
# Range: 512, 1024 or 2048 bytes
hba0-max-frame-length=1024;

# Maximum number of commands queued on each logical unit.
# Range: 1 - 65535
hba0-execution-throttle=16;

# Number of port login retry attempts.
# Range: 0 - 255
hba0-login-retry-count=8;

# Enable/disable the use adapter hard loop ID address on the fibre
# channel bus.
# 0 = disable, 1 = enabled
hba0-enable-adapter-hard-loop-ID=0;

# Adapter hard loop ID address to use on the fibre channel bus.
# Range: 0 - 125
hba0-adapter-hard-loop-ID=0;

# Enable/disable the use LIP reset for loop reset.
# 0 = disable, 1 = enabled
hba0-enable-LIP-reset=0;

# Enable/disable the use LIP full login for loop reset.
# 0 = disable, 1 = enabled
hba0-enable-LIP-full-login=1;

# Enable/disable the use of target reset for loop reset.
# 0 = disable, 1 = enabled
hba0-enable-target-reset=0;

# Amount of time to delay after a loop reset for starting any new
# I/O commands.
# Range: 0 - 255 seconds
hba0-reset-delay=5;

# Number of times to retry a port that is not responding.
# Range: 0 - 255
hba0-port-down-retry-count=90;

# Maximum number of LUNs to scan for, if a device does not
# support SCSI Report LUNs command.
# Range: 1 - 256
hba0-maximum-luns-per-target=8;

# Connection options.
# 0 = loop only
# 1 = point-to-point only
# 2 = loop preferred, otherwise point-to-point
# 3 = point-to-point preferred, otherwise loop
hba0-connection-options=1;

# Fibre Channel tape support enable/disable.
# 0 = disable, 1 = enabled
hba0-fc-tape=1;

# PCI latency timer.
# Range: 0 - 0xF8
# Default: 0x40
hba0-pci-latency-timer=0x40;

# During link down conditions enable/disable the reporting of
# errors.
# 0 = disabled, 1 = enable
hba0-link-down-error=1;

# Amount of time to wait for loop to come up after it has gone down
# before reporting I/O errors.
# Range: 0 - 240 seconds
hba0-link-down-timeout=10;

# Persistent binding only option.
# 0 = Reports to OS discovery of binded and non-binded devices
# 1 = Reports to OS discovery of persistent binded devices only
hba0-persistent-binding-configuration=1;

# Fast error reporting to Solaris, enabled/disabled.
# 0 = disabled, 1 = enable
hba0-fast-error-reporting=0;

# Enable extended logging.
# 0 = disabled, 1 = enable
hba0-extended-logging=0;

#####################################################################
# WARNING: Beginning of Configuration Data stored by the QLogic #
# Applications. Consult documentation before editing #
# any data passed this text. #
#####################################################################

# CPQ installation changes made.

# CPQswsp: start of Secure Path edits. Caution: do not remove! This line is
used by pkgadd/pkgrm.

hba0-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9";
hba2-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9";
hba0-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC";
hba2-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC";
hba0-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8";
hba2-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8";
hba0-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD";
hba2-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD";
hba0-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519";
hba2-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519";
hba0-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C";
hba2-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C";
hba0-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518";
hba2-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518";
hba0-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D";
hba2-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D";

# CPQswsp: end of Secure Path edits. Caution: do not remove! This line is
used by pkgadd/pkgrm.
=========== EOF =====================
/kernel/drv/swsp.conf
# Compaq StorageWorks Secure Path
# swsp.conf - Configuration file for swsp
#
# use swsp.conf to configure which arrays can be controlled by Secure Path
# add one entry of the following form per array:
# name="swsp" class="root" portid=0 reg=0x0,0x(instance+1),0x1
# instance=(instance #) array-name="ARRAY_WWID";
#
# configurable parameters can be set globally, or on an array basis by
# adding one of path-verify, path-verify-period load-balance or auto-restore
# to the line defining the array instance, or on a line by itself (for
global)
#
# path-verify=?
# 1= path-verification enabled
# 0= path-verification disabled
# path-verify-period=X
# X = number of seconds between path verification attempts
#
# load-balance=?
# 1= enabled
# 0= disabled
#
# auto-restore=?
# 1= enabled
# 0= disabled
#
path-verify=1;
name="swsp" class="root" portid=0 reg=0x0,0x1,0x1 instance=0
array-name="5000-1FE1-5002-81C0";
wwlid-0-0="6005-08B4-0001-3879-0000-D000-0150-0000@0,0";
wwlid-0-1="6005-08B4-0001-3879-0000-D000-0153-0000@0,1";
wwlid-0-2="6005-08B4-0001-3879-0000-D000-0156-0000@0,2";
wwlid-0-3="6005-08B4-0001-3879-0000-D000-0164-0000@0,3";
name="swsp" class="root" portid=0 reg=0x0,0x2,0x1 instance=1
array-name="5000-1FE1-5002-2510";
wwlid-1-0="6005-08B4-0001-24D1-0000-A000-0193-0000@0,0";
wwlid-1-1="6005-08B4-0001-24D1-0000-A000-0196-0000@0,1";
wwlid-1-2="6005-08B4-0001-24D1-0000-A000-0199-0000@0,2";
wwlid-1-3="6005-08B4-0001-24D1-0000-A000-01A7-0000@0,3";
======================== EOF ========================================
=====================================================================
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:29:39 EDT