Problem with 3510 FC Array State Replica's

From: Thor Newman (Thor@airg.com)
Date: Thu Mar 17 2005 - 19:49:20 EST


Hello managers,

I am in the process of building a SunFire E2900 with 3510FC RAID for
external storage, connected via 2GB Host Adapter (SG-XPCI2FC-QF2).
Solaris itself is installed on the two internal HDD which the E2900 has
; the RAID's are intended for database partitions.

Each 3510 provides a hardware RAID LUN to a separate controller in the
E2900, I then use Solaris VM to mirror the LUN's.

The problem I am having is not with the mirroring itself, this is fine,
it's with 'metadb' state replicas put on 3510 LUN's.

I have state replicas on the two internal HDD, and these are fine. I set
aside a small slice (20mb or so) on each LUN for additional state
replicas, and added them with metadb, and there was no problem with
this. The replicas were added, the VM objects were created, the
partitions mounted, everyone was happy, and all was well.

The problem arose after rebooting the server; the 3510 state replicas
failed or were otherwise reported as corrupted. The server would not
boot unaided, but prompted for single user mode to perform maintenance.
I had to delete state replicas on the 3510 devices in order to clear the
error and continue with a successful boot. I went through this twice and
verified patches, drivers, etc, each time.

- The replicas on the E2900 internal HDD are always fine.

- The internal drives themselves are mirrored with VM as well
as the RAID LUN's.

- There are no issues with the setup as long as I don't add
state replicas to the 3510 devices; if I do, all is well until a reboot.

- When I reboot, the server prompts for maintenance and reports
problems with the 3510 objects and corruption issues with 3510 state
replicas. Deleting the 3510 state replicas corrects the issues, and the
server continues booting. The 3510 metadevices mount correctly, no data
is lost or corrupted.

So the problem appears to be the state replicas only. The 3510 is
supposed to interop with Solaris VM, but perhaps only when configured
with JBOD? Is their a problem putting a state replica on a slice on a HW
RAID LUN? Driver issues? Any suggestions / notes / recommendations /
links / etc will be greatly appreciated.

Here's the sort of errors I get:

Mar 17 19:39:44 e2900-01 genunix: [ID 936769 kern.info] ssd1 is
/ssm@0,0/pci@18,700000/SUNW,qlc@1/fp@0,0/ssd@w216000c0ff884c46,1

Mar 17 19:39:44 e2900-01 scsi: [ID 799468 kern.info] ssd0 at fp0: name
w216000c0ff884c46,0, bus address a7

Mar 17 19:39:44 e2900-01 genunix: [ID 936769 kern.info] ssd0 is
/ssm@0,0/pci@18,700000/SUNW,qlc@1/fp@0,0/ssd@w216000c0ff884c46,0

Mar 17 19:39:44 e2900-01 genunix: [ID 408114 kern.info]
/ssm@0,0/pci@18,700000/SUNW,qlc@1/fp@0,0/ssd@w216000c0ff884c46,1 (ssd1)
online

Mar 17 19:39:44 e2900-01 genunix: [ID 454863 kern.info] dump on
/dev/md/dsk/d60 size 8193 MB

Mar 17 19:39:46 e2900-01 pseudo: [ID 129642 kern.info] pseudo-device:
devinfo0

Mar 17 19:39:46 e2900-01 genunix: [ID 936769 kern.info] devinfo0 is
/pseudo/devinfo@0

Mar 17 19:39:46 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:46 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d69: B_FAILFAST I/O disabled

Mar 17 19:39:46 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d70: open error on (Unavailable)

Mar 17 19:39:46 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:46 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d69: B_FAILFAST I/O disabled

Mar 17 19:39:46 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d70: open error on (Unavailable)

Mar 17 19:39:46 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:46 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d79: B_FAILFAST I/O disabled

Mar 17 19:39:46 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:39:46 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:46 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d79: B_FAILFAST I/O disabled

Mar 17 19:39:46 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:39:46 e2900-01 pseudo: [ID 129642 kern.info] pseudo-device:
sgfru0

Mar 17 19:39:46 e2900-01 genunix: [ID 936769 kern.info] sgfru0 is
/pseudo/sgfru@0

Mar 17 19:39:47 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:47 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d69: B_FAILFAST I/O disabled

Mar 17 19:39:47 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d70: open error on (Unavailable)

Mar 17 19:39:47 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:47 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d69: B_FAILFAST I/O disabled

Mar 17 19:39:47 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d70: open error on (Unavailable)

Mar 17 19:39:47 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:47 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d69: B_FAILFAST I/O disabled

Mar 17 19:39:47 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d70: open error on (Unavailable)

Mar 17 19:39:47 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:47 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d69: B_FAILFAST I/O disabled

Mar 17 19:39:47 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d70: open error on (Unavailable)

Mar 17 19:39:47 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:47 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d79: B_FAILFAST I/O disabled

Mar 17 19:39:47 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:39:47 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:47 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d79: B_FAILFAST I/O disabled

Mar 17 19:39:47 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:39:47 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:47 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d79: B_FAILFAST I/O disabled

Mar 17 19:39:47 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:39:47 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:39:47 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d79: B_FAILFAST I/O disabled

Mar 17 19:39:47 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:40:27 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:40:27 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d69: B_FAILFAST I/O disabled

Mar 17 19:40:27 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d70: open error on (Unavailable)

Mar 17 19:40:27 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:40:27 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d69: B_FAILFAST I/O disabled

Mar 17 19:40:27 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:40:27 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d70: open error on (Unavailable)

Mar 17 19:40:27 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d79: B_FAILFAST I/O disabled

Mar 17 19:40:27 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:40:27 e2900-01 genunix: [ID 511416 kern.notice]
e_ddi_get_dev_info: Illegal major device number <-1>

Mar 17 19:40:27 e2900-01 md_mirror: [ID 437521 kern.info] NOTICE: md:
d79: B_FAILFAST I/O disabled

Mar 17 19:40:27 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:41:47 e2900-01 in.routed[141]: [ID 238047 daemon.warning]
interface ce1 to 10.10.10.22 turned off

Mar 17 19:41:47 e2900-01 in.routed[141]: [ID 300549 daemon.warning]
interface ce1 to 10.10.10.22 restored

Mar 17 19:41:48 e2900-01 genunix: [ID 454863 kern.info] dump on
/dev/dsk/c1t0d0s7 size 8193 MB

Mar 17 19:41:50 e2900-01 pseudo: [ID 129642 kern.info] pseudo-device:
tod0

Mar 17 19:41:50 e2900-01 genunix: [ID 936769 kern.info] tod0 is
/pseudo/tod@0

Mar 17 19:41:51 e2900-01 pseudo: [ID 129642 kern.info] pseudo-device:
pm0

Mar 17 19:41:51 e2900-01 genunix: [ID 936769 kern.info] pm0 is
/pseudo/pm@0

Mar 17 19:41:52 e2900-01 metadevadm: [ID 209699 daemon.error] Invalid
device relocation information detected in Solaris Volume Manager

Mar 17 19:41:52 e2900-01 metadevadm: [ID 912841 daemon.error] Please
check the status of the following disk(s):

Mar 17 19:41:52 e2900-01 metadevadm: [ID 702911 daemon.error]
c3t40d0

Mar 17 19:41:52 e2900-01 metadevadm: [ID 702911 daemon.error]
c3t40d1

Mar 17 19:41:53 e2900-01 md_mirror: [ID 976326 kern.warning] WARNING: md
d80: open error on (Unavailable)

Mar 17 19:41:53 e2900-01 last message repeated 1 time

Mar 17 19:41:53 e2900-01 md: [ID 255104 kern.notice] NOTICE:
md_probe_one: err 6 mnum 70

Mar 17 19:41:53 e2900-01 md: [ID 255104 kern.notice] NOTICE:
md_probe_one: err 6 mnum 80

The OS is Solaris 9, with latest recommended patch cluster (as of
Monday, 7th March):

SunOS e2900-01 5.9 Generic_118558-03 sun4u sparc SUNW,Netra-T12

All documented drivers, patches, etc specific to the needs of the host
adapter cards have been applied:

system SUNWqlc Qlogic ISP 2200/2202 Fibre
Channel Device Driver

system SUNWqlcx Qlogic ISP 2200/2202 Fibre
Channel Device Driver (64 bit)

This patch bundle was generated by PatchPro.

Please refer to the README file within each patch for installation

instructions. To properly patch your system, the following patches

should be installed in the listed order:

1) 113039-08

2) 111847-08 !!! SEE README !!!

3) 113040-11

4) 114878-08

5) 113043-09

6) 114476-04 !!! SEE README !!!

7) 114478-06

8) 113041-07 !!! SEE README !!!

9) 113042-09

The RAID LUN's are all correctly perceived by the OS:

# format

Searching for disks...done

AVAILABLE DISK SELECTIONS:

       0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> [ INTERNAL
HDD ]

          /ssm@0,0/pci@18,600000/scsi@2/sd@0,0

       1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> [ INTERNAL
HDD ]

          /ssm@0,0/pci@18,600000/scsi@2/sd@1,0

       2. c3t40d0 <SUN-StorEdge3510-327R cyl 52723 alt 2 hd 127 sec 64>
[ RAID LUN ]

/ssm@0,0/pci@18,700000/SUNW,qlc@1/fp@0,0/ssd@w216000c0ff884c46,0

       3. c3t40d1 <SUN-StorEdge3510-327R cyl 34873 alt 2 hd 64 sec 64>
[ RAID LUN ]

/ssm@0,0/pci@18,700000/SUNW,qlc@1/fp@0,0/ssd@w216000c0ff884c46,1

       4. c5t40d0 <SUN-StorEdge3510-327R cyl 52723 alt 2 hd 127 sec 64>
[ RAID LUN ]

/ssm@0,0/pci@19,700000/SUNW,qlc@3/fp@0,0/ssd@w216000c0ff88431c,0

       5. c5t40d1 <SUN-StorEdge3510-327R cyl 34873 alt 2 hd 64 sec 64>
[ RAID LUN ]

/ssm@0,0/pci@19,700000/SUNW,qlc@3/fp@0,0/ssd@w216000c0ff88431c,1

Each INTERNAL disk has a state replica on slice 6, and these are active
in metadb and work without any problems. I'm afraid I don't have output
for metadb with the 3510's slices added

# metadb

        flags first blk block count

     a m p luo 16 8192
/dev/dsk/c1t0d0s6

     a p luo 16 8192
/dev/dsk/c1t1d0s6

Each RAID LUN has space for a state replica on slice 7 (sample follows):

Total disk cylinders available: 52723 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks

  0 root wm 0 0 (0/0/0)
0

  1 swap wu 0 0 (0/0/0)
0

  2 backup wu 0 - 52722 204.34GB (52723/0/0)
428532544

  3 unassigned wm 0 0 (0/0/0)
0

  4 unassigned wm 0 0 (0/0/0)
0

  5 unassigned wm 0 0 (0/0/0)
0

  6 usr wm 0 - 52716 204.32GB (52717/0/0)
428483776

  7 unassigned wm 52717 - 52722 23.81MB (6/0/0)
48768

The LUN's themselves are happily ensconsed in VM objects (d70 and d80):

d70: Mirror

    Submirror 0: d69

      State: Okay

    Submirror 1: d71

      State: Okay

    Pass: 1

    Read option: roundrobin (default)

    Write option: parallel (default)

    Size: 428483776 blocks (204 GB)

d69: Submirror of d70

    State: Okay

    Size: 428483776 blocks (204 GB)

    Stripe 0:

        Device Start Block Dbase State Reloc Hot Spare

        c3t40d0s6 0 No Okay Yes

d71: Submirror of d70

    State: Okay

    Size: 428483776 blocks (204 GB)

    Stripe 0:

        Device Start Block Dbase State Reloc Hot Spare

        c5t40d0s6 0 No Okay Yes

d80: Mirror

    Submirror 0: d79

      State: Okay

    Submirror 1: d81

      State: Okay

    Pass: 1

    Read option: roundrobin (default)

    Write option: parallel (default)

    Size: 142798848 blocks (68 GB)

d79: Submirror of d80

    State: Okay

    Size: 142798848 blocks (68 GB)

    Stripe 0:

        Device Start Block Dbase State Reloc Hot Spare

        c3t40d1s6 0 No Okay Yes

d81: Submirror of d80

    State: Okay

    Size: 142798848 blocks (68 GB)

    Stripe 0:

        Device Start Block Dbase State Reloc Hot Spare

        c5t40d1s6 0 No Okay Yes

 - thor
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:23 EDT