SUMMARY: hardware RAID domain panics

From: Neil R. Smith (neils@ariel.met.tamu.edu)
Date: Fri Apr 09 2004 - 16:27:08 EDT


Summarized issue:
ES45, Tru64 5.1A (no patches)
External Hardware RAID: Western Scientific F4 Tornado RAID IDE-SCSI
    3TB partitioned & presented to Tru64 as 2TB and 1.3TB Luns. Each
    incorporated as single domain with single fileset each.

Successful usage, as is, until ~40% and ~56% capacity fill whereupon
begin AdvFS I/O errors followed in short order by domain panics and
withdrawal of domain from service.

fixfdmn showed the following:

fixfdmn -n d12
fixfdmn: Checking the RBMT.
fixfdmn: Can't read page at block -660733904 on '/dev/disk/dsk12c'.
fixfdmn: Invalid argument
fixfdmn: Error correcting the RBMT.

Was this OS or hardware related?

Additional evidence later from examination of disklabel I applied:
# size offset fstype fsize bsize cpg # ~Cyl
values
  a: 131072 0 unused 0 0 # 0 -
7
  b: 262144 131072 unused 0 0 # 8 -
23
  c: -1651834880 0 AdvFS # 0 -
161323
  d: 0 0 unused 0 0 # 0 -
0
  e: 0 0 unused 0 0 # 0 -
0
  f: 0 0 unused 0 0 # 0 -
0
  g: 1321369600 393216 unused 0 0 # 24 -
80673
  h: 1321369600 1321762816 unused 0 0 # 80674 -
161323
-------------------------

Answer: Problem is twofold, and was not hardware related.
        1. Patch Kit 3, at minimum, required - Advfs fixes
           (I have installed Patch Kit 6 for 5.1A)
        2. Disklabel applied to the luns was wrong, as hinted
           by the negative-integer partition sizes in the label.
           I had applied a default disklabel by doing
                disklabel -rw dsk12
              This is wrong! I should have used the following syntax
            which forces disklabel to query the disk, in this case
            the hardware RAID controller, for disk info:
                disklabel -rwt advfs dsk12 junk
            where 'junk' is anything not found in /etc/disktab

Many thanks to:
John Farmer
Bob Harris
Robert Collins
Alan Rollow

-- 
Neil R. Smith, Comp. Sys. Mngr.		neils@tamu.edu
Dept. Atmospheric Sci., Texas A&M Univ.	979/845-6272 FAX:979/862-4466


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:56 EDT