FOLLOWUP Salvage question and SUMMARY: System crashing trying to access RAID

From: Mike Hudson (mjhudson@uwaterloo.ca)
Date: Mon Mar 03 2003 - 11:57:52 EST


SUMMARY

Thanks to Tom Blinn, Alan Rollow and Pat O'Brien ... I also called HP
software support in the end

The problem below *seems* to be due to corruption of the advFS file
system (as opposed to a hradware problem)

One suggestion - fixfdmns - is not available for 4.0E. HP support felt
that if the problem is so bad that verify crashes, then fixfdmns probably
would not work anyway.

So I have to resort to the salvage command. At least for a few tests this
seemed to work without crashing the system ...

The problem is the RAID is huge (1Tb) and when copying directories to
(much smaller) disks, the latter tend to overfill.

NEW QUESTION

Is there any way to get salvage to do an "ls" or "du"? The salvage man
page doesn't say anything about this -- so I guess its a long shot but I
thought I'd ask.

Thanks
Mike

On Tue, 25 Feb 2003, Mike Hudson wrote:

> Hello managers
>
> Due to a series of power failures (while I was out of town) I am now
> having prolems with my DS20E running 4.0E.
>
> I have a third-party RAID which appears to the DS20E as a single large
> disk connected via a SCSI. The RAID has several partitions which use
> AdvFS. Externally the RAID seems fine - all lights are green and there
> are no error message son its LCD console.
>
> I can boot the DS20E to single-user mode fine (the system disks are
> on local disks). The system sees the RAID device when I boot up, and
> /sbin/advfs/advscan of the device detects the partitions there.
>
> When I try to access the RAID using /sbin/advfs/verify or mount
> the system crashes with
>
> "trap: invalid memory read access ...
>
> kernel panic"
>
> (more messages with hex numbers of the fault flash by but they are only on
> the console and do not seem to be logged anywhere).
>
> and then reboots to single-user mode.
>
> I have tried switching the RAID to a different SCSI controller but the
> problem persists ...
>
> I have tried reboting with a generic kernel ... no luck there.
>
> Any ideas for at least starting points about how to diagnose this problem?
>
> I don't know if the data on disk is corrupt or if there is a problem on
> the system reading it ... I don't have a lot of experience managing this
> sytem (I am not a full-time system manager, but rather a professor!).
>
> Thanks in advance
> Mike



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:09 EDT