write errors

From: Charles Ballowe (hangman@steelballs.org)
Date: Fri Sep 26 2003 - 10:41:07 EDT


The DBAs called me telling me they were having write errors on some files
the other day and digging through with evmget, I found errors like:
25-Sep-2003 01:18:26 sys.unix.hw.error_counter_changed.disk._hwid.86 200 oraproddb A change has occurred in an error counter for device (HWID=86 lid=16)

I placed a support call with HP sent the binary.errlog and they were able to
determine which disk in that RAID set was having problems and it was replaced.
The DBAs are still having their problems but I am no longer seeing anything
logged to the errlog. I don't believe it to be a hardware problem at this time,
so I'm wondering if it's possible that there is some error state stored in the
kernel that won't be cleared until the system reboots or some other action
is taken? Is there something else I should be looking for?

Any thoughts on how to get by this would be appreciated.
System is Tru64v5.1A PK5 and is clustered - both members are GS-80s.

-Charlie



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:49:36 EDT