An other quick question

From: Didier Godefroy (ldg@ulysium.net)
Date: Thu Jul 27 2006 - 13:14:36 EDT


After having so much trouble lately with one system, I'm getting to the
point where I don't really know what the real cause of the problem is.

A system that's been running fine for years is now constantly failing from a
scsi event. I haven't recently changed any system software, it's been
running tru64 5.1 since it came out years ago, and no hardware was changed.
At first, it started having hard drive failures, but it turned out none of
the drives are really dead, they work just fine on other machines.
So I thought maybe the scsi controller would be failing intermittently,
which would seems reasonable.
However, after several times that the system went down from the same
reasons, always some scsi event, I replaced the whole machine, moving only
the hard drives to that other machine, and those hard drives are not the
original ones, all were changed over and those were fully tested and working
fine on other machines before putting them in that failing one.
So now all hardware is different, none of the original hardware was kept and
only the original software was moved over as is (from lsm mirroring).
That means all that system has in common with the original one now is the
software/data contents on the drives.
Even in that new machine, a different scsi controller was used, it was a
built-in one before, now it's a pci based one (kzpba-cx).
And this system is continuing to fail from a scsi event, now on that pci
based controller, which came from an other fully working system.

How could such failures continue to happen like this, with all different
hardware which tests good without the software residing on those drives?

Could such scsi events be wrongly triggered by some faulty software?
Is it possible the software is seeing scsi events errors when there are
really none?
Could that be caused by some corruption in some part of the system software?
(scsi drivers/kernel...)

(I know, it looks like several questions, but it's kind of like the same one
isn't it?)

-- 
Didier Godefroy
mailto:dg@ulysium.net


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:30 EDT