From: Tom Combs (combs@magnet.fsu.edu)
Date: Wed Dec 17 2003 - 15:29:15 EST
Hello,
First off, let me thank everyone for your responses to my 'high
availability' question. It has been a busy week and I have not
had time to digest all of it or respond with my thanks.
Now for my current question... I have a Sun Ultra 10 running
Solaris 8 that is a file server to about 30 other computers, most
being Red Hat and a few being Solaris. I started getting errors
like this in my logs:
Dec 17 13:53:57 fangio scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1/IntraServer,Ultra2-scsi@1/sd@2,0 (sd3):
Dec 17 13:53:57 fangio Error for Command: read(10)
Error Level: Fatal
Dec 17 13:53:57 fangio scsi: [ID 107833 kern.notice] Requested Block:
12001584 Error Block: 12001614
Dec 17 13:53:57 fangio scsi: [ID 107833 kern.notice] Vendor: SEAGATE
Serial Number: 3JA2JM1Z
Dec 17 13:53:57 fangio scsi: [ID 107833 kern.notice] Sense Key: Media Error
Dec 17 13:53:57 fangio scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered
read error), ASCQ: 0x0, FRU: 0xe4
I put in a brand new drive and restored the files and everything looked
fine until the nightly dump and I got the errors again. By doing finer
and finer selective dumps on directories and files, I isolated the problem
to a single ascii text file. If I tried to ufsdump, cat or mv the file, I
would get errors. I removed the file and was able to dump the file system
without any problems.
Realize that I was getting these errors first on a five year old drive. I
put in a new drive (the one in the above errors), formatted, newfs'd and
fsck'd the new drive. I then restored files from backup tape. On subsequent
dumps, the SCSI errors returned until I removed one specific file. I would
think that the errors would be due to faulty media or hardware but after
all I've been through I'm beginning to wonder if this has to be the case.
Is it possible for a file to be corrupt enough to cause SCSI disk errors?
I know this doesn't sound very plausible but I've been through a lot with
this problem and I am now reaching. In the process of isolating the
problem, I have switched the entire Ultra 10 box, scsi cables, terminator,
scsi controllers and updated the scsi drivers; in addition to adding the
new drive. All to no avail until removing this one particular file.
Thanks for your thoughts. --Tom Combs
-- Tom Combs E-mail: combs@magnet.fsu.edu National High Magnetic Field Laboratory Phone: (850) 644-1657 1800 E. Paul Dirac Drive Tallahassee, FL 32310 _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers
This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:42 EDT