Disk error caused by a file?

From: Tom Combs (combs@magnet.fsu.edu)
Date: Wed Dec 17 2003 - 15:29:15 EST


Hello,

  First off, let me thank everyone for your responses to my 'high
  availability' question. It has been a busy week and I have not
  had time to digest all of it or respond with my thanks.
  
  Now for my current question... I have a Sun Ultra 10 running
  Solaris 8 that is a file server to about 30 other computers, most
  being Red Hat and a few being Solaris. I started getting errors
  like this in my logs:
  
  Dec 17 13:53:57 fangio scsi: [ID 107833 kern.warning] WARNING:
  /pci@1f,0/pci@1/IntraServer,Ultra2-scsi@1/sd@2,0 (sd3):

  Dec 17 13:53:57 fangio Error for Command: read(10)
                  Error Level: Fatal

  Dec 17 13:53:57 fangio scsi: [ID 107833 kern.notice] Requested Block:
        12001584 Error Block: 12001614

  Dec 17 13:53:57 fangio scsi: [ID 107833 kern.notice] Vendor: SEAGATE
                    Serial Number: 3JA2JM1Z

  Dec 17 13:53:57 fangio scsi: [ID 107833 kern.notice] Sense Key: Media Error

  Dec 17 13:53:57 fangio scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered
      read error), ASCQ: 0x0, FRU: 0xe4
      
      
  I put in a brand new drive and restored the files and everything looked
  fine until the nightly dump and I got the errors again. By doing finer
  and finer selective dumps on directories and files, I isolated the problem
  to a single ascii text file. If I tried to ufsdump, cat or mv the file, I
  would get errors. I removed the file and was able to dump the file system
  without any problems.
  
  Realize that I was getting these errors first on a five year old drive. I
  put in a new drive (the one in the above errors), formatted, newfs'd and
  fsck'd the new drive. I then restored files from backup tape. On subsequent
  dumps, the SCSI errors returned until I removed one specific file. I would
  think that the errors would be due to faulty media or hardware but after
  all I've been through I'm beginning to wonder if this has to be the case.
  
  Is it possible for a file to be corrupt enough to cause SCSI disk errors?
  I know this doesn't sound very plausible but I've been through a lot with
  this problem and I am now reaching. In the process of isolating the
  problem, I have switched the entire Ultra 10 box, scsi cables, terminator,
  scsi controllers and updated the scsi drivers; in addition to adding the
  new drive. All to no avail until removing this one particular file.

  Thanks for your thoughts. --Tom Combs
  
  
  

--
Tom Combs                                      E-mail: combs@magnet.fsu.edu
National High Magnetic Field Laboratory        Phone:  (850) 644-1657
1800 E. Paul Dirac Drive                       Tallahassee, FL 32310
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:27:42 EDT