LSM raid5 volume causing errors

From: Didier Godefroy (ldg@ulysium.net)
Date: Mon Feb 28 2005 - 17:19:41 EST


I was just working on an LSM raid5 issue for which I posted my summary just
a day or so ago. I couldn't work again on the production system to find out
the answers to my problem, so I set up a test bed for myself to experiment
with and that's how I found out how raid5 is actually handled by LSM, but I
also am bumping against an problem, although it's not on a production
system, it puzzles me and I'd like to understand what causes this, just in
case I have the same thing showing up later on the production system.

The way I have that test bed set up is with a couple of 4gig drives for the
system (5.1b with pk1), with mirroring, plus 4 2gig drives to use for the
raid5 set, all that on just one bus on a KZPBA-CX controller. All drives are
storageworks SBBs.

That system isn't on any network, and I only applied the patch kit 1 from
the CD. Perhaps this is due to a bug that was fixed later in a patch kit.
What's going on is that when I try filling that raid5 volume (advfs domain
mounted on it) with a bunch of files to see how it performs, there are many
errors being reported, such as SCSI CAM ERROR PACKET on each and every disk
involved in the raid5 set, including the one holding the log plex, and the
reported error is "timeout on disconnected request" from the routine
"ss_perform_timeout".

I have the log plex on the drive 0, which is one of the two disks from the
mirrored system, since they were of slightly different sizes, one of them
had enough room unused to make one more partition and put the raid5 log plex
on it.

I found that when I remove the log plex from the raid5 set, leaving it
running on just the 4 columns on the 4 drives, the errors don't show up and
it's actually pretty fast. It didn't matter if I added one more drive (a 10k
rpm) and put the raid5 log plex on that, the errors were reported for that
drive as well.

What is causing this?
Can that be prevented?
Does it have anything to do with the specific choice of hardware?

I have 2 KZPBA-CX controllers in that test box and I tried them both, but it
made no difference, it always works fine with no errors when no log plex is
added to the raid5 set.

-- 
Didier Godefroy
mailto:dg@ulysium.net


This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:16 EDT