ZFS problems with bad disk

From: Michael Hase (michael@six.de)
Date: Wed Aug 09 2006 - 06:15:39 EDT


Hi,

last weekend I discovered some strange beaviour of zfs regarding bad
disks. In a test environment with an e420 running Solaris 10 6/06 and
an a5000 (split loop, jni hbas) I created a zfs raidz pool with 5 36gb
disks. No problems so far. Then did some stress tests with bonnie,
great performace for a software raid5 implementation on slow cpus:
about 60 mb/sec write and 120 mb/sec read (according to mpstat
strictly cpu bound on all 4 cpus, about 80-90% sys time). Thats more
than ok for zfs, especiallay when compared with svm raid5. With faster
cpus I think one can get full bus or platter limited performance out
of a simple jbod enclosure.

Now the bad part. One of those 5 disks is known to be flaky after some
days uptime, but this does make it a great testing device. So without
any heavy i/o traffic this disk went offline, it doesn't have any bad
blocks. luxadm then showed the device as bypassed.

Then I tried to check the storage pool with zpool status. Command did
not return, simply was hung. Another df command: also hung. Even
luxadm didn't work anymore in this state. Every command that tried to
access the filesystems or devices with the dead disk did not work. The
box itself worked ok, ssh logins where no problem. It seems that zfs
does not give up accessing this dead device, and as it's dead there is
no chance for responding (see the SCSI transport failed messages
below). Shutdown did not work too, I was a bit surprised that reboot
didn't work either. The only way out of this kind of deadlock was
going to ok prompt and sync, see messages below.

After the reboot everything looked good, the filesystem in the zpool
was available in degraded mode as expected. zpool status recognized
the dead disk as "unavailable".

I found a case with similiar error messages

http://mail.opensolaris.org/pipermail/zfs-discuss/2006-April/001768.html

albeit on nevada. Here it was no bad disk but tampering with the
underlying device, the result was a real kernel panic. In my case
on Solaris 10 6/06 there was no panic, but the behaviour of zfs is imho
still not appropiate for a production box.

Does anybody know of this kind of problem? How is zfs supposed to
handle bad disks? Any tuning possible, maybe timeout settings?

Here is the relevant part of /var/adm/messages:

Aug 6 06:09:41 tycho scsi: [ID 107833 kern.warning] WARNING: /pci@1f,2000/SUNW,jfca@1/fp@0,0/ssd@w22000020371a7cd1,0 (ssd23):
Aug 6 06:09:41 tycho offline
Aug 6 06:09:41 tycho scsi: [ID 107833 kern.warning] WARNING: /pci@1f,2000/SUNW,jfca@1/fp@0,0/ssd@w22000020371a7cd1,0 (ssd23):
Aug 6 06:09:41 tycho SCSI transport failed: reason 'tran_err': giving up
...
Aug 6 06:34:07 tycho scsi: [ID 107833 kern.warning] WARNING: /pci@1f,2000/SUNW,jfca@1/fp@0,0/ssd@w22000020371a7cd1,0 (ssd23):
Aug 6 06:34:07 tycho offline
Aug 6 06:34:07 tycho scsi: [ID 107833 kern.warning] WARNING: /pci@1f,2000/SUNW,jfca@1/fp@0,0/ssd@w22000020371a7cd1,0 (ssd23):
Aug 6 06:34:07 tycho SCSI transport failed: reason 'tran_err': giving up
Aug 6 06:34:19 tycho offline
Aug 6 06:34:19 tycho scsi: [ID 107833 kern.warning] WARNING: /pci@1f,2000/SUNW,jfca@1/fp@0,0/ssd@w22000020371a7cd1,0 (ssd23):
Aug 6 06:34:19 tycho i/o to invalid geometry
...
Aug 6 06:52:14 tycho unix: [ID 836849 kern.notice]
Aug 6 06:52:14 tycho ^Mpanic[cpu1]/thread=2a1002efcc0:
Aug 6 06:52:14 tycho unix: [ID 879351 kern.notice] sync initiated
Aug 6 06:52:14 tycho unix: [ID 100000 kern.notice]
Aug 6 06:52:14 tycho unix: [ID 839527 kern.notice] sched:
Aug 6 06:52:14 tycho unix: [ID 294280 kern.notice] software trap 0x7f
Aug 6 06:52:14 tycho unix: [ID 101969 kern.notice] pid=0, pc=0xf00515d8, sp=0x2a1002eeec1, tstate=0x8800001400, context=0x0
Aug 6 06:52:14 tycho unix: [ID 743441 kern.notice] g1-g7: 1047374, 0, 183d400, 0, 1074000, 0, 2a1002efcc0
Aug 6 06:52:14 tycho unix: [ID 100000 kern.notice]
Aug 6 06:52:14 tycho genunix: [ID 723222 kern.notice] 00000000fff83d10 unix:sync_handler+138 (30000e78000, 3, 1, 1074c00, 1, 1814000)
Aug 6 06:52:14 tycho genunix: [ID 179002 kern.notice] %l0-3: 000000000184c3c0 000000000184c000 000000000000017f 0000000001843c00
Aug 6 06:52:14 tycho %l4-7: 0000000000000000 000000000183d400 000000000000000a 000000000180fc00
Aug 6 06:52:14 tycho genunix: [ID 723222 kern.notice] 00000000fff83de0 unix:vx_handler+80 (fff49a00, 181c658, 0, 0, 181c760, f003acd5)
Aug 6 06:52:14 tycho genunix: [ID 179002 kern.notice] %l0-3: 000000000181c760 0000000000000000 0000000000000001 0000000000000001
Aug 6 06:52:14 tycho %l4-7: 0000000001810400 00000000f0000000 0000000001000000 0000000001016ca0
Aug 6 06:52:14 tycho genunix: [ID 723222 kern.notice] 00000000fff83e90 unix:callback_handler+20 (fff49a00, fffd6280, 0, 0, 0, 0)
Aug 6 06:52:14 tycho genunix: [ID 179002 kern.notice] %l0-3: 0000000000000016 00000000fff83741 00000000f0000000 00000000fffe0000
Aug 6 06:52:14 tycho %l4-7: 00000000f0051584 0000000000000000 0000000000000000 00000000fffeefa0
Aug 6 06:52:14 tycho unix: [ID 100000 kern.notice]
Aug 6 06:52:14 tycho genunix: [ID 672855 kern.notice] syncing file systems...
Aug 6 06:52:14 tycho genunix: [ID 904073 kern.notice] done
Aug 6 06:52:14 tycho genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c0t0d0s1, offset 859111424, content: kernel
Aug 6 06:52:14 tycho genunix: [ID 409368 kern.notice] ^M100% done: 420767 pages dumped, compression ratio 9.62,
Aug 6 06:52:14 tycho genunix: [ID 851671 kern.notice] dump succeeded
Aug 6 06:56:16 tycho genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_118833-17 64-bit

Any ideas?

Cheers,
Michael

-- 
i.A. Michael Hase              Six Offene Systeme GmbH
michael@six.de                 Am Wallgraben 99
http://www.six.de              70565 Stuttgart
phone +49 711 99091 62         Germany
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:40:34 EDT