SATA-RAID and ZFS: Infortrend/ADVUNI OXYGENRAID: top or flop?

SATA RAID devices have become quite popular as they have a low pricetag for the storage size offered. SATA disks are said to have much lower input/output operations per second capability compared to SCSI/SAS or FC devices. Public opinion is also that SATA devices are failing much earlier than their SCSI/FC counterparts.

And then there is Sun's new filesystem and volume manager system: ZFS. ZFS also is capable to create RAID groups of its own - how does it perform with SATA devices?

I had the opportunity to have an Infortrend aka Advanced Unibyte OXYGENRAID device with 16 spindles à 750 GB each for three days to test in ZFS environments. Not all tests I wanted to made could be made in that short time. This isn't meant as a scientific work because I did not have the time take three or four results per configuration to guess the error range of my results. So take the following numbers as a first impression.

So let's begin.

Test equipment

  • Sun X4200 M2 dual opteron server with 20 GB RAM, 10 GB used for ARC cache
  • 2 Sun (Q-Logic) FC cards installed, 4 Gbit/sec capable, connected through a SAN switch fabric to the raid device
  • SAN switch fabric without other traffic (isolated)
  • Solaris 10 x86 with Kernel 127112-11 installled (all relevant ZFS patches as of 2008/05/12).
  • Advanced Unibyte (OEM Infortrend) OXYGENRAID, Firmware 4F614, 16 x 750 GB SATA Seagate ST3750640AS Barracuda 7200.10

This was a real world test, so caches on the system (ARC) and on the RAID device are ON. I did explicitly not want a lab benchmark with all caches turned off.

Test configurations

The following configurations were tested:
  • 1x1: single disk, configured as JBOD/non-RAID, for reference
  • 1x2r1: RAID1 (mirror) with 2 disks
  • 1x2m: ZFS mirror with 2 disks as JBOD/NRAID
  • 1x3r5: RAID5 with 3 disks
  • 4x3r5: 4 RAID5 sets with 3 disks each, striped via ZFS
  • 1x6r5: RAID5 with 6 disks
  • 1x6z1: ZFS raidz1 with 6 disks (JBOD/NRAID)
  • 2x6r5: 2 RAID5 sets with 6 disks each, striped via ZFS
  • 1x12r5: RAID5 with 12 disks
  • 1x12z1: ZFS raidz1 with 12 disks (JBOD/NRAID)

All devices have been accessed via Sun's scsi_vhci (MPxIO) driver, logical blocksize 20 (1 MB). For the 12-disk-raidz1-Test, 6 disks were configured per Infortrend controller, because one Infortrend controller can only handle up to 8 LUNs.

A side note for raidz1/JBOD configurations: I wanted to know whether raidz1 is usable with this RAID device by configuring the disks just as JBOD. I did not want to test the usability of ZFS' raid implementation in general. Infortrend's NRAID/JBOD implementation is 'in' these numbers, too.

Test method

Sun's filebench tool was used to generate the results.

The following filebench personalities and parameters were used:

  • multistreamread
    • $filesize=10g
  • multistreamwrite
    • $filesize=10g
  • varmail
    • $filesize=10000
    • $nfiles=100000
    • $nthreads=60
  • oltp
    • $filesize=4g

All tests were run for 300 seconds ("run 300").

ZFS was used as filesystem in all scenarios.

multistreamread

spacer

The result is not very impressive, more spindles mean more i/o bandwidth. More interesting is the fact that ZFS' mirror mechanism performs better with sequential reads than the RAID1 mirror implemented in the raid device. A 12 disk RAID5 configuration performs a little bit better than a 4-way-ZFS-stripe of 3-disk-RAID5-devices. The differences are not substantial. It seems that 85 MB/s are somewhat of a bottleneck for this raid device. Note that ZFS' raid configurations have a slightly lower bandwith.

The next graph compares used cpu time:

spacer

Again - nothing impressive. As ZFS' raid implementation uses the system's CPUs it will use more cpu time compared to the access to the RAID devices handled by the RAID system.

multistreamwrite

spacer

Write numbers are more interesting - it seems that our RAID device does do writes very well - better than reads. Note that even with a RAID1 (mirror) configuration, write bandwith is higher than in the single disk configuration. ZFS' mirror does not enhance the write bandwidth in any way. Infortrend's write back cache seems to do a very good job.

spacer

CPU time numbers are somewhat similar than the read numbers, again the cpu overhead for zfs' raid implentations is visible.

varmail

The varmail scenario is a heavy random i/o scenario with many files in one directory with concurrent access of many threads. If you don't want to use a RAID system for single user video streaming, read on.

spacer

The graphic above shows the number of total operations per second measured. If you like to have real read/write operations, this is the graph for you:

spacer

Main results:

  1. ZFS' mirror does not enhance performance.
  2. More spindles mean more operations per time unit.
  3. ZFS concatenation/stripe algorith introduces a performance decrease.
  4. ZFS' RAID implementation in conjunction with Infortrend's JBOD setting is not recommendable for this scenario.
spacer

More spindles effectively reduce latency. ZFS' raid latency is bigger than Infortrend's. 19ms is the lower limit - 5 times the value stated by Seagate for one of these disks (Barracuda 7200.10).

The amount of CPU time used by ZFS' raid implementation is not a big hit (times in microseconds):

spacer

oltp

Now, a more complex scenario. "oltp" simulates a transactional database (like Oracle, Postgres, ...) with (very) small database updates, a common shared memory mapped region and a transaction log file. 230 threads are running in parallel. First result: Do not use this RAID for that kind of workload - and use many spindles in case you have to do so - and bury immediately any idea to use ZFS' raid implementation raidz1 with this Infortrend device.

spacer
  1. raidz1 (ZFS raid) does not scale at all with the number of spindles.
  2. ZFS' concatenation/stripe algorithm performs very well with this kind of workload.
  3. These Seagate SATA disks seem to be able to handle 100 oltp ops per second. FC and SAS disks should handle more than 100.

The last graph shows cpu usage:

spacer

CPU time overhead of ZFS raidz1 is considerably high for this kind of workload.

Notes

In conjunction with this Infortrend device, ZFS raidz1 is performing very badly besides sequential access patterns. As many sources on the internet are stating thet ZFS raidz IS in fact very fast, it must be a rather dull JBOD implementation of this Infortrend device - as it is sold as a hardware RAID device, nobody wants to use them as JBOD cases - it's not the job of that box.

For comparison, I would like to make this test with the same hostsystem with other devices. Main problem: I don't have access to other FC storage devices at the moment. I'll have an old LSI Fiberchannel disk system with real FC disks next week, so I will be able to make filebench tests.

The tests were done in the datacenter of University of Konstanz.

0 TrackBacks

Listed below are links to blogs that reference this entry: SATA-RAID and ZFS: Infortrend/ADVUNI OXYGENRAID: top or flop?.

TrackBack URL for this entry: southbrain.com/mt/mt-tb.cgi/28

2 Comments

By Brett Morrow on July 2, 2008 5:00 PM

Thank you for posting your results. I have been working with these Infortrend raids and ZFS for a while now. We did get really low performance until we added:

* NOTE: Cache flushing is commonly done as part of the ZIL operations.
* While disabling cache flushing can, at times, make sense, disabling the
*ZIL does not.
* If you tune this parameter, please reference this URL in shell
* script or in an /etc/system comment.
* www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#FLUSH
set zfs:zfs_nocacheflush = 1

to the /etc/system. I had talked to infortrend about how to turn off the syncs on their end (to ignore them as other raids allow) but they say there is no way, so I have to live on the danager side with a good UPS to back it up :)


By Pascal Gienger on July 2, 2008 7:26 PM

We did that setting (not only for this test). Otherwise the numbers would have been too abysmal. I made a note on that setting also on this blog.

The device is running fairly well, we use it as a big mail storage for a campus mail system (University of Konstanz, Germany).

Search

spacer
About me
(Google+)

 

Pages

  • Pascal Gienger, Winterthur, Switzerland
  • articles
    • LSI ProFibre 4000R: anno 2002.
    • Me and my new desktop: Ubuntu 11.04 "Natty" with Unity
    • SATA-RAID and ZFS: Infortrend/ADVUNI OXYGENRAID: top or flop?
    • Sun StorageTek StorEdge 6140 FC/SATA array
  • software
    • Mandelbrot set as Java Applet
    • vhci_stat
  • tutorials
    • Creating and manipulating zpools (zfs)
    • Fiberchannel Solaris Part 1: Introduction
    • Fiberchannel Solaris Part 2: Configuring SAN and TCP/IP
    • Fiberchannel Solaris Part 3: SAN Diagnostics
    • Installing Redhat Enterprise Server 5 in VMWare with PVSCSI (paravirtual SCSI)
    • Installing Redhat Enterprise Server 5 under Sun xVM/Xen
    • Installing SLES 11 under Xen (Sun xVM)
    • Using zfs (basics)
  • uncommented
    • OpenSolaris snv_121 Xen boot (Sun xVM 3.4.2)
    • Redhat ES 5.4 Xen Boot (Sun xVM 3.4.2)
    • SLES 11 Xen boot (Sun xVM 3.4.2)
  • virtualization
    • 1: Why virtualization?
    • 2: Userspace-based virtualization (the easy way)
    • 3: Xen: Hypervisor-based virtualization
    • 4: ESX: Hypervisor-based virtualization
    • 5: Solaris Zones: A sharing approach
    • 6: Hybrid methods: KVM

Categories

  • Android (3)
  • Asterisk VoIP (7)
  • BSOD (1)
  • Cisco (2)
  • Cyrus IMAP (7)
  • Horde framework (1)
  • Ironport (1)
  • Java (1)
  • Linux (6)
    • Ubuntu (3)
  • Misc (34)
  • Perl (6)
  • Postfix (7)
  • Privacy (1)
  • SOGo (1)
  • Security (3)
  • Solaris (65)
  • Storage (6)
  • Travel (18)
  • Unforgotten (6)
  • Virtualization (28)
    • Sun xVM (21)
    • VirtualBox (5)
    • vmware (7)

Monthly Archives

  • May 2013 (1)
  • January 2012 (7)
  • December 2011 (1)
  • October 2011 (2)
  • September 2011 (1)
  • August 2011 (6)
  • July 2011 (2)
  • May 2011 (2)
  • April 2011 (3)
  • October 2010 (1)
  • August 2010 (1)
  • July 2010 (2)
  • May 2010 (3)
  • April 2010 (1)
  • March 2010 (2)
  • February 2010 (3)
  • January 2010 (5)
  • December 2009 (8)
  • November 2009 (6)
  • October 2009 (6)
  • September 2009 (13)
  • August 2009 (9)
  • June 2009 (5)
  • May 2009 (1)
  • April 2009 (4)
  • March 2009 (3)
  • February 2009 (2)
  • January 2009 (1)
  • December 2008 (1)
  • November 2008 (3)
  • October 2008 (6)
  • September 2008 (2)
  • August 2008 (6)
  • July 2008 (6)
  • June 2008 (4)
  • May 2008 (8)
  • April 2008 (7)
  • March 2008 (6)
  • February 2008 (11)

Recent Entries

  • This blog will not be updated. The comments are closed.
  • SOGo and Cyrus IMAP: 2.4 works well.
  • login account failure: No account present for user
  • The future of IT: CLOSED.
  • known_hosts in hash format - OpenSSH
  • Solaris 10 Minimal, getting started with SSH server
  • OpenSSL 1.0.0: New CApath hashes!
  • Google Earth clone from Nokia using WebGL
  • Merry Christmas and a happy new year!
  • Ubuntu 11.10 Oneiric Ocelot available

December 2015

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

About

This blog is owned by:

Pascal Gienger
Jägerstrasse 77
8406 Winterthur
Switzerland


Google+: Profile
YouTube Channel: pascalgienger

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.