Yenya's World

Sat, 18 Dec 2010


As some of you know, I am not a huge fan of hardware RAID. In my opinion, software RAID can be faster in most cases. This is mainly because the operating system these days has several orders of magnitude larger buffer cache. This means more space for sorting, reordering, and prioritizing requests, fine tailored to the individual disks. Moreover, filesystems like XFS can be teached about the RAID structure of the underlying block device, and can optimize requests based on this knowledge.

The advantages of hardware approach are elsewhere: HW RAID box is usually well tied to the hardware, so for example the disk slot numbers actually match the numbers reported by the storage controller software, it can have battery-backed cache, etc. On the other hand, SW RAID is better tied to the operating system, which can see the SMART health data of the disks, uses standard means of reporting drive failures, etc. HW RAID controllers differ on a vendor-by-vendor basis in reporting, configuring, etc.

Yesterday I have been able to verify the above claims on a real iron: I have a box with LSI Fusion MPT SAS controller, and several 2TB WDC RE-4 black drives. So I have configured a HW RAID-10 volume using 10 disks, and then a SW RAID-10 volume using 10 disks. The initial measurement (after the RAID resync has finished) are here:

time mkfs.ext4 /dev/md0  # SW RAID
real	8m4.783s
user	0m9.255s
sys	2m30.107s

time mkfs.ext4 -F /dev/sdb # HW RAID
real	22m13.503s
user	0m9.763s
sys	2m51.371s

time sh -c 'dd if=/dev/zero bs=1M count=10000 of=/hwraid/bigfile; sync'
real	1m22.967s
user	0m0.005s
sys	0m11.898s

time sh -c 'dd if=/dev/zero bs=1M count=10000 of=/swraid/bigfile; sync'
real	0m36.771s
user	0m0.008s
sys	0m11.224s

I plan to do more tests (with XFS and BTRFS) later.

Section: /computers (RSS feed) | Permanent link | 8 writebacks

8 replies for this story:

Polish wrote: more tests case

Try use fio with iometer test - it measures iops and delay.

Pavel Janík wrote:

I always remember you status quo regarding software RAID when is resyncing its large SW RAID array and is SLLLLOW like hell for a long time. :-)

Mirek Suchy wrote: Phoronix test suite

So it seem sometime is better HW and sometime SW raid. Can you test it with Phoronix Test Suite: so we can see some graphs and more data, and better decide when is better SW or HW?

tecik wrote: really HW raid?

Have experience with: LSI Fusion MPT SAS controller (SAS1068E) This pseudo-hw-raid controler is same/worst than SW, yes.. Maybe im wrong, but missing memory, missing cpu... resync is up to host system (mean hardware [cpu/memory]). Can u try really HW raid controllers like Perc5/6 (or newest h700/h800)? From my experiences - its really huge different... and number will be really different.

Yenya wrote: Re: tecik

This one definitely has memory and CPU and probably is doing resync itself (at least after I created the RAID-10 volume on it, the HDD LEDs blinked for about a day). No, I don't have dozens of HW RAID controllers for experimenting.

Yenya wrote: Re: Pavel Janík

The problem you have experienced was a poor interaction between the Linux CFQ iosched and RAID-5 resync. After switching off to deadline iosched both interactivity and RAID rebuild time became much better. Anyway, I am not saying HW RAID has to be slower than SW RAID in all cases, just that it is possible to have an implementation of SW RAID which is faster on the same hardware than any HW RAID theoretically can be, unless the HW RAID controller cache is at least the same as the OS can dedicate for its own buffer cache.

Polish wrote: ssd will be challenge

Rotational discs are slow in IOPs performance. Quality of I/O subsystem should be more visible on SSD disc (flash or RAM based), even a little delay should lead to big performance degradation.

Jiri Horky wrote: Well..

Well, I would be very careful about making any final comments about what's better. I have been benchmarking SAS1064ET controller on 20+ IBM dx360M3 [24 cores with HT, 48GBRAM] nodes against SW RAID [RAID 0]. What we have seen is that sequential and also random writes using iozone with 512K blocks was aprox. 15-20% better with SW RAID [reads roughly the same] in both, 1-thread test and 24-thread throughput tests, BUT using our real application (mostly random reads) has shown that SW RAID can be as much as 3x slower in such scenarios. As this was quite a surprise, I've tried to play with chunk size setting but I was unable to match the HW RAID performance. So we decided to go with HW RAID. I have a report with all the details if anyone would be interested....

Reply to this story:

URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)


Yenya's World: Linux and beyond - Yenya's blog.


RSS feed

Jan "Yenya" Kasprzak

The main page of this blog



Blog roll:

alphabetically :-)