Sat, 18 Dec 2010
HW RAID versus SW RAID
As some of you know, I am not a huge fan of hardware RAID. In my opinion, software RAID can be faster in most cases. This is mainly because the operating system these days has several orders of magnitude larger buffer cache. This means more space for sorting, reordering, and prioritizing requests, fine tailored to the individual disks. Moreover, filesystems like XFS can be teached about the RAID structure of the underlying block device, and can optimize requests based on this knowledge.
The advantages of hardware approach are elsewhere: HW RAID box is usually well tied to the hardware, so for example the disk slot numbers actually match the numbers reported by the storage controller software, it can have battery-backed cache, etc. On the other hand, SW RAID is better tied to the operating system, which can see the SMART health data of the disks, uses standard means of reporting drive failures, etc. HW RAID controllers differ on a vendor-by-vendor basis in reporting, configuring, etc.
Yesterday I have been able to verify the above claims on a real iron: I have a box with LSI Fusion MPT SAS controller, and several 2TB WDC RE-4 black drives. So I have configured a HW RAID-10 volume using 10 disks, and then a SW RAID-10 volume using 10 disks. The initial measurement (after the RAID resync has finished) are here:
time mkfs.ext4 /dev/md0 # SW RAID real 8m4.783s user 0m9.255s sys 2m30.107s time mkfs.ext4 -F /dev/sdb # HW RAID real 22m13.503s user 0m9.763s sys 2m51.371s time sh -c 'dd if=/dev/zero bs=1M count=10000 of=/hwraid/bigfile; sync' real 1m22.967s user 0m0.005s sys 0m11.898s time sh -c 'dd if=/dev/zero bs=1M count=10000 of=/swraid/bigfile; sync' real 0m36.771s user 0m0.008s sys 0m11.224s
I plan to do more tests (with XFS and BTRFS) later.
8 replies for this story:
Polish wrote: more tests case
Try use fio with iometer test - it measures iops and delay.
Pavel Janík wrote:
I always remember you status quo regarding software RAID when ftp.linux.cz is resyncing its large SW RAID array and is SLLLLOW like hell for a long time. :-)
Mirek Suchy wrote: Phoronix test suite
So it seem sometime is better HW and sometime SW raid. Can you test it with Phoronix Test Suite: http://phoronix-test-suite.com/ so we can see some graphs and more data, and better decide when is better SW or HW?
tecik wrote: really HW raid?
Have experience with: LSI Fusion MPT SAS controller (SAS1068E) This pseudo-hw-raid controler is same/worst than SW, yes.. Maybe im wrong, but missing memory, missing cpu... resync is up to host system (mean hardware [cpu/memory]). Can u try really HW raid controllers like Perc5/6 (or newest h700/h800)? From my experiences - its really huge different... and number will be really different.
Yenya wrote: Re: tecik
This one definitely has memory and CPU and probably is doing resync itself (at least after I created the RAID-10 volume on it, the HDD LEDs blinked for about a day). No, I don't have dozens of HW RAID controllers for experimenting.
Yenya wrote: Re: Pavel Janík
The problem you have experienced was a poor interaction between the Linux CFQ iosched and RAID-5 resync. After switching off to deadline iosched both interactivity and RAID rebuild time became much better. Anyway, I am not saying HW RAID has to be slower than SW RAID in all cases, just that it is possible to have an implementation of SW RAID which is faster on the same hardware than any HW RAID theoretically can be, unless the HW RAID controller cache is at least the same as the OS can dedicate for its own buffer cache.
Polish wrote: ssd will be challenge
Rotational discs are slow in IOPs performance. Quality of I/O subsystem should be more visible on SSD disc (flash or RAM based), even a little delay should lead to big performance degradation.
Jiri Horky wrote: Well..
Well, I would be very careful about making any final comments about what's better. I have been benchmarking SAS1064ET controller on 20+ IBM dx360M3 [24 cores with HT, 48GBRAM] nodes against SW RAID [RAID 0]. What we have seen is that sequential and also random writes using iozone with 512K blocks was aprox. 15-20% better with SW RAID [reads roughly the same] in both, 1-thread test and 24-thread throughput tests, BUT using our real application (mostly random reads) has shown that SW RAID can be as much as 3x slower in such scenarios. As this was quite a surprise, I've tried to play with chunk size setting but I was unable to match the HW RAID performance. So we decided to go with HW RAID. I have a report with all the details if anyone would be interested....