Yenya's World

Tue, 11 Jul 2006

3ware Disk Latency

Odysseus with the new hardware seems to be pretty stable. However, there is still a problem: it seems that with the new 3ware 9550SX disk controller, the drives have much bigger latency than they had with the older controller (7508).

The system apparently has a bigger overall throughput, but the latency sucks. It is most visible on Qmail - with the old setup, Qmail was able to send about 2-4k individual mails per 5 minutes. With the new setup, this number is in low hundreds of messages per 5 minutes. With this slowness, Odysseus is not even able to keep up with the incoming queue. After the new HW was installed, the delay of the mail queue was several days(!).

I have found this two years old message to LKML, where they try to solve the same problem with disk latency. It seems that the 3ware driver allow up to 254 requests in flight to a single SCSI target, while the kernel's block layer queue (nr_requests) is only 128 requests deep. This means that the controller sucks all the outstanding requests to itself, and the kernel's block request scheduler does not have an opportunity to do anything.

So I have lowered the per-target number of requests to 4, and disabled the NCQ on the most latency-sensitive drives (i.e. those which carry the /var volume), and the performance looks much better now. I think the main difference between the old HW and the new one is that the new controller has much bigger cache, so it can allow more requests in-flight. So the kernel scheduler cannot prioritize the requests it considers important, causing the whole latency to go up.

I hope I have solved the latency problem for now, but during summer holidays the FTP server load is usually lower, so the problem may return back.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)