Yenya's World

Tue, 19 Feb 2008

The Cost of Flexibility (and Cleanliness)

In the previously mentioned distributed computing project, I am trying to do something like the following code:

sub parse_file {
    my $fh = shift;
    while (my $parsed_data
            = nontrivial_get_data_from($fh)) {

The nontrivial_get_data_from($fh) code is indeed non-trivial (in the terms of lines of code, not necessarily in the terms of CPU time), while handle($parsed_data) is pretty straightforward. Now the problem is that I want to use this non-trivial code with different handle($parsed_data) routines (for example, printing out the $parsed_data for testing purposes). A natural way would be to implement a pure virtual class in which the $self->handle($parsed_data) routine would be called inside the parse_file() method, and which the programmer would subclass, providing different $self->handle() implementations.

I have found that using a subclassed method $self->handle() instead of putting the handling code directly into parse_file() costs about 14 % of time (the dirty inlined code took 35 seconds on the test data set, while the nice and clean subclassed one took 40 seconds).

So, my dear Perl gurus, how would you implement this? I need to call different code in the innermost loop of the program, and just factoring it out into the subroutine (or a virtual method) costs me about 14 % of time. Maybe some clever eval { } and precompiling different instances of parse_file()? In fact I don't really need the flexibility of objects: I need only a single implementation of handle($parsed_data) in a single program run, but I want to be able to use a different handle() code with the same parse_file() code base called from different programs.

Section: /computers (RSS feed) | Permanent link | 6 writebacks

Thu, 14 Feb 2008

AMD versus Intel

For a long time, we have been using AMD Opterons and Athlons 64 for our web and application servers. Everybody says that Intel has made a big progress in the last year or so, so I wondered whether an Intel architecture would be better than AMD one for our upcoming distributed computing project. Usually all benchmarks display things like raw memory throughput, encoding/decoding video (which can be done using the SIMD instructions), etc. However, how would the architectures perform on heavily branched code?

A part of our project is sorting big quantities of data. We have chunks consisting of 4-byte key, 4-byte value pairs, which have to be sorted according to the key. Since the data is being generated relatively slowly, I have decided to pre-sort it using a bucket sort into a set of 256 (for now) bucket files, then sort each bucket file separately, and finally concatenate the results. I have tried to measure how long the "sort all buckets" step will take on a single core:

Machinecc -Oscc -O6cc -O6 w/o memcpy()
Athlon64 FX-51
2.2 GHz/1 MB L2
Athlon64 x2 5600+
2.8 GHz/2x 1 MB L2
Pentium D
3.0 GHz/2 MB L2

The first two variants used memcpy() inside the quicksort routine for swapping the two entries (in order to be prepared for possible future variable data size), the last one used single 64-bit instruction instead. There are two interesting observations there:

Another interesting part was the cache size effect: four biggest buckets had 1088232, 1046624, 872792, and 776224 bytes, respectively. Sorting those four buckets took 2.26, 2.22, 0.63, and 0.21 seconds (on the above FX-51 machine). This means that somewhere around 800 KB of data size, the algorithm could no longer fit the data into the L2 cache, resulting in a big slowdown: these four buckets together took more time to sort than the remaining 254 buckets, even though they contain only 2.23 % of the total data size. I guess I will just use more (512?) buckets in the production version.

So, what is your experience with compiler optimization settings, and with speed of various CPUs and architectures?

Section: /computers (RSS feed) | Permanent link | 3 writebacks

Thu, 07 Feb 2008

QR Code on a Business Card

It seems more and more phones are able to read the "2-dimensional bar codes", such as QR Code. While these codes have a widespread use mainly in Japan, they will probably become popular here as well. I have been thinking about puting one to my business card (actually, Mirka was the one who got the idea first).

QR Code with this blog URL The problem is that there apparently is no standard way of formatting contact data in the QR code. All readers I have seen so far handle them as plain text, searching for embedded phone numbers and URLs the same way as they do when displaying any other plain text data (like SMS messages).

For example, the generator allows to create codes for URLs, phone numbers, or plain text. However, the code for an URL looks exactly the same as an equivalent code for plain text with that URL, and the code for a phone number is the same as the code of the "TEL:that_phone_number" plain text string.

I have not found how to encode the information "the phone number of Kasprzak, Jan is +420123456789" in a way which the scanner inside the phone can understand (i.e. by adding the contact to the phonebook, including my long-and-complicated surname). Does my lazyweb know how to do this?

Section: /computers (RSS feed) | Permanent link | 1 writebacks

Wed, 06 Feb 2008

Event Library

For a distributed computing project of mine I have been looking for a lightweight library for event-driven programming. For Perl, I have found EV (with EV::Watcher::Bind add-on). Now the question is what to use for C.

I have decided to not use glib, because it is too heavyweight (with its own object system, etc.). The next choice was libev, because this is what the perl EV module is based on. And another one was libevent, which is already included in Fedora. Both libev and libevent are still being actively developed.

The most interesting thing is probably that in the libev documentation the author says that the glorified kqueue API on *BSD is broken on most *BSDs (scroll down to the EVBACKEND_KQUEUE text).

I will probably use libev: it even provides an API-compatible wrapper to libevent-based applications (does anybody know whether it is also ABI-compatible?), and according to its author, it is faster than libevent, even when using the libevent API wrapper. For further reading on why an advanced kernel API is needed instead of select(2) or poll(2) see The C10k problem article.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

Fri, 01 Feb 2008

Spell-Checking in VIM

Mirek has asked me how do I spell-check my (preferably TeX) documents. Well, I don't, but his question made me wonder how spell-checking in Vim is done.

screenshot of spell-checking

Deny has suggested this howto by Pavel Satrapa. It is basically correct, but the details are outdated. Moreover, I need something more: I usually write in three modes: English, Czech, and Czech in US-ASCII (without diacritics). So I need a Czech dictionary in ASCII.

Firstly, here is the Czech dictionary for Vim. Put it into ~/.vim/spell/ (or to a system-wide $VIMRUNTIME/spell/ - see :echo $VIMRUNTIME for the exact location).

For Czech ASCII spell-checking, I have downloaded the data (you need the file). Unpack it, convert cs_CZ.dic and cs_CZ.aff into ASCII (iconv -f iso-8859-2 -t us-ascii//TRANSLIT), rename them to (say) csa.dic and csa.aff, run the :mkspell csa command in vim, and move the resulting csa.utf-8.spl file to the same directory as the original cs.utf-8.spl.

Now set Vim to use these dictionaries using "set spelllang=cs,en,csa" and "set spell" commands in your ~/.vimrc, and you should be able to test it. Remember to disable the "csa" language when writing a document which should be strictly with diacritics. Further reading: Vim documentation: spell.

Section: /computers (RSS feed) | Permanent link | 2 writebacks


Yenya's World: Linux and beyond - Yenya's blog.


RSS feed

Jan "Yenya" Kasprzak

The main page of this blog



Blog roll:

alphabetically :-)