Yenya's World

Wed, 29 Nov 2006

The Opinion of Novell Employees

During last weeks I wondered how Novell (ex-SUSE) Linux engineers feel about the Novell-Microsoft partnership. Some of them say it directly, while others use carefully chosen words. GregKH's essay (the later link) is really worth reading. It reminds me of the times of the communist regime in the Czechoslovakia, when we were not allowed to state our opinions directly, so we had to use very careful words, from which the intended meaning could be guessed. Bad memories, oh well.

Section: /computers (RSS feed) | Permanent link | 2 writebacks

Thu, 23 Nov 2006

32-bit Galeon

I use an AMD64-based workstation, and from time to time I need to view data in proprietary formats, such as Flash. The Flash player is not available for Linux/AMD64, only for Linux/i386. The Flash plug-in is a shared library, so as a 32-bit object it can be used only with 32-bit executable, not with my 64-bit Galeon. I used to run flashplayer remotely from my laptop, but now I have decided I need it on my workstation as well.

Fortunately, there is a 32-bit version of Galeon in Fedora Extras. So I have installed it, and at a cost of keeping both 32-bit and 64-bit shared libraries in memory, it works, even with Flash plugin.

There was a problem, though: I use SCIM (with Anthy) for input of Japanese characters. It did not work with 32-bit Galeon. I have looked into the Red Hat bugzilla, and found bug #215583 there. In the discussion, Jens Petersen recommends to use scim-bridge, which allows the input methods to be run over the socket, in the separate address space. So it works even when the input method runs on different architecture (AMD64 in my case) than the main application (Galeon/i386) in my case. I just had to install both 32-bit and 64-bit scim-bridge-gtk modules, the scim-bridge package, and set the GTK_IM_MODULE environment variable to scim-bridge instead of scim.

So now I have both Japanese input method and Flash working in Galeon. Nevertheless, I hope there will be a full-featured and free Flash implementation soon.

Section: /computers (RSS feed) | Permanent link | 1 writebacks

Wed, 22 Nov 2006

CPAN Bugs

I am migrating the IS MU cluster to Fedora 6 - the front-end servers are already migrated, and now I am working on migrating the web cluster. This is a bigger change than it may seem, because we are also migrating the old 32-bit system to a fully 64-bit one. So I am recompiling everything from Apache to Perl. While testing the Perl modules I found the following regressions:

The Text::Tabs module does not work correctly on UTF-8 characters. Here is the test case - the output is misaligned:

perl -CS -Mutf8 -MText::Tabs -e \
	 'print expand("\taa\t.\n\t\x{010a}\x{010a}\t."), "\n"'

I did not dig into this further, replacing Tabs.pm from the older version worked.

Another regression was in Crypt::Cracklib, which we use for generating and checking passwords. I have seen random crashes when using this module. The problem was that it did not #include the <crack.h> header file, so the compiler thought that some function returns int where in fact it returned a pointer. On AMD64 the integer is smaller than the pointer, so the pointer got truncated. I have solved this with modifying Makefile.PL the following way:

                $include = $_;
                if (-f "$include/packer.h") {
+                        if (-f "$include/crack.h") {
+                                $incfile .= ' -Dcrack=1';
+                        }
                        last;
                } elsif (-f "$include/crack.h") {

In both cases I have contacted the authors, but it would be nice to have a bug-tracking system for CPAN modules. Is there anything like that available? If there isn't I hope at least this blog entry helps somebody to solve the problems I had with those two modules - especially Crypt::Cracklib has not been updated for years.

Section: /computers (RSS feed) | Permanent link | 2 writebacks

Tue, 21 Nov 2006

Linux Weekend 2006

The Linux Weekend, a conference organized by CZLUG in Prague, has been oficially announced.

Linux weekend banner

I will have a presentation about (guess what?) spam, but there will be more interesting talks as well ;-).

Section: /computers (RSS feed) | Permanent link | 0 writebacks

Thu, 16 Nov 2006

Up-to-date Knowledge

It took me a while to figure out that prelink changed my executable files and libraries yesterday, and that RPM is now prelink-aware. Vašek added an interesting comment to which I want to reply in this blog post:

Yes, prelink. It shoud be written at the top of rpm man page (maybe also bold and red) that prelink modifies binaries and RPM knows about it. Most people find differences and then hunt ghosts - just like you. I only wonder how a guru like you do not know this. :-)

Well, the simple answer is that I am by no means a guru :-) The more complicated answer is that I learned RPM long ago (pretty indepth, I even wrote a series of articles about it), when it did not have a prelink support.

This is a general problem with the IT knowledge: it is often not so hard to gain the knowledge, but it is much harder to keep up with ongoing changes. I think it is because there are manuals and tutorials for beginners, but almost nothing about what has changed in - for example - last two years. I don't count Changelogs, because they are clogged with changes at the micro-architectural level, which are of no interest after a year or two.

It is hard to keep the knowledge up-to-date even when you actually still use the system in question: I build RPM packages occasionally, yet this was the first time I came into the prelink support of RPM. It is the same with Perl, for example. I use Perl almost daily, yet many features used in Perl Best Practices were new for me (such as using "our $var;" instead of "use vars qw($var);").

Also, in this particular case it would not probably helped to have a big fat warning near the top of the rpm(1) manpage - I think I would not consult this manpage in this situation. Using strace(1) is more general :-). Keeping the knowledge up-to-date is pretty hard - for example, I skip most of content of the Linux Journal, because it in a lengthy ways repeats what I already know, and should there be anything new for me, it is deeply buried in facts I already know.

How do you keep up with the current development in IT, my dear lazyweb?

Section: /personal (RSS feed) | Permanent link | 2 writebacks

Wed, 15 Nov 2006

Hunting Ghosts

Today I worked on synchronizing filesystems on some of our high-availability systems. We use custom-made rsync-based setup for checking for differences between filesystems in a cluster. One of the hosts in a H-A pair has been down for a while because of a faulty hardware, so I had to manually check whether the changes on the active system can be propagated to the backup as well. I have synchronized the filesystems, and switched the load to the newly plugged-in host (because it is faster than the other one). Just to be sure, I re-ran the checks again, and was surprised: some files have been different on the new host now.

What was worse, the set of files which were different was a bit suspicious: bash, login, tcpdump, some other utils and libraries, including those which are run every time system boots (such as heartbeat and its libraries). I ran "rpm -V", just to be sure the files are different than in the RPM database, but it displayed that all files are OK and well matching the database. I took the clean RPMs from the FTP file repository, and the files in question were shorter in the package than on my filesystem. I thought: are current rootkits so smart that they modify the RPM database, and so stupid that "ls -l" still can tell the difference?

"rpm -qlv bash|grep /bin/bash" displayed that there was a different size in the RPM database than in the file itself, yet "rpm -V bash" said the package was perfectly OK. Strange. So I suspected the rpm program has been modified as well (even though it did not show up in the list of modified files). To prove this, I used strace. On a clean system its output was shorter, and the difference was that on a modified system rpm spawned some more threads/processes. "strace -f" then showed the quilty party - the rpm command executed prelink on each modified binary.

So I have been hunting ghosts all the time: the files in question have only not been prelinked yet, or the prelinking info has been overwritten (or not overwritten, I don't know) by my synchronization scripts. After running "/etc/cron.daily/prelink" on a "modified" system both filesystems look the same. Problem solved.

For a long time I wondered how prelinking can be done without modifying the binary (and thus breaking the packaging system). The answer for rpm appears to be: the package manager needs to know about prelinking as well. I have to find some time to read Jakub's prelink paper (PDF). Back to a serious work now.

Section: /computers (RSS feed) | Permanent link | 3 writebacks

Tue, 14 Nov 2006

Postfix Filtering

When I wrote about Posfix on the IS MU mailserver, I promised more details about my life with Postfix. So here we are:

I think one of the biggest pros of Postfix is that it has sane defaults. The only things you have to configure are those which are special for your setup. Now the cons: the biggest problem of Postfix is probably its filtering mechanism. When you want to do for example virus or spam filtering inside the SMTP session, there are three ways to do it:

The first one is the policy server - this is the server which listens on a socket (often being run by master process), reads requests from the SMTP server, and sends its verdicts to it (verdict can be DUNNO, DISCARD, REJECT, etc). It is possible to have a policy server for every part of the SMTP conversation - MAIL FROM, RCPT TO, DATA, end of DATA, etc. Policy servers are great for implementing greylisting, for example. However, there is one rather stupid property of the policy servers, which renders them unusable for virus or spam filtering: even the end-of-DATA policy server gets only the envelope information, and not the message itself. Why implement an end-of-DATA policy server at all, when it doesn't get any new information apart from what is already available in the DATA stage?

The second method is Milter (sendmail-compatible mail filtering interface). However, it does not have a native library for writing milters, and it requires sendmail to be installed and configured as well. Blehhh.

The last method of filtering is SMTP proxy - you can write a SMTP proxy, and Postfix's SMTP server forwards the message to the proxy, which can then do any filtering/discarding/rejecting according to its policy. If the message is to be passed back to Postfix for further handling, the proxy should send it over SMTP as well (recommended configuration is to run another SMTP daemon bound to another port on the loopback interface). This is poorly documented, because nobody knows whether Postfix's SMTP server passes all commands to the proxy (and thus which features of [E]SMTP should the proxy implement), or whether the proxy gets the message in some simplified form.

I am not aware of any other filtering mechanism, which would not include writing its own SMTP server and client, and which would allow me to filter messages (decide whether to reject or discard the message, or pass it in, maybe with an added header). As a performance-wise bonus, it would be nice if the filtering mechanism allowed the message to be accessed directly as a file inside the queue, without copying data and sending it back.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

Sat, 11 Nov 2006

Postgrey statistics

I wrote about an increase of spam volume in the previous month, and also about installing Postfix to the IS MU mailserver. Here are results after 10 days:

The total volume of my spam last month was lower than expected - only 1.3GB in 94k messages, so the increase was 1.6 times in size and 1.3 times in number of messages. However, the total number of spam going to other mailboxes in our domain has increased by 50% or more.

I also wrote that have installed Postgrey to the IS MU mailserver, hoping that greylisting[?] would deflect a majority of spam from IS MU. However, I did not found a decent statistics package for Postgrey, so I hacked up a simple script to generate statistics from Postgrey. It is too simple and possibly not entirely correct, because Postgrey does not provide much info in its log. Here are the stats from the first 10 days:

Greylisting stats from Nov  1 00:00:00
                  till Nov 10 15:38:03

Messages
total:                   2072453
accepted immediately:     179692 (  8.7%)
delayed:                   22473 (  1.1%)
blocked:                 1870288 ( 90.2%)

Greylisting delay (avg):       4399s delayed
                                550s all

SMTP servers:
                                   hosts   messages
No message accepted:              231940    1840367
No message graylisted:              1364     163021
All accepted (maybe delayed):       7750      21281 = 33243-11962
More blocked than accepted:         2112      34531 = 40432-5901

So it seems that 90% messages are spam which can be blocked by greylisting. Nice. The next problem is to recognize spam in the remaining 10%.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

Fri, 10 Nov 2006

What is on your flashdisk?

From time to time I think about finding out an ultimate Linux distribution suitable for being run from an USB flash disk. I have a 512MB flash disk, which should be big enough for basic tasks such as fsck, ssh, rsync, and maybe mutt and links. Maybe something like Slax?

There is however a drawback: I sometimes need to use my USB key as a raw device: I simply dd(1) a bootable image of something into it. Usually it is a diskboot.img file from Fedora, which I then use for installing/upgrading a computer, or even with the rescue feature of Fedora Core installer as a rescue disk (it requires the rest of the distro being available over the Net, though). Yesterday I even put an image of a DOS floppy to it, and used it to flash a new BIOS to a mainboard.

So needing a raw device too often has prevented me from using a "permanent" live distribution on my USB key. The problem may be solved with partitioning the device and installing a boot loader (can a DOS floppy image be booted when put on a partition, e.g. by GRUB?), but I don't know whether it is worth the trouble (and I may need a raw device for something else later). I also don't know if all BIOSes support booting from a partitioned removable device (as opposed to the raw device as a whole).

What is on your USB key? If some kind of a live distribution, do you also use it as an install disk or for flashing BIOSes as well?

Section: /computers (RSS feed) | Permanent link | 2 writebacks

Thu, 09 Nov 2006

My Next Phone

OK, I think we probably have a winner. For some time I has been thinking about replacing my aging Nokia 6310i with something newer, more capable, and preferably Linux-based. I have been looking at some Motorolas, but it seems that this one is way better: It is truly open, including the whole kernel and hardware drivers (it only runs a proprietary GSM stack on a separate CPU), it has GPS, and it allows users to run X11 apps (unlike Motorola, where user apps are mostly limited to that slow Java). I hope it will be released soon and I hope I will find somebody to order it from.

Section: /computers (RSS feed) | Permanent link | 2 writebacks

Tue, 07 Nov 2006

TMOU 8

On Friday and Saturday, we took part in TMOU 8. It was an interesting experience, because it was my first TMOU with snow. In contrast to what some other teams think, my opinion is that snow is not much of a problem - for example heavy rain with temperatures around 0°C could be much worse.

Our performance was similar to the previous years - slow start, good middle part, and a bad end: the first stage took definitely more time than we wanted, and we were somewhere between 20th and 30th place (out of 210 teams). During the first stages of the outdoor part of the game we were even somewhere between 4th and 10th places. But then, at stage 10, we have tried many different approaches, but the solution was pretty simple. It just required to look at the task from a different point of view. The stage 11 was similar - we knew the basic principle, but an inaccuracy on our side prevented us to solve the puzzle soon. We have then finished at stage 13 out of 15 on Saturday noon, on the 22th place according to an unofficial statistics.

I think we have again proved, that even though we may not always find the solution quickly, we are still able to use a different point of view and solve the puzzle after some time. However, this year's TMOU did not contain many interesting puzzles. Most of them only required a lot of work, while the basic principle could be quessed quickly. Anyway, here is the description of TMOU 8 as seen by the team coredump.

Section: /personal (RSS feed) | Permanent link | 0 writebacks

Energy drinks

Does anybody know how energy drinks are supposed to work? Last weekend we took part in TMOU 8. I have bought - probably for the first time in my life - an energy drink (its name was "Kamikaze" - lacking any other reasonable criteria for choosing an energy drink, I have bought the one with Kanji on it :-) in case I would get tired.

It was a bit cold during the night (I guess it was about -8 Celsius degrees), and at 4am I have decided that this was the right time for my Kamikaze. The results of drinking this very cold thing was strange: I have almost instantly felt very cold, and very sleepy. So I have laid down and fell asleep, while my teammates were trying to solve the stage 10.

Interestingly enough, when I awoke maybe half an hour or an hour later, I felt good, and I have solved the puzzle at the stage 10 in a very short time. Was it an effect of the energy drink? I don't think so - I remember TMOU in previous years, where short sleep has provided a big improvement in our performance as well, without any energy drink.

Do energy drinks actually work? What is your experience with them?

Section: /world (RSS feed) | Permanent link | 5 writebacks

Fri, 03 Nov 2006

Time versus direction

The topic of the previous Japanese lesson was directions and time. I have found out that Japanese use the word/kanji "" (hold the mouse over this kanji to see the pronnouncation) for both "in front of", and "ago". So, "20 years ago" is "二十年前" as in "the time we are talking about had 20 years in front of it to the present time".

It is interesting that the same concept is in Czech as well - we say "před 20 lety", where the word "před" can be used as "in front of" as well. On the other hand, in Russian they use "тому назад", which literally means "back from now". So they are talking about the past from the present point of view, while Czechs and Japanese talk about the past from the point of view of that past.

But all those languages have a similar view of the flow of the time - the past is "behind us, back from us", while the future is "in front of us". I wonder whether this concept has been developed independently, or whether this shows some kind of common roots of all those languages.

Section: /world (RSS feed) | Permanent link | 2 writebacks

Thu, 02 Nov 2006

Regexp of the day

Regular expressions[?] are widely used in text manipulation, such as parsing e-mail addresses. I tried to use Email::Address module for it (it allows address to be modified, unlike a similar module - Mail::Address). My task was to qualify unqualified addresses, i.e. to change the address

AYANAMI Rei <rei>

to something like

AYANAMI Rei <rei@nerv.gov.jp>

However, Email::Address cannot parse unqualified addresses (unlike Mail::Address, as I have discovered later :-). Fortunately, Email::Address provides the regular expression for matching the address, and it can be modified. So the solution would be easy, wouldn't it? Here we go - the regexp in question is the following (line-wrapped for convennience):

$ perl -MEmail::Address -e 'print $Email::Address::angle_addr'
(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|
(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xis
m:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s
*\)\s*)|\s+)*<(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*
(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-
xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[
^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7
F()<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-
xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(
?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\
]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+
)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]
+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?
-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*))
)*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0
A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+
))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-
xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))
*\s*\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?
-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xi
sm:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\
x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F(
)<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-xi
sm:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-
xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+
))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*
)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)
)|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-x
ism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*
\s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\\])|(?-xism:\\(?-xi
sm:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xis
m:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\
s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)
)*\s*\)\s*)))*\s*\)\s*)|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xi
sm:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:
\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A
\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)

My dear lazyweb, who would be the first who will find out how to edit this regexp to parse unqualified addresses as well?

Update - Fri, 03 Nov 2006: Another solution

I have used Mail::Address, and I create a new object as a replacement whenever I have to replace an unqualified address with the qualified one.

Section: /computers (RSS feed) | Permanent link | 2 writebacks

Wed, 01 Nov 2006

Spam, spam, spam

It seems that the total volume of spam has increased radically during the last month. My own spambox is bigger than before - the previous spambox had 770MB and 56k messages, the current one has over a gigabyte and 71k messages (we rotate spamboxes on 10th each month, so the current one will probably be about 50% bigger than it is now). So it would be an increase in spam almost by factor of two in the last month. I am getting more than 3500 spam messages a day, one spam each 24 seconds! And this is just spam recognized by my spam filter, I guess 20-50 messages get through each day.

The IS MU mailserver could not cope with this volume of spam, and the CPU load has caused huge delays in message delivery. So, as a temporary measure, I have switched off some spam filtering features (causing a big uproar amongst users[1]), and I have started to reimplement the server part of the mailserver.

I have replaced Qmail with Postfix (expect more about life with Postfix in the next blogspot :-), added PostGrey, and rewrote the delivery mechanism so that entire our cluster is used for spam filtering (instead of the mailserver only). I have also added ClamAV antivirus scanner.

So, the current IS MU mailserver should be an order of magnitude faster than before, and it will be even more spam-resistant because of additional antispam measures such as greylisting. It took me about three weeks to redesign and reimplement it, but I think we are prepared for the next wave of spam.

Footnote [1]: Of course, when somebody complains "Fix your f*cking spam filter, I receive five spams a day!", I can always reply "Lucky you, I get 3500+ spams daily." :-)

Section: /computers (RSS feed) | Permanent link | 11 writebacks

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)