Yenya's World

Tue, 28 Feb 2006

The NTP server and pool

After several years I have reviewed the configuration of our time server. I have contacted a NTP admin at CESNET (our ISP), and he pointed me to several stratum 1 NTP servers (most of them GPS-based, but there is also one server based on cesium atomic clock). So we have a fairly stable stratum 2 NTP server now, synchronized with about six stratum 1 servers, some of them outside the Czech republic.

I have also written a documentation for our users, and set up the IS MU servers to synchronize against our NTP server.

I tried to enable the X.509-based signatures of NTP data, but did not found any meaningful documentation - the "official NTP documentation" is rather confusing for me - even the NTP FAQ were more helpful. The best documentation about NTP servers is probably the Sun's "Basic NTP Administration ad Architecture" (the link is to a PDF document). However, this file documents an older revision of NTP server, without the advanced features like asymmetric cryptography.

I have added our NTP server to the public NTP server pool (which has a pleasant side-effect that we now have a free remote monitoring of the NTP server quality).

Section: /computers (RSS feed) | Permanent link | 3 writebacks

3 replies for this story:

adelton wrote:

Hmmm, is a cron job better than running ntpd?

Yenya wrote: Cron job?

What cron job? ntpdate? Running ntpd costs another process in memory. However, ntpd can adjust the time by incremental skewing, which (unlike ntpdate) will not confuse a time-sensitive apps (such as Oracle backup, as our DBA said), because with ntpd, the system time never skips back.

adelton wrote: Re: Cron job?

Yes, ntpd instead of a cronjob starting ntpdate. As for the ntpd never setting the time back, that only holds if you use the -x option.

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Mon, 27 Feb 2006

Scripts from the past

On Friday I was trying to modify the way IS MU cluster uses for time synchronization - it turned out that the new servers in the IS MU cluster skew the clock - the clock is about 6.5s a day faster than the reference time. (the servers are based on Tyan Tomcat boards, which are - with the exception of this problem - excellent boards).

I have looked into the script we use to glue rdate(1), hwclock(8), and system logging together. There were the following lines written near the top of the script:

# Simple front_end to rdate (1)
# Nastaveni casu po siti.
# Jan Kasprzak 18.10. 1994

It seems that even in 1994 there was rdate(1), hwclock(8), and I used them both on Linux systems. How old is the oldest software you wrote which is still runing on your computers?

Section: /computers (RSS feed) | Permanent link | 1 writebacks

1 replies for this story:

Vaek Stodlka wrote: My oldest

Wow, my oldest untouched code is totem.nazelenelouce.net from 1999. And it seems to me as funny as in 1999. :)

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

After the Sendvič 2006

Well, 8th place at this year's Sendvič is worse than expected. However, I think we did well - there were tasks I think we would not be able to solve in the two hours limit anyway. I can recommend to look at the crossword puzzle (task 303). Most of it can be solved even by English-speaking people (only C, D, E, L, M, and N are related to the Czech language). The funniest one is probably E (at least we had a good laugh when we figured out what it was).

Section: /personal (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Thu, 23 Feb 2006

Logical puzzle

My brother sent me a link to an interesting logical puzzle (in Flash, sorry). If you don't speak Japanese, here are the rules:

Good luck and try to measure how much time did you need to work out the solution, and how many steps did the solution need.

Section: /personal (RSS feed) | Permanent link | 1 writebacks

1 replies for this story:

Stepan wrote: how many steps

Good brain-teaser! I need 17 steps to solve it...

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

MusicPD over the Net

I use MusicPD for maintaining my audio files collection. It is nice, but it can play music to the locally connected speakers only. So it is impossible to listen to the music played by mpd from the wireless-connected laptop in the next room.

One possible solution would be to share the music collection over NFS or SMB. Another solution can be to use remote esd sound daemon and libao output of mpd.

Nevertheless, I have found something even better: the latest development version of mpd supports besides the ALSA, OSS, and libao outputs also the Icecast output. So I have created an Icecast server for our home LAN, and voila - instant music on the laptop, with no storage required. I have used the mpd and Icecast howto.

It works well, but I have to read the documentation in order to figure out a proper authentication and administration of the Icecast server, and to find out whether Icecast supports also FLAC streams (my current stream uses OGG/Vorbis).

Section: /computers (RSS feed) | Permanent link | 1 writebacks

1 replies for this story:

oozy wrote: mpd

This so soo great! All my dreams about hearing music and music management came true ;). Thanks for links!

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Tue, 21 Feb 2006

Per-list spam filters

In January I received more than 40,000 spam messages. Most of them were dropped by my spam filter, but the number of messages which went to my inbox is still high. I have found that my spam filter is not working efficiently especially on messages sent through the mailing lists or aliases. I think the range of message formats, languages, encodings and so on is too broad for my spam filters.

For example, in the CRM114 Mailfilter HOWTO the author writes, that when comparing the spam and non-spam database using the cssdiff utility, the databases are quite different:

Note that there's a big difference between the two files; in this case there are about 10 times as many differences between the two files as there are similarities. That's pretty much typical.

Well, I have tried to run cssdiff on my CRM114 databases, and I have about the same number of differences as the number of similarities, not ten times more differencies than similarities, as the CRM114 author had. This means that my spam is too similar to the non-spam. Or maybe some spam going through a particular mail alias is too similar to the legitimate mail from some other alias or mailing list.

I am subscribed to many mailing lists, and I am a member of some well-known mail aliases at the University. I think some of these addresses receive mail with unique features. For example, the linux-kernel mailing list receives almost no legal mail in HTML or in Czech but occasionally somebody has a signature in Spanish or Portuguese. On the other hand, the mail alias info(a)fi.muni.cz gets many messages in Czech, Slovak, HTML-encoded, containing "suspicious" words like "account number" (for an admission fee) etc. But no Spanish almost no English messages.

It would probably make sense to have a special spam classifier database for each mailing list or alias I am member of. The drawback of this approach is that each of these databases would have to be taught the new types of spam separately. Or maybe the spam corpus for each of those addresses could be shared, and only the non-spam corpus could be separate for each address. This would probably also require some special handling such as removal the mailing list headers/footers before classification and before learning. On the positive side, the per-mailing list spam corpus could be used for filtering the mail before it enters the listserver queue (for lists which I administrate).

What do you think about it? Does anybody use a separate spam filter database for each e-mail source?

Section: /computers (RSS feed) | Permanent link | 4 writebacks

4 replies for this story:

Milan Zamazal wrote:

Personally, I wouldn't bother with separate databases. Spam is spam, regardless of its source and the classifier should recognize it. I'd suggest to rebuild your databases, possibly using different classification and/or learning methods. Note that CRM114 implements several classification methods, they are often improved and the recent (January) release contains new mailtrainer script. Just make double sure that you make no mistake when rebuilding the database (learning spam as non-spam and vice versa), that may confuse the classifier a lot.

Yenya wrote: Spam is spam, but ...

... but non-spam is different between various sources (and often more consistent inside one source, such as mailing list). So I think it may help - the classifier then would have a bigger "distance" between (general) spam and (specialized) non-spam. I'll have a look at new CRM114. I have rebuilt the databases ~2 months ago (finding few errors inside my spam corpus, of course).

Milan Zamazal wrote:

I guess the idea may work well for better detection of ordinary messages, but it may not help with special cases. Hard to say without more knowledge about the misclassified mails. BTW, if you'd like to employ your sed skills, you may analyse the misclassifications running crm with the `-T' option :-). In any case, I'm interested in results, please write about them if you have any.

Yenya wrote: Special cases

Well, it seems that some sources generate almost exclusively "special" messages. And I feel bad when teaching CRM114 that this is not spam, because I know that if it came through other alias/mailig list, it would definitely be a spam. These sources are simply different from my other mail: consider prihlaska @ our domain - there are questions about admissions, account numbers, multipart/alternative mails (which are banned from almost every serious mailing list), etc.

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Wed, 15 Feb 2006

Why Qmail should not be used

In the local Linux mailing list somebody asked which software should he choose for the mail server. I wrote a lengthy followup on why should not Qmail be used as a MTA for new installations. I think it is a good idea to rephrase this here. This comes from a person who maintains several Qmail instalations, including the linux.cz listserver. This is not result of a short-time anger with Qmail, but rather a thought-out opinion, formed over years of Qmail usage.

Firstly, some advantages which Qmail have:

And now the shortcomings of Qmail:

The latest problem is the worst one, because it is a design problem - it is a shortcoming of Qmail's modularity. It cannot be solved without rewriting qmail-smtpd (but that would require that it can access the database of users/aliases/etc., so it would have to leave its chroot jail).

So my recommendation is: do not use qmail for new installations. Choose Postfix or something like that instead.

Section: /computers (RSS feed) | Permanent link | 4 writebacks

4 replies for this story:

Bjarne Hansen wrote: Mr

Well I don't know how easy qmail is to configure, but it sounds pretty bad - sounds to me that exim is a much better option - it rejects smtp at the port - it can be configured to stop revicing at some level of load - after doing some config changes and study a litle bit - I think I like it. Actualy I was look for some info if Qmail would do the same - but sounds like no. Then I must agree, today with all the spammers and stuff you realy need to be able to put limits on difrent aspects.

Yenya wrote:

Yes, I think exim or postfix are better choices in the current state of the Net.

Mysidia wrote: Qmail is open source

Public domain. See here: http://cr.yp.to/qmail/dist.html So if you use qmail you have the possibility of forking the development, fixing problems you find, and making your own software release.

Yenya wrote: Re: Qmail is open source

Well, it has been public domain only for last year or so. Before that, its license has not been open source compatible. Try to look into the source code (hardcoded constants everywhere, standard library almost never used, etc). I would rather spend my life doing something more useful than resurrecting a project which has been dead for several years and has been surpassed by other project.

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Tue, 14 Feb 2006

Apache2 CPU time

Yesterday we had a huge load peak at IS MU (seminars registration or something like that). It turned out to be that we had few inoptimalities in our cluster configuration. I have spent the last night benchmarking and tweaking Apache.

Apache2 CPU time graph

Probably the most interesting change, even visible on a MRTG graph is that I have changed the SSL session cache from DBM to the shared memory, which on our dual-core systems seem to help a bit. In the above graph the change has been made at 13:00 (the last hour on the right side). It is quite clear that both the system time (orange area) and the user time (red line) are lower in the last hour. I made the following change in httpd.conf:

-SSLSessionCache dbm:log-mu/ssl_gcache_data
-SSLMutex file:log-mu/ssl_mutex
+SSLSessionCache shm:log-mu/ssl_gcache_data(512000)
+SSLMutex sysvsem

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Fri, 10 Feb 2006

Sendvič 2006

Like last two years, even this year we will take part in Sendvič 2006 - an on-line puzzle solving competition. Our team coredump was not so bad in the previous years.

Sendvič is different from TMOU or similar competitions in that it is strictly on-line, and it is very time-limited. It is similar in style to the qualification for TMOU 7. This year the organizers of the game asked whether we would object if the next year's game is in English - so maybe next year we will have even more teams to compete with.

After two third places, I hope we have a good chance to win this year :-).

Section: /personal (RSS feed) | Permanent link | 3 writebacks

3 replies for this story:

Spes wrote: Heh ...

... you think that you have a good change before every game :-))

Spes wrote:

chance not change, srry

Yenya wrote: Good chance

Yes, I do :-)

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Thu, 09 Feb 2006

Van Jacobson's network channels

In DaveM's blog there is an interesting followup to the Van Jacobson's talk (slides in PDF) at linux.conf.au.

Van Jacobson suggests that the kernel networking stack should be reworked as channels (one-way lock-free queues) of packets, and the parsing and handling of the network packets is to be done as near to the end of the "food supply chain" as possible (i.e., in the user-space apps, if possible). He also gives the numbers which show the better scalability of this approach. The scalability is important especially in SMP, NUMA and multi-core systems, which are becoming more and more common these days.

While this approach is definitely interesting, Van Jacobon leaves out an important problem - how the security can be accomplished? When any app is allowed to send arbitrary packets (because it does user-space TCP), how it can be kept from interfering with other apps, disrupting other TCP connections, and so on? DaveM's suggestion is to make "channel-based" TCP in kernel, with a tiny packet classifier, which allows mapping of the packets in the device's input channel to the channel of the particular socket. The TCP handling would then be done in the context of that particular process (yet in kernel space).

Van Jacobson's measurements suggest this way the TCP processing on a SMP box can be 6 times faster (and essentially lock-free) than in the current kernel (while he also acknowledges that Linux net stack already is the fastest and most complete networking stack of any OS). There is also a LWN followup in last week's LWN "Kernel" section.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Mod_perl2 and autoflush

Another "interesting" behaviour of mod_perl2: some of our web applications set the Perl autoflush variable ($| = 1) in order to send at least partial output to client, when generating the full output can be lengthy. It seems that mod_perl2 does not like autoflush on the output filehandle.

When the autoflush is enabled, the Apache sends only the first HTTP header to the client, and the rest of the headers together with the page body is sent as a HTTP response body. Moreover, an "internal server error" message is appended afterwards.

After numerous tests I have figured out that the problem is indeed in the autoflush feature, and disabling autoflush or at least setting it after all the headers has been sent fixes the problem. I have no idea why autoflush when sending out headers is broken on mod_perl2.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Wed, 08 Feb 2006

Learning sed(1)

Few days ago I got some free time, which I decided to spend by reading the sed(1) documentation. I have used sed for tasks like s/Bill/Linus/g before, and I vaguely knew it can do something more.

Well, it turned out that the sed language is not very complicated, yet it is more powerful than I expected. It can group commands to blocks, and it can do both conditional and unconditional branches. With these constructs, the language can become powerful enough to emulate a Turing machine.

Just for fun I wrote a simple sed script which works as a context grep(1) -- i.e. it prints some context as well as the matched line (in this case, two lines above and two lines below). It has some bugs (it does not work as expected when the searched string is on two consecutive lines, for example).

#!/bin/sed -nf
# Context grep - sample script written by Jan "Yenya" Kasprzak
# The sample searched string is "gopher" here
H;x;s/.*\n\([^\n]*\n[^\n]*\n[^\n]*\)$/\1/;x# Keep the last three lines in buffer
/gopher/{x;p;x;n;N;p# If found, print the buffer, the next two lines,
# and an empty line
a
}

It can be tested with the following commands:

$ chmod +x ./agrep.sed
$ ./agrep.sed /etc/passwd

For things like this, Perl or AWK would probably be a better tool, but nevertheless, using sed for something beyond the classical s/Bill/Linus/g task can be a nice mental exercise. Hmm, sed golf, anyone?

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Tue, 07 Feb 2006

Computer-generated playlists

I own a decent collection of music (mostly CDs, but also vinyl and a few tapes), and I wonder whether there is a way to make the computer select what to play in any intelligent (= better than random) manner. I don't have time to "just listen" to one of my favourite albums these days, I just want some background music at work or in the car.

There are "social-network" sites such as last.fm or iRATE radio, but I think I want something different. Last.fm just suggest what other music the user may like, but I have to get it myself. iRATE radio is limited to a freely-distributable music.

Another approach are song-rating systems like PyTone (stupid name, isn't it? guess in which language is it written :-), LongPlayer, or IMMS. These can select "what to play next", but unlike the "social network" sites, it cannot suggest "what else I might like". Moreover, none of these tools seem to work with the player I use, mpd.

I think what I want is to add the rating system to the mpd back-end, and maybe a minimal rating support to the network protocol and clients (just "now play by ratings from all available songs" command, and maybe "display/change the rating of this song"). Mpd is probably flexible enough to make this working even as a separate client/daemon. The in-server solution would allow to do some neat things, though - such as per-user rating, and selecting what to play based on the ratings of all currently logged-in users.

The sad thing is that I don't have time to implement this. Anyone interested? I have created a theme for bachelors' thesis (authenticated page for MU students only, sorry) about this, so maybe some student will take this as an interesting challenge.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Sat, 04 Feb 2006

SCP only

Everybody uses SSH for secure remote logins and file transfers. The problem is that SSH sometimes allows too much. It would be nice to have a "scp-only" accounts, which could transfer files, but not run remote processes, forward ports, and so on.

I have recently found that there is a nice and clean solution to these problems: scponly is a program which can be used as the login shell for "file-transfer-only" accounts. It is also compatible with many other means of transfering files over SSH, such as rsync+ssh, Subversion over SSH, and so on.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Fri, 03 Feb 2006

Ekiga

From time to time I try to play with the IP telephony (I have mentioned it before). Now I have figured out that the GnomeMeeting team have made a big progress since then, they renamed the project to Ekiga, and they are heading to the 2.0 release.

I have installed Ekiga on my computers, and made a pair of accounts in their SIP directory at ekiga.net. Yes, Ekiga now supports also SIP in addition to H.323, which makes it an ultimate client for heterogenous environments. The ekiga.net has even some test numbers for testing things like echo, remote latency, and so on.

From my first tests it seems that Ekiga has at least as good sound level detection as Skype. The client also supports the silence detection and echo cancellation. However, the later was too weak for the environment with a huge echo, such as my laptop, which has the speakers located within few centimeters distance from the microphone. With the headphones, the calls were much more pleasant.

I have also tried the calls from behind the 1:N NAT (the laptop in my home network), with partial success. I have used the STUN server at ekiga.net, and surprisingly enough, I was able to call the internal softphone from the outside network. The other direction, however, had problems: the voice from the internal host to the outside network had too big jitter and packet loss, and the other direction did not work at all. I have to do more tests, maybe I have something set up incorrectly.

So it seems the IP telephony is getting to be pretty usable. Now I have to obtain a public phone number (either a fully public one from Netbox - my home ISP, or a "partly-public" one from CESNET, which would be suitable at least for calls to/from the University and other academic institutions).

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

Thu, 02 Feb 2006

Moving to Apache 2

In the last few days we have finally moved IS MU to the Apache 2.0. Well, in the meantime the 2.2 branch has been released, but we will try 2.0 first. I wrote about some problems with mod_perl2, today I will add few more:

In spite of the above problems, we are now running Apache 2.0 on our production servers, and we can move on to upgrade our infrastructure to use UTF-8 even at the HTTP server layer (our DB is already in UTF-8, as I wrote before).

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)