Yenya's World

Wed, 29 Nov 2006

The Opinion of Novell Employees

During last weeks I wondered how Novell (ex-SUSE) Linux engineers feel about the Novell-Microsoft partnership. Some of them say it directly, while others use carefully chosen words. GregKH's essay (the later link) is really worth reading. It reminds me of the times of the communist regime in the Czechoslovakia, when we were not allowed to state our opinions directly, so we had to use very careful words, from which the intended meaning could be guessed. Bad memories, oh well.

Section: /computers (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

Martiner wrote:

Link http://pavelmachek.livejournal.com/30914.html is not valid ("Error No such entry.")

Yenya wrote: Works for me

Strange - the URL http://pavelmachek.livejournal.com/30914.html works for me (even from links/lynx and even from the other OS, which must not be named).

Reply to this story:

Thu, 23 Nov 2006

32-bit Galeon

I use an AMD64-based workstation, and from time to time I need to view data in proprietary formats, such as Flash. The Flash player is not available for Linux/AMD64, only for Linux/i386. The Flash plug-in is a shared library, so as a 32-bit object it can be used only with 32-bit executable, not with my 64-bit Galeon. I used to run flashplayer remotely from my laptop, but now I have decided I need it on my workstation as well.

Fortunately, there is a 32-bit version of Galeon in Fedora Extras. So I have installed it, and at a cost of keeping both 32-bit and 64-bit shared libraries in memory, it works, even with Flash plugin.

There was a problem, though: I use SCIM (with Anthy) for input of Japanese characters. It did not work with 32-bit Galeon. I have looked into the Red Hat bugzilla, and found bug #215583 there. In the discussion, Jens Petersen recommends to use scim-bridge, which allows the input methods to be run over the socket, in the separate address space. So it works even when the input method runs on different architecture (AMD64 in my case) than the main application (Galeon/i386) in my case. I just had to install both 32-bit and 64-bit scim-bridge-gtk modules, the scim-bridge package, and set the GTK_IM_MODULE environment variable to scim-bridge instead of scim.

So now I have both Japanese input method and Flash working in Galeon. Nevertheless, I hope there will be a full-featured and free Flash implementation soon.

Section: /computers (RSS feed) | Permanent link | 1 writebacks

1 replies for this story:

Vasek Stodulka wrote:

I think that gnash will be the impulse for open source Adobe plugin. Have a look at java - when there is gnu/classpath eventually usable, Sun releases Java under GPL. Is it only coincidence?

Reply to this story:

Wed, 22 Nov 2006

CPAN Bugs

I am migrating the IS MU cluster to Fedora 6 - the front-end servers are already migrated, and now I am working on migrating the web cluster. This is a bigger change than it may seem, because we are also migrating the old 32-bit system to a fully 64-bit one. So I am recompiling everything from Apache to Perl. While testing the Perl modules I found the following regressions:

The Text::Tabs module does not work correctly on UTF-8 characters. Here is the test case - the output is misaligned:

perl -CS -Mutf8 -MText::Tabs -e \
	 'print expand("\taa\t.\n\t\x{010a}\x{010a}\t."), "\n"'

I did not dig into this further, replacing Tabs.pm from the older version worked.

Another regression was in Crypt::Cracklib, which we use for generating and checking passwords. I have seen random crashes when using this module. The problem was that it did not #include the <crack.h> header file, so the compiler thought that some function returns int where in fact it returned a pointer. On AMD64 the integer is smaller than the pointer, so the pointer got truncated. I have solved this with modifying Makefile.PL the following way:

                $include = $_;
                if (-f "$include/packer.h") {
+                        if (-f "$include/crack.h") {
+                                $incfile .= ' -Dcrack=1';
+                        }
                        last;
                } elsif (-f "$include/crack.h") {

In both cases I have contacted the authors, but it would be nice to have a bug-tracking system for CPAN modules. Is there anything like that available? If there isn't I hope at least this blog entry helps somebody to solve the problems I had with those two modules - especially Crypt::Cracklib has not been updated for years.

Section: /computers (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

Adelton wrote: rt.cpan.org

Not that it would help much in getting the authors to do the fix for you if they are not responding to regular emails but you'll at least have URL to point to.

Adelton wrote:

... hmm, but Text::Tabs is a standard module, so it should go the perlbug way. The problem is also present on 32bit system, it is not 64bit specific. I've also filed a bug 217833 against RHEL 5 Beta 2 to have it tracked from Red Hat's side as well.

Reply to this story:

Tue, 21 Nov 2006

Linux Weekend 2006

The Linux Weekend, a conference organized by CZLUG in Prague, has been oficially announced.

I will have a presentation about (guess what?) spam, but there will be more interesting talks as well ;-).

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

Thu, 16 Nov 2006

Up-to-date Knowledge

It took me a while to figure out that prelink changed my executable files and libraries yesterday, and that RPM is now prelink-aware. Vašek added an interesting comment to which I want to reply in this blog post:

Yes, prelink. It shoud be written at the top of rpm man page (maybe also bold and red) that prelink modifies binaries and RPM knows about it. Most people find differences and then hunt ghosts - just like you. I only wonder how a guru like you do not know this. :-)

Well, the simple answer is that I am by no means a guru :-) The more complicated answer is that I learned RPM long ago (pretty indepth, I even wrote a series of articles about it), when it did not have a prelink support.

This is a general problem with the IT knowledge: it is often not so hard to gain the knowledge, but it is much harder to keep up with ongoing changes. I think it is because there are manuals and tutorials for beginners, but almost nothing about what has changed in - for example - last two years. I don't count Changelogs, because they are clogged with changes at the micro-architectural level, which are of no interest after a year or two.

It is hard to keep the knowledge up-to-date even when you actually still use the system in question: I build RPM packages occasionally, yet this was the first time I came into the prelink support of RPM. It is the same with Perl, for example. I use Perl almost daily, yet many features used in Perl Best Practices were new for me (such as using "our $var;" instead of "use vars qw($var);").

Also, in this particular case it would not probably helped to have a big fat warning near the top of the rpm(1) manpage - I think I would not consult this manpage in this situation. Using strace(1) is more general :-). Keeping the knowledge up-to-date is pretty hard - for example, I skip most of content of the Linux Journal, because it in a lengthy ways repeats what I already know, and should there be anything new for me, it is deeply buried in facts I already know.

How do you keep up with the current development in IT, my dear lazyweb?

Section: /personal (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

Vasek Stodulka wrote:

I gave up to know everything long time ago. For example - last time i have been making idiot from myself, when I was saying that "for i in *;" is not safe because of maximal command line length. These thing happens, especially to me. :-)

Milan Zamazal wrote:

ANSI Common Lisp hasn't been changed yet since it was adopted and user visible changes in Emacs are carefully documented in the NEWS file. So there's no problem to keep up with the most important development in IT. :-)) Seriously, you are lucky you just don't keep up with new development, there are pieces of free software that are hard to avoid to use (such as some hardware drivers) and that don't contain real documentation at all, so it's difficult to get any knowledge about them. I try to do two things to make things better: 1. keeping manuals of the software I write up to date including documenting user visible changes in NEWS; 2. documenting some of my user experience on my web pages so that other users had at least chance to find useful information on internet and not to repeat the hard way.

Reply to this story:

Wed, 15 Nov 2006

Hunting Ghosts

Today I worked on synchronizing filesystems on some of our high-availability systems. We use custom-made rsync-based setup for checking for differences between filesystems in a cluster. One of the hosts in a H-A pair has been down for a while because of a faulty hardware, so I had to manually check whether the changes on the active system can be propagated to the backup as well. I have synchronized the filesystems, and switched the load to the newly plugged-in host (because it is faster than the other one). Just to be sure, I re-ran the checks again, and was surprised: some files have been different on the new host now.

What was worse, the set of files which were different was a bit suspicious: bash, login, tcpdump, some other utils and libraries, including those which are run every time system boots (such as heartbeat and its libraries). I ran "rpm -V", just to be sure the files are different than in the RPM database, but it displayed that all files are OK and well matching the database. I took the clean RPMs from the FTP file repository, and the files in question were shorter in the package than on my filesystem. I thought: are current rootkits so smart that they modify the RPM database, and so stupid that "ls -l" still can tell the difference?

"rpm -qlv bash|grep /bin/bash" displayed that there was a different size in the RPM database than in the file itself, yet "rpm -V bash" said the package was perfectly OK. Strange. So I suspected the rpm program has been modified as well (even though it did not show up in the list of modified files). To prove this, I used strace. On a clean system its output was shorter, and the difference was that on a modified system rpm spawned some more threads/processes. "strace -f" then showed the quilty party - the rpm command executed prelink on each modified binary.

So I have been hunting ghosts all the time: the files in question have only not been prelinked yet, or the prelinking info has been overwritten (or not overwritten, I don't know) by my synchronization scripts. After running "/etc/cron.daily/prelink" on a "modified" system both filesystems look the same. Problem solved.

For a long time I wondered how prelinking can be done without modifying the binary (and thus breaking the packaging system). The answer for rpm appears to be: the package manager needs to know about prelinking as well. I have to find some time to read Jakub's prelink paper (PDF). Back to a serious work now.

Section: /computers (RSS feed) | Permanent link | 3 writebacks

3 replies for this story:

Vasek Stodulka wrote:

Yes, prelink. It shoud be written at the top of rpm man page (maybe also bold and red) that prelink modifies binaries and RPM knows about it. Most people find differences and then hunt ghosts - just like you. I only wonder how a guru like you do not know this. :-)

Yenya wrote: Too much knowledge

I think hunting ghosts is not bad per se, provided that I find the right answer after all. It is probably that I know about many things that can go wrong, so it takes time to find the right one. Last week one of my students asked me to find out whether he cannot login via KDM any more after installing some completely unrelated package. I have traced the X startup scripts and so on, and it took me at least quarter an hour before I ran "df /" and discovered that his root filesystem is full...

Peter Kruty wrote: knowledge

Right, there is so much knowledge about the live Linux system, that sometimes tooks time to find out right source of the problems.

Reply to this story:

Tue, 14 Nov 2006

Postfix Filtering

When I wrote about Posfix on the IS MU mailserver, I promised more details about my life with Postfix. So here we are:

I think one of the biggest pros of Postfix is that it has sane defaults. The only things you have to configure are those which are special for your setup. Now the cons: the biggest problem of Postfix is probably its filtering mechanism. When you want to do for example virus or spam filtering inside the SMTP session, there are three ways to do it:

The first one is the policy server - this is the server which listens on a socket (often being run by master process), reads requests from the SMTP server, and sends its verdicts to it (verdict can be DUNNO, DISCARD, REJECT, etc). It is possible to have a policy server for every part of the SMTP conversation - MAIL FROM, RCPT TO, DATA, end of DATA, etc. Policy servers are great for implementing greylisting, for example. However, there is one rather stupid property of the policy servers, which renders them unusable for virus or spam filtering: even the end-of-DATA policy server gets only the envelope information, and not the message itself. Why implement an end-of-DATA policy server at all, when it doesn't get any new information apart from what is already available in the DATA stage?

The second method is Milter (sendmail-compatible mail filtering interface). However, it does not have a native library for writing milters, and it requires sendmail to be installed and configured as well. Blehhh.

The last method of filtering is SMTP proxy - you can write a SMTP proxy, and Postfix's SMTP server forwards the message to the proxy, which can then do any filtering/discarding/rejecting according to its policy. If the message is to be passed back to Postfix for further handling, the proxy should send it over SMTP as well (recommended configuration is to run another SMTP daemon bound to another port on the loopback interface). This is poorly documented, because nobody knows whether Postfix's SMTP server passes all commands to the proxy (and thus which features of [E]SMTP should the proxy implement), or whether the proxy gets the message in some simplified form.

I am not aware of any other filtering mechanism, which would not include writing its own SMTP server and client, and which would allow me to filter messages (decide whether to reject or discard the message, or pass it in, maybe with an added header). As a performance-wise bonus, it would be nice if the filtering mechanism allowed the message to be accessed directly as a file inside the queue, without copying data and sending it back.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

Sat, 11 Nov 2006

Postgrey statistics

I wrote about an increase of spam volume in the previous month, and also about installing Postfix to the IS MU mailserver. Here are results after 10 days:

The total volume of my spam last month was lower than expected - only 1.3GB in 94k messages, so the increase was 1.6 times in size and 1.3 times in number of messages. However, the total number of spam going to other mailboxes in our domain has increased by 50% or more.

I also wrote that have installed Postgrey to the IS MU mailserver, hoping that greylisting[?] would deflect a majority of spam from IS MU. However, I did not found a decent statistics package for Postgrey, so I hacked up a simple script to generate statistics from Postgrey. It is too simple and possibly not entirely correct, because Postgrey does not provide much info in its log. Here are the stats from the first 10 days:

Greylisting stats from Nov  1 00:00:00
                  till Nov 10 15:38:03

Messages
total:                   2072453
accepted immediately:     179692 (  8.7%)
delayed:                   22473 (  1.1%)
blocked:                 1870288 ( 90.2%)

Greylisting delay (avg):       4399s delayed
                                550s all

SMTP servers:
                                   hosts   messages
No message accepted:              231940    1840367
No message graylisted:              1364     163021
All accepted (maybe delayed):       7750      21281 = 33243-11962
More blocked than accepted:         2112      34531 = 40432-5901

So it seems that 90% messages are spam which can be blocked by greylisting. Nice. The next problem is to recognize spam in the remaining 10%.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

Fri, 10 Nov 2006

What is on your flashdisk?

From time to time I think about finding out an ultimate Linux distribution suitable for being run from an USB flash disk. I have a 512MB flash disk, which should be big enough for basic tasks such as fsck, ssh, rsync, and maybe mutt and links. Maybe something like Slax?

There is however a drawback: I sometimes need to use my USB key as a raw device: I simply dd(1) a bootable image of something into it. Usually it is a diskboot.img file from Fedora, which I then use for installing/upgrading a computer, or even with the rescue feature of Fedora Core installer as a rescue disk (it requires the rest of the distro being available over the Net, though). Yesterday I even put an image of a DOS floppy to it, and used it to flash a new BIOS to a mainboard.

So needing a raw device too often has prevented me from using a "permanent" live distribution on my USB key. The problem may be solved with partitioning the device and installing a boot loader (can a DOS floppy image be booted when put on a partition, e.g. by GRUB?), but I don't know whether it is worth the trouble (and I may need a raw device for something else later). I also don't know if all BIOSes support booting from a partitioned removable device (as opposed to the raw device as a whole).

What is on your USB key? If some kind of a live distribution, do you also use it as an install disk or for flashing BIOSes as well?

Section: /computers (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

Satai wrote: Why USB

Why do you use an USB drive? The probability, that you can boot from a CD on a randomly choosen computer seems to be much higher..

Yenya wrote: Re: Why USB

Nope. Almost none of my servers has CD-ROM drive. Every one of them has an on-board USB port with BIOS that supports USB boot. Also USB stick is much smaller and less prone to physical errors (unlike CD, which can easily be scratched or damaged in another way).

Reply to this story:

Thu, 09 Nov 2006

My Next Phone

OK, I think we probably have a winner. For some time I has been thinking about replacing my aging Nokia 6310i with something newer, more capable, and preferably Linux-based. I have been looking at some Motorolas, but it seems that this one is way better: It is truly open, including the whole kernel and hardware drivers (it only runs a proprietary GSM stack on a separate CPU), it has GPS, and it allows users to run X11 apps (unlike Motorola, where user apps are mostly limited to that slow Java). I hope it will be released soon and I hope I will find somebody to order it from.

Section: /computers (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

Milan Zamazal wrote:

Being recently equipped with a Linux PDA I can see the following problem: I can either use the Qtopia platform that is stable and comfortable but of limited features and incompatible with most free applications (which might turn the Greenphone project into just another substrate of proprietary systems), or an X11 platform that can run a lot of applications that are unstable and inconvenient to use on a mobile device. What we need is a free stable X11 platform customized for mobile devices. Hopefully this project could help with this. But what else is this project good for? With the small and possibly fragile display and no keyboard (really?) it can hardly be much more than, ehm, a phone. What would be the added value to persuade me to upgrade my old Ericsson T39?

Yenya wrote: Why a smartphone?

This one has GPS, for example. And it probably can play films - think playing Krteček to you bored kid when waiting for a meal in a restaurant or during any other kind of waiting :-). This is what I use my Palm T5 for (aside of being a nice OGG player).

Reply to this story:

Tue, 07 Nov 2006

TMOU 8

On Friday and Saturday, we took part in TMOU 8. It was an interesting experience, because it was my first TMOU with snow. In contrast to what some other teams think, my opinion is that snow is not much of a problem - for example heavy rain with temperatures around 0°C could be much worse.

Our performance was similar to the previous years - slow start, good middle part, and a bad end: the first stage took definitely more time than we wanted, and we were somewhere between 20th and 30th place (out of 210 teams). During the first stages of the outdoor part of the game we were even somewhere between 4th and 10th places. But then, at stage 10, we have tried many different approaches, but the solution was pretty simple. It just required to look at the task from a different point of view. The stage 11 was similar - we knew the basic principle, but an inaccuracy on our side prevented us to solve the puzzle soon. We have then finished at stage 13 out of 15 on Saturday noon, on the 22th place according to an unofficial statistics.

I think we have again proved, that even though we may not always find the solution quickly, we are still able to use a different point of view and solve the puzzle after some time. However, this year's TMOU did not contain many interesting puzzles. Most of them only required a lot of work, while the basic principle could be quessed quickly. Anyway, here is the description of TMOU 8 as seen by the team coredump.

Section: /personal (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

Energy drinks

Does anybody know how energy drinks are supposed to work? Last weekend we took part in TMOU 8. I have bought - probably for the first time in my life - an energy drink (its name was "Kamikaze" - lacking any other reasonable criteria for choosing an energy drink, I have bought the one with Kanji on it :-) in case I would get tired.

It was a bit cold during the night (I guess it was about -8 Celsius degrees), and at 4am I have decided that this was the right time for my Kamikaze. The results of drinking this very cold thing was strange: I have almost instantly felt very cold, and very sleepy. So I have laid down and fell asleep, while my teammates were trying to solve the stage 10.

Interestingly enough, when I awoke maybe half an hour or an hour later, I felt good, and I have solved the puzzle at the stage 10 in a very short time. Was it an effect of the energy drink? I don't think so - I remember TMOU in previous years, where short sleep has provided a big improvement in our performance as well, without any energy drink.

Do energy drinks actually work? What is your experience with them?

Section: /world (RSS feed) | Permanent link | 5 writebacks

5 replies for this story:

Va�ek Stod�lka wrote:

Energy drinks are obsolete, present hit are energy ampules. You can buy "Speed 8" at every gas station. I got one with Heroes of Migth and Magic V game (labeled "Elixir of endurance :), but I have not used it yet. It looks it should really work - it is packed like some drug and a booklet tells you about not mixing with alcohol and negative symptoms etc. Energy drinks are designed to quickly fill up glucose level in your muscles, in Tmou we were more frequently used our brains. I do not know if the brain is "powered" by glucose too, but I think it works differently and classical energy drinks do not work.

Yenya wrote: Glucose

Interesting. I had some sacharide pills with me (I don't know right now whether it was glucose or fructose) - it was good because it was possible to eat them even though I had problems with eating other food such as rolls or cheese. But then, if glucose is the main component, wouldn't a bar of chocolate have the same effect? I know, it is probably sacharose, but it is close enough [Note: for TMOU, look for chocolates which are edible even at 0°C]

Vasek Stodlka wrote:

Glucose (and fructose, it is chemically the same, IMHO) goes from stomach and thin bowel directly to blood and to muscles, sacharose needs reassembly to glucose. In basic school we learned that glucose can reach muscles in 15 minutes after eating, sacharose takes longer - about an hour. It means that chocolate also works, but with a delay. Principially, you can also eat some honey. :)

x wrote:

isn't Nutella the best chocolate for this temperatures?

davro wrote:

Brain is powered only by glucose. There is a very important parameter, which shows how fast are sugars from food transfered to blood - GI (glycemic index). Pure glucose transfer to blood is fast and is refered to as reference food (BTW: beer has GI 103 - so drink it if you need fast glucose transfer :-) Chocolate is not as good at it, because it contains lots of fat (but on the other hand, it is more complex - gives you energy for a longer time), and of course it contains legal drugs:-). I wonder whether good clothing is more important to your brain performance than glucose? If you are cold lots of sugar is spent to thermoregulation.

Reply to this story:

Fri, 03 Nov 2006

Time versus direction

The topic of the previous Japanese lesson was directions and time. I have found out that Japanese use the word/kanji "前" (hold the mouse over this kanji to see the pronnouncation) for both "in front of", and "ago". So, "20 years ago" is "二十年前" as in "the time we are talking about had 20 years in front of it to the present time".

It is interesting that the same concept is in Czech as well - we say "před 20 lety", where the word "před" can be used as "in front of" as well. On the other hand, in Russian they use "тому назад", which literally means "back from now". So they are talking about the past from the present point of view, while Czechs and Japanese talk about the past from the point of view of that past.

But all those languages have a similar view of the flow of the time - the past is "behind us, back from us", while the future is "in front of us". I wonder whether this concept has been developed independently, or whether this shows some kind of common roots of all those languages.

Section: /world (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

wrote:

cestinu vidim, rustinu ne

Yenya wrote: use Unicode fonts

You have to set you browser to use UTF-8 by default, and UTF-8 fonts by default.

Reply to this story:

Thu, 02 Nov 2006

Regexp of the day

Regular expressions[?] are widely used in text manipulation, such as parsing e-mail addresses. I tried to use Email::Address module for it (it allows address to be modified, unlike a similar module - Mail::Address). My task was to qualify unqualified addresses, i.e. to change the address

AYANAMI Rei <rei>

to something like

AYANAMI Rei <rei@nerv.gov.jp>

However, Email::Address cannot parse unqualified addresses (unlike Mail::Address, as I have discovered later :-). Fortunately, Email::Address provides the regular expression for matching the address, and it can be modified. So the solution would be easy, wouldn't it? Here we go - the regexp in question is the following (line-wrapped for convennience):

$ perl -MEmail::Address -e 'print $Email::Address::angle_addr'
(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|
(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xis
m:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s
*\)\s*)|\s+)*<(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*
(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-
xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[
^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7
F()<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-
xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(
?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\
]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+
)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]
+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?
-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*))
)*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0
A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+
))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-
xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))
*\s*\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?
-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xi
sm:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\
x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F(
)<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-xi
sm:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-
xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+
))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*
)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)
)|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-x
ism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*
\s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\\])|(?-xism:\\(?-xi
sm:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xis
m:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\
s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)
)*\s*\)\s*)))*\s*\)\s*)|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xi
sm:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:
\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A
\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)

My dear lazyweb, who would be the first who will find out how to edit this regexp to parse unqualified addresses as well?

UPDATE 2006/11/03: Another solution
I have used Mail::Address, and I create a new object as a replacement whenever I have to replace an unqualified address with the qualified one.

Section: /computers (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

Vasek Stodlka wrote:

It wold be interesting to know, if this is human-written or machine genereated. :)

Adelton wrote: Matching unqualified

Put ( before that \@ on line 19 and replace )))> with )))?)> on line -4.

Reply to this story:

Wed, 01 Nov 2006

Spam, spam, spam

It seems that the total volume of spam has increased radically during the last month. My own spambox is bigger than before - the previous spambox had 770MB and 56k messages, the current one has over a gigabyte and 71k messages (we rotate spamboxes on 10th each month, so the current one will probably be about 50% bigger than it is now). So it would be an increase in spam almost by factor of two in the last month. I am getting more than 3500 spam messages a day, one spam each 24 seconds! And this is just spam recognized by my spam filter, I guess 20-50 messages get through each day.

The IS MU mailserver could not cope with this volume of spam, and the CPU load has caused huge delays in message delivery. So, as a temporary measure, I have switched off some spam filtering features (causing a big uproar amongst users[1]), and I have started to reimplement the server part of the mailserver.

I have replaced Qmail with Postfix (expect more about life with Postfix in the next blogspot :-), added PostGrey, and rewrote the delivery mechanism so that entire our cluster is used for spam filtering (instead of the mailserver only). I have also added ClamAV antivirus scanner.

So, the current IS MU mailserver should be an order of magnitude faster than before, and it will be even more spam-resistant because of additional antispam measures such as greylisting. It took me about three weeks to redesign and reimplement it, but I think we are prepared for the next wave of spam.

Footnote [1]: Of course, when somebody complains "Fix your f*cking spam filter, I receive five spams a day!", I can always reply "Lucky you, I get 3500+ spams daily." :-)

Section: /computers (RSS feed) | Permanent link | 11 writebacks

11 replies for this story:

Anydot wrote: Congratz

That you choosed postfix

thingie wrote: Of course

Ready for the previous war...?

Honza Holčapek wrote: Poor you

3500+ spams a day is terrible number. Just a minor note: shouldn't you use "fix your f*ucking" instead of "fix your f*ucked"?

Milan Zamazal wrote:

Greylisting is a good antispam method as it puts the burden on spammers too. Just beware of relays sending mail to you -- greylisting them burdens just you and the relay instead of spammers (well, preferably motivating the relay admins to apply effective antispam means on their sites too). Non-filtering relays require special handling such as combining greylisting with other methods (for instance I'm going to apply razor on debian.org relays which are responsible for most of the spam passing through my primary shield).

Yenya wrote: Re: Of course

What war?

Yenya wrote: Re: Poor you

Thanks, fixed (my English is rather poor, I know :-).

Yenya wrote: Relays

Yes, I know - relays are Evil(tm) and should be avoided.

Honza Holčapek wrote: Re: Re: Poor you

Definitely not, you English is pretty good, and I mean it.

thingie wrote: Re: Re: Of course

Sort of a proverb. You are ready to block spam you've already got. But what about the spam that is going to come tommorow? Vven spammers have to hate spam, I think.

Adelton wrote: Greylisting stats?

Yenya, have you got some statistics of the percentage of emails caught by greylisting?

Yenya wrote: Re: Greylisting stats?

Sorry, I don't have any. I am looking for a statistics package for Postgrey, I don't have time to write it myself.


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)