Yenya's World

Mon, 26 Feb 2007

Superlinear Algorithms

A while ago I wrote about the Email::Address Perl module for parsing addresses in the e-mail headers (such as To:, Cc:, etc.). We use it in production system, but on Thursday we have got a problem, which I narrowed to an inoptimality in this module. The mail delivery daemon (which - amongst other things - parses the mail message) seemed to be stuck in an infinite loop.

Later I found out that the message in question contained an overly long To: header - it had more than 300 mail addresses there. I have investigated this further, and it seems that the Email::Address module has complexity far above O(n):

# of addressesparsing time
600.13 s
650.13 s
700.29 s
750.45 s
800.79 s
850.58 s
900.86 s
951.63 s
1002.38 s
1053.17 s
1104.02 s
1155.97 s
12011.37 s
125230.89 s
1301045.07 s

So it was definitely impossible for it to parse a header with >300 addresses in a reasonable amount of time. The alternative module, Mail::Address, is unusable on this particular message as well - it crashes perl with SIGSEGV :-(.

As a workaround, I wanted to truncate the header somewhere, but in order to truncate the header exactly between e-mail addresses, I would have to parse the header. Which is what I has been doing, and it took too much time. A nice Catch 22, isn't it? So I have invented a heuristics where I truncate the header after a comma followed by a newline. When this is not enough, I use just a sole comma as a separator, and when even this is not enough, I truncate the header to a fixed number of characters. So far it seems to be sufficient.

Section: /computers (RSS feed) | Permanent link | 3 writebacks


Yenya's World: Linux and beyond - Yenya's blog.


RSS feed

Jan "Yenya" Kasprzak

The main page of this blog



Blog roll:

alphabetically :-)