Mon, 26 Feb 2007
A while ago I wrote about the
Email::Address Perl module for parsing addresses
in the e-mail headers (such as
We use it in production system, but on Thursday we have got a problem,
which I narrowed to an inoptimality in this module. The mail delivery daemon
(which - amongst other things - parses the mail message) seemed to be
stuck in an infinite loop.
Later I found out that the message in question contained an overly long
To: header - it had more than 300 mail addresses there.
I have investigated this further, and it seems that the
Email::Address module has complexity far above O(n):
|# of addresses||parsing time|
So it was definitely impossible for it to parse a header with >300
addresses in a reasonable amount of time.
The alternative module,
Mail::Address, is unusable
on this particular message as well - it crashes
with SIGSEGV :-(.
As a workaround, I wanted to truncate the header somewhere, but in order to truncate the header exactly between e-mail addresses, I would have to parse the header. Which is what I has been doing, and it took too much time. A nice Catch 22, isn't it? So I have invented a heuristics where I truncate the header after a comma followed by a newline. When this is not enough, I use just a sole comma as a separator, and when even this is not enough, I truncate the header to a fixed number of characters. So far it seems to be sufficient.