Yenya's World

Tue, 30 May 2006

MIME::Words and UTF-8

We use the MIME::Words package from CPAN to handle encoding and decoding the RFC 1522-style e-mail headers (those =?UTF-8?Q?something=20something?=-like texts). Long time ago I have found that this package had a bug - when encoding two adjacent words the inner whitespace should be added to the first or the second word, because the whitespace between the two adjacent encoded words is discarded during decoding. When moving our system to UTF-8, I have decided to install a new MIME::Words module, and I wondered whether this bug is fixed.

In the manpage, they wrote:

It does not comply with the RFC-1522 rules regarding the use of encoded words in message headers. You may want to roll your own variant, using encoded_mimeword(), for your application. Thanks to Jan Kasprzak for reminding me about this problem.

So they did not fix the problem reported 3-5 years ago, they just acknowledged its existence (even with my name :-). The module also does not handle multi-byte characters (in UTF-8 strings) correctly, and defaults to the ISO-8859-1 encoding instead.

I have decided to fix this module, solving both the problem of two adjacent encoded words, and the problems of encoding/decoding from/to the multibyte strings. Here is the patch for MIME::Words and UTF-8. Hopefully they will apply it soon.

Section: /computers (RSS feed) | Permanent link | 9 writebacks

Yenya's World

Tue, 30 May 2006

MIME::Words and UTF-8

About:

Links:

Categories:

Archive:

Blog roll: