Yenya's World

Thu, 03 Apr 2008

EXIF Comment

For an internal project, we need to store comments inside the JPEG images. I think the EXIF tag UserComment is suitable for our purpose (we need texts also in the Czech language, and the alternative tag, ImageDescription, is strictly US-ASCII only). Nevertheless, the problem still lies in the character set area.

The EXIF standard (PDF warning, look at 34th page, numbered "page 28" near the bottom) defines the UserComment data such that the first 8 bytes contain the charset info (strings "ASCII", "JIS", or "UNICODE" padded to 8 bytes with null bytes), and then the comment data. The problem is what "UNICODE" means. Is it UTF-8, UTF-16, or what?

I have tried to set the comment using Exiv2 utility, and tried to read it with Image::ExifTool Perl library. The following code prints the raw UserComment value (i.e. the string "UNICODE\0my_own_comment_as_utf8_bytes"):

#!/usr/bin/perl -w
use Image::ExifTool
my $info = Image::ExifTool::ImageInfo("exif_comment.jpg",
       { Charset=> "UTF8",PrintConv=>0 });
print $info->{UserComment}, "\n";

However, with PrintConv=>1 it prints garbage, so probably the UNICODE charset in EXIF means something different than UTF-8.

JPEG with EXIF comment

So, what does your favourite image handling program display as the EXIF UserComment for the above image? It should read: "Příliš žluťoučký kůň. こんにちは。".

Section: /computers (RSS feed) | Permanent link | 4 writebacks

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)