Yenya's World

Thu, 03 Apr 2008

EXIF Comment

For an internal project, we need to store comments inside the JPEG images. I think the EXIF tag UserComment is suitable for our purpose (we need texts also in the Czech language, and the alternative tag, ImageDescription, is strictly US-ASCII only). Nevertheless, the problem still lies in the character set area.

The EXIF standard (PDF warning, look at 34th page, numbered "page 28" near the bottom) defines the UserComment data such that the first 8 bytes contain the charset info (strings "ASCII", "JIS", or "UNICODE" padded to 8 bytes with null bytes), and then the comment data. The problem is what "UNICODE" means. Is it UTF-8, UTF-16, or what?

I have tried to set the comment using Exiv2 utility, and tried to read it with Image::ExifTool Perl library. The following code prints the raw UserComment value (i.e. the string "UNICODE\0my_own_comment_as_utf8_bytes"):

#!/usr/bin/perl -w
use Image::ExifTool
my $info = Image::ExifTool::ImageInfo("exif_comment.jpg",
       { Charset=> "UTF8",PrintConv=>0 });
print $info->{UserComment}, "\n";

However, with PrintConv=>1 it prints garbage, so probably the UNICODE charset in EXIF means something different than UTF-8.

JPEG with EXIF comment

So, what does your favourite image handling program display as the EXIF UserComment for the above image? It should read: "Příliš žluťoučký kůň. こんにちは。".

Section: /computers (RSS feed) | Permanent link | 4 writebacks

4 replies for this story:

misch wrote:

Firefox Exif Viewer 1.40 says: User Comment (Hex) = 0x55,0x4e,0x49,0x43,0x4f,0x44,0x45,0x00,0x50,0xc5,0x99,0xc3,0xad,0x6c,0x69,0xc5,0xa1,0x20,0xc5,0xbe,0x6c,0x75,0xc5,0xa5,0x6f,0x75,0xc4,0x8d,0x6b,0xc3,0xbd,0x20,0x6b,0xc5,0xaf,0xc5,0x88,0x2e,0x20,0xe3,0x81,0x93,0xe3,0x82,0x93,0xe3,0x81,0xab,0xe3,0x81,0xa1,0xe3,0x81,0xaf,0xe3,0x80,0x82 User Comment Character Code = Unicode So it recognizes unicode text, but displays it as raw data :-(

Věroš wrote:

We use EXIF heavily at Cestovatel for more than two years. Most of images are commented by Zoner Photo Studio and their UNICODE EXIF is usually is usually saved as UTF-16. BTW: Try XMP ( http://www.adobe.com/products/xmp/ ). It's XML based solution so you don't have to bother with encoding.

Milan Zamazal wrote:

exiv2 displays it correctly, showfoto/digikam displays empty rectangles in place of all characters. Other programs I've tried either don't display user comments at all or they display them as common unknown tags (in hexa).

Yenya wrote: OK, next try

OK, next try - this time in UTF-16. Exiv2 does not display it correctly, Image::ExifTool does. Please reload the above image and retry with it. Thanks!

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)