Monday, January 16, 2006

EXIF tags


Some images come with embedded metadata, such as EXIF tags or XMP. Images from AntWeb are a good example. These tags can be viewed by various programs, such as Adobe Photoshop, or utilities such as EXIF Viewer, seen here.

So, an obvious step would be (assuming we start using a triple store as a backend for iSpecies, and/or provide the results of a query in RDF) would be to extract metadata from EXIF tags. For example, the image http://www.antweb.org/images/casent0100367/casent0100367_p_1_low.jpg of Proceratium google in AntWeb has the following metadata:


File name: casent0100367_p_1_low.jpg
File size: 17811 bytes (0x0, infbpp, 0x)
EXIF Summary:

Camera-Specific Properties:

Camera Software: EXIFutils V2.5.7
Photographer: April Nobile

Image-Specific Properties:

Image Created: 2005:09:27 09:54:34
Comment: Attribution-NonCommercial-ShareAlike Creative Commons License

Other Properties:

Exif IFD Pointer: 196
Exif Version: 2.20


Hence, we could extract the relevant bits (author, date, copyright) and store those. This could be done in bulk using a tool such as ExifTool.

The example of AntWeb does show one weakness of free-text metadata. The image is licensed under "Attribution-NonCommercial-ShareAlike Creative Commons License". I'm assuming this is Attribution-NonCommercial-ShareAlike 2.0, but without a URL it is a faff to work this out. Ah, looking at the AntBase pages for individual specimens, it's actually 1.0. Yes, it's pretty obvious, but it still requires string matching. These things need to be computer readable as well, and versioned (for example, which version of this license was intended?).



For the photographer (such as AntWeb's April Nobile - seen here), it might be useful to create a FOAF file to link to, so that we have metadata about the creator of the images.

4 comments:

Anonymous said...

EXIF is a rather old, rather limited albeit widely implemented, image metadata standard. More heavy duty ones, including some which have XML representations and have been accepted by various actual standards bodies, are sketched in my chapter on image metadata in the recently published ENBI publication "Best Practices for Imaging Type Specimens." Late versions of most of the articles from this book are at http://wiki.cs.umb.edu/twiki/bin/view/BDEI/ENBITypeImaging

FWIW, these the major modern standards for image metadata are heavyweight industrial grade standards with support in the major image processing tools, unlike the state of RDF attempts at image metadata, which to the best of my knowledge remain "examples". RDF has its (substantial) virtue in its extensibility, but not in any support in mature image processing tools (AFAIK). What this means to me is that it will remain a good tool for image discovery, not---based on acceptance---for content description. Oops. Am I spouting off in the wrong blog?

Roderic Page said...

Sure EXIF is limited. My post was motivated by the perception that adding metadata to things can be tedious — we are all in a hurry and have too muuch to do, and if the kinds of projects we are talking about are going fly ideally they will be as easy to use as possible.

In the case of images, if the relevant metadata is stored in the image file itself (in whatever format, so long as it can be extracted), then this makes life simpler. Uploading a bunch of annotated images can be as simple as dumping them in a folder on a web site and publishing the URL. The images can then be sucked up, the metadata extracted and dumped into a triple store. The person providing the images doesn't have to do anything else, and most importantly doesn't duplicate any effort. If the image has metadata, they don't need to fill in any tedious web forms. EXIF is just one example of embedded metadata.

In the end it's all about simplicity and scalability. Without these, it will nae fly.

Anonymous said...

We are in perfect agreement. I forgot to mention that JPEG2000 has a much better story than JPEG about embedded metadata, whether in EXIF, DIG35,RDF, or anything else. Really, my only point is that there are no (known to me) image image management and processing tools in wide use by biologists---say as wide as Photoshop or NIHImage---that manage embedded RDF metadata. I hope I'm wrong, because apparently I have been volunteered to help produce an LSID resolver for images. All pointers would be appreciated, as would be any reference to an RDF representation of DIG35. To me that is important, because DIG35 has a sucky story on content metadata and extending it with RDF would be wonderful.

Anonymous said...

What do you know Metin2 gold. And do you want to know? You can get Metin2 yang here. And welcome to view our website, here you can play games, and you will get Cheap metin2 yang to play game. I know Cheap metin2 gold, and it is very interesting.Do you want a try, come and view our website, and you will learn how to Buy metin2 gold. Come and join with us. We are waiting for your coming.

Do want to know the magic of online games, and here you can get more Perfect World Gold. Do you want to have a try? Come on and Buy Perfect World Gold can make you happy.You can change a lot Perfect World Silver for play games. Playing online games can make much Perfect World money. And you will be happy at the sametime. And you can use the cheap Perfect World Gold do what you want to do in the online game.