Tuesday, August 29, 2006

Extracting DOIs

DOIs are pretty cool, so I spent a little time this evening working out how to extract DOIs from Google Scholar results for journals hosted by Springer, JStor, and J-Stage I've also added code to extract Serial Item and Contribution Identifiers (SICIs) from JSTor URLs. SICI is NISO standard Z39.56.

The point of this exercise is to try and get DOIs for as many articles as possible, because DOIs are the GUID of choice for publications, and we can extract metadata for a DOI, either directly using CrossRef's OpenURL resolver, or via Connotea. This will make life easier for the next step, namely aggregating literature into a triple store.