Saturday, July 08, 2006

Adding sources to iSpecies

One issue which comes up every so often is how to add data sources to iSpecies. At present iSpecies queries NCBI, Yahoo images, and Google Scholar, each source requiring different code to make the query and handle the response. If adding new sources requires writing code specific to that source then iSpecies would rapidly become a nightmare (leaving aside the issue that until iSpecies is multithreaded, adding additional sources slows everything down -- see my earlier post about the need for speed).

One solution is to develop a standard search interface and ask data source to adopt that. The obvious candidate is OpenSearch, which I've already touched on over at iPhylo. OpenSearch is appealing because it is no more difficult than serving RSS feeds, and because it is based on RSS it can be integrated into a range of tools, such as Amazon's A9, and Internet Explorer 7.

At a minimum, it would be useful if sources supported OpenSearch. It would also be useful if they supported RSS to serve individual records. This is handy because NCBI links to numerous sources via LinkOut, and hence we could avoid the overhead of doing a search if we can retrieve the record directly (i.e., if NCBI has a link then I already now the information exists).

In say "RSS", I should stress that I really mean RSS 1.0 (i.e., RDF). RSS 2.0 and Atom are a lot less useful in the long run, because RSS 1.0 can be integrated into a triple store, which opens up a world of cool things (i.e., aggregating data and performing queries on that data).