Tuesday, March 25, 2008

Wikipedia on iSpecies


I've added snippets from Wikipedia to iSpecies results, in part inspired by FreeBase. This makes use of the XML export format . For example, the URL http://en.wikipedia.org / wiki / Special:Export / Luzon_Montane_Forest_Mouse returns XML, with the wiki markup enclosed in the tags <text xml:space="preserve"></text> I use some simple regular expressions to strip some of the markup out, including the taxobox, then I grab the first 100 words of the article to display on the iSpecies page (together with a link to the original article).

Because a species may have multiple names, we need to handle redirection. For example, the URL http://en.wikipedia.org / wiki / Special:Export / Apomys_datae returns
<text xml:space="preserve">#Redirect [[Luzon Montane Forest Mouse]]</text>

which tells us that the content is to be found at http://en.wikipedia.org / wiki / Special:Export / Luzon_Montane_Forest_Mouse.

There's still some polishing to do, but the Wikipedia snippets add something to the iSpecies results.