<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-18671685</id><updated>2011-04-22T05:16:52.658+01:00</updated><category term='XML'/><category term='Wikipedia'/><title type='text'>iSpecies</title><subtitle type='html'>A record of the development of the iSpecies search engine.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>39</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-18671685.post-8363255126460274097</id><published>2008-06-10T23:29:00.001+01:00</published><updated>2008-06-10T23:32:24.765+01:00</updated><title type='text'>Offline</title><content type='html'>iSpecies was off-line for a few hours today. I moved it from a local folder in my user folder to the &lt;font face="Courier"&gt;/Library/Server&lt;/font&gt; folder on the web server, and associated ispecies.org with it's own IP address (although it is still served from the same machine). Glasgow University's DNS seems takes a while to update, so consequently the site appeared to be broken. A quick external check using &lt;a href="http://network-tools.com/default.asp?prog=lookup&amp;host=ispecies.org"&gt;Network-Tools.com&lt;/a&gt; confirmed that ispecies.org had the new IP address, but locally it was still resolving to the holding page of &lt;a href="http://www.123-reg.co.uk/"&gt;123-reg&lt;/a&gt;, with whom I registered the domain. By fussing with the &lt;font face="Courier"&gt;VirtualHost&lt;/font&gt; directive in the Apache httpd.conf file, I managed to get it working again.&lt;br /&gt;&lt;pre style="border: 1px solid #c7cfd5;background: #f1f5f9;padding:15px;"&gt;NameVirtualHost 130.209.46.63&lt;br /&gt;&amp;lt;VirtualHost 130.209.46.63&amp;gt;   &lt;br /&gt;   DocumentRoot "/Library/WebServer/ispecies"&lt;br /&gt;   ServerName ispecies.org&lt;br /&gt;   ServerSignature email&lt;br /&gt;   DirectoryIndex index.php index.html index.htm index.shtml&lt;br /&gt;   LogLevel warn&lt;br /&gt;   HostNameLookups off&lt;br /&gt;   &amp;lt;Directory "/Library/WebServer/ispecies"&amp;gt;&lt;br /&gt;      allow from all&lt;br /&gt;      Options +Indexes&lt;br /&gt;   &amp;lt;/Directory&amp;gt;  &lt;br /&gt;&amp;lt;/VirtualHost&amp;gt; &lt;/pre&gt;&lt;br /&gt;The only difference users may notice is that the URLs will now always start with http://ispecies.org.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-8363255126460274097?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/8363255126460274097/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=8363255126460274097' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/8363255126460274097'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/8363255126460274097'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2008/06/offline.html' title='Offline'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-373737525580032559</id><published>2008-03-25T16:04:00.004Z</published><updated>2008-03-25T16:23:23.798Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='XML'/><title type='text'>Wikipedia on iSpecies</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Gct8lVAxKqQ/R-kiy04mbPI/AAAAAAAAALc/AdegAYqZ7O0/s1600-h/Nohat-logo-nowords-bgwhite-200px.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_Gct8lVAxKqQ/R-kiy04mbPI/AAAAAAAAALc/AdegAYqZ7O0/s320/Nohat-logo-nowords-bgwhite-200px.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5181711102851312882" /&gt;&lt;/a&gt;&lt;br /&gt;I've added snippets from &lt;a href="http://en.wikipedia.org/wiki/Main_Page"&gt;Wikipedia&lt;/a&gt; to iSpecies results, in part inspired by &lt;a href="http://www.freebase.com/"&gt;FreeBase&lt;/a&gt;. This makes use of the &lt;a href="http://en.wikipedia.org/wiki/Special:Export"&gt;XML export format &lt;/a&gt;. For example, the URL &lt;a href="http://en.wikipedia.org/wiki/Special:Export/Luzon_Montane_Forest_Mouse"&gt;http://en.wikipedia.org / wiki / Special:Export / Luzon_Montane_Forest_Mouse&lt;/a&gt; returns XML, with the wiki markup enclosed in the tags &amp;lt;text xml:space="preserve"&amp;gt;&amp;lt;/text&amp;gt; I use some simple regular expressions to strip some of the markup out, including the &lt;a href="http://en.wikipedia.org/wiki/Wikipedia:TAXOBOX"&gt;taxobox&lt;/a&gt;, then I grab the first 100 words of the article to display on the iSpecies page (together with a link to the original article). &lt;br /&gt;&lt;br /&gt;Because a species may have multiple names, we need to handle redirection. For example, the URL &lt;a href="http://en.wikipedia.org/wiki/Special:Export/Apomys_datae"&gt;http://en.wikipedia.org / wiki / Special:Export / Apomys_datae&lt;/a&gt; returns &lt;br /&gt;&lt;pre style="border: 1px solid #c7cfd5;background: #f1f5f9;padding:15px;"&gt;&amp;lt;text xml:space="preserve"&amp;gt;#Redirect [[Luzon Montane Forest Mouse]]&amp;lt;/text&amp;gt;&lt;/pre&gt;&lt;br /&gt;which tells us that the content is to be found at &lt;a href="http://en.wikipedia.org/wiki/Special:Export/Luzon_Montane_Forest_Mouse"&gt;http://en.wikipedia.org / wiki / Special:Export / Luzon_Montane_Forest_Mouse&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;There's still some polishing to do, but the Wikipedia snippets add something to the iSpecies results.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-373737525580032559?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/373737525580032559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=373737525580032559' title='34 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/373737525580032559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/373737525580032559'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2008/03/wikipedia-on-ispecies.html' title='Wikipedia on iSpecies'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_Gct8lVAxKqQ/R-kiy04mbPI/AAAAAAAAALc/AdegAYqZ7O0/s72-c/Nohat-logo-nowords-bgwhite-200px.jpg' height='72' width='72'/><thr:total>34</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-2279268323798462594</id><published>2007-08-30T15:22:00.000+01:00</published><updated>2007-08-30T16:46:48.120+01:00</updated><title type='text'>Maps, and a Google tweak</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Gct8lVAxKqQ/RtbS4Wm8lRI/AAAAAAAAAFM/yddprEpSTqg/s1600-h/widgetIcon.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_Gct8lVAxKqQ/RtbS4Wm8lRI/AAAAAAAAAFM/yddprEpSTqg/s320/widgetIcon.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5104499093254477074" /&gt;&lt;/a&gt;&lt;br /&gt;Today I stumbled across the &lt;a href="http://www.apple.com/downloads/dashboard/information/speciesdistributionmap.html"&gt;Species Distribution Widget&lt;/a&gt; from GBIF (written by Tim Robertson and Dave Martin). For Mac OS X 10.4 users, this provides a cool way to quickly get a distribution map for a taxon. Given that &lt;a href="http://www.apple.com/macosx/features/dashboard/"&gt;Apple dashboard widgets&lt;/a&gt; are essentially Javascript and HTML, it occurred to me to reverse engineer the widget to see what it did. To open the widget you just "Ctrl-click" on the widget icon, select &lt;strong&gt;Show Package Contents&lt;/strong&gt;, and the contents open in a Finder window.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Gct8lVAxKqQ/RtbXS2m8lTI/AAAAAAAAAFc/MiUNKIo4W70/s1600-h/folder2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_Gct8lVAxKqQ/RtbXS2m8lTI/AAAAAAAAAFc/MiUNKIo4W70/s400/folder2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5104503946567521586" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The guts of the widget is in the &lt;strong&gt;scripts&lt;/strong&gt; folder. This contains a Javascript file. The widget calls the URL &lt;strong&gt;http://data.gbif.org/species/taxonName/ajax/returnType/concept /view/ajaxMapUrls/provider/1/?query=&lt;/strong&gt;, to which is appended the taxon name you are searching for. Back comes the result in XML. For example, &lt;a href="http://data.gbif.org/species/taxonName/ajax/returnType/concept/view/ajaxMapUrls/provider/1/?query=Apus+apus" target="_blank"&gt;searching for &lt;i&gt;Apus apus&lt;/i&gt;&lt;/a&gt; returns: &lt;br /&gt;&lt;pre style="border: 1px solid #c7cfd5;background: #f1f5f9;padding:15px;"&gt;&amp;lt;taxons&amp;gt;&lt;br /&gt; &amp;lt;taxon&amp;gt;&lt;br /&gt;  &amp;lt;name&amp;gt;Apus apus&amp;lt;/name&amp;gt;&lt;br /&gt;   &amp;lt;commonName&amp;gt;Common swift&amp;lt;/commonName&amp;gt;&lt;br /&gt;   &amp;lt;key&amp;gt;13836131&amp;lt;/key&amp;gt;&lt;br /&gt;   &amp;lt;url&amp;gt;species/13836131/overviewMap.png&amp;lt;/url&amp;gt;&lt;br /&gt; &amp;lt;/taxon&amp;gt;&lt;br /&gt;&amp;lt;/taxons&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(Shouldn't "taxons" be "taxa"?) The URL of the corresponding map is given in the &amp;lt;url&amp;gt; tag. Append this to "&lt;strong&gt;http://data.gbif.org/&lt;/strong&gt;, and you have the URL for the image of the map. For example, here's the map for &lt;i&gt;Apus apus&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Gct8lVAxKqQ/RtbYyGm8lUI/AAAAAAAAAFk/jJBYfh3Rxto/s1600-h/overviewMap.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_Gct8lVAxKqQ/RtbYyGm8lUI/AAAAAAAAAFk/jJBYfh3Rxto/s400/overviewMap.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5104505582950061378" /&gt;&lt;/a&gt;&lt;br /&gt;I've added code to do this to iSpecies, so it now features maps from from GBIF. I've also finally tweaked the Google code to stop mangling UTF-8 characters.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-2279268323798462594?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/2279268323798462594/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=2279268323798462594' title='61 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/2279268323798462594'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/2279268323798462594'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2007/08/maps-and-google-tweak.html' title='Maps, and a Google tweak'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_Gct8lVAxKqQ/RtbS4Wm8lRI/AAAAAAAAAFM/yddprEpSTqg/s72-c/widgetIcon.png' height='72' width='72'/><thr:total>61</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-716871994961433071</id><published>2007-03-05T15:38:00.000Z</published><updated>2007-03-05T15:44:46.613Z</updated><title type='text'>5 Ways to Mix, Rip, and Mash Your Data</title><content type='html'>Spotted by &lt;a href="http://www.simon.rycroft.name/"&gt;Simon Rycroft&lt;/a&gt;, Nick Gonzalez has a comparison of maship scripts entitled: &lt;a href="http://www.techcrunch.com/2007/03/02/5-ways-to-mix-rip-and-mash-your-data/"&gt;5 Ways to Mix, Rip, and Mash Your Data&lt;/a&gt;.&lt;br /&gt;&lt;blockquote&gt;Call them pipes, teqlos, dapps, modules, mashups or whatever else but fact is that recently we have seen a good number of new services that allow developers and users to build mini-apps and mashups that mix and re-mix data. Here we run through 5 applications that allow you to mix, rip and mash your data, looking at the data input, output, REST support, suggested use, and required skill level.&lt;/blockquote&gt;&lt;br /&gt;&lt;a href="http://www.techcrunch.com/wp-content/mashfeatcomp.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px;" src="http://www.techcrunch.com/wp-content/mashfeatcomp.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Clearly, this stuff is attracting a lot of attention.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-716871994961433071?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/716871994961433071/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=716871994961433071' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/716871994961433071'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/716871994961433071'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2007/03/5-ways-to-mix-rip-and-mash-your-data.html' title='5 Ways to Mix, Rip, and Mash Your Data'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-117294883084006711</id><published>2007-03-03T19:07:00.000Z</published><updated>2007-03-03T19:07:10.880Z</updated><title type='text'>Wikis and the future of iSpecies</title><content type='html'>So, where next for iSpecies? An obvious route seems to be adding a Wiki, something I've discussed on &lt;a href="http://semant.blogspot.com/2006/08/wikis_28.html"&gt;SemAnt&lt;/a&gt;. Imagine pre-populating a Wiki with search results from iSpecies, especially if we drilled down using the links in the NCBI search results to extract further content, and made use of the improved mapping between NCBI and TreeBASE names (&lt;a href="http://linnaeus.zoology.gla.ac.uk/~rpage/tbmap/"&gt;TBMap&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;A few things have stopped me from implementing this. One is the problem that Wiki's are (usually) just unstructured text. However, semantic wikis are starting to emerge (e.g., &lt;a href="http://wiki.ontoworld.org/wiki/Semantic_MediaWiki"&gt;Semantic MediaWiki&lt;/a&gt;, and &lt;a href="http://www.liminalzone.org/Rhizome"&gt;Rhizome&lt;/a&gt; -- I'll be adding links to more at &lt;a href="http://del.icio.us/rdmpage/semantic-wiki"&gt;del.icio.us/rdmpage/semantic-wiki&lt;/a&gt;). Using a semantic wiki means we can enter structured information and render it as RDF, which would make it potentially a great way to cpature basic facts (triples) about a taxon, but still have human-readable and editable documents.&lt;br /&gt;&lt;br /&gt;I've been pondering this, and toying with either writing something myself, or using an off the shelf solution. It's like that I may write something, because I want to link it to a triple store, and I want to pre-populate the wiki as much as possible.&lt;br /&gt;&lt;br /&gt;One minor thing that has been holding me back is thinking about URLs to link to the content. For example, I'd like to be able to do the following:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Link to a page by either a unique numerical identifier (e.g., "wiki/0342001", or a name (e.g., "wiki/Physeter catodon"). If the user enters the numerical version, they get directed to the text identifier.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;If a name is a synonym, redirect user to that page. For example, "wiki/Physeter macrocephalus" would redirect to "wiki/Physeter catodon").&lt;/li&gt;&lt;br /&gt;&lt;li&gt;If the name is a homonym, display a disambiguation page listing the different taxa with that name.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;If a user creates a URL that doesn't exist, the wiki would offer to make a new page, after checking that the URL tag is a scientific name (say by using uBio's  &lt;a href="http://www.ubio.org/index.php?pagename=xml_services"&gt;XML web service&lt;/a&gt;).&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;I've been learning about the joys of Apache's mode-rewrite, which looks like a nice way to deal with some of these issues. For example, this .htaccess file handles both numerical and text identifiers.&lt;br /&gt;&lt;pre style="border: 1px solid #c7cfd5;background: #f1f5f9;padding:15px;"&gt;&lt;br /&gt;# Don't mess with the actual script call&lt;br /&gt;RewriteRule ^get.php*         -             [L]&lt;br /&gt;# URL is numerical id&lt;br /&gt;RewriteRule (^[0-9]*$)	get.php?id=$1       [L]&lt;br /&gt;# URL is tag name&lt;br /&gt;RewriteRule (^[A-Za-z](.*))	get.php?name=$1 [L]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then, the code in get.php would do display the appropriate page. If the parameter is a numerical id, it's a simple database lookup (numerical identifiers are great because databases handle them easily, and they can be stored without worrying about issues such as capitalisation and punctionation). If it's a name we follow the steps outlined above to handle synonyms, etc.&lt;br /&gt;&lt;br /&gt;The point of this is that we get clean URLs, but users can still link using natural URLs like those in WikiPedia and WikiSpecies. Given this, why don't I use WikiSpecies? Well, because it's not a semantic wiki, so I don't gain anything from locking information up in this format.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-117294883084006711?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/117294883084006711/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=117294883084006711' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/117294883084006711'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/117294883084006711'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2007/03/wikis-and-future-of-ispecies.html' title='Wikis and the future of iSpecies'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-117217738698512622</id><published>2007-02-22T20:38:00.000Z</published><updated>2007-02-22T20:49:46.996Z</updated><title type='text'>RSSBus</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.rssbus.com/img/rssbus.gif"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://www.rssbus.com/img/rssbus.gif" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;David Shorthouse altered me to &lt;a href="http://www.rssbus.com" target="_new"&gt;RSSBus&lt;/a&gt;, which is similar to Yahoo's &lt;a href="http://pipes.yahoo.com"&gt;Pipes&lt;/a&gt;, but Lance Robinson (the "Tech Evangelist" at RSSBus) &lt;a href="http://blog.rssbus.com/default.aspx"&gt;argues&lt;/a&gt; that their product is much better. What is RSSBus?&lt;br /&gt;&lt;blockquote&gt;RSSBus is a Really Simple Service Bus that uses the RSS protocol as the main interchange mechanism. RSS is an extensible protocol used to exchange Feeds of Items. Normally these are news items or blog postings, but they don't have to be: RSS Feeds may be augmented through standard RSS extensions to exchange any type of data.&lt;br /&gt;&lt;br /&gt;RSSBus is a collection of tools and services that simplify the process of creating RSS Feeds with rich data extensions. Feeds are generated from RSSBus Connectors, reusable code modules that convert data into feeds. They do so by communicating with RSSBus over defined interfaces (please refer to our RSSBus Connectors Reference for details on building custom connectors).&lt;br /&gt;&lt;br /&gt;RSSBus provides an infrastructure for generating, maintaining, combining, manipulating, and visualizing Feeds. Items and Feeds are orchestrated by the RSSBus Engine and together help create a loosely integrated application architecture which we like to refer to as RSS Web.&lt;/blockquote&gt; &lt;br /&gt;David says he has managed to recreate iSpecies on his desktop with RSSBus, which sounds cool. So far RSSBus is a Windows only tool, although there is code for other platforms listed on the &lt;a href="http://blog.rssbus.com/default.aspx"&gt;blog&lt;/a&gt;. There is also a &lt;a href="http://www.rssbus.com/docs/rssbus.aspx"&gt;white paper&lt;/a&gt;. &lt;br /&gt;Looks like the &lt;a href="http://www.blogger.com/comment.g?blogID=18671685&amp;postID=116507514354753306"&gt;conversation on OpenSearch, RSS, and biodiversity informatics&lt;/a&gt; has only just got started.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-117217738698512622?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/117217738698512622/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=117217738698512622' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/117217738698512622'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/117217738698512622'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2007/02/rssbus.html' title='RSSBus'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-116507514354753306</id><published>2006-12-02T15:59:00.000Z</published><updated>2006-12-02T16:22:36.790Z</updated><title type='text'>Open Search and The Nearctic Spider Database - almost there</title><content type='html'>As &lt;a href="http://mailman.nhm.ku.edu/pipermail/taxacom/2006-December/025007.html" target="_blank"&gt;announced&lt;/a&gt; on TAXACOM, David Shorthouse has added an Open Search interface to his really nice &lt;a href="http://canadianarachnology.dyndns.org/data/canada_spiders/" target="_blank"&gt; Nearctic Spider Database&lt;/a&gt;. As I've noted previously (see &lt;a href="http://ispecies.blogspot.com/2006/07/adding-sources-to-ispecies.html"&gt;Adding sources to iSpecies&lt;/a&gt; and &lt;a href="http://ispecies.blogspot.com/2006/09/opensearch-and-ispecies.html"&gt;OpenSearch and iSpecies&lt;/a&gt; ), &lt;a href="http://opensearch.a9.com/" target="_blank"&gt;OpenSearch&lt;/a&gt; seems an obvious candidate for a simple way to add search functionality to biodiversity web sites.&lt;br /&gt;&lt;a href="http://www.spiderling.de/arages/Fotogalerie/Enoplognatha_latimana_1024.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px;" src="http://www.spiderling.de/arages/Fotogalerie/Enoplognatha_latimana_1024.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;The interface is generated by some software called &lt;a href="http://www.wrensoft.com/zoom/" target="_blank"&gt;Zoom Search&lt;/a&gt;, and the interface is &lt;a href="http://canadianarachnology.dyndns.org/data/canada_spiders/search/search.xml" target="_blank"&gt;here&lt;/a&gt;. As an example, &lt;a href="http://canadianarachnology.dyndns.org/data/canada_spiders/search/search.cgi?zoom_query=Enoplognatha+latimana&amp;zoom_xml=1" target="_blank"&gt;here is a query&lt;/a&gt; for the spider &lt;i&gt;Enoplognatha latimana&lt;/i&gt;. &lt;br /&gt;&lt;br /&gt;&lt;strong&gt;But...&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Having an easy way to search a site using a URL API such as Open Search is great, but the feed is RSS 2.0, and as a result has very little information. For example, here's an extract:&lt;br /&gt;&lt;br /&gt;&lt;font face="Courier"&gt;&lt;br /&gt;&amp;lt;item&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;title&amp;gt;The Nearctic Spider Database: Enoplognatha latimana Hippa &amp;#38; Oksala, 1982 Description&amp;lt;/title&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;link&amp;gt;http://canadianarachnology.dyndns.org/data/spiders/7561&amp;lt;/link&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;description&amp;gt;THERIDIIDAE: Enoplognatha latimana taxonomic and natural history description in the Nearctic Spider Database.&amp;lt;/description&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;zoom:context&amp;gt; ... Descriptions Home Search: Register Log in Enoplognatha latimana Hippa&amp;#38; Oksala, 1982 Temporary ...  2007 Arachnid Calendar FAMILY: THERIDIIDAE Sundevall, 1833 Genus: Enoplognatha Pavesi, 1880 ...&amp;lt;/zoom:context&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;zoom:termsMatched&amp;gt;2&amp;lt;/zoom:termsMatched&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;zoom:score&amp;gt;1804&amp;lt;/zoom:score&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;/item&amp;gt;&lt;br /&gt;&lt;/font&gt;&lt;br /&gt;&lt;br /&gt;This information is intended to be displayed in a feed reader, and hence viewed by a human. But, what if I want to put this information in a database, or combine it with other data sources in a mashup, such as iSpecies? Well, I have to scrape information out of free formatted text. In other words, I'm no further forward than if I scraped the original web page.&lt;br /&gt;&lt;br /&gt;If we want to make the information accessible to a computer, then we need something else. RDF is the obvious way forward.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;The difference that RDF makes&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;To illustrate the difference, let's search for images of the same spider (&lt;i&gt;Enoplognatha latimana&lt;/i&gt;) using my Open Search wrapper for Yahoo's images search (described in &lt;a href="http://ispecies.blogspot.com/2006/09/opensearch-and-ispecies.html"&gt;OpenSearch and iSpecies&lt;/a&gt;). Here is the &lt;a href="http://darwin.zoology.gla.ac.uk/cgi-bin/yahoo.cgi?q=Enoplognatha%20latimana" target="_blank"&gt;query&lt;/a&gt;. This feed is formatted as RSS 1.0, and I can view it in a feed reader, such as &lt;a href="http://www.newsgator.com/NGOLProduct.aspx?ProdID=NetNewsWire" target="_blank"&gt;NetNewsWire&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://photos1.blogger.com/x/blogger/4123/605/1600/175367/rss.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/x/blogger/4123/605/320/822212/rss.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;But, because the feed is RSS 1.0 and therefore RDF, the feed contains lots of information on the image in a form that can be easily consumed. &lt;br /&gt;&lt;br /&gt;&lt;font face="Courier"&gt;&lt;br /&gt;&amp;lt;foaf:Image rdf:about="http://www.spiderling.de/arages/&lt;br /&gt;Fotogalerie/Enoplognatha_latimana_1024.jpg"&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;dc:type&amp;gt;image&amp;lt;/dc:type&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;dc:title&amp;gt;Enoplognatha_latimana_1024.jpg&amp;lt;/dc:title&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;dc:description&amp;gt;&amp;lt;/dc:description&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;dc:subject&amp;gt;Enoplognatha latimana&amp;lt;/dc:subject&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;dc:source&amp;gt;http://www.spiderling.de/arages/&lt;br /&gt;Verbreitungskarten/ENO_LAT0.HTM&amp;lt;/dc:source&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;dc:format&amp;gt;image/jpeg&amp;lt;/dc:format&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;lt;foaf:thumbnail rdf:resource=&lt;br /&gt;"http://re3.mm-a1.yimg.com/image/206564554"/&amp;gt;&lt;br /&gt;&amp;lt;/foaf:Image&amp;gt;&lt;br /&gt;&lt;/font&gt;&lt;br /&gt;&lt;br /&gt;In this example, I use the &lt;a href="http://xmlns.com/foaf/0.1/" target="_blank"&gt;FOAF&lt;/a&gt; and Dublin Core vocabularies. these are widely used, making it easy to integrate this information into a larger database, such as a triple store. To my mind, this is the way forward. We need to move beyond thinking about making data only accessible to people, and making it &lt;strong&gt;accessible to computers&lt;/strong&gt;. Once we do this, then we can start to aggregate and query the huge amounts of data on the web (as exemplified by David's wonderful site on spiders). And once we do that, we may discover all sorts of things that we don't know (see &lt;a href="http://semant.blogspot.com/2006/07/disconnected-databases.html"&gt;Disconnected databases&lt;/a&gt;,  and &lt;a href="http://semant.blogspot.com/2006/06/discovering-new-things.html"&gt;Discovering new things&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-116507514354753306?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/116507514354753306/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=116507514354753306' title='41 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/116507514354753306'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/116507514354753306'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/12/open-search-and-nearctic-spider.html' title='Open Search and The Nearctic Spider Database - almost there'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>41</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-116283389034168256</id><published>2006-11-06T17:16:00.000Z</published><updated>2006-11-06T17:24:50.586Z</updated><title type='text'>Identification service</title><content type='html'>&lt;a href="http://www.flickr.com/photos/globalvoyager/" targat="_blank"&gt;Nick Hobgood&lt;/a&gt; emailed me asking whether iSpecies supports requests for identifications. In other words, is this fish &lt;i&gt;Rudarius minutus&lt;/i&gt;?&lt;br /&gt;&lt;br /&gt;&lt;a href="http://static.flickr.com/118/285519423_3b9f4edc65.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px;" src="http://static.flickr.com/118/285519423_3b9f4edc65.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;iSpecies doesn't support requests, but it strikes me a useful idea if there was a place where such requests could be directed. The &lt;a href="http://mailman.nhm.ku.edu/mailman/listinfo/taxacom"&gt;TAXACOM&lt;/a&gt; mailing list is one place I've seen requests made, but a mailing list is probably not the best forum. An interesting idea to pursue...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-116283389034168256?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/116283389034168256/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=116283389034168256' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/116283389034168256'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/116283389034168256'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/11/identification-service.html' title='Identification service'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-115745322022869270</id><published>2006-09-05T11:08:00.000+01:00</published><updated>2006-09-05T11:47:00.263+01:00</updated><title type='text'>OpenSearch and iSpecies</title><content type='html'>I've mentioned OpenSearch in an &lt;a href="http://ispecies.blogspot.com/2006/07/adding-sources-to-ispecies.html"&gt;earlier post&lt;/a&gt;, in the context of adding additional sources to iSpecies. But it's slowly dawned on me that what i should be doing is wraping the sources I &lt;strong&gt;currently&lt;/strong&gt; use in OpenSearch as well. Hence, any data source would have a consistent query interface, and a consistent return format. If we ensure the later is RDF, then we get aggregation "for free".&lt;br /&gt;&lt;a href="http://www.parasiticplants.siu.edu/Viscaceae/images/Atta.mex.draw.JPEG"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px;" src="http://www.parasiticplants.siu.edu/Viscaceae/images/Atta.mex.draw.JPEG" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;So, I've made a start. First up is Yahoo's image search, which I've wrapped as http://darwin.zoology.gla.ac.uk/cgi-bin/yahoo.cgi. You just append "q=" and the search terms to get a result. Try an &lt;a href="http://darwin.zoology.gla.ac.uk/cgi-bin/yahoo.cgi?q=Atta%20mexicana" target="_new"&gt;example search&lt;/a&gt; for images of the ant &lt;i&gt;Atta mexicana&lt;/i&gt;. Note that I currently just support the return format, not the query format (that'll come later). The query result is RSS 1.0 because it contains RDF (RSS 2.0 and Atom don't, and hence for my purposes are beside the point). The upshot is that I can now use this search in other projects, and making a better iSpecies becomes simply a case of adding a bunch of OpenSearch sources together.&lt;br /&gt;&lt;br /&gt;Generating the RSS proved "fun", but the feed now validates as RDF, although &lt;a href="http://www.feedvalidator.org/"&gt;Feed Validator&lt;/a&gt; grumbles slightly. It's all a bit of a black art, but I had to nest the RDF payload in &amp;lt;content:item&amp;gt; tags, like this:&lt;br /&gt;&lt;pre&gt;&amp;lt;content:item&amp;gt;&lt;br /&gt;&amp;lt;foaf:Image rdf:about="http://www.par...x.draw.JPEG"&amp;gt;&lt;br /&gt;  &amp;lt;dc:type&amp;gt;image&amp;lt;/dc:type&amp;gt;&lt;br /&gt;  &amp;lt;dc:title&amp;gt;Atta.mex.draw.JPEG&amp;lt;/dc:title&amp;gt;&lt;br /&gt;  &amp;lt;dc:description&amp;gt;Leaf-cutter ants (Atta mexicana ) ... &amp;lt;/dc:description&amp;gt;&lt;br /&gt;  &amp;lt;dc:subject&amp;gt;Atta mexicana&amp;lt;/dc:subject&amp;gt;&lt;br /&gt;  &amp;lt;dc:source&amp;gt;http://www.parasiticplants.siu.edu/Viscaceae&amp;lt;/dc:source&amp;gt;&lt;br /&gt;  &amp;lt;dc:format&amp;gt;image/jpeg&amp;lt;/dc:format&amp;gt;&lt;br /&gt;  &amp;lt;foaf:thumbnail rdf:resource="http://mud.mm-a5.yimg.com/image/2050519657"/&amp;gt;&lt;br /&gt;&amp;lt;/foaf:Image&amp;gt;&lt;br /&gt;&amp;lt;/content:item&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-115745322022869270?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/115745322022869270/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=115745322022869270' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115745322022869270'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115745322022869270'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/09/opensearch-and-ispecies.html' title='OpenSearch and iSpecies'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-115712785965719665</id><published>2006-09-01T16:08:00.000+01:00</published><updated>2006-09-01T17:25:45.950+01:00</updated><title type='text'>More DOIs</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.scielo.br/img/en/fbpelogp.gif"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://www.scielo.br/img/en/fbpelogp.gif" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Following on from an &lt;a href="http://ispecies.blogspot.com/2006/08/extracting-dois.html"&gt;earleir post&lt;/a&gt;, I've now added DOI extraction for &lt;a href="http://www.scielo.br/scielo.php/lng_en"&gt;SciELO&lt;/a&gt;, which hosts Brazilian publications, and &lt;a href="http://www.taylorandfrancis.com/"&gt;Taylor and Francis&lt;/a&gt;. This was motivated by searching iSpecies for the ant &lt;a href="http://darwin.zoology.gla.ac.uk/~rpage/ispecies/?q=Trachymyrmex+opulentus"&gt;&lt;i&gt;Trachymyrmex opulentus&lt;/i&gt;&lt;/a&gt;, for which only papers hosted by these two publishers appear in the search results.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.evergreen.edu/ants/Genera/trachymyrmex/species/opulentus/INBIOCRI001238138_l.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px;" src="http://www.evergreen.edu/ants/Genera/trachymyrmex/species/opulentus/INBIOCRI001238138_l.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Again, we are reduced to screen scraping (sigh). Why oh why don't the people who design these web sites get their act together and embed useful information in the HTML, rather than assume that only humans will make use of these pages?&lt;br /&gt;&lt;br /&gt;One provider that is clued up is Ingenta. For example, take a look at the &lt;a href="http://www.ingentaconnect.com/content/tandf/snfe/2003/00000038/00000002/art00006"&gt;HTML&lt;/a&gt; for the article "Influence of Topography on the Distribution of Ground-Dwelling Ants in an Amazonian Forest" (&lt;a href="http://dx.doi.org/10.1076/snfe.38.2.115.15923"&gt;doi:10.1076/snfe.38.2.115.15923&lt;/a&gt;) on the Ingenta site (Firefox and Camino users can see the source &lt;a href="view-source:http://www.ingentaconnect.com/content/tandf/snfe/2003/00000038/00000002/art00006"&gt;here&lt;/a&gt;). Embedded in the &amp;lt;meta&amp;gt; tags is all sorts of metadata, including the DOI:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;meta name="DC.identifier" scheme="URI" &lt;br /&gt;   content="info:doi/10.1076/snfe.38.2.115.15923"/&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The use of consistently formatted tags makes data extraction much easier. Of course, it's no surprise that Ingenta do this well (check out their &lt;a href="http://allmyeye.blogspot.com/"&gt;blog&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-115712785965719665?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/115712785965719665/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=115712785965719665' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115712785965719665'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115712785965719665'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/09/more-dois.html' title='More DOIs'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-115688802702256084</id><published>2006-08-29T22:31:00.000+01:00</published><updated>2006-08-29T22:47:07.043+01:00</updated><title type='text'>Extracting DOIs</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.springerlink.com/images/springerlink-logo.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://www.springerlink.com/images/springerlink-logo.gif" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.jstage.jst.go.jp/images/logo.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://www.jstage.jst.go.jp/images/logo.gif" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;DOIs are pretty cool, so I spent a little time this evening working out how to extract DOIs from Google Scholar results for journals hosted by &lt;a href="http://www.springerlink.com/"&gt;Springer&lt;/a&gt;, &lt;a href="http://www.jstor.org/"&gt;JStor&lt;/a&gt;, and &lt;a href="http://www.jstage.jst.go.jp/browse/"&gt;J-Stage&lt;/a&gt; I've also added code to extract Serial Item and Contribution Identifiers (SICIs) from JSTor URLs. SICI is NISO standard &lt;a href="http://www.niso.org/standards/standard_detail.cfm?std_id=530"&gt;Z39.56&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The point of this exercise is to try and get DOIs for as many articles as possible, because DOIs are the GUID of choice for publications, and we can extract metadata for a DOI, either directly using CrossRef's &lt;a href="http://www.crossref.org/02publishers/openurl_info.html"&gt;OpenURL resolver&lt;/a&gt;, or via &lt;a href="http://www.connotea.org"&gt;Connotea&lt;/a&gt;. This will make life easier for the next step, namely aggregating literature into a triple store.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-115688802702256084?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/115688802702256084/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=115688802702256084' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115688802702256084'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115688802702256084'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/08/extracting-dois.html' title='Extracting DOIs'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-115514062809311490</id><published>2006-08-09T17:11:00.000+01:00</published><updated>2006-08-09T17:23:48.136+01:00</updated><title type='text'>Add to Connotea</title><content type='html'>Finally go around to adding a &amp;quot;Add to Connotea&amp;quot; button &lt;img style="cursor: pointer;" src="http://darwin.zoology.gla.ac.uk/%7Erpage/ispecies/images/connotea.png" alt="" border="0" /&gt; to the Google Scholar results, based on code from &lt;a href="http://postgenomic.com"&gt;Postgenomic&lt;/a&gt;. The code is a simple bit of Javascript:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;a style="cursor:pointer;" onclick="javascript:&lt;br /&gt;u='http://dx.doi.org/10.1111/j.1744-7429.2005.37_04_01.x';&lt;br /&gt;a=false;&lt;br /&gt;x=window;&lt;br /&gt;e=x.encodeURIComponent;&lt;br /&gt;d=document;&lt;br /&gt;w=open('http://www.connotea.org/addpopup?continue=confirm&lt;br /&gt;&amp;amp;uri='+e(u),'add','width=660,height=300,scrollbars,resizable');&lt;br /&gt;void(x.setTimeout('w.focus()',200));"&amp;gt;&lt;br /&gt;&amp;lt;img src="images/connotea.png" border="0" &lt;br /&gt;alt="add bookmark to connotea" align="absmiddle"&amp;gt;&amp;lt;/a&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;where &lt;a href="http://dx.doi.org/10.1111/j.1744-7429.2005.37_04_01.x"&gt;http://dx.doi.org/10.1111/j.1744-7429.2005.37_04_01.x&lt;/a&gt; is the URI of the article being added.&lt;br /&gt;&lt;br /&gt;Now a click brings up &lt;a href="http://www.connotea.org"&gt;Connotea&lt;/a&gt; and you can add a paper you've found using iSpecies. At present this only works for papers where I've extracted a DOI.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-115514062809311490?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/115514062809311490/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=115514062809311490' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115514062809311490'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115514062809311490'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/08/add-to-connotea.html' title='Add to Connotea'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-115231967138272827</id><published>2006-07-08T01:47:00.000+01:00</published><updated>2006-07-08T01:47:51.423+01:00</updated><title type='text'>Adding sources to iSpecies</title><content type='html'>&lt;img src="http://opensearch.a9.com/aggregator.gif" align="right" width="300"/&gt;One issue which comes up every so often is how to add data sources to iSpecies. At present iSpecies queries NCBI, Yahoo images, and Google Scholar, each source requiring different code to make the query and handle the response. If adding new sources requires writing code specific to that source then iSpecies would rapidly become a nightmare (leaving aside the issue that until iSpecies is multithreaded, adding additional sources slows everything down -- see &lt;a href="http://ispecies.blogspot.com/2006/03/towards-faster-ispecies-building.html"&gt;my earlier post&lt;/a&gt; about the need for speed). &lt;br /&gt;&lt;br /&gt;One solution is to develop a standard search interface and ask data source to adopt that. The obvious candidate is &lt;a href="http://opensearch.a9.com/"&gt;OpenSearch&lt;/a&gt;, which I've already touched on over at &lt;a href="http://iphylo.blogspot.com/2006/03/opensearch-and-ie7.html"&gt;iPhylo&lt;/a&gt;. OpenSearch is appealing because it is no more difficult than serving RSS feeds, and because it is based on RSS it can be integrated into a range of tools, such as Amazon's &lt;a href="a9.com"&gt;A9&lt;/a&gt;, and &lt;a href="http://www.daveyp.com/blog/index.php/archives/70/"&gt;Internet Explorer 7&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;At a minimum, it would be useful if sources supported OpenSearch. It would also be useful if they supported RSS to serve individual records. This is handy because NCBI links to numerous sources via LinkOut, and hence we could avoid the overhead of doing a search if we can retrieve the record directly (i.e., if NCBI has a link then I already now the information exists).&lt;br /&gt;&lt;br /&gt;In say "RSS", I should stress that I really mean RSS 1.0 (i.e., RDF). RSS 2.0 and Atom are a lot less useful in the long run, because RSS 1.0 can be integrated into a triple store, which opens up a world of cool things (i.e., aggregating data and performing queries on that data).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-115231967138272827?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/115231967138272827/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=115231967138272827' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115231967138272827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/115231967138272827'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/07/adding-sources-to-ispecies.html' title='Adding sources to iSpecies'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-114829409139144760</id><published>2006-05-22T11:34:00.000+01:00</published><updated>2006-05-23T09:15:10.206+01:00</updated><title type='text'>Why Google is good for science...maybe</title><content type='html'>&lt;br /&gt;&lt;img align="right" width="128" src="http://images.the-scientist.com/graphics/interface/toptoolbar/tslogo.gif"/&gt;I just noticed this piece written by Jeff Perkel in January who, after "poking around" the iSpecies blog, wrote &lt;a href="http://www.the-scientist.com/blog/display/22999/"&gt;Why Google is good for science&lt;/a&gt;. Well, yes and no. On the one hand it's fabulous, but on the other hand they can play rough. For example, iSpecies used &lt;a href="http://schoolar.google.com"&gt;Google Scholar&lt;/a&gt; to find scientific papers for a species name. The traffic was pretty minimal in the scheme of things, but Google have now blocked iSpecies (and as a consequence my whole University - gulp!) from accessing Google Scholar. &lt;br /&gt;&lt;img align="right" width="128" src="http://scholar.google.com/intl/en/images/scholar_logo.gif"/&gt;&lt;br /&gt;&lt;br /&gt;Before anybody says, "but you got what you deserved because you broke Google's &lt;a href="http://www.google.com/terms_of_service.html"&gt;Terms of Service&lt;/a&gt;", &lt;strike&gt;I think in this case they are simply being lazy. If Google truly cared about making Google Scholar useful, they'd create an API. Because they haven't I had to resort to screen scraping their unbelievably awful HTML (and I'm not the only one). The cost of setting up an API along the lines of the &lt;a href="http://www.google.com/apis/"&gt;one available&lt;/a&gt; for the main Google search engine would be trivial. &lt;/strike&gt;&lt;br /&gt;&lt;br /&gt;After venting my spleen, the reason -- as I should of guessed -- is "intellectual property". Google Scholar's agreements with the publishers that they index prevents Google from making it available other then through the web site. Thanks to Rebecca Shapley for clarifying this. Once again, scientists are being ill served by our publishers. Perhaps somebody needs to set up an Open Source/Open Access equivalent of Google Scholar.&lt;br /&gt;&lt;br /&gt;This is what I originally wrote, which perhaps is another reason publishers don't want Google Scholar having an API:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;There would also be a potential market. In the UK we rate our research based on a number of factors including journal impact factor, as part of the gargantuan &lt;a href="http://www.rae.ac.uk/"&gt;Research Assessment Exercise&lt;/a&gt;. Impact factors are supplied by ISI, and Google Scholar results &lt;a href="http://www.int-res.com/articles/esep/2005/E65.pdf"&gt;compare well with that source&lt;/a&gt;. Just think of the possibilities of a service that used Google Scholar to rate scientists' output. It could even be part of a service like &lt;a href="http://www.linkedin.com/home?trk=logo"&gt;LinkedIn&lt;/a&gt;, whioch I stumbled on via Pierre's blog on &lt;a href="http://plindenbaum.blogspot.com/2006/05/ncbi-pubmed-rss-feeds-geotagging.html"&gt;geotagging RSS feeds&lt;/a&gt; (which is a whole separate issue).&lt;br /&gt;&lt;img align="right" width="128" src="http://www.linkedin.com/img/logos/logo.gif"/&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-114829409139144760?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/114829409139144760/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=114829409139144760' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114829409139144760'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114829409139144760'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/05/why-google-is-good-for-sciencemaybe.html' title='Why Google is good for science...maybe'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-114587724007684122</id><published>2006-04-24T12:09:00.000+01:00</published><updated>2006-04-24T12:14:00.103+01:00</updated><title type='text'>iSpecies down</title><content type='html'>As &lt;a href="http://iphylo.blogspot.com/2006/04/darwin-hacked.html"&gt;reported on iPhylo&lt;/a&gt;, the machine running iSpecies was hacked last week. It's taking a while to rebuild things, but iSpecies is now running on &lt;a href="http://linnaeus.zoology.gla.ac.uk/~rpage/ispecies"&gt;another machine&lt;/a&gt; until the hacked machine can be rebuilt. My apologies for any inconvenience. Apart from issues of backing up, things take time to restore because the original machine ran an old version (4.2) of PHP, and the new machine uses PHP 5.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-114587724007684122?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/114587724007684122/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=114587724007684122' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114587724007684122'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114587724007684122'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/04/ispecies-down.html' title='iSpecies down'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-114279624660121438</id><published>2006-03-19T19:24:00.001Z</published><updated>2008-05-01T07:22:13.096+01:00</updated><title type='text'>Building the encyclopedia of life</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_Gct8lVAxKqQ/SBlhW8LJCcI/AAAAAAAAAMM/u8rTjAbWfVM/s1600-h/LocalImage.jpeg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://bp3.blogger.com/_Gct8lVAxKqQ/SBlhW8LJCcI/AAAAAAAAAMM/u8rTjAbWfVM/s320/LocalImage.jpeg" border="0" alt=""id="BLOGGER_PHOTO_ID_5195290691886451138" /&gt;&lt;/a&gt;&lt;br /&gt;iSpecies is very limited in the sources it uses, and also in what it extracts from its sources. The sources it does query contain a wealth of information. As an example, GenBank sequence &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&amp;val=7108724"&gt;AF131710&lt;/a&gt; from &lt;em&gt;Ligophorus mugilinus&lt;/em&gt; has the following information about this animal:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;FEATURES             Location/Qualifiers&lt;br /&gt;     source          1..374&lt;br /&gt;                     /organism="Ligophorus mugilinus"&lt;br /&gt;                     /mol_type="genomic DNA"&lt;br /&gt;                     /specific_host="Mugil cephalus"&lt;br /&gt;                     /db_xref="taxon:92200"&lt;br /&gt;                     /country="France"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Note the tags "/specific_host" and "/country". By parsing this record we learn that this organism is found in France, and is hosted by &lt;em&gt;Mugil cephalus&lt;/em&gt;. &lt;br /&gt;&lt;br /&gt;In the same way, the Google Scholar results could be more effectively used. In many cases we could follow the links to get abstracts of articles, then use literature data mining techniques (e.g., &lt;a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/18/12/1553"&gt;Hirschman et al.&lt;/a&gt;) to extract information on the organism's ecology, etc.&lt;br /&gt;&lt;br /&gt;Extracting this sort of information would be an one way to automate the construction of an encyclopedia of life.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-114279624660121438?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/114279624660121438/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=114279624660121438' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114279624660121438'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114279624660121438'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/03/building-encyclopedia-of-life.html' title='Building the encyclopedia of life'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_Gct8lVAxKqQ/SBlhW8LJCcI/AAAAAAAAAMM/u8rTjAbWfVM/s72-c/LocalImage.jpeg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-114277005397394879</id><published>2006-03-19T12:07:00.000Z</published><updated>2006-03-19T15:30:37.646Z</updated><title type='text'>Towards a faster iSpecies: building libxml and libxslt on Mac OS X</title><content type='html'>iSpecies is written in PHP, and calls a Perl CGI script (to query Google Scholar). This works, but is a bit slow. It also puts limits on what we can do. For example, it would be cool to make the search multithreaded so that the different sources are queried at the same time. This becomes a major issue if we want to "drill down." For example, if a taxon exists in NCBI, it would be useful to visit all the LinkOut resources and collect whatever information they make available. Likewise, Google Scholar results contain links to publishers that could be explored further (such as extracting bibliographic information from RIS files, or RSS feeds such as those available for &lt;a href="http://www.ldodds.com/blog/archives/000169.html"&gt;Ingenta-hosted journals&lt;/a&gt;). All of this would delay displaying search results to the user, especially if we have to visit one link after another.&lt;br /&gt;&lt;br /&gt;Multithreading would help, but PHP doesn't do this, hence I'm toying with moving to C++ and building a "proper" application (I don't do Java). This means I need to get XML, XPath, and XSLT libraries for C/C++, and this has been, ahem, interesting. Was going to use Sablotron (which I use in my PHP 4 and Perl work), but its documentation is just awful (where are some nice examples?). Will probably use &lt;a href="http://xmlsoft.org/"&gt;libxml&lt;/a&gt; and &lt;a href="http://xmlsoft.org/XSLT/"&gt;libxslt&lt;/a&gt;. These come with Mac OS X 10.3.9 (I do my development on a G4 iBook, before moving stuff to a Linux box), but Apple hasn't compiled libxml with XPath support (sigh). I built libxml 2.2.63 OK, but libxslt 1.1.15 needed a little hand holding because of the presence of Apple's libxml. The following does the trick:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;./configure --with-libxml-prefix=/usr/local&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This tells configure to use the version of libxml I installed in /usr/local. Now, once I get my head around &lt;a href="http://curl.haxx.se/"&gt;libcurl&lt;/a&gt; I'll try and build something and see if we can speed up iSpecies.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-114277005397394879?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/114277005397394879/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=114277005397394879' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114277005397394879'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114277005397394879'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/03/towards-faster-ispecies-building.html' title='Towards a faster iSpecies: building libxml and libxslt on Mac OS X'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-114104717244012040</id><published>2006-02-27T13:32:00.000Z</published><updated>2006-02-27T13:32:52.660Z</updated><title type='text'>Silobreaker</title><content type='html'>&lt;p&gt;&lt;img src="http://www.silobreaker.com/corporate/res/images/menu/SiloBreakerLogo_small.gif " align="right" /&gt;&lt;a href="http://www.silobreaker.com/corporate/#"&gt;Silobreaker&lt;/a&gt; looks to be a very cool way of exploring information. Facetted browsing is an old idea, but this looks like it might actually make it fun.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;(Via &lt;a href="http://rafaelsidi.blogspot.com/2005/10/silobreaker.html"&gt;Really Simple Sidi (RSS)&lt;/a&gt;.)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-114104717244012040?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/114104717244012040/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=114104717244012040' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114104717244012040'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/114104717244012040'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/02/silobreaker.html' title='Silobreaker'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113932524063094041</id><published>2006-02-07T15:05:00.000Z</published><updated>2006-02-07T15:14:04.520Z</updated><title type='text'>Tag cloud</title><content type='html'>I've added a simple "tag cloud" showing frequency of the top 30 searches in iSpecies. It's a bit ugly, but you get the idea. You see the tag cloud if you go to &lt;a href="http://ispecies.org"&gt;iSpecies&lt;/a&gt; directly (i.e., no search term). &lt;br /&gt;&lt;br /&gt;I made use of a nice article on &lt;a href="http://thraxil.org/users/anders/posts/2005/12/13/scaling-tag-clouds/"&gt;scaling tag clouds&lt;/a&gt; by &lt;a href="http://thraxil.org/users/anders/"&gt;Anders Pearson&lt;/a&gt;. He describes a simple function to scale the tags. I put this into an Excel spreadsheet as a quick hack (in other words, the tag cloud isn't dynamic yet).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113932524063094041?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113932524063094041/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113932524063094041' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113932524063094041'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113932524063094041'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/02/tag-cloud.html' title='Tag cloud'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113932052669259247</id><published>2006-02-07T13:49:00.000Z</published><updated>2006-02-07T13:55:26.706Z</updated><title type='text'>Programmable Web</title><content type='html'>&lt;a href="http://www.programmableweb.com/images/ProgrammableWebLogo.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 100px;" src="http://www.programmableweb.com/images/ProgrammableWebLogo.gif" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;iSpecies makes it onto &lt;a href="http://www.programmableweb.com/urlDetail?linkID=434"&gt;Programmable Web&lt;/a&gt;. This site has all sorts of useful information on Web 2.0 and mashups, see also &lt;a href="http://mashupfeed.com/"&gt;Mashup Feed&lt;/a&gt;. Spotted by &lt;a href="http://www.simon.rycroft.name/"&gt;Simon Rycroft&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113932052669259247?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113932052669259247/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113932052669259247' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113932052669259247'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113932052669259247'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/02/programmable-web.html' title='Programmable Web'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113857721900192666</id><published>2006-01-29T23:06:00.000Z</published><updated>2006-01-29T23:27:22.043Z</updated><title type='text'>Automatic extraction of references from a paper</title><content type='html'>One goal for iSpecies would be integrating taxonomic literature into the output. This has been motivated by &lt;a href="http://www.antbase.de/interview.html"&gt;Donat Agosti's&lt;/a&gt; efforts to make the taxonomic literature for ants available (see his &lt;a href="http://www.nature.com/nature/journal/v439/n7075/full/439392a.html"&gt;letter to &lt;em&gt;Nature&lt;/em&gt;&lt;/a&gt; about copyright and biopiracy &lt;a href="http://dx.doi.org/10.1038/439392a"&gt;doi:10.1038/439392a&lt;/a&gt;). For example, we can take a paper marked up in an XML schema such as the &lt;a href="http://research.amnh.org/informatics/taxlit/schemas"&gt;TaxonX Treatment Markup&lt;/a&gt;, extract the treatments of a name, and insert these into a triple store that iSpecies can query. For a crude example search iSpecies for the "Google ant" &lt;a href="http://darwin.zoology.gla.ac.uk/~rpage/ispecies/?q=Proceratium+google"&gt;&lt;em&gt;Proceratium google&lt;/em&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Now, marking up documents by hand (which is what Donat does) is tedious in the extreme. How can we automate this? In particular, I'd like to automate extracting taxonomic names, and references to other papers. The first can be facilitated by taxonomic name servers, particularly uBio's &lt;a href="http://names.mbl.edu/soap/finditSOAP.php"&gt;FindIT&lt;/a&gt; SOAP service. Extracting references seems more of a challenge, but tonight I stumbled across &lt;a href="http://paracite.eprints.org/"&gt;ParaCite&lt;/a&gt;, which looks like it might do the trick. There is Perl code available from CPAN (although when I tried this on Mac OS X 10.3.9 using cpan it failed to build) and from the &lt;a href="http://paracite.eprints.org/developers/downloads.html"&gt;downloads&lt;/a&gt; section of ParaCite. I grabbed Biblio-Citation-Parser-1.10, installed the dependencies via cpan, then built Biblio::Citation::Parser, and so far it looks promising. If references can be readily extracted from taxonomic markup, then this tool could be used to extract the bibliographic information and hence we could look up the references, both in taxon-specific databases such as AntBase, but also in Google Scholar.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113857721900192666?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113857721900192666/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113857721900192666' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113857721900192666'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113857721900192666'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/automatic-extraction-of-references.html' title='Automatic extraction of references from a paper'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113815564708511058</id><published>2006-01-25T02:20:00.000Z</published><updated>2006-01-25T02:22:04.313Z</updated><title type='text'>Google Maps Mania: North American Bird Watching Google Maps Mashup</title><content type='html'>&lt;a href="http://static.flickr.com/21/89266397_4d6810c7e3_m.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px;" src="http://static.flickr.com/21/89266397_4d6810c7e3_m.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://googlemapsmania.blogspot.com/2006/01/north-american-bird-watching-google.html"&gt;Google Maps Mania: North American Bird Watching Google Maps Mashup&lt;/a&gt; notes the very slick combination of Google Maps and Flash to display ranges of North American birds at &lt;a href="http://www.geobirds.com/index.php?option=com_staticxt&amp;staticfile=local.html"&gt;GeoBirds.com&lt;/a&gt;. The mashup uses data from the &lt;a href="http://www.mbr-pwrc.usgs.gov/bbs/"&gt;USGS Breeding Bird Survey&lt;/a&gt; and the Audobon Society's &lt;a href="http://www.audubon.org/bird/cbc/"&gt;Christmas Bird Count&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113815564708511058?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113815564708511058/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113815564708511058' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113815564708511058'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113815564708511058'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/google-maps-mania-north-american-bird.html' title='Google Maps Mania: North American Bird Watching Google Maps Mashup'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113801612697613589</id><published>2006-01-23T11:31:00.000Z</published><updated>2006-01-23T11:35:26.986Z</updated><title type='text'>Google, Yahoo, and the death of taxonomy?</title><content type='html'>I posted this on my &lt;a href="http://iphylo.blogspot.com"&gt;iPhylo blog&lt;/a&gt;, but since it is more relevant to iSpecies, and indeed the talk is the reason I built iSpecies, maybe it belongs here (see, I'm so self-absorbed I've started to blog my blogs - sad).&lt;br /&gt;&lt;br /&gt;Wednesday December 7th I gave a talk at the &lt;a href="http://www.systass.org/"&gt;Systematics Association's&lt;/a&gt; AGM in London, with the slightly tongue in cheek title &lt;emph&gt;Google, Yahoo, and the end of taxonomy?&lt;/emph&gt;. It summarises some of the ideas that lead me to create &lt;a href="http://ispecies.org"&gt;iSpecies.org&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;For fun I've made a &lt;a href="http://darwin.zoology.gla.ac.uk/~rpage/talks/nov2.mov"&gt;Quicktime movie&lt;/a&gt; of the presentation. Sadly there is no sound. Be warned that if you are offended by even mild nudity, this talk is not for you.&lt;br /&gt;&lt;br /&gt;The presentation style was inspired by Dick Hardt's wonderful presentation on &lt;a href="http://www.identity20.com/media/OSCON2005/"&gt;Identity 2.0&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113801612697613589?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113801612697613589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113801612697613589' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113801612697613589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113801612697613589'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/google-yahoo-and-death-of-taxonomy.html' title='Google, Yahoo, and the death of taxonomy?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113778109358328644</id><published>2006-01-20T17:48:00.000Z</published><updated>2006-01-20T19:01:26.953Z</updated><title type='text'>Identifiers for publications</title><content type='html'>Despite my enthusiasm for LSIDs, here are some thoughts on indentifiers for publications. Say you want to set up a bibliographic database. How do you generate stable identifiers for the contents?&lt;br /&gt;&lt;br /&gt;There's an interesting -- if dated -- review by the &lt;a href="http://www.nla.gov.au/initiatives/persistence/PIcontents.html"&gt;National Library of Australia&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.handle.net"&gt;Handle System&lt;/a&gt; generates Globally Unique Identifiers (GUIDs), such as  &lt;a href="hdl:2246/3615"&gt;hdl:2246/3615&lt;/a&gt; (which can be resolved in Firefox if you have the &lt;a href="http://www.handle.net/resolver/mozilla/"&gt;HDL/DOI extension&lt;/a&gt;). Handles can also be resolved with URLs, e.g. &lt;a href="http://digitallibrary.amnh.org/dspace/handle/2246/3615"&gt;http://digitallibrary.amnh.org/dspace/handle/2246/3615&lt;/a&gt; and &lt;a href="http://hdl.handle.net/2246/3615"&gt;http://hdl.handle.net/2246/3615&lt;/a&gt;. &lt;a href="http://www.dspace.org/"&gt;DSpace&lt;/a&gt; uses handles.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.doi.org"&gt;DOIs&lt;/a&gt; deserve serious consideration, despite costs, especially if the goal is to make literature more widely available. With DOIs, metadata will go into &lt;a href="http://www.crossref.org/"&gt;CrossRef&lt;/a&gt;, and publishers will be able to use that to add URLs to their electronic publications. That means people reading papers online will have immediate access to the papers in your database. Apart from cost, copyright is an issue (is the material you are serving copytighted by sombody else?), and recent papers will already have DOIs. Having more than one is not ideal. &lt;br /&gt;&lt;br /&gt;If Handles or DOIs aren't what you want to use, then some sort of persistent URL is an option. Their content can be dynamically generated even if they look like static URLs. For background see &lt;a href="http://www.devarticles.com/c/a/Apache/Using-ForceType-For-Nicer-Page-URLs/2/"&gt;Using ForceType For Nicer Page URLs - Implementing ForceType sensibly&lt;/a&gt; and &lt;a href="http://evolt.org/article/Making_clean_URLs_with_Apache_and_PHP/18/22880/index.html"&gt; Making "clean" URLs with Apache and PHP&lt;/a&gt;. To do this in Apache you need a .htaccess file in the web folder, e.g.:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;# AcceptPathInfo On is for Apache 2.x, don't use for Apache 1.x&lt;br /&gt;&amp;lt;Files uri&amp;gt;&lt;br /&gt;#   AcceptPathInfo On&lt;br /&gt;    ForceType application/x-httpd-php&lt;br /&gt;&amp;lt;/Files&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You need to ensure that .htaccess can override FileInfo, e.g. have this in httpd.conf:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;Directory "/Users/rpage/Sites/iphylo"&amp;gt;&lt;br /&gt;    Options Indexes MultiViews&lt;br /&gt;    AllowOverride FileInfo&lt;br /&gt;    Order allow,deny&lt;br /&gt;    Allow from all&lt;br /&gt;&amp;lt;/Directory&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This would mean that http://localhost/~rpage/iphylo/uri/234 would execute the file uri (which does not have a PHP extension). The file would look something like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;?php&lt;br /&gt;&lt;br /&gt; // Parse URL to extract URI&lt;br /&gt; $uri = $_SERVER["SCRIPT_URL"];&lt;br /&gt; $uri = str_replace ($_SERVER["SCRIPT_NAME"] . '/', '', $uri);&lt;br /&gt; &lt;br /&gt; // Check for any prefixes, such as "rdf" or "rss" which will flag the&lt;br /&gt; // format to return&lt;br /&gt; // Check that it is indeed a URI&lt;br /&gt; // Lookup in our database&lt;br /&gt; // Display result&lt;br /&gt; ?&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Lastly, &lt;a href="http://www.cdlib.org/inside/diglib/ark/"&gt;ARK&lt;/a&gt; is another option, which is essentially a URL but it copes with the potential loss of a server. It comes from the &lt;a href="http://www.cdlib.org/"&gt;California Digital Library&lt;/a&gt;. I'm not sure how widely this has been adopted. My sense is it hasn't been, although the &lt;a href="http://pbj.ctlt.wsu.edu/cornish/archive/2005/12/11/8584.aspx"&gt;Northwest Digital Archives&lt;/a&gt; is looking at it.&lt;br /&gt;&lt;br /&gt;If cost and hassle are a consideration, I'd go for clean URLs. If you wanted a proper bibliographic archive system I'd consider setting up a DSpace installation. One argument I found interesting in the Australian review is that Handles and DOIs resolve to a URL that may be very different to the identifier, and if people copy the URL in the location bar they won't have copied the GUID, which somewhat defeats the point. In other words, if they are going to store the identifiers, say in a database, they need to get the identifier, not the URL.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113778109358328644?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113778109358328644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113778109358328644' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113778109358328644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113778109358328644'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/identifiers-for-publications.html' title='Identifiers for publications'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113776650834814364</id><published>2006-01-20T14:15:00.000Z</published><updated>2006-01-20T14:15:08.353Z</updated><title type='text'>Creative Commons Welcome Pack</title><content type='html'>&lt;div style="float: right; margin-left: 10px; margin-bottom: 10px;"&gt; &lt;a href="http://www.flickr.com/photos/cmiller/87784936/" title="photo sharing"&gt;&lt;img src="http://static.flickr.com/37/87784936_d88d4a3e91_m.jpg" alt="" style="border: solid 2px #000000;" /&gt;&lt;/a&gt; &lt;br /&gt; &lt;span style="font-size: 0.9em; margin-top: 0px;"&gt;  &lt;a href="http://www.flickr.com/photos/cmiller/87784936/"&gt;Creative Commons Welcome Pack&lt;/a&gt;  &lt;br /&gt;  Originally uploaded by &lt;a href="http://www.flickr.com/people/cmiller/"&gt;Carlfish&lt;/a&gt;. &lt;/span&gt;&lt;/div&gt;I want one!&lt;br clear="all" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113776650834814364?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113776650834814364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113776650834814364' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113776650834814364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113776650834814364'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/creative-commons-welcome-pack.html' title='Creative Commons Welcome Pack'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113776651888355433</id><published>2006-01-20T14:09:00.000Z</published><updated>2006-01-20T14:15:18.896Z</updated><title type='text'>Using RSS feeds to notify clients when data changes</title><content type='html'>iChating with David Remsen earlier this morning, and he suggested using RSS feeds as a way of data providers "notifying" clients if their data has changed (e.g., if they've added some new names). Nice idea, and frees the client (such as a triple store) from having to download the entire data set every time, or to compute the difference between the data held locally and the data held by the remote source. Turns out that &lt;a href="http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers"&gt;HTTP conditional GET&lt;/a&gt; can be used to tell if something has changed.&lt;br /&gt;&lt;br /&gt;So, the idea is that a data source timestamps its data, and when data is modified it adds the modified records to its RSS feed. The data consumer peridically checks the RSS feed, and if it has changed it grabs the feed and stores the new data (which, in the case of a triple store can be easily parsed into a suitable form).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113776651888355433?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113776651888355433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113776651888355433' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113776651888355433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113776651888355433'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/using-rss-feeds-to-notify-clients-when.html' title='Using RSS feeds to notify clients when data changes'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113746049944447592</id><published>2006-01-17T01:11:00.000Z</published><updated>2006-01-17T01:14:59.453Z</updated><title type='text'>Says it all...</title><content type='html'>&lt;a href="http://photos1.blogger.com/blogger/4123/605/1600/dilbert2005121017631.2.gif"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/400/dilbert2005121017631.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Why I avoid discussions of standards like the plague...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113746049944447592?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113746049944447592/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113746049944447592' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113746049944447592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113746049944447592'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/says-it-all.html' title='Says it all...'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113740637463837478</id><published>2006-01-16T09:55:00.000Z</published><updated>2006-01-24T21:31:42.773Z</updated><title type='text'>EXIF tags</title><content type='html'>&lt;a href="http://photos1.blogger.com/blogger/4123/605/1600/exif.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/exif.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Some images come with embedded metadata, such as &lt;a href="http://www.exif.org/"&gt;EXIF&lt;/a&gt; tags or &lt;a href="http://www.adobe.com/products/xmp/main.html"&gt;XMP&lt;/a&gt;. Images from &lt;a href="http://www.antweb.org/"&gt;AntWeb&lt;/a&gt; are a good example. These tags can be viewed by various programs, such as Adobe Photoshop, or utilities such as &lt;a href="http://homepage.mac.com/aozer/EV/"&gt;EXIF Viewer&lt;/a&gt;, seen here.&lt;br /&gt;&lt;br /&gt;So, an obvious step would be (assuming we start using a triple store as a backend for iSpecies, and/or provide the results of a query in RDF) would be to extract metadata from EXIF tags. For example, the image &lt;a href="http://www.antweb.org/images/casent0100367/casent0100367_p_1_low.jpg"&gt;http://www.antweb.org/images/casent0100367/casent0100367_p_1_low.jpg&lt;/a&gt; of &lt;em&gt;Proceratium google&lt;/em&gt; in AntWeb has the following metadata:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt; File name: casent0100367_p_1_low.jpg&lt;br /&gt; File size: 17811 bytes (0x0, infbpp, 0x)&lt;br /&gt; EXIF Summary: &lt;br /&gt;&lt;br /&gt;Camera-Specific Properties:&lt;br /&gt;&lt;br /&gt; Camera Software: EXIFutils V2.5.7&lt;br /&gt; Photographer: April Nobile&lt;br /&gt;&lt;br /&gt;Image-Specific Properties:&lt;br /&gt;&lt;br /&gt; Image Created: 2005:09:27 09:54:34&lt;br /&gt; Comment:  Attribution-NonCommercial-ShareAlike Creative Commons License&lt;br /&gt;&lt;br /&gt;Other Properties:&lt;br /&gt;&lt;br /&gt; Exif IFD Pointer: 196&lt;br /&gt; Exif Version: 2.20&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Hence, we could extract the relevant bits (author, date, copyright) and store those. This could be done in bulk using a tool such as &lt;a href="http://www.sno.phy.queensu.ca/~phil/exiftool/"&gt;ExifTool&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The example of AntWeb does show one weakness of free-text metadata. The image is licensed under "Attribution-NonCommercial-ShareAlike Creative Commons License". &lt;strike&gt;I'm assuming this is &lt;a href="http://creativecommons.org/licenses/by-nc-sa/2.0/"&gt;Attribution-NonCommercial-ShareAlike 2.0&lt;/a&gt;, but without a URL it is a faff to work this out. &lt;/strike&gt; Ah, looking at the AntBase pages for individual specimens, it's actually &lt;a href="http://creativecommons.org/licenses/by-nc-sa/1.0/"&gt;1.0&lt;/a&gt;. Yes, it's pretty obvious, but it still requires string matching. These things need to be computer readable as well, and versioned (for example, which version of this license was intended?).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.antweb.org/images/april.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px;" src="http://www.antweb.org/images/april.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;For the photographer (such as AntWeb's April Nobile - seen here), it might be useful to create a FOAF file to link to, so that we have metadata about the creator of the images.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113740637463837478?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113740637463837478/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113740637463837478' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113740637463837478'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113740637463837478'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/exif-tags.html' title='EXIF tags'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113681685610964059</id><published>2006-01-09T14:27:00.000Z</published><updated>2006-01-16T19:10:27.266Z</updated><title type='text'>From the Blogsphere</title><content type='html'>Some interesting comments on iSpecies and related matters:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.bio-itworld.com/newsletters/insideit/2006/01/11/17508/view"&gt;Hip Hop Offers Lessons on Life Science Data Integration&lt;/a&gt;&lt;br /&gt;(Great title)&lt;br /&gt;&lt;br /&gt;&lt;a href="http://mattdowling.blogspot.com/2005/12/ispeciesorg-and-census-of-marine-life.html"&gt;Ontogeny: iSpecies.org and Census of Marine Life Update&lt;/a&gt;&lt;br /&gt;&lt;blockquote&gt;I searched on Solenopsis invicta and thought the return information was great.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://scilib.typepad.com/science_library_pad/2006/01/open_data_and_o.html"&gt;open data and open APIs enable scientific mashups&lt;/a&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;The biodiversity community is one group working to develop such services. To demonstrate the principle, Roderic Page of the University of Glasgow, UK, built what he describes as a "toy" — a mashup called Ispecies.org (http://darwin.zoology.gla.ac.uk/~rpage/ispecies). If you type in a species name it builds a web page for it showing sequence data from GenBank, literature from Google Scholar and photos from a Yahoo image search. If you could pool data from every museum or lab in the world, "you could do amazing things", says Page.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;a href="http://blog.uwinnipeg.ca/loomware/archives/001663.html"&gt;Loomware&lt;/a&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;This is too cool - some interesting examples of "Scientific mashups" from Richard Akerman. I followed the iSpecies link and was blown away by the data that was returned. I studied a damselfly called Argia vivida for my graduate degree and way back then finding data on the bug was not always easy. Searching the species in iSpecies bring up a TaxID number to the NCBI Taxonomy Browser, a list of papers from Google Scholar and images from Yahoo Images. This is an excellent example of what we have all been waiting for with the promise of the Web and web services in particular. The concept of a page for every species known is a dream come true for science, so this one is worth watching.&lt;br /&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113681685610964059?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113681685610964059/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113681685610964059' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113681685610964059'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113681685610964059'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/from-blogsphere.html' title='From the Blogsphere'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113656466422508746</id><published>2006-01-06T16:24:00.000Z</published><updated>2006-01-06T16:24:24.253Z</updated><title type='text'>AntWeb-Google Earth Map</title><content type='html'>&lt;div style="float: right; margin-left: 10px; margin-bottom: 10px;"&gt; &lt;a href="http://www.flickr.com/photos/gisuser/49512446/" title="photo sharing"&gt;&lt;img src="http://static.flickr.com/27/49512446_1bb3f596cd_m.jpg" alt="" style="border: solid 2px #000000;" /&gt;&lt;/a&gt; &lt;br /&gt; &lt;span style="font-size: 0.9em; margin-top: 0px;"&gt;  &lt;a href="http://www.flickr.com/photos/gisuser/49512446/"&gt;AntWeb- Google Earth Map&lt;/a&gt;  &lt;br /&gt;  Originally uploaded by &lt;a href="http://www.flickr.com/people/gisuser/"&gt;GISuser.com&lt;/a&gt;. &lt;/span&gt;&lt;/div&gt;This is a nice example of the kind of thing that can be done when georeferenced specimen data are readily available. Need to think about doing this for iSpecies.&lt;br /&gt;&lt;br /&gt;For more on AntWeb and Google Earth vist &lt;a href="http://www.antweb.org/google_earth.jsp"&gt;here &lt;/a&gt;.&lt;br clear="all" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113656466422508746?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113656466422508746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113656466422508746' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113656466422508746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113656466422508746'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/antweb-google-earth-map.html' title='AntWeb-Google Earth Map'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113641470491735841</id><published>2006-01-04T22:41:00.000Z</published><updated>2006-01-04T22:45:04.926Z</updated><title type='text'>Nature on mashups</title><content type='html'>&lt;em&gt;Nature&lt;/em&gt; has an &lt;a href="http://www.nature.com/nature/journal/v439/n7072/full/439006a.html"&gt;article&lt;/a&gt; by &lt;a href="http://declanbutler.info/blog/"&gt;Declan Butler&lt;/a&gt; on "mashups", which mentions &lt;a href="http://ispecies.org/"&gt;iSpecies&lt;/a&gt;, and also Donat Agosti's work on AntWeb and Antbase.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113641470491735841?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113641470491735841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113641470491735841' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113641470491735841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113641470491735841'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2006/01/nature-on-mashups.html' title='Nature on mashups'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113545007428960485</id><published>2005-12-24T18:41:00.000Z</published><updated>2005-12-24T18:47:54.300Z</updated><title type='text'>The Modern Palimpsest</title><content type='html'>&lt;a href="http://www.coasterphotos.co.uk/images/apus.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px;" src="http://www.coasterphotos.co.uk/images/apus.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Leigh Dodds included iSpecies in his recent post on the scientific paper as &lt;a href="http://www.ldodds.com/blog/archives/000264.html"&gt;modern palimpest&lt;/a&gt;. He writes:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;iSpecies is a nice example of a science "mashup" that illustrates an alternative search interface for finding related content. I used the false results that can appear when performing simple keyword searches to reinforce the need for standard identifiers. (The need for a common, scoped identifier for authors, is a particular hobby horse of mine).&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Among the false results is the image of oil tanker (among other things) that Yahoo provides when searching for "Apus apus."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113545007428960485?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113545007428960485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113545007428960485' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113545007428960485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113545007428960485'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2005/12/modern-palimpsest.html' title='The Modern Palimpsest'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113535790760938434</id><published>2005-12-23T17:06:00.000Z</published><updated>2005-12-23T17:11:47.616Z</updated><title type='text'>Character coding issues with Google Scholar</title><content type='html'>Tanja points out that results for articles with German titles can look awful (e.g., try searching on &lt;a href="http://darwin.zoology.gla.ac.uk/~rpage/ispecies/?q=Erica+inflata"&gt;Erica inflata&lt;/a&gt;). This is a problem with Google Scholar, which corrupts the characters. To verify this, do the search &lt;a href="http://scholar.google.com/scholar?hl=en&amp;lr=&amp;safe=off&amp;c2coff=1&amp;q=Erica+inflata&amp;btnG=Search"&gt;directly in Google Scholar&lt;/a&gt;. A workaround, if one had time, would be to screen scrape some of the source sites. For example, Springer's web site could be scraped to get the correct title, and a DOI. One more thing for the to do list...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113535790760938434?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113535790760938434/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113535790760938434' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113535790760938434'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113535790760938434'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2005/12/character-coding-issues-with-google.html' title='Character coding issues with Google Scholar'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113533399676014403</id><published>2005-12-23T10:28:00.000Z</published><updated>2005-12-23T10:33:16.773Z</updated><title type='text'>NatureServe</title><content type='html'>&lt;a href="http://www.natureserve.org/"&gt;NatureServe&lt;/a&gt;, a "non-profit conservation organization that provides the scientific information and tools needed to help guide effective conservation action" have announced an XML schema for their proposed &lt;a href="http://services.natureserve.org/"&gt;web service&lt;/a&gt;. NatureServe's focus (I think) is on rare and endangered species in North America, but some of their data and/or schema may be useful for iSpecies.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113533399676014403?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113533399676014403/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113533399676014403' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113533399676014403'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113533399676014403'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2005/12/natureserve.html' title='NatureServe'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113476428599937473</id><published>2005-12-16T20:16:00.000Z</published><updated>2005-12-16T20:18:06.010Z</updated><title type='text'>Science NetWatch</title><content type='html'>&lt;span style="font-style:italic;"&gt;Science&lt;/span&gt; magazine's &lt;a href="http://www.sciencemag.org/content/vol310/issue5755/netwatch.dtl"&gt;NetWatch column&lt;/a&gt; for 16 December 2005 mentions iSpecies.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113476428599937473?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113476428599937473/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113476428599937473' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113476428599937473'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113476428599937473'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2005/12/science-netwatch.html' title='Science NetWatch'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113342791388840203</id><published>2005-12-01T09:05:00.000Z</published><updated>2005-12-01T09:05:13.916Z</updated><title type='text'>The Simplicity of the Semantic Web</title><content type='html'>&lt;div style="float: right; margin-left: 10px; margin-bottom: 10px;"&gt; &lt;a href="http://www.flickr.com/photos/kinetisonic/6119376/" title="photo sharing"&gt;&lt;img src="http://static.flickr.com/3/6119376_a4f5ef4e93_m.jpg" alt="" style="border: solid 2px #000000;" /&gt;&lt;/a&gt; &lt;br /&gt; &lt;span style="font-size: 0.9em; margin-top: 0px;"&gt;  &lt;a href="http://www.flickr.com/photos/kinetisonic/6119376/"&gt;The Simplicity of the Semantic Web&lt;/a&gt;  &lt;br /&gt;  Originally uploaded by &lt;a href="http://www.flickr.com/people/kinetisonic/"&gt;kinetisonic&lt;/a&gt;. &lt;/span&gt;&lt;/div&gt;This is simply to remind me of where iSpecies needs to be heading...&lt;br clear="all" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113342791388840203?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113342791388840203/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113342791388840203' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113342791388840203'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113342791388840203'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2005/12/simplicity-of-semantic-web.html' title='The Simplicity of the Semantic Web'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113125326470099041</id><published>2005-11-06T04:52:00.000Z</published><updated>2005-11-06T05:01:04.706Z</updated><title type='text'>Nodalpoint notices</title><content type='html'>iSpecies was blogged by "Greg" at &lt;a href="http://www.nodalpoint.org/node/1737"&gt;Nodalpoint&lt;/a&gt;, which carried some &lt;a href="http://www.nodalpoint.org/node/1571"&gt;earlier discussion&lt;/a&gt; about LSIDs.&lt;br /&gt;&lt;br /&gt;Curiously, LSIDs pop up again:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;What is also cool is that each species has RDF formated metadata associated with it via an LSID, see here for an example. It would be nice if each species had its own permanent URL, which would be arguably more useful than an LSID, but I won't go there :)&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Actually, iSpecies doesn't link to LSIDs for taxon names (that's done through one of my other toys, the &lt;a href="http://darwin.zoology.gla.ac.uk/~rpage/portal"&gt;Taxonomic Search Engine&lt;/a&gt;). Perhaps it's time to link these two toys together?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113125326470099041?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113125326470099041/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113125326470099041' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113125326470099041'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113125326470099041'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2005/11/nodalpoint-notices.html' title='Nodalpoint notices'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113119187542210200</id><published>2005-11-05T11:51:00.000Z</published><updated>2005-11-05T11:57:55.423Z</updated><title type='text'>Being too careful</title><content type='html'>Managed to figure out why the Google Scholar results would sometimes appear and sometimes not. The server hosting iSpecies uses PortSentry to detect and block port scans. PortSentry decided that the Glasgow University proxy was evil. Our proxy has three IP addresses, two of which were blocked by PortSentry. If the DNS resolved the proxy address to a blocked IP, the Google Scholar Perl script would fail (as it connects to the outside world via the proxy). If the DNS happened to resolve it to the unblocked IP, it would work. Found this out by changing &lt;pre&gt;use LWP;&lt;/pre&gt; to &lt;pre&gt;use LWP::Debug '+';&lt;/pre&gt; and printing out the response status_line.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113119187542210200?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113119187542210200/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113119187542210200' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113119187542210200'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113119187542210200'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2005/11/being-too-careful.html' title='Being too careful'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18671685.post-113119082249869728</id><published>2005-11-05T11:34:00.000Z</published><updated>2005-11-05T11:45:50.103Z</updated><title type='text'>iSpecies launched</title><content type='html'>&lt;a href="http://iSpecies.org"&gt;iSpecies&lt;/a&gt; is a very simple test of E O Wilson's idea of a web page for each species. The data displayed are generated "on the fly" by querying other data sources, such as NCBI, Yahoo Images, and Google Scholar. The site was announced on &lt;a href="http://listserv.nhm.ku.edu/archives/taxacom.html"&gt;TAXACOM&lt;/a&gt; and the &lt;a href="http://listserv.nhm.ku.edu/archives/tdwg-sdd.html"&gt;Taxonomic Databases Working Group - Structure of Descriptive Data&lt;/a&gt; lists on 2 November 2005. It was blogged by &lt;a href="http://www.ldodds.com/blog/archives/000246.html"&gt;Leigh Dodds&lt;/a&gt; and &lt;a href="http://dannyayers.com/archives/2005/11/03/bits-2/"&gt;Danny Ayers&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18671685-113119082249869728?l=ispecies.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ispecies.blogspot.com/feeds/113119082249869728/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18671685&amp;postID=113119082249869728' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113119082249869728'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18671685/posts/default/113119082249869728'/><link rel='alternate' type='text/html' href='http://ispecies.blogspot.com/2005/11/ispecies-launched.html' title='iSpecies launched'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>8</thr:total></entry></feed>
