Monday, March 05, 2007

5 Ways to Mix, Rip, and Mash Your Data

Spotted by Simon Rycroft, Nick Gonzalez has a comparison of maship scripts entitled: 5 Ways to Mix, Rip, and Mash Your Data.
Call them pipes, teqlos, dapps, modules, mashups or whatever else but fact is that recently we have seen a good number of new services that allow developers and users to build mini-apps and mashups that mix and re-mix data. Here we run through 5 applications that allow you to mix, rip and mash your data, looking at the data input, output, REST support, suggested use, and required skill level.


Clearly, this stuff is attracting a lot of attention.

Saturday, March 03, 2007

Wikis and the future of iSpecies

So, where next for iSpecies? An obvious route seems to be adding a Wiki, something I've discussed on SemAnt. Imagine pre-populating a Wiki with search results from iSpecies, especially if we drilled down using the links in the NCBI search results to extract further content, and made use of the improved mapping between NCBI and TreeBASE names (TBMap).

A few things have stopped me from implementing this. One is the problem that Wiki's are (usually) just unstructured text. However, semantic wikis are starting to emerge (e.g., Semantic MediaWiki, and Rhizome -- I'll be adding links to more at del.icio.us/rdmpage/semantic-wiki). Using a semantic wiki means we can enter structured information and render it as RDF, which would make it potentially a great way to cpature basic facts (triples) about a taxon, but still have human-readable and editable documents.

I've been pondering this, and toying with either writing something myself, or using an off the shelf solution. It's like that I may write something, because I want to link it to a triple store, and I want to pre-populate the wiki as much as possible.

One minor thing that has been holding me back is thinking about URLs to link to the content. For example, I'd like to be able to do the following:
  • Link to a page by either a unique numerical identifier (e.g., "wiki/0342001", or a name (e.g., "wiki/Physeter catodon"). If the user enters the numerical version, they get directed to the text identifier.

  • If a name is a synonym, redirect user to that page. For example, "wiki/Physeter macrocephalus" would redirect to "wiki/Physeter catodon").

  • If the name is a homonym, display a disambiguation page listing the different taxa with that name.

  • If a user creates a URL that doesn't exist, the wiki would offer to make a new page, after checking that the URL tag is a scientific name (say by using uBio's XML web service).


I've been learning about the joys of Apache's mode-rewrite, which looks like a nice way to deal with some of these issues. For example, this .htaccess file handles both numerical and text identifiers.

# Don't mess with the actual script call
RewriteRule ^get.php* - [L]
# URL is numerical id
RewriteRule (^[0-9]*$) get.php?id=$1 [L]
# URL is tag name
RewriteRule (^[A-Za-z](.*)) get.php?name=$1 [L]

Then, the code in get.php would do display the appropriate page. If the parameter is a numerical id, it's a simple database lookup (numerical identifiers are great because databases handle them easily, and they can be stored without worrying about issues such as capitalisation and punctionation). If it's a name we follow the steps outlined above to handle synonyms, etc.

The point of this is that we get clean URLs, but users can still link using natural URLs like those in WikiPedia and WikiSpecies. Given this, why don't I use WikiSpecies? Well, because it's not a semantic wiki, so I don't gain anything from locking information up in this format.