Saturday, March 03, 2007

Wikis and the future of iSpecies

So, where next for iSpecies? An obvious route seems to be adding a Wiki, something I've discussed on SemAnt. Imagine pre-populating a Wiki with search results from iSpecies, especially if we drilled down using the links in the NCBI search results to extract further content, and made use of the improved mapping between NCBI and TreeBASE names (TBMap).

A few things have stopped me from implementing this. One is the problem that Wiki's are (usually) just unstructured text. However, semantic wikis are starting to emerge (e.g., Semantic MediaWiki, and Rhizome -- I'll be adding links to more at Using a semantic wiki means we can enter structured information and render it as RDF, which would make it potentially a great way to cpature basic facts (triples) about a taxon, but still have human-readable and editable documents.

I've been pondering this, and toying with either writing something myself, or using an off the shelf solution. It's like that I may write something, because I want to link it to a triple store, and I want to pre-populate the wiki as much as possible.

One minor thing that has been holding me back is thinking about URLs to link to the content. For example, I'd like to be able to do the following:
  • Link to a page by either a unique numerical identifier (e.g., "wiki/0342001", or a name (e.g., "wiki/Physeter catodon"). If the user enters the numerical version, they get directed to the text identifier.

  • If a name is a synonym, redirect user to that page. For example, "wiki/Physeter macrocephalus" would redirect to "wiki/Physeter catodon").

  • If the name is a homonym, display a disambiguation page listing the different taxa with that name.

  • If a user creates a URL that doesn't exist, the wiki would offer to make a new page, after checking that the URL tag is a scientific name (say by using uBio's XML web service).

I've been learning about the joys of Apache's mode-rewrite, which looks like a nice way to deal with some of these issues. For example, this .htaccess file handles both numerical and text identifiers.

# Don't mess with the actual script call
RewriteRule ^get.php* - [L]
# URL is numerical id
RewriteRule (^[0-9]*$) get.php?id=$1 [L]
# URL is tag name
RewriteRule (^[A-Za-z](.*)) get.php?name=$1 [L]

Then, the code in get.php would do display the appropriate page. If the parameter is a numerical id, it's a simple database lookup (numerical identifiers are great because databases handle them easily, and they can be stored without worrying about issues such as capitalisation and punctionation). If it's a name we follow the steps outlined above to handle synonyms, etc.

The point of this is that we get clean URLs, but users can still link using natural URLs like those in WikiPedia and WikiSpecies. Given this, why don't I use WikiSpecies? Well, because it's not a semantic wiki, so I don't gain anything from locking information up in this format.


Sarah said...

What about using iSpecies to make a field guide? I'm not a coder, so some of your discussion is a bit beyond me. I am an educator/ scientist, hard at work developing an ecology learning/ invasive species monitoring program for 7th and 8th graders in Maine, where every 7th and 8th grader has a laptop computer thanks to the Maine Learning Technology Initiative (think of 32,000 data collectors spread around a state the size of Ireland). We're developing software for the laptops to help the students follow protocols in the field. We have a cadre of scientists who are interested in the data students collect (these scientists and a group of educators have helped us design the program). What we really need now is an electronic species identification tool. I'd love to be able to download iSpecies searches onto a laptop (preferably re-configured to focus on the data/ images most useful for identifying the species), disconnect from the internet, go in the field and have all that data with me... oh, and this has to be easy for teachers and or middle school students to do on their own. Ideas?

justg said...

sarah: You said: " make a field guide". If you are interested in Identification, are you aware of the dichotomous keys available online:
There are data dumps of the entire WikiMedia projects available for offline use too:

Rod Page said...

Sarah, this would be fun, although a lot of the information iSpecies finds is only available online. That said, one could imagine grabbing the images, and perhaps a map via GBIF, and any papers that were online as PDFs, and using that as a starting point.
To do this off-line (e.g., in the field) then I guess we need some simple software to display the results. Of course, we could use a web browser (and browse the content off-line). Most of the software that drives Wiki's can work equally well off-line, just so long as you don't look up remote web sites.

Alternatively, perhaps something like a personal wiki would work. VoodooPad is an example of the sort of software I mean. This looks very easy to use, so that students could add content easily.

Have you contacted Bob Morris at the Electronic Field Guide? Bob is based at University Massachusetts, Boston, and has spent a lot of time working on field guides.

Sarah said...

Hello justg and rod,

I just took a quick look and your links, justg, and I think both services as they are won't quite work. With invasive species, we're looking at a few plants, a few crabs, a few crayfish, so the existing dichotomous keys are too narrow. A total wiki data dump would be too broad... no quick answer, but thanks. There might be something we can use there.

Rod, you make a fine point about the online nature of the iSpecies content (this is iSpecies' strength, after all). What I was thinking was more along the lines of running special searches along the lines of iSpecies but customized for school children (i.e., pictures, size ranges, defining characteristics, growth characteristics, other identifiable traits, etc.), downloading them, and then making them searchable offline.

And yes, I'm in touch with Bob Morris. His EFG solution relies on Apache, and the MLTI administration is understandably committed to keeping their laptop image simple, i.e., without even a disabled version of Apache running. Another hurdle for us is that, at least as of yet, his interface is not customizable by anyone but his programmer, and the look and feel and interface that we envision is quite different. So I'm looking for alternatives while I discuss possibilities with Bob.

I haven't seen Voodoopad yet. I'll take a look at that next.

Thanks very much for your thoughts!

Donat Agosti said...

if you are interested in getting access to the original content of descriptions, you can get all of them available right now via these two RSS feeds picking up Terrys idea, the SRS now provides two feeds:
- including links to the XML documents (link to TaxonX will be done soon)
- including links to the HTML documents, as displayed in the SRS search results

This way, you know what we have. You then can get the xml versions.

All of them have an LSID, and we work on getting a resolver set up.

Also we issue LSIDs for the materialsCitation (the former collecting events) and serve them via TAPIR to GBIF. So you then could also pick up individual records published in books.

For me, this seems one way to bring the content of the publications into the semantic web environment. If more people would pick this up, we quickly would have a huge base to populate ispecies etc.


Recovering Algebraist said...

Describes a just finishing project in our graduate software engineering course. It adds a button to Semantic Media Wiki (SMW) that lets you auto-generate MW links to a bunch of stuff discovered by several (presently hard-coded) searches, including some obvious ones like Google and some less obvious ones. Most interesting is that it can do so to terms expressed in an SMW's ontology. To me this is interesting because SMW is supposed to be able to import OWL ontologies.

I had my hands on this yesterday, and as student projects go it seemed pretty solid. I'm looking into the code this weekend. My colleague whose students did this agreed with me that it should be possible to refactor their code and produce an API to the search driver that would allow you to specify search mechanisms at configuration time, including semantically driven ones, e.g. with SPARQL queries instead of just text matching as in the present version.

Best of all will be if Rod takes it and does this... :-)

The students told me that they will be contributing it to the MW community mechanisms. I believe it has a GPL3 license on it.

Bob Morris

Recovering Algebraist said...

The new version EFG2 of the UMB Electronic Field Guide production software supports a template-based system that allows authors to tailor the look and feel of their guides, including multiple deployments against different templates from the same descriptive data. The template system itself is extensible in two ways, with difficulty depending on your IT resources. The first is that all the CSS in the templates currently in the library can be edited, allowing for trivial presentation changes for typography, colors, etc. The second is that the XSLT that transforms the XML form of species descriptions into template data can be edited.

As with the original EFG, there is a documented way to have among your descriptive data a paramaterized query to services anywhere on the web to bring other resources (e.g. pictures) into your EFG. The reason this is implemented on an Apache Tomcat server--whether connected to the internet or not--is that arbitrary applications can make a request for the constructed descriptions as XML. Actually, our own HTML-based instances do this, and make the HTML via the template system.

The code is licensed under a GPL open source license, and is available on our web site. There is a Windows installer that is quite solid, and we are building a Linux distribution now. We have no current funding, so this is a slow, volunteer effort. From experience in the previous version, we expect that the software will run fine on recent versions of OS/X.

Bob Morris