Monday, May 22, 2006

Why Google is good for science...maybe


I just noticed this piece written by Jeff Perkel in January who, after "poking around" the iSpecies blog, wrote Why Google is good for science. Well, yes and no. On the one hand it's fabulous, but on the other hand they can play rough. For example, iSpecies used Google Scholar to find scientific papers for a species name. The traffic was pretty minimal in the scheme of things, but Google have now blocked iSpecies (and as a consequence my whole University - gulp!) from accessing Google Scholar.


Before anybody says, "but you got what you deserved because you broke Google's Terms of Service", I think in this case they are simply being lazy. If Google truly cared about making Google Scholar useful, they'd create an API. Because they haven't I had to resort to screen scraping their unbelievably awful HTML (and I'm not the only one). The cost of setting up an API along the lines of the one available for the main Google search engine would be trivial.

After venting my spleen, the reason -- as I should of guessed -- is "intellectual property". Google Scholar's agreements with the publishers that they index prevents Google from making it available other then through the web site. Thanks to Rebecca Shapley for clarifying this. Once again, scientists are being ill served by our publishers. Perhaps somebody needs to set up an Open Source/Open Access equivalent of Google Scholar.

This is what I originally wrote, which perhaps is another reason publishers don't want Google Scholar having an API:

There would also be a potential market. In the UK we rate our research based on a number of factors including journal impact factor, as part of the gargantuan Research Assessment Exercise. Impact factors are supplied by ISI, and Google Scholar results compare well with that source. Just think of the possibilities of a service that used Google Scholar to rate scientists' output. It could even be part of a service like LinkedIn, whioch I stumbled on via Pierre's blog on geotagging RSS feeds (which is a whole separate issue).