Friday, December 23, 2005

Character coding issues with Google Scholar

Tanja points out that results for articles with German titles can look awful (e.g., try searching on Erica inflata). This is a problem with Google Scholar, which corrupts the characters. To verify this, do the search directly in Google Scholar. A workaround, if one had time, would be to screen scrape some of the source sites. For example, Springer's web site could be scraped to get the correct title, and a DOI. One more thing for the to do list...