
Following on from an earleir post, I've now added DOI extraction for SciELO, which hosts Brazilian publications, and Taylor and Francis. This was motivated by searching iSpecies for the ant Trachymyrmex opulentus, for which only papers hosted by these two publishers appear in the search results.

Again, we are reduced to screen scraping (sigh). Why oh why don't the people who design these web sites get their act together and embed useful information in the HTML, rather than assume that only humans will make use of these pages?
One provider that is clued up is Ingenta. For example, take a look at the HTML for the article "Influence of Topography on the Distribution of Ground-Dwelling Ants in an Amazonian Forest" (doi:10.1076/snfe.38.2.115.15923) on the Ingenta site (Firefox and Camino users can see the source here). Embedded in the <meta> tags is all sorts of metadata, including the DOI:
<meta name="DC.identifier" scheme="URI"
content="info:doi/10.1076/snfe.38.2.115.15923"/>
The use of consistently formatted tags makes data extraction much easier. Of course, it's no surprise that Ingenta do this well (check out their blog).

1 comments:
酒店喝酒
酒店消費
喝花酒
粉味
酒店打工
酒店兼職
台北禮服酒店
酒店經紀公司
酒店經紀
台北酒店經紀
禮服酒店
禮服店
酒店pt
酒店經紀人
台北酒店
台北酒店經紀公司
酒店打
酒店正職
台北禮服店
酒店午場
便服店
台北便服店
酒店資訊
酒店上班
酒店日保
Post a Comment