Sunday, March 19, 2006

Towards a faster iSpecies: building libxml and libxslt on Mac OS X

iSpecies is written in PHP, and calls a Perl CGI script (to query Google Scholar). This works, but is a bit slow. It also puts limits on what we can do. For example, it would be cool to make the search multithreaded so that the different sources are queried at the same time. This becomes a major issue if we want to "drill down." For example, if a taxon exists in NCBI, it would be useful to visit all the LinkOut resources and collect whatever information they make available. Likewise, Google Scholar results contain links to publishers that could be explored further (such as extracting bibliographic information from RIS files, or RSS feeds such as those available for Ingenta-hosted journals). All of this would delay displaying search results to the user, especially if we have to visit one link after another.

Multithreading would help, but PHP doesn't do this, hence I'm toying with moving to C++ and building a "proper" application (I don't do Java). This means I need to get XML, XPath, and XSLT libraries for C/C++, and this has been, ahem, interesting. Was going to use Sablotron (which I use in my PHP 4 and Perl work), but its documentation is just awful (where are some nice examples?). Will probably use libxml and libxslt. These come with Mac OS X 10.3.9 (I do my development on a G4 iBook, before moving stuff to a Linux box), but Apple hasn't compiled libxml with XPath support (sigh). I built libxml 2.2.63 OK, but libxslt 1.1.15 needed a little hand holding because of the presence of Apple's libxml. The following does the trick:


./configure --with-libxml-prefix=/usr/local


This tells configure to use the version of libxml I installed in /usr/local. Now, once I get my head around libcurl I'll try and build something and see if we can speed up iSpecies.

4 comments:

Anonymous said...

Have you considered Ruby? It has multithreading, good XML/XSL support, interactive execution, and was originally designed as a Perl successor. Plus, all the cool kids are using it.

samantha said...

Doing this in C++ will slow you down even more than Java would in my experience. Your best trade off in programming productive and performance would most like be to go to Python. There is extensive server-side, multi-threaded etc. Python out there. Code any really critical bottlenecks in C/C++ but otherwise don't hamstring your creativity with such languages.

Anonymous said...

Ruby? Why not Python? Yet more cool kids are using it... ;-)

Anonymous said...

Do you know that the mabinogi gold, and do you want to know? In the game many palyers need the mabinogi money to up their levels. so they often search where can buy the cheap mabinogi, I think our website is your choice. Many friends told me that in here can buy mabinogi gold, and you will also practice your online games skills. So i hope more and more players come here to buy the mabinogi online gold.

What do you know maple mesos. And do you want to know? You can get mesos here. And welcome to our website, here you can play games, and you will get cheap mesos to play game. I know maplestory mesos, and it is very interesting.Do you want a try, come and view our website, and you will learn much about maple story mesos.