Managed to figure out why the Google Scholar results would sometimes appear and sometimes not. The server hosting iSpecies uses PortSentry to detect and block port scans. PortSentry decided that the Glasgow University proxy was evil. Our proxy has three IP addresses, two of which were blocked by PortSentry. If the DNS resolved the proxy address to a blocked IP, the Google Scholar Perl script would fail (as it connects to the outside world via the proxy). If the DNS happened to resolve it to the unblocked IP, it would work. Found this out by changing
use LWP;
to
use LWP::Debug '+';
and printing out the response status_line.
1 comment:
I'm missing a bit of context here. Do you have a Perl script for parsing Google Scholar results?
Post a Comment