Perl, Selenium, and ASP.NET
I've done a lot of web scraping with Perl over the years, but I hadn't experienced anything quite like the "Next page" link that ASP.NET threw at me this week. The opposite of REST, ASP.NET's ctlPagePlaceHolder
makes the simplest navigation beyond the reach of WWW::Mechanize as far as I can tell. Luckily Selenium came to my rescue.
If you haven't experienced Selenium automation, it's quite impressive. From a normal shell window (OS X Terminal.app in my case) you launch a Java server
selenium-remote-control-1.0.3/selenium-server-1.0.3
$ java -jar selenium-server.jar
and then use WWW::Selenium in your Perl program. As your Perl program runs it launches Firefox on your local workstation and performs whatever commands you issue. In my case this was simply "click 'Next Page'". :)
The Selenium IDE Firefox plugin is great for quickly mocking up what commands you need. Once it's working, drop those commands into your program to get the job done.
It's disturbing to me that ASP.NET is so convoluted by default that I need Selenium for some operations. (I can't imagine anyone went out of their way to make automation of their site this difficult intentionally.) But I sure am glad it's available when I need it. What should have been ?page=2
turned into me harnessing untold volumes of source code (Selenium + Firefox!) to get the job done.
Oh well. Such is the way of the world sometimes. :)
The nice weather is holding out here in Omaha, Nebraska, USA. Still motorcycle weather, and the trees are pretty in the light industrial park Mutation Grid now calls home.
Have you read my tutorial Using WWW::Selenium To Test Or Automate An Ajax Website? Some of the problems I outlined with getting Selenium running are no longer valid (they have been resolved) but the rest should still be useful.
Presumably WWW::Mechanize::Firefox would be similarly helpful
Chickenfoot, while not Perl, offers alternative web scraping powers when normal approaches don't work.