Perl, Selenium, and ASP.NET
I've done a lot of web scraping with Perl over the years, but I hadn't experienced anything quite like the "Next page" link that ASP.NET threw at me this week. The opposite of REST, ASP.NET's
ctlPagePlaceHolder makes the simplest navigation beyond the reach of WWW::Mechanize as far as I can tell. Luckily Selenium came to my rescue.
If you haven't experienced Selenium automation, it's quite impressive. From a normal shell window (OS X Terminal.app in my case) you launch a Java server
selenium-remote-control-1.0.3/selenium-server-1.0.3 $ java -jar selenium-server.jar
and then use WWW::Selenium in your Perl program. As your Perl program runs it launches Firefox on your local workstation and performs whatever commands you issue. In my case this was simply "click 'Next Page'". :)
The Selenium IDE Firefox plugin is great for quickly mocking up what commands you need. Once it's working, drop those commands into your program to get the job done.
It's disturbing to me that ASP.NET is so convoluted by default that I need Selenium for some operations. (I can't imagine anyone went out of their way to make automation of their site this difficult intentionally.) But I sure am glad it's available when I need it. What should have been
?page=2 turned into me harnessing untold volumes of source code (Selenium + Firefox!) to get the job done.
Oh well. Such is the way of the world sometimes. :)
The nice weather is holding out here in Omaha, Nebraska, USA. Still motorcycle weather, and the trees are pretty in the light industrial park Mutation Grid now calls home.