Virtual Spring Cleaning (part 3 of XX) wherein one release begets another

The ambush of WWW::Mechanize::Chrome shows more fallout before the module itself has been released. The module is one in a long line of browser automation modules I wrote, starting with WWW::Mechanize::Shell, reaching is breakthrough with WWW::Mechanize::Firefox and continuing from WWW::Mechanize::PhantomJS to WWW::Mechanize::Chrome.

My approach for shared code between WWW::Mechanize::Firefox and WWW::Mechanize::PhantomJS was to have module files that were copied between the two distributions, as the code was too short to merit a release on its own. One exemplary case is WWW::Mechanize::Plugin::Selector, which is basically a role that all three modules share. It requires an ->xpath method and provides the translation of HTML CSS selectors to the corresponding XPath queries, as implemented by HTML::Selector::XPath.

The usage of the new function is also fairly obvious. The following call returns all paragraph nodes from the HTML document that have a class of content:

my @text = $mech->selector('p.content');

In theory, the module also works with WWW::Mechanize::Pluggable but I think WWW::Mechanize::Pluggable is missing a proper ->xpath method to use this plugin. Such a plugin should be trivially implementable using HTML::TreeBuilder::XPath if you're willing to forego HTML 5 tags, or using HTML::HTML5::Parser.

The code implementing the functionality is embarassingly short but I hope that other WWW::Mechanize or web scraper implementations also use it or are inspired by it to implement a similar facility.

Leave a comment

About Max Maischein

user-pic I'm the Treasurer for the Frankfurt Perlmongers e.V. . I have organized Perl events including 7 German Perl Workshops one YAPC::Europe.