Last Minute: HTML::Element::Replacer
As my regular readers might remember, I finished my August assignment on the 28th of September at 3 AM. I sent the email to Neil Bowers, noting it was probably too late to get a proper September assignment. Surprisingly, Neil replied with
Well, you’re on an unbeaten run so far, so if you want a September one, with 4 days left, I’ll assign you one. Want one? :-)
I imagined 4 days (I could only count 3, but hey) with maybe another two weeks of “sticking” with the assignment, and replied with Yes.
Being Busy
The same day at 3 PM, I got my assignment: HTML::Element::Replacer. It was a public holiday here and we went to our friends’ son’s birthday party, so I didn’t have enough time to even have a look. The next day, there was Prague.pm’s emergency social meeting with people from Brno.pm. I returned home around midnight, but despite having drunk many beers, I didn’t want to go to bed. You probably already know — I started hacking.
Generated Files
The first thing I noticed after cloning my fork of the GitHub repository was the presence of the generated files in it. The Makefile was there, a file generated by tests, and even the whole blib/
directory! Don’t keep generated files in version control, I said to myself and created my first commit.
Test Failure
But, I felt like I could do better. I checked the testers’ reports and discovered failures in Perls 5.18+. The failing test compared two XML files as strings, but the order of a node’s attributes was different:
# Failed test 'HTML' # at t/01-replacer.t line 28. # got: '<table> # <tr scla="top" /="/"></tr> # <tr scla="mid"> # <td kmap="brand">schlitz</td> # <td kmap="age">young</td> # </tr> # <tr scla="mid"> # <td kmap="brand">lowenbrau</td> # <td kmap="age">24</td> # </tr> # <tr scla="mid"> # <td kmap="brand">miller</td> # <td kmap="age">17</td> # </tr> # <tr /="/" scla="bot"></tr> # </table> # ' # expected: '<table> # <tr scla="top" /="/"></tr> # <tr scla="mid"> # <td kmap="brand">schlitz</td> # <td kmap="age">young</td> # </tr> # <tr scla="mid"> # <td kmap="brand">lowenbrau</td> # <td kmap="age">24</td> # </tr> # <tr scla="mid"> # <td kmap="brand">miller</td> # <td kmap="age">17</td> # </tr> # <tr scla="bot" /="/"></tr> # </table> # '
Do you see it? No? diff
can help you:
< # <tr /="/" scla="bot"></tr> --- > # <tr scla="bot" /="/"></tr>
Wait, wait… Is slash a valid attribute name in HTML at all? HTML::TreeBuilder seems to have its own problems here with parsing XML-like self-closing tags. But without the problems, each element would have had only one attribute, and a different bug wouldn’t have been revealed.
It was clear to me that the ordering of attributes is a consequence of the hash order randomisation that happened in 5.18. HTML::PrettyPrinter has no option to specify how to order attributes, so I decided to drop the dependency and use the HTML::Element::as_HTML
method from the HTML::Tree distribution that was already used for other stuff.
After the second commit, I got finally tired, so I created a pull request from the commits (I should have probably created two separate PR’s, but it was too late). Two days after the assignment, I was ready for October!
Afterthought
I usually use XML::XSH2 for XML handling. Here’s how you can mimic 01-replacer.t in it:
#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;
my @data = ( { brand => 'schlitz', age => '"young"' },
{ brand => 'lowenbrau', age => 24 },
{ brand => "O'Hara", age => 17},
);
xsh 'open :F html t/html/replacer/replacer.initial';
xsh '$replace = //tr[@scla="mid"][1]';
for my $struct (@data) {
xsh '$new := xcopy :r $replace before .';
for my $attr (keys %$struct) {
$XML::XSH2::Map::attr = $attr;
$XML::XSH2::Map::value = $struct->{$attr};
xsh 'set $new/td[@kmap=$attr] $value';
}
}
xsh 'delete $replace ; save :b ;';
Leave a comment