Last Minute: HTML::Element::Replacer

As my regular readers might remember, I finished my August assignment on the 28th of September at 3 AM. I sent the email to Neil Bowers, noting it was probably too late to get a proper September assignment. Surprisingly, Neil replied with

Well, you’re on an unbeaten run so far, so if you want a September one, with 4 days left, I’ll assign you one. Want one? :-)

I imagined 4 days (I could only count 3, but hey) with maybe another two weeks of “sticking” with the assignment, and replied with Yes.

Being Busy

The same day at 3 PM, I got my assignment: HTML::Element::Replacer. It was a public holiday here and we went to our friends’ son’s birthday party, so I didn’t have enough time to even have a look. The next day, there was Prague.pm’s emergency social meeting with people from Brno.pm. I returned home around midnight, but despite having drunk many beers, I didn’t want to go to bed. You probably already know — I started hacking.

Generated Files

The first thing I noticed after cloning my fork of the GitHub repository was the presence of the generated files in it. The Makefile was there, a file generated by tests, and even the whole blib/ directory! Don’t keep generated files in version control, I said to myself and created my first commit.

Test Failure

But, I felt like I could do better. I checked the testers’ reports and discovered failures in Perls 5.18+. The failing test compared two XML files as strings, but the order of a node’s attributes was different:

#   Failed test 'HTML'
#   at t/01-replacer.t line 28.
#          got: '<table>
#   <tr scla="top" /="/"></tr>
#   <tr scla="mid">
#     <td kmap="brand">schlitz</td>
#     <td kmap="age">young</td>
#   </tr>
#   <tr scla="mid">
#     <td kmap="brand">lowenbrau</td>
#     <td kmap="age">24</td>
#   </tr>
#   <tr scla="mid">
#     <td kmap="brand">miller</td>
#     <td kmap="age">17</td>
#   </tr>
#   <tr /="/" scla="bot"></tr>
# </table>
# '
#     expected: '<table>
#   <tr scla="top" /="/"></tr>
#   <tr scla="mid">
#     <td kmap="brand">schlitz</td>
#     <td kmap="age">young</td>
#   </tr>
#   <tr scla="mid">
#     <td kmap="brand">lowenbrau</td>
#     <td kmap="age">24</td>
#   </tr>
#   <tr scla="mid">
#     <td kmap="brand">miller</td>
#     <td kmap="age">17</td>
#   </tr>
#   <tr scla="bot" /="/"></tr>
# </table>
# '

Do you see it? No? diff can help you:

< #   <tr /="/" scla="bot"></tr>
---
> #   <tr scla="bot" /="/"></tr>

Wait, wait… Is slash a valid attribute name in HTML at all? HTML::TreeBuilder seems to have its own problems here with parsing XML-like self-closing tags. But without the problems, each element would have had only one attribute, and a different bug wouldn’t have been revealed.

It was clear to me that the ordering of attributes is a consequence of the hash order randomisation that happened in 5.18. HTML::PrettyPrinter has no option to specify how to order attributes, so I decided to drop the dependency and use the HTML::Element::as_HTML method from the HTML::Tree distribution that was already used for other stuff.

After the second commit, I got finally tired, so I created a pull request from the commits (I should have probably created two separate PR’s, but it was too late). Two days after the assignment, I was ready for October!

Afterthought

I usually use XML::XSH2 for XML handling. Here’s how you can mimic 01-replacer.t in it:

#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;

my @data = ( { brand => 'schlitz',   age => '"young"' },
             { brand => 'lowenbrau', age => 24 },
             { brand => "O'Hara",    age => 17},
           );

xsh 'open :F html t/html/replacer/replacer.initial';
xsh '$replace = //tr[@scla="mid"][1]';
for my $struct (@data) {
    xsh '$new := xcopy :r $replace before .';
    for my $attr (keys %$struct) {
        $XML::XSH2::Map::attr  = $attr;
        $XML::XSH2::Map::value = $struct->{$attr};
        xsh 'set $new/td[@kmap=$attr] $value';
    }
}
xsh 'delete $replace ; save :b ;';

Leave a comment

About E. Choroba

user-pic I blog about Perl.