XML::CompactTree::XS Works Again

The Report

In a GitHub issue discussion by my former colleagues, I noticed weird errors being reported. I was curious about the cause, but not enough to dig deeper, and as nothing happened, I forgot about it.

But then they asked me for a favour, and I agreed to help them with a project. But, to my surprise, I wasn’t able to install the tools I originally helped to create, getting the same error as the poor GitHub guy. Quick check of the CPAN Testers showed problems in Treex::PML, one of the underlying libraries, all the way down to 5.20.0. A typical report would look like this:

# Testing Treex::PML 2.18, Perl 5.020003, /home/stro/perl/5.20.3/bin/perl
t/00-load.t ....... ok
Modification of a read-only value attempted at .pml_compile.d/annotation__generated_read_container@terminal.type line 16.
# Looks like you planned 126 tests but ran 3.
# Looks like your test exited with 25 just after 3.
t/01-pml.t ........ 
Dubious, test returned 25 (wstat 6400, 0x1900)
Failed 123/126 subtests 
t/02-csts.t ....... ok
t/03-fs.t ......... ok
t/04-xslt.t ....... ok
t/boilerplate.t ... ok
t/manifest.t ...... skipped: Author tests not required for installation
t/pod-coverage.t .. ok
t/pod.t ........... ok
t/url.t ........... ok

Test Summary Report
-------------------
t/01-pml.t      (Wstat: 6400 Tests: 3 Failed: 0)
  Non-zero exit status: 25
  Parse errors: Bad plan.  You planned 126 tests but ran 3.

The library manipulates an XML format. For each data file, you first supply a schema which tells the library how to read and validate the data. The code is very hard to debug, as it uses a highly efficient optimisation: The schema is used to generate the code to process the data. So, when parsing an XML element containing positive integer, you don’t have to check its type in the schema, the generated code already knows it expects a positive integer and possibly throws the appropriate exception if it’s not there.

The First Fix

Fortunately, there’s a debug option to dump all the generated code to files that you can inspect. With some trial and error, I found the error was coming from the following lines (simplified):

use constant XAT_ATTRS => 3;
# ...
sub {
  my ($p) = @_;
  my $A = $p->[XAT_ATTRS];
  my $c = $p->[XAT_CHILDREN];
  my (%s, $k, $v, $content, @a_rest);
  if ($A) {
    while (@$A) {
      $k = shift @$A;
      $v = shift @$A;
      if (exists $attributes{$k}) {
        $s{$k} = $v;
      } else {
        push @a_rest, $k, $v;
      }
    }
  }
  $p->[XAT_ATTRS] = \@a_rest;  # <- HERE

How can $p->[3] be read-only? You can easily get the error if you assign a reference to @_ to $p:

sub { $p = \@_; $p->[3] = [] }->(1, 2, 3, 4);

But in the generated code above, that wasn’t the case. I played with the code for a while, but I didn’t find anything.

I blindly tried to fix the problem by not using @_ directly, I only made a shallow copy of it:

sub {
    my $p = [ @{ +shift } ];

And to my surprise, the test passed!

I wrote an email to my ex-colleagues stating I had a fix, and released a new version of the library to CPAN. But in the back of my head, I knew this wasn’t the real solution. I didn’t know what was broken, and I didn’t know how the shallow copy fixed it.

The Proper Fix

The next evening, I decided to examine the problem once again. When playing with the tests, I noticed XML::CompactTree::XS was used, and I tried to replace it with its pure Perl implementation—and, to my surprise, there was no failure.

Therefore, the problem had to lie in the XS code. I could barely read XS, but I recognised (the source was rather short, fortunately) that the array reference that was later called $p was coming from the XS. It was created by pushing various stuff to kind of an array:

av_push(prev, newSViv(XML_READER_TYPE_DOCUMENT));
// ...
av_push(prev, Char2SV(xmlTextReaderConstEncoding(reader)));
// ...
av_push(top, newRV_noinc((SV*)prev));
// etc.

So, I googled for “perl 5.20 av_push”, hoping a delta would mention a change in the behaviour or something. And to my surprise, the third hit came form perlguts and read:

perlguts. Perl 5 version 24.0 documentation. Go to top • Download PDF ...... In perl 5.20, storing &PL_sv_undef will create a read-only element, because the ...

Read-only element? That was my problem! I checked the code and found the culprit:

av_push(av, &PL_sv_undef); /* no attributes */

Perlguts tells us to use newSV(0) instead, so I tried that—and all the tests passed.

Leaving Module::Build

I uploaded a new version of the XS module to CPAN, and fixed the Treex::PML distribution to depend on this new version. But after a short while, I got an e-mail from Slaven Rezić: on his smoker machines, Treex::PML was still failing.

It turned out only the pure Perl XML::CompactTree was listed as a dependency, the XS variant (even if in the correct version) was only recommended. If a smoker had the old XS library installed, it still used it and failed.

It wasn’t that hard to change the Build.PL script to check whether the wrong XS dependency was present, and change the recommendation into a requirement in such a case:

if (eval {
    require XML::CompactTree::XS;
    $XML::CompactTree::XS::VERSION lt $MINIMAL_COMPACTTREE_XS;
}) {
    delete $recommends->{'XML::CompactTree::XS'};
    $requires->{'XML::CompactTree::XS'} = $MINIMAL_COMPACTTREE_XS;
}

But I wasn’t able to do the same in the generated Makefile.PL. I needed the Makefile.PL to be dynamic in the same way as Build.PL, but I didn’t find a way how to tell Module::Build to include code into the generated file.

I didn’t want to create both the files manually, as that would mean keeping them in sync, leading to weird errors when I forget. Therefore, after several hours, I decided to use only ExtUtils::MakeMaker, and let Module::Build go.

Summary aka tl;dr

"Modification of a read-only value attempted" appearing in Perl 5.20+ where older Perls worked OK might be caused by using PL_sv_undef in an array or hash within XS code.

P. S. Thanks Slaven and all the testers for their help.

Leave a comment

About E. Choroba

user-pic I blog about Perl.