June 2010 Archives

"block sequence entries are not allowed in this context"

I have a couple hundred thousand YAML files that I created with plain, ol' YAML.pm. I created these just before I realized how slow YAML.pm is, but I have the files already. Processing them with YAML.pm is really, really slow, so I wanted to see how much faster the other YAML modules might be.

My problem, which Google doesn't know much about (yet), is that the faster parsers complain "block sequence entries are not allowed in this context" when I try to parse these files while YAML.pm (really old, but pure Perl) and YAML::Syck (deprecated, uses YAML 1.0) don't. YAML::XS is based on libyaml, an implementation that actually conforms to the YAML 1.1 specification. I didn't create the files with YAML::XS though, so I have lines like:

cpplast: -
cppminus: -

When in YAML 1.1 those lines should be something like:

cpplast: '-'
cppminus: '-'

Those are literal dashes and they shouldn't be YAML syntax. Here's a bit of code that lets me selectively load a YAML file to dump the same hash ref:

#!/usr/bin/perl

use 5.010;

load_yaml_module( $ARGV[0] );

print Dump( {
    foo => '-',
    bar => '--',
    cat => 'Buster',
    }
    );

sub load_yaml_module
    {
    my( $module ) = shift;

    say "Loading $module...";
    my $loaded = eval "use $module; 1;";

    die "Could not load $module: $@\n" unless $loaded;
    die "Did not get a Dump from $module\n"
        unless defined &Dump;

    return 1;
    }

Now, I try it with a couple of different YAML implementors, and I see that it's the pure Perl YAML.pm that's creating the unquoted - field:

$ perl yaml-dump YAML
Loading YAML...
---
bar: --
cat: Buster
foo: -

$ perl yaml-dump YAML::XS
Loading YAML::XS...
---
bar: --
cat: Buster
foo: '-'

$ perl yaml-dump YAML::Syck
Loading YAML::Syck...
--- 
bar: --
cat: Buster
foo: "-"

So, when I try to use YAML::XS to parse the files created by YAML.pm, it complains "block sequence entries are not allowed in this context" because it thinks that single dash is part of the YAML sequence that says it's starting a member of a collection.

So, now I just have to convert all the old files to the latest YAML so I can use the speediest parser. That means I have to use the old YAML.pm to load the file then write the same data back with YAML::XS with something like:

use YAML     qw( LoadFile );
use YAML::XS qw( Dump );

my $yaml = LoadFile( $ARGV[0] );

open my $fh, '>', $ARGV[0];

print $fh Dump( $yaml );

Now I just have to endure the super slow YAML.pm for another pass (about two full days of churning) so I can use the faster stuff for real processing.

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).