"block sequence entries are not allowed in this context"
I have a couple hundred thousand YAML files that I created with plain, ol' YAML.pm. I created these just before I realized how slow YAML.pm is, but I have the files already. Processing them with YAML.pm is really, really slow, so I wanted to see how much faster the other YAML modules might be.
My problem, which Google doesn't know much about (yet), is that the faster parsers complain "block sequence entries are not allowed in this context" when I try to parse these files while YAML.pm (really old, but pure Perl) and YAML::Syck (deprecated, uses YAML 1.0) don't. YAML::XS is based on libyaml, an implementation that actually conforms to the YAML 1.1 specification. I didn't create the files with YAML::XS though, so I have lines like:
cpplast: -
cppminus: -
When in YAML 1.1 those lines should be something like:
cpplast: '-'
cppminus: '-'
Those are literal dashes and they shouldn't be YAML syntax. Here's a bit of code that lets me selectively load a YAML file to dump the same hash ref:
#!/usr/bin/perl
use 5.010;
load_yaml_module( $ARGV[0] );
print Dump( {
foo => '-',
bar => '--',
cat => 'Buster',
}
);
sub load_yaml_module
{
my( $module ) = shift;
say "Loading $module...";
my $loaded = eval "use $module; 1;";
die "Could not load $module: $@\n" unless $loaded;
die "Did not get a Dump from $module\n"
unless defined &Dump;
return 1;
}
Now, I try it with a couple of different YAML implementors, and I see that it's the pure Perl YAML.pm that's creating the unquoted - field:
$ perl yaml-dump YAML
Loading YAML...
---
bar: --
cat: Buster
foo: -
$ perl yaml-dump YAML::XS
Loading YAML::XS...
---
bar: --
cat: Buster
foo: '-'
$ perl yaml-dump YAML::Syck
Loading YAML::Syck...
---
bar: --
cat: Buster
foo: "-"
So, when I try to use YAML::XS to parse the files created by YAML.pm, it complains "block sequence entries are not allowed in this context" because it thinks that single dash is part of the YAML sequence that says it's starting a member of a collection.
So, now I just have to convert all the old files to the latest YAML so I can use the speediest parser. That means I have to use the old YAML.pm to load the file then write the same data back with YAML::XS with something like:
use YAML qw( LoadFile );
use YAML::XS qw( Dump );
my $yaml = LoadFile( $ARGV[0] );
open my $fh, '>', $ARGV[0];
print $fh Dump( $yaml );
Now I just have to endure the super slow YAML.pm for another pass (about two full days of churning) so I can use the faster stuff for real processing.
How about just converting them to JSON so you don't have to endure YAML again?
Yes, I'm going to convert them to JSON too, but I need to fix up a lot of other stuff that expects them to be YAML before then. The conversion to the latest YAML format is a quick bit of code that I start running and forget about for two days.
Also, as I now remember as I started to convert some of the data to JSON, that some of it has serialized Perl objects (some of which contain other objects). That makes it a pain in the ass to convert to JSON. It can be done, but I just wish that I had an option to simply ignore the blessing and use the reference anyway.
I can understand reading in using the old library and writing out using the new one is the most complete and correct method to fix this, but if you are sure that the line you've shown is the only problem, isn't a well constructed sed command or perl one liner sufficient to just alter the files?
Well, the assumption there is "if you are sure", and I am not. My solution is the one that requires the least of my attention and the least potential for screwing up, so that one wins.