On YAML and YAML::XS inconsistencies

Personally I'd prefer YAML over JSON for local config data anytime. Even if JSON is secure by default, and YAML is insecure. YAML is readable and writable. It's better than .ini, .json and .xml.

But Houston we have a problem. For a long time. I'll fix it.

We have the unique advantage that the spec author and maintainer is from the perl world, Ingy, and maintains the two standard libraries YAML, the PP (pure perl) variant, and YAML::XS, the fast XS variant, based on LibYAML.

This would be an advantage if those two libraries would agree on their interpretation and implementation of the specs. They do not.

Historically the YAML library is used as the default reader for CPAN .yml preferences and a fork of YAML::Tiny, CPAN::Meta::YAML which is in core is used to read and write the package META.yml files.

The basic idea is to use the fastest library available and use a PP fallback for systems which don't have the fast variant. perl5 core does not ship a proper fast library for JSON and YAML, so you have to stick to the 10x slower PP variants. cperl will ship with YAML::XS and Cpanel::JSON::XS in core, so there this problem is gone.

But we still have the YAML problem:

YAML, the default reader for CPAN, refuses to read .yml files produced by YAML::XS. You can only set YAML::Syck as yaml_module in ~/.cpan/CPAN/MyConfig.pm, using YAML::XS will get you into trouble. But YAML::Syck is not maintained anymore. It was written by _why the lucky stiff, also the author of potion, the VM for p2. It still kinda works, and it behaves better than YAML::XS, but it would be better to replace libsyck by libyaml afterall, and get the YAML maintainers to fix their mess.

The fault is in the YAML::XS (i.e. LibYAML) dumper and in the YAML loader.

YAML::XS writes .yml files, which YAML cannot read. YAML supports scalars, arrays (called sequences) and hashes (called map). The current problem is the interpretation of the Spec in the current version 1.2, 6.1 Indentation Spaces.

YAML::XS writes the elements of sequence without indent, and YAML all other YAML libraries expect an indent.

I.e. YAML::XS writes for {author => ['perl5-porters@perl.org']}

author:
- perl5-porters@perl.org

while all other libraries and the spec insist on at least a space before the -, the seq sibling.

author:
    - perl5-porters@perl.org

"Each node must be indented further than its parent node. All sibling nodes must use the exact same indentation level. However the content of each sibling node may be further indented independently." http://yaml.org/spec/1.2/spec.html#id2777534

But in the meantime all other YAML loaders came to accept Ingy's interpretation on the seq indentation level, and do accept the missing seq indent. Just YAML not. YAML throws a MAP error. This is certainly a YAML loader bug.

Remember that YAML is the default reader in the CPAN config, all it needs to do is to load the yaml. Which is broken.

All this is known for a long time, Szabo wrote about inconsistencies, p5p put a variant of the better YAML::Tiny into core as CPAN::Meta::YAML. This is fine, but in the long run a fast library in core is preferred, and that's what I'm doing for cperl.

So what needs to be done:

  1. Change yaml_module in ~/.cpan/CPAN/MyConfig.pm to either CPAN::Meta::YAML, YAML::Syck or YAML::XS. All these can read those YAML files. YAML can not, until it's fixed.

  2. Fix YAML::XS to dump seq elements with intentation, as all the others YAML libraries, and as the specs says. I'm working on that.

  3. Fix YAML to accept seq elements with intentation to be able to read old YAML::XS files. I'm working on that.

  4. Fix YAML::XS to accept spec-violating elements in a new NonStrict mode, because the other libraries write those elements, and a YAML loader should be optionally non-fatal on illegal control chars, illegal utf-8 characters and such. All other YAML loaders silently replace illegal elements with undef. I'm working on that in https://github.com/ingydotnet/yaml-libyaml-pm/issues/44

Ingy insists that all other libraries are broken, they produce wrong YAML. Which would be acceptable if the libraries and the spec at least would be consistent. They are not. And historically all successful YAML readers are non-fatal.

cpanel_json_xs has now the options yaml, yaml-xs, yaml-tiny, and yaml-syck to use those libraries for readine and writing from the command line. This way you can easily prove the various inconsistencies.

cpanel_json_xs -f yaml -t yaml-xs <META.yml >XSMETA.yml
cpanel_json_xs -f yaml -t yaml    <XSMETA.yml

YAML Error: Invalid element in map
  Code: YAML_LOAD_ERR_BAD_MAP_ELEMENT

And you can try all other variants, which do work mostly.

For YAML::XS the following needs to be done: With NonStrict allow character errors (control, unicode), throw a warning, replace by the partial read or undef, and continue parsing. This way you loose data, but NonStrict is optional and a fallback for local configuration files, which are better read partially than not at all. We cannot loose everything on roundtrips.

6 Comments

try toml, it's great

Is this improvement available on CPAN, too?

About Reini Urban

user-pic Working at cPanel on cperl, B::C (the perl-compiler), parrot, B::Generate, cygwin perl and more guts, keeping the system alive.