On YAML and YAML::XS inconsistencies
Personally I'd prefer YAML over JSON for local config data anytime.
Even if JSON is secure by default, and YAML is insecure.
YAML is readable and writable. It's better than .ini
, .json
and .xml
.
But Houston we have a problem. For a long time. I'll fix it.
We have the unique advantage that the spec author and maintainer is from the perl world, Ingy, and maintains the two standard libraries YAML, the PP (pure perl) variant, and YAML::XS, the fast XS variant, based on LibYAML.
This would be an advantage if those two libraries would agree on their interpretation and implementation of the specs. They do not.
Historically the YAML library is used as the default reader for
CPAN .yml
preferences and a fork of YAML::Tiny, CPAN::Meta::YAML which
is in core is used to read and write the package META.yml
files.
The basic idea is to use the fastest library available and use a PP fallback for systems which don't have the fast variant. perl5 core does not ship a proper fast library for JSON and YAML, so you have to stick to the 10x slower PP variants. cperl will ship with YAML::XS and Cpanel::JSON::XS in core, so there this problem is gone.
But we still have the YAML problem:
YAML
, the default reader for CPAN, refuses to read .yml files
produced by YAML::XS
. You can only set YAML::Syck
as yaml_module
in ~/.cpan/CPAN/MyConfig.pm
, using YAML::XS
will get you into
trouble. But YAML::Syck is not maintained anymore. It was written by
_why the lucky stiff, also the author of potion, the VM for p2. It
still kinda works, and it behaves better than YAML::XS, but it would
be better to replace libsyck by libyaml afterall, and get the YAML
maintainers to fix their mess.
The fault is in the YAML::XS (i.e. LibYAML) dumper and in the YAML loader.
YAML::XS writes .yml
files, which YAML cannot read.
YAML supports scalars, arrays (called sequences) and hashes (called map).
The current problem is the interpretation of the Spec in the current version 1.2,
6.1 Indentation Spaces.
YAML::XS writes the elements of sequence without indent, and YAML all other YAML libraries expect an indent.
I.e. YAML::XS writes for {author => ['perl5-porters@perl.org']}
author:
- perl5-porters@perl.org
while all other libraries and the spec insist on at least a space
before the -
, the seq sibling.
author:
- perl5-porters@perl.org
"Each node must be indented further than its parent node. All sibling nodes must use the exact same indentation level. However the content of each sibling node may be further indented independently." http://yaml.org/spec/1.2/spec.html#id2777534
But in the meantime all other YAML loaders came to accept Ingy's interpretation on the seq indentation level, and do accept the missing seq indent. Just YAML not. YAML throws a MAP error. This is certainly a YAML loader bug.
Remember that YAML is the default reader in the CPAN config, all it needs to do is to load the yaml. Which is broken.
All this is known for a long time, Szabo wrote about inconsistencies, p5p put a variant of the better YAML::Tiny into core as CPAN::Meta::YAML. This is fine, but in the long run a fast library in core is preferred, and that's what I'm doing for cperl.
So what needs to be done:
Change
yaml_module
in~/.cpan/CPAN/MyConfig.pm
to either CPAN::Meta::YAML, YAML::Syck or YAML::XS. All these can read those YAML files. YAML can not, until it's fixed.Fix YAML::XS to dump seq elements with intentation, as all the others YAML libraries, and as the specs says. I'm working on that.
Fix YAML to accept seq elements with intentation to be able to read old YAML::XS files. I'm working on that.
Fix YAML::XS to accept spec-violating elements in a new NonStrict mode, because the other libraries write those elements, and a YAML loader should be optionally non-fatal on illegal control chars, illegal utf-8 characters and such. All other YAML loaders silently replace illegal elements with undef. I'm working on that in https://github.com/ingydotnet/yaml-libyaml-pm/issues/44
Ingy insists that all other libraries are broken, they produce wrong YAML. Which would be acceptable if the libraries and the spec at least would be consistent. They are not. And historically all successful YAML readers are non-fatal.
cpanel_json_xs
has now the options yaml, yaml-xs, yaml-tiny, and
yaml-syck to use those libraries for readine and writing from the
command line. This way you can easily prove the various inconsistencies.
cpanel_json_xs -f yaml -t yaml-xs <META.yml >XSMETA.yml
cpanel_json_xs -f yaml -t yaml <XSMETA.yml
YAML Error: Invalid element in map
Code: YAML_LOAD_ERR_BAD_MAP_ELEMENT
And you can try all other variants, which do work mostly.
For YAML::XS the following needs to be done: With NonStrict allow character errors (control, unicode), throw a warning, replace by the partial read or undef, and continue parsing. This way you loose data, but NonStrict is optional and a fallback for local configuration files, which are better read partially than not at all. We cannot loose everything on roundtrips.
Relevant old ticket (same issue at its core): https://rt.cpan.org/Ticket/Display.html?id=41463
try toml, it's great
toml is great for your private config files, but we need to solve YAML mess for perl5 core.
Given that libyaml is broken and YAML.pm is broken, we are only left with YAML::Syck and YAML::Tiny (CPAN::Meta::YAML).
Next blog post about fixing libsyck and merging with upstream.
Fixed the first part $YAML::XS::IndentlessMap with https://github.com/rurban/yaml-libyaml-pm/commit/7f1af737489f11cdd9f7139fe989c0f4d4fbc004
Now only $YAML::XS::NonStrict is needed to pass the validator tests, and YAML.pm needs to accept YAML::XS IndentlessMap.
And with the latest commit in https://github.com/rurban/yaml-libyaml-pm/commits/cperl-core
YAML::XS is back again in the modern ages.
Add native DumpFile, LoadFile
support filename in error messages.
Add dumper options: Indent, BestWidth, Canonical, Unicode,
Encoding, LineBreak, OpenEnded (kept defaults)
Add loader option: NonStrict, Encoding (kept defaults)
Fix default emitter_set_width (2 => 80)
Is this improvement available on CPAN, too?