YAML::PP Final Grant Report
This is the final report on my TPF Grant Complete YAML::PP
It's a final report, but it's also a TODO report.
I did way more work on YAML::XS and YAML.pm than I would have thought. Both modules will stay around. YAML::XS because of its speed and its usage of the widely used (or ported) libyaml; YAML.pm because for simple data it mostly just works, so there is at least no urgent need for changing to a YAML 1.2 processor.
I think fixing some annoying bugs and incompatibilities improved the state of YAML in Perl a lot.
This is something that was not part of my initial grant proposal.
On the other hand, during implementing the YAML::PP, I learnt more about YAML and saw the need to implement things in a more generic way. If done right, you can do cool things with it. If you are the kind of person who likes to write their programs in latin, you might appreciate that you can now tell the YAML loader to read roman numbers into integers! (And the other way around).
This takes more time to implement and is one of the reasons why loading and dumping generic perl objects is not implemented yet.
Report
Up to the March 2018 report, I worked on YAML related things about 305 hours. Adding the time since then, (if I exclude the time spent in Oslo in April), I think it's safe to say I spent at least 340 hours in total, that's a bit more than two month full time work.
In its current state, you can rely on the documented YAML::PP functionality, but not so much on its API. Be aware that the usage might change.
Pin the used YAML::PP version, and run your test suite when upgrading. (You have a test suite, right?)
I will do my best to document breaking changes in the Changelog.
Deliverables
Here is a comparison of my original plans to what I've done.
Complete YAML::PP::Parser
Original
- Flow Style
- Flow Nodes as mapping keys
- Line and Column Numbers for error messages
Status
Most of flow style is done.
The remaining cases are rarely used and can all be avoided, or edge cases:
- Empty nodes where a comma or
]
is directly following a tag or anchor[&anchor,]
,{foo:,}
- Implicit mappings in flow sequences:
[a: 1, b: 2]
==[{a: 1},{b: 2}]
- Explicit keys in flow collections:
[ ? key : value ]
- "Empty" documents (two document end markers
...
) - Unquoted strings ending with
:
, e.g.foo::: bar
equals"foo::": bar
- No space after colon when key is quoted
{"foo":23}
Flow nodes as mapping keys are not really relevant for perl, since they can't be loaded into native hashes.
[a, b]: [1, 2]
Most error messages have line and column number, and the lexer will report which tokens were expected, and which it got instead.
Currently there are 21 failing parsing tests according to the YAML Test Matrix. To see what to avoid you can view every failing test case.
Most of the parsing is done via a grammar, but there is still also manual parsing going on that should be transferred to the grammar in the future.
YAML::PP::Loader/Constructor
Original
- Implement loading of Tags and blessing into objects
- Provide a possibility for safe loading
- Ideally provide a way to only load certain tags
Status
Currently you can load scalars into objects or transform data, by providing a regex or list of strings, and/or a tag name. You can provide a code reference which gets the original YAML scalar and its style as an argument. There's an example in the distribution that has a little templating feature and can load external vars.
Safe loading is the default. I added an option to detect or reject cyclic references.
You cannot yet do that custom loading for mappings or sequences. Because YAML supports cyclic references, this can get quite complicated, though.
Originally I planned to implement just one standard Schema plus the generic perl objects. It would have been easier to hardcode this, but I decided to make it more generic. It should end up as powerful as PyYAML, for example. I regularly see questions asked that are using PyYAML's features, so I think Perl should also have something like this. After all, it's Perl!
Instead I implemented all three YAML 1.2 Schemas in a generic way. You can load and Dump data structures using Failsafe, JSON and Core Schema. The only other YAML processor I know that can load different schemas is js-yaml, but it currently supports only Failsafe and Core.
Adding the YAML 1.1 types to YAML::PP to be able to load 1.1 documents should be easy.
For loading generic perl objects, I have to add the custom loading for mappings and sequences first.
Emitter/Dumper
Original
- Write YAML::PP::Emitter
- Write YAML::PP::Dumper/Deconstructor
Status
The Dumper is able to dump all data structures except objects or things like typeglobs or coderefs.
The Emitter is able to output all test cases (that the parser can parse) correctly, except for folded block scalars. Since folded block scalars aren't used by default, you should be able to use it correctly for all data.
I wrote YAML::PP::Representer which is the opposite of YAML::PP::Constructor. It is responsible for the Schema, that means deciding if something is an integer, float, boolean undef or string and telling the emitter if it has to be quoted or not.
YAML Test Suite
The test suite and related projects are slowly attracting more developers. I think it's a success, but there's still a lot of work.
Currently the following projects are using the test suite: Nim NimYAML, Perl YAML::PP, Perl YAML::Pegex, C libyaml, Go go-yaml, Java SnakeYAML, Javascript yaml, Haskell HsYAML
I added 64 tests and fixed existing tests, especially the tests for JSON comparison, since I was the only one using these tests until recently.
Additional work
YAML::XS
I added $YAML::XS::LoadBlessed
, so you can now (quite) safely load YAML from
untrusted sources.
You can now serialize booleans with the $YAML::XS::Boolean
option, enabling
you to exchange data with JSON modules and others that use booleans.
I fixed a bug with loading many regexes in one YAML file, which resulted in a segfault.
Loading and dumping one regex multiple times will now not grow the regex anymore.
In the test matrix, YAML::XS is now very close to PyYAML.
YAML.pm
YAML.pm is still widely used and quite incompatible with other YAML processors. One reason is that it was written for YAML 1.0. But there were also bugs and problems which I was able to fix.
I added $YAML::LoadBlessed
, so all YAML modules on CPAN are now safe regarding
loading objects.
Other changes:
- Fixed a problematic regex for parsing quoted strings
- Added support for trailing comments. So far I know of one CPAN module that
broke because of that, but fortunately it was only the test suite that needed
a patch. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898561
Thanks to the debian crew! If you are using
MARC::Transform be sure to add
quotes around strings with
<space>#
- Fixed two problems with mapping keys that equal
=
or start with=<space>
- Fixed the same bug as in YAML::XS with growing regexes
- Fixed bug when loading top level scalars with multiple spaces
- Support compact nested sequences
- Support zero indented sequences
Especially the last two increase interoperability with modules like YAML::XS.
YAML Editor
The YAML editor consists of a docker image with 17 different YAML processors built in, several programs/scripts that parse/load/emit/dump YAML, and some clever vimscript to play with several frameworks at once.
I did several small fixes for the existing views, added new ones and improved documentation a bit.
Recently, Herbert Riedel, a Haskell developer, joined us on IRC, and he is writing a YAML Parser/Loader on top of the YAML Reference Parser, written by Oren Ben-Kiki. Ingy added it to the YAML Editor. This is really helpful because we can now easily see how the reference parser parses the test cases. You can see it in the test matrix now; it's currently the parser and loader which passes most test cases.
YAML Test Matrix
I think the test matrix is an important part of the test infrastructure because it visualizes the test suite and also gives a quick overview over existing YAML processors.
I added an overview page to quickly compare all the processors to each other: https://matrix.yaml.io/
I added the results for the invalid tests to the overview.
I added a page to it that shows all test cases highlighted, so people can get a very quick impression of what the test suite contains, instead of having to browse all test cases manually. Highlighting was done with YAML::PP. It's also searchable.
Although recently Ingy changed the test suite to the new TestML format which is now processed by nodejs, a lot of the test infrastructure is powered by perl.
Blog posts
I wrote five blog posts, and I think the tutorials already have been very helpful, at least to me. When someone asks about YAML on IRC or stackoverflow, I can often just give a small example and then refer to one of the articles.
- Strings in YAML - To Quote or not to Quote
- Introduction to YAML Schemas and Tags
- Safely load untrusted YAML in
Perl
- Please note the comment section. I made a mistake in the original post and edited it later
- Perl Toolchain Summit
- YAML.pm 1.25 Changelog
Talks
I gave a 40 minute talk on The state of the YAML at TPC in Amsterdam.
Feedback showed that it was a bit too much theory for most of the audience.
At the London Perl Workshop in November I gave a more practical 20 minute talk about YAML - Where and how to use? What's new? (Video), and a Lightning Talk YAML::PP - Just another YAML Framework? (Video). I got positive feedback for those.
Thanks to people giving feedback which helps me to improve my talks.
All past reports
- August/September 2017
- October 2017
- November 2017
- December 2017
- January 2018
- February 2018
- March 2018
- For April, see the Perl Toolchain Summit report
Thanks to...
- Ingy for inventing YAML and creating the YAML Editor and YAML Test Suite
- Felix Krause for helping me understanding the Spec
- My Grant Manager Mark for helping me with my reports
- The Perl Foundation for supporting this, and the sponsors supporting the Perl Foundation
Good!