YAML::PP Grant Report February 2018

Hi there,

I had another busy month and did only hack a bit. I'm even so busy that I forget to use my time tracker, so I estimate about 20 hours.

See also my previous reports on blogs.perl.org (Aug/Sep, Oct, Nov, Dec, Jan )

Blog

I wrote a post about how to Safely load untrusted YAML in Perl

Another blog post I wanted to write for a while now is how to quote (or not) strings in YAML. In one of my previous reports I added a mini tutorial on this, but that was really very basic.

I have not found a complete reference of string types for YAML anywhere, so the only documentation is really the spec itself. That's not very useful for users.

So I wrote this: Strings in YAML - To Quote or not to Quote

YAML::PP

Cyclic references

I continued refactoring the Constructor to be able to use tags also for sequences and mappings.

While doing that, I realized that I can detect cyclic references. Sometimes, for example when loading untrusted YAML, you don't want them.

In the next version (and on github master branch), you can do:

my $ypp = YAML::PP->new( cyclic_refs => 'fatal' ); # will die

There are some other values for this option, but I'm not sure if they'll stay like they are: warn, ignore (both will just add an undef instead of the ref to the data structure), and allow (default).

It would also be nice to automatically weaken the reference when loading, but that seems to be a bit more complicated, since, when I encounter such a ref, I would have to use weaken on the actual item in the final data structure, but I only have a variable with the reference at that point, which has not been added to the data structure. Maybe I need some post processing for that.

Custom constructors

I implemented a kind of proof of conecpt for preserving key order for hashes.

My first draft looks like that:

use YAML::PP;
use Tie::IxHash;
my $ypp = YAML::PP->new;
$ypp->schema->add_map_resolver(
    tag => 'tag:yaml.org,2002:map',
    on_create => sub {
        my %hash;
        tie(%hash, 'Tie::IxHash');
        return \%hash;
    },
);

I think that's as easy as it gets from a user point of view. The internal API for that needs more work though. It's in a branch.

One of the more complicated things are cyclic references. When I iterate over the parser events and get a reference to one of the parent nodes, those nodes aren't finished yet. Still, I have to add a reference to them. That means that I have to have an existing data structure I can refer to, even if the values haven't been filled in.

This gets really funny when you want to serialize objects, for example. Let's say you want to serialize Graphs, and their nodes are hash objects.

Additionally a graph can be cyclic. When loading this, the callback needs to create an empty Graph Node object so that I can refer to it later.

Since YAML::PP doesn't know it's filling an object instead of a plain hash, I need to provide a callback (something like on_key_value) that takes the data from the constructor and adds it to the Node object. That's because the class might want to check the content of the data it gets, so that a wrong YAML file doesn't create an invalid object.

Fun!

PyYAML, by the way, works a bit different. For custom constructors, it creates a DOM like data structure. While this can be also useful, in most cases you just want to receive the child nodes already resolved. Also, PyYAML seems a bit limited for custom constructors in other regards, but I still have to play with it.

libyaml

YAML allows colons in plain strings, as long as no space (or other special character) follows. So you can write:

url: http://example.org/

Inside a flow collection, this is also allowed, but that's actually our interpretation of the YAML 1.1 spec. It has a mistake, and Felix Krause explained it to us. YAML 1.2 however, is clear on that. So this should be allowed too:

flow sequence: [http://example.org/, http://yaml.org/]
flow mapping: { url: http://example.org/ }

libyaml doesn't allow this. PyYAML has been fixed regarding that.

I made a pull request for libyaml to allow that.

It will still fail on edge cases like {foo:} (which should be the same as { foo: null }), but that's harder to fix. (It's C!)

I'm not sure yet if or when this will be merged. So until this is released, and integrated into YAML::XS, you still have to quote such things in YAML::XS.

Leave a comment

About tinita

user-pic just another perl punk,