YAML::PP Grant Report January 2018

Hi,

thanks for reading my report. This report will be quite short because I have been busy with other things, and a lot of time went into discussions rather than coding, and I never logged time for discussions.

I have been working on YAML::PP about 10 hours.

See also my previous reports on blogs.perl.org (Aug/Sep, Oct, Nov, Dec )

YAML::PP

I have been tweaking the Schema interface a little bit. To add a tag, the syntax now looks like that:

# Plain string 'null' is resolved to undef, with or without
# a '!!null' tag
$schema->add_resolver(
    tag => 'tag:yaml.org,2002:null',
    match => [ equals => null => undef ],
);
# Empty scalar is only resolved to undef with the '!!null' tag
$schema->add_resolver(
    tag => 'tag:yaml.org,2002:null',
    match => [ equals => '' => undef ],
    implicit => 0,
);

The syntax will probably change, though. I also want to define tags for sequences and mappings, not only for scalars, and the tag could also be a regex in the future.

I added YAML::PP::Representer, so YAML::PP::Dumper delegates to YAML::PP::Representer and YAML::PP::Emitter. Now this matches the loading stack, where YAML::PP::Loader delegates to YAML::PP::Constructor and YAML::PP::Parser.

The Schema will be an attribute of the YAML::PP object itself and will be passed to the various classes it delegates to.

I started restructuring the Constructor, so that I will be able to also change behaviour of loading sequences and mappings via Schema definitions. Before, the constructing was done using perl's reference features, to use as little memory and CPU as possible, but this was very unflexible.

This is not yet in master but in the schema branch.

I also started to add some fixes to YAML::PP::Emitter. This is in the emitter branch.

I have an idea how I can add a feature to preserve hash key order via a schema plugin, after I have restructured the Constructor to allow calling your own constructor on sequences and mappings. This constructor then would just use a tied hash with sorted keys when loading a mapping. Currently only YAML.pm supports preserving key order, also via tied hashes.

This way I don't have to add loads of options to the constructor (like preserve_key_order), but can implement a plugin. Adding this flexibility will allow people to implement all kinds of tweaks without the need for YAML::PP core to support them.

Currently I have more ideas than time to implement them, but I hope that will change a bit in a couple of weeks.

The London Perl Workshop 2017 Videos are online now, and you can find my talks here:

Regarding the JSON compatibility in the talk, it seems I was mistaken about the key length limit, but I'll talk about that later below.

Discussions

The maintainer of go-yaml joined our yaml-test-suite team, and we talked a bit about how to fix various issues, mainly the JSON comparison data.

We started to fix things.

Another thing I was wondering about was JSON compatibility. YAML 1.2 was created to fix some mistakes in the 1.1 Spec, but also to be completely compatible to JSON.

However, if you tell people "YAML is a strict superset of JSON", you should be aware and add, that many processors still implement 1.1 (or even 1.0).

With these parsers, you cannot reliable parse any JSON.

Also, even 1.2 parsers might not be able to do that, because they don't parse everything correctly. That's why we hope the yaml-test-suite will be a huge win for the future.

On the other hand it should be noted, that these are rare cases, and most JSON you are dealing with is probably be parseable by most YAML parsers.

I'm currently trying to find out

  • The changes that were actually made for 1.2
  • I'm told that these changes ensure compatibility with JSON, but I actually want to understand them myself, and ideally we can create a page on github so that everybody can read about them. The YAML 1.2 Spec does not list these in detail.

I'll just mention an example.

YAML 1.2 has a length limit of 1024 characters for implicit keys. The keys in the following map are called implicit, because the parser is aware of them to be keys only after it reaches the colon:

key1: one
key2: two

These keys, quoted or plain, must not be longer then 1024 characters. But in Flow Mappings this limit does not apply:

# valid
{ "key longer than 1024 characters": value }

So I was mistaken about that. The good thing is, that I know now the production in the Spec that proves this - I think!

Also I have been taking part in libyaml discussions, and I found another bug in YAML.pm

Also a Debian maintainer opened a YAML::XS pull request and suggested an environment variable switch to be able to build YAML::XS with the system libyaml. We're currently working on it, mainly trying to move the bundled libyaml into its own directory and ignore it if requested.

ToDos

I have been working quite an amount of time, not only on YAML::PP, but also on related projects.

It's taking longer than estimated. Another reason for that is that I don't want to quickly hardcode the features that I need for this Grant to be finished, but I want to design it flexible from the beginning.

Flow Collections

I need to implement a couple of more things in the parser for Flow Style Collections.

Sequences and Mappings as mapping keys

Since this not natively supported in Perl, it's probably not that important to get it working soon, but I can think of use cases where you'd want to be able to parse this and use your own constructor.

Add Schema for Dumper

You can correctly use the three YAML 1.2 Schemas for the Loader, but the Dumper/Representer currently is pretty simple and will not always generate correct output regarding scalar types/quotes.

Support loading and dumping generic perl objects

To be able to use YAML::PP as a replacement for existing modules I should implement all types of perl objects including Code references.

That's it, thanks for reading and see you in four weeks!

Leave a comment

About tinita

user-pic just another perl punk,