Reusing data with YAML Anchors, Aliases and Merge Keys

By tinita on May 15, 2019 12:41 AM

I just added a feature called "Merge Keys" to YAML::PP. No Perl YAML module supports this so far. You can merge mappings defined elsewhere in your YAML document into other mappings with that. Here is a short example:

---
defaults: &defaults
  A: 1
  B: 2
mapping:
  << : *defaults
  A: 23
  C: 99

# same as:
mapping:
  B: 2
  A: 23
  C: 99

If you don't know the &defaults/*defaults notation, here follows a basic introduction before I explain the merge keys.

YAML Anchors and Aliases

The &name is an anchor. It can be added to any mapping, sequence or scalar. It's similar to references in Perl 5.

Later in the YAML Document you can refer to a previously defined anchor with an alias *name, like a Perl reference.

---
invoice:
  billing address: &address1
    name: Santa Clause
    street: Santa Claus Lane
    city: North Pole
  shipping address: *address1

You can also use it for scalars:

---
name1: &name Larry Wall
name2: *name

The YAML specification requires that the loaded data is aliased, so of you change $data->{name1}, it would change $data->{name2} too. This is implemented differently in Perl's YAML modules. YAML::XS is currently the only module that can do this for scalars, because it uses real perl aliases. YAML.pm, YAML::Syck and YAML::PP only use references, so it works only for data which are references.

I want to add real aliases to YAML::PP in the future, but it's not a top priority.

YAML Merge Keys

Merge Keys were added for YAML 1.1.

This is the example from the specification:

---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: 10 }
- &SMALL { r: 1 }

# All the following maps are equal:

- # Explicit keys
  x: 1
  y: 2
  r: 10
  label: center/big

- # Merge one map
  << : *CENTER
  r: 10
  label: center/big

- # Merge multiple maps
  << : [ *CENTER, *BIG ]
  label: center/big

- # Override
  << : [ *BIG, *LEFT, *SMALL ]
  x: 1
  label: center/big

The specification doesn't say anything about the location of the merge key. In the example it is always on top, and looking at already existing implementations, it actually doesn't matter where it is located. Actual values in the mapping will always override the merged ones.

The merge value does not have to be an alias, so the "Merge one map" could theoretically also look like this:

- # Merge one map
  << : { x: 1, y: 2 }
  r: 10
  label: center/big

You might wonder how you would use a literal mapping key containing the string <<, not being a merge key. You can do that by quoting it. A merge key is working like other special values. Compare the following theoretical examples:

---
implicit boolean:   true
explicit boolean:   !!bool true
string:             'true'
implicit null:      null
explicit null:      !!null 'null'
string:             'null'
implicit merge key: <<
explicit merge key: !!merge '<<'
string:             '<<'

That means a merge key is created with the tag !!merge, but it is automatically resolved if the string is <<.

Once you know how to use it it's a pretty convenient syntax. In other languages where we have YAML processors that support this, merge keys are widely used.

Unfortunately the implementation is not trivial. It would be trivial for a basic processor where most things are hardcoded.

In YAML::PP I am providing support for your own callbacks that handle loading mappings. Additionally I have to add a possibility to handle duplicate keys, since they should be forbidden by default.

That means, a generic implementation requires some work.

Additionally, most other YAML tags only operate on the node itself, for example !!null, !!int etc. The !!merge tag is changing the behaviour of the parent mapping which is why I had to add special handling for it.

The inner API for this is not finished, but you can use merge keys since YAML::PP v0.015:

my $yp = YAML::PP->new( schema => [qw/ JSON Merge /] );

Feedback welcome, and happy hacking!

7 comments

Tagged as:

YAML

7 Comments

Kevin Goess | May 20, 2019 6:49 PM | Reply

> Merge Keys were added for YAML 1.1

I was under the impression that these were a "proposed extension" and not actually added to the 1.1 spec, is that not correct?

And have you looked at a patch to add this feature to YAML::XS?

tinita replied to comment from Kevin Goess | May 21, 2019 10:51 AM | Reply

That's right, they are not "really" official, and Ingy would actually like to get rid of them, but they were implemented in several YAML processors, and people like to use them, so I decided to implement them optionally.

Adding it to YAML::XS would be possible I guess. Just a bit of work because of, well, XS... And Ingy should be ok with adding it before I start doing any work.

If you want to get some of the speed of C, you could also try YAML::PP::LibYAML, where you could use the merge key feature already.

Let me know if you have more questions or suggestions (preferably via github, email or IRC because I don't get notifications here on blos.perl.org).

tinita replied to comment from tinita | May 21, 2019 10:56 AM | Reply

Oh, actually, I *do* get comment notifications now. Nice!

Aristotle | May 22, 2019 7:57 PM | Reply

On newer Perls with refaliasing, it should be trivial to implement scalar aliases. YAML::PP could use that.

On older Perls, Perl-based but non-pure-Perl YAML implementations can use Array::RefElem as a lightweight form of the refaliasing feature. Its drawback compared to refaliasing is that it only works on a slot in a container – which is a limitation when used in Perl code for aliasing scalars, but I think there is no possibility of encountering aliases in YAML in any other circumstance than inside slots within a container, so Array::RefElem should actually be 100% sufficient for that. The advantage of RefElem is that it only exposes three extremely basic perlguts functions, so it is effectively guaranteed to never break and to be 100% backward and forward compatible – unlike every other non-core aliasing module I know of.

tinita | May 25, 2019 1:49 AM | Reply

Yeah, I read about the refaliasing feature.

For older perls I had a look at Data::Alias. But I didn't know about Array::RefElem before.
With "slot in a container" you mean, that I can only add an alias to an array or hash element?

I'm not yet sure that the implementation will be trivial, because at the time I parse and store the alias, it is stored in a temporary structure, which will be rearranged during the constructing process.

It's implemented like that because I provide callbacks for custom constructurs. A custom constructor for a mapping will get the keys and values as an arrayref, so that it can a) receive the original order, b) be able to do something with keys that are not strings, and c) do something else.

I guess that problem has to be solved in any case - for refaliasing and for Array::RefElem.

tinita replied to comment from Aristotle | May 25, 2019 1:51 AM | Reply

My previous reply was a reply to your comment, Aristotle. Due to the commenting bug here that got lost.

Aristotle replied to comment from tinita | May 26, 2019 11:21 PM | Reply

I’m not yet sure that the implementation will be trivial

Yes, I only meant trivial in the sense that you don’t have to resort to any trickery or low-level wizardry or XS or anything like that to create aliases of scalars to each other. A simple core-language construct is all you need. I did not mean to imply that the feature as a whole will be trivial to implement.

With "slot in a container" you mean, that I can only add an alias to an array or hash element?

Exactly. It lets you make the value of an array index or hash key into an alias to another scalar, so that this scalar is then stored in multiple places and changing it in any of those places changes all the other places too, without any de-/referencing.

This is a limitation, of course, that the module only lets you do that with array/hash slots – but the upside is that despite being an XS module it’s completely zero magic and effectively unbreakable.

(Data::Alias is the opposite, it tries to get around all limitations and be as DWIMmy as possible, which has resulted in it being broken for long stretches of time. I used to be a fan, but that experience has turned me off, and I no longer love its interface either. Now that we have refaliasing in core, I really can’t recommend it for anything.)

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About tinita

just another perl punk,

More info »

tinita