Reusing data with YAML Anchors, Aliases and Merge Keys

I just added a feature called "Merge Keys" to YAML::PP. No Perl YAML module supports this so far. You can merge mappings defined elsewhere in your YAML document into other mappings with that. Here is a short example:

defaults: &defaults
  A: 1
  B: 2
  << : *defaults
  A: 23
  C: 99

# same as:
  B: 2
  A: 23
  C: 99

If you don't know the &defaults/*defaults notation, here follows a basic introduction before I explain the merge keys.

YAML Anchors and Aliases

The &name is an anchor. It can be added to any mapping, sequence or scalar. It's similar to references in Perl 5.

Later in the YAML Document you can refer to a previously defined anchor with an alias *name, like a Perl reference.

  billing address: &address1
    name: Santa Clause
    street: Santa Claus Lane
    city: North Pole
  shipping address: *address1

You can also use it for scalars:

name1: &name Larry Wall
name2: *name

The YAML specification requires that the loaded data is aliased, so of you change $data->{name1}, it would change $data->{name2} too. This is implemented differently in Perl's YAML modules. YAML::XS is currently the only module that can do this for scalars, because it uses real perl aliases., YAML::Syck and YAML::PP only use references, so it works only for data which are references.

I want to add real aliases to YAML::PP in the future, but it's not a top priority.

YAML Merge Keys

Merge Keys were added for YAML 1.1.

This is the example from the specification:

- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: 10 }
- &SMALL { r: 1 }

# All the following maps are equal:

- # Explicit keys
  x: 1
  y: 2
  r: 10
  label: center/big

- # Merge one map
  << : *CENTER
  r: 10
  label: center/big

- # Merge multiple maps
  << : [ *CENTER, *BIG ]
  label: center/big

- # Override
  << : [ *BIG, *LEFT, *SMALL ]
  x: 1
  label: center/big

The specification doesn't say anything about the location of the merge key. In the example it is always on top, and looking at already existing implementations, it actually doesn't matter where it is located. Actual values in the mapping will always override the merged ones.

The merge value does not have to be an alias, so the "Merge one map" could theoretically also look like this:

- # Merge one map
  << : { x: 1, y: 2 }
  r: 10
  label: center/big

You might wonder how you would use a literal mapping key containing the string <<, not being a merge key. You can do that by quoting it. A merge key is working like other special values. Compare the following theoretical examples:

implicit boolean:   true
explicit boolean:   !!bool true
string:             'true'
implicit null:      null
explicit null:      !!null 'null'
string:             'null'
implicit merge key: <<
explicit merge key: !!merge '<<'
string:             '<<'

That means a merge key is created with the tag !!merge, but it is automatically resolved if the string is <<.

Once you know how to use it it's a pretty convenient syntax. In other languages where we have YAML processors that support this, merge keys are widely used.

Unfortunately the implementation is not trivial. It would be trivial for a basic processor where most things are hardcoded.

In YAML::PP I am providing support for your own callbacks that handle loading mappings. Additionally I have to add a possibility to handle duplicate keys, since they should be forbidden by default.

That means, a generic implementation requires some work.

Additionally, most other YAML tags only operate on the node itself, for example !!null, !!int etc. The !!merge tag is changing the behaviour of the parent mapping which is why I had to add special handling for it.

The inner API for this is not finished, but you can use merge keys since YAML::PP v0.015:

my $yp = YAML::PP->new( schema => [qw/ JSON Merge /] );

Feedback welcome, and happy hacking!


> Merge Keys were added for YAML 1.1

I was under the impression that these were a "proposed extension" and not actually added to the 1.1 spec, is that not correct?

And have you looked at a patch to add this feature to YAML::XS?

On newer Perls with refaliasing, it should be trivial to implement scalar aliases. YAML::PP could use that.

On older Perls, Perl-based but non-pure-Perl YAML implementations can use Array::RefElem as a lightweight form of the refaliasing feature. Its drawback compared to refaliasing is that it only works on a slot in a container – which is a limitation when used in Perl code for aliasing scalars, but I think there is no possibility of encountering aliases in YAML in any other circumstance than inside slots within a container, so Array::RefElem should actually be 100% sufficient for that. The advantage of RefElem is that it only exposes three extremely basic perlguts functions, so it is effectively guaranteed to never break and to be 100% backward and forward compatible – unlike every other non-core aliasing module I know of.

I’m not yet sure that the implementation will be trivial

Yes, I only meant trivial in the sense that you don’t have to resort to any trickery or low-level wizardry or XS or anything like that to create aliases of scalars to each other. A simple core-language construct is all you need. I did not mean to imply that the feature as a whole will be trivial to implement.

With "slot in a container" you mean, that I can only add an alias to an array or hash element?

Exactly. It lets you make the value of an array index or hash key into an alias to another scalar, so that this scalar is then stored in multiple places and changing it in any of those places changes all the other places too, without any de-/referencing.

This is a limitation, of course, that the module only lets you do that with array/hash slots – but the upside is that despite being an XS module it’s completely zero magic and effectively unbreakable.

(Data::Alias is the opposite, it tries to get around all limitations and be as DWIMmy as possible, which has resulted in it being broken for long stretches of time. I used to be a fan, but that experience has turned me off, and I no longer love its interface either. Now that we have refaliasing in core, I really can’t recommend it for anything.)

Leave a comment

About tinita

user-pic just another perl punk,