Reusing data with YAML Anchors, Aliases and Merge Keys
I just added a feature called "Merge Keys" to YAML::PP. No Perl YAML module supports this so far. You can merge mappings defined elsewhere in your YAML document into other mappings with that. Here is a short example:
---
defaults: &defaults
A: 1
B: 2
mapping:
<< : *defaults
A: 23
C: 99
# same as:
mapping:
B: 2
A: 23
C: 99
If you don't know the &defaults/*defaults
notation, here follows a basic
introduction before I explain the merge keys.
YAML Anchors and Aliases
The &name
is an anchor. It can be added to any mapping, sequence or scalar.
It's similar to references in Perl 5.
Later in the YAML Document you can refer to a previously defined anchor with
an alias *name
, like a Perl reference.
---
invoice:
billing address: &address1
name: Santa Clause
street: Santa Claus Lane
city: North Pole
shipping address: *address1
You can also use it for scalars:
---
name1: &name Larry Wall
name2: *name
The YAML specification requires that the loaded data is aliased, so of you
change $data->{name1}
, it would change $data->{name2}
too. This is
implemented differently in Perl's YAML modules.
YAML::XS is currently the only module
that can do this for scalars, because it uses real perl aliases.
YAML.pm,
YAML::Syck and
YAML::PP only use references, so it
works only for data which are references.
I want to add real aliases to YAML::PP in the future, but it's not a top priority.
YAML Merge Keys
Merge Keys were added for YAML 1.1.
This is the example from the specification:
---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: 10 }
- &SMALL { r: 1 }
# All the following maps are equal:
- # Explicit keys
x: 1
y: 2
r: 10
label: center/big
- # Merge one map
<< : *CENTER
r: 10
label: center/big
- # Merge multiple maps
<< : [ *CENTER, *BIG ]
label: center/big
- # Override
<< : [ *BIG, *LEFT, *SMALL ]
x: 1
label: center/big
The specification doesn't say anything about the location of the merge key. In the example it is always on top, and looking at already existing implementations, it actually doesn't matter where it is located. Actual values in the mapping will always override the merged ones.
The merge value does not have to be an alias, so the "Merge one map" could theoretically also look like this:
- # Merge one map
<< : { x: 1, y: 2 }
r: 10
label: center/big
You might wonder how you would use a literal mapping key containing the string
<<
, not being a merge key. You can do that by quoting it. A merge key is
working like other special values. Compare the following theoretical examples:
---
implicit boolean: true
explicit boolean: !!bool true
string: 'true'
implicit null: null
explicit null: !!null 'null'
string: 'null'
implicit merge key: <<
explicit merge key: !!merge '<<'
string: '<<'
That means a merge key is created with the tag !!merge
, but it is automatically
resolved if the string is <<
.
Once you know how to use it it's a pretty convenient syntax. In other languages where we have YAML processors that support this, merge keys are widely used.
Unfortunately the implementation is not trivial. It would be trivial for a basic processor where most things are hardcoded.
In YAML::PP I am providing support for your own callbacks that handle loading mappings. Additionally I have to add a possibility to handle duplicate keys, since they should be forbidden by default.
That means, a generic implementation requires some work.
Additionally, most other YAML tags only operate on the node itself, for example
!!null
, !!int
etc. The !!merge
tag is changing the behaviour of the
parent mapping which is why I had to add special handling for it.
The inner API for this is not finished, but you can use merge keys since YAML::PP v0.015:
my $yp = YAML::PP->new( schema => [qw/ JSON Merge /] );
Feedback welcome, and happy hacking!
> Merge Keys were added for YAML 1.1
I was under the impression that these were a "proposed extension" and not actually added to the 1.1 spec, is that not correct?
And have you looked at a patch to add this feature to YAML::XS?
That's right, they are not "really" official, and Ingy would actually like to get rid of them, but they were implemented in several YAML processors, and people like to use them, so I decided to implement them optionally.
Adding it to YAML::XS would be possible I guess. Just a bit of work because of, well, XS... And Ingy should be ok with adding it before I start doing any work.
If you want to get some of the speed of C, you could also try YAML::PP::LibYAML, where you could use the merge key feature already.
Let me know if you have more questions or suggestions (preferably via github, email or IRC because I don't get notifications here on blos.perl.org).
Oh, actually, I *do* get comment notifications now. Nice!
On newer Perls with refaliasing, it should be trivial to implement scalar aliases. YAML::PP could use that.
On older Perls, Perl-based but non-pure-Perl YAML implementations can use Array::RefElem as a lightweight form of the refaliasing feature. Its drawback compared to refaliasing is that it only works on a slot in a container – which is a limitation when used in Perl code for aliasing scalars, but I think there is no possibility of encountering aliases in YAML in any other circumstance than inside slots within a container, so Array::RefElem should actually be 100% sufficient for that. The advantage of RefElem is that it only exposes three extremely basic perlguts functions, so it is effectively guaranteed to never break and to be 100% backward and forward compatible – unlike every other non-core aliasing module I know of.
Yeah, I read about the refaliasing feature.
For older perls I had a look at Data::Alias. But I didn't know about Array::RefElem before.
With "slot in a container" you mean, that I can only add an alias to an array or hash element?
I'm not yet sure that the implementation will be trivial, because at the time I parse and store the alias, it is stored in a temporary structure, which will be rearranged during the constructing process.
It's implemented like that because I provide callbacks for custom constructurs. A custom constructor for a mapping will get the keys and values as an arrayref, so that it can a) receive the original order, b) be able to do something with keys that are not strings, and c) do something else.
I guess that problem has to be solved in any case - for refaliasing and for Array::RefElem.
My previous reply was a reply to your comment, Aristotle. Due to the commenting bug here that got lost.
Yes, I only meant trivial in the sense that you don’t have to resort to any trickery or low-level wizardry or XS or anything like that to create aliases of scalars to each other. A simple core-language construct is all you need. I did not mean to imply that the feature as a whole will be trivial to implement.
Exactly. It lets you make the value of an array index or hash key into an alias to another scalar, so that this scalar is then stored in multiple places and changing it in any of those places changes all the other places too, without any de-/referencing.
This is a limitation, of course, that the module only lets you do that with array/hash slots – but the upside is that despite being an XS module it’s completely zero magic and effectively unbreakable.
(Data::Alias is the opposite, it tries to get around all limitations and be as DWIMmy as possible, which has resulted in it being broken for long stretches of time. I used to be a fan, but that experience has turned me off, and I no longer love its interface either. Now that we have
refaliasing
in core, I really can’t recommend it for anything.)