SUSE Hackweek Day 2 - YAML::PP !include
In this post I'll talk about what I hacked on the second day of the SUSE hackweek. If you missed my other posts:
Including files in YAML
YAML already allows reusing data with so called anchors and aliases:
---
invoice:
billing address: &address # define anchor
name: Santa Claus
street: Santa Claus Lane
city: North Pole
shipping address: *address # use alias
But sometimes you have data that you want to reuse in several YAML documents, or maybe your files are very big and it gets hard to edit them.
I was planning to implement !include
for
YAML::PP at some point.
Recently, Martin Barth approached me on IRC and asked my if he would be able to read RAML files with YAML::PP.
RAML files are YAML 1.2, and the only non-standard tag it
is using is !include
.
It was already possible do implement that with the previous version of YAML::PP, but the API for that is not yet stable and not documented.
Also, when including other files, you have to take care of several things, so it's better to have a standard plugin.
So I began working on this already a while before hackweek, but there were still some issues I had to think about, and I created a working version this week.
You can now use includes in YAML::PP v0.017 and adjust it to your needs, if necessary. It's called YAML::PP::Schema::Include.
Let's take the example from above:
--- # invoice.yaml
invoice:
billing address: &address !include includes/santa-claus-address.yaml
shipping address: *address
--- # includes/santa-claus-address.yaml
name: Santa Claus
street: Santa Claus Lane
city: North Pole
You can read that with the following code:
# create an instance of the Include class
my $include = YAML::PP::Schema::Include->new;
# Add it as a schema for YAML::PP
my $yp = YAML::PP->new( schema => ['JSON', $include] );
# let the $include instance know of the YAML::PP object
# I might remove the need for this eventually
$include->yp($yp);
my $invoice = $yp->load_file("invoice.yaml");
The filename specified in the !include
tag is relative to the currently
processed filename. You can also recursively include files in the
included files.
Circular includes are prevented, and YAML::PP will die.
By default absolute filenames and ..
are forbidden, so that the following
won't work:
---
passwd: !include /etc/passwd
---
passwd: !include ../../../../../../etc/passwd
If you want to allow that, you have to instantiate the object like this:
my $include = YAML::PP::Schema::Include->new(
allow_absolute => 1,
);
Another possibility is to specify the include path yourself, so that the included files will be searched for in this directory. You can even specify a list of directories:
my $include = YAML::PP::Schema::Include->new(
paths => ['/path/one', '/path/two'],
);
The loader will iterate through the paths until it found the specified file.
Apparently that's not enough for reading RAML files. The specification says that an include can also just load the content of files that are not RAML. It depends on the content of the file. Only if it has a RAML directive, it will be loaded as YAML.
Here is a simpler example how you can handle the loading yourself:
my $include = YAML::PP::Schema::Include->new(
loader => sub {
my ($yp, $filename);
if ($filename =~ m/\.txt$/) {
# open file and just return text
}
else {
# default behaviour
return $yp->load_file($filename);
}
},
);
All the details about finding the full filename and detecting circular includes will still be done for you.
The only thing not (yet) working with this is this RAML syntax:
type: !include elements.xsd#Foo
I'm thinking of a way to get this working.
The implementation
It was a bit tricky to get this working.
An instance of YAML::PP and its other objects saves the state of parsing. We need a fresh object for including a file. But this object should have the same configuration as the root object.
So I added a clone
method to several YAML::PP::*
classes which create
new objects with the same options as the original objects. I think that's as
efficient as it gets. I reuse the YAML::PP->schema
object, because this
is static, and it's the part of creating a completely new object which costs
most of the time.
Leave a comment