SUSE Hackweek Day 2 - YAML::PP !include

In this post I'll talk about what I hacked on the second day of the SUSE hackweek. If you missed my other posts:

Including files in YAML

YAML already allows reusing data with so called anchors and aliases:

---
invoice:
  billing address: &address      # define anchor
    name: Santa Claus
    street: Santa Claus Lane
    city: North Pole
  shipping address: *address     # use alias

But sometimes you have data that you want to reuse in several YAML documents, or maybe your files are very big and it gets hard to edit them.

I was planning to implement !include for YAML::PP at some point.

Recently, Martin Barth approached me on IRC and asked my if he would be able to read RAML files with YAML::PP.

RAML files are YAML 1.2, and the only non-standard tag it is using is !include.

It was already possible do implement that with the previous version of YAML::PP, but the API for that is not yet stable and not documented.

Also, when including other files, you have to take care of several things, so it's better to have a standard plugin.

So I began working on this already a while before hackweek, but there were still some issues I had to think about, and I created a working version this week.

You can now use includes in YAML::PP v0.017 and adjust it to your needs, if necessary. It's called YAML::PP::Schema::Include.

Let's take the example from above:

--- # invoice.yaml
invoice:
  billing address: &address !include includes/santa-claus-address.yaml
  shipping address: *address

--- # includes/santa-claus-address.yaml
name: Santa Claus
street: Santa Claus Lane
city: North Pole

You can read that with the following code:

# create an instance of the Include class
my $include = YAML::PP::Schema::Include->new;

# Add it as a schema for YAML::PP
my $yp = YAML::PP->new( schema => ['JSON', $include] );

# let the $include instance know of the YAML::PP object
# I might remove the need for this eventually
$include->yp($yp);

my $invoice = $yp->load_file("invoice.yaml");

The filename specified in the !include tag is relative to the currently processed filename. You can also recursively include files in the included files.

Circular includes are prevented, and YAML::PP will die.

By default absolute filenames and .. are forbidden, so that the following won't work:

---
passwd: !include /etc/passwd
---
passwd: !include ../../../../../../etc/passwd

If you want to allow that, you have to instantiate the object like this:

my $include = YAML::PP::Schema::Include->new(
    allow_absolute => 1,
);

Another possibility is to specify the include path yourself, so that the included files will be searched for in this directory. You can even specify a list of directories:

my $include = YAML::PP::Schema::Include->new(
    paths => ['/path/one', '/path/two'],
);

The loader will iterate through the paths until it found the specified file.

Apparently that's not enough for reading RAML files. The specification says that an include can also just load the content of files that are not RAML. It depends on the content of the file. Only if it has a RAML directive, it will be loaded as YAML.

Here is a simpler example how you can handle the loading yourself:

    my $include = YAML::PP::Schema::Include->new(
        loader => sub {
            my ($yp, $filename);
            if ($filename =~ m/\.txt$/) {
                # open file and just return text
            }
            else {
                # default behaviour
                return $yp->load_file($filename);
            }
        },
    );

All the details about finding the full filename and detecting circular includes will still be done for you.

The only thing not (yet) working with this is this RAML syntax:

type: !include elements.xsd#Foo

I'm thinking of a way to get this working.

The implementation

It was a bit tricky to get this working.

An instance of YAML::PP and its other objects saves the state of parsing. We need a fresh object for including a file. But this object should have the same configuration as the root object.

So I added a clone method to several YAML::PP::* classes which create new objects with the same options as the original objects. I think that's as efficient as it gets. I reuse the YAML::PP->schema object, because this is static, and it's the part of creating a completely new object which costs most of the time.

Leave a comment

About tinita

user-pic just another perl punk,