Your own template engine in 4 flavors. With Benchmarks!

This time on blog I'll show you how to write your own template engine - with syntax and behavior tailored for your needs. And we'll do it in four different ways to analyze pros and cons of each approach as well as code speed and complexity. Our sample task for today is to compose password reminder text for user, which can then be sent by email.

use v6;

my $template = q{
    Hi [VARIABLE person]!

    You can change your password by visiting [VARIABLE link] .
    
    Best regards.
};

my %fields = (
    'person' => 'John',
    'link' => 'http://example.com'
);

So we decided how our template syntax should look like and for starter we'll do trivial variables (although that's not very precise name because variables in templates are almost always immutable).
We also have data to populate template fields. Let's get started!

1. Substitutions

sub substitutions ( $template is copy, %fields ) {
    for %fields.kv -> $key, $value {
        $template ~~ s:g/'[VARIABLE ' $key ']'/$value/;
    }
    return $template;
}

say substitutions($template, %fields);

Yay, works:

    Hi John!

    You can change your password by visiting http://example.com .

    Best regards.

Now it is time to benchmark it to get some baseline for different approaches:

use Bench;

my $template_short = $template;
my %fields_short = %fields;

my $template_long = join(
    ' lorem ipsum ', map( { '[VARIABLE ' ~ $_ ~ ']' }, 'a' .. 'z')
) x 100;
my %fields_long = ( 'a' .. 'z' ) Z=> ( 'lorem ipsum' xx * );

my $b = Bench.new;
$b.timethese(
    1000,
    {
        'substitutions_short' => sub {
            substitutions( $template_short, %fields_short )
        },
        'substitutions_long' => sub {
            substitutions( $template_long, %fields_long )
        },
    }
);

Benchmark in this post will tests two cases for each approach. Our template from example is "short" case. And there is "long" case with 62KB template containing 2599 text fragments and 2600 variables filled by 26 fields. So here are the results:

Timing 1000 iterations of substitutions_long, substitutions_short...
substitutions_long: 221.1147 wallclock secs @ 4.5225/s (n=1000)
substitutions_short: 0.1962 wallclock secs @ 5097.3042/s (n=1000)

Whoa! That is a serious penalty for long templates. And the reason for that is because this code has three serious flaws - original template is destroyed during variables evaluation and therefore it must be copied each time we want to reuse it, template text is parsed multiple times and output is rewritten every time after populating each variable. But we can do better...

2. Substitution

sub substitution ( $template is copy, %fields ) {
    $template ~~ s:g/'[VARIABLE ' (\w+) ']'/{ %fields{$0} }/;
    return $template;
}

This time we have single substitution. Variable name is captured and we can use it to get field value on the fly. Benchmarks:

Timing 1000 iterations of substitution_long, substitution_short...
substitution_long: 71.6882 wallclock secs @ 13.9493/s (n=1000)
substitution_short: 0.1359 wallclock secs @ 7356.3411/s (n=1000)

Mediocre boost. We have less penalty on long templates because text is not parsed multiple times. However remaining flaws from previous approach still apply and regexp engine still must do plenty of memory reallocations for each piece of template text replaced.

Also it won't allow our template engine to gain new features - like conditions or loops - in the future because it is very hard to parse nested tags in single regexp. Time for completely different approach...

3. Grammars and direct Actions

If you are not familiar with Perl 6 grammars and Abstract Syntax Tree concept you should study official documentation first.

grammar Grammar {
    regex TOP { ^ [  |  ]* $ }
    regex text { <-[ [ ] >+ }
    regex variable { '[VARIABLE ' $=(\w+) ']' }
}

class Actions {
    
    has %.fields is required;
    
    method TOP ( $/ ) {
        make [~]( map { .made }, $/{'chunk'} );
    }
    method text ( $/ ) {
        make ~$/;
    }
    method variable ( $/ ) {
        make %.fields{$/{'name'}};
    }
    
}

sub grammar_actions_direct ( $template, %fields ) {
    my $actions = Actions.new( fields => %fields );
    return Grammar.parse($template, :$actions).made;
}

The most important thing is defining our template syntax as a grammar. Grammar is just a set of named regular expressions that can call each other. On "TOP" (where parsing starts) we see that our template is composed of chunks. Each chunk can be text or variable. Regexp for text matches everything until it hits variable start ('[' character, let's assume it is forbidden in text to make things more simple). Regexp for variable should look familiar from previous approaches, however now we capture variable name in named way instead of positional.

Action class has methods that are called whenever regexp with corresponding name is matched. When called, method gets match object ($/) from this regexp and can "make" something from it. This "made" something will be seen by upper level method when it is called. For example our "TOP" regexp calls "text" regexp which matches "Hi " part of template and calls "text" method. This "text" method just "make"s this matched string for later use. Then "TOP" regexp calls "variable" regexp which matches "[VARIABLE name]" part of template. Then "variable" method is called and it checks in match object for variable name and "makes" value of this variable from %fields hash for later use. This continues until end of template string. Then "TOP" regexp is matched and "TOP" method is called. This "TOP" method can access array of text or variable "chunks" in match object and see what was "made" for those chunks earlier. So all it has to do is to "make" those values concatenated together. And finally we get this "made" template from "parse" method. So let's look at benchmarks:

Timing 1000 iterations of grammar_actions_direct_long, grammar_actions_direct_short...
grammar_actions_direct_long: 149.5412 wallclock secs @ 6.6871/s (n=1000)
grammar_actions_direct_short: 0.2405 wallclock secs @ 4158.1981/s (n=1000)

We got rid of two more flaws from previous approaches. Original template is not destroyed when fields are filled and that means less memory copying. There is also no reallocation of memory during substitution of each field because now every action method just "make"s strings to be joined later. And we can easily extend our template syntax by adding loops, conditions and more features just by throwing some regexps into grammar and defining corresponding behavior in actions. Unfortunately we see some performance regression and this happens because every time template is processed it is parsed, match objects are created, parse tree is built and it has to track all those "make"/"made" values when it is collapsed to final output. But that was not our final word...

4. Grammars and closure Actions

Finally we reached "boss level", where we have to exterminate last and greatest flaw - re-parsing.
The idea is to use grammars and actions like in previous approach, but this time instead of getting direct output we want to generate executable and reusable code that works like this under the hood:

sub ( %fields ) {
    return join '',
        sub ( %fields ) { return "Hi "}.( %fields ),
        sub ( %fields ) { return %fields{'person'} }.( %fields ),
        ...
}

That's right, we will be converting our template body to a cascade of subroutines.
Each time this cascade is called it will get and propagate %fields to deeper subroutines.
And each subroutine is responsible for handling piece of template matched by single regexp in grammars. We can reuse grammar from previous approach and modify only actions:

class Actions {
    
    method TOP ( $/ ) {
        my @chunks = $/{'chunk'};
        make sub ( %fields ) {
           return [~]( map { .made.( %fields ) }, @chunks );
        };
    }
    method text ( $/ ) {
        my $text = ~$/;
        make sub ( %fields ) {
            return $text;
        };
    }
    method variable ( $/ ) {
        my $name = $/{'name'};
        make sub ( %fields  ) {
            return %fields{$name}
        };
    }
    
}

sub grammar_actions_closures ( $template, %fields ) {
    state %cache{Str};
    my $closure = %cache{$template} //= Grammar.parse(
        $template, actions => Actions.new
    ).made;
    return $closure( %fields );
}

Now every action method instead of making final output makes a subroutine that will get %fields and do final output later. To generate this cascade of subroutines template must be parsed only once. Once we have it we can call it with different set of %fields to populate in our template variables. Note how Object Hash %cache is used to determine if we already have subroutines tree for given $template. Enough talking, let's crunch some numbers:

Timing 1000 iterations of grammar_actions_closures_long, grammar_actions_closures_short...
grammar_actions_closures_long: 22.0476 wallclock secs @ 45.3563/s (n=1000)
grammar_actions_closures_short: 0.0439 wallclock secs @ 22778.8885/s (n=1000)

Nice result! We have extensible template engine that is 4 times faster for short templates and 10 times faster for long templates than our initial approach. And yes, there is bonus level...

4.1. Grammars and closure Actions in parallel

Last approach opened a new optimization possibility. If we have subroutines that will generate our template why not run them in parallel? So let's modify our action "TOP" method to process text and variable chunks simultaneously:

method TOP ( $/ ) {
    my @chunks = $/{'chunk'};
    make sub ( %fields ) {
       return [~]( @chunks.hyper.map( {.made.( %fields ) } ).list );
    };
}

Such optimization will shine if your template engine must do some lengthy operations to generate chunk of final output, for example execute heavy database query or call some API. It is perfectly fine to ask for data on the fly to populate template, because in feature rich template engine you may not be able to predict and generate complete set of data needed beforehand, like we did with our %fields. Use this optimization wisely - for fast subroutines you will see a performance drop because cost of sending and retrieving chunks to/from threads will be higher that just executing them in serial on single core.

Which approach should I use to implement my own template engine?

That depends how much you can reuse templates. For example if you send one password reminder per day - go for simple substitution and reach for grammar with direct actions if you need more complex features. But if you are using templates for example in PSGI processes to display hundreds of pages per second for different users then grammar and closure actions approach wins hands down.

You can download all approaches with benchmarks in single file here.

To be continued?

If you like this brief introduction to template engines and want to see more complex features like conditions of loops implemented leave a comment under this article on blogs.perl.org or send me a private message on irc.freenode.net #perl6 channel (nick: bbkr).

Leave a comment

About Pawel bbkr Pabian

user-pic GitHub LinkedIn