5.10 regex features: Build a nested structure while matching
In trying to learn 5.10 grammar-like regex features and wrote a program that, while matching simple Lisp-like constructs, builds up a data structure. This involves recursing into subpatterns, but I couldn't find a way to assemble the resulting tree from the bottom up. Therefore I used a stack.
Do you know of a better way of building the data structure with the given regex/grammar?
(I'm aware of Parse::RecDescent and Regexp::Grammars, but wanted to do this as simply as possible.)
#!/usr/bin/env perl
# Exercising perl 5.10 regular expression features:
# Return a tree structure while matching simple Lisp-like constructs
use warnings;
use strict;
use 5.010;
use Test::More;
use Test::Differences;
sub parse {
my $string = shift;
# 'my @S' produces 'Variable "@S" will not stay shared' and failures
our @S = ();
# 'my @t' produces 'Can't localize lexical variable @t'
our @t;
# Use a stack because I can't see a way of localizing variables when
# recursing into subpatterns.
state $re = qr{
(?&tree)
(?(DEFINE)
(?<tree>
\(
(?{push @S, 'MARK'})
(?&element) (?: \s+ (?&element) )*
\)
(?{
local @t;
while ((my $el = pop @S) ne 'MARK') { unshift @t, $el }
push @S, \@t;
})
)
(?<element> (?&value) | (?&tree) )
(?<value> (\w+) (?{ push @S, $^N }) )
)
}x;
$string =~ $re;
pop @S;
}
sub check {
my ($string, $expect) = @_;
eq_or_diff parse($string), $expect, $string;
}
check('(print)', ['print']);
check('(print foo bar baz)', [qw(print foo bar baz)]);
check('(print (foo 1 2) (bar 3 (baz 4 5)))',
[ 'print', [ 'foo', '1', '2' ], [ 'bar', '3', [ 'baz', '4', '5' ] ] ]);
check('((foo (bar 1)) baz)', [ [ 'foo', [ 'bar', '1' ] ], 'baz' ]);
done_testing;
You can get the code at https://gist.github.com/1257636 .
Leave a comment