Perl and Parsing 10: "Use" the Easier Way
the previous post in this series,
I described the way
the Traditional Perl Parser (TPP) parses Perl's
This post will describe an IMHO attractive alternative approach,
using my own
Marpa::XS has a prototype Perl parser as part of its test suite.
The TPP Way
If that previous post is not fresh in your mind, it may be useful at this point to look back at it. Summarizing, the TPP uses the "Ruby Slippers" method: It first simplifies the grammar so that, while convenient for the parser, the grammar falls short of describing the actual language. The lexer then fakes the input to the parser so that the parser's "wishful thinking" comes true.
In the TPP, the parser requires all
take this form:
use_statement ::= USE WORD WORD LIST
WORD's may be either versions or
names of modules -- the semantics
looks at the internal representations to sort this out once
the parsing is finished.
The Marpa::Perl Way
Marpa is a lot better than
TPP's LALR parse engine at the Ruby Slippers thing.
Marpa will tell its lexers exactly what it wants at every point, whereas
the TPP's lexer can expect no help from the LALR parse engine
and is forced to do its own parsing of
But in Marpa::Perl, I decided NOT to use the Ruby Slippers
to parse Perl's
Marpa is good at the Ruby Slippers, but it also allows even better and easier
ways to tackle this problem.
Marpa can parse far more grammars than the TPP's parse engine.
Marpa can even parse ambigious grammars.
comes in three forms, and it is easy to create a rule for
use_statement ::= USE NAME VERSION LIST
use_statement ::= USE VERSION
use_statement ::= USE NAME LIST
usestatement -- so natural that in fact, it's almost exactly the way the Perl documentation DOES describe it. The one difference is that the Perl documentation lists separate forms for
usestatements with and without
LIST's of arguments. This is useful for clarity, but
LIST's can be empty. Separate rules in the parser's BNF dedicated to module requests without the
LIST's would create a useless and inefficient ambiguity.
The three rules above demonstrate ambiguity of a helpful kind.
The TPP parses this as a request to use
the (so far non-existent) version 5 or higher of the
use Fatal 5;
According to both TPP's and Marpa::Perl's BNF,
however, it could also be parsed as a request to
Fatal to wrap a Perl subroutine named
In fact, that's exactly how the TPP will interpret
use Fatal +5;
usestatement that way whenever possible. When a
usestatement cannot possibly be the long form of a module request, Marpa::Perl tries to parse it as a perl version request. As a last resort, Marpa::Perl interprets a
usestatement as the short form (name only) of a module request.
By using this rule ranking,
Marpa::Perl's lexer can avoid the issue of determining what is a
version number and what is a name.
Marpa allows tokens to be
Marpa::Perl's lexer can tell the parser that
v5.0 is BOTH a name and a version,
and let the parser sort things out.
LALR is, to say the least,
not tolerant of
the TPP's lexer must tackle the issue
of identifying versions head-on.
When it comes to parsing Perl's
I believe I have shown that Marpa can do it more easily.
in Marpa required the three BNF rules shown,
plus arranging to have the lexer recognize that numbers
and v-strings can be either a version or a name,
which is done with
a simple loop over a fixed list.
the corresponding code in
and you'll see some of the most difficult logic
that you're likely to find anywhere.
Perhaps more significant is what Marpa::Perl could do that the TPP
does not and could not do.
Imagine, for example, that you've written a module to make available
the regular expression logic as implemented in the classic
AT&T UNIX version 8,
affectionately remembered as v8 UNIX.
Imagine further that you decide, unhappily, to name this module,
In that case you can simply forget about loading it with TPP's
which will insist that
v8 is a version number,
even in a case like this:
use v8 0 qw(re);
just shown can only have one parse.
A human being can quickly determine what that one parse must be.
Why have we come to believe this is too much
to expect of our parsing algorithms?
"Traditional Perl Parser": In this series, I call the Perl interpreter's parser, the one whose implementation centers on the files toke.c and perly.y, the Traditional Perl Parser, or TPP. Most others just call it the Perl parser, but in this series I discuss its alternatives (Marpa::Perl and PPI) a lot. This little bit of special terminology saves confusion.
"names of modules": Actually, these can also be pragma names, a fact that, for brevity's sake, I will not mention again.
"last resort": Actually, the TPP accepts some undocumented syntaxes. This gets into strange territory. The TPP parses
as a request to use version 2.0 or higher of the
use v2 Fatal;
Fatalmodule. You could take the point of view that the TPP should reject statements like this as syntax errors, in which case TPP's behavior is a bug. For the purposes of this blog post, I ignore this issue.
"a simple loop over a fixed list": For the code see pperl/Marpa/Perl.pm in the latest developer's version of Marpa::XS . The code discussed begins at line 1200.