Perl and Parsing 9: "Use" and the Ruby Slippers
In this post, I talk about how
Perl 5 parses its
is implemented with
what I have named
"Ruby Slippers" parsing.
The idea is that you parse with a convenient grammar,
but one which is too simple to actually describe the language
you are parsing.
For example, if you are parsing HTML, the grammar might assume
all start tags have end tags.
Whenever the simplified grammar has trouble parsing, the lexer fixes the situation by pretending the input is what the parser wants to see. The parser is like Dorothy in the Wizard of Oz, who really would like to be back in Kansas. The lexer is like the good witch, Glenda, who assures Dorothy that, because of her Ruby Slippers, Dorothy really can be wherever she wants to be.
Few "new" programming ideas are so new that they have no precedent in previous practice. Perl 5 put the Ruby Slippers technique to work well before I described and named it. Its code captures the two essential elements of Ruby Slippers parsing.
The Syntax of the
As a reminder, the
use statement comes in several forms.
Most of them are module
that is, they request the loading of a module.
In the long form of the module request, a version number
is specified as well. The version number is usually
interpreted as the minimum acceptable version of
use Module VERSIONFor example,
The short form of the module request is a module request with no version specified.use List::Util 1.21;
use ModuleFor example,
Module requests, both short form and long form, also allow a list argument.
use Module VERSION LIST use Module LISTHere's an example of the short form with a list of arguments.
Because a number can be either a version or a single item list argument, module requests of the long form are potentially ambiguous. That is,use List::Util qw(reduce shuffle);
could be a request to use at least version 42 of the Fatal module. It can also be a request to load any version of the Fatal module, catching errors for the function nameduse Fatal v42;
v42. As implemented, Perl disambiguates these in the lexer. It parses the
usestatement as the long form, with version specified, whenever possible. The above line, for example, will complain that there is no version 42 of the
If you want to use the
with a function that you happened to have named
v42, you can take advantage of the fact
that the lexer's idea of whether a lexeme is a version number
or not is a guess.
This guess is based entirely on the first character or,
in the case of a v-string, the first two.
use Fatal +'v42';
Perl 5 has
to live within the limits of LALR parsing.
So you would think what I've already described
would be living plenty dangerously enough
for its designers.
But there is in fact
an additional form of the
use VERSIONFor example,
This is the perl version request form. The example requires that the version of Perl used be at least 5.010.use 5.010;
Out Come the Ruby Slippers
So, to parse all these different forms, what is in Perl's BNF?
The BNF in perly.y has a single rule for
one that specifies only the long form of the
The Perl 5 BNF rule
is the equivalent of
use WORD WORD LISTHere I have
VERSION, because that is what perly.y has. In this context, a
WORDis a token which can be either a version number or the name of a module.
Of the five forms of the
command, only one
is represented in the BNF.
Of couse, a LIST can be empty,
and that accounts for two of the missing variants.
But one BNF rule still accounts
for three different syntaxes:
- the long form of the module request,
- the short form of the module request, and
- the perl version request.
Collapsing three syntaxes into one rule makes for some gruesome code within toke.c, but Perl 5 has no better choice. Perl 5 is committed to LALR parsing. Restricting the grammar to a single BNF rule is the best hope of making sure the grammar does not go beyond LALR.
So the Perl 5 BNF assumes, contrary to fact, that Perl
use statements always contain both module name
Here we see the first part of the Ruby Slippers strategy:
When a feature of the actual language is inconvenient
the Ruby Slippers allow you to simply pretend it does not
Perl 5's parser will only handle grammars that are LALR,
but the Perl 5 language is not LALR.
Using the Ruby Slippers approach,
the Perl 5 BNF pretends that the language is LALR.
Making Wishes Come True
The second part of the Ruby Slippers strategy requires that wishful thinking be made to come true. That's easy in Marpa, where the parse engine knows, and can tell the lexer, exactly what it is wishing for and where. yacc is much less context-aware. As far as yacc is concerned, if the lexer really loved it, it would know what it wants.
The Perl 5 lexer winds up having to work very hard
to make this particular relationship work.
It looks ahead at the two
tokens and decides what to
do based on what it sees.
This amounts to figuring out in the lexer which variant of the
use statement is actually being used.
In effect Perl 5 parses every
use statement twice:
once in the lexer, and then one more time with yacc.
If the lexer sees that the
statement is the short form,
then it invents a second
to fill in for the missing one.
The lexer make the short form
look to yacc as if it was the full form.
This is the classic Ruby Slippers approach.
The perl version form of the
use statement is essentially a completely different
statement with the same keyword.
As its first step, it also does the classic Ruby Slippers
move, only in reverse.
The lexer reads the version as the first
WORD is then invented, with
As a final step, Perl 5 needs to distinguish
perl version requests from module load requests.
Here things get quite hackish.
Because the lexer is actually creating the Perl scalars,
it has full control over their internal representations:
the semantics can rely on the internal representations of
Perl 5 assumes that
are version numbers if and only if they have numeric
After the lexer reworks its input,
the parser always sees two
exactly one of which is a version number.
on whether that version number is
the first or second
Perl 5 decides whether a
use statement is
a module request or a perl version request.
Desperate men do desperate things.
"exactly what it is wishing for": As currently implemented, Marpa has a convenient call that, at any point in the parse, will return the list of expected terminals. An lexer willing to use the debug and trace functions can find a lot more information: which rules are in progress, how far they have been recognized, where they began, etc., etc. I could create convenient calls to access this information as well, but so far the list of expected terminals has been all that my lexers care to know.
"module": Actually, the
usestatement implements pragmas as well as modules, so the module request forms are also pragma invocations, and module names can be pragma names.