Developing parsers incrementally with Marpa
Marpa::XS is a general context-free parser. What does that mean? For a grammar writer, it means that he doesn't need to worry that the next rule he adds to the grammar is the one that makes it hit the invisible wall that most other parser generators set up. If you can write it in BNF, Marpa will parse it. Which makes Marpa::XS good at incremental development.
As you develop your Marpa grammar, you can track the tradeoffs you are making between features and efficiency. Chances are, if your grammar is unambiguous or lightly ambiguous, there are no tradeoffs -- you're getting everything you want in linear time. Marpa is linear for every class of grammar currently in practical use.
Marpa is also linear with many ambiguous grammars and, in the worst case, Marpa's time complexity is what is accepted as optimal in practice. Whatever the time complexity that you're seeing with Marpa, it's probably as good or better than you're going to get from another parser generator.
One way to start the semantics
As you build your grammar out, of course,
you'll need a semantics.
I often start my Marpa semantics with a single routine,
something like:
sub My_Actions::do_what_I_mean {
# The first argument is the per-parse variable.
# At this stage, just throw it away
shift;
# Throw away any undef's
my @children = grep { defined } @_;
# Return what's left
return scalar @children > 1 ? \@children : shift @children;
}
When defining the grammar,
I set do_what_I_mean
as the default action:
default_action => 'My_Actions::do_what_I_mean',
This one function is enough to get me started
working out the semantics of my parse tree.
At every node of the parse tree,
it throws away all the undefined child values.
If no child value remains, an undef
is returned.
If only one child value remains, it is the result.
If more than one child value remains,
a reference to an array of them is returned.
The result, when handed to Data::Dumper,
is a reasonable representation of your parse tree.
The "per-parse variable" is also thrown away. By default, this is an empty hash which can be used as a kind of global scratchpad in the parse. This will be useful, for example, if your language has a symbol table.
The do_what_I_mean
function produces
a first cut at a parse tree,
which I can hack away at.
For example, if I'm writing a calculator, I can change the actions
rule by rule to perform the calculations.
Your do_what_I_mean
function might vary,
depending on your application.
You might find it is better to keep all the undefineds, for example.
Also, once your grammar is complete,
you should check to see if you can do without a default_action
.
If you can, eliminating it is an efficiency gain.
This efficiency gain will be even bigger in Marpa::R2,
Marpa's next release.
I have read it somewhere that C++ has undecidable grammar. Does it mean that it cannot be parsed using Marpa?
@Jakub: C++ is not one of the languages Wikipedia lists as having an undecidable syntax. (The list is at the end of this section.) Marpa really isn't an issue here either way -- Marpa cannot parse 100% of an undecidable language, but neither can anything else. Incidentally, with undecidable languages, the important thing is what portion of the programs you'd actually write are undecidable. The trick (yet another thing to be credited to Larry Wall) is to get the power that comes with undecidability, without ever having to actually pay the price.