(my) Marpa Best Practices
Marpa is a great, really great, piece of software that deserve to be used by everybody wanting to do serious (let's say frankly: professional) and innovative parsing. I feel that Perl language is very lucky to to have been choosen by its author, Jeffrey Kegler, as the main frontend. But seriously, this is the only module that is a true BNF parser. All others modules on CPAN that contain the BNF keyword are not. Marpa brings innovative and on-the-edge ways of thinking, writen by a person clearly brilliant in both progamming and language theory, and very responsive on the marpa-parser google group.
My first module using Marpa was MarpaX::Languages::C::AST, and I should have blogued about it at that time. Instead Jeffrey did it, time for me to start blogging too on Marpa!
As in any progamming language, Marpa also has its syntax for writing gramars. Obviously this syntax is in BNF and can be viewed here.
Before doing other posts on ECMAScript, I'd like to remind very briefly the main Marpa features:
- A grammar is writen in BNF following Marpa's BNF syntax
- A grammar may have two main distinct embedded grammars: G1 and G0.
- G1 is like any other grammar, look to BNF grammars all over the world. G1 rules have "::=" as separator between the production LHS (Left Hand Side) and the RHSs (RIght Hand Side rules) other productions or terminals
- G0 can be used to define "lexemes". Lexemes is what Lex/Flex people would call the terminals.
- G0 can have its own grammar. The separator is then "~"
- Marpa brings natively the notion of events all over the place: prediction, completion, nulled events.
- Any external scanner can be used together with Marpa, and this is how we do the "business" logic with the "technical" tool that is Marpa
Here is how I feel the most cumfortable when writing, developping and maintaining my grammars:
- A lexeme is always writen using letters matching class [A-Z0-9_], but never start with '_'.
- Any G0 rule that is not a lexeme start with '_'.
- Any G1 rule should contain at least two characters, one of them being in the range [a-z]
- Paused before lexeme named events should be writen like: ^LEXEME
- Paused after lexeme named events should be writen like: LEXEME$
- Nulled G1 events should be writen like: Rule
- Predicted G1 events should be writen like: ^Rule
- Completed G1 events should be writen like: Rule$
- Predicted lexemes named events should be writen like: ^^LEXEME
- Eventual _any suffix for a rule that repeat zero or more times
- Eventual _many suffix for a rule that repeat one or more times
- Eventual _maybe suffix for a rule that is optional (i.e. is nullable)
Take care about ^^LEXEME and ^LEXEME: they are VERY different: ^^LEXEME is really a prediction. Nothing guarantees it will truely happen. On the other hand ^LEXEME means that Marpa found it, and you are paused just before Marpa wants to process it.
Marpa Rocks, trust me.