A grammar that exemplifies, describes and parses itself
[ This is cross-posted from the Ocean of Awareness blog. ]
I've written a grammar in Marpa's new BNF interface, to parse Marpa's new BNF interface. In the 70's, when I learned parsing theory, this was a very fashionable thing to do, perhaps because yacc had done it, in Appendix B of the original 1975 paper. By 1979, Hoftstadter's book Godel-Escher-Bach (GEB) was out, and the next year it took the Pulitzer for General Nonfiction. Self-description, recursion, self-reference, self-embedding, you (preferably autologically) name it, these things were all the rage.
Reading code that is at once both self-example and self-description still holds a certain magic for me. Regular expressions cannot describe themselves. Recursive descent parsers are hand-written in another general-purpose language, so there can be no concise self-description. Ironically, yacc actually cannot parse its own description language. ("Ironically" is the word used in the paper.) Like almost all useful grammars, yacc's description language goes beyond the capabilities of yacc's LALR parser, and a lexer hack is needed to make the code in Appendix B work.
Marpa is a general BNF parser and requires no special hacks to parse the following efficiently:
rules ::= rule+ action => do_rules rule ::= empty_rule | priority_rule | quantified_rule priority_rule ::= lhs op_declare priorities action => do_priority_rule empty_rule ::= lhs op_declare adverb_list action => do_empty_rule quantified_rule ::= lhs op_declare name quantifier adverb_list action => do_quantified_rule priorities ::= alternatives+ separator => op_tighter proper => 1 action => do_discard_separators alternatives ::= alternative+ separator => op_eq_pri proper => 1 action => do_discard_separators alternative ::= rhs adverb_list action => do_alternative adverb_list ::= adverb_item* action => do_adverb_list adverb_item ::= action | left_association | right_association | group_association | separator_specification | proper_specification action ::= kw_action op_arrow name action => do_action left_association ::= kw_assoc op_arrow kw_left action => do_left_association right_association ::= kw_assoc op_arrow kw_right action => do_right_association group_association ::= kw_assoc op_arrow kw_group action => do_group_association separator_specification ::= kw_separator op_arrow name action => do_separator_specification proper_specification ::= kw_proper op_arrow boolean action => do_proper_specification lhs ::= name action => do_lhs rhs ::= names quantifier ::= op_star | op_plus names ::= name+ action => do_array name ::= bare_name | reserved_word | quoted_name name ::= bracketed_name action => do_bracketed_name reserved_word ::= kw_action | kw_assoc | kw_separator | kw_proper | kw_left | kw_right | kw_group
The conventions are standard or transparent. The "::=" symbol separates the left and right hand sides of rules. The "|" symbol separates alternative right hand sides. The "*" and "+" are quantifiers, similar to those in regular expressions, and indicate, respectively, zero or more repetitions and one or more repetitions of the preceding symbol. Adverbs take the form "keyword => value", and indicate semantics or the style of sequence separation. Full documentation can be found here.
Self-parsing compiler compilers ruled the earth in the age of bellbottoms. Self-parsing has lasted better, but not by much. When some years I wrote a self-describing language as an interface to Marpa, it seemed to confuse people. They wondered what Marpa did -- parsing your own description did not seem to be about doing anything. These days my examples feature a lot of calculators. ("Ironically", Hofstadter seems to have had the same problem with GEB -- he felt that people did not understand what his book was saying -- even those who liked it.)
But ideas from Larry Wall and Peter Stuifzand have re-ignited my interest in self-parsing. And this time the self-parsing parser was written with a specific purpose. I plan to enhance this language. I have found that the convenience of this interface more than compensates for the circular dependency issues. The BNF source in this post is the source for its own parser, and I plan to use it to produce improved versions of itself.
Comments on this post can be sent to the Marpa Google Group: