Marpa: Practical General BNF Parsing

With a regular expression engine, there are expectations. You feed a regular expression to the RE engine, it parses with it. That simple.

A general BNF parser is one which fulfills the same expectation for BNF. Write your language in BNF, you got a parser. But it hasn't been that simple.

The guys who write the textbooks have pushed general BNF parsing for years. Improvements in these algorithms have pushed the speeds down to linear or close to it for the kinds of language in practical use.

But the general parsing algorithms have languished on the textbook pages. And I did find it wasn't quite as easy as the academics suggested. There were some obstacles that they didn't forsee. But bottom line, they were right. General BNF parsing is practical.

Marpa is a practical (if at this point alpha) general BNF parser generator. I have used Marpa to write a practical parser for a non-trivial task -- HTML parsing. That HTML parser is Marpa::UrHTML, and I now use Marpa::UrHTML in tasks I do routinely.

Over the next weeks, I will do phased releases of Marpa. The HTML parser, Marpa::UrHTML, is documented and ready to use in the first release. Over the next few weeks, I will document Marpa, the parse engine itself.

Right now the Marpa parse engine is "pure Perl", and speeds are on the order of PPI. This is acceptable for many applications, but Marpa can do better. Marpa has as its parse engine a mathematical algorithm that lends itself to conversion to C. Further down the road, I'll write a Marpa C library, and an XS wrapper.

2 Comments

Have you looked at OMeta? The author seems to have solved the left-recursion problem for PEG grammars. I would like to see a port of OMeta to Perl.

About Jeffrey Kegler

user-pic I blog about Perl, with a focus on parsing and Marpa, my parsing algorithm based on Jay Earley's.