Marpa: Practical General BNF Parsing

By Jeffrey Kegler on December 31, 2009 3:29 PM

With a regular expression engine, there are expectations. You feed a regular expression to the RE engine, it parses with it. That simple.

A general BNF parser is one which fulfills the same expectation for BNF. Write your language in BNF, you got a parser. But it hasn't been that simple.

The guys who write the textbooks have pushed general BNF parsing for years. Improvements in these algorithms have pushed the speeds down to linear or close to it for the kinds of language in practical use.

But the general parsing algorithms have languished on the textbook pages. And I did find it wasn't quite as easy as the academics suggested. There were some obstacles that they didn't forsee. But bottom line, they were right. General BNF parsing is practical.

Marpa is a practical (if at this point alpha) general BNF parser generator. I have used Marpa to write a practical parser for a non-trivial task -- HTML parsing. That HTML parser is Marpa::UrHTML, and I now use Marpa::UrHTML in tasks I do routinely.

Over the next weeks, I will do phased releases of Marpa. The HTML parser, Marpa::UrHTML, is documented and ready to use in the first release. Over the next few weeks, I will document Marpa, the parse engine itself.

Right now the Marpa parse engine is "pure Perl", and speeds are on the order of PPI. This is acceptable for many applications, but Marpa can do better. Marpa has as its parse engine a mathematical algorithm that lends itself to conversion to C. Further down the road, I'll write a Marpa C library, and an XS wrapper.

2 comments

2 Comments

Randal L. Schwartz | January 1, 2010 1:16 PM

Have you looked at OMeta? The author seems to have solved the left-recursion problem for PEG grammars. I would like to see a port of OMeta to Perl.

Jeffrey Kegler replied to comment from Randal L. Schwartz | January 1, 2010 2:25 PM

Marpa loves to left-recurse -- it left-recurses in linear time, unlike Ometa.

And Marpa will handle anything you can write in BNF (albeit non necessarily in linear time). In Marpa you can start out describing your problem in a completely natural form and get it working. Then, as needed, selectively remove ambiguities and right recursions.

This has in my work turned out to be more of an advantage than you'd expect. It is very handy to be able to put anything at all in BNF into a parser and have it run. It's quite possible that a major use for Marpa will be as development tool for people working out grammars for YACC/bison/yapp, and maybe this will be the case for Ometa/PEG also.

In a bit I will finish the Marpa documents, and you can check it out.

About Jeffrey Kegler

I blog about Perl, with a focus on parsing and Marpa, my parsing algorithm based on Jay Earley's.

More info »

Ocean of Awareness