The solved problem that isn't, is

By Jeffrey Kegler on August 12, 2012 8:07 PM

In the title of an excellent blog post, Laurence Tratt calls parsing, "the solved problem that isn't". I thought this phrase captured the current situation in parsing theory and practice very nicely. In stating that parsing is not a solved problem, Tratt realized he was taking on a consensus. But the consensus is fading -- for example, neither side in the interchange between Might/Darais and Russ Cox expresses complete contentment with the state of the art.

What would be a real solution to the parsing problem? I wish to suggest that Marpa is that solution. I say that based on a list of features. Marpa is the first parser to have all of these features, and I claim they are enough to justify the assertion that, with Marpa, parsing is no longer an unsolved problem. Specifically,

Marpa parses everything you can write in BNF.
Marpa parses in times considered theoretically optimal. For unambiguous grammars, Marpa is never worse than O(n²). For ambiguous grammars, Marpa is never worse than O(n³). Marpa never goes exponential.
Marpa parses all classes of grammar in practical use today in linear time, O(n). Marpa is linear for all LR-regular grammars. The LR-regular grammars include regular expressions, LL(k) for all k, and LR(k) for all k.
A serious practical issue has been parse-time error detection. Marpa breaks new ground here. Marpa is fully aware, at every point in the parse, of all the rules it is parsing, how far into them it has proceeded, and of what tokens it expects next. This information is available to the application conveniently and efficiently.
Marpa parsers do not need to be handwritten. Marpa is available as a open-source library. It is written in C, and the C library can be used directly or via a Perl interface.
For general BNF parsing, the user does not need to craft a lookahead or backtracking strategy -- Marpa does not use lookahead and never backtracks.
Marpa's complexity and correctness claims come with the traditional theoretical apparatus of proofs based on prior literature.

In his post, Tratt focuses his discontent on the problem of "language composition" -- the problem of combining two grammars into one. Tratt knew that an efficient and practical general BNF parser, like Marpa, would make language composition easy. But he was not aware that any such parser existed. Language composition is a topic to which I hope to return.

1 comment

Tagged as:

Earley, Marpa, parser, parsing

1 Comment

jeandamiendurand.myopenid.com | December 13, 2012 2:03 PM

Marpa is revolutionary in at least one world that I know well, the perl language, which is just binding well the C version, itself being clearly very fast, robust and portable. A master piece, and deserve a big community.

About Jeffrey Kegler

I blog about Perl, with a focus on parsing and Marpa, my parsing algorithm based on Jay Earley's.

More info »

Ocean of Awareness