Announcing Marpa::XS 0.016000

By Jeffrey Kegler on October 14, 2011 8:10 PM

I released Marpa::XS 0.016000 a week ago and the cpantesters results look excellent. With this release, my conversion of Marpa from Perl to C is finished. A lot of Perl code remains, to be sure, but all of it is code that arguably belongs in some kind of higher-level language.

This release was checked for leaks and other memory issues. The couple of issues that turned up were fixed.

What is Marpa?

Marpa is an advance over recursive descent and yacc. I hope the Marpa algorithm will become the standard parser for problems too big for regular expressions.

Marpa parses in linear time, all classes of grammar that are currently in practical use.
The importance of parse-time debugging is often underestimated. Marpa's parse-time error detection breaks new ground -- Marpa is fully aware of exactly where in the parse it is at all times, and of exactly what input it expects and why. This means parse-time error detection, once a desperate last resort, now can be used as a parsing technique in itself.
Marpa is a general BNF parser -- that means if you feed it anything written in BNF, it "just parses" it. This includes grammars which are left-recursive, right-recursive and ambiguous -- even infinitely ambiguous.
Marpa never goes exponential -- worst case, even for highly ambiguous grammars, is O(n**3), which is considered optimal.

What is next with Marpa?

At this point, little remains to be done before a a 1.000000 beta release of Marpa::XS. Once Marpa::XS does go beta, I expect to be able to keep its interface stable.

Inside Marpa::XS, the portions converted to C amount to a complete, if low-level, parsing library. The libmarpa library does not, however, have the documentation that you'd expect in a library being released on its own.

Also, frankly, before writing the documentation, I need to redo the interface to libmarpa. As long as the interface to libmarpa was a strictly internal affair, I didn't worry about it much -- I made the first cut at a design, and got it working. I then looked at the result. If the design was so awful that it got in the way of features or was an efficiency issue, I fixed it. If ugliness or awkwardness was its main issue, I moved on. With the first pass and a few trial applications behind me, I now know how the libmarpa interface should look.

Once I have libmarpa finished and documented, the next step will be Marpa::Thin. These days a lot of people like to hack interfaces. The Marpa project is about the parse engine -- when it comes to interfaces, I want to put everyone else on a level playing field with me, if not get out of the game altogether. Marpa::Thin will be a "thin" Perl interface to libmarpa.

Currently, I plan to create a SWIG interface to libmarpa, which means that any environment that SWIG supports (and there are a lot of them) will have access to the low-level Perl library. I have mixed feelings about Marpa leaving its Perl home, but I think most Perlers share my belief -- we contribute to the Perl community, not because it is a goal in itself, but because it is the best way to contribute to a larger community.

Limitations and Comparisons

Currently, the major limitation of Marpa::XS is that it is alpha. Development is well advanced, but the interface remains subject to change. For a comparison of Marpa to other parsers, one which is careful to point out situations where older parsing algorithms may still be superior, see the "Limitations and Comparisons" section in my announcement of Marpa::XS 0.008000.

Notes

"higher-level language": Of course, depending on the application, the ideal "high-level language" may be C. But I feel no real need to convert the code for the user interfaces to C, and the scripts for building and testing really should be in a highly portable high-level language.
"in linear time": To be specific, Marpa parses any LR-regular grammar in linear time -- O(n). LR-regular is a vast class of grammars that includes LALR, LR(k) for all k, and LL(k) for all k. This means that Marpa parses, in linear time, every grammar parsed by yacc, recursive descent and regular expressions.
"considered optimal": The phrase "considered optimal" elides some irrelevant bits of theory. It would be a bit of a surprise if it is possible to do general BNF parsing in O(n), but nobody has proved that it can't be done. The Valiant algorithm parses general BNF and is O(n**2.376...) or better. But Valiant's algorithm is only faster for huge problems, and for those it needs a machine with many terabytes of main memory to deliver on its speed claims. So it won't be competing with Marpa any time soon.