Marpa::XS release candidate now available

By Jeffrey Kegler on December 11, 2011 9:17 PM

I am very happy to announce that the latest release of Marpa::XS is a release candidate for the first full release, Marpa::XS 1.000000. Most user's experience with the previous beta releases seems to have been trouble-free. The one significant issue that was identified was a failure to properly evaluate null symbols under an unusual combination of circumstances. This problem (a one line error in the C rewrite of the parse engine) is fixed in this release. Unusual as the issue is, when it does occur it results in a parse failure, so that I recommend that all users of Marpa::XS upgrade to the latest release.

Marpa::XS is being kept stable. Bug fixes, even of minor and cosmetic bugs, will be made, as will changes that improve maintainability. But no new features will be added. Interface changes will be especially avoided.

What is Marpa?

Marpa is an advance over recursive descent and yacc. I hope the Marpa algorithm will become the standard parser for problems too big for regular expressions.

Marpa parses, in linear time, those classes of grammar that are currently in practical use.
The importance of parse-time debugging is often underestimated. Marpa's parse-time error detection breaks new ground -- Marpa is fully aware of exactly where in the parse it is at all times, and of exactly what input it expects and why. This means parse-time error detection, once a desperate last resort, now can be used as a parsing technique in itself.
Marpa is a general BNF parser -- that means if you feed it anything written in BNF, it "just parses" it. This includes grammars which are left-recursive, right-recursive and ambiguous -- even infinitely ambiguous.
Marpa never goes exponential -- worst case, even for highly ambiguous grammars, is O(n**3), which is considered optimal.

New with this release

Since the beta release, several bugs have been fixed, The most important one was a failure to properly evaluate null symbols under certain unusual circumstances. This problem, identified and described by Tomáš Jirotka, is fixed in this latest release.

In some previous Marpa::XS releases, the documentation, while part of the distribution, did not install automatically. As of this release, that problem is fixed. The documentation now installs, as it should, along with the rest of Marpa::XS.

No interface has been deprecated since Marpa::XS went beta -- the interface has remained stable. But many interfaces deprecated BEFORE Marpa went beta were used in the test suite. To make the test suite more useful for readers, I eliminated deprecated practices except in code whose purpose it is to test that deprecated practice. Where tests continue to use a deprecated practice, comments explicitly point this out.

What is next with Marpa?

Based on the feedback, I have confidence that Marpa::XS have been extensively used and found reliable. With the fixes for this release, I expect that Marpa::XS can be taken out of beta and into a full 1.000000 release shortly.

Development of new features for Marpa continues, but in another distribution: Marpa::R2. This isolates Marpa::XS users from the accidental changes and bugs that can be the side effect of active development.

Limitations

Marpa::XS installation requires a C compiler, and many of the GNU utilities and libraries as well. Marpa::XS has been tested on a wide variety of POSIX systems. In theory Marpa::XS is NOT restricted to POSIX systems -- all the tools it uses have Windows versions, for example. However, Marpa::XS has yet to be installed on a non-POSIX system.

Notes

"in linear time": To be specific, Marpa parses any LR-regular grammar in linear time -- O(n). LR-regular is a vast class of grammars that includes LALR, LR(k) for all k, and LL(k) for all k. This means that Marpa parses, in linear time, every class of grammar parsed by yacc, recursive descent and regular expressions.
"considered optimal": The phrase "considered optimal" elides some irrelevant bits of theory. It would be a bit of a surprise if it is possible to do general BNF parsing in O(n), but nobody has proved that it can't be done. The Valiant algorithm parses general BNF and is O(n**2.373...) or better. But Valiant's algorithm is only faster for huge problems, and for those it needs a machine with many terabytes of main memory to deliver on its speed claims. So it won't be competing with Marpa any time soon.
"unusual circumstances": I call this bug, the Jirotka issue, in gratitude to its discoverer (and because it is otherwise hard to describe). Those curious can look in Marpa::XS's test suite, at the regression test I created for the Jirotka issue. Most users will want to upgrade to the latest release and remain unaware that this issue ever existed.

In internalese, reproducing the Jirotka issue requires, at a minimum, a rule which creates a Leo chain with multiple rules and which has a certain pattern of nullables. The bug itself boiled down to using the wrong variable at a single point in the code for handling nullables and Leo chains. This incorrect variable often had either the same value as the correct variable, and or a value which had the same effect. This meant that, even if a test exercised Leo chains with multiple rules, and even if the nullables fell into the right positions, that test would not necessary surface the Jirotka issue.

My actual fix was not just that one line, but included some refactoring. The refactoring made the logic clearer, so that I could examine the code carefully, think the issue through, and assure myself that no other problems lurked in nearby or associated logic.

A bug report like Tomáš Jirotka's is a good sign -- it indicates that Marpa::XS has been very thoroughly exercised. This bug report also shows off a nice property of Marpa. Since Marpa parses every BNF grammar, it is very easy to determine whether Marpa's failure to parse a grammar is a bug or not -- if Marpa does not parse it, it's a bug. Most parsers in current use parse only a subset of the grammars that can be written in BNF, and whether a particular grammar is one of those in the subset can be very difficult to determine.
"GNU utilities and libraries": These dependences can be an inconvenience, I admit, but the alternative is installing my attempt to portably re-create all the things the GNU people have developed. I think that it is clear that the GNU software is the easier and more reliable alternative.

If you browse the package, you may see that it uses TeX as well. TeX is ONLY needed is you are working on libmarpa, the highly mathematical, low-level core library that provides the parse engine. To do this, you'd need to have studied a lot of the mathematics of parsing -- and you'd understand why I feel forced to do the documentation in TeX. All the non-mathematical parts are either in Perl, or in C code which can be read and changed on systems which do not have TeX installed.