Progress on Marpa::XS
A Faster Marpa
I've been converting Marpa from Pure Perl to C. For those who don't know, Marpa is a parser generator which parses any grammar you can write in BNF. If the grammar is one of those which can be parsed by yacc, by recursive descent or as a regular expression, Marpa parses it in linear time.
Marpa's core algorithm does no system calls. It is almost 100% pointer twiddling. There is no floating point, and very little integer arithmetic. It's as if it was made to order to show C in its very best light.
I expected converting the Perl implementation to C to improve speed by two orders of magnitude, and first results are close -- the speedup for the code converted to C is 95 to 1. The code converted to C remains wrappered in Perl. The Perl wrapper handles the I/O, the user interface, and roughly the last third of the core algorithm. This last part of the core algorithm I have yet to rewrite in C.
The speedup for an actual application depends on how much Perl "wrappering" the application layers on top of the Marpa parse engine. But every test in the test suite is several times faster for XS than for Pure Perl. Ultimately, I will make Marpa's core algorithm available as libmarpa, a C library with a documented interface. If you write a wrapper for libmarpa in C, you will be able to get the full speedup, which I expect with a few more tweaks to be better than 100 to 1.
Looking Onward: An "Official" Alpha Release
At this moment, Marpa::XS is available in developer's releases only. I prefer that most users look at Marpa instead. Marpa is Pure Perl. It is also alpha, but it is stable at this point.
But with this very measurable speedup, Marpa::XS now offers added value over Marpa, and I am turning to work on an "official alpha" release of Marpa::XS. Such a release would still not be suitable for use in production situations, and its interface would be subject to change.
Notes
Note 1: Technically, the grammars that Marpa parses in linear time are the LR-regular grammars. These include the LR(k) grammars and the LL(k) grammars, for all k.
Note 2: The claim of O(n) time complexity is for the underlying algorithm. Actual implementations tend to pick those data structures which are fastest in practice, instead of on paper. Pure Perl implementations, of course, must rely on the internal implementation of Perl's native data structures.
The relationship between theoretical time complexity and actual implemented speed in our field is like that between quantum mechanics and Newtonian mechanics in physics. There are effects back and forth, and a realm where each clearly rules. There are also specialists in each, who tend to avoid looking at the scales where one set of rules ceases to apply and the other kicks in.
The relationship between theoretical time complexity and actual implemented speed interests me. In Marpa's XS implementation, I have control of the data structures, and I am using the implementation to work out in detail the proofs of time complexity for Earley's algorithm and Leo's modification of it. In the course of this I hope to make a full accounting of any differences between theoretical and implemented time complexity.
"Looking Onward: An "Official" Alpha Release"
Geez... do you work for Google or something? This phrase distances you from your code in so many ways ("onward", scare quotes, "alpha") that it's hard to imagine using it.
I would love to work for Google.
I have been criticized for being over-cautious in describing my releases. On the other hand, if you have the choice of using code written by someone who is over-cautious, or by someone who is over-enthusiastic, which would you pick?