Marpa is now O(n) for Right Recursions
There's news with the latest version of Marpa (0.102000). Marpa now parses grammars with right-recursions in linear time (O(n)). (Marpa already handled left-recursion in linear time.)
This means that Marpa is now O(n) for all LR-regular grammars. LR-regular means LR with infinite lookahead using regular expressions. That a big class of grammars. It obviously includes all the LR(k) grammars, and therefore everything parsed by Yapp, yacc, and bison. LR-regular grammars also include everything parseable by recursive descent, PEGs, and other LL(k) grammars. LR-regular definitely includes all regular expressions.
Marpa's O(n) behavior has another nice feature. When it does not parse in O(n) time, it still parses. Some parser generators always parse quickly, because when they can't parse quickly, they don't parse at all. Marpa will parse anything you can write in BNF, even highly ambiguous grammars, and the absolute worst case is cubic (O(n**3)).
In my last post, I explained that the previous release of Marpa could parse unusually large classes of grammars in linear time, and that the right recursive cases where Marpa was not linear could usually be worked around. In fact, my experience had been that working around a right recursion was easy, so I'd never bothered looking hard at the issue.
Sometimes, writing a long explanation of why a limitation does not matter makes me think: Perhaps it does matter enough to take a second look. And take a second look is what I did.
A 1991 article by Joop Leo had laid out a modification to Earley's algorithm (the basis of Marpa) which was O(n) for all LR-regular grammars. Problem was, Marpa already incorporated other, major, enhancements to Earley's from another article, this one by Aycock and Horspool and dating to 2002. Were the two modifications compatible?
They are. And Marpa 0.102000 is the result. CPAN and the Perl community has it, and everybody else will have it when they borrow from us.
It's very nice to see this sort of cutting edge "academic" work being done in Perl.
I was interested in Marpa when I first heard about it and I'll definitely take a look at it again in the future for some personal projects I have in mind.
I'm curious regarding Marpa real-world performance, i.e., benchmarks. Does it fare well against the other Perl alternatives?
Would Marpa fare well against other Perl alternatives in benchmarks? My fast, biased, answer is yes. Speed can be compared to PPI.
A fast way to test this for yourself is to take the utilities in Marpa::HTML and run them on sample Web pages. Marpa::HTML uses Marpa and I think you'll find speed of those utilties quite acceptable. It actually is comparable to some rendering engines.
Where applications for restricted grammars already have carefully optimized parsers, Marpa will not be the best tool. XML, for example, has a grammar which was designed to be easy to parse and parsers specially written to crunch it. A general BNF parser like Marpa is never going to beat that combination. The best that can be hoped is to get into the same ballpark.
I'm planning to convert Marpa to XS. The code is mainly pointer-twiddling, so it should do well when converted to C. At that point, I hope speeds will go from acceptable to impressive.
Very nice, I'll keep an eye on your project. :)