Marpa::XS is now 1.000000
Marpa::XS is now 1.000000. Marpa::XS is the current lead implementation of Marpa, an algorithm that I hope will become standard for those parsing problems which are too complex for regular expressions. Apparently quite a number of people have put the beta to use. Feedback has been positive -- often extremely so.
What is Marpa?
Marpa is a general BNF parser -- it parses anything you can write in BNF, no exceptions. Left-recursion, right-recursion, ambiguity and even infinite ambiguity, you name it, Marpa parses it. If the grammar is of a class in practical use, Marpa parses it in linear time -- O(n).
Marpa's parse-time error detection is a breakthrough. When previous parsers failed, they often offered very little clue as to why. Marpa knows exactly what input it expects and why. Marpa is always fully aware of exactly where it is in the parse, in terms of the rules of the grammar, and it can share that information with the application. So good is Marpa at error detection, once considered a desperate last resort, that error detection can be used as a parsing technique in itself.
While Marpa is intended to computer with production parsers, it does have special advantages for developers and experimenters. Marpa is highly tolerant of difficult grammars -- it parses all of them, and in times which are considered optimal.
New with this release
For Marpa::XS 1.000000, only the version number and the README file were changed from the previous, beta, release.
What is next with Marpa?
Marpa::XS is aimed at users who want a stable platform for applications. To ensure the stability of Marpa::XS, active development of Marpa is moving into a new fork: Marpa::R2. This will isolate Marpa::XS users from the accidental changes and bugs that can be the side effect of active development.
Initially, changes to Marpa::XS will be restricted to bug fixes and those justified from a maintainability standpoint. The feature set will be kept stable. (As it stands, Marpa::XS is much more fully featured than competing parsers.) If I enhance the features of Marpa::XS, the new features will be back-ported from Marpa's active development forks, and I will preserve backward compatibility.
Limitations
Marpa::XS is, as the name suggests, XS only -- installation requires access to a C compiler, and to many of the GNU utilities and libraries as well. Marpa::XS has been tested on a wide variety of POSIX systems. In theory Marpa::XS is NOT restricted to POSIX systems -- all the tools it uses have Windows versions, for example. However, Marpa::XS has not, to my knowledge, been installed on a non-POSIX system.
Notes
- "in linear time": To be specific, Marpa parses any LR-regular grammar in linear time -- O(n). LR-regular is a vast class of grammars that includes LALR, LR(k) for all k, and LL(k) for all k. This means that Marpa parses, in linear time, every class of grammar parsed by yacc, recursive descent and regular expressions.
- "considered optimal": The phrase "considered optimal" elides some irrelevant bits of theory. I would be mildly surprised if it turns out that there is an O(n) algorithm for general BNF parsing, but nobody has proved that such a thing cannot exist. And there is an algorithm which, in theory, beats Marpa's O(n**3) worst case. The Valiant algorithm parses general BNF and is O(n**2.373...) or better. But Valiant's algorithm is only faster for huge problems, and for those it needs a machine with many terabytes of main memory to deliver on its speed claims. So it won't be competing with Marpa any time soon.
- "GNU utilities and libraries":
These dependences can be an inconvenience, I admit, but
the alternative is installing
my attempt to portably re-create
all the things the GNU people have developed.
I think that it is clear that the GNU software is the easier
and more reliable alternative.
If you browse the package, you may see that it uses TeX as well. TeX is ONLY needed is you are working on libmarpa, the highly mathematical, low-level core library that provides the parse engine. To do this, you'd need to have studied a lot of the mathematics of parsing -- and you'd understand why I feel forced to do the documentation in TeX. All the non-mathematical parts are either in Perl, or in C code which can be read and changed on systems which do not have TeX installed.
Hi Jeffrey
While I think of it - has anyone coded a parser for Graphviz's
DOT language?
And, would it be that hard?
Seconded. I'm longing for more examples aside from the HTML parser. Preferably simpler ones, included under examples/ or demo/ in the distribution. Compare, for one, with Regexp::Grammar's distribution:
http://cpansearch.perl.org/src/DCONWAY/Regexp-Grammars-1.014/demo/
The examples help *a lot* for casual parser writers.
@Ron: This is the first I've heard of Graphviz's DOT, but based on a quick look, my guess is parsing the DOT language with Marpa::XS would be quite straightforward. The DOT language seems well-documented: There is BNF for the grammar and a description of how to do the lexing. The grammar looks like something Marpa should have no problem with.
As for the lexing, there are DOT modules on CPAN, which at 1st glance seem to do lexing. Hopefully, those could be leveraged to provide a lexer to a Marpa-powered parser.
@Steven: One thing I did before taking Marpa::XS 1.000000 was to remove all the deprecated techniques out of the tests in the t directory (except in the deprecated*.t tests). All the other test files are safe to use as examples. Most suitable for that purpose are perhaps ah_numeric.t, counter.t, equation.t, minus.t, pascal.t, randal.t, timeflies.t and wall.t. The .t files are cluttered, because I usually test not just output, but diagnostics (a practice I highly recommend, by the way.)
Marpa was a very long (4+ full time years) and complex project, and until it was stable, I could not be sure that any demos I did would still run -- unless of course I included them in the test suite. This is perhaps not exactly what you wanted, but I hope it helps.
Ah, the test files of course. It's a start, thanks! :)
The big problem with using Marpa::XS is that anything useful I build on top of it will inherit the same platform limitations, and so will everything that uses my things as well, and so on recursively.
That's quite a big problem, and the reason why high quality modules like SVN::Core are never really adopted.
I would suggest that whatever is needed to get it working on all platforms, it is very very much worth trying.
Windows for example does already have dmake, and gnu make, and gcc, and other utilities. But some of the stuff around the edges aren't available and my be worth some evil hackery in order to get it working.
@Adam: Dunno what you mean by "stuff around the edges". Throughout my work on Marpa::XS, I made sure to avoid POSIX dependencies. And with anything I used, I ALWAYS checked to see that it was available on Windows. But I don't have access to a Windows machine, so I was not able to test.
There's only two non-core Perl dependencies, Glib and ExtUtils::PkgConfig, and they run on Windows. gcc is NOT required -- The C code for Marpa::XS is C89, and compiles clean with "-ansi -pedantic". File handling is done only during build, and uses File::Spec, etc., etc. My hope and expectation is that no "evil hackery" is needed to get Marpa::XS onto Windows.
@Steven: Also, some very nice examples are starting to appear on the net already, like this
very nice-looking CSS parser by Wolfgang Kinkeldei.
If it is supposed to work on Windows, then something isn't quite right.
I think the initial problem is you haven't listed ExtUtils::PkgConfig as a dependency (needs to be a configure_requires one)
@adam: Mysterious. I went in to make the fix, and found that ExtUtils::PkgConfig already *IS* one of the configure_requires prereqs. It shows in the META.json.
@Steven: I have added a section to the Marpa web page which has examples of Marpa parsers written by others. Frankly, I find the Marpa parsers that others write often look more elegant than my own efforts.