Making the parsing game safe

By Jeffrey Kegler on March 10, 2012 7:49 PM

In previous posts, I've talked about Marpa as an alternative to other parsers. In this one, I want to talk about Marpa as an alternative for problems where parsing has been avoided.

Because parsing HAS been avoided in the past. And for good reason. If you were drawn by the allure of domain-specific languages, or yielded to the siren call of language-oriented programming, you plunged headlong toward two pitfalls:

Your parser might not parse your grammar. Which you might discover at any point in incremental development. Or when a vital maintenance change came along.
Your input might not parse, and your parse engine might leave you with no easy way to find out what the problem is. Maybe your input was wrong, maybe your grammar was wrong, or maybe you've simply hit the limits of that parse engine. When it came to debugging, taking a language-based approach was a bit like deciding to write your problem up in P''.

By approaching your problem as ANYTHING but a parsing problem, you avoided these two pitfalls. Ambitious programmers, after a few encounters with the traditional parsing tools, would learn this. And the next time they dreamed up an elegant little DSL to finesse their design issues, they would wake themselves up and decide that they ain't that desperate yet.

Changing the parsing game

With Marpa in the parsing game, the rules are different. Now, anything you can write in BNF will parse. If your grammar falls into anything close to one of the classes of grammar currently in practical use, Marpa parses in linear time. If there's a problem, Marpa tells you exactly what it was looking for and why it was looking for it.

The Interpreter pattern, domain-specific languages, and language-oriented programming are all immensely powerful techniques. Almost any problem CAN be seen as the domain of a language. In practice, less powerful techniques are often a better fit. And with the traditional language-writing tools, it was a rare problem indeed for which a DSL was seen to justify the risk and effort.

Since it was so hard to create a new language, reuse of languages was emphasized instead. We've gotten used to the idea of leveraging existing languages, even ones which are a very poor fit to our problems, because the alternative was even worse.

More and better DSL's could breathe new life into our programming tools, and our programming methods. And now that the parsing game has become easier, DSL's are within reach in cases where they had not been before.

Note

"parsing has been avoided": For the purposes of this post, I do not include regular expressions as "parsing solutions". This post focuses on languages, and the term "language" is usually avoided when describing anything within the restricted syntax accepted by regular expressions. However, regular expressions do largely avoid the pitfalls described in this post, and that goes a long way to explain their popularity.

6 comments

Tagged as:

Marpa Earley parsing parser

6 Comments

Alberto Simões | March 12, 2012 6:16 AM

Unfortunately Marpa doesn’t pass all tests. I was preparing a test report but it seems you have plenty to work with (http://static.cpantesters.org/distro/M/Marpa.html). I am writing an article on Parser generator in Perl, and I would like to include Marpa in my comparison.

Thanks.

Jeffrey Kegler | March 12, 2012 7:35 AM

@Alberto: The distribution you looked at is the legacy, deprecated "bare name" Marpa. The official, stable version is Marpa::XS. (I did not remove "bare name" Marpa on CPAN as a convenience for its users, but perhaps at this point I should remove it.)

I follow closely all FAIL, UNKNOWN and NA test reports for Marpa. To my knowledge, there are none which indicate Marpa issues. Specifically,

Carp very recently changed the punctuation of an error message. Marpa's test suite tests its diagnostics, and the punctuation change causes these tests to fail. The failure is harmless, and a forced install should work fine. Fixes to Marpa::XS and Marpa::PP will be released very shortly.
Some issues in some versions of Perl 5.15, a development release, have affected Marpa. These have been fixed by the Perl team, but the failing test reports still can be found in the test matrices.

Marpa::XS requires CPAN's Glib and GNU's glib. Smokers should report the absence of Glib or glib as an "UNKNOWN", and most do. But some report "FAIL"'s. The "fix" is to install glib.

Joel Berger replied to comment from Jeffrey Kegler | March 12, 2012 8:33 AM

Jeffrey, if you are thinking of "removing" Marpa, might you not think of using the bare namespace as a distribution point ala Text::CSV. As you probably know, when you use Text::CSV it loads Text::CSV_XS if available and otherwise loads Text::CSV_PP which comes packaged with the barenamed module. Perhaps Marpa could provide Marpa::PP but load Marpa::XS if available? Just a suggestion, but it would certainly be less confusing. If you feel you must remove Marpa and leave it empty, perhaps place a simple pod document explaining where users should go to find Marpa; see perhaps the Alien distribution which simply is some pod describing the namespace.

Jeffrey Kegler replied to comment from Joel Berger | March 12, 2012 3:35 PM

@Joel: I've pretty much decided to remove the "bare name" Marpa version. I won't put in a replacement -- the way CPAN works, on upgrade it would overload existing installations with either the new semantics or the pod stub, which might be a disaster for some users.

Steven Haryanto | March 12, 2012 4:34 PM

+1 for Joel's suggestion. I like how JSON picks JSON::XS or JSON::PP depending on availability, and this way, it's still possible to explicitly choose an implementation if the user wants. I don't think Marpa's userbase is extensive yet, and it's possible to deal with this change somewhat gracefully. But the decision rests solely on the author's hand.

Jeffrey Kegler replied to comment from Steven Haryanto | March 12, 2012 4:50 PM

@Steven, Joel: thanks for the feedback.

Actually, I have experience with the "XS if possible, PP if not" implementation: Marpa::HTML does exactly that. In the Marpa context, this has proved very problematic. Because Marpa::XS has a significant non-Perl "alien" dependency, "XS or PP if not possible" in practice becomes pretty much equivalent to "almost always PP". This is particularly so in the cpantesters environment, so that testing is 90% of the PP version, which is a QA disaster.

Also, Marpa::XS and Marpa::PP have differences, both subtle and not. Which means an application might work or not depending on which it got. I'm unlikely to have the cycles anytime soon to bring them into sync.

Btw, my current priority is to package the core of Marpa to make it possible for Perl mavens to create Marpa interfaces to their liking.

About Jeffrey Kegler

I blog about Perl, with a focus on parsing and Marpa, my parsing algorithm based on Jay Earley's.

More info »

Ocean of Awareness