Making the parsing game safe

In previous posts, I've talked about Marpa as an alternative to other parsers. In this one, I want to talk about Marpa as an alternative for problems where parsing has been avoided.

Because parsing HAS been avoided in the past. And for good reason. If you were drawn by the allure of domain-specific languages, or yielded to the siren call of language-oriented programming, you plunged headlong toward two pitfalls:

  • Your parser might not parse your grammar. Which you might discover at any point in incremental development. Or when a vital maintenance change came along.
  • Your input might not parse, and your parse engine might leave you with no easy way to find out what the problem is. Maybe your input was wrong, maybe your grammar was wrong, or maybe you've simply hit the limits of that parse engine. When it came to debugging, taking a language-based approach was a bit like deciding to write your problem up in P''.

By approaching your problem as ANYTHING but a parsing problem, you avoided these two pitfalls. Ambitious programmers, after a few encounters with the traditional parsing tools, would learn this. And the next time they dreamed up an elegant little DSL to finesse their design issues, they would wake themselves up and decide that they ain't that desperate yet.

Changing the parsing game

With Marpa in the parsing game, the rules are different. Now, anything you can write in BNF will parse. If your grammar falls into anything close to one of the classes of grammar currently in practical use, Marpa parses in linear time. If there's a problem, Marpa tells you exactly what it was looking for and why it was looking for it.

The Interpreter pattern, domain-specific languages, and language-oriented programming are all immensely powerful techniques. Almost any problem CAN be seen as the domain of a language. In practice, less powerful techniques are often a better fit. And with the traditional language-writing tools, it was a rare problem indeed for which a DSL was seen to justify the risk and effort.

Since it was so hard to create a new language, reuse of languages was emphasized instead. We've gotten used to the idea of leveraging existing languages, even ones which are a very poor fit to our problems, because the alternative was even worse.

More and better DSL's could breathe new life into our programming tools, and our programming methods. And now that the parsing game has become easier, DSL's are within reach in cases where they had not been before.

Note

"parsing has been avoided": For the purposes of this post, I do not include regular expressions as "parsing solutions". This post focuses on languages, and the term "language" is usually avoided when describing anything within the restricted syntax accepted by regular expressions. However, regular expressions do largely avoid the pitfalls described in this post, and that goes a long way to explain their popularity.

6 Comments

Unfortunately Marpa doesn’t pass all tests. I was preparing a test report but it seems you have plenty to work with (http://static.cpantesters.org/distro/M/Marpa.html). I am writing an article on Parser generator in Perl, and I would like to include Marpa in my comparison.

Thanks.

Jeffrey, if you are thinking of "removing" Marpa, might you not think of using the bare namespace as a distribution point ala Text::CSV. As you probably know, when you use Text::CSV it loads Text::CSV_XS if available and otherwise loads Text::CSV_PP which comes packaged with the barenamed module. Perhaps Marpa could provide Marpa::PP but load Marpa::XS if available? Just a suggestion, but it would certainly be less confusing. If you feel you must remove Marpa and leave it empty, perhaps place a simple pod document explaining where users should go to find Marpa; see perhaps the Alien distribution which simply is some pod describing the namespace.

+1 for Joel's suggestion. I like how JSON picks JSON::XS or JSON::PP depending on availability, and this way, it's still possible to explicitly choose an implementation if the user wants. I don't think Marpa's userbase is extensive yet, and it's possible to deal with this change somewhat gracefully. But the decision rests solely on the author's hand.


About Jeffrey Kegler

user-pic I blog about Perl, with a focus on parsing and Marpa, my parsing algorithm based on Jay Earley's.