Perl's Pegex Module: a great way to parse files by creating grammars

By Vikas Kumar on August 3, 2016 9:38 PM

We recently came across Pegex and found it to be an interesting module for parsing text data. Instead of using regular expressions directly, the user can write a grammar for the data to be parsed. The data can be automatically converted to a native Perl object or, if the user desires, it's possible to use actions to handle the grammar while parsing using a Pegex::Receiver class.

Pegex uses the type of grammars called Parsing Expression Grammars (PEG), which is an unambiguous form of writing a grammar. Each parsed string will in effect have a single valid parse tree. Since Pegex converts the rules of the grammar to regular expressions, it is a greedy parser.

In this blog post we demonstrate how to easily use Pegex to parse an /etc/hosts file on Linux and convert the result into Perl objects automatically without having to manually create any object.

For more details check out the original blog post by me here.

6 comments

Tagged as:

pegex

6 Comments

Ron Savage | August 4, 2016 12:21 AM | Reply

Before you waste any more time with PEG, read https://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2015/03/peg.html, understand why it's dangerous, and switch to https://metacpan.org/release/Marpa-R2.

Matt S Trout (mst) | August 4, 2016 4:07 PM | Reply

I'm afraid that article doesn't seem like a convincing argument unless you accept the author's premises a priori. Certainly the second example (A = "a"A"a"/"aa") was entirely obvious to me, and yet was clearly chosen to try and demonstrate a problem.

I think a clearer comparison of the strengths and weaknesses of PEG and Marpa, preferably with input from fans of both styles, would be a useful thing, but dismissing somebody's working code as a waste of time doesn't really count.

-- mst

Ron Savage | August 5, 2016 12:04 AM | Reply

Matt

Sure, but you're judging it from the point of view of an expert. My aim is to warn beginners of the dangers.

Jeffrey Kegler | August 5, 2016 1:04 AM | Reply

Thanks for looking at my piece. Most users will read the PEG A = "a"A"a"/"aa" as if it was BNF or a regex, in which case the number of "a"'s clearly must be a multiple of two. But to match in PEG, the count must be a power of two.

This was not obvious to the author or referees of the article from which I took the example. It is shown by formulating and solving a recurrence. The mathematician John von Neumann was famed for his ability to formulate and solve these off the top of his head, but I wonder if you perhaps, at this point in my article, read me a bit hastily, and would not agree with Redziejowski (http://www.romanredz.se/papers/FI2008.pdf on p. 4) that this in fact "really defies intuition"

Matt S Trout (mst) | August 7, 2016 1:18 PM | Reply

Ron: If the article was insufficient for *me* to understand why you think PEG is "dangerous", I fail to see how it's going to help beginners at all.

Jeffrey: Nah, I read it as a regex-like thing and got the right answer immediately - a *multiple* of two would obviously require "aa"A/"aa" or similar, no? I think perhaps it might be an interesting exercise for you to show a few more "intuition defying" examples to a few perl hackers some time and see whether it truly defies intuition in general or merely *an* intuition, and we have a different one.

-- mst

Jeffrey Kegler replied to comment from Matt S Trout (mst) | August 9, 2016 8:07 PM | Reply

Intuitions certainly can differ. I may add more about PEG to my "Parsing: A Timeline" piece: http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2014/09/chron.html PEG's algorithm in fact goes back to the pre-YACC compiler-compilers of the 1960's.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Vikas Kumar

I use Perl.

More info »

Vikas Kumar