July 2013 Archives

Parsing Ada Lovelace

[ This is cross-posted from the Ocean of Awareness blog. ]

The application

Abstract Syntax Forests (ASF's) are my most recent project. I am adding ASF's to my Marpa parser. Marpa has long supported ambiguous parsing, and allowed users to iterate through, and examine, all the parses of an ambiguous parse. This was enough for most applications.

Even applications which avoid ambiguity benefit from better ways to detect and locate it. And there are applications that require the ability to select among and manipulate very large sets of ambiguous parses. Prominent among these is Natural Language Processing (NLP). This post will introduce an experiment. Marpa in fact seems to have some potential for NLP.

Writing an efficient ASF in not a simple matter. The naive implementation is to generate complete set of fully expanded abstract syntax trees (AST's). This approach consumes resources that can become exponential in the size of the input. Translation: the naive implementation quickly becomes unuseably slow. Marpa optimizes by aggressively identifying identical subtrees of the AST's. Especially in highly ambiguous parses, many subtrees are identical, and this optimization is often a big win.

About Jeffrey Kegler

user-pic I blog about Perl, with a focus on parsing and Marpa, my parsing algorithm based on Jay Earley's.