A Lexer for Marpa::R2

By rns on November 9, 2012 12:02 PM

This proof-of-concept lexer (module, test) extracts literals (and other terminals) from a Marpa::R2::Grammar and turns them to regexes to tokenize string input for Marpa::R2::Recognizer.

It is made possible by check_terminal(), rule_ids(), and rule() accessors provided by Marpa::R2::Grammar.

It works because token names to be read() by Marpa::R2::Recognizer must be terminal symbols of Marpa::R2::Grammar.

A literal (and terminal, in some cases) fits the definition of a token perfectly, so tokenizing input by (pre-)splitting on literals (terminals) looks obvious, but I was unable to find definitive links and so feel a bit uneasy as to whether or not this would work in general.

But more testing will show the truth, I think. :)

0 comments

Tagged as:

grammar, input, lexer, literals, Marpa, splitting, terminals, tokenizing, tokens

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About rns

I blog about Perl.

More info »

rns