A Lexer for Marpa::R2

This proof-of-concept lexer (module, test) extracts literals (and other terminals) from a Marpa::R2::Grammar and turns them to regexes to tokenize string input for Marpa::R2::Recognizer.

It is made possible by check_terminal(), rule_ids(), and rule() accessors provided by Marpa::R2::Grammar.

It works because token names to be read() by Marpa::R2::Recognizer must be terminal symbols of Marpa::R2::Grammar.

A literal (and terminal, in some cases) fits the definition of a token perfectly, so tokenizing input by (pre-)splitting on literals (terminals) looks obvious, but I was unable to find definitive links and so feel a bit uneasy as to whether or not this would work in general.

But more testing will show the truth, I think. :)

Leave a comment

About rns

user-pic I blog about Perl.