A Lexer for Marpa::R2
This proof-of-concept lexer (module, test) extracts literals (and other terminals) from a Marpa::R2::Grammar and turns them to regexes to tokenize string input for Marpa::R2::Recognizer.
It is made possible by check_terminal()
, rule_ids()
, and rule()
accessors provided by Marpa::R2::Grammar.
It works because token names to be read()
by Marpa::R2::Recognizer must be terminal symbols of Marpa::R2::Grammar.
A literal (and terminal, in some cases) fits the definition of a token perfectly, so tokenizing input by (pre-)splitting on literals (terminals) looks obvious, but I was unable to find definitive links and so feel a bit uneasy as to whether or not this would work in general.
But more testing will show the truth, I think. :)
Leave a comment