What! No Lexer?
To those who have noted that Marpa::XS does not come with a lexer, I'd respond that, in a very real sense it does -- Perl. Perl5 is a powerful lexical analyzer.
If you're trying to figure out how to write your first Marpa parser, I'd recommend a close look at Wolfgang Kinkeldei's recent posting about his Marpa-powered CSS parser. Wolfgang lays his parser out in a very elegant fashion, and I find his code makes an excellent template.
Especially nice-looking is Wolfgang's lexer. Wolfgang follows one of the two main strategies for lexical analysis in Perl: he consumes the input using substitution (s/ ... / ... /
) commands.
The other strategy is to use the Perl regex search position to track the progress of the lexical analysis. In the search-position strategy, your cases consist of a lot of match commands using the
\G
anchor
and the gc
modifier:
m/\G ... /gc
.
An excellent tutorial on this
kind of lexing,
albeit in a non-Marpa context,
can be found in
Mark Jason Dominus's book,
Higher Order Perl.
Mark's coverage of lexing is in
Chapter 8, "Parsing", on pages 359-375.
Mark's book can be read
on-line.
I highly recommend Mark's book
and own a paper copy.
Actually, regular expressions are well within Marpa's capabilites, and lexical analysis could be done in Marpa. But a look at Mark and Wolfgang's code should convince you that lexical analysis is easy to do in Perl.
Thanks for the flowers. The code snippet you are referring to only is a very quick and dirty first try I made using Marpa. Nice to hear that you like the Lexer. Howerver using this approach it is hard to tell the line number of the source code where an error occurred. But we would not use Perl if no solution existed... By expanding the regex patterns like
the source-text will not get destroyed and the position where the last-matching regex stopped can get queried using
This makes error reporting possible.
I like Marpa very much and experimented even further. The scanner I am currently using is part of an experiment trying to read SCSS-Syntax which is a superset of CSS. It can be found here:
https://github.com/wki/CSS-SCSS/blob/master/lib/CSS/SCSS/Parser/Scanner.pm
Thanks for having created Marpa.
Well, I don't know if there is language like BNF / ABNF for describing tokens, or whether BNF can be used for that, but automatic generation of lexer from rules is what I meant when asking about lexer.
@Jakub: Generators of lexical analyzers have traditionally used regular expression notation as their description language. lex and flex are examples. This means that they don't really look all that different from Perl lexical analyzers, like Mark Dominus's and Wolfgang Kinkeldei's. Any regular expression can be rewritten in BNF, so BNF *could* be used for a lexical analyzer, but the result would probably be a step backward in clarity.
@Jeffrey: Thanks for response.
What would be nice to have is to have among Marpa documentation full example of generating lexer and parses, for example out of description in some Internet RFC (email address perhaps?).
BTW. I wonder how hard would be to write ABNF to Marpa parser using Marpa...