Marpa version of Perl6 Advent Calendar, Day 18
The Perl 6 Advent Calendar, Day 18, in addition to show perl6's builtin grammar facility, was adressing a fundamental aspect of text processing, i.e. native unicode support in a grammar.
Indeed, if we say text processing, we say also characters-oriented framework. The perl6 example was the occasion to test Marpa::R2, and produce a tiny tutorial with it.
A card is a face followed immediately by a suit.
Perl6's definition:
token face {:i <[2..9]> | 10 | j | q | k | a }
proto token suit {*}
token suit:sym<♥> {}
token suit:sym<♦> {}
token suit:sym<♣> {}
token suit:sym<♠> {}
UPDATE [23 december, 2013] character class version:
token face {:i <[2..9 jkqa]> | 10 }
token suit {<[♥♦♣♠]>}
Marpa::R2's definitions:
face ~ [2-9] | '10' | 'j' | 'q' | 'k' | 'a'
suit ~ '♥' | '♦' | '♣' | '♠'
UPDATE [23 december, 2013] character class version:
face ~ [2-9jqka] | '10'
suit ~ [♥♦♣♠]
The rest is decoration, hands definition is a classic left-recursive rule, and "space" is discarded automatically with Marpa::R2's :discard rule:
:start ::= deal
deal ::= hands
hands ::= hand | hands ';' hand
hand ::= card card card card card
card ~ face suit
face ~ [2-9jqkaä]:i | '10' # With ä and :i, to show case-folding work
suit ~ [♥♦♣♠] # Unicode in the grammar
WS ~ [\s]
:discard ~ WS
The difference between '::=' and '~' ? These are two sub-grammars, where '~' grammar (named G0) value produces a token, that '::=' grammar (named G1) is using. Discard rule :discard happens automatically and only in the G1 grammar.
Perl6's advent wants to check for duplicate card. We will say to Marpa::R2 to generate an event each time a card a completed. Marpa events are called once and only once, when an instance of that event occurs. This is expressed like the following:
:lexeme ~pause => after event => 'card'
and Marpa will always pause as soon as this event happens, up to the application to resume.
So full Marpa grammar is:
:start ::= deal
deal ::= hands
hands ::= hand | hands ';' hand
hand ::= card card card card card
card ~ face suit
face ~ [2-9jqkaä]:i | '10' # With ä and :i, to show case-folding work
suit ~ [♥♦♣♠] # Unicode in the grammar
WS ~ [\s]
:lexeme ~pause => after event => 'card'
:discard ~ WS
and the parsing logic is:
#
# 1: Parse with Marpa::R2::Scanless:R read()
#
do {
#
# 2: Paused by event on card: Get literal and check for duplicate
#
#
# 3: resume parsing with Marpa::R2::Scanless::R resume()
#
} while (!end of input);
}
A full working example, with error handling, is at this gist.
Note: you will need Marpa::R2 >= 2.077_013 - sorry about that, but perl6 advent use-case with Marpa revealed a unicode issue recently fixed.
"Perl6's definition: ...
Marpa::R2's definitions: suit ~ [♥♦♣♠] ..."
I'm pretty sure the Perl 6 grammar could *also* have used a single token (with a character class) for the suits.
Afaik, separating alternations into multiple tokens connected to a proto token, only serves to make a grammar easier to extend via inheritance and easier to debug/introspect.
Agree, suit and face tokens are more verbose than needed. Here's some more concise definitions:
token suit {}
token face {:i | 10 }
Nice work with the Marpa example.
Err, not that concise:
You are right, I quoted the article as-is and this give the impression that Perl6 does not support the character class, definitely not fair since I gave the character class version afterwards with Marpa.
I will do an UPDATE section.