Marpa version of Perl6 Advent Calendar, Day 18

The Perl 6 Advent Calendar, Day 18, in addition to show perl6's builtin grammar facility, was adressing a fundamental aspect of text processing, i.e. native unicode support in a grammar.

Indeed, if we say text processing, we say also characters-oriented framework. The perl6 example was the occasion to test Marpa::R2, and produce a tiny tutorial with it.

A card is a face followed immediately by a suit.



Perl6's definition:

token face {:i <[2..9]> | 10 | j | q | k | a }
proto token suit {*}
token suit:sym<♥> {}
token suit:sym<♦> {}
token suit:sym<♣> {}
token suit:sym<♠> {}


UPDATE [23 december, 2013] character class version:


token face {:i <[2..9 jkqa]> | 10 }
token suit {<[♥♦♣♠]>}





Marpa::R2's definitions:

face ~ [2-9] | '10' | 'j' | 'q' | 'k' | 'a'
suit ~ '♥' | '♦' | '♣' | '♠'


UPDATE [23 december, 2013] character class version:


face ~ [2-9jqka] | '10'
suit ~ [♥♦♣♠]





The rest is decoration, hands definition is a classic left-recursive rule, and "space" is discarded automatically with Marpa::R2's :discard rule:

:start ::= deal
deal ::= hands
hands ::= hand | hands ';' hand
hand ::= card card card card card
card ~ face suit
face ~ [2-9jqkaä]:i | '10' # With ä and :i, to show case-folding work
suit ~ [♥♦♣♠] # Unicode in the grammar
WS ~ [\s]
:discard ~ WS

The difference between '::=' and '~' ? These are two sub-grammars, where '~' grammar (named G0) value produces a token, that '::=' grammar (named G1) is using. Discard rule :discard happens automatically and only in the G1 grammar.


Perl6's advent wants to check for duplicate card. We will say to Marpa::R2 to generate an event each time a card a completed. Marpa events are called once and only once, when an instance of that event occurs. This is expressed like the following:


:lexeme ~ pause => after event => 'card'

and Marpa will always pause as soon as this event happens, up to the application to resume.

So full Marpa grammar is:


:start ::= deal
deal ::= hands
hands ::= hand | hands ';' hand
hand ::= card card card card card
card ~ face suit
face ~ [2-9jqkaä]:i | '10' # With ä and :i, to show case-folding work
suit ~ [♥♦♣♠] # Unicode in the grammar
WS ~ [\s]

:lexeme ~ pause => after event => 'card'
:discard ~ WS

and the parsing logic is:


#
# 1: Parse with Marpa::R2::Scanless:R read()
#
do {
#
# 2: Paused by event on card: Get literal and check for duplicate
#
#
# 3: resume parsing with Marpa::R2::Scanless::R resume()
#
} while (!end of input);
}

A full working example, with error handling, is at this gist.

5 Comments

"Perl6's definition: ...
Marpa::R2's definitions: suit ~ [♥♦♣♠] ..."

I'm pretty sure the Perl 6 grammar could *also* have used a single token (with a character class) for the suits.

Afaik, separating alternations into multiple tokens connected to a proto token, only serves to make a grammar easier to extend via inheritance and easier to debug/introspect.

Agree, suit and face tokens are more verbose than needed. Here's some more concise definitions:

token suit {}
token face {:i | 10 }

Nice work with the Marpa example.

Err, not that concise:


token suit {<[♥♦♣♠]>}
token face {:i <[2..9 jkqa]> | 10 }

Leave a comment

About Jean-Damien Durand

user-pic About::Me::And::Perl