Marpa version of Perl6 Advent Calendar, Day 18

By Jean-Damien Durand on December 21, 2013 7:21 AM under Marpa, Parsing

The Perl 6 Advent Calendar, Day 18, in addition to show perl6's builtin grammar facility, was adressing a fundamental aspect of text processing, i.e. native unicode support in a grammar.

Indeed, if we say text processing, we say also characters-oriented framework. The perl6 example was the occasion to test Marpa::R2, and produce a tiny tutorial with it.

A card is a face followed immediately by a suit.

Perl6's definition:



token face {:i <[2..9]> | 10 | j | q | k | a }

proto token suit {*}

    token suit:sym<♥>  {}

    token suit:sym<♦>  {}

    token suit:sym<♣>  {}

    token suit:sym<♠>  {}

UPDATE [23 december, 2013] character class version:



token face {:i <[2..9 jkqa]> | 10 }

token suit {<[♥♦♣♠]>}

Marpa::R2's definitions:



face ~ [2-9] | '10' | 'j' | 'q' | 'k' | 'a'

suit ~ '♥' | '♦' | '♣' | '♠'

UPDATE [23 december, 2013] character class version:



face ~ [2-9jqka] | '10'

suit ~ [♥♦♣♠]

The rest is decoration, hands definition is a classic left-recursive rule, and "space" is discarded automatically with Marpa::R2's :discard rule:



:start ::= deal

deal ::= hands

hands ::= hand | hands ';' hand

hand ::= card card card card card

card ~ face suit

face ~ [2-9jqkaä]:i | '10' # With ä and :i, to show case-folding work

suit ~ [♥♦♣♠]           # Unicode in the grammar

WS ~ [\s]

 :discard ~ WS

The difference between '::=' and '~' ? These are two sub-grammars, where '~' grammar (named G0) value produces a token, that '::=' grammar (named G1) is using. Discard rule :discard happens automatically and only in the G1 grammar.

Perl6's advent wants to check for duplicate card. We will say to Marpa::R2 to generate an event each time a card a completed. Marpa events are called once and only once, when an instance of that event occurs. This is expressed like the following:



:lexeme ~   pause => after event => 'card'

and Marpa will always pause as soon as this event happens, up to the application to resume.

So full Marpa grammar is:



:start ::= deal

deal ::= hands

hands ::= hand | hands ';' hand

hand ::= card card card card card

card ~ face suit

face ~ [2-9jqkaä]:i | '10' # With ä and :i, to show case-folding work

suit ~ [♥♦♣♠]           # Unicode in the grammar

WS ~ [\s]

 

:lexeme ~   pause => after event => 'card'

:discard ~ WS

and the parsing logic is:



    #

    # 1: Parse with Marpa::R2::Scanless:R read()

    #

    do {

	#

	# 2: Paused by event on card: Get literal and check for duplicate

	#

	#

	# 3: resume parsing with Marpa::R2::Scanless::R resume()

	#

    } while (!end of input);

}

A full working example, with error handling, is at this gist.

5 comments

Tagged as:

Marpa, Parsing, Unicode

5 Comments

Jean-Damien Durand | December 21, 2013 8:14 AM | Reply

Note: you will need Marpa::R2 >= 2.077_013 - sorry about that, but perl6 advent use-case with Marpa revealed a unicode issue recently fixed.

smls | December 23, 2013 12:28 AM | Reply

"Perl6's definition: ...
Marpa::R2's definitions: suit ~ [♥♦♣♠] ..."

I'm pretty sure the Perl 6 grammar could *also* have used a single token (with a character class) for the suits.

Afaik, separating alternations into multiple tokens connected to a proto token, only serves to make a grammar easier to extend via inheritance and easier to debug/introspect.

david.warring replied to comment from smls | December 23, 2013 2:25 AM | Reply

Agree, suit and face tokens are more verbose than needed. Here's some more concise definitions:

token suit {}
token face {:i | 10 }

Nice work with the Marpa example.

david.warring | December 23, 2013 2:44 AM | Reply

Err, not that concise:



    token suit {<[♥♦♣♠]>}

    token face {:i <[2..9 jkqa]> | 10 }

Jean-Damien Durand replied to comment from smls | December 23, 2013 6:34 PM | Reply

You are right, I quoted the article as-is and this give the impression that Perl6 does not support the character class, definitely not fair since I gave the character class version afterwards with Marpa.

I will do an UPDATE section.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Jean-Damien Durand

About::Me::And::Perl

More info »

Jean-Damien Durand