Yet another BNF: Extended Marpa Scanless InterFace
This post is to introduce another BNF, namely MarpaX::ESLIF - as the name suggests, it is largely inspired by Marpa::R2's BNF, and aim to extend the later.
The intent was to provide the following features:
- native regular expression
- support of syntactic exception
- externalized data reader in a streaming compatible architecture
- unlimited number of sub-grammars
This is done using a built-in version of PCRE2.
Although it looks like Marpa's BNF, it is not fully backward compatible with it! I invite readers to read the Introduction, that is covering the architecture and the main features, as well as its BNF.
The inner implementation is an XS proxy to a complete C library built on top of Marpa::R2's core engine, namely c-marpaESLIF.
This could have never exist without remarkable Marpa library, copyrighted by Jeffrey, that I applaud here for his fantastic work that deserve a wide audience IMHO.
I have also uploaded a JSON parser, MarpaX::ESLIF::ECMA404 to give a concrete example of how MarpaX::ESLIF is working. If these packages can boost marpa reputation, great.
Impatient readers ? Here is an ESLIF version of JSON grammar, hopefully correct-;
# ---------------------------- # JSON Grammar as per ECMA-404 # ---------------------------- # # Default action is to propagate the first RHS value # :default ::= action => ::shift # # JSON starting point is value # :start ::= value # # ---------------------------- # I explicitely expose string grammar for one reason: inner string # elements have specific actions # ---------------------------- # object ::= '{' members '}' action => ::copy[1] members ::= pairs* separator => ',' action => members pairs ::= string ':' value action => pairs array ::= '[' elements ']' action => ::copy[1] elements ::= value* separator => ',' action => array_ref value ::= string | number | object | array | 'true' | 'false' | 'null'# -------------------------
# Unsignificant whitespaces
# -------------------------
:discard ::= /[\x{9}\x{A}\x{D}\x{20}]*/# -----------
# JSON string
# -----------
# Executed in the top grammar and not as a lexeme.
# This is why we shutdown temporarily :discard in it.
#
string ::= '"' discardOff chars '"' discardOn action => ::copy[2]
discardOff ::=
discardOn ::=event :discard[on] = nulled discardOn
event :discard[off] = nulled discardOffchars ::= filled
filled ::= char+ action => ::concat
chars ::= action => empty_string
char ::= [^"\\[:cntrl:]]
| '\\' '"' action => ::copy[1]
| '\\' '\\' action => ::copy[1]
| '\\' '/' action => ::copy[1]
| '\\' 'b' action => backspace_character
| '\\' 'f' action => formfeed_character
| '\\' 'n' action => newline_character
| '\\' 'r' action => return_character
| '\\' 't' action => tabulation_character
| '\\' 'u' /[[:xdigit:]]{4}/ action => hex2codepoint_character# ------------------------------------------
# JSON number: defined as a single terminal.
# ECMA404 numbers are 100% compliant with perl numbers syntax AFAIK.
# -------------------------------------------------------------------
#
number ::=
/\-?(?:(?:[1-9]?[0-9]*)|[0-9])(?:\.[0-9]*)?(?:[eE](?:[+-])?[0-9]*)?/
Leave a comment