parser updates

I worked on a new perl11 vm, p2, in the last months. In some perl11 meetings we identified several problems with the current architecture, and came to similar results as the parrot discussions a decade before.

Not only is the VM (the bytecode interpreter) horribly designed as previously observed by Gregg & Ertl 2003, also the parser is an untangable and not maintainable beast. And since any future VM should be able to parse and run perl5 and perl6 together, that's why we reserved use v6; and use v5;

Any new perl vm such as parrot, nqp with the jvm or other backends, niecza or p2 need to be able to parse both. perl6 cannot afford to leave perl5 aside, even if it's a much nicer language. That's why parrot came up first with the PGE based parser framework, which made it super easy for other language to target parrot in the first years.

Parrot's PGE library is based on peg - Parsing Expression Grammar, a new parser language, different from the old yacc or hand-written recursive descent parsers. The only peg C library is peg/leg by Ian Piumarta, which is also used by p2, based on some extensions by why the lucky stiff, renamed to "greg" and subsequently used for other little languages also. _why advanced it from 0.1.9 to 0.2.2 with potion, I advanced it to 0.2.3 for p2, Amos Wenger advanced it to 0.4.3 for his ooc language by adding error blocks and fixing some bugs, and today I advanced it to 0.4.5.

This is only greg, the parser generator, not the p5 or p6 syntax itself.

Larry's perl6/std with the viv metacompiler contains the canonical Perl6 grammar and now also a Perl5 grammar. Written in perl6, interpreted and compiled in perl5 (via viv).

Flavio Glock wrote hand-written p5 and p6 parsers for perlito, and those parsers really show off, as they look much nicer, readable and maintainable as table-driven parsers such as ours, if based on yacc (perl5) or peg (std, p6, p2).

Using a PGE library means that you can interpret the parser statemachine at run-time, which easily allows parser extensions, so called macros. Using a standalone parser generator, such as yacc, marpa, greg/peg/leg just generates C code for the parser statemachine, but extending such a statemachine dynamically is a not yet solved problem. Ian Piumarta and his idst crew work on the basis of such interpreted but efficient parsers, e.g. by jitting the statemachine as done in maru. That is something like a jitted regex engine, just a bit more advanced, as a regex is just a small subset of a general parser.

Advancing on that I believe that the new regex engine should be builtin into the VM, such as the LPeg library for lua is a general PEG-based matcher, which can be used to implement the simplier pcre library. I can really feel the pain of normal programmers which need to use the old-style perl,grep,sed regular expressions, while they could use a richer language as in LPeg or lisp matchers.

I am the opinion that an extendable parser needs to be based on LR based, such as yacc, and not on PEG and its ordered rules. Only with LR you can easily add rules alternatives without destroying the fragile order of evaluation in a PEG. With a PEG you'd need to add the position of the new macro rule manually.

So using greg is effectively a dead end, and I'd need to start extending yacc somewhen, adding a yacc library and yacc run-time. Which means something like hooking the created statemachine into my vm, or by jitting the states. The java based parser generators, like antlr have a huge advantage there.

4 Comments

I've been following your progress on github, and saw that you hadn't made any commits in a while. When you say "greg is a dead end" does this mean that your work trying to use Potion/P2 as a back end for Perl also is a dead end? Are you starting over?

I agree with the principle that any significant new Perl VM should be able to handle both Perl 6 and Perl 5 syntax, made easy in principle by dispatching on the simple reserved declarations use v6 or use v5 in code. This is also a principle that I included in my new programming language, such that code written in it is required to declare what version(s) of the language it is known to conform to or not conform to. Having language-identifying declarations common in code makes it much easier to support multi-language programs or compilers.

I'm Darren Duncan BTW, don't know what's up with blogs.perl.org's interaction with Google identities in the prior comment.

About Reini Urban

user-pic Working at cPanel on cperl, B::C (the perl-compiler), parrot, B::Generate, cygwin perl and more guts, keeping the system alive.