First p2 milestone passed: parse 50% of perl5

By Reini Urban on September 11, 2013 11:23 PM

This week I spent some time to fix the remaining p2 parser problems. I set my first goal to parse 50% of perl5 syntax by summer 2013. I want to show some real code and benchmarks at the upcoming YAPC::Asia in Tokyo.

After the YAPC::EU in Kiev I first spent some time fixing B::C for 5.16, 5.18 and blead (PADLIST, COW and bytecode compiler regressions). In the meantime my partner in crime goccy in Tokyo made progress with his parser, compiler and the new llvm backend. His parser is hand-written and I'm still fighting with greg, but my code is still much smaller and more elegant.

I spent most of the last days fixing the expression parser and how to parse functions and method calls. Every expr is a list (TUP for tuple), and calls can be like call arg1, args... (implicit list context), call call args... (nested calls), call(list) (explicit lists), and similar for methods. Add indirect method calls to the mix that p2 does only methods, no functions calls. Every call is a method one some object, user-provided subs not in a class (or package) are stored globally and act on the global 'Lobby' object.

expr = c:method           { $$ = PN_AST(EXPR, c) }
    | c:calllist          { $$ = PN_AST(EXPR, c) }
    | c:call e:expr       { $$ = PN_AST(EXPR, PN_PUSH(PN_S(e,0), PN_S(c,0))); }
    | c:call l:listexprs  { $$ = PN_SHIFT(PN_S(l,0));
            if (!PN_S(l, 0)) { PN_SRC(c)->a[1] = PN_SRC($$); }
            $$ = PN_PUSH(PN_TUP($$), c); }
    | e:opexpr            { $$ = PN_AST(EXPR, PN_TUPIF(e)) }
    | c:call              { $$ = PN_AST(EXPR, c) }
    | e:atom              { $$ = PN_AST(EXPR, PN_TUPIF(e)) }

calllist = m:name - list-start - list-end
           { PN_SRC(m)->a[1] = PN_SRC(PN_AST(LIST, PN_NIL)); $$ = PN_TUP(m) }
         | m:name - l:list -
           { PN_SRC(m)->a[1] = PN_SRC(l); $$ = PN_TUP(m) }
         | m:name - list-start l:callexprs list-end -
           { PN_SRC(m)->a[1] = PN_SRC(PN_AST(LIST, l)); $$ = PN_TUP(m) }
call = m:name - { $$ = PN_TUP(m) }
method = v:methlhs - arrow m:name - l:list -
         { PN_SRC(m)->a[1] = PN_SRC(l); $$ = PN_PUSH(PN_TUPIF(v), m) }
       | v:methlhs - arrow m:name -
         { $$ = PN_PUSH(PN_TUPIF(v), m) }

methlhs is a name or $scalar. The biggest problem was some missing whitespace, and the differences of signature parsing, from the weird potion way, which compiles expr in the compiler to sigs to the new way in p2, where the parser already generates proper signatures. The PN_SHIFT(PN_S(l,0)) orgy above in call listexprs is for moving the object from the first arg of the call to the front, the indirect method call. I'm not happy with that.

All my perl5 tests pass now, which means there are a lot of new features to explore, like declaring default parameters and calling with named parameters (no need for hash abuse anymore).

$ cat test/closures/named.pl
sub min ($x, $y) { $y - $x }
@b = (99, 98, 97);
$b[1] = "XXX";
(1, min($y=12, $x=89), $b[2], $b[1]) #=> (1, -77, 97, XXX)

$ cat test/closures/default.pl
sub min ($x=0, $y=1) { $y - $x }
(min(), min(1), min(0,1), min($y=0), min, min->arity, min->minargs)
#=> (1, 0, 1, 1, sub($x:=0,$y:=1), 2, 0)

And this is the p2 parse tree

$ bin/p2 -Dv test/closures/default.pl

-- parsed --
code (assign (expr (msg ("min")) expr (proto (list ($x, 58, 0, $y, 58, 1) block (expr (minus (msg ("$y") msg ("$x"))))))), expr (list (expr (msg ("min" list undef undef)), expr (msg ("min" list (expr (value (1))) undef)), expr (msg ("min" list (expr (value (0)), expr (value (1))) undef)), expr (msg ("min" list (assign (expr (msg ("$y")) expr (value (0)))) undef)), expr (msg ("min")), expr (msg ("min"), msg ("arity")), expr (msg ("min"), msg ("minargs")))))

The sig ($x=0, $y=1) is parsed to ($x, 58, 0, $y, 58, 1), 58 being chr(:). potion uses the = for type assignment and := for defaults, hence : instead of =. The 3rd element for each sig, here 0 denotes the default value, which can only be immediate values for now. It could be an expression also, but I dislike the idea. Looks like action at a distance.

Currently all variables and subs are lexical only, work for dynamic symbol lookup is still in a branch. And no example from the shootout benchmark works yet, as I haven't implemented yet for loops, recursive function calls are buggy and similar stuff.

Since the underlying parse tree and vm code is 1:1 the same as for potion, the benchmarks are the same as for potion, i.e. typically 30x faster than perl5 code.

4 comments

Tagged as:

4 Comments

Roland Lammel | September 12, 2013 8:33 AM

I'm really impressed by how much steam the perl core projects are having currently. I'm really looking forward to see your next milestone (and those of MoarVM and the JVM stuff)!

Great work and thanks for sharing your progress.

Reini Urban | September 16, 2013 12:01 AM

Fixed the benchmark examples now.

fib.pl did not work because there was a bytecode compiler problem with the representation of 0 (as 1), which was interpreted as true instead of false (issue #24).

And nbody had a wrong algorithm. Works now.

But nbody is currently with bytecode about 2x slower than perl5, and jitted there is a stack corruption problem somewhere. This is disappointing.
Why bytecode is so slow? It should be faster then perl5 at least. 1. there are no compiler optimizations, esp. constant folding implemented.
And 2. there's no array access op. Everything goes through a method call. Which is very dynamic and nice for adding a tie or overload interface later, but slow if untied. And it should be optimized at least for the constant array index case (aelemfast).
Ditto for hashes.

Reini Urban | September 16, 2013 1:25 AM

Spoke too early. Found out why my bytecode was slow. It's again 2-3x faster than perl5.

I added default signatures and named calls, and had to add run-time checks for signatures for every function call.
The check was done very prematurely, now it's a lot faster again.

I also added tuple overallocation to help the GC a bit. The current GC kicks in at every single memory change, but should only run periodically, triggered by a periodic timer or memory segfault.

Reini Urban | September 29, 2013 1:02 AM

Unfortunately I ran out of time in the 40minutes at #yapcasia and I couldn't demo the new debugger I wrote on the plane. I can single step through the codeand do eval, but not yet access the lexical variables in scope.

About Reini Urban

Working at cPanel on cperl, B::C (the perl-compiler), parrot, B::Generate, cygwin perl and more guts, keeping the system alive.

More info »

Reini Urban