Perl 6 Core Hacking: QASTalicious
Read this article on Rakudo.Party
Over the past month, I spent some time in Rakudo's QAST land writing a few optimizations, fixing bugs involving warnings, as well as squashing a monster hive of 10 thunk scoping bugs with a single commit. In today's article, we'll go over that last feat in detail, as well as learn what QAST is and how to work with it.
PART I: The QAST
"QAST" stands for "Q" Abstract Syntax Tree. The "Q" is there because it's comes after letter "P", and "P" used to be in "PAST" to stand for "Parrot", the name of an earlier, experimental Perl 6 implementation (or rather, its virtual machine). Let's see what QAST is all about!
Dumping QAST
Every Rakudo Perl 6 program
compiles down to a tree of QAST nodes and you can dump that tree if you specify
--target=ast
or --target=optimize
command line option to perl6
when
compiling a program or a module:
$ perl6 --target=ast -e 'say "Hello, World!"'
[...]
- QAST::Op(call &say) <sunk> :statement_id<?> say \"Hello, World!\"
- QAST::Want <wanted> Hello, World!
- QAST::WVal(Str)
- Ss
- QAST::SVal(Hello, World!)
[...]
The difference between the --target=ast
and --target=optimize
is that
the former shows the QAST tree as soon as it has been generated, while
the later shows the QAST tree after the static optimizer has had a go at it.
While the command line option gives you the QAST for the entire program
(excluding modules pre-compiled separately), each
QAST::Node
object has a .dump
method you can use to dump specific QAST pieces of interest
from within Rakudo's source code.
For example, to examine the QAST generated by the statement
token, I'd
find method statement
in src/Perl6/Actions.nqp
and stick nqp::say('statement QAST: ' ~ $past.dump)
close to the end of
the method.
Since Rakudo's compilation takes a couple of minutes for each go, I like to key my debug dumps on env variables, like this:
nqp::atkey(nqp::getenvhash(),'ZZ1') && nqp::say('ZZ1: something or other');
...
nqp::atkey(nqp::getenvhash(),'ZZ2') && nqp::say('ZZ2: something else');
Then, I can execute the compiled ./perl6
as if I didn't add anything, and
enable my dumps by running ZZ1=1 ./perl6
, ZZ2=1 ./perl6
, or both dumps
at the same time with ZZ1=1 ZZ2=1 ./perl6
.
Viewing QAST
Looking at the output of --target
dumps in the terminal is sufficient
for a quickie glance at the trees, but for extra assistance you can install
CoreHackers::Q
module that
brings in q
command line utility.
Simply prefix your regular perl6
invocation with q a
or q o
to produce
--target=ast
and --target=optimize
QAST dumps respectively. The program
will generate out.html
file in the current directory:
$ q a perl6 -e 'say "Hello, World!"'
$ firefox out.html
Pop open the generated HTML file and reap these benefits:
- Color-coded QAST nodes
- Color hints for sunk nodes
- Ctrl+Click on any node to collapse it
- Muted view of QAST::Want alternatives, makes it easier to ignore them
Eventually, I hope to extend this tool and make it more helpful, but at the time of this writing, that's all it does.
The QAST Forest
There are four main files in
rakudo's source
where you'd expect to be working with QAST
nodes: src/Perl6/Grammar.nqp
, src/Perl6/Actions.nqp
, src/Perl6/World.nqp
,
and src/Perl6/Optimizer.nqp
. If you're using Z-Script
utility, you can even run z q
command to open these
four files in Atom
editor.
Grammar.nqp
is the Perl 6 grammar. Actions.nqp
are the actions for it.
World.nqp
contains all sorts of helpful routines used by both Grammar.nqp
and Actions.nqp
that access them via the $*W
dynamic variable containing a
Perl6::World
object.
Lastly, Optimizer.nqp
contains Rakudo's static optimizer.
The root (of all evil) is the QAST::Node
object, with all the other QAST nodes
being its subclasses. Let's review some of the popular ones:
QAST::Op
QAST::Op
nodes are the workhorse of the QAST world. The :op
named argument
specifies the name of an NQP op or the name of a Rakudo's NQP
extension op
and its children are the arguments:
Here's a say
op printing a string value:
QAST::Op.new: :op<say>,
QAST::SVal.new: :value('Hello, World!');
And here's a QAST node for a call
op that calls Perl 6's infix:<+>
operator; notice how the name of the routine we call is given via :name
named
argument:
QAST::Op.new: :op<call>, :name('&infix:<+>'),
QAST::IVal.new( :value(2)),
QAST::IVal.new: :value(2)
QAST::*Val
The QAST::SVal
, QAST::IVal
, QAST::NVal
, and QAST::WVal
nodes, specify
string, integer, float, and "World" object values respectively. The first three
are the "unboxed" raw values, while World objects are everything else, such
as DateTime
,
Block
, or
Str
objects.
QAST::Want
Some of the objects can be represented by multiple QAST::*Val
nodes, where
the most appropriate value is used depending on what is wanted in the
current context. QAST::Want
node contains these alternatives, interleaved
with string markers indicating what those alternatives are.
For example, numeric value 42
in Perl 6 could be wanted as an object
to call some method on, or as a raw value to be assigned to a native
int
variable. The QAST::Want
node for it would look like this:
QAST::Want.new:
QAST::WVal.new(:value($Int-obj))),
'Ii',
QAST::IVal.new: :value(42)
The $Int-obj
above would contain an instance of
Int
type with value set to 42
. The
Ii
marker indicates the following alternative is an integer value and we
provide a QAST::IVal
object containing it. The other possible markers are
Nn
(float), Ss
(string), and v
(void context) alternatives.
When these nodes are later converted to bytecode, the most appropriate value will be selected, with the first child being the "default" value, to be used when none of the available alternatives make the cut.
QAST::Var
These nodes are used for variables and parameters. The :name
named argument
specifies the name of the variable and :scope
its scope:
QAST::Op.new: :op('bind'),
QAST::Var.new(:name<$x>, :scope<lexical>, :decl<var>, :returns(int)),
QAST::IVal.new: :value(0)
The :decl
named arg is present when the node is used for the variable's
declaration (when it's absent, we simply reference the variable) and its
value dictates
what sort of variable it is: var
for variables and param
for routine
parameters. Several other :decl
types, as well as optional arguments
specifying additional configuration of the variable exist. You can find them
discussed in the QAST
documentation
QAST::Stmt
/ QAST::Stmts
These are statement grouping constructs. For example, here, the truthy branch
of an nqp::if
contains three nqp::say
statements, all grouped inside
QAST::Stmts
:
QAST::Op.new: :op<if>,
QAST::IVal.new(:value(42)),
QAST::Stmts.new(
QAST::Op.new( :op<say>, QAST::SVal.new: :value<foo>),
QAST::Op.new( :op<say>, QAST::SVal.new: :value<bar>),
QAST::Op.new: :op<say>, QAST::SVal.new: :value<ber>),
QAST::Op.new: :op<say>, QAST::SVal.new: :value<meow>,
The singular QAST::Stmt
is similar. The difference is it
marks a register allocation boundary, beyond which, any temporaries are
free to be reused. When used correctly, this alternative can result in better
code generation.
QAST::Block
This node is both a unit of invocation and a unit of lexical scoping. For
example, code sub foo { say "hello" }
might compile to a QAST::Block
like this:
Block (:cuid(1)) <wanted> :IN_DECL<sub> { say \"hello\" }
[...]
Stmts <wanted> say \"hello\"
Stmt <wanted final> say \"hello\"
Want <wanted>
Op (call &say) <wanted> :statement_id<?> say \"hello\"
Want <wanted> hello
WVal (Str)
- Ss
SVal (hello)
- v
Op (p6sink)
Op (call &say) <wanted> :statement_id<?> say \"hello\"
Want <wanted> hello
WVal (Str)
- Ss
SVal (hello)
[...]
Each block demarcates a lexical scope boundary—this detail comes into play in Part II of this article, when we'll be going over a fix for a bug.
Others
A few more QAST nodes exist. They're out of scope of this article, but you may wish to read the documentation or, since some of them are not appear in those docs, go straight to the source.
Executing QAST Trees
Having a decent familarity with
nqp ops
(as well as Rakudo's nqp extensions) is helpful when working with
QAST. A sharp eye would notice in QAST dumps that many QAST::Op
nodes
correspond to nqp::*
op calls, where :op
named argument specifies the name
of the op.
When writing large QAST trees, it's handy to write them down using pure NQP ops first, and then translate the result into a tree of QAST node objects. Let's look at a simplified example:
nqp::if(
nqp::isgt_n(nqp::rand_n(1e0), .5e0),
nqp::say('Glass half full'),
nqp::say('Glass half empty'));
We have NQP op, so we'll start with QAST::Op
node, using 'if'
as the
value for :op
. The op takes three positional arguments—the three
ops used for the conditional, the truthy branch, and the falsy branch.
Some of the ops also take float and string values, so we'll use QAST::NVal
and QAST::SVal
nodes for those. The result is:
QAST::Op.new(:op('if'),
QAST::Op.new(:op('isgt_n'),
QAST::Op.new(:op('rand_n'),
QAST::NVal.new(:value(1e0))
),
QAST::NVal.new(:value(.5e0))
),
QAST::Op.new(:op('say'),
QAST::SVal.new(:value('Glass half full'))
),
QAST::Op.new(:op('say'),
QAST::SVal.new(:value('Glass half empty'))
)
)
I find it easier to track the tree's nesting by using parentheses only when necessary, preferring colon method call syntax whenever possible:
QAST::Op.new: :op<if>,
QAST::Op.new(:op<isgt_n>,
QAST::Op.new(:op<rand_n>,
QAST::NVal.new: :value(1e0)),
QAST::NVal.new: :value(.5e0)),
QAST::Op.new(:op<say>,
QAST::SVal.new: :value('Glass half full')),
QAST::Op.new: :op<say>,
QAST::SVal.new: :value('Glass half empty')
If a .new
is followed by a colon, there aren't any
more nodes on the same level. If .new
is followed by an opening parentheses,
there are more sister nodes yet to come.
Due to Rakudo's lengthy compilation, it can be handy to execute your QAST tree
without having to stick it into src/Perl6/Actions.nqp
or similar file first.
To some extent, it's possible to do that with a regular Perl 6 program. We'll
simply access Perl6::World
object in $*W
variable inside a BEGIN
block,
where it still exists, and call .compile_time_evaluate
method, giving it
an empty variable as the first positional (it expects a Match
object for
the tree) and our QAST tree as the second positional:
use QAST:from<NQP>;
BEGIN $*W.compile_time_evaluate: $,
QAST::Op.new: :op<if>,
QAST::Op.new(:op<isgt_n>,
QAST::Op.new(:op<rand_n>,
QAST::NVal.new: :value(1e0)),
QAST::NVal.new: :value(.5e0)),
QAST::Op.new(:op<say>,
QAST::SVal.new: :value('Glass half full')),
QAST::Op.new: :op<say>,
QAST::SVal.new: :value('Glass half empty')
The one caveat with this method is we're using full-blown Perl 6 language,
whereas in src/Perl6/Actions.nqp
and related files, as .nqp
extension
suggests, we're using NQP language only. Keep an eye out for strange
explosions; it's possible your QAST tree that explodes in Perl 6 will compile
just fine in the land of pure NQP.
Annotating QAST Nodes
All QAST nodes support annotations that allow you to attach an arbitrary value
to a node and then read that value elsewhere. To add an annotation, use
.annotate
method, which takes two positional arguments—a string containing
name of the annotation and the value to attach to it—and returns that
value. Recent
versions of NQP also have .annotate_self
method that works the same, except
it returns the QAST node itself:
$qast.annotate_self('foo', 42).annotate: 'bar', 'meow';
Later, you can read that value using .ann
method that takes the name of
the annotation as the argument. If the annotation doesn't exist,
NQPMu
is returned instead:
note($qast.ann: 'foo'); # OUTPUT: «42»
You can also check for whether an annotation merely exists using .has_ann
method that returns 1
(true) or 0
(false):
note($qast.has_ann: 'bar'); # OUTPUT: «1»
Or dump all of the annotations on the node (to prevent potential flood of output, most values will be dumped as simply a question mark):
note($qast.dump_annotations); # OUTPUT: « :bar<?> :foo<?>»);
Lasty, to clear all annotations on the node, simply call .clear_annotations
method.
Mutating QAST Nodes
A handy thing to do with QAST node objects is to mutate them into something
better. That's essentially all the static optimizer in
src/Perl6/Optimizer.nqp
does. Named arguments can be mutated by calling
them as methods and providing a value. For example, $qast.op('callstatic')
will change the value of :op
from whatever it is to callstatic
. Positional
arguments can be altered by re-assignment to a positional index, as well
as shift
, push
, unshift
, pop
operations performed either via method
calls with those names or nqp::
ops. Some nodes also support nqp::elems
calls on them, which is slightly faster than the generic pattern of +@($qast)
that can be used on all nodes to find out the number of children a node
contains.
As an exercise, let's write a small optimization: some operations, like
$foo < $bar < $ber
compile to nqp::chain
ops. That is so even if
we have only two children, e.g. $foo < $bar
. In such cases, rewriting
the op to be nqp::call
has performance advantages: not only nqp::call
on its own is a little bit faster than nqp::chain
, the static optimizer knows
how to do further optimizations on nqp::call
ops.
Let's take a look at what both 2-child and 2+-child nqp::chain
chains look
like:
$ perl6 --target=ast -e '2 < 3 < 4; 2 < 3'
The first statement compiled to this (I removed QAST::Want
s for clarity):
- QAST::Op(chain &infix:«<») :statement_id<?> <
- QAST::Op(chain &infix:«<») <wanted> <
- QAST::IVal(2)
- QAST::IVal(3)
- QAST::IVal(4)
And the second one to:
- QAST::Op(chain &infix:«<») :statement_id<?> <
- QAST::IVal(2)
- QAST::IVal(3)
Thus, to target our optimization correctly, we need to ensure neither child of
our chain
op is a chain
op. In addition, we need to ensure that the
op we're optimizing is not itself a child of another chain
op.
Raking the code of the optimizer, we can spot that
chain depth is already tracked via $!chain_depth
attribute, so we merely need
to ensure we're at the first link of the chain. The code then becomes:
$qast.op: 'call'
if nqp::istype($qast, QAST::Op)
&& $qast.op eq 'chain'
&& $!chain_depth == 1
&& ! (nqp::istype($qast[0], QAST::Op) && $qast[0].op eq 'chain')
&& ! (nqp::istype($qast[1], QAST::Op) && $qast[1].op eq 'chain');
Once we find a chain
QAST::Op
, we index into it and use nqp::istype
to
check the type of kid nodes, and if those happen to be QAST::Op
nodes,
we ensure the :op
parameter is not a chain
op. If all of the conditions are met, we simply call .op
method on our node with value 'call'
to convert
it into a call
op.
We then stick our optimization early enough into .visit_op
method of the optimizer and its later portions will
further optimize our call
.
A fairly easy and straightforward optimization that can bring a lot of benefit.
PART II: A Thunk in The Trunk
Note: it took me three evenings to debug and fix the following tickets. To learn the solution I tried many dead ends that I won't be covering, to keep you from getting bored, and instead will instantly jump to conclusions. The point I'm making is that fixing core bugs is a lot easier than may seem from reading this article—you just need to be willing to spend some time on them.
Now that we have some familiarity with QAST, let's try to fix a
bug that existed in Rakudo v2018.01.30.ga.5.c.2398.cc
and earlier. The ticket in question is
R#1212, that shows
the following problem:
$ perl6 -e 'say <a b c>[$_ xx 2] with 1'
Use of Nil in string context
in block at -e line 1
Unable to call postcircumfix [ (Any) ] with a type object
Indexing requires a defined object
in block <unit> at -e line 1
It looks like the $_
topical variable inside the indexing brackets fails
to get the value from with
statement modifier and ends up being undefined.
Sounds like a challenge!
It's A Hive!
Both with
and xx
operator create thunks (thunks are like blocks of code,
without having explicit blocks in the code; this, for example, lets
rand xx 10
to produce 10 different random values; rand
is thunked and the
thunk is called for each iteration). This reminded me of some other tickets
I've seen, so I went to fail.rakudo.party and
looked through open tickets for anything that mentioned thunking or wrong
scoping.
I ended up with a list of 7 tickets, and with the help of dogbert++ later increased the number to 9, which with the original Issue gives us a total of 10 different manifestations of a bug. The other tickets are RT#130575, RT#132337, RT#131548, RT#132211, RT#126569, RT#128054, RT#126413, RT#126984, and RT#132172. Quite a bug hive!
Test It Out
Our starting point is to cover each manifestation of the bug with a test.
Make all the test pass and you know you've fixed the bug, plus you already
have something to place into roast, to
cover the tickets. My tests ended up looking like this, where I've used
gather
/take
duo to capture
what the tickets' code printed to the screen:
use Test;
plan 1;
subtest 'thunking closure scoping' => {
plan 10;
# https://github.com/rakudo/rakudo/issues/1212
is-deeply <a b c>[$_ xx 2], <b b>.Seq, 'xx inside `with`' with 1;
# RT #130575
is-deeply gather {
sub itcavuc ($c) { try {take $c} andthen 42 };
itcavuc $_ for 2, 4, 6;
}, (2, 4, 6).Seq, 'try with block and andthen';
# RT #132337
is-deeply gather {
sub foo ($str) { { take $str }() orelse Nil }
foo "cc"; foo "dd";
}, <cc dd>.Seq, 'block in a sub with orelse';
# RT #131548
is-deeply gather for ^7 {
my $x = 1;
1 andthen $x.take andthen $x = 2 andthen $x = 3 andthen $x = 4;
}, 1 xx 7, 'loop + lexical variable plus chain of andthens';
# RT #132211
is-deeply gather for <a b c> { $^v.uc andthen $v.take orelse .say },
<a b c>.Seq, 'loop + andthen + orelse';
# RT #126569
is-deeply gather { (.take xx 10) given 42 }, 42 xx 10,
'parentheses + xx + given';
# RT #128054
is-deeply gather { take ("{$_}") for <aa bb> }, <aa bb>.Seq,
'postfix for + take + block in a string';
# RT #126413
is-deeply gather { take (* + $_)(32) given 10 }, 42.Seq,
'given + whatever code closure execution';
# RT #126984
is-deeply gather {
sub foo($x) { (* ~ $x)($_).take given $x }; foo(1); foo(2)
}, ("11", "22").Seq, 'sub + given + whatevercode closure execution';
# RT #132172
is-deeply gather { sub {
my $ver =.lines.uc with "totally-not-there".IO.open
orelse "meow {$_ ~~ Failure}".take and return 42;
}() }, 'meow True'.Seq, 'sub with `with` + orelse + block interpolation';
}
When I brought up the first bug in our dev
chatroom, jnthn++ pointed
out that such bugs are often due to mis-scoped blocks, as p6capturelex
op that's involved needs to be called in the immediate outer of the block
it references.
Looking through the tickets, I also spotted skids++'s note that changing
a conditional for statement_id
in block migrator predicate
fixed one of the tickets.
This wasn't the full story of the fix, as the many still-failing tests showed, but it
was a good start.
What's Your Problem?
In order to find the best solution for a bug, it's important to understand
what exactly is the problem. We know mis-scoped blocks are the cause of the
bug, so lets grab each of our tests, dump their QAST (--target=ast
), and
write out how mis-scoped the blocks are.
To make it easier to match the QAST::Block
s with the QAST::WVal
s
referencing them, I made a modification to QAST::Node.dump
to include CUID numbers and statement_id
annotations in the dumps.
Going through mosts of the buggy code chunks, we have these results:
is-deeply <a b c>[$_ xx 2], <b b>.Seq, 'xx inside `with`' with 1;
# QAST for `xx` is ALONGSIDE RHS `andthen` thunk, but needs to be INSIDE
is-deeply gather {
sub itcavuc ($c) { try {take $c} andthen 42 };
itcavuc $_ for 2, 4, 6;
}, (2, 4, 6).Seq, 'try with block and andthen';
# QAST for try block is INSIDE RHS `andthen` thunk, but needs to be ALONGSIDE
is-deeply gather {
sub foo ($str) { { take $str }() orelse Nil }
foo "cc"; foo "dd";
}, <cc dd>.Seq, 'block in a sub with orelse';
# QAST for block is INSIDE RHS `andthen` thunk, but needs to be ALONGSIDE
is-deeply gather for ^7 {
my $x = 1;
1 andthen $x.take andthen $x = 2 andthen $x = 3 andthen $x = 4;
}, 1 xx 7, 'loop + lexical variable plus chain of andthens';
# each andthen thunk is nested inside the previous one, but all need to be
# ALONGSIDE each other
is-deeply gather for <a b c> { $^v.uc andthen $v.take orelse .say },
<a b c>.Seq, 'loop + andthen + orelse';
# andthen's block is INSIDE orelse's but needs to be ALONGSIDE each other
is-deeply gather { (.take xx 10) given 42 }, 42 xx 10,
'parentheses + xx + given';
# .take thunk is ALONGSIDE given's thunk, but needs to be INSIDE of it
is-deeply gather { take ("{$_}") for <aa bb> }, <aa bb>.Seq,
'postfix for + take + block in a string';
# the $_ is ALONGSIDE `for`'s thunk, but needs to be INSIDE
is-deeply gather { take (* + $_)(32) given 10 }, 42.Seq,
'given + whatever code closure execution';
# the WhateverCode ain't got no statement_id and is ALONGSIDE given
# block but needs to be INSIDE of it
So far, we can see a couple of patterns:
xx
andWhateverCode
thunks don't get migrated, even though they shouldandthen
thunks get migrated, even though they shouldn't
The first one is fairly straightforward. Looking at the QAST dump, we see
xx
thunk has a higher statement_id
than the block it was meant to be in.
This is what skids++'s hint addresses, so we'll
change the statement_id
conditional from
==
to >=
to look for statement IDs higher than our current one as well,
since those would be from any substatements, such as our xx
inside the
positional indexing operator:
($b.ann('statement_id') // -1) >= $migrate_stmt_id
The cause is very similar for the WhateverCode
case, as it's missing
statement_id
annotation altogether, so we'll just annotate
the generated QAST::Block
with the statement ID. Some basic detective work
gives us the location where that node is created: we search src/Perl6/Actions.nqp
for word "whatever"
until we spot
whatever_curry
method and in its guts we find the QAST::Block
we want. For
the statement ID, we'll grep the source for statement_id
:
$ grep -FIRn 'statement_id' src/Perl6/
src/Perl6/Actions.nqp:1497: $past.annotate('statement_id', $id);
src/Perl6/Actions.nqp:2326: $_.annotate('statement_id', $*STATEMENT_ID);
src/Perl6/Actions.nqp:2488: -> $b { ($b.ann('statement_id') // -1) == $stmt.ann('statement_id') });
src/Perl6/Actions.nqp:9235: && ($b.ann('statement_id') // -1) >= $migrate_stmt_id
src/Perl6/Actions.nqp:9616: ).annotate_self: 'statement_id', $*STATEMENT_ID;
src/Perl6/World.nqp:256: $pad.annotate('statement_id', $*STATEMENT_ID);
From the output, we can see the ID is stored in $*STATEMENT_ID
dynamic
variable, so we'll use that for our annotation on the
WhateverCode
's QAST::Block
:
my $block := QAST::Block.new(
QAST::Stmts.new(), $past
).annotate_self: 'statement_id', $*STATEMENT_ID;
Let's compile and run our bug tests. If you're using
Z-Script, you can re-compile Rakudo
by running z
command with no arguments:
$ z
[...]
$ ./perl6 bug-tests.t
1..1
1..10
ok 1 - xx inside `with`
not ok 2 - try with block and andthen
# Failed test 'try with block and andthen'
# at bug-tests.t line 10
# expected: $(2, 4, 6)
# got: $(2, 2, 4)
not ok 3 - block in a sub with orelse
# Failed test 'block in a sub with orelse'
# at bug-tests.t line 16
# expected: $("cc", "dd")
# got: $("cc", "cc")
not ok 4 - loop + lexical variable plus chain of andthens
# Failed test 'loop + lexical variable plus chain of andthens'
# at bug-tests.t line 22
# expected: $(1, 1, 1, 1, 1, 1, 1)
# got: $(1, 4, 3, 3, 3, 3, 3)
not ok 5 - loop + andthen + orelse
# Failed test 'loop + andthen + orelse'
# at bug-tests.t line 28
# expected: $("a", "b", "c")
# got: $("a", "a", "a")
ok 6 - parentheses + xx + given
ok 7 - postfix for + take + block in a string
ok 8 - given + whatever code closure execution
ok 9 - sub + given + whatevercode closure execution
not ok 10 - sub with `with` + orelse + block interpolation
# Failed test 'sub with `with` + orelse + block interpolation'
# at bug-tests.t line 49
# expected: $("meow True",)
# got: $("meow False",)
# Looks like you failed 5 tests of 10
not ok 1 - thunking closure scoping
# Failed test 'thunking closure scoping'
# at bug-tests.t line 3
# Looks like you failed 1 test of 1
Looks like that fixed half of the issues already. That's pretty good!
Extra Debugging
Let's now look at the remaining failures and figure out why block migration isn't how we want it in those cases. To assists with our sleuthing efforts, let's make a couple of changes to produce more debugging info.
First, let's modify QAST::Node.dump
method in NQP's repo
to dump the value of in_stmt_mod
annotation, by telling it to dump out
the value verbatim if the key is in_stmt_mod
:
if $k eq 'IN_DECL' || $k eq 'BY' || $k eq 'statement_id'
|| $k eq 'in_stmt_mod' {
...
Next, let's go to sub migrate_blocks
in Actions.nqp
and add a bunch of debug dumps inside most of the
conditionals.
This will let us track when a block is compared and to see whether migration
occurs. As mentioned earlier, I like to key my dumps on env vars using
nqp::getenvhash
op, so after modifications my migrate_blocks
routine
looks like this; note the use of .dump
method to dump QAST node guts
(tip: .dump
method also exists on Perl6::Grammar
's match objects!):
sub migrate_blocks($from, $to, $predicate?) {
my @decls := @($from[0]);
my int $n := nqp::elems(@decls);
my int $i := 0;
while $i < $n {
my $decl := @decls[$i];
if nqp::istype($decl, QAST::Block) {
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: -----------------');
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: trying to grab ' ~ $decl.dump);
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: to move to ' ~ $to.dump);
if !$predicate || $predicate($decl) {
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: grabbed');
$to[0].push($decl);
@decls[$i] := QAST::Op.new( :op('null') );
}
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: -----------------');
}
elsif (nqp::istype($decl, QAST::Stmt) || nqp::istype($decl, QAST::Stmts)) &&
nqp::istype($decl[0], QAST::Block) {
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: -----------------');
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: trying to grab ' ~ $decl[0].dump);
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: to move to ' ~ $to.dump);
if !$predicate || $predicate($decl[0]) {
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: grabbed');
$to[0].push($decl[0]);
$decl[0] := QAST::Op.new( :op('null') );
}
nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: -----------------');
}
elsif nqp::istype($decl, QAST::Var) && $predicate && $predicate($decl) {
$to[0].push($decl);
@decls[$i] := QAST::Op.new( :op('null') );
}
$i++;
}
}
After making the changes, we need to recompile both NQP and Rakudo. With
Z-Script, we can just run z n
to do that:
$ z n
[...]
Now, we'll grab the first failing code and take a look at its QAST. I'm
going to use the CoreHackers::Q
tool:
$ q a ./perl6 -e '
sub itcavuc ($c) { try {say $c} andthen 42 };
itcavuc $_ for 2, 4, 6;'
$ firefox out.html
We can see that our buggy say
call lives in QAST::Block
with cuid 1
,
which gets called from within QAST::Block
with cuid 3
, but is actually
located within QAST::Block
with cuid 2
:
- QAST::Block(:cuid(3)) <wanted> :statement_id<1>
:count<?> :signatured<?> :IN_DECL<sub>
:in_stmt_mod<0> :code_object<?>
:outer<?> { try {say $c} andthen 42 }
[...]
- QAST::Block(:cuid(2)) <wanted> :statement_id<2>
:count<?> :in_stmt_mod<0> :code_object<?> :outer<?>
[...]
- QAST::Block(:cuid(1)) <wanted> :statement_id<2>
:IN_DECL<> :in_stmt_mod<0> :code_object<?>
:also_uses<?> :outer<?> {say $c}
[...]
- QAST::Op(call &say) say $c
[...]
- QAST::Op(p6typecheckrv)
[...]
- QAST::WVal(Block :cuid(1))
Looks like cuid 2
block steals our cuid 1
block. Let's enable the
debug env var and look at the dumps to see why exactly:
$ ZZ=1 ./perl6 -e '
sub itcavuc ($c) { try {say $c} andthen 42 };
itcavuc $_ for 2, 4, 6;'
ZZ1: -----------------
ZZ1: trying to grab - QAST::Block(:cuid(1)) <wanted>
:statement_id<2> :IN_DECL<> :in_stmt_mod<0> :code_object<?>
:also_uses<?> :outer<?> {say $c}
[...]
ZZ1: to move to - QAST::Block :statement_id<2>
:in_stmt_mod<0> :outer<?>
ZZ1: grabbed
ZZ1: -----------------
We can see the theft in progress. Let's take a look at our migration predicate again:
! $b.ann('in_stmt_mod')
&& ($b.ann('statement_id') // -1) >= $migrate_stmt_id
In the dump we can see in_stmt_mod
is false. Were it set to a true value,
the block would not be migrated—exactly what we're trying to accomplish.
Let's investigate the in_stmt_mod
annotation, to see when it gets set:
$ G 'in_stmt_mod' src/Perl6/Actions.nqp
2327: $_.annotate('in_stmt_mod', $*IN_STMT_MOD);
9206: !$b.ann('in_stmt_mod') && ($b.ann('statement_id') // -1) >= $migrate_stmt_id
$ G '$*IN_STMT_MOD' src/Perl6/Grammar.nqp
1200: :my $*IN_STMT_MOD := 0; # are we inside a statement modifier?
1328: :my $*IN_STMT_MOD := 0;
1338: | <EXPR> :dba('statement end') { $*IN_STMT_MOD := 1 }
Looks like it's a marker for statement modifier conditions. Statement modifiers
have a lot of relevance to our andthen
thunks, because $foo with $bar
gets
turned into $bar andthen $foo
during parsing. Since, as we can see in
src/Perl6/Grammar.nqp
, in_stmt_mod
annotation gets set for with
statement modifiers, we can hypothesize that if we turn
our buggy andthen
into a with
, the bug will disappear:
$ ./perl6 -e 'sub itcavuc ($c) { 42 with try {say $c} };
itcavuc $_ for 2, 4, 6;'
2
4
6
And indeed it does! Then, we have a way forward: we need to set in_stmt_mod
annotation to a truthy value for just the first argument of andthen
(and its relatives notandthen
and orelse
).
Glancing at the Grammar it doesn't look
like it
immediatelly offers a similar opportunity for how in_stmt_mod
is set for the
with
statement modifier. Let's approach it differently. Since we care about
this when thunks are created, let's watch for andthen
QAST inside
sub thunkity_thunk
in Actions,
then descend into its first kid and add
the in_stmt_mod
annotation by cheating and using the past_block
annotation
on QAST::WVal
with the thunk that contains the reference to QAST::Block
we wish to annotate. The code will look something like this:
sub mark_blocks_as_andnotelse_first_arg($ast) {
if $ast && nqp::can($ast, 'ann') && $ast.ann('past_block') {
$ast.ann('past_block').annotate: 'in_stmt_mod', 1;
}
elsif nqp::istype($ast, QAST::Op)
|| nqp::istype($ast, QAST::Stmt)
|| nqp::istype($ast, QAST::Stmts) {
mark_blocks_as_andnotelse_first_arg($_) for @($ast)
}
}
sub thunkity_thunk($/,$thunky,$past,@clause) {
[...]
my $andnotelse_thunk := nqp::istype($past, QAST::Op)
&& $past.op eq 'call'
&& ( $past.name eq '&infix:<andthen>'
|| $past.name eq '&infix:<notandthen>'
|| $past.name eq '&infix:<orelse>');
while $i < $e {
my $ast := @clause[$i];
$ast := $ast.ast if nqp::can($ast,'ast');
mark_blocks_as_andnotelse_first_arg($ast)
if $andnotelse_thunk && $i == 0;
[...]
First, we rake $past
argument given to thunkity_thunk
for a QAST::Op
for nqp::call
that calls one of our ops—when we found one, we set
a variable to a truthy value. Then, in the loop, when we're iterating over
the first child node ($i == 0
) of these ops, we'll pass its QAST to our
newly minted mark_blocks_as_andnotelse_first_arg
routine, inside of which
we recurse over any ops that can have kids and mark anything that
has past_block
annotation with truthy in_stmt_mod
annotation.
Let's compile our concoction and give the tests another run. Once again, I'm using Z-Script to recompile Rakudo:
$ z
[...]
$ ./perl6 bug-tests.t
1..1
1..10
ok 1 - xx inside `with`
ok 2 - try with block and andthen
ok 3 - block in a sub with orelse
not ok 4 - loop + lexical variable plus chain of andthens
# Failed test 'loop + lexical variable plus chain of andthens'
# at bug-tests.t line 23
# expected: $(1, 1, 1, 1, 1, 1, 1)
# got: $(1, 4, 3, 3, 3, 3, 3)
ok 5 - loop + andthen + orelse
ok 6 - parentheses + xx + given
ok 7 - postfix for + take + block in a string
ok 8 - given + whatever code closure execution
ok 9 - sub + given + whatevercode closure execution
not ok 10 - sub with `with` + orelse + block interpolation
# Failed test 'sub with `with` + orelse + block interpolation'
# at bug-tests.t line 50
# expected: $("meow True",)
# got: $("meow False",)
# Looks like you failed 2 tests of 10
not ok 1 - thunking closure scoping
# Failed test 'thunking closure scoping'
# at bug-tests.t line 4
# Looks like you failed 1 test of 1
We got closer to the goal, with 80% of the tests now passing! In the first remaining failure, we already know from our original examination that chained andthen thunks get nested when they should not—we haven't done anything to fix that yet. Let's take care of that first.
Playing Chinese Food Mind Games
Looking back out at the fixes we applied already, we have a marker for when
we're working with andthen
or its sister ops: the $andnotelse_thunk
variable. It seems fairly straight-forward that if we don't want the thunks
of these ops to migrate, we just need to annotate them appropriately and stick
the check for that annotation into the migration predicate.
In Grammar.nqp
, we can see our ops are configured with the .b
thunky, so we'll locate that branch in
sub thunkity_thunk
and pass $andnotelse_thunk
variable as a new named param
to the make_topic_block_ref
block maker:
...
elsif $type eq 'b' { # thunk and topicalize to a block
unless $ast.ann('bare_block') || $ast.ann('past_block') {
$ast := block_closure(make_topic_block_ref(@clause[$i],
$ast, :$andnotelse_thunk,
migrate_stmt_id => $*STATEMENT_ID));
}
$past.push($ast);
}
...
The block maker) will shove it into the migration predicate, so our block maker code becomes this:
sub make_topic_block_ref(
$/, $past, :$copy, :$andnotelse_thunk, :$migrate_stmt_id,
) {
my $block := $*W.push_lexpad($/);
# Add annotation to thunks of our ops:
$block.annotate: 'andnotelse_thunk', 1 if $andnotelse_thunk;
$block[0].push
QAST::Var.new( :name('$_'), :scope('lexical'), :decl('var') );
$block.push($past);
$*W.pop_lexpad();
if nqp::defined($migrate_stmt_id) {
migrate_blocks($*W.cur_lexpad(), $block, -> $b {
! $b.ann('in_stmt_mod')
# Don't migrate thunks of our ops:
&& ! $b.ann('andnotelse_thunk')
&& ($b.ann('statement_id') // -1) >= $migrate_stmt_id
});
}
...
One more compilation cycle and test run:
$ z
[...]
$ ./perl6 bug-tests.t
1..1
1..10
ok 1 - xx inside `with`
ok 2 - try with block and andthen
ok 3 - block in a sub with orelse
ok 4 - loop + lexical variable plus chain of andthens
ok 5 - loop + andthen + orelse
ok 6 - parentheses + xx + given
ok 7 - postfix for + take + block in a string
ok 8 - given + whatever code closure execution
ok 9 - sub + given + whatevercode closure execution
not ok 10 - sub with `with` + orelse + block interpolation
# Failed test 'sub with `with` + orelse + block interpolation'
# at bug-tests.t line 50
# expected: $("meow True",)
# got: $("meow False",)
# Looks like you failed 1 test of 10
not ok 1 - thunking closure scoping
# Failed test 'thunking closure scoping'
# at bug-tests.t line 4
# Looks like you failed 1 test of 1
So close! Just a single test failure remains. Let's give it a close look.
Within and Without
Let's repeat our procedure of dumping QASTs as well as enabing the ZZ
env
var and looking at what's causing the thunk mis-migration. I'm going to run
a slightly simplified version of the failing test, to keep the cruft out of
QAST dumps. If you're following along, when looking at full QAST dump keep in
mind what I mentioned earlier: with
gets rewritten into andthen
op call
during parsing.
$ q a ./perl6 -e '.uc with +"a" orelse "meow {$_ ~~ Failure}".say and 42'
$ firefox out.html
- QAST::Block(:cuid(4)) :in_stmt_mod<0>
[...]
- QAST::Block(:cuid(1)) :statement_id<1> :in_stmt_mod<1>
[...]
- QAST::Op(chain &infix:<~~>) <wanted> :statement_id<2> ~~
- QAST::Var(lexical $_) <wanted> $_
- QAST::WVal(Failure) <wanted> Failure
- QAST::Block(:cuid(2)) :statement_id<1>
:in_stmt_mod<1> :andnotelse_thunk<1>
[...]
- QAST::Op(callmethod Stringy) <wanted>
- QAST::Op(call) <wanted> {$_ ~~ Failure}
- QAST::Op(p6capturelex) <wanted> :code_object<?>
- QAST::Op(callmethod clone)
- QAST::WVal(Block)
$ ZZ=1 ./perl6 -e '.uc with +"a" orelse "meow {$_ ~~ Failure}".say and 42'
[...]
ZZ1: -----------------
ZZ1: trying to grab - QAST::Block(:cuid(1))
:statement_id<1> :in_stmt_mod<1>
[...]
ZZ1: to move to - QAST::Block
:statement_id<1> :andnotelse_thunk<1> :in_stmt_mod<1>
[...]
ZZ1: -----------------
Although QAST::WVal
lacks .past_block
annotation and so doesn't show
the block's CUID in the dump, just by reading the code dumped around that
QAST, we can
see that the CUID-less block is our QAST::Block :cuid(1)
, whose immediate
outer is
QAST::Block :cuid(4)
, yet it's called from within
QAST::Block :cuid(2)
. It's supposed to get migrated, but that migration
never happens, as we can see when we use the ZZ
env var to enable our
debug dumps in the sub migrate_blocks
.
We can see why. Here's our current migration predicate (where $b
is the
examined block, which in our case is QAST::Block :cuid(1)
):
! $b.ann('in_stmt_mod')
&& ! $b.ann('andnotelse_thunk')
&& ($b.ann('statement_id') // -1) >= $migrate_stmt_id
The very first condition prevents our migration, as our block has truthy
in_stmt_mod
annotation, because it's part of the with
's condition.
At the same time,
it does need to be migrated because it's part of the andthen
thunk that's
inside the statement modifier!
Since we already have $andnotelse_thunk
variable in the vicinity of the
migration predicate we can use it to tell us whether we're migrating for the
benefit of our andthen
thunk and not the statement modifier. However,
recall that we've used
the very same in_stmt_mod
annotation to mark the first argument of andthen
and its brother ops. We need to alter that first.
And so, the sub mark_blocks_as_andnotelse_first_arg
we added earlier becomes:
sub mark_blocks_as_andnotelse_first_arg($ast) {
if $ast && nqp::can($ast, 'ann') && $ast.ann('past_block') {
$ast.ann('past_block').annotate: 'in_stmt_mod_andnotelse', 1;
}
...
And then we tweak the migration predicate
to watch for this altered annotation and
to consider the value of $andnotelse_thunk
variable:
migrate_blocks($*W.cur_lexpad(), $block, -> $b {
( (! $b.ann('in_stmt_mod_andnotelse') && $andnotelse_thunk)
|| (! $b.ann('in_stmt_mod') && ! $andnotelse_thunk)
)
&& ($b.ann('statement_id') // -1) >= $migrate_stmt_id
&& ! $b.has_ann('andnotelse_thunk')
});
Thus, we migrate all the blocks with statement_id
equal to or higher than
ours and are all of the following:
- Not thunks of actual
andthen
,notandthen
, ororelse
- Not thunks inside a statement modifier, unless they're inside
thunks of
andthen
or related ops - If we're considering migrating them inside one of the
andthen
's thunks, then also not part of the first argument toandthen
(or related ops), .
That's a fancy-pants predicate. Let's compile and see if it gets the job done:
$ z
[...]
$ ./perl6 bug-tests.t
1..1
1..10
ok 1 - xx inside `with`
ok 2 - try with block and andthen
ok 3 - block in a sub with orelse
ok 4 - loop + lexical variable plus chain of andthens
ok 5 - loop + andthen + orelse
ok 6 - parentheses + xx + given
ok 7 - postfix for + take + block in a string
ok 8 - given + whatever code closure execution
ok 9 - sub + given + whatevercode closure execution
ok 10 - sub with `with` + orelse + block interpolation
ok 1 - thunking closure scoping
Success! Now, let's remove all of the debug statements we added. Then,
recompile and run make stresstest
, to ensure we did not break anything else.
With Z-Script, we can do all that by
just running z ss
:
$ z ss
[...]
All tests successful.
Files=1287, Tests=153127, 159 wallclock secs (21.40 usr 3.27 sys + 3418.56 cusr 179.32 csys = 3622.55 CPU)
Result: PASS
All is green. We can now commit our fix to Rakudo's repo, then commit our tests to the roast repo, and all that remains is closing those 10 tickets we fixed!
Job well done.
Conclusion
Today, we learned quite a bit about QAST: the Abstract Syntax Trees Perl 6 code compiles to in the Rakudo compiler. We examined the common types of QAST and how to create, annotate, mutate, execute, and dump them for examination.
In the second part of the article, we applied our new knowledge to fix a hive of mis-scoped thunking bugs that plagued various Perl 6 constructs. We introspected the generated QAST nodes to specially annotate them, and then used those annotations to reconfigure migration predicate, so that it migrates the blocks correctly.
Hopefully, this knowledge inspires you to fix the many other bugs we have on the RT tracker as well as our GitHub Issue tracker
-Ofun
Leave a comment