Itch.scratch()

In writing my past few blog entries I’ve repeatedly come across a situation that Raku doesn’t handle as well as I could wish. It’s a little thing, but like so many other little things
its presence is a source of minor but persistent irritation.

In my previous entry I used some Raku code that illustrates the point perfectly. I needed to build an object for every value in the @values array, or a single special object if @values was empty:

    for @values Z $label,"",* -> ($value, $label) {
        Result.new:
            desc  => "$label ($param.name() = $value)",
            value => timed { $block($value) },
            check => { last if .timing > TIMEOUT }
    }
    if !@values {
        Result.new:
            desc  => $label,
            value => timed { $block(Empty) }
    }

At almost the same time, in other (non-blog) code I was writing, I needed exactly the same construction...to do something with every element of an array, or something different if the array had no elements:

    for @errors -> $error {
        note $error if DEBUG;
        LAST die X::CompilationFailed.new( :@errors );
    }
    if !@errors {
        note 'Compilation complete' if DEBUG;
        return $compilation;
    }

These are just two examples of a surprisingly common situation: the need to iterate through a list...or else do something special if the list is empty. In other words: if the loop doesn’t iterate, do this instead.

There are several other ways I could have written those loops. For example, I could have prefixed the for with a do, thereby converting the loop into an expression. Then I could append an or and the special case code, so that code would be executed if the first do was false, which would happen if the loop didn’t iterate at all:

    do for @values Z $label,"",* -> ($value, $label) {
        Result.new:
            desc  => "$label ($param.name() = $value)",
            value => timed { $block($value) },
            check => { last if .timing > TIMEOUT }
    } or
        Result.new:
            desc  => $label,
            value => timed { $block(Empty) };


    do for @errors -> $error {
        note $error if DEBUG;
        LAST die X::CompilationFailed.new( :@errors );
    } or do {
        note 'Compilation complete' if DEBUG;
        return $compilation;
    }

That certainly works, and it eliminates the repeated testing of @values or @errors, but it’s aesthetically unsatisying, besides making the code less readable.

I could improve the readability (though not the aesthetics) by hoisting the “if-there-were-no-iterable-values” test to the top, like so:

    if @values {
        for @values Z $label,"",* -> ($value, $label) {
            Result.new:
                desc  => "$label ($param.name() = $value)",
                value => timed { $block($value) },
                check => { last if .timing > TIMEOUT }
        }
    }
    else {
        Result.new:
            desc  => $label,
            value => timed { $block(Empty) }
    }


    if @errors {
        for @errors -> $error {
            note $error if DEBUG;
            LAST die X::CompilationFailed.new( :@errors );
        }
    }
    else {
        note 'Compilation complete' if DEBUG;
        return $compilation;
    }

...but that just underscores the absurdity of needing to test the state of the iterated array twice within the first two lines.

However, it does suggest a cleaner solution: one that eliminates repetition, and maximizes readability. A solution whose only drawback is that it’s impossible in standard Raku.

That solution is: for loops should be able to have an else block!

An else block executes when the preceding if or when block doesn’t. In just the
same way, it ought to be possible to append an else block to a for,
so that the else block executes when the preceding loop block doesn’t.

If that were possible in Raku, then my two pieces of code would simplify to:

    for @values Z $label,"",* -> ($value, $label) {
        Result.new:
            desc  => "$label ($param.name() = $value)",
            value => timed { $block($value) },
            check => { last if .timing > TIMEOUT }
    }
    else {
        Result.new:
            desc  => $label,
            value => timed { $block(Empty) }
    }


    for @errors -> $error {
        note $error if DEBUG;
        LAST die X::CompilationFailed.new( :@errors );
    }
    else {
        note 'Compilation complete' if DEBUG;
        return $compilation;
    }

There’s just that very minor problem of its not being valid Raku syntax (or semantics).
But, as usual, that’s not really much of a problem at all in Raku.
To solve it, we just redefine the for keyword...

To replace the standard definition of for we need to tell the compiler two things: what the new definition looks like, and how it works. In other words, we need to define how to recognize the new for syntax, and how to convert that new syntax into an “abstract syntax tree” of opcodes that the compiler can optimize and execute. And, of course, we also need to tell the compiler to use these new components instead of the standard ones.

In Raku, the grammar and semantics we are using to interpret any part of the code is known as a “sublanguage”, or “slang” for short. A typical Raku program consists of a number of slangs braided together: the main Raku sublanguage, the Pod documentation sublanguage, the string sublanguage, the regex sublanguage, etc. The objects implementing these various active sublanguages are available through a compile-time variable: $*LANG.

In this instance, we need to augment the main Raku slang, so we create a role with the new grammar rule for the extended for syntax, then mix that new syntax into the existing grammar. Likewise, we need to extend the actions the compiler takes on encountering the new syntax, so we create a second role specifying those actions, and later mix it into the existing actions.

In its simplest form, the new grammar rule looks like this:

    # Encapulate new grammar component in a composable role...
    role ForElse::Grammar {

        # Replacement 'for' syntax...
        rule statement_control:sym<for> {
              <sym>   <xblock(2)>
            [ 'else'  <pblock(0)> ]?
        }
    }

We declare the role (ForElse::Grammar) that will contain the new grammar rule,
then declare the rule itself. The rule’s name is: statement_control:sym<for>,
which tells the compiler that it’s a statement-level control structure, introduced by the symbol for. In the body of the rule, we first match that symbol (<sym>), followed by an “expression-block” (<xblock(2)>). An expression block is simply a shorthand for matching a non-optional expression, followed by a non-optional block. The 2 passed into the call to xblock tells the subrule that the block it matches must contain a topic variable of some kind (because for loops always set a topic variable, and we might as well enforce that when the source code is being parsed).

After parsing the for component, we now want to allow an optional else, so we specify that as a literal ('else'), after which we expect a parameterized block (<pblock(0)>). The zero argument tells the subrule that the else block is not expected to have a topic variable. Then we wrap the entire else syntax in non-capturing brackets ([...]) and make it optional (?).

Note that we don’t need to specify rules for <xblock> and <pblock>. Their rules are already defined in the standard Raku grammar, to which we will eventually be adding this new statement_control:sym<for> rule.

This two-line rule is sufficient to parse every valid for...else, but it will also successfully parse several other invalid constructs. So we next add in a small number of extra components to prevent that. The extended version of the rule looks like this:

    rule statement_control:sym<for> {
        <sym><.kok> {}
        <.vetPerl5Syntax>
        <xblock(2)> {}
        [ 'else' <elseblock=pblock> ]?
    }

    rule vetPerl5Syntax {
        [ <?before 'my'? '$'\w+\s+'(' >
            <.typed_panic: 'X::Syntax::P5'> ]?
        [ <?before '(' <.EXPR>? ';' <.EXPR>? ';' <.EXPR>? ')' >
            <.obs('C-style "for (;;)" loop', '"loop (;;)"')> ]?
    }

The call to the <.kok> subrule is used to check that, having matched the initial 'for', those three characters really do constitute a keyword that’s okay. For example, if the 'for' is followed by a =>, then it’s not the start of a for loop, but the key of a pair. Similarly, if the 'for' is immediately followed by an opening parenthesis, then it’s not the start of a for loop; it’s a call to some function named &for. The <.kok> subrule (once again inherited from the standard Raku grammar) does various lookaheads to check for these and other edge-cases, and fails if any is found.

The empty braces ({}) after the call to <.kok> are there to indicate the end of longest-token matching within that part of the rule. The issue here is that other statement-control symbols might be defined by other people wanting to modify the code, and the grammar needs to know which one to select if two or more of them match. In general, when considering a set of alternatives within a regex or rule, Raku takes the alternative that matches the longest substring (not the first alternative that matches, like in Perl). This is known as “longest token matching” or LTM for short.

When the grammar is trying to decide between our new for...else syntax, and (say) someone else’s for...otherwise syntax, we don’t want it selecting ours just because ours matched more total characters. We want it to select whichever syntax is more appropriate. So we need to stop the LTM evaluator from considering the entire match, and only consider the keyword. There are several ways to signal “end of LTM”, but the shortest and easiest is just to insert an empty code block (i.e. {}) into the rule. Which is what we’ve done here.

The next addition to the rule is a call to the <.vetPerl5Syntax> subrule:

    rule vetPerl5Syntax {
        [ <?before 'my'? '$'\w+\s+'(' >
            <.typed_panic: 'X::Syntax::P5'> ]?
        [ <?before '(' <.EXPR>? ';' <.EXPR>? ';' <.EXPR>? ')' >
            <.obs('C-style "for (;;)" loop', '"loop (;;)"')> ]?
    }

This call was added because the standard Raku grammar always looks particularly
closely at for loops to make sure that someone hasn’t accidentally used one of
the two older Perl 5 syntaxes by mistake. If the subrule looks ahead and finds a my
and/or a variable immediately after the for, and then an opening parenthesis
(<?before 'my'? '$'\w+ \s+ '(' >), it concludes that it’s seeing a Perl 5 for
and throws an X::Syntax::P5 exception. If it looks ahead and finds a pair of
parentheses containing three expressions separated by semi-colons
(<?before '(' <.EXPR>? ';' <.EXPR>? ';' <.EXPR>? ')' >),
it concludes that it’s seeing a Perl 5 C-style for loop and warns the user
to replace it with a loop instead.

Finally, we modify the call to <pblock(0)> like so: <elseblock=.pblock(0)>
This causes any match by the <pblock(0)> call to be stored under the key 'elseblock' instead of the key 'pblock'. That will subsequently improve the readability of our else-processing code.

Once these extra checks and balances are in place, our statement_control:sym<for> rule is ready to be added to the current slang. If we did so, the compiler would now be able to recognize for...else constructs, but we’d see no useful effect from its doing so. That’s because we haven’t yet told it how to convert the new for...else syntax into executable opcodes.

To tell it that, we declare a second role (so we can later mix it into the existing
compiler actions)
. In that role we specify a method of the appropriate name,
which the compiler will then call automatically every time it successfully parses
with our new statement_control:sym<for> rule:

    # Encapsulate new actions for new 'for' syntax...
    role ForElse::Actions {
        use nqp;
        use QAST:from;

        # Utility function...
        sub lookup(Mu \match, \key) {
            nqp::atkey(
                nqp::findmethod(match, 'hash')(match),
                key
            ).?ast
        }

        # New actions when a 'for' is parsed...
        method statement_control:sym (Mu $match) {
            my $forloop := callsame;
            if lookup($match, 'elseblock') -> $elseblock {
                match.make:
                    QAST::Op.new: :op<unless>, $forloop,
                    QAST::Op.new: :op<call>,   $elseblock
            }
        }
    }

The first thing we do in our new action role is to load the facilities of the nqp and QAST modules. NQP is the Not Quite Perl 6” subset of Raku in which the majority of the Raku compiler is written. QAST is the Quisquous Abstract Syntax Tree” representation of opcodes and arguments to which all Raku code is reduced within the compiler. As we’re effectively upgrading the compiler to handle our new syntax, we’re going to need to access those syntactic components via NQP commands. And, to implement our new behaviour, we’re going to need to build a suitable QAST structure.

First we build a simple utility function (lookup) that takes a pattern match from the grammar and attempts to retrieve the abstract syntax tree of a particular named submatch from within that match. Note that, because this code will be inserted into the compiler,
it can’t rely on the usual Raku data structures and access methods being available.
Instead, we need to use the underlying NQP access functions. In this case, the utility first locates the function that extracts the hash-like component of the match object (nqp::findmethod(match, 'hash')), then calls that hash-extractor function on the match data structure ((match)), then does a key-lookup into the resulting hash (nqp::atkey(..., key)) then attempts to retrieve the abstract syntax tree associated with that match (.?ast).

Once we have this ability to extract particular components from a grammar match, we can write a method that pulls out the various pieces of a for...else match and rearranges them into a suitable QAST representation. That method has to have the same name as the rule whose match it is processing, so we declare:

    method statement_control:sym<for> (Mu $match) {...}

The method takes as its only argument the match object produced by the corresponding grammar rule. We declare that parameter to be of type Mu (the root type of the entire Raku hierarchy) because it’s an NQP object and the Raku type system won’t pass it otherwise. Note that we can’t just omit the type declaration from the $match parameter, because then it would default to type Any, which would be too specific in this case.

Once we have the match object, the first thing we need to do in order to turn the parsed match into a suitable QAST object is to convert the for component. But the standard Raku parser already knows how to do that, so we can just tell it to fall back on the previous behaviour...by invoking callsame.

That redispatched call will return a QAST object representing the for loop, and will also install that same QAST object as the new abstract syntax tree for the $match object.
As we may need to override that behaviour (if there is an else involved), we keep the for loop’s QAST object, by aliasing it to $forloop:

    my $forloop := callsame;

Then we need to discover whether the parser actually did find an else block,
which we do by looking for a capture named 'elseblock' within with $match object
(lookup($match, 'elseblock')). If there was an else after the for, we need to
build a QAST structure that executes the else block only if the for loop didn’t execute.
In pseudocode, that’s:

    unless forloop
      call elseblock

And in QAST it’s exactly the same:

    QAST::Op.new: :op<unless>, $forloop,
    QAST::Op.new: :op<call>,   $elseblock

That is, we build a new QAST unless operation (QAST::Op.new: :op<unless>). passing it its two required operands:

  • a QAST object representing the condition to be tested,

  • a QAST object representing what to do if that condition is false.

In this case, the first argument (the condition) is the QAST object we got back from callsame; the QAST object that implements the entire for loop (i.e. $forloop).
The second argument (what to do) is a new QAST object implementing a call to the else block (i.e. QAST::Op.new: :op<call>, $elseblock).

Once we’ve build that QAST structure, we simply install it as the abstract syntax tree for the original match ($match.make: ...).

And that’s it. When the new statement_control:syn<for> rule in the extended grammar successfully matches, the compiler will invoke the equivalent statement_control:syn<for> method in the extended actions, which will convert the parsed syntax into a QAST implementing the extended behaviour.

Provided, of course, we actually extend the grammar and its actions.
Which we haven’t done yet.

But, like most things in Raku, actually extending the grammar and its actions is not hard to do. Our goal is to have a module (let’s call it: Slang::ForElse) that installs our new slang within any lexical scope where it’s use’d:

    {
        use Slang::ForElse;

        for @values {
            .say;
        }
        else {
            say 'No values';
        }
    }

That module will need to modify the $*LANG object to install an augmented main grammar and the corresponding extended actions. Specifically, the module will need to call the $*LANG object’s .define_slang method, passing it the name of the slang to be modified (in this case: the "MAIN" slang), and the new grammar and actions to be installed for that slang.

We’re going to need to call that .define_slang method every time the module is use’d,
so we should put the call in the module’s EXPORT subroutine. Like so:

    sub EXPORT () {

        $*LANG.define_slang:
            "MAIN",
            $*LANG.slangs<MAIN>         but ForElse::Grammar,
            $*LANG.slangs<MAIN-actions> but ForElse::Actions;

        return hash();
    }

The subroutine calls the .define_slang method of the $*LANG object, requesting it to update the definitions of the "MAIN" slang. The second argument is the grammar to be installed, which is just the current "MAIN" grammar ($*LANG.slangs<MAIN>), but with the new for...else grammar mixed into it (but ForElse::Grammar). The third argument is the actions object to be installed, which is just the current "MAIN-actions" object ($*LANG.slangs<MAIN-actions>) but with our new for...else actions
mixed in (but ForElse::Actions).

Finally, we have to make sure that the EXPORT subroutine returns an empty hash (hash()), to tell the compiler that we’re not actually exporting anything here.

And that’s it. In less than 25 lines, we’ve modified the syntax and semantics of Raku to add a “missing” construct. It would be no harder to extend or modify the language in other ways to scratch other itches. And because each extension or modification is performed lexically, by mixing new behaviours into the existing slangs, these extra features are likely to play nicely with one another.

For example, we could load both Slang::ForElse and Slang::SQL and write:

    use Slang::ForElse;
    use Slang::SQL;

    sql drop table if exists stuff;

    sql create table if not exists stuff (
        id  integer,
        sid varchar(32)
    );

    for @ids {
        sql insert into stuff (id, sid)
            values (?, ?); with ($_, ('g'..'Z').pick(16).join);
    }
    else {
        sql insert into stuff (id, sid)
            values (?, ?); with (99, 'default');
    }

    sql select * from stuff order by id asc; do -> $row {
        "{$row<id>}\t{$row<sid>}".say;
    };

The ability to define and deploy lexically scoped slangs makes Raku highly future-proof. Anything we forgot to add to Raku in the original design (such as for...else!) can easily be added later if needed.

And slangs also have the potential to make Raku highly interoperable with other tools.
For example, Raku could potentially become the ultimate “glue language”, by allowing us to switch into slangs that look suspiciously like other programming languages, whenever those languages might be more convenient to code in:

    for @values -> \value {
        use Slang::Python;
        from math import floor, sqrt

        def fac(n):
            step = lambda x: 1 + (x<<2) - ((x>>1)<<1)
            maxq = long(floor(sqrt(n)))
            d = 1
            q = n % 2 == 0 and 2 or 3
            while q <= maxq and n % q != 0:
                q = step(d)
                d += 1
            return q <= maxq and [q] + fac(n//q) or [n]

        print(fac(value)))
    }
    else {
        use Slang::Ruby;
        require 'io/console'
        print "No values. Continue?"

        loop do
            case $stdin.getch
                when "Y" then return
                when "N" then break
                else print "\rNo values. Continue? [YN]"
            end
        end
    }

Unfortunately, those particular slang modules don’t exist yet, but fully functional Raku interfaces to both Python and Ruby (and to Perl and Lua and Scheme and Go and C) are already available, so writing fully integrated slangs for each of them would be just a simple matter of metaprogramming. ;-)

Damian

2 Comments

I knew I had seen this construct before.

In any case, your posts never fail to amaze me. Thanks!

I love the basic construct, but I feel reusing the keyword 'esle' could lead to confusion when quickly scanning the code. Perhaps something slightly distinct, 'forelse' or 'felse'?

But then I use ridicule 'foreach' as too long, when you can type 'for'.

Leave a comment

About Damian Conway

user-pic I blog about Perl.