Perl 6 Archives

The Rakudo Book Project

Read this article on Rakudo.Party

When I first joined the Rakudo project, we used to say "there are none right now; check back in a year" whenever someone asked for a book about the language. Today, there's a whole website for picking out a book, and the number of available books seems to multiply every time I look at it.

Still, I feel something is amiss, when I talk to folks on our support chat, when I read blog posts about the language, or when I look at our official language documentation. And it's due to that feeling that I wish to join the Rakudo book-writing club and write a few of my own. I dub it: The Rakudo Book Project.


The Books

The Rakudo Book Project involves 3 main books—The White Book, The Gray Book, and The Black Book—as well as 2 half-books—The Green Book and The Cracked Book.

The White Book will aim to provide introductory material to the Rakudo language. The target audience will benefit from prior programming experience, but it won't be strictly necessary for computer-savy people. The target audience is "adept beginners", as some might call it.

The book will cover most of Rakudo's features a typical Rakudo programmer might use in their projects, but it won't cover every little thing about each of them. By the end of the book, the readers will have written several programming projects and will be comfortable making useful, real-world Rakudo programs. More in-depth coverage of the language will be provided by The Gray Book, which is what The White Book's readers would read next. The Black Book will reach even deeper, exploring all of the arcane constructs. The progression through the books can be thought of as a plant growing in a flower pot. Initially, the roots extend through a large area of the pot, but they don't go all the way to all the walls and are rather sparse. As the plant grows, more and more roots shoot out, covering more and more volume of the pot. Same is with the books; while reading The White Book alone will let the plant survive, the root coverage will be sparse. However, by the end of The Black Book, the reader will be an expert Rakudo programmer.

Those three books are the core of my planned project. They're supplemented by two half-books on each end of the knowledge spectrum. The Green Book will target absolute programming beginners and get them up to speed just enough so they would be able to comfortably continue their learning using The White Book. On the other end of the spectrum is The Cracked Book. It's a half-book that follows The Black Book and won't provide more advanced techniques per say, but rather arcane "hacks" or even "bad ideas" that one might not wish to use in real-life code but which nevertheless provide some insight into the language.

The Cracked Book is yet a faint glimmer of an idea. Whether it will actually be made will depend on how much more I will want to say after The Black Book is complete. The Green Book is currently a bit amorphous as well. I have a 12-year old sibling interested in computers, so The Green Book might end up being a Rakudo For Kids.

The likely order in which the books will be produced is White, Gray, Green, Black, and Cracked. It's an ambitious plan, and so I won't be making any promises for producing more than one book at a time. Thus, the current aim is to produce just The White Book.

The Price

The digital versions of the books will be available for free.

Since Rakudo development can always use more funding, I plan to run crowd-funding campaigns during each of the book's development. 100% of all the collected funds will be used to sponsor Rakudo work (sponsoring someone other than me, of course). The campaigns will start once half of the target book has been created and the backers will get early preview digital copies as the book is developed further, as well as honourable mentions as Rakudo sponsors in the book itself.

Thus, the first Rakudo Core Fundraiser will launch once I have the first half of The White Book finished. I'm hoping that will happen soon.

The Why

Other than the obvious reason why people write the books—giving an alternate take on the material—I'd like to do this to cross off an item off my bucket list. Having written a terrible non-fiction book, lackluster fiction book, and a decent illustrated children's book, I hope to add a great technical book to the list, to complete it. I figure, with 5 books to attempt it, I'll be successful.

As for my alternate take, I hope to squash the myth that Rakudo is too big to learn as well as carve out a well-defined path for learners to follow. Just as I could make a living 10 years ago, when I barely spoke English, so a beginner Rakudo programmer can make useful programs with rudimentary knowledge of the language. The key is to not try to learn everything at once as well as have a definite path to walk through. Hence the 5 separate books.

I'm hoping at the end of this journey I will have accomplished all of these goals.

See you at the first Rakudo Core Fundraiser.

Perl 6: Seqs, Drugs, And Rock'n'Roll (Part 2)

Read this article on Perl6.Party

This is the second part in the series! Be sure you read Part I first where we discuss what Seqs are and how to .cache them.

Today, we'll take the Seq apart and see what's up in it; what drives it; and how to make it do exactly what we want.

PART II: That Iterated Quickly

The main piece that makes a Seq do its thing is an object that does the Iterator role. It's this object that knows how to generate the next value, whenever we try to pull a value from a Seq, or push all of its values somewhere, or simply discard all of the remaining values.

Keep in mind that you never need to use Iterator's methods directly, when making use of a Seq as a source of values. They are called indirectly under the hood in various Perl 6 constructs. The use case for calling those methods yourself is often the time when we're making an Iterator that's fed by another Iterator, as we'll see.

Pull my finger...

In its most basic form, an Iterator object needs to provide only one method: .pull-one

my $seq := Seq.new: class :: does Iterator {
    method pull-one {
        return $++ if $++ < 4;
        IterationEnd
    }
}.new;

.say for $seq;

# OUTPUT:
# 0
# 1
# 2
# 3

Above, we create a Seq using its .new method that expects an instantiated Iterator, for which we use an anonymous class that does the Iterator role and provides a single .pull-one method that uses a pair of anonymous state variables to generate 4 numbers, one per call, and then returns IterationEnd constant to signal the Iterator does not have any more values to produce.

The Iterator protocol forbids attempting to fetch more values from an Iterator once it generated the IterationEnd value, so your Iterator's methods may assume they'll never get called again past that point.

Meet the rest of the gang

The Iterator role defines several more methods, all of which are optional to implement, and most of which have some sort of default implementation. The extra methods are there for optimization purposes that let you take shortcuts depending on how the sequence is iterated over.

Let's build a Seq that hashes a bunch of data using Crypt::Bcryptmodule (run zef install Crypt::Bcrypt to install it). We'll start with the most basic Iterator that provides .pull-one method and then we'll optimize it to perform better in different circumstances.

use Crypt::Bcrypt;

sub hash-it (*@stuff) {
    Seq.new: class :: does Iterator {
        has @.stuff;
        method pull-one {
            @!stuff ?? bcrypt-hash @!stuff.shift, :15rounds
                    !! IterationEnd
        }
    }.new: :@stuff
}

my $hashes := hash-it <foo bar ber>;
for $hashes {
    say "Fetched value #{++$} {now - INIT now}";
    say "\t$_";
}

# OUTPUT:
# Fetched value #1 2.26035863
#     $2b$15$ZspycxXAHoiDpK99YuMWqeXUJX4XZ3cNNzTMwhfF8kEudqli.lSIa
# Fetched value #2 4.49311657
#     $2b$15$GiqWNgaaVbHABT6yBh7aAec0r5Vwl4AUPYmDqPlac.pK4RPOUNv1K
# Fetched value #3 6.71103435
#     $2b$15$zq0mf6Qv3Xv8oIDp686eYeTixCw1aF9/EqpV/bH2SohbbImXRSati

In the above program, we wrapped all the Seq making stuff inside a sub called hash-it. We slurp all the positional arguments given to that sub and instantiate a new Seq with an anonymous class as the Iterator. We use attribute @!stuff to store the stuff we need to hash. In the .pull-one method we check if we still have @!stuff to hash; if we do, we shift a value off @!stuff and hash it, using 15 rounds to make the hashing algo take some time. Lastly, we added a say statement to measure how long the program has been running for each iteration, using two now calls, one of which is run with the INIT phaser. From the output, we see it takes about 2.2 seconds to hash a single string.

Skipping breakfast

Using a for loop, is not the only way to use the Seq returned by our hashing routine. What if some user doesn't care about the first few hashes? For example, they could write a piece of code like this:

my $hash = hash-it(<foo bar ber>).skip(2).head;
say "Made hash {now - INIT now}";
say bcrypt-match 'ber', $hash;

# OUTPUT:
# Made hash 6.6813790
# True

We've used Crypt::Bcryptmodule's bcrypt-match routine to ensure the hash we got matches our third input string and it does, but look at the timing in the output. It took 6.7s to produce that single hash!

In fact, things will look the worse the more items the user tries to skip. If the user calls our hash-it with a ton of items and then tries to .skip the first 1,000,000 elements to get at the 1,000,001st hash, they'll be waiting for about 25 days for that single hash to be produced!!

The reason is our basic Iterator only knows how to .pull-one, so the skip operation still generates the hashes, just to discard them. Since the values our Iterator generates do not depend on previous values, we can implement one of the optimizing methods to skip iterations cheaply:

use Crypt::Bcrypt;

sub hash-it (*@stuff) {
    Seq.new: class :: does Iterator {
        has @.stuff;
        method pull-one {
            @!stuff ?? bcrypt-hash @!stuff.shift, :15rounds
                    !! IterationEnd
        }
        method skip-one {
            return False unless @!stuff;
            @!stuff.shift;
            True
        }
    }.new: :@stuff
}

my $hash = hash-it(<foo bar ber>).skip(2).head;
say "Made hash {now - INIT now}";
say bcrypt-match 'ber', $hash;

# OUTPUT:
# Made hash 2.2548012
# True

We added a .skip-one method to our Iterator that instead of hashing a value, simply discards it. It needs to return a truthy value, if it was able to skip a value (i.e. we had a value we'd otherwise generate in .pull-one, but we skipped it), or falsy value if there weren't any values to skip.

Now, the .skip method called on our Seq uses our new .skip-one method to cheaply skip through 2 items and then uses .pull-one to generate the third hash. Look at the timing now: 2.2s; the time it takes to generate a single hash.

However, we can kick it up a notch. While we won't notice a difference with our 3-item Seq, that user who was attempting to skip 1,000,000 items won't get the 2.2s time to generate the 1,000,000th hash. They would also have to wait for 1,000,000 calls to .skip-one and @!stuff.shift. To optimize skipping over a bunch of items, we can implement the .skip-at-least method (for brevity, just our Iterator class is shown):

class :: does Iterator {
    has @.stuff;
    method pull-one {
        @!stuff
            ?? bcrypt-hash( @!stuff.shift, :15rounds )
            !! IterationEnd
    }
    method skip-one {
        return False unless @!stuff;
        @!stuff.shift;
        True
    }
    method skip-at-least (Int \n) {
        n == @!stuff.splice: 0, n
    }
}

The .skip-at-least method takes an Int of items to skip. It should skip as many as it can, and return a truthy value if it was able to skip that many items, and falsy value if the number of skipped items was fewer. Now, the user who skips 1,000,000 items will only have to suffer through a single .splice call.

For the sake of completeness, there's another skipping method defined by Iterator: .skip-at-least-pull-one. It follows the same semantics as .skip-at-least, except with .pull-one semantics for return values. Its default implemention involves just calling those two methods, short-circuiting and returning IterationEnd if the .skip-at-least returned a falsy value, and that default implementation is very likely good enough for all Iterators. The method exists as a convenience for Iterator users who call methods on Iterators and (at the moment) it's not used in core Rakudo Perl 6 by any methods that can be called on users' Seqs.

A so, so count...

There are two more optimization methods—.bool-only and .count-only—that do not have a default implementation. The first one returns True or False, depending on whether there are still items that can be generated by the Iterator (True if yes). The second one returns the number of items the Iterator can still produce. Importantly these methods must be able to do that without exhausting the Iterator. In other words, after finding these methods implemented, the user of our Iterator can call them and afterwards should still be able to .pull-one all of the items, as if the methods were never called.

Let's make an Iterator that will take an Iterable and .rotate it once per iteration of our Iterator until its tail becomes its head. Basically, we want this:

.say for rotator 1, 2, 3, 4;

# OUTPUT:
# [2 3 4 1]
# [3 4 1 2]
# [4 1 2 3]

This iterator will serve our purpose to study the two Iterator methods. For a less "made-up" example, try to find implementations of iterators for combinations and permutations routines in Perl 6 compiler's source code.

Here's a sub that creates our Seq with our shiny Iterator along with some code that operates on it and some timings for different stages of the program:

sub rotator (*@stuff) {
    Seq.new: class :: does Iterator {
        has int $!n;
        has int $!steps = 1;
        has     @.stuff is required;

        submethod TWEAK { $!n = @!stuff − 1 }

        method pull-one {
            if $!n-- > 0 {
                LEAVE $!steps = 1;
                [@!stuff .= rotate: $!steps]
            }
            else {
                IterationEnd
            }
        }
        method skip-one {
            $!n > 0 or return False;
            $!n--; $!steps++;
            True
        }
        method skip-at-least (Int \n) {
            if $!n > all 0, n {
                $!steps += n;
                $!n     −= n;
                True
            }
            else {
                $!n = 0;
                False
            }
        }
    }.new: stuff => [@stuff]
}

my $rotations := rotator ^5000;

if $rotations {
    say "Time after getting Bool: {now - INIT now}";

    say "We got $rotations.elems() rotations!";
    say "Time after getting count: {now - INIT now}";

    say "Fetching last one...";
    say "Last one's first 5 elements are: $rotations.tail.head(5)";
    say "Time after getting last elem: {now - INIT now}";
}

# OUTPUT:
# Time after getting Bool: 0.0230339
# We got 4999 rotations!
# Time after getting count: 26.04481484
# Fetching last one...
# Last one's first 5 elements are: 4999 0 1 2 3
# Time after getting last elem: 26.0466234

First things first, let's take a look at what we're doing in our Iterator. We take an Iterable (in the sub call on line 37, we use a Range object out of which we can milk 5000 elements in this case), shallow-clone it (using [ ... ] operator) and keep that clone in @!stuff attribute of our Iterator. During object instantiation, we also save how many items @!stuff has in it into $!n attribute, inside the TWEAK submethod.

For each .pull-one of the Iterator, we .rotate our @!stuff attribute, storing the rotated result back in it, as well as making a shallow clone of it, which is what we return for the iteration.

We also already implemented the .skip-one and .skip-at-least optimization methods, where we use a private $!steps attribute to alter how many steps the next .pull-one will .rotate our @!stuff by. Whenever .pull-one is called, we simply reset $!steps to its default value of 1 using the LEAVE phaser.

Let's check out how this thing performs! We store our precious Seq in $rotations variable that we first check for truthiness, to see if it has any elements in it at all; then we tell the world how many rotations we can fish out of that Seq; lastly, we fetch the last element of the Seq and (for screen space reasons) print the first 5 elements of the last rotation.

All three steps—check .Bool, check .elems, and fetch last item with .tail are timed, and the results aren't that pretty. While .Bool took relatively quick to complete, the .elems call took ages (26s)! That's actually not all of the damage. Recall from PART I of this series that both .Bool and .elems cache the Seq unless special methods are implemented in the Iterator. This means that each of those rotations we made are still there in memory, using up space for nothing! What are we to do? Let's try implementing those special methods .Bool and .elems are looking for!

This only thing we need to change is to add two extra methods to our Iterator that determinine how many elements we can generate (.count-only) and whether we have any elements to generate (.bool-only):

method count-only { $!n     }
method bool-only  { $!n > 0 }

For the sake of completeness, here is our previous example, with these two methods added to our Iterator:

sub rotator (*@stuff) {
    Seq.new: class :: does Iterator {
        has int $!n;
        has int $!steps = 1;
        has     @.stuff is required;

        submethod TWEAK { $!n = @!stuff − 1 }

        method count-only { $!n     }
        method bool-only  { $!n > 0 }

        method pull-one {
            if $!n-- > 0 {
                LEAVE $!steps = 1;
                [@!stuff .= rotate: $!steps]
            }
            else {
                IterationEnd
            }
        }
        method skip-one {
            $!n > 0 or return False;
            $!n--; $!steps++;
            True
        }
        method skip-at-least (\n) {
            if $!n > all 0, n {
                $!steps += n;
                $!n     −= n;
                True
            }
            else {
                $!n = 0;
                False
            }
        }
    }.new: stuff => [@stuff]
}

my $rotations := rotator ^5000;

if $rotations {
    say "Time after getting Bool: {now - INIT now}";

    say "We got $rotations.elems() rotations!";
    say "Time after getting count: {now - INIT now}";

    say "Fetching last one...";
    say "Last one's first 5 elements are: $rotations.tail.head(5)";
    say "Time after getting last elem: {now - INIT now}";
}

# OUTPUT:
# Time after getting Bool: 0.0087576
# We got 4999 rotations!
# Time after getting count: 0.00993624
# Fetching last one...
# Last one's first 5 elements are: 4999 0 1 2 3
# Time after getting last elem: 0.0149863

The code is nearly identical, but look at those sweet, sweet timings! Our entire program runs about 1,733 times faster because our Seq can figure out if and how many elements it has without having to iterate or rotate anything. The .tail call sees our optimization (side note: that's actually very recent) and it too doesn't have to iterate over anything and can just use our .skip-at-least optimization to skip to the end. And last but not least, our Seq is no longer being cached, so the only things kept around in memory are the things we care about. It's a huge win-win-win for very little extra code.

But wait... there's more!

Push it real good...

The Seqs we looked at so far did heavy work: each generated value took a relatively long time to generate. However, Seqs are quite versatile and at times you'll find that generation of a value is cheaper than calling .pull-one and storing that value somewhere. For cases like that, there're a few more methods we can implement to make our Seq perform better.

For the next example, we'll stick with the basics. Our Iterator will generate a sequence of positive even numbers up to the wanted limit. Here's what the call to the sub that makes our Seq looks like:

say evens-up-to 20; # OUTPUT: (2 4 6 8 10 12 14 16 18)

And here's the all of the code for it. The particular operation we'll be doing is storing all the values in an Array, by assigning to it:

sub evens-up-to {
    Seq.new: class :: does Iterator {
        has int $!n = 0;
        has int $.limit is required;
        method pull-one { ($!n += 2) < $!limit ?? $!n !! IterationEnd }
    }.new: :$^limit
}

my @a = evens-up-to 1_700_000;

say now - INIT now; # OUTPUT: 1.00765440

For a limit of 1.7 million, the code takes around a second to run. However, all we do in our Iterator is add some numbers together, so a lot of the time is likely lost in .pull-oneing the values and adding them to the Array, one by one.

In cases like this, implementing a custom .push-all method in our Iterator can help. The method receives one argument that is a reification target. We're pretty close to bare "metal" now, so we can't do anything fancy with the reification target object other than call .push method on it with a single value to add to the target. The .push-all always returns IterationEnd, since it exhausts the Iterator, so we'll just pop that value right into the return value of the method's Signature:

sub evens-up-to {
    Seq.new: class :: does Iterator {
        has int $!n = 0;
        has int $.limit is required;
        method pull-one {
            ($!n += 2) < $!limit ?? $!n !! IterationEnd
        }
        method push-all (\target --> IterationEnd) {
            target.push: $!n while ($!n += 2) < $!limit;
        }
    }.new: :$^limit
}

my @a = evens-up-to 1_700_000;
say now - INIT now; # OUTPUT: 0.91364949

Our program is now 10% faster; not a lot. However, since we're doing all the work in .push-all now, we no longer need to deal with state inside the method's body, so we can shave off a bit of time by using lexical variables instead of accessing object's attributes all the time. We'll make them use native int types for even more speed. Also, (at least currently), the += meta operator is more expensive than a simple assignment and a regular +; since we're trying to squeeze every last bit of juice here, let's take advantage of that as well. So what we have now is this:

sub evens-up-to {
    Seq.new: class :: does Iterator {
        has int $!n = 0;
        has int $.limit is required;
        method pull-one {
            ($!n += 2) < $!limit ?? $!n !! IterationEnd
        }
        method push-all (\target --> IterationEnd) {
            my int $limit = $!limit;
            my int $n     = $!n;
            target.push: $n while ($n = $n + 2) < $limit;
            $!n = $n;
        }
    }.new: :$^limit
}

my @a = evens-up-to 1_700_000;
say now - INIT now; # OUTPUT: 0.6688109

There we go. Now our program is 1.5 times faster than the original, thanks to .push-all. The gain isn't as dramatic as we what saw with other methods, but can come in quite handy when you need it.

There are a few more .push-* methods you can implement to, for example, do something special when your Seq is used in codes like...

for $your-seq -> $a, $b, $c { ... }

...where the Iterator would be asked to .push-exactly three items. The idea behind them is similar to .push-all: you push stuff onto the reification target. Their utility and performance gains are ever smaller, useful only in particular situations, so I won't be covering them.

It's worth noting the .push-all can be used only with Iterators that are not lazy, since... well... it expects you to push all the items. And what exactly are lazy Iterators? I'm so glad you asked!

A quick brown fox jumped over the lazy Seq

Let's pare down our previous Seq that generates even numbers down to the basics. Let's make it generate an infinite list of even numbers, using an anonymous state variable:

sub evens {
    Seq.new: class :: does Iterator {
        method pull-one { $ += 2 }
    }.new
}

put evens

Since the list is infinite, it'd take us an infinite time to fetch them all. So what exactly happens when we run the code above? It... quite predictably hangs when the put routine is called; it sits and patiently waits for our infinite Seq to complete. The same issue occurs when trying to assign our seq to a @-sigiled variable:

my @evens = evens # hangs

Or even when trying to pass our Seq to a sub with a slurpy parameter_Parameters):

sub meows (*@evens) { say 'Got some evens!' }
meows evens # hangs

That's quite an annoying problem. Fortunately, there's a very easy solution for it. But first, a minor detour to the land of naming clarification!

A rose by any other name would laze as sweet

In Perl 6 some things are or can be made "lazy". While it evokes the concept of on-demand or "lazy" evaluation, which is ubiquitous in Perl 6, things that are lazy in Perl 6 aren't just about that. If something is-lazy, it means it always wants to be evaluated lazily, fetching only as many items as needed, even in "mostly lazy" Perl 6 constructs that would otherwise eagerly consume even from sources that do on-demand generation.

For example, a sequence of lines read from a file would want to be lazy, as reading them all in at once has the potential to use up all the RAM. An infinite sequence would also want to be is-lazy because an eager evaluation would cause it to hang, as the sequence never completes.

So a thing that is-lazy in Perl 6 can be thought of as being infinite. Sometimes it actually will be infinite, but even if it isn't, it being lazy means it has similar consequences if used eagerly (too much CPU time used, too much RAM, etc).


Now back to our infinite list of even numbers. It sounds like all we have to do is make our Seq lazy and we do that by implementing .is-lazy method on our Iterator that simply returns True:

sub evens {
    Seq.new: class :: does Iterator {
        method pull-one { $ += 2 }
        method is-lazy (--> True) {}
    }.new
}

sub meows (*@evens) { say 'Got some evens!' }

put         evens; # OUTPUT: ...
my @evens = evens; # doesn't hang
meows       evens; # OUTPUT: Got some evens!

The put routine now detects its dealing with something terribly long and just outputs some dots. Assignment to Array no longer hangs (and will instead reify on demand). And the call to a slurpy doesn't hang either and will also reify on demand.

There's one more Iterator optimization method left that we should discuss...

A Sinking Ship

Perl 6 has sink context, similar to "void" context in other languages, which means a value is being discarded:

42;

# OUTPUT:
# WARNINGS for ...:
# Useless use of constant integer 42 in sink context (line 1)

The constant 42 in the above program is in sink context—its value isn't used by anything—and since it's nearly pointless to have it like that, the compiler warns about it.

Not all sinkage is bad however and sometimes you may find that gorgeous Seq on which you worked so hard is ruthlessly being sunk by the user! Let's take a look at what happens when we sink one of our previous examples, the Seq that generates up to limit even numbers:

sub evens-up-to {
    Seq.new: class :: does Iterator {
        has int $!n = 0;
        has int $.limit is required;
        method pull-one {
            ($!n += 2) < $!limit ?? $!n !! IterationEnd
        }
    }.new: :$^limit
}

evens-up-to 5_000_000; # sink our Seq

say now - INIT now; # OUTPUT: 5.87409072

Ouch! Iterating our Seq has no side-effects outside of the Iterator that it uses, which means it took the program almost six seconds to do absolutely nothing.

We can remedy the situation by implementing our own .sink-all method. Its default implementation .pull-ones until the end of the Seq (since Seqs may have useful side effects), which is not what we want for our Seq. So let's implement a .sink-all that does nothing!

sub evens-up-to {
    Seq.new: class :: does Iterator {
        has int $!n = 0;
        has int $.limit is required;
        method pull-one {
            ($!n += 2) < $!limit ?? $!n !! IterationEnd
        }
        method sink-all(--> IterationEnd) {}
    }.new: :$^limit
}

evens-up-to 5_000_000; # sink our Seq

say now - INIT now; # OUTPUT: 0.0038638

We added a single line of code and made our program 1,520 times faster—the perfect speed up for a program that does nothing!

However, doing nothing is not the only thing .sink-all is good for. Use it for clean up that would usually be done at the end of iteration (e.g. closing a file handle the Iterator was using). Or simply set the state of the system to what it would be at the end of the iteration (e.g. .seek a file handle to the end, for sunk Seq that produces lines from it). Or, as an alternative idea, how about warning the user their code might contain an error:

sub evens-up-to {
    Seq.new: class :: does Iterator {
        has int $!n = 0;
        has int $.limit is required;
        method pull-one {
            ($!n += 2) < $!limit ?? $!n !! IterationEnd
        }
        method sink-all(--> IterationEnd) {
            warn "Oh noes! Looks like you sunk all the evens!\n"
                ~ 'Why did you make them in the first place?'
        }
    }.new: :$^limit
}

evens-up-to 5_000_000; # sink our Seq

# OUTPUT:
# Oh noes! Looks like you sunk all the evens!
# Why did you make them in the first place?
# ...

That concludes our discussion on optimizing your Iterators. Now, let's talk about using Iterators others have made.

It's a marathon, not a sprint

With all the juicy knowledge about Iterators and Seqs we now possess, we can probably see how this piece of code manages to work without hanging, despite being given an infinite Range of numbers:

.say for ^∞ .grep(*.is-prime).map(* ~ ' is a prime number').head: 5;

# OUTPUT:
# 2 is a prime number
# 3 is a prime number
# 5 is a prime number
# 7 is a prime number
# 11 is a prime number

The infinite Range probably is-lazy. That .grep probably .pull-ones until it finds a prime number. The .map .pull-ones each of the .grep's values and modifies them, and .head allows at most 5 values to be .pull-oned from it.

In short what we have here is a pipeline of Seqs and Iterators where the Iterator of the next Seq is based on the Iterator of the previous one. For our study purposes, let's cook up a Seq of our own that combines all of the steps above:

sub first-five-primes (*@numbers) {
    Seq.new: class :: does Iterator {
        has     $.iter;
        has int $!produced = 0;
        method pull-one {
            $!produced++ == 5 and return IterationEnd;
            loop {
                my $value := $!iter.pull-one;
                return IterationEnd if $value =:= IterationEnd;
                return "$value is a prime number" if $value.is-prime;
            }
        }
    }.new: iter => @numbers.iterator
}

.say for first-five-primes ^∞;

# OUTPUT:
# 2 is a prime number
# 3 is a prime number
# 5 is a prime number
# 7 is a prime number
# 11 is a prime number

Our sub slurps up_Parameters) its positional arguments and then calls .iterator method on the @numbers Iterable. This method is available on all Perl 6 objects and will let us interface with the object using Iterator methods directly.

We save the @numbers's Iterator in one of the attributes of our Iterator as well as create another attribute to keep track of how many items we produced. In the .pull-one method, we first check whether we already produced the 5 items we need to produce, and if not, we drop into a loop that calls .pull-one on the other Iterator, the one we got from @numbers Array.

We recently learned that if the Iterator does not have any more values for us, it will return the IterationEnd constant. A constant whose job is to signal the end of iteration is finicky to deal with, as you can imagine. To detect it, we need to ensure we use the binding (:=), not the assignment (=) operator, when storing the value we get from .pull-one in a variable. This is because pretty much only the container identity (=:=) operator will accept such a monstrosity, so we can't stuff the value we .pull-one into just any container we please.

In our example program, if we do find that we received IterationEnd from the source Iterator, we simply return it to indicate we're done. If not, we repeat the process until we find a prime number, which we then put into our desired string and that's what we return from our .pull-one.

All the rest of the Iterator methods we've learned about can be called on the source Iterator in a similar fashion as we called .pull-one in our example.

Conclusion

Today, we've learned a whole ton of stuff! We now know that Seqs are powered by Iterator objects and we can make custom iterators that generate any variety of values we can dream about.

The most basic Iterator has only .pull-one method that generates a single value and returns IterationEnd when it has no more values to produce. It's not permitted to call .pull-one again, once it generated IterationEnd and we can write our .pull-one methods with the expectation that will never happen.

There are plenty of optimization opportunities a custom Iterator can take advantage of. If it can cheaply skip through items, it can implement .skip-one or .skip-at-least methods. If it can know how many items it'll produce, it can implement .bool-only and .count-only methods that can avoid a ton of work and memory use when only certain values of a Seq are needed. And for squeezing the very last bit of performance, you can take advantage of .push-all and other .push-* methods that let you push values onto the target directly.

When your Iterator .is-lazy, things will treat it with extra care and won't try to fetch all of the items at once. And we can use the .sink-all method to avoid work or warn the user of potential mistakes in their code, when our Seq is being sunk.

Lastly, since we know how to make Iterators and what their methods do, we can make use of Iterators coming from other sources and call methods on them directly, manipulating them just how we want to.

We now have all the tools to work with Seq objects in Perl 6. In the PART III of this series, we'll learn how to compactify all of that knowledge and skillfully build Seqs with just a line or two of code, using the sequence operator.

Stay tuned!

-Ofun

Perl 6: Seqs, Drugs, And Rock'n'Roll

Read this article on Perl6.Party

I vividly recall my first steps in Perl 6 were just a couple of months before the first stable release of the language in December 2015. Around that time, Larry Wall was making a presentation and showed a neat feature—the sequence operator—and it got me amazed about just how powerful the language is:

# First 12 even numbers:
say (2, 4 … ∞)[^12];      # OUTPUT: (2 4 6 8 10 12 14 16 18 20 22 24)

# First 10 powers of 2:
say (2, 2², 2³ … ∞)[^10]; # OUTPUT: (2 4 8 16 32 64 128 256 512 1024)

# First 13 Fibonacci numbers:
say (1, 1, *+* … ∞)[^13]; # OUTPUT: (1 1 2 3 5 8 13 21 34 55 89 144 233)

The ellipsis () is the sequence operator and the stuff it makes is the Seq object. And now, a year and a half after Perl 6's first release, I hope to pass on my amazement to a new batch of future Perl 6 programmers.

This is a 3-part series. In PART I of this article we'll talk about what Seq s are and how to make them without the sequence operator. In PART II, we'll look at the thing-behind-the-curtain of Seq's: the Iterator type and how to make Seqs from our own Iterators. Lastly, in PART III, we'll examine the sequence operator in all of its glory.

Note: I will be using all sorts of fancy Unicode operators and symbols in this article. If you don't like them, consult with the Texas Equivalents page for the equivalent ASCII-only way to type those elements.

PART I: What the Seq is all this about?

The Seq stands for Sequence and the Seq object provides a one-shot way to iterate over a sequence of stuff. New values can be generated on demand—in fact, it's perfectly possible to create infinite sequences—and already-generated values are discarded, never to be seen again, although, there's a way to cache them, as we'll see.

Sequences are driven by Iterator objects that are responsible for generating values. However, in many cases you don't have to create Iterators directly or use their methods while iterating a Seq. There are several ways to make a Seq and in this section, we'll talk about gather/take construct.

I gather you'll take us to...

The gather statement and take routine are similar to "generators" and "yield" statement in some other languages:

my $seq-full-of-sunshine := gather {
    say  'And nobody cries';
    say  'there’s only butterflies';

    take 'me away';
    say  'A secret place';
    say  'A sweet escape';

    take 'meee awaaay';
    say  'To better days'    ;

    take 'MEEE AWAAAAYYYY';
    say  'A hiding place';
}

Above, we have a code block with lines of song lyrics, some of which we say (print to the screen) and others we take (to be gathered). Just like, .say can be used as either a method or a subroutine, so you can use .take as a method or subroutine, there's no real difference; merely convenience.

Now, let's iterate over $seq-full-of-sunshine and watch the output:

for $seq-full-of-sunshine {
    ENTER say '▬▬▶ Entering';
    LEAVE say '◀▬▬ Leaving';

    say "❚❚ $_";
}

# OUTPUT:
# And nobody cries
# there’s only butterflies
# ▬▬▶ Entering
# ❚❚ me away
# ◀▬▬ Leaving
# A secret place
# A sweet escape
# ▬▬▶ Entering
# ❚❚ meee awaaay
# ◀▬▬ Leaving
# To better days
# ▬▬▶ Entering
# ❚❚ MEEE AWAAAAYYYY
# ◀▬▬ Leaving
# A hiding place

Notice how the say statements we had inside the gather statement didn't actualy get executed until we needed to iterate over a value that take routines took after those particular say lines. The block got stopped and then continued only when more values from the Seq were requested. The last say call didn't have any more takes after it, and it got executed when the iterator was asked for more values after the last take.

That's exceptional!

The take routine works by throwing a CX::Take control exception that will percolate up the call stack until something takes care of it. This means you can feed a gather not just from an immediate block, but from a bunch of different sources, such as routine calls:

multi what's-that (42)                     { take 'The Answer'            }
multi what's-that (Int $ where *.is-prime) { take 'Tis a prime!'          }
multi what's-that (Numeric)                { take 'Some kind of a number' }

multi what's-that   { how-good-is $^it                   }
sub how-good-is ($) { take rand > ½ ?? 'Tis OK' !! 'Eww' }

my $seq := gather map &what's-that, 1, 31337, 42, 'meows';

.say for $seq;

# OUTPUT:
# Some kind of a number
# Tis a prime!
# The Answer
# Eww

Once again, we iterated over our new Seq with a for loop, and you can see that take called from different multies and even nested sub calls still delivered the value to our gather successfully:

The only limitation is you can't gather takes done in another Promise or in code manually cued in the scheduler:

gather await start take 42;
# OUTPUT:
# Tried to get the result of a broken Promise
#   in block <unit> at test.p6 line 2
#
# Original exception:
#     take without gather

gather $*SCHEDULER.cue: { take 42 }
await Promise.in: 2;
# OUTPUT: Unhandled exception: take without gather

However, nothing's stopping you from using a Channel to proxy your data to be taken in a react block.

my Channel $chan .= new;
my $promise = start gather react whenever $chan { .take }

say "Sending stuff to Channel to gather...";
await start {
    $chan.send: $_ for <a b c>;
    $chan.close;
}
dd await $promise;

# OUTPUT:
# Sending stuff to Channel to gather...
# ("a", "b", "c").Seq

Or gathering takes from within a Supply:

my $supply = supply {
    take 42;
    emit 'Took 42!';
}

my $x := gather react whenever $supply { .say }
say $x;

# OUTPUT: Took 42!
# (42)

Stash into the cache

I mentioned earlier that Seqs are one-shot Iterables that can be iterated only once. So what exactly happens when we try to iterate them the second time?

my $seq := gather take 42;
.say for $seq;
.say for $seq;

# OUTPUT:
# 42
# This Seq has already been iterated, and its values consumed
# (you might solve this by adding .cache on usages of the Seq, or
# by assigning the Seq into an array)

A X::Seq::Consumed exception gets thrown. In fact, Seqs do not even do the Positional role, which is why we didn't use the @ sigil that type- checks for Positional on the variables we stored Seqs in.

The Seq is deemed consumed whenever something asks it for its Iterator after another thing grabbed it, like the for loop would. For example, even if in the first for loop above we would've iterated over just 1 item, we wouldn't be able to resume taking more items in the next for loop, as it'd try to ask for the Seq's iterator that was already taken by the first for loop.

As you can imagine, having Seqs always be one-shot would be somewhat of a pain in the butt. A lot of times you can afford to keep the entire sequence around, which is the price for being able to access its values more than once, and that's precisely what the Seq.cachemethod does:

my $seq := gather { take 42; take 70 };
$seq.cache;

.say for $seq;
.say for $seq;

# OUTPUT:
# 42
# 70
# 42
# 70

As long as you call .cache before you fetch the first item of the Seq, you're good to go iterating over it until the heat death of the Universe (or until its cache noms all of your RAM). However, often you do not even need to call .cache yourself.

Many methods will automatically .cache the Seq for you:

There's one more nicety with Seqs losing their one-shotness that you may see refered to as PositionalBindFailover. It's a role that indicates to the parameter binder that the type can still be converted into a Positional, even when it doesn't do Positional role. In plain English, it means you can do this:

sub foo (@pos) { say @pos[1, 3, 5] }

my $seq := 2, 4 … ∞;
foo $seq; # OUTPUT: (4 8 12)

We have a sub that expects a Positional argument and we give it a Seq which isn't Positional, yet it all works out, because the binder .caches our Seq and uses the List the .cache method returns to be the Positional to be used, thanks to it doing the PositionalBindFailover role.

Last, but not least, if you don't care about all of your Seq's values being generated and cached right there and then, you can simply assign it to a @ sigiled variable, which will reify the Seq and store it as an Array:

my @stuff = gather {
    take 42;
    say "meow";
    take 70;
}

say "Starting to iterate:";
.say for @stuff;

# OUTPUT:
# meow
# Starting to iterate:
# 42
# 70

From the output, we can see say "meow" was executed on assignment to @stuff and not when we actually iterated over the value in the for loop.

Conclusion

In Perl 6, Seqs are one-shot Iterables that don't keep their values around, which makes them very useful for iterating over huge, or even infinite, sequences. However, it's perfectly possible to cache Seq values and re-use them, if that is needed. In fact, many of the Seq's methods will automatically cache the Seq for you.

There are several ways to create Seqs, one of which is to use the gather and take where a gather block will stop its execution and continue it only when more values are needed.

In parts II and III, we'll look at other, more exciting, ways of creating Seqs. Stay tuned!

-Ofun

Perl 6 Release Quality Assurance: Full Ecosystem Toaster

Read this article on Perl6.Party

As some recall, Rakudo's 2017.04 release was somewhat of a trainwreck. It was clear the quality assurance of releases needed to be kicked up a notch. So today, I'll talk about what progress we've made in that area.

Define The Problem

A particular problem that plagued the 2017.04 release were big changes and refactors made in the compiler that passed all the 150,000+ stresstests, however still caused issues in some ecosystem modules and users' code.

The upcoming 2017.06 has many, many more big changes:

  • IO::ArgFiles were entirely replaced with the new IO::CatHandle implementation
  • IO::Socket got a refactor and sync sockets no longer use libuv
  • IO::Handle got a refactor with encoding and sync IO no longer uses libuv
  • Sets/Bags/Mixes got optimization polish and op semantics finalizations
  • Proc was refactored to be in terms of Proc::Async

The IO and Proc stuff is especially impactful, as it affects precomp and module loading as well. Merely passing stresstests just wouldn't give me enough of peace of mind of a solid release. It was time to extend the testing.

Going All In

The good news is I didn't actually have to write any new tests. With 836 modules in the Perl 6 ecosystem, the tests were already there for the taking. Best of all, they were mostly written without bias due to implementation knowledge of core code, as well as have personal style variations from hundreds of different coders. This is all perfect for testing for any regressions of core code. The only problem is running all that.

While there's a budding effort to get CPANTesters to smoke Perl 6 dists, it's not quite the data I need. I need to smoke a whole ton of modules on a particular pre-release commit, while also smoking them on a previous release on the same box, eliminating setup issues that might contribute to failures, as well as ensuring the results were for the same versions of modules.

My first crude attempt involved firing up a 32-core Google Compute Engine VM and writing a 60-line script that launched 836 Proc::Asyncs—one for each module.

Other than chewing through 125 GB of RAM with a single Perl 6 program, the experiment didn't yield any useful data. Each module had to wait for locks, before being installed, and all the Procs were asking zef to install to the same location, so dependency handling was iffy. I needed a more refined solution...

Procs, Kernels, and Murder

So, I started to polish my code. First, I wrote Proc::Q module that let me queue up a bunch of Procs, and scale the number of them running at the same time, based on the number of cores the box had. Supply.throttle core feature made the job a piece of cake.

However, some modules are naughty or broken and I needed a way to kill Procs that take too long to run. Alas, I discovered that Proc::Async.kill had a bug in it, where trying to simultaneously kill a bunch of Procs was failing. After some digging I found out the cause was $*KERNEL.signal method the .kill was using isn't actually thread safe and the bug was due to a data race in initialization of the signal table.

After refactoring Kernel.signal, and fixing Proc::Async.kill, I released Proc::Q module—my first module to require (at the time) the bleedest of bleeding edges: a HEAD commit.

Going Atomic

After cooking up boilerplate DB and Proc::Q code, I was ready to toast the ecosystem. However, it appeared zef wasn't designed, or at least well-tested, in scenarious where up to 40 instances were running module installations simultaneously. I was getting JSON errors from reading ecosystem JSON, broken cache files (due to lack of file locking), and false positives in installations because modules claimed they were already installed.

I initially attempted to solve the JSON errors by looking at an Issue in the ecosystem repo about the updater script not writing atomically. However, even after fixing the updater script, I was still getting invalid JSON errors from zef when reading ecosystem data.

It might be due to something in zef, but instead of investigating it further, I followed ugexe++'s advice and told zef not to fetch ecosystem in each Proc. The broken cache issues were similarly eliminated by disabling caching support. And the false positives were eliminated telling each zef instance to install the tested module into a separate location.

The final solution involved programatically editing zef's config file before a toast run to disable auto-updates of CPAN and p6c ecosystem data, and then in individual Procs zef module install command ended up being:

«zef --/cached --debug install "$module" "--install-to=inst#$where"»

Where $where is a per-module, per-rakudo-commit location. The final issue was floppy test runs, which I resolved by re-testing failed modules one more time, to see if the new run succeeds.

Time is Everything

The toasting of the entire ecosystem on HEAD and 2017.05 releases took about three hours on a 24-core VM, while being unattended. While watching over it and killing the few hanging modules at the end without waiting for them to time out makes a single-commit run take about 65 minutes.

I also did a toast run on a 64-core VM...

Overall, the run took me 50 minutes, and I had to manually kill some modules' tests. However, looking at CPU utilization charts, it seems the run sat idle for dozens of minutes before I came along to kill stuff:

So I think after some polish of avoiding hanging modules and figuring out why (apparently) Proc::Async.kill still doesn't kill everything, the runs can be entirely automated and a single run can be completed in about 20-30 minutes.

This means that even with last-minute big changes pushed to Rakudo, I can still toast the entire ecosystem reasonably fast, detect any potential regressions, fix them, and re-test again.

Reeling In The Catch

The Toaster database is available for viewing at toast.perl6.party. As more commits get toasted, they get added to the database. I plan to clear them out after each release.

The toasting runs I did so far weren't just a chance to play with powerful hardware. The very first issue was detected when toasting Clifford module.

The issue was to do with Lists of Pairs with same keys coerced into a MixHash, when the final accumulative weight was zero. The issue was introduced on June 7th and it took me about an hour of digging through the module's guts to find it. Considering it's quite an edge case, I imagine without the toaster runs it would take a lot longer to identify this bug. lizmat++ squashed this bug hours after identification and it never made it into any releases.

The other issue detected by toasting had to do with the VM-backed decoder serialization introduced during IO refactor and jnthn++ fixed it a day after detection. One more bug had to do with Proc refactor making Proc not synchronous-enough. It was mercilessly squashed, while fixing a couple of longstanding issues with Proc.

All of these issues weren't detected by the 150,000+ tests in the testsuite and while an argument can be made that the tests are sparse in places, there's no doubt the Toaster has paid off for the effort in making it by catching bugs that might've otherwise made it into the release.

The Future

The future plans for the Toaster would be first to make it toast on more platforms, like Windows and MacOS. Eventually, I hope to make toast runs continuous, on less-powerful VMs that are entirely automated. An IRC bot would watch for any failures and report them to the dev channel.

Conclusion

The ecosystem Toaster lets core devs test a Rakudo commit on hundreds of software pieces, made by hundreds of different developers, all within a single hour. During its short existence, the Toaster already found issues with ecosystem infrastructure, highly-multi-threaded Perl 6 programs, as well as detected regressions and new bugs that we were able to fix before the release.

The extra testing lets core devs deliver higher-quality releases, which makes Perl 6 more trustworthy to use in production-quality software. The future will see the Toaster improved to test on a wider range of systems, as well as being automated for continued extended testing.

And most importantly, the Toaster makes it possible for any Perl 6 programmer to help core development of Perl 6, by simply publishing a module.

-Ofun

COMPLETION Report / Perl 6 IO TPF Grant

This document is the May, 2017 progress report for TPF Standardization, Test Coverage, and Documentation of Perl 6 I/O Routines grant. I believe I reasonably satisfied the goals of the grant and consider it completed. This is the final report and may reference some of the work/commits previously mentioned in monthly reports.

Thank You!

I'd like to thank all the donors that support The Perl Foundation who made this grant possible. It was a wonderful learning experience for me, and it brings me joy to look back and see Perl 6 improved due to this grant.

Thank You!

Completeness Criteria

Here are the original completeness criteria (in bold) that are listed on the original grant proposal and my comments on their status:

  • rakudo repository will contain the IO Action Plan document and it will be fully implemented. The promised document exists. It's fully implemented except for three items that I listed on the IO Action Plan, but which are currently a bit beyond my skill level to implement. I hope to do them eventually, but outside the scope of this grant. They are:
    • IO::Handle's Closed status. My original proposal would cause some perfomance issues, so it was decided to improve MoarVM errors instead.
    • Optimize multiple stat calls. This involves creating a new nqp op, with code for it implemented in MoarVM and JVM backends.
    • Use typed exceptions instead of X::AdHoc. I made typed exceptions be thrown whereever I could. The rest require VM-level exceptions and is on the same level as the handle closed status issue (first item above).
  • All of the I/O routines will have tests in roast and documented on docs.perl6.org. If any of the currently implemented but unspecced routines are decided against being included in Perl 6 Language, their implementation will no longer be available in Rakudo. To the best of my knowledge, this is completed in full.
  • The test coverage tool will report all I/O routines as covered and the information will be visible on perl6.wtf (Perl 6's Wonderful Test Files) website. Note: due to current experimental status of the coverage tool, its report may still show some lines or conditionals untested despite them actually being tested; however, it will show the lines where routines' names are specified as covered. To the best of my knowledge, all IO routines currently have tests covering them. Due to its experimental status, the coverage tool shows some attributes as uncovered. I did manually verify all the attributes/routines whose names the tool shows as uncovered contain tests for them. One exception is IO::Notification type (and IO::Path.watch method). While it has full coverage for OSX operating system, it lacks it for other OSes. I tried writing some tests for it, but it looks like the behaviour of the nqp op handling these is broken on Linux and the class needs more work.

Extra Deliverables

I produced these extra deliverables while working on the grant:

  • The Definitive I/O Guide. Providing tutorial-like documentation for Perl 6's I/O, including documenting some of the bad practices I noticed in the ecosystem (and even a Perl 6 book!) and the correct way to perform those tasks. (N.B. as I write this report, the guide could still use a few extra sections to be considered "The Definitive"; I'll write them in upcoming weeks)
  • Performance improvements. I made 23 performance-enhancing commits, with many commits making things more than 200% faster, with highest improvement making a routine 6300% faster.
  • Trait::IO module. Provides does auto-close pseudo-trait to simplify closing of IO handles.
  • IO::Path::ChildSecure module. Due to large ecosystem usage, IO::Path.child was left as is until 6.d language, at which point it will be made secure (as outlined in the IO Plan). This module provides the secure version in the mean time.
  • IO::Dir module. Provides IO::Path.dir-like functionality, with ability to close open directory without needing to fully exhaust the returned Seq.
  • Die module. Implements Perl-5-like behaviour for &die routine.
  • The "Map of Perl 6 Routines" (or rather the "table") is available on map.perl6.party with its code in perl6/routine-map repo. In near future, I plan to use it to identify incorrect or incomplete entries in our documentation

In addition, I plan to complete these modules some time in the future; the ideas for them were birthed while working on the grant: - NL module. Targeted for use in one liners, the module will provide $*NL dynvar that behaves like Perl 5's $. variable (providing current $*ARGFILES's file's line number). Its implementation became possible thanks to newly-implemented IO::CatHandle type - FastIO module. A re-imagination of core IO, the biggest part of which will be the removal of (user-exposed) use of IO::Spec::* types and $*SPEC variable, which—it is believed—will provide improved performance over core IO. The module is a prototype for some of the proposals that were made during the IO grant and if it offers significant improvements over core IO, its ideas will be used by core IO in future language versions.

Work Performed in May

For the work done in May, many of my commits went into going through the IO routine list, and adding missing tests and documentation, along with fixing bugs (and reporting new ones I found).

The major work was implementation of the IO::CatHandle class that fixed all of the bugs and NYIs with the $*ARGFILES. This work saw the addition of 372 lines of code, 800 lines of tests and 793 lines of documentation.

Work by Other Core Members

jnthn++ completed the handle encoding refactor that will eventually let us get rid of using libuv for syncronous IO and, more importantly, allow us to support user-defined encoders/decoders.

Along with fixing a bunch of bugs, this work altered the performance landscape for IO operations (i.e. some operations may now be a bit faster, others a bit slower), though overall the performance appeared to stay the same.

Tickets Fixed

Grant Commits

During this grant, I've made 417 commits, that are: 134 Rakudo commits + 23 performance-enchancing Rakudo commits + 114 Perl 6 Specification commits + 146 documentation commits,

Performance Rakudo Commits

I've made 23 performance enchancing commits to Rakudo's repository:

  • 4032953 Make IO::Handle.open 75% faster
  • dcf1bb2 Make IO::Spec::Unix.rel2abs 35% faster
  • c13480c IO::Path.slurp: make 12%-35% faster; propagate Failures
  • 0e36bb2 Make IO::Spec::Win32!canon-cat 2.3x faster
  • c6fd736 Make IO::Spec::Win32.is-absolute about 63x faster
  • 894ba82 Make IO::Spec::Win32.split about 82% faster
  • 277b6e5 Make IO::Spec::Unix.rel2abs 2.9x faster
  • 74680d4 Make IO::Path.is-absolute about 80% faster
  • ff23416 Make IO::Path.is-relative about 2.1x faster
  • d272667 Make IO::Spec::Unix.join about 40% faster
  • 50429b1 Make IO::Handle.put($x) about 5%-35% faster
  • 204ea59 Make &say(**@args) 70%− faster
  • 6d7fc8e Make &put(**@args) up to 70% faster
  • 76af536 Make 1-arg IO::Handle.say up to 2x faster
  • aa72bde Remove dir's :absolute and :Str; make up to 23% faster
  • 48cf0e6 Make IO::Spec::Cygwin.is-absolute 21x faster
  • c96727a Fix combiners on SPEC::Win32.rel2abs; make 6% faster
  • 0547979 Make IO::Spec::Unix.path consistent and 4.6x faster
  • 8992af1 Fix IO::Spec::Win32.path and make 26x faster
  • 7d6fa73 Make IO::Spec::Win32.catpath 47x faster
  • 494659a Make IO::Spec::Win32.join 26x faster
  • 6ca702f Make IO::Spec::Unix.splitdir 7.7x faster
  • 2816ef7 Make IO::Spec::Win32.splitdir 25x faster

Non-Performance Rakudo Commits

Other than perf commits, I've also made 134 commits to the Rakudo's repository:

  • dd4dfb1 Fix crash in IO::Special .WHICH/.Str
  • 76f7187 Do not cache IO::Path.e results
  • 212cc8a Remove IO::Path.Bridge
  • a01d679 Remove IO::Path.pipe
  • 55abc6d Improve IO::Path.child perf on *nix
  • 4fdebc9 Make IO::Spec::Unix.split 36x Faster
  • 0111f10 Make IO::Spec::Unix.catdir 3.9x Faster
  • fa9aa47 Make R::I::SET_LINE_ENDING_ON_HANDLE 4.1x Faster
  • c360ac2 Fix smartmatch of Cool ~~ IO::Path
  • 0c7e4a0 Do not capture args in .IO method
  • 9d8d7b2 Log all changes to plan made during review period
  • 87987c2 Removerole IOand its .umask method
  • 36ad92a Remove 15 methods from IO::Handle
  • a5800a1 Implement IO::Handle.spurt
  • aa62cd5 Remove &tmpdir and &homedir
  • a0ef2ed Improve &chdir, &indir, and IO::Path.chdir
  • ca1acb7 Fix race in &indir(IO::Path …)
  • 2483d68 Fix regression in &chdir's failure mode
  • 5464b82 Improve &*chdir
  • 4c31903 Add S32-io/chdir-process.t to list of test files to run
  • cb27bce Clean up &open and IO::Path.open
  • 099512b Clean up and improve all spurt routines
  • b62d1a7 Give $*TMPDIR a container
  • b1e7a01 Implement IO::Path.extension 2.0
  • 15a25da Fix ambiguity in empty extension vs no extension
  • 50aea2b Restore IO::Handle.IO
  • 966a7e3 Implement IO::Path.concat-with
  • 94a6909 Clean up IO::Spec::Unix.abs2rel a bit
  • a432b3d Remove IO::Path.abspath (part 2)
  • 954e69e Fix return value of IO::Special methods
  • 67f06b2 Run S32-io/io-special.t test file
  • a0b82ed Make IO::Path::* actually instantiate a subclass
  • 0c8bef5 Implement :parent in IO::Spec::Cygwin.canonpath
  • 0a442ce Remove type constraint in IO::Spec::Cygwin.canonpath
  • b4358af Delete code for IO::Spec::Win32.catfile
  • e681498 Make IO::Path throw when path contains NUL byte
  • 6a8d63d Implement :completely param in IO::Path.resolve
  • b6838ee Remove .f check in .z
  • 184d499 Make IO::Handle.Supply respect handle's mode
  • f1b4af7 Implement IO::Handle.slurp
  • 90da80f Rework read methods in IO::Path/IO::Handle
  • 8c09c84 Fix symlink and link routines
  • da1dea2 Fix &symlink and &link
  • 7f73f92 Make IO::Path.new-from-absolute-path private
  • ff97083 Straighten up rename, move, and copy
  • 0d9ecae Remove multi-dir &mkdir
  • 6ee71c2 Coerce mode in IO::Path.mkdir to Int
  • d46e8df Add IO::Pipe .path and .IO methods
  • c01ebea Make IO::Path.mkdir return invocant on success
  • 1f689a9 Fix up IO::Handle.Str
  • 490ffd1 Do not use self.Str in IO::Path errors
  • 40217ed Swap .child to .concat-with in all the guts
  • fd503f8 Revert "Removerole IOand its .umask method"
  • c95c4a7 Make IO::Path/IO::Special do IO role
  • 214198b Implement proper args for IO::Handle.lock
  • 9a2446c Move Bool return value to signature
  • 51e4629 Amend rules for last part in IO::Path.resolve
  • b8458d3 Rewordmethod childfor cleaner code
  • 1887114 Implement IO::Path.child-secure
  • 9d8e391 Fix IO::Path.resolve with combiners; timotimo++
  • 0b5a41b Rename IO::Path.concat-with to .add
  • a98b285 Remove IO::Path.child-secure
  • 8bacad8 Implement IO::Path.sibling
  • 7112a08 Add :D on invocant for file tests
  • b2a64a1 Fix $*CWD inside IO::Path.dir's :test Callable
  • 6fa4bbc Straighten out &slurp/&spurt/&get/&getc/&close
  • 34b58d1 Straighten out &lines/&words
  • d0cd137 Make dir take any IO(), not just Cool
  • 7412184 Make $*HOME default to Nil, not Any
  • 475d9bc Fix display of backslashes in IO::Path.gist
  • 6ef2abd Revert "Fix display of backslashes in IO::Path.gist"
  • 134efd8 Fix .perl for IO::Path and subclasses
  • 69320e7 Fix .IO on :U of IO::Path subclasses
  • eb8d006 Make IO::Handle.iterator a private lines iterator
  • 08a8075 Fix IO::Path.copy/move when source/target are same
  • 973338a Fix IO::Handle.comb/.split; make them .slurp
  • b43ed18 Make IO::Handle.flush fail with typed exceptions
  • 276d4a7 Remove .tell info in IO::Handle.gist
  • f4309de Fix IO::Spec::Unix.is-absolute for combiners on /
  • 06d8800 Fix crash when setting .nl-in ...
  • 7e9496d Make IO::Handle.encoding settable via .new
  • 95e49dc Make IO::Handle.open respect attribute values
  • 6ed14ef Remove:directoryfrom IO::Spec::*.split
  • 9021a48 Make IO::Path.parts a Map instead of Hash
  • a282b8c Fix IO::Handle.perl.EVAL roundtrippage
  • a412788 Make IO::Path.resolve set CWD to $!SPEC.dir-sep
  • 84502dc Implement $limit arg for IO::Handle.words
  • 613bdcf Make IO::Handle.print/.put sig consistent
  • 0646d3f Allow no-arg &prompt
  • 4a8aa27 Implement IO::CatHandle.close
  • 4ad8b17 Implement IO::CatHandle.get
  • 3b668b6 Implement IO::CatHandle.getc
  • 25b664a Implement IO::CatHandle.words
  • 7ebc386 Implement IO::CatHandle.slurp
  • 52b34b7 Implement IO::CatHandle.comb/.split
  • beaa925 Implement IO::CatHandle.read
  • ccc90fd Implement IO::CatHandle.readchars
  • 40f4dc9 Implement IO::CatHandle.Supply
  • 0c9aea7 Implement IO::CatHandle.encoding
  • ee1e185 Implement IO::CatHandle.eof
  • 80686a7 Implement IO::CatHandle.t/.path/.IO/.native-descriptor
  • 993de50 Implement IO::CatHandle.gist/.Str/.opened/.open
  • 677c4ea Implement IO::CatHandle.lock/.unlock/.seek/.tell
  • e657ed1 Implement IO::CatHandle.chomp/.nl-in
  • a452e42 Implement IO::CatHandle.on-switch
  • f539a62 Swap IO::ArgFiles to IO::CatHandle impl
  • fa7aa1c Implement IO::CatHandle.perl method
  • 21fd2c4 Remove IO::Path.watch
  • 65941b2 Revert "Remove IO::Path.watch"
  • a47a78f Remove useless :SPEC/:CWD on some IO subs
  • d13d9c2 Throw out IO::Path.int

Perl 6 Specification Commits

I've made 114 commits to the Perl 6 Specification (roast) repository:

  • 63370fe Test IO::Special .WHICH/.Str do not crash
  • 465795c Test IO::Path.lines(*) does not crash
  • 091931a Expand &open tests
  • 8d6ca7a Cover IO::Path.ACCEPTS
  • 14b6844 Use Numeric instead of IO role in dispatch test
  • 5a7a365 Expand IO::Spec::*.tmpdir tests
  • f48198f Test &indir
  • bd46836 Amend &indir race tests
  • 04333b3 Test &indir fails with non-existent paths by default
  • 73a5448 Remove two fudged &chdir tests
  • 86f79ce Expand &chdir tests
  • 430ab89 Test &*chdir
  • 86c5f9c Delete qp{} tests
  • 3c4e81b Test IO::Path.Str works as advertised
  • ba3e7be Merge S32-io/path.t and S32-io/io-path.t
  • 79ff022 Expand &spurt and IO::Path.spurt tests
  • 1d4e881 Test $*TMPDIR can betemped
  • b23e53e Test IO::Path.extension
  • 2f09f18 Fix incorrect test
  • 305f206 Test empty-string extensions in IO::Path.extension
  • 0e47f25 Test IO::Path.concat-with
  • e5dc376 Expand IO::Path.accessed tests
  • 43ec543 Cover methods of IO::Special
  • bd8d167 Test IO::Path::* instantiate a subclass
  • d8707e7 Cover IO::Spec::Unix.basename
  • c3c51ed Cover IO::Spec::Win32.basename
  • 896033a Cover IO::Spec::QNX.canonpath
  • 7c7fbb4 Cover :parent arg in IO::Spec::Cygwin.canonpath
  • 8f73ad8 Change \0 roundtrip test to \t roundtrip test
  • b16fbd3 Add tests to check nul byte is rejected
  • ee7f05b Move is-path sub to top so it can be reused
  • a809f0f Expand IO::Path.resolve tests
  • feecaf0 Expand file tests
  • a4c53b0 Use bin IO::Handle to test its .Supply
  • 7e4a2ae Swap .slurp-rest to .slurp
  • d4353b6 Rewrite .l on broken symlinks test
  • 416b746 Test symlink routines
  • 8fa49e1 Testlinkroutines
  • 637500d Spec IO::Pipe.path/.IO returns IO::Path type object
  • 64ff572 Cover IO::Path/IO::Pipe's .Str/.path/.IO
  • 4194755 Test IO::Handle.lock/.unlock
  • a716962 Amend rules for last part in IO::Path.resolve
  • f3c5dae Test IO::Path.child-secure
  • 92217f7 Test IO::Path.child-secure with combiners
  • 39677c4 IO::Path.concat-with got renamed to .add
  • 7a063b5 Fudge .child-secure tests
  • 3b36d4d Test IO::Path.sibling
  • 41b7f9f Test $*CWD in IO::Path.dir(:test) Callable
  • 18d9c04 Cover IO::Handle.spurt
  • 8f78ca6 Test &words with IO::ArgFiles
  • ea137f6 Cover IO::Handle.tell
  • 71a6423 Add $*HOME tests
  • 95d68a2 Test IO::Path.gist does escapes of backslashes
  • de89d25 Revert "Test IO::Path.gist does escapes of backslashes"
  • 9e8b154 Test IO::Handle.close can be...
  • 853f76f Test IO::Pipe.close returns pipe's Proc
  • d543e75 Test IO::Handle.DESTROY closes the handle
  • 1ed18b4 Add test for .perl.EVAL roundtrip with combiners
  • 704210c Test we can roundtrip IO::Path.perl
  • 2689eb1 Test .IO on :U of IO::Path subclasses
  • 40353f1 Test for IO::Handle:D { ... } loops over handle
  • 4fdb850 Test IO::Path.copy/move when source/target are same
  • 98917dc Test IO::Path.dir's absoluteness behaviour
  • 71eebc7 Test IO::Spec::Unix.extension
  • 4495615 Test IO::Handle.flush
  • 60f5a6d Test IO::Handle.t when handle is a TTY
  • 31e3993 Test IO::Path*.gist
  • c481433 Test .is-absolute method for / with combiners
  • 8ee0a0a Test IO::Spec::Win32.rel2abs with combiners
  • a41027f Test IO::Handle.nl-in can be set
  • e82b798 Test IO::Handle.open respects attributes
  • 2c29150 Test IO::Handle.nl-in attribute
  • 03ce93b Test IO::Handle.encoding can be set
  • 8ae81c0 Test no-arg candidate of &note
  • fb61306 Test IO::Path.parts attribute
  • 7266522 Test return type of IO::Spec::Unix.path
  • 6ac3b4a Test IO::Spec::Win32.path
  • dbbea15 Test IO::Handle.perl.EVAL roundtrips
  • 5eb513c Test IO::Path.resolve sets CWD to $!SPEC.dir-sep
  • b0c4a7a Test &words, IO::Handle.words, and IO::Path.words
  • f3d1f67 Test $limit arg with &lines/IO::*.lines
  • 4f5589b Add test for handle leak in IO::Path.lines
  • 4d0f97a Add &put/IO::Handle.put tests
  • 125fe18 Add &prompt tests
  • 939ca8d Test IO::CatHandle.close
  • 9833012 Test IO::CatHandle.get
  • 2f65a72 Test IO::CatHandle.getc
  • a4a7eaa Test IO::CatHandle.words
  • 1131c09 Add &put/IO::Handle.put tests
  • 80de9b6 Add &prompt tests
  • bacfd9f Test IO::CatHandle.slurp
  • e78e3c0 Test IO::CatHandle.comb/.split
  • f1c1125 Test IO::CatHandle.read
  • e9e78e1 Test IO::CatHandle.readchars
  • 0479087 Test IO::CatHandle.Supply
  • 71953e3 Test IO::CatHandle.encoding
  • db4847e Test IO::CatHandle.eof
  • 175ba45 Test IO::CatHandle.t/.path/.IO/.native-descriptor
  • c6cc66a Test IO::CatHandle.gist/.Str/.opened/.open
  • dcdac1a Test IO::CatHandle.lock/.unlock/.seek/.tell
  • f48c26e Test IO::CatHandle.chomp/.nl-in
  • 8afd758 Test IO::CatHandle.DESTROY
  • c7eff2b Test IO::CatHandle.on-switch
  • e87e20d Test IO::CatHandle.next-handle
  • 28717f0 Test IO::CatHandle.perl method
  • 432bf94 Test IO::Path.watch
  • ce1b637 Test IO::Handle.say
  • 0bb6298 Test IO::Handle.print-nl
  • 47c88ab Test IO::Pipe.proc attribute
  • 945621d Test IO::Path.SPEC attribute
  • 5fb4b63 Test IO::Path.CWD/.path attributes
  • d0e5701 Test IO::Path.Numeric and other .numeric methods
  • 94d7133 Test 0-arg &say/&put/&print
  • 38c61cd Test &slurp() and &slurp(IO::Handle)

Perl 6 Documentation Commits

I've made 146 commits to the Perl 6 Documentation repository:

  • fd7a41b Improve code example
  • 110efb4 No need for.ends-with``
  • 69d32da Remove IO::Handle.z
  • d02ae7d Remove IO::Handle.rw and .rwx
  • ccae74a Fix incorrect information for IO::Path.absolute
  • 3cf943d Expand IO::Path.relative
  • cc496eb Remove mention of IO.umask
  • 335a98d Remove mention ofrole IO``
  • cc6539b Remove 8 methods from IO::Handle
  • 0511e07 Document IO::Spec::*.tmpdir
  • db36655 Remove tip to use $*SPEC to detect OS
  • 839a6b3 Expand docs for $*HOME and $*TMPDIR
  • d050d4b Remove IO::Path.chdir prose
  • 1d0e433 Document &chdir
  • 3fdc6dc Document &*chdir
  • e1a299c Reword "defined as" for &*chdir
  • e5225be Fix URL to &*chdir
  • bf377c7 Document &indir
  • 5aa614f Improve suggestion for Perl 5's opendir
  • a53015a Clarify value of IO::Path.path
  • bdd18f1 Fix desc of IO::Path.Str
  • b78d4fd Include type names in links to methods
  • b8fba97 Point out my $*CWD = chdir … is an error
  • d5abceb Write docs for all spurt routines
  • b9e692e Document new IO::Path.extension
  • 65cc372 Document IO::Path.concat-with
  • 24a6ea9 Toss all of the TODO methods in IO::Spec*
  • 1f75ddc Document IO::Spec*.abs2rel
  • cc62dd2 Kill IO::Path.abspath
  • 1973010 Document IO::Path.ACCEPTS
  • b3a9324 Expand/fix up IO::Path.accessed
  • 1cd7de0 Fix up type graph
  • 56256d0 Minor formatting improvements in IO::Special
  • 184342c Document IO::Special.what
  • 6bd0f98 Dissuade readers from using IO::Spec*
  • 7afd9c4 Remove unrelated related classes
  • a43ecb9 Document IO::Path's $.SPEC and $.CWD
  • e9b6809 Document IO::Path::* subclasses
  • 9102b51 Fix up IO::Path.basename
  • 5c1d3b6 Document IO::Spec::Unix.basename
  • a1cb80b Document IO::Spec::Win32.basename
  • 28b6283 Document IO::Spec::*.canonpath
  • 50e5565 Document IO::Spec::*.catdir and .catfile
  • dbdc995 Document IO::Spec::*.catpath
  • 0ca2295 Reword/expand IO::Path intro prose
  • 45e84ad Move IO::Path.path to attributes
  • b9de84f Remove DateTime tutorial from IO::Path docs
  • 69b2082 Document IO::Path.chdir
  • d436f3c Document IO::Spec::* don't do any validation
  • 4090446 Improve chmod docs
  • 1527d32 Document :completely arg to IO::Path.resolve
  • 372545c Straighten up file test docs
  • a30fae6 Avoid potential confusion with use of word "object"
  • 2aa3c9f Document new behaviour of IO::Handle.Supply
  • 56b50fe Document IO::Handle.slurp
  • 017acd4 Improve docs for IO::Path.slurp
  • 0f49bb5 List Rakudo-supported encodings in open()
  • e60da5c List utf-* alias examples too since they're common
  • f83f78c Use idiomatic Perl 6 in example
  • fff866f Fix docs for symlink/link routines
  • aeeec94 Straighten up copy, move, rename
  • 923ea05 Straighten up mkdir docs
  • 47b0526 Explicitly spell out caveats of IO::Path.Str
  • 60b9227 Change return value formkdir``
  • 8d95371 Expand IO::Handle/IO::Pipe.path docs
  • fd8a5ed Document IO::Pipe.path
  • bd4fa68 Document IO::Handle/IO::Pipe.IO
  • 2aaf12a Document IO::Handle.Str
  • 53f2b99 Documentrole IO's new purpose
  • 160c6a2 Document IO::Handle.lock/.unlock
  • 3145979 Document IO::Path.child-secure
  • c5524ef Rename IO::Path.concat-with to .add
  • 81a5806 Amend IO::Path.resolve: :completely
  • 6ca67e4 Start sketching out Definitive IO Guide™
  • b9c9117 Toss IO::Path.child-secure
  • 61cb776 Document IO::Path.sibling
  • 0fc39a6 Fix typegraph
  • 9a63dc4 Document IO::Path.cleanup
  • 2387ce3 Re-write IO::Handle.close docs
  • 0def0d1 Amend IO::Handle.close docs
  • c7e32e2 Document IO::Spec::Unix.curupdir
  • fe489dc Document IO::Spec::Unix.curdir
  • 83d5de0 Document IO::Spec::Unix.updir
  • 4804128 Document IO::Handle.DESTROY
  • c991862 Add warning to dir about...
  • eca21ff Document copy/move behaviour for same target/source
  • 6c2b8b2 Document IO::Path/IO::Handle.comb
  • fb29e04 Include exception used in IO::Path.resolve
  • 69d473f Document IO::Spec::*.devnull
  • 994d671 List IO::Dir as one of the means...
  • 4432ef3 Finish up IO::Path.dir docs
  • 64355c8 Document IO::Spec::*.dir-sep
  • 914c100 Finish up IO::Path.dirname
  • 8d5e31c Document IO::Handle.encoding
  • d5c36aa Finish off IO::Handle.eof
  • e9de97e Document IO::Spec::*.extension
  • bf7ec00 Document IO::Handle.flush
  • 25bce38 Document IO::Path.succ
  • 8233960 Improve IO::Handle.t docs
  • b4006a2 Be explicit what IO::Handle.opened returns
  • c4f27a7 Document IO::Path.pred
  • 860333f Remove entirely-invented "File test operators"
  • ab0bd7a Document IO::Path.Numeric/.Int
  • 4f81f08 Improve IO::Handle.get docs
  • c45d389 Finish off IO::Handle.getc/&getc docs
  • a4012e0 Document IO::Handle.gist
  • d15b0c7 Document IO::Path.gist
  • 1cf6932 Document IO::Spec::*.is-absolute
  • 4e88b84 Finish up IO::Path.is-absolute
  • 497e7f7 Finish off IO::Path.is-relative
  • f7e75c1 Document IO::Handle.nl-in
  • e309ddd Finish up &note
  • 81900cb Finish off IO::Path.parent
  • 59cbc38 Finish off IO::Path.parts
  • b99a666 Finish off IO::Path.path/.IO
  • b070999 Document IO::Spec::*.path
  • bace8ff Document IO::Path*.perl
  • dfdd845 Add "The Basics" section to TDIOG
  • cdc701e Add "What's an IO::Path Anyway?" section to TDIOG
  • 0d6d058 Add "Writing into files" Section to TDIOG
  • a6365f3 Document IO::Handle.words/&words
  • 2e25c82 Document IO::Spec::*.join
  • 49e58bd Document IO::Handle.lines
  • 1744820 Document IO::Path.lines
  • f3f70a0 Document IO::Path.words
  • 509f0e8 Fix incorrect suggested routine
  • a6f1cbf Fix up IO::Handle.print
  • 8f53830 Fix up IO::Handle.print-nl
  • dc50211 Fix &prompt
  • 98965b3 Fix up IO::Handle.split
  • bd702e2 Fix up IO::Handle.comb
  • 6dd92b8 Document IO::CatHandle
  • edeb069 Document IO::Path.split
  • 2d96596 Document IO::Spec::*.split
  • 129c097 Document IO::Spec::*.splitdir
  • b946960 Document IO::Spec::*.splitpath
  • dcd7490 Fix rmdir docs
  • 2a7bd17 Document IO::Spec::*.rel2abs
  • f45241f Document IO::Spec::*.rootdir
  • 70a80ec Document IO::Handle.put
  • 6f58ed0 Polish IO::Handle.say
  • 3790a0f Polish &put/&print/&say
  • ebb6f53 Document IO::Handle.nl-out attribute
  • 53c9c91 Document IO::Handle.chomp attribute
  • ca2a3a0 Improve &open/IO::Handle.open docs
  • 856e846 Add Reading From Files section to TDIOG

Perl 6 Core Hacking: Where's Da Sauce, Boss?

Read this article on Perl6.Party

Imagine you were playing with Perl 6 and you came across a buglet or you were having some fun with the Perl 6 bug queue—you'd like to debug a particular core subroutine or method, so where's the source for it at?

Asked such a question, you might be told it's in Rakudo compiler's GitHub repository. Depending on how deep down the rabbit hole you wish to go, you may also stop by NQP's repo, which is a subset of Perl 6 that's used in Rakudo, or the MoarVM's repo, which is the leading virtual machine Perl 6 runs on.

The answer is fine, but we can do better. We'd like to know exactly where da sauce is.

Stick to The Basics

The most obvious way is to just use grep command in the source repository. The code is likely in src/ directory, or src/core more specifically.

We'll use a regex that catches sub, method, and multi keywords. For example, here's our search for path sub or method:

$ grep -nER '^\s*(multi|sub|method|multi sub|multi method)\s+path' src/core

src/core/Cool.pm:229:    method path() { self.Stringy.IO }
src/core/CompUnit/Repository/Locally.pm:26:    method path-spec(CompUnit::Repository::Locally:D:) {
src/core/CompUnit/Repository/AbsolutePath.pm:46:    method path-spec() {
src/core/CompUnit/Repository/NQP.pm:32:    method path-spec() {
src/core/CompUnit/Repository/Perl5.pm:46:    method path-spec() {
src/core/CompUnit/PrecompilationStore/File.pm:93:    method path(CompUnit::PrecompilationId $compiler-id,
src/core/CompUnit/PrecompilationUnit.pm:17:    method path(--> IO::Path) { ... }
src/core/IO/Spec/Win32.pm:58:    method path {
src/core/IO/Spec/Unix.pm:61:    method path {
src/core/IO/Handle.pm:714:    method path(IO::Handle:D:)            { $!path.IO }

It's not too terrible, but it's a rather blunt tool. We have these problems:

  • There are false positives; we have several path-spec methods found
  • It doesn't tell us which of the results is for the actual method we have in our code. There's Cool, IO::Spec::Unix, and IO::Handle all with method path in them. If I call "foo".IO.path, which of those get called?

The last one is particularly irksome, but luckily Perl 6 can tell us where the source is from. Let's ask it!

But here's line number... So code me maybe

The Code class from which all subs and methods inherit provides .file and .line methods that tell which file that particular Code is defined in, including the line number:

say "The code is in {.file} on line {.line}" given &foo;

sub foo {
    say 'Hello world!';
}

# OUTPUT:
# The code is in test.p6 on line 3

That looks nice and simple, but it gets more awkward with methods:

class Kitty {
    method meow {
        say 'Meow world!';
    }
}

say "The code is in {.file} on line {.line}" given Kitty.^can('meow')[0];

# OUTPUT:
# The code is in test.p6 on line 2

We got extra cruft of the .^can metamodel call, which returns a list of Method objects. Above we use the first one to get the .file and .line number from, but is it really the method we were looking for? Take a look at this example:

class Cuddly {
    method meow ('meow', 'meow') {
        say 'Meow meow meow!';
    }
}

class Kitty is Cuddly {
    multi method meow ('world') {
        say 'Meow world!';
    }

    multi method meow ('meow') {
        say 'Meow meow';
    }
}

We have a method meow in one class and in another class we have two multi methods meow. How can we print the location of the last method, the one that takes a single 'meow' as an argument?

First, let's take a gander at all the items .^can returns:

say Kitty.^can('meow');
# OUTPUT:
# (meow meow)

Wait a minute, we have three methods in our code, so how come we only have two meows in the output? Let's print the .file and .line for both meows:

for 0, 1 {
    say "The code is in {.file} on line {.line}"
        given Kitty.^can('meow')[$_];
}
# OUTPUT:
# The code is in gen/moar/m-CORE.setting on line 587
# The code is in test.p6 on line 2

The second meow gives us a sane result; it's our method defined in class Cuddly. The first one, however, gives us some weird file.

What's happening here is the line is referencing the proto for the multies. Since in this case instead of providing our own proto we use the autogenerated one, the referenced file has nothing to do with our code. We can, of course, add a proto into the code, but then the line number would still reference the proto, not the last meow method. Is there anything that we can do?

You .cando It!

The Routine class, from which both Method and Sub classes inherit, provides the .cando method. Given a Capture, it returns a list of candidates that can handle it, with the narrowest candidate first in the list, and since the returned object is a Code, we can query its specific .file and .line:

class Cuddly {
    method meow ('meow', 'meow') {
        say 'Meow meow meow!';
    }
}

class Kitty is Cuddly {
    multi method meow ('world') {
        say 'Meow world!';
    }

    multi method meow ('meow') {
        say 'Meow meow';
    }
}

my $code = gather {
    for Kitty.^can('meow') -> $meth {
        .take for $meth.cando: \(Kitty, 'meow');
    }
}

say "The code is in {.file} on line {.line}" with $code[0];

# OUTPUT:
# The code is in test.p6 on line 12

Hooray! We got the correct location of the multi we wanted. We still have our two classes with three meow methods total. On line 17–21 we loop over the two meow Methods the .^can metamodel call gives us. For each of them we call the .cando method with the Capture that matches the multi we want (note that we do need to provide the needed object as the first argument of the Capture). We then .take all found candidates to gather them into the $code variable.

The first value we get is the narrowest candidate and is good 'nuf for us, so we call the .file and .line on it, which gives us the location we were looking for. Sounds like we nailed this .file and .line business down rather well. Let's dive into the core, shall we?

Can't see the core files for the setting

If this is the first time you're to see the print out of the .file/.line for some core stuff, you're in for a surprise. Actually, we've already seen the surprise, but you may have thought it to be a fluke:

say "{.file}:{.line}" given &say;
# OUTPUT:
# gen/moar/m-CORE.setting:29038

All of the nice, good looking files you see in src/core in the repo actually get compiled into one giant file called the "setting." My current setting is 40,952 lines long and the .line of core subs and methods refers to one of those thousands of lines.

Now sure, we could pop the setting open and watch our editor grind to a stuttering halt (I'm looking at you, Atom!). However, that doesn't help us find the right repo file to edit if we want to make changes to how it works. So what do we do?

A keen eye will look at the contents of the setting or at the file that generates it and notice that for each of the separate files in the repo, the setting has this type of comment before the contents of the file are inserted into the setting:

#line 1 src/core/core_prologue.pm

This means if we're clever enough, we can write a sub that translates a line number in the setting to the separate file we can locate in the repo. Here's a plan of action: we pop open the setting file and read it line by line. When we encounter one of the above comments, we make a note of which file we're in as well as how many lines deep in the setting we're currently at.

The location of the setting file may differ, depending on how you installed Perl 6, but on my system (I use rakudobrew), it's in $*EXECUTABLE.parent.parent.parent.child('gen/moar/m-CORE.setting'), so the code for finding the actual file that defines our core sub or method is this:

sub real-location-for ($wanted) {
    state $setting = $*EXECUTABLE.parent.parent.parent.child: 'gen/moar/m-CORE.setting';
    my ($cur-line-num, $offset) = 0, 0;
    my $file;
    for $setting.IO.lines -> $line {
        return %( :$file, :line($cur-line-num - $offset), )
            if ++$cur-line-num == $wanted;

        if $line ~~ /^ '#line 1 ' $<file>=\S+/ {
            $file   = $<file>;
            $offset = $cur-line-num + 1;
        }
    };
    fail 'Were not able to find location in setting.';
}

say "{.<file>}:{.<line>}" given real-location-for &say.line;


# OUTPUT:
# src/core/io_operators.pm:17

The $wanted contains the setting line number given to us by .line call and the $cur-line-num contains the number of the current line we're examining. We loop until the $cur-line-num reaches $wanted and return a Hash with the results. For each line that matches our special comment, we store the real name of the file the code is from into $file and store the $offset of the first line of the code in that file. Once done, we simply subtract the $offset from the setting $cur-line-num and we get the line number in the source file.

This is pretty awesome and useful, but it's still not what I had in mind when I said we wanted to know exactly where da sauce is. I don't want to clone the repo and go to the repo and open my editor. I want to just look at code.

If it's worth doing, it's worth overdoing

There's one place where we can stare at Rakudo's source code until it blushes and looks away: GitHub. Since our handy sub gives us a filename and a line number, we can construct a URL that points to a specific file and line in the source code, like this one, for example: https://github.com/rakudo/rakudo/blob/nom/src/core/Str.pm#L16

There's an obvious problem with such an approach: the URL points to the master branch (called nom, for "New Object Model," in Rakudo). Commits go into the repo daily, and unless we rebuild our Perl 6 several times a day, there's a good chance the location our GitHub URL points to is wrong.

Not only do we have to point to a specific file and line number, we have to point to the right commit too. On GitHub's end, it's easy: we just replace nom in the URL with the appropriate commit number—we just need Rakudo to tell us what that number is.

The two dynamic variables $*VM and $*PERL contain some juicy information. By introspecting them, we can locate some useful info and what looks like commit prefix parts in version numbers:

say $*VM.^methods;
# (BUILD platform-library-name Str gist config prefix precomp-ext
# precomp-target precomp-dir name auth version signature desc)

say $*VM.version;
# v2016.06

say $*PERL.^methods;
# (BUILD VMnames DISTROnames KERNELnames Str gist compiler name auth version
# signature desc)

say $*PERL.compiler.^methods;
# (BUILD build-date Str gist id release codename name auth version
# signature desc)

say $*PERL.compiler.version;
# v2016.06.10.g.7.cff.429

Rakudo is a compiler and so we're interested in the value of $*PERL.compiler.version. It contains the major release version, followed by g, followed by the commit prefix of this particular build. The prefix is split up on number-letter boundaries, so we'll need to join up all the bits and split on g. But, take a look at $*VM.version, which is the version of the virtual machine we're running the code on. There aren't any gs and commits in it and for a good reason: it's a tagged major release, and the name of the tag is the version. The same will occur for Rakudo on release builds, like the ones shipped with Rakudo Star. So we'll need to check for such edge cases and this is the code:

my $where = .Str ~~ /g/
    ?? .parts.join.split("g")[*-1]
    !! .Str
given $*PERL.compiler.version;

given a $*PERL .compiler .version, if it contains letter g, join up version bits, split on g, and the last portion will be our commit prefix; if it doesn't contain letter g, then we're dealing with a release tag, so we'll take it as-is. All said and done, our code for locating source becomes this:

my $where = .Str ~~ /g/
    ?? .parts.join.split("g")[*-1]
    !! .Str
given $*PERL.compiler.version;

say [~] 'https://github.com/rakudo/rakudo/blob/',
        $where, '/', .<file>, '#L', .<line>
given real-location-for &say.line;

# OUTPUT:
# https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L17

Hey! Awesome! We got a link that points to the correct commit and file! Let celebrations begin! Wait. What? You followed the link and noticed the line number is not quite right? What gives? Did we mess up our algorithm?

Crank Up The Insanity

If you take a look again at the script that generates the setting file, you'll notice it strips things: comments and special backend-specific chunks of code.

There are two ways to fix this. The sane approach would be to commit a change that would make that script insert an empty line for each line it skips and then pretend that we didn't commit that just to make our personal project work. Then, there's the Zoffix Way to fix this: we got the GitHub link, so why don't we fetch that code and figure out what the right line number is. Hey! That second way sounds much more fun! Let's do just that!

The one link we've seen so far is this: https://github.com/rakudo/rakudo/blob/c843682/src/core/iooperators.pm#L17. It's not quite what we want, since it's got HTML and bells and whistles in it. We want raw code and GitHub does offer that at a slightly different URL: https://raw.githubusercontent.com/rakudo/rakudo/c843682/src/core/iooperators.pm. The plan of action then becomes:

  • Get the line number in the setting
  • Use our real-location-for sub to get the filename and sorta-right line number in a source file
  • Get the commit our compiler was built with
  • Generate a GitHub URL for raw code for that file on that commit and fetch that code
  • Use the same algorithm as in the setting generating script to convert the code we fetched into the version that lives in our setting, while keeping track of the number of lines we strip
  • When we reach the correct line number in the converted file, we adjust the original line number we had by the number of lines we stripped
  • Generate a regular GitHub URL to the commit, file, and corrected line number
  • ???
  • Profit!

I could go over the code, but it's just a dumb, unfun algorithm, and most importantly, you don't need to know it. Because... there's a module that does just that!

What Sorcery Is This?

The module is called CoreHackers::Sourcery and when you use it, it'll augment the Code class and all core classes that inherit from it with .sourcery method, as well as provide a sourcery subroutine.

So, to get the location of the code for say sub, just run:

use CoreHackers::Sourcery;
&say.sourcery.put;

# OUTPUT:
# src/core/io_operators.pm:20 https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L20

That gives us the correct location of the proto. We can either pop open a file in a repo checkout or view the code at the provided GitHub URL.

Want to get the location of a specific multi? There's no need to mess with .cando! The arguments you give to the .sourcery method will be used to select the best matching multi, so to find the location of the say multi that will handle say "foo" call, just run:

&say.sourcery("foo").put;

# OUTPUT:
# src/core/io_operators.pm:22 https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L22

That covers the subs. For methods, you can go with the whole .^can meta dance, but we like simple things, and so we'll use the subroutine form of sourcery:

put sourcery Int, 'abs';         # method of a type object
put sourcery 42,  'split';       # method of an Int object
put sourcery 42,  'base', \(16); # best candidate for `base` method called with 16 as arg

This is pretty handy. And the whole hitting the GitHub thing? The module will cache the code fetched from GitHub, so things like this won't take forever:

put "Int.{.name} is at {.sourcery}" for Int.^methods;

However, if you do actually run that code, after some output you'll be greeted with this error:

# Method 'sourcery' not found for invocant of class 'Method+{Callable[Bool:D]}'
#   in block  at test.p6 line 1
#   in block <unit> at test.p6 line 1

The class it mentions is not a pure Method object, but has a mixin in it. While CoreHackers::Sourcery recomposes all core subclasses of Code class after augmenting it, it doesn't do that for such mixes, so you'd have to recompose them yourself:

for Int.^methods {
    .WHAT.^compose;
    put "Int.{.name} is at {.sourcery}" ;
}

Or better still, just use the subroutine form of sourcery:

put "Int.{.name} is at {sourcery $_}" for Int.^methods;

Do It For Me

For most stuff, we wouldn't want to do a whole bunch of typing to use a module and call subs and then copy/paste URLs or filenames. You'll notice sourcery returns a list of two items: the filename and the URL. This means we can make some nice and short aliases to call it and automatically pop open either our editor or web browser:

$ alias sourcery='perl6 -MCoreHackers::Sourcery -MMONKEY-SEE-NO-EVAL \
    -e '\''run "atom", "/home/zoffix/rakudo/" \
        ~ EVAL "sourcery(@*ARGS[0])[0]" '\'''

$ alias sourcery-web='perl6 -MCoreHackers::Sourcery -MMONKEY-SEE-NO-EVAL \
    -e '\''run "firefox", EVAL "sourcery(@*ARGS[0])[1]" '\'''

# opens Atom editor at the spot to edit code for Int.base
$  sourcery 'Int, "base"'

# opens Firefox, showing code for Int.base
$  sourcery 'Int, "base"'

We EVAL the argument we give to these aliases, so be careful with them. For sourcery alias, we run the Atom editor and give it the file to open. I prepended the location of my local Rakudo checkout, but you'd use yours. Most editors support opening file:line-number format to open files at a particular spot; if yours doesn't, modify the command.

For sourcery-web we use the URL returned by sourcery and open Firefox browser at this location. And just like that, with a few keystrokes, we can jump in to view or edit the code for a particular core sub or method in Rakudo!

Conclusion

We've learned where Rakudo's source lives, how to find the commit the current compiler is built off, and how to locate the source code for a particular sub or method in a giant file called the setting. We then further hacked away the inconveniences by getting to the actual place in the source code we can edit, culminating with a shiny module and a couple of handy command line aliases.

Happy hacking!

UPDATE 2016.08.05

Inspired by this blog post, lizmat++ has changed the setting generation script to not skip any lines, so making adjustments to line numbers by fetching source from GitHub is no longer necessary, as the line numbers match up with the original source.

Hacking on The Rakudo Perl 6 Compiler: Mix Your Fix

Read this article on Perl6.Party

While testing a fix for one of the Less Than Awesome behaviours in standalone Signature objects, I came across a bugglet. Smartmatching two Signatures throws, while spilling a bit of the guts:

<Zoffix> m: my $m = method ($a: $b) { }; say $m.signature ~~ :($a, $b);
<camelia> rakudo-moar 46838d: OUTPUT«Method 'type' not found for invocant of class 'Any'␤ in block at line 1␤␤»

So I figured I'll write about fixing it, 'cause hacking on internals is lots of fun. Let's roll!

Golf It Down

The less code there is to reproduces the bug, the fewer places there are for that bug to hide. We have a detached method and then we smartmatch its signature against something else. Let's try to golf it down a bit and smartmatch two Signatures, without involving a method:

<Zoffix> m: :($a, $b) ~~ :($a, $b);
<camelia> rakudo-moar 46838d: ( no output )

The bug disappeared, so perhaps out Signature on the left doesn't contain the stuff that triggers the bug. Let's dump the signature of the method to see what we should match against:

<Zoffix> m: my $m = method ($a: $b) { }; say $m.signature <camelia> rakudo-moar 46838d: OUTPUT«($a: $b, *%_)␤»

Aha! It has a slurpy hash: *%_. Let's try matching a Signature with a slurpy in it:

<Zoffix> m: :(*%) ~~ :();
<camelia> rakudo-moar 46838d: OUTPUT«Method 'type' not found for invocant of class 'Any'␤ in block at line 1␤␤»

And there we go: hole in three. Let's proceed.

Roast It

There's an official Perl 6 test suite that Rakudo must pass to be called a Perl 6 compiler. Since we got a bug on our hands, we should add a test for it to the test suite to ensure it doesn't rear its ugly head again.

The copy of the repo gets automatically cloned into t/spec when you run make spectest in Rakudo's checkout. If you don't have a commit bit, you can just change the remote/branch of that checkout to your fork:

cd t/spec
git remote rm origin
git remote add origin https://github.com/YOURUSERNAME/roast
git checkout your-branch
cd ../..

It may be tricky to figure out which file to put the test in, if you're new. You can always ask the good folks on irc.freenode.net/#perl6 for advice. In this case, I'll place the test into S06-signature/outside-subroutine.t

While not required, I find it helpful to open a ticket for the bug. This way I can reference it in my fix in the compiler repo, I can reference it in the commit to the test repo, and people get a place where to tell me why I'm being stupid when I am. I opened this bug as RT#128795.

Now, for the code of the test itself. I'll adjust the plan at the top of the file to include however many tests I'm writing—in this case one. I'll use the lives-ok test sub and stick our buggy golfed code into it. Here's the diff of the changes to the file; note the reference to the ticket number in the comment before the test:

@@ -1,7 +1,7 @@
  use v6;
  use Test;

 -plan 3;
 +plan 4;

  # RT #82946
  subtest 'signature binding outside of routine calls' => {
 @@ -25,4 +25,7 @@ subtest 'smartmatch on signatures with literal strings' => {
  # RT #128783
  lives-ok { EVAL ’:($:)‘ }, ’signature marker is allowed in bare signature‘;

 +# RT #128795
 +lives-ok { :(*%)~~ :() }, 'smartmatch with no slurpy on right side';
 +
  # vim: ft=perl6

Run the file now to ensure the test fails. Hint: some files have fudging; explaining it is out of the scope of this article, but if you notice failures you're not expecting, look it up.

$ make t/spec/S06-signature/outside-subroutine.t
...
Test Summary Report
-------------------
t/spec/S06-signature/outside-subroutine.t (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  4
  Non-zero exit status: 1

With the test in place, it's time to look at some source code. Let the bug hunt begin!

Make it Saucy

Our bug involves a Smartmatch operator, which aliases the left side to the topic variable $_ and calls .ACCEPTS method on the right side with it. Both of our sides are Signature objects, so let's pop open Rakudo's sauce code for that class.

In the Rakudo's repo, directory src/core/ contains most of the built in types in separate files named after those types, so we'll just pop open src/core/Signature.pm in the editor and locate the definition of method ACCEPTS.

There are actually four multis for ACCEPTS. Here's the full code. Don't try to understand all of it, just note its size.

``` multi method ACCEPTS(Signature:D: Capture $topic) { nqp::p6bool(nqp::p6isbindable(self, nqp::decont($topic))); }

multi method ACCEPTS(Signature:D: @topic) {
    self.ACCEPTS(@topic.Capture)
}

multi method ACCEPTS(Signature:D: %topic) {
    self.ACCEPTS(%topic.Capture)
}

multi method ACCEPTS(Signature:D: Signature:D $topic) {
    my $sclass = self.params.classify({.named});
    my $tclass = $topic.params.classify({.named});
    my @spos := $sclass{False} // ();
    my @tpos := $tclass{False} // ();

    while @spos {
        my $s;
        my $t;
        last unless @tpos && ($t = @tpos.shift);
        $s=@spos.shift;
        if $s.slurpy or $s.capture {
            @spos=();
            @tpos=();
            last;
        }
        if $t.slurpy or $t.capture {
            return False unless any(@spos) ~~ {.slurpy or .capture};
            @spos=();
            @tpos=();
            last;
        }
        if not $s.optional {
            return False if $t.optional
        }
        return False unless $t ~~ $s;
    }
    return False if @tpos;
    if @spos {
        return False unless @spos[0].optional or @spos[0].slurpy or @spos[0].capture;
    }

    for flat ($sclass{True} // ()).grep({!.optional and !.slurpy}) -> $this {
        my $other;
        return False unless $other=($tclass{True} // ()).grep(
            {!.optional and $_ ~~ $this });
        return False unless +$other == 1;
    }

    my $here=($sclass{True}:v).SetHash;
    my $hasslurpy=($sclass{True} // ()).grep({.slurpy});
    $here{@$hasslurpy} :delete;
    $hasslurpy .= Bool;
    for flat @($tclass{True} // ()) -> $other {
        my $this;

        if $other.slurpy {
            return False if any($here.keys) ~~ -> Any $_ { !(.type =:= Mu) };
            return $hasslurpy;
        }
        if $this=$here.keys.grep( -> $t { $other ~~ $t }) {
            $here{$this[0]} :delete;
        }
        else {
            return False unless $hasslurpy;
        }
    }
    return False unless self.returns =:= $topic.returns;
    True;
}

```

The error we get from the bug mentions .type method call and there is one such method call in the code above (close to the end of it). In this case, there's quite a bit of code to sort through. It would be nice to be able to play around with it, stick a couple of dd or say calls to dump out variables, right?

That approach, however, is somewhat annoying because after each change we have to recompile the entire Rakudo. On the meatiest box I got, it takes about 60 seconds. Not the end of the world, but there's a way to make things lightning fast!

Mix Your Fix

We need to fix a bug in a method of a class. Another way to think of it is: we need to replace a broken method with a working one. Signature class is just like any other class, so if we want to replace one of its methods, we can just mix in a role!

The broken ACCEPTS will continue to live in the compiler, and we'll pop open a separate playground file and define a role—let's calls it FixedSignature—in it. To get our new-and-improved ACCEPTS method in standalone signature objects, we'll use the but operator to mix the FixedSignature in.

Here's the role, the mixing in, and the code that triggers the bug. I'll leave out method bodies for brieviety, but there's they are the same as in the code above.

role FixedSignature {
    multi method ACCEPTS(Signature:D: Capture $topic)     { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: @topic)             { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: %topic)             { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: Signature:D $topic) { #`(redacted for brevity) }
}

my $a = :(*%) but FixedSignature;
my $b = :()   but FixedSignature;

say $a ~~ $b;

There are two more things we need to do for our role to work properly. First, we're dealing with multis and right now the multis in our role are creating ambiguities with the multis in the original Signature class. To avoid that, we'll define a proto:

proto method ACCEPTS (|) { * }

Since the code is using some NQP, we also need to bring in those features into our playground file with the role. Just add the appropriate pragma at the top of the file:

use MONKEY-GUTS;

With these modifications, our final test file becomes the following:

use MONKEY-GUTS;

role FixedSignature {
    proto method ACCEPTS (|) { * }

    multi method ACCEPTS(Signature:D: Capture $topic)     { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: @topic)             { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: %topic)             { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: Signature:D $topic) { #`(redacted for brevity) }
}

my $a = :(*%) but FixedSignature;
my $b = :()   but FixedSignature;

say $a ~~ $b;

And with this trick in place, we now have a rapid-fire weapon to hunt down the bug with—the changes we make compile instantly.

Pull The Trigger

Now, we can debug the code just like any other. I prefer applying liberal amounts of dd (or say) calls and dumping out the variables to ensure their contents match expectations.

The .type method call our error message mentions is in this line:

return False if any($here.keys) ~~ -> Any $_ { !(.type =:= Mu) };

It calls it on the keys of $here, so let's dump the $here before that statement:

...
dd $here
return False if any($here.keys) ~~ -> Any $_ { !(.type =:= Mu) };
...
# OUTPUT:
# SetHash $here = SetHash.new(Any)

Here's our offending Any, let's go up a bit and dump the $here right where it's defined:

...
my $here=$sclass{True}.SetHash;
dd $here;
...
# OUTPUT:
# SetHash $here = SetHash.new(Any)

It's still there, and for a good reason. If we trace the creation of $sclass, we'll see it's this:

my $sclass = self.params.classify({.named});

The params of the Signature on the right of the smartmatch get classified based on whether they are named or not. The named parameters will be inside a list under the True key of $sclass. Since we do not have any named params, there won't be such a key, and we can verify that with this bit of code:

:().params.classify(*.named).say
# OUTPUT:
# {}

When we go to define $here, we get an Any from $sclass{True}, since that key doesn't exist, and when we call .SetHash on it, we get our problematic Sethash object with an Any in it. And so, we have our fix for the bug: ensure the True key in $sclass is actually there before creating a SetHash out of its value:

my $here=($sclass{True}:v).SetHash;

Add that to our playground file with the FixedSignature role in it, run it, and verify the fix works. Now, simply transplant the fix back into src/core/Signature.pm and then compile the compiler.

perl Configure.pl --gen-moar --gen-nqp --backends=moar
make
make test
make install

Verify our fix worked before we proceed onto the final stages:

$ make t/spec/S06-signature/outside-subroutine.t
...
All tests successful.
Files=1, Tests=4,  1 wallclock secs ( 0.03 usr  0.00 sys +  0.32 cusr  0.02 csys =  0.37 CPU)
Result: PASS

A Clean Kill

So far, all we know is the bug we found was fixed and the tests we wrote for it pass. However, before we ship our fix, we must ensure we didn't break anything else. There are other devs working from the same repo and you'll be interfering with their work if you break stuff.

Run the full Roast test suite with make spectest command. You can use the TEST_JOBS environmental variable to specify the number of simultaneous tests. Generally a value slightly higher than the available cores works the fastest... and cores make all the difference. On my 24-core VM I cut releases on, the spectest completes in about 1 minute and 15 seconds. On my 2-core web server, it takes about 25 minutes. You get the idea.

TEST_JOBS=28 make spectest
...
All tests successful.
Files=1111, Tests=52510, 82 wallclock secs (13.09 usr 2.44 sys + 1517.34 cusr 97.67 csys = 1630.54 CPU)
Result: PASS

Once the spectest completes and we have the clean bill of health, we're ready to ship our fix. Commit the Rakudo fix, then go into t/spec and commit the Roast fix:

git commit -m 'Fix Smartmatch with two signatures, only one of which has slurpy hash' \
           -m 'Fixes RT#128795' src/core/Signature.pm
git push

cd t/spec
git commit -m 'smartmatch on signature with no slurpy on right side does not crash' \
           -m 'RT#128795' S06-signature/outside-subroutine.t
git push

If you're pushing to your fork of these projects, you have to go the extra step and submit a Pull Request (just go to your fork and GitHub should display a button just for that).

And we're done! Celebrate with the appropriate amount of fun.

Conclusion

Rakudo bugs can be easy to fix, requiring not much more than knowledge of Perl 6. To fix them, you don't need to re-compile the entire compiler, but can instead define a small role with a method you're trying to fix and modify and recompile just that.

It's important to add tests for the bug into the official test suite and it's also important to run the full spectest after you fix the bug. But most important of all, is to have fun fixing it.

-Ofun

IRC::Client: Perl 6 Multi-Server IRC (or Awesome Async Interfaces with Perl 6)

Read this article on Perl6.Party

I wrote my first Perl 6 program—a New Years IRC Party bot—around Christmas, 2015. The work included releasing the IRC::Client module, and given my virginity with the language and blood alcohol level appropriate for the Holiday Season, the module ended up sufficiently craptastic.

Recently, I needed a tool for some Perl 6 bug queue work, so I decided to lock myself up for a weekend and re-design and re-write the module from scratch. Multiple people bugged me to do so over the past months, so I figured I'd also write a tutorial for how to use the module—as an apology for being a master procrastinator. And should IRC be of no interest to you, I hope the tutorial will prove useful as a general example of async, non-blocking interfaces in Perl 6.

The Basics

To create an IRC bot, instantiate an IRC::Client object, giving it some basic info, and call the .run method. Implement all of the functionality you need as classes with method names matching the events you want to listen to and hand those in via the .plugins attribute. When an IRC event occurs, it's passed to all of the plugins, in the order you specify them, stopping if a plugin claims it handled the event.

Here's a simple IRC bot that responds to being addressed in-channel, notices, and private messages sent to it. The response is the uppercased original message the bot received:

use IRC::Client;
.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(class { method irc-to-me ($_) { .text.uc } })

And here's what the bot looks like when running:

<Zoffix> MahBot, I ♥ you!
<MahBot> Zoffix, I ♥ YOU!

The :nick, :host, and :channels are the nick for your bot, the server it should connect to, and channels it should join. The :debug controls how much debugging output to display. We'll set it to value 1 here, for sparse debug output, just to see what's happening. Tip: install the optional Terminal::ANSIColor module to make debug output purty:

For the .plugins attribute, we hand in an anonymous class. If you have multiple plugins, just shove them all in in the order you want them to receive events in:

:plugins(PlugFirst.new, PlugSecond.new(:conf), class { ... })

The plugin class of our uppercasing bot has a single method that listens to irc-to-me event, triggered whenever the bot is addressed in-channel or is sent a private message or notice. It receives a single argument: one of the objects that does the IRC::Client::Message role. We stick it into the $_ topical variable to save a bit of typing.

We reply to the event by returning a value from the method. The original text is contained inside the .text attribute of the message object, so we'll call .uc method on it to uppercase the content and that's what our reply will be.

As awesome as our uppercasing bot is, it's as useful as an air conditioner on a polar expedition. Let's teach it some tricks.

Getting Smarter

We'll call our new plugin Trickster and it'll respond to commands time—that will give the local time and date—and temp—that will convert temperature between Fahrenheit and Celsius. Here's the code:

use IRC::Client;

class Trickster {
    method irc-to-me ($_) {
        given .text {
            when /time/ { DateTime.now }
            when /temp \s+ $<temp>=\d+ $<unit>=[F|C]/ {
                when $<unit> eq 'F' { "That's {($<temp> - 32) × .5556}°C" }
                default             { "That's { $<temp> × 1.8 + 32   }°F" }
            }
            'huh?'
        }
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(Trickster)

<Zoffix> MahBot, time
<MahBot> Zoffix, 2016-07-23T19:00:15.795551-04:00
<Zoffix> MahBot, temp 42F
<MahBot> Zoffix, That's 5.556°C
<Zoffix> MahBot, temp 42C
<MahBot> Zoffix, That's 107.6°F
<Zoffix> MahBot, I ♥ you!
<MahBot> Zoffix, huh?

The code is trivial: we pass the given text over a couple of regexes. If it contains word time, we return the current time. If it contains word temp we do the appropriate math, based on whether the given number is postfixed by an F or a C. And if no matches happen, we end up returning the inquisitive huh?.

There's an obvious problem with this new and improved plugin: the bot no longer loves me! And while I'll survive the heartache, I doubt any other plugin will teach the bot to love again, as Trickster consumes all irc-to-me events, even if it doesn't recognize any of the commands it can handle. Let's fix that!

Passing The Buck

There's a special value that can be returned by the event handler to signal that it did not handle the event and that it should be propagated to further plugins and event handlers. That value is provided by the .NEXT attribute offered by the IRC::Client::Plugin role, which a plugin does to obtain that attribute. The role is automatically exported when you use IRC::Client.

Let's look at some code utilizing that special value. Note that since .NEXT is an attribute and we can't look up attributes on type objects, you need to go the extra step and instantiate your plugin classes when giving them to :plugins.

use IRC::Client;

class Trickster does IRC::Client::Plugin {
    method irc-to-me ($_) {
        given .text {
            when /time/ { DateTime.now }
            when /temp \s+ $<temp>=\d+ $<unit>=[F|C]/ {
                when $<unit> eq 'F' { "That's {($<temp> - 32) × .5556}°C" }
                default             { "That's { $<temp> × 1.8 + 32   }°F" }
            }
            $.NEXT;
        }
    }
}

class BFF does IRC::Client::Plugin {
    method irc-to-me ($_) {
        when .text ~~ /'♥'/ { 'I ♥ YOU!' };
        $.NEXT;
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(Trickster.new, BFF.new)

<Zoffix> MahBot, time
<MahBot> Zoffix, 2016-07-23T19:37:45.788272-04:00
<Zoffix> MahBot, temp 42F
<MahBot> Zoffix, That's 5.556°C
<Zoffix> MahBot, temp 42C
<MahBot> Zoffix, That's 107.6°F
<Zoffix> MahBot, I ♥ you!
<MahBot> Zoffix, I ♥ YOU!

We now have two plugins that both subscribe to irc-to-me event. The :plugins attribute receives Trickster plugin first, so its event handler will be run first. If the received text does not match either of the Trickster's regexes, it returns $.NEXT from the method.

That signals the Client Object to go hunting for other handlers, so it gets to BFF's irc-to-me handler. There, we reply if the input contains a heart, if not, we pre-emptively return $.NEXT here too.

While the bot got its sunny disposition back, it did so at the cost of quite a bit of extra typing. What can we do about that?

Multify All The Things!

Perl 6 supports multi-dispatch as well as type constraints in signatures. On top of that, smartmatch against IRC::Client's message objects that have a .text attribute uses the value of that attribute. Combine all three of those features and you end up with ridiculously concise code:

use IRC::Client;
class Trickster {
    multi method irc-to-me ($ where /time/) { DateTime.now }
    multi method irc-to-me ($ where /temp \s+ $<temp>=\d+ $<unit>=[F|C]/) {
        $<unit> eq 'F' ?? "That's {($<temp> - 32) × .5556}°C"
                       !! "That's { $<temp> × 1.8 + 32   }°F"
    }
}

class BFF { method irc-to-me ($ where /'♥'/) { 'I ♥ YOU!' } }

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(Trickster, BFF)

<Zoffix> MahBot, time
<MahBot> Zoffix, 2016-07-23T19:59:44.481553-04:00
<Zoffix> MahBot, temp 42F
<MahBot> Zoffix, That's 5.556°C
<Zoffix> MahBot, temp 42C
<MahBot> Zoffix, That's 107.6°F
<Zoffix> MahBot, I ♥ you!
<MahBot> Zoffix, I ♥ YOU!

Outside of the signature, we no longer have any need for the message object, so we use the anonymous $ parameter in its place. We then type-constrain that parameter with a regex match, and so the method will be called only if the text of the message matches that regex. Since no methods will be called on failed matches, we no longer have to mess around with the whole $.NEXT business or compose any roles into our plugins.

The bodies of our methods each have a single statement that produces the response value for the event. In the temperature converter, we use the ternary operator to select which formula to use for the conversion, depending on the unit requested, and yes, the $<unit> and $<temp> captures created in the signature type constraint match are available in the method's body.

An Eventful Day

Along with standard named and numerical IRC protocol events, IRC::Client offers convenience events. One of them we've already seen: the irc-to-me event. Such events are layered, so one IRC event can trigger several IRC::Client's events. For example, if someone addresses our bot in a channel, the following chain of events will be fired:

irc-addressed  ▶  irc-to-me  ▶  irc-privmsg-channel  ▶  irc-privmsg  ▶  irc-all

The events are ordered from "narrowest" to "widest": irc-addressed can be triggered only in-channel, when our bot is addressed; irc-to-me can also be triggered via notice and private message, so it's wider; irc-privmsg-channel includes all channel messages, so it's wider still; and irc-privmsg also includes private messages to our bot. The chain ends by the widest event of them all: irc-all.

If a plugin's event handler returns any value other than $.NEXT, later events in the event chain won't be fired, just as plugins later in the plugin chain won't be tried for the same reason. Each event is tried on all of the plugins, before attempting to handle a wider event.

By setting the :debug attribute to level 3 or higher, you'll get emitted events in the debug output. Here's our bot attempting to handle unknown command blarg and then processing command time handled by irc-to-me event handler we defined:

All of IRC::Client's events have irc- prefix, so you can freely define auxiliary methods in your plugin, without worrying about conflicting with event handlers. Speaking of emitting things...

Keep 'Em Commin'

Responding to commands is sweet and all, but many bots will likely want to generate some output out of their own volition. As an example, let's write a bot that will annoy us whenever we have unread GitHub notifications!

use IRC::Client;
use HTTP::Tinyish;
use JSON::Fast;

class GitHub::Notifications does IRC::Client::Plugin {
    has Str  $.token  = %*ENV<GITHUB_TOKEN>;
    has      $!ua     = HTTP::Tinyish.new;
    constant $API_URL = 'https://api.github.com/notifications';

    method irc-connected ($) {
        start react {
            whenever self!notification.grep(* > 0) -> $num {
                $.irc.send: :where<Zoffix>
                            :text("You have $num unread notifications!")
                            :notice;
            }
        }
    }

    method !notification {
        supply {
            loop {
                my $res = $!ua.get: $API_URL, :headers{ :Authorization("token $!token") };
                $res<success> and emit +grep *.<unread>, |from-json $res<content>;
                sleep $res<headers><X-Poll-Interval> || 60;
            }
        }
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(GitHub::Notifications.new)

[00:25:41] -MahBot- Zoffix, You have 20 unread notifications!
[00:26:41] -MahBot- Zoffix, You have 19 unread notifications!

We create GitHub::Notifications class that does the IRC::Client::Plugin role. That role gives us the $.irc attribute, which is the IRC::Client object we'll use to send messages to us on IRC.

Aside from irc-connected method, the class is just like any other: a public $.token attribute for our GitHub API token, a private $!ua attribute that keeps our HTTP User Agent object around, and a private notification method, where all the action happens.

Inside notification, we create a Supply that will emit the number of unread notifications we have. It does so by using an HTTP::Tinyish object to access a GitHub API endpoint. On line 24, it parses the JSON returned by successful requests, and greps the message list for any items with unread property set to true. The prefix + operator converts the list to an Int that is total items found, which is what we emit from our supply.

The irc-connected event handler gets triggered when we successfully connect to an IRC server. In it, we start an event loop that reacts whenever we receive the current unread messages count from our supply given by notifications method. Since we're only interested in cases where we do have unread messages, we also pop a grep on the supply to filter out the cases without any messages (yes, we could avoid emitting those in the first place, but I'm showing off Perl 6 here 😸). And once we do have unread messages, we simply call IRC::Client's .send method, asking it to send us an IRC notice with the total number of unread messages. Pure awesomeness!

Don't Wait Up

We've covered the cases where we either have an asynchronous supply of values we sent to IRC or where we reply to a command right away. It's not uncommon for a bot command to take some time to execute. In those cases, we don't want the bot to lock up while the command is doing its thing.

Thanks to Perl 6's excellent concurrency primitives, it doesn't have to! If an event handler returns a Promise, the Client Object will use its .result as the reply when it is kept. This means that in order to make our blocking event handler non-blocking, all we have to do is wrap its body in a start { ... } block. What could be simpler?

As an example, let's write a bot that will respond to bash command. The bot will fetch bash.org/?random1, parse out the quotes from the HTML, and keep them in the cache. When the command is triggered, the bot will hand out one of the quotes, repeating the fetching when the cache runs out. In particular, we don't want the bot to block while retrieving and parsing the web page. Here's the full code:

use IRC::Client;
use Mojo::UserAgent:from<Perl5>;

class Bash {
    constant $BASH_URL = 'http://bash.org/?random1';
    constant $cache    = Channel.new;
    has        $!ua    = Mojo::UserAgent.new;

    multi method irc-to-me ($ where /bash/) {
        start $cache.poll or do { self!fetch-quotes; $cache.poll };
    }

    method !fetch-quotes {
        $cache.send: $_
            for $!ua.get($BASH_URL).res.dom.find('.qt').each».all_text.lines.join: '  ';
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(Bash.new)

<Zoffix> MahBot, bash
<MahBot> Zoffix, <Time> that reminds me of when Manning and I installed OS/2 Warp4 on a box and during the install routine it said something to the likes of 'join the hundreds of people on the internet'

For page fetching needs, I chose Perl 5's Mojo::UserAgent, since it has an HTML parser built-in. The :from<Perl5> adverb indicates to the compiler that we want to load a Perl 5, not Perl 6, module.

Since we're multi-threading, we'll use a Channel as a thread-safe queue for our caching purposes. We subscribe to the irc-to-me event where text contains word bash. When the event handler is triggered, we pop out to a new thread using the start keyword. Then we .poll our cache and use the cached value if we have one, otherwise, the logic will move onto the do block that that calls the fetch-quotes private method and when that completes, polls the cache once more, getting a fresh quote. All said and done, a quote will be the result of the Promise we return from the event handler.

The fetch-quotes method fires up our Mojo::UserAgent object that fetches the random quotes page from the website, finds all HTML elements that have class="qt" on them—those are paragraphs with quotes. Then, we use a hyper method call to convert those paragraphs to just text and that final list is fed to our $cache Channel via a for loop. And there you go, we non-blockingly connected our bot to the cesspit of the IRC world. And speaking of things you may want to filter...

Watch Your Mouth!

Our bot would get banned rather quickly if it spewed enormous amounts of output into channels. An obvious solution is to include logic in our plugins that would use a pastebin if the output is too large. However, it's pretty impractical to add such a thing to every plugin we write. Luckily, IRC::Client has support for filters!

For any method that issues a NOTICE or PRIVMSG IRC command, IRC::Client will pass the output through classes given to it via :filters attribute. This means we can set up a filter that will automatically pastebin large output, regardless of what plugin it comes from.

We'll re-use our bash.org quote bot, except this time it will pastebin large quotes to Shadowcat pastebin. Let's look at some code!

use IRC::Client;
use Pastebin::Shadowcat;
use Mojo::UserAgent:from<Perl5>;

class Bash {
    constant $BASH_URL = 'http://bash.org/?random1';
    constant $cache    = Channel.new;
    has        $!ua    = Mojo::UserAgent.new;

    multi method irc-to-me ($ where /bash/) {
        start $cache.poll or do { self!fetch-quotes; $cache.poll };
    }

    method !fetch-quotes {
        $cache.send: $_
            for $!ua.get($BASH_URL).res.dom.find('.qt').each».all_text;
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#zofbot>
    :debug
    :plugins(Bash.new)
    :filters(
        -> $text where .lines > 1 || .chars > 300 {
            Pastebin::Shadowcat.new.paste: $text.lines.join: "\n";
        }
    )

<Zoffix> MahBot, bash
<MahBot> Zoffix, <intuit> hmm maybe sumtime next week i will go outside'
<Zoffix> MahBot, bash
<MahBot> Zoffix, http://fpaste.scsys.co.uk/528741

The code that does all the filtering work is small enough that it's easy to miss—it's the last 5 lines in the program above. The :filters attribute takes a list of Callables, and here we're passing a pointy block. In its signature we constraint the text to be more than 1 line or more than 300 characters long, so our filter will be run only when those criteria are met. Inside the block, we simply use the Pastebin::Shadowcat module to throw the output onto the pastebin. Its .paste method returns the URL of the newly-created paste, which is what our filter will replace the original content with. Pretty awesome!

It Spreads Like Butter

In the past, when I used other IRC client tools, whenever someone asked me to place my bots on other servers, the procedure was simple: copy over the code to another directory, change config, and you're done. It almost made sense that a new server would mean a "new" bot: different channels, different nicknames, and so on.

In Perl 6's IRC::Client, I tried to re-imagine things a bit: a server is merely another identifier for a message, along with a channel or nickname. This means connecting your bot to multiple servers is as simple as adding new server configuration via :servers attribute:

use IRC::Client;

class BFF {
    method irc-to-me ($ where /'♥'/) { 'I ♥ YOU!' }
}

.run with IRC::Client.new:
    :debug
    :plugins(BFF)
    :nick<MahBot>
    :channels<#zofbot>
    :servers(
        freenode => %(
            :host<irc.freenode.net>,
        ),
        local => %(
            :nick<P6Bot>,
            :channels<#zofbot #perl6>,
            :host<localhost>,
        )
    )

[on Freenode server]
<ZoffixW> MahBot, I ♥ you
<MahBot> ZoffixW, I ♥ YOU!

[on local server]
<ZoffixW> P6Bot, I ♥ you
<P6Bot> ZoffixW, I ♥ YOU!

First, our plugin remains oblivious that it's being run on multiple servers. Its replies get redirected to the correct server and IRC::Client still executes its method handler in a thread-safe way.

In the IRC::Client's constructor we added :servers attribute that takes a Hash. The keys of this Hash are servers' labels and values are server-specific configurations that override global settings. So freenode server gets its :nick and :channels from the :nick and :channels attributes we give to IRC::Client, while the local server overrides those with its own values.

The debug output now has server lables printed, to indicate to which server the event applies:

And so, but simply telling the bot to connect to another server, we made it multi-server, without making any changes to our plugins. But what do we do when we do want to talk to a specific server?

Send It That Way

When the bot is .run, the Client Object changes the values of :servers attribute to be IRC::Client::Server objects. Those stringify to the label for the server they represent and we can get them either from the .server attribute of the Message Object or .servers hash attribute of the Client Object. Client Object methods such as .send or .join take an optional server attribute that controls which server the message will be sent to and defaults to value *, which means send to every server.

Here's a bot that connects to two servers and joins several channels. Whenever it sees a channel message, it forwards it to all other channels and sends a private message to user Zoffix on server designated by label local.

use IRC::Client;

class Messenger does IRC::Client::Plugin {
    method irc-privmsg-channel ($e) {
        for $.irc.servers.values -> $server {
            for $server.channels -> $channel {
                next if $server eq $e.server and $channel eq $e.channel;

                $.irc.send: :$server, :where($channel), :text(
                    "$e.nick() over at $e.server.host()/$e.channel() says $e.text()"
                );
            }
        }

        $.irc.send: :where<Zoffix>
                    :text('I spread the messages!')
                    :server<local>;
    }
}

.run with IRC::Client.new:
    :debug
    :plugins[Messenger.new]
    :nick<MahBot>
    :channels<#zofbot>
    :servers{
        freenode => %(
            :host<irc.freenode.net>,
        ),
        local => %(
            :nick<P6Bot>,
            :channels<#zofbot #perl6>,
            :host<localhost>,
        )
    }

[on Freenode server/#zofbot]
<ZoffixW> Yey!
[on local server/#zofbot]
<P6Bot> ZoffixW over at irc.freenode.net/#zofbot says Yey!
[on local server/#perl6]
<P6Bot> ZoffixW over at irc.freenode.net/#zofbot says Yey!
[on local server/ZoffixW private message queue]
<P6Bot> I spread the messages!

We subscribe to the irc-privmsg-channel event and when it's triggered, we loop over all the servers. For each server, we loop over all of the connected channels and use $.irc.send method to send a message to that particular channel and server, unless the server and channel are the same as where the message originated.

The message itself calls .nick, .channel, and .server.host methods on the Message Object to identify the sender and origin of the message.

Conclusion

Perl 6 offers powerful concurrency primitives, dispatch methods, and introspection that lets you build awesome non-blocking, event-based interfaces. One of them is IRC::Client that lets you use IRC networks. It's here. It's ready. Use it!

Perl 6 Hands-On Workshop: Weatherapp (Part 3)

Read this article on Perl6.Party

Be sure to read Part 1 and Part 2 of this workshop first.

There is black box testing, glass box testing, unit testing, integration testing, functional testing, system testing, end-to-end testing, sanity testing, regression testing, acceptance testing, load testing, stress testing, performance testing, usability testing, and many more types of testing.

I'll leave it for people with thicker glasses to explain all of the types. Today, we'll write tests that ensure our weather reporting module works as expected, and as a bonus, you get to pick your own label for what type of tests these are. Let's dive in!

TDD

TDD (Test-Driven Development) is where you write a bunch of tests before you write the actual code, ensure they fail—because code to satisfy them isn't there yet—and then you write code until the tests succeed. Now you can safely refactor your code or add new features without worrying you'll break something. Rinse and repeat.

Not only do you avoid having to convince yourself to bother writing tests after your code seems to work, you also get a feel for how comfortable your interface is to use before you even create it.

Testing Modules

Perl 6 comes with a number of standard modules included, one of which is a module called Test that we'll use. The Ecosystem also has dozens of other test related modules and we'll use two called Test::When and Test::META

Test provides all the generic testing routines we'll use, Test::When will let us watch for when the user actually agreed to run specific types of tests, and Test::META will keep an eye on the sanity of our distribution's META file (more on that later).

To install Test::When and Test::META, run zef install Test::When Test::META or panda install Test::When Test::META, depending on which module manager you're using.

Testing Files

Our testing files are named with the extension .t and go into t/ directory. They will be automatically discovered and run by module managers during installation of our module.

You are free to organize your tests under subdirectories; they will still be automatically found. It's also common to prefix the names of tests with a sequential number, e.g. 00-init.t, 01-methods.t, etc. It's more of an organizational practice and in no way should your tests in one file depend on whether tests in another file ran first.

Boilerplate

use Test;

use My::Module;
is get-stuff(), 'the right stuff', 'The stuff we received is correct';

done-testing;

# or

use Test;

plan 1;

use My::Module;
is get-stuff(), 'the right stuff', 'The stuff we received is correct';

The two versions above differ in that the first doesn't care how many tests you run and the second expects exactly one test to run. The former knows all tests ran when done-testing is called while the latter counts how many ran and complains if the count doesn't match the plan.

The version without a plan is generally easier to use, especially in a highly collaborative environment where multiple people might be adding tests to the file, so keeping an accurate test count becomes annoying. The one thing to be careful with planless method is this:

my @results = get-results;
for @results.kv -> $i, $v {
    is $v, 'expected', "result #{$i+1} is correct";
}

This test will run correctly regardless of how many results we get in @results, even if it is none! We should add an additional test that ensures @results contains the correct number of results:

is @results.elems, 5, 'got five results';

Our Files

We'll create two test files and our directory structure will look like this:

t
├── key
├── 01-use.t
├── author
│   └── 01-meta.t
└── online
    └── 01-weather-for.t

We placed our META file test into an author subdirectory because that test is useful only for us and not the user, so there's no point in having to require them to install the extra modules. The same logic should apply to other tests, like ones that test documentation completeness or any other test failing which does not mean the module itself is broken. No one wants their build to stall just because you didn't document a new experimental method, so we should avoid running those tests on the installer's machine.

Our main test file goes into online directory, as it will be run only when the installer requests online tests. The names of these subdirectories are arbitrary and their existence is purely for organizational purposes. Whether the tests are actually run is controlled by Test::When module.

Last but not least, we have the key file containing our API key. This way, we don't hardcode it into any one test, it's more obvious that this sort of data is present in our codebase, and we know where to go if we have to replace it (even if we add multiple files that need the key). Depending on the service you are using, you may choose to make the key entirely private and ask the installer to enter their own key. Some services offer tester keys or sandboxed endpoints precisely for the purposes of users running tests.

The 01-use.t and author/01-meta.t tests are rather unspectacular.

# t/01-use.t
use Test;

use-ok 'WebService::Weather';

done-testing;

We call use-ok that tests whether the module can be loaded and we give it the name of our future module as the argument. Generally, this test isn't needed, since you're going to use your module to bring in the functionality for testing anyway. In this particular case, however, all of our other tests may get skipped (installer doesn't ask for author/online tests), resulting in Result: NOTESTS output, which I don't entirely trust for all module installers to know to interpret as success.

The Meta file test is just a copy-paste from the docs, which works for any distribution:

# t/author/01-meta.t
use Test::When <author>;
use Test;
use Test::META;

meta-ok;

done-testing;

In both tests we include Test module and call done-testing at the end. In the Meta file test we've used use Test::When <author> to indicate this test is an author test and we'll need to set an environmental variable for it to run—more on that later.

Main Test

To write the main test, we'll peak into what sort of values the API returns and try to model them. We need to strike a balance between knowing we received a legit value from our subroutine or method, while not making the test so precise that it fails the minute the valid value we receive decided to wear a hat and put on makeup.

Here's the code for the test:

# t/online/01-weather-for.t
use Test::When <online>;
use Test;
use WebService::Weather;

for ('London'), ('London', 'ca') -> $args {
    subtest {
        my $result = weather-for |$args;

        isa-ok $result, 'WebService::Weather::Result',
            'result is of a correct data type';

        does-ok $result."$_"(), Numeric, "$_ is numerical"
            for <temp wind precip>;

        cmp-ok $result.temp,   &[<],  70,   'temperature is not too high';
        cmp-ok $result.temp,   &[>],  -100, 'temperature is not too low';
        cmp-ok $result.wind,   &[<],  120,  'wind speed is not too high';
        cmp-ok $result.wind,   &[>=], 0,    'wind speed is not too low';
        cmp-ok $result.precip, &[<],  3200, 'precipitation is not too high';
        cmp-ok $result.precip, &[>=], 0,    'precipitation is not too low';
    }, "Testing with args: $args";
}

isa-ok weather-for('blargs' x 12), Failure,
    'we get a Failure for unknown city';

done-testing;

We use Test::When to mark this test as requiring an active Internet connection, so the test will only run when the installer explicitly requests to do so via an environmental variable. We also use the module we'll make.

In the first for loop, we're iterating over two sets of arguments: city only and city + country. The loop executes a subtest on each iteration, delineating our results in the output nicely. When we call weather-for we Slip each set of arguments in and save the return value into our $result.

We follow the interface described in our DESIGN doc to write the tests for the result. It needs to be an object and it has .temp, .wind, and .precip methods and their values are Numeric.

The isa-ok sub tests our result is of the correct class and does-ok sub checks all of the return values do the Numeric role—note how we simply used another for loop there, to avoid duplicating the test code.

The last segment of the test uses a bunch of cmp-ok tests to check the sanity of the range of the returned values. Since we don't know what the weather is like on the day we're running the test, we can't check for the exact values. I've consulted with the list of weather records to get an idea for the range of the values we're expecting.

Lastly, outside our main for loop, we have one more test that gives weather-for a garbage city name and tests that it returns a Failure object.

We're done with our tests, so let's commit them:

git add t
git commit -m 'Write tests'
git push

Your distribution structure should look something like this now.

Extra Testing

Our tests did not test absolutely everything that can be tested. What happens when a city is an empty string? What happens when it's not a string? What happens when we give a garbage value for the country? What happens when network connection fails?

We could add that, but keep one thing in mind: tests are code and code needs maintenance. If adding a couple lines of code to your program requires you to also dig through thousands of lines of tests, you're going to have a bad day.

So how much testing is enough? It depends on the type of the software you're writing. If your software failing will result in the loss of human life (e.g. medical software) or loss of a large investment (e.g. software for space probes) you better make sure you test every possible case. On the other end, if you're writing a cowsay clone, you may scrimp on tests for the sake of easier maintenance.

Running The Tests

To run the tests, we use the prove command and pass perl6 as executable to use. Since the modules we're writing tend to live in lib/ directory, we should also pass the -I command line switch to include that directory in the module search path. We'll also tell it to find test files recursively and be verbose with its output. Thus, the full command is:

prove -e 'perl6 -Ilib' -vr t/

Where t/ is the directory with our tests, but we can give it individual test files as well. For convenience, I aliased the above command in my .bash_aliases file:

alias prove6="prove -e 'perl6 -Ilib' -vr"

And then I just use it as

prove6 t/

Try running the tests right now. Unsurprisingly, they fail!

...
# Failed test 'The module can be use-d ok'
...

These failures will be our instructions on what to do next while implementing the module, which we'll cover in the next post!

Refining the Design

At this point, we got a feel for using the code we haven't even written yet and that type of code is much cheaper to change than one we've written and shipped. Does anything feel off or awkward to use? Are we missing anything? Does anything seem redundant? If yes, we probably should alter our design.

Three things jump out with our weather module:

  • We don't know why we failed. Was the city name wrong? Did the service change and now we're not giving it the correct arguments? Was it a network error? Perhaps, we should add some exception classes and throw one of them, depending on the error.
  • We don't know whether we got the weather for the correct city. Calling with ('London') gives weather for London in Britain, but calling with ('London', 'ca') gives weather for London in Ontario, Canada. Perhaps, we could add a .location method to our result object that would return City + Country of the actual location we received the weather for.
  • An astute reader will notice we never specced how weather-for obtains the API key! There are several approaches. We can specify it on the use line or call a key subroutine and store it in a class variable—both of which will restrict your program to use just one API key. Another way may be to pass a :key named argument to weather-for or even redesign the interface to be Object Oriented, with key specified as an attribute to the WebService::Weather object.

Homework

Several problems with our code/design were brought up in this articles: we don't know how to specify the API key to use, tests don't test for everything, and we could use some extra features, such as precise failure mode indicators and providing the location of in the result.

Try to alter the design and modify the tests to accommodate that stuff.

Conclusion

Today, we broke ground by laying down the first code for our app. This code tests the functionality of the actual app code we're yet to write.

Ensuring your code works is important and having automated tests do that for you lets you modify your code without fear that you'll break something. The amount of tests you write depends on the type of your application. As tests require maintenance and you need to strike a balance between having your application work "correctly enough" and adding extra maintenance work for you.

In the next post, we'll write the actual code to fetch weather information. Get excited!

Perl 6 Hands-On Workshop: Weatherapp (Part 2)

Read this article on Perl6.Party

Be sure to read Part 1 of this workshop first.

Imagine writing 10,000 lines of code and then throwing it all away. Turns out when the client said "easy to use," they meant being able to access the app without a password, but you took it to mean a "smart" UI that figures out user's setup and stores it together with their account information. Ouch.

The last largish piece of code where I didn't bother writing design docs was 948 lines of code and documentation. That doesn't include a couple of supporting plugins and programs I wrote using it. I had to blow it all up and re-start from scratch. There weren't any picky clients involved. The client was me and in the first 10 seconds of using that code in a real program, I realized it sucked. Don't be like me.

Today, we'll write a detailed design for our weather reporting program. There are plenty of books and opinions on the subject, so I won't tell you how you should do design. I'll tell you how I do it and if at the end you decide that I'm an idiot, well... at least I tried.

The Reason

It's pretty easy to convince yourself that writing design docs is a waste of time. The reason for such a feeling is because design future-proofs the software, proving useful after months or years, and us, squishy sacks of snot and muscle, really like the type of stuff we can touch, see, and run right away. However, unless you can hold all the workings of your program entirely into your head at once, you'll benefit from jotting down the design first. Here are some of the arguments against doing so I heard from others or thought about myself:

It's more work / More time consuming

That's only true if you consider the amount of work done today or in the next couple of weeks. Unless it's a one-off script that can die after that time, you'll have to deal with new features added, current features modified, appearance of new technologies and deprecation of old ones.

If you never sat down and actively thought about how your piece of software will handle those things, it'll be more work to change them later on, because you'll have to change the architecture of already-written code and that might mean rewriting the entire program in extreme cases.

There are worse fates than a rewrite, however. How about being stuck with awful software for a decade or more? It does everything you want it to, if you add a couple of convoluted hacks no one knows how to maintain. You can't really change it, because it's a lot of work and too many things depend on it working the way it is right now. Sure, the interface is abhorrent, but at least it works. And you can pretend that piece of code doesn't really exist, until you have to add a new feature to it.

Yeah, tell it to my boss!

You tell them! Listen, if your boss tells you to write a complicated program in one hour... which parts of it would you leave unimplemented, for the client to complain about? Which parts of it would you leave buggy? Which parts of it would you leave non-secure?

Because you're doing the same thing when you don't bother with the design, don't bother with the tests, and don't bother with the documentation. The only difference is the time when people find out how screwed everyone is is further in the future, which lets you delude yourself into thinking those parts can be omitted.

Just as you would tell your boss they aren't giving you enough time in the case I described above, tell the same if you don't have the time to write down the design or the docs. If they insist the software must get finished sooner, explain to them the repercussion of omitting the steps you plan to omit, so that when shit hits the fan, it's on them.

I think better in code

This is the trap I myself used to fall into more often than I care to admit. You start writing your "design" by explaining which class goes where and which methods it has and... five minutes in you realize writing all that in code is more concise anyway, so you abandon the idea and start programming.

The cause for that is your design is too detailed on the code and not enough on the purpose and goals. The more of the design you can write without having to rely on specific details of an implementation, the more robust your application will be and, as time passes and technologies come and go, what your app is supposed to do remains clear and in human language. That's not to say there's no place for code in the design. The detailed interface is good to have and larger software should have its guts designed too. However, try to write your design as something you'd give to a competent programmer to implement, rather than step-by-step instructions that even an idiot could follow and end up with a program.

To give you a real-world example: 8–10 years ago, the biggest argument I had with other web developers was the width of the website. You see, 760–780 pixel maximum width was the golden standard, because some people had 800x600 monitor resolutions and so, if you account for the scrollbar's width, the 780 pixel website fit perfectly without horizontal scrolling. I was of the opinion that it was time for those people to move on to higher resolutions, and often used 900 pixel widths... or even 1000px, when I was feeling especially rebellious.

Now, imagine implementation-specific design docs that address that detail: "The website must be 780 pixels in width." Made sense in the past, but is completely ludicrous today. A better phrasing should've been "The website must avoid horizontal scrolling."

The benefits

Along with the aforementioned benefits of having a written design document, there are another two that are more obvious: tests and user documentation.

A well-written and complete design document is the human-language version of decent machine-language tests. It's easier to do TDD (Test Driven Development), which we'll do in the next post in this series, and your tests are less reliant on the specifics of the implementation, so that they don't falsely blow up every time you make a change.

Also, a huge chunk of your design document can be re-used for user documentation. We'll see that first-hand we we get to that part.

The Design

By this point, we have two groups of readers: those who are convinced we need a design and those who need to keep track of the line count of their programs to cry about when they have to rewrite them from scratch, (well, three groups: those who already think I'm an idiot).

We'll pop open DESIGN.md that we started in Part 1 and add our detailed design to it.

Throw Away Your Code

The best code is not the most clever, most documented, or most tested. It's the one that's easiest to throw away and replace. And you can add and remove features and react to technology changes by throwing away code and replacing it with better code. Since replacing the entire program each time is expensive, we need to construct our program out of pieces each of which is easy to throw away and replace.

Our weather program is something we want to run from a command line. If we shove all of our code into a script, we're faced with a problem tomorrow, when we decide to share our creation with our friends in a form of a web application.

We can avoid that issue by packing all functionality into a module that provides a function. A tiny script can call that function and print the output to the terminal and a web application can provide the output to the web browser.

We have another weakness on the other end of the app: the weather service we use. It's entirely out of our control whether it continues to be fast enough and cheap enough or exists at all. A dozen of now-defuct pastebin modules I wrote are a testament to how frequently a service can disappear.

We have to reduce the amount of code we'd need to replace, should OpenWeatherMap disappear. We can do that by creating an abstraction of what a weather service is like and implementing as much as we can inside that abstraction, leaving only the crucial bits in an OpenWeatherMap-specific class.

Let's write the general outline into our DESIGN.md:

# GENERAL OUTLINE

The implementation is a module that provides a function to retrieve
weather information. Currently supported service
is [OpenWeatherMap](www.openweathermap.org), but the implementation
must allow for easy replacement of services.

Details

Let's put on the shoes of someone who will be using our code and think about the easiest and least error-prone way to do so.

First, how will a call to our function look like? The API tells us all we need is a city name, and if we want to include a country, just plop its ISO code in after the city, separated with a comma. So, how about this:

my $result = weather-for 'Brampton,ca';

While this will let us write the simplest implementation—we just hand over the given argument to the API—I am not a fan of it. It merges two distinct units of information into one, so any calls where the arguments are stored in variables would have to use a join or string interpolation. Should we choose to make a specific country the default one, we'd have to mess around inspecting the given argument to see whether it already includes a country. Lastly, city names can get rather weird... what happens if a user supplies a city name with a comma in it? The API doesn't address that possibility, so my choice would be to strip commas from city names, which is easiest to do when it's a separate variable. Thus, I'll alter what the call looks like to this:

my $result = weather-for 'Brampton', 'ca';

As for the return value, we'll return a Weather::Result object. I'll go over what objects are when we write the code. For now, you can think of them as things you can send a message to (by calling a method on it) and sometimes get a useful message back in return. So, if I want to know the temperature, I can call my $t = $weather-object.temp and get a number in $t; and I don't care at all how that value is obtained.

Our generic Weather::Result object will have a method for each piece of information we're interested in: temperature, information on precipitation, and wind speed. Looking at the available information given by the API, we can merge the amount of rain and the amount of snow into a single method, and for wind I'll only use the speed value itself and not the direction, thus a potential use for our function could look like this:

printf "Current weather is: %d℃, %dmm precip/3hr, wind %dm/s\n",
    .temp, .precip, .wind given weather-for <Brampton ca>;

Looks awesome to me! Let's write all of this into our DESIGN.md:

# INTERFACE DETAILS

## EXPORTED SUBROUTINES

### `weather-for`

    my $result = weather-for 'Brampton', 'ca';

    printf "Current weather is: %d℃, %dmm precip/3hr, wind %dm/s\n",
        .temp, .precip, .wind given $result;

Takes two positional arguments—name of the city and [ISO country
code](http://www.nationsonline.org/oneworld/country_code_list.htm)—to
provide weather information for. The country is optional and by default is
not specified.

Returns a `Weather::Result` object on success, otherwise returns
a `Failure`. The object provides these methods:

#### `.temp`

    say "Current temperature is $result.temp()℃"

Takes no arguments. Returns the `Numeric` temperature in degrees Celcius.

#### `.precip`

    say "Expected to receive $result.precip()mm/3hr of percipitation";

Takes no arguments. Returns the `Numeric` amount of precipitation in
millimeters per three hours.

#### `.wind`

    say "Wind speed is $result.wind()m/s";

Takes no arguments. Returns the `Numeric` wind speed in meters per second.

Great! The interface is done. And the best thing is we can add extra methods to the object we return, to add useful functionality, which brings me to the next part:

Overengineering

It's easy for programmers to overengineer their software. Unlike building a larger house, there's no extra lumber needed to build a larger program. And it's easy to fall into the trap of adding numerious useless features to your software that make it more complicated and more difficult to maintain, without adding any measurable amount of usefulness.

Some examples are:

  • Accepting multiple types of input (Array, Hash, scalars), just because you can.
  • Returning multiple types of output, just because you can figure out what is most likely expected, based on the input or the calling context.
  • Providing both object-oriented and functional interfaces, just because some people like one or the other.
  • Adding a feature, just because it's only a couple of lines of code to add it.
  • Providing detailed settings or configuration, just because...

Note that none of the above features are inherently bad. It's the reasons for why they are added that suck. All of those items make your program more complex, which translates to: more bugs, more code to maintain, more code to write to replicate the interface should the implementation change, and last but not least, more documentation for the user to sift through! It's critical to evaluate the merits of each addition and to justify the extra cost of having it included.

My favourite example of overengineering is WeChall wargaming website. I'm pretty sure there's a button that will make that site mow my lawn... I just have to find it first:

If I have some "cool" ideas for what my module XYZ can do, I usually simply make sure they're possible to add with my current design, and then... I leave them alone until someone asks me for them.

An astute reader will notice our weather-for can only do metric units or that the wind speed doesn't include the direction, even though the API provides other units and extra information. Well, that's all our fictional client asked for. The code is easy to implement and the entire documentation fits onto half a screen.

If in the future weather-for needs to return Imperial units, we'll simply make it accept :imperial positional argument that will switch it into Imerial units mode. If we ever need wind direction as well, no problem, just add it as an extra method in Weather::Result object.

Do less. Be lazy. In programming, that's a virtue.

Our Repo

Our repository now contains completed DESIGN.md file with our design. Commit what we wrote today:

git add DESIGN.md
git commit -m 'Write detailed design'
git push

I created a GitHub repo for this project, so you can follow along and ensure you have all the files.

Homework

Amend the design to include either of these features (or both): (1) make it possible for weather-for to use both metric or Imperial units, depending on what the user wants; (2) Make it possible to give weather-for actual names for countries rather than ISO country codes.

If you're feeling particularly adventurous, design a Web application that will use our module.

Conclusion

Today, we've learned how to think about the design of software before we create it. It's useful to have the design written down in human language, as that's easier to understand and cheaper to change than code. We wrote the design for our weather applications and are now ready to get down and dirty and start writing some code. Coming up next: Tests!

Update: Part 3 is now available!

About Zoffix Znet

user-pic I blog about Perl.