Perl 6 Archives

Perl 6 Core Hacking: Where's Da Sauce, Boss?

Read this article on Perl6.Party

Imagine you were playing with Perl 6 and you came across a buglet or you were having some fun with the Perl 6 bug queue—you'd like to debug a particular core subroutine or method, so where's the source for it at?

Asked such a question, you might be told it's in Rakudo compiler's GitHub repository. Depending on how deep down the rabbit hole you wish to go, you may also stop by NQP's repo, which is a subset of Perl 6 that's used in Rakudo, or the MoarVM's repo, which is the leading virtual machine Perl 6 runs on.

The answer is fine, but we can do better. We'd like to know exactly where da sauce is.

Stick to The Basics

The most obvious way is to just use grep command in the source repository. The code is likely in src/ directory, or src/core more specifically.

We'll use a regex that catches sub, method, and multi keywords. For example, here's our search for path sub or method:

$ grep -nER '^\s*(multi|sub|method|multi sub|multi method)\s+path' src/core

src/core/Cool.pm:229:    method path() { self.Stringy.IO }
src/core/CompUnit/Repository/Locally.pm:26:    method path-spec(CompUnit::Repository::Locally:D:) {
src/core/CompUnit/Repository/AbsolutePath.pm:46:    method path-spec() {
src/core/CompUnit/Repository/NQP.pm:32:    method path-spec() {
src/core/CompUnit/Repository/Perl5.pm:46:    method path-spec() {
src/core/CompUnit/PrecompilationStore/File.pm:93:    method path(CompUnit::PrecompilationId $compiler-id,
src/core/CompUnit/PrecompilationUnit.pm:17:    method path(--> IO::Path) { ... }
src/core/IO/Spec/Win32.pm:58:    method path {
src/core/IO/Spec/Unix.pm:61:    method path {
src/core/IO/Handle.pm:714:    method path(IO::Handle:D:)            { $!path.IO }

It's not too terrible, but it's a rather blunt tool. We have these problems:

  • There are false positives; we have several path-spec methods found
  • It doesn't tell us which of the results is for the actual method we have in our code. There's Cool, IO::Spec::Unix, and IO::Handle all with method path in them. If I call "foo".IO.path, which of those get called?

The last one is particularly irksome, but luckily Perl 6 can tell us where the source is from. Let's ask it!

But here's line number... So code me maybe

The Code class from which all subs and methods inherit provides .file and .line methods that tell which file that particular Code is defined in, including the line number:

say "The code is in {.file} on line {.line}" given &foo;

sub foo {
    say 'Hello world!';
}

# OUTPUT:
# The code is in test.p6 on line 3

That looks nice and simple, but it gets more awkward with methods:

class Kitty {
    method meow {
        say 'Meow world!';
    }
}

say "The code is in {.file} on line {.line}" given Kitty.^can('meow')[0];

# OUTPUT:
# The code is in test.p6 on line 2

We got extra cruft of the .^can metamodel call, which returns a list of Method objects. Above we use the first one to get the .file and .line number from, but is it really the method we were looking for? Take a look at this example:

class Cuddly {
    method meow ('meow', 'meow') {
        say 'Meow meow meow!';
    }
}

class Kitty is Cuddly {
    multi method meow ('world') {
        say 'Meow world!';
    }

    multi method meow ('meow') {
        say 'Meow meow';
    }
}

We have a method meow in one class and in another class we have two multi methods meow. How can we print the location of the last method, the one that takes a single 'meow' as an argument?

First, let's take a gander at all the items .^can returns:

say Kitty.^can('meow');
# OUTPUT:
# (meow meow)

Wait a minute, we have three methods in our code, so how come we only have two meows in the output? Let's print the .file and .line for both meows:

for 0, 1 {
    say "The code is in {.file} on line {.line}"
        given Kitty.^can('meow')[$_];
}
# OUTPUT:
# The code is in gen/moar/m-CORE.setting on line 587
# The code is in test.p6 on line 2

The second meow gives us a sane result; it's our method defined in class Cuddly. The first one, however, gives us some weird file.

What's happening here is the line is referencing the proto for the multies. Since in this case instead of providing our own proto we use the autogenerated one, the referenced file has nothing to do with our code. We can, of course, add a proto into the code, but then the line number would still reference the proto, not the last meow method. Is there anything that we can do?

You .cando It!

The Routine class, from which both Method and Sub classes inherit, provides the .cando method. Given a Capture, it returns a list of candidates that can handle it, with the narrowest candidate first in the list, and since the returned object is a Code, we can query its specific .file and .line:

class Cuddly {
    method meow ('meow', 'meow') {
        say 'Meow meow meow!';
    }
}

class Kitty is Cuddly {
    multi method meow ('world') {
        say 'Meow world!';
    }

    multi method meow ('meow') {
        say 'Meow meow';
    }
}

my $code = gather {
    for Kitty.^can('meow') -> $meth {
        .take for $meth.cando: \(Kitty, 'meow');
    }
}

say "The code is in {.file} on line {.line}" with $code[0];

# OUTPUT:
# The code is in test.p6 on line 12

Hooray! We got the correct location of the multi we wanted. We still have our two classes with three meow methods total. On line 17–21 we loop over the two meow Methods the .^can metamodel call gives us. For each of them we call the .cando method with the Capture that matches the multi we want (note that we do need to provide the needed object as the first argument of the Capture). We then .take all found candidates to gather them into the $code variable.

The first value we get is the narrowest candidate and is good 'nuf for us, so we call the .file and .line on it, which gives us the location we were looking for. Sounds like we nailed this .file and .line business down rather well. Let's dive into the core, shall we?

Can't see the core files for the setting

If this is the first time you're to see the print out of the .file/.line for some core stuff, you're in for a surprise. Actually, we've already seen the surprise, but you may have thought it to be a fluke:

say "{.file}:{.line}" given &say;
# OUTPUT:
# gen/moar/m-CORE.setting:29038

All of the nice, good looking files you see in src/core in the repo actually get compiled into one giant file called the "setting." My current setting is 40,952 lines long and the .line of core subs and methods refers to one of those thousands of lines.

Now sure, we could pop the setting open and watch our editor grind to a stuttering halt (I'm looking at you, Atom!). However, that doesn't help us find the right repo file to edit if we want to make changes to how it works. So what do we do?

A keen eye will look at the contents of the setting or at the file that generates it and notice that for each of the separate files in the repo, the setting has this type of comment before the contents of the file are inserted into the setting:

#line 1 src/core/core_prologue.pm

This means if we're clever enough, we can write a sub that translates a line number in the setting to the separate file we can locate in the repo. Here's a plan of action: we pop open the setting file and read it line by line. When we encounter one of the above comments, we make a note of which file we're in as well as how many lines deep in the setting we're currently at.

The location of the setting file may differ, depending on how you installed Perl 6, but on my system (I use rakudobrew), it's in $*EXECUTABLE.parent.parent.parent.child('gen/moar/m-CORE.setting'), so the code for finding the actual file that defines our core sub or method is this:

sub real-location-for ($wanted) {
    state $setting = $*EXECUTABLE.parent.parent.parent.child: 'gen/moar/m-CORE.setting';
    my ($cur-line-num, $offset) = 0, 0;
    my $file;
    for $setting.IO.lines -> $line {
        return %( :$file, :line($cur-line-num - $offset), )
            if ++$cur-line-num == $wanted;

        if $line ~~ /^ '#line 1 ' $<file>=\S+/ {
            $file   = $<file>;
            $offset = $cur-line-num + 1;
        }
    };
    fail 'Were not able to find location in setting.';
}

say "{.<file>}:{.<line>}" given real-location-for &say.line;


# OUTPUT:
# src/core/io_operators.pm:17

The $wanted contains the setting line number given to us by .line call and the $cur-line-num contains the number of the current line we're examining. We loop until the $cur-line-num reaches $wanted and return a Hash with the results. For each line that matches our special comment, we store the real name of the file the code is from into $file and store the $offset of the first line of the code in that file. Once done, we simply subtract the $offset from the setting $cur-line-num and we get the line number in the source file.

This is pretty awesome and useful, but it's still not what I had in mind when I said we wanted to know exactly where da sauce is. I don't want to clone the repo and go to the repo and open my editor. I want to just look at code.

If it's worth doing, it's worth overdoing

There's one place where we can stare at Rakudo's source code until it blushes and looks away: GitHub. Since our handy sub gives us a filename and a line number, we can construct a URL that points to a specific file and line in the source code, like this one, for example: https://github.com/rakudo/rakudo/blob/nom/src/core/Str.pm#L16

There's an obvious problem with such an approach: the URL points to the master branch (called nom, for "New Object Model," in Rakudo). Commits go into the repo daily, and unless we rebuild our Perl 6 several times a day, there's a good chance the location our GitHub URL points to is wrong.

Not only do we have to point to a specific file and line number, we have to point to the right commit too. On GitHub's end, it's easy: we just replace nom in the URL with the appropriate commit number—we just need Rakudo to tell us what that number is.

The two dynamic variables $*VM and $*PERL contain some juicy information. By introspecting them, we can locate some useful info and what looks like commit prefix parts in version numbers:

say $*VM.^methods;
# (BUILD platform-library-name Str gist config prefix precomp-ext
# precomp-target precomp-dir name auth version signature desc)

say $*VM.version;
# v2016.06

say $*PERL.^methods;
# (BUILD VMnames DISTROnames KERNELnames Str gist compiler name auth version
# signature desc)

say $*PERL.compiler.^methods;
# (BUILD build-date Str gist id release codename name auth version
# signature desc)

say $*PERL.compiler.version;
# v2016.06.10.g.7.cff.429

Rakudo is a compiler and so we're interested in the value of $*PERL.compiler.version. It contains the major release version, followed by g, followed by the commit prefix of this particular build. The prefix is split up on number-letter boundaries, so we'll need to join up all the bits and split on g. But, take a look at $*VM.version, which is the version of the virtual machine we're running the code on. There aren't any gs and commits in it and for a good reason: it's a tagged major release, and the name of the tag is the version. The same will occur for Rakudo on release builds, like the ones shipped with Rakudo Star. So we'll need to check for such edge cases and this is the code:

my $where = .Str ~~ /g/
    ?? .parts.join.split("g")[*-1]
    !! .Str
given $*PERL.compiler.version;

given a $*PERL .compiler .version, if it contains letter g, join up version bits, split on g, and the last portion will be our commit prefix; if it doesn't contain letter g, then we're dealing with a release tag, so we'll take it as-is. All said and done, our code for locating source becomes this:

my $where = .Str ~~ /g/
    ?? .parts.join.split("g")[*-1]
    !! .Str
given $*PERL.compiler.version;

say [~] 'https://github.com/rakudo/rakudo/blob/',
        $where, '/', .<file>, '#L', .<line>
given real-location-for &say.line;

# OUTPUT:
# https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L17

Hey! Awesome! We got a link that points to the correct commit and file! Let celebrations begin! Wait. What? You followed the link and noticed the line number is not quite right? What gives? Did we mess up our algorithm?

Crank Up The Insanity

If you take a look again at the script that generates the setting file, you'll notice it strips things: comments and special backend-specific chunks of code.

There are two ways to fix this. The sane approach would be to commit a change that would make that script insert an empty line for each line it skips and then pretend that we didn't commit that just to make our personal project work. Then, there's the Zoffix Way to fix this: we got the GitHub link, so why don't we fetch that code and figure out what the right line number is. Hey! That second way sounds much more fun! Let's do just that!

The one link we've seen so far is this: https://github.com/rakudo/rakudo/blob/c843682/src/core/iooperators.pm#L17. It's not quite what we want, since it's got HTML and bells and whistles in it. We want raw code and GitHub does offer that at a slightly different URL: https://raw.githubusercontent.com/rakudo/rakudo/c843682/src/core/iooperators.pm. The plan of action then becomes:

  • Get the line number in the setting
  • Use our real-location-for sub to get the filename and sorta-right line number in a source file
  • Get the commit our compiler was built with
  • Generate a GitHub URL for raw code for that file on that commit and fetch that code
  • Use the same algorithm as in the setting generating script to convert the code we fetched into the version that lives in our setting, while keeping track of the number of lines we strip
  • When we reach the correct line number in the converted file, we adjust the original line number we had by the number of lines we stripped
  • Generate a regular GitHub URL to the commit, file, and corrected line number
  • ???
  • Profit!

I could go over the code, but it's just a dumb, unfun algorithm, and most importantly, you don't need to know it. Because... there's a module that does just that!

What Sorcery Is This?

The module is called CoreHackers::Sourcery and when you use it, it'll augment the Code class and all core classes that inherit from it with .sourcery method, as well as provide a sourcery subroutine.

So, to get the location of the code for say sub, just run:

use CoreHackers::Sourcery;
&say.sourcery.put;

# OUTPUT:
# src/core/io_operators.pm:20 https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L20

That gives us the correct location of the proto. We can either pop open a file in a repo checkout or view the code at the provided GitHub URL.

Want to get the location of a specific multi? There's no need to mess with .cando! The arguments you give to the .sourcery method will be used to select the best matching multi, so to find the location of the say multi that will handle say "foo" call, just run:

&say.sourcery("foo").put;

# OUTPUT:
# src/core/io_operators.pm:22 https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L22

That covers the subs. For methods, you can go with the whole .^can meta dance, but we like simple things, and so we'll use the subroutine form of sourcery:

put sourcery Int, 'abs';         # method of a type object
put sourcery 42,  'split';       # method of an Int object
put sourcery 42,  'base', \(16); # best candidate for `base` method called with 16 as arg

This is pretty handy. And the whole hitting the GitHub thing? The module will cache the code fetched from GitHub, so things like this won't take forever:

put "Int.{.name} is at {.sourcery}" for Int.^methods;

However, if you do actually run that code, after some output you'll be greeted with this error:

# Method 'sourcery' not found for invocant of class 'Method+{Callable[Bool:D]}'
#   in block  at test.p6 line 1
#   in block <unit> at test.p6 line 1

The class it mentions is not a pure Method object, but has a mixin in it. While CoreHackers::Sourcery recomposes all core subclasses of Code class after augmenting it, it doesn't do that for such mixes, so you'd have to recompose them yourself:

for Int.^methods {
    .WHAT.^compose;
    put "Int.{.name} is at {.sourcery}" ;
}

Or better still, just use the subroutine form of sourcery:

put "Int.{.name} is at {sourcery $_}" for Int.^methods;

Do It For Me

For most stuff, we wouldn't want to do a whole bunch of typing to use a module and call subs and then copy/paste URLs or filenames. You'll notice sourcery returns a list of two items: the filename and the URL. This means we can make some nice and short aliases to call it and automatically pop open either our editor or web browser:

$ alias sourcery='perl6 -MCoreHackers::Sourcery -MMONKEY-SEE-NO-EVAL \
    -e '\''run "atom", "/home/zoffix/rakudo/" \
        ~ EVAL "sourcery(@*ARGS[0])[0]" '\'''

$ alias sourcery-web='perl6 -MCoreHackers::Sourcery -MMONKEY-SEE-NO-EVAL \
    -e '\''run "firefox", EVAL "sourcery(@*ARGS[0])[1]" '\'''

# opens Atom editor at the spot to edit code for Int.base
$  sourcery 'Int, "base"'

# opens Firefox, showing code for Int.base
$  sourcery 'Int, "base"'

We EVAL the argument we give to these aliases, so be careful with them. For sourcery alias, we run the Atom editor and give it the file to open. I prepended the location of my local Rakudo checkout, but you'd use yours. Most editors support opening file:line-number format to open files at a particular spot; if yours doesn't, modify the command.

For sourcery-web we use the URL returned by sourcery and open Firefox browser at this location. And just like that, with a few keystrokes, we can jump in to view or edit the code for a particular core sub or method in Rakudo!

Conclusion

We've learned where Rakudo's source lives, how to find the commit the current compiler is built off, and how to locate the source code for a particular sub or method in a giant file called the setting. We then further hacked away the inconveniences by getting to the actual place in the source code we can edit, culminating with a shiny module and a couple of handy command line aliases.

Happy hacking!

UPDATE 2016.08.05

Inspired by this blog post, lizmat++ has changed the setting generation script to not skip any lines, so making adjustments to line numbers by fetching source from GitHub is no longer necessary, as the line numbers match up with the original source.

Hacking on The Rakudo Perl 6 Compiler: Mix Your Fix

Read this article on Perl6.Party

While testing a fix for one of the Less Than Awesome behaviours in standalone Signature objects, I came across a bugglet. Smartmatching two Signatures throws, while spilling a bit of the guts:

<Zoffix> m: my $m = method ($a: $b) { }; say $m.signature ~~ :($a, $b);
<camelia> rakudo-moar 46838d: OUTPUT«Method 'type' not found for invocant of class 'Any'␤ in block at line 1␤␤»

So I figured I'll write about fixing it, 'cause hacking on internals is lots of fun. Let's roll!

Golf It Down

The less code there is to reproduces the bug, the fewer places there are for that bug to hide. We have a detached method and then we smartmatch its signature against something else. Let's try to golf it down a bit and smartmatch two Signatures, without involving a method:

<Zoffix> m: :($a, $b) ~~ :($a, $b);
<camelia> rakudo-moar 46838d: ( no output )

The bug disappeared, so perhaps out Signature on the left doesn't contain the stuff that triggers the bug. Let's dump the signature of the method to see what we should match against:

<Zoffix> m: my $m = method ($a: $b) { }; say $m.signature <camelia> rakudo-moar 46838d: OUTPUT«($a: $b, *%_)␤»

Aha! It has a slurpy hash: *%_. Let's try matching a Signature with a slurpy in it:

<Zoffix> m: :(*%) ~~ :();
<camelia> rakudo-moar 46838d: OUTPUT«Method 'type' not found for invocant of class 'Any'␤ in block at line 1␤␤»

And there we go: hole in three. Let's proceed.

Roast It

There's an official Perl 6 test suite that Rakudo must pass to be called a Perl 6 compiler. Since we got a bug on our hands, we should add a test for it to the test suite to ensure it doesn't rear its ugly head again.

The copy of the repo gets automatically cloned into t/spec when you run make spectest in Rakudo's checkout. If you don't have a commit bit, you can just change the remote/branch of that checkout to your fork:

cd t/spec
git remote rm origin
git remote add origin https://github.com/YOURUSERNAME/roast
git checkout your-branch
cd ../..

It may be tricky to figure out which file to put the test in, if you're new. You can always ask the good folks on irc.freenode.net/#perl6 for advice. In this case, I'll place the test into S06-signature/outside-subroutine.t

While not required, I find it helpful to open a ticket for the bug. This way I can reference it in my fix in the compiler repo, I can reference it in the commit to the test repo, and people get a place where to tell me why I'm being stupid when I am. I opened this bug as RT#128795.

Now, for the code of the test itself. I'll adjust the plan at the top of the file to include however many tests I'm writing—in this case one. I'll use the lives-ok test sub and stick our buggy golfed code into it. Here's the diff of the changes to the file; note the reference to the ticket number in the comment before the test:

@@ -1,7 +1,7 @@
  use v6;
  use Test;

 -plan 3;
 +plan 4;

  # RT #82946
  subtest 'signature binding outside of routine calls' => {
 @@ -25,4 +25,7 @@ subtest 'smartmatch on signatures with literal strings' => {
  # RT #128783
  lives-ok { EVAL ’:($:)‘ }, ’signature marker is allowed in bare signature‘;

 +# RT #128795
 +lives-ok { :(*%)~~ :() }, 'smartmatch with no slurpy on right side';
 +
  # vim: ft=perl6

Run the file now to ensure the test fails. Hint: some files have fudging; explaining it is out of the scope of this article, but if you notice failures you're not expecting, look it up.

$ make t/spec/S06-signature/outside-subroutine.t
...
Test Summary Report
-------------------
t/spec/S06-signature/outside-subroutine.t (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  4
  Non-zero exit status: 1

With the test in place, it's time to look at some source code. Let the bug hunt begin!

Make it Saucy

Our bug involves a Smartmatch operator, which aliases the left side to the topic variable $_ and calls .ACCEPTS method on the right side with it. Both of our sides are Signature objects, so let's pop open Rakudo's sauce code for that class.

In the Rakudo's repo, directory src/core/ contains most of the built in types in separate files named after those types, so we'll just pop open src/core/Signature.pm in the editor and locate the definition of method ACCEPTS.

There are actually four multis for ACCEPTS. Here's the full code. Don't try to understand all of it, just note its size.

``` multi method ACCEPTS(Signature:D: Capture $topic) { nqp::p6bool(nqp::p6isbindable(self, nqp::decont($topic))); }

multi method ACCEPTS(Signature:D: @topic) {
    self.ACCEPTS(@topic.Capture)
}

multi method ACCEPTS(Signature:D: %topic) {
    self.ACCEPTS(%topic.Capture)
}

multi method ACCEPTS(Signature:D: Signature:D $topic) {
    my $sclass = self.params.classify({.named});
    my $tclass = $topic.params.classify({.named});
    my @spos := $sclass{False} // ();
    my @tpos := $tclass{False} // ();

    while @spos {
        my $s;
        my $t;
        last unless @tpos && ($t = @tpos.shift);
        $s=@spos.shift;
        if $s.slurpy or $s.capture {
            @spos=();
            @tpos=();
            last;
        }
        if $t.slurpy or $t.capture {
            return False unless any(@spos) ~~ {.slurpy or .capture};
            @spos=();
            @tpos=();
            last;
        }
        if not $s.optional {
            return False if $t.optional
        }
        return False unless $t ~~ $s;
    }
    return False if @tpos;
    if @spos {
        return False unless @spos[0].optional or @spos[0].slurpy or @spos[0].capture;
    }

    for flat ($sclass{True} // ()).grep({!.optional and !.slurpy}) -> $this {
        my $other;
        return False unless $other=($tclass{True} // ()).grep(
            {!.optional and $_ ~~ $this });
        return False unless +$other == 1;
    }

    my $here=($sclass{True}:v).SetHash;
    my $hasslurpy=($sclass{True} // ()).grep({.slurpy});
    $here{@$hasslurpy} :delete;
    $hasslurpy .= Bool;
    for flat @($tclass{True} // ()) -> $other {
        my $this;

        if $other.slurpy {
            return False if any($here.keys) ~~ -> Any $_ { !(.type =:= Mu) };
            return $hasslurpy;
        }
        if $this=$here.keys.grep( -> $t { $other ~~ $t }) {
            $here{$this[0]} :delete;
        }
        else {
            return False unless $hasslurpy;
        }
    }
    return False unless self.returns =:= $topic.returns;
    True;
}

```

The error we get from the bug mentions .type method call and there is one such method call in the code above (close to the end of it). In this case, there's quite a bit of code to sort through. It would be nice to be able to play around with it, stick a couple of dd or say calls to dump out variables, right?

That approach, however, is somewhat annoying because after each change we have to recompile the entire Rakudo. On the meatiest box I got, it takes about 60 seconds. Not the end of the world, but there's a way to make things lightning fast!

Mix Your Fix

We need to fix a bug in a method of a class. Another way to think of it is: we need to replace a broken method with a working one. Signature class is just like any other class, so if we want to replace one of its methods, we can just mix in a role!

The broken ACCEPTS will continue to live in the compiler, and we'll pop open a separate playground file and define a role—let's calls it FixedSignature—in it. To get our new-and-improved ACCEPTS method in standalone signature objects, we'll use the but operator to mix the FixedSignature in.

Here's the role, the mixing in, and the code that triggers the bug. I'll leave out method bodies for brieviety, but there's they are the same as in the code above.

role FixedSignature {
    multi method ACCEPTS(Signature:D: Capture $topic)     { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: @topic)             { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: %topic)             { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: Signature:D $topic) { #`(redacted for brevity) }
}

my $a = :(*%) but FixedSignature;
my $b = :()   but FixedSignature;

say $a ~~ $b;

There are two more things we need to do for our role to work properly. First, we're dealing with multis and right now the multis in our role are creating ambiguities with the multis in the original Signature class. To avoid that, we'll define a proto:

proto method ACCEPTS (|) { * }

Since the code is using some NQP, we also need to bring in those features into our playground file with the role. Just add the appropriate pragma at the top of the file:

use MONKEY-GUTS;

With these modifications, our final test file becomes the following:

use MONKEY-GUTS;

role FixedSignature {
    proto method ACCEPTS (|) { * }

    multi method ACCEPTS(Signature:D: Capture $topic)     { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: @topic)             { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: %topic)             { #`(redacted for brevity) }
    multi method ACCEPTS(Signature:D: Signature:D $topic) { #`(redacted for brevity) }
}

my $a = :(*%) but FixedSignature;
my $b = :()   but FixedSignature;

say $a ~~ $b;

And with this trick in place, we now have a rapid-fire weapon to hunt down the bug with—the changes we make compile instantly.

Pull The Trigger

Now, we can debug the code just like any other. I prefer applying liberal amounts of dd (or say) calls and dumping out the variables to ensure their contents match expectations.

The .type method call our error message mentions is in this line:

return False if any($here.keys) ~~ -> Any $_ { !(.type =:= Mu) };

It calls it on the keys of $here, so let's dump the $here before that statement:

...
dd $here
return False if any($here.keys) ~~ -> Any $_ { !(.type =:= Mu) };
...
# OUTPUT:
# SetHash $here = SetHash.new(Any)

Here's our offending Any, let's go up a bit and dump the $here right where it's defined:

...
my $here=$sclass{True}.SetHash;
dd $here;
...
# OUTPUT:
# SetHash $here = SetHash.new(Any)

It's still there, and for a good reason. If we trace the creation of $sclass, we'll see it's this:

my $sclass = self.params.classify({.named});

The params of the Signature on the right of the smartmatch get classified based on whether they are named or not. The named parameters will be inside a list under the True key of $sclass. Since we do not have any named params, there won't be such a key, and we can verify that with this bit of code:

:().params.classify(*.named).say
# OUTPUT:
# {}

When we go to define $here, we get an Any from $sclass{True}, since that key doesn't exist, and when we call .SetHash on it, we get our problematic Sethash object with an Any in it. And so, we have our fix for the bug: ensure the True key in $sclass is actually there before creating a SetHash out of its value:

my $here=($sclass{True}:v).SetHash;

Add that to our playground file with the FixedSignature role in it, run it, and verify the fix works. Now, simply transplant the fix back into src/core/Signature.pm and then compile the compiler.

perl Configure.pl --gen-moar --gen-nqp --backends=moar
make
make test
make install

Verify our fix worked before we proceed onto the final stages:

$ make t/spec/S06-signature/outside-subroutine.t
...
All tests successful.
Files=1, Tests=4,  1 wallclock secs ( 0.03 usr  0.00 sys +  0.32 cusr  0.02 csys =  0.37 CPU)
Result: PASS

A Clean Kill

So far, all we know is the bug we found was fixed and the tests we wrote for it pass. However, before we ship our fix, we must ensure we didn't break anything else. There are other devs working from the same repo and you'll be interfering with their work if you break stuff.

Run the full Roast test suite with make spectest command. You can use the TEST_JOBS environmental variable to specify the number of simultaneous tests. Generally a value slightly higher than the available cores works the fastest... and cores make all the difference. On my 24-core VM I cut releases on, the spectest completes in about 1 minute and 15 seconds. On my 2-core web server, it takes about 25 minutes. You get the idea.

TEST_JOBS=28 make spectest
...
All tests successful.
Files=1111, Tests=52510, 82 wallclock secs (13.09 usr 2.44 sys + 1517.34 cusr 97.67 csys = 1630.54 CPU)
Result: PASS

Once the spectest completes and we have the clean bill of health, we're ready to ship our fix. Commit the Rakudo fix, then go into t/spec and commit the Roast fix:

git commit -m 'Fix Smartmatch with two signatures, only one of which has slurpy hash' \
           -m 'Fixes RT#128795' src/core/Signature.pm
git push

cd t/spec
git commit -m 'smartmatch on signature with no slurpy on right side does not crash' \
           -m 'RT#128795' S06-signature/outside-subroutine.t
git push

If you're pushing to your fork of these projects, you have to go the extra step and submit a Pull Request (just go to your fork and GitHub should display a button just for that).

And we're done! Celebrate with the appropriate amount of fun.

Conclusion

Rakudo bugs can be easy to fix, requiring not much more than knowledge of Perl 6. To fix them, you don't need to re-compile the entire compiler, but can instead define a small role with a method you're trying to fix and modify and recompile just that.

It's important to add tests for the bug into the official test suite and it's also important to run the full spectest after you fix the bug. But most important of all, is to have fun fixing it.

-Ofun

IRC::Client: Perl 6 Multi-Server IRC (or Awesome Async Interfaces with Perl 6)

Read this article on Perl6.Party

I wrote my first Perl 6 program—a New Years IRC Party bot—around Christmas, 2015. The work included releasing the IRC::Client module, and given my virginity with the language and blood alcohol level appropriate for the Holiday Season, the module ended up sufficiently craptastic.

Recently, I needed a tool for some Perl 6 bug queue work, so I decided to lock myself up for a weekend and re-design and re-write the module from scratch. Multiple people bugged me to do so over the past months, so I figured I'd also write a tutorial for how to use the module—as an apology for being a master procrastinator. And should IRC be of no interest to you, I hope the tutorial will prove useful as a general example of async, non-blocking interfaces in Perl 6.

The Basics

To create an IRC bot, instantiate an IRC::Client object, giving it some basic info, and call the .run method. Implement all of the functionality you need as classes with method names matching the events you want to listen to and hand those in via the .plugins attribute. When an IRC event occurs, it's passed to all of the plugins, in the order you specify them, stopping if a plugin claims it handled the event.

Here's a simple IRC bot that responds to being addressed in-channel, notices, and private messages sent to it. The response is the uppercased original message the bot received:

use IRC::Client;
.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(class { method irc-to-me ($_) { .text.uc } })

And here's what the bot looks like when running:

<Zoffix> MahBot, I ♥ you!
<MahBot> Zoffix, I ♥ YOU!

The :nick, :host, and :channels are the nick for your bot, the server it should connect to, and channels it should join. The :debug controls how much debugging output to display. We'll set it to value 1 here, for sparse debug output, just to see what's happening. Tip: install the optional Terminal::ANSIColor module to make debug output purty:

For the .plugins attribute, we hand in an anonymous class. If you have multiple plugins, just shove them all in in the order you want them to receive events in:

:plugins(PlugFirst.new, PlugSecond.new(:conf), class { ... })

The plugin class of our uppercasing bot has a single method that listens to irc-to-me event, triggered whenever the bot is addressed in-channel or is sent a private message or notice. It receives a single argument: one of the objects that does the IRC::Client::Message role. We stick it into the $_ topical variable to save a bit of typing.

We reply to the event by returning a value from the method. The original text is contained inside the .text attribute of the message object, so we'll call .uc method on it to uppercase the content and that's what our reply will be.

As awesome as our uppercasing bot is, it's as useful as an air conditioner on a polar expedition. Let's teach it some tricks.

Getting Smarter

We'll call our new plugin Trickster and it'll respond to commands time—that will give the local time and date—and temp—that will convert temperature between Fahrenheit and Celsius. Here's the code:

use IRC::Client;

class Trickster {
    method irc-to-me ($_) {
        given .text {
            when /time/ { DateTime.now }
            when /temp \s+ $<temp>=\d+ $<unit>=[F|C]/ {
                when $<unit> eq 'F' { "That's {($<temp> - 32) × .5556}°C" }
                default             { "That's { $<temp> × 1.8 + 32   }°F" }
            }
            'huh?'
        }
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(Trickster)

<Zoffix> MahBot, time
<MahBot> Zoffix, 2016-07-23T19:00:15.795551-04:00
<Zoffix> MahBot, temp 42F
<MahBot> Zoffix, That's 5.556°C
<Zoffix> MahBot, temp 42C
<MahBot> Zoffix, That's 107.6°F
<Zoffix> MahBot, I ♥ you!
<MahBot> Zoffix, huh?

The code is trivial: we pass the given text over a couple of regexes. If it contains word time, we return the current time. If it contains word temp we do the appropriate math, based on whether the given number is postfixed by an F or a C. And if no matches happen, we end up returning the inquisitive huh?.

There's an obvious problem with this new and improved plugin: the bot no longer loves me! And while I'll survive the heartache, I doubt any other plugin will teach the bot to love again, as Trickster consumes all irc-to-me events, even if it doesn't recognize any of the commands it can handle. Let's fix that!

Passing The Buck

There's a special value that can be returned by the event handler to signal that it did not handle the event and that it should be propagated to further plugins and event handlers. That value is provided by the .NEXT attribute offered by the IRC::Client::Plugin role, which a plugin does to obtain that attribute. The role is automatically exported when you use IRC::Client.

Let's look at some code utilizing that special value. Note that since .NEXT is an attribute and we can't look up attributes on type objects, you need to go the extra step and instantiate your plugin classes when giving them to :plugins.

use IRC::Client;

class Trickster does IRC::Client::Plugin {
    method irc-to-me ($_) {
        given .text {
            when /time/ { DateTime.now }
            when /temp \s+ $<temp>=\d+ $<unit>=[F|C]/ {
                when $<unit> eq 'F' { "That's {($<temp> - 32) × .5556}°C" }
                default             { "That's { $<temp> × 1.8 + 32   }°F" }
            }
            $.NEXT;
        }
    }
}

class BFF does IRC::Client::Plugin {
    method irc-to-me ($_) {
        when .text ~~ /'♥'/ { 'I ♥ YOU!' };
        $.NEXT;
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(Trickster.new, BFF.new)

<Zoffix> MahBot, time
<MahBot> Zoffix, 2016-07-23T19:37:45.788272-04:00
<Zoffix> MahBot, temp 42F
<MahBot> Zoffix, That's 5.556°C
<Zoffix> MahBot, temp 42C
<MahBot> Zoffix, That's 107.6°F
<Zoffix> MahBot, I ♥ you!
<MahBot> Zoffix, I ♥ YOU!

We now have two plugins that both subscribe to irc-to-me event. The :plugins attribute receives Trickster plugin first, so its event handler will be run first. If the received text does not match either of the Trickster's regexes, it returns $.NEXT from the method.

That signals the Client Object to go hunting for other handlers, so it gets to BFF's irc-to-me handler. There, we reply if the input contains a heart, if not, we pre-emptively return $.NEXT here too.

While the bot got its sunny disposition back, it did so at the cost of quite a bit of extra typing. What can we do about that?

Multify All The Things!

Perl 6 supports multi-dispatch as well as type constraints in signatures. On top of that, smartmatch against IRC::Client's message objects that have a .text attribute uses the value of that attribute. Combine all three of those features and you end up with ridiculously concise code:

use IRC::Client;
class Trickster {
    multi method irc-to-me ($ where /time/) { DateTime.now }
    multi method irc-to-me ($ where /temp \s+ $<temp>=\d+ $<unit>=[F|C]/) {
        $<unit> eq 'F' ?? "That's {($<temp> - 32) × .5556}°C"
                       !! "That's { $<temp> × 1.8 + 32   }°F"
    }
}

class BFF { method irc-to-me ($ where /'♥'/) { 'I ♥ YOU!' } }

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(Trickster, BFF)

<Zoffix> MahBot, time
<MahBot> Zoffix, 2016-07-23T19:59:44.481553-04:00
<Zoffix> MahBot, temp 42F
<MahBot> Zoffix, That's 5.556°C
<Zoffix> MahBot, temp 42C
<MahBot> Zoffix, That's 107.6°F
<Zoffix> MahBot, I ♥ you!
<MahBot> Zoffix, I ♥ YOU!

Outside of the signature, we no longer have any need for the message object, so we use the anonymous $ parameter in its place. We then type-constrain that parameter with a regex match, and so the method will be called only if the text of the message matches that regex. Since no methods will be called on failed matches, we no longer have to mess around with the whole $.NEXT business or compose any roles into our plugins.

The bodies of our methods each have a single statement that produces the response value for the event. In the temperature converter, we use the ternary operator to select which formula to use for the conversion, depending on the unit requested, and yes, the $<unit> and $<temp> captures created in the signature type constraint match are available in the method's body.

An Eventful Day

Along with standard named and numerical IRC protocol events, IRC::Client offers convenience events. One of them we've already seen: the irc-to-me event. Such events are layered, so one IRC event can trigger several IRC::Client's events. For example, if someone addresses our bot in a channel, the following chain of events will be fired:

irc-addressed  ▶  irc-to-me  ▶  irc-privmsg-channel  ▶  irc-privmsg  ▶  irc-all

The events are ordered from "narrowest" to "widest": irc-addressed can be triggered only in-channel, when our bot is addressed; irc-to-me can also be triggered via notice and private message, so it's wider; irc-privmsg-channel includes all channel messages, so it's wider still; and irc-privmsg also includes private messages to our bot. The chain ends by the widest event of them all: irc-all.

If a plugin's event handler returns any value other than $.NEXT, later events in the event chain won't be fired, just as plugins later in the plugin chain won't be tried for the same reason. Each event is tried on all of the plugins, before attempting to handle a wider event.

By setting the :debug attribute to level 3 or higher, you'll get emitted events in the debug output. Here's our bot attempting to handle unknown command blarg and then processing command time handled by irc-to-me event handler we defined:

All of IRC::Client's events have irc- prefix, so you can freely define auxiliary methods in your plugin, without worrying about conflicting with event handlers. Speaking of emitting things...

Keep 'Em Commin'

Responding to commands is sweet and all, but many bots will likely want to generate some output out of their own volition. As an example, let's write a bot that will annoy us whenever we have unread GitHub notifications!

use IRC::Client;
use HTTP::Tinyish;
use JSON::Fast;

class GitHub::Notifications does IRC::Client::Plugin {
    has Str  $.token  = %*ENV<GITHUB_TOKEN>;
    has      $!ua     = HTTP::Tinyish.new;
    constant $API_URL = 'https://api.github.com/notifications';

    method irc-connected ($) {
        start react {
            whenever self!notification.grep(* > 0) -> $num {
                $.irc.send: :where<Zoffix>
                            :text("You have $num unread notifications!")
                            :notice;
            }
        }
    }

    method !notification {
        supply {
            loop {
                my $res = $!ua.get: $API_URL, :headers{ :Authorization("token $!token") };
                $res<success> and emit +grep *.<unread>, |from-json $res<content>;
                sleep $res<headers><X-Poll-Interval> || 60;
            }
        }
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(GitHub::Notifications.new)

[00:25:41] -MahBot- Zoffix, You have 20 unread notifications!
[00:26:41] -MahBot- Zoffix, You have 19 unread notifications!

We create GitHub::Notifications class that does the IRC::Client::Plugin role. That role gives us the $.irc attribute, which is the IRC::Client object we'll use to send messages to us on IRC.

Aside from irc-connected method, the class is just like any other: a public $.token attribute for our GitHub API token, a private $!ua attribute that keeps our HTTP User Agent object around, and a private notification method, where all the action happens.

Inside notification, we create a Supply that will emit the number of unread notifications we have. It does so by using an HTTP::Tinyish object to access a GitHub API endpoint. On line 24, it parses the JSON returned by successful requests, and greps the message list for any items with unread property set to true. The prefix + operator converts the list to an Int that is total items found, which is what we emit from our supply.

The irc-connected event handler gets triggered when we successfully connect to an IRC server. In it, we start an event loop that reacts whenever we receive the current unread messages count from our supply given by notifications method. Since we're only interested in cases where we do have unread messages, we also pop a grep on the supply to filter out the cases without any messages (yes, we could avoid emitting those in the first place, but I'm showing off Perl 6 here 😸). And once we do have unread messages, we simply call IRC::Client's .send method, asking it to send us an IRC notice with the total number of unread messages. Pure awesomeness!

Don't Wait Up

We've covered the cases where we either have an asynchronous supply of values we sent to IRC or where we reply to a command right away. It's not uncommon for a bot command to take some time to execute. In those cases, we don't want the bot to lock up while the command is doing its thing.

Thanks to Perl 6's excellent concurrency primitives, it doesn't have to! If an event handler returns a Promise, the Client Object will use its .result as the reply when it is kept. This means that in order to make our blocking event handler non-blocking, all we have to do is wrap its body in a start { ... } block. What could be simpler?

As an example, let's write a bot that will respond to bash command. The bot will fetch bash.org/?random1, parse out the quotes from the HTML, and keep them in the cache. When the command is triggered, the bot will hand out one of the quotes, repeating the fetching when the cache runs out. In particular, we don't want the bot to block while retrieving and parsing the web page. Here's the full code:

use IRC::Client;
use Mojo::UserAgent:from<Perl5>;

class Bash {
    constant $BASH_URL = 'http://bash.org/?random1';
    constant $cache    = Channel.new;
    has        $!ua    = Mojo::UserAgent.new;

    multi method irc-to-me ($ where /bash/) {
        start $cache.poll or do { self!fetch-quotes; $cache.poll };
    }

    method !fetch-quotes {
        $cache.send: $_
            for $!ua.get($BASH_URL).res.dom.find('.qt').each».all_text.lines.join: '  ';
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#perl6>
    :debug
    :plugins(Bash.new)

<Zoffix> MahBot, bash
<MahBot> Zoffix, <Time> that reminds me of when Manning and I installed OS/2 Warp4 on a box and during the install routine it said something to the likes of 'join the hundreds of people on the internet'

For page fetching needs, I chose Perl 5's Mojo::UserAgent, since it has an HTML parser built-in. The :from<Perl5> adverb indicates to the compiler that we want to load a Perl 5, not Perl 6, module.

Since we're multi-threading, we'll use a Channel as a thread-safe queue for our caching purposes. We subscribe to the irc-to-me event where text contains word bash. When the event handler is triggered, we pop out to a new thread using the start keyword. Then we .poll our cache and use the cached value if we have one, otherwise, the logic will move onto the do block that that calls the fetch-quotes private method and when that completes, polls the cache once more, getting a fresh quote. All said and done, a quote will be the result of the Promise we return from the event handler.

The fetch-quotes method fires up our Mojo::UserAgent object that fetches the random quotes page from the website, finds all HTML elements that have class="qt" on them—those are paragraphs with quotes. Then, we use a hyper method call to convert those paragraphs to just text and that final list is fed to our $cache Channel via a for loop. And there you go, we non-blockingly connected our bot to the cesspit of the IRC world. And speaking of things you may want to filter...

Watch Your Mouth!

Our bot would get banned rather quickly if it spewed enormous amounts of output into channels. An obvious solution is to include logic in our plugins that would use a pastebin if the output is too large. However, it's pretty impractical to add such a thing to every plugin we write. Luckily, IRC::Client has support for filters!

For any method that issues a NOTICE or PRIVMSG IRC command, IRC::Client will pass the output through classes given to it via :filters attribute. This means we can set up a filter that will automatically pastebin large output, regardless of what plugin it comes from.

We'll re-use our bash.org quote bot, except this time it will pastebin large quotes to Shadowcat pastebin. Let's look at some code!

use IRC::Client;
use Pastebin::Shadowcat;
use Mojo::UserAgent:from<Perl5>;

class Bash {
    constant $BASH_URL = 'http://bash.org/?random1';
    constant $cache    = Channel.new;
    has        $!ua    = Mojo::UserAgent.new;

    multi method irc-to-me ($ where /bash/) {
        start $cache.poll or do { self!fetch-quotes; $cache.poll };
    }

    method !fetch-quotes {
        $cache.send: $_
            for $!ua.get($BASH_URL).res.dom.find('.qt').each».all_text;
    }
}

.run with IRC::Client.new:
    :nick<MahBot>
    :host<irc.freenode.net>
    :channels<#zofbot>
    :debug
    :plugins(Bash.new)
    :filters(
        -> $text where .lines > 1 || .chars > 300 {
            Pastebin::Shadowcat.new.paste: $text.lines.join: "\n";
        }
    )

<Zoffix> MahBot, bash
<MahBot> Zoffix, <intuit> hmm maybe sumtime next week i will go outside'
<Zoffix> MahBot, bash
<MahBot> Zoffix, http://fpaste.scsys.co.uk/528741

The code that does all the filtering work is small enough that it's easy to miss—it's the last 5 lines in the program above. The :filters attribute takes a list of Callables, and here we're passing a pointy block. In its signature we constraint the text to be more than 1 line or more than 300 characters long, so our filter will be run only when those criteria are met. Inside the block, we simply use the Pastebin::Shadowcat module to throw the output onto the pastebin. Its .paste method returns the URL of the newly-created paste, which is what our filter will replace the original content with. Pretty awesome!

It Spreads Like Butter

In the past, when I used other IRC client tools, whenever someone asked me to place my bots on other servers, the procedure was simple: copy over the code to another directory, change config, and you're done. It almost made sense that a new server would mean a "new" bot: different channels, different nicknames, and so on.

In Perl 6's IRC::Client, I tried to re-imagine things a bit: a server is merely another identifier for a message, along with a channel or nickname. This means connecting your bot to multiple servers is as simple as adding new server configuration via :servers attribute:

use IRC::Client;

class BFF {
    method irc-to-me ($ where /'♥'/) { 'I ♥ YOU!' }
}

.run with IRC::Client.new:
    :debug
    :plugins(BFF)
    :nick<MahBot>
    :channels<#zofbot>
    :servers(
        freenode => %(
            :host<irc.freenode.net>,
        ),
        local => %(
            :nick<P6Bot>,
            :channels<#zofbot #perl6>,
            :host<localhost>,
        )
    )

[on Freenode server]
<ZoffixW> MahBot, I ♥ you
<MahBot> ZoffixW, I ♥ YOU!

[on local server]
<ZoffixW> P6Bot, I ♥ you
<P6Bot> ZoffixW, I ♥ YOU!

First, our plugin remains oblivious that it's being run on multiple servers. Its replies get redirected to the correct server and IRC::Client still executes its method handler in a thread-safe way.

In the IRC::Client's constructor we added :servers attribute that takes a Hash. The keys of this Hash are servers' labels and values are server-specific configurations that override global settings. So freenode server gets its :nick and :channels from the :nick and :channels attributes we give to IRC::Client, while the local server overrides those with its own values.

The debug output now has server lables printed, to indicate to which server the event applies:

And so, but simply telling the bot to connect to another server, we made it multi-server, without making any changes to our plugins. But what do we do when we do want to talk to a specific server?

Send It That Way

When the bot is .run, the Client Object changes the values of :servers attribute to be IRC::Client::Server objects. Those stringify to the label for the server they represent and we can get them either from the .server attribute of the Message Object or .servers hash attribute of the Client Object. Client Object methods such as .send or .join take an optional server attribute that controls which server the message will be sent to and defaults to value *, which means send to every server.

Here's a bot that connects to two servers and joins several channels. Whenever it sees a channel message, it forwards it to all other channels and sends a private message to user Zoffix on server designated by label local.

use IRC::Client;

class Messenger does IRC::Client::Plugin {
    method irc-privmsg-channel ($e) {
        for $.irc.servers.values -> $server {
            for $server.channels -> $channel {
                next if $server eq $e.server and $channel eq $e.channel;

                $.irc.send: :$server, :where($channel), :text(
                    "$e.nick() over at $e.server.host()/$e.channel() says $e.text()"
                );
            }
        }

        $.irc.send: :where<Zoffix>
                    :text('I spread the messages!')
                    :server<local>;
    }
}

.run with IRC::Client.new:
    :debug
    :plugins[Messenger.new]
    :nick<MahBot>
    :channels<#zofbot>
    :servers{
        freenode => %(
            :host<irc.freenode.net>,
        ),
        local => %(
            :nick<P6Bot>,
            :channels<#zofbot #perl6>,
            :host<localhost>,
        )
    }

[on Freenode server/#zofbot]
<ZoffixW> Yey!
[on local server/#zofbot]
<P6Bot> ZoffixW over at irc.freenode.net/#zofbot says Yey!
[on local server/#perl6]
<P6Bot> ZoffixW over at irc.freenode.net/#zofbot says Yey!
[on local server/ZoffixW private message queue]
<P6Bot> I spread the messages!

We subscribe to the irc-privmsg-channel event and when it's triggered, we loop over all the servers. For each server, we loop over all of the connected channels and use $.irc.send method to send a message to that particular channel and server, unless the server and channel are the same as where the message originated.

The message itself calls .nick, .channel, and .server.host methods on the Message Object to identify the sender and origin of the message.

Conclusion

Perl 6 offers powerful concurrency primitives, dispatch methods, and introspection that lets you build awesome non-blocking, event-based interfaces. One of them is IRC::Client that lets you use IRC networks. It's here. It's ready. Use it!

Perl 6 Hands-On Workshop: Weatherapp (Part 3)

Read this article on Perl6.Party

Be sure to read Part 1 and Part 2 of this workshop first.

There is black box testing, glass box testing, unit testing, integration testing, functional testing, system testing, end-to-end testing, sanity testing, regression testing, acceptance testing, load testing, stress testing, performance testing, usability testing, and many more types of testing.

I'll leave it for people with thicker glasses to explain all of the types. Today, we'll write tests that ensure our weather reporting module works as expected, and as a bonus, you get to pick your own label for what type of tests these are. Let's dive in!

TDD

TDD (Test-Driven Development) is where you write a bunch of tests before you write the actual code, ensure they fail—because code to satisfy them isn't there yet—and then you write code until the tests succeed. Now you can safely refactor your code or add new features without worrying you'll break something. Rinse and repeat.

Not only do you avoid having to convince yourself to bother writing tests after your code seems to work, you also get a feel for how comfortable your interface is to use before you even create it.

Testing Modules

Perl 6 comes with a number of standard modules included, one of which is a module called Test that we'll use. The Ecosystem also has dozens of other test related modules and we'll use two called Test::When and Test::META

Test provides all the generic testing routines we'll use, Test::When will let us watch for when the user actually agreed to run specific types of tests, and Test::META will keep an eye on the sanity of our distribution's META file (more on that later).

To install Test::When and Test::META, run zef install Test::When Test::META or panda install Test::When Test::META, depending on which module manager you're using.

Testing Files

Our testing files are named with the extension .t and go into t/ directory. They will be automatically discovered and run by module managers during installation of our module.

You are free to organize your tests under subdirectories; they will still be automatically found. It's also common to prefix the names of tests with a sequential number, e.g. 00-init.t, 01-methods.t, etc. It's more of an organizational practice and in no way should your tests in one file depend on whether tests in another file ran first.

Boilerplate

use Test;

use My::Module;
is get-stuff(), 'the right stuff', 'The stuff we received is correct';

done-testing;

# or

use Test;

plan 1;

use My::Module;
is get-stuff(), 'the right stuff', 'The stuff we received is correct';

The two versions above differ in that the first doesn't care how many tests you run and the second expects exactly one test to run. The former knows all tests ran when done-testing is called while the latter counts how many ran and complains if the count doesn't match the plan.

The version without a plan is generally easier to use, especially in a highly collaborative environment where multiple people might be adding tests to the file, so keeping an accurate test count becomes annoying. The one thing to be careful with planless method is this:

my @results = get-results;
for @results.kv -> $i, $v {
    is $v, 'expected', "result #{$i+1} is correct";
}

This test will run correctly regardless of how many results we get in @results, even if it is none! We should add an additional test that ensures @results contains the correct number of results:

is @results.elems, 5, 'got five results';

Our Files

We'll create two test files and our directory structure will look like this:

t
├── key
├── 01-use.t
├── author
│   └── 01-meta.t
└── online
    └── 01-weather-for.t

We placed our META file test into an author subdirectory because that test is useful only for us and not the user, so there's no point in having to require them to install the extra modules. The same logic should apply to other tests, like ones that test documentation completeness or any other test failing which does not mean the module itself is broken. No one wants their build to stall just because you didn't document a new experimental method, so we should avoid running those tests on the installer's machine.

Our main test file goes into online directory, as it will be run only when the installer requests online tests. The names of these subdirectories are arbitrary and their existence is purely for organizational purposes. Whether the tests are actually run is controlled by Test::When module.

Last but not least, we have the key file containing our API key. This way, we don't hardcode it into any one test, it's more obvious that this sort of data is present in our codebase, and we know where to go if we have to replace it (even if we add multiple files that need the key). Depending on the service you are using, you may choose to make the key entirely private and ask the installer to enter their own key. Some services offer tester keys or sandboxed endpoints precisely for the purposes of users running tests.

The 01-use.t and author/01-meta.t tests are rather unspectacular.

# t/01-use.t
use Test;

use-ok 'WebService::Weather';

done-testing;

We call use-ok that tests whether the module can be loaded and we give it the name of our future module as the argument. Generally, this test isn't needed, since you're going to use your module to bring in the functionality for testing anyway. In this particular case, however, all of our other tests may get skipped (installer doesn't ask for author/online tests), resulting in Result: NOTESTS output, which I don't entirely trust for all module installers to know to interpret as success.

The Meta file test is just a copy-paste from the docs, which works for any distribution:

# t/author/01-meta.t
use Test::When <author>;
use Test;
use Test::META;

meta-ok;

done-testing;

In both tests we include Test module and call done-testing at the end. In the Meta file test we've used use Test::When <author> to indicate this test is an author test and we'll need to set an environmental variable for it to run—more on that later.

Main Test

To write the main test, we'll peak into what sort of values the API returns and try to model them. We need to strike a balance between knowing we received a legit value from our subroutine or method, while not making the test so precise that it fails the minute the valid value we receive decided to wear a hat and put on makeup.

Here's the code for the test:

# t/online/01-weather-for.t
use Test::When <online>;
use Test;
use WebService::Weather;

for ('London'), ('London', 'ca') -> $args {
    subtest {
        my $result = weather-for |$args;

        isa-ok $result, 'WebService::Weather::Result',
            'result is of a correct data type';

        does-ok $result."$_"(), Numeric, "$_ is numerical"
            for <temp wind precip>;

        cmp-ok $result.temp,   &[<],  70,   'temperature is not too high';
        cmp-ok $result.temp,   &[>],  -100, 'temperature is not too low';
        cmp-ok $result.wind,   &[<],  120,  'wind speed is not too high';
        cmp-ok $result.wind,   &[>=], 0,    'wind speed is not too low';
        cmp-ok $result.precip, &[<],  3200, 'precipitation is not too high';
        cmp-ok $result.precip, &[>=], 0,    'precipitation is not too low';
    }, "Testing with args: $args";
}

isa-ok weather-for('blargs' x 12), Failure,
    'we get a Failure for unknown city';

done-testing;

We use Test::When to mark this test as requiring an active Internet connection, so the test will only run when the installer explicitly requests to do so via an environmental variable. We also use the module we'll make.

In the first for loop, we're iterating over two sets of arguments: city only and city + country. The loop executes a subtest on each iteration, delineating our results in the output nicely. When we call weather-for we Slip each set of arguments in and save the return value into our $result.

We follow the interface described in our DESIGN doc to write the tests for the result. It needs to be an object and it has .temp, .wind, and .precip methods and their values are Numeric.

The isa-ok sub tests our result is of the correct class and does-ok sub checks all of the return values do the Numeric role—note how we simply used another for loop there, to avoid duplicating the test code.

The last segment of the test uses a bunch of cmp-ok tests to check the sanity of the range of the returned values. Since we don't know what the weather is like on the day we're running the test, we can't check for the exact values. I've consulted with the list of weather records to get an idea for the range of the values we're expecting.

Lastly, outside our main for loop, we have one more test that gives weather-for a garbage city name and tests that it returns a Failure object.

We're done with our tests, so let's commit them:

git add t
git commit -m 'Write tests'
git push

Your distribution structure should look something like this now.

Extra Testing

Our tests did not test absolutely everything that can be tested. What happens when a city is an empty string? What happens when it's not a string? What happens when we give a garbage value for the country? What happens when network connection fails?

We could add that, but keep one thing in mind: tests are code and code needs maintenance. If adding a couple lines of code to your program requires you to also dig through thousands of lines of tests, you're going to have a bad day.

So how much testing is enough? It depends on the type of the software you're writing. If your software failing will result in the loss of human life (e.g. medical software) or loss of a large investment (e.g. software for space probes) you better make sure you test every possible case. On the other end, if you're writing a cowsay clone, you may scrimp on tests for the sake of easier maintenance.

Running The Tests

To run the tests, we use the prove command and pass perl6 as executable to use. Since the modules we're writing tend to live in lib/ directory, we should also pass the -I command line switch to include that directory in the module search path. We'll also tell it to find test files recursively and be verbose with its output. Thus, the full command is:

prove -e 'perl6 -Ilib' -vr t/

Where t/ is the directory with our tests, but we can give it individual test files as well. For convenience, I aliased the above command in my .bash_aliases file:

alias prove6="prove -e 'perl6 -Ilib' -vr"

And then I just use it as

prove6 t/

Try running the tests right now. Unsurprisingly, they fail!

...
# Failed test 'The module can be use-d ok'
...

These failures will be our instructions on what to do next while implementing the module, which we'll cover in the next post!

Refining the Design

At this point, we got a feel for using the code we haven't even written yet and that type of code is much cheaper to change than one we've written and shipped. Does anything feel off or awkward to use? Are we missing anything? Does anything seem redundant? If yes, we probably should alter our design.

Three things jump out with our weather module:

  • We don't know why we failed. Was the city name wrong? Did the service change and now we're not giving it the correct arguments? Was it a network error? Perhaps, we should add some exception classes and throw one of them, depending on the error.
  • We don't know whether we got the weather for the correct city. Calling with ('London') gives weather for London in Britain, but calling with ('London', 'ca') gives weather for London in Ontario, Canada. Perhaps, we could add a .location method to our result object that would return City + Country of the actual location we received the weather for.
  • An astute reader will notice we never specced how weather-for obtains the API key! There are several approaches. We can specify it on the use line or call a key subroutine and store it in a class variable—both of which will restrict your program to use just one API key. Another way may be to pass a :key named argument to weather-for or even redesign the interface to be Object Oriented, with key specified as an attribute to the WebService::Weather object.

Homework

Several problems with our code/design were brought up in this articles: we don't know how to specify the API key to use, tests don't test for everything, and we could use some extra features, such as precise failure mode indicators and providing the location of in the result.

Try to alter the design and modify the tests to accommodate that stuff.

Conclusion

Today, we broke ground by laying down the first code for our app. This code tests the functionality of the actual app code we're yet to write.

Ensuring your code works is important and having automated tests do that for you lets you modify your code without fear that you'll break something. The amount of tests you write depends on the type of your application. As tests require maintenance and you need to strike a balance between having your application work "correctly enough" and adding extra maintenance work for you.

In the next post, we'll write the actual code to fetch weather information. Get excited!

Perl 6 Hands-On Workshop: Weatherapp (Part 2)

Read this article on Perl6.Party

Be sure to read Part 1 of this workshop first.

Imagine writing 10,000 lines of code and then throwing it all away. Turns out when the client said "easy to use," they meant being able to access the app without a password, but you took it to mean a "smart" UI that figures out user's setup and stores it together with their account information. Ouch.

The last largish piece of code where I didn't bother writing design docs was 948 lines of code and documentation. That doesn't include a couple of supporting plugins and programs I wrote using it. I had to blow it all up and re-start from scratch. There weren't any picky clients involved. The client was me and in the first 10 seconds of using that code in a real program, I realized it sucked. Don't be like me.

Today, we'll write a detailed design for our weather reporting program. There are plenty of books and opinions on the subject, so I won't tell you how you should do design. I'll tell you how I do it and if at the end you decide that I'm an idiot, well... at least I tried.

The Reason

It's pretty easy to convince yourself that writing design docs is a waste of time. The reason for such a feeling is because design future-proofs the software, proving useful after months or years, and us, squishy sacks of snot and muscle, really like the type of stuff we can touch, see, and run right away. However, unless you can hold all the workings of your program entirely into your head at once, you'll benefit from jotting down the design first. Here are some of the arguments against doing so I heard from others or thought about myself:

It's more work / More time consuming

That's only true if you consider the amount of work done today or in the next couple of weeks. Unless it's a one-off script that can die after that time, you'll have to deal with new features added, current features modified, appearance of new technologies and deprecation of old ones.

If you never sat down and actively thought about how your piece of software will handle those things, it'll be more work to change them later on, because you'll have to change the architecture of already-written code and that might mean rewriting the entire program in extreme cases.

There are worse fates than a rewrite, however. How about being stuck with awful software for a decade or more? It does everything you want it to, if you add a couple of convoluted hacks no one knows how to maintain. You can't really change it, because it's a lot of work and too many things depend on it working the way it is right now. Sure, the interface is abhorrent, but at least it works. And you can pretend that piece of code doesn't really exist, until you have to add a new feature to it.

Yeah, tell it to my boss!

You tell them! Listen, if your boss tells you to write a complicated program in one hour... which parts of it would you leave unimplemented, for the client to complain about? Which parts of it would you leave buggy? Which parts of it would you leave non-secure?

Because you're doing the same thing when you don't bother with the design, don't bother with the tests, and don't bother with the documentation. The only difference is the time when people find out how screwed everyone is is further in the future, which lets you delude yourself into thinking those parts can be omitted.

Just as you would tell your boss they aren't giving you enough time in the case I described above, tell the same if you don't have the time to write down the design or the docs. If they insist the software must get finished sooner, explain to them the repercussion of omitting the steps you plan to omit, so that when shit hits the fan, it's on them.

I think better in code

This is the trap I myself used to fall into more often than I care to admit. You start writing your "design" by explaining which class goes where and which methods it has and... five minutes in you realize writing all that in code is more concise anyway, so you abandon the idea and start programming.

The cause for that is your design is too detailed on the code and not enough on the purpose and goals. The more of the design you can write without having to rely on specific details of an implementation, the more robust your application will be and, as time passes and technologies come and go, what your app is supposed to do remains clear and in human language. That's not to say there's no place for code in the design. The detailed interface is good to have and larger software should have its guts designed too. However, try to write your design as something you'd give to a competent programmer to implement, rather than step-by-step instructions that even an idiot could follow and end up with a program.

To give you a real-world example: 8–10 years ago, the biggest argument I had with other web developers was the width of the website. You see, 760–780 pixel maximum width was the golden standard, because some people had 800x600 monitor resolutions and so, if you account for the scrollbar's width, the 780 pixel website fit perfectly without horizontal scrolling. I was of the opinion that it was time for those people to move on to higher resolutions, and often used 900 pixel widths... or even 1000px, when I was feeling especially rebellious.

Now, imagine implementation-specific design docs that address that detail: "The website must be 780 pixels in width." Made sense in the past, but is completely ludicrous today. A better phrasing should've been "The website must avoid horizontal scrolling."

The benefits

Along with the aforementioned benefits of having a written design document, there are another two that are more obvious: tests and user documentation.

A well-written and complete design document is the human-language version of decent machine-language tests. It's easier to do TDD (Test Driven Development), which we'll do in the next post in this series, and your tests are less reliant on the specifics of the implementation, so that they don't falsely blow up every time you make a change.

Also, a huge chunk of your design document can be re-used for user documentation. We'll see that first-hand we we get to that part.

The Design

By this point, we have two groups of readers: those who are convinced we need a design and those who need to keep track of the line count of their programs to cry about when they have to rewrite them from scratch, (well, three groups: those who already think I'm an idiot).

We'll pop open DESIGN.md that we started in Part 1 and add our detailed design to it.

Throw Away Your Code

The best code is not the most clever, most documented, or most tested. It's the one that's easiest to throw away and replace. And you can add and remove features and react to technology changes by throwing away code and replacing it with better code. Since replacing the entire program each time is expensive, we need to construct our program out of pieces each of which is easy to throw away and replace.

Our weather program is something we want to run from a command line. If we shove all of our code into a script, we're faced with a problem tomorrow, when we decide to share our creation with our friends in a form of a web application.

We can avoid that issue by packing all functionality into a module that provides a function. A tiny script can call that function and print the output to the terminal and a web application can provide the output to the web browser.

We have another weakness on the other end of the app: the weather service we use. It's entirely out of our control whether it continues to be fast enough and cheap enough or exists at all. A dozen of now-defuct pastebin modules I wrote are a testament to how frequently a service can disappear.

We have to reduce the amount of code we'd need to replace, should OpenWeatherMap disappear. We can do that by creating an abstraction of what a weather service is like and implementing as much as we can inside that abstraction, leaving only the crucial bits in an OpenWeatherMap-specific class.

Let's write the general outline into our DESIGN.md:

# GENERAL OUTLINE

The implementation is a module that provides a function to retrieve
weather information. Currently supported service
is [OpenWeatherMap](www.openweathermap.org), but the implementation
must allow for easy replacement of services.

Details

Let's put on the shoes of someone who will be using our code and think about the easiest and least error-prone way to do so.

First, how will a call to our function look like? The API tells us all we need is a city name, and if we want to include a country, just plop its ISO code in after the city, separated with a comma. So, how about this:

my $result = weather-for 'Brampton,ca';

While this will let us write the simplest implementation—we just hand over the given argument to the API—I am not a fan of it. It merges two distinct units of information into one, so any calls where the arguments are stored in variables would have to use a join or string interpolation. Should we choose to make a specific country the default one, we'd have to mess around inspecting the given argument to see whether it already includes a country. Lastly, city names can get rather weird... what happens if a user supplies a city name with a comma in it? The API doesn't address that possibility, so my choice would be to strip commas from city names, which is easiest to do when it's a separate variable. Thus, I'll alter what the call looks like to this:

my $result = weather-for 'Brampton', 'ca';

As for the return value, we'll return a Weather::Result object. I'll go over what objects are when we write the code. For now, you can think of them as things you can send a message to (by calling a method on it) and sometimes get a useful message back in return. So, if I want to know the temperature, I can call my $t = $weather-object.temp and get a number in $t; and I don't care at all how that value is obtained.

Our generic Weather::Result object will have a method for each piece of information we're interested in: temperature, information on precipitation, and wind speed. Looking at the available information given by the API, we can merge the amount of rain and the amount of snow into a single method, and for wind I'll only use the speed value itself and not the direction, thus a potential use for our function could look like this:

printf "Current weather is: %d℃, %dmm precip/3hr, wind %dm/s\n",
    .temp, .precip, .wind given weather-for <Brampton ca>;

Looks awesome to me! Let's write all of this into our DESIGN.md:

# INTERFACE DETAILS

## EXPORTED SUBROUTINES

### `weather-for`

    my $result = weather-for 'Brampton', 'ca';

    printf "Current weather is: %d℃, %dmm precip/3hr, wind %dm/s\n",
        .temp, .precip, .wind given $result;

Takes two positional arguments—name of the city and [ISO country
code](http://www.nationsonline.org/oneworld/country_code_list.htm)—to
provide weather information for. The country is optional and by default is
not specified.

Returns a `Weather::Result` object on success, otherwise returns
a `Failure`. The object provides these methods:

#### `.temp`

    say "Current temperature is $result.temp()℃"

Takes no arguments. Returns the `Numeric` temperature in degrees Celcius.

#### `.precip`

    say "Expected to receive $result.precip()mm/3hr of percipitation";

Takes no arguments. Returns the `Numeric` amount of precipitation in
millimeters per three hours.

#### `.wind`

    say "Wind speed is $result.wind()m/s";

Takes no arguments. Returns the `Numeric` wind speed in meters per second.

Great! The interface is done. And the best thing is we can add extra methods to the object we return, to add useful functionality, which brings me to the next part:

Overengineering

It's easy for programmers to overengineer their software. Unlike building a larger house, there's no extra lumber needed to build a larger program. And it's easy to fall into the trap of adding numerious useless features to your software that make it more complicated and more difficult to maintain, without adding any measurable amount of usefulness.

Some examples are:

  • Accepting multiple types of input (Array, Hash, scalars), just because you can.
  • Returning multiple types of output, just because you can figure out what is most likely expected, based on the input or the calling context.
  • Providing both object-oriented and functional interfaces, just because some people like one or the other.
  • Adding a feature, just because it's only a couple of lines of code to add it.
  • Providing detailed settings or configuration, just because...

Note that none of the above features are inherently bad. It's the reasons for why they are added that suck. All of those items make your program more complex, which translates to: more bugs, more code to maintain, more code to write to replicate the interface should the implementation change, and last but not least, more documentation for the user to sift through! It's critical to evaluate the merits of each addition and to justify the extra cost of having it included.

My favourite example of overengineering is WeChall wargaming website. I'm pretty sure there's a button that will make that site mow my lawn... I just have to find it first:

If I have some "cool" ideas for what my module XYZ can do, I usually simply make sure they're possible to add with my current design, and then... I leave them alone until someone asks me for them.

An astute reader will notice our weather-for can only do metric units or that the wind speed doesn't include the direction, even though the API provides other units and extra information. Well, that's all our fictional client asked for. The code is easy to implement and the entire documentation fits onto half a screen.

If in the future weather-for needs to return Imperial units, we'll simply make it accept :imperial positional argument that will switch it into Imerial units mode. If we ever need wind direction as well, no problem, just add it as an extra method in Weather::Result object.

Do less. Be lazy. In programming, that's a virtue.

Our Repo

Our repository now contains completed DESIGN.md file with our design. Commit what we wrote today:

git add DESIGN.md
git commit -m 'Write detailed design'
git push

I created a GitHub repo for this project, so you can follow along and ensure you have all the files.

Homework

Amend the design to include either of these features (or both): (1) make it possible for weather-for to use both metric or Imperial units, depending on what the user wants; (2) Make it possible to give weather-for actual names for countries rather than ISO country codes.

If you're feeling particularly adventurous, design a Web application that will use our module.

Conclusion

Today, we've learned how to think about the design of software before we create it. It's useful to have the design written down in human language, as that's easier to understand and cheaper to change than code. We wrote the design for our weather applications and are now ready to get down and dirty and start writing some code. Coming up next: Tests!

Update: Part 3 is now available!

Perl 6 Hands-On Workshop: Weatherapp (Part 1)

Read this article on Perl6.Party

Welcome to the Perl 6 Hands-On Workshop, or Perl 6 HOW, where instead of learning about a specific feature or technique of Perl 6, we'll be learning to build entire programs or modules.

Knowing a bunch of method calls won't make you a good programmer. In fact, actually writing the code of your program is not where you spend most of your time. There're requirements, design, documentation, tests, usability testing, maintenance, bug fixes, distribution of your code, and more.

This Workshop will cover those areas. But by no means should you accept what you learn as authoritative commandments, but rather as reasoned tips. It's up to you to think about them and decide whether to adopt them.

Project: "Weatherapp"

In this installment of Perl 6 HOW we'll learn how to build an application that talks to a Web service using its API (Application Programming Interface). The app will tell us weather at a location we provide. Sounds simple enough! Let's jump in!

Preparation

I'll be using Linux with bash shell. If you aren't, you can get a good distro and run it in VirtualBox, or just try to find what the equivalent commands are on your OS. It should be fairly easy.

I'll also be using git for version control. It's not required that you use this type of version control and you can skip all the git commands in the text. However, using version control lets you play around with your code and not worry about breaking everything, especially when you store your repository somewhere online. I highly recommend you familiarize yourself with it.

To start, we'll create an empty directory Weatherapp and initialize a new git repository inside:

mkdir Weatherapp
cd Weatherapp
git init

Design Docs: "Why?"

Before we write down a single line of code we need to know a clear answer for what problem we're trying to solve. The statement "tell weather" is ridiculously vague. Do we need real-time, satellite-tracked wind speeds and pressures or is it enough to have temperature alone for one day for just the locations within United States? The answer will drastically change the amount of code written and the web service we'll choose to use—and some of those are rather expensive.

Let's write the first bits of our design docs: the purpose of the code. This helps define the scope of the project and lets us evaluate what tools we'll need for it and whether it is at all possible to implement.

I'll be using Markdown for all the docs. Let's create DESIGN.md file in our app's directory and write out our goal:

# Purpose

Provide basic information on the current weather for a specified location.
The information must be available for as many countries as possible and
needs to include temperature, possibility of precipitation, wind
speed, humidex, and windchill. The information is to be provided
for the current day only (no hourly or multi-day forecasts).

And commit it:

git add DESIGN.md
git commit -m 'Start basic design document'
git push

With that single paragraph, we significantly clarified what we expect our app to be able to do. Be sure to pass it by your client and resolve all ambiguities. At times, it'll feel like you're just annoying them with questions answers to which should be "obvious," but a steep price tag for your work is more annoying. Besides, your questions can often bring to light things the client haven't even though of.

Anyway, time to go shopping!

Research and Prior Art

Before we write anything, let's see if someone already wrote it for us. Searching the ecosystem for weather gives zero results, at the time of this writing, so it looks like if we want this in pure Perl 6, we have to write it from scratch. Lack of Perl 6 implementation doesn't always mean you have to write anything, however.

Use multiple languages

What zealots who endlessly diss everything that isn't their favourite language don't tell you is their closet is full of reinvented wheels, created for no good reason. In Perl 6, you can use C libraries with NativeCall, most of Perl 5 modules with Inline::Perl5, and there's a handful of other Inlines in the ecosystem, including Python and Ruby. When instead of spending several weeks designing, writing, and testing code you can just use someone's library that did all that already, you are the winner!

That's not to say such an approach is always the best one. First, you're adding extra dependencies that can't be automatically installed by panda or zef. The C library you used might not be available at all for the system you're deploying your code on. Inline::Perl5 requires perl compiled with -fPIC, which may not be the case on user's box. And your client may refuse to involve Python without ever giving you a reason why. How to approach this issue is a decision you'll have to make yourself, as a programmer.

Steal Borrow Ideas

Even if you choose not to include other languages (and we won't, for the purposes of this tutorial), it's always a good idea to take a look at prior art. Not only can we see how other people solved similar problems and use their ideas, but for our program we can also see which weather Web service people tend to use.

The two pieces we'll examine are Perl 5's Weather::Underground and Weather::OpenWeatherMap. They use different services, return their results in different formats (Perl 5's native data structures vs. objects), and the data contains varying amounts of detail.

I like ::OpenWeatherMap's approach of returning results as objects, since the data can be abstracted and we can add useful methods if we ever need to, however, traversing its documentation is more difficult than that of ::Underground—even more so for someone not overly familiar with Object Orientation. So in our program, I think we can return objects, but we'll have fewer of them. We'll think more about that during design stage.

Also, the implementations suggest none of the two Web services offer humidex or windchill values. We'll ask our client how badly they need those values and try to find another service, if they are absolutely required. A more common case will be where the more expensive plan of the service offers a feature while the free or less expensive doesn't. The client will have to decide how much they wish to splurge in that case.

Weather::Underground's service seems to offer more types of data, so let's look at it first. Even if we don't need those right now, we might in the future. The website is pretty slow, has two giant ads, has poor usability, and the availability of the API isn't apparent right away (the link to it is in the footer). While those aren't direct indicators of the quality of the service, there tends to be at least some correlation.

When we get to the API service level options, we see the free version has rather low limits: 10 requests per minute up to a maximum of 500 per day. If you actually try to sign up to the site, you'll encounter a bug where you have to fill out your app's details twice. And the docs? They aren't terrible, but it took me a bit to find the description of request parameters. Also, none of the response parameters are explained and we're left to wonder what estimated is and what its properties are when it's not empty, for example. The API actually does offer windchill and "heat index," but in a sample response their values are "N/A". Are they ever available? Overall, I'd try to avoid this service if I have a choice.

Up next, Weather::OpenWeatherMap's service—www.openweathermap.org. Nicer and faster website, unintrusive ads, the API link is right in the navigation and leads to a clear summary of the APIs offered. The free version limits are much better too: 60 requests per minute, without a daily limit. Signing up for the API key is simpler as well—no annoying email confirmations. And docs are excellent. Even though humidex and wind chill aren't available, the docs explicitly state how many worldwide weather stations the site offers, while Wunderground's site mentions worldwidedness as an afterthought inside a broken <dfn> hovering over which pops up Definition not found message.

The winner is clear: OpenWeatherMap. Sign up for an account and generate an API key for us to use. You can use a throwaway email address for registration, if you prefer.

Alternatively, try finding yet another web service that's better suited for our weather application!

Homework

By choosing OpeanWeathMap service we had to abandon providing humidex and wind chill in our data that we originally wrote in our design doc. We pretended our client OKed changing the requirements of the app, so we need to update our docs to reflect that.

You can also update it to reflect some other service you found. Perhaps, don't mention the specific data types, but rather the purpose of the weather information. Is it for a city dweller to know what to wear in the morning? Is it for a farmer to know when to sow the crops? Or is the data to be used in a research project?

Conclusion

Today, we started our design docs by defining the scope of our application. We then looked at prior art written in Perl 6 (found none) and other languages. We evaluated to services that provide weather data on their potential quality, reliability, query limits, and feature sets.

At this point we have: start of the design doc, chosen service provider, and API key for it. In the next post, we'll write detailed design and tests for our app.

See you then!

Update: Part 2 is now available!

Perl 6 .polymod: Break Up a Number Into Denominations

Read this article on Perl6.Party and play with code examples right in your browser!

Back in the day, I wrote Perl 5 module Number::Denominal that breaks up a number into "units," say, 3661 becomes '1 hour, 1 minute, and 1 second'. I felt it was the pinnacle of achievement and awesome to boot. Later, I ported that module to Perl 6, and recently I found out that Perl 6 actually has .polymod method built in, which makes half of my cool module entirely useless.

Today, we'll examine what .polymod does and how to use it. And then I'll talk a bit about my reinvented wheel as well.

Denominated

The .polymod method takes a number of divisors and breaks up its invocant into pieces:

my $seconds = 1 * 60*60*24 # days
            + 3 * 60*60    # hours
            + 4 * 60       # minutes
            + 5;           # seconds

say $seconds.polymod: 60, 60;
say $seconds.polymod: 60, 60, 24;

# OUTPUT:
# (5 4 27)
# (5 4 3 1)

The divisors we pass as arguments in this case are time related: 60 (seconds per minute), 60 (minutes per hour), and 24 (hours in a day). From the smallest unit, we're progressing to the largest one, with the numbers being how many of the unit in question fit into the next larger unit.

Matching up the output to the expression we assigned to $seconds we can see that output also progresses—same as input divisors—from smallest unit to largest: 5 seconds, 4 minutes, 3 hours, and 1 day.

Notice how in the first call, we did not specify a divisor for hours-in-a-day, and so we got our days expressed as hours (24 hours for one day, plus the 3 hours we had originally). So this form of .polymod simply uses up all the divisors and the number of returned items is one more than the number of given divisors.

Handmade

Another code example useful to understanding of .polymod is one showing the previous calculation done with a loop instead, without involving .polymod:

my $seconds = 2 * 60*60*24 # days
            + 3 * 60*60    # hours
            + 4 * 60       # minutes
            + 5;           # seconds

my @pieces;
for 60, 60, 24 -> $divisor {
    @pieces.push: $seconds mod $divisor;
    $seconds div= $divisor
}
@pieces.push: $seconds;

say @pieces;

# OUTPUT:
# [5 4 3 2]

For each of the divisors, we take the remainder of integer division of $seconds and the divisor being processed and then change the $seconds to the integer divison between $seconds and that divisor.

To Infinity and Beyond!

Perl 6 is advanced enough to have infinite things in it without blowing up and that's accomplished with lazy lists. When the divisors given to .polymod method are in a lazy list, it'll run until the remainder is zero and not through the whole list:

say 120.polymod:      10¹, 10², 10³, 10⁴, 10⁵;
say 120.polymod: lazy 10¹, 10², 10³, 10⁴, 10⁵;
say 120.polymod:      10¹, 10², 10³ … ∞;

# OUTPUT:
# (0 12 0 0 0 0)
# (0 12)
# (0 12)

In the first call, we have a series of numbers increasing by a power of 10. The output of that call includes 4 trailing zeros, because .polymod evaluated each divisor. In the second call, we explicitly create a lazy list using lazy keyword and now we have just two items in the returned list.

The first divisor (10) results in zero remainder, which is our first item in the returned list, and integer division changes our 120 to just 12 for the next divisor. The remainder of division of 12 by 100 is 12, which is our second item in the returned list. Now, integer division of 12 by 100 is zero, which stops the execution of .polymod and gives us our two-item result.

In the last call, we use an ellipsis, which is the sequence operator, to create the same series of numbers increasing by a power of 10, except this time that series is infinite. Since it's lazy, the result is, once again, just two elements.

Zip It, Lock It, Put It In The Pocket

Numbers alone are great and all, but aren't too descriptive about the units they represent. Let's use a Zip meta operator, to fix that issue:

my @units  = <ng μg mg g kg>;
my @pieces = 42_666_555_444_333.polymod: 10³ xx ∞;

say @pieces Z~ @units;
# OUTPUT:
# (333ng 444μg 555mg 666g 42kg)

For the purposes of our calculation, I'll be breaking up forty two trillion, six hundred sixty six billion, five hundred fifty five million, four hundred forty four thousand, three hundred and thirty three (😸) nanograms into several larger units.

We store unit names in the array @units. Then, we call .polymod on our huge number and give it an infinite list with number 1000 for each divisor and store what it gives us in @pieces.

The Zip meta operator one-by-one takes elements from lists on the left and right hand sides and applies to them the operator given to it. In this case, we're using the string concatenation operator (~), and thus our final result is a list of strings with numbers and units.

That Denominated Quickly

You're not limited to just Ints for both the invocant and the divisors, but can use others too. In this mode, regular division and not integer one will be used with the divisors and the remainder of the division will be simply subtracted. Note that this mode is triggered by the invocant not being an Int, so if it is, simply coerce it into a Rat, a Num, or anything else that does the Real role:

say ⅔.polymod: ⅓;

say 5.Rat.polymod: .3, .2;
say 3.Rat.polymod: ⅔, ⅓;

# OUTPUT:
# (0 2)
# (0.2 0 80)
# (0.333333 0 12)

In the first call, our invocant is already a Rat, so we can just call .polymod and be done with it. In the second and third, we start off with Ints, so we coerce them into Rats. The reason I didn't use a Num here is because it adds floating point math noise into the results, which Rats can often avoid:

say 5.Num.polymod: .3, .2;
say 3.Num.polymod: ⅔, ⅓;

# OUTPUT:
# (0.2 0.199999999999999 79)
# (0.333333333333333 2.22044604925031e-16 12)

This imprecision of floating point math is also something to be very careful about when using lazy list mode of .polymod, since it may never reach exact zero (at least at the time of this writing). For example, on my machine this is a nearly infinite loop as the numbers fluctuate wildly. Change put to say to print the first hundred numbers:

put 4343434343.Num.polymod: ⅓ xx ∞

Making it Human

All we've seen so far is nice and useful, but Less Than Awesome when we want to present the results to squishy humans. Even if we use the Zip meta operator to add the units, we're still not handling the differences between singular and plural names for units, for example. Luckily, some crazy guy wrote a module to help us: Number::Denominate.

use Number::Denominate;

my $seconds = 1 * 60*60*24 # days
            + 3 * 60*60    # hours
            + 4 * 60       # minutes
            + 5;           # seconds

say denominate $seconds;
say denominate $seconds, :set<weight>;

# OUTPUT:
# 1 day, 3 hours, 4 minutes, and 5 seconds
# 97 kilograms and 445 grams

By default, the module uses time units and the first call to denominate gives us a nice, pretty string. Several sets of units are pre-defined and in the second call we use the weight unit set.

You can even define your own units:

say denominate 449, :units( foo => 3, <bar boors> => 32, 'ber' );

# OUTPUT:
# 4 foos, 2 boors, and 1 ber

The module offers precision control and a couple of other options, and I encourage you to check out the docs if denominating things is what you commonly do.

Conclusion

Perl 6's built-in .polymod method is a powerful tool for breaking up numbers into denominations. You can use it on Ints or other types of numbers, with latter allowing for use of non-integer divisors. You can alter the mode of its operation by providing the divisors as an infinite list. Lastly, module Number::Denominate can assist with presenting your denominated number in a human-friendly fashion.

Enjoy!

"Anguish": Invisible Programming Language and Invisible Data Theft

DISCLAIMER: data theft is a serious crime in many jurisdictions. The author does not condone or encourage anyone to break laws. The information provided here is for educational purposes only.

PART I: Anguish: The Invisible Programming Language

You may be familiar with funky esoteric languages like Ook or even Whitespace. Those are fun and neat, but I've decided to dial up the crazy a notch and make a completely invisible programming language!

I named it Anguish and, based on my quick googling, I may be a lone wolf at this depth of insanity. In this article, I'll describe the language, go over my implementation of its interpreter, and then talk about some security implications that come with invisible code.

The Code

Here's an Anguish program that prints Hello World:

‌⁣‌⁣‍⁣⁣⁡⁢​⁠‌‍⁣⁣‍⁡⁡⁡⁡⁡⁡⁡⁡‌⁠⁡⁡⁡⁡‌⁠⁡⁡⁠⁡⁡⁡⁠⁡⁡⁡⁠⁡​​​​⁢‍⁠⁡⁠⁡⁠⁢⁠⁠⁡‌​‍​⁢‍⁠⁠⁣⁠⁢⁢⁢⁣⁡⁡⁡⁡⁡⁡⁡⁣⁣⁡⁡⁡⁣⁠⁠⁣​⁢⁣​⁣⁡⁡⁡⁣⁢⁢⁢⁢⁢⁢⁣⁢⁢⁢⁢⁢⁢⁢⁢⁣⁠⁠⁡⁣⁠⁡⁡⁣

Here's another one that reads in a 4-character string and prints it back out:

⁠⁠⁠​​​⁣⁠⁣⁠⁣⁠⁣

Here's code for a full-featured web browser:

OK, the last one I lied about, but the first two are real programs and, if your Unicode support is decent, completely invisible to the human eye (as opposed to, say, spaces and tabs, which are "transparent").

Anguish is based on Brainf#%k (BF) except instead of using visible characters, it uses invisible ones. This also means we can easily convert any BF program into an Anguish one using this simple one-liner:

perl -C -pi -e 'tr/><+.,[]-/\x{2060}\x{200B}\x{2061}\x{2063}\x{FEFF}\x{200C}\x{200D}\x{2062}/'

Here's the character mapping I chose with BF operators on the left and Anguish versions of them on the right:

>   [⁠] U+2060 WORD JOINER [Cf]
<   [​] U+200B ZERO WIDTH SPACE [Cf]
+   [⁡] U+2061 FUNCTION APPLICATION [Cf]
-   [⁢] U+2062 INVISIBLE TIMES [Cf]
.   [⁣] U+2063 INVISIBLE SEPARATOR [Cf]
,   [] U+FEFF ZERO WIDTH NO-BREAK SPACE [Cf]
[   [‌] U+200C ZERO WIDTH NON-JOINER [Cf]
]   [‍] U+200D ZERO WIDTH JOINER [Cf]

These are—by far—not the only invisible Unicode characters and my choice was more or less arbitrary. However, most of the ones I chose can actually be abused into Perl 6 terms and operators, which I'll show in Part II.

The Interpreter

For the interpreter, I chose the awesome Perl 6 programming language and I merely copied over the guts of my Inline::Brainf#%k Perl 6 module and changed it to look for Anguish characters.

You can get the full distro in my repo. Here, I'm replicating the main code:

01: unit module Acme::Anguish;
02: use Term::termios;
03:
04: sub anguish (Str:D $code) is export {
05:     check-matching-loop $code;
06:     my $saved-term;
07:     try {
08:         $saved-term = Term::termios.new(:1fd).getattr;
09:         given Term::termios.new(:1fd).getattr {
10:             .makeraw;
11:             .setattr(:DRAIN);
12:         }
13:     };
14:     LEAVE { try $saved-term.setattr(:DRAIN) }
15:
16:     my @code    = $code.NFC.map(*.chr).grep:
17:                     * eq "\x2062"|"\x200B"|"\x2060"
18:                         |"\x2061"|"\x2063"|"\xFEFF"|"\x200C"|"\x200D";
19:     my $ꜛ       = 0;
20:     my $cursor  = 0;
21:     my $stack   = Buf[uint8].new: 0;
22:     loop {
23:         given @code[$cursor] {
24:             when "\x2060" { $stack.append: 0 if $stack.elems == ++$ꜛ;       }
25:             when "\x200B" { $ꜛ--; fail "Negative cell pointer\n" if $ꜛ < 0; }
26:             when "\x2061" { $stack[$ꜛ]++;               }
27:             when "\x2062" { $stack[$ꜛ]--;               }
28:             when "\x2063" { $stack[$ꜛ].chr.print;       }
29:             when "\xFEFF" { $stack[$ꜛ] = $*IN.getc.ord; }
30:             when "\x200C" {
31:                 $cursor++; next if $stack[$ꜛ];
32:                 loop {
33:                     state $level = 1;
34:                     $level++ if @code[$cursor] eq "\x200C";
35:                     $level-- if @code[$cursor] eq "\x200D";
36:                     last unless $level;
37:                     $cursor++;
38:                 }
39:             }
40:             when "\x200D" {
41:                 unless $stack[$ꜛ] { $cursor++; next; }
42:                 loop {
43:                     state $level = 1;
44:                     $cursor--;
45:                     $level-- if @code[$cursor] eq "\x200C";
46:                     $level++ if @code[$cursor] eq "\x200D";
47:                     last unless $level;
48:                 }
49:             }
50:         }
51:         last if ++$cursor > @code.elems;
52:     }
53: }
54:
55: sub check-matching-loop ($code) {
56:     my $level = 0;
57:     for $code.NFC.map: *.chr {
58:         $level++ if $_ eq "\x200C";
59:         $level-- if $_ eq "\x200D";
60:         fail qq{Closing "\\x200D" found without matching "\\x200C"\n}
61:             if $level < 0;
62:         LAST { fail 'Unmatched \\x200C \\x200D' if $level > 0 }
63:     }
64: }

On line 5 (and 55-64), we simply check our loops are matching. Lines 7-14 set the terminal into non-buffering mode so we can read input by characters. On lines 16-21, we prepare our code, stack, and pointers. And the loop on lines 22-52 simply iterates over the Anguish code and does things according to the operator being processed.

One thing to note is lines 16-18, as well as line 57. You'll notice the curious use of .NFC method. It converts our input code into Normal Form Composed.

Perl 6 has advanced Unicode support and, under normal use, the characters we're attempting to go over would be made into graphemes in strings and some of the codepoints we're abusing would get "merged" together when we loop over them. The same would happen with my .grep on line 16, had I used a regex, as in my BF interpreter. To avoid the creation of graphemes, I used eq against a Junction instead.

This wraps it up for the Anguish language and those with intent can go and try to write a full-featured browser in it now. As for the rest of us, let's abuse our invisible Unicode chars some more and steal some data!

PART II: Invisible Data Theft

The beauty of the invisible Anguish characters we used is they aren't "spacey", but are formatting characters. This means in Perl 6 we can abuse them and create invisible terms and operators. The innocuous version may look rather cute:

sub infix:<⁣> { $^a + $^b };
say 2⁣2;

# OUTPUT:
# 4

Here is where I placed the INVISIBLE SEPARATOR character that produced the effect:

sub infix:<<U+2063>> { $^a + $^b };
say 2<U+2063>2;

If we now consider the expression:

my $x = 42;

We can silently add code to that expression that will steal the assigned value. We'll create a very loose invisible prefix operator and pop it at the start of the line. Let's observe the results:

sub prefix:<⁣> is tighter(&infix:<or>) { say $^a };
⁣my $x = 42;

# OUTPUT
# 42

Again, here's the visible version of the program, with the placement of the invisible char included:

sub prefix:<<U+2063>> is tighter(&infix:<or>) { say $^a };
<U+2063>my $x = 42;

Let's get evil!

Exporting Malicious Operators

Now, if we just plop down our data thieving code in the middle of an important piece of software, someone will likely notice it. Instead, we'll insert it into and export from some auxiliary module no one's likely to start poking in. We'll also disguise our code with a clever comment to make it look rather innocent:

# In SomethingInnocent.pm6:
unit module SomethingInnocent;

... code ...

# Debugging helper
sub prefix:<⁣> is tighter(&infix:<or>) is export {spurt 'DEBUG.txt', $^a, :append};

... code ...

It's a debug helper and it just prints into a DEBUG.txt file. Feels like something that could easily slip in. Once again, I'm using U+2063 character for the name of the operator. Alright, now we're set to steal some data from an important piece of code:

# In ReallyImportantAndSecretCode.p6
use SomethingInnocent;
⁣my $credit_card = '3333-4444-4444-4444'; # pretend this is coming in from DB

As with the earlier example, I've inserted U+2063 character right before my in this code. It's our malicious operator that gets automatically imported from SomethingInnocent. When the code is run, our operator gets called with the value of $credit_card and we dump it to our secret file DEBUG.txt. Data theft completed.

Wait a minute! What about git commits?

It's true, the change we made in ReallyImportantAndSecretCode.p6 will show up as a changed line in the commit... but it's a change involving an invisible character. Depending on the tooling used, it might just look like ditched whitespace at the end of the line. It's certainly not something I'd pay too much attention to were I reviewing the commit. While my command line tools revealed the invisible characters as their Unicode numbers, here's what my adding a bunch of invisible characters to text looks like on GitHub:

Conclusion

Anguish is a language for true computer masochists who would love to question whether their program actually exists. However, the language does point out to us the reality of existence of Unicode characters that make sense in one domain but are outright dangerous in another. We already avoid some abusable characters in domain names and it's time to apply the same practice in other domains, such as programming languages.

Perl 6 Is Slower Than My Fat Momma!

I notice several groups of people: folks who wish Perl 6's performance weren't mentioned; folks who are confused about Perl 6's perfomance; folks who gleefully chuckle at Perl 6's performance, reassured the threat to their favourite language XYZ hasn't arrived yet.

So I'm here to talk about the elephant in the room and get the first group out of hiding and more at ease, I'll explain things to the second group, and to the third group... well, this post isn't about them.

Why is it slow?

The simplest answer: Perl 6 is brand new. It's not the next Perl, but a brand new language in the Perl family. The language spec was finished less than 4 months ago (Dec 25, 2015). While some optimization has been done, the core team focused on getting things right first. It's simply unrealistic to evaluate Perl 6's performance as that of an extremely polished product at this time.

The second part of the answer: Perl 6 is big. It's easy to come up with a couple of one-liners that are much faster in other languages. However, a Perl 6 one-liner loads the comprehensive object model, list tools, set tools, large arsenal of async and concurrency tools... When in a real program you have to load a dozen of modules in language XYZ, but can still stay with bare Perl 6 to get same features, that's when performance starts to even out.

What can you do about it?

Now that we got things right, we can focus on making them fast. Perl 6 uses a modern compiler, so in theory it can be optimized quite a lot. It remains to be seen whether theory will match reality, but looking through numerous optimization commits made since the start of 2016, many stand out by the boosts they bring in:

Thus, the answer is: we're working on it... and we're making good progress.

What can I do about it?

I'll mention three main things to keep in mind when trying to get your code to perform better: pre-compilation, native types, and of course, concurrency.

Pre-Compilation

Currently, a large chunk of slowness you may notice comes from parsing and compiling code. Luckily, Perl 6 automagically pre-compiles modules, as can be seen here, with a large Foo.pm6 module I'm including:

$ perl6 -I. -MFoo --stagestats -e ''
Stage start      :   0.000
Stage parse      :   4.262
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.002
Stage mast       :   0.013
Stage mbc        :   0.000
Stage moar       :   0.000

$ perl6 -I. -MFoo --stagestats -e ''
Stage start      :   0.000
Stage parse      :   0.413
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.002
Stage mast       :   0.013
Stage mbc        :   0.000
Stage moar       :   0.000

The first run was a full run that pre-compiled my module, but the second one already had the pre-compiled Foo.pm6 available and the parse stage went down from 4.262 seconds to 0.413: a 1031% start-up improvement.

Modules you install from the ecosystem get pre-compiled during installation, so you don't have to worry about them. When writing your own modules, however, they will be automatically re-pre-compiled every time you change their code. If you make a change before each time you run the program, it's easy to get the impression your code is not performing well, even though the compilation penalty won't affect the program once you're done tinkering with it.

Just keep that in mind.

Native Types

Perl 6 has several "native" machine types that can offer performance boosts in some cases:

my Int $x = 0;
$x++ while $x < 30000000;
say now - INIT now;

# OUTPUT:
# 4.416726

my int $x = 0;
$x++ while $x < 30000000;
say now - INIT now;

# OUTPUT:
# 0.1711660

That's a 2580% boost we achieved by simply switching our counter to a native int type.

The available types are: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, num, num32, and num64. The number in the type name signifies the available bits, with the numberless types being platform-dependent.

They aren't a magical solution to every problem, and won't offer huge improvements in every case, but keep them in mind and look out for cases where they can be used.

Concurrency

Perl 6 makes it extremely easy to utilize multi-core CPUs using high-level APIs like Promises, Supplies, and Channels. Where language XYZ is fast, but lacks ease of concurrency, Perl 6 can end up the winner in peformance by distributing work over multiple cores.

I won't go into details—you can consult the documentation or watch my talk that mentions them (slides here). I will show an example, though:

await (
    start { say "One!";   sleep 1; },
    start { say "Two!";   sleep 1; },
    start { say "Three!"; sleep 1; },
);
say now - INIT now;

# OUTPUT:
# One!
# Three!
# Two!
# 1.00665192

We use the start keyword to create three Promises and then use the await keyword to wait for all of them to complete. Inside our Promises, we print out a string and then sleep for at least one second.

The result? Our program has three operations that take at least 1 second each, yet the total runtime was just above 1 second. From the output, we can see it's not in order, suggesting code was executed on multiple cores.

That was quite easy, but we can crank it up a notch and use a HyperSeq to transform ordinary code into concurrent code with a single method call:

for (1..4).race( batch => 1 ) {
    say "Doing $_";
    sleep 1;
}
say "Code took {now - INIT now} seconds to run";

# OUTPUT:
# Doing 1
# Doing 3
# Doing 2
# Doing 4
# Code took 1.0090415 seconds to run

We had a list of 4 items to work with. We looped over each of them and performed an expensive operation (in this case, a 1-second sleep). To modify our code to be faster, we simply called the .race method on our list of 4 items to get a Hyper Sequence. Our loop remains the same, but it's now executing in a concurrent manner, as can be seen from the output: items are out of order and our total runtime was just over 1 second, despite a total of 4 seconds of sleep.

If the default batch size of 64 is suitable for you, it means you can go from a plain loop to a concurrent loop by simply typing 5 characters (. r a c e).

Let's See Some Benchmarks

I won't show you any. There's hardly any sense in benchmarking entire languages. Clever one-liners can be written to support one point of view or another, but they simply abstract a problem into a simplistic singularity. Languages are different and they have vastly different tool kits to solve similar problems. Would you choose code that completes in 1 second and takes you 40 minutes to write or code that completes in 2 seconds, yet takes you 10 minutes to write? The choice depends on the type of application you're writing.

Conclusion

Perl 6 is a brand new product, so it doesn't make sense to compare it against software that existed for decades. It is being actively improved and, at least in theory, it should become performant on the level similar to other competing languages.

You don't have to wait for that to happen, however. Thanks to Perl 6's pre-compilation of modules, support of native types, and superb concurrency primitives you can substantially improve the performance of your code right now.

Some may disagree that Perl 6 is slow, some may find it faster than another language, and some may say Perl 6 is slower than my fat momma.

Who's to decide for you? Only you yourself can.

Perl 6 Types: Made for Humans

In my first college programming course, I was taught that Pascal language has Integer, Boolean, and String types among others. I learned the types were there because computers were stupid. While dabbling in C, I learned more about what int, char, and other vermin looked like inside the warm, buzzing metal box under my desk.

Perl 5 didn’t have types, and I felt free as a kid on a bike, rushing through the wind, going down a slope. No longer did I have to cram my mind into the narrow slits computer hardware dictated me to. I had data and I could do whatever I wanted with it, as long as I didn’t get the wrong kind of data. And when I did get it, I fell off my bike and skinned my knees.

With Perl 6, you can have the cake and eat it too. You can use types or avoid them. You can have broad types that accept many kinds of values or narrow ones. And you can enjoy the speed of types that represent the mind of the machine, or you can enjoy the precision of your own custom types that represent your mind, the types made for humans.

Gradual Typing

my       $a = 'whatever';
my Str   $b = 'strings only';
my Str:D $c = 'defined strings only';
my int   $d = 16; # native int

sub foo ($x) { $x + 2 }
sub bar (Int:D $x) returns Int { $x + 2 }

Perl 6 has gradual typing, which means you can either use types or avoid them. So why bother with them at all?

First, types restrict the range of values that can be contained in your variable, accepted by your method or sub or returned by them. This functions both as data validation and as a safety net for garbage data generated by incorrect code.

Also, you can get better performance and reduced memory usage when using native, machine-mind types, providing they’re the appropriate tool for your data.

Built-In Types

There’s a veritable smörgåsbord of built-in types in Perl 6. If the thing your subroutine does makes sense to be done only on integers, use an Int for your parameters. If negatives don’t make sense either, limit the range of values even further and use a UInt—an unsigned Int. On the other hand, if you want to handle a broader range, Numeric type may be more appropriate.

If you want to drive closer to the metal, Perl 6 also offers a range of native types that map into what you’d normally find with, say, C. Using these may offer performance improvements or lower memory usage. The available types are: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, num, num32, and num64. The number in the type name signifies the available bits, with the numberless types being platform-dependent.

Sub-byte types such as int1, int2, and int4 are planned to be implemented in the future as well.

Smileys

multi foo (Int:U $x) { 'Y U NO define $x?'         }
multi foo (Int:D $x) { "The square of $x is {$x²}" }

my Int $x;
say foo $x;
$x = 42;
say foo $x;

# OUTPUT:
# Y U NO define $x?
# The square of 42 is 1764

Smileys are :U, :D, or :_ appended to the type name. The :_ is the default you get when you don’t specify a smiley. The :U specifies undefined values only, while :D specifies defined values only.

This can be useful to detect whether a method is called on the class or on the instance by having two multies with :U and :D on the invocant. And if you work at a nuclear powerplant, ensuring your rod insertion subroutine never tries to insert by an undefined amount is also a fine thing, I imagine.

Subsets: Tailor-Made Types

Built-in types are cool and all, but most of the data programmers work with doesn’t match them precisely. That’s where Perl 6 subsets come into play:

subset Prime of Int where *.is-prime;
my Prime $x = 3;
$x = 11; # works
$x = 4;  # Fails with type mismatch

Using the subset keyword, we created a type called Prime on the fly. It’s a subset of Int, so anything that’s non-Int doesn’t fit the type. We also specify an additional restriction with the where keyword; that restriction being that .is-prime method called on the given value must return a true value.

With that single line of code, we created a special type and can use it as if it were built-in! Not only can we use it to specify the type of variables, sub/method parameters and return values, but we can test arbitrary values against it with the smartmatch operator, just as we can with built-in types:

subset Prime of Int where *.is-prime;
say "It's an Int"  if 'foo' ~~ Int;   # false, it's a Str
say "It's a prime" if 31337 ~~ Prime; # true, it's a prime number

Is your “type” a one-off thing you just want to apply to a single variable? You don’t need to declare a separate subset at all! Just use the where keyword after the variable and you’re good to go:

multi is-a-prime (Int $ where *.is-prime --> 'Yup' ) {}
multi is-a-prime (Any                    --> 'Nope') {}

say is-a-prime 3;     # Yup
say is-a-prime 4;     # Nope
say is-a-prime 'foo'; # Nope

The --> in the signature above is just another way to indicate the return type, or in this case, a concrete returned value. So we have two multies with different signatures. First one takes an Int that is a prime number and the second one takes everything else. With exactly zero code in the bodies of our multies we wrote a subroutine that can tell you whether a number is prime!!

Pack it All Up for Reuse

What we’ve learned so far is pretty sweet, but sweet ain’t awesome! You may end up using some of your custom types quite frequently. Working at a company where product numbers can be at most 20 characters, following some format? Perfect! Let’s create a subtype just for that:

subset ProductNumber of Str where { .chars <= 20 and m/^ \d**3 <[-#]>/ };
my ProductNumber $num = '333-FOOBAR';

This is great, but we don’t want to repeat this subset stuff all over the place. Let’s shove it into a separate module we can use. I’ll create /opt/local/Perl6/Company/Types.pm6 because /opt/local/Perl6 is the path included in module search path for all the apps I write for this fictional company. Inside this file, we’ll have this code:

unit module Company::Types;
my package EXPORT::DEFAULT {
    subset ProductNumber of Str where { .chars <= 20 and m/^ \d**3 <[-#]>/ };
}

We name our module and let our shiny subsets be exported by default. What will our code look like now? It’ll look pretty sweet—no, wait, AWESOME—this time:

use Company::Types;
my ProductNumber $num1 = '333-FOOBAR'; # succeeds
my ProductNumber $num2 = 'meow';       # fails

And so, with a single use statement, we extended Perl 6 to provide custom-tailored types for us that match perfectly what we want our data to be like.

Awesome Error Messages for Subsets

If you’ve been actually trying out all these examples, you may have noticed a minor flaw. The error messages you get are Less Than Awesome:

Type check failed in assignment to $num2;
expected Company::Types::EXPORT::DEFAULT::ProductNumber but got Str ("meow")
in block <unit> at test.p6 line 3

When awesome is the goal, you certainly have a way to improve those messages. Pop open our Company::Types file again, and extend the where clause of our ProductNumber type to include an awesome error message:

subset ProductNumber of Str where {
    .chars <= 20 and m/^ \d**3 <[-#]>/
        or warn 'ProductNumber type expects a string at most 20 chars long'
            ~ ' with the first 4 characters in the format of \d\d\d[-|#]'
};

Now, whenever the thing doesn’t match our type, the message will be included before the Type check... message and the stack trace, providing more info on what sort of stuff was expected. You can also call fail instead of warn here, if you wish, in which case the Type check... message won’t be printed, giving you more control over the error the user of your code receives.

Conclusion

Perl 6 was made for humans to tell computers what to do, not for computers to restrict what is possible. Using types catches programming errors and does data validation, but you can abstain from using types when you don’t want to or when the type of data you get is uncertain.

You have the freedom to refine the built-in types to represent exactly the data you’re working with and you can create a module for common subsets. Importing such a module lets you write code as if those custom types were part of Perl 6 itself.

The Perl 6 technology lets you create types that are made for Humans. And it’s about time we started telling computers what to do.

UPDATE:

Perl 6 will actually evaluate your where expression when checking types even for optional parameters. This can be quite annoying, due to “uninitialized” values being compared. I wrote Subset::Helper to make it easier to create subsets that solves that issue, and it provides an easy way to add awesome error messages too.

About Zoffix Znet

user-pic I blog about Perl.