April 2016 Archives

Perl 6: The S/// Operator

Coming from a Perl 5 background, my first experience with Perl 6's non-destructive substitution operator S/// looked something like this:

(artist's impression)

You'll fare better, I'm sure. Not only have the error messages improved, but I'll also explain everything right here and now.

The Smartmatch

The reason I had issues is because, seeing familiar-looking operators, I simply translated Perl 5's binding operator (=~) to Perl 6's smartmatch operator (~~) and expected things to work. The S/// was not documented and, combined with the confusing (at the time) warning message, this was the source of my pain:

my $orig = 'meowmix';
my $new = $orig ~~ S/me/c/;
say $new;

# OUTPUT warning:
# Smartmatch with S/// can never succeed

The old warning suggests the ~~ operator is the wrong choice here and it is. The ~~ isn't the equivalent of Perl 5's =~. It aliases the left hand side to $_, evaluates the right hand side, and then calls .ACCEPTS($_) on it. That is all there is to its magic.

So what's actually happening in the example above:

  • By the time we get to S///, $orig is aliased to $_
  • The S/// non-destructively executes substitution on $_ and returns the resulting string. This is what the smartmatch will operate on
  • The smartmatch, following the rules for match of Str against Str, will give True or False depending on whether substitution happened (True, confusingly, meaning it didn't)

At the end of it all, we aren't getting what we actually want: the version of the string with substitution applied.

With The Given

Now that we know that S/// always works on $_ and returns the result, it's easy to come up with a whole bunch of ways that set $_ to our original string and gather back the return value of S///, but let's look at just a couple of them:

my $orig = 'meowmix';
my $new = S/me/c/ given $orig;
say $orig;
say $new;

my @orig = <meow cow sow vow>;
my @new = do for @orig { S/\w+ <?before 'ow'>/w/ };
say @orig;
say @new;

# OUTPUT:
# meowmix
# cowmix
# [meow cow sow vow]
# [wow wow wow wow]

The first one operates on a single value. We use the postfix form of the given block, which lets us avoid the curlies (you can use with in place of given with the same results). From the output, you can see the original string remained intact.

The second example operates on a whole bunch of strings from an Array and we use the do keyword to execute a regular for loop (that aliases to $_ in this case) and assign the result to the @new array. Again, the output shows the originals were not touched.

Adverbs

The S/// operator—just like s/// and some methods—lets you use regex adverbs:

given 'Lörem Ipsum Dolor Sit Amet' {
    say S:g      /m/g/;  # Löreg Ipsug Dolor Sit Aget
    say S:i      /l/b/;  # börem Ipsum Dolor Sit Amet
    say S:ii     /l/b/;  # Börem Ipsum Dolor Sit Amet
    say S:mm     /o/u/;  # Lürem Ipsum Dolor Sit Amet
    say S:nth(2) /m /g/; # Lörem Ipsug Dolor Sit Amet
    say S:x(2)   /m /g/; # Löreg Ipsug Dolor Sit Amet
    say S:ss/Ipsum Dolor/Gipsum\nColor/; # Lörem Gipsum Color Sit Amet
    say S:g:ii:nth(2) /m/g/; # Lörem Ipsug Dolor Sit Amet
}

As you can see, they are in the form of :foo that is added after the S part of the operator. You can use whitespace liberally and several adverbs can be used at the same time. Here are their meanings:

  • :g—(long alternative: :global) global match: replace all occurances
  • :i—case insentive match
  • :ii—(long alternative: :samecase) preserve case: regardless of the case of letter used as a substitute, the original case of the letter being replaced will be used
  • :mm—(long alternative: :samemark) preserve mark: in the example above, the diaeresis that was on letter o was preserved and applied to the replacement letter u
  • :nth(n)—replace only nth occurance
  • :x(n)—replace at most n occurances (mnemonic: "x as in times")
  • :ss—(long alternative: :samespace) preserve space type: the type of whitespace character is preserved, regardless of whitespace characters used in the replacement string. In the example above, we replaced with a new line, but the original space was kept

Method Form

Operator S/// is nice, but using it is somewhat awkward at times. Don't fear, Perl 6 provides .subst method for all your substitution needs and delightful .subst/.substr confusion. Here's what its use looks like:

say 'meowmix'.subst: 'me', 'c';
say 'meowmix'.subst: /m./, 'c';

# OUTPUT:
# cowmix
# cowmix

The method takes either a regex or a plain string as the first positional argument, which is the thing it'll look for in its invocant. The second argument is the replacement string.

You can use the adverbs as well, by simply listing them as named Bool arguments, with a slight caveat. In S/// form, adverbs :ss and :ii imply the presence of :s (make whitepsace significant) and :i (case-insensitive match) adverbs respectively. In method form, you have to apply those to the regex itself:

given 'Lorem Ipsum Dolor Sit Amet' {
    say .subst: /:i l/, 'b', :ii;
    say .subst: /:s Ipsum Dolor/, "Gipsum\nColor", :ss;
}

# OUTPUT:
# Borem Ipsum Dolor Sit Amet
# Lorem Gipsum Color Sit Amet

Method Form Captures

Captures aren't alien to substitution operations, so let's try one out with the method call form of substitution:

say 'meowmix'.subst: /me (.+)/, "c$0";

# OUTPUT:
# Use of Nil in string context  in block <unit> at test.p6 line 1
# c

Not quite what we were looking for. Our replacement string is constructed even before it reaches the .subst method and the $0 variable inside of it actually refers to whatever it is before the method call, not the capture in the .subst regex. So how do we fix this?

The second argument to .subst can also take a Callable. Inside of it, you can use the $0, $1, ... $n variables the way they were meant to and get correct values from captures:

say 'meowmix'.subst: /me (.+)/, -> { "c$0" };

# OUTPUT:
# cowmix

Here, we've used a pointy block for our Callable, but WhateverCode and subs will work too. They will be called for each substitution, with the Match object passed as the first positional argument, if you need to access it.

Conclusion

The S/// operator in Perl 6 is the brother of s/// operator that instead of modifying the original string, copies it, modifies, and returns the modified version. The way to use this operator differs from the way non-destructive substitution operator works in Perl 5. As an alternative, a method version .subst is available as well. Both method and operator form of substitution can take a number of adverbs that modify the behaviour of it, to suit your needs.

Extra-Typical Perl 6

Have you ever grabbed an Int and thought, "Boy! I sure would enjoy having an .even method on it!" Before you beg the core developers on IRC to add it to Perl 6, let's review some user-space recourse available to you.

My Grandpa Left Me a Fortune

One way to go is to define your own custom Int-like class that knows how to perform the .even method. You don't have to reinvent the wheel, just inherit from the Int type. You can mix and match core Perl 6 types and roles any way that pleases you.

class BetterInt is Int {
    method even { self %% 2 }
}

my BetterInt $x .= new: 42;
say $x.even;

$x .= new: 71;
say $x.even;

say $x + 42;

# OUTPUT:
# True
# False
# 113

We created a BetterInt class and inherited from Int using is Int trait. The class body has just the extra method even we want to add. Using such a class requires a bit of extra code, however. The my BetterInt $x part restricts $x to contain objects of just BetterInt or subclasses. The .= new: 42 in this case is the same as = BetterInt.new: 42 (it's a shorthand method-call-assign notation, same as += is a shorthand to add to original value).

If we ever want to change the value, we have to do the same .= new: trick again to get a BetterInt inside of our container or else we'll get a fatal error. The good news, however, is that math operators work just fine on our new class, and it's even accepted by anything that wants to have an Int. Here's a sub that expects an Int but happily gobbles up our BetterInt:

sub foo (Int $x) { say "\$x is $x" }

my BetterInt $x .= new: 42;
foo $x;

# OUTPUT:
# $x is 42

But... But... But...

Another option is to mix in a role. The but infix operator creates a copy of an object and does just that:

my $x = 42 but role { method even { self %% 2 } };
say $x.even;

# OUTPUT:
# True

The role doesn't have to be inlined, of course. Here's another example that uses a pre-defined role and also shows that our object is indeed a copy:

role Better {
    method better { 'Yes, I am better' }
}

class Foo {
    has $.attr is rw
}

my $original = Foo.new: :attr<original>;

my $copy = $original but Better;
$copy.attr = 'copy';

say $original.attr;  # still 'original'
say $copy.attr;      # this one is 'copy'

say $copy.better;
say $original.better; # fatal error: can't find method

# OUTPUT:
# original
# copy
# Yes, I am better
# Method 'better' not found for invocant of class 'Foo'
#   in block <unit> at test.p6 line 18

This is great and all, but as far as our original goal is concerned, this solution is rather weak:

my $x = 42 but role { method even { self %% 2 } };
say $x.even; # True
$x = 72;
say $x.even; # No such method

The role is mixed into our object stored inside the container, so as soon as we put a new value into the container, our fancy-pants .even method is gone, unless we mix in the role again.

Sub it in

Did you know you can call subs as methods? It's pretty neat! You receive the object as the first positional parameter and you can even continue the method chain, with a caveat that you can't break up those chains onto multiple lines if the &sub method call doesn't remain on the first line:

sub even { $^a %% 2 };
say 42.&even.uc;

# OUTPUT:
# TRUE

This does serve as a decent way to add extra functionality to core types. The $^a inside our sub's definition refers to the first parameter (the object we're making the call on) and the entire sub can be written differently as sub ($x) { $x %% 2 }. And, of course, your sub-now-method can take arguments too.

Here Be Dragons

The docs for what I'm about to describe contain words "don't do this" at the beggining. No matter what the JavaScript folks might tell you, augmenting native types is dangerous, because you're affecting all parts of your program—even modules that don't see your augmentation.

Now that I have the right to tell you 'I told you so' when the nuclear plant you work at melts down, let's see some code:

# Foo.pm6
unit module Foo;
sub fob is export {
    say 42.even;
}

# Bar.pm6
unit module Bar;
use MONKEY-TYPING;
augment class Int {
    method even { self %% 2 }
}

# test.p6
use Foo;
use Bar;

say 72.even;
fob;

# OUTPUT:
# True
# True

All of the action is happening inside Bar.pm6. First, we have to write a use MONKEY-* declaration, which is there to tell us we're doing something dangerous. Next, we use keyword augment before class Int to indicate we want to augment the existing class. Our augmentation adds method even that tells whether the Int is an even number.

Now, let's look at the whole program. We have module Foo that gives us one sub that simply prints the result of a call of .even on 42 (which is an Int). We use Foo BEFORE we use Bar, the module with our augmentation. Lastly, in our script, we call method .even on an Int and then make a call to the sub exported by Foo.

The scary thing? It all works! Both 72 in our main script and 42 inside the sub in Foo now have method .even, all thanks to our augmentation we performed inside Bar.pm6. We got what we wanted originally, but it's a dangerous method to use.

Evil Flows Through Me

If you're still reading this, that means you're not above messing everything up, core or not. We augmented an Int type, but our numbers can exist as types other than that. Let's augment the Cool type to cover all of 'em:

use MONKEY-TYPING;
augment class Cool {
    method even { self %% 2 }
}

.say for 72.even, '72'.even, pi.even, ½.even;

# OUTPUT:
# Method 'even' not found for invocant of class 'Int'
# in block <unit> at test.p6 line 8

Oops. That didn't work, did it? As soon as we hit our first attempt to call .even (on Int 72), the program crashed. The reason for that is all the types that derive from Cool were already composed by the time we augmented Cool. So, to make it work we have to re-compose them with .^compose Meta Object Protocol method:

use MONKEY-TYPING;
augment class Cool {
    method even { self %% 2 }
}

.^compose for Int, Num, Rat, Str, IntStr, NumStr, RatStr;

.say for 72.even, '72'.even, pi.even, ½.even;

# OUTPUT:
# True
# True
# False
# False

It worked! Now Int, Num, Rat, Str, IntStr, NumStr, RatStr types have an .even method (note: those aren't the only types that inherit Cool)! This is both equisitely evil and plesantly awesome.

Conclusion

When extending functionality of Perl 6's core types or any other class, you have several options. You can use a subclass with is Class. You can mix in a role with but Role. You can call subroutines as methods with $object.&sub. Or you can come to the dark side and use augmentation.

Perl 6—There Is More Than One Way To Extend it.

Perl 6: Comb It!

In Perl 5, I always appreciated the convenience of constructs like these two:

my @things = $text =~ /thing/g;
my %things = $text =~ /(key)...(value)/g;

You take some nice, predictable text, pop a regex next to it, and BOOM! You get a nice list of things or a pretty hash. Magical!

There are some similarities to this construct in Perl 6, but if you're a new programmer, with Perl 5 background, there might be some confusion. First, using several captures doesn't result in nice hashes right off the bat. Second, you don't get strings, you get Match objects.

While Matches are fine, let's look at a tool more suited for the job: The comb

20160425-Perl6-Comb-It.jpg

Plain 'Ol Characters

You can use comb as a subroutine or as a method. In its basic form, comb simply breaks up strings into characters:

'foobar moobar 駱駝道bar'.comb.join('|').say;
'foobar moobar 駱駝道bar'.comb(6).join('|').say;

# OUTPUT:
# f|o|o|b|a|r| |m|o|o|b|a|r| |駱|駝|道|b|a|r
# foobar| mooba|r 駱駝道b|ar

Without arguments, you get individual characters. Supply an integer and you'll get a list of strings at most that many characters long, receiving a shorter string when there are not enough characters left. This method is also about 30x faster than using a regex for the job.

Limits

You can also provide a second integer, the limit, to indicate that you want at most that many items in the final list:

'foobar moobar 駱駝道bar'.comb(1, 5).join('|').say;
'foobar moobar 駱駝道bar'.comb(6, 2).join('|').say;

# OUTPUT:
# f|o|o|b|a
# foobar| mooba

This applies to all forms of using comb, not just the one shown above.

Counting Things

The comb also takes a regular Str as an argument, returning a list of matches containing... that string. So this is useful to get the total number the substring appears inside a string:

'The 🐈 ran after a 🐁, but the 🐁 ran away'.comb('🐈').Int.say;
'The 🐈 ran after a 🐁, but the 🐁 ran away'.comb('ran').Int.say;

# OUTPUT:
# 1
# 2

Simple Matching

Moving onto the realm of regexes, there are several ways to obtain what you want using comb. The simplest way is to just match what you want. The entire match will be returned as an item by the comb:

'foobar moobar 駱駝道bar'.comb(/<[a..z]>+ 'bar'/).join('|').say;

# OUTPUT:
# foobar|moobar

The bar with Rakuda-dō Japaneese characters did not match our a through z character class and so was excluded from the list.

The wildcard match can be useful, but sometimes you don't want to include the wildcard in the resulting strings... Well, good news!

Limit What's Captured

You could use look-around assertions but an even simpler way is to use <( and )> regex capture markers (<( is similar to \K in Perl 5):

'moo=meow ping=pong'.comb(/\w+    '=' <( \w**4/).join('|').say; # values
'moo=meow ping=pong'.comb(/\w+ )> '='    \w**4/).join('|').say; # keys

# OUTPUT:
# meow|pong
# moo|ping

You can use one or the other or both of them.<( will exclude from the match anything described before it and )> anything that follows it. That is, /'foo' <('bar')> 'ber'/, will match things containing foobarber, but the returned string from comb would only be string bar.

Multi Captures

As powerful as comb has been so far, we still haven't seen the compliment to Perl 5's way of fishing out key/value pairs out of text using regex. We won't be able to achieve the same clarity and elegance, but we can still use comb... we'll just ask it to give us Match objects:

my %things = 'moo=meow ping=pong'.comb(/(\w+) '=' (\w+)/, :match)».Slip».Str;
say %things;

# OUTPUT:
# moo => meow, ping => pong

Let's break that code down: it uses the same old .comb to look for a sequence of word characters, followed by the = character, followed by another sequence of word characters. We use () parentheses to capture both of those sequences in separate captures. Also, notice we added :match argument to .comb, this causes it to return a list of Match objects instead of strings. Next, we use two hyper operators (») to first convert the Matches to Slips, which gives us a list of captures, but they're still Match objects, which is why we convert them to Str as well.

An even more verbose, but much clearer, method is to use named captures instead and then .map them into Pairs:

my %things = 'moo=meow ping=pong'
    .comb(/$<key>=\w+ '=' $<value>=\w+/, :match)
    .map({ .<key> => .<value>.Str });
say %things;

# OUTPUT:
# moo => meow, ping => pong

Lastly, an astute reader will rember I mentioned at the beginning that simply using Perl 5's method will result in a list of Match objects... the same Match objects we're asking .comb to give us above. Thus, you can also write the above code like this, without .comb:

my %things = ('moo=meow ping=pong' ~~ m:g/(\w+) '=' (\w+)/)».Slip».Str;
say %things;

# OUTPUT:
# moo => meow, ping => pong

Conclusion

We've learned how to break up a string into bits any way we want to. Be it one or more characters. Be it simple strings or regex matches. Be it partial captures or multiple ones. You can use comb for all. Combined with .rotor, the power is limitless.

The other thing we also are certain of: nothing beats Perl 5's concise my %things = $text =~ /(key)...(value)/g;

Perl 6 Is Slower Than My Fat Momma!

I notice several groups of people: folks who wish Perl 6's performance weren't mentioned; folks who are confused about Perl 6's perfomance; folks who gleefully chuckle at Perl 6's performance, reassured the threat to their favourite language XYZ hasn't arrived yet.

So I'm here to talk about the elephant in the room and get the first group out of hiding and more at ease, I'll explain things to the second group, and to the third group... well, this post isn't about them.

Why is it slow?

The simplest answer: Perl 6 is brand new. It's not the next Perl, but a brand new language in the Perl family. The language spec was finished less than 4 months ago (Dec 25, 2015). While some optimization has been done, the core team focused on getting things right first. It's simply unrealistic to evaluate Perl 6's performance as that of an extremely polished product at this time.

The second part of the answer: Perl 6 is big. It's easy to come up with a couple of one-liners that are much faster in other languages. However, a Perl 6 one-liner loads the comprehensive object model, list tools, set tools, large arsenal of async and concurrency tools... When in a real program you have to load a dozen of modules in language XYZ, but can still stay with bare Perl 6 to get same features, that's when performance starts to even out.

What can you do about it?

Now that we got things right, we can focus on making them fast. Perl 6 uses a modern compiler, so in theory it can be optimized quite a lot. It remains to be seen whether theory will match reality, but looking through numerous optimization commits made since the start of 2016, many stand out by the boosts they bring in:

Thus, the answer is: we're working on it... and we're making good progress.

What can I do about it?

I'll mention three main things to keep in mind when trying to get your code to perform better: pre-compilation, native types, and of course, concurrency.

Pre-Compilation

Currently, a large chunk of slowness you may notice comes from parsing and compiling code. Luckily, Perl 6 automagically pre-compiles modules, as can be seen here, with a large Foo.pm6 module I'm including:

$ perl6 -I. -MFoo --stagestats -e ''
Stage start      :   0.000
Stage parse      :   4.262
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.002
Stage mast       :   0.013
Stage mbc        :   0.000
Stage moar       :   0.000

$ perl6 -I. -MFoo --stagestats -e ''
Stage start      :   0.000
Stage parse      :   0.413
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.002
Stage mast       :   0.013
Stage mbc        :   0.000
Stage moar       :   0.000

The first run was a full run that pre-compiled my module, but the second one already had the pre-compiled Foo.pm6 available and the parse stage went down from 4.262 seconds to 0.413: a 1031% start-up improvement.

Modules you install from the ecosystem get pre-compiled during installation, so you don't have to worry about them. When writing your own modules, however, they will be automatically re-pre-compiled every time you change their code. If you make a change before each time you run the program, it's easy to get the impression your code is not performing well, even though the compilation penalty won't affect the program once you're done tinkering with it.

Just keep that in mind.

Native Types

Perl 6 has several "native" machine types that can offer performance boosts in some cases:

my Int $x = 0;
$x++ while $x < 30000000;
say now - INIT now;

# OUTPUT:
# 4.416726

my int $x = 0;
$x++ while $x < 30000000;
say now - INIT now;

# OUTPUT:
# 0.1711660

That's a 2580% boost we achieved by simply switching our counter to a native int type.

The available types are: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, num, num32, and num64. The number in the type name signifies the available bits, with the numberless types being platform-dependent.

They aren't a magical solution to every problem, and won't offer huge improvements in every case, but keep them in mind and look out for cases where they can be used.

Concurrency

Perl 6 makes it extremely easy to utilize multi-core CPUs using high-level APIs like Promises, Supplies, and Channels. Where language XYZ is fast, but lacks ease of concurrency, Perl 6 can end up the winner in peformance by distributing work over multiple cores.

I won't go into details—you can consult the documentation or watch my talk that mentions them (slides here). I will show an example, though:

await (
    start { say "One!";   sleep 1; },
    start { say "Two!";   sleep 1; },
    start { say "Three!"; sleep 1; },
);
say now - INIT now;

# OUTPUT:
# One!
# Three!
# Two!
# 1.00665192

We use the start keyword to create three Promises and then use the await keyword to wait for all of them to complete. Inside our Promises, we print out a string and then sleep for at least one second.

The result? Our program has three operations that take at least 1 second each, yet the total runtime was just above 1 second. From the output, we can see it's not in order, suggesting code was executed on multiple cores.

That was quite easy, but we can crank it up a notch and use a HyperSeq to transform ordinary code into concurrent code with a single method call:

for (1..4).race( batch => 1 ) {
    say "Doing $_";
    sleep 1;
}
say "Code took {now - INIT now} seconds to run";

# OUTPUT:
# Doing 1
# Doing 3
# Doing 2
# Doing 4
# Code took 1.0090415 seconds to run

We had a list of 4 items to work with. We looped over each of them and performed an expensive operation (in this case, a 1-second sleep). To modify our code to be faster, we simply called the .race method on our list of 4 items to get a Hyper Sequence. Our loop remains the same, but it's now executing in a concurrent manner, as can be seen from the output: items are out of order and our total runtime was just over 1 second, despite a total of 4 seconds of sleep.

If the default batch size of 64 is suitable for you, it means you can go from a plain loop to a concurrent loop by simply typing 5 characters (. r a c e).

Let's See Some Benchmarks

I won't show you any. There's hardly any sense in benchmarking entire languages. Clever one-liners can be written to support one point of view or another, but they simply abstract a problem into a simplistic singularity. Languages are different and they have vastly different tool kits to solve similar problems. Would you choose code that completes in 1 second and takes you 40 minutes to write or code that completes in 2 seconds, yet takes you 10 minutes to write? The choice depends on the type of application you're writing.

Conclusion

Perl 6 is a brand new product, so it doesn't make sense to compare it against software that existed for decades. It is being actively improved and, at least in theory, it should become performant on the level similar to other competing languages.

You don't have to wait for that to happen, however. Thanks to Perl 6's pre-compilation of modules, support of native types, and superb concurrency primitives you can substantially improve the performance of your code right now.

Some may disagree that Perl 6 is slow, some may find it faster than another language, and some may say Perl 6 is slower than my fat momma.

Who's to decide for you? Only you yourself can.

Perl 6 Types: Made for Humans

In my first college programming course, I was taught that Pascal language has Integer, Boolean, and String types among others. I learned the types were there because computers were stupid. While dabbling in C, I learned more about what int, char, and other vermin looked like inside the warm, buzzing metal box under my desk.

Perl 5 didn’t have types, and I felt free as a kid on a bike, rushing through the wind, going down a slope. No longer did I have to cram my mind into the narrow slits computer hardware dictated me to. I had data and I could do whatever I wanted with it, as long as I didn’t get the wrong kind of data. And when I did get it, I fell off my bike and skinned my knees.

With Perl 6, you can have the cake and eat it too. You can use types or avoid them. You can have broad types that accept many kinds of values or narrow ones. And you can enjoy the speed of types that represent the mind of the machine, or you can enjoy the precision of your own custom types that represent your mind, the types made for humans.

Gradual Typing

my       $a = 'whatever';
my Str   $b = 'strings only';
my Str:D $c = 'defined strings only';
my int   $d = 16; # native int

sub foo ($x) { $x + 2 }
sub bar (Int:D $x) returns Int { $x + 2 }

Perl 6 has gradual typing, which means you can either use types or avoid them. So why bother with them at all?

First, types restrict the range of values that can be contained in your variable, accepted by your method or sub or returned by them. This functions both as data validation and as a safety net for garbage data generated by incorrect code.

Also, you can get better performance and reduced memory usage when using native, machine-mind types, providing they’re the appropriate tool for your data.

Built-In Types

There’s a veritable smörgåsbord of built-in types in Perl 6. If the thing your subroutine does makes sense to be done only on integers, use an Int for your parameters. If negatives don’t make sense either, limit the range of values even further and use a UInt—an unsigned Int. On the other hand, if you want to handle a broader range, Numeric type may be more appropriate.

If you want to drive closer to the metal, Perl 6 also offers a range of native types that map into what you’d normally find with, say, C. Using these may offer performance improvements or lower memory usage. The available types are: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, num, num32, and num64. The number in the type name signifies the available bits, with the numberless types being platform-dependent.

Sub-byte types such as int1, int2, and int4 are planned to be implemented in the future as well.

Smileys

multi foo (Int:U $x) { 'Y U NO define $x?'         }
multi foo (Int:D $x) { "The square of $x is {$x²}" }

my Int $x;
say foo $x;
$x = 42;
say foo $x;

# OUTPUT:
# Y U NO define $x?
# The square of 42 is 1764

Smileys are :U, :D, or :_ appended to the type name. The :_ is the default you get when you don’t specify a smiley. The :U specifies undefined values only, while :D specifies defined values only.

This can be useful to detect whether a method is called on the class or on the instance by having two multies with :U and :D on the invocant. And if you work at a nuclear powerplant, ensuring your rod insertion subroutine never tries to insert by an undefined amount is also a fine thing, I imagine.

Subsets: Tailor-Made Types

Built-in types are cool and all, but most of the data programmers work with doesn’t match them precisely. That’s where Perl 6 subsets come into play:

subset Prime of Int where *.is-prime;
my Prime $x = 3;
$x = 11; # works
$x = 4;  # Fails with type mismatch

Using the subset keyword, we created a type called Prime on the fly. It’s a subset of Int, so anything that’s non-Int doesn’t fit the type. We also specify an additional restriction with the where keyword; that restriction being that .is-prime method called on the given value must return a true value.

With that single line of code, we created a special type and can use it as if it were built-in! Not only can we use it to specify the type of variables, sub/method parameters and return values, but we can test arbitrary values against it with the smartmatch operator, just as we can with built-in types:

subset Prime of Int where *.is-prime;
say "It's an Int"  if 'foo' ~~ Int;   # false, it's a Str
say "It's a prime" if 31337 ~~ Prime; # true, it's a prime number

Is your “type” a one-off thing you just want to apply to a single variable? You don’t need to declare a separate subset at all! Just use the where keyword after the variable and you’re good to go:

multi is-a-prime (Int $ where *.is-prime --> 'Yup' ) {}
multi is-a-prime (Any                    --> 'Nope') {}

say is-a-prime 3;     # Yup
say is-a-prime 4;     # Nope
say is-a-prime 'foo'; # Nope

The --> in the signature above is just another way to indicate the return type, or in this case, a concrete returned value. So we have two multies with different signatures. First one takes an Int that is a prime number and the second one takes everything else. With exactly zero code in the bodies of our multies we wrote a subroutine that can tell you whether a number is prime!!

Pack it All Up for Reuse

What we’ve learned so far is pretty sweet, but sweet ain’t awesome! You may end up using some of your custom types quite frequently. Working at a company where product numbers can be at most 20 characters, following some format? Perfect! Let’s create a subtype just for that:

subset ProductNumber of Str where { .chars <= 20 and m/^ \d**3 <[-#]>/ };
my ProductNumber $num = '333-FOOBAR';

This is great, but we don’t want to repeat this subset stuff all over the place. Let’s shove it into a separate module we can use. I’ll create /opt/local/Perl6/Company/Types.pm6 because /opt/local/Perl6 is the path included in module search path for all the apps I write for this fictional company. Inside this file, we’ll have this code:

unit module Company::Types;
my package EXPORT::DEFAULT {
    subset ProductNumber of Str where { .chars <= 20 and m/^ \d**3 <[-#]>/ };
}

We name our module and let our shiny subsets be exported by default. What will our code look like now? It’ll look pretty sweet—no, wait, AWESOME—this time:

use Company::Types;
my ProductNumber $num1 = '333-FOOBAR'; # succeeds
my ProductNumber $num2 = 'meow';       # fails

And so, with a single use statement, we extended Perl 6 to provide custom-tailored types for us that match perfectly what we want our data to be like.

Awesome Error Messages for Subsets

If you’ve been actually trying out all these examples, you may have noticed a minor flaw. The error messages you get are Less Than Awesome:

Type check failed in assignment to $num2;
expected Company::Types::EXPORT::DEFAULT::ProductNumber but got Str ("meow")
in block <unit> at test.p6 line 3

When awesome is the goal, you certainly have a way to improve those messages. Pop open our Company::Types file again, and extend the where clause of our ProductNumber type to include an awesome error message:

subset ProductNumber of Str where {
    .chars <= 20 and m/^ \d**3 <[-#]>/
        or warn 'ProductNumber type expects a string at most 20 chars long'
            ~ ' with the first 4 characters in the format of \d\d\d[-|#]'
};

Now, whenever the thing doesn’t match our type, the message will be included before the Type check... message and the stack trace, providing more info on what sort of stuff was expected. You can also call fail instead of warn here, if you wish, in which case the Type check... message won’t be printed, giving you more control over the error the user of your code receives.

Conclusion

Perl 6 was made for humans to tell computers what to do, not for computers to restrict what is possible. Using types catches programming errors and does data validation, but you can abstain from using types when you don’t want to or when the type of data you get is uncertain.

You have the freedom to refine the built-in types to represent exactly the data you’re working with and you can create a module for common subsets. Importing such a module lets you write code as if those custom types were part of Perl 6 itself.

The Perl 6 technology lets you create types that are made for Humans. And it’s about time we started telling computers what to do.

UPDATE:

Perl 6 will actually evaluate your where expression when checking types even for optional parameters. This can be quite annoying, due to “uninitialized” values being compared. I wrote Subset::Helper to make it easier to create subsets that solves that issue, and it provides an easy way to add awesome error messages too.

About Zoffix Znet

user-pic I blog about Perl.