Perl 6: Comb It!
In Perl 5, I always appreciated the convenience of constructs like these two:
my @things = $text =~ /thing/g;
my %things = $text =~ /(key)...(value)/g;
You take some nice, predictable text, pop a regex next to it, and BOOM! You get a nice list of things or a pretty hash. Magical!
There are some similarities to this construct in Perl 6, but if you're a new
programmer, with Perl 5 background, there might be some confusion. First,
using several captures doesn't result in nice hashes right off the bat. Second,
you don't get strings, you get Match
objects.
While Matches are fine, let's look at a tool more suited for the job:
The comb
Plain 'Ol Characters
You can use comb
as a subroutine or as a method. In its basic form, comb
simply breaks up strings into characters:
'foobar moobar 駱駝道bar'.comb.join('|').say;
'foobar moobar 駱駝道bar'.comb(6).join('|').say;
# OUTPUT:
# f|o|o|b|a|r| |m|o|o|b|a|r| |駱|駝|道|b|a|r
# foobar| mooba|r 駱駝道b|ar
Without arguments, you get individual characters. Supply an integer and you'll get a list of strings at most that many characters long, receiving a shorter string when there are not enough characters left. This method is also about 30x faster than using a regex for the job.
Limits
You can also provide a second integer, the limit, to indicate that you want at most that many items in the final list:
'foobar moobar 駱駝道bar'.comb(1, 5).join('|').say;
'foobar moobar 駱駝道bar'.comb(6, 2).join('|').say;
# OUTPUT:
# f|o|o|b|a
# foobar| mooba
This applies to all forms of using comb
, not just the one shown above.
Counting Things
The comb
also takes a regular Str
as an
argument, returning a list of matches
containing... that string. So this is useful to get the total number the
substring appears inside a string:
'The 🐈 ran after a 🐁, but the 🐁 ran away'.comb('🐈').Int.say;
'The 🐈 ran after a 🐁, but the 🐁 ran away'.comb('ran').Int.say;
# OUTPUT:
# 1
# 2
Simple Matching
Moving onto the realm of regexes,
there are several ways to obtain what you want using comb
. The simplest
way is to just match what you want. The entire match will be returned as an
item by the comb:
'foobar moobar 駱駝道bar'.comb(/<[a..z]>+ 'bar'/).join('|').say;
# OUTPUT:
# foobar|moobar
The bar
with Rakuda-dō Japaneese characters did not match our a
through
z
character class and so was excluded from the list.
The wildcard match can be useful, but sometimes you don't want to include the wildcard in the resulting strings... Well, good news!
Limit What's Captured
You could use look-around assertions but an even simpler way is to
use <(
and )>
regex capture markers (<(
is similar to \K
in Perl 5):
'moo=meow ping=pong'.comb(/\w+ '=' <( \w**4/).join('|').say; # values
'moo=meow ping=pong'.comb(/\w+ )> '=' \w**4/).join('|').say; # keys
# OUTPUT:
# meow|pong
# moo|ping
You can use one or the other or both of them.<(
will exclude from the match
anything described before it and )>
anything that follows it. That is,
/'foo' <('bar')> 'ber'/
, will match things containing foobarber
, but
the returned string from comb
would only be string bar
.
Multi Captures
As powerful as comb
has been so far, we still haven't seen the compliment
to Perl 5's way of fishing out key/value pairs out of text using regex. We
won't be able to achieve the same clarity and elegance, but we can still
use comb
... we'll just ask it to give us Match
objects:
my %things = 'moo=meow ping=pong'.comb(/(\w+) '=' (\w+)/, :match)».Slip».Str;
say %things;
# OUTPUT:
# moo => meow, ping => pong
Let's break that code down:
it uses the same old .comb
to look for a sequence of word characters, followed by
the =
character, followed by another sequence of word characters. We use
()
parentheses to capture both of those sequences in separate captures. Also,
notice we added :match
argument to .comb
, this causes it to return a list
of Match
objects instead of strings. Next, we use two hyper operators (») to
first convert the Matches
to Slips
, which gives us a list of captures, but they're still Match
objects, which is
why we convert them to Str
as well.
An even more verbose, but much clearer, method is to use named captures instead
and then .map
them into Pairs
:
my %things = 'moo=meow ping=pong'
.comb(/$<key>=\w+ '=' $<value>=\w+/, :match)
.map({ .<key> => .<value>.Str });
say %things;
# OUTPUT:
# moo => meow, ping => pong
Lastly, an astute reader will rember I mentioned at the beginning that
simply using Perl 5's method
will result in a list of Match
objects... the same Match
objects we're
asking .comb
to give us above. Thus, you can also write the above code like
this, without .comb
:
my %things = ('moo=meow ping=pong' ~~ m:g/(\w+) '=' (\w+)/)».Slip».Str;
say %things;
# OUTPUT:
# moo => meow, ping => pong
Conclusion
We've learned how to break up a string into bits any way we want to. Be it one or more characters. Be it simple strings or regex matches. Be it partial captures
or multiple ones. You can use comb
for all. Combined with .rotor
, the power is limitless.
The other thing we also are certain of: nothing beats Perl 5's concise
my %things = $text =~ /(key)...(value)/g;
CIVash suggested on IRC that instead of .comb with :match, you can just use .match method: 'moo=meow ping=pong'.match(/(\w+) '=' (\w+)/, :g)
Maybe Hash assignment can detect Match object and flatten it to preserve
my %things = $text =~ m:g/(key)...(value)/;
idiom? It is too useful to be removed.