Perl 6: The S/// Operator
Coming from a Perl 5 background, my first experience with Perl 6's non-destructive substitution operator S///
looked something like this:
(artist's impression)
You'll fare better, I'm sure. Not only have the error messages improved, but I'll also explain everything right here and now.
The Smartmatch
The reason I had issues is because, seeing familiar-looking operators, I
simply translated Perl 5's binding operator (=~
) to Perl 6's
smartmatch operator (~~
) and expected things to work. The S///
was not documented and, combined with the confusing (at the time) warning message, this was the source of my pain:
my $orig = 'meowmix';
my $new = $orig ~~ S/me/c/;
say $new;
# OUTPUT warning:
# Smartmatch with S/// can never succeed
The old warning suggests the ~~
operator is the wrong choice here and it is.
The ~~
isn't the equivalent of Perl 5's =~
. It aliases the left hand side
to $_
, evaluates the right hand side, and then calls .ACCEPTS($_)
on it. That is all there is to its magic.
So what's actually happening in the example above:
- By the time we get to
S///
,$orig
is aliased to$_
- The
S///
non-destructively executes substitution on$_
and returns the resulting string. This is what the smartmatch will operate on - The smartmatch, following the rules for
match of
Str
againstStr
, will giveTrue
orFalse
depending on whether substitution happened (True
, confusingly, meaning it didn't)
At the end of it all, we aren't getting what we actually want: the version of the string with substitution applied.
With The Given
Now that we know that S///
always works on $_
and returns the result, it's
easy to come up with a whole bunch of ways that set $_
to our original
string and gather back the return value of S///
, but let's look
at just a couple of them:
my $orig = 'meowmix';
my $new = S/me/c/ given $orig;
say $orig;
say $new;
my @orig = <meow cow sow vow>;
my @new = do for @orig { S/\w+ <?before 'ow'>/w/ };
say @orig;
say @new;
# OUTPUT:
# meowmix
# cowmix
# [meow cow sow vow]
# [wow wow wow wow]
The first one operates on a single value. We use the postfix form of the
given
block, which lets us avoid the curlies (you can use with
in place of given
with the same results). From the output, you can see the original string remained intact.
The second example operates on a whole bunch of strings from an Array
and we
use the do
keyword to execute a regular for
loop (that aliases to $_
in this case) and assign the result to the @new
array. Again, the output shows
the originals were not touched.
Adverbs
The S///
operator—just like s///
and some methods—lets you use regex adverbs:
given 'Lörem Ipsum Dolor Sit Amet' {
say S:g /m/g/; # Löreg Ipsug Dolor Sit Aget
say S:i /l/b/; # börem Ipsum Dolor Sit Amet
say S:ii /l/b/; # Börem Ipsum Dolor Sit Amet
say S:mm /o/u/; # Lürem Ipsum Dolor Sit Amet
say S:nth(2) /m /g/; # Lörem Ipsug Dolor Sit Amet
say S:x(2) /m /g/; # Löreg Ipsug Dolor Sit Amet
say S:ss/Ipsum Dolor/Gipsum\nColor/; # Lörem Gipsum Color Sit Amet
say S:g:ii:nth(2) /m/g/; # Lörem Ipsug Dolor Sit Amet
}
As you can see, they are in the form of :foo
that is added after the S
part of the operator. You can
use whitespace liberally and several adverbs can be used at the same time. Here are their
meanings:
:g
—(long alternative::global
) global match: replace all occurances:i
—case insentive match:ii
—(long alternative::samecase
) preserve case: regardless of the case of letter used as a substitute, the original case of the letter being replaced will be used:mm
—(long alternative::samemark
) preserve mark: in the example above, the diaeresis that was on lettero
was preserved and applied to the replacement letteru
:nth(n)
—replace onlynth
occurance:x(n)
—replace at mostn
occurances (mnemonic: "x as in times"):ss
—(long alternative::samespace
) preserve space type: the type of whitespace character is preserved, regardless of whitespace characters used in the replacement string. In the example above, we replaced with a new line, but the original space was kept
Method Form
Operator S///
is nice, but using it is somewhat awkward at times. Don't fear, Perl 6 provides
.subst
method for all your substitution needs and delightful .subst
/.substr
confusion. Here's
what its use looks like:
say 'meowmix'.subst: 'me', 'c';
say 'meowmix'.subst: /m./, 'c';
# OUTPUT:
# cowmix
# cowmix
The method takes either a regex or a plain string as the first positional argument, which is the thing it'll look for in its invocant. The second argument is the replacement string.
You can use the adverbs as well, by simply listing them as named Bool
arguments,
with a slight caveat. In S///
form, adverbs :ss
and :ii
imply
the presence of :s
(make whitepsace significant) and :i
(case-insensitive match) adverbs respectively.
In method form, you have to apply those to the regex itself:
given 'Lorem Ipsum Dolor Sit Amet' {
say .subst: /:i l/, 'b', :ii;
say .subst: /:s Ipsum Dolor/, "Gipsum\nColor", :ss;
}
# OUTPUT:
# Borem Ipsum Dolor Sit Amet
# Lorem Gipsum Color Sit Amet
Method Form Captures
Captures aren't alien to substitution operations, so let's try one out with the method call form of substitution:
say 'meowmix'.subst: /me (.+)/, "c$0";
# OUTPUT:
# Use of Nil in string context in block <unit> at test.p6 line 1
# c
Not quite what we were looking for. Our replacement string is constructed even before it reaches the .subst
method and the $0
variable inside of it actually refers to whatever it is before the
method call, not the capture in the .subst
regex. So how do we fix this?
The second argument to .subst
can also take a Callable
. Inside
of it, you can use the $0, $1, ... $n
variables the way they were meant to and get correct values from
captures:
say 'meowmix'.subst: /me (.+)/, -> { "c$0" };
# OUTPUT:
# cowmix
Here, we've used a pointy block for our Callable, but WhateverCode and subs will work too. They will
be called for each substitution, with the Match
object passed
as the first positional argument, if you need to access it.
Conclusion
The S///
operator in Perl 6 is the brother of s///
operator that instead of modifying the original
string, copies it, modifies, and returns the modified version. The way to use this operator differs from
the way non-destructive substitution operator works in Perl 5. As an alternative, a method version .subst
is available as well. Both method and operator form of substitution can take a number of adverbs that
modify the behaviour of it, to suit your needs.
In other words, is six's
S///
an equivalent of five'ss///r
?Yes. It just takes more adverbs than Perl 5's s/// and is used differently than Perl 6's s///