Splitting on a change in Perl6
I had more thoughts about splitting on character changes, partly thanks to a mailing list thread, which led to more questions.
As a prelude, here are regexps that splits when changing to the letter B
# If the cursor is before B, and is after a character, and is not after B, split
> say "ABBBCDEEF".split(/<?before (B) {} ><?after . ><!after B >/).perl
("A", "BBBCDEEF").Seq
# If the cursor is before B, and is after character-class-exlcuding B, split
> say "ABBBCDEEF".split(/<?before (B) {} ><?after <-[B]> >/).perl
("A", "BBBCDEEF").Seq
And when changing from the letter B
# If the cursor is after B, and is before a character, and is not before B, split
say "ABBBCDEEF".split(/<?after (B) {} ><?before . ><!before B >/).perl
("ABBB", "CDEEF").Seq
# If the cursor is after B, and is before character-class-exlcuding B, split
say "ABBBCDEEF".split(/<?after (B) {} ><?before <-[B]> >/).perl
("ABBB", "CDEEF").Seq
One of these worked for me in the general case-
# If cursor is before a character, save it in $c,
# then check cursor is NOT after the $c we saved
say "ABBBCDEEF".split(/<?before (.) {} :my $c=$0; ><?after . ><!after $c >/).perl
("A", "BBB", "C", "D", "EE", "F").Seq
Yay! But what's wrong with each of these variations?
# 1. The empty braces publish $0 etc, so why do I need $c?
say "ABBBCDEEF".split(/<?before (.) {} ><?after . ><!after $0 >/).perl
("A", "B", "B", "B", "C", "D", "E", "E", "F").Seq # Not the answer I wanted
# 2. Try saving the character the cursor is after, and compare with the character before
# Splits everywhere instead of on character transition, why?
say "ABBBCDEEF".split(/<?after (.) {} :my $c=$0; ><?before . ><!before $c >/).perl
("A", "B", "B", "B", "C", "D", "E", "E", "F").Seq # Not the answer I wanted
# 3. Use a negated character class, anything but $c
# How does one specify a class with everything but what's in $c? Not this....
say "ABBBCDEEF".split(/<?before (.) {} :my $c=$0; ><?after <- $c >>).perl
I'd like to know what's going on in 1, 2, 3 above- especially #3
O M G Perl6 regular expressions come with <same> as a built-in character class, now a very brief answer is
say "ABBBCDEEF".split(/<!same>/);
I am still interested to know what's going on with questions 1 2 3, and am considering opening an issue or two against character classes or their documentation.