TWC 119: Les Nybb and the Arrhythmic Trio

In which Raku solutions give shape to Perl solutions, and vice versa, and then Raku does what Raku does best.

Task 1: Swap Nibbles - basic and extended solutions in Raku and Perl.

Task 2: Sequence of symbols 123 without adjacent 1’s. Solutions in Raku and Perl, then a radically different approach that I would have never discovered in anything but Raku.

Special thanks to TheYeti of CodeFights (now CodeSignal), who invited me to start participating in the Weekly Challenge way back in October 2019. That site no longer has messaging or forums, and I don’t know how to let them know directly, but this week is my response to their invitation. Thank you!

TWC Task #1, Swap Nibbles

Observations:

  • A nibble (which I learned as “nybble”) is 4 bits, which is exactly one hex digit. So, in any approach that starts with translating to a binary string, I should consider hexadecimal instead of binary, in case it is clearer or more concise to code for 1|2 hex digits instead of groups of 4|8 bits.
  • The task deliberately avoids discussing N>=256, so we have some freedom to decide on behavior if we also want to solve for an extended domain. I may choose differently for Perl than for Raku.

Raku

See Util’s Raku solution for complete program.

multi sub nib ( Int $n where ^256 ) {
    return ( ($n +& 0xF0) +> 4 )
         + ( ($n +& 0x0F) +< 4 );
}

multi sub nib ( Int $n where * >= 0 ) {
    my Str $hex = $n.base(16);
    $hex [R~]= '0' if $hex.chars !%% 2;
    return $hex.comb(2)».flip.join.parse-base(16);
}

Thoughts, and notable elements:

  • multi allows subroutines with “duplicate” names; the ambiguity is resolved by comparing the subs’ parameter lists to the arguments used in the actual subroutine call.
  • ^256 means the range 0..255
  • “Bit-twiddling” is an approach from the C language. Masking with +& 0xF0 is not strictly needed, since +> will right-shift those low bits into the bit bucket. I kept it there for clarity of intent.
  • To extend the task past 255 (for my own curiosity), I decided to swap nybbles in each byte. This is different from my decision in the Perl code.
  • The second nib sub would work just fine by itself. I took advantage of Raku’s multi to separate the solution for the task from my solution to the (invented) extended task.
  • .comb(N) splits a string into a list of sub-strings of length N.
  • .flip reverses a string; 'XYZ'.flip gives you 'ZYX'.
  • ». makes a method (that would work on one thing) work on each thing in a list.
  • .join, without a parameter, joins its list using the empty string as a separator.
  • !%% N means “is not evenly divisible by N”, and returns True or False, without needing to tell you what the remainder would be.
  • Pre-pending zero when $hex is an odd length is the easiest way to assure alignment before breaking into 2-character substrings.
  • The R meta-operator reverses the operands of its operator. 4 / 5 is 0.80, but 4 R/ 5 is the same as 5 / 4, or 1.25 .
  • Just like $foo ~= 9 would append nine to the contents of foo, $foo [R~]= 9 will prepend nine. R is my least favorite meta-operator, but I have wanted a prepending op for a long time, and I think this could become a good idiom in Raku. Maybe.
  • I could have used .fmt or sprintf instead of .base(16). Those would have given opportunity to specify that I want a leading zero, without needing that extra line of [R~]=. However, how can we tell how many digits to request? We cannot just say “even number of digits”. We have to calculate the size, which is 2*$n.log(16²).ceiling. That code is long enough to need its own line, so nothing is gained in brevity, and some clarity would be lost.
  • The whole sub could be in one expression by using sprintf’s .* syntax for separate specification of size, which automatically has leading-zero turned on for hex digits: return sprintf( '%.*X', 2*$n.log(16²).ceiling, $n ).comb(2)».flip.join.parse-base(16);. Now, let’s pretend I never wrote that.

Also, in the code (shown in the GitHub link above, not shown here) that auto-runs a test suite when no argument is given, I use two tricks:

  • Since 0x65 and 101 are just two different ways of “spelling” the exact same number, I use hex format for the test data. That allows a human to easily calculate by hand the expected result.
  • Since nib(nib(X)) should equal X, any pairs of input and expected output should also be tested in reverse. Instead of repeating the is line with reverse arguments, I kept it DRY (Don’t Repeat Yourself) by combining the “hyper-method” syntax with the .antipair method that swaps key-and-value in a Pair object.

Perl

See Util’s Perl solution for complete program.

sub nib ( $n ) {
    return   ($n & 0xFFFFFF00)
         + ( ($n & 0xF0) >> 4 )
         + ( ($n & 0x0F) << 4 );
}

See Util’s Perl bigint solution for complete program.

use bigint;
sub nib ( $n ) {
    return   ($n & ~0xFF)
         + ( ($n &  0xF0) >> 4 )
         + ( ($n &  0x0F) << 4 );
}

Thoughts, and notable elements:

  • Clearly, this Perl code was influenced by my Raku solution.
  • Instead of exchanging nybbles within every byte, I decided to only swap within the last byte. This is different from my decision in the Raku code.
  • I used 0xFFFFFF00 in my original code, thinking that the mask is large enough to cover all input below bigint range, and that I would figure out how big to extend it later, when I would add larger test cases and use bigint. Then I forgot to extend it, and got failures in the new larger test cases. I learned the trick from Abigail of how to make the mask however-big-you-need-it: bitwise negation via ~0xFF.
  • No 7+ character English words can be made from only the letters A-F, perl -wlnE 'print if /^[a-f]{6,}$/i' unixdict.txt , drastically limiting our options for test constants more clever than 0xDEADBEEF .

TWC Task #2, Sequence without 1-on-1

Observations:

  • Scanning through 1..∞ , and grep’ing for the right pattern, will be dirt-simple to write, and dog-slow to run. At N=258, you are already wasting 99.78% of the numbers to get ‘121212’, and the percentage of waste goes up as N grows. In Big-O notation, I think this approach is O(N²). The highest test number we are given is 60, so both Perl and Raku will complete it in sub-second speed. I think I can do better, so I will see how much performance I can gain without completely abandoning clarity.
  • Changing to base-4 (and filtering on !/0|11/) will greatly reduce the percentage of waste, but still have a worse-than-linear performance. I can’t decide if it’s Big-O just has a much smaller vanished constant, or is still N-squared for some lowered value of “squared”. :^)
  • Base-3 seems ideal, but is unworkable because we will need to tr/012/123/, and no standard base has leading zeros, so we cannot generate any number with a leading 1.
  • 1,2,3 are not “numbers” in this problem; they are “symbols”. The task could have specified A,B,C or Foo,Bar,Baz or ζ,η,θ and (after tr/// or s///g) the algorithms would be the same. I found it convenient to use 0,1,2 in some circular (modulo) manipulations, to use “first symbol”,”second symbol”,”third symbol” in my thinking, and 1,2,3 mainly in the output.
  • All the combinations of symbols of length 5 can be generated from all of length 4, by pre-pending 1 to all of length 4, then 2 to all of length 4, then 3 to all of length 4, then filtering out any with adjacent ones. Because you can generate everything of length N from length N-1, I started thinking of each of those groups of same-length combinations as a “generation”.
  • The numbers that represent the start of each generation (like where the final length-5 value 33333 increments to the first length-6 value 121212) are: 1,4,12,34,94,258,706,1930,5274,14410,.... That is the OEIS sequence A293005, which gives a formula that will quickly generate the rest that generation-size sequence, but I did not find it helpful in producing individual elements of the generations.
  • The count of members of each generation are: 1,3,8,22,60,164,448,1224,3344,9136,.... That is the OEIS sequence A028859, of which the first comment is “Number of words of length n without adjacent 0’s from the alphabet {0,1,2}”. Bingo! Also, the partial sums of this sequence produce A293005 above.
  • ::: (In writing this blog post, I now see that A028859 points us to the fxtbook: “Matters Computational” (formerly titled “Algorithms for Programmers”), section 14.9: “Strings with no two consecutive zeros”. My 2 A.M. scanning of that section yields no insights, but then I don’t understand Generating Functions even when I am awake.)
  • Raku has the Xop meta-operator, which is ideal for combining 1,2,3 with a prior generation to create the next generation.
  • Raku’s ... sequence operator is designed to concisely support this “generate the next from the prior” computation, and allow it to be “lazy” (deferred until needed, and generating as much as demanded).

Raku

See Util’s Raku solution for complete program.

sub s123 ( Int $n where * > 0 ) {
    sub next_generation ( @a ) {
        return [ ( <1 2 3> X~ @a ).grep: {!/11/} ];
    }
    constant @s = ( [""], &next_generation ... * ).map(*.<>).flat;
    return @s[$n];
}

Notable:

  • The “decontainerizing” operator .<> has to be mapped to undo the “hard” containerization that happens to each generation. Otherwise, .flat will fail to flatten them all into one nice long lazy stream.
  • We cannot change .map(*.<>) to ».<>, because the hyper-method syntax also implies that the method can be run on any element of the list in any order (the programmer is promising that parallel processing or out-of-order processing will cause no problems). This is not completely compatible with lazily-generated infinite lists, so the compiler disallows it (at least for now).
  • The @s array is auto-caching. For example, after s123(10_000) is calculated, s123(9_000) can be returned without calculation.
  • The only filtering needed the /11/ specified in the task. There is no “waste” from generating non-123 digits.
  • The performance is now linear: O(N). Whatever time it takes to calculate s123(999), it should only take 1000x as long to calculate s123(999*1000). That is a big win, compared to O(N²) which would take a 1000*1000x as long instead of just 1000x.

Perl

See Util’s Perl solution for complete program.

As I was translating the Raku solution to Perl, I re-read the grep: {!/11/} code and had an epiphany. If /11/ does not already exist in a generation, then the only place it can occur during the production of the next generation is in those with a leading-1. If I keep each generation sub-grouped by leading digit, I can code to leave out the group containing leading-1 while creating the 1-group of the next generation. This will remove the need for the grep completely, leaving only the direct generation of the combinations wanted by the task. I implemented this idea only in the Perl code.

sub s123 ( $n ) {
    state @s;
    state $last = [ [], [], [""] ];
    while ( not defined $s[$n] ) {
        push @s,            @{$last->[0]},@{$last->[1]},@{$last->[2]};

        $last = [
            [ map { "1$_" }               @{$last->[1]},@{$last->[2]} ],
            [ map { "2$_" } @{$last->[0]},@{$last->[1]},@{$last->[2]} ],
            [ map { "3$_" } @{$last->[0]},@{$last->[1]},@{$last->[2]} ],
        ];
    }
    return $s[$n];
}

I would have then used the sub-group idea to improve the Raku code, but the underlying idea pointed to deeper understanding.

More observations, and analysis:

  • I want to use the same approach here that I would use to convert to base-N for any N: modulo, subtract, divide. This gives you one digit at a time, building from smallest digit to largest, in O(log N) time. That cannot work here, because the smallest digit is arrhythmic; every time a /11/ pattern occurs at a higher level, it disrupts the even rhythm of 123123123123... that would allow the modulo operation to give the right answer.
  • Digit-at-a-time can be processed from the other direction. We usually don’t, because of the extra effort. For example, converting 361 to base 5, we first ask “what is the highest power of 5 that is not bigger than our target?” 5**4=625 is bigger. 5**3==125 is not bigger. Now, how many 125’s can we get out of 361? We can extract 2 of them, 2*125==250, and 361-250=111, so the first digit will be 2, and we continue the process with the remaining 111. We continue on, not until the remainder is zero, but until we calculate 5**N all the way down to N==0 and so have handled the “ones” place. We could do the same thing here, if we just had a way to tell (ahead of time) what input numbers represented each “left-most digit”, like 3xxxx, 2xxxx, and 1xxxx. Also, we need the “distance” between 3xxxx and 2xxxx (and 2->1 and 1->0) in the source numbers.
  • Instead of producing every number in every generation, we can calculate the counts in each generation using the sub-group idea from above; something like raku -e 'my @s1 = [0,0,1], { [ @^a.sum - @^a[0], @^a.sum, @^a.sum ] } ... *; say .raku for @s1.head(5);'

    [0, 0, 1] [1, 1, 1] [2, 3, 3] [6, 8, 8] [16, 22, 22] [44, 60, 60]

  • Using a flattened copy of those 3-sub-grouped counts, and a partial sum of the flattened copy, we can scan to find the first partial sum that cannot fit our target, and from its position calculate both the digit it should correspond to and the amount of the original target that it represents. For example:

@s2 is a flattened copy of @s1: 0, 0, 1, 1, 1, 1, 2, 3, 3, 6, 8, 8, 16, 22, 22, 44, 60, 60, …

@s3 is the partial sum of @s2: 0, 0, 1, 2, 3, 4, 6, 9, 12, 18, 26, 34, 50, 72, 94, 138, 198, 258, …

200 is the target, and falls between @s3[16]==198 and @s3[17]==258; 17 % 3 == 2, so the first digit will be 2+1==”3”, and we will be subtracting from 200 the values of @s2[17]==60 (to reduce 200 from representation 3xxxx to 2xxxx), @s2[16]==60 (to reduce from representation 2xxxx to 1xxxx), and @s2[15]==44 (to reduce from representation 1xxxx to xxxx). 60+60+44==164. 200-164==36. So, first digit is “3”, and the remaining target is 36.

36 is the target, and falls between @s3[11]==34 and @s3[12]==50; 12 % 3 == 0, so the next digit will be 0+1==”1”, and we will be subtracting from 36 the values of @s2[12]==16 (to reduce 36 from representation 1xxx to xxx). 36-16==20 So, next digit is “1”, and the remaining target is 20.

20 is the target, and falls between @s3[9]==18 and @s3[10]==26; 10 % 3 == 1, so the next digit will be 1+1==”2”, and we will be subtracting from 20 the values of @s2[10]==8 (to reduce 20 from representation 2xx to 1xx) and @s2[9]==6 (to reduce from representation 1xx to xx). 6+8==14 20-14==6 So, next digit is “2”, and the remaining target is 6.

And so on, until target is zero.

Raku

See Util’s Raku solution for complete program.

sub s123 ( Int $n is copy where * > 0 ) {
    constant @s1 = [0,0,1], { my \s = @^a.sum; [ s - @^a[0], s, s ] } ... *;
    constant @s2 = @s1.map(*.<>).flat;
    constant @s3 = [\+] @s2;

    return join '', gather {
        while $n > 0 {
            my $k   = @s3.first: :k, * > $n;
            my $pos = $k % 3;

            take $pos + 1; # Digit

            $n -= @s2[ $k + (-$pos .. 0) ].sum
        }

        # Should be impossible, but I cannot prove it.
        die "NEGATIVE N: ", $n if $n < 0;
    }
}

Using even the fastest linear Perl code, s123(10**120) would take longer than the heat death of the universe, if my laptop had enough memory.

Using this new logarithmic Raku code, s123(10**120) takes less than 1 second.

Perl

See Util’s Perl solution for complete program.

This Perl code is a straight translation of the Raku logN code, except that @s1 is skipped in favor of generating @s2 directly. In Raku, generating @s2 directly is much harder to understand, due to shifting positions of “the 3 values of the last generation” as each element has to look-back a different offset amount. This is the first situation that I have seen Perl’s “flatten everything always” approach allow for clearly better code.

sub s2 ( $n ) {
    state $s2 = [ 0, 0, 1 ];
    while ( not defined $s2->[$n] ) {
        my $s = sum0 @{$s2}[-3,-2,-1];
        push @{$s2}, $s - $s2->[-3], $s, $s;
    }
    return $s2->[$n];
}
sub s3 ( $n ) {
    state $s3 = [ s2(0) + s2(1) ];
    while ( not defined $s3->[$n] ) {
        push @{$s3}, $s3->[-1] + s2($#{$s3} + 1);
    }
    return $s3->[$n];
}
sub s123 ( $n ) {
    my $r;
    while ( $n > 0 ) {
        my $k = first { s3($_) > $n } 0..4200; # Enough for 10**600
        my $pos = $k % 3;

        $r .= $pos + 1; # Digit
        $n -= sum0 map { s2($_) } ($k-$pos) .. $k;
    }

    # Should be impossible, but I cannot prove it.
    die "NEGATIVE N: ", $n if $n < 0;

    return $r;
}

In closing, I want to point out that although all my Raku code is cleaner than its Perl equivalent, that is due in large part to my solutions having leaned on features that Raku added to its Perl origins, so translating them back to Perl makes them “wordier”.

The place where Raku was indispensable was in the support for higher-level thinking and lazy infinite lists.

Re-reading my logN Raku code, the three constant lazy infinite arrays, and the block that uses them, feel elegant, almost obvious. In reality, it was a treacherous, tortuous trek to tease that algorithm from the torrent of tri-digits. Not remotely obvious. Only elegant at the end.

I can now recreate this code in many languages, but I could not have created it in any language I know, except Raku.

/me sleeps

1 Comment

Thanks for your participation.

Leave a comment

About Bruce Gray

user-pic "Util" on IRC and PerlMonks. Frequent speaker on Perl and Raku, but infrequent blogger.