Perl Weekly Challenge 26: Common Letters and Mean Angles

These are some answers to the Week 26 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 22, 2019). This blog post offers some solutions to this challenge. Please don't read on if you intend to complete the challenge on your own, which you're strongly encouraged to do.

Challenge # 1: Common Letters Count

Create a script that accepts two strings, let us call it, “stones” and “jewels”. It should print the count of “alphabet” from the string “stones” found in the string “jewels”. For example, if your stones is “chancellor” and “jewels” is “chocolate”, then the script should print “8”. To keep it simple, only A-Z,a-z characters are acceptable. Also make the comparison case sensitive.

We're given two strings and need to find out how many characters of the second string can be found in the first string.

Common Letters Count in Perl 5

This is straight forward. Our script should be given two arguments (else we abort the program). We split the first string into individual letters and store them in the %letters hash. Note that we filter out any character not in the [A-Za-z] character class. Then we split the second string into individual letters, keep only letters found in the %letters hash and finally coerce the resulting list of letters in a scalar context to transform it in a letter count (note that the scalar keyword isn't really needed here, as we have a scalar context anyway, but I included it to make it easier to understand).

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

@ARGV == 2 or die "This script needs two strings are parameters";
my ($str1, $str2) = @ARGV;
my %letters = map {$_ => 1} grep /[A-Za-z]/, split "", $str1;
my $count = scalar grep { exists $letters{$_}} split "", $str2;
say "$str2 has $count letters from $str1";

Running the program:

$ perl count_letters.pl chocolate chancellor
chancellor has 8 letters from chocolate

$ perl count_letters.pl chancellor chocolate
chocolate has 8 letters from chancellor

$ perl count_letters.pl chancellor CHOCOLATE
CHOCOLATE has 0 letters from chancellor

We get the expected result. The last test shows that the comparison is case-sensitive, as requested in the specification.

Common Letters Count in Perl 6

We will use more or less the same idea as in P5, except that we'll use a set instead of a hash for storing unique letters of the first string.

use v6;

sub MAIN (Str $str1, Str $str2) {
    my $letters = $str1.comb.grep( /<[A..Za..z]>/ ).Set;
    my $count = $str2.comb.grep( { $_ (elem) $letters} ).elems;
    say "$str2 has $count letters from $str1";
}

This works as expected:

$ perl6 count_letters.p6 chocolate chancellor
chancellor has 8 letters from chocolate

$ perl6 count_letters.p6 chocolate CHANCELLOR
CHANCELLOR has 0 letters from chocolate

Mean Angles

Create a script that prints mean angles of the given list of angles in degrees. Please read wiki page that explains the formula in details with an example.

In mathematics, a mean of circular quantities is a mean which is sometimes better-suited for quantities like angles, day times, and fractional parts of real numbers. This is necessary since most of the usual means may not be appropriate on circular quantities. For example, the arithmetic mean of 0° and 360° is 180°, which is misleading because for most purposes 360° is the same thing as 0°.

A common formula for the mean of a list of angles is:

angle-mean.jpg

We just need to apply the formula, after having converted the input values from degrees to radians.

The Wikipedia page has the following example, that we will use in our tests: consider the following three angles as an example: 10, 20, and 30 degrees. Intuitively, calculating the mean would involve adding these three angles together and dividing by 3, in this case indeed resulting in a correct mean angle of 20 degrees. By rotating this system anticlockwise through 15 degrees the three angles become 355 degrees, 5 degrees and 15 degrees. The naive mean is now 125 degrees, which is the wrong answer, as it should be 5 degrees.

Mean Angles in Perl 5

There are a number of modules that could be used here to convert degrees to radians and radians to degrees, to compute arithmetic means and perhaps even to compute directly mean angles. But that wouldn't be a challenge if we were just using modules to dodge the real work.

So I wrote the deg2rad and rad2deg subroutines to do the angle unit conversions, and computed the arithmetic means of sines and cosines in a for loop.

As I do not have a use for such a program, I will implement the necessary subroutine and just use them in a series of tests.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use constant PI => atan2(1, 0) * 2;
use Test::More;
plan tests => 9;


sub deg2rad { return $_[0] * PI /180; }
sub rad2deg { return $_[0] * 180 / PI }

sub mean {
    my @angles = map { deg2rad $_ } @_;
    my $count = @angles;
    my ($sum_sin, $sum_cos) = (0, 0);
    for my $angle (@angles) {
        $sum_sin += sin $angle;
        $sum_cos += cos $angle;
    }
    return rad2deg atan2 $sum_sin/$count, $sum_cos/$count;
}

is deg2rad(0), 0, "To rad: 0 degree";
is deg2rad(90), PI/2, "To rad: 90 degrees";
is deg2rad(180), PI, "To rad: 180 degrees";
is rad2deg(PI/2), 90, "To degrees: 90 degrees";
is rad2deg(PI), 180, "To degrees: 180 degrees";
is deg2rad(rad2deg(PI)), PI, "Roundtrip rad -> deg -> rad";
is rad2deg(deg2rad(90)), 90, "Roundtrip deg -> rad -> deg";
is mean(10, 20, 30), 20, "Mean of 10, 20, 30 degrees";
is mean(355, 5, 15), 5, "Mean of 355, 5, 15 degrees";

Running the tests displays the following:

$ perl angle-mean.pl
1..9
ok 1 - To rad: 0 degree
ok 2 - To rad: 90 degrees
ok 3 - To rad: 180 degrees
ok 4 - To degrees: 90 degrees
ok 5 - To degrees: 180 degrees
ok 6 - Roundtrip rad -> deg -> rad
ok 7 - Roundtrip deg -> rad -> deg
ok 8 - Mean of 10, 20, 30 degrees
ok 9 - Mean of 355, 5, 15 degrees

Update: As pointed out in a comment by Saif below, there is no need to divide both arguments of the atan2 built-in function: these arguments represent the abscissa and the ordinate of a point in the plan. Whether the two Cartesian coordinates are divided by count or not does not change the resulting polar angle calculated by atan2. Thus, we don't need to perform this division, and we don't even need the $count variable. The mean subroutine can be simplified as follows:

sub mean {
    my @angles = map { deg2rad $_ } @_;
    my ($sum_sin, $sum_cos) = (0, 0);
    for my $angle (@angles) {
        $sum_sin += sin $angle;
        $sum_cos += cos $angle;
    }
    return rad2deg atan2 $sum_sin, $sum_cos;
}

The tests display the same results as before.

End update.

Mean Angles in Perl 6

We will use essentially the same idea as in P5.

use v6;
use Test;

sub deg2rad (Numeric $deg) { return $deg * pi /180; }
sub rad2deg (Numeric $rad) { return $rad * 180 / pi }

sub mean (*@degrees) {
    my @radians = map { deg2rad $_ }, @degrees;
    my $count = @radians.elems;
    my $avg-sin = ([+] @radians.map( {sin $_})) / $count; 
    my $avg-cos = ([+] @radians.map( {cos $_})) / $count; 
    return rad2deg atan2 $avg-sin, $avg-cos;
}
plan 9;
is deg2rad(0), 0, "To rad: 0 degree";
is deg2rad(90), pi/2, "To rad: 90 degrees";
is deg2rad(180), pi, "To rad: 180 degrees";
is rad2deg(pi/2), 90, "To degrees: 90 degrees";
is rad2deg(pi), 180, "To degrees: 180 degrees";
is deg2rad(rad2deg(pi)), pi, "Roundtrip rad -> deg -> rad";
is rad2deg(deg2rad(90)), 90, "Roundtrip deg -> rad -> deg";
is-approx mean(10, 20, 30), 20, "Mean of 10, 20, 30 degrees";
is-approx mean(355, 5, 15), 5, "Mean of 355, 5, 15 degrees";

And this is the output produced when running the script:

perl6  angle-mean.p6
1..9
ok 1 - To rad: 0 degree
ok 2 - To rad: 90 degrees
ok 3 - To rad: 180 degrees
ok 4 - To degrees: 90 degrees
ok 5 - To degrees: 180 degrees
ok 6 - Roundtrip rad -> deg -> rad
ok 7 - Roundtrip deg -> rad -> deg
ok 8 - Mean of 10, 20, 30 degrees
ok 9 - Mean of 355, 5, 15 degrees

Note that I had to use the is-approx function of the Test module (instead of the simple is function) for tests computing the mean because I would otherwise get failed tests due to rounding issues:

# Failed test 'Mean of 10, 20, 30 degrees'
# at angle-mean.p6 line 22
# expected: '20'
#      got: '19.999999999999996'
not ok 9 - Mean of 355, 5, 15 degrees

As you can see, the program computes 19.999999999999996, where I expect 20, which is nearly the same numeric value.

I actually expected similar problems with Perl 5, but, for some reason, it did not occur. Perhaps the P5 Test::More module has a built-in approximate numeric comparison that silently takes care of such problems.

Update: as note above in the P5 section of this task following Saif's comment, we don't really need to divide the arguments of the atan2 built-in function by the number of angles. The mean subroutine can be simplified as follows:

sub mean (*@degrees) {
    my @radians = map { deg2rad $_ }, @degrees;
    my $sum-sin = [+] @radians.map( {sin $_}); 
    my $sum-cos = [+] @radians.map( {cos $_}); 
    return rad2deg atan2 $sum-sin, $sum-cos;
}

The tests display the same results as before.

End update.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September, 29. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge 25: Pokémon Sequence and Chaocipher

These are some answers to the Week 25 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 15, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Sequence of Pokémon Names

Generate a longest sequence of the following English Pokemon names where each name starts with the last letter of previous name.ù

audino bagon baltoy banette bidoof braviary bronzor carracosta charmeleon cresselia croagunk darmanitan deino emboar emolga exeggcute gabite girafarig gulpin haxorus heatmor heatran ivysaur jellicent jumpluff kangaskhan kricketune landorus ledyba loudred lumineon lunatone machamp magnezone mamoswine nosepass petilil pidgeotto pikachu pinsir poliwrath poochyena porygon2 porygonz registeel relicanth remoraid rufflet sableye scolipede scrafty seaking sealeo silcoon simisear snivy snorlax spoink starly tirtouga trapinch treecko tyrogue vigoroth vulpix wailord wartortle whismur wingull yamask

First, an assumption: each name in the sequence must appear only once, because if there could be duplicates, then it wouldn't be difficult to find an infinite cyclical sequence and easily win the prize for the longest sequence. Therefore, when we use a name at some point in a sequence, it should be removed from the list of names authorized afterwards in the same sequence. We also assume that the longest sequence means the sequence with the largest number of names (not the largest number of letters). One comment, finally: one of the Pokémons is named "porygon2"; since no name starts with a digit, this name cannot be used within a sequence, but at best as the final item of a sequence.

Longest Sequence of Pokémons in Perl 5

The first version of my program did not handle the case where there are several sequences, but it still printed the largest sequence count each time it was updated. And it appeared immediately that there were many sequences (1248) with the highest count (23 names). So I changed the code to record all the sequences with the highest count.

The first thing that the program does is to populate a hash with arrays of words starting with the same letter (that letter being the key in the hash). This way, when we look for a successor in a sequence, we only look at names stating with the right letter. The program also maintains a $seen hash reference to filter out names that have already been used in a sequence.

The program is using brute force, i.e. trying every legal sequence. Each time we've found a sequence that can no longer be augmented, we need to backtrack. The easiest way to implement a backtracking algorithm is to use recursion. So, our search_seq calls itself recursively each time we want to add a new name to a sequence.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my @names = qw /audino bagon baltoy banette bidoof braviary bronzor 
                carracosta charmeleon cresselia croagunk darmanitan 
                deino emboar emolga exeggcute gabite girafarig 
                gulpin haxorus heatmor heatran ivysaur jellicent 
                jumpluff kangaskhan kricketune landorus ledyba 
                loudred lumineon lunatone machamp magnezone 
                mamoswine nosepass petilil pidgeotto pikachu pinsir 
                poliwrath poochyena porygon2 porygonz registeel 
                relicanth remoraid rufflet sableye scolipede scrafty 
                seaking sealeo silcoon simisear snivy snorlax spoink
                starly tirtouga trapinch treecko tyrogue vigoroth 
                vulpix wailord wartortle whismur wingull yamask/;

my %name_by_letter;
for my $name (@names) {
    my $start_letter = substr $name, 0, 1;
    push @{$name_by_letter{$start_letter}}, $name;
}

my @best_seq;
my $best_count = 0;
for my $name (@names) {
    search_seq( [$name], {$name => 1} );
}
say "BEST SEQUENCES: ";
for my $item (@best_seq) {
   say "@$item";
}
say "Number of sequences: ", scalar @best_seq;

sub search_seq {
    my ($current_seq, $seen) = @_;
    my $last_name = $current_seq->[-1];
    my $last_letter = substr $last_name, -1, 1;
    my @next_candidates = grep { not exists $seen->{$_} }   
        @{$name_by_letter{$last_letter}};
    if (scalar @next_candidates == 0) {
        my $count = scalar @$current_seq;
        if ($count > $best_count) {
            @best_seq = ($current_seq);
            $best_count = $count;
        } elsif ($count == $best_count) {
            push @best_seq, $current_seq;
        }
    } else {
        for my $name (@next_candidates) {
            my %local_seen = %$seen;
            $local_seen{$name} = 1;
            search_seq ([@$current_seq, $name], {%local_seen});
        }
    }
}

As already mentioned, the best sequence count is 23 names, and the program detects 1248 sequences with that name count. So, I will provide only a few lines of the output:

machamp petilil landorus scrafty yamask kricketune emboar registeel loudred darmanitan nosepass simisear relicanth heatmor rufflet trapinch haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar registeel loudred darmanitan nosepass simisear rufflet trapinch heatmor relicanth haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth haxorus simisear rufflet trapinch heatmor registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth heatmor registeel loudred darmanitan nosepass simisear rufflet trapinch haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth heatmor rufflet trapinch haxorus simisear registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino

The program runs in about 35 seconds. This is sort of acceptable, but still a bit too long in my view. The main problem is that adding just a few dozen names is very likely to make the performance totally unacceptable. I can think about a few micro-optimizations, but I'm not really interested with these. And I don't see any simple way to significantly improve performance. Well, yes, there might be a possibility: each time we explore a name, we could keep track of the longest sequence it has generated, so that when we explore a new name and find the first name, we could use the longest sequence. But it's not very easy, because that works only if that longest sequence does not have any of the names previously used. Overall, I'm not really convinced we would be able to add full longest subsequences very often.

I tried to pass the last added name as a parameter to the search_seq subroutine to avoid having to look for it in the current sequence, but that doesn't run any faster (possibly even slightly slower); it seems that the overhead of the additional argument is larger than the cost of dereferencing the last item of the current sequence. I also tried to populate a hash giving directly a list of possible successors for each name in the list (to avoid having to check repeatedly the last letter of the last word added), but that does not bring any significant speed improvement.

Longest Sequence of Pokémons in Perl 6

We'll use basically the same algorithm in Perl 6. Using sets or similar collections (in fact sethashes), with the mathematical set operations, will make the syntax a bit easier.

use v6;

my @names = < audino bagon baltoy banette bidoof braviary bronzor  
                carracosta charmeleon cresselia croagunk darmanitan 
                deino emboar emolga exeggcute gabite girafarig 
                gulpin haxorus heatmor heatran ivysaur jellicent 
                jumpluff kangaskhan kricketune landorus ledyba 
                loudred lumineon lunatone machamp magnezone 
                mamoswine nosepass petilil pidgeotto pikachu pinsir 
                poliwrath poochyena porygon2 porygonz registeel 
                relicanth remoraid rufflet sableye scolipede scrafty 
                seaking sealeo silcoon simisear snivy snorlax spoink
                starly tirtouga trapinch treecko tyrogue vigoroth 
                vulpix wailord wartortle whismur wingull yamask >;

my %name-by-letter;
for @names -> $name {
    my $start-letter = substr $name, 0, 1;
    push %name-by-letter{$start-letter}, $name;
}

my @best-seq;
my $best-count = 0;
for @names -> $name {
    search-seq( [$name], $name.SetHash );
}
say "BEST SEQUENCES: ";
for @best-seq -> $item {
   say "$item";
}
say "Number of sequences: ", @best-seq.elems;
say now - INIT now;

sub search-seq (@current-seq, $seen) {
    my $last-name = @current-seq[*-1];
    my $last-letter = substr $last-name, *-1, 1;
    my @next-candidates = grep {defined $_}, # Remove empty slots
        (@(%name-by-letter{$last-letter}) (-) $seen).keys;
    if ( @next-candidates.elems == 0) {
        my $count = @current-seq.elems;
        if $count > $best-count {
            @best-seq = @current-seq;
            $best-count = $count;
        } elsif ($count == $best-count) {
            push @best-seq, @current-seq;
        }
    } else {
        for @next-candidates -> $name {
            my @new-seq = | @current-seq, $name;
            search-seq( @new-seq, $seen ∪ $name.SetHash );
        }
    }
}

Again, we copy only a small fraction of the output:

machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch heatmor registeel loudred darmanitan nosepass simisear relicanth haxorus scrafty yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass scrafty yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass snivy yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass starly yamask kricketune emolga audino

So this works, but the Perl 6 program now runs in more than 8 minutes. I have to think harder about optimizations or preferably a better algorithm.

Update Sept. 11: In his comment below, Timo Paulssen suggested that the grep in this statement:

    my @next-candidates = grep {defined $_}, # Remove empty slots
        (@(%name-by-letter{$last-letter}) (-) $seen).keys;

is slowing down significantly the program. For some reason, the correction he suggested wasn't really successful (I probably did something wrong), but removing the grep by changing the statement to this:

    my @next-candidates = %name-by-letter{$last-letter} ??
        (@(%name-by-letter{$last-letter}) (-) $seen).keys !! ();

reduced the execution time to four and a half minutes. I don't understand why this simple grep is taking so much time (not far from half of the total time), but that's a very good improvement.

I also tried to populate a hash giving directly a list of possible successors for each name in the list (to avoid having to check repeatedly the last letter of the last word added), but that does not bring any significant speed improvement (a win of about ten seconds).

End update.

I'll still try to think about a better algorithm, if time permits, and come back if I find something of interest.

Challenge 2: Implementation of Chaocypher

Create script to implement Chaocipher. Please checkout wiki page for more information.

According to the linked Wikipedia page, the Chaocipher is a cipher method invented by John Francis Byrne in 1918 and described in his 1953 autobiographical Silent Years. He believed Chaocipher was simple, yet unbreakable. He offered cash rewards for anyone who could solve it. In May 2010, the Byrne family donated all Chaocipher-related papers and artifacts to the National Cryptologic Museum in Ft. Meade, Maryland, USA. This led to the disclosure of the Chaocipher algorithm in a paper entitled Chaocypher Revealed: the Algorithm (2010), by Moshe Rubin.

How the Chaocipher works

The Chaocipher system consists of two alphabets, with the "right" alphabet used for locating the plaintext letter while the other ("left") alphabet is used for reading the corresponding ciphertext letter. In other words, the basis of the method is a simple substitution. The novel idea in the Chaocipher algorithm, however, is that the two alphabets are slightly modified after each input plaintext letter is enciphered. This leads to nonlinear and highly diffused alphabets as encryption progresses.

Although Byrne had in mind a physical model with rotating wheels, we will follow Rubin's algorithmic explanation of the method and represent each of the two alphabets as a 26-character string consisting of a permutation of the standard alphabet, for example:

            +            *
LEFT (ct):  HXUCZVAMDSLKPEFJRIGTWOBNYQ 
RIGHT (pt): PTLNBQDEOYSFAVZKGJRIHWXUMC

The place marked with a + sign and a * sign are called by Byrne the zenith and nadir points and they correspond to the first and the fourteenth positions in the alphabet. They are important for the alphabet permutation that will be performed after each ciphering and deciphering step.

The right alphabet (bottom) is used for finding the plain text letter, while the left alphabet (top) is used for finding the corresponding cipher text letter.

To encipher the plaintext letter "A," we simply look for this letter in the right alphabet and take the corresponding letter ("P") in the left alphabet (ct and pt stand for cipher text and plain text).

Each time a letter has been encrypted (or decrypted), we proceed with permutations of the alphabets. To permute the left alphabet, we will:

  • Shift the whole alphabet cyclically, so that the letter just enciphered ("P") is moved to the zenith (first) position;

    LEFT (ct):  PEFJRIGTWOBNYQHXUCZVAMDSLK
    

    Remove temporarily the letter in the second position (or zenith + 1), "E" in our example, leaving a "hole" in this position:

    LEFT (ct):  P.FJRIGTWOBNYQHXUCZVAMDSLK
    

    Shift one position to the left all letters between the second position and the nadir position, leaving a hole in the nadir position:

    LEFT (ct):  PFJRIGTWOBNYQ.HXUCZVAMDSLK
    
  • And finally insert the letter that has been removed ("E") in the nadir position:

    LEFT (ct):  PFJRIGTWOBNYQEHXUCZVAMDSLK
    

    Permuting the right alphabet is a similar process, but with some small but important differences that I will not describe here: please refer to Rubin's document to find the details.

After the permutation of the right alphabet, the two alphabets look like this:

LEFT (ct):  PFJRIGTWOBNYQEHXUCZVAMDSLK
RIGHT (pt): VZGJRIHWXUMCPKTLNBQDEOYSFA

With these new alphabets, we are now ready to encrypt the second letter of the plain text. Then we permute again both alphabets and proceed with the third letter of the plain text. And so on.

Deciphering the cipher text is the same process, except of course that we need to locate the first letter of the cipher text in the left alphabet and pick up the corresponding letter in the right alphabet. Alphabet permutations then follow exactly the same rules as when enciphering the plain text.

The strength of the Chaocipher is that the encryption key (the two alphabets) is changed each time a letter of the input text is processed, and the way it is changed depends on the content of the input message. In effect, this is an advanced form of an autokey cipher that is very difficult to break.

Chaocipher Implementation in Perl 5

For our alphabets, we could use strings of characters, arrays of letters or even hashes. Operations on strings of characters are usually reasonably fast and efficient, so I settled for that. Since both alphabets need to be permuted at the same time, I decided to write only one subroutine (permute_alphabets) to permute both alphabets at the same time: at least, there is no risk to permute one and forget to permute the other. I included some tests based on Rubin's paper examples.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

sub permute_alphabets {
    my ($left, $right, $pos) = @_;
    my $newleft = substr ($left, $pos) . substr $left, 0, $pos;
    $newleft = substr ($newleft, 0, 1) . substr ($newleft, 2, 12) 
               . substr ($newleft, 1, 1) . substr $newleft, 14;

    my $newright = substr ($right, $pos+1) . substr $right, 0, $pos+1;
    $newright = substr ($newright, 0, 2) . substr ($newright, 3, 11) 
                . substr ($newright, 2, 1) . substr $newright, 14;
    return ($newleft, $newright);
}

sub run_tests {
    use Test::More; # Minimal tests for providing an example
    plan tests => 4;
    my $left  = 'HXUCZVAMDSLKPEFJRIGTWOBNYQ';
    my $right = 'PTLNBQDEOYSFAVZKGJRIHWXUMC';
    my $position = index $right, 'A';
    my ($newleft, $newright) = permute_alphabets $left, $right, 
        $position;
    is $newleft, 'PFJRIGTWOBNYQEHXUCZVAMDSLK', 
        "Left alphabet: $newleft";
    is $newright, 'VZGJRIHWXUMCPKTLNBQDEOYSFA', 
        "Right alphabet: $newright";
    my $plaintext = "WELLDONEISBETTERTHANWELLSAID";
    my $ciphertext = encipher($plaintext, $left, $right);
    is $ciphertext, 'OAHQHCNYNXTSZJRRHJBYHQKSOUJY', 
        "Testing enciphering: $ciphertext";
    my $deciphered = decipher($ciphertext, $left, $right);
    is $deciphered, $plaintext, "Roundtrip: $deciphered";
}

sub encipher {
    my ($plaintext, $left, $right) = @_;
    my $ciphertext = "";
    my @letters = split //, $plaintext;
    for my $let (@letters) {
        my $position = index $right, $let;
        $ciphertext .= substr $left, $position, 1;
        ($left, $right) = permute_alphabets ($left, $right, 
            $position);
    }
    return $ciphertext;
}

sub decipher {
    my ($ciphertext, $left, $right) = @_;
    my $plaintext = "";
    my @letters = split //, $ciphertext;
    for my $let (@letters) {
        my $position = index $left, $let;
        $plaintext .= substr $right, $position, 1;
        ($left, $right) = permute_alphabets ($left, $right, 
            $position);
    }
    return $plaintext;
}

if (@ARGV == 0) {
    run_tests;
} else {
    die "Invalid number of arguments: we need 4 arguments.\n" 
        unless @ARGV == 4;
    my ($mode, $text, $left, $right) = @ARGV;
    if ($mode eq 'encipher') {
        say encipher($text, $left, $right);
    } elsif ($mode eq 'decipher') {
        say decipher($text, $left, $right);
    } else {
        die "Invalid mode: must be 'encipher' or 'decipher'.\n";
    }
}

We can either launch the program without any argument to run the tests, or pass four arguments (mode, text, left alphabet and right alphabet) to encipher or decipher the text.

This is an example of the output:

$ perl chaocipher.pl
1..4
ok 1 - Left alphabet: PFJRIGTWOBNYQEHXUCZVAMDSLK
ok 2 - Right alphabet: VZGJRIHWXUMCPKTLNBQDEOYSFA
ok 3 - Testing enciphering: OAHQHCNYNXTSZJRRHJBYHQKSOUJY
ok 4 - Roundtrip: WELLDONEISBETTERTHANWELLSAID

$ perl chaocipher.pl encipher WELLDONEISBETTERTHANWELLSAID HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
OAHQHCNYNXTSZJRRHJBYHQKSOUJY

$ perl chaocipher.pl decipher OAHQHCNYNXTSZJRRHJBYHQKSOUJY HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
WELLDONEISBETTERTHANWELLSAID

In a real life program, we would need to do a few other things to make it more robust, including especially a proper validation of the arguments (only upper case ASCII letters in the input text, complete alphabets, etc.). This is left as an exercise to the reader ).

Chaocipher Implementation in Perl 6

Besides the minor syntax differences between Perl 5 and Perl 6, there no reason to change the way the enciphering/deciphering algorithm operates. We will use multi MAIN subroutines to decide on whether to run tests or to process a string passed to the program. There are a couple of minor added features. We declare an uppercase subset of the string type to provide a better (still limited) validation of subroutine arguments. And we fold the case of the program arguments to what is needed.

use v6;
subset UcStr of Str where /^<[A..Z]>+$/;

sub permute-alphabets (UcStr $left is copy, UcStr $right is copy, UInt $pos) {
    $left = substr($left, $pos) ~ substr $left, 0, $pos;
    $left = substr($left, 0, 1) ~ substr($left, 2, 12) 
            ~ substr($left, 1, 1) ~ substr $left, 14;

    $right = substr($right, $pos+1) ~ substr $right, 0, $pos+1;
    $right = substr($right, 0, 2) ~ substr($right, 3, 11) 
             ~ substr($right, 2, 1) ~ substr $right, 14;
    return ($left, $right);
}

sub run_tests {
    use Test; 
    plan 4;
    my $left  = 'HXUCZVAMDSLKPEFJRIGTWOBNYQ';
    my $right = 'PTLNBQDEOYSFAVZKGJRIHWXUMC';
    my $position = index $right, 'A';
    my ($newleft, $newright) = permute-alphabets $left, $right,
        $position;
    is $newleft, 'PFJRIGTWOBNYQEHXUCZVAMDSLK', 
        "Left alphabet: $newleft";
    is $newright, 'VZGJRIHWXUMCPKTLNBQDEOYSFA', 
        "Right alphabet: $newright";
    my $plaintext = "WELLDONEISBETTERTHANWELLSAID";
    my $ciphertext = encipher($plaintext, $left, $right);
    is $ciphertext, 'OAHQHCNYNXTSZJRRHJBYHQKSOUJY', 
        "Testing enciphering: $ciphertext";
    my $deciphered = decipher($ciphertext, $left, $right);
    is $deciphered, $plaintext, "Roundtrip: $deciphered";
}

sub encipher (UcStr $plaintext, UcStr $left is copy, UcStr $right is copy) {
    my $ciphertext = "";
    for $plaintext.comb -> $let {
        my $position = index $right, $let;
        $ciphertext ~= substr $left, $position, 1;
        ($left, $right) = permute-alphabets $left, $right,
            $position;
    }
    return $ciphertext;
}

sub decipher (UcStr $ciphertext, UcStr $left is copy, UcStr $right is copy) {
    my $plaintext = "";
    for $ciphertext.comb -> $let {
        my $position = index $left, $let;
        $plaintext ~= substr $right, $position, 1;
        ($left, $right) = permute-alphabets $left, $right, 
            $position;
    }
    return $plaintext;
}

multi MAIN () {
    run_tests;
} 
multi MAIN (Str $mode, Str $text, Str $left, Str $right) {  
    if $mode.lc eq 'encipher' {
        say encipher $text.uc, $left.uc, $right.uc;
    } elsif $mode.lc eq 'decipher' {
        say decipher $text.uc, $left.uc, $right.uc;
    } else {
        die "Invalid mode $mode: must be 'encipher' or 'decipher'.\n";
    }
}

And this is a sample output with various arguments:

$ perl6 chaocipher.p6 encipher WELLDONEISBETTERTHANWELLSAID HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
OAHQHCNYNXTSZJRRHJBYHQKSOUJY

$ perl6 chaocipher.p6  decipher OAHQHCNYNXTSZJRRHJBYHQKSOUJY HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
WELLDONEISBETTERTHANWELLSAID

$ perl6 chaocipher.p6
1..4
ok 1 - Left alphabet: PFJRIGTWOBNYQEHXUCZVAMDSLK
ok 2 - Right alphabet: VZGJRIHWXUMCPKTLNBQDEOYSFA
ok 3 - Testing enciphering: OAHQHCNYNXTSZJRRHJBYHQKSOUJY
ok 4 - Roundtrip: WELLDONEISBETTERTHANWELLSAID

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September, 22. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 24: Smallest Script and Inverted Index

These are some answers to the Week 24 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 8 , 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Smallest Script With No Execution Error

Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with the smallest one-liner.

I was first puzzled by this strange specification. Can it be that we really want a script that does nothing? Does it have to be the shortest possible script.

Well, after reading again, yes, it seems so.

I'll go for one-liners.

My script in Perl 5:

$ perl -e ''

Just in case there is any doubt, we can check the return value under Bash to confirm that there was no error:

$ echo $?
0

And this is my script in Perl 6:

$ perl6 -e ''

Note that, in both Perl 5 and Perl 6, creating an empty file and using it as a parameter to the perl or perl6 command line would work just as well, for example:

$ perl6 my-empty-file.pl

And that's it for the first challenge. Boy, that was a quick one.

Inverted Index

Create a script to implement full text search functionality using Inverted Index. According to wikipedia:

In computer science, an inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

Inverted Index in Perl 5

I do not find the Wikipedia explanation to be very clear, but I'll implement the following: I have on my file system a directory containing about 500 Perl scripts (with a '.pl' extension). My program will read all these files (line by line), split the lines into words and keep only words containing only alphanumerical characters (to get rid of operators and variables names with sigils) and with a length of at least 3 such characters. These words will be used to populate a hash (actually a HoH), so that for each such word, I'll be able to directly look up the name of all the files where this word is used.

This is fairly simple:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @files = glob "./*.pl";
my %dict;
for my $file (@files) {
    open my $IN, "<", $file or die "Cannot open $file $!";
    while (my $line = <$IN>) {
        my @words = grep { /^\w{3,}$/ } split /\s+/, $line;;
        $dict{$_}{$file} = 1 for @words;
    }
    close $IN;
}
print Dumper \%dict;

The output has a bit less than 20,000 lines, which read in part as follows:

'checkdir' => {
                './monitor_files.pl' => 1,
                './monitor_files2.pl' => 1
              },
'start' => {
             './solver.pl' => 1,
             './url_regex.pl' => 1,
             './teams.pl' => 1,
             './test_start.pl' => 1,
             './markov_analysis.pl' => 1
           },
'1000' => {
            './first.pl' => 1,
            './jam1.pl' => 1
          },
'Minimal' => {
               './vigenere.pl' => 1
             },
'last' => {
            './strong_primes.pl' => 1,
            './pm_1196078.pl' => 1,
            './bench_lazy_map.pl' => 1,
            './inter_pairs.pl' => 1,
            './ladder2.pl' => 1,
            './perfect.pl' => 1,
            './homophones.pl' => 1,
            './pairs.pl' => 1,
            (...)

It wouldn't be difficult to store the output into a text file (that can then be reloaded into a Perl script hash) or into a database, or to find some other way of making the data persistent, but I have little use for such an index and the challenge specification does not request anything of that type. So, I will not try to go further.

Inverted Index in Perl 6

We'll do the same thing in Perl 6, but with another directory containing about 350 Perl 6 programs (with ".p6" or ".pl6" extensions).

use v6;

my @files = grep { /\.p6$/ or /\.pl6$/ }, dir('.');
my %dict;
for @files -> $file {
    for $file.IO.lines.words.grep({/^ \w ** 3..* $/}) -> $word {
        %dict{$word}{$file} = True;
    }
}
.say for %dict{'given'}.keys;

The program duly prints out the list of files with the given keyword:

$ perl6 inverted-index.p6
mult_gram.p6
calc_grammar.pl6
calculator-exp.pl6
VMS_grammar.p6
ana2.p6
calc_grammar2.pl6
ArithmAction.pl6

[... lines omitted for brevity]

normalize_url.p6
calculator.p6
arithmetic.pl6
json_grammar_2.pl6
point2d.pl6
arithmetic2.pl6
forest.p6

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September 15. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 23: Difference Series and Prime Factorization

These are some answers to the Week 23 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 1, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: nth Order Difference Series

Create a script that prints nth order forward difference series. You should be a able to pass the list of numbers and order number as command line parameters. Let me show you with an example.

Suppose we have list (X) of numbers: 5, 9, 2, 8, 1, 6 and we would like to create 1st order forward difference series (Y). So using the formula Y(i) = X(i+1) - X(i), we get the following numbers: (9-5), (2-9), (8-2), (1-8), (6-1). In short, the final series would be: 4, -7, 6, -7, 5. If you noticed, it has one less number than the original series. Similarly you can carry on 2nd order forward difference series like: (-7-4), (6+7), (-7-6), (5+7) => -11, 13, -13, 12.

nth Order Difference Series in Perl 5

For this, we will write a simple fwd_diff subroutine to compute the first order difference series of the list of values passed as arguments to it. We do that with map on the indexes of the arguments list (starting at 1).

Then, we use a for loop to call this subroutine the number of times required by the first parameter (the order) passed to the script. Note that if the order is larger than the count of the other items passed to the script, then we cannot compute the result.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub fwd_diff {
    return map $_[$_] - $_[$_ - 1], 1..$#_;
}

my ($order, @values) = @ARGV;
my $count = scalar @values;
if ($count <= $order) {
    die "Can't calculate ${order}th series of $count values";
}
my @result = @values;
for (1..$order) {
    @result = fwd_diff @result;
}
say "${order} forward diff of @values is: @result";

Testing with 6 values the forward difference series with orders 1 to 6 displays the following output:

$ perl  fwd_diff.pl 1 5 9 2 8 1 6
1th forward diff of 5 9 2 8 1 6 is: 4 -7 6 -7 5

$ perl  fwd_diff.pl 2 5 9 2 8 1 6
2th forward diff of 5 9 2 8 1 6 is: -11 13 -13 12

$ perl  fwd_diff.pl 3 5 9 2 8 1 6
3th forward diff of 5 9 2 8 1 6 is: 24 -26 25

$ perl  fwd_diff.pl 4 5 9 2 8 1 6
4th forward diff of 5 9 2 8 1 6 is: -50 51

$ perl  fwd_diff.pl 5 5 9 2 8 1 6
5th forward diff of 5 9 2 8 1 6 is: 101

$ perl  fwd_diff.pl 6 5 9 2 8 1 6
Can't calculate 6th series of 6 values at fwd_diff.pl line 13.

nth Order Difference Series in Perl 6

I would have liked to be able to use a pointy block syntax with two parameters, but that does not work because the loop will consume two values at each step, as shown under the REPL:

> for <5 9 2 8 1 6> -> $a, $b {say $b - $a}
4
6
5

So we would need to pre-process the input data in order to get twice all values except those at both ends of the input list.

We'll use the rotor built-in function

These are two examples using rotor under the REPL:

> <5 9 2 8 1 6>.rotor(1)
((5) (9) (2) (8) (1) (6))
> <5 9 2 8 1 6>.rotor(2)
((5 9) (2 8) (1 6))

In these examples, rotor groups the elements of the invocant into groups of 1 and 2 elements respectively.

The rotor method can take as parameter a key-value pair, whose value (the second item) specifies a gap between the various matches:

> (1..10).rotor(2 => 1)
((1 2) (4 5) (7 8))

As you can see, we obtain pairs of values, with a gap of 1 between the pairs (item 3, 6 and 9 are omitted from the list). Now, the gap can also be negative and, with a gap of -1, we get all successive pairs from the range:

> <5 9 2 8 1 6>.rotor(2 => -1)
((5 9) (9 2) (2 8) (8 1) (1 6))

This is exactly what we need: we can now subtract the first item from the second one in each sublist.

Continuing under the REPL, we can define the fwd-diff subroutine and use it as follows:

> sub fwd-diff (*@in) { map {$_[1] - $_[0]},  (@in).rotor(2 => -1)}
&fwd-diff
> say fwd-diff <5 9 2 8 1 6>
[4 -7 6 -7 5]
>

OK, enough experimenting with the REPL, we now know how to solve the challenge and can write our program:

use v6;

sub fwd-diff (*@in) { 
    map {$_[1] - $_[0]},  (@in).rotor(2 => -1)
}
sub MAIN (Int $order, *@values) {
    if @values.elems <= $order {
        die "Can't compute {$order}th series of {@values.elems} values";
    }
    my @result = @values;
    for 1 .. $order {
        @result = fwd-diff @result;
    }
    say "{$order}th forward diff of @values[] is: @result[]";
}

Testing with 6 values the forward difference series with orders 1 to 6 displays the following output:

$ fwd-diff.p6 1 5 9 2 8 1 6
1th forward diff of 5 9 2 8 1 6 is: 4 -7 6 -7 5

$ fwd-diff.p6 2 5 9 2 8 1 6
2th forward diff of 5 9 2 8 1 6 is: -11 13 -13 12

$ fwd-diff.p6 3 5 9 2 8 1 6
3th forward diff of 5 9 2 8 1 6 is: 24 -26 25

$ fwd-diff.p6 4 5 9 2 8 1 6
4th forward diff of 5 9 2 8 1 6 is: -50 51

$ fwd-diff.p6 5 5 9 2 8 1 6
5th forward diff of 5 9 2 8 1 6 is: 101

$ fwd-diff.p6 6 5 9 2 8 1 6
Can't compute 6th series of 6 values
  in sub MAIN at fwd-diff.p6 line 9
  in block <unit> at fwd-diff.p6 line 1

Note that I was hoping to get rid of the if @values.elems <= $order test and related die block by using a constraint in the signature of the MAIN subroutine, for example something like this:

sub MAIN (Int $order, *@values where @values.elems > $order) { # ...

but that does not appear to work properly.

Prime Factorization

Create a script that prints Prime Decomposition of a given number. The prime decomposition of a number is defined as a list of prime numbers which when all multiplied together, are equal to that number. For example, the Prime decomposition of 228 is 2,2,3,19 as 228 = 2 * 2 * 3 * 19.

Prime Factorization in Perl 5

The simplest way to solve this challenge is called trial division, i.e. to divide the input number by successive integers until the result is 1. This may appear to be a silly brute force approach, but it turns out to be fairly fast even for the largest integers that Perl 5 can natively handle (there is nothing in the challenge specification that says that we should be able to handle very large numbers). The only performance enhancements that we'll do here is to test even division by 2 and then only by successive odd numbers, and exit the loop when $div becomes too large. I thought for a moment that it would be worth to test only prime numbers, but first finding prime numbers would take more time than what we are likely to win.

We store the prime factors in a hash, with the key being a factor and the value the number of times this factor is a divisor of the input number.

The fact that factors are taken out of the number $num in ascending order garantees the list will only contain primes.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my $num = shift;
my %factors;
while ($num % 2 == 0) {
    $factors{2} ++;
    $num /= 2;
}
my $div = 1;
while (1) {
    $div += 2;
    while ($num % $div == 0) {
        $factors{$div} ++;
        $num /= $div;
    }
    last if $num == 1;
    ++$factors{$div} and last if $div * 2  > $num;
}
for my $fact (sort { $a <=> $b } keys %factors) {
    say "$fact ** $factors{$fact}";
}

This is the output for some test values:

$ perl prime-fact.pl 12
2 ** 2
3 ** 1

$ perl prime-fact.pl 1200
2 ** 4
3 ** 1
5 ** 2

$ perl prime-fact.pl 1280
2 ** 8
5 ** 1

$ time perl prime-fact.pl 128089876
2 ** 2
463 ** 1
69163 ** 1

real    0m0,055s
user    0m0,015s
sys     0m0,030s


$ time perl prime-fact.pl 1280898769976
2 ** 3
7 ** 2
1783 ** 1
1832641 ** 1

real    0m0,118s
user    0m0,078s
sys     0m0,030s

As we can see on the last test, even for a number with 13 digits and one relatively large prime factor, the computation takes less than 0.2 second. With larger numbers having large prime factors, this might take a few seconds, but that's OK, I'm satisfied with that.

Prime Factorization in Perl 6

Perl 6 has a fast is-prime built-in routine that we can use to build a lazy infinite list of prime numbers, so that we can try even division by prime factors only.

use v6;

my @primes = grep {.is-prime}, 1..*;

sub MAIN (UInt $num is copy) {
    my %factors;
    for @primes -> $div {
        while ($num %% $div) {
            %factors{$div}++;
            $num div= $div;
        }
        last if $num == 1;
        ++%factors{$num} and last if $num.is-prime;
    }
    for sort {$^a <=> $^b}, keys %factors -> $fact {
        say "$fact ** %factors{$fact}";
    }
    say now - INIT now; # timings
}

Note that this line:

++%factors{$num} and last if $num.is-prime;

isn't really needed but brings a significant performance enhancement when the last factor to be found is very large, as it can be seen in the last three tests below (in such cases, Perl 6 runs significantly faster than Perl 5):

$ perl6 prime-fact.p6 12
2 ** 2
3 ** 1
0.0129253

$ perl6 prime-fact.p6 1200
2 ** 4
3 ** 1
5 ** 2
0.01692924

$ perl6 prime-fact.p6 1280
2 ** 8
5 ** 1
0.01294

$ perl6 prime-fact.p6 128089876
2 ** 2
463 ** 1
69163 ** 1
0.052831

$
$ perl6 prime-fact.p6 1280898769976
2 ** 3
7 ** 2
1783 ** 1
1832641 ** 1
0.1106868

$ perl6 prime-fact.p6 128089876997685
3 ** 1
5 ** 1
29 ** 1
37 ** 1
179 ** 1
44460137 ** 1
0.051871

perl6 prime-fact.p6 12808987699768576
2 ** 8
509 ** 1
98300801969 ** 1
0.0469033

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September, 8. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 22: Sexy Prime Pairs and Compression Algorithm

These are some answers to the Week 22 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 25, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Sexy Prime Pairs

Write a script to print first 10 Sexy Prime Pairs. Sexy primes are prime numbers that differ from each other by 6. For example, the numbers 5 and 11 are both sexy primes, because 11 - 5 = 6. The term “sexy prime” is a pun stemming from the Latin word for six: sex. For more information, please checkout wiki page.

My first question, when reading this definition, was whether sexy primes had to be consecutive prime numbers. The example provided (as well as those found in the the Wikipedia page) shows that it needs not be the case: 5 and 11 are not consecutive primes (since 7 is also prime). If sexy primes had to be consecutive primes, then the first such pair would be (23, 29). With that answer to my question, it seems to me that all we need to do is to look at each prime number p and check whether p + 6 is prime (and stop as soon as we have 10 sexy pairs.

Note that (1, 7) is not a sexy prime pair (despite having a gap of 6), because 1 is not considered to be a prime number. Therefore, to avoid the risk of finding a false sexy prime pair, we will start our search with number 2.

Sexy Prime Pairs in Perl 6

We first build a lazy infinite list @sexy-primes of prime numbers such that each such prime + 6 is also prime, and then print the pairs:

use v6;

my @sexy-primes = grep { .is-prime and ($_ + 6).is-prime}, (2, 3, *+2 ... Inf);
say "@sexy-primes[$_] ", @sexy-primes[$_] + 6 for ^10;

Note that, as a basis for finding the primes, we use a sequence operator with an explicit generator in order to check parity only for odd numbers. This avoids useless computations on even numbers which cannot be prime (except for 2). This might be considered premature optimization (and we all know what Donald Knuth said about premature optimization). Well, yes, but, at the same time, I don't like to let my programs do unnecessary work.

And this prints:

$ perl6 sexy-pairs.p6
5 11
7 13
11 17
13 19
17 23
23 29
31 37
37 43
41 47
47 52

This program is so short that we can easily get rid of the @sexy-primes temporary array and transform the script into a Perl6 one-liner:

$ perl6 'say "$_ ", $_+6 for (2...*).grep({.is-prime && ($_ + 6).is-prime})[^10];'
5 11
7 13
11 17
13 19
17 23
23 29
31 37
37 43
41 47
47 53

Sexy Prime Pairs in Perl 5

Since we know from our tests with Perl 6 that we're not going to look for prime numbers much larger than 50, we don't need to try hard to optimize the primality check subroutine as we've done in some previous weekly challenges. Our is_prime subroutine will simply test all possible factors between 2 and the square root of the number being checked.

Since Perl 5 doesn't have infinite lists, we will just use an infinite while loop instead and break out of it with a last statement once we've found what we need.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub is_prime{
    my $num = shift;
    for my $i (2 .. $num ** .5) {
        return 0 if $num % $i == 0;
    }
    return 1;
}

my ($candidate, $count) = (2, 0);
while (1) {
    if (is_prime $candidate and is_prime $candidate + 6) {
        say "$candidate ", $candidate + 6;
        $count ++
    }
    last if $count >= 10;
    $candidate ++;
}

And this prints out the same output as our P6 programs, no point of repeating it here.

Note that the program runs in 0.08 second, there was really no need to try to optimize performance.

Lempel–Ziv–Welch (LZW) Compression

Write a script to implement Lempel–Ziv–Welch (LZW) compression algorithm. The script should have method to encode/decode algorithm. The wiki page explains the compression algorithm very nicely.

Lempel–Ziv–Welch (LZW) is a lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978.

The scenario described by Welch encodes sequences of 8-bit data as fixed-length 12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence with no code yet in the dictionary. The code for the sequence (without that character) is added to the output, and a new code (for the sequence with that character) is added to the dictionary.

For encoding (or, really, compressing) a string, we buffer input the characters in a sequence (note that we use here the variables names from the Wikipedia page to facilitate understanding) until the next is not in the %dict hash. Emit the code for , and add plus the next character to the hash. Start buffering again with the next character. Concretely, we first populate the %dict hash with the single possible letters. Then, we traverse the input string character by character and build the sequence as long as it exists in the dict hash. When the new sequence to be built does not exist in the hash, we add the previous sequence to the result, add the new one to the hash and start a new sequence with the last visited character.

For decoding (decompressing), we use the same initial hash as when encoding (we don't need the final hash, so we don't need to transmit the dictionary, which can be hard coded). Additional entries can be reconstructed as they are always simply concatenations of previous entries. Concretely, we populate %dict hash as before, but inverting keys and values. Then we go through the codes one by one; if a code exists in the hash, we just convert it and add it to the output; else, we build the new sequence, add it to the output and add the sequence concatenated with the sequence's first character to the hash.

There is nothing specific to either Perl 5 or Perl 6 in the above explanations, so they apply to both our P5 and P6 implementations below.

LZW Compression in Perl 6

For a start, we will use an input string ('TOBEORNOTTOBEORTOBEOR...') consisting only of capital letters ('A'..'Z'), as in the Wikipedia article, and populate our initial hash %dict with corresponding numeric codes between 0 and 25.

use v6;

constant $start-dict-size = 26;

sub encode (Str $in) {
    my %dict = map { $_[0] => $_[1] }, 
        ( ('A'..'Z') Z (^$start-dict-size) );
    my $ω = "";
    my @result = gather {
        for $in.comb -> $c {
            my $ωc = $ω ~ $c;
            if %dict{$ωc}:exists {
                $ω = $ωc;
            } else {
                take %dict{$ω};
                %dict{$ωc} = +%dict;
                $ω = $c;
            }
        }
        take %dict{$ω} if $ω.chars;
    }
    # say %dict;
    return @result;
}
sub decode (@encoded) {
    my $dict-size = $start-dict-size;
    my %dict = map { $_[1] => $_[0] }, 
        ( ('A'..'Z') Z (^$start-dict-size) );
    my $ω = %dict{shift @encoded};
    my @result = gather {
        take $ω; 
        for @encoded -> $i {
            my $str;
            if %dict{$i}:exists {
                $str = %dict{$i};
            } elsif  $i == $dict-size {
                $str = $ω ~ $ω.substr(0,1) 
            }
            take $str;
            %dict{$dict-size++} = $ω ~ $str.substr(0,1);
            $ω = $str;
        }
    }
    return join "", @result;
}

my $input_str = 'TOBEORNOTTOBETOBEORNOTTOBETOBEORNOTTOBE';
my @encoded = encode $input_str;
say @encoded;
say decode @encoded;

Running this code produces a correct round trip and displays the following output:

$ perl6 LZW_compression.p6
[19 14 1 4 14 17 13 14 19 26 28 35 29 31 33 37 37 30 32 34 27 4]
TOBEORNOTTOBETOBEORNOTTOBETOBEORNOTTOBE

The encoded (compressed) code has 22 numbers that could each be encoded over 6 bits, so that's a total of 132 bits. The input string had 39 bytes, i.e. 312 bits. In other words, we obtain a compression ratio of 2.36. Admittedly, we could have used a fixed-length encoding scheme and encoded each character of the input string over 5 bits, which would have led to a total of 195 bits, leading to a compression ratio of 1.6. We still get an LZW compression ratio which is 1.47 times better than a fixed-length encoding.

The reason for this better compression ratio is that many of our numeric codes represent two letters of the input, and some of them even more letters; for example, numeric code (35) stands for 3 letters, "TOB", and code 37 stands for 4 letters, "TOBE":

19 14 1 4 14 17 13 14 19 26 28 35  29 31 33 37   37   30 32 34 27 4
T  O  B E O  R  N  O  T  TO BE TOB EO RN OT TOBE TOBE OR NO TT OB E

Encoding only ASCII capital letters is of course very limited. Leaving aside Unicode, we would like at least to be able to compress bytes encoded over 256 bits. For this, we only need to change the $start-dict-size constant to 256 and to populate the initial %dict hash accordingly. For example, this way for the encode subroutine:

my %dict = map { .chr => $_ }, ^$start-dict-size;

And this way in the decode subroutine:

my %dict = map { $_ => .chr }, ^$start-dict-size;

The compressed code still has 22 numbers, but the compression rate would fall down, because these numbers would now need to be encoded over more bits:

[84 79 66 69 79 82 78 79 84 256 258 265 259 261 263 267 267 260 262 264 257 69]

And we can now compress data not comprising only of capital ASCII letters. For example, with the following input string:

To be or not to be, to be or not to be, that's the question

we obtain the following output:

perl6 LZW_compression.p6
[84 111 32 98 101 32 111 114 32 110 111 116 32 116 257 259 44 268 270 260 262 264 266 
273 258 101 272 116 104 97 116 39 115 268 104 260 113 117 101 115 116 105 111 110]
To be or not to be, to be or not to be, that's the question

LZW Compression in Perl 5

As for the P6 implementation, we will use an input string ('TOBEORNOTTOBE...') consisting only of capital letters ('A'..'Z'), as in the Wikipedia article, and populate our initial hash %dict with corresponding numeric codes between 0 and 25. Translating the Perl 6 implementation into Perl 5 is a bit tedious because of all these small pesky syntax differences between P5 and P6, but is conceptually a piece of cake. Here we go:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say/;
use constant start_dict_size => 256;
use utf8;

sub encode {
    my $in = shift;
    my %dict = map { chr $_ => $_ } 0 .. start_dict_size - 1;
    my $ω = "";
    my @result;

    for my $c (split //, $in) {
        my $ωc = $ω . $c;
        if (exists $dict{$ωc}) {
            $ω = $ωc;
        } else {
            push @result, $dict{$ω};
            $dict{$ωc} = scalar keys %dict;
            $ω = $c;
        }
    }
    push @result, $dict{$ω} if length $ω;
    return @result;
}
sub decode {
    my @encoded = @_;
    my $dict_size = start_dict_size;
    my %dict = map { $_ => chr } 0 .. start_dict_size - 1;;
    my $ω = $dict{shift @encoded};
    my @result = ($ω); 
    for my $i (@encoded) {
        my $str;
        if (exists $dict{$i}) {
            $str = $dict{$i};
        } elsif  ($i == $dict_size) {
            $str = $ω . substr $ω, 0, 1; 
        } else { die "Error on $i" }
        push @result, $str;
        $dict{$dict_size++} = $ω . substr $str, 0, 1;
        $ω = $str;
    }
    return join "", @result;
}

my $input_str = 'TOBEORNOTTOBETOBEORNOTTOBETOBEORNOTTOBE';
my @encoded = encode $input_str;
say "@encoded";
say decode(@encoded);

The round trip works as before:

$ perl LZW_compression.pl
84 79 66 69 79 82 78 79 84 256 258 265 259 261 263 267 267 260 262 264 257 69
TOBEORNOTTOBETOBEORNOTTOBETOBEORNOTTOBE

(The numerical codes are not the same as in the original P6 implementation because we've used directly a starting %dict hash with 256 entries, but they are the same as in out second P6 test.)

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September 1. And, please, also spread the word about the Perl Weekly Challenge if you can.