Perl Weekly Challenge 25: Pokémon Sequence and Chaocipher

These are some answers to the Week 25 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 15, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Sequence of Pokémon Names

Generate a longest sequence of the following English Pokemon names where each name starts with the last letter of previous name.ù

audino bagon baltoy banette bidoof braviary bronzor carracosta charmeleon cresselia croagunk darmanitan deino emboar emolga exeggcute gabite girafarig gulpin haxorus heatmor heatran ivysaur jellicent jumpluff kangaskhan kricketune landorus ledyba loudred lumineon lunatone machamp magnezone mamoswine nosepass petilil pidgeotto pikachu pinsir poliwrath poochyena porygon2 porygonz registeel relicanth remoraid rufflet sableye scolipede scrafty seaking sealeo silcoon simisear snivy snorlax spoink starly tirtouga trapinch treecko tyrogue vigoroth vulpix wailord wartortle whismur wingull yamask

First, an assumption: each name in the sequence must appear only once, because if there could be duplicates, then it wouldn't be difficult to find an infinite cyclical sequence and easily win the prize for the longest sequence. Therefore, when we use a name at some point in a sequence, it should be removed from the list of names authorized afterwards in the same sequence. We also assume that the longest sequence means the sequence with the largest number of names (not the largest number of letters). One comment, finally: one of the Pokémons is named "porygon2"; since no name starts with a digit, this name cannot be used within a sequence, but at best as the final item of a sequence.

Longest Sequence of Pokémons in Perl 5

The first version of my program did not handle the case where there are several sequences, but it still printed the largest sequence count each time it was updated. And it appeared immediately that there were many sequences (1248) with the highest count (23 names). So I changed the code to record all the sequences with the highest count.

The first thing that the program does is to populate a hash with arrays of words starting with the same letter (that letter being the key in the hash). This way, when we look for a successor in a sequence, we only look at names stating with the right letter. The program also maintains a $seen hash reference to filter out names that have already been used in a sequence.

The program is using brute force, i.e. trying every legal sequence. Each time we've found a sequence that can no longer be augmented, we need to backtrack. The easiest way to implement a backtracking algorithm is to use recursion. So, our search_seq calls itself recursively each time we want to add a new name to a sequence.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my @names = qw /audino bagon baltoy banette bidoof braviary bronzor 
                carracosta charmeleon cresselia croagunk darmanitan 
                deino emboar emolga exeggcute gabite girafarig 
                gulpin haxorus heatmor heatran ivysaur jellicent 
                jumpluff kangaskhan kricketune landorus ledyba 
                loudred lumineon lunatone machamp magnezone 
                mamoswine nosepass petilil pidgeotto pikachu pinsir 
                poliwrath poochyena porygon2 porygonz registeel 
                relicanth remoraid rufflet sableye scolipede scrafty 
                seaking sealeo silcoon simisear snivy snorlax spoink
                starly tirtouga trapinch treecko tyrogue vigoroth 
                vulpix wailord wartortle whismur wingull yamask/;

my %name_by_letter;
for my $name (@names) {
    my $start_letter = substr $name, 0, 1;
    push @{$name_by_letter{$start_letter}}, $name;
}

my @best_seq;
my $best_count = 0;
for my $name (@names) {
    search_seq( [$name], {$name => 1} );
}
say "BEST SEQUENCES: ";
for my $item (@best_seq) {
   say "@$item";
}
say "Number of sequences: ", scalar @best_seq;

sub search_seq {
    my ($current_seq, $seen) = @_;
    my $last_name = $current_seq->[-1];
    my $last_letter = substr $last_name, -1, 1;
    my @next_candidates = grep { not exists $seen->{$_} }   
        @{$name_by_letter{$last_letter}};
    if (scalar @next_candidates == 0) {
        my $count = scalar @$current_seq;
        if ($count > $best_count) {
            @best_seq = ($current_seq);
            $best_count = $count;
        } elsif ($count == $best_count) {
            push @best_seq, $current_seq;
        }
    } else {
        for my $name (@next_candidates) {
            my %local_seen = %$seen;
            $local_seen{$name} = 1;
            search_seq ([@$current_seq, $name], {%local_seen});
        }
    }
}

As already mentioned, the best sequence count is 23 names, and the program detects 1248 sequences with that name count. So, I will provide only a few lines of the output:

machamp petilil landorus scrafty yamask kricketune emboar registeel loudred darmanitan nosepass simisear relicanth heatmor rufflet trapinch haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar registeel loudred darmanitan nosepass simisear rufflet trapinch heatmor relicanth haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth haxorus simisear rufflet trapinch heatmor registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth heatmor registeel loudred darmanitan nosepass simisear rufflet trapinch haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth heatmor rufflet trapinch haxorus simisear registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino

The program runs in about 35 seconds. This is sort of acceptable, but still a bit too long in my view. The main problem is that adding just a few dozen names is very likely to make the performance totally unacceptable. I can think about a few micro-optimizations, but I'm not really interested with these. And I don't see any simple way to significantly improve performance. Well, yes, there might be a possibility: each time we explore a name, we could keep track of the longest sequence it has generated, so that when we explore a new name and find the first name, we could use the longest sequence. But it's not very easy, because that works only if that longest sequence does not have any of the names previously used. Overall, I'm not really convinced we would be able to add full longest subsequences very often.

I tried to pass the last added name as a parameter to the search_seq subroutine to avoid having to look for it in the current sequence, but that doesn't run any faster (possibly even slightly slower); it seems that the overhead of the additional argument is larger than the cost of dereferencing the last item of the current sequence. I also tried to populate a hash giving directly a list of possible successors for each name in the list (to avoid having to check repeatedly the last letter of the last word added), but that does not bring any significant speed improvement.

Longest Sequence of Pokémons in Perl 6

We'll use basically the same algorithm in Perl 6. Using sets or similar collections (in fact sethashes), with the mathematical set operations, will make the syntax a bit easier.

use v6;

my @names = < audino bagon baltoy banette bidoof braviary bronzor  
                carracosta charmeleon cresselia croagunk darmanitan 
                deino emboar emolga exeggcute gabite girafarig 
                gulpin haxorus heatmor heatran ivysaur jellicent 
                jumpluff kangaskhan kricketune landorus ledyba 
                loudred lumineon lunatone machamp magnezone 
                mamoswine nosepass petilil pidgeotto pikachu pinsir 
                poliwrath poochyena porygon2 porygonz registeel 
                relicanth remoraid rufflet sableye scolipede scrafty 
                seaking sealeo silcoon simisear snivy snorlax spoink
                starly tirtouga trapinch treecko tyrogue vigoroth 
                vulpix wailord wartortle whismur wingull yamask >;

my %name-by-letter;
for @names -> $name {
    my $start-letter = substr $name, 0, 1;
    push %name-by-letter{$start-letter}, $name;
}

my @best-seq;
my $best-count = 0;
for @names -> $name {
    search-seq( [$name], $name.SetHash );
}
say "BEST SEQUENCES: ";
for @best-seq -> $item {
   say "$item";
}
say "Number of sequences: ", @best-seq.elems;
say now - INIT now;

sub search-seq (@current-seq, $seen) {
    my $last-name = @current-seq[*-1];
    my $last-letter = substr $last-name, *-1, 1;
    my @next-candidates = grep {defined $_}, # Remove empty slots
        (@(%name-by-letter{$last-letter}) (-) $seen).keys;
    if ( @next-candidates.elems == 0) {
        my $count = @current-seq.elems;
        if $count > $best-count {
            @best-seq = @current-seq;
            $best-count = $count;
        } elsif ($count == $best-count) {
            push @best-seq, @current-seq;
        }
    } else {
        for @next-candidates -> $name {
            my @new-seq = | @current-seq, $name;
            search-seq( @new-seq, $seen ∪ $name.SetHash );
        }
    }
}

Again, we copy only a small fraction of the output:

machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch heatmor registeel loudred darmanitan nosepass simisear relicanth haxorus scrafty yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass scrafty yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass snivy yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass starly yamask kricketune emolga audino

So this works, but the Perl 6 program now runs in more than 8 minutes. I have to think harder about optimizations or preferably a better algorithm.

Update Sept. 11: In his comment below, Timo Paulssen suggested that the grep in this statement:

    my @next-candidates = grep {defined $_}, # Remove empty slots
        (@(%name-by-letter{$last-letter}) (-) $seen).keys;

is slowing down significantly the program. For some reason, the correction he suggested wasn't really successful (I probably did something wrong), but removing the grep by changing the statement to this:

    my @next-candidates = %name-by-letter{$last-letter} ??
        (@(%name-by-letter{$last-letter}) (-) $seen).keys !! ();

reduced the execution time to four and a half minutes. I don't understand why this simple grep is taking so much time (not far from half of the total time), but that's a very good improvement.

I also tried to populate a hash giving directly a list of possible successors for each name in the list (to avoid having to check repeatedly the last letter of the last word added), but that does not bring any significant speed improvement (a win of about ten seconds).

End update.

I'll still try to think about a better algorithm, if time permits, and come back if I find something of interest.

Challenge 2: Implementation of Chaocypher

Create script to implement Chaocipher. Please checkout wiki page for more information.

According to the linked Wikipedia page, the Chaocipher is a cipher method invented by John Francis Byrne in 1918 and described in his 1953 autobiographical Silent Years. He believed Chaocipher was simple, yet unbreakable. He offered cash rewards for anyone who could solve it. In May 2010, the Byrne family donated all Chaocipher-related papers and artifacts to the National Cryptologic Museum in Ft. Meade, Maryland, USA. This led to the disclosure of the Chaocipher algorithm in a paper entitled Chaocypher Revealed: the Algorithm (2010), by Moshe Rubin.

How the Chaocipher works

The Chaocipher system consists of two alphabets, with the "right" alphabet used for locating the plaintext letter while the other ("left") alphabet is used for reading the corresponding ciphertext letter. In other words, the basis of the method is a simple substitution. The novel idea in the Chaocipher algorithm, however, is that the two alphabets are slightly modified after each input plaintext letter is enciphered. This leads to nonlinear and highly diffused alphabets as encryption progresses.

Although Byrne had in mind a physical model with rotating wheels, we will follow Rubin's algorithmic explanation of the method and represent each of the two alphabets as a 26-character string consisting of a permutation of the standard alphabet, for example:

            +            *
LEFT (ct):  HXUCZVAMDSLKPEFJRIGTWOBNYQ 
RIGHT (pt): PTLNBQDEOYSFAVZKGJRIHWXUMC

The place marked with a + sign and a * sign are called by Byrne the zenith and nadir points and they correspond to the first and the fourteenth positions in the alphabet. They are important for the alphabet permutation that will be performed after each ciphering and deciphering step.

The right alphabet (bottom) is used for finding the plain text letter, while the left alphabet (top) is used for finding the corresponding cipher text letter.

To encipher the plaintext letter "A," we simply look for this letter in the right alphabet and take the corresponding letter ("P") in the left alphabet (ct and pt stand for cipher text and plain text).

Each time a letter has been encrypted (or decrypted), we proceed with permutations of the alphabets. To permute the left alphabet, we will:

  • Shift the whole alphabet cyclically, so that the letter just enciphered ("P") is moved to the zenith (first) position;

    LEFT (ct):  PEFJRIGTWOBNYQHXUCZVAMDSLK
    

    Remove temporarily the letter in the second position (or zenith + 1), "E" in our example, leaving a "hole" in this position:

    LEFT (ct):  P.FJRIGTWOBNYQHXUCZVAMDSLK
    

    Shift one position to the left all letters between the second position and the nadir position, leaving a hole in the nadir position:

    LEFT (ct):  PFJRIGTWOBNYQ.HXUCZVAMDSLK
    
  • And finally insert the letter that has been removed ("E") in the nadir position:

    LEFT (ct):  PFJRIGTWOBNYQEHXUCZVAMDSLK
    

    Permuting the right alphabet is a similar process, but with some small but important differences that I will not describe here: please refer to Rubin's document to find the details.

After the permutation of the right alphabet, the two alphabets look like this:

LEFT (ct):  PFJRIGTWOBNYQEHXUCZVAMDSLK
RIGHT (pt): VZGJRIHWXUMCPKTLNBQDEOYSFA

With these new alphabets, we are now ready to encrypt the second letter of the plain text. Then we permute again both alphabets and proceed with the third letter of the plain text. And so on.

Deciphering the cipher text is the same process, except of course that we need to locate the first letter of the cipher text in the left alphabet and pick up the corresponding letter in the right alphabet. Alphabet permutations then follow exactly the same rules as when enciphering the plain text.

The strength of the Chaocipher is that the encryption key (the two alphabets) is changed each time a letter of the input text is processed, and the way it is changed depends on the content of the input message. In effect, this is an advanced form of an autokey cipher that is very difficult to break.

Chaocipher Implementation in Perl 5

For our alphabets, we could use strings of characters, arrays of letters or even hashes. Operations on strings of characters are usually reasonably fast and efficient, so I settled for that. Since both alphabets need to be permuted at the same time, I decided to write only one subroutine (permute_alphabets) to permute both alphabets at the same time: at least, there is no risk to permute one and forget to permute the other. I included some tests based on Rubin's paper examples.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

sub permute_alphabets {
    my ($left, $right, $pos) = @_;
    my $newleft = substr ($left, $pos) . substr $left, 0, $pos;
    $newleft = substr ($newleft, 0, 1) . substr ($newleft, 2, 12) 
               . substr ($newleft, 1, 1) . substr $newleft, 14;

    my $newright = substr ($right, $pos+1) . substr $right, 0, $pos+1;
    $newright = substr ($newright, 0, 2) . substr ($newright, 3, 11) 
                . substr ($newright, 2, 1) . substr $newright, 14;
    return ($newleft, $newright);
}

sub run_tests {
    use Test::More; # Minimal tests for providing an example
    plan tests => 4;
    my $left  = 'HXUCZVAMDSLKPEFJRIGTWOBNYQ';
    my $right = 'PTLNBQDEOYSFAVZKGJRIHWXUMC';
    my $position = index $right, 'A';
    my ($newleft, $newright) = permute_alphabets $left, $right, 
        $position;
    is $newleft, 'PFJRIGTWOBNYQEHXUCZVAMDSLK', 
        "Left alphabet: $newleft";
    is $newright, 'VZGJRIHWXUMCPKTLNBQDEOYSFA', 
        "Right alphabet: $newright";
    my $plaintext = "WELLDONEISBETTERTHANWELLSAID";
    my $ciphertext = encipher($plaintext, $left, $right);
    is $ciphertext, 'OAHQHCNYNXTSZJRRHJBYHQKSOUJY', 
        "Testing enciphering: $ciphertext";
    my $deciphered = decipher($ciphertext, $left, $right);
    is $deciphered, $plaintext, "Roundtrip: $deciphered";
}

sub encipher {
    my ($plaintext, $left, $right) = @_;
    my $ciphertext = "";
    my @letters = split //, $plaintext;
    for my $let (@letters) {
        my $position = index $right, $let;
        $ciphertext .= substr $left, $position, 1;
        ($left, $right) = permute_alphabets ($left, $right, 
            $position);
    }
    return $ciphertext;
}

sub decipher {
    my ($ciphertext, $left, $right) = @_;
    my $plaintext = "";
    my @letters = split //, $ciphertext;
    for my $let (@letters) {
        my $position = index $left, $let;
        $plaintext .= substr $right, $position, 1;
        ($left, $right) = permute_alphabets ($left, $right, 
            $position);
    }
    return $plaintext;
}

if (@ARGV == 0) {
    run_tests;
} else {
    die "Invalid number of arguments: we need 4 arguments.\n" 
        unless @ARGV == 4;
    my ($mode, $text, $left, $right) = @ARGV;
    if ($mode eq 'encipher') {
        say encipher($text, $left, $right);
    } elsif ($mode eq 'decipher') {
        say decipher($text, $left, $right);
    } else {
        die "Invalid mode: must be 'encipher' or 'decipher'.\n";
    }
}

We can either launch the program without any argument to run the tests, or pass four arguments (mode, text, left alphabet and right alphabet) to encipher or decipher the text.

This is an example of the output:

$ perl chaocipher.pl
1..4
ok 1 - Left alphabet: PFJRIGTWOBNYQEHXUCZVAMDSLK
ok 2 - Right alphabet: VZGJRIHWXUMCPKTLNBQDEOYSFA
ok 3 - Testing enciphering: OAHQHCNYNXTSZJRRHJBYHQKSOUJY
ok 4 - Roundtrip: WELLDONEISBETTERTHANWELLSAID

$ perl chaocipher.pl encipher WELLDONEISBETTERTHANWELLSAID HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
OAHQHCNYNXTSZJRRHJBYHQKSOUJY

$ perl chaocipher.pl decipher OAHQHCNYNXTSZJRRHJBYHQKSOUJY HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
WELLDONEISBETTERTHANWELLSAID

In a real life program, we would need to do a few other things to make it more robust, including especially a proper validation of the arguments (only upper case ASCII letters in the input text, complete alphabets, etc.). This is left as an exercise to the reader ).

Chaocipher Implementation in Perl 6

Besides the minor syntax differences between Perl 5 and Perl 6, there no reason to change the way the enciphering/deciphering algorithm operates. We will use multi MAIN subroutines to decide on whether to run tests or to process a string passed to the program. There are a couple of minor added features. We declare an uppercase subset of the string type to provide a better (still limited) validation of subroutine arguments. And we fold the case of the program arguments to what is needed.

use v6;
subset UcStr of Str where /^<[A..Z]>+$/;

sub permute-alphabets (UcStr $left is copy, UcStr $right is copy, UInt $pos) {
    $left = substr($left, $pos) ~ substr $left, 0, $pos;
    $left = substr($left, 0, 1) ~ substr($left, 2, 12) 
            ~ substr($left, 1, 1) ~ substr $left, 14;

    $right = substr($right, $pos+1) ~ substr $right, 0, $pos+1;
    $right = substr($right, 0, 2) ~ substr($right, 3, 11) 
             ~ substr($right, 2, 1) ~ substr $right, 14;
    return ($left, $right);
}

sub run_tests {
    use Test; 
    plan 4;
    my $left  = 'HXUCZVAMDSLKPEFJRIGTWOBNYQ';
    my $right = 'PTLNBQDEOYSFAVZKGJRIHWXUMC';
    my $position = index $right, 'A';
    my ($newleft, $newright) = permute-alphabets $left, $right,
        $position;
    is $newleft, 'PFJRIGTWOBNYQEHXUCZVAMDSLK', 
        "Left alphabet: $newleft";
    is $newright, 'VZGJRIHWXUMCPKTLNBQDEOYSFA', 
        "Right alphabet: $newright";
    my $plaintext = "WELLDONEISBETTERTHANWELLSAID";
    my $ciphertext = encipher($plaintext, $left, $right);
    is $ciphertext, 'OAHQHCNYNXTSZJRRHJBYHQKSOUJY', 
        "Testing enciphering: $ciphertext";
    my $deciphered = decipher($ciphertext, $left, $right);
    is $deciphered, $plaintext, "Roundtrip: $deciphered";
}

sub encipher (UcStr $plaintext, UcStr $left is copy, UcStr $right is copy) {
    my $ciphertext = "";
    for $plaintext.comb -> $let {
        my $position = index $right, $let;
        $ciphertext ~= substr $left, $position, 1;
        ($left, $right) = permute-alphabets $left, $right,
            $position;
    }
    return $ciphertext;
}

sub decipher (UcStr $ciphertext, UcStr $left is copy, UcStr $right is copy) {
    my $plaintext = "";
    for $ciphertext.comb -> $let {
        my $position = index $left, $let;
        $plaintext ~= substr $right, $position, 1;
        ($left, $right) = permute-alphabets $left, $right, 
            $position;
    }
    return $plaintext;
}

multi MAIN () {
    run_tests;
} 
multi MAIN (Str $mode, Str $text, Str $left, Str $right) {  
    if $mode.lc eq 'encipher' {
        say encipher $text.uc, $left.uc, $right.uc;
    } elsif $mode.lc eq 'decipher' {
        say decipher $text.uc, $left.uc, $right.uc;
    } else {
        die "Invalid mode $mode: must be 'encipher' or 'decipher'.\n";
    }
}

And this is a sample output with various arguments:

$ perl6 chaocipher.p6 encipher WELLDONEISBETTERTHANWELLSAID HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
OAHQHCNYNXTSZJRRHJBYHQKSOUJY

$ perl6 chaocipher.p6  decipher OAHQHCNYNXTSZJRRHJBYHQKSOUJY HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
WELLDONEISBETTERTHANWELLSAID

$ perl6 chaocipher.p6
1..4
ok 1 - Left alphabet: PFJRIGTWOBNYQEHXUCZVAMDSLK
ok 2 - Right alphabet: VZGJRIHWXUMCPKTLNBQDEOYSFA
ok 3 - Testing enciphering: OAHQHCNYNXTSZJRRHJBYHQKSOUJY
ok 4 - Roundtrip: WELLDONEISBETTERTHANWELLSAID

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September, 22. And, please, also spread the word about the Perl Weekly Challenge if you can.

7 Comments

Hi laurent,

you can get the perl6 version of the pokemon calculator a good bit faster by getting rid of the grep for empty keys.

the reason there ever are undefined values in the list is that getting %name-by-letter{$last-letter} when there's nothing in there for that given letter, you'll get (Any), which when turned into a list will become [(Any)], which upsets the rest of the function.

My simple solution was to use the with conditional like so:

my @next-candidates = ($_.list (-) $seen).keys with %name-by-letter{$last-letter};

In my timings the program goes (for only the first 20 names as start points) from 1m10s down to 47s.

cheers
  - Timo

Hi Laurent,

One of the optimization I made is to do it backwards.

Generate list of names ending with a letter which is NOT a starting letter of any name in the whole pokemon name list. These will be the last name candidates

The last name candidates can be further reduced by getting the name with most number of branching (its first char is same as last char of name in pokemon name list)

This will result in fewer starting points, and will reduce recursions.

Hi Laurent, You are right! Thanks for pointing out

-Yet

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.